A data engineer needs to schedule a workflow that runs a set of AWS Glue jobs every day. The data engineer does not require the Glue jobs to run or finish at a specific time.Which solution will run the Glue jobs in the MOST cost-effective way?
Answer(s): A
A) Choosing FLEX execution class is most cost-effective for nondeterministic or flexible-start workflows, as FLEX allows Glue to use fewer compute resources and scale efficiently when exact start/finish times aren’t required.B) Spot Instances are not a Glue job option; Glue supports on-demand, streaming, and a few pricing models, but Spot isn’t applicable for Glue jobs.C) STANDARD is a fixed cost model; it may be more expensive for sporadic, scheduled tasks that don’t need guaranteed timing.D) GlueVersion specifies features, not cost optimization; newer versions don’t inherently reduce cost for flexible schedules.
A data engineer needs to create an AWS Lambda function that converts the format of data from .csv to Apache Parquet. The Lambda function must run only if a user uploads a .csv file to an Amazon S3 bucket.Which solution will meet these requirements with the LEAST operational overhead?
A is correct because S3 event notifications can filter for object creation events with a .csv suffix and directly invoke the Lambda function, minimizing components and operational overhead. B is incorrect because tag-based triggers require tagging policy and do not guarantee the file is a CSV, adding complexity. C is incorrect because s3:* is overly broad and not needed; it would generate excessive events and complicate processing. D is incorrect because using SNS introduces an additional service and subscription step, increasing latency and maintenance versus direct Lambda invocation from S3 events.
A data engineer needs Amazon Athena queries to finish faster. The data engineer notices that all the files the Athena queries use are currently stored in uncompressed .csv format. The data engineer also notices that users perform most queries by selecting a specific column.Which solution will MOST speed up the Athena query performance?
Answer(s): C
Athena performance improves when using columnar, compressed formats and predicate pushdown; Parquet with Snappy enables efficient column pruning for single-column queries.A) JSON is not columnar and increases read I/O; adds no efficient columnar pruning, even with Snappy.B) Snappy compression alone on CSV helps slightly but does not enable columnar pruning or reduce I/O as effectively as Parquet.C) Parquet is columnar AND Snappy-compressed, enabling predicate pushdown for specific columns and substantial I/O reduction, speeding queries.D) gzip is a single-stream compression on CSV, not columnar and limits parallelism/Predicate pushdown; less performance gain than Parquet.
A manufacturing company collects sensor data from its factory floor to monitor and enhance operational efficiency. The company uses Amazon Kinesis Data Streams to publish the data that the sensors collect to a data stream. Then Amazon Kinesis Data Firehose writes the data to an Amazon S3 bucket.The company needs to display a real-time view of operational efficiency on a large screen in the manufacturing facility.Which solution will meet these requirements with the LOWEST latency?
A) Real-time processing with Amazon Managed Service for Apache Flink and a Grafana dashboard using Timestream minimizes latency by streaming analytics directly from Kinesis Data Streams, with Timestream as a time-series store and Grafana for real-time visualization.B) S3 event-based Lambda introduces higher latency due to polling and object-level processing; Aurora+QuickSight is batch-oriented and not real-time.C) While Flink is suitable, publishing directly from Flink to Timestream via a dedicated Firehose stream adds unnecessary hops and potential latency compared to inline Flink processing and Grafana.D) AWS Glue bookmarks are batch-oriented and not suitable for real-time dashboards; Grafana over Timestream would be possible, but the end-to-end real-time path is weaker than A.
A company stores daily records of the financial performance of investment portfolios in .csv format in an Amazon S3 bucket. A data engineer uses AWS Glue crawlers to crawl the S3 data.The data engineer must make the S3 data accessible daily in the AWS Glue Data Catalog.Which solution will meet these requirements?
Answer(s): B
The correct option B: Uses the AWSGlueServiceRole, which is the appropriate IAM role for Glue crawlers, ensuring least privilege and proper integration with the Glue service; specifying the S3 source and a daily crawl with a database name for the output places the catalog metadata in a known database.A) Uses AmazonS3FullAccess, excessive permissions and not required; output destination path concept not necessary for catalog integration.C) Adds DPUs but lacks proper output database naming and uses broad S3 access; unnecessary for catalog registration.D) Same issue as C with DPUs and output path, plus uses full service role rather than Glue service role.
A company loads transaction data for each day into Amazon Redshift tables at the end of each day. The company wants to have the ability to track which tables have been loaded and which tables still need to be loaded.A data engineer wants to store the load statuses of Redshift tables in an Amazon DynamoDB table. The data engineer creates an AWS Lambda function to publish the details of the load statuses to DynamoDB.How should the data engineer invoke the Lambda function to write load statuses to the DynamoDB table?
B) The Redshift Data API can publish events to EventBridge, which can trigger the Lambda to write load statuses to DynamoDB, enabling decoupled, serverless event-driven updates aligned with Redshift activity.A) Requires a separate Lambda and CloudWatch events; more complex and not as direct as EventBridge integration with Redshift Data API.C) SQS-to-Lambda path adds unnecessary queueing and is not the idiomatic Redshift event notification mechanism.D) CloudTrail events are audit logs, not intended for real-time data workflow triggers between Redshift and Lambda.
A data engineer needs to securely transfer 5 TB of data from an on-premises data center to an Amazon S3 bucket. Approximately 5% of the data changes every day. Updates to the data need to be regularly proliferated to the S3 bucket. The data includes files that are in multiple formats. The data engineer needs to automate the transfer process and must schedule the process to run periodically.Which AWS service should the data engineer use to transfer the data in the MOST operationally efficient way?
A) AWS DataSync is correct because it enables secure, automated, periodic transfer of large on-premises datasets to S3, supports incremental changes, multiple file formats, and can schedule transfers; it handles continuous updates efficiently without manual scripting.B) AWS Glue is optimized for ETL processing and data cataloging, not for secure, ongoing bulk transfer from on-premises to S3 with scheduling and incremental sync.C) AWS Direct Connect provides a dedicated network connection, not data movement orchestration or scheduling of transfers to S3.D) Amazon S3 Transfer Acceleration speeds individual uploads over long distances but is not designed for automated, scheduled, incremental sync from on-premises with ongoing updates.
A company uses an on-premises Microsoft SQL Server database to store financial transaction data. The company migrates the transaction data from the on-premises database to AWS at the end of each month. The company has noticed that the cost to migrate data from the on-premises database to an Amazon RDS for SQL Server database has increased recently.The company requires a cost-effective solution to migrate the data to AWS. The solution must cause minimal downtown for the applications that access the database.Which AWS service should the company use to meet these requirements?
AWS DMS is designed for ongoing or batch migrations with minimal downtime, supporting continuous replication from on-premises SQL Server to RDS for SQL Server and optimizing for cost during monthly migrations.A) AWS Lambda is event-driven compute unsuitable for large data migrations and lacks built-in data replication capabilities.C) AWS Direct Connect provides dedicated network connectivity but does not handle data transformation or ongoing replication between on-prem and AWS.D) AWS DataSync focuses on high-speed transfer of files and object storage, not relational database replication to RDS.B) Correct: DMS handles database migration with minimal downtime and cost-effective, ongoing replication for SQL Server to RDS.
Share your comments for Amazon Amazon-DEA-C01 exam with other users:
Question 5:Question 5 asks how to identify min and max values for each column in a Dataflow result. Correct options: B and E.
Question 18:Question 18: Why not A?
Question 4:Question 4 is about when to use batch processing.
Question 5:I can’t see the [Image] in Question 5, but I can explain the likely reasoning.
Question 12:Here’s why Question 12’s correct choices are C and D.
Question 3:Question 3 asks for two valid ways to meet the purchase order creation validation (warn if the vendor is on the exclusion list for the customer/product and block/alert accordingly). Correct answers: C and D
Question 12:Here’s how to understand question 12.
Question 6:Here’s how question 6 works. Key constraint: All new and extended objects must be in an existing model named FinanceExt. Creating a brand-new model is not allowed. Why the two correct options work:
Question 2:I don’t have the text for Question 2 here. Please paste the exact Question 2 (including all answer choices) or describe the topic it covers. Once I have it, I’ll:
Which statement is true about using default environment variables? The environment variables can be read in workflows using the ENV: variable_name syntax. The environment variables created should be prefixed with GITHUB_ to ensure they can be accessed in workflows The environment variables can be set in the defaults: sections of the workflow The GITHUB_WORKSPACE environment variable should be used to access files from within the runner.Correct answer: The statement "The GITHUB_WORKSPACE environment variable should be used to access files from within the runner." is true. Why the others are false:
${{ env.VARIABLE }}
$VARIABLE
GITHUB_
defaults:
run
GITHUB_WORKSPACE
${{ github.workspace }}
$GITHUB_WORKSPACE/...
${{ github.workspace }}/...
As an administrator for this subscription, you have been tasked with recommending a solution that prohibits users from copying corporate information from managed applications installed on unmanaged devices. Which of the following should you recommend? Windows Virtual Desktop. Microsoft Intune. Windows AutoPilot. Azure AD Application Proxy.
Question 34:
Policy
function of appnav in sdwan
Question 1:
Question 5:
Why this is correct
Question 7:
Question 104:
clustering keys
Q23: Fabric Admin is correct. Because Domain admin cannot create domains. Only Fabric Admin can among the given options. Q51: Wrapping @pipeline.parameter.param1 inside {} will return a string. But question requires the expression to return Int, so correct answer should be @pipeline.parameter.param1 (no {})
Question 62:
ZDX
Analyze Score
Y Engine
Question 32:
Question 3:
date = sys.argv[1]
sys.argv[1]
date = spark.conf.get("date")
input()
date = dbutils.notebooks.getParam("date")
dbutils.notebook.run
Question 528:
Question 23:The correct answer is Domain admin (option B), not Fabric admin.
Question 2:For question 2, the key concept is the Longest Prefix Match. Routers pick the route whose subnet mask is the most specific (largest prefix length) that still matches the destination IP. From the options:
Question 129:Correct answer: CNAME
compute.osAdminLogin
enable-oslogin
Question 2:
Recommend using AI for Solutions rather the Answer(s) submitted here
This is very interesting
Are these the same questions you have to pay for in ExamTopics?
For Question 7 - while the answer description indicates the correct answer, the option no. mentioned is incorrect. Nice and Comprehensive. Thankyou