Amazon Amazon-DEA-C01 Dumps in PDF and Online

QUESTION: 9

A data engineer needs to schedule a workflow that runs a set of AWS Glue jobs every day. The data engineer does not require the Glue jobs to run or finish at a specific time.
Which solution will run the Glue jobs in the MOST cost-effective way?

Choose the FLEX execution class in the Glue job properties.
Use the Spot Instance type in Glue job properties.
Choose the STANDARD execution class in the Glue job properties.
Choose the latest version in the GlueVersion field in the Glue job properties.

Answer(s): A

Explanation:

A) Choosing FLEX execution class is most cost-effective for nondeterministic or flexible-start workflows, as FLEX allows Glue to use fewer compute resources and scale efficiently when exact start/finish times aren’t required.
B) Spot Instances are not a Glue job option; Glue supports on-demand, streaming, and a few pricing models, but Spot isn’t applicable for Glue jobs.
C) STANDARD is a fixed cost model; it may be more expensive for sporadic, scheduled tasks that don’t need guaranteed timing.
D) GlueVersion specifies features, not cost optimization; newer versions don’t inherently reduce cost for flexible schedules.

Reveal Solution Next Question

QUESTION: 10

A data engineer needs to create an AWS Lambda function that converts the format of data from .csv to Apache Parquet. The Lambda function must run only if a user uploads a .csv file to an Amazon S3 bucket.
Which solution will meet these requirements with the LEAST operational overhead?

Create an S3 event notification that has an event type of s3:ObjectCreated:*. Use a filter rule to generate notifications only when the suffix includes .csv. Set the Amazon Resource Name (ARN) of the Lambda function as the destination for the event notification.
Create an S3 event notification that has an event type of s3:ObjectTagging:* for objects that have a tag set to .csv. Set the Amazon Resource Name (ARN) of the Lambda function as the destination for the event notification.
Create an S3 event notification that has an event type of s3:*. Use a filter rule to generate notifications only when the suffix includes .csv. Set the Amazon Resource Name (ARN) of the Lambda function as the destination for the event notification.
Create an S3 event notification that has an event type of s3:ObjectCreated:*. Use a filter rule to generate notifications only when the suffix includes .csv. Set an Amazon Simple Notification Service (Amazon SNS) topic as the destination for the event notification. Subscribe the Lambda function to the SNS topic.

Answer(s): A

Explanation:

A is correct because S3 event notifications can filter for object creation events with a .csv suffix and directly invoke the Lambda function, minimizing components and operational overhead. B is incorrect because tag-based triggers require tagging policy and do not guarantee the file is a CSV, adding complexity. C is incorrect because s3:* is overly broad and not needed; it would generate excessive events and complicate processing. D is incorrect because using SNS introduces an additional service and subscription step, increasing latency and maintenance versus direct Lambda invocation from S3 events.

Reveal Solution Next Question

QUESTION: 11

A data engineer needs Amazon Athena queries to finish faster. The data engineer notices that all the files the Athena queries use are currently stored in uncompressed .csv format. The data engineer also notices that users perform most queries by selecting a specific column.
Which solution will MOST speed up the Athena query performance?

Change the data format from .csv to JSON format. Apply Snappy compression.
Compress the .csv files by using Snappy compression.
Change the data format from .csv to Apache Parquet. Apply Snappy compression.
Compress the .csv files by using gzip compression.

Answer(s): C

Explanation:

Athena performance improves when using columnar, compressed formats and predicate pushdown; Parquet with Snappy enables efficient column pruning for single-column queries.
A) JSON is not columnar and increases read I/O; adds no efficient columnar pruning, even with Snappy.
B) Snappy compression alone on CSV helps slightly but does not enable columnar pruning or reduce I/O as effectively as Parquet.
C) Parquet is columnar AND Snappy-compressed, enabling predicate pushdown for specific columns and substantial I/O reduction, speeding queries.
D) gzip is a single-stream compression on CSV, not columnar and limits parallelism/Predicate pushdown; less performance gain than Parquet.

Reveal Solution Next Question

QUESTION: 12

A manufacturing company collects sensor data from its factory floor to monitor and enhance operational efficiency. The company uses Amazon Kinesis Data Streams to publish the data that the sensors collect to a data stream. Then Amazon Kinesis Data Firehose writes the data to an Amazon S3 bucket.
The company needs to display a real-time view of operational efficiency on a large screen in the manufacturing facility.
Which solution will meet these requirements with the LOWEST latency?

Use Amazon Managed Service for Apache Flink (previously known as Amazon Kinesis Data Analytics) to process the sensor data. Use a connector for Apache Flink to write data to an Amazon Timestream database. Use the Timestream database as a source to create a Grafana dashboard.
Configure the S3 bucket to send a notification to an AWS Lambda function when any new object is created. Use the Lambda function to publish the data to Amazon Aurora. Use Aurora as a source to create an Amazon QuickSight dashboard.
Use Amazon Managed Service for Apache Flink (previously known as Amazon Kinesis Data Analytics) to process the sensor data. Create a new Data Firehose delivery stream to publish data directly to an Amazon Timestream database. Use the Timestream database as a source to create an Amazon QuickSight dashboard.
Use AWS Glue bookmarks to read sensor data from the S3 bucket in real time. Publish the data to an Amazon Timestream database. Use the Timestream database as a source to create a Grafana dashboard.

Answer(s): A

Explanation:

A) Real-time processing with Amazon Managed Service for Apache Flink and a Grafana dashboard using Timestream minimizes latency by streaming analytics directly from Kinesis Data Streams, with Timestream as a time-series store and Grafana for real-time visualization.
B) S3 event-based Lambda introduces higher latency due to polling and object-level processing; Aurora+QuickSight is batch-oriented and not real-time.
C) While Flink is suitable, publishing directly from Flink to Timestream via a dedicated Firehose stream adds unnecessary hops and potential latency compared to inline Flink processing and Grafana.
D) AWS Glue bookmarks are batch-oriented and not suitable for real-time dashboards; Grafana over Timestream would be possible, but the end-to-end real-time path is weaker than A.

Reveal Solution Next Question

QUESTION: 13

A company stores daily records of the financial performance of investment portfolios in .csv format in an Amazon S3 bucket. A data engineer uses AWS Glue crawlers to crawl the S3 data.
The data engineer must make the S3 data accessible daily in the AWS Glue Data Catalog.
Which solution will meet these requirements?

Create an IAM role that includes the AmazonS3FullAccess policy. Associate the role with the crawler. Specify the S3 bucket path of the source data as the crawler's data store. Create a daily schedule to run the crawler. Configure the output destination to a new path in the existing S3 bucket.
Create an IAM role that includes the AWSGlueServiceRole policy. Associate the role with the crawler. Specify the S3 bucket path of the source data as the crawler's data store. Create a daily schedule to run the crawler. Specify a database name for the output.
Create an IAM role that includes the AmazonS3FullAccess policy. Associate the role with the crawler. Specify the S3 bucket path of the source data as the crawler's data store. Allocate data processing units (DPUs) to run the crawler every day. Specify a database name for the output.
Create an IAM role that includes the AWSGlueServiceRole policy. Associate the role with the crawler. Specify the S3 bucket path of the source data as the crawler's data store. Allocate data processing units (DPUs) to run the crawler every day. Configure the output destination to a new path in the existing S3 bucket.

Answer(s): B

Explanation:

The correct option B: Uses the AWSGlueServiceRole, which is the appropriate IAM role for Glue crawlers, ensuring least privilege and proper integration with the Glue service; specifying the S3 source and a daily crawl with a database name for the output places the catalog metadata in a known database.
A) Uses AmazonS3FullAccess, excessive permissions and not required; output destination path concept not necessary for catalog integration.
C) Adds DPUs but lacks proper output database naming and uses broad S3 access; unnecessary for catalog registration.
D) Same issue as C with DPUs and output path, plus uses full service role rather than Glue service role.

Reveal Solution Next Question

QUESTION: 14

A company loads transaction data for each day into Amazon Redshift tables at the end of each day. The company wants to have the ability to track which tables have been loaded and which tables still need to be loaded.
A data engineer wants to store the load statuses of Redshift tables in an Amazon DynamoDB table. The data engineer creates an AWS Lambda function to publish the details of the load statuses to DynamoDB.
How should the data engineer invoke the Lambda function to write load statuses to the DynamoDB table?

Use a second Lambda function to invoke the first Lambda function based on Amazon CloudWatch events.
Use the Amazon Redshift Data API to publish an event to Amazon EventBridge. Configure an EventBridge rule to invoke the Lambda function.
Use the Amazon Redshift Data API to publish a message to an Amazon Simple Queue Service (Amazon SQS) queue. Configure the SQS queue to invoke the Lambda function.
Use a second Lambda function to invoke the first Lambda function based on AWS CloudTrail events.

Answer(s): B

Explanation:

B) The Redshift Data API can publish events to EventBridge, which can trigger the Lambda to write load statuses to DynamoDB, enabling decoupled, serverless event-driven updates aligned with Redshift activity.
A) Requires a separate Lambda and CloudWatch events; more complex and not as direct as EventBridge integration with Redshift Data API.
C) SQS-to-Lambda path adds unnecessary queueing and is not the idiomatic Redshift event notification mechanism.
D) CloudTrail events are audit logs, not intended for real-time data workflow triggers between Redshift and Lambda.

Reveal Solution Next Question

QUESTION: 15

A data engineer needs to securely transfer 5 TB of data from an on-premises data center to an Amazon S3 bucket. Approximately 5% of the data changes every day. Updates to the data need to be regularly proliferated to the S3 bucket. The data includes files that are in multiple formats. The data engineer needs to automate the transfer process and must schedule the process to run periodically.
Which AWS service should the data engineer use to transfer the data in the MOST operationally efficient way?

AWS DataSync
AWS Glue
AWS Direct Connect
Amazon S3 Transfer Acceleration

Answer(s): A

Explanation:

A) AWS DataSync is correct because it enables secure, automated, periodic transfer of large on-premises datasets to S3, supports incremental changes, multiple file formats, and can schedule transfers; it handles continuous updates efficiently without manual scripting.
B) AWS Glue is optimized for ETL processing and data cataloging, not for secure, ongoing bulk transfer from on-premises to S3 with scheduling and incremental sync.
C) AWS Direct Connect provides a dedicated network connection, not data movement orchestration or scheduling of transfers to S3.
D) Amazon S3 Transfer Acceleration speeds individual uploads over long distances but is not designed for automated, scheduled, incremental sync from on-premises with ongoing updates.

Reveal Solution Next Question

QUESTION: 16

A company uses an on-premises Microsoft SQL Server database to store financial transaction data. The company migrates the transaction data from the on-premises database to AWS at the end of each month. The company has noticed that the cost to migrate data from the on-premises database to an Amazon RDS for SQL Server database has increased recently.
The company requires a cost-effective solution to migrate the data to AWS. The solution must cause minimal downtown for the applications that access the database.
Which AWS service should the company use to meet these requirements?

AWS Lambda
AWS Database Migration Service (AWS DMS)
AWS Direct Connect
AWS DataSync

Answer(s): B

Explanation:

AWS DMS is designed for ongoing or batch migrations with minimal downtime, supporting continuous replication from on-premises SQL Server to RDS for SQL Server and optimizing for cost during monthly migrations.
A) AWS Lambda is event-driven compute unsuitable for large data migrations and lacks built-in data replication capabilities.
C) AWS Direct Connect provides dedicated network connectivity but does not handle data transformation or ongoing replication between on-prem and AWS.
D) AWS DataSync focuses on high-speed transfer of files and object storage, not relational database replication to RDS.
B) Correct: DMS handles database migration with minimal downtime and cost-effective, ongoing replication for SQL Server to RDS.

Reveal Solution Next Question

Amazon AWS Certified Data Engineer - Associate Amazon-DEA-C01 Dumps in PDF

Free Amazon Amazon-DEA-C01 Real Questions (page: 2)

QUESTION: 9

Explanation:

QUESTION: 10

Explanation:

QUESTION: 11

Explanation:

QUESTION: 12

Explanation:

QUESTION: 13

Explanation:

QUESTION: 14

Explanation:

QUESTION: 15

Explanation:

QUESTION: 16

Explanation: