Amazon AWS Certified Data Engineer - Associate Amazon-DEA-C01 Dumps in PDF

Free Amazon Amazon-DEA-C01 Real Questions (page: 27)

A company has a frontend ReactJS website that uses Amazon API Gateway to invoke REST APIs. The APIs perform the functionality of the website. A data engineer needs to write a Python script that can be occasionally invoked through API Gateway. The code must return results to API Gateway.
Which solution will meet these requirements with the LEAST operational overhead?

  1. Deploy a custom Python script on an Amazon Elastic Container Service (Amazon ECS) cluster.
  2. Create an AWS Lambda Python function with provisioned concurrency.
  3. Deploy a custom Python script that can integrate with API Gateway on Amazon Elastic Kubernetes Service (Amazon EKS).
  4. Create an AWS Lambda function. Ensure that the function is warm by scheduling an Amazon EventBridge rule to invoke the Lambda function every 5 minutes by using mock events.

Answer(s): B

Explanation:

A) The least operational overhead is achieved with a Lambda function that can be invoked by API Gateway and does not require managing servers or containers, with provisioned concurrency ensuring cold-start avoidance.
B) Correct: AWS Lambda Python function with provisioned concurrency minimizes latency and management effort; API Gateway integration is direct, and no infrastructure provisioning is required.
C) EKS adds substantial operational overhead for Kubernetes management and does not align with “least overhead” for a small script invoked via API Gateway.
D) Regularly pinging a Lambda to stay warm is unnecessary with provisioned concurrency and adds unnecessary scheduling, increasing operational overhead and complexity.



A company has a production AWS account that runs company workloads. The company's security team created a security AWS account to store and analyze security logs from the production AWS account. The security logs in the production AWS account are stored in Amazon CloudWatch Logs.
The company needs to use Amazon Kinesis Data Streams to deliver the security logs to the security AWS account.
Which solution will meet these requirements?

  1. Create a destination data stream in the production AWS account. In the security AWS account, create an IAM role that has cross-account permissions to Kinesis Data Streams in the production AWS account.
  2. Create a destination data stream in the security AWS account. Create an IAM role and a trust policy to grant CloudWatch Logs the permission to put data into the stream. Create a subscription filter in the security AWS account.
  3. Create a destination data stream in the production AWS account. In the production AWS account, create an IAM role that has cross-account permissions to Kinesis Data Streams in the security AWS account.
  4. Create a destination data stream in the security AWS account. Create an IAM role and a trust policy to grant CloudWatch Logs the permission to put data into the stream. Create a subscription filter in the production AWS account.

Answer(s): D

Explanation:

Kinesis Data Streams must reside in the destination account (security account) where logs from CloudWatch Logs will be delivered via a subscription filter. Creating the destination stream in the security account and granting CloudWatch Logs permission to put records, with a subscription filter, aligns cross-account delivery without requiring cross-account Data Streams permissions in the source account.
A) Wrong: destination stream in production requires cross-account Kinesis permissions; not aligned with central security account ownership.
B) Wrong: subscription filter targets CloudWatch Logs to a Kinesis stream in security account, but missing cross-account trust for CloudWatch Logs in production; workflow incorrect.
C) Wrong: destination stream in production; cross-account role in production to security is unnecessary and misaligned with security-centric design.
D) Correct: destination stream in security account; IAM trust policy allows CloudWatch Logs to write; subscription filter enables delivery from production logs to the security account.



A company uses Amazon S3 to store semi-structured data in a transactional data lake. Some of the data files are small, but other data files are tens of terabytes.
A data engineer must perform a change data capture (CDC) operation to identify changed data from the data source. The data source sends a full snapshot as a JSON file every day and ingests the changed data into the data lake.
Which solution will capture the changed data MOST cost-effectively?

  1. Create an AWS Lambda function to identify the changes between the previous data and the current data. Configure the Lambda function to ingest the changes into the data lake.
  2. Ingest the data into Amazon RDS for MySQL. Use AWS Database Migration Service (AWS DMS) to write the changed data to the data lake.
  3. Use an open source data lake format to merge the data source with the S3 data lake to insert the new data and update the existing data.
  4. Ingest the data into an Amazon Aurora MySQL DB instance that runs Aurora Serverless. Use AWS Database Migration Service (AWS DMS) to write the changed data to the data lake.

Answer(s): C

Explanation:

The correct answer is C because using an open source data lake format (such as Apache Iceberg or Delta Lake) enables ACID-compliant upserts/merges on a large-scale S3 data lake, allowing efficient CDC by merging daily full snapshots with existing data without heavy per-row processing or data movement. It minimizes storage and compute costs for tens of terabytes and small files, and supports scalable incremental updates.
A) Lambda-based diffing on large datasets is prohibitively expensive and slow for multi-terabyte files.
B) DMS with RDS MySQL adds relational DB maintenance and ongoing replication cost; CDC via DMS is not optimal for bulk S3 lake merging.
D) Aurora Serverless with DMS adds database compute cost and complexity; not the most cost-effective for bulk lake merges.



A data engineer runs Amazon Athena queries on data that is in an Amazon S3 bucket. The Athena queries use AWS Glue Data Catalog as a metadata table.
The data engineer notices that the Athena query plans are experiencing a performance bottleneck. The data engineer determines that the cause of the performance bottleneck is the large number of partitions that are in the S3 bucket. The data engineer must resolve the performance bottleneck and reduce Athena query planning time.
Which solutions will meet these requirements? (Choose two.)

  1. Create an AWS Glue partition index. Enable partition filtering.
  2. Bucket the data based on a column that the data have in common in a WHERE clause of the user query.
  3. Use Athena partition projection based on the S3 bucket prefix.
  4. Transform the data that is in the S3 bucket to Apache Parquet format.
  5. Use the Amazon EMR S3DistCP utility to combine smaller objects in the S3 bucket into larger objects.

Answer(s): A,C

Explanation:

Athena planning is sped up by reducing partition discovery and enabling predicate pushdown through partition metadata, which Glue partition index and partition projection provide.
A) Creates a Glue partition index and enables partition filtering to prune partitions at query planning time.
B) Bucketing by a common column does not affect partition discovery or metadata pruning in Athena when using Glue Catalog; it mainly affects data layout for certain query engines but not partition pruning in this setup.
C) Enables partition projection to avoid enumerating large numbers of partitions, speeding up planning by computing partitions from the S3 prefix without metadata lookups.
D) Parquet formatting improves scan efficiency but does not directly reduce partition discovery or planning time; it affects I/O and scan cost rather than planning bottlenecks.
E) S3DistCP merging objects reduces small object overhead but does not impact Athena’s partition planning or metadata pruning.



A data engineer must manage the ingestion of real-time streaming data into AWS. The data engineer wants to perform real-time analytics on the incoming streaming data by using time-based aggregations over a window of up to 30 minutes. The data engineer needs a solution that is highly fault tolerant.
Which solution will meet these requirements with the LEAST operational overhead?

  1. Use an AWS Lambda function that includes both the business and the analytics logic to perform time-based aggregations over a window of up to 30 minutes for the data in Amazon Kinesis Data Streams.
  2. Use Amazon Managed Service for Apache Flink (previously known as Amazon Kinesis Data Analytics) to analyze the data that might occasionally contain duplicates by using multiple types of aggregations.
  3. Use an AWS Lambda function that includes both the business and the analytics logic to perform aggregations for a tumbling window of up to 30 minutes, based on the event timestamp.
  4. Use Amazon Managed Service for Apache Flink (previously known as Amazon Kinesis Data Analytics) to analyze the data by using multiple types of aggregations to perform time-based analytics over a window of up to 30 minutes.

Answer(s): D

Explanation:

A concise, correct choice explanation:
D) Fully managed Apache Flink service provides stateful, fault-tolerant stream processing with native support for time-based windows (up to 30 minutes) and multiple aggregations, minimizing operational overhead. It handles exactly-once semantics and scalable stateful processing for real-time analytics.
A) Lambda with both business and analytics logic is operationally heavy and lacks built-in fault-tolerant streaming state management for long windows; potential reprocessing and state management increase overhead.
B) Flink option is correct; the claim about occasional duplicates is not aligned with Flink’s strong exactly-once semantics and proper windowing.
C) Lambda-based tumbling window on event time is less suitable for long-running, continuous aggregations and still requires complex orchestration and fault handling.



A company is planning to upgrade its Amazon Elastic Block Store (Amazon EBS) General Purpose SSD storage from gp2 to gp3. The company wants to prevent any interruptions in its Amazon EC2 instances that will cause data loss during the migration to the upgraded storage.
Which solution will meet these requirements with the LEAST operational overhead?

  1. Create snapshots of the gp2 volumes. Create new gp3 volumes from the snapshots. Attach the new gp3 volumes to the EC2 instances.
  2. Create new gp3 volumes. Gradually transfer the data to the new gp3 volumes. When the transfer is complete, mount the new gp3 volumes to the EC2 instances to replace the gp2 volumes.
  3. Change the volume type of the existing gp2 volumes to gp3. Enter new values for volume size, IOPS, and throughput.
  4. Use AWS DataSync to create new gp3 volumes. Transfer the data from the original gp2 volumes to the new gp3 volumes.

Answer(s): C

Explanation:

GP3 supports in-place volume type conversion and allows adjusting IOPS/throughput without data migration, minimizing downtime and operational overhead.
A) Incorrect: snapshots then create and attach new volumes introduces downtime during detachment/attachment and data consistency concerns; extra steps increase risk and cost.
B) Incorrect: gradual data transfer requires coordination and can still affect availability; mounting changes can cause brief interruption.
D) Incorrect: DataSync is unnecessary for block storage within the same volume; adds complexity and latency without reliability benefits for in-place upgrades.



A company is migrating its database servers from Amazon EC2 instances that run Microsoft SQL Server to Amazon RDS for Microsoft SQL Server DB instances. The company's analytics team must export large data elements every day until the migration is complete. The data elements are the result of SQL joins across multiple tables. The data must be in Apache Parquet format. The analytics team must store the data in Amazon S3.
Which solution will meet these requirements in the MOST operationally efficient way?

  1. Create a view in the EC2 instance-based SQL Server databases that contains the required data elements. Create an AWS Glue job that selects the data directly from the view and transfers the data in Parquet format to an S3 bucket. Schedule the AWS Glue job to run every day.
  2. Schedule SQL Server Agent to run a daily SQL query that selects the desired data elements from the EC2 instance-based SQL Server databases. Configure the query to direct the output .csv objects to an S3 bucket. Create an S3 event that invokes an AWS Lambda function to transform the output format from .csv to Parquet.
  3. Use a SQL query to create a view in the EC2 instance-based SQL Server databases that contains the required data elements. Create and run an AWS Glue crawler to read the view. Create an AWS Glue job that retrieves the data and transfers the data in Parquet format to an S3 bucket. Schedule the AWS Glue job to run every day.
  4. Create an AWS Lambda function that queries the EC2 instance-based databases by using Java Database Connectivity (JDBC). Configure the Lambda function to retrieve the required data, transform the data into Parquet format, and transfer the data into an S3 bucket. Use Amazon EventBridge to schedule the Lambda function to run every day.

Answer(s): C

Explanation:

A) Using Glue to read directly from a live view on EC2 is not straightforward since Glue typically crawls data stores for cataloging; accessing an on-premises or EC2-hosted SQL Server view would require a persistent connection and proper JDBC/ODBC setup, which adds operational overhead. B) SQL Server Agent on EC2 producing CSV to S3 and Lambda to convert adds unnecessary steps and potential ETL drift; managing two services increases operational burden. C) Creating a view, then using a Glue crawler to catalog the view and a Glue job to extract and convert to Parquet into S3 provides a fully managed, serverless, repeatable, and scalable workflow with minimal maintenance. D) Lambda with JDBC requires continuous connection management, cold start considerations, and scripting for Parquet conversion, which is less operationally efficient than Glue-based ETL.



A data engineering team is using an Amazon Redshift data warehouse for operational reporting. The team wants to prevent performance issues that might result from long- running queries. A data engineer must choose a system table in Amazon Redshift to record anomalies when a query optimizer identifies conditions that might indicate performance issues.
Which table views should the data engineer use to meet this requirement?

  1. STL_USAGE_CONTROL
  2. STL_ALERT_EVENT_LOG
  3. STL_QUERY_METRICS
  4. STL_PLAN_INFO

Answer(s): B

Explanation:

A) STL_ALERT_EVENT_LOG
B) STL_ALERT_EVENT_LOG is the system view that records anomalies and alerts detected by the query optimizer when potential performance issues are identified, making it the appropriate source for monitoring long-running or problematic queries. C) STL_QUERY_METRICS contains per-query metrics but does not specifically log anomalies identified by the optimizer. D) STL_PLAN_INFO provides plan details but not a centralized anomaly/alert log. A) STL_USAGE_CONTROL is related to usage controls and does not capture optimizer anomaly events.



Share your comments for Amazon Amazon-DEA-C01 exam with other users:

S
SH
5/16/2023 1:43:00 PM

these questions are not valid , they dont come for the exam now

S
sudhagar
9/6/2023 3:02:00 PM

question looks valid

V
Van
11/24/2023 4:02:00 AM

good for practice

D
Divya
8/2/2023 6:54:00 AM

need more q&a to go ahead

R
Rakesh
10/6/2023 3:06:00 AM

question 59 - a newly-created role is not assigned to any user, nor granted to any other role. answer is b https://docs.snowflake.com/en/user-guide/security-access-control-overview

N
Nik
11/10/2023 4:57:00 AM

just passed my exam today. i saw all of these questions in my text today. so i can confirm this is a valid dump.

D
Deep
6/12/2023 7:22:00 AM

needed dumps

T
tumz
1/16/2024 10:30:00 AM

very helpful

N
NRI
8/27/2023 10:05:00 AM

will post once the exam is finished

K
kent
11/3/2023 10:45:00 AM

relevant questions

Q
Qasim
6/11/2022 9:43:00 AM

just clear exam on 10/06/2202 dumps is valid all questions are came same in dumps only 2 new questions total 46 questions 1 case study with 5 question no lab/simulation in my exam please check the answers best of luck

C
Cath
10/10/2023 10:09:00 AM

q.112 - correct answer is c - the event registry is a module that provides event definitions. answer a - not correct as it is the definition of event log

S
Shiji
10/15/2023 1:31:00 PM

good and useful.

A
Ade
6/25/2023 1:14:00 PM

good questions

P
Praveen P
11/8/2023 5:18:00 AM

good content

A
Anastasiia
12/28/2023 9:06:00 AM

totally not correct answers. 21. you have one gcp account running in your default region and zone and another account running in a non-default region and zone. you want to start a new compute engine instance in these two google cloud platform accounts using the command line interface. what should you do? correct: create two configurations using gcloud config configurations create [name]. run gcloud config configurations activate [name] to switch between accounts when running the commands to start the compute engine instances.

P
Priyanka
7/24/2023 2:26:00 AM

kindly upload the dumps

N
Nabeel
7/25/2023 4:11:00 PM

still learning

G
gure
7/26/2023 5:10:00 PM

excellent way to learn

C
ciken
8/24/2023 2:55:00 PM

help so much

B
Biswa
11/20/2023 9:28:00 AM

understand sql col.

S
Saint Pierre
10/24/2023 6:21:00 AM

i would give 5 stars to this website as i studied for az-800 exam from here. it has all the relevant material available for preparation. i got 890/1000 on the test.

R
Rose
7/24/2023 2:16:00 PM

this is nice.

A
anon
10/15/2023 12:21:00 PM

q55- the ridac workflow can be modified using flow designer, correct answer is d not a

N
NanoTek3
6/13/2022 10:44:00 PM

by far this is the most accurate exam dumps i have ever purchased. all questions are in the exam. i saw almost 90% of the questions word by word.

E
eriy
11/9/2023 5:12:00 AM

i cleared the az-104 exam by scoring 930/1000 on the exam. it was all possible due to this platform as it provides premium quality service. thank you!

M
Muhammad Rawish Siddiqui
12/8/2023 8:12:00 PM

question # 232: accessibility, privacy, and innovation are not data quality dimensions.

V
Venkat
12/27/2023 9:04:00 AM

looks wrong answer for 443 question, please check and update

V
Varun
10/29/2023 9:11:00 PM

great question

D
Doc
10/29/2023 9:36:00 PM

question: a user wants to start a recruiting posting job posting. what must occur before the posting process can begin? 3 ans: comment- option e is incorrect reason: as part of enablement steps, sap recommends that to be able to post jobs to a job board, a user need to have the correct permission and secondly, be associated with one posting profile at minimum

I
It‘s not A
9/17/2023 5:31:00 PM

answer to question 72 is d [sys_user_role]

I
indira m
8/14/2023 12:15:00 PM

please provide the pdf

R
ribrahim
8/1/2023 6:05:00 AM

hey guys, just to let you all know that i cleared my 312-38 today within 1 hr with 100 questions and passed. thank you so much brain-dumps.net all the questions that ive studied in this dump came out exactly the same word for word "verbatim". you rock brain-dumps.net!!! section name total score gained score network perimeter protection 16 11 incident response 10 8 enterprise virtual, cloud, and wireless network protection 12 8 application and data protection 13 10 network défense management 10 9 endpoint protection 15 12 incident d

A
Andrew
8/23/2023 6:02:00 PM

very helpful

AI Tutor 👋 I’m here to help!