Amazon AWS Certified Data Engineer - Associate DEA-C01 Exam (page: 6)
Amazon AWS Certified Data Engineer - Associate DEA-C01
Updated on: 25-Dec-2025

A company is planning to use a provisioned Amazon EMR cluster that runs Apache Spark jobs to perform big data analysis. The company requires high reliability. A big data team must follow best practices for running cost-optimized and long-running workloads on Amazon EMR. The team must find a solution that will maintain the company's current level of performance.
Which combination of resources will meet these requirements MOST cost-effectively? (Choose two.)

  1. Use Hadoop Distributed File System (HDFS) as a persistent data store.
  2. Use Amazon S3 as a persistent data store.
  3. Use x86-based instances for core nodes and task nodes.
  4. Use Graviton instances for core nodes and task nodes.
  5. Use Spot Instances for all primary nodes.

Answer(s): B,D



A company wants to implement real-time analytics capabilities. The company wants to use Amazon Kinesis Data Streams and Amazon Redshift to ingest and process streaming data at the rate of several gigabytes per second. The company wants to derive near real-time insights by using existing business intelligence (BI) and analytics tools.
Which solution will meet these requirements with the LEAST operational overhead?

  1. Use Kinesis Data Streams to stage data in Amazon S3. Use the COPY command to load data from Amazon S3 directly into Amazon Redshift to make the data immediately available for real-time analysis.
  2. Access the data from Kinesis Data Streams by using SQL queries. Create materialized views directly on top of the stream. Refresh the materialized views regularly to query the most recent stream data.
  3. Create an external schema in Amazon Redshift to map the data from Kinesis Data Streams to an Amazon Redshift object. Create a materialized view to read data from the stream. Set the materialized view to auto refresh.
  4. Connect Kinesis Data Streams to Amazon Kinesis Data Firehose. Use Kinesis Data Firehose to stage the data in Amazon S3. Use the COPY command to load the data from Amazon S3 to a table in Amazon Redshift.

Answer(s): C



A company uses an Amazon QuickSight dashboard to monitor usage of one of the company's applications. The company uses AWS Glue jobs to process data for the dashboard. The company stores the data in a single Amazon S3 bucket. The company adds new data every day.
A data engineer discovers that dashboard queries are becoming slower over time. The data engineer determines that the root cause of the slowing queries is long-running AWS Glue jobs.
Which actions should the data engineer take to improve the performance of the AWS Glue jobs? (Choose two.)

  1. Partition the data that is in the S3 bucket. Organize the data by year, month, and day.
  2. Increase the AWS Glue instance size by scaling up the worker type.
  3. Convert the AWS Glue schema to the DynamicFrame schema class.
  4. Adjust AWS Glue job scheduling frequency so the jobs run half as many times each day.
  5. Modify the IAM role that grants access to AWS glue to grant access to all S3 features.

Answer(s): A,B



A data engineer needs to use AWS Step Functions to design an orchestration workflow. The workflow must parallel process a large collection of data files and apply a specific transformation to each file.
Which Step Functions state should the data engineer use to meet these requirements?

  1. Parallel state
  2. Choice state
  3. Map state
  4. Wait state

Answer(s): C



A company is migrating a legacy application to an Amazon S3 based data lake. A data engineer reviewed data that is associated with the legacy application. The data engineer found that the legacy data contained some duplicate information.
The data engineer must identify and remove duplicate information from the legacy application data.
Which solution will meet these requirements with the LEAST operational overhead?

  1. Write a custom extract, transform, and load (ETL) job in Python. Use the DataFrame.drop_duplicates() function by importing the Pandas library to perform data deduplication.
  2. Write an AWS Glue extract, transform, and load (ETL) job. Use the FindMatches machine learning (ML) transform to transform the data to perform data deduplication.
  3. Write a custom extract, transform, and load (ETL) job in Python. Import the Python dedupe library. Use the dedupe library to perform data deduplication.
  4. Write an AWS Glue extract, transform, and load (ETL) job. Import the Python dedupe library. Use the dedupe library to perform data deduplication.

Answer(s): B



Viewing Page 6 of 43



Share your comments for Amazon AWS Certified Data Engineer - Associate DEA-C01 exam with other users:

Abhi 9/19/2023 1:22:00 PM

thanks for helping us
Anonymous


mrtom33 11/20/2023 4:51:00 AM

i prepared for the eccouncil 350-401 exam. i scored 92% on the test.
Anonymous


JUAN 6/28/2023 2:12:00 AM

aba questions to practice
UNITED STATES


LK 1/2/2024 11:56:00 AM

great content
Anonymous


Srijeeta 10/8/2023 6:24:00 AM

how do i get the remaining questions?
INDIA


Jovanne 7/26/2022 11:42:00 PM

well formatted pdf and the test engine software is free. well worth the money i sept.
ITALY


CHINIMILLI SATISH 8/29/2023 6:22:00 AM

looking for 1z0-116
Anonymous


Pedro Afonso 1/15/2024 8:01:00 AM

in question 22, shouldnt be in the data (option a) layer?
Anonymous


Pushkar 11/7/2022 12:12:00 AM

the questions are incredibly close to real exam. you people are amazing.
INDIA


Ankit S 11/13/2023 3:58:00 AM

q15. answer is b. simple
UNITED STATES


S. R 12/8/2023 9:41:00 AM

great practice
FRANCE


Mungara 3/14/2023 12:10:00 AM

thanks to this exam dumps, i felt confident and passed my exam with ease.
UNITED STATES


Anonymous 7/25/2023 2:55:00 AM

need 1z0-1105-22 exam
Anonymous


Nigora 5/31/2022 10:05:00 PM

this is a beautiful tool. passed after a week of studying.
UNITED STATES


Av dey 8/16/2023 2:35:00 PM

can you please upload the dumps for 1z0-1096-23 for oracle
INDIA


Mayur Shermale 11/23/2023 12:22:00 AM

its intresting, i would like to learn more abouth this
JAPAN


JM 12/19/2023 2:23:00 PM

q252: dns poisoning is the correct answer, not locator redirection. beaconing is detected from a host. this indicates that the system has been infected with malware, which could be the source of local dns poisoning. location redirection works by either embedding the redirection in the original websites code or having a user click on a url that has an embedded redirect. since users at a different office are not getting redirected, it isnt an embedded redirection on the original website and since the user is manually typing in the url and not clicking a link, it isnt a modified link.
UNITED STATES


Freddie 12/12/2023 12:37:00 PM

helpful dump questions
SOUTH AFRICA


Da Costa 8/25/2023 7:30:00 AM

question 423 eigrp uses metric
Anonymous


Bsmaind 8/20/2023 9:22:00 AM

hello nice dumps
Anonymous


beau 1/12/2024 4:53:00 PM

good resource for learning
UNITED STATES


Sandeep 12/29/2023 4:07:00 AM

very useful
Anonymous


kevin 9/29/2023 8:04:00 AM

physical tempering techniques
Anonymous


Blessious Phiri 8/15/2023 4:08:00 PM

its giving best technical knowledge
Anonymous


Testbear 6/13/2023 11:15:00 AM

please upload
ITALY


shime 10/24/2023 4:23:00 AM

great question with explanation thanks!!
ETHIOPIA


Thembelani 5/30/2023 2:40:00 AM

does this exam have lab sections?
Anonymous


Shin 9/8/2023 5:31:00 AM

please upload
PHILIPPINES


priti kagwade 7/22/2023 5:17:00 AM

please upload the braindump for .net
UNITED STATES


Robe 9/27/2023 8:15:00 PM

i need this exam 1z0-1107-2. please.
Anonymous


Chiranthaka 9/20/2023 11:22:00 AM

very useful!
Anonymous


Not Miguel 11/26/2023 9:43:00 PM

for this question - "which three type of basic patient or member information is displayed on the patient info component? (choose three.)", list of conditions is not displayed (it is displayed in patient card, not patient info). so should be thumbnail of chatter photo
Anonymous


Andrus 12/17/2023 12:09:00 PM

q52 should be d. vm storage controller bandwidth represents the amount of data (in terms of bandwidth) that a vms storage controller is using to read and write data to the storage fabric.
Anonymous


Raj 5/25/2023 8:43:00 AM

nice questions
UNITED STATES