A data engineer is building a data pipeline on AWS by using AWS Glue extract, transform, and load (ETL) jobs. The data engineer needs to process data from Amazon RDS and MongoDB, perform transformations, and load the transformed data into Amazon Redshift for analytics. The data updates must occur every hour.Which combination of tasks will meet these requirements with the LEAST operational overhead? (Choose two.)
Answer(s): A,D
Hourly triggers in AWS Glue provide automated, serverless ETL execution aligned with the requirement, meeting the least-ops overhead. A) Glue triggers can schedule ETL jobs to run on an hourly cadence without manual intervention. D) Glue connections enable secure, managed connectivity between RDS, MongoDB, and Redshift within Glue’s managed environment, simplifying data movement and transformation without custom networking setup. B) DataBrew is primarily for data cleaning in a data prep context and not a full ETL workflow for multiple data stores to Redshift. C) Lambda scheduling would add orchestration and state management overhead. E) Redshift Data API is for issuing SQL from applications, not for orchestrating and loading ETL pipelines.
A company uses an Amazon Redshift cluster that runs on RA3 nodes. The company wants to scale read and write capacity to meet demand. A data engineer needs to identify a solution that will turn on concurrency scaling.Which solution will meet this requirement?
Answer(s): B
Concurrency scaling in Redshift is enabled at the WLM queue level for a provisioned cluster, which allows automatic scaling of read/write workloads to handle bursts without user intervention. A) is incorrect because concurrency scaling applies to cluster-based WLM, not Serverless workgroups. B) is correct. C) is incorrect because concurrency scaling is not toggled globally at cluster creation; it is configured per WLM queue. D) is incorrect because daily usage quotas are unrelated to concurrency scaling behavior.
A data engineer must orchestrate a series of Amazon Athena queries that will run every day. Each query can run for more than 15 minutes.Which combination of steps will meet these requirements MOST cost-effectively? (Choose two.)
Answer(s): A,B
Athena queries over 15 minutes are best orchestrated with event-driven control and scalable coordination; Lambda with start_query_execution is cost-effective for invoking queries, and Step Functions can poll for completion without idle EC2 costs.A) Lambda with Athena Boto3 start_query_execution is cost-efficient for short-lived orchestration and triggers; suitable for repeated daily runs without provisioning servers.B) Step Functions with a Wait and get_query_execution provides reliable polling and sequencing across multiple queries without constant active polling, reducing compute waste.C) Glue Python shell is more expensive and not ideal for long-running, frequent daily queries; adds unnecessary ETL service.D) Glue Python shell with sleep polling incurs unnecessary Lambda-like idle wait and maintenance overhead.E) MWAA introduces extra managed Airflow overhead and AWS Batch, not cost-optimal for simple sequential tasks.
A company is migrating on-premises workloads to AWS. The company wants to reduce overall operational overhead. The company also wants to explore serverless options.The company's current workloads use Apache Pig, Apache Oozie, Apache Spark, Apache Hbase, and Apache Flink. The on-premises workloads process petabytes of data in seconds. The company must maintain similar or better performance after the migration to AWS.Which extract, transform, and load (ETL) service will meet these requirements?
A strong fit: Amazon EMR, because it provides managed clusters for big data frameworks (Hadoop, Spark, HBase, Flink, Pig, Oozie) enabling scalable ETL at prior-on-prem performance, with options to run on-demand and serverless-like flexibility via EMR on EKS/Step Functions integration, reducing operational overhead.A) AWS Glue is serverless but primarily targets data cataloging and ETL for structured data; it may not natively support Pig, Oozie, HBase, or Flink at petabyte-scale with existing Pig/Oozie workflows.C) AWS Lambda is serverless compute but not suitable for long-running, heavy ETL workloads and complex big data pipelines at petabyte scale.D) Amazon Redshift is a data warehouse, not an ETL service, and lacks direct support for Pig/Oozie workflows and HBase/Flink-based processing.
A data engineer must use AWS services to ingest a dataset into an Amazon S3 data lake. The data engineer profiles the dataset and discovers that the dataset contains personally identifiable information (PII). The data engineer must implement a solution to profile the dataset and obfuscate the PII.Which solution will meet this requirement with the LEAST operational effort?
The Detect PII transform in AWS Glue Studio provides built-in profiling and PII detection with minimal setup, and combining it with obfuscation and an orchestrated ingest pipeline via AWS Step Functions yields a low-operational, serverless solution to profile and mask data before storing in S3.A) Requires custom Lambda transform and SDK, increasing operational overhead and maintenance risk. C) Uses Glue Studio detection but relies on Glue Data Quality for obfuscation, adding extra tools and steps. D) Involves DynamoDB and Lambda for both detection and obfuscation, plus manual data movement to S3, raising complexity and latency.
A company maintains multiple extract, transform, and load (ETL) workflows that ingest data from the company's operational databases into an Amazon S3 based data lake. The ETL workflows use AWS Glue and Amazon EMR to process data.The company wants to improve the existing architecture to provide automated orchestration and to require minimal manual effort.Which solution will meet these requirements with the LEAST operational overhead?
Automating ETL orchestration with minimal manual effort is best achieved using AWS Step Functions, which can coordinate Glue jobs, EMR steps, and other AWS services in serverless workflows with built-in retries, error handling, and visual monitoring.A) AWS Glue workflows are Glue-native but provide limited cross-service orchestration and less flexibility for complex state machines compared to Step Functions.C) AWS Lambda functions require custom orchestration logic and may not handle long-running tasks efficiently, increasing operational effort.D) Amazon MWAA provides Airflow-based orchestration but introduces more management overhead and is not as lightweight as Step Functions for serverless, event-driven workflows.
A company currently stores all of its data in Amazon S3 by using the S3 Standard storage class.A data engineer examined data access patterns to identify trends. During the first 6 months, most data files are accessed several times each day. Between 6 months and 2 years, most data files are accessed once or twice each month. After 2 years, data files are accessed only once or twice each year.The data engineer needs to use an S3 Lifecycle policy to develop new data storage rules. The new storage solution must continue to provide high availability.Which solution will meet these requirements in the MOST cost-effective way?
Transition to S3 Standard-IA after 6 months preserves high availability while reducing cost for infrequently accessed data, and then move to Glacier Flexible Retrieval after 2 years for long-term archival at lower costs. This aligns with a tiered retention pattern: frequent access early, then infrequent, then archival, without sacrificing availability during the lifecycle.A) One Zone-IA reduces durability (single AZ) and is not ideal for high availability across AZ failures.C) Glacier Deep Archive after 2 years offers the lowest cost but higher retrieval times and is not the most cost-effective given typical retrieval requirements.D) One Zone-IA + Glacier Deep Archive further compromises durability and performance compared with Standard-IA followed by Glacier Flexible Retrieval.
A company maintains an Amazon Redshift provisioned cluster that the company uses for extract, transform, and load (ETL) operations to support critical analysis tasks. A sales team within the company maintains a Redshift cluster that the sales team uses for business intelligence (BI) tasks.The sales team recently requested access to the data that is in the ETL Redshift cluster so the team can perform weekly summary analysis tasks. The sales team needs to join data from the ETL cluster with data that is in the sales team's BI cluster.The company needs a solution that will share the ETL cluster data with the sales team without interrupting the critical analysis tasks. The solution must minimize usage of the computing resources of the ETL cluster.Which solution will meet these requirements?
Answer(s): A
Redshift data sharing allows a consumer cluster (sales BI) to access live data from the producer cluster (ETL) without data duplication or ETL cluster load, meeting low-resource impact and real-time joined analytics.A) Correct. Redshift data sharing enables cross-cluster query access with minimal compute on the producer, avoiding ETL disruption.B) Incorrect. Materialized views would require data duplication or periodic refresh, and granting direct access to the ETL cluster increases load and risks contention.C) Incorrect. Database views alone offer no cross-cluster sharing; direct access forces ETL cluster workload and potential performance impact.D) Incorrect. Unloading to S3 and Spectrum adds ETL to ETL data movement, introduces latency, and does not provide real-time joins between clusters.
Share your comments for Amazon DEA-C01 exam with other users:
great, i appreciate it.
please could you upload (isc)2 certified in cybersecurity (cc) exam questions
good questions, wrong answers
im preparing for exams
question no: 42 isnt azure vm an iaas solution? so, shouldnt the answer be "no"?
im study azure
i need this now
i took the aws saa-c03 test and scored 935/1000. it has all the exam dumps and important info.
good questions
well explained
i got the full version and it helped me pass the exam. pdf version is very good.
provide the download link, please
please upload thank.
please can you share 1z0-1055-22 dump pls
i will wait impatiently. thank youu
is it possible to clear the exam if we focus on only these 156 questions instead of 623 questions? kindly help!
really helped with preparation of my scrum exam
very informative and through explanations
prep for exam
thanks for helping us
i prepared for the eccouncil 350-401 exam. i scored 92% on the test.
aba questions to practice
great content
how do i get the remaining questions?
well formatted pdf and the test engine software is free. well worth the money i sept.
looking for 1z0-116
in question 22, shouldnt be in the data (option a) layer?
the questions are incredibly close to real exam. you people are amazing.
q15. answer is b. simple
great practice
thanks to this exam dumps, i felt confident and passed my exam with ease.
need 1z0-1105-22 exam
this is a beautiful tool. passed after a week of studying.
can you please upload the dumps for 1z0-1096-23 for oracle