A data engineer needs to schedule a workflow that runs a set of AWS Glue jobs every day. The data engineer does not require the Glue jobs to run or finish at a specific time.Which solution will run the Glue jobs in the MOST cost-effective way?
Answer(s): A
A) Choosing FLEX execution class is most cost-effective for nondeterministic or flexible-start workflows, as FLEX allows Glue to use fewer compute resources and scale efficiently when exact start/finish times aren’t required.B) Spot Instances are not a Glue job option; Glue supports on-demand, streaming, and a few pricing models, but Spot isn’t applicable for Glue jobs.C) STANDARD is a fixed cost model; it may be more expensive for sporadic, scheduled tasks that don’t need guaranteed timing.D) GlueVersion specifies features, not cost optimization; newer versions don’t inherently reduce cost for flexible schedules.
A data engineer needs to create an AWS Lambda function that converts the format of data from .csv to Apache Parquet. The Lambda function must run only if a user uploads a .csv file to an Amazon S3 bucket.Which solution will meet these requirements with the LEAST operational overhead?
A is correct because S3 event notifications can filter for object creation events with a .csv suffix and directly invoke the Lambda function, minimizing components and operational overhead. B is incorrect because tag-based triggers require tagging policy and do not guarantee the file is a CSV, adding complexity. C is incorrect because s3:* is overly broad and not needed; it would generate excessive events and complicate processing. D is incorrect because using SNS introduces an additional service and subscription step, increasing latency and maintenance versus direct Lambda invocation from S3 events.
A data engineer needs Amazon Athena queries to finish faster. The data engineer notices that all the files the Athena queries use are currently stored in uncompressed .csv format. The data engineer also notices that users perform most queries by selecting a specific column.Which solution will MOST speed up the Athena query performance?
Answer(s): C
Athena performance improves when using columnar, compressed formats and predicate pushdown; Parquet with Snappy enables efficient column pruning for single-column queries.A) JSON is not columnar and increases read I/O; adds no efficient columnar pruning, even with Snappy.B) Snappy compression alone on CSV helps slightly but does not enable columnar pruning or reduce I/O as effectively as Parquet.C) Parquet is columnar AND Snappy-compressed, enabling predicate pushdown for specific columns and substantial I/O reduction, speeding queries.D) gzip is a single-stream compression on CSV, not columnar and limits parallelism/Predicate pushdown; less performance gain than Parquet.
A manufacturing company collects sensor data from its factory floor to monitor and enhance operational efficiency. The company uses Amazon Kinesis Data Streams to publish the data that the sensors collect to a data stream. Then Amazon Kinesis Data Firehose writes the data to an Amazon S3 bucket.The company needs to display a real-time view of operational efficiency on a large screen in the manufacturing facility.Which solution will meet these requirements with the LOWEST latency?
A) Real-time processing with Amazon Managed Service for Apache Flink and a Grafana dashboard using Timestream minimizes latency by streaming analytics directly from Kinesis Data Streams, with Timestream as a time-series store and Grafana for real-time visualization.B) S3 event-based Lambda introduces higher latency due to polling and object-level processing; Aurora+QuickSight is batch-oriented and not real-time.C) While Flink is suitable, publishing directly from Flink to Timestream via a dedicated Firehose stream adds unnecessary hops and potential latency compared to inline Flink processing and Grafana.D) AWS Glue bookmarks are batch-oriented and not suitable for real-time dashboards; Grafana over Timestream would be possible, but the end-to-end real-time path is weaker than A.
A company stores daily records of the financial performance of investment portfolios in .csv format in an Amazon S3 bucket. A data engineer uses AWS Glue crawlers to crawl the S3 data.The data engineer must make the S3 data accessible daily in the AWS Glue Data Catalog.Which solution will meet these requirements?
Answer(s): B
The correct option B: Uses the AWSGlueServiceRole, which is the appropriate IAM role for Glue crawlers, ensuring least privilege and proper integration with the Glue service; specifying the S3 source and a daily crawl with a database name for the output places the catalog metadata in a known database.A) Uses AmazonS3FullAccess, excessive permissions and not required; output destination path concept not necessary for catalog integration.C) Adds DPUs but lacks proper output database naming and uses broad S3 access; unnecessary for catalog registration.D) Same issue as C with DPUs and output path, plus uses full service role rather than Glue service role.
A company loads transaction data for each day into Amazon Redshift tables at the end of each day. The company wants to have the ability to track which tables have been loaded and which tables still need to be loaded.A data engineer wants to store the load statuses of Redshift tables in an Amazon DynamoDB table. The data engineer creates an AWS Lambda function to publish the details of the load statuses to DynamoDB.How should the data engineer invoke the Lambda function to write load statuses to the DynamoDB table?
B) The Redshift Data API can publish events to EventBridge, which can trigger the Lambda to write load statuses to DynamoDB, enabling decoupled, serverless event-driven updates aligned with Redshift activity.A) Requires a separate Lambda and CloudWatch events; more complex and not as direct as EventBridge integration with Redshift Data API.C) SQS-to-Lambda path adds unnecessary queueing and is not the idiomatic Redshift event notification mechanism.D) CloudTrail events are audit logs, not intended for real-time data workflow triggers between Redshift and Lambda.
A data engineer needs to securely transfer 5 TB of data from an on-premises data center to an Amazon S3 bucket. Approximately 5% of the data changes every day. Updates to the data need to be regularly proliferated to the S3 bucket. The data includes files that are in multiple formats. The data engineer needs to automate the transfer process and must schedule the process to run periodically.Which AWS service should the data engineer use to transfer the data in the MOST operationally efficient way?
A) AWS DataSync is correct because it enables secure, automated, periodic transfer of large on-premises datasets to S3, supports incremental changes, multiple file formats, and can schedule transfers; it handles continuous updates efficiently without manual scripting.B) AWS Glue is optimized for ETL processing and data cataloging, not for secure, ongoing bulk transfer from on-premises to S3 with scheduling and incremental sync.C) AWS Direct Connect provides a dedicated network connection, not data movement orchestration or scheduling of transfers to S3.D) Amazon S3 Transfer Acceleration speeds individual uploads over long distances but is not designed for automated, scheduled, incremental sync from on-premises with ongoing updates.
A company uses an on-premises Microsoft SQL Server database to store financial transaction data. The company migrates the transaction data from the on-premises database to AWS at the end of each month. The company has noticed that the cost to migrate data from the on-premises database to an Amazon RDS for SQL Server database has increased recently.The company requires a cost-effective solution to migrate the data to AWS. The solution must cause minimal downtown for the applications that access the database.Which AWS service should the company use to meet these requirements?
AWS DMS is designed for ongoing or batch migrations with minimal downtime, supporting continuous replication from on-premises SQL Server to RDS for SQL Server and optimizing for cost during monthly migrations.A) AWS Lambda is event-driven compute unsuitable for large data migrations and lacks built-in data replication capabilities.C) AWS Direct Connect provides dedicated network connectivity but does not handle data transformation or ongoing replication between on-prem and AWS.D) AWS DataSync focuses on high-speed transfer of files and object storage, not relational database replication to RDS.B) Correct: DMS handles database migration with minimal downtime and cost-effective, ongoing replication for SQL Server to RDS.
Share your comments for Amazon DEA-C01 exam with other users:
please upload dump, i have exam in 2 days
this is useful
question 232 answer should be perimeter not netowrk layer. wrong answer selected
nice questions
hi team, could you please provide this dump ?
very helpful to clear the exam and understand the concept.
i think it is great that you are helping people when they need it. thanks.
cannot evaluate yet
a laptops wireless antenna is most likely located in the bezel of the lid
good examplae to learn basic
this is useful information
looks usefull
question 81 should be c.
question 18 : response isnt a ?
plaese add questions
is dumps still valid ?
thanks for this
please upload questions
please upload the question dump for professional machinelearning
question 4 answer is c. this site shows the correct answer as b. "adopt a consumption model" is clearly a cost optimization design principle. looks like im done using this site to study!!!
number 52 answer is d
just started preparing for my exam , and this site is so much help
question 35 is incorrect, the correct answer is c, it even states so: explanation: when a vm is infected with ransomware, you should not restore the vm to the infected vm. this is because the ransomware will still be present on the vm, and it will encrypt the files again. you should also not restore the vm to any vm within the companys subscription. this is because the ransomware could spread to other vms in the subscription. the best way to restore a vm that is infected with ransomware is to restore it to a new azure vm. this will ensure that the ransomware is not present on the new vm.
i would like to take psm1 exam.
cbd and pdb are key to the database
the purchase and download process is very much streamlined. the xengine application is very nice and user-friendly but there is always room for improvement.
please upload p_sapea_2023
anyone use this? the question dont seem to follow other formats and terminology i have been studying im getting worried
good questions
hello are these questions valid for ms-102
some questions are wrongly answered but its good nonetheless
how to get system serial number using intune
is it really helpful to pass the exam
#229 in incorrect - all the customers require an annual review