A company has a frontend ReactJS website that uses Amazon API Gateway to invoke REST APIs. The APIs perform the functionality of the website. A data engineer needs to write a Python script that can be occasionally invoked through API Gateway. The code must return results to API Gateway.Which solution will meet these requirements with the LEAST operational overhead?
Answer(s): B
A) The least operational overhead is achieved with a Lambda function that can be invoked by API Gateway and does not require managing servers or containers, with provisioned concurrency ensuring cold-start avoidance. B) Correct: AWS Lambda Python function with provisioned concurrency minimizes latency and management effort; API Gateway integration is direct, and no infrastructure provisioning is required. C) EKS adds substantial operational overhead for Kubernetes management and does not align with “least overhead” for a small script invoked via API Gateway. D) Regularly pinging a Lambda to stay warm is unnecessary with provisioned concurrency and adds unnecessary scheduling, increasing operational overhead and complexity.
A company has a production AWS account that runs company workloads. The company's security team created a security AWS account to store and analyze security logs from the production AWS account. The security logs in the production AWS account are stored in Amazon CloudWatch Logs.The company needs to use Amazon Kinesis Data Streams to deliver the security logs to the security AWS account.Which solution will meet these requirements?
Answer(s): D
Kinesis Data Streams must reside in the destination account (security account) where logs from CloudWatch Logs will be delivered via a subscription filter. Creating the destination stream in the security account and granting CloudWatch Logs permission to put records, with a subscription filter, aligns cross-account delivery without requiring cross-account Data Streams permissions in the source account.A) Wrong: destination stream in production requires cross-account Kinesis permissions; not aligned with central security account ownership.B) Wrong: subscription filter targets CloudWatch Logs to a Kinesis stream in security account, but missing cross-account trust for CloudWatch Logs in production; workflow incorrect.C) Wrong: destination stream in production; cross-account role in production to security is unnecessary and misaligned with security-centric design.D) Correct: destination stream in security account; IAM trust policy allows CloudWatch Logs to write; subscription filter enables delivery from production logs to the security account.
A company uses Amazon S3 to store semi-structured data in a transactional data lake. Some of the data files are small, but other data files are tens of terabytes.A data engineer must perform a change data capture (CDC) operation to identify changed data from the data source. The data source sends a full snapshot as a JSON file every day and ingests the changed data into the data lake.Which solution will capture the changed data MOST cost-effectively?
Answer(s): C
The correct answer is C because using an open source data lake format (such as Apache Iceberg or Delta Lake) enables ACID-compliant upserts/merges on a large-scale S3 data lake, allowing efficient CDC by merging daily full snapshots with existing data without heavy per-row processing or data movement. It minimizes storage and compute costs for tens of terabytes and small files, and supports scalable incremental updates.A) Lambda-based diffing on large datasets is prohibitively expensive and slow for multi-terabyte files.B) DMS with RDS MySQL adds relational DB maintenance and ongoing replication cost; CDC via DMS is not optimal for bulk S3 lake merging.D) Aurora Serverless with DMS adds database compute cost and complexity; not the most cost-effective for bulk lake merges.
A data engineer runs Amazon Athena queries on data that is in an Amazon S3 bucket. The Athena queries use AWS Glue Data Catalog as a metadata table.The data engineer notices that the Athena query plans are experiencing a performance bottleneck. The data engineer determines that the cause of the performance bottleneck is the large number of partitions that are in the S3 bucket. The data engineer must resolve the performance bottleneck and reduce Athena query planning time.Which solutions will meet these requirements? (Choose two.)
Answer(s): A,C
Athena planning is sped up by reducing partition discovery and enabling predicate pushdown through partition metadata, which Glue partition index and partition projection provide.A) Creates a Glue partition index and enables partition filtering to prune partitions at query planning time. B) Bucketing by a common column does not affect partition discovery or metadata pruning in Athena when using Glue Catalog; it mainly affects data layout for certain query engines but not partition pruning in this setup. C) Enables partition projection to avoid enumerating large numbers of partitions, speeding up planning by computing partitions from the S3 prefix without metadata lookups. D) Parquet formatting improves scan efficiency but does not directly reduce partition discovery or planning time; it affects I/O and scan cost rather than planning bottlenecks. E) S3DistCP merging objects reduces small object overhead but does not impact Athena’s partition planning or metadata pruning.
A data engineer must manage the ingestion of real-time streaming data into AWS. The data engineer wants to perform real-time analytics on the incoming streaming data by using time-based aggregations over a window of up to 30 minutes. The data engineer needs a solution that is highly fault tolerant.Which solution will meet these requirements with the LEAST operational overhead?
A concise, correct choice explanation:D) Fully managed Apache Flink service provides stateful, fault-tolerant stream processing with native support for time-based windows (up to 30 minutes) and multiple aggregations, minimizing operational overhead. It handles exactly-once semantics and scalable stateful processing for real-time analytics.A) Lambda with both business and analytics logic is operationally heavy and lacks built-in fault-tolerant streaming state management for long windows; potential reprocessing and state management increase overhead.B) Flink option is correct; the claim about occasional duplicates is not aligned with Flink’s strong exactly-once semantics and proper windowing.C) Lambda-based tumbling window on event time is less suitable for long-running, continuous aggregations and still requires complex orchestration and fault handling.
A company is planning to upgrade its Amazon Elastic Block Store (Amazon EBS) General Purpose SSD storage from gp2 to gp3. The company wants to prevent any interruptions in its Amazon EC2 instances that will cause data loss during the migration to the upgraded storage.Which solution will meet these requirements with the LEAST operational overhead?
GP3 supports in-place volume type conversion and allows adjusting IOPS/throughput without data migration, minimizing downtime and operational overhead.A) Incorrect: snapshots then create and attach new volumes introduces downtime during detachment/attachment and data consistency concerns; extra steps increase risk and cost.B) Incorrect: gradual data transfer requires coordination and can still affect availability; mounting changes can cause brief interruption.D) Incorrect: DataSync is unnecessary for block storage within the same volume; adds complexity and latency without reliability benefits for in-place upgrades.
A company is migrating its database servers from Amazon EC2 instances that run Microsoft SQL Server to Amazon RDS for Microsoft SQL Server DB instances. The company's analytics team must export large data elements every day until the migration is complete. The data elements are the result of SQL joins across multiple tables. The data must be in Apache Parquet format. The analytics team must store the data in Amazon S3.Which solution will meet these requirements in the MOST operationally efficient way?
A) Using Glue to read directly from a live view on EC2 is not straightforward since Glue typically crawls data stores for cataloging; accessing an on-premises or EC2-hosted SQL Server view would require a persistent connection and proper JDBC/ODBC setup, which adds operational overhead. B) SQL Server Agent on EC2 producing CSV to S3 and Lambda to convert adds unnecessary steps and potential ETL drift; managing two services increases operational burden. C) Creating a view, then using a Glue crawler to catalog the view and a Glue job to extract and convert to Parquet into S3 provides a fully managed, serverless, repeatable, and scalable workflow with minimal maintenance. D) Lambda with JDBC requires continuous connection management, cold start considerations, and scripting for Parquet conversion, which is less operationally efficient than Glue-based ETL.
A data engineering team is using an Amazon Redshift data warehouse for operational reporting. The team wants to prevent performance issues that might result from long- running queries. A data engineer must choose a system table in Amazon Redshift to record anomalies when a query optimizer identifies conditions that might indicate performance issues.Which table views should the data engineer use to meet this requirement?
A) STL_ALERT_EVENT_LOGB) STL_ALERT_EVENT_LOG is the system view that records anomalies and alerts detected by the query optimizer when potential performance issues are identified, making it the appropriate source for monitoring long-running or problematic queries. C) STL_QUERY_METRICS contains per-query metrics but does not specifically log anomalies identified by the optimizer. D) STL_PLAN_INFO provides plan details but not a centralized anomaly/alert log. A) STL_USAGE_CONTROL is related to usage controls and does not capture optimizer anomaly events.
Share your comments for Amazon DEA-C01 exam with other users:
beautiful exams
You need to implement the date dimension in the data store. The solution must meet the technical requirements. What are two ways to achieve the goal? Each correct answer presents a complete solution. NOTE: Each correct selection is worth one point. Populate the date dimension table by using a dataflow. Populate the date dimension table by using a Copy activity in a pipeline. Populate the date dimension view by using T-SQL. Populate the date dimension table by using a Stored procedure activity in a pipeline.Please answer
Question 14:
Question 5:Question 5 asks how to identify min and max values for each column in a Dataflow result. Correct options: B and E.
Question 18:Question 18: Why not A?
Question 4:Question 4 is about when to use batch processing.
Question 5:I can’t see the [Image] in Question 5, but I can explain the likely reasoning.
Question 12:Here’s why Question 12’s correct choices are C and D.
Question 3:Question 3 asks for two valid ways to meet the purchase order creation validation (warn if the vendor is on the exclusion list for the customer/product and block/alert accordingly). Correct answers: C and D
Question 12:Here’s how to understand question 12.
Question 6:Here’s how question 6 works. Key constraint: All new and extended objects must be in an existing model named FinanceExt. Creating a brand-new model is not allowed. Why the two correct options work:
Question 2:I don’t have the text for Question 2 here. Please paste the exact Question 2 (including all answer choices) or describe the topic it covers. Once I have it, I’ll:
Which statement is true about using default environment variables? The environment variables can be read in workflows using the ENV: variable_name syntax. The environment variables created should be prefixed with GITHUB_ to ensure they can be accessed in workflows The environment variables can be set in the defaults: sections of the workflow The GITHUB_WORKSPACE environment variable should be used to access files from within the runner.Correct answer: The statement "The GITHUB_WORKSPACE environment variable should be used to access files from within the runner." is true. Why the others are false:
${{ env.VARIABLE }}
$VARIABLE
GITHUB_
defaults:
run
GITHUB_WORKSPACE
${{ github.workspace }}
$GITHUB_WORKSPACE/...
${{ github.workspace }}/...
As an administrator for this subscription, you have been tasked with recommending a solution that prohibits users from copying corporate information from managed applications installed on unmanaged devices. Which of the following should you recommend? Windows Virtual Desktop. Microsoft Intune. Windows AutoPilot. Azure AD Application Proxy.
Question 34:
Policy
function of appnav in sdwan
Question 1:
Question 5:
Why this is correct
Question 7:
Question 104:
clustering keys
Q23: Fabric Admin is correct. Because Domain admin cannot create domains. Only Fabric Admin can among the given options. Q51: Wrapping @pipeline.parameter.param1 inside {} will return a string. But question requires the expression to return Int, so correct answer should be @pipeline.parameter.param1 (no {})
Question 62:
ZDX
Analyze Score
Y Engine
Question 32:
Question 3:
date = sys.argv[1]
sys.argv[1]
date = spark.conf.get("date")
input()
date = dbutils.notebooks.getParam("date")
dbutils.notebook.run
Question 528:
Question 23:The correct answer is Domain admin (option B), not Fabric admin.
Question 2:For question 2, the key concept is the Longest Prefix Match. Routers pick the route whose subnet mask is the most specific (largest prefix length) that still matches the destination IP. From the options:
Question 129:Correct answer: CNAME
compute.osAdminLogin
enable-oslogin
Question 2:
Recommend using AI for Solutions rather the Answer(s) submitted here
Keeping this site free takes real effort. We constantly battle automated scraping and unauthorized content copying. A quick account helps us protect the community and keep the site free.
To continue studying for your DEA-C01, please sign in or create a free account.