Amazon DAS-C01 Exam (page: 3)
Amazon AWS Certified Data Analytics - Specialty (DAS-C01)
Updated on: 29-Mar-2026

Viewing Page 3 of 22

A team of data scientists plans to analyze market trend data for their company's new investment strategy. The trend data comes from ve different data sources in large volumes. The team wants to utilize Amazon Kinesis to support their use case. The team uses SQL-like queries to analyze trends and wants to send noti cations based on certain signi cant patterns in the trends. Additionally, the data scientists want to save the data to Amazon S3 for archival and historical re- processing, and use AWS managed services wherever possible. The team wants to implement the lowest-cost solution.
Which solution meets these requirements?

  1. Publish data to one Kinesis data stream. Deploy a custom application using the Kinesis Client Library (KCL) for analyzing trends, and send noti cations using Amazon SNS. Con gure Kinesis Data Firehose on the Kinesis data stream to persist data to an S3 bucket.
  2. Publish data to one Kinesis data stream. Deploy Kinesis Data Analytic to the stream for analyzing trends, and con gure an AWS Lambda function as an output to send noti cations using Amazon SNS. Con gure Kinesis Data Firehose on the Kinesis data stream to persist data to an S3 bucket.
  3. Publish data to two Kinesis data streams. Deploy Kinesis Data Analytics to the rst stream for analyzing trends, and con gure an AWS Lambda function as an output to send noti cations using Amazon SNS. Con gure Kinesis Data Firehose on the second Kinesis data stream to persist data to an S3 bucket.
  4. Publish data to two Kinesis data streams. Deploy a custom application using the Kinesis Client Library (KCL) to the rst stream for analyzing trends, and send noti cations using Amazon SNS. Con gure Kinesis Data Firehose on the second Kinesis data stream to persist data to an S3 bucket.

Answer(s): B

Explanation:

A concise explanation of the correct choice and why others are incorrect follows.
B) Correct: Kinesis Data Analytics (KDA) provides SQL-like queries over streams, enabling real-time trend analysis; Lambda can output notifications via SNS, and Kinesis Data Firehose can persist to S3 for archival with low operational cost using managed services.
A) Uses a custom KCL app instead of managed KDA for SQL-like queries, increasing maintenance and cost; Firehose on a single stream may suffice, but the architecture isn’t optimized for SQL analytics and serverless outputs.
C) Splits data across streams unnecessarily and uses Firehose on a second stream; Lambda-based output still relies on reactive processing rather than streamlined real-time analysis on a single stream.
D) Uses a custom KCL for analysis and Firehose on the second stream, adding complexity and cost; two streams without leveraging KDA for SQL analytics is suboptimal.



A company currently uses Amazon Athena to query its global datasets. The regional data is stored in Amazon S3 in the us-east-1 and us-west-2 Regions. The data is not encrypted. To simplify the query process and manage it centrally, the company wants to use Athena in us-west-2 to query data from Amazon S3 in both
Regions. The solution should be as low-cost as possible.
What should the company do to achieve this goal?

  1. Use AWS DMS to migrate the AWS Glue Data Catalog from us-east-1 to us-west-2. Run Athena queries in us-west-2.
  2. Run the AWS Glue crawler in us-west-2 to catalog datasets in all Regions. Once the data is crawled, run Athena queries in us-west-2.
  3. Enable cross-Region replication for the S3 buckets in us-east-1 to replicate data in us-west-2. Once the data is replicated in us-west-2, run the AWS Glue crawler there to update the AWS Glue Data Catalog in us-west-2 and run Athena queries.
  4. Update AWS Glue resource policies to provide us-east-1 AWS Glue Data Catalog access to us-west-2. Once the catalog in us-west-2 has access to the catalog in us-east-1, run Athena queries in us-west-2.

Answer(s): B

Explanation:

Athena cross-region querying is supported when the Data Catalog is aware of tables across regions; using a single us-west-2 catalog via a crawler in that region to discover datasets in all Regions minimizes replication and keeps costs low.
A) DMS migration of Glue Data Catalog is unnecessary for cross-region catalog access and adds cost/time.
C) Cross-Region replication of S3 adds storage and transfer costs; duplicating data is not required for cataloging.
D) Granting cross-region catalog access via policy is not a valid mechanism to register and query remote-region data; catalog synchronization is needed.
B) Correct: a single crawler in us-west-2 catalogs datasets in all Regions, enabling centralized queries at minimal cost.



A large company receives les from external parties in Amazon EC2 throughout the day. At the end of the day, the les are combined into a single le, compressed into a gzip le, and uploaded to Amazon S3. The total size of all the les is close to 100 GB daily. Once the les are uploaded to Amazon S3, an
AWS Batch program executes a COPY command to load the les into an Amazon Redshift cluster.
Which program modi cation will accelerate the COPY process?

  1. Upload the individual les to Amazon S3 and run the COPY command as soon as the les become available.
  2. Split the number of les so they are equal to a multiple of the number of slices in the Amazon Redshift cluster. Gzip and upload the les to Amazon S3. Run the COPY command on the les.
  3. Split the number of les so they are equal to a multiple of the number of compute nodes in the Amazon Redshift cluster. Gzip and upload the les to Amazon S3. Run the COPY command on the les.
  4. Apply sharding by breaking up the les so the distkey columns with the same values go to the same le. Gzip and upload the sharded les to Amazon S3. Run the COPY command on the les.

Answer(s): B

Explanation:

The correct answer is B. Aligning the number of input files to a multiple of Redshift slice count enables optimal parallelism during COPY, reducing bottlenecks and improving throughput. Splitting into equal-sized files and compressing them minimizes I/O and speeds up network transfer to S3 and subsequent loading.
A is incorrect because uploading per-file as they arrive prevents optimal parallelism and can introduce scheduling delays; combining into one big file at the end underutilizes parallel COPY across slices. C is incorrect since using compute node count rather than slice count determines parallelism in distribution, but the COPY parallelism is driven by slices, not nodes. D is incorrect because sharding by distkey value alignment is beneficial for JOIN performance, not for COPY throughput.


Reference:

https://docs.aws.amazon.com/redshift/latest/dg/t_splitting-data- les.html



A large ride-sharing company has thousands of drivers globally serving millions of unique customers every day. The company has decided to migrate an existing data mart to Amazon Redshift. The existing schema includes the following tables.
A trips fact table for information on completed rides.
A drivers dimension table for driver pro les.
A customers fact table holding customer pro le information.
The company analyzes trip details by date and destination to examine pro tability by region. The drivers data rarely changes. The customers data frequently changes.
What table design provides optimal query performance?

  1. Use DISTSTYLE KEY (destination) for the trips table and sort by date. Use DISTSTYLE ALL for the drivers and customers tables.
  2. Use DISTSTYLE EVEN for the trips table and sort by date. Use DISTSTYLE ALL for the drivers table. Use DISTSTYLE EVEN for the customers table.
  3. Use DISTSTYLE KEY (destination) for the trips table and sort by date. Use DISTSTYLE ALL for the drivers table. Use DISTSTYLE EVEN for the customers table.
  4. Use DISTSTYLE EVEN for the drivers table and sort by date. Use DISTSTYLE ALL for both fact tables.

Answer(s): C

Explanation:

The optimal design uses a key distribution on trips by destination to colocate data with the relevant dimension and uses a dense, rarely changing drivers dimension with DISTSTYLE ALL to ensure joins are fast; customers, which changes frequently, uses DISTSTYLE EVEN to avoid redistribution overhead on updates.
A) Using DISTSTYLE KEY on trips and ALL on drivers and customers is suboptimal because customers change frequently, and ALL on large dimensions can cause data skew and maintenance overhead.
B) DISTSTYLE EVEN for trips disperses data, hurting joins with destination-based analysis; drivers ALL is ok, but customers needs EVEN to handle frequent updates efficiently.
D) Distinguishing drivers with EVEN and using ALL for both fact tables ignores the benefits of destination-based co-location and proper handling of high-change customer data.



Three teams of data analysts use Apache Hive on an Amazon EMR cluster with the EMR File System (EMRFS) to query data stored within each teams Amazon
S3 bucket. The EMR cluster has Kerberos enabled and is con gured to authenticate users from the corporate Active Directory. The data is highly sensitive, so access must be limited to the members of each team.
Which steps will satisfy the security requirements?

  1. For the EMR cluster Amazon EC2 instances, create a service role that grants no access to Amazon S3. Create three additional IAM roles, each granting access to each team's speci c bucket. Add the additional IAM roles to the cluster's EMR role for the EC2 trust policy. Create a security con guration mapping for the additional IAM roles to Active Directory user groups for each team.
  2. For the EMR cluster Amazon EC2 instances, create a service role that grants no access to Amazon S3. Create three additional IAM roles, each granting access to each team's speci c bucket. Add the service role for the EMR cluster EC2 instances to the trust policies for the additional IAM roles. Create a security con guration mapping for the additional IAM roles to Active Directory user groups for each team.
  3. For the EMR cluster Amazon EC2 instances, create a service role that grants full access to Amazon S3. Create three additional IAM roles, each granting access to each team's speci c bucket. Add the service role for the EMR cluster EC2 instances to the trust polices for the additional IAM roles. Create a security con guration mapping for the additional IAM roles to Active Directory user groups for each team.
  4. For the EMR cluster Amazon EC2 instances, create a service role that grants full access to Amazon S3. Create three additional IAM roles, each granting access to each team's speci c bucket. Add the service role for the EMR cluster EC2 instances to the trust polices for the base IAM roles. Create a security con guration mapping for the additional IAM roles to Active Directory user groups for each team.

Answer(s): B

Explanation:

The correct option ensures least-privilege by separating base EMR service access (no S3) from per-bucket access granted to team roles, and links those roles to AD groups via security configuration, allowing Kerberos/AD-based authorization without broad S3 access.
A) Incorrect: binds additional roles to EMR role policy instead of EMR EC2 service role trust, misconfiguring trust for instance profiles.
C) Incorrect: EMR service role grants full S3 access, violating least privilege.
D) Incorrect: trusts base EMR roles instead of configuring trust for the per-team bucket roles, leading to improper permission scoping.



A company is planning to create a data lake in Amazon S3. The company wants to create tiered storage based on access patterns and cost objectives. The solution must include support for JDBC connections from legacy clients, metadata management that allows federation for access control, and batch-based ETL using PySpark and Scala. Operational management should be limited. Which combination of components can meet these requirements? (Choose three.)

  1. AWS Glue Data Catalog for metadata management
  2. Amazon EMR with Apache Spark for ETL
  3. AWS Glue for Scala-based ETL
  4. Amazon EMR with Apache Hive for JDBC clients
  5. Amazon Athena for querying data in Amazon S3 using JDBC drivers
  6. Amazon EMR with Apache Hive, using an Amazon RDS with MySQL-compatible backed metastore

Answer(s): A,C,E

Explanation:

A) AWS Glue Data Catalog for metadata management
C) AWS Glue for Scala-based ETL
E) Amazon Athena for querying data in Amazon S3 using JDBC drivers
A) Glue Data Catalog provides centralized metadata management with federation-compatible IAM and fine-grained access control, enabling cross-account and cross-service metadata access suitable for a data lake with tiered storage.
C) Glue supports PySpark and Scala-based ETL jobs via AWS Glue Studio/ETL, aligning with batch-based processing requirements.
E) Athena natively queries S3 data and supports JDBC-based connections through drivers, enabling legacy JDBC clients without heavy operational overhead.
B) EMR with Spark adds complexity and ongoing cluster ops; not minimal management.
D) EMR with Hive lacks federation metadata management and JDBC client support in a minimal setup.
F) EMR with Hive with RDS-backed metastore introduces unnecessary managed RDS dependency and operational overhead.


Reference:

https://d1.awsstatic.com/whitepapers/Storage/data-lake-on-aws.pdf



A company wants to optimize the cost of its data and analytics platform. The company is ingesting a number of .csv and JSON les in Amazon S3 from various data sources. Incoming data is expected to be 50 GB each day. The company is using Amazon Athena to query the raw data in Amazon S3 directly. Most queries aggregate data from the past 12 months, and data that is older than 5 years is infrequently queried. The typical query scans about 500 MB of data and is expected to return results in less than 1 minute. The raw data must be retained inde nitely for compliance requirements.
Which solution meets the company's requirements?

  1. Use an AWS Glue ETL job to compress, partition, and convert the data into a columnar data format. Use Athena to query the processed dataset. Con gure a lifecycle policy to move the processed data into the Amazon S3 Standard-Infrequent Access (S3 Standard-IA) storage class 5 years after object creation. Con gure a second lifecycle policy to move the raw data into Amazon S3 Glacier for long-term archival 7 days after object creation.
  2. Use an AWS Glue ETL job to partition and convert the data into a row-based data format. Use Athena to query the processed dataset. Con gure a lifecycle policy to move the data into the Amazon S3 Standard-Infrequent Access (S3 Standard-IA) storage class 5 years after object creation. Con gure a second lifecycle policy to move the raw data into Amazon S3 Glacier for long-term archival 7 days after object creation.
  3. Use an AWS Glue ETL job to compress, partition, and convert the data into a columnar data format. Use Athena to query the processed dataset. Con gure a lifecycle policy to move the processed data into the Amazon S3 Standard-Infrequent Access (S3 Standard-IA) storage class 5 years after the object was last accessed. Con gure a second lifecycle policy to move the raw data into Amazon S3 Glacier for long- term archival 7 days after the last date the object was accessed.
  4. Use an AWS Glue ETL job to partition and convert the data into a row-based data format. Use Athena to query the processed dataset. Con gure a lifecycle policy to move the data into the Amazon S3 Standard-Infrequent Access (S3 Standard-IA) storage class 5 years after the object was last accessed. Con gure a second lifecycle policy to move the raw data into Amazon S3 Glacier for long-term archival 7 days after the last date the object was accessed.

Answer(s): A

Explanation:

A) Compress, partition, and convert to a columnar format; query via Athena; move processed data to S3 Standard-IA after 5 years; move raw data to Glacier after 7 days. This aligns with cost optimization for infrequently accessed data post-5 years, while preserving immutable raw data in Glacier for long-term archival per compliance; columnar format and partitioning optimize query performance and reduce scanned data, meeting sub-1-minute response. B) uses row-based format harming scan efficiency; C) and D) base lifecycle on last accessed date, which is unsuitable for immutable retention policy and may misprice data hot/cold, plus Glacier timing remains 7 days after last access, not alignment with 5-year access patterns.



An energy company collects voltage data in real time from sensors that are attached to buildings. The company wants to receive noti cations when a sequence of two voltage drops is detected within 10 minutes of a sudden voltage increase at the same building. All noti cations must be delivered as quickly as possible. The system must be highly available. The company needs a solution that will automatically scale when this monitoring feature is implemented in other cities. The noti cation system is subscribed to an Amazon Simple Noti cation Service (Amazon SNS) topic for remediation.
Which solution will meet these requirements?

  1. Create an Amazon Managed Streaming for Apache Kafka cluster to ingest the data. Use an Apache Spark Streaming with Apache Kafka consumer API in an automatically scaled Amazon EMR cluster to process the incoming data. Use the Spark Streaming application to detect the known event sequence and send the SNS message.
  2. Create a REST-based web service by using Amazon API Gateway in front of an AWS Lambda function. Create an Amazon RDS for PostgreSQL database with su cient Provisioned IOPS to meet current demand. Con gure the Lambda function to store incoming events in the RDS for PostgreSQL database, query the latest data to detect the known event sequence, and send the SNS message.
  3. Create an Amazon Kinesis Data Firehose delivery stream to capture the incoming sensor data. Use an AWS Lambda transformation function to detect the known event sequence and send the SNS message.
  4. Create an Amazon Kinesis data stream to capture the incoming sensor data. Create another stream for noti cations. Set up AWS Application Auto Scaling on both streams. Create an Amazon Kinesis Data Analytics for Java application to detect the known event sequence, and add a message to the message stream Con gure an AWS Lambda function to poll the message stream and publish to the SNS topic.

Answer(s): A

Explanation:

A) The combination of KSQL-like streaming with Spark on an auto-scaling EMR cluster provides real-time processing, scalable ingestion, and low-latency SNS notifications, meeting high availability and on-demand scaling as deployments expand to other cities.
B) REST+Lambda with RDS is not optimal for real-time sequence detection due to potential latency, limited scalability, and overhead of polling queries on relational DBs for streaming patterns.
C) Kinesis Data Firehose with Lambda provides near real-time but Firehose is primarily for delivery, not complex stateful sequence detection across events within a window.
D) Separate streams with Kinesis Data Analytics for Java adds unnecessary complexity; dual streams and polling increase latency and operational overhead without clear advantage over Spark on EMR.


Reference:

https://aws.amazon.com/kinesis/data-streams/faqs/



Viewing Page 3 of 22



Share your comments for Amazon DAS-C01 exam with other users:

DN 9/4/2023 11:19:00 PM

question 14 - run terraform import: this is the recommended best practice for bringing manually created or destroyed resources under terraform management. you use terraform import to associate an existing resource with a terraform resource configuration. this ensures that terraform is aware of the resource, and you can subsequently manage it with terraform.
Anonymous


Zhiguang 8/19/2023 11:37:00 PM

please upload dump. thanks in advance.
Anonymous


deedee 12/23/2023 5:51:00 PM

great great
UNITED STATES


Asad Khan 11/1/2023 3:10:00 AM

answer 16 should be b your organizational policies require you to use virtual machines directly
Anonymous


Sale Danasabe 10/24/2023 5:21:00 PM

the question are kind of tricky of you didnt get the hnag on it.
Anonymous


Luis 11/16/2023 1:39:00 PM

can anyone tell me if this is for rhel8 or rhel9?
UNITED STATES


hik 1/19/2024 1:47:00 PM

good content
UNITED STATES


Blessious Phiri 8/15/2023 2:18:00 PM

pdb and cdb are critical to the database
Anonymous


Zuned 10/22/2023 4:39:00 AM

till 104 questions are free, lets see how it helps me in my exam today.
UNITED STATES


Muhammad Rawish Siddiqui 12/3/2023 12:11:00 PM

question # 56, answer is true not false.
SAUDI ARABIA


Amaresh Vashishtha 8/27/2023 1:33:00 AM

i would be requiring dumps to prepare for certification exam
Anonymous


Asad 9/8/2023 1:01:00 AM

very helpful
PAKISTAN


Blessious Phiri 8/13/2023 3:10:00 PM

control file is the heart of rman backup
Anonymous


Senthil 9/19/2023 5:47:00 AM

hi could you please upload the ibm c2090-543 dumps
Anonymous


Harry 6/27/2023 7:20:00 AM

appriciate if you could upload this again
AUSTRALIA


Anonymous 7/10/2023 4:10:00 AM

please upload the dump
SWEDEN


Raja 6/20/2023 5:30:00 AM

i found some questions answers mismatch with explanation answers. please properly update
UNITED STATES


Doora 11/30/2023 4:20:00 AM

nothing to mention
Anonymous


deally 1/19/2024 3:41:00 PM

knowable questions
UNITED STATES


Sonia 7/23/2023 4:03:00 PM

very helpfull
UNITED STATES


binEY 10/6/2023 5:15:00 AM

good questions
Anonymous


Neha 9/28/2023 1:58:00 PM

its helpful
Anonymous


Desmond 1/5/2023 9:11:00 PM

i just took my oracle exam and let me tell you, this exam dumps was a lifesaver! without them, iam not sure i would have passed. the questions were tricky and the answers were obscure, but the exam dumps had everything i needed. i would recommend to anyone looking to pass their oracle exams with flying colors (and a little bit of cheating) lol.
SINGAPORE


Davidson OZ 9/9/2023 6:37:00 PM

22. if you need to make sure that one computer in your hot-spot network can access the internet without hot-spot authentication, which menu allows you to do this? answer is ip binding and not wall garden. wall garden allows specified websites to be accessed with users authentication to the hotspot
Anonymous


381 9/2/2023 4:31:00 PM

is question 1 correct?
Anonymous


Laurent 10/6/2023 5:09:00 PM

good content
Anonymous


Sniper69 5/9/2022 11:04:00 PM

manged to pass the exam with this exam dumps.
UNITED STATES


Deepak 12/27/2023 2:37:00 AM

good questions
SINGAPORE


dba 9/23/2023 3:10:00 AM

can we please have the latest exam questions?
Anonymous


Prasad 9/29/2023 7:27:00 AM

please help with jn0-649 latest dumps
HONG KONG


GTI9982 7/31/2023 10:15:00 PM

please i need this dump. thanks
CANADA


Elton Riva 12/12/2023 8:20:00 PM

i have to take the aws certified developer - associate dva-c02 in the next few weeks and i wanted to know if the questions on your website are the same as the official exam.
Anonymous


Berihun Desalegn Wonde 7/13/2023 11:00:00 AM

all questions are more important
Anonymous


gr 7/2/2023 7:03:00 AM

ques 4 answer should be c ie automatically recover from failure
Anonymous