Databricks Certified Data Engineer Professional Certified Data Engineer Professional Exam Questions in PDF

Free Databricks Certified Data Engineer Professional Dumps Questions (page: 3)

A production workload incrementally applies updates from an external Change Data Capture feed to a Delta Lake table as an always-on Structured Stream job. When data was initially migrated for this table, OPTIMIZE was executed and most data files were resized to 1 GB. Auto Optimize and Auto Compaction were both turned on for the streaming production job. Recent review of data files shows that most data files are under 64 MB, although each partition in the table contains at least 1 GB of data and the total table size is over 10 TB.

Which of the following likely explains these smaller file sizes?

  1. Databricks has autotuned to a smaller target file size to reduce duration of MERGE operations
  2. Z-order indices calculated on the table are preventing file compaction
  3. Bloom filter indices calculated on the table are preventing file compaction
  4. Databricks has autotuned to a smaller target file size based on the overall size of data in the table
  5. Databricks has autotuned to a smaller target file size based on the amount of data in each partition

Answer(s): A



Which statement regarding stream-static joins and static Delta tables is correct?

  1. Each microbatch of a stream-static join will use the most recent version of the static Delta table as of each microbatch.
  2. Each microbatch of a stream-static join will use the most recent version of the static Delta table as of the job's initialization.
  3. The checkpoint directory will be used to track state information for the unique keys present in the join.
  4. Stream-static joins cannot use static Delta tables because of consistency issues.
  5. The checkpoint directory will be used to track updates to the static Delta table.

Answer(s): A



A junior data engineer has been asked to develop a streaming data pipeline with a grouped aggregation using DataFrame df. The pipeline needs to calculate the average humidity and average temperature for each non- overlapping five-minute interval. Events are recorded once per minute per device.

Streaming DataFrame df has the following schema:

"device_id INT, event_time TIMESTAMP, temp FLOAT, humidity FLOAT"

Code block:



Choose the response that correctly fills in the blank within the code block to complete this task.

  1. to_interval("event_time", "5 minutes").alias("time")
  2. window("event_time", "5 minutes").alias("time")
  3. "event_time"
  4. window("event_time", "10 minutes").alias("time")
  5. lag("event_time", "10 minutes").alias("time")

Answer(s): B



A data architect has designed a system in which two Structured Streaming jobs will concurrently write to a single bronze Delta table. Each job is subscribing to a different topic from an Apache Kafka source, but they will write data with the same schema. To keep the directory structure simple, a data engineer has decided to nest a checkpoint directory to be shared by both streams.

The proposed directory structure is displayed below:



Which statement describes whether this checkpoint directory structure is valid for the given scenario and why?

  1. No; Delta Lake manages streaming checkpoints in the transaction log.
  2. Yes; both of the streams can share a single checkpoint directory.
  3. No; only one stream can write to a Delta Lake table.
  4. Yes; Delta Lake supports infinite concurrent writers.
  5. No; each of the streams needs to have its own checkpoint directory.

Answer(s): E



A Structured Streaming job deployed to production has been experiencing delays during peak hours of the day. At present, during normal execution, each microbatch of data is processed in less than 3 seconds. During peak hours of the day, execution time for each microbatch becomes very inconsistent, sometimes exceeding 30 seconds. The streaming write is currently configured with a trigger interval of 10 seconds.

Holding all other variables constant and assuming records need to be processed in less than 10 seconds, which adjustment will meet the requirement?

  1. Decrease the trigger interval to 5 seconds; triggering batches more frequently allows idle executors to begin processing the next batch while longer running tasks from previous batches finish.
  2. Increase the trigger interval to 30 seconds; setting the trigger interval near the maximum execution time observed for each batch is always best practice to ensure no records are dropped.
  3. The trigger interval cannot be modified without modifying the checkpoint directory; to maintain the current stream state, increase the number of shuffle partitions to maximize parallelism.
  4. Use the trigger once option and configure a Databricks job to execute the query every 10 seconds; this ensures all backlogged records are processed with each batch.
  5. Decrease the trigger interval to 5 seconds; triggering batches more frequently may prevent records from backing up and large batches from causing spill.

Answer(s): E



Which statement describes Delta Lake Auto Compaction?

  1. An asynchronous job runs after the write completes to detect if files could be further compacted; if yes, an OPTIMIZE job is executed toward a default of 1 GB.
  2. Before a Jobs cluster terminates, OPTIMIZE is executed on all tables modified during the most recent job.
  3. Optimized writes use logical partitions instead of directory partitions; because partition boundaries are only represented in metadata, fewer small files are written.
  4. Data is queued in a messaging bus instead of committing data directly to memory; all data is committed from the messaging bus in one batch once the job is complete.
  5. An asynchronous job runs after the write completes to detect if files could be further compacted; if yes, an OPTIMIZE job is executed toward a default of 128 MB.

Answer(s): E



Which statement characterizes the general programming model used by Spark Structured Streaming?

  1. Structured Streaming leverages the parallel processing of GPUs to achieve highly parallel data throughput.
  2. Structured Streaming is implemented as a messaging bus and is derived from Apache Kafka.
  3. Structured Streaming uses specialized hardware and I/O streams to achieve sub-second latency for data transfer.
  4. Structured Streaming models new data arriving in a data stream as new rows appended to an unbounded table.
  5. Structured Streaming relies on a distributed network of nodes that hold incremental state values for cached stages.

Answer(s): D



Which configuration parameter directly affects the size of a spark-partition upon ingestion of data into Spark?

  1. spark.sql.files.maxPartitionBytes
  2. spark.sql.autoBroadcastJoinThreshold
  3. spark.sql.files.openCostInBytes
  4. spark.sql.adaptive.coalescePartitions.minPartitionNum
  5. spark.sql.adaptive.advisoryPartitionSizeInBytes

Answer(s): A



Share your comments for Databricks Certified Data Engineer Professional exam with other users:

P
Palash Ghosh
9/11/2023 8:30:00 AM

easy questions

N
Noor
10/2/2023 7:48:00 AM

could you please upload ad0-127 dumps

K
Kotesh
7/27/2023 2:30:00 AM

good content

B
Biswa
11/20/2023 9:07:00 AM

understanding about joins

J
Jimmy Lopez
8/25/2023 10:19:00 AM

please upload oracle cloud infrastructure 2023 foundations associate exam braindumps. thank you.

L
Lily
4/24/2023 10:50:00 PM

questions made studying easy and enjoyable, passed on the first try!

J
John
8/7/2023 12:12:00 AM

has anyone recently attended safe 6.0 exam? did you see any questions from here?

B
Big Dog
6/24/2023 4:47:00 PM

question 13 should be dhcp option 43, right?

B
B.Khan
4/19/2022 9:43:00 PM

the buy 1 get 1 is a great deal. so far i have only gone over exam. it looks promissing. i report back once i write my exam.

G
Ganesh
12/24/2023 11:56:00 PM

is this dump good

A
Albin
10/13/2023 12:37:00 AM

good ................

P
Passed
1/16/2022 9:40:00 AM

passed

H
Harsh
6/12/2023 1:43:00 PM

yes going good

S
Salesforce consultant
1/2/2024 1:32:00 PM

good questions for practice

R
Ridima
9/12/2023 4:18:00 AM

need dump and sap notes for c_s4cpr_2308 - sap certified application associate - sap s/4hana cloud, public edition - sourcing and procurement

T
Tanvi Rajput
10/6/2023 6:50:00 AM

question 11: d i personally feel some answers are wrong.

A
Anil
7/18/2023 9:38:00 AM

nice questions

C
Chris
8/26/2023 1:10:00 AM

looking for c1000-158: ibm cloud technical advocate v4 questions

S
sachin
6/27/2023 1:22:00 PM

can you share the pdf

B
Blessious Phiri
8/13/2023 10:26:00 AM

admin ii is real technical stuff

L
Luis Manuel
7/13/2023 9:30:00 PM

could you post the link

V
vijendra
8/18/2023 7:54:00 AM

hello send me dumps

S
Simeneh
7/9/2023 8:46:00 AM

it is very nice

J
john
11/16/2023 5:13:00 PM

i gave the amazon dva-c02 tests today and passed. very helpful.

T
Tao
11/20/2023 8:53:00 AM

there is an incorrect word in the problem statement. for example, in question 1, there is the word "speci c". this is "specific. in the other question, there is the word "noti cation". this is "notification. these mistakes make this site difficult for me to use.

P
patricks
10/24/2023 6:02:00 AM

passed my az-120 certification exam today with 90% marks. studied using the dumps highly recommended to all.

A
Ananya
9/14/2023 5:17:00 AM

i need it, plz make it available

J
JM
12/19/2023 2:41:00 PM

q47: intrusion prevention system is the correct answer, not patch management. by definition, there are no patches available for a zero-day vulnerability. the way to prevent an attacker from exploiting a zero-day vulnerability is to use an ips.

R
Ronke
8/18/2023 10:39:00 AM

this is simple but tiugh as well

C
CesarPA
7/12/2023 10:36:00 PM

questão 4, segundo meu compilador local e o site https://www.jdoodle.com/online-java-compiler/, a resposta correta é "c" !

J
Jeya
9/13/2023 7:50:00 AM

its very useful

T
Tracy
10/24/2023 6:28:00 AM

i mastered my skills and aced the comptia 220-1102 exam with a score of 920/1000. i give the credit to for my success.

J
James
8/17/2023 4:33:00 PM

real questions

A
Aderonke
10/23/2023 1:07:00 PM

very helpful assessments

AI Tutor 👋 I’m here to help!