Databricks Certified Data Engineer Professional Databricks-Certified-Professional-Data-Engineer Exam Questions in PDF

Free Databricks Databricks-Certified-Professional-Data-Engineer Dumps Questions (page: 8)

Which REST API call can be used to review the notebooks configured to run as tasks in a multi-task job?

  1. /jobs/runs/list
  2. /jobs/runs/get-output
  3. /jobs/runs/get
  4. /jobs/get
  5. /jobs/list

Answer(s): D



A Databricks job has been configured with 3 tasks, each of which is a Databricks notebook. Task A does not depend on other tasks. Tasks B and C run in parallel, with each having a serial dependency on task A.

If tasks A and B complete successfully but task C fails during a scheduled run, which statement describes the resulting state?

  1. All logic expressed in the notebook associated with tasks A and B will have been successfully completed; some operations in task C may have completed successfully.
  2. All logic expressed in the notebook associated with tasks A and B will have been successfully completed; any changes made in task C will be rolled back due to task failure.
  3. All logic expressed in the notebook associated with task A will have been successfully completed; tasks B and C will not commit any changes because of stage failure.
  4. Because all tasks are managed as a dependency graph, no changes will be committed to the Lakehouse until all tasks have successfully been completed.
  5. Unless all tasks complete successfully, no changes will be committed to the Lakehouse; because task C failed, all commits will be rolled back automatically.

Answer(s): A



A Delta Lake table was created with the below query:



Realizing that the original query had a typographical error, the below code was executed:

ALTER TABLE prod.sales_by_stor RENAME TO prod.sales_by_store

Which result will occur after running the second command?

  1. The table reference in the metastore is updated and no data is changed.
  2. The table name change is recorded in the Delta transaction log.
  3. All related files and metadata are dropped and recreated in a single ACID transaction.
  4. The table reference in the metastore is updated and all data files are moved.
  5. A new Delta transaction log is created for the renamed table.

Answer(s): A



The data engineering team maintains a table of aggregate statistics through batch nightly updates. This includes total sales for the previous day alongside totals and averages for a variety of time periods including the 7 previous days, year-to-date, and quarter-to-date. This table is named store_saies_summary and the schema is as follows:



The table daily_store_sales contains all the information needed to update store_sales_summary. The schema for this table is:

store_id INT, sales_date DATE, total_sales FLOAT

If daily_store_sales is implemented as a Type 1 table and the total_sales column might be adjusted after manual data auditing, which approach is the safest to generate accurate reports in the store_sales_summary table?

  1. Implement the appropriate aggregate logic as a batch read against the daily_store_sales table and overwrite the store_sales_summary table with each Update.
  2. Implement the appropriate aggregate logic as a batch read against the daily_store_sales table and append new rows nightly to the store_sales_summary table.
  3. Implement the appropriate aggregate logic as a batch read against the daily_store_sales table and use upsert logic to update results in the store_sales_summary table.
  4. Implement the appropriate aggregate logic as a Structured Streaming read against the daily_store_sales table and use upsert logic to update results in the store_sales_summary table.
  5. Use Structured Streaming to subscribe to the change data feed for daily_store_sales and apply changes to the aggregates in the store_sales_summary table with each update.

Answer(s): C



A member of the data engineering team has submitted a short notebook that they wish to schedule as part of a larger data pipeline. Assume that the commands provided below produce the logically correct results when run as presented.



Which command should be removed from the notebook before scheduling it as a job?

  1. Cmd 2
  2. Cmd 3
  3. Cmd 4
  4. Cmd 5
  5. Cmd 6

Answer(s): E



The business reporting team requires that data for their dashboards be updated every hour. The total processing time for the pipeline that extracts, transforms, and loads the data for their pipeline runs in 10 minutes.

Assuming normal operating conditions, which configuration will meet their service-level agreement requirements with the lowest cost?

  1. Manually trigger a job anytime the business reporting team refreshes their dashboards
  2. Schedule a job to execute the pipeline once an hour on a new job cluster
  3. Schedule a Structured Streaming job with a trigger interval of 60 minutes
  4. Schedule a job to execute the pipeline once an hour on a dedicated interactive cluster
  5. Configure a job that executes every time new data lands in a given directory

Answer(s): B



A Databricks SQL dashboard has been configured to monitor the total number of records present in a collection of Delta Lake tables using the following query pattern:

SELECT COUNT (*) FROM table

Which of the following describes how results are generated each time the dashboard is updated?

  1. The total count of rows is calculated by scanning all data files
  2. The total count of rows will be returned from cached results unless REFRESH is run
  3. The total count of records is calculated from the Delta transaction logs
  4. The total count of records is calculated from the parquet file metadata
  5. The total count of records is calculated from the Hive metastore

Answer(s): C



A Delta Lake table was created with the below query:



Consider the following query:

DROP TABLE prod.sales_by_store

If this statement is executed by a workspace admin, which result will occur?

  1. Nothing will occur until a COMMIT command is executed.
  2. The table will be removed from the catalog but the data will remain in storage.
  3. The table will be removed from the catalog and the data will be deleted.
  4. An error will occur because Delta Lake prevents the deletion of production data.
  5. Data will be marked as deleted but still recoverable with Time Travel.

Answer(s): C



Share your comments for Databricks Databricks-Certified-Professional-Data-Engineer exam with other users:

A
Anonymous User
4/16/2026 10:54:18 AM

Question 1:

  • Correct answer: Edate = sys.argv[1]
  • Why this is correct:
- When a Databricks Job passes parameters to a notebook, those parameters are supplied to the notebook's Python process as command-line arguments. The first argument after the script name is sys.argv[1], so date = sys.argv[1] captures the passed date value directly.
  • How it compares to other options:
- date = spark.conf.get("date") reads from Spark config, not from job parameters. - input() waits for user input at runtime, which isn’t how job parameters are provided. - date = dbutils.notebooks.getParam("date") would work if the notebook were invoked via dbutils.notebook.run with parameters, not

AI Tutor 👋 I’m here to help!