Databricks Databricks Certified Associate Developer for Apache Spark 3.0 Exam (page: 4)
Databricks Certified Associate Developer for Apache Spark
Updated on: 09-Apr-2026

Which of the following describes a shuffle?

  1. A shuffle is a process that is executed during a broadcast hash join.
  2. A shuffle is a process that compares data across executors.
  3. A shuffle is a process that compares data across partitions.
  4. A shuffle is a Spark operation that results from DataFrame.coalesce().
  5. A shuffle is a process that allocates partitions to executors.

Answer(s): C

Explanation:

A shuffle is a Spark operation that results from DataFrame.coalesce(). No. DataFrame.coalesce() does not result in a shuffle.
A shuffle is a process that allocates partitions to executors. This is incorrect.
A shuffle is a process that is executed during a broadcast hash join.
No, broadcast hash joins avoid shuffles and yield performance benefits if at least one of the two tables is small in size (<= 10 MB by default). Broadcast hash joins can avoid shuffles because instead of exchanging partitions between executors, they broadcast a small table to all executors that then perform the rest of the join operation locally.
A shuffle is a process that compares data across executors.
No, in a shuffle, data is compared across partitions, and not executors. More info: Spark Repartition & Coalesce - Explained (https://bit.ly/32KF7zS)



Which of the following describes Spark's Adaptive Query Execution?

  1. Adaptive Query Execution features include dynamically coalescing shuffle partitions, dynamically injecting scan filters, and dynamically optimizing skew joins.
  2. Adaptive Query Execution is enabled in Spark by default.
  3. Adaptive Query Execution reoptimizes queries at execution points.
  4. Adaptive Query Execution features are dynamically switching join strategies and dynamically optimizing skew joins.
  5. Adaptive Query Execution applies to all kinds of queries.

Answer(s): D

Explanation:

Adaptive Query Execution features include dynamically coalescing shuffle partitions, dynamically injecting scan filters, and dynamically optimizing skew joins.
This is almost correct. All of these features, except for dynamically injecting scan filters, are part of Adaptive Query Execution. Dynamically injecting scan filters for join operations to limit the amount of data to be considered in a query is part of Dynamic Partition Pruning and not of Adaptive Query Execution.
Adaptive Query Execution reoptimizes queries at execution points.
No, Adaptive Query Execution reoptimizes queries at materialization points. Adaptive Query Execution is enabled in Spark by default.
No, Adaptive Query Execution is disabled in Spark needs to be enabled through the spark.sql.adaptive.enabled property.
Adaptive Query Execution applies to all kinds of queries.
No, Adaptive Query Execution applies only to queries that are not streaming queries and that contain at least one exchange (typically expressed through a join, aggregate, or window operator) or one subquery.
More info: How to Speed up SQL Queries with Adaptive Query Execution, Learning Spark, 2nd Edition, Chapter 12 (https://bit.ly/3tOh8M1)



The code block displayed below contains an error. The code block is intended to join DataFrame itemsDf with the larger DataFrame transactionsDf on column itemId. Find the error.
Code block:
transactionsDf.join(itemsDf, "itemId", how="broadcast")

  1. The syntax is wrong, how= should be removed from the code block.
  2. The join method should be replaced by the broadcast method.
  3. Spark will only perform the broadcast operation if this behavior has been enabled on the Spark cluster.
  4. The larger DataFrame transactionsDf is being broadcasted, rather than the smaller DataFrame itemsDf.
  5. broadcast is not a valid join type.

Answer(s): E

Explanation:

broadcast is not a valid join type.
Correct! The code block should read transactionsDf.join(broadcast(itemsDf), "itemId"). This would imply an inner join (this is the default in DataFrame.join()), but since the join type is not given in the question, this would be a valid choice.
The larger DataFrame transactionsDf is being broadcasted, rather than the smaller DataFrame itemsDf.
This option does not apply here, since the syntax around broadcasting is incorrect.
Spark will only perform the broadcast operation if this behavior has been enabled on the Spark cluster.
No, it is enabled by default, since the spark.sql.autoBroadcastJoinThreshold property is set to 10 MB by default. If that property would be set to -1, then broadcast joining would be disabled.
More info: Performance Tuning - Spark 3.1.1 Documentation (https://bit.ly/3gCz34r) The join method should be replaced by the broadcast method.
No, DataFrame has no broadcast() method.
The syntax is wrong, how= should be removed from the code block. No, having the keyword argument how= is totally acceptable.



Which of the following code blocks efficiently converts DataFrame transactionsDf from 12 into 24 partitions?

  1. transactionsDf.repartition(24, boost=True)
  2. transactionsDf.repartition()
  3. transactionsDf.repartition("itemId", 24)
  4. transactionsDf.coalesce(24)
  5. transactionsDf.repartition(24)

Answer(s): E

Explanation:

transactionsDf.coalesce(24)
No, the coalesce() method can only reduce, but not increase the number of partitions. transactionsDf.repartition()
No, repartition() requires a numPartitions argument. transactionsDf.repartition("itemId", 24)
No, here the cols and numPartitions argument have been mixed up. If the code block would be transactionsDf.repartition(24, "itemId"), this would be a valid solution. transactionsDf.repartition(24, boost=True)
No, there is no boost argument in the repartition() method.



Which of the following code blocks removes all rows in the 6-column DataFrame transactionsDf that have missing data in at least 3 columns?

  1. transactionsDf.dropna("any")
  2. transactionsDf.dropna(thresh=4)
  3. transactionsDf.drop.na("",2)
  4. transactionsDf.dropna(thresh=2)
  5. transactionsDf.dropna("",4)

Answer(s): B

Explanation:

transactionsDf.dropna(thresh=4)
Correct. Note that by only working with the thresh keyword argument, the first how keyword argument is ignored. Also, figuring out which value to set for thresh can be difficult, especially when under pressure in the exam. Here, I recommend you use the notes to create a "simulation" of what different values for thresh would do to a DataFrame. Here is an explanatory image why thresh=4 is the correct answer to the question:

transactionsDf.dropna(thresh=2)
Almost right. See the comment about thresh for the correct answer above. transactionsDf.dropna("any")
No, this would remove all rows that have at least one missing value. transactionsDf.drop.na("",2)
No, drop.na is not a proper DataFrame method. transactionsDf.dropna("",4)
No, this does not work and will throw an error in Spark because Spark cannot understand the first argument.
More info: pyspark.sql.DataFrame.dropna — PySpark 3.1.1 documentation (https://bit.ly/2QZpiCp) Static notebook | Dynamic notebook: See test 1, Question: 20 (Databricks import instructions) (https://flrs.github.io/spark_practice_tests_code/#1/20.html ,
https://bit.ly/sparkpracticeexams_import_instructions)



Viewing Page 4 of 37



Share your comments for Databricks Databricks Certified Associate Developer for Apache Spark 3.0 exam with other users:

Bipul Mishra 12/14/2023 7:12:00 AM

thank you for this tableau dumps . it will helpfull for tableau certification
UNITED STATES


hello 10/31/2023 12:07:00 PM

good content
Anonymous


Matheus 9/3/2023 2:14:00 PM

just testing if the comments are real
UNITED STATES


yenvti2@gmail.com 8/12/2023 7:56:00 PM

very helpful for exam preparation
Anonymous


Miguel 10/5/2023 12:16:00 PM

question 11: https://help.salesforce.com/s/articleview?id=sf.admin_lead_to_patient_setup_overview.htm&type=5
SPAIN


Noushin 11/28/2023 4:52:00 PM

i think the answer to question 42 is b not c
CANADA


susan sandivore 8/28/2023 1:00:00 AM

thanks for the dump
Anonymous


Aderonke 10/31/2023 12:51:00 AM

fantastic assessments
Anonymous


Priscila 7/22/2022 9:59:00 AM

i find the xengine test engine simulator to be more fun than reading from pdf.
GERMANY


suresh 12/16/2023 10:54:00 PM

nice document
Anonymous


Wali 6/4/2023 10:07:00 PM

thank you for making the questions and answers intractive and selectable.
UNITED STATES


Nawaz 7/18/2023 1:10:00 AM

answers are correct?
UNITED STATES


das 6/23/2023 7:57:00 AM

can i belive this dump
INDIA


Sanjay 10/15/2023 1:34:00 PM

great site to practice for sitecore exam
INDIA


jaya 12/17/2023 8:36:00 AM

good for students
UNITED STATES


Bsmaind 8/20/2023 9:23:00 AM

nice practice dumps
Anonymous


kumar 11/15/2023 11:24:00 AM

nokia 4a0-114 dumps
Anonymous


Vetri 10/3/2023 12:59:00 AM

great content and wonderful to have the answers with explanation
UNITED STATES


Ranjith 8/21/2023 3:39:00 PM

for question #118, the answer is option c. the screen shot is showing the drop down, but the answer is marked incorrectly please update . thanks for sharing such nice questions.
Anonymous


Eduardo Ramírez 12/11/2023 9:55:00 PM

the correct answer for the question 29 is d.
Anonymous


Dass 11/2/2023 7:43:00 AM

question no 22: correct answers: bc, 1 per session 1 per page 1 per component always
UNITED STATES


Reddy 12/14/2023 2:42:00 AM

these are pretty useful
Anonymous


Daisy Delgado 1/9/2023 1:05:00 PM

awesome
UNITED STATES


Atif 6/13/2023 4:09:00 AM

yes please upload
UNITED STATES


Xunil 6/12/2023 3:04:00 PM

great job whoever put this together, for the greater good! thanks!
Anonymous


Lakshmi 10/2/2023 5:26:00 AM

just started to view all questions for the exam
NETHERLANDS


rani 1/19/2024 11:52:00 AM

helpful material
Anonymous


Greg 11/16/2023 6:59:00 AM

hope for the best
UNITED STATES


hi 10/5/2023 4:00:00 AM

will post exam has finished
UNITED STATES


Vmotu 8/24/2023 11:14:00 AM

really correct and good analyze!
AZERBAIJAN


hicham 5/30/2023 8:57:00 AM

excellent thanks a lot
FRANCE


Suman C 7/7/2023 8:13:00 AM

will post once pass the cka exam
INDIA


Ram 11/3/2023 5:10:00 AM

good content
Anonymous


Nagendra Pedipina 7/13/2023 2:12:00 AM

q:32 answer has to be option c
INDIA