Databricks Databricks Certified Associate Developer for Apache Spark 3.0 Exam (page: 2)
Databricks Certified Associate Developer for Apache Spark
Updated on: 02-Jan-2026

Which of the following is the deepest level in Spark's execution hierarchy?

  1. Job
  2. Task
  3. Executor
  4. Slot
  5. Stage

Answer(s): B

Explanation:

The hierarchy is, from top to bottom: Job, Stage, Task.
Executors and slots facilitate the execution of tasks, but they are not directly part of the hierarchy. Executors are launched by the driver on worker nodes for the purpose of running a specific Spark application. Slots help Spark parallelize work. An executor can have multiple slots which enable it to process multiple tasks in parallel.



Which of the following statements about garbage collection in Spark is incorrect?

  1. Garbage collection information can be accessed in the Spark UI's stage detail view.
  2. Optimizing garbage collection performance in Spark may limit caching ability.
  3. Manually persisting RDDs in Spark prevents them from being garbage collected.
  4. In Spark, using the G1 garbage collector is an alternative to using the default Parallel garbage collector.
  5. Serialized caching is a strategy to increase the performance of garbage collection.

Answer(s): C

Explanation:

Manually persisting RDDs in Spark prevents them from being garbage collected.
This statement is incorrect, and thus the correct answer to the question. Spark's garbage collector will remove even persisted objects, albeit in an "LRU" fashion. LRU stands for least recently used.
So, during a garbage collection run, the objects that were used the longest time ago will be garbage collected first.
See the linked StackOverflow post below for more information.
Serialized caching is a strategy to increase the performance of garbage collection.
This statement is correct. The more Java objects Spark needs to collect during garbage collection, the longer it takes. Storing a collection of many Java objects, such as a DataFrame with a complex schema, through serialization as a single byte array thus increases performance. This means that garbage collection takes less time on a serialized DataFrame than an unserialized DataFrame.
Optimizing garbage collection performance in Spark may limit caching ability.
This statement is correct. A full garbage collection run slows down a Spark application. When taking about "tuning" garbage collection, we mean reducing the amount or duration of these slowdowns.
A full garbage collection run is triggered when the Old generation of the Java heap space is almost full. (If you are unfamiliar with this concept, check out the link to the Garbage Collection Tuning docs below.) Thus, one measure to avoid triggering a garbage collection run is to prevent the Old generation share of the heap space to be almost full.
To achieve this, one may decrease its size. Objects with sizes greater than the Old generation space will then be discarded instead of cached (stored) in the space and helping it to be "almost full".
This will decrease the number of full garbage collection runs, increasing overall performance. Inevitably, however, objects will need to be recomputed when they are needed. So, this mechanism only works when a Spark application needs to reuse cached data as little as possible.
Garbage collection information can be accessed in the Spark UI's stage detail view.
This statement is correct. The task table in the Spark UI's stage detail view has a "GC Time" column, indicating the garbage collection time needed per task.
In Spark, using the G1 garbage collector is an alternative to using the default Parallel garbage collector.
This statement is correct. The G1 garbage collector, also known as garbage first garbage collector, is an alternative to the default Parallel garbage collector.
While the default Parallel garbage collector divides the heap into a few static regions, the G1 garbage collector divides the heap into many small regions that are created dynamically. The G1 garbage collector has certain advantages over the Parallel garbage collector which improve performance particularly for Spark workloads that require high throughput and low latency.
The G1 garbage collector is not enabled by default, and you need to explicitly pass an argument to Spark to enable it. For more information about the two garbage collectors, check out the Databricks article linked below.



Which of the following describes characteristics of the Dataset API?

  1. The Dataset API does not support unstructured data.
  2. In Python, the Dataset API mainly resembles Pandas' DataFrame API.
  3. In Python, the Dataset API's schema is constructed via type hints.
  4. The Dataset API is available in Scala, but it is not available in Python.
  5. The Dataset API does not provide compile-time type safety.

Answer(s): D

Explanation:

The Dataset API is available in Scala, but it is not available in Python.
Correct. The Dataset API uses fixed typing and is typically used for object-oriented programming. It is available when Spark is used with the Scala programming language, but not for Python. In Python, you use the DataFrame API, which is based on the Dataset API. The Dataset API does not provide compile-time type safety.
No – in fact, depending on the use case, the type safety that the Dataset API provides is an advantage.
The Dataset API does not support unstructured data.
Wrong, the Dataset API supports structured and unstructured data. In Python, the Dataset API's schema is constructed via type hints.
No, this is not applicable since the Dataset API is not available in Python. In Python, the Dataset API mainly resembles Pandas' DataFrame API. The Dataset API does not exist in Python, only in Scala and Java.



Which of the following describes the difference between client and cluster execution modes?

  1. In cluster mode, the driver runs on the worker nodes, while the client mode runs the driver on the client machine.
  2. In cluster mode, the driver runs on the edge node, while the client mode runs the driver in a worker node.
  3. In cluster mode, each node will launch its own executor, while in client mode, executors will exclusively run on the client machine.
  4. In client mode, the cluster manager runs on the same host as the driver, while in cluster mode, the cluster manager runs on a separate node.
  5. In cluster mode, the driver runs on the master node, while in client mode, the driver runs on a virtual machine in the cloud.

Answer(s): A

Explanation:

In cluster mode, the driver runs on the master node, while in client mode, the driver runs on a virtual machine in the cloud.
This is wrong, since execution modes do not specify whether workloads are run in the cloud or on- premise.
In cluster mode, each node will launch its own executor, while in client mode, executors will exclusively run on the client machine.
Wrong, since in both cases executors run on worker nodes.
In cluster mode, the driver runs on the edge node, while the client mode runs the driver in a worker node.
Wrong – in cluster mode, the driver runs on a worker node. In client mode, the driver runs on the client machine.
In client mode, the cluster manager runs on the same host as the driver, while in cluster mode, the cluster manager runs on a separate node.
No. In both modes, the cluster manager is typically on a separate node – not on the same host as the driver. It only runs on the same host as the driver in local execution mode.
More info: Learning Spark, 2nd Edition, Chapter 1, and Spark: The Definitive Guide, Chapter 15. ()



Which of the following statements about executors is correct, assuming that one can consider each of the JVMs working as executors as a pool of task execution slots?

  1. Slot is another name for executor.
  2. There must be less executors than tasks.
  3. An executor runs on a single core.
  4. There must be more slots than tasks.
  5. Tasks run in parallel via slots.

Answer(s): E

Explanation:

Tasks run in parallel via slots.
Correct. Given the assumption, an executor then has one or more "slots", defined by the equation spark.executor.cores / spark.task.cpus. With the executor's resources divided into slots, each task takes up a slot and multiple tasks can be executed in parallel. Slot is another name for executor.
No, a slot is part of an executor. An executor runs on a single core.
No, an executor can occupy multiple cores. This is set by the spark.executor.cores option. There must be more slots than tasks.
No. Slots just process tasks. One could imagine a scenario where there was just a single slot for multiple tasks, processing one task at a time. Granted – this is the opposite of what Spark should be used for, which is distributed data processing over multiple cores and machines, performing many tasks in parallel.
There must be less executors than tasks. No, there is no such requirement.
More info: Spark Architecture | Distributed Systems Architecture (https://bit.ly/3x4MZZt)



Viewing Page 2 of 37



Share your comments for Databricks Databricks Certified Associate Developer for Apache Spark 3.0 exam with other users:

Not Miguel 11/26/2023 9:43:00 PM

for this question - "which three type of basic patient or member information is displayed on the patient info component? (choose three.)", list of conditions is not displayed (it is displayed in patient card, not patient info). so should be thumbnail of chatter photo
Anonymous


Andrus 12/17/2023 12:09:00 PM

q52 should be d. vm storage controller bandwidth represents the amount of data (in terms of bandwidth) that a vms storage controller is using to read and write data to the storage fabric.
Anonymous


Raj 5/25/2023 8:43:00 AM

nice questions
UNITED STATES


max 12/22/2023 3:45:00 PM

very useful
Anonymous


Muhammad Rawish Siddiqui 12/8/2023 6:12:00 PM

question # 208: failure logs is not an example of operational metadata.
SAUDI ARABIA


Sachin Bedi 1/5/2024 4:47:00 AM

good questions
Anonymous


Kenneth 12/8/2023 7:34:00 AM

thank you for the test materials!
KOREA REPUBLIC OF


Harjinder Singh 8/9/2023 4:16:00 AM

its very helpful
HONG KONG


SD 7/13/2023 12:56:00 AM

good questions
UNITED STATES


kanjoe 7/2/2023 11:40:00 AM

good questons
UNITED STATES


Mahmoud 7/6/2023 4:24:00 AM

i need the dumb of the hcip security v4.0 exam
EGYPT


Wei 8/3/2023 4:18:00 AM

upload the dump please
HONG KONG


Stephen 10/3/2023 6:24:00 PM

yes, iam looking this
AUSTRALIA


Stephen 8/4/2023 9:08:00 PM

please upload cima e2 managing performance dumps
Anonymous


hp 6/16/2023 12:44:00 AM

wonderful questions
Anonymous


Priyo 11/14/2023 2:23:00 AM

i used this site since 2000, still great to support my career
INDONESIA


Jude 8/29/2023 1:56:00 PM

why is the answer to "which of the following is required by scrum?" all of the following stated below since most of them are not mandatory? sprint retrospective. members must be stand up at the daily scrum. sprint burndown chart. release planning.
UNITED STATES


Marc blue 9/15/2023 4:11:00 AM

great job. hope this helps out.
UNITED STATES


Anne 9/13/2023 2:33:00 AM

upload please. many thanks!
Anonymous


pepe el toro 9/12/2023 7:55:00 PM

this is so interesting
Anonymous


Antony 11/28/2023 12:13:00 AM

great material thanks
AUSTRALIA


Thembelani 5/30/2023 2:22:00 AM

anyone who wrote this exam recently
Anonymous


P 9/16/2023 1:27:00 AM

ok they re good
Anonymous


Jorn 7/13/2023 5:05:00 AM

relevant questions
UNITED KINGDOM


AM 6/20/2023 7:54:00 PM

please post
UNITED STATES


Nagendra Pedipina 7/13/2023 2:22:00 AM

q:42 there has to be a image in the question to choose what does it mean from the options
INDIA


BrainDumpee 11/18/2023 1:36:00 PM

looking for cphq dumps, where can i find these for free? please and thank you.
UNITED STATES


sheik 10/14/2023 11:37:00 AM

@aarun , thanks for the information. it would be great help if you share your email
Anonymous


Random user 12/11/2023 1:34:00 AM

1z0-1078-23 need this dumps
Anonymous


labuschanka 11/16/2023 6:06:00 PM

i gave the microsoft azure az-500 tests and prepared from this site as it has latest mock tests available which helped me evaluate my performance and score 919/1000
Anonymous


Marianne 10/22/2023 11:57:00 PM

i cannot see the button to go to the questions
Anonymous


sushant 6/28/2023 4:52:00 AM

good questions
EUROPEAN UNION


A\MAM 6/27/2023 5:17:00 PM

q-6 ans-b correct. https://docs.paloaltonetworks.com/pan-os/9-1/pan-os-cli-quick-start/use-the-cli/commit-configuration-changes
UNITED STATES


unanimous 12/15/2023 6:38:00 AM

very nice very nice
Anonymous