Databricks Databricks Certified Associate Developer for Apache Spark 3.0 Exam (page: 6)
Databricks Certified Associate Developer for Apache Spark
Updated on: 02-Jan-2026

Which of the following code blocks returns a DataFrame that has all columns of DataFrame transactionsDf and an additional column predErrorSquared which is the squared value of column predError in DataFrame transactionsDf?

  1. transactionsDf.withColumn("predError", pow(col("predErrorSquared"), 2))
  2. transactionsDf.withColumnRenamed("predErrorSquared", pow(predError, 2))
  3. transactionsDf.withColumn("predErrorSquared", pow(col("predError"), lit(2)))
  4. transactionsDf.withColumn("predErrorSquared", pow(predError, lit(2)))
  5. transactionsDf.withColumn("predErrorSquared", "predError"**2)

Answer(s): C

Explanation:

While only one of these code blocks works, the DataFrame API is pretty flexible when it comes to accepting columns into the pow() method. The following code blocks would also work: transactionsDf.withColumn("predErrorSquared", pow("predError", 2)) transactionsDf.withColumn("predErrorSquared", pow("predError", lit(2)))
Static notebook | Dynamic notebook: See test 1, Question: 26 (
Databricks import instructions) (https://flrs.github.io/spark_practice_tests_code/#1/26.html ,
https://bit.ly/sparkpracticeexams_import_instructions)



The code block displayed below contains an error. The code block should return a new DataFrame
that only contains rows from DataFrame transactionsDf in which the value in column predError is at least 5. Find the error. Code block:
transactionsDf.where("col(predError) >= 5")

  1. The argument to the where method should be "predError >= 5".
  2. Instead of where(), filter() should be used.
  3. The expression returns the original DataFrame transactionsDf and not a new DataFrame. To avoid this, the code block should be transactionsDf.toNewDataFrame().where("col(predError) >= 5").
  4. The argument to the where method cannot be a string.
  5. Instead of >=, the SQL operator GEQ should be used.

Answer(s): A

Explanation:

The argument to the where method cannot be a string. It can be a string, no problem here.
Instead of where(), filter() should be used.
No, that does not matter. In PySpark, where() and filter() are equivalent. Instead of >=, the SQL operator GEQ should be used.
Incorrect.
The expression returns the original DataFrame transactionsDf and not a new DataFrame. To avoid this, the code block should be transactionsDf.toNewDataFrame().where("col(predError) >= 5").
No, Spark returns a new DataFrame.
Static notebook | Dynamic notebook: See test 1, Question: 27 (
Databricks import instructions) (https://flrs.github.io/spark_practice_tests_code/#1/27.html ,
https://bit.ly/sparkpracticeexams_import_instructions)



Which of the following code blocks saves DataFrame transactionsDf in location
/FileStore/transactions.csv as a CSV file and throws an error if a file already exists in the location?

  1. transactionsDf.write.save("/FileStore/transactions.csv")
  2. transactionsDf.write.format("csv").mode("error").path("/FileStore/transactions.csv")
  3. transactionsDf.write.format("csv").mode("ignore").path("/FileStore/transactions.csv")
  4. transactionsDf.write("csv").mode("error").save("/FileStore/transactions.csv")
  5. transactionsDf.write.format("csv").mode("error").save("/FileStore/transactions.csv")

Answer(s): E

Explanation:

Static notebook | Dynamic notebook: See test 1, question 28 (
Databricks import instructions) (https://flrs.github.io/spark_practice_tests_code/#1/28.html ,
https://bit.ly/sparkpracticeexams_import_instructions)



The code block shown below should return a DataFrame with two columns, itemId and col. In this DataFrame, for each element in column attributes of DataFrame itemDf there should be a separate row in which the column itemId contains the associated itemId from DataFrame itemsDf. The new DataFrame should only contain rows for rows in DataFrame itemsDf in which the column attributes contains the element cozy.

A sample of DataFrame itemsDf is below.

Code block:
itemsDf. 1 ( 2 ). 3 ( 4 , 5 ( 6 ))

  1. 1. filter
    2. array_contains("cozy")
    3. select
    4. "itemId"
    5. explode
    6. "attributes"
  2. 1. where
    2. "array_contains(attributes, 'cozy')"
    3. select
    4. itemId
    5. explode
    6. attributes
  3. 1. filter
    2. "array_contains(attributes, 'cozy')"
    3. select
    4. "itemId"
    5. map
    6. "attributes"
  4. 1. filter
    2. "array_contains(attributes, cozy)"
    3. select
    4. "itemId"
    5. explode
    6. "attributes"
  5. 1. filter
    2. "array_contains(attributes, 'cozy')"
    3. select
    4. "itemId"
    5. explode
    6. "attributes"

Answer(s): E

Explanation:

The correct code block is:
itemsDf.filter("array_contains(attributes, 'cozy')").select("itemId", explode("attributes"))
The key here is understanding how to use array_contains(). You can either use it as an expression in a string, or you can import it from pyspark.sql.functions. In that case, the following would also work:
itemsDf.filter(array_contains("attributes", "cozy")).select("itemId", explode("attributes")) Static notebook | Dynamic notebook: See test 1, Question: 29 (
Databricks import instructions) (https://flrs.github.io/spark_practice_tests_code/#1/29.html ,
https://bit.ly/sparkpracticeexams_import_instructions)



The code block displayed below contains an error. The code block should return the average of rows in column value grouped by unique storeId. Find the error.
Code block: transactionsDf.agg("storeId").avg("value")

  1. Instead of avg("value"), avg(col("value")) should be used.
  2. The avg("value") should be specified as a second argument to agg() instead of being appended to it.
  3. All column names should be wrapped in col() operators.
  4. agg should be replaced by groupBy.
  5. "storeId" and "value" should be swapped.

Answer(s): D

Explanation:

Static notebook | Dynamic notebook: See test 1, Question: 30 (
Databricks import instructions) (https://flrs.github.io/spark_practice_tests_code/#1/30.html ,
https://bit.ly/sparkpracticeexams_import_instructions)



Viewing Page 6 of 37



Share your comments for Databricks Databricks Certified Associate Developer for Apache Spark 3.0 exam with other users:

A\MAM 6/27/2023 5:17:00 PM

q-6 ans-b correct. https://docs.paloaltonetworks.com/pan-os/9-1/pan-os-cli-quick-start/use-the-cli/commit-configuration-changes
UNITED STATES


unanimous 12/15/2023 6:38:00 AM

very nice very nice
Anonymous


akminocha 9/28/2023 10:36:00 AM

please help us with 1z0-1107-2 dumps
INDIA


Jefi 9/4/2023 8:15:00 AM

please upload the practice questions
Anonymous


Thembelani 5/30/2023 2:45:00 AM

need this dumps
Anonymous


Abduraimov 4/19/2023 12:43:00 AM

preparing for this exam is overwhelming. you cannot pass without the help of these exam dumps.
UNITED KINGDOM


Puneeth 10/5/2023 2:06:00 AM

new to this site but i feel it is good
EUROPEAN UNION


Ashok Kumar 1/2/2024 6:53:00 AM

the correct answer to q8 is b. explanation since the mule app has a dependency, it is necessary to include project modules and dependencies to make sure the app will run successfully on the runtime on any other machine. source code of the component that the mule app is dependent of does not need to be included in the exported jar file, because the source code is not being used while executing an app. compiled code is being used instead.
Anonymous


Merry 7/30/2023 6:57:00 AM

good questions
Anonymous


VoiceofMidnight 12/17/2023 4:07:00 PM

Delayed the exam until December 29th.
UNITED STATES


Umar Ali 8/29/2023 2:59:00 PM

A and D are True
Anonymous


vel 8/28/2023 9:17:09 AM

good one with explanation
Anonymous


Gurdeep 1/18/2024 4:00:15 PM

This is one of the most useful study guides I have ever used.
CANADA