Databricks Databricks Certified Associate Developer for Apache Spark 3.0 Exam (page: 6)
Databricks Certified Associate Developer for Apache Spark
Updated on: 09-Apr-2026

Which of the following code blocks returns a DataFrame that has all columns of DataFrame transactionsDf and an additional column predErrorSquared which is the squared value of column predError in DataFrame transactionsDf?

  1. transactionsDf.withColumn("predError", pow(col("predErrorSquared"), 2))
  2. transactionsDf.withColumnRenamed("predErrorSquared", pow(predError, 2))
  3. transactionsDf.withColumn("predErrorSquared", pow(col("predError"), lit(2)))
  4. transactionsDf.withColumn("predErrorSquared", pow(predError, lit(2)))
  5. transactionsDf.withColumn("predErrorSquared", "predError"**2)

Answer(s): C

Explanation:

While only one of these code blocks works, the DataFrame API is pretty flexible when it comes to accepting columns into the pow() method. The following code blocks would also work: transactionsDf.withColumn("predErrorSquared", pow("predError", 2)) transactionsDf.withColumn("predErrorSquared", pow("predError", lit(2)))
Static notebook | Dynamic notebook: See test 1, Question: 26 (
Databricks import instructions) (https://flrs.github.io/spark_practice_tests_code/#1/26.html ,
https://bit.ly/sparkpracticeexams_import_instructions)



The code block displayed below contains an error. The code block should return a new DataFrame
that only contains rows from DataFrame transactionsDf in which the value in column predError is at least 5. Find the error. Code block:
transactionsDf.where("col(predError) >= 5")

  1. The argument to the where method should be "predError >= 5".
  2. Instead of where(), filter() should be used.
  3. The expression returns the original DataFrame transactionsDf and not a new DataFrame. To avoid this, the code block should be transactionsDf.toNewDataFrame().where("col(predError) >= 5").
  4. The argument to the where method cannot be a string.
  5. Instead of >=, the SQL operator GEQ should be used.

Answer(s): A

Explanation:

The argument to the where method cannot be a string. It can be a string, no problem here.
Instead of where(), filter() should be used.
No, that does not matter. In PySpark, where() and filter() are equivalent. Instead of >=, the SQL operator GEQ should be used.
Incorrect.
The expression returns the original DataFrame transactionsDf and not a new DataFrame. To avoid this, the code block should be transactionsDf.toNewDataFrame().where("col(predError) >= 5").
No, Spark returns a new DataFrame.
Static notebook | Dynamic notebook: See test 1, Question: 27 (
Databricks import instructions) (https://flrs.github.io/spark_practice_tests_code/#1/27.html ,
https://bit.ly/sparkpracticeexams_import_instructions)



Which of the following code blocks saves DataFrame transactionsDf in location
/FileStore/transactions.csv as a CSV file and throws an error if a file already exists in the location?

  1. transactionsDf.write.save("/FileStore/transactions.csv")
  2. transactionsDf.write.format("csv").mode("error").path("/FileStore/transactions.csv")
  3. transactionsDf.write.format("csv").mode("ignore").path("/FileStore/transactions.csv")
  4. transactionsDf.write("csv").mode("error").save("/FileStore/transactions.csv")
  5. transactionsDf.write.format("csv").mode("error").save("/FileStore/transactions.csv")

Answer(s): E

Explanation:

Static notebook | Dynamic notebook: See test 1, question 28 (
Databricks import instructions) (https://flrs.github.io/spark_practice_tests_code/#1/28.html ,
https://bit.ly/sparkpracticeexams_import_instructions)



The code block shown below should return a DataFrame with two columns, itemId and col. In this DataFrame, for each element in column attributes of DataFrame itemDf there should be a separate row in which the column itemId contains the associated itemId from DataFrame itemsDf. The new DataFrame should only contain rows for rows in DataFrame itemsDf in which the column attributes contains the element cozy.

A sample of DataFrame itemsDf is below.

Code block:
itemsDf. 1 ( 2 ). 3 ( 4 , 5 ( 6 ))

  1. 1. filter
    2. array_contains("cozy")
    3. select
    4. "itemId"
    5. explode
    6. "attributes"
  2. 1. where
    2. "array_contains(attributes, 'cozy')"
    3. select
    4. itemId
    5. explode
    6. attributes
  3. 1. filter
    2. "array_contains(attributes, 'cozy')"
    3. select
    4. "itemId"
    5. map
    6. "attributes"
  4. 1. filter
    2. "array_contains(attributes, cozy)"
    3. select
    4. "itemId"
    5. explode
    6. "attributes"
  5. 1. filter
    2. "array_contains(attributes, 'cozy')"
    3. select
    4. "itemId"
    5. explode
    6. "attributes"

Answer(s): E

Explanation:

The correct code block is:
itemsDf.filter("array_contains(attributes, 'cozy')").select("itemId", explode("attributes"))
The key here is understanding how to use array_contains(). You can either use it as an expression in a string, or you can import it from pyspark.sql.functions. In that case, the following would also work:
itemsDf.filter(array_contains("attributes", "cozy")).select("itemId", explode("attributes")) Static notebook | Dynamic notebook: See test 1, Question: 29 (
Databricks import instructions) (https://flrs.github.io/spark_practice_tests_code/#1/29.html ,
https://bit.ly/sparkpracticeexams_import_instructions)



The code block displayed below contains an error. The code block should return the average of rows in column value grouped by unique storeId. Find the error.
Code block: transactionsDf.agg("storeId").avg("value")

  1. Instead of avg("value"), avg(col("value")) should be used.
  2. The avg("value") should be specified as a second argument to agg() instead of being appended to it.
  3. All column names should be wrapped in col() operators.
  4. agg should be replaced by groupBy.
  5. "storeId" and "value" should be swapped.

Answer(s): D

Explanation:

Static notebook | Dynamic notebook: See test 1, Question: 30 (
Databricks import instructions) (https://flrs.github.io/spark_practice_tests_code/#1/30.html ,
https://bit.ly/sparkpracticeexams_import_instructions)



Viewing Page 6 of 37



Share your comments for Databricks Databricks Certified Associate Developer for Apache Spark 3.0 exam with other users:

Samir 8/1/2023 3:07:00 PM

very useful tools
UNITED STATES


Saeed 11/7/2023 3:14:00 AM

looks a good platform to prepare az-104
Anonymous


Matiullah 6/24/2023 7:37:00 AM

want to pass the exam
Anonymous


SN 9/5/2023 2:25:00 PM

good resource
UNITED STATES


Zoubeyr 9/8/2023 5:56:00 AM

question 11 : d
FRANCE


User 8/29/2023 3:24:00 AM

only the free dumps will be enough for pass, or have to purchase the premium one. please suggest.
Anonymous


CW 7/6/2023 7:37:00 PM

good questions. thanks.
Anonymous


Farooqi 11/21/2023 1:37:00 AM

good for practice.
INDIA


Isaac 10/28/2023 2:30:00 PM

great case study
UNITED STATES


Malviya 2/3/2023 9:10:00 AM

the questions in this exam dumps is valid. i passed my test last monday. i only whish they had their pricing in inr instead of usd. but it is still worth it.
INDIA


rsmyth 5/18/2023 12:44:00 PM

q40 the answer is not d, why are you giving incorrect answers? snapshot consolidation is used to merge the snapshot delta disk files to the vm base disk
IRELAND


Keny 6/23/2023 9:00:00 PM

thanks, very relevant
PERU


Muhammad Rawish Siddiqui 11/29/2023 12:14:00 PM

wrong answer. it is true not false.
SAUDI ARABIA


Josh 7/10/2023 1:54:00 PM

please i need the mo-100 questions
Anonymous


VINNY 6/2/2023 11:59:00 AM

very good use full
Anonymous


Andy 12/6/2023 5:56:00 AM

very valid questions
Anonymous


Mamo 8/12/2023 7:46:00 AM

will these question help me to clear pl-300 exam?
UNITED STATES


Marial Manyang 7/26/2023 10:13:00 AM

please provide me with these dumps questions. thanks
Anonymous


Amel Mhamdi 12/16/2022 10:10:00 AM

in the pdf downloaded is write google cloud database engineer i think that it isnt the correct exam
FRANCE


Angel 8/30/2023 10:58:00 PM

i think you have the answers wrong regarding question: "what are three core principles of web content accessibility guidelines (wcag)? answer: robust, operable, understandable
UNITED STATES


SH 5/16/2023 1:43:00 PM

these questions are not valid , they dont come for the exam now
UNITED STATES


sudhagar 9/6/2023 3:02:00 PM

question looks valid
UNITED STATES


Van 11/24/2023 4:02:00 AM

good for practice
Anonymous


Divya 8/2/2023 6:54:00 AM

need more q&a to go ahead
Anonymous


Rakesh 10/6/2023 3:06:00 AM

question 59 - a newly-created role is not assigned to any user, nor granted to any other role. answer is b https://docs.snowflake.com/en/user-guide/security-access-control-overview
Anonymous


Nik 11/10/2023 4:57:00 AM

just passed my exam today. i saw all of these questions in my text today. so i can confirm this is a valid dump.
HONG KONG


Deep 6/12/2023 7:22:00 AM

needed dumps
INDIA


tumz 1/16/2024 10:30:00 AM

very helpful
UNITED STATES


NRI 8/27/2023 10:05:00 AM

will post once the exam is finished
UNITED STATES


kent 11/3/2023 10:45:00 AM

relevant questions
Anonymous


Qasim 6/11/2022 9:43:00 AM

just clear exam on 10/06/2202 dumps is valid all questions are came same in dumps only 2 new questions total 46 questions 1 case study with 5 question no lab/simulation in my exam please check the answers best of luck
Anonymous


Cath 10/10/2023 10:09:00 AM

q.112 - correct answer is c - the event registry is a module that provides event definitions. answer a - not correct as it is the definition of event log
VIET NAM


Shiji 10/15/2023 1:31:00 PM

good and useful.
INDIA


Ade 6/25/2023 1:14:00 PM

good questions
Anonymous


AI Tutor 👋 I’m here to help!