Databricks Databricks Certified Associate Developer for Apache Spark 3.0 Exam (page: 7)
Databricks Certified Associate Developer for Apache Spark
Updated on: 09-Apr-2026

Which of the following code blocks returns a copy of DataFrame itemsDf where the column supplier has been renamed to manufacturer?

  1. itemsDf.withColumn(["supplier", "manufacturer"])
  2. itemsDf.withColumn("supplier").alias("manufacturer")
  3. itemsDf.withColumnRenamed("supplier", "manufacturer")
  4. itemsDf.withColumnRenamed(col("manufacturer"), col("supplier"))
  5. itemsDf.withColumnsRenamed("supplier", "manufacturer")

Answer(s): C

Explanation:

itemsDf.withColumnRenamed("supplier", "manufacturer")

Correct! This uses the relatively trivial DataFrame method withColumnRenamed for renaming column supplier to column manufacturer.
Note that the Question: asks for
"a copy of DataFrame itemsDf". This may be confusing if you are not familiar with Spark yet. RDDs (Resilient Distributed Datasets) are the foundation of Spark DataFrames and are immutable. As such, DataFrames are immutable, too. Any command that changes anything in the DataFrame therefore necessarily returns a copy, or a new version, of it that has the changes applied. itemsDf.withColumnsRenamed("supplier", "manufacturer")
Incorrect. Spark's DataFrame API does not have a withColumnsRenamed() method. itemsDf.withColumnRenamed(col("manufacturer"), col("supplier"))
No. Watch out – although the col() method works for many methods of the DataFrame API, withColumnRenamed is not one of them. As outlined in the documentation linked below, withColumnRenamed expects strings. itemsDf.withColumn(["supplier", "manufacturer"])
Wrong. While DataFrame.withColumn() exists in Spark, it has a different purpose than renaming columns. withColumn is typically used to add columns to DataFrames, taking the name of the new column as a first, and a Column as a second argument. Learn more via the documentation that is linked below.
itemsDf.withColumn("supplier").alias("manufacturer")
No. While DataFrame.withColumn() exists, it requires 2 arguments. Furthermore, the alias() method on DataFrames would not help the cause of renaming a column much. DataFrame.alias() can be useful in addressing the input of join statements. However, this is far outside of the scope of this question. If you are curious nevertheless, check out the link below.
More info: pyspark.sql.DataFrame.withColumnRenamed — PySpark 3.1.1 documentation, pyspark.sql.DataFrame.withColumn — PySpark 3.1.1 documentation, and pyspark.sql.DataFrame.alias —

PySpark 3.1.2 documentation (https://bit.ly/3aSB5tm , https://bit.ly/2Tv4rbE , https://bit.ly/2RbhBd2)
Static notebook | Dynamic notebook: See test 1, Question: 31 (Databricks import instructions) (https://flrs.github.io/spark_practice_tests_code/#1/31.html ,
https://bit.ly/sparkpracticeexams_import_instructions)



Which of the following code blocks returns DataFrame transactionsDf sorted in descending order by column predError, showing missing values last?

  1. transactionsDf.sort(asc_nulls_last("predError"))
  2. transactionsDf.orderBy("predError").desc_nulls_last()
  3. transactionsDf.sort("predError", ascending=False)
  4. transactionsDf.desc_nulls_last("predError")
  5. transactionsDf.orderBy("predError").asc_nulls_last()

Answer(s): C

Explanation:

transactionsDf.sort("predError", ascending=False)
Correct! When using DataFrame.sort() and setting ascending=False, the DataFrame will be sorted by the specified column in descending order, putting all missing values last. An alternative, although not listed as an answer here, would be transactionsDf.sort(desc_nulls_last("predError")). transactionsDf.sort(asc_nulls_last("predError"))
Incorrect. While this is valid syntax, the DataFrame will be sorted on column predError in ascending order and not in descending order, putting missing values last. transactionsDf.desc_nulls_last("predError")
Wrong, this is invalid syntax. There is no method DataFrame.desc_nulls_last() in the Spark API. There is a Spark function desc_nulls_last() however (link see below). transactionsDf.orderBy("predError").desc_nulls_last()
No. While transactionsDf.orderBy("predError") is correct syntax (although it sorts the DataFrame by column predError in ascending order) and returns a DataFrame, there is no method DataFrame.desc_nulls_last() in the Spark API. There is a Spark function desc_nulls_last() however (link see below).
transactionsDf.orderBy("predError").asc_nulls_last()
Incorrect. There is no method DataFrame.asc_nulls_last() in the Spark API (see above).
More info: pyspark.sql.functions.desc_nulls_last — PySpark 3.1.2 documentation and pyspark.sql.DataFrame.sort — PySpark 3.1.2 documentation (https://bit.ly/3g1JtbI , https://bit.ly/2R90NCS)
Static notebook | Dynamic notebook: See test 1, Question: 32 (
Databricks import instructions) (https://flrs.github.io/spark_practice_tests_code/#1/32.html ,
https://bit.ly/sparkpracticeexams_import_instructions)



The code block displayed below contains an error. The code block is intended to perform an outer join of DataFrames transactionsDf and itemsDf on columns productId and itemId, respectively.

Find the error.
Code block:
transactionsDf.join(itemsDf, [itemsDf.itemId, transactionsDf.productId], "outer")

  1. The "outer" argument should be eliminated, since "outer" is the default join type.
  2. The join type needs to be appended to the join() operator, like join().outer() instead of listing it as the last argument inside the join() call.
  3. The term [itemsDf.itemId, transactionsDf.productId] should be replaced by itemsDf.itemId == transactionsDf.productId.
  4. The term [itemsDf.itemId, transactionsDf.productId] should be replaced by itemsDf.col("itemId")
    == transactionsDf.col("productId").
  5. The "outer" argument should be eliminated from the call and join should be replaced by joinOuter.

Answer(s): C

Explanation:

Correct code block:
transactionsDf.join(itemsDf, itemsDf.itemId == transactionsDf.productId, "outer") Static notebook | Dynamic notebook: See test 1, Question: 33 (
Databricks import instructions) (https://flrs.github.io/spark_practice_tests_code/#1/33.html ,
https://bit.ly/sparkpracticeexams_import_instructions)



Which of the following code blocks performs a join in which the small DataFrame transactionsDf is sent to all executors where it is joined with DataFrame itemsDf on columns storeId and itemId, respectively?

  1. itemsDf.join(transactionsDf, itemsDf.itemId == transactionsDf.storeId, "right_outer")
  2. itemsDf.join(transactionsDf, itemsDf.itemId == transactionsDf.storeId, "broadcast")
  3. itemsDf.merge(transactionsDf, "itemsDf.itemId == transactionsDf.storeId", "broadcast")
  4. itemsDf.join(broadcast(transactionsDf), itemsDf.itemId == transactionsDf.storeId)
  5. itemsDf.join(transactionsDf, broadcast(itemsDf.itemId == transactionsDf.storeId))

Answer(s): D

Explanation:

The issue with all answers that have "broadcast" as very last argument is that "broadcast" is not a valid join type. While the entry with "right_outer" is a valid statement, it is not a broadcast join. The item where broadcast() is wrapped around the equality condition is not valid code in Spark. broadcast() needs to be wrapped around the name of the small DataFrame that should be broadcast.

More info: Learning Spark, 2nd Edition, Chapter 7
Static notebook | Dynamic notebook: See test 1, Question: 34 ( Databricks import instructions)
tion and explanation?



Which of the following code blocks reduces a DataFrame from 12 to 6 partitions and performs a full shuffle?

  1. DataFrame.repartition(12)
  2. DataFrame.coalesce(6).shuffle()
  3. DataFrame.coalesce(6)
  4. DataFrame.coalesce(6, shuffle=True)
  5. DataFrame.repartition(6)

Answer(s): E

Explanation:

DataFrame.repartition(6)
Correct. repartition() always triggers a full shuffle (different from coalesce()). DataFrame.repartition(12)
No, this would just leave the DataFrame with 12 partitions and not 6. DataFrame.coalesce(6)
coalesce does not perform a full shuffle of the data. Whenever you see "full shuffle", you know that you are not dealing with coalesce(). While coalesce() can perform a partial shuffle when required, it will try to minimize shuffle operations, so the amount of data that is sent between executors.
Here, 12 partitions can easily be repartitioned to be 6 partitions simply by stitching every two partitions into one.
DataFrame.coalesce(6, shuffle=True) and DataFrame.coalesce(6).shuffle() These statements are not valid Spark API syntax.
More info: Spark Repartition & Coalesce - Explained and Repartition vs Coalesce in Apache Spark - Rock the JVM Blog



Viewing Page 7 of 37



Share your comments for Databricks Databricks Certified Associate Developer for Apache Spark 3.0 exam with other users:

AbedRabbou Alaqabna 12/18/2023 3:10:00 AM

q50: which two functions can be used by an end user when pivoting an interactive report? the correct answer is a, c because we do not have rank in the function pivoting you can check in the apex app
GREECE


Rohan Limaye 12/30/2023 8:52:00 AM

best to practice
Anonymous


Aparajeeta 10/13/2023 2:42:00 PM

so far it is good
Anonymous


Vgf 7/20/2023 3:59:00 PM

please provide me the dump
Anonymous


Deno 10/25/2023 1:14:00 AM

i failed the cisa exam today. but i have found all the questions that were on the exam to be on this site.
Anonymous


CiscoStudent 11/15/2023 5:29:00 AM

in question 272 the right answer states that an autonomous acces point is "configured and managed by the wlc" but this is not what i have learned in my ccna course. is this a mistake? i understand that lightweight aps are managed by wlc while autonomous work as standalones on the wlan.
Anonymous


pankaj 9/28/2023 4:36:00 AM

it was helpful
Anonymous


User123 10/8/2023 9:59:00 AM

good question
UNITED STATES


vinay 9/4/2023 10:23:00 AM

really nice
Anonymous


Usman 8/28/2023 10:07:00 AM

please i need dumps for isc2 cybersecuity
Anonymous


Q44 7/30/2023 11:50:00 AM

ans is coldline i think
UNITED STATES


Anuj 12/21/2023 1:30:00 PM

very helpful
Anonymous


Giri 9/13/2023 10:31:00 PM

can you please provide dumps so that it helps me more
UNITED STATES


Aaron 2/8/2023 12:10:00 AM

thank you for providing me with the updated question and answers. this version has all the questions from the exam. i just saw them in my exam this morning. i passed my exam today.
SOUTH AFRICA


Sarwar 12/21/2023 4:54:00 PM

how i can see exam questions?
CANADA


Chengchaone 9/11/2023 10:22:00 AM

can you please upload please?
Anonymous


Mouli 9/2/2023 7:02:00 AM

question 75: option c is correct answer
Anonymous


JugHead 9/27/2023 2:40:00 PM

please add this exam
Anonymous


sushant 6/28/2023 4:38:00 AM

please upoad
EUROPEAN UNION


John 8/7/2023 12:09:00 AM

has anyone recently attended safe 6.0 certification? is it the samq question from here.
Anonymous


Blessious Phiri 8/14/2023 3:49:00 PM

expository experience
Anonymous


concerned citizen 12/29/2023 11:31:00 AM

52 should be b&c. controller failure has nothing to do with this type of issue. degraded state tells us its a raid issue, and if the os is missing then the bootable device isnt found. the only other consideration could be data loss but thats somewhat broad whereas b&c show understanding of the specific issues the question is asking about.
UNITED STATES


deedee 12/23/2023 5:10:00 PM

great help!!!
UNITED STATES


Samir 8/1/2023 3:07:00 PM

very useful tools
UNITED STATES


Saeed 11/7/2023 3:14:00 AM

looks a good platform to prepare az-104
Anonymous


Matiullah 6/24/2023 7:37:00 AM

want to pass the exam
Anonymous


SN 9/5/2023 2:25:00 PM

good resource
UNITED STATES


Zoubeyr 9/8/2023 5:56:00 AM

question 11 : d
FRANCE


User 8/29/2023 3:24:00 AM

only the free dumps will be enough for pass, or have to purchase the premium one. please suggest.
Anonymous


CW 7/6/2023 7:37:00 PM

good questions. thanks.
Anonymous


Farooqi 11/21/2023 1:37:00 AM

good for practice.
INDIA


Isaac 10/28/2023 2:30:00 PM

great case study
UNITED STATES


Malviya 2/3/2023 9:10:00 AM

the questions in this exam dumps is valid. i passed my test last monday. i only whish they had their pricing in inr instead of usd. but it is still worth it.
INDIA


rsmyth 5/18/2023 12:44:00 PM

q40 the answer is not d, why are you giving incorrect answers? snapshot consolidation is used to merge the snapshot delta disk files to the vm base disk
IRELAND


AI Tutor 👋 I’m here to help!