Certified Associate Developer for Apache Spark Dumps PDF Free Download

QUESTION: 21

Which of the following code blocks can be used to save DataFrame transactionsDf to memory only, recalculating partitions that do not fit in memory when they are needed?

from pyspark import StorageLevel transactionsDf.cache(StorageLevel.MEMORY_ONLY)
transactionsDf.cache()
transactionsDf.storage_level('MEMORY_ONLY')
transactionsDf.persist()
transactionsDf.clear_persist()
from pyspark import StorageLevel transactionsDf.persist(StorageLevel.MEMORY_ONLY)

Answer(s): F

Explanation:

from pyspark import StorageLevel transactionsDf.persist(StorageLevel.MEMORY_ONLY)
Correct. Note that the storage level MEMORY_ONLY means that all partitions that do not fit into memory will be recomputed when they are needed.
transactionsDf.cache()
This is wrong because the default storage level of DataFrame.cache() is MEMORY_AND_DISK, meaning that partitions that do not fit into memory are stored on disk.
transactionsDf.persist()
This is wrong because the default storage level of DataFrame.persist() is MEMORY_AND_DISK. transactionsDf.clear_persist()
Incorrect, since clear_persist() is not a method of DataFrame. transactionsDf.storage_level('MEMORY_ONLY')
Wrong. storage_level is not a method of DataFrame.
More info: RDD Programming Guide - Spark 3.0.0 Documentation, pyspark.sql.DataFrame.persist —
PySpark 3.0.0 documentation (https://bit.ly/3sxHLVC , https://bit.ly/3j2N6B9)

Reveal Solution Next Question

QUESTION: 22

The code block displayed below contains an error. The code block should create DataFrame itemsAttributesDf which has columns itemId and attribute and lists every attribute from the attributes column in DataFrame itemsDf next to the itemId of the respective row in itemsDf. Find the error.
A sample of DataFrame itemsDf is below.

Code block:
itemsAttributesDf = itemsDf.explode("attributes").alias("attribute").select("attribute", "itemId")

Since itemId is the index, it does not need to be an argument to the select() method.
The alias() method needs to be called after the select() method.
The explode() method expects a Column object rather than a string.
explode() is not a method of DataFrame. explode() should be used inside the select() method instead.
The split() method should be used inside the select() method instead of the explode() method.

Answer(s): D

Explanation:

The correct code block looks like this:

Then, the first couple of rows of itemAttributesDf look like this:

explode() is not a method of DataFrame. explode() should be used inside the select() method instead.
This is correct.
The split() method should be used inside the select() method instead of the explode() method.
No, the split() method is used to split strings into parts. However, column attributs is an array of strings. In this case, the explode() method is appropriate.
Since itemId is the index, it does not need to be an argument to the select() method. No, itemId still needs to be selected, whether it is used as an index or not.
The explode() method expects a Column object rather than a string.
No, a string works just fine here. This being said, there are some valid alternatives to passing in a string:

The alias() method needs to be called after the select() method. No.
More info: pyspark.sql.functions.explode — PySpark 3.1.1 documentation (https://bit.ly/2QUZI1J) Static notebook | Dynamic notebook: See test 1, Question: 22 (
Databricks import instructions) (https://flrs.github.io/spark_practice_tests_code/#1/22.html , https://bit.ly/sparkpracticeexams_import_instructions)

Reveal Solution Next Question

QUESTION: 23

Which of the following code blocks reads in parquet file /FileStore/imports.parquet as a DataFrame?

spark.mode("parquet").read("/FileStore/imports.parquet")
spark.read.path("/FileStore/imports.parquet", source="parquet")
spark.read().parquet("/FileStore/imports.parquet")
spark.read.parquet("/FileStore/imports.parquet")
spark.read().format('parquet').open("/FileStore/imports.parquet")

Answer(s): D

Explanation:

Static notebook | Dynamic notebook: See test 1, Question: 23 (
Databricks import instructions) (https://flrs.github.io/spark_practice_tests_code/#1/23.html ,
https://bit.ly/sparkpracticeexams_import_instructions)

Reveal Solution Next Question

QUESTION: 24

The code block shown below should convert up to 5 rows in DataFrame transactionsDf that have the value 25 in column storeId into a Python list. Choose the answer that correctly fills the blanks in the code block to accomplish this. Code block:
transactionsDf. 1 ( 2 ). 3 ( 4 )

1. filter
2. "storeId"==25
3. collect 4. 5
1. filter
2. col("storeId")==25
3. toLocalIterator 4. 5
1. select
2. storeId==25
3. head 4. 5
1. filter
2. col("storeId")==25
3. take 4. 5
1. filter
2. col("storeId")==25
3. collect 4. 5

Answer(s): D

Explanation:

The correct code block is: transactionsDf.filter(col("storeId")==25).take(5)
Any of the options with collect will not work because collect does not take any arguments, and in both cases the argument 5 is given.
The option with toLocalIterator will not work because the only argument to toLocalIterator is prefetchPartitions which is a boolean, so passing 5 here does not make sense.
The option using head will not work because the expression passed to select is not proper syntax. It would work if the expression would be col("storeId")==25.
Static notebook | Dynamic notebook: See test 1, Question: 24 (
Databricks import instructions) (https://flrs.github.io/spark_practice_tests_code/#1/24.html ,
https://bit.ly/sparkpracticeexams_import_instructions)

Reveal Solution Next Question

QUESTION: 25

Which of the following code blocks reads JSON file imports.json into a DataFrame?

spark.read().mode("json").path("/FileStore/imports.json")
spark.read.format("json").path("/FileStore/imports.json")
spark.read("json", "/FileStore/imports.json")
spark.read.json("/FileStore/imports.json")
spark.read().json("/FileStore/imports.json")

Answer(s): D

Explanation:

Static notebook | Dynamic notebook: See test 1, Question: 25 (
Databricks import instructions) (https://flrs.github.io/spark_practice_tests_code/#1/25.html ,
https://bit.ly/sparkpracticeexams_import_instructions)

Reveal Solution Next Question

Databricks Certified Associate Developer for Apache Spark Certified Associate Developer for Apache Spark Dumps in PDF

Free Databricks Certified Associate Developer for Apache Spark Real Questions (page: 5)

QUESTION: 21

Explanation:

QUESTION: 22

Explanation:

QUESTION: 23

Explanation:

QUESTION: 24

Explanation:

QUESTION: 25

Explanation: