The code block shown below should return a DataFrame with two columns, itemId and col. In this DataFrame, for each element in column attributes of DataFrame itemDf there should be a separate row in which the column itemId contains the associated itemId from DataFrame itemsDf. The new DataFrame should only contain rows for rows in DataFrame itemsDf in which the column attributes contains the element cozy.
A sample of DataFrame itemsDf is below.
Code block:
itemsDf. 1 ( 2 ). 3 ( 4 , 5 ( 6 ))
- 1. filter
2. array_contains("cozy")
3. select
4. "itemId"
5. explode
6. "attributes" - 1. where
2. "array_contains(attributes, 'cozy')"
3. select
4. itemId
5. explode
6. attributes - 1. filter
2. "array_contains(attributes, 'cozy')"
3. select
4. "itemId"
5. map
6. "attributes" - 1. filter
2. "array_contains(attributes, cozy)"
3. select
4. "itemId"
5. explode
6. "attributes" - 1. filter
2. "array_contains(attributes, 'cozy')"
3. select
4. "itemId"
5. explode
6. "attributes"
Answer(s): E
Explanation:
The correct code block is:
itemsDf.filter("array_contains(attributes, 'cozy')").select("itemId", explode("attributes"))
The key here is understanding how to use array_contains(). You can either use it as an expression in a string, or you can import it from pyspark.sql.functions. In that case, the following would also work:
itemsDf.filter(array_contains("attributes", "cozy")).select("itemId", explode("attributes")) Static notebook | Dynamic notebook: See test 1, Question: 29 (
Databricks import instructions) (https://flrs.github.io/spark_practice_tests_code/#1/29.html ,
https://bit.ly/sparkpracticeexams_import_instructions)
Reveal Solution Next Question