Cloudera CCA Spark and Hadoop Developer Exam CCA175 Dumps in PDF

Free Cloudera CCA175 Real Questions (page: 2)

Problem Scenario 74 : You have been given MySQL DB with following details.
user=retail_dba
password=cloudera
database=retail_db
table=retail_db.orders
table=retail_db.order_items
jdbc URL = jdbc:mysql://quickstart:3306/retail_db
Columns of order table : (orderjd , order_date , ordercustomerid, order status}
Columns of orderjtems table : (order_item_td , order_item_order_id , order_item_product_id,
order_item_quantity, order_item_subtotal, order_item_product_price)
Please accomplish following activities.

1. Copy "retaildb.orders" and "retaildb.orderjtems" table to hdfs in respective directory
p89_orders and p89_order_items .
2. Join these data using orderjd in Spark and Python
3. Now fetch selected columns from joined data Orderld, Order date and amount collected
on this order.
4. Calculate total order placed for each date, and produced the output sorted by date.

  1. See the explanation for Step by Step Solution and configuration.

Answer(s): A

Explanation:

Solution:Step 1: Import Single table .
sqoop import --connect jdbc:mysql://quickstart:3306/retail_db -username=retail_dba - password=cloudera -table=orders --target-dir=p89_orders - -m1 sqoop import --connect jdbc:mysql://quickstart:3306/retail_db -username=retail_dba - password=cloudera -table=order_items ~target-dir=p89_ order items -m 1 Note : Please check you dont have space between before or after '=' sign. Sqoop uses the MapReduce framework to copy data from RDBMS to hdfs
Step 2: Read the data from one of the partition, created using above command, hadoopfs -cat p89_orders/part-m-00000 hadoop fs -cat p89_order_items/part-m-00000

Step 3: Load these above two directory as RDD using Spark and Python (Open pyspark terminal and do following). orders = sc.textFile("p89_orders") orderitems = sc.textFile("p89_order_items")
Step 4: Convert RDD into key value as (orderjd as a key and rest of the values as a value)
#First value is orderjd
ordersKeyValue = orders.map(lambda line: (int(line.split(", ")[0]), line))
#Second value as an Orderjd
orderltemsKeyValue = orderltems.map(lambda line: (int(line.split(", ")[1]), line))
Step 5: Join both the RDD using orderjd
joinedData = orderltemsKeyValue.join(ordersKeyValue)
#print the joined data
tor line in joinedData.collect():
print(line)
Format of joinedData as below.
[Orderld, 'All columns from orderltemsKeyValue', 'All columns from orders Key Value']
Step 6: Now fetch selected values Orderld, Order date and amount collected on this order. revenuePerOrderPerDay = joinedData.map(lambda row: (row[0]( row[1][1].split(", ")[1]( f!oat(row[1][0].split('\M}[4]}}}
#printthe result
for line in revenuePerOrderPerDay.collect():
print(line)
Step 7: Select distinct order ids for each date.
#distinct(date, order_id)
distinctOrdersDate = joinedData.map(lambda row: row[1][1].split('\")[1] + ", " + str(row[0])).distinct()
for line in distinctOrdersDate.collect(): print(line)
Step 8: Similar to word count, generate (date, 1) record for each row. newLineTuple = distinctOrdersDate.map(lambda line: (line.split(", ")[0], 1))
Step 9: Do the count for each key(date), to get total order per date. totalOrdersPerDate = newLineTuple.reduceByKey(lambda a, b: a + b}
#print results
for line in totalOrdersPerDate.collect():
print(line)

Step 10: Sort the results by date sortedData=totalOrdersPerDate.sortByKey().collect()
#print results
for line in sortedData:
print(line)



Problem Scenario 34 : You have given a file named spark6/user.csv.
Data is given below:
user.csv
id, topic, hits
Rahul, scala, 120
Nikita, spark, 80
Mithun, spark, 1
myself, cca175, 180
Now write a Spark code in scala which will remove the header part and create RDD of values as below, for all rows. And also if id is myself" than filter out row.
Map(id -> om, topic -> scala, hits -> 120)

  1. See the explanation for Step by Step Solution and configuration.

Answer(s): A

Explanation:

Solution :
Step 1: Create file in hdfs (We will do using Hue). However, you can first create in local filesystem and then upload it to hdfs.
Step 2: Load user.csv file from hdfs and create PairRDDs val csv = sc.textFile("spark6/user.csv")
Step 3: split and clean data
val headerAndRows = csv.map(line => line.split(", ").map(_.trim))
Step 4: Get header row
val header = headerAndRows.first
Step 5: Filter out header (We need to check if the first val matches the first header name) val data = headerAndRows.filter(_(0) != header(O))
Step 6: Splits to map (header/value pairs)
val maps = data.map(splits => header.zip(splits).toMap)

Step 7: Filter out the user "myself
val result = maps.filter(map => mapf'id") != "myself")
Step 8: Save the output as a Text file. result.saveAsTextFile("spark6/result.txt")



Problem Scenario 39 : You have been given two files
spark16/file1.txt
1, 9, 5
2, 7, 4
3, 8, 3
spark16/file2.txt
1, g, h
2, i, j
3, k, l
Load these two tiles as Spark RDD and join them to produce the below results
(l, ((9, 5), (g, h)))
(2, ((7, 4), (i, j))) (3, ((8, 3), (k, l)))
And write code snippet which will sum the second columns of above joined results (5+4+3).

  1. See the explanation for Step by Step Solution and configuration.

Answer(s): A

Explanation:

Solution :
Step 1: Create tiles in hdfs using Hue.
Step 2: Create pairRDD for both the files.
val one = sc.textFile("spark16/file1.txt").map{
_.split(", ", -1) match {
case Array(a, b, c) => (a, ( b, c))
} }
val two = sc.textFHe(Mspark16/file2.txt").map{
_.split('7\-1) match {
case Array(a, b, c) => (a, (b, c))
} }
Step 3: Join both the RDD. val joined = one.join(two)
Step 4: Sum second column values.
val sum = joined.map {
case (_, ((_, num2), (_, _))) => num2.tolnt
}.reduce(_ + _)



Problem Scenario 58 : You have been given below code snippet.
val a = sc.parallelize(List("dog", "tiger", "lion", "cat", "spider", "eagle"), 2) val b = a.keyBy(_.length)
operation1
Write a correct code snippet for operationl which will produce desired output, shown below.
Array[(lnt, Seq[String])] = Array((4, ArrayBuffer(lion)), (6, ArrayBuffer(spider)), (3, ArrayBuffer(dog, cat)), (5, ArrayBuffer(tiger, eagle}}}

  1. See the explanation for Step by Step Solution and configuration.

Answer(s): A

Explanation:

Solution :
b.groupByKey.collect
groupByKey [Pair]
Very similar to groupBy, but instead of supplying a function, the key-component of each pair will automatically be presented to the partitioner.
Listing Variants
def groupByKeyQ: RDD[(K, lterable[V]}]
def groupByKey(numPartittons: Int): RDD[(K, lterable[V] )] def groupByKey(partitioner: Partitioner): RDD[(K, lterable[V])]



Problem Scenario 63 : You have been given below code snippet.
val a = sc.parallelize(List("dog", "tiger", "lion", "cat", "panther", "eagle"), 2)
val b = a.map(x => (x.length, x))
operation1

Write a correct code snippet for operationl which will produce desired output, shown below. Array[(lnt, String}] = Array((4, lion), (3, dogcat), (7, panther), (5, tigereagle))

  1. See the explanation for Step by Step Solution and configuration.

Answer(s): A

Explanation:

Solution :
b.reduceByKey(_ + _).collect
reduceByKey JPair] : This function provides the well-known reduce functionality in Spark. Please note that any function f you provide, should be commutative in order to generate reproducible results.



Share your comments for Cloudera CCA175 exam with other users:

S
Sanjay
8/15/2023 10:22:00 AM

informative for me.

T
Tom
12/12/2023 8:53:00 PM

question 134s answer shoule be "dlp"

A
Alex
11/7/2023 11:02:00 AM

in 72 the answer must be [sys_user_has_role] table.

F
Finn
5/4/2023 10:21:00 PM

i appreciated the mix of multiple-choice and short answer questions. i passed my exam this morning.

A
AJ
7/13/2023 8:33:00 AM

great to find this website, thanks

C
Curtis Nakawaki
6/29/2023 9:11:00 PM

examination questions seem to be relevant.

U
Umashankar Sharma
10/22/2023 9:39:00 AM

planning to take psm test

E
ED SHAW
7/31/2023 10:34:00 AM

please allow to download

A
AD
7/22/2023 11:29:00 AM

please provide dumps

A
Ayyjayy
11/6/2023 7:29:00 AM

is the answer to question 15 correct ? i feel like the answer should be b

B
Blessious Phiri
8/12/2023 11:56:00 AM

its getting more technical

J
Jeanine J
7/11/2023 3:04:00 PM

i think these questions are what i need.

A
Aderonke
10/23/2023 2:13:00 PM

helpful assessment

T
Tom
1/5/2024 2:32:00 AM

i am confused about the answers to the questions. do you know if the answers are correct?

V
Vinit N.
8/28/2023 2:33:00 AM

hi, please make the dumps available for my upcoming examination.

S
Sanyog Deshpande
9/14/2023 7:05:00 AM

good practice

T
Tyron
9/8/2023 12:12:00 AM

so far it is really informative

B
beast
7/30/2023 2:22:00 PM

hi i want it please please upload it

M
Mirex
5/26/2023 3:45:00 AM

am preparing for exam ,just nice questions

E
exampei
8/7/2023 8:05:00 AM

please upload c_tadm_23 exam

A
Anonymous
9/12/2023 12:50:00 PM

can we get tdvan4 vantage data engineering pdf?

A
Aish
10/11/2023 5:51:00 AM

want to clear the exam.

S
Smaranika
6/22/2023 8:42:00 AM

could you please upload the dumps of sap c_sac_2302

B
Blessious Phiri
8/15/2023 1:56:00 PM

asm management configuration is about storage

L
Lewis
7/6/2023 8:49:00 PM

kool thumb up

M
Moreece
5/15/2023 8:44:00 AM

just passed the az-500 exam this last friday. most of the questions in this exam dumps are in the exam. i bought the full version and noticed some of the questions which were answered wrong in the free version are all corrected in the full version. this site is good but i wish the had it in an interactive version like a test engine simulator.

T
Terry
5/24/2023 4:41:00 PM

i can practice for exam

E
Emerys
7/29/2023 6:55:00 AM

please i need this exam.

G
Goni Mala
9/2/2023 12:27:00 PM

i need the dump

L
Lenny
9/29/2023 11:30:00 AM

i want it bad, even if cs6 maybe retired, i want to learn cs6

M
MilfSlayer
12/28/2023 8:32:00 PM

i hate comptia with all my heart with their "choose the best" answer format as an argument could be made on every question. they say "the "comptia way", lmao no this right here boys is the comptia way 100%. take it from someone whos failed this exam twice but can configure an entire complex network that these are the questions that are on the test 100% no questions asked. the pbqs are dead on! nice work

S
Swati Raj
11/14/2023 6:28:00 AM

very good materials

K
Ko Htet
10/17/2023 1:28:00 AM

thanks for your support.

P
Philippe
1/22/2023 10:24:00 AM

iam impressed with the quality of these dumps. they questions and answers were easy to understand and the xengine app was very helpful to use.

AI Tutor 👋 I’m here to help!