Cloudera CCA Spark and Hadoop Developer Exam CCA175 Dumps in PDF

Free Cloudera CCA175 Real Questions (page: 2)

Problem Scenario 74 : You have been given MySQL DB with following details.
user=retail_dba
password=cloudera
database=retail_db
table=retail_db.orders
table=retail_db.order_items
jdbc URL = jdbc:mysql://quickstart:3306/retail_db
Columns of order table : (orderjd , order_date , ordercustomerid, order status}
Columns of orderjtems table : (order_item_td , order_item_order_id , order_item_product_id,
order_item_quantity, order_item_subtotal, order_item_product_price)
Please accomplish following activities.

1. Copy "retaildb.orders" and "retaildb.orderjtems" table to hdfs in respective directory
p89_orders and p89_order_items .
2. Join these data using orderjd in Spark and Python
3. Now fetch selected columns from joined data Orderld, Order date and amount collected
on this order.
4. Calculate total order placed for each date, and produced the output sorted by date.

  1. See the explanation for Step by Step Solution and configuration.

Answer(s): A

Explanation:

Solution:Step 1: Import Single table .
sqoop import --connect jdbc:mysql://quickstart:3306/retail_db -username=retail_dba - password=cloudera -table=orders --target-dir=p89_orders - -m1 sqoop import --connect jdbc:mysql://quickstart:3306/retail_db -username=retail_dba - password=cloudera -table=order_items ~target-dir=p89_ order items -m 1 Note : Please check you dont have space between before or after '=' sign. Sqoop uses the MapReduce framework to copy data from RDBMS to hdfs
Step 2: Read the data from one of the partition, created using above command, hadoopfs -cat p89_orders/part-m-00000 hadoop fs -cat p89_order_items/part-m-00000

Step 3: Load these above two directory as RDD using Spark and Python (Open pyspark terminal and do following). orders = sc.textFile("p89_orders") orderitems = sc.textFile("p89_order_items")
Step 4: Convert RDD into key value as (orderjd as a key and rest of the values as a value)
#First value is orderjd
ordersKeyValue = orders.map(lambda line: (int(line.split(", ")[0]), line))
#Second value as an Orderjd
orderltemsKeyValue = orderltems.map(lambda line: (int(line.split(", ")[1]), line))
Step 5: Join both the RDD using orderjd
joinedData = orderltemsKeyValue.join(ordersKeyValue)
#print the joined data
tor line in joinedData.collect():
print(line)
Format of joinedData as below.
[Orderld, 'All columns from orderltemsKeyValue', 'All columns from orders Key Value']
Step 6: Now fetch selected values Orderld, Order date and amount collected on this order. revenuePerOrderPerDay = joinedData.map(lambda row: (row[0]( row[1][1].split(", ")[1]( f!oat(row[1][0].split('\M}[4]}}}
#printthe result
for line in revenuePerOrderPerDay.collect():
print(line)
Step 7: Select distinct order ids for each date.
#distinct(date, order_id)
distinctOrdersDate = joinedData.map(lambda row: row[1][1].split('\")[1] + ", " + str(row[0])).distinct()
for line in distinctOrdersDate.collect(): print(line)
Step 8: Similar to word count, generate (date, 1) record for each row. newLineTuple = distinctOrdersDate.map(lambda line: (line.split(", ")[0], 1))
Step 9: Do the count for each key(date), to get total order per date. totalOrdersPerDate = newLineTuple.reduceByKey(lambda a, b: a + b}
#print results
for line in totalOrdersPerDate.collect():
print(line)

Step 10: Sort the results by date sortedData=totalOrdersPerDate.sortByKey().collect()
#print results
for line in sortedData:
print(line)



Problem Scenario 34 : You have given a file named spark6/user.csv.
Data is given below:
user.csv
id, topic, hits
Rahul, scala, 120
Nikita, spark, 80
Mithun, spark, 1
myself, cca175, 180
Now write a Spark code in scala which will remove the header part and create RDD of values as below, for all rows. And also if id is myself" than filter out row.
Map(id -> om, topic -> scala, hits -> 120)

  1. See the explanation for Step by Step Solution and configuration.

Answer(s): A

Explanation:

Solution :
Step 1: Create file in hdfs (We will do using Hue). However, you can first create in local filesystem and then upload it to hdfs.
Step 2: Load user.csv file from hdfs and create PairRDDs val csv = sc.textFile("spark6/user.csv")
Step 3: split and clean data
val headerAndRows = csv.map(line => line.split(", ").map(_.trim))
Step 4: Get header row
val header = headerAndRows.first
Step 5: Filter out header (We need to check if the first val matches the first header name) val data = headerAndRows.filter(_(0) != header(O))
Step 6: Splits to map (header/value pairs)
val maps = data.map(splits => header.zip(splits).toMap)

Step 7: Filter out the user "myself
val result = maps.filter(map => mapf'id") != "myself")
Step 8: Save the output as a Text file. result.saveAsTextFile("spark6/result.txt")



Problem Scenario 39 : You have been given two files
spark16/file1.txt
1, 9, 5
2, 7, 4
3, 8, 3
spark16/file2.txt
1, g, h
2, i, j
3, k, l
Load these two tiles as Spark RDD and join them to produce the below results
(l, ((9, 5), (g, h)))
(2, ((7, 4), (i, j))) (3, ((8, 3), (k, l)))
And write code snippet which will sum the second columns of above joined results (5+4+3).

  1. See the explanation for Step by Step Solution and configuration.

Answer(s): A

Explanation:

Solution :
Step 1: Create tiles in hdfs using Hue.
Step 2: Create pairRDD for both the files.
val one = sc.textFile("spark16/file1.txt").map{
_.split(", ", -1) match {
case Array(a, b, c) => (a, ( b, c))
} }
val two = sc.textFHe(Mspark16/file2.txt").map{
_.split('7\-1) match {
case Array(a, b, c) => (a, (b, c))
} }
Step 3: Join both the RDD. val joined = one.join(two)
Step 4: Sum second column values.
val sum = joined.map {
case (_, ((_, num2), (_, _))) => num2.tolnt
}.reduce(_ + _)



Problem Scenario 58 : You have been given below code snippet.
val a = sc.parallelize(List("dog", "tiger", "lion", "cat", "spider", "eagle"), 2) val b = a.keyBy(_.length)
operation1
Write a correct code snippet for operationl which will produce desired output, shown below.
Array[(lnt, Seq[String])] = Array((4, ArrayBuffer(lion)), (6, ArrayBuffer(spider)), (3, ArrayBuffer(dog, cat)), (5, ArrayBuffer(tiger, eagle}}}

  1. See the explanation for Step by Step Solution and configuration.

Answer(s): A

Explanation:

Solution :
b.groupByKey.collect
groupByKey [Pair]
Very similar to groupBy, but instead of supplying a function, the key-component of each pair will automatically be presented to the partitioner.
Listing Variants
def groupByKeyQ: RDD[(K, lterable[V]}]
def groupByKey(numPartittons: Int): RDD[(K, lterable[V] )] def groupByKey(partitioner: Partitioner): RDD[(K, lterable[V])]



Problem Scenario 63 : You have been given below code snippet.
val a = sc.parallelize(List("dog", "tiger", "lion", "cat", "panther", "eagle"), 2)
val b = a.map(x => (x.length, x))
operation1

Write a correct code snippet for operationl which will produce desired output, shown below. Array[(lnt, String}] = Array((4, lion), (3, dogcat), (7, panther), (5, tigereagle))

  1. See the explanation for Step by Step Solution and configuration.

Answer(s): A

Explanation:

Solution :
b.reduceByKey(_ + _).collect
reduceByKey JPair] : This function provides the well-known reduce functionality in Spark. Please note that any function f you provide, should be commutative in order to generate reproducible results.



Share your comments for Cloudera CCA175 exam with other users:

S
shime
10/23/2023 10:03:00 AM

this is really very very helpful for mcd level 1

V
Vnu
6/3/2023 2:39:00 AM

very helpful!

S
Steve
8/17/2023 2:19:00 PM

question #18s answer should be a, not d. this should be corrected. it should be minvalidityperiod

R
RITEISH
12/24/2023 4:33:00 AM

thanks for the exact solution

S
SB
10/15/2023 7:58:00 AM

need to refer the questions and have to give the exam

M
Mike Derfalem
7/16/2023 7:59:00 PM

i need it right now if it was possible please

I
Isak
7/6/2023 3:21:00 AM

i need it very much please share it in the fastest time.

M
Maria
6/23/2023 11:40:00 AM

correct answer is d for student.java program

N
Nagendra Pedipina
7/12/2023 9:10:00 AM

q:37 c is correct

J
John
9/16/2023 9:37:00 PM

q6 exam topic: terramearth, c: correct answer: copy 1petabyte to encrypted usb device ???

S
SAM
12/4/2023 12:56:00 AM

explained answers

A
Andy
12/26/2023 9:35:00 PM

plan to take theaws certified developer - associate dva-c02 in the next few weeks

S
siva
5/17/2023 12:32:00 AM

very helpfull

M
mouna
9/27/2023 8:53:00 AM

good questions

B
Bhavya
9/12/2023 7:18:00 AM

help to practice csa exam

M
Malik
9/28/2023 1:09:00 PM

nice tip and well documented

R
rodrigo
6/22/2023 7:55:00 AM

i need the exam

D
Dan
6/29/2023 1:53:00 PM

please upload

A
Ale M
11/22/2023 6:38:00 PM

prepping for fsc exam

A
ahmad hassan
9/6/2023 3:26:00 AM

pd1 with great experience

Ž
Žarko
9/5/2023 3:35:00 AM

@t it seems like azure service bus message quesues could be the best solution

S
Shiji
10/15/2023 1:08:00 PM

helpful to check your understanding.

D
Da Costa
8/27/2023 11:43:00 AM

question 128 the answer should be static not auto

B
bot
7/26/2023 6:45:00 PM

more comments here

K
Kaleemullah
12/31/2023 1:35:00 AM

great support to appear for exams

B
Bsmaind
8/20/2023 9:26:00 AM

useful dumps

B
Blessious Phiri
8/13/2023 8:37:00 AM

making progress

N
Nabla
9/17/2023 10:20:00 AM

q31 answer should be d i think

V
vladputin
7/20/2023 5:00:00 AM

is this real?

N
Nick W
9/29/2023 7:32:00 AM

q10: c and f are also true. q11: this is outdated. you no longer need ownership on a pipe to operate it

N
Naveed
8/28/2023 2:48:00 AM

good questions with simple explanation

C
cert
9/24/2023 4:53:00 PM

admin guide (windows) respond to malicious causality chains. when the cortex xdr agent identifies a remote network connection that attempts to perform malicious activity—such as encrypting endpoint files—the agent can automatically block the ip address to close all existing communication and block new connections from this ip address to the endpoint. when cortex xdrblocks an ip address per endpoint, that address remains blocked throughout all agent profiles and policies, including any host-firewall policy rules. you can view the list of all blocked ip addresses per endpoint from the action center, as well as unblock them to re-enable communication as appropriate. this module is supported with cortex xdr agent 7.3.0 and later. select the action mode to take when the cortex xdr agent detects remote malicious causality chains: enabled (default)—terminate connection and block ip address of the remote connection. disabled—do not block remote ip addresses. to allow specific and known s

Y
Yves
8/29/2023 8:46:00 PM

very inciting

M
Miguel
10/16/2023 11:18:00 AM

question 5, it seems a instead of d, because: - care plan = case - patient = person account - product = product2;

AI Tutor 👋 I’m here to help!