Cloudera CCA Spark and Hadoop Developer Exam CCA175 Dumps in PDF

Free Cloudera CCA175 Real Questions (page: 6)

Problem Scenario 51 : You have been given below code snippet.
val a = sc.parallelize(List(1, 2, 1, 3), 1)
val b = a.map((_, "b"))
val c = a.map((_, "c"))
Operation_xyz
Write a correct code snippet for Operationxyz which will produce below output.
Output:
Array[(lnt, (lterable[String], lterable[String]))] = Array(
(2, (ArrayBuffer(b), ArrayBuffer(c))),
(3, (ArrayBuffer(b), ArrayBuffer(c))),
(1, (ArrayBuffer(b, b), ArrayBuffer(c, c)))
)

  1. See the explanation for Step by Step Solution and configuration.

Answer(s): A

Explanation:

Solution :
b.cogroup(c).collect
cogroup [Pair], groupWith [Pair]
A very powerful set of functions that allow grouping up to 3 key-value RDDs together using their keys.
Another example
val x = sc.parallelize(List((1, "apple"), (2, "banana"), (3, "orange"), (4, "kiwi")), 2) val y = sc.parallelize(List((5, "computer"), (1, "laptop"), (1, "desktop"), (4, "iPad")), 2)
x.cogroup(y).collect
Array[(lnt, (lterable[String], lterable[String]))] = Array( (4, (ArrayBuffer(kiwi), ArrayBuffer(iPad))),
(2, (ArrayBuffer(banana), ArrayBuffer())),
(3, (ArrayBuffer(orange), ArrayBuffer())),
(1 , (ArrayBuffer(apple), ArrayBuffer(laptop, desktop))), (5, {ArrayBuffer(), ArrayBuffer(computer))))



Problem Scenario 24 : You have been given below comma separated employee information.
Data Set:
name, salary, sex, age
alok, 100000, male, 29
jatin, 105000, male, 32
yogesh, 134000, male, 39
ragini, 112000, female, 35
jyotsana, 129000, female, 39
valmiki, 123000, male, 29
Requirements:
Use the netcat service on port 44444, and nc above data line by line. Please do the following activities.

1. Create a flume conf file using fastest channel, which write data in hive warehouse
directory, in a table called flumemaleemployee (Create hive table as well tor given data).
2. While importing, make sure only male employee data is stored.

  1. See the explanation for Step by Step Solution and configuration.

Answer(s): A

Explanation:

Step 1: Create hive table for flumeemployee.'
CREATE TABLE flumemaleemployee
(
name string,
salary int,
sex string,
age int
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ', ';
Step 2: Create flume configuration file, with below configuration for source, sink and channel and save it in flume4.conf.
#Define source , sink, channel and agent.
agent1 .sources = source1
agent1 .sinks = sink1
agent1 .channels = channel1
# Describe/configure source1
agent1 .sources.source1.type = netcat
agent1 .sources.source1.bind = 127.0.0.1
agent1.sources.sourcel.port = 44444
#Define interceptors
agent1.sources.source1.interceptors=il
agent1 .sources.source1.interceptors.i1.type=regex_filter agent1 .sources.source1.interceptors.i1.regex=female agent1 .sources.source1.interceptors.i1.excludeEvents=true
## Describe sink1
agent1 .sinks, sinkl.channel = memory-channel
agent1.sinks.sink1.type = hdfs
agent1 .sinks, sinkl. hdfs. path = /user/hive/warehouse/flumemaleemployee hdfs-agent.sinks.hdfs-write.hdfs.writeFormat=Text
agentl .sinks.sink1.hdfs.fileType = Data Stream
# Now we need to define channel1 property.
agent1.channels.channel1.type = memory
agent1.channels.channell.capacity = 1000
agent1.channels.channel1.transactionCapacity = 100
# Bind the source and sink to the channel
agent1 .sources.source1.channels = channel1
agent1 .sinks.sink1.channel = channel1
Step 3: Run below command which will use this configuration file and append data in hdfs.

Start flume service:
flume-ng agent -conf /home/cloudera/flumeconf -conf-file /home/cloudera/flumeconf/flume4.conf --name agentl
Step 4: Open another terminal and use the netcat service, nc localhost 44444
Step 5: Enter data line by line.
alok, 100000, male, 29
jatin, 105000, male, 32
yogesh, 134000, male, 39
ragini, 112000, female, 35
jyotsana, 129000, female, 39
valmiki.123000.male.29
Step 6: Open hue and check the data is available in hive table or not.
Step 7: Stop flume service by pressing ctrl+c
Step 8: Calculate average salary on hive table using below query. You can use either hive command line tool or hue. select avg(salary) from flumeemployee;



Problem Scenario 27 : You need to implement near real time solutions for collecting information when submitted in file with below information.
Data
echo "IBM, 100, 20160104" >> /tmp/spooldir/bb/.bb.txt
echo "IBM, 103, 20160105" >> /tmp/spooldir/bb/.bb.txt
mv /tmp/spooldir/bb/.bb.txt /tmp/spooldir/bb/bb.txt
After few mins
echo "IBM, 100.2, 20160104" >> /tmp/spooldir/dr/.dr.txt
echo "IBM, 103.1, 20160105" >> /tmp/spooldir/dr/.dr.txt
mv /tmp/spooldir/dr/.dr.txt /tmp/spooldir/dr/dr.txt
Requirements:
You have been given below directory location (if not available than create it) /tmp/spooldir . You have a finacial subscription for getting stock prices from BloomBerg as well as
Reuters and using ftp you download every hour new files from their respective ftp site in directories /tmp/spooldir/bb and /tmp/spooldir/dr respectively.
As soon as file committed in this directory that needs to be available in hdfs in /tmp/flume/finance location in a single directory.
Write a flume configuration file named flume7.conf and use it to load data in hdfs with following additional properties .

1. Spool /tmp/spooldir/bb and /tmp/spooldir/dr
2. File prefix in hdfs sholuld be events
3. File suffix should be .log
4. If file is not commited and in use than it should have _ as prefix.
5. Data should be written as text to hdfs

  1. See the explanation for Step by Step Solution and configuration.

Answer(s): A

Explanation:

Solution :
Step 1: Create directory mkdir /tmp/spooldir/bb mkdir /tmp/spooldir/dr
Step 2: Create flume configuration file, with below configuration for
agent1.sources = source1 source2
agent1 .sinks = sink1
agent1.channels = channel1
agent1 .sources.source1.channels = channel1
agentl .sources.source2.channels = channell agent1 .sinks.sinkl.channel = channell agent1 .sources.source1.type = spooldir
agent1 .sources.sourcel.spoolDir = /tmp/spooldir/bb
agent1 .sources.source2.type = spooldir
agent1 .sources.source2.spoolDir = /tmp/spooldir/dr
agent1 .sinks.sink1.type = hdfs
agent1 .sinks.sink1.hdfs.path = /tmp/flume/finance
agent1-sinks.sink1.hdfs.filePrefix = events
agent1.sinks.sink1.hdfs.fileSuffix = .log
agent1 .sinks.sink1.hdfs.inUsePrefix = _
agent1 .sinks.sink1.hdfs.fileType = Data Stream
agent1.channels.channel1.type = file
Step 4: Run below command which will use this configuration file and append data in hdfs.
Start flume service:
flume-ng agent -conf /home/cloudera/flumeconf -conf-file /home/cloudera/fIumeconf/fIume7.conf --name agent1
Step 5: Open another terminal and create a file in /tmp/spooldir/ echo "IBM, 100, 20160104" » /tmp/spooldir/bb/.bb.txt
echo "IBM, 103, 20160105" » /tmp/spooldir/bb/.bb.txt mv /tmp/spooldir/bb/.bb.txt /tmp/spooldir/bb/bb.txt
After few mins
echo "IBM, 100.2, 20160104" » /tmp/spooldir/dr/.dr.txt echo "IBM, 103.1, 20160105" »/tmp/spooldir/dr/.dr.txt mv /tmp/spooldir/dr/.dr.txt /tmp/spooldir/dr/dr.txt



Problem Scenario 60 : You have been given below code snippet.
val a = sc.parallelize(List("dog", "salmon", "salmon", "rat", "elephant"}, 3}
val b = a.keyBy(_.length)
val c = sc.parallelize(List("dog", "cat", "gnu", "salmon", "rabbit", "turkey", "woif", "bear", "bee"), 3)
val d = c.keyBy(_.length)
operation1
Write a correct code snippet for operationl which will produce desired output, shown below.
Array[(lnt, (String, String))] = Array((6, (salmon, salmon)), (6, (salmon, rabbit)), (6, (salmon, turkey)), (6, (salmon, salmon)), (6, (salmon, rabbit)),
(6, (salmon, turkey)), (3, (dog, dog)), (3, (dog, cat)), (3, (dog, gnu)), (3, (dog, bee)), (3, (rat, dog)), (3, (rat, cat)), (3, (rat, gnu)), (3, (rat, bee)))

  1. See the explanation for Step by Step Solution and configuration.

Answer(s): A

Explanation:

solution:
b.join(d).collect
join [Pair]: Performs an inner join using two key-value RDDs. Please note that the keys must be generally comparable to make this work. keyBy : Constructs two-component tuples (key-value pairs) by applying a function on each data item. The result of the function becomes the data item becomes the key and the original value of the newly created tuples.



Problem Scenario 90 : You have been given below two files
course.txt
id, course
1, Hadoop
2, Spark
3, HBase
fee.txt
id, fee
2, 3900
3, 4200
4, 2900
Accomplish the following activities.

1. Select all the courses and their fees , whether fee is listed or not.
2. Select all the available fees and respective course. If course does not exists still list the
fee
3. Select all the courses and their fees , whether fee is listed or not. However, ignore
records having fee as null.

  1. See the explanation for Step by Step Solution and configuration.

Answer(s): A

Explanation:

Solution :

Step 1:
hdfs dfs -mkdir sparksql4
hdfs dfs -put course.txt sparksql4/
hdfs dfs -put fee.txt sparksql4/
Step 2: Now in spark shell
// load the data into a new RDD
val course = sc.textFile("sparksql4/course.txt")
val fee = sc.textFile("sparksql4/fee.txt")
// Return the first element in this RDD
course.fi rst()
fee.fi rst()
//define the schema using a case class case class Course(id: Integer, name: String) case class Fee(id: Integer, fee: Integer)
// create an RDD of Product objects
val courseRDD = course.map(_.split(", ")).map(c => Course(c(0).tolnt, c(1))) val feeRDD =fee.map(_.split(", ")).map(c => Fee(c(0}.tolnt, c(1}.tolnt)) courseRDD.first()
courseRDD.count(}
feeRDD.first()
feeRDD.countQ
// change RDD of Product objects to a DataFrame val courseDF = courseRDD.toDF(} val feeDF = feeRDD.toDF{)
// register the DataFrame as a temp table courseDF. registerTempTable("course") feeDF.
registerTempTablef'fee")
// Select data from table
val results = sqlContext.sql(......SELECT' FROM course """ ) results. showQ
val results = sqlContext.sql(......SELECT' FROM fee......) results. showQ
val results = sqlContext.sql(......SELECT * FROM course LEFT JOIN fee ON course.id = fee.id......)
results-showQ
val results ="sqlContext.sql(......SELECT * FROM course RIGHT JOIN fee ON course.id = fee.id "MM )
results. showQ
val results = sqlContext.sql(......SELECT' FROM course LEFT JOIN fee ON course.id = fee.id where fee.id IS NULL"
results. show()



Share your comments for Cloudera CCA175 exam with other users:

S
sk
5/13/2023 2:07:00 AM

excelent material

C
Christopher
9/5/2022 10:54:00 PM

the new versoin of this exam which i downloaded has all the latest questions from the exam. i only saw 3 new questions in the exam which was not in this dump.

S
Sam
9/7/2023 6:51:00 AM

question 8 - can cloudtrail be used for storing jobs? based on aws - aws cloudtrail is used for governance, compliance and investigating api usage across all of our aws accounts. every action that is taken by a user or script is an api call so this is logged to [aws] cloudtrail. something seems incorrect here.

T
Tanvi Rajput
8/14/2023 10:55:00 AM

question 13 tda - c01 answer : quick table calculation -> percentage of total , compute using table down

P
PMSAGAR
9/19/2023 2:48:00 AM

pls share teh dump

Z
zazza
6/16/2023 10:47:00 AM

question 44 answer is user risk

P
Prasana
6/23/2023 1:59:00 AM

please post the questions for preparation

T
test user
9/24/2023 3:15:00 AM

thanks for the questions

D
Draco
7/19/2023 5:34:00 AM

please reopen it now ..its really urgent

M
Megan
4/14/2023 5:08:00 PM

these practice exam questions were exactly what i needed. the variety of questions and the realistic exam-like environment they created helped me assess my strengths and weaknesses. i felt more confident and well-prepared on exam day, and i owe it to this exam dumps!

A
abdo casa
8/9/2023 6:10:00 PM

thank u it very instructuf

D
Danny
1/15/2024 9:10:00 AM

its helpful?

H
hanaa
10/3/2023 6:57:00 PM

is this dump still valid???

G
Georgio
1/19/2024 8:15:00 AM

question 205 answer is b

M
Matthew Dievendorf
5/30/2023 9:37:00 PM

question 39, should be answer b, directions stated is being sudneted from /21 to a /23. a /23 has 512 ips so 510 hosts. and can make 4 subnets out of the /21

A
Adhithya
8/11/2022 12:27:00 AM

beautiful test engine software and very helpful. questions are same as in the real exam. i passed my paper.

S
SuckerPumch88
4/25/2022 10:24:00 AM

the questions are exactly the same in real exam. just make sure not to answer all them correct or else they suspect you are cheating.

S
soheib
7/24/2023 7:05:00 PM

question: 78 the right answer i think is d not a

S
srija
8/14/2023 8:53:00 AM

very helpful

T
Thembelani
5/30/2023 2:17:00 AM

i am writing this exam tomorrow and have dumps

A
Anita
10/1/2023 4:11:00 PM

can i have the icdl excel exam

B
Ben
9/9/2023 7:35:00 AM

please upload it

A
anonymous
9/20/2023 11:27:00 PM

hye when will post again the past year question for this h13-311_v3 part since i have to for my test tommorow…thank you very much

R
Randall
9/28/2023 8:25:00 PM

on question 22, option b-once per session is also valid.

T
Tshegofatso
8/28/2023 11:51:00 AM

this website is very helpful

P
philly
9/18/2023 2:40:00 PM

its my first time exam

B
Beexam
9/4/2023 9:06:00 PM

correct answers are device configuration-enable the automatic installation of webview2 runtime. & policy management- prevent users from submitting feedback.

R
RAWI
7/9/2023 4:54:00 AM

is this dump still valid? today is 9-july-2023

A
Annie
6/7/2023 3:46:00 AM

i need this exam.. please upload these are really helpful

S
Shubhra Rathi
8/26/2023 1:08:00 PM

please upload the oracle 1z0-1059-22 dumps

S
Shiji
10/15/2023 1:34:00 PM

very good questions

R
Rita Rony
11/27/2023 1:36:00 PM

nice, first step to exams

A
Aloke Paul
9/11/2023 6:53:00 AM

is this valid for chfiv9 as well... as i am reker 3rd time...

C
Calbert Francis
1/15/2024 8:19:00 PM

great exam for people taking 220-1101

AI Tutor 👋 I’m here to help!