Infosys Certified Spark Professional

Question 1

Which of the following statements are TRUE about Spark framework? Choose THREE CORRECT options from below.

Accepted Answer

Supports in-memory data processing

.

Question 2

Select the number of stages that will be generated from the DAG while executing the below code?

val rdd1 = sc.textFile("Customer")

val rdd2 = rdd1.map(_.split(",")).map(arr1 => (arr1(2),arr1(4).toInt))

rdd2.cache()

val rdd3 = rdd2.reduceByKey(_ max _)

rdd3.saveAsTextFile("output1")

Accepted Answer

1.

Question 3

While Spark job execution, program gets converted into a lineage graph. Which of the following statements are TRUE with respect to RDD Lineage Graph? Choose THREE CORRECT options from below.

Accepted Answer

Unless an action statement, graph does not get submitted for execution

.

Question 4

Which of the following methods will give the count of number of partitions created of an RDD? Choose three correct options.

Accepted Answer

rdd.getNumPartitions

.

Question 5

Which of the following method is used to get the RDD lineage graph in Spark?

Accepted Answer

Stats.

Question 6

Select the following option which will display the record starting with the word "hadoop" of an RDD.

Accepted Answer

rdd.filter(x => x.startsWith("hadoop")).collect

.

Question 7

Which of the given Scala function is used in Spark for changing the number of partitions in a RDD?

Accepted Answer

rdd.changePartition(newnumberOfPartitions)

.

Question 8

Consider sales dataset with column names as CustomerID, Location, Merchant, Amount. Requirement is to create a paired RDD with CustomerID as key and Amount as value. Which of the below code snippet is correct to create a paired RDD.

Accepted Answer

val SalesData = sc.textFile("HDFS Path")

val PairedSalesData = SalesData.map{record => (record.split(",")(0),record.split(",")(3).toLong)}

.

Question 9

What is the output of the given code snipet:

val RDD1 = sc.parallelize(Array(1,2))
val RDD2 = sc.parallelize (Array(2,3))
val product=RDD1.cartesian(RDD2)
val RDD3 = sc.parallelize(Seq((1,2),(2,3),(3,4)))
product.join(RDD3).collect

Accepted Answer

Type mismatch error. Since integer RDD is combined with pairedRDD in the join

.

Question 10

Which of the following function(s) are RDD action statements? Choose THREE CORRECT options from below.

Accepted Answer

foreach().

10 Infosys Certified Spark Professional practice questions with answers

Which of the following statements are TRUE about Spark framework? Choose THREE CORRECT options from below.

Select the number of stages that will be generated from the DAG while executing the below code?
val rdd1 = sc.textFile("Customer")
val rdd2 = rdd1.map(_.split(",")).map(arr1 => (arr1(2),arr1(4).toInt))
rdd2.cache()
val rdd3 = rdd2.reduceByKey(_ max _)
rdd3.saveAsTextFile("output1")

While Spark job execution, program gets converted into a lineage graph. Which of the following statements are TRUE with respect to RDD Lineage Graph? Choose THREE CORRECT options from below.

Which of the following methods will give the count of number of partitions created of an RDD? Choose three correct options.

Which of the following method is used to get the RDD lineage graph in Spark?

Select the following option which will display the record starting with the word "hadoop" of an RDD.

Which of the given Scala function is used in Spark for changing the number of partitions in a RDD?

Consider sales dataset with column names as CustomerID, Location, Merchant, Amount. Requirement is to create a paired RDD with CustomerID as key and Amount as value. Which of the below code snippet is correct to create a paired RDD.

What is the output of the given code snipet:
val RDD1 = sc.parallelize(Array(1,2))
val RDD2 = sc.parallelize (Array(2,3))
val product=RDD1.cartesian(RDD2)
val RDD3 = sc.parallelize(Seq((1,2),(2,3),(3,4)))
product.join(RDD3).collect

Which of the following function(s) are RDD action statements? Choose THREE CORRECT options from below.

More in Big Data

Infosys certified data analyst on Advanced AWS

Infosys Certified Confluent Kafka Developer

Infosys Certified Big Data Analyst on AWS

Pay once. Clear every cert this year.

Common questions, straight answers.

Infosys Certified Spark Professional

Which of the following statements are TRUE about Spark framework? Choose THREE CORRECT options from below.

Select the number of stages that will be generated from the DAG while executing the below code?val rdd1 = sc.textFile("Customer")val rdd2 = rdd1.map(_.split(",")).map(arr1 => (arr1(2),arr1(4).toInt))rdd2.cache()val rdd3 = rdd2.reduceByKey(_ max _)rdd3.saveAsTextFile("output1")

While Spark job execution, program gets converted into a lineage graph. Which of the following statements are TRUE with respect to RDD Lineage Graph? Choose THREE CORRECT options from below.

Which of the following methods will give the count of number of partitions created of an RDD? Choose three correct options.

Which of the following method is used to get the RDD lineage graph in Spark?

Select the following option which will display the record starting with the word "hadoop" of an RDD.

Which of the given Scala function is used in Spark for changing the number of partitions in a RDD?

Consider sales dataset with column names as CustomerID, Location, Merchant, Amount. Requirement is to create a paired RDD with CustomerID as key and Amount as value. Which of the below code snippet is correct to create a paired RDD.

What is the output of the given code snipet:val RDD1 = sc.parallelize(Array(1,2))val RDD2 = sc.parallelize (Array(2,3))val product=RDD1.cartesian(RDD2)val RDD3 = sc.parallelize(Seq((1,2),(2,3),(3,4)))product.join(RDD3).collect

Which of the following function(s) are RDD action statements? Choose THREE CORRECT options from below.

More in Big Data

Infosys certified data analyst on Advanced AWS

Infosys Certified Confluent Kafka Developer

Infosys Certified Big Data Analyst on AWS

Pay once. Clear every cert this year.

Common questions, straight answers.

Select the number of stages that will be generated from the DAG while executing the below code?
val rdd1 = sc.textFile("Customer")
val rdd2 = rdd1.map(_.split(",")).map(arr1 => (arr1(2),arr1(4).toInt))
rdd2.cache()
val rdd3 = rdd2.reduceByKey(_ max _)
rdd3.saveAsTextFile("output1")

What is the output of the given code snipet:
val RDD1 = sc.parallelize(Array(1,2))
val RDD2 = sc.parallelize (Array(2,3))
val product=RDD1.cartesian(RDD2)
val RDD3 = sc.parallelize(Seq((1,2),(2,3),(3,4)))
product.join(RDD3).collect