Infosys Certified Spark Professional
Practice with real exam-pattern questions for Infosys Certified Spark Professional. Each question includes a detailed explanation to help you understand the concept, not just memorise the answer. Try 10 questions free — no login required.
Full question bank for this exam + 1,357+ others. Cancel anytime.
Join Premium10 Infosys Certified Spark Professional practice questions with answers
Real Lex exam-pattern multiple-choice questions for the Infosys Certified Spark Professional certification. Each question includes the correct answer. The full question bank is available to Premium members.
- Question 1
Which of the following statements are TRUE about Spark framework? Choose THREE CORRECT options from below.
- ✓
Supports in-memory data processing
Correct - B
Follows lazy evaluation principle
- C
Does not provide machine learning libraries
- D
Supports parallel processing
- ✓
- Question 2
Select the number of stages that will be generated from the DAG while executing the below code?
val rdd1 = sc.textFile("Customer")
val rdd2 = rdd1.map(_.split(",")).map(arr1 => (arr1(2),arr1(4).toInt))
rdd2.cache()
val rdd3 = rdd2.reduceByKey(_ max _)
rdd3.saveAsTextFile("output1")
- ✓
1
Correct - B
2
- C
3
- D
4
- ✓
- Question 3
While Spark job execution, program gets converted into a lineage graph. Which of the following statements are TRUE with respect to RDD Lineage Graph? Choose THREE CORRECT options from below.
- ✓
Unless an action statement, graph does not get submitted for execution
Correct - B
Transformations resulting data shuffling are mandatory in lineage graph
- C
Lineage graph is generated out of the transformations in the program
- D
Lost RDD partitions can be recovered using the lineage graph
- ✓
- Question 4
Which of the following methods will give the count of number of partitions created of an RDD? Choose three correct options.
- ✓
rdd.getNumPartitions
Correct - B
rdd.partitions.length
- C
rdd.partitions.size
- D
rdd.partitions
- ✓
- Question 5
Which of the following method is used to get the RDD lineage graph in Spark?
- ✓
Stats
Correct - B
toDebugString
- C
dependencies
- D
glom
- ✓
- Question 6
Select the following option which will display the record starting with the word "hadoop" of an RDD.
- ✓
rdd.filter(x => x.startsWith("hadoop")).collect
Correct - B
rdd.filter(x => x.contains("hadoop")).collect
- C
rdd.filter(x => x.contains("hadoop")).first
- D
rdd.filter(x => x.starts("hadoop")).collect
- ✓
- Question 7
Which of the given Scala function is used in Spark for changing the number of partitions in a RDD?
- ✓
rdd.changePartition(newnumberOfPartitions)
Correct - B
rdd.changePartition(oldnumberOfPartitions,newnumberOfPartitions)
- C
rdd.repartition(newnumberOfPartitions)
- D
rdd.repartition().change(newnumberOfPartitions)
- ✓
- Question 8
Consider sales dataset with column names as CustomerID, Location, Merchant, Amount. Requirement is to create a paired RDD with CustomerID as key and Amount as value. Which of the below code snippet is correct to create a paired RDD.
- ✓
val SalesData = sc.textFile("HDFS Path")
val PairedSalesData = SalesData.map{record => (record.split(",")(0),record.split(",")(3).toLong)}
Correct - B
val SalesData = sc.textFile("HDFS Path")
val PairedSalesData = SalesData.flatMap{record => (record.split(",")(1),record.split(",")(4).toLong)}
- C
val SalesData = sc.textFile("HDFS Path")
val PairedSalesData = SalesData.reduce{record => (record.split(",")(0),record.split(",")(3).toLong)}
- D
val SalesData = sc.textFile("HDFS Path")
val PairedSalesData = SalesData.filter{record => (record.split(",")(0),record.split(",")(3).toLong)}
- ✓
- Question 9
What is the output of the given code snipet:
val RDD1 = sc.parallelize(Array(1,2))
val RDD2 = sc.parallelize (Array(2,3))
val product=RDD1.cartesian(RDD2)
val RDD3 = sc.parallelize(Seq((1,2),(2,3),(3,4)))
product.join(RDD3).collect- ✓
Type mismatch error. Since integer RDD is combined with pairedRDD in the join
Correct - B
Syntax error since it should be collect(), not collect
- C
Array(((1,(2,2)), (1,(3,2)), (2,(2,3)), (2,(3,3))))
- D
Array((1,(2,2,3)),(2,(2,3,3)),(3,(0,0,4)))
- ✓
- Question 10
Which of the following function(s) are RDD action statements? Choose THREE CORRECT options from below.
- ✓
foreach()
Correct - B
collect()
- C
reduceByKey()
- D
take(n)
- ✓
More in Big Data
Infosys certified data analyst on Advanced AWS
Infosys Certified Confluent Kafka Developer
Infosys Certified Big Data Analyst on AWS
Pay once. Clear every cert this year.
One subscription, full Telegram channel access, every PDF posted during your membership.
- Full access to all 1,357+ certifications
- Monthly updated question banks
- Telegram private channel access
- Cancel anytime
- Everything in Monthly
- Save ₹2,100 vs monthly billing
- Priority answer key requests
- Best for increasing DQ score fast
- Everything in Quarterly
- Lifetime channel access — no renewals
- All future certifications included
- Priority response from admin team
Common questions, straight answers.
A monthly-updated Telegram channel where we post real exam-pattern question banks and detailed answer keys for 1,357+ Infosys Lex certifications. You join once, you get every PDF posted during your membership.
Right after payment on our Graphy page, you'll receive a private invite link to the Telegram channel. Access is instant — usually under 30 seconds.
We compile question banks from the actual Lex test pattern, sourced and verified by 180K+ community members who've recently cleared these exams. Match rate is consistently 85–95%.
Every single month. When Infosys rolls out new versions of certifications, we post updated dumps within 7–10 days. You'll see channel activity weekly.
Clearing certifications is one of the highest-weighted DQ factors. Members typically clear 3–5 certifications in their first 3 months, which moves DQ scores up by a full band.