Infosys Certified PySpark Professional
Practice with real exam-pattern questions for Infosys Certified PySpark Professional. Each question includes a detailed explanation to help you understand the concept, not just memorise the answer. Try 10 questions free — no login required.
Full question bank for this exam + 1,357+ others. Cancel anytime.
Join Premium10 Infosys Certified PySpark Professional practice questions with answers
Real Lex exam-pattern multiple-choice questions for the Infosys Certified PySpark Professional certification. Each question includes the correct answer. The full question bank is available to Premium members.
- Question 1
Consider a scenario where an HDFS file is divided into four blocks and to be processed by a Spark application. As part of the RDD creation, data in each block is represented as?
- ✓
partitions
Correct - B
blocks
- C
tasks
- D
executors
- ✓
- Question 2
Consider a scenario to process large amount of data in distributed pattern using Spark program. Where does the actual data to be processed gets stored?
- ✓
Driver
Correct - B
Cluster Manager
- C
Worker nodes
- D
Tasks
- ✓
- Question 3
Consider a scenario where few partitions of RDD are lost while Spark job getting executed. Which of the following component facilitates the recreation of lost partitions?
- ✓
accumulators
Correct - B
Lineage graph
- C
replication
- D
key operations
- ✓
- Question 4
Which of the below are the features of PySpark framework?
- ✓
In-memory computation
Correct - B
Lazy evaluation
- C
Parallel processing
- D
More disk I/O operations while processing data
- ✓
- Question 5
Consider a scenario where a PySpark Job is getting deployed in the cluster. What does "--master" parameter in spark-submit command indicate?
- ✓
Provides the Cluster manager details used to run the Spark application.
Correct - B
Indicates that Spark application runs only on Master node and not on worker nodes
- C
Provides the driver program details
- D
Provides the worker nodes details
- ✓
- Question 6
Sam works for a banking client and performs data analysis using Spark. Which of the following command can be used by him to get the URL of Spark Web User Interface to view the details of jobs and executors?
- ✓
sc.webUI
Correct - B
sc.web
- C
sc.uiWebUrl
- D
sc.displayUI
- ✓
- Question 7
Which of the below operation may result in data skewing with not an even distribution of data across partitions?
- ✓
cache()
Correct - B
collect()
- C
persist()
- D
coalesce()
- ✓
- Question 8
Shane works on a data analytics project and needs to perform analysis on employee data (Employee.csv file).
Schema: EmployeeID, EmployeeName, Age, Salary, Department.
Which of the below code snippet can be used to sort the employees based on the department in descending order?
- ✓
logsRDD = sc.textFile("/dataset/Employee")
FieldsRDD = logsRDD.map(lambda var1: var1.split(","))
logdata= FieldsRDD.map(lambda var1: (var1[4], [var1[0],var1[1],var1[2], var1[3]]))
sortdata= logdata.sortByKey(ascending=False,keyfunc=lambda k: k)
Correct - B
logsRDD = sc.textFile("/dataset/Employee")
FieldsRDD = logsRDD.map(lambda var1: var1.split(","))
logdata= FieldsRDD.map(lambda var1: (var1[4], [var1[0],var1[1],var1[2], var1[3]]))
sortdata= logdata.sortByKey(ascending=True,keyfunc=lambda k: k)
- C
logsRDD = sc.textFile("/dataset/Employee")
FieldsRDD = logsRDD.map(lambda var1: var1.split(","))
logdata= FieldsRDD.map(lambda var1: (var1[4], [var1[0],var1[1],var1[2], var1[3]]))
sortdata= logdata.sort(ascending=True,keyfunc=lambda k: k)
- D
logsRDD = sc.textFile("/dataset/Employee")
FieldsRDD = logsRDD.map(lambda var1: var1.split(","))
logdata= FieldsRDD.map(lambda var1: [(var1[0], var1[1],var1[2],var1[3], var1[4]]))
sortdata= logdata.sort(ascending=False,keyfunc=lambda k: k)
- ✓
- Question 9
Which of the following storage level in persist() method is equivalent to cache() method?
- ✓
MEMORY_ONLY
Correct - B
DISK_ONLY
- C
MEMORY_DISK_ONLY
- D
DISK_ONLY_2
- ✓
- Question 10
Consider the below code snippet:
def fun(x):
return x.split(",")
rdd=sc.parallelize(["1002,John,20000","1003,harry,7000","1004,lookie,900"])
newrdd=rdd.map(fun) # Line 1
newrdd.first() # Line 2
Predict the correct output.
- ✓
[1002,John,20000]
Correct - B
(1002,John,20000)
- C
Error will occurr due to Line 1
- D
Error will occurr due to Line 2
- ✓
More in Big Data
Infosys certified data analyst on Advanced AWS
Infosys Certified Confluent Kafka Developer
Infosys Certified Big Data Analyst on AWS
Pay once. Clear every cert this year.
One subscription, full Telegram channel access, every PDF posted during your membership.
- Full access to all 1,357+ certifications
- Monthly updated question banks
- Telegram private channel access
- Cancel anytime
- Everything in Monthly
- Save ₹2,100 vs monthly billing
- Priority answer key requests
- Best for increasing DQ score fast
- Everything in Quarterly
- Lifetime channel access — no renewals
- All future certifications included
- Priority response from admin team
Common questions, straight answers.
A monthly-updated Telegram channel where we post real exam-pattern question banks and detailed answer keys for 1,357+ Infosys Lex certifications. You join once, you get every PDF posted during your membership.
Right after payment on our Graphy page, you'll receive a private invite link to the Telegram channel. Access is instant — usually under 30 seconds.
We compile question banks from the actual Lex test pattern, sourced and verified by 180K+ community members who've recently cleared these exams. Match rate is consistently 85–95%.
Every single month. When Infosys rolls out new versions of certifications, we post updated dumps within 7–10 days. You'll see channel activity weekly.
Clearing certifications is one of the highest-weighted DQ factors. Members typically clear 3–5 certifications in their first 3 months, which moves DQ scores up by a full band.