HomeExamsBig DataTETABDAPRFIC3031
TETABDAPRFIC3031

Infosys Certified PySpark Professional

Practice with real exam-pattern questions for Infosys Certified PySpark Professional. Each question includes a detailed explanation to help you understand the concept, not just memorise the answer. Try 10 questions free — no login required.

IntermediateBig Data60 min
Free questions

10 Infosys Certified PySpark Professional practice questions with answers

Real Lex exam-pattern multiple-choice questions for the Infosys Certified PySpark Professional certification. Each question includes the correct answer. The full question bank is available to Premium members.

  1. Question 1

    Consider a scenario where an HDFS file is divided into four blocks and to be processed by a Spark application. As part of the RDD creation, data in each block is represented as?

    • partitions

      Correct
    • B

      blocks

    • C

      tasks

    • D

      executors

  2. Question 2

    Consider a scenario to process large amount of data in distributed pattern using Spark program. Where does the actual data to be processed gets stored?

    • Driver

      Correct
    • B

      Cluster Manager

    • C

      Worker nodes

    • D

      Tasks

  3. Question 3

    Consider a scenario where few partitions of RDD are lost while Spark job getting executed. Which of the following component facilitates the recreation of lost partitions?

    • accumulators

      Correct
    • B

      Lineage graph

    • C

      replication

    • D

      key operations

  4. Question 4

    Which of the below are the features of PySpark framework?

    • In-memory computation

      Correct
    • B

      Lazy evaluation

    • C

      Parallel processing

    • D

      More disk I/O operations while processing data

  5. Question 5

    Consider a scenario where a PySpark Job is getting deployed in the cluster. What does "--master" parameter in spark-submit command indicate?

    • Provides the Cluster manager details used to run the Spark application.

      Correct
    • B

      Indicates that Spark application runs only on Master node and not on worker nodes

    • C

      Provides the driver program details

    • D

      Provides the worker nodes details

  6. Question 6

    Sam works for a banking client and performs data analysis using Spark. Which of the following command can be used by him to get the URL of Spark Web User Interface to view the details of jobs and executors?

    • sc.webUI

      Correct
    • B

      sc.web

    • C

      sc.uiWebUrl

    • D

      sc.displayUI

  7. Question 7

    Which of the below operation may result in data skewing with not an even distribution of data across partitions?

    • cache()

      Correct
    • B

      collect()

    • C

      persist()

    • D

      coalesce()

  8. Question 8

    Shane works on a data analytics project and needs to perform analysis on employee data (Employee.csv file).

    Schema: EmployeeID, EmployeeName, Age, Salary, Department.

    Which of the below code snippet can be used to sort the employees based on the department in descending order?

    • logsRDD = sc.textFile("/dataset/Employee")

      FieldsRDD = logsRDD.map(lambda var1: var1.split(","))

      logdata= FieldsRDD.map(lambda var1: (var1[4], [var1[0],var1[1],var1[2], var1[3]]))

      sortdata= logdata.sortByKey(ascending=False,keyfunc=lambda k: k)

      Correct
    • B

      logsRDD = sc.textFile("/dataset/Employee")

      FieldsRDD = logsRDD.map(lambda var1: var1.split(","))

      logdata= FieldsRDD.map(lambda var1: (var1[4], [var1[0],var1[1],var1[2], var1[3]]))

      sortdata= logdata.sortByKey(ascending=True,keyfunc=lambda k: k)

    • C

      logsRDD = sc.textFile("/dataset/Employee")

      FieldsRDD = logsRDD.map(lambda var1: var1.split(","))

      logdata= FieldsRDD.map(lambda var1: (var1[4], [var1[0],var1[1],var1[2], var1[3]]))

      sortdata= logdata.sort(ascending=True,keyfunc=lambda k: k)

    • D

      logsRDD = sc.textFile("/dataset/Employee")

      FieldsRDD = logsRDD.map(lambda var1: var1.split(","))

      logdata= FieldsRDD.map(lambda var1: [(var1[0], var1[1],var1[2],var1[3], var1[4]]))

      sortdata= logdata.sort(ascending=False,keyfunc=lambda k: k)

  9. Question 9

    Which of the following storage level in persist() method is equivalent to cache() method?

    • MEMORY_ONLY

      Correct
    • B

      DISK_ONLY

    • C

      MEMORY_DISK_ONLY

    • D

      DISK_ONLY_2

  10. Question 10

    Consider the below code snippet:

    def fun(x):

    return x.split(",")

    rdd=sc.parallelize(["1002,John,20000","1003,harry,7000","1004,lookie,900"])

    newrdd=rdd.map(fun) # Line 1

    newrdd.first() # Line 2

    Predict the correct output.

    • [1002,John,20000]

      Correct
    • B

      (1002,John,20000)

    • C

      Error will occurr due to Line 1

    • D

      Error will occurr due to Line 2

Pricing

Pay once. Clear every cert this year.

One subscription, full Telegram channel access, every PDF posted during your membership.

Monthly
50% OFF
₹1,300₹2,600
Per month · cancel anytime
  • Full access to all 1,357+ certifications
  • Monthly updated question banks
  • Telegram private channel access
  • Cancel anytime
Get Monthly
POPULAR
Quarterly
44% OFF
₹1,800₹3,200
That's ₹600/mo · billed for 3 months
  • Everything in Monthly
  • Save ₹2,100 vs monthly billing
  • Priority answer key requests
  • Best for increasing DQ score fast
Get Quarterly
BEST VALUE
Lifetime
52% OFF
₹2,400₹5,000
One-time · lifetime access
  • Everything in Quarterly
  • Lifetime channel access — no renewals
  • All future certifications included
  • Priority response from admin team
Get Lifetime
FAQ

Common questions, straight answers.

A monthly-updated Telegram channel where we post real exam-pattern question banks and detailed answer keys for 1,357+ Infosys Lex certifications. You join once, you get every PDF posted during your membership.

Right after payment on our Graphy page, you'll receive a private invite link to the Telegram channel. Access is instant — usually under 30 seconds.

We compile question banks from the actual Lex test pattern, sourced and verified by 180K+ community members who've recently cleared these exams. Match rate is consistently 85–95%.

Every single month. When Infosys rolls out new versions of certifications, we post updated dumps within 7–10 days. You'll see channel activity weekly.

Clearing certifications is one of the highest-weighted DQ factors. Members typically clear 3–5 certifications in their first 3 months, which moves DQ scores up by a full band.

i
InfyLexDumps

Independent exam preparation platform for Infosys Lex certifications. Real exam-pattern question banks, monthly updates, 180K+ community members.

Join Premium Telegram
Contact
  • @prepflixadmin
  • admin@prepflix.net
This platform is an independent educational resource and is not affiliated with or endorsed by Infosys Ltd. All certification names referenced are property of their respective owners.
© 2026 InfyLexDumps
Join Premium Telegram