최신 Associate-Developer-Apache-Spark 무료덤프 - Databricks Certified Associate Developer for Apache Spark 3.0

문제1

The code block shown below should return the number of columns in the CSV file stored at location filePath.
From the CSV file, only lines should be read that do not start with a # character. Choose the answer that correctly fills the blanks in the code block to accomplish this.
Code block:
__1__(__2__.__3__.csv(filePath, __4__).__5__)

A. 1. size
2. pyspark
3. DataFrameReader
4. comment='#'
5. columns

B. 1. len
2. spark
3. read
4. comment='#'
5. columns

C. 1. len
2. pyspark
3. DataFrameReader
4. comment='#'
5. columns

D. 1. size
2. spark
3. read()
4. escape='#'
5. columns

E. 1. DataFrame
2. spark
3. read()
4. escape='#'
5. shape[0]

정답: B

설명: (DumpTOP 회원만 볼 수 있음)

문제2

The code block displayed below contains an error. The code block should create DataFrame itemsAttributesDf which has columns itemId and attribute and lists every attribute from the attributes column in DataFrame itemsDf next to the itemId of the respective row in itemsDf. Find the error.
A sample of DataFrame itemsDf is below.

Code block:
itemsAttributesDf = itemsDf.explode("attributes").alias("attribute").select("attribute", "itemId")

A. explode() is not a method of DataFrame. explode() should be used inside the select() method instead.

B. The split() method should be used inside the select() method instead of the explode() method.

C. Since itemId is the index, it does not need to be an argument to the select() method.

D. The alias() method needs to be called after the select() method.

E. The explode() method expects a Column object rather than a string.

정답: A

설명: (DumpTOP 회원만 볼 수 있음)

문제3

Which of the following statements about storage levels is incorrect?

A. Caching can be undone using the DataFrame.unpersist() operator.

B. In client mode, DataFrames cached with the MEMORY_ONLY_2 level will not be stored in the edge node's memory.

C. DISK_ONLY will not use the worker node's memory.

D. MEMORY_AND_DISK replicates cached DataFrames both on memory and disk.

E. The cache operator on DataFrames is evaluated like a transformation.

정답: D

설명: (DumpTOP 회원만 볼 수 있음)

문제4

Which of the following code blocks can be used to save DataFrame transactionsDf to memory only, recalculating partitions that do not fit in memory when they are needed?

A. transactionsDf.persist()

B. transactionsDf.storage_level('MEMORY_ONLY')

C. from pyspark import StorageLevel
transactionsDf.cache(StorageLevel.MEMORY_ONLY)

D. transactionsDf.clear_persist()

E. transactionsDf.cache()

F. from pyspark import StorageLevel
transactionsDf.persist(StorageLevel.MEMORY_ONLY)

정답: F

설명: (DumpTOP 회원만 볼 수 있음)

문제5

The code block displayed below contains an error. The code block is intended to perform an outer join of DataFrames transactionsDf and itemsDf on columns productId and itemId, respectively.
Find the error.
Code block:
transactionsDf.join(itemsDf, [itemsDf.itemId, transactionsDf.productId], "outer")

A. The join type needs to be appended to the join() operator, like join().outer() instead of listing it as the last argument inside the join() call.

B. The "outer" argument should be eliminated from the call and join should be replaced by joinOuter.

C. The term [itemsDf.itemId, transactionsDf.productId] should be replaced by itemsDf.itemId == transactionsDf.productId.

D. The "outer" argument should be eliminated, since "outer" is the default join type.

E. The term [itemsDf.itemId, transactionsDf.productId] should be replaced by itemsDf.col("itemId") == transactionsDf.col("productId").

정답: C

설명: (DumpTOP 회원만 볼 수 있음)

문제6

The code block displayed below contains an error. The code block should read the csv file located at path data/transactions.csv into DataFrame transactionsDf, using the first row as column header and casting the columns in the most appropriate type. Find the error.
First 3 rows of transactions.csv:
1.transactionId;storeId;productId;name
2.1;23;12;green grass
3.2;35;31;yellow sun
4.3;23;12;green grass
Code block:
transactionsDf = spark.read.load("data/transactions.csv", sep=";", format="csv", header=True)

A. The DataFrameReader is not accessed correctly.

B. Spark is unable to understand the file type.

C. The code block is unable to capture all columns.

D. The transaction is evaluated lazily, so no file will be read.

E. The resulting DataFrame will not have the appropriate schema.

정답: E

설명: (DumpTOP 회원만 볼 수 있음)

문제7

Which of the following code blocks creates a new one-column, two-row DataFrame dfDates with column date of type timestamp?

A. 1.dfDates = spark.createDataFrame(["23/01/2022 11:28:12","24/01/2022 10:58:34"], ["date"])
2.dfDates = dfDates.withColumnRenamed("date", to_datetime("date", "yyyy-MM-dd HH:mm:ss"))

B. 1.dfDates = spark.createDataFrame(["23/01/2022 11:28:12","24/01/2022 10:58:34"], ["date"])
2.dfDates = dfDates.withColumn("date", to_timestamp("dd/MM/yyyy HH:mm:ss", "date"))

C. 1.dfDates = spark.createDataFrame([("23/01/2022 11:28:12",),("24/01/2022 10:58:34",)], ["date"])
2.dfDates = dfDates.withColumnRenamed("date", to_timestamp("date", "yyyy-MM-dd HH:mm:ss"))

D. 1.dfDates = spark.createDataFrame([("23/01/2022 11:28:12",),("24/01/2022 10:58:34",)], ["date"])

E. 1.dfDates = spark.createDataFrame([("23/01/2022 11:28:12",),("24/01/2022 10:58:34",)], ["date"])
2.dfDates = dfDates.withColumn("date", to_timestamp("date", "dd/MM/yyyy HH:mm:ss"))

정답: E

설명: (DumpTOP 회원만 볼 수 있음)

문제8

The code block shown below should add column transactionDateForm to DataFrame transactionsDf. The column should express the unix-format timestamps in column transactionDate as string type like Apr 26 (Sunday). Choose the answer that correctly fills the blanks in the code block to accomplish this.
transactionsDf.__1__(__2__, from_unixtime(__3__, __4__))

A. 1. withColumnRenamed
2. "transactionDate"
3. "transactionDateForm"
4. "MM d (EEE)"

B. 1. withColumn
2. "transactionDateForm"
3. "MMM d (EEEE)"
4. "transactionDate"

C. 1. select
2. "transactionDate"
3. "transactionDateForm"
4. "MMM d (EEEE)"

D. 1. withColumn
2. "transactionDateForm"
3. "transactionDate"
4. "MM d (EEE)"

E. 1. withColumn
2. "transactionDateForm"
3. "transactionDate"
4. "MMM d (EEEE)"

정답: E

설명: (DumpTOP 회원만 볼 수 있음)

문제9

The code block shown below should return a copy of DataFrame transactionsDf without columns value and productId and with an additional column associateId that has the value 5. Choose the answer that correctly fills the blanks in the code block to accomplish this.
transactionsDf.__1__(__2__, __3__).__4__(__5__, 'value')

A. 1. withColumn
2. 'associateId'
3. 5
4. remove
5. 'productId'

B. 1. withColumn
2. col(associateId)
3. lit(5)
4. drop
5. col(productId)

C. 1. withNewColumn
2. associateId
3. lit(5)
4. drop
5. productId

D. 1. withColumnRenamed
2. 'associateId'
3. 5
4. drop
5. 'productId'

E. 1. withColumn
2. 'associateId'
3. lit(5)
4. drop
5. 'productId'

정답: E

설명: (DumpTOP 회원만 볼 수 있음)

문제10

A. itemsDf.filter(col(supplier).not_contains('X')).select(supplier).distinct()

B. itemsDf.filter(not(col('supplier').contains('X'))).select('supplier').unique()

C. itemsDf.filter(!col('supplier').contains('X')).select(col('supplier')).unique()

D. itemsDf.select(~col('supplier').contains('X')).distinct()

E. itemsDf.filter(~col('supplier').contains('X')).select('supplier').distinct()

정답: E

설명: (DumpTOP 회원만 볼 수 있음)

최신 Associate-Developer-Apache-Spark 무료덤프 - Databricks Certified Associate Developer for Apache Spark 3.0

우리와 연락하기

유용한 링크

최신 업데이트