최신 Databricks-Certified-Data-Engineer-Associate 무료덤프 - Databricks Certified Data Engineer Associate

문제1

Which of the following can be used to simplify and unify siloed data architectures that are specialized for specific use cases?

A. Data warehouse

B. None of these

C. Data lake

D. Data lakehouse

E. All of these

정답: D

설명: (DumpTOP 회원만 볼 수 있음)

문제2

A data engineer has joined an existing project and they see the following query in the project repository:
CREATE STREAMING LIVE TABLE loyal_customers AS
SELECT customer_id -
FROM STREAM(LIVE.customers)
WHERE loyalty_level = 'high';
Which of the following describes why the STREAM function is included in the query?

A. The customers table is a reference to a Structured Streaming query on a PySpark DataFrame.

B. The table being created is a live table.

C. The customers table is a streaming live table.

D. The STREAM function is not needed and will cause an error.

E. The data in the customers table has been updated since its last run.

정답: C

설명: (DumpTOP 회원만 볼 수 있음)

문제3

A single Job runs two notebooks as two separate tasks. A data engineer has noticed that one of the notebooks is running slowly in the Job's current run. The data engineer asks a tech lead for help in identifying why this might be the case.
Which of the following approaches can the tech lead use to identify why the notebook is running slowly as part of the Job?

A. They can navigate to the Tasks tab in the Jobs UI to immediately review the processing notebook.

B. There is no way to determine why a Job task is running slowly.

C. They can navigate to the Runs tab in the Jobs UI to immediately review the processing notebook.

D. They can navigate to the Runs tab in the Jobs UI and click on the active run to review the processing notebook.

E. They can navigate to the Tasks tab in the Jobs UI and click on the active run to review the processing notebook.

정답: E

설명: (DumpTOP 회원만 볼 수 있음)

문제4

A data engineer is working with two tables. Each of these tables is displayed below in its entirety.
The data engineer runs the following query to join these tables together:
Which of the following will be returned by the above query?

A. Option E

B. Option D

C. Option C

D. Option B

E. Option A

정답: E

설명: (DumpTOP 회원만 볼 수 있음)

문제5

Which of the following is hosted completely in the control plane of the classic Databricks architecture?

A. Worker node

B. Databricks Filesystem

C. Databricks web application

D. Driver node

E. JDBC data source

정답: C

설명: (DumpTOP 회원만 볼 수 있음)

문제6

What is stored in a Databricks customer's cloud account?

A. Cluster management metadata

B. Data

C. Notebooks

D. Databricks web application

정답: B

설명: (DumpTOP 회원만 볼 수 있음)

문제7

Which file format is used for storing Delta Lake Table?

A. SV

B. Parquet

C. JSON

D. Delta

정답: B

설명: (DumpTOP 회원만 볼 수 있음)

문제8

A data engineer has configured a Structured Streaming job to read from a table, manipulate the data, and then perform a streaming write into a new table.
The cade block used by the data engineer is below:

If the data engineer only wants the query to execute a micro-batch to process data every 5 seconds, which of the following lines of code should the data engineer use to fill in the blank?

A. trigger(once="5 seconds")

B. trigger(continuous="5 seconds")

C. trigger(processingTime="5 seconds")

D. trigger("5 seconds")

E. trigger()

정답: C

설명: (DumpTOP 회원만 볼 수 있음)

문제9

A data engineer is designing a data pipeline. The source system generates files in a shared directory that is also used by other processes. As a result, the files should be kept as is and will accumulate in the directory. The data engineer needs to identify which files are new since the previous run in the pipeline, and set up the pipeline to only ingest those new files with each run.
Which of the following tools can the data engineer use to solve this problem?

A. Auto Loader

B. Databricks SQL

C. Unity Catalog

D. Data Explorer

E. Delta Lake

정답: A

설명: (DumpTOP 회원만 볼 수 있음)

문제10

A dataset has been defined using Delta Live Tables and includes an expectations clause:
CONSTRAINT valid_timestamp EXPECT (timestamp > '2020-01-01') ON VIOLATION FAIL UPDATE What is the expected behavior when a batch of data containing data that violates these constraints is processed?

A. Records that violate the expectation are added to the target dataset and recorded as invalid in the event log.

B. Records that violate the expectation cause the job to fail.

C. Records that violate the expectation are dropped from the target dataset and recorded as invalid in the event log.

D. Records that violate the expectation are dropped from the target dataset and loaded into a quarantine table.

E. Records that violate the expectation are added to the target dataset and flagged as invalid in a field added to the target dataset.

정답: B

설명: (DumpTOP 회원만 볼 수 있음)

최신 Databricks-Certified-Data-Engineer-Associate 무료덤프 - Databricks Certified Data Engineer Associate

우리와 연락하기

유용한 링크

최신 업데이트