최신 Professional-Data-Engineer 무료덤프 - Google Certified Professional Data Engineer

문제1

By default, which of the following windowing behavior does Dataflow apply to unbounded data sets?

A. Single, Global Window

B. Windows at every 10 minutes

C. Windows at every 1 minute

D. Windows at every 100 MB of data

정답: A

설명: (DumpTOP 회원만 볼 수 있음)

문제2

You designed a database for patient records as a pilot project to cover a few hundred patients in three clinics. Your design used a single database table to represent all patients and their visits, and you used self-joins to generate reports. The server resource utilization was at 50%. Since then, the scope of the project has expanded. The database must now store 100 times more patient records. You can no longer run the reports, because they either take too long or they encounter errors with insufficient compute resources. How should you adjust the database design?

A. Add capacity (memory and disk space) to the database server by the order of 200.

B. Shard the tables into smaller ones based on date ranges, and only generate reports with prespecified date ranges.

C. Partition the table into smaller tables, with one for each clinic. Run queries against the smaller table pairs, and use unions for consolidated reports.

D. Normalize the master patient-record table into the patient table and the visits table, and create other necessary tables to avoid self-join.

정답: D

문제3

You are using Google BigQuery as your data warehouse. Your users report that the following simple query is running very slowly, no matter when they run the query:
SELECT country, state, city FROM [myproject:mydataset.mytable] GROUP BY country You check the query plan for the query and see the following output in the Read section of Stage:1:

What is the most likely cause of the delay for this query?

A. The [myproject:mydataset.mytable] table has too many partitions

B. Either the state or the city columns in the [myproject:mydataset.mytable] table have too many NULL values

C. Users are running too many concurrent queries in the system

D. Most rows in the [myproject:mydataset.mytable] table have the same value in the country column, causing data skew

정답: C

문제4

You need to look at BigQuery data from a specific table multiple times a day. The underlying table you are querying is several petabytes in size, but you want to filter your data and provide simple aggregations to downstream users. You want to run queries faster and get up-to-date insights quicker. What should you do?

A. Limit the query columns being pulled in the final result.

B. Use a cached query to accelerate time to results.

C. Run a scheduled query to pull the necessary data at specific intervals daily.

D. Create a materialized view based off of the query being run.

정답: C

설명: (DumpTOP 회원만 볼 수 있음)

문제5

You have a data pipeline with a Dataflow job that aggregates and writes time series metrics to Bigtable. You notice that data is slow to update in Bigtable. This data feeds a dashboard used by thousands of users across the organization. You need to support additional concurrent users and reduce the amount of time required to write the dat a. What should you do?
Choose 2 answers

A. Configure your Dataflow pipeline to use local execution.

B. Modify your Dataflow pipeline to use the CoGrcupByKey transform before writing to Bigtable.

C. Increase the number of nodes in the Bigtable cluster.

D. Modify your Dataflow pipeline lo use the Flatten transform before writing to Bigtable.

E. Increase the maximum number of Dataflow workers by setting maxNumWorkers in PipelineOptions.

정답: C,E

설명: (DumpTOP 회원만 볼 수 있음)

문제6

You are designing storage for two relational tables that are part of a 10-TB database on Google Cloud. You want to support transactions that scale horizontally. You also want to optimize data for range queries on nonkey columns. What should you do?

A. Use Cloud Spanner for storage. Add secondary indexes to support query patterns.

B. Use Cloud Spanner for storage. Use Cloud Dataflow to transform data to support query patterns.

C. Use Cloud SQL for storage. Use Cloud Dataflow to transform data to support query patterns.

D. Use Cloud SQL for storage. Add secondary indexes to support query patterns.

정답: B

문제7

You have a petabyte of analytics data and need to design a storage and processing platform for it. You must be able to perform data warehouse-style analytics on the data in Google Cloud and expose the dataset as files for batch analysis tools in other cloud providers. What should you do?

A. Store and process the entire dataset in BigQuery.

B. Store and process the entire dataset in Cloud Bigtable.

C. Store the full dataset in BigQuery, and store a compressed copy of the data in a Cloud Storage bucket.

D. Store the warm data as files in Cloud Storage, and store the active data in BigQuery. Keep this ratio as 80% warm and 20% active.

정답: C

문제8

You want to encrypt the customer data stored in BigQuery. You need to implement for-user crypto-deletion on data stored in your tables. You want to adopt native features in Google Cloud to avoid custom solutions. What should you do?

A. Create a customer-managed encryption key (CMEK) in Cloud KMS. Use the key to encrypt data before storing in BigQuery.

B. Implement Authenticated Encryption with Associated Data (AEAD) BigQuery functions while storing your data in BigQuery.

C. Create a customer-managed encryption key (CMEK) in Cloud KMS. Associate the key to the table while creating the table.

D. Encrypt your data during ingestion by using a cryptographic library supported by your ETL pipeline.

정답: C

설명: (DumpTOP 회원만 볼 수 있음)

문제9

You are designing a real-time system for a ride hailing app that identifies areas with high demand for rides to effectively reroute available drivers to meet the demand. The system ingests data from multiple sources to Pub/Sub. processes the data, and stores the results for visualization and analysis in real-time dashboards. The data sources include driver location updates every 5 seconds and app-based booking events from riders. The data processing involves real-time aggregation of supply and demand data for the last 30 seconds, every 2 seconds, and storing the results in a low-latency system for visualization. What should you do?

A. Group the data by using a hopping window in a Dataflow pipeline, and write the aggregated data to Memorystore

B. Group the data by using a session window in a Dataflow pipeline, and write the aggregated data to BigQuery.

C. Group the data by using a hopping window in a Dataflow pipeline, and write the aggregated data to BigQuery.

D. Group the data by using a tumbling window in a Dataflow pipeline, and write the aggregated data to Memorystore

정답: A

설명: (DumpTOP 회원만 볼 수 있음)

문제10

Your company receives both batch- and stream-based event dat
a. You want to process the data using Google Cloud Dataflow over a predictable time period. However, you realize that in some instances data can arrive late or out of order. How should you design your Cloud Dataflow pipeline to handle data that is late or out of order?

A. Use watermarks and timestamps to capture the lagged data.

B. Set sliding windows to capture all the lagged data.

C. Ensure every datasource type (stream or batch) has a timestamp, and use the timestamps to define the logic for lagged data.

D. Set a single global window to capture all the data.

정답: B

문제11

The CUSTOM tier for Cloud Machine Learning Engine allows you to specify the number of which types of cluster nodes?

A. Workers

B. Workers and parameter servers

C. Masters, workers, and parameter servers

D. Parameter servers

정답: B

설명: (DumpTOP 회원만 볼 수 있음)

문제12

Your organization uses a multi-cloud data storage strategy, storing data in Cloud Storage, and data in Amazon Web Services' (AWS) S3 storage buckets. All data resides in US regions. You want to query up-to-date data by using BigQuery. regardless of which cloud the data is stored in. You need to allow users to query the tables from BigQuery without giving direct access to the data in the storage buckets What should you do?

A. Set up a BigQuery Omni connection to the AWS S3 bucket data Create BigLake tables over the Cloud Storage and S3 data and query the data using BigQuery directly.

B. Set up a BigQuery Omni connection to the AWS S3 bucket data. Create external tables over the Cloud Storage and S3 data and query the data using BigQuery directly.

C. Use the Storage Transfer Service to copy data from the AWS S3 buckets to Cloud Storage buckets Create BigLake tables over the Cloud Storage data and query the data using BigQuery directly.

D. Use the Storage Transfer Service to copy data from the AWS S3 buckets to Cloud Storage buckets Create external tables over the Cloud Storage data and query the data using BigQuery directly

정답: B

설명: (DumpTOP 회원만 볼 수 있음)

문제13

The data analyst team at your company uses BigQuery for ad-hoc queries and scheduled SQL pipelines in a Google Cloud project with a slot reservation of 2000 slots. However, with the recent introduction of hundreds of new non time-sensitive SQL pipelines, the team is encountering frequent quota errors. You examine the logs and notice that approximately 1500 queries are being triggered concurrently during peak time. You need to resolve the concurrency issue. What should you do?

A. Increase the slot capacity of the project with baseline as 2000 and maximum reservation size as 3000.

B. Update SOL pipelines to run as a batch query, and run ad-hoc queries as interactive query jobs.

C. Increase the slot capacity of the project with baseline as 0 and maximum reservation size as 3000.

D. Update SQL pipelines and ad-hoc queries to run as interactive query jobs.

정답: B

설명: (DumpTOP 회원만 볼 수 있음)

문제14

Your car factory is pushing machine measurements as messages into a Pub/Sub topic in your Google Cloud project. A Dataflow streaming job. that you wrote with the Apache Beam SDK, reads these messages, sends acknowledgment lo Pub/Sub. applies some custom business logic in a Doffs instance, and writes the result to BigQuery. You want to ensure that if your business logic fails on a message, the message will be sent to a Pub/Sub topic that you want to monitor for alerting purposes. What should you do?

A. Use an exception handling block in your Data Flow's Doffs code to push the messages that failed to be transformed through a side output and to a new Pub/Sub topic. Use Cloud Monitoring to monitor the topic/num_jnacked_messages_by_region metric on this new topic.

B. Create a snapshot of your Pub/Sub pull subscription. Use Cloud Monitoring to monitor the snapshot/numessages metric on this snapshot.

C. Enable dead lettering in your Pub/Sub pull subscription, and specify a new Pub/Sub topic as the dead letter topic. Use Cloud Monitoring to monitor the subscription/dead_letter_message_count metric on your pull subscription.

D. Enable retaining of acknowledged messages in your Pub/Sub pull subscription. Use Cloud Monitoring to monitor the subscription/num_retained_acked_messages metric on this subscription.

정답: C

설명: (DumpTOP 회원만 볼 수 있음)

문제15

You want to use a database of information about tissue samples to classify future tissue samples as either normal or mutated. You are evaluating an unsupervised anomaly detection method for classifying the tissue samples. Which two characteristic support this method? (Choose two.)

A. There are very few occurrences of mutations relative to normal samples.

B. You expect future mutations to have different features from the mutated samples in the database.

C. You expect future mutations to have similar features to the mutated samples in the database.

D. You already have labels for which samples are mutated and which are normal in the database.

E. There are roughly equal occurrences of both normal and mutated samples in the database.

정답: A,C

설명: (DumpTOP 회원만 볼 수 있음)

문제16

MJTelco's Google Cloud Dataflow pipeline is now ready to start receiving data from the 50,000 installations. You want to allow Cloud Dataflow to scale its compute power up as required. Which Cloud Dataflow pipeline configuration setting should you update?

A. The zone

B. The maximum number of workers

C. The disk size per worker

D. The number of workers

정답: A

최신 Professional-Data-Engineer 무료덤프 - Google Certified Professional Data Engineer

우리와 연락하기

유용한 링크

최신 업데이트