최신 Databricks-Machine-Learning-Associate 무료덤프 - Databricks Certified Machine Learning Associate
A data scientist learned during their training to always use 5-fold cross-validation in their model development workflow. A colleague suggests that there are cases where a train-validation split could be preferred over k-fold cross-validation when k > 2.
Which of the following describes a potential benefit of using a train-validation split over k-fold cross-validation in this scenario?
Which of the following describes a potential benefit of using a train-validation split over k-fold cross-validation in this scenario?
정답: B
A new data scientist has started working on an existing machine learning project. The project is a scheduled Job that retrains every day. The project currently exists in a Repo in Databricks. The data scientist has been tasked with improving the feature engineering of the pipeline's preprocessing stage. The data scientist wants to make necessary updates to the code that can be easily adopted into the project without changing what is being run each day.
Which approach should the data scientist take to complete this task?
Which approach should the data scientist take to complete this task?
정답: A
설명: (DumpTOP 회원만 볼 수 있음)
A data scientist has replaced missing values in their feature set with each respective feature variable's median value. A colleague suggests that the data scientist is throwing away valuable information by doing this.
Which of the following approaches can they take to include as much information as possible in the feature set?
Which of the following approaches can they take to include as much information as possible in the feature set?
정답: B
설명: (DumpTOP 회원만 볼 수 있음)
A machine learning engineer is trying to scale a machine learning pipeline by distributing its single-node model tuning process. After broadcasting the entire training data onto each core, each core in the cluster can train one model at a time. Because the tuning process is still running slowly, the engineer wants to increase the level of parallelism from 4 cores to 8 cores to speed up the tuning process. Unfortunately, the total memory in the cluster cannot be increased.
In which of the following scenarios will increasing the level of parallelism from 4 to 8 speed up the tuning process?
In which of the following scenarios will increasing the level of parallelism from 4 to 8 speed up the tuning process?
정답: A
설명: (DumpTOP 회원만 볼 수 있음)
Which of the following describes the relationship between native Spark DataFrames and pandas API on Spark DataFrames?
정답: D
설명: (DumpTOP 회원만 볼 수 있음)
A data scientist has created two linear regression models. The first model uses price as a label variable and the second model uses log(price) as a label variable. When evaluating the RMSE of each model by comparing the label predictions to the actual price values, the data scientist notices that the RMSE for the second model is much larger than the RMSE of the first model.
Which of the following possible explanations for this difference is invalid?
Which of the following possible explanations for this difference is invalid?
정답: C
설명: (DumpTOP 회원만 볼 수 있음)
A machine learning engineer is using the following code block to scale the inference of a single-node model on a Spark DataFrame with one million records:
Assuming the default Spark configuration is in place, which of the following is a benefit of using an Iterator?
Assuming the default Spark configuration is in place, which of the following is a benefit of using an Iterator?
정답: B
설명: (DumpTOP 회원만 볼 수 있음)
A data scientist wants to parallelize the training of trees in a gradient boosted tree to speed up the training process. A colleague suggests that parallelizing a boosted tree algorithm can be difficult.
Which of the following describes why?
Which of the following describes why?
정답: D
설명: (DumpTOP 회원만 볼 수 있음)