AWS Certified Machine Learning Specialty Exam Notes and Practice Tests

You didn’t pass this time, but that’s okay. Take this as an opportunity to identify areas for improvement. Review the materials, focus on your weak spots, and you’ll be even more prepared for your next attempt.
Great work! You passed this practice test. Keep reinforcing your knowledge, and you’ll be confident and ready for the real AWS exam.

Current
Review
Answered
You're Right!
Incorrect

Question 1 of 65

1. Question
A fintech company has developed an XGBoost model to assess credit risk using historical loan repayment data. The model performs well on the training dataset but underperforms on a separate validation dataset, suggesting overfitting. Which hyperparameter adjustments should the data science team make to improve generalization? (Choose TWO)
- Decrease the 'colsample_bytree' parameter in XGBoost
- Reduce the 'eta' parameter in XGBoost
- Increase the 'max_depth' parameter in XGBoost
- Decrease the 'subsample' parameter in XGBoost
- Increase the 'gamma' parameter in XGBoost
Correct

Incorrect
Question 2 of 65

2. Question
You are developing a machine learning system to detect your company's logo in images. You have unlabeled image data, some containing the logo and others not. Which approach requires the least effort to prepare this dataset for supervised learning?
- Use Amazon Mechanical Turk to manually label the dataset by assigning tasks to human workers
- Deploy Amazon Rekognition Custom Labels to automatically identify and label images with minimal manual effort
- Utilize Amazon SageMaker Ground Truth to automatically label the dataset
- Train an object detection model on SageMaker without labeling the dataset
Correct

Incorrect
Question 3 of 65

3. Question
Given the following confusion matrix, what is the F1 score of the model? (Columns represent actual values, rows represent predicted values) Actual Positive Actual Negative Predicted Positive 40 10 Predicted Negative 20 30
- 0.67
- 0.73
- 0.60
- 0.36
Correct

Incorrect
Question 4 of 65

4. Question
A brokerage firm's data scientist reports that their Amazon SageMaker Linear Learner model fails to converge despite data normalization being enabled. What is the most likely reason for the model's failure to converge?
- The training dataset is too large for the instance type being used, causing memory allocation failures
- Apply a uniform distribution to initialize all model weights
- Disable data normalization, as it may interfere with model training
- Enable automatic data shuffling in SageMaker training configuration to prevent convergence issues
Correct

Incorrect
Question 5 of 65

5. Question
While training a SageMaker Linear Learner regression model to predict individual income based on age and education level, the dataset includes multiple distinct groups. To ensure optimal model performance, which two preprocessing steps should be taken? (Select TWO)
- Add random noise to the training data
- Normalize the feature data to have zero mean and unit standard deviation
- Shuffle the input data
- Scale the feature data to match the range of the income variable
- Use SMOTE (Synthetic Minority Over-sampling Technique) to balance the dataset
Correct

Incorrect
Question 6 of 65

6. Question
A data science team at an energy analytics company uses a neural network to predict electricity consumption based on environmental variables. Initially, the model underperformed, so they added more layers to capture complex patterns. However, after this modification, training accuracy remains low, and the model fails to converge. What is the most probable reason for this behavior?
- Increase the number of training epochs in SageMaker to allow more time for convergence
- Use AWS Lambda function layering to split the neural network into smaller tasks for improved parallelization
- Switch to ReLU activation functions to mitigate the vanishing gradient problem
- Enable Amazon S3 cross-region replication to improve training dataset availability
Correct

Incorrect
Question 7 of 65

7. Question
An online retailer increased the batch size during training of their deep neural network-powered recommendation engine. Following this adjustment, the accuracy of recommendations declined. What is the most likely reason for this drop in model performance?
- Increased batch size led to overfitting
- Increased batch size caused the optimizer to converge to suboptimal local minima
- Increased batch size improved the model's generalization ability
- Increased batch size reduced training data variability
Correct

Incorrect
Question 8 of 65

8. Question
You are tasked with classifying terabytes of news articles into topics using Amazon SageMaker and Latent Dirichlet Allocation (LDA). Processing this vast dataset in a single batch is not feasible. What strategy should be used to improve performance?
- Use multiple GPUs for LDA model training
- Convert the articles to CSV format and use Pipe mode for streaming data
- Transform articles into RecordIO format and use Pipe mode for efficient data ingestion
- Deploy multiple instances in SageMaker to parallelize LDA training
Correct

Incorrect
Question 9 of 65

9. Question
You are training Amazon SageMaker BlazingText in supervised mode using File mode. Which of the following represents a correctly formatted training sample?
- __label__4 linux ready for prime time, intel says.
- __label__4 Linux ready for prime time, Intel says.
- __label__4 linux ready for prime time , intel says .
- __label4 linux ready for prime time, intel says.
Correct

Incorrect
Question 10 of 65

10. Question
A digital media company has deployed an Amazon SageMaker recommendation engine. After developing an improved model, the team wants to evaluate its performance in production while ensuring minimal risk and disruption. Which strategy is the most effective?
- Use Amazon CloudWatch Alarms to automatically trigger a rollback if the updated model underperforms
- Modify SageMaker Endpoint's traffic-splitting configuration to route 50% of traffic to the new model and monitor user engagement
- Leverage Amazon SageMaker Endpoints with Production Variants to gradually roll out the updated model to a subset of users
- Use Amazon API Gateway and AWS Lambda to swap model versions dynamically without impacting production traffic
Correct

Incorrect
Question 11 of 65

11. Question
A smart home automation company is developing a regression model to predict daily energy consumption based on various environmental and usage factors. The team initially applied L1 regularization to improve model simplicity, but it resulted in underfitting and poor predictive performance. Which two adjustments could potentially enhance model accuracy? (SELECT TWO)
- Use AWS Glue DataBrew to remove insignificant features before training the model
- Increase the L1 regularization parameter in Amazon SageMaker
- Reduce the L1 regularization parameter in Amazon SageMaker
- Increase the sample rate to capture more granular patterns in the dataset
- Switch to L2 regularization in Amazon SageMaker Linear Learner
Correct

Incorrect
Question 12 of 65

12. Question
A machine learning team at an education technology company built a Random Forest classifier to automatically grade handwritten numbers (0–9) on student math tests. After deployment, they evaluated model performance using the following confusion matrix comparing predicted and actual digits: \tActual 0\tActual 1\tActual 2\tActual 3\tActual 4\tActual 5\tActual 6\tActual 7\tActual 8\tActual 9 Predicted 0\t85\t0\t1\t0\t0\t0\t0\t0\t0\t0 Predicted 1\t0\t92\t0\t0\t0\t0\t0\t0\t0\t0 Predicted 2\t0\t0\t60\t5\t0\t0\t0\t0\t0\t0 Predicted 3\t0\t0\t2\t40\t0\t0\t0\t0\t0\t0 Predicted 4\t0\t0\t0\t0\t75\t0\t0\t0\t0\t0 Predicted 5\t0\t0\t0\t0\t0\t30\t0\t0\t0\t0 Predicted 6\t0\t0\t0\t0\t0\t0\t88\t0\t0\t0 Predicted 7\t0\t0\t0\t0\t0\t0\t0\t50\t0\t0 Predicted 8\t0\t0\t0\t0\t0\t0\t0\t0\t90\t0 Predicted 9\t0\t0\t0\t0\t0\t0\t0\t0\t0\t95 Based on the confusion matrix, which digit had the lowest classification accuracy?
- 7
- 3
- 2
- 5
Correct

Incorrect
Question 13 of 65

13. Question
A MapReduce job that previously processed data from HDFS must now operate on data migrated to Amazon S3. What is the most efficient approach to integrate S3-stored data with MapReduce jobs running on Amazon EMR?
- Use Amazon EMR File System (EMRFS) with the s3:// prefix for data access
- Deploy Apache Hive as an intermediary layer between MapReduce and S3
- Utilize Amazon Elastic File System (EFS) to bridge data between MapReduce and S3
- Enable MapReduce to directly access S3 using the s3a:// prefix
Correct

Incorrect
Question 14 of 65

14. Question
A media analytics company wants to analyze tweets from influential figures to identify trends and sentiment similarities over time. The company needs to compute embeddings to capture semantic meaning from the tweets. Which AWS solution would be most effective for this task?
- Use Amazon Comprehend’s topic modeling for thematic extraction
- Utilize Amazon Kendra to index and search for semantically similar tweets
- Train an Object2Vec model in Amazon SageMaker
- Implement SageMaker BlazingText using the Skip-gram model
Correct

Incorrect
Question 15 of 65

15. Question
An AI startup is developing a complex image recognition model using TensorFlow on Amazon SageMaker. Due to a large dataset and high model complexity, training on a single GPU instance is insufficient. What is the best approach to scale training across multiple GPUs?
- TensorFlow does not support distributed training; consider using Apache MXNet
- Use Horovod for distributed training in Amazon SageMaker
- Wrap TensorFlow code in PySpark and use sagemaker-spark for distributed training
- Deploy TensorFlow to multiple EC2 P3 instances and let SageMaker manage distribution
Correct

Incorrect
Question 16 of 65

16. Question
A census-based dataset contains multiple correlated features, including age, but 10% of age values are missing. To maximize model accuracy, what is the best approach for handling these missing values?
- Randomly assign values to missing entries in the age column
- Train an auxiliary model to predict missing age values using the other features
- Replace missing values with the mean age from the dataset
- Drop the age column from the dataset
Correct

Incorrect
Question 17 of 65

17. Question
A tech startup is developing a deep learning model to predict stock market trends using Amazon SageMaker. To enhance predictive accuracy, which two hyperparameter tuning strategies should be employed? (CHOOSE TWO)
- Adjust 'max_jobs' and 'max_parallel_jobs' in SageMaker Hyperparameter Optimization (HPO) for efficient training
- Use Amazon SageMaker Debugger to dynamically adjust hyperparameters during training
- Enable early stopping in SageMaker HPO to terminate underperforming models and conserve resources
- Use Amazon S3 lifecycle policies to archive unused training data
- Implement EC2 Auto Scaling with SageMaker HPO to dynamically adjust training instances
Correct

Incorrect
Question 18 of 65

18. Question
A digital marketing firm needs to analyze consumer demographic data, which arrives continuously in JSON format. They seek a cost-efficient, serverless approach for storing, querying, and visualizing the data. Which solution is most appropriate?
- Use Amazon Kinesis Firehose to convert data to Parquet, store in S3, catalog with AWS Glue, query via Athena, and visualize in QuickSight
- Stream JSON data directly into Amazon S3 and use an EMR cluster to convert it to Parquet before querying it with Athena
- Store JSON data in S3 and visualize directly with Amazon QuickSight
- Stream the data into Amazon Aurora and use Aurora’s JDBC integration for visualization in QuickSight
Correct

Incorrect
Question 19 of 65

19. Question
A fintech startup is automating its nightly ML model training pipeline, which consists of sequential ETL tasks. The company wants an approach that manages dependencies and errors automatically. Which AWS service best meets these requirements?
- Use Amazon SQS with AWS Lambda to decouple and manage the workflow
- Implement Managed Workflows for Apache Airflow (MWAA) for flexible orchestration
- Leverage AWS Batch’s job scheduling capabilities
- Use AWS Step Functions to orchestrate the ML workflow
Correct

Incorrect
Question 20 of 65

20. Question
A company wants to restrict access to Amazon SageMaker notebooks so that only certain IAM groups can use them. What is the best approach to enforce this security policy?
- Configure a Network ACL (NACL) to restrict access based on IP addresses associated with IAM groups
- Use AWS KMS encryption on SageMaker notebooks and grant decryption access only to specific IAM groups
- Apply an IAM policy to SageMaker notebooks specifying allowed IAM groups
- Set up an Amazon Cognito user pool for SageMaker notebook authentication
Correct

Incorrect
Question 21 of 65

21. Question
A data scientist has initialized an Amazon SageMaker notebook instance to work on a predictive modeling project. The dataset is stored in Amazon S3, and the notebook needs to access it. Which method determines the notebook instance’s ability to interact with S3 data?
- S3 access is granted automatically to all SageMaker notebooks by default
- Access to S3 must be explicitly configured in the IAM role attached to the notebook instance
- S3 access is managed through a VPC endpoint, not IAM policies
- SageMaker notebooks can only access S3 buckets with "sagemaker" in their name unless manually overridden
Correct

Incorrect
Question 22 of 65

22. Question
An e-commerce retailer wants to predict website traffic on an hourly basis for the upcoming year. The prediction must account for daily and seasonal variations while requiring minimal development effort. Which AWS service should they use?
- Use Amazon Forecast to generate automatic time-series forecasts
- Manually build and deploy a prediction model using AWS Lambda
- Use Amazon Athena to query past traffic data and extrapolate future trends
- Leverage Amazon Rekognition to analyze website traffic pattern images
Correct

Incorrect
Question 23 of 65

23. Question
An AI researcher is training a deep neural network for image classification. The model achieves 99% accuracy on the training dataset but only 90% on the test dataset. Expert analysts consistently achieve 98% accuracy on similar tasks. What two actions should be taken to reduce overfitting? (SELECT TWO)
- Reduce network size
- Train for additional epochs
- Increase dropout rate
- Decrease learning rate
- Increase batch size
Correct

Incorrect
Question 24 of 65

24. Question
A fintech company has deployed an ML model to classify insurance claims as fraudulent or legitimate. Given that processing a fraudulent claim is costlier than investigating false positives, which evaluation metric should be prioritized?
- Optimize for overall model accuracy using Amazon SageMaker’s built-in metrics
- Maximize recall to capture all potential fraudulent claims, even at the cost of more investigations
- Balance precision and recall using a standard F1-score metric
- Optimize for precision to minimize false positives and reduce unnecessary investigations
Correct

Incorrect
Question 25 of 65

25. Question
A company manages an S3 data lake storing clickstream data and wants to analyze and visualize it without provisioning servers. Which AWS services can be used?
- S3, Kinesis, Amazon Elasticsearch
- S3, Glue, Athena, and QuickSight
- S3, EMR, and QuickSight
- S3, DMS, and RDS
Correct

Incorrect
Question 26 of 65

26. Question
A retail chain stores daily sales data in Amazon S3. It needs to analyze trends over the last 30 days and archive older data beyond 90 days in the most cost-effective manner. Which AWS solution meets this requirement?
- Use Amazon Timestream for time-series storage and auto-retention of data beyond 90 days
- Store data in Amazon Redshift and run a script to delete records older than 90 days
- Use AWS Glue ETL to filter and remove data older than 90 days
- Partition data in S3 by date, query using Athena, and set S3 lifecycle policies to move older data to Glacier
Correct

Incorrect
Question 27 of 65

27. Question
A team is training an XGBoost model in Amazon SageMaker to classify videos into genres. The video metadata must be converted into LibSVM format before training. Which two AWS services could be used for data preprocessing? (SELECT TWO)
- Use PySpark with XGBoostSageMakerEstimator to preprocess and train data
- Use Kinesis Analytics for real-time transformation into LibSVM format
- Use Spark on Amazon EMR to preprocess data and store results in S3
- Use AWS Glue ETL to convert data into LibSVM format
- Use scikit-learn within a SageMaker notebook for preprocessing
Correct

Incorrect
Question 28 of 65

28. Question
A company is developing a “Universal Translator” application to transcribe speech, translate it into English, and synthesize speech output. Which sequence of AWS services should be used?
- Amazon Polly → Amazon Transcribe → Amazon Translate
- Amazon Transcribe → Amazon Translate → Amazon Polly
- AWS Lambda → Amazon Translate → Amazon Polly
- Amazon Rekognition → Amazon Translate → Amazon Polly
Correct

Incorrect
Question 29 of 65

29. Question
A startup is creating AI-generated music by training an ML model on sequential music data. Given the nature of music generation, which neural network architectures are best suited for this task? (SELECT TWO)
- Use Reinforcement Learning to optimize musical note selection
- Use LSTMs to process sequential music patterns
- Use Generative Adversarial Networks (GANs) to generate synthetic music
- Use a Support Vector Machine (SVM) to classify and generate new musical notes
- Use a Convolutional Neural Network (CNN) for music composition
Correct

Incorrect
Question 30 of 65

30. Question
A healthcare analytics firm needs to train ML models on sensitive patient data stored in Amazon S3. The SageMaker training jobs run in a VPC with no internet access for security compliance. How should the training jobs securely access S3 data?
- Use AWS Direct Connect to establish a private connection to S3
- Configure an Internet Gateway to enable SageMaker access to S3
- Set up a VPC endpoint for direct private access to Amazon S3
- Deploy a NAT Gateway in each subnet running SageMaker training jobs
Correct

Incorrect
Question 31 of 65

31. Question
A classifier is designed to detect fraudulent credit card transactions. The following confusion matrix (columns represent actual values, rows represent predicted values) was generated after testing the model: \tActual Fraud (Positive)\tActual Legit (Negative) Predicted Fraud\t45\t15 Predicted Legit\t5\t135 What is the precision of this classifier?
- 67%
- 72%
- 75%
- 80%
Correct

Incorrect
Question 32 of 65

32. Question
A news organization is digitizing its extensive article archive to make it easily searchable and categorized based on topics. The archive consists of raw text without any pre-assigned labels. Which AWS services would you use to automatically classify and organize the articles with minimal manual effort? (SELECT TWO)
- Use Amazon SageMaker's Neural Topic Model (NTM) to identify and categorize topics
- Use Amazon Comprehend’s topic modeling (LDA) to assign topics automatically
- Leverage Amazon Kendra for semantic search without needing pre-classified topics
- Utilize Amazon Textract to extract text-based topics from the articles
- Manually label a subset of articles using Amazon SageMaker Ground Truth and train a text classification model
Correct

Incorrect
Question 33 of 65

33. Question
A retail company processes large volumes of customer transaction data stored in Amazon EMR with Apache Spark. The workload fluctuates significantly, especially during the holiday season. How can the company efficiently scale resources to meet seasonal demand while minimizing costs?
- Use EC2 Spot instances for Spark task nodes while keeping core and master nodes on On-Demand instances
- Deploy Spot instances for both core and task nodes, while reserving On-Demand instances for the master node
- Use Spot instances across all node types, including master, core, and task nodes
- Use Reserved instances for task nodes and Spot instances for core nodes
Correct

Incorrect
Question 34 of 65

34. Question
A genomics research team is developing a model to predict genetic disease risks. The dataset has thousands of genomic features, many of which are highly correlated. What preprocessing technique should be used to reduce dimensionality and improve model performance?
- Apply Principal Component Analysis (PCA) to remove redundant features
- One-hot encode all categorical variables
- Increase the dataset size to compensate for high dimensionality
- Manually select features based on expert domain knowledge
Correct

Incorrect
Question 35 of 65

35. Question
A public transportation agency wants to monitor real-time subway ridership at different stations. Data is collected every minute, and the agency wants to detect anomalies in rider volume and send alerts when unusual spikes or drops occur. Which is the most efficient and cost-effective approach?
- Use Amazon Kinesis Data Firehose, process the data using Amazon Kinesis Data Analytics with Random Cut Forest (RCF) for anomaly detection, and trigger alerts via AWS Lambda and Amazon SNS
- Ingest the data into Amazon S3, run AWS Glue ETL for anomaly detection, and trigger SNS alerts
- Use Kinesis Data Streams, process anomalies using Amazon SageMaker with Random Cut Forest (RCF), and send alerts via SNS
- Use Kinesis Firehose to load data into S3 and set up Amazon CloudWatch alerts for anomaly detection
Correct

Incorrect
Question 36 of 65

36. Question
A healthcare research lab is using machine learning to predict patient medical costs. One of the features, blood pressure, is recorded with two decimal places but needs to be transformed into a more compact format. The values are highly skewed, and extreme values are critical in prediction. What preprocessing technique should be applied?
- Normalize the blood pressure values to scale them within a standard range
- Apply quantile binning to convert blood pressure into a set of categories
- Use interval binning with predefined categories for classification
- Apply boosting techniques to increase the impact of underrepresented ranges in the dataset
Correct

Incorrect
Question 37 of 65

37. Question
A global tech company is developing an AI-powered translation system using Amazon SageMaker’s sequence-to-sequence (seq2seq) model. How should the training data be formatted?
- Use TFRecord format with byte-pair encoding
- Use CSV format with one-hot encoding
- Use JSON format with word embeddings
- Use RecordIO-Protobuf format with integer tokenized sequences
Correct

Incorrect
Question 38 of 65

38. Question
A hospital network is developing an ML model to detect thyroid cancer. The dataset consists of 900 non-cancer cases and 100 cancer cases, leading to class imbalance. Despite achieving 90% accuracy, the model has low recall for detecting cancer cases. Which two methods can improve recall? (SELECT TWO)
- Adjust the classification threshold or implement cost-sensitive learning
- Use feature selection to remove irrelevant variables
- Apply data augmentation to negative (non-cancer) cases
- Use SMOTE (Synthetic Minority Over-sampling Technique) to generate additional cancer cases
- Employ ensemble techniques like boosting to improve minority class classification
Correct

Incorrect
Question 39 of 65

39. Question
A software development team is deploying a custom ML model using Amazon SageMaker. The model has been containerized for deployment. What are the mandatory requirements for the container to function correctly on SageMaker? (SELECT TWO)
- Must respond to /invocations and /ping requests on port 80
- Must respond to /invocations and /ping requests on port 8080
- Must respond to GET requests on /ping within 5 seconds
- Must be compressed in ZIP format for deployment
- Must accept all socket connection requests within 250 ms
Correct

Incorrect
Question 40 of 65

40. Question
A financial institution is evaluating a fraud-detection classifier using the ROC curve shown below. The curve rises sharply toward the top-left corner, reaching a true-positive rate (TPR) of 0.90 at a false-positive rate (FPR) of 0.10, and the area under the curve (AUC) is 0.90. What can be inferred from the classifier’s ROC curve?
- The model has no discrimination capability (AUC ≈ 0.5)
- The AUC value is 1.0, indicating a perfect classifier
- The classifier achieves 100 % accuracy across all thresholds
- The model has strong discrimination ability between fraud and non-fraud cases
Correct

Incorrect
Question 41 of 65

41. Question
A sentiment analysis company is optimizing their text processing pipeline using SageMaker’s BlazingText in Word2Vec mode to capture word associations. Which of the following statements regarding BlazingText’s Word2Vec mode is true?
- BlazingText’s Word2Vec supports both Skip-gram and CBOW architectures and does not require maintaining word order
- Word2Vec automatically performs stop-word removal and stemming during preprocessing
- BlazingText only supports CPU-based training for Word2Vec embeddings
- Word2Vec does not support subword information, making it ineffective for morphologically complex languages
Correct

Incorrect
Question 42 of 65

42. Question
A sports event organizer wants to create a system where cameras automatically detect attendees wearing branded T-shirts to participate in a promotional contest. Which approach would be the most viable?
- Use AWS DeepLens cameras with Amazon Rekognition to identify attendees wearing branded T-shirts without pre-training on specific images
- Train a Convolutional Neural Network (CNN) with Amazon SageMaker using labeled images of branded T-shirts, then deploy it on AWS DeepLens
- Use Amazon Lex and Polly for image recognition and real-time speech interaction
- Implement a Recurrent Neural Network (RNN) on AWS DeepLens for pattern-based T-shirt recognition
Correct

Incorrect
Question 43 of 65

43. Question
A telecom company is planning to train an XGBoost model on SageMaker (version 1.2 or newer) to predict customer churn. Given the computational demands of the model, which instance type is the most cost-effective for training?
- T3 instance type
- M5 instance type
- P3 instance type
- R5 instance type
Correct

Incorrect
Question 44 of 65

44. Question
A consumer electronics brand is developing a voice-activated virtual assistant that will understand and respond to user queries. What combination of AWS services is best suited for processing voice commands and executing actions?
- Amazon Lex → Amazon Polly
- Amazon Transcribe → Amazon Comprehend → Amazon Polly
- Amazon Lex → Amazon Transcribe → Amazon Polly
- Amazon Rekognition → Amazon Polly
Correct

Incorrect
Question 45 of 65

45. Question
A deep learning team is training an image recognition model. After 100 epochs, training accuracy continues to increase, but validation accuracy starts declining. What is the best approach to mitigate this issue?
- Use Amazon SageMaker’s early stopping mechanism to halt training when validation accuracy decreases
- Increase SageMaker’s resource allocation to handle a larger validation set
- Reduce the learning rate in SageMaker’s hyperparameter optimization
- Revert to the model state at the 100th epoch using Amazon S3 versioning
Correct

Incorrect
Question 46 of 65

46. Question
A media streaming company is optimizing its recommendation system by increasing the learning rate in its deep neural network. Following this adjustment, the system’s prediction accuracy dropped significantly. What is the most likely reason for this?
- The increased learning rate caused the model to skip over the global minimum of the loss function, leading to suboptimal performance
- The increased learning rate caused the model to become stuck in local minima
- The training dataset was shuffled incorrectly, leading to poor learning patterns
- The model complexity increased due to adding too many layers
Correct

Incorrect
Question 47 of 65

47. Question
A financial institution is training a deep learning fraud detection model on Amazon SageMaker. The model is deployed in a VPC with highly sensitive financial data. How can you secure data in transit during training?
- Enable SSL/TLS encryption for data transfers within the VPC
- Use VPC Peering to ensure encrypted data transfers between SageMaker and other services
- Configure a NAT Gateway to encrypt outbound data from the VPC
- Apply IAM policies for SageMaker instances to encrypt data at rest
Correct

Incorrect
Question 48 of 65

48. Question
A medical research team is analyzing a clinical dataset with various features, including Mean Arterial Pressure (MAP). The dataset is almost complete, but 1% of MAP values are missing. What is the best method to handle missing MAP values?
- Impute missing values using the median of existing MAP values to handle outliers
- Fill missing MAP values with random noise to preserve dataset volume
- Drop the MAP column due to missing values
- Replace missing values with the mean MAP value
Correct

Incorrect
Question 49 of 65

49. Question
A hedge fund wants to automate the analysis of market transactions, execution costs, and investment risks. The firm requires a scalable solution that dynamically adjusts computing resources for highly variable workloads. Which AWS service is best suited for this use case?
- AWS Step Functions for orchestrating workflows with AWS Lambda
- Amazon SQS for queueing tasks with manually managed compute resources
- Amazon SWF for manually configuring EC2-based workflows
- AWS Batch for scheduling and dynamically provisioning computing resources
Correct

Incorrect
Question 50 of 65

50. Question
A retail corporation wants to forecast revenue trends while dealing with irregular historical revenue patterns. The company also seeks a low-maintenance visualization solution. What is the simplest approach?
- Upload sales data to Amazon S3 and use Amazon QuickSight for forecasting and visualization
- Store sales data in Amazon RDS and visualize reports using Tableau
- Use Amazon SageMaker for forecasting and Amazon QuickSight for visualization
- Process sales data with Apache Spark on Amazon EMR, then visualize with QuickSight
Correct

Incorrect
Question 51 of 65

51. Question
You are building a linear regression model to predict annual income based on features such as age and occupation. The dataset contains extreme outliers due to high-net-worth individuals. To prevent these outliers from distorting the model’s predictions, what is the most effective approach?
- Use quantile binning to categorize income levels before training the model
- Leverage Amazon SageMaker Data Wrangler to detect and remove income outliers
- Apply log transformation to normalize the income distribution
- Use Amazon SageMaker Model Monitor to filter outliers during inference
Correct

Incorrect
Question 52 of 65

52. Question
An online news portal has built a time-series forecasting model to predict the number of daily visitors to its website. After deployment, the following observations were made from the predicted vs. actual web traffic trends: •\tThe model captures weekly seasonality accurately, with peaks and troughs aligning closely with real traffic patterns. •\tHowever, over several months, the predicted traffic gradually diverges from the actual upward trend in overall visitors. How would you assess the model’s performance?
- The model effectively captures seasonal trends but struggles with long-term trends
- The model successfully captures both short-term seasonality and long-term trends
- The model struggles with both seasonal patterns and overall trends
- The model effectively tracks long-term trends but does not capture seasonality well
Correct

Incorrect
Question 53 of 65

53. Question
A retail company maintains a large Amazon S3 data lake containing structured and unstructured CSV files. The data needs transformation and cleaning before analysts can query it via SQL. Which AWS service combination would require the least effort and maintenance?
- AWS Glue for ETL processing, Amazon Athena for querying
- Amazon EMR for data transformation, Amazon RDS for querying
- Amazon SageMaker for data preprocessing, Amazon Redshift for querying
- AWS Glue DataBrew for transformation, AWS Lambda for automated cleaning
Correct

Incorrect
Question 54 of 65

54. Question
A financial services company processes real-time transaction data, where each record contains hundreds of columns. Many of these columns are irrelevant to a fraud detection model. What is the most efficient way to filter and preprocess the data before model training?
- Use AWS Glue DataBrew to remove irrelevant columns and transform key features
- Process the data in real-time using Amazon Kinesis Data Analytics
- Utilize Apache Spark on Amazon EMR to remove unnecessary features
- Trigger an AWS Lambda function for data transformation before storing it in S3
Correct

Incorrect
Question 55 of 65

55. Question
A media company is using Amazon SageMaker Factorization Machines for a movie recommendation system. What is the correct training data format for Factorization Machines?
- Dense matrix in CSV format where each row represents a complete user-movie rating
- Sparse matrix in RecordIO format with user ID, movie ID, and ratings
- Sparse matrix in RecordIO float32 format with user-movie interactions and optional extra features
- RecordIO integer format including user IDs, movie IDs, and categorical features
Correct

Incorrect
Question 56 of 65

56. Question
A machine learning operations (MLOps) team wants to restrict Amazon SageMaker notebook instance access to specific IAM groups. What is the best method to enforce access control?
- Assign an IAM policy to specific groups, granting access only to authorized users
- Use AWS Single Sign-On (SSO) to restrict SageMaker access by user role
- Attach a VPC endpoint to the notebook instance and manage access via security groups
- Apply an AWS Glue security configuration to enforce IAM restrictions
Correct

Incorrect
Question 57 of 65

57. Question
A publishing company is receiving duplicate book data from multiple sources into Amazon S3. How can the company remove duplicate records efficiently before further processing?
- Use AWS Glue FindMatches ML Transform to detect and remove duplicates
- Run an Apache Spark job on Amazon EMR to remove duplicates
- Load the data into Amazon Redshift and apply a PRIMARY KEY constraint
- Use AWS Lambda to validate and filter out duplicates upon arrival
Correct

Incorrect
Question 58 of 65

58. Question
A biodiversity research group wants to extend its image classification model to classify flower species. They already have a CNN model that detects flowers but lacks species classification. What is the most efficient way to enhance their model?
- Apply transfer learning to fine-tune the existing CNN model on species classification
- Retrain the CNN from scratch using Amazon SageMaker’s built-in image classification algorithm
- Deploy the model on AWS DeepLens and update it with real-time feedback loops
- Use Amazon Lex to enhance the model’s ability to classify species via voice commands
Correct

Incorrect
Question 59 of 65

59. Question
An advertising company is developing a machine learning model to predict purchase likelihood. The dataset includes hundreds of demographic features. What is the best way to reduce dimensionality while preserving predictive power?
- Use Principal Component Analysis (PCA) to extract the most important features
- Apply K-Means clustering to categorize users into groups based on shared features
- Train a Factorization Machine model to identify feature interactions
- Apply a Random Cut Forest (RCF) algorithm to filter out low-impact features
Correct

Incorrect
Question 60 of 65

60. Question
A financial institution is developing a handwritten digit recognition model to process customer documents. The model classifies digits from 0 to 9. What label preprocessing should be applied for optimal training?
- Convert labels into one-hot encoded vectors (e.g., digit ‘5’ → [0,0,0,0,0,1,0,0,0,0])
- Normalize labels to a range between 0 and 1 for compatibility with deep learning models
- Transform labels into hexadecimal format for efficient processing
- Keep labels as raw integer values without transformation
Correct

Incorrect
Question 61 of 65

61. Question
A marketing analyst wants to assess the effectiveness of an A/B email campaign, assuming that each recipient has a 40% probability of opening an email. The goal is to model the probability of email openings for a given batch of recipients. Which probability distribution is best suited for this scenario?
- Poisson Distribution
- Binomial Distribution
- Normal Distribution
- Exponential Distribution
Correct

Incorrect
Question 62 of 65

62. Question
A telecommunications company is analyzing customer churn patterns to segment users into distinct categories for targeted retention campaigns. The data scientist is using k-Means clustering. What is the best approach to determine the optimal number of clusters (k)?
- Use the "elbow method" by plotting the within-cluster sum of squares (WSS) against k values.
- Apply hierarchical clustering to identify the best k before running k-Means.
- Use Principal Component Analysis (PCA) to reduce dimensionality and set k automatically.
- Leverage AWS Glue DataBrew to perform silhouette analysis before clustering.
Correct

Incorrect
Question 63 of 65

63. Question
A sports analytics company has installed a high-speed camera at a stadium entrance to identify VIP ticket holders using facial recognition. The objective is to automatically notify security when a VIP is detected. Which AWS service combination would be most efficient with minimal development effort?
- Amazon Rekognition → Amazon SNS
- Amazon SageMaker → AWS Lambda → Amazon SNS
- Amazon Rekognition → AWS Lambda → Amazon SES
- Amazon Rekognition → Amazon Kinesis → AWS Lambda → Amazon SNS
Correct

Incorrect
Question 64 of 65

64. Question
A healthcare research team is developing an ML model to classify patient conditions based on X-ray scan results. The dataset contains two types of diagnoses, and a scatterplot of patient biometrics indicates a non-linear separation between the two classes. Which two ML algorithms would be the best fit for classification? (Select Two)
- Support Vector Machine (SVM) with an RBF Kernel
- Linear Regression
- Principal Component Analysis (PCA) before classification
- Support Vector Machine (SVM) with a Linear Kernel
- k-Nearest Neighbors (kNN)
Correct

Incorrect
Question 65 of 65

65. Question
A social media monitoring system needs to analyze images and text from posts to identify people, objects, and topics being discussed. To ensure both text and image data are efficiently labeled, which two AWS services should be integrated into the solution? (Select Two)
- Amazon Rekognition for image subject identification
- Amazon Textract for extracting text from images
- Amazon Comprehend for natural language processing on text data
- AWS Lambda for orchestrating manual labeling tasks
- Amazon SageMaker for training a custom entity recognition model from scratch
Correct

Incorrect

AWS Certified Machine Learning Specialty Exam Notes and Practice Tests

AWS Certified Machine Learning Specialty – Practice Test #1

Quiz Summary

Information

Results

Results

Domains

1. Question

2. Question

3. Question

4. Question

5. Question

6. Question

7. Question

8. Question

9. Question

10. Question

11. Question

12. Question

13. Question

14. Question

15. Question

16. Question

17. Question

18. Question

19. Question

20. Question

21. Question

22. Question

23. Question

24. Question

25. Question

26. Question

27. Question

28. Question

29. Question

30. Question

31. Question

32. Question

33. Question

34. Question

35. Question

36. Question

37. Question

38. Question

39. Question

40. Question

41. Question

42. Question

43. Question

44. Question

45. Question

46. Question

47. Question

48. Question

49. Question

50. Question

51. Question

52. Question

53. Question

54. Question

55. Question

56. Question

57. Question

58. Question

59. Question

60. Question

61. Question

62. Question

63. Question

64. Question

65. Question