0 of 65 Questions completed
Questions:
You have already completed this quiz. You cannot start it again.
Quiz is loading…
You must sign in or sign up to take this quiz.
You must first complete the following:
Quiz complete. Results are being recorded.
0 of 65 Questions answered correctly
Your Time:
Time has elapsed.
You have reached 0 of 0 point(s), (0)
Grade:
0 Essay(s) Pending (Possible Point(s): 0)
You didn’t pass this time, but that’s okay. Take this as an opportunity to identify areas for improvement. Review the materials, focus on your weak spots, and you’ll be even more prepared for your next attempt.
Great work! You passed this practice test. Keep reinforcing your knowledge, and you’ll be confident and ready for the real AWS exam.
A fintech company has developed an XGBoost model to assess credit risk using historical loan repayment data. The model performs well on the training dataset but underperforms on a separate validation dataset, suggesting overfitting. Which hyperparameter adjustments should the data science team make to improve generalization? (Choose TWO)
You are developing a machine learning system to detect your company's logo in images. You have unlabeled image data, some containing the logo and others not. Which approach requires the least effort to prepare this dataset for supervised learning?
Given the following confusion matrix, what is the F1 score of the model? (Columns represent actual values, rows represent predicted values) Actual Positive Actual Negative Predicted Positive 40 10 Predicted Negative 20 30
A brokerage firm's data scientist reports that their Amazon SageMaker Linear Learner model fails to converge despite data normalization being enabled. What is the most likely reason for the model's failure to converge?
While training a SageMaker Linear Learner regression model to predict individual income based on age and education level, the dataset includes multiple distinct groups. To ensure optimal model performance, which two preprocessing steps should be taken? (Select TWO)
A data science team at an energy analytics company uses a neural network to predict electricity consumption based on environmental variables. Initially, the model underperformed, so they added more layers to capture complex patterns. However, after this modification, training accuracy remains low, and the model fails to converge. What is the most probable reason for this behavior?
An online retailer increased the batch size during training of their deep neural network-powered recommendation engine. Following this adjustment, the accuracy of recommendations declined. What is the most likely reason for this drop in model performance?
You are tasked with classifying terabytes of news articles into topics using Amazon SageMaker and Latent Dirichlet Allocation (LDA). Processing this vast dataset in a single batch is not feasible. What strategy should be used to improve performance?
You are training Amazon SageMaker BlazingText in supervised mode using File mode. Which of the following represents a correctly formatted training sample?
A digital media company has deployed an Amazon SageMaker recommendation engine. After developing an improved model, the team wants to evaluate its performance in production while ensuring minimal risk and disruption. Which strategy is the most effective?
A smart home automation company is developing a regression model to predict daily energy consumption based on various environmental and usage factors. The team initially applied L1 regularization to improve model simplicity, but it resulted in underfitting and poor predictive performance. Which two adjustments could potentially enhance model accuracy? (SELECT TWO)
A machine learning team at an education technology company built a Random Forest classifier to automatically grade handwritten numbers (0–9) on student math tests. After deployment, they evaluated model performance using the following confusion matrix comparing predicted and actual digits: \tActual 0\tActual 1\tActual 2\tActual 3\tActual 4\tActual 5\tActual 6\tActual 7\tActual 8\tActual 9 Predicted 0\t85\t0\t1\t0\t0\t0\t0\t0\t0\t0 Predicted 1\t0\t92\t0\t0\t0\t0\t0\t0\t0\t0 Predicted 2\t0\t0\t60\t5\t0\t0\t0\t0\t0\t0 Predicted 3\t0\t0\t2\t40\t0\t0\t0\t0\t0\t0 Predicted 4\t0\t0\t0\t0\t75\t0\t0\t0\t0\t0 Predicted 5\t0\t0\t0\t0\t0\t30\t0\t0\t0\t0 Predicted 6\t0\t0\t0\t0\t0\t0\t88\t0\t0\t0 Predicted 7\t0\t0\t0\t0\t0\t0\t0\t50\t0\t0 Predicted 8\t0\t0\t0\t0\t0\t0\t0\t0\t90\t0 Predicted 9\t0\t0\t0\t0\t0\t0\t0\t0\t0\t95 Based on the confusion matrix, which digit had the lowest classification accuracy?
A MapReduce job that previously processed data from HDFS must now operate on data migrated to Amazon S3. What is the most efficient approach to integrate S3-stored data with MapReduce jobs running on Amazon EMR?
A media analytics company wants to analyze tweets from influential figures to identify trends and sentiment similarities over time. The company needs to compute embeddings to capture semantic meaning from the tweets. Which AWS solution would be most effective for this task?
An AI startup is developing a complex image recognition model using TensorFlow on Amazon SageMaker. Due to a large dataset and high model complexity, training on a single GPU instance is insufficient. What is the best approach to scale training across multiple GPUs?
A census-based dataset contains multiple correlated features, including age, but 10% of age values are missing. To maximize model accuracy, what is the best approach for handling these missing values?
A tech startup is developing a deep learning model to predict stock market trends using Amazon SageMaker. To enhance predictive accuracy, which two hyperparameter tuning strategies should be employed? (CHOOSE TWO)
A digital marketing firm needs to analyze consumer demographic data, which arrives continuously in JSON format. They seek a cost-efficient, serverless approach for storing, querying, and visualizing the data. Which solution is most appropriate?
A fintech startup is automating its nightly ML model training pipeline, which consists of sequential ETL tasks. The company wants an approach that manages dependencies and errors automatically. Which AWS service best meets these requirements?
A company wants to restrict access to Amazon SageMaker notebooks so that only certain IAM groups can use them. What is the best approach to enforce this security policy?
A data scientist has initialized an Amazon SageMaker notebook instance to work on a predictive modeling project. The dataset is stored in Amazon S3, and the notebook needs to access it. Which method determines the notebook instance’s ability to interact with S3 data?
An e-commerce retailer wants to predict website traffic on an hourly basis for the upcoming year. The prediction must account for daily and seasonal variations while requiring minimal development effort. Which AWS service should they use?
An AI researcher is training a deep neural network for image classification. The model achieves 99% accuracy on the training dataset but only 90% on the test dataset. Expert analysts consistently achieve 98% accuracy on similar tasks. What two actions should be taken to reduce overfitting? (SELECT TWO)
A fintech company has deployed an ML model to classify insurance claims as fraudulent or legitimate. Given that processing a fraudulent claim is costlier than investigating false positives, which evaluation metric should be prioritized?
A company manages an S3 data lake storing clickstream data and wants to analyze and visualize it without provisioning servers. Which AWS services can be used?
A retail chain stores daily sales data in Amazon S3. It needs to analyze trends over the last 30 days and archive older data beyond 90 days in the most cost-effective manner. Which AWS solution meets this requirement?
A team is training an XGBoost model in Amazon SageMaker to classify videos into genres. The video metadata must be converted into LibSVM format before training. Which two AWS services could be used for data preprocessing? (SELECT TWO)
A company is developing a “Universal Translator” application to transcribe speech, translate it into English, and synthesize speech output. Which sequence of AWS services should be used?
A startup is creating AI-generated music by training an ML model on sequential music data. Given the nature of music generation, which neural network architectures are best suited for this task? (SELECT TWO)
A healthcare analytics firm needs to train ML models on sensitive patient data stored in Amazon S3. The SageMaker training jobs run in a VPC with no internet access for security compliance. How should the training jobs securely access S3 data?
A classifier is designed to detect fraudulent credit card transactions. The following confusion matrix (columns represent actual values, rows represent predicted values) was generated after testing the model: \tActual Fraud (Positive)\tActual Legit (Negative) Predicted Fraud\t45\t15 Predicted Legit\t5\t135 What is the precision of this classifier?
A news organization is digitizing its extensive article archive to make it easily searchable and categorized based on topics. The archive consists of raw text without any pre-assigned labels. Which AWS services would you use to automatically classify and organize the articles with minimal manual effort? (SELECT TWO)
A retail company processes large volumes of customer transaction data stored in Amazon EMR with Apache Spark. The workload fluctuates significantly, especially during the holiday season. How can the company efficiently scale resources to meet seasonal demand while minimizing costs?
A genomics research team is developing a model to predict genetic disease risks. The dataset has thousands of genomic features, many of which are highly correlated. What preprocessing technique should be used to reduce dimensionality and improve model performance?
A public transportation agency wants to monitor real-time subway ridership at different stations. Data is collected every minute, and the agency wants to detect anomalies in rider volume and send alerts when unusual spikes or drops occur. Which is the most efficient and cost-effective approach?
A healthcare research lab is using machine learning to predict patient medical costs. One of the features, blood pressure, is recorded with two decimal places but needs to be transformed into a more compact format. The values are highly skewed, and extreme values are critical in prediction. What preprocessing technique should be applied?
A global tech company is developing an AI-powered translation system using Amazon SageMaker’s sequence-to-sequence (seq2seq) model. How should the training data be formatted?
A hospital network is developing an ML model to detect thyroid cancer. The dataset consists of 900 non-cancer cases and 100 cancer cases, leading to class imbalance. Despite achieving 90% accuracy, the model has low recall for detecting cancer cases. Which two methods can improve recall? (SELECT TWO)
A software development team is deploying a custom ML model using Amazon SageMaker. The model has been containerized for deployment. What are the mandatory requirements for the container to function correctly on SageMaker? (SELECT TWO)
A financial institution is evaluating a fraud-detection classifier using the ROC curve shown below. The curve rises sharply toward the top-left corner, reaching a true-positive rate (TPR) of 0.90 at a false-positive rate (FPR) of 0.10, and the area under the curve (AUC) is 0.90. What can be inferred from the classifier’s ROC curve?
A sentiment analysis company is optimizing their text processing pipeline using SageMaker’s BlazingText in Word2Vec mode to capture word associations. Which of the following statements regarding BlazingText’s Word2Vec mode is true?
A sports event organizer wants to create a system where cameras automatically detect attendees wearing branded T-shirts to participate in a promotional contest. Which approach would be the most viable?
A telecom company is planning to train an XGBoost model on SageMaker (version 1.2 or newer) to predict customer churn. Given the computational demands of the model, which instance type is the most cost-effective for training?
A consumer electronics brand is developing a voice-activated virtual assistant that will understand and respond to user queries. What combination of AWS services is best suited for processing voice commands and executing actions?
A deep learning team is training an image recognition model. After 100 epochs, training accuracy continues to increase, but validation accuracy starts declining. What is the best approach to mitigate this issue?
A media streaming company is optimizing its recommendation system by increasing the learning rate in its deep neural network. Following this adjustment, the system’s prediction accuracy dropped significantly. What is the most likely reason for this?
A financial institution is training a deep learning fraud detection model on Amazon SageMaker. The model is deployed in a VPC with highly sensitive financial data. How can you secure data in transit during training?
A medical research team is analyzing a clinical dataset with various features, including Mean Arterial Pressure (MAP). The dataset is almost complete, but 1% of MAP values are missing. What is the best method to handle missing MAP values?
A hedge fund wants to automate the analysis of market transactions, execution costs, and investment risks. The firm requires a scalable solution that dynamically adjusts computing resources for highly variable workloads. Which AWS service is best suited for this use case?
A retail corporation wants to forecast revenue trends while dealing with irregular historical revenue patterns. The company also seeks a low-maintenance visualization solution. What is the simplest approach?
You are building a linear regression model to predict annual income based on features such as age and occupation. The dataset contains extreme outliers due to high-net-worth individuals. To prevent these outliers from distorting the model’s predictions, what is the most effective approach?
An online news portal has built a time-series forecasting model to predict the number of daily visitors to its website. After deployment, the following observations were made from the predicted vs. actual web traffic trends: •\tThe model captures weekly seasonality accurately, with peaks and troughs aligning closely with real traffic patterns. •\tHowever, over several months, the predicted traffic gradually diverges from the actual upward trend in overall visitors. How would you assess the model’s performance?
A retail company maintains a large Amazon S3 data lake containing structured and unstructured CSV files. The data needs transformation and cleaning before analysts can query it via SQL. Which AWS service combination would require the least effort and maintenance?
A financial services company processes real-time transaction data, where each record contains hundreds of columns. Many of these columns are irrelevant to a fraud detection model. What is the most efficient way to filter and preprocess the data before model training?
A media company is using Amazon SageMaker Factorization Machines for a movie recommendation system. What is the correct training data format for Factorization Machines?
A machine learning operations (MLOps) team wants to restrict Amazon SageMaker notebook instance access to specific IAM groups. What is the best method to enforce access control?
A publishing company is receiving duplicate book data from multiple sources into Amazon S3. How can the company remove duplicate records efficiently before further processing?
A biodiversity research group wants to extend its image classification model to classify flower species. They already have a CNN model that detects flowers but lacks species classification. What is the most efficient way to enhance their model?
An advertising company is developing a machine learning model to predict purchase likelihood. The dataset includes hundreds of demographic features. What is the best way to reduce dimensionality while preserving predictive power?
A financial institution is developing a handwritten digit recognition model to process customer documents. The model classifies digits from 0 to 9. What label preprocessing should be applied for optimal training?
A marketing analyst wants to assess the effectiveness of an A/B email campaign, assuming that each recipient has a 40% probability of opening an email. The goal is to model the probability of email openings for a given batch of recipients. Which probability distribution is best suited for this scenario?
A telecommunications company is analyzing customer churn patterns to segment users into distinct categories for targeted retention campaigns. The data scientist is using k-Means clustering. What is the best approach to determine the optimal number of clusters (k)?
A sports analytics company has installed a high-speed camera at a stadium entrance to identify VIP ticket holders using facial recognition. The objective is to automatically notify security when a VIP is detected. Which AWS service combination would be most efficient with minimal development effort?
A healthcare research team is developing an ML model to classify patient conditions based on X-ray scan results. The dataset contains two types of diagnoses, and a scatterplot of patient biometrics indicates a non-linear separation between the two classes. Which two ML algorithms would be the best fit for classification? (Select Two)
A social media monitoring system needs to analyze images and text from posts to identify people, objects, and topics being discussed. To ensure both text and image data are efficiently labeled, which two AWS services should be integrated into the solution? (Select Two)