Math Problem Statement

1 Which of the following metrics measure the ‘goodness of fit’ of a regression model? Mean absolute deviation Root mean squared error R-square Total sum of squared errors

2 Which of the following statements is true about outliers in all scenarios? Outliers should be identified and removed from a dataset. Outliers should be part of the training set but should not be present in the test set. Outliers should be part of the test set but should not be present in the training dset. Nature of the problem determines how to deal with the outliers. 3 Which of the following is the average positive difference between estimated and actual outcome values? Root mean squared error Mean squared error Mean absolute error None of the above

4 Which of the followings are common uses of unsupervised techniques like clustering? Detect outliers Find a relevant set of input features for supervised learning Find meaningful relationships in the dataset None of the above.

6 Which of the following are accurate axes labels for an ROC curve? Vertical axis: % of true negatives; Horizontal axis: % of false negatives Vertical axis: % of true positives; Horizontal axis: % of false positives Vertical axis: % of false negatives; Horizontal axis: % of false positives Vertical axis: % of false positives; Horizontal axis: % of true negatives

7 Which of the following are correct scenarios to use oversampling? When the cost of failing rare events are low. To de-emphasize rare events to the learning algorithm When a binary response variable includes significantly more items with one choice of response than the other choice, and we seek to accurately predict both the choices. When a binary response variable includes roughly the same number of items for each choice.

8 Which is not included as a step when partitioning is done with oversampling. Half the records from class 1 stratum are randomly selected into training partition. Randomly select class 0 records for training partition equal to no. of class 1 records. Randomly select class 0 records to maintain the original ratio of class 0 to class 1 records for validation partition. None of the above.

9 Which of the following statements is true about prediction problems in general? The output attribute must be categorical. The output attribute must be numeric. The resultant model is designed to determine future outcomes. The resultant model is designed to classify current behavior.

10 Which of the following are true about asymmetric misclassification cost? Considered when the misclassification error for a class of interest is more costly than for the other class. Goal of performance metrics changes to either minimization of costs or maximisation of profits. Performance metrics which incorporate asymmetric misclassification costs are used. None of the above.

Solution

Here are the answers for each of the questions:

1. Which of the following metrics measure the ‘goodness of fit’ of a regression model?

  • Answer: R-square
    R-squared is a measure of how well the regression model fits the data, showing the proportion of variance explained by the model.

2. Which of the following statements is true about outliers in all scenarios?

  • Answer: Nature of the problem determines how to deal with the outliers.
    Outliers should be dealt with based on the context of the problem. For some models, outliers may be important, while for others, they may distort results and need to be removed.

3. Which of the following is the average positive difference between estimated and actual outcome values?

  • Answer: Mean absolute error
    Mean Absolute Error (MAE) calculates the average of the absolute differences between the predicted values and the actual values.

4. Which of the following are common uses of unsupervised techniques like clustering?

  • Answer: Find meaningful relationships in the dataset
    Clustering techniques are primarily used to group similar data points and identify patterns or relationships within the data, which can be useful for further analysis.

6. Which of the following are accurate axes labels for an ROC curve?

  • Answer: Vertical axis: % of true positives; Horizontal axis: % of false positives
    An ROC curve plots the true positive rate (TPR) against the false positive rate (FPR).

7. Which of the following are correct scenarios to use oversampling?

  • Answer: When a binary response variable includes significantly more items with one choice of response than the other choice, and we seek to accurately predict both the choices.
    Oversampling is typically used to address class imbalance in datasets, where one class has fewer examples than the other.

8. Which is not included as a step when partitioning is done with oversampling?

  • Answer: Randomly select class 0 records to maintain the original ratio of class 0 to class 1 records for validation partition.
    In oversampling, the training set is typically balanced by adding more samples from the minority class, but the validation set should ideally maintain the original class distribution.

9. Which of the following statements is true about prediction problems in general?

  • Answer: The resultant model is designed to determine future outcomes.
    In prediction problems, the model is generally used to predict future or unseen data outcomes, whether it is for numeric or categorical values.

10. Which of the following are true about asymmetric misclassification cost?

  • Answer: Considered when the misclassification error for a class of interest is more costly than for the other class.
    Asymmetric misclassification costs occur when the penalty for misclassifying one class is higher than for another, which may require adjusting models or performance metrics accordingly.

Would you like more details or have any follow-up questions on these topics?

Here are some related questions for further

Ask a new question for Free

By Image

Drop file here or Click Here to upload

Math Problem Analysis

Mathematical Concepts

Regression Analysis
Goodness of Fit
R-squared

Formulas

R^2 = 1 - (SS_res / SS_tot)

Theorems

Coefficient of determination (R-squared)

Suitable Grade Level

Grades 10-12