代写CSE416 Introduction to Machine Learning帮做R语言
- 首页 >> Java编程CSE416 Introduction to Machine Learning
Q1 Learning a Tree
1 Point
Select one option.
Consider the following dataset.
If we use the decision tree algorithm to learn a decision tree from this dataset, what feature would be used as the split for the root node?
h1(x)
h2(x)
h3(x)
Q2 Decision Boundaries
4 Points
Which of the following pictures show decision boundaries that are possible to represent with a decision tree only using the features Age and Income? The regions with the green background are predicted positive and the regions with the orange background are predicted negative.
Q2.1
1 Point
This decision boundary can be learnt by a decision tree classifier only using the features Age and Income.
This decision boundary cannot be learnt by a decision tree classifier only using the features Age and Income.
Q2.2
1 Point
This decision boundary can be learnt by a decision tree classifier only using the features Age and Income.
This decision boundary cannot be learnt by a decision tree classifier only using the features Age and Income.
Q2.3
1 Point
This decision boundary can be learnt by a decision tree classifier only using the features Age and Income.
This decision boundary cannot be learnt by a decision tree classifier only using the features Age and Income.
Q2.4
1 Point
This decision boundary can be learnt by a decision tree classifier only using the features Age and Income.
This decision boundary cannot be learnt by a decision tree classifier only using the features Age and Income.
Q3 Tree Depth
2 Points
Q3.1 Bias/Variance
1 Point
A smaller depth decision tree will have __ bias and __ variance than a deeper decision tree.
higher bias, higher variance
higher bias, lower variance
lower bias, higher variance
lower bias, lower variance
Q3.2 Comparing Trees
1 Point
If decision tree T1 has lower training error than decision tree T2, then T1 will always have better test error than T2.
True
False
Q4 Calculating Classification Error
1 Point
Provide a numeric answer.
Based on the implementation in lecture, compute the classification error of the following tree with two output classes safe and risky.
Please give your answer to two decimal places. Make sure to not start with a dot, such as .5 (instead, you would write 0.5).
Q5 Splitting on a numeric feature
1 Point
Provide a numeric answer.
In building a decision tree to classify whether a loan is risky or not, we choose to split on the feature Annual Income from 10 training examples.
Here are the values of this column for examples of each output class:
Risky: 10k, 15k, 40k, 100k
Safe: 20k, 61k, 82k, 89k, 95k, 96k
The below image shows another view of the same data:
What is the classification error of the best split?
Please give your answer to one decimal place. Make sure to not start with a dot, such as .5 (instead, you would write 0.5).
Q6 Comparing Ensembles
2 Points
Q6.1 Which Model?
1 Point
Select one option.
Which following choice describes an ensemble model where each of the models in the ensemble can easily be trained in parallel (i.e. in any order)?
Decision Tree
Random Forest
AdaBoost
Q6.2 More Trees?
1 Point
Consider the following claim. Select which ensemble models we discussed in class that the claim is generally true for.
Claim: This ensemble model needs to select the number of trees used in the ensemble as to avoid overfitting.
Random Forest
AdaBoost
Q7 AdaBoost
1 Point
Select one option.
Suppose we are running AdaBoost using decision tree stumps. At a particular iteration, the data points have weights according to the figure (Large points indicate heavier weights.)
Which of the following decision tree stumps is most likely to be fit in the next iteration?
Hint: Notice the labels on the decision boundary. It shows the predicted label for a side of the decision boundary under/to the right of the word "Predict"
Q8 ML Practitioner Scenarios
4 Points
Consider the below scenarios, and determine whether you would or would not recommend the suggested idea. In your answer, state if the suggestion is "Correct" or "Incorrect" and provide a justification as to why that suggestion would or wouldn't be a good idea.
Q8.1
2 Points
Pavan's computer has 8 cores in the CPU, and each core can be responsible for a parallel task. He plans to use all of them for training a random forest classifier. On each core, he makes an exact copy of the original training dataset and trains a decision tree on that copy.