代做Assignment 1: 161.762 Multivariate Statistics for Big Data S1, 2025代做R语言
- 首页 >> C/C++编程Assignment 1: 161.762 Multivariate Statistics for Big Data
S1, 2025
Due: 25th April 2025
Provide the code used for each item, where applicable.
1) [14 marks] The principal components of pizza
The dataset pizza2025.sas7.bdat contains measurements that capture key factors influencing the flavour and overall quality of a pizza.
The variables in the data set are:
Brand: Pizza brand (class label)
Id: Sample analysed
Mois: Amount of water per 100 grams in the sample
Prot: Amount of protein per 100 grams in the sample
Fat: Amount of fat per 100 grams in the sample
Ash: Amount of ash per 100 grams in the sample
Sodium: Amount of sodium per 100 grams in the sample
Carb: Amount of carbohydrates per 100 grams in the sample
Cal: Number of calories per 100 grams in the sample
a. [3 marks] Create a draftsman’s plot for the variables from mois to cal, with colors based on the Brand variable. Discuss any noticeable patterns or trends.
b. [2 marks] Calculate the correlation matrix for the given variables. Briefly comment on any noticeable patterns that might suggest relationships between variables.
c. [2 marks] Perform. a principal component analysis (PCA) for the tastiness pizza variables mois to cal and create a biplot colour coded by Brand.
d. [2 marks] How much of the total variation in the original variables is explained by the first two principal components (PC) axes together?
e. [3 marks] Explain the characteristics of the brands observed in the plot from item 1c and the reasons for these distinctions.
f. [2 marks] What is the difference between the correlation matrix you got in item 1b with the one you obtained from the PCA analysis in item 1c. Why is this difference important for the analysis?
2) [10 marks] The correspondent sandwich
The dataset sandwiches_2025 records the variables: brand, name, type of sandwich (SandType) and fiber content (FbCont) for different products.
a. [1 mark] Perform. a Multiple Correspondence Analysis for the variables brand, SandType, FbCont. Adjust the inertia by the Benzécri’s method.
b. [2 marks] How are the brands related to the fiber content?
c. [2 marks] How are the brands related to the sandwich type?
d. [2 marks] Can you see a relationship between fiber content and sandwich type?
e. [3 marks] What are the inertia levels for the first two dimensions, what is the inertia telling you in the analysis and why the Benzécri’s method is applied?