代做Assignment 1: 161.762 Multivariate Statistics for Big Data S1, 2025代做R语言

- 首页 >> C/C++编程

Assignment 1: 161.762 Multivariate Statistics for Big Data

S1, 2025

Due: 25th April 2025

Provide the code used for each item, where applicable.

1) [14 marks] The principal components of pizza

The dataset pizza2025.sas7.bdat contains measurements that capture key factors influencing the flavour and overall quality of a pizza.

The variables in the data set are:

Brand: Pizza brand (class label)

Id: Sample analysed

Mois: Amount of water per 100 grams in the sample

Prot: Amount of protein per 100 grams in the sample

Fat: Amount of fat per 100 grams in the sample

Ash: Amount of ash per 100 grams in the sample

Sodium: Amount of sodium per 100 grams in the sample

Carb: Amount of carbohydrates per 100 grams in the sample

Cal: Number of calories per 100 grams in the sample

a. [3 marks] Create a draftsman’s plot for the variables from mois to cal, with colors based on the Brand variable. Discuss any noticeable patterns or trends.

b. [2 marks] Calculate the correlation matrix for the given variables. Briefly comment on any noticeable patterns that might suggest relationships between variables.

c. [2 marks] Perform. a principal component analysis (PCA) for the tastiness pizza variables mois to cal and create a biplot colour coded by Brand.

d. [2 marks] How much of the total variation in the original variables is explained by the first two principal components (PC) axes together?

e. [3 marks] Explain the characteristics of the brands observed in the plot from item 1c and the reasons for these distinctions.

f. [2 marks] What is the difference between the correlation matrix you got in item 1b with the one you obtained from the PCA analysis in item 1c. Why is this difference important for the analysis?

2) [10 marks] The correspondent sandwich

The dataset sandwiches_2025 records the variables: brand, name, type of sandwich (SandType) and fiber content (FbCont) for different products.

a. [1 mark] Perform. a Multiple Correspondence Analysis for the variables brand, SandType, FbCont. Adjust the inertia by the Benzécri’s method.

b. [2 marks] How are the brands related to the fiber content?

c. [2 marks] How are the brands related to the sandwich type?

d. [2 marks] Can you see a relationship between fiber content and sandwich type?

e. [3 marks] What are the inertia levels for the first two dimensions, what is the inertia telling you in the analysis and why the Benzécri’s method is applied?





站长地图