代写Programing Assignment 1代写留学生Python语言

- 首页 >> C/C++编程

Programing Assignment 1

(Programing)

Please paste code, produced tables and plots on your solution.

1.          NumPy is a package which provides convenient matrix/vector computations : (10%)

a.   Please generate a 8 × 8 matrix A and find the minimum, mean, maximum values of each row and column using NumPy. (3%)

b.   Please generate another 8 × 8 matrix B and find the transpose and inverse of B. (3%)

c.    Please compute the element-wise multiplication and matrix multiplication of A and B. (4%)

2.          Pandas DataFrame is two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns. Pandas DataFrame consists of three principal components, the data, rows, and columns. (10%)

a.   Given a table of NBA players’ stats as follows, please generate a Pandas DataFrame based on the table. (3%)

Player

GP

MIN

PTS

FGM

FGA

FG%

3PM

3PA

3P%

James Harden

11

38.5

31.6

9.9

24

41.3

4.4

12.5

35

Kawhi Leonard

24

39.1

30.5

10.1

20.7

49

2.3

6

37.9

Paul George

5

40.8

28.6

8.8

20.2

43.6

3

9.4

31.9

Stephen Curry

22

38.5

28.2

8.6

19.6

44.1

4.2

11.1

37.7

Damian Lillard

16

40.6

26.9

8.6

20.6

41.8

9.9

37.3

Giannis Antetokounmpo

15

34.3

25.5

8.6

17.4

49.4

1.2

3.7

32.7

Nikola Jokic

14

39.7

25.1

9.4

18.6

50.6

1.6

4

39.3

CJ McCollum

16

39.7

24.7

21.9

44

2.9

7.3

39.3

Russell Westbrook

5

39.4

22.8

8

22.2

36

2.2

6.8

32.4

DeMar DeRozan

7

35.9

22

8.3

17

48.7

0

0.1

0

James Harden

11

38.5

31.6

9.9

24

41.3

4.4

12.5

35

b.   Please check how many data are missing and fill the missing data with the average of other players. (4%)

c.    Now, we get the stats of another player as follows, please add his information into our DataFrame. (3%)

Player

GP

MIN

PTS

FGM

FGA

FG%

3PM

3PA

3P%

Lou Williams

6

29.3

21.7

7.5

17.3

43.3

1

3

33.3

3.          Parkinson Dataset with replicated acoustic features Data Set

(http://archive.ics.uci.edu/ml/datasets/Parkinson+Dataset+with+replicated+acoustic+features+ ) contains acoustic features extracted from 3 voice recording replications of the sustained /a/phonation for each one of the 80 subjects (Some of them with Parkinson's Disease, i.e., status=1). Please find the data as Parkinson.csv file. (Hint: columns ‘ID’ and ‘ Recording’ can not be considered as the features.) (40%)

a.   As we discussed in class, given a dataset to analyze, before designing supervised learning model or unsupervised model, we need to understand the structure and statistics of the data, i.e., distribution of class labels, distribution of each feature, etc. Please implement such data analysis using Python. (10%)

b.   Considering each record as an individual sample, please train a decision tree classifier (max_depth = 3) to predict the status of each sample. Please plot your decision tree. (15%)

c.   As discussed in class, Grid Search can help us to tune the model parameters to find the optimal solution. Please tune your  decision tree classifier to improve the predictive performance. (15%)

4.          Indian Liver Patient Dataset

(https://archive.ics.uci.edu/ml/datasets/ILPD+%28Indian+Liver+Patient+Dataset%29, please

find the data as the ILPD.csv file.) provides the age, gender, total Bilirubin, direct Bilirubin, total  proteins, albumin, A/G ratio, SGPT, SGOT and Alkphos of patients. Please train a KNN classifier and a Logistic Regression classifier to predict class label of the patient. (for KNN classifier please refer to: https://scikit-

learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html ) (40%)

Note: some data are missing.


站长地图