辅导STA 141A、讲解R设计、辅导R、Canvas讲解讲解留学生Processing|解析Haskell程序

2019.05.01 - 首页 >> 其他

STA 141A, Homework 2
Due April 30th 2019 (by 8 am)

Name:

Student ID:

Section:

Names of your study mates:

Please submit on Canvas, in a compiled R-markdown file (to pdf or html).

All code in this assignment should be cleanly written and well commented, with appropriate use of functions/arguments. Imagine you are sending this code to your colleagues or supervisors for review—which they can only do if they can understand it.

We are interested in understanding the biomolecular pathways in tumor cells that make them susceptible to the immune system. To this end, we need to investigate genes whose expression are related to the quantity of necrotic tumor tissue. In this exercise, we will estimate the relationship between gene-experession levels in tumor cells and the existence of necrotic tissue using the NOAH data.

The data can be found on Canvas in the following files:

clinical_data.csv: clinical/phenotypic information.*
expression_data_probeID.csv: expression information (by probeset).
annotation.csv: genename identifiers corresponding to each probeset.
You can read find about the probesets by searching the affymetrix human 133 plus 2.0 array annotation.

Your tasks here are to build a prediction model for the phenotypic variable, necrotic_cells.pct (the percentage of necrotic tissue in a tumor found by pathology), using gene expression levels. In particular, you will

select genes that lead to good prediction performance, in order for scientists to follow up in future studies,
and properly control for over-fitting in your analysis.