辅导R留学生、讲解R程序设计、辅导mortgage applications 辅导Python编程|讲解Database
- 首页 >> OS编程 Heading Home (Due Monday November 4th)
You will analyze a sample of mortgage applications in the next several assignments. Each assignment requires an analysis of the NY.Rda data and a written discussion of the results in Word. I suggest that you write one program and continually add commands to it.
Begin by downloading the data file NY.Rda to a folder on your computer. Use the load command to bring the data into R. For example, load(file="NY.Rda", verbose = TRUE).
Paragraph 1: Sample selection
1.Write a sentence giving the number of observations and variables in the dataset.
You will use functions load(), dim(), names() and str(). These functions take the name of the data frame as an argument.
2.Look at the names of the variables. Write a sentence summarizing the type of information you have.
3.I selected the sample using the variables: State, PropType, LoanPurpose, and Occupancy. Perform an analysis of each of these categorical variables. Use the table command to examine each variable; for example, table(NY$State). Write two sentences describing the sample based on your results.
Note the use of the two part variable name with the parts separated by the “$”. Remember that capitalization matters in R.
Paragraph 2: Demography
4.The key demographic variables in this data set are Race, Ethnicity, Sex, and CoApplicant. Perform an analysis of these categorical variables. Use the prop.table command to examine each variable; for example prop.table(table(NY$Race)). Include the results in your paper. Copy/paste or compile your document to get the results into Word. Add a title of as “Exhibit 1 Demography”.
5.Write a paragraph describing the “typical” mortgage applicant based on your demographic analysis.
6.Look up the demographic makeup of New York State. Write a sentence explaining whether the sample of applicants is similar or different from the state.
Paragraph 3: Ethnicity and the relationship to other Demographic variables
7.Our primary interest is the differences between Hispanic and non-Hispanic mortgage applicants. However, ethnicity may be related to other variables. Create a frequency tables to examine whether ethnicity is independent of race; the command for this is table(NY$Ethnicity, NY$Race). Note the use of two variables in the table command separated with a comma.
Use the results to calculate the conditional probabilities P(Hispanic | White), P(Hispanic | Black), P(Hispanic | Asian). The easiest way to do this is to just use R as a calculator. Write a sentence explaining if race and ethnicity are independent and cite the probabilities.
8.Repeat the analysis for ethnicity and gender. Create the frequency table.
Compare P(Female | Hispanic) with P(Female | not Hispanic).
Compare P(Hispanic | Female) with P(Hispanic | Male).
Write a sentence about this result.
9.Create an indicator variable for whether the applicant is Hispanic. The syntax is
NY$Hispanic <- ifelse(NY$Ethnicity == "Hispanic", 1, 0)
Create another indicator variable for whether the applicant is female. Look at a table of the variable Sex and then write a similar ifelse command.
Perform a statistical test of the difference in proportions. The syntax is
t.test(NY$Female ~ NY$Hispanic)
Notice that the group means match your calculation from the previous question. And this command provides a confidence interval for the difference between the groups.
Include the test of the difference in proportion in your document. Add a title of as “Exhibit 2 Race and Gender Relationship”.
I claim that a Hispanic applicant is more likely to be a female than a nonHispanic applicant. Write a sentence discussing my hypothesis using the confid4nce interval from your analysis.
Paragraph 4: The distribution of Income
10.Use the mean(), sd(), median(), summary(), and hist() commands to analyze the ApplicantIncome variable.
Include the graph in your paper and write a sentence describing the distribution; comment on the shape, center, and spread using your calculations.
You will analyze a sample of mortgage applications in the next several assignments. Each assignment requires an analysis of the NY.Rda data and a written discussion of the results in Word. I suggest that you write one program and continually add commands to it.
Begin by downloading the data file NY.Rda to a folder on your computer. Use the load command to bring the data into R. For example, load(file="NY.Rda", verbose = TRUE).
Paragraph 1: Sample selection
1.Write a sentence giving the number of observations and variables in the dataset.
You will use functions load(), dim(), names() and str(). These functions take the name of the data frame as an argument.
2.Look at the names of the variables. Write a sentence summarizing the type of information you have.
3.I selected the sample using the variables: State, PropType, LoanPurpose, and Occupancy. Perform an analysis of each of these categorical variables. Use the table command to examine each variable; for example, table(NY$State). Write two sentences describing the sample based on your results.
Note the use of the two part variable name with the parts separated by the “$”. Remember that capitalization matters in R.
Paragraph 2: Demography
4.The key demographic variables in this data set are Race, Ethnicity, Sex, and CoApplicant. Perform an analysis of these categorical variables. Use the prop.table command to examine each variable; for example prop.table(table(NY$Race)). Include the results in your paper. Copy/paste or compile your document to get the results into Word. Add a title of as “Exhibit 1 Demography”.
5.Write a paragraph describing the “typical” mortgage applicant based on your demographic analysis.
6.Look up the demographic makeup of New York State. Write a sentence explaining whether the sample of applicants is similar or different from the state.
Paragraph 3: Ethnicity and the relationship to other Demographic variables
7.Our primary interest is the differences between Hispanic and non-Hispanic mortgage applicants. However, ethnicity may be related to other variables. Create a frequency tables to examine whether ethnicity is independent of race; the command for this is table(NY$Ethnicity, NY$Race). Note the use of two variables in the table command separated with a comma.
Use the results to calculate the conditional probabilities P(Hispanic | White), P(Hispanic | Black), P(Hispanic | Asian). The easiest way to do this is to just use R as a calculator. Write a sentence explaining if race and ethnicity are independent and cite the probabilities.
8.Repeat the analysis for ethnicity and gender. Create the frequency table.
Compare P(Female | Hispanic) with P(Female | not Hispanic).
Compare P(Hispanic | Female) with P(Hispanic | Male).
Write a sentence about this result.
9.Create an indicator variable for whether the applicant is Hispanic. The syntax is
NY$Hispanic <- ifelse(NY$Ethnicity == "Hispanic", 1, 0)
Create another indicator variable for whether the applicant is female. Look at a table of the variable Sex and then write a similar ifelse command.
Perform a statistical test of the difference in proportions. The syntax is
t.test(NY$Female ~ NY$Hispanic)
Notice that the group means match your calculation from the previous question. And this command provides a confidence interval for the difference between the groups.
Include the test of the difference in proportion in your document. Add a title of as “Exhibit 2 Race and Gender Relationship”.
I claim that a Hispanic applicant is more likely to be a female than a nonHispanic applicant. Write a sentence discussing my hypothesis using the confid4nce interval from your analysis.
Paragraph 4: The distribution of Income
10.Use the mean(), sd(), median(), summary(), and hist() commands to analyze the ApplicantIncome variable.
Include the graph in your paper and write a sentence describing the distribution; comment on the shape, center, and spread using your calculations.