代写STAT6160 DATA ANALYTICS FOR BUSINESS Assignment 1 Questions代写留学生Matlab语言
Assignment 1 Questions
Number of questions: 3, Total marks: 25, Weight: 25%
Due time and date: 11:59pm AEST, Sunday end of Week 7
Submission instructions and general marking criteria
. Prepare your assignment in WORD, Latex, R Markdown, or any other appropriate software system for document preparation.
. Submit a copy in PDF format via Canvas.
. Assignments submitted by other means (e.g., email) or forms (e.g., scanned copy) will attract no marks.
. Late Submission Penalty: As detailed in the Course Outline.
. It is expected that R is used to assist with calculations and preparation of appropriate graphs. All relevant R scripts and output must be included with your assignment. However, raw computer output without explanatory text is not acceptable. Answers must be written in clear English sentences clearly linked to appropriate supporting computer output.
. You will need to demonstrate understanding of types of data, the use of graphs to explore distributions of variables and relationships between variables, and of statistical tests. Marks will be awarded based on the quality of your assessment of the data and how clearly that assessment is communicated.
. The assessment requires you to apply concepts from Modules 1-4 (plus a tiny bit from Module 5) to apply the correct analysis to the various scenarios/data sets and to write up the results of a statistical analysis.
Question 1 (Total 6 Marks)
A lost-time injury is defined by Australian Workplace Standards as an occurrence that resulted in a fatality, permanent disability or time lost from work of one day/shift or more. The data is provided in the file “Question1.csv” : Columns A and B contain the causes of lost-time injuries and their percentage of occurrence across the previous year at a mining site.
(i) What type of variable (Continuous, Discrete, Ordinal or Nominal) is Cause and justify your answer? [2 Marks]
(ii) Which is the appropriate graphical display to use for the variable type you have identified in part (i)? [1 Mark]
(iii) Use R to create an appropriate chart to graphically display the data provided. [1 Mark]
(iv) Comment on the key finding from this chart. [2 Marks]
Question 2 (Total 10 Marks)
The Australian Bureau of Statistics regularly reports on large percentages of small businesses failing. In a bid to identify potential indicators, or symptoms, of business failure, a national study of small businesses was undertaken. A random sample of 100 small businesses was obtained and characteristics measured. One of the recorded variables was the ratio of current assets to current liabilities (variable name “Asset_Liability_Ratio”); roughly speaking, this is the amount that the firm is worth divided by what it owes.
Five years later these same small businesses were revisited. Among the variables collected was whether the small business was still operating or not; the latter meaning the business had failed or closed. This is an example of what is known as a longitudinal study.
The study was interested in, amongst many measures of performance, assessing whether the previously recorded ratio of current assets to liabilities differed between small businesses which were still operating five years later and those that were not.
The data is provided in the file “Question2.csv” : Columns A and B contain the two variables.
(i) Use R to construct a histogram of Asset Liability Ratio for 100 small businesses. How would you describe the shape? Include your histogram. [2 Marks]
(ii) There are two common graphical presentations used to compare the “Asset Liability Ratio” for the “still operational” and “now-closed” small businesses. Which one is preferred for this study? Name, justify your answer and provide the visual display using R. [2 Marks]
(iii) Use R to find the mean, median, standard deviation and interquartile ranges of the “Asset Liability Ratio” for the “still operational” and “now-closed” small businesses. [3 Marks]
(iv) Using your output created in parts (i)-(iii), give a brief report comparing the “Asset Liability Ratio” for the “still operational” and “ now-closed” small businesses. (Hint: Think 3 S’s for each group and provide a comparison summary.) [3 marks]
Question 3 (Total 9 Marks)
The TCS Management Group selected 100 clients randomly and sent them a survey to complete regarding their satisfaction with dealings with TCS. In this survey people were asked about the level of satisfaction where a higher score was indicative of a higher level of satisfaction with possible scores ranging from 0 to 100. The average scores of the sample was 72.44 with a standard deviation of 8.18. Management expects the mean survey score to be higher than 70.
(i) Provide a hypothesis test, at a 5% significance level, to test if the population mean survey score is higher than 70. [6 Marks]
(ii) Discuss why there is no need to have the actual dataset (e.g., a csv file containing several rows of records) to complete this hypothesis test. [3 Marks]
(Hint: Implement all steps of the hypothesis test. This question requires some knowledge from Module 5. You do not need R to complete Question 3.)