代写STA304H5F: Surveys, Sampling and Observational Data代写留学生Python语言
- 首页 >> Algorithm 算法STA304H5F: Surveys, Sampling and Observational Data
Technical Report Instructions
The typical flow of a technical article is: abstract, introduction, methodology, analysis, discussion/results, conclusion/limitations, appendix. It does NOT need to follow this exact format (some journal articles have slightly different formats) .
Depending on how much you want to elaborate, you can split certain sections into two parts (i.e. Analysis could become Quantitative Analysis and Qualitative analysis.) You have free reign to decide your outline insofar as it makes logical sense.
Journal articles are often written for an audience that knows statistics well. That is,
you will not need to elaborate terms such as p-values, null & alternative hypotheses,
etc. It will be the responsibility of the reader to understand the statistical analysis
method. However, it would be nice to have some plain language conclusions at the end for readability.
We do not expect students to write or perform literature review, ethical statements,
conflicts of interest, or funding. (However, it would be nice to include literature review if you had the time.)
We are expecting students to use either LaTeX, R-Markdown, or Quarto for the report.
WARNING: DO NOT SAY AFFECT AND OTHER WORDS THAT IMPLY CAUSALITY.
(SUCH AS: Influence, impact, contribute…)
This is not a controlled randomized trial! You should look for words such as association, relationship, connection.
1. Abstract
In prior versions of this course, we did not request for an abstract. However, abstracts are necessary for any academic article. Your abstract should be a quick summary of
your paper and is shown before the introduction. They are used for other researchers to tell whether the paper is worth reading. It should address the following:
• A brief introduction to the topic.
• Aim of the paper.
• A brief statement regarding the data collection methodology.
• A summary of key findings.
• A brief overview of implications of results, or what needs to be improved.
Here’s an example of an abstract (colour coded to match the above description):
In a statistics program, most courses emphasize statistical theory over practical applications, often resulting in a focus on examinations rather than assignments. However, many statistics programs include a course on survey and sampling design, where students can be assessed through projects. Designing these projects requires more resources than tests, including increased grading workload and time spent resolving group conflicts. This study examines whether students prefer projects over traditional examinations and identifies the benefits they perceive from project-based learning. In a third-year statistics survey & sampling course, we asked students their perceptions of project based learning through Google Forms after completion of the final project. The results show that 81% of students prefer projects and found them useful for developing skill sets necessary for the workforce. The main reasons for disliking projects were group conflicts and unclear instructions. In response, we plan to provide more resources to support student success and to find ways to mitigate potential group disputes in the future.
2. Introduction
Within the introduction, the following questions should be answered:
• What is your study about and why should we care?
• What background information is necessary to let the reader understand?
• What are your research questions?
• What are your hypotheses?
• What is the brief outline of the rest of your paper?
Below, a student wrote two different drafts of an introduction regarding a study including cannabis usage:
Draft 1:
Those opposed to the decriminalization of cannabis will often cite that it is destroying the youth and their cognitive ability to function in society. Cannabis is a popular drug that people occasionally will smoke for leisure activity. There are studies done that show that prolonged cannabis consumption is not good for brain development, and perhaps this could influence students to care less about school. We aim to analyze the following research question:
• (RQ1) What are the impacts of cannabis usage on lecture attendance?
o Null hypothesis: cannabis usage has no association on lecture attendance.
o Alternative hypothesis: cannabis usage is correlated with a lower attendance rate.
• (RQ2) What are the impacts of cannabis usage on students’ grades?
o Null hypothesis: cannabis usage has no association on students’ grades.
o Alternative hypothesis: cannabis usage is correlated with lower grades.
• (RQ3) Do students perceive cannabis to be a positive asset in their life?
o Null hypothesis: cannabis usage is not associated with being an advantage to one’s life.
o Alternative hypothesis: cannabis usage is linked to improving one’s lifestyle.
There are some issues with this introduction:
1. Fairly bland, boring, and sometimes awkward.
2. It doesn’t transition well, i.e., the introduction of research questions were quite abrupt.
3. What is their exact population? Where could they be gathering this information?
4. What is the outline for the rest of the paper?
Draft 2:
This study examines the impact of cannabis consumption on university students' lecture attendance. Cannabis, a psychoactive substance frequently used by young adults for recreational purposes, has garnered increasing attention due to its potential implications for cognitive and behavioral outcomes, particularly among students.
Extensive research has suggested that prolonged and frequent cannabis consumption may have adverse effects on brain development and cognitive function (Iversen, 2003) .
In light of this, we formulate the hypothesis that individuals who engage in regular, weekly cannabis use are more prone to reduced lecture attendance. This research endeavors to investigate the relationship between cannabis consumption patterns and student engagement with academic activities, shedding light on an area of growing concern in contemporary education.
The structure of the paper is as follows: Section 2 outlines our data collection methods. Section 3 presents our quantitative analysis. In Section 4, we explore feedback from students and the common themes that arise. Section 5 discusses the qualitative results and addresses our research question. Section 6 covers the limitations of our study, and Section 7 concludes our analysis.
Again, there are some issues with this introduction:
1. This is not concise (mostly the second paragraph), and honestly boring to read. It sounds like there is an abundance of “fluff” to over-compensate for lack of substance. (What is contemporary education?)
2. The research questions and hypotheses are not clear. (They also don’t mention enough.)
3. Again, we are unsure of the exact population and where they could possibly be gathering this information.
We can combine drafts 1 and 2, and integrate the missing pieces, to create a superior introduction (note: longer doesn’t always mean better, it’s just that the previous two drafts had missing information) .
Final Draft:
Cannabis faces significant stigma from older generations due to its illegalization and negative stereotypes, such as the belief that it increases laziness. Moreover, extensive research suggests that prolonged and frequent cannabis consumption may have adverse effects on brain development and cognitive function (Iversen, 2003) . While some individuals may use cannabis for recreational purposes, it can also provide relief from medical conditions that induce high levels of pain.
In this study, we analyze whether the stigma against cannabis is well-deserved.
Specifically, we examine the impact of cannabis consumption on university students' lecture attendance and grades at a North American intensive university. In October 2024, we deployed a survey via email to collect data on students’ cannabis usage, academic performance, demographic factors, and attitudes towards cannabis. The survey included both users and non-users of cannabis for comparisons. Additionally, we carefully differentiated between prescribed and recreational cannabis use. We aim to study the following research questions:
• (RQ1) What are the impacts of cannabis usage on lecture attendance?
o Null hypothesis: cannabis usage has no association on lecture attendance.
o Alternative hypothesis: cannabis usage is correlated with a lower attendance rate.
• (RQ2) What are the impacts of cannabis usage on students’ grades?
o Null hypothesis: cannabis usage has no association on students’ grades.
o Alternative hypothesis: cannabis usage is correlated with lower grades.
• (RQ3) Do students perceive cannabis to be a positive asset in their life?
o Null hypothesis: cannabis usage is not associated with being an advantage to one’s life.
o Alternative hypothesis: cannabis usage is linked to improving one’s lifestyle.
The structure of the paper is as follows: Section 2 outlines our data collection methods. Section 3 presents our quantitative analysis. In Section 4, we explore feedback from students and the common themes that arise. Section 5 discusses the qualitative results and addresses our research question. Section 6 covers the limitations of our study, and Section 7 concludes our analysis.
3. Methodology
This is where you outline your data collection methodology in detail.
● Where and when was the data collected? (Piazza, lectures, tutorials, online databases…)
● What sampling method did you use? (SRS, stratified, etc.…)
○ How did you ensure randomness? (randomly sampling from an R
program, systematically deploying surveys in lecture based on seating arrangements…)
● What were your strata, if any?
● What was your sample size?
● What is a general summary of the questions you asked? (What are your variables?)
Below we present an example.
Between May 2023 to August 2023, a survey meant to understand students’ academic performance and recreational drug usage was deployed within an introductory statistics course at a research-intensive North American university. We utilized simple random sampling by using a Python random generator to randomly sample 50 students from an email list of all students taking the introductory statistics course (N = 242) . Out of 50 students, 19 did not respond, thus we were left with only (n = 31) responses. The survey consisted of 10 short answer questions, asking for their average cannabis usage per week, lecture attendance, and miscellaneous demographic factors (gender, program of study, age.)
4. Analysis
WARNING: In a journal article, codes are not provided unless in an appendix. Do not put R-code here.
Things that are included in this section:
● Show relevant graphs & tables.
● Show computations for your sample size.
● Necessary assumptions for tests are shown before utilizing them. (If assumptions are not satisfied, you should not be using that test!)
○ For example, the assumptions for the two-sample test for mean are the following:
■ The samples are independent from each other & are obtained randomly.
■ The samples are normally distributed.
■ The variances for the two independent groups are equal.
● Show the outputs of the computations and statistical tests, i.e., p-values, test- statistics are provided here.
○ Remark: the tests for significance should be done at the 0.05 level.
● Some statistical information that may be useful to reference (depending on the context): standard deviation, confidence interval, median, IQR.
If you have used a questionnaire, use N = 200 for computing the sample size. n, unfortunately, is already known. So, you will be reverse calculating as if you wanted a certain bound to fit your sample size.
You do NOT need to show EVERYTHING you computed. Only the computations that you will DISCUSS in the report. Normally, a lot of side-things you compute at first will have no statistically significant results. (I.e., gender may have no connection with the amount of cannabis a person will smoke.)
Part of this project is having the ability to decipher what is important to report. This is a snippet of what to include in the analysis section:
For our study, our population size is N = 300 and we plan to collect data using simple random sampling. To determine a sample size to collect, we’ll go with the calculation focusing on the mean population parameter. It is assumed that there is an equal proportion amongst those who enjoy projects in courses versus those who enjoy examinations. Hence, given a bound of error of 0.13, the sample size calculation ends up being like:
Hence, we sampled 50 students for our analysis.
We found that 40% of our participants identified as male (n = 20), and the rest were female (n = 30) . We also had 64% domestic students (n = 32) and 36% international students (n = 18) . Surprisingly, only 40% said they preferred projects over examinations (n = 20) .
To answer RQ1, we need to calculate the one-sample proportion test. All of our assumptions are satisfied, as we have a random sample, a binomial distribution (two binary outcomes/choices) and we have at least 5 from each outcome (np = 20, n(1-p) = 30) . We obtain a Chi-square test statistic of 2, df = 1, and p = 0.1573.
RQ2 tries to determine if certain demographic factors contribute to whether someone prefers assignments. Again, we need to check if the assumptions for the two-sample proportion test are satisfied. We have the same binomial distribution, and the samples are still independent from each other.
Below is a table that summarizes the last assumptions. All assumptions were satisfied, so we included a brief result of the statistical test.
We also analyzed the Likert scale data, which had options which ranged from:
“strongly disagree”, “disagree”, “slightly disagree”, “neutral”, “slightly agree”, “agree”, and “strongly agree” . We denoted “strongly disagree” as 1, “strongly agree” as 7.
To see whether students perceived their skills to be enhanced through project courses in general, we conducted the one-sample t-test for the mean, where option 4 (neutral) denotes no change. The assumptions for the one-sample test involves the data being randomly selected, independent from each other, and the normal appears normally distributed. The first two conditions have been addressed, and the last condition can be satisfied by the central limit theorem due to having 50 samples which is greater than 30.
Below is a summary of the Likert scale questions, as well as the one-sample test for the mean:
Remark 1: some interesting data points to include would be the confidence interval, which I omitted due to laziness. I also did not include “advanced tests” .
Remark 2: it was unrelated to any of the research questions, but if you peek at the attached code file I tested to see if there were any differences amongst the perceptions of Likert scale answers between male & female, and international & domestic. There were no differences, so I did not include them. If you have time for more results, and they do end up being significant, it would be interesting to include them.
5. Discussion/Results
In this section you should write the interpretation of your results , and hopefully in plain language conclusions.
You may also use this section to elaborate facts regarding a table or a graph in detail, such as pointing out abnormalities or highlighting the stark differences between two groups in your data.
A discussion section is typically longer the more interesting the results, but it can be made relatively short if no results were found.
6. Limitations
Briefly mention limitations and what should be done in the future. Consider what you would’ve changed in your survey if you went back in time to redo the study. A common limitation in studies involve an inadequate sample size, failing to reject the null hypothesis, missing confounding variables, biased sample size and/or survey, poor survey questions.
A common issue in prior years is that students mentioned limitations that were more of a critique of the method as opposed to their own study. They probably stole these ideas from ChatGPT. As a warning, the following are not sufficient limitations (and I will address why):
• Lack of causal connections. STA304’s official name is “Survey, Sampling, &
Observational Data” . Observational data, in nature, will never lead to causal results. If you want to make a causal connection, you’re in the wrong course.
• Self-reported data. You want to stalk and hack into a database that reveals information about your fellow classmates just for your project!? The course name also includes “surveys”! Surveys will always be tied with self-reported data.
• Temporal factors. Maybe this would cause an effect … But realistically, are you going to spend a year collecting survey data? I doubt most people’s opinions don’t drastically change unless your project is entirely season dependent …
There are more nonsensical answers that generative AI tools will spout out. I am begging you to take a second to think about your limitations rather than consulting AI. (In fact, I may have literally given the answers in the first paragraph of this section.)
Here’s an example of an adequate limitation section:
In our study we tried to see which demographic factors (gender, ethnicity, domestic or international, and year of study) are linked with an overreliance of using AI tools using the Chi Square test for independence or the non-parametric Fisher’s exact test (depending on whether the assumptions of the Chi square were satisfied) .
Unfortunately, in all cases we failed to reject the null hypothesis. Hence, there are no ties between various demographic factors and whether students tend to use AI tools. In fact, we found that 90% of students admitted to using AI generative tools. As a
result, we are less likely to see any relationships due to a high majority already using them. In the future, we will incorporate open-ended questions to ask students to elaborate on their preferences.
7. Conclusion
In this section you should have the following:
• Summarize your findings and explicitly answer your research questions.
o No new information should be provided; everything mentioned here should have been mentioned in the past.
• Talk about more in detail of what researchers should do in the future.
Naturally, people may wonder the difference between an abstract and the conclusion.
• The abstract is supposed to be a concise summary of the entire paper. It goes through the phrases of an introduction, methodology, analysis, and results.
• The conclusion does summarize the paper but emphasizes on the latter half of the paper (results and limitations) . It does not include background information. It will also re-address limitations and go into depth about future directions.