代做36103 Statistical Thinking for Data Science Autumn 2024帮做Python编程
- 首页 >> Algorithm 算法SUBJECT OUTLINE
36103 Statistical Thinking for Data Science
Subject description
Statistical thinking is the foundational mindset in data science, emphasizing the use of statistical principles and methods to understand, analyze, and derive meaningful insights from data. It serves as the core of data science. This subject equips students with essential skills and concepts for applying statistical thinking in the context of applied data science. Initially, students are introduced to fundamental statistical principles, developing a simultaneous understanding of modern methods for statistical inference, and gaining valuable hands-on experience with real-world data. Subsequently, they delve into a range of statistical models and estimation techniques, applying their acquired knowledge to engage in a complete data science research cycle. Collaborating in teams, students learn how to formulate research inquiries, employ formal statistics and real-world datasets to address them, and effectively communicate their findings through both oral presentations and written reports.
The progression of this subject starts with more teaching-intensive methods such as workshops and lectures to give students the technical and conceptual know-how to work as practicing data scientists. As the subject progresses, students increasingly move towards an individually driven learning mode, allowing both teams and individuals the flexibility to enhance their statistical thinking and skills.
Upon completion of the subject, students possess a robust foundation in technical, conceptual, and practical aspects, empowering them to continue their development as Data Scientists.
Subject learning objectives (SLOs)
Upon successful completion of this subject students should be able to:
1. Manage the complexity of real data science projects and their inevitable compromises
2. Formulate authentic data science questions precise enough to be answered by valid statistical techniques
3. Justify the use of different statistical concepts and tools to audiences from a wide range of backgrounds
4. Find, clean, and merge datasets from a range of sources to answer real world data science problems
5. Apply statistical methods that are appropriate to a dataset and stakeholder requirements
6. Interpret the results of a statistical analysis correctly, visualizing and reporting upon them in ways that create value for, and are sensitive to the needs of, a wide range of stakeholders
7. Collaborate with and contribute to the professional community of data scientists, both local and global
Course intended learning outcomes (CILOs)
This subject also contributes specifically to the development of the following course outcomes:
Exploring and testing models and describing behaviours of complex systems
Explore and test models and generalisations for describing the behaviour of sociotechnical systems and selecting data sources, taking into account the needs and values of different contexts and stakeholders (1.2)
Making the invisible visible
Use transdisciplinary approaches to seeing and doing to uncover underrepresented, or misrepresented, elements of a system (1.4)
● Exploring, interpreting and visualising data
Explore, analyse, manipulate, interpret and visualise data using data science techniques, software and technologies to make sense of data rich environments (2.2)
. Designing and managing data investigations
Apply and assess data science concepts, theories, practices and tools for designing and managing data discovery investigations in professional environments that draw upon diverse data sources, including efforts to shed light on underrepresented components (2.4)
Developing strategies for innovation
Explore, interrogate, generate, apply, test and evaluate problem-solving strategies to extract economic, business, social, strategic or other value from data (3.1)
Working together
Develop a collaborative and team-oriented mindset to harness value for stakeholders to produce innovative solutions to challenges (3.3)
Engaging audiences
Explore and craft interpretative narratives that engage key audiences with data analytics and potential significance for action, at a societal, industrial, organisational, group or individual levels (4.2)
· Informing decision making
Develop, test, justify and deliver data project propositions, methodologies, analytics outcomes and
recommendations for informing decision-making, both to specialist and non-specialist audiences (4.3)
Contribution to the development of graduate attributes
Your experiences as a student in this subject support you to develop the following graduate attributes (GA):
GA 1 Sociotechnical systems thinking
GA 2 Creative, analytical and rigorous sense making
GA 3 Create value in problem solving and inquiry
GA 4 Persuasive and robust communication
Teaching and learning strategies
Authentic problem based learning: This subject relies heavily upon the principle that students learn best by doing. It offers a range of authentic data science problems to solve that help to develop students’ statistical thinking about complex problems. Students work on real world data analysis problems using datasets that they create using modern data harvesting techniques. These are used to answer realistic data science questions in broad areas of topical interest. This exposes them to the true ambiguities, constraints, and complexities of working as a data scientist for a variety of different stakeholders.
Blend of online and face to face activities: This subject is offered through a series of block sessions blending online with face-to-face learning. Students interact face-to-face with each other and the teaching team in three intensive modules that require the completion of both preparation and after class activities. They concurrently use a range of complementary online resources to develop their statistical thinking according to identified weaknesses in their background knowledge. They are expected to engage in online discussion and to actively participate in other blended activities.
Collaborative work: We place a strong emphasis on group activities and collaboration in diverse teams. As a data science professional you need to approach professional projects and challenges by working with people from different backgrounds, expectations, and expertise. This course simulates that environment by requiring students to work with a team of peers who come from many different backgrounds. Group assessments help students to develop effective strategies for working as a part of a data science team, as well as an appreciation that there are diverse perspectives on many different topics in data science and innovation.
Self paced evaluation and improvement: This subject takes students from an exceptionally wide range of backgrounds, some of who are better versed in statistical methods, and Python, than others. We help all students to self-diagnose their weaknesses and strengths, and to work to improve in areas that they identify as a priority for the professional niche that they would like to occupy as a practicing data scientist. Students choose their own path through a wide variety of curated resources as needed.
Embedding English Language: An aim of this subject is to help you develop academic and professional language and communication skills in order to succeed at university and in the workplace. To determine your current academic language proficiency, you are required to complete an online language screening task, OPELA (information available at
https://www.edu.au/research-and-teaching/learning-and-teaching/enhancing/language-and-learning/about-opela-student If you receive a Basic grade for OPELA, you must attend additional Language Development Tutorials (each week from week [3/4] to week [11/12] in order to pass the subject. These tutorials are designed to support you to develop your language and communication skills. Students who do not complete the OPELA and/or do not attend 80% of the Language Development Tutorials will receive a Fail X grade
Assessment |
This subject is 100% coursework based with no exams. A detailed assessment brief is available on canvas detailing each assessment task, please refer to this throughout the course. Assessments are a blend of individual and team-based work. |
Assessment task 1: Exploration of data skills and issues |
Objective(s): 3 and 5 Type: Report
Groupwork: Individual Weight: 20%
Task: This assessment is intended to conduct exploratory data analysis (EDA) on a marketing campaign
dataset from a telecommunication company. A telecommunication company recently launched a
marketing campaign to promote the adoption of their new subscription plan among customers. The company seeks assistance in gaining a comprehensive understanding of their customers and
identifying the customer segments that display the highest responsiveness to marketing campaigns. The response variable, subscribed, indicates whether the client subscribed to a new plan, which was the objective of the campaign.
The dataset may have issues such missing information and data errors. Identifying and handling such issues is part of the assessment.
The requirements involve applying a minimum of three distinct exploratory data analysis techniques to gain preliminary insights from the data.
Length: A maximum of 7 pages
Due: 11.59pm Sunday 10 March 2024
Assessment task 2: Data analysis project |
Objective(s): 1, 2, 3, 4, 5, 6 and 7 Type: Project Groupwork: Group, group assessed Weight: 30% Task: Students work in teams of 5-7 people with complementary skills and backgrounds. Each team selects a context and work to define research questions that help them to propose, exectue, and disseminate a data science project. Project presentation (group) worth 15% Students work in teams to carry out their proposed project. Projects are presented to the class. Project report (group) worth 15% Students work in teams to carry out their proposed project. Project reports are submitted in written format. Length: Group Presentation: 10-15 minutes Group Report: 500-700 words Due: See Further information. Further Presentation: Saturday 27 April online information: Report: Due 11:59 pm Sunday 12 May |
Assessment task 3: Individual project exploration |
Objective(s): 2, 3 and 6 Type: Project
Groupwork: Individual Weight: 50%
Task: Assessment 3 builds on Assessment 1.
The objective of this assessment is to develop data science models that provide insights into the business question of which customer segments are most responsive to marketing campaigns. Your report must show results for at least two different sets of predictions.
• At least one of your models should be a parametric model.
• At least one of your models should be a non-parametric model.
You should use at least one estimation method introduced in Module 3. Your report should include the following elements:
1. Justification for modelselection, including an explanation of the configuration and training choices made.
2. Parametric estimates and their corresponding interpretations.
3. A comparative analysis of the models, employing cross-validation or validation metrics.
4. Proficiency in Data mining, demonstrated by the ability to extract relevant business insights from the data and effectively articulating them.
Length: 700 to 1000 words Canvas Submission.
Due: 11.59pm Sunday 2 June 2024
Minimum requirements
To meet the minimum requirement for the course, students must attain a minimum of 50% marks to pass.
Additionally, it is a requirement of this subject that all students complete OPELA. Students who received a Basic grade in the OPELA are required to attend 80% of the Language Development Tutorials in order to pass the subject.
Students who do not complete the OPELA and/or do not attend 80% of the Language Development Tutorials will receive a Fail X grade.
Recommended texts
Other learning resources: Depending on your background and what you are planning to learn you will find at least one useful. You are not expected to read all of these resources cover-to-cover. Use them to help you solve specific problems. To learn statistical concepts: James, G., Witten, D., Hastie, T. and Tibshirani, R. (2021). An Introduction to Statistical Learning with Applications in R (Second Edition). New York: Springer. (An Introduction to Statistical Learning (statlearning.com)) Bruce, P., & Bruce, A. (2017). Practical Statistics for Data Scientists: 50 Essential Concepts. O'Reilly Media, Inc. You can get it here. We will refer to it as PSDS in this subject. To learn linear regression modelling: Brian Caffo, Regression models for Data Science in R, Lean pubs. You can get a free copy here: leanpub.com/regmods/read . It is written as a companion book to the Coursera Regression Models class,and also has a series of YouTube videos accompanying it. We will refer to it as RM throughout this subject. To run a good Data Science project: Godsey, B. (2017). Think Like a Data Scientist: Tackle the data science process step-by-step. Manning Publications Co.. You can get it here.
|