代写DTS208TC Data Analytics and Visualisation Coursework 2代做Python编程

- 首页 >> Java编程

Module code and Title

DTS208TC Data Analytics and Visualisation

School Title

School of AI and Advanced Computing

Assignment Title

Coursework 2

Submission Deadline

03/Apr/2025

Final Word Count

N/A

DTS208TC Data Analytics and Visualisation

Coursework 2

Submission deadline: 11:59pm, 03/Apr/2025

Percentage in final mark: 50%

Learning outcomes assessed: C. Select appropriate data analysis and visualisation methods to highlight particular features for a given data type and a set of analysis objectives or user requirements.

Late policy: 5% of the total marks available for the assessment shall be deducted from the assessment mark for each working day after the submission date, up to a maximum of five working days

Risks:

•    Please read the coursework instructions and requirements carefully. Not following these instructions and requirements may result in loss of marks.

•    Plagiarism results in award of ZERO mark.

•    The formal procedure for submitting coursework at XJTLU is strictly followed. Submission link on

Learning Mall will be provided in due course. The submission timestamp on Learning Mall will be used to check late submission.

•    Academic Integrity Policy is strictly followed.

Overview

In this individual coursework, you will use python to analyse air quality data from the United States across multiple years and states, focusing specifically on California. The dataset includes various air quality metrics, population estimates, and yearly statistics from 2000 to 2022. Your task is to explore trends, identify patterns, and predict California’s Median AQI for 2022 using both data visualization and machine learning. Additionally, you will analyse and compare the predictions from these two methods.

Dataset

The datasetAQI By State 2000-2022 contains the following columns:

Geo_Loc - Geographic location identifier.

Year - The year of the observation.

State - The name of the U.S. state.

Pop_Est - Estimated population for the year and state.

Dys_Blw_Thr - Number of days with air quality below a specific threshold.

Dys_Abv_Thr - Number of days with air quality above a specific threshold.

Good Days - Number of days classified as Good” air quality.

Moderate Days - Number of days classified as Moderate” air quality.

Unhealthy for Sensitive Groups Days - Number of days classified as Unhealthy for sensitive groups” air quality.

Unhealthy Days - Number of days classified as “Unhealthy” air quality.

Very Unhealthy Days - Number of days classified as “Very Unhealthy” air quality.

Hazardous Days - Number of days categorized as Hazardous.”

Max AQI - Maximum Air Quality Index recorded in a year.

Median AQI - Median Air Quality Index recorded in a year.

Submission and Requirements

You are required to submit the following files as part of their coursework:

1.      Task-Specific Python Files:

•       Each task must be implemented in a separate Python script file.

•       Name the files should be task1.py and task2.py

•       Your code needs to include appropriate comments and be well-documented.

2. Report:

•       Complete the provided CW2_Report.docx

•       Please include all source code and results in the report.

•       Ensure that any non-obvious parts of your implementation are explained clearly in the report.

•       The report should be submitted in .pdf format.

3.      Other

•       The original dataset

Tasks

Given the dataset, you are expected to complete the following tasks using the Python programming language. You are allowed to use existing Python libraries to solve the tasks.

T1. Nationwide Visualisation of Air Quality (45 marks)

Explore how air quality has changed across the U.S. over time and analyse its geographic distribution, focusing on patterns and regional differences. Based on the given requirements, you need to select the most appropriate visualization designs (e.g., marks, channels, etc.).

T1-1: Create a visualization showing the trends of Max AQI for all states from 2000 to 2022.

T1-2: Create a visualization showing the distribution of Max AQI by different states for year 2022.

T1-3: Create a visualization showing the distribution of air quality days (Good Days, Moderate Days,

Unhealthy Days, Very Unhealthy Days and Hazardous Days) in California for the year 2000.

T1-4: Describe the design ofT1-1, T1-2 and T1-3. Please fill in the required information in the report.

T2. Predictive Analysis for California (55 marks)

Focus on California’s air quality data to predict its Median AQI for 2022 using two approaches: visual analysis and model-based prediction.

T2-1: Create 5 data visualisation results to show the relationships between California’s Median AQI and its influencing factors (Year (2000 - 2021), Pop_Est, Good Days, Moderate Days, Unhealthy Days) with suitable designs.

T2-2: Based on the visualisation results, describe the relationship between these influencing factors and the Median AQI. Using these relationships and the 2022 influencing factors data for California, predict the Median AQI for California in 2022 without relying on model training. Justify the reason of your prediction.

T2-3: Train a regression model using California’s data from 2000 to 2021. The model should aim to learn the relationships between Median AQI (target variable) and its influencing factors (Year (2000 - 2021), Pop_Est, Good Days, Moderate Days, Unhealthy Days). Choose 2 evaluation metrics to evaluate your model and discuss the performance.

T2-4: Using the trained model and the corresponding factors data from California in the 2022, predict the 2022 Median AQI value for California.

T2-5: Compare the results of the visual prediction from T2-2, the model-based prediction from T2-4, and the ground truth in the dataset. Discuss the differences and explain which approach you find more reliable and why.



站长地图