讲解Hadoop、辅导Python语言、讲解Airline Delays、辅导Python设计

- 首页 >> Python编程

Project - Predicting Airline Delays with Hadoop

One of the main goals is using machine learning algorithms to build predictive

models with Python packages and data analysis programs. Training the original

datasets is important to build models with its performance. Finding a good

combination of technologies and programming languages would be cruicial to

make a successful project.

Dataset The data can be downloaded from Bureau of Transportation Statistics

where it is described in detail. An other link to more detailed data can be found

here.

Bureau of Transportation Statistics:

https://www.transtats.bts.gov/OT_Delay/OT_DelayCause1.asp

Detail:https://www.transtats.bts.gov/Fields.asp?Table_ID=236

Possible tools

- Apache Pig - Hadoop?

- Python?

- scikit-learn

Report

The report should briefly cover the following topics :

— Problem Definition : What is the problem that you are trying to solve ? What are the

challenges of this problem ?

— Methodology : What is your methodology to attack the problem and the associated

challenges ? What is the computational and space complexity of your solution in terms of

input size ?

— Results and Discussion : What are the outcomes of the project ?

? — Guideline : Briefly explain which code was used for which task.?Note that your

report should not exceed 8 pages.?


站长地图