讲解Hadoop、辅导Python语言、讲解Airline Delays、辅导Python设计
- 首页 >> Python编程Project - Predicting Airline Delays with Hadoop
One of the main goals is using machine learning algorithms to build predictive
models with Python packages and data analysis programs. Training the original
datasets is important to build models with its performance. Finding a good
combination of technologies and programming languages would be cruicial to
make a successful project.
Dataset The data can be downloaded from Bureau of Transportation Statistics
where it is described in detail. An other link to more detailed data can be found
here.
Bureau of Transportation Statistics:
https://www.transtats.bts.gov/OT_Delay/OT_DelayCause1.asp
Detail:https://www.transtats.bts.gov/Fields.asp?Table_ID=236
Possible tools
- Apache Pig - Hadoop?
- Python?
- scikit-learn
Report
The report should briefly cover the following topics :
— Problem Definition : What is the problem that you are trying to solve ? What are the
challenges of this problem ?
— Methodology : What is your methodology to attack the problem and the associated
challenges ? What is the computational and space complexity of your solution in terms of
input size ?
— Results and Discussion : What are the outcomes of the project ?
? — Guideline : Briefly explain which code was used for which task.?Note that your
report should not exceed 8 pages.?