讲解PCA留学生、辅导Python程序语言、讲解Jupyter、Python编程辅导

2018.12.18 - 首页 >> Python编程

Requirement

1. Pick up ONE (or more if you like) favorite challenges below. If you would like to work on a different problem outside the candidates we proposed, please email course instructors about your proposal. Brave hearts for explorations will be encouraged!

2. Team work: we encourage you to form small team, up to THREE persons per group, to work on the same problem. Each team just submit ONE report, with a clear remark on each person’s contribution. The report can be in the format of either Python (Jupyter) Notebooks with a detailed documentation, a poster such as (see the appendix)

or a technical report within 8 pages, e.g. NIPS conference style https://nips.cc/Conferences/2016/PaperInformation/StyleFiles.

In the report, show your proposed scientific questions to explore and main results with a careful analysis supporting the results toward answering your problems. Remember: scientific analysis and reasoning are more important than merely the performance tables. Separate source codes may be submitted through email as a .zip file, GitHub link, or as an appendix if it is not large. There is no restriction on the programming languages to use, but Python is recommended.

Project List:

1. Transfer Learning

You are required to do the transfer learning on (at least) one dataset below. The following procedures is for your reference.

Feature extraction by pre-trained deep neural networks, e.g. VGG19, and resnet18, etc.;

Visualize these features using classical unsupervised learning methods, e.g. PCA, clustering, etc.;

Image classifications using traditional supervised learning methods based on the features extracted, e.g. LDA, logistic regression, SVM, random forests, etc.;

(Optional) Train the last layer or fine-tune the deep neural networks in your choice, that may need GPUs to speed up;

Compare the results you obtained and give your own analysis on explaining the phenomena. Below are some candidate datasets.

1.1 MNIST dataset – a Warmup

Yann LeCun’s website contains original MNIST dataset of 60,000 training images and 10,000 test images.

http://yann.lecun.com/exdb/mnist/

There are various ways to download and parse MNIST files. For example, Python users may refer to the following website:

https://github.com/datapythonista/mnist

or MXNET tutorial on mnist

https://mxnet.incubator.apache.org/tutorials/python/mnist.html

1.2 Fashion-MNIST dataset

Zalando’s Fashion-MNIST dataset of 60,000 training images and 10,000 test images, of size 28-by-28 in grayscale.

https://github.com/zalandoresearch/fashion-mnist

Remind: You should be really careful when reporting test error evaluation. For example, You cannot directly tune parameters (shallow learning or fine tuning) to make your leave one out error least and report it as the test error estimation. In this problem, it’s easy to find some hyperparameters to overfit due to the small size of data (even if you augment training dataset, batch effect make the augmented crop images from the same paint similar in feature space, then finding such hyperparamters to overfit is basically as easy as before).

2. From Project Warmup: Kaggle contest classification: Predict survival on the Titanic

The following website contains the Kaggle contest on predicting survival (binary classification) on the Titanic:

https://www.kaggle.com/c/titanic/

Register the Kaggle and join the contest by submitting your predictions. Report your methods and the corresponding scores (accuracy) on the leaderboard (your registered name and ranking results).

3. From Project Warmup: Kaggle contest regression: Predict house sales prices

The following website contains a Kaggle contest on predicting house sales prices (regression) using the Ames Housing dataset:

https://www.kaggle.com/c/house-prices-advanced-regression-techniques/

It is aimed for practicing feature engineering, RFs, and gradient boosting etc. Register the Kaggle and join the contest by submitting your predictions. Report your methods and the corresponding scores (RMSE) on the leaderboard (your registered name and ranking results).