辅导data编程、讲解R编程语言、R程序调试 讲解Database|辅导R语言程序
- 首页 >> Python编程 Homework 6
1. From the Indiegogo (https://webrobots.io/indiegogo-dataset/ ) dataset you need to download at
least 5 JSON (or CSV) files. Use the content of “tagline” or “title” from downloaded files.
2. Extract the article title from your downloaded dataset and use “bag of words” to convert the
article title into set of words (3 points).
3. Use Latent Semantic Analysis (LSA) or Latent Dirichlet Allocation (LDA) to cluster them (4
points). You need to do research online about how using
LSA and LDA and do they work, we did not develop them in the class, to enforce you do the
research on your own.
4. Visualize the article clustering results, in a dendrogram or heatmap (3 points). Please be sure
to show a clean dendrogram and not a sloppy dendrogram. Delivering an imbalanced unclear
dendrogram, or a heatmap with too small text, will result in reducing your grade. You might
need to annotate your dendrogram by hand and add information to it. Please make the
dendrogram nice and clean, one line represents one thing
Please make the visualization nice. Add more setting of the visualization.
Besides, please spend time and create a proper visualization for your experiment. At this time,
you should be able to create proper visualization and do not rely on default font settings.
You need to prepare a report on your tasks and findings along with a video file describing what
you have done. You can copy paste your codes, its results and your description into a Word
document, Python Notebook or you can use R notebook.
Your deadline for delivering this home work is written on the blackboard online. Please send
your questions to TA and if required to RA.
1. From the Indiegogo (https://webrobots.io/indiegogo-dataset/ ) dataset you need to download at
least 5 JSON (or CSV) files. Use the content of “tagline” or “title” from downloaded files.
2. Extract the article title from your downloaded dataset and use “bag of words” to convert the
article title into set of words (3 points).
3. Use Latent Semantic Analysis (LSA) or Latent Dirichlet Allocation (LDA) to cluster them (4
points). You need to do research online about how using
LSA and LDA and do they work, we did not develop them in the class, to enforce you do the
research on your own.
4. Visualize the article clustering results, in a dendrogram or heatmap (3 points). Please be sure
to show a clean dendrogram and not a sloppy dendrogram. Delivering an imbalanced unclear
dendrogram, or a heatmap with too small text, will result in reducing your grade. You might
need to annotate your dendrogram by hand and add information to it. Please make the
dendrogram nice and clean, one line represents one thing
Please make the visualization nice. Add more setting of the visualization.
Besides, please spend time and create a proper visualization for your experiment. At this time,
you should be able to create proper visualization and do not rely on default font settings.
You need to prepare a report on your tasks and findings along with a video file describing what
you have done. You can copy paste your codes, its results and your description into a Word
document, Python Notebook or you can use R notebook.
Your deadline for delivering this home work is written on the blackboard online. Please send
your questions to TA and if required to RA.