讲解INF 510、辅导Milestone留学生、SQL语言讲解、SQL编程设计调试
- 首页 >> 其他 Homework 6 (60 points)
Due Thursday, April 25th at 11:59pm (via blackboard)
AKA “Project” Milestone #2
In this assignment, we’re taking the raw data we obtained from HW5, and we’re building a data
model for this. This can be anything you like (for example: SQL relationships, a class hierarchy,
setting up Pandas dataframes, SQLalchemy, etc. This list is not exhaustive!) You have the
freedom to interface with your data however you’d like, but keep in mind that regardless of
how simple you think the data is, your solution will be graded on how useful, extensible,
modular and robust your solution is. Better solutions get better scores!
You are to turn in your Python code for your project so far, including the code you wrote in
HW5(i.e. this new code should integrate with the old code). You can turn in any number of
supporting files (libraries, modules, etc.) but you must follow the same format as before:
Name your script: LASTNAME_FIRSTNAME_hw6.py (you will LOSE points if you don’t do this!)
Your script should be modular in that it allows you to obtain the data from the scraper/API (as
in HW5) but also obtain it from local storage. How you implemented this (text files, CSV,
cached webpages, SQL files, Feather serialized dataframes, etc.) is up to you. There should be a
–source=remote or –source=local command line parameter (remember the lecture on args and
kwargs!)
When invoked, your Python script should grab the data (either locally or remotely) stick it into
your data model, and then retrieve it and manipulate it in some way. How you do this is up to
you; just imagine doing one of whatever computation you’ll end up doing for the final project.
For example, if your data sources were, say, lat/long combinations, a google API and voting
records, you might grab the lat/long, ask the google API for the closest city, and then get the
voting records for that city. You’d display a “result” (just one!) [You’ll save the “final”
result/conclusion for the last part of the project]
In addition, you should turn in a plain text file named LASTNAME_FIRSTNAME_hw6.txt (NO
DOC, PDF, OR ANYTHING ELSE), that answers the following questions:
1. What are the strengths of your data modeling format?
2. What are the weaknesses? (Does your data model support? Sorting the information?
Re-ordering it? Only obtaining a certain subset of the information?)
3. How do you store your data on disk? 4. Let’s say you find another data source that relates to all 3 of your data sources (i.e. a
data source that relates to your existing data). How would you extend your model to
include this new data source? How would that change the interface?
5. How would you add a new attribute to your data (i.e. imagine you had a lat/long
column in a database. You might use that to access an API to get a city name. How
would you add city name to your data?)
The rubric for HW6 is as follows:
Python Code
o Code is modular and robust: /20
o Code displays a result: /5
o Remote and local source command line parameter: /5
o Code is poorly documented: -5
o Runtime error: -5
Data Model
o Question 1: /3
o Question 2: /5
o Question 3: /2
o Question 4: /10
o Question 5: /10
Due Thursday, April 25th at 11:59pm (via blackboard)
Due Thursday, April 25th at 11:59pm (via blackboard)
AKA “Project” Milestone #2
In this assignment, we’re taking the raw data we obtained from HW5, and we’re building a data
model for this. This can be anything you like (for example: SQL relationships, a class hierarchy,
setting up Pandas dataframes, SQLalchemy, etc. This list is not exhaustive!) You have the
freedom to interface with your data however you’d like, but keep in mind that regardless of
how simple you think the data is, your solution will be graded on how useful, extensible,
modular and robust your solution is. Better solutions get better scores!
You are to turn in your Python code for your project so far, including the code you wrote in
HW5(i.e. this new code should integrate with the old code). You can turn in any number of
supporting files (libraries, modules, etc.) but you must follow the same format as before:
Name your script: LASTNAME_FIRSTNAME_hw6.py (you will LOSE points if you don’t do this!)
Your script should be modular in that it allows you to obtain the data from the scraper/API (as
in HW5) but also obtain it from local storage. How you implemented this (text files, CSV,
cached webpages, SQL files, Feather serialized dataframes, etc.) is up to you. There should be a
–source=remote or –source=local command line parameter (remember the lecture on args and
kwargs!)
When invoked, your Python script should grab the data (either locally or remotely) stick it into
your data model, and then retrieve it and manipulate it in some way. How you do this is up to
you; just imagine doing one of whatever computation you’ll end up doing for the final project.
For example, if your data sources were, say, lat/long combinations, a google API and voting
records, you might grab the lat/long, ask the google API for the closest city, and then get the
voting records for that city. You’d display a “result” (just one!) [You’ll save the “final”
result/conclusion for the last part of the project]
In addition, you should turn in a plain text file named LASTNAME_FIRSTNAME_hw6.txt (NO
DOC, PDF, OR ANYTHING ELSE), that answers the following questions:
1. What are the strengths of your data modeling format?
2. What are the weaknesses? (Does your data model support? Sorting the information?
Re-ordering it? Only obtaining a certain subset of the information?)
3. How do you store your data on disk? 4. Let’s say you find another data source that relates to all 3 of your data sources (i.e. a
data source that relates to your existing data). How would you extend your model to
include this new data source? How would that change the interface?
5. How would you add a new attribute to your data (i.e. imagine you had a lat/long
column in a database. You might use that to access an API to get a city name. How
would you add city name to your data?)
The rubric for HW6 is as follows:
Python Code
o Code is modular and robust: /20
o Code displays a result: /5
o Remote and local source command line parameter: /5
o Code is poorly documented: -5
o Runtime error: -5
Data Model
o Question 1: /3
o Question 2: /5
o Question 3: /2
o Question 4: /10
o Question 5: /10
Due Thursday, April 25th at 11:59pm (via blackboard)