COMP 8042程序辅导、C/C++,Python编程讲解、辅导Java语言编程 辅导Database|讲解Python程序

- 首页 >> Java编程
Put your Name & Id
on the report
Final Project
Due April 12th, 2021 11:45pm
COMP 8042
All work should be done individually.
Geographic Information System
Geographic information systems organize information pertaining to geographic features and
provide various kinds of access to the information. A geographic feature may possess many
attributes (see below). In particular, a geographic feature has a specific location. There
are a number of ways to specify location. For this project, we will use latitude and longitude,
which will allow us to deal with geographic features at any location on Earth. A
reasonably detailed tutorial on latitude and longitude can be found in the Wikipedia at
http://en.wikipedia.org/wiki/Latitude and http://en.wikipedia.org/wiki/Longitude.
The GIS record files were obtained from the website for the USGS Board on Geographic
Names (http://geonames.usgs.gov). The file begins with a descriptive header line, followed
by a sequence of GIS records, one per line, which contain the fields provided in Table 1 in
Appendix in the indicated order.
Notes:
• See https://geonames.usgs.gov/docs/pubs/Nat State Topic File formats.pdf for the field
descriptions.
• The type specifications used here have been modified from the source (URL above) to
better reflect the realities of your programming environment.
• Latitude and longitude may be expressed in DMS (degrees/minutes/seconds, 0820830W)
format, or DEC (real number, -82.1417975) format. In DMS format, latitude will always
be expressed using 6 digits followed by a single character specifying the hemisphere, and
longitude will always be expressed using 7 digits1
followed by a hemisphere designator
(for more information look at [This Link]).
• Although some fields are mandatory, some may be omitted altogether. Best practice is
to treat every field as if it may be left unspecified. Certain fields are necessary in order
to index a record: the feature name and the primary latitude and primary longitude.
If a record omits any of those fields, you may discard the record, or index it as far as
possible.
In the GIS record file, each record will occur on a single line, and the fields will be separated
by pipe (‘|’) symbols. Empty fields will be indicated by a pair of pipe symbols with no
characters between them. See the posted VA Monterey.txt file for many examples.
1While latitude lines range between -90 and +90 degrees, longitude coordinates are between -180 and
+180 degrees.
1
Put your Name & Id
on the report
Final Project
Due April 12th, 2021 11:45pm
COMP 8042
Hassan S. Shavarani
GIS record files are guaranteed to conform to this syntax, so there is no explicit requirement
that you validate the files. On the other hand, some error-checking during parsing may help
you detect errors in your parsing logic.
The file can be thought of as a sequence of bytes, each at a unique offset from the beginning
of the file, just like the cells of an array. So, each GIS record begins at a unique offset from
the beginning of the file.
Line Termination
Each line of a text file ends with a particular marker (known as the line terminator). In
MS-DOS/Windows file systems, the line terminator is a sequence of two ASCII characters
(CR + LF, 0X0D0A). In Unix systems, the line terminator is a single ASCII character (LF).
Other systems may use other line termination conventions.
Why should you care? Which line termination is used has an effect on the file offsets for all
but the first record in the data file. As long as we’re all testing with files that use the same
line termination, we should all get the same file offsets. But if you change the file format (of
the posted data files) to use different line termination, you will get different file offsets than
are shown in the posted log file(s). Most good text editors will tell you what line termination
is used in an opened file, and also let you change the line termination scheme.
In Figure 1, note that some record fields are optional, and that when there is no given value
for a field, there are still delimiter symbols for it.
Also, some of the lines are “wrapped” to fit into the text box; lines are never “wrapped” in
the actual data files.
Figure 1: Sample Geographic Data Records
2
Put your Name & Id
on the report
Final Project
Due April 12th, 2021 11:45pm
COMP 8042
Hassan S. Shavarani
Project Description
You will implement a system that indexes and provides search features for a file of GIS
records, as described above.
Your system will build and maintain several in-memory index data structures to support
these operations:
• Importing new GIS records into the database file
• Retrieving data for all GIS records matching given geographic coordinates
• Retrieving data for all GIS records matching a given feature name and state
• Retrieving data for all GIS records that fall within a given (rectangular) geographic
region
• Displaying the in-memory indices in a human-readable manner
You will implement a single software system in C++ to perform all system functions.
Program Invocation
The program will take the names of three files from the command line, like this:
./GIS
Note that this implies your main class must be named GIS, and be able to be compiled
simply using a g++ compile command. Preferably, you are encouraged to create make files
for the project and provide the required dependency files in your submission.
The database file should be created as an empty file; note that the specified database file may
already exist, in which case the existing file should be truncated or deleted and recreated.
If the command script file is not found the program should write an error message to the
console and exit. The log file should be rewritten every time the program is run, so if the file
already exists it should be truncated or deleted and recreated.
3
Put your Name & Id
on the report
Final Project
Due April 12th, 2021 11:45pm
COMP 8042
Hassan S. Shavarani
System Overview
The system will create and maintain a GIS database file that contains all the records that
are imported as the program runs. The GIS database file will be empty initially. All the
indexing of records will be done relative to this file.
There is no guarantee that the GIS record file will not contain two or more distinct records
that have the same geographic coordinates. In fact, this is natural since the coordinates are
expressed in the usual DMS system. So, we cannot treat geographic coordinates as a primary
(unique) key.
The GIS records will be indexed by the Feature Name and State (abbreviation) fields. This
name index will support finding offsets of GIS records that match a given feature name and
state abbreviation.
The GIS records will also be indexed by geographic coordinate. This coordinate index will
support finding offsets of GIS records that match a given primary latitude and primary
longitude.
The system will include a buffer pool, as a front end for the GIS database file, to improve
search speed. See the discussion of the buffer pool below for detailed requirements. When
performing searches, retrieving a GIS record from the database file must be managed through
the buffer pool. During an import operation, when records are written to the database file,
the buffer pool will be bypassed, since the buffer pool would not improve performance during
imports.
When searches are performed, complete GIS records will be retrieved from the GIS database
file that your program maintains. The only complete GIS records that are stored in memory
at any time are those that have just been retrieved to satisfy the current search, or individual
GIS records created while importing data or GIS records stored in the buffer pool.
Aside from where specific data structures are required, you may use any suitable STL library
implementation you like (but not other libraries).
Each index should have the ability to write a nicely-formatted display of itself to an output
stream.
Name Index Internals
The name index will use a hash table for its physical organization2
. Each hash table entry
will store a feature name and state abbreviation (separately or concatenated, as you like)
and the file offset(s) of the matching record(s). Since each GIS record occupies one line in
the file, it is a trivial matter to locate and read a record given nothing but the file offset at
which the record begins.
2You are not allowed to use std::map here. You may use and modify any of the lab provided starter
codes as your own implementation of a hash table.
4
Put your Name & Id
on the report
Final Project
Due April 12th, 2021 11:45pm
COMP 8042
Hassan S. Shavarani
Your table will use quadratic probing to resolve collisions, with the quadratic function (n
2+n)
2
to compute the step size. The hash table must use a contiguous physical structure (array).
The initial size of the table will be 1024, and the table will resize itself automatically, by
doubling its size whenever the table becomes 70% full. Since the specified table sizes given
above are powers of 2, an empty slot will always be found unless the table is full.
You can use your desired hash function (e.g. elfhash), and apply it to the concatenation of
the feature name and state abbreviation field of the data records.
You must be able to display the contents of the hash table in a readable manner.
Coordinate Index Internals
The coordinate index will use a bucket PR quadtree3
for the physical organization. In a bucket
PR quadtree, each leaf stores up to K data objects (for some fixed value of K). Upon insertion,
if the added value would fall into a leaf that is already full, then the region corresponding
to the leaf will be partitioned into quadrants and the K+1 data objects will be inserted into
those quadrants as appropriate. As is the case with the regular PR quadtree, this may lead
to a sequence of partitioning steps, extending the relevant branch of the quadtree by multiple
levels. In this project, K will probably equal 4, but I reserve the right to specify a different
bucket size with little notice, so this should be easy to modify.
The index entries held in the quadtree will store a geographic coordinate and a collection of
the file offsets of the matching GIS records in the database file.
Note: do not confuse the bucket size with any limit on the number of GIS records that
may be associated with a single geographic coordinate. A quadtree node can contain index
objects for up to K different geographic coordinates. Each such index object can contain
references to an unlimited number of different GIS records.
The PR quadtree implementation should follow good design practices, and its interface should
be somewhat similar to that of the BST.
You must be able to display the PR quadtree in a readable manner. The display must clearly
indicate the structure of the tree, the relationships between its nodes, and the data objects
in the leaf nodes (think of this problem as an in-order traversal of tree with four-children).
Buffer Pool Details
The buffer pool for the database file should be capable of buffering up to 15 records, and
will use LRU replacement. You may use any structure you like to organize the pool slots;
however, since the pool will have to deal with record replacements, some structures will be
more efficient (and simpler) to use. You may use any classes from STL library you think are
appropriate.
3A quadtree is a tree each node of which can have up to four children. A PR Quadtree limits quadtree
nodes to either 4 children or none!
5
Put your Name & Id
on the report
Final Project
Due April 12th, 2021 11:45pm
COMP 8042
Hassan S. Shavarani
It is up to you to decide whether your buffer pool stores interpreted or raw data; i.e., whether
the buffer pool stores GIS record objects or just strings.
You must be able to display the contents of the buffer pool, listed from MRU to LRU entry,
in a readable manner. The order in which you retrieve records when servicing a multi-match
search is not specified, so such searches may result in different orderings of the records within
the buffer pool. That is OK.
A Note on Coordinates and Spatial Regions
It is important to remember that there are fundamental differences between the notion that
a geographic feature has specific coordinates (which may be thought of as a point) and the
notion that each node of the PR quadtree corresponds to a particular sub-region of the
coordinate space (which may contain many geographic features).
In this project, coordinates of geographic features are specified as latitude/longitude pairs,
and the minimum resolution is one second of arc. Thus, you may think of the geographic
coordinates as being specified by a pair of integer values.
On the other hand, the boundaries of the sub-regions are determined by performing arithmetic
operations, including division, starting with the values that define the boundaries of the
“world”. Unless the dimensions of the world happen to be powers of 2, this can quickly lead
to regions whose boundaries cannot be expressed exactly as integer values. You may use
floating-point values or integer values to represent region boundaries when computing region
boundaries during splitting and quadtree traversals. If you use integers, be careful not to
unintentionally create “gaps” between regions.
Your implementation should view the boundary between regions as belonging to one of those
regions. The choice of a particular rule for handling this situation is left to you.
When carrying out a region search, you must determine whether the search region overlaps
with the region corresponding to a subtree node before descending into that subtree. Think
about a proper divide and conquer approach to solve the region search problem.
Other System Elements
There should be an overall controller that validates the command line arguments and manages
the initialization of the various system components. The controller should hand off execution
to a command processor that manages retrieving commands from the script file, and making
the necessary calls to other components in order to carry out those commands.
Naturally, there should be a data type that models a GIS record.
There may well be additional system elements, whether data types or data structures, or
system components that are not mentioned here. The fact no additional elements are explicitly
identified here does not imply that you will not be expected to analyze the design issues
carefully, and to perhaps include such elements.
6
Put your Name & Id
on the report
Final Project
Due April 12th, 2021 11:45pm
COMP 8042
Hassan S. Shavarani
Aside from the command-line interface, there are no specific requirements for interfaces of
any of the classes that will make up your software; it is up to you to analyze the specification
and come up with an appropriate set of classes, and to design their interfaces to facilitate the
necessary interactions. It is probably worth pointing out that an index (e.g., a geographic
coordinate index) should not simply be a naked container object (e.g, quadtree); if that’s not
clear to you, think more carefully about what sort of interface would be appropriate for an
index, as opposed to a container.
Command File
The execution of the program will be driven by a script file. Lines beginning with a semicolon
character (‘;’) are comments and should be ignored. Blank lines are possible. Each line in
the command file consists of a sequence of tokens, which will be separated by single tab
characters. A line terminator will immediately follow the final token on each line. The
command file is guaranteed to conform to this specification, so you do not need to worry
about error-checking when reading it.
The first non-comment line will specify the world boundaries to be used:
world
This will be the first command in the file, and will occur once. It specifies the
boundaries of the coordinate space to be modeled. The four parameters will be
longitude and latitudes expressed in DMS format, representing the vertical and
horizontal boundaries of the coordinate space.
It is certainly possible that the GIS record file will contain records for features
that lie outside the specified coordinate space. Such records should be ignored;
i.e., they will not be indexed.
Each subsequent non-comment line of the command file will specify one of the commands
described below.
One command is used to load records into your database from external files:
import
Add all the valid GIS records in the specified file to the database file. This means
that the records will be appended to the existing database file, and that those
records will be indexed in the manner described earlier. When the import is
completed, log the number of entries added to each index, and the longest probe
sequence that was needed when inserting to the hash table. (A valid record is one
that lies within the specified world boundaries.)
Note: will be a relative address (e.g. ./VA Monterey.txt
will point to the file named VA Monterey.txt which is placed besides the GIS
executable.
7
Put your Name & Id
on the report
Final Project
Due April 12th, 2021 11:45pm
COMP 8042
Hassan S. Shavarani
Another command requires producing a human-friendly display of the contents of an index
structure:
debug[ quad | hash | pool | world]
Log the contents of the specified index structure in a fashion that makes the
internal structure and contents of the index clear. It is not necessary to be overly
verbose here, but you should include information like key values and file offsets
where appropriate (implementing debugworld command is optional, if you
don’t like to implement it, you can just output that this is an optional feature. It
will help you debug the QuadTree much easier, tho, if you implement it).
Another simply terminates execution, which is handy if you want to process only part of a
command file:
quit
Terminate program execution.
The other commands involve searches of the indexed records. For the following commands,
if a is specified for the command, it will be expressed as a pair
of latitude/longitude values, expressed in the same DMS format that is used in the GIS
record files. In the script files, will show up as coordinate latitude>.
what is at
For every GIS record in the database file that matches the given coordinate>, log the offset at which the record was found, and the feature name,
county name, and state abbreviation. Do not log any other data from the matching
records.
what is
For every GIS record in the database file that matches the given
and , log the offset at which the record was found, and
the county name, the primary latitude, and the primary longitude. Do not log
any other data from the matching records.
what is in
For every GIS record in the database file whose coordinates fall within the closed
rectangle with the specified height and width, centered at the inate>, log the offset at which the record was found, and the feature name, the
8
Put your Name & Id
on the report
Final Project
Due April 12th, 2021 11:45pm
COMP 8042
Hassan S. Shavarani
state name, and the primary latitude and primary longitude. Do not log any other
data from the matching records. The half-height and half-width are specified as
total seconds.
The what is in command takes an optional modifier, -long, which causes the
display of a long listing of the relevant records. The switch will be the first token
following the name of the command. If this switch is present, then for every GIS
record in the database file whose coordinates fall within the closed rectangle with
the specified height and width, centered at the , log
every important non-empty field, nicely formatted and labeled. See the posted log
file(s) for an example. Do not log any empty fields. The half-height and half-width
are specified as seconds.
The what is in command also takes an optional modifier, causing the search
results to be filtered: -filter [ pop | water | structure ]
The switch and its modifier will be the first and second tokens following the name
of the command (for simplicity we assume that either -filter will be provided or
-long optional modifier, but not both). If present, this causes the set of matching
records to be filtered to only show those whose feature type field corresponds to
the given filter specifier. See Table 2 in the Appendix for instructions on how to
interpret the feature types shown above.
Sample command script(s), and corresponding log file(s), are provided alongside this description
file. As a general rule, every command should result in some output. In particular, a
descriptive message should be logged if a search yields no matching records. Also, to make
sure you are logging everything properly, please avoid using std::cout in your code unless
you are debugging.
Log File Description
Your output should be clear, concise, well labeled, and correct. You should begin the log
with a few lines identifying yourself, and listing the names of the input files that are being
used.
The remainder of the log file output should come directly from your processing of the command
file. You are required to echo each comment line, and each command that you process
to the log file so that it’s easy to determine which command each section of your output
corresponds to. Each command (except for “world”) should be numbered, starting with 1,
and the output from each command should be well formatted, and delimited from the output
resulting from processing other commands.
9
Put your Name & Id
on the report
Final Project
Due April 12th, 2021 11:45pm
COMP 8042
Hassan S. Shavarani
Quick Start
Reading through this whole document might be quite overwhelming and could make it hard
to get started. This section is intended to reduce that complexity and give you a road-map
on the tasks that need to be done in order. Following this road-map will help you save time
while enabling you to finish up each part as soon as you learn about it in the class. Lets get
started:
1. This project has a certain structure described throughout the document. Get started
by creating a GIS.cpp file and populating the project structure in it. You will need
a main function, as well as the classes: GISRecord, NameIndex, CoordinateIndex,
BufferPool, Logger, SystemManager, and CommandProcessor. You will also need a
enum Command structure for the different commands described in the project description
and a struct DMS to keep the parsed coordinates.
2. Once done with the project structure, get started on filling up the GISRecord and
CommandProcessor classes. By the end of this step your project should be able to read
the script file and parse its content reporting different commands and the arguments
passed to each. Your project should also be able to create the database file and log file
and fill out the initial and final lines of the execution logs as well as copying the comments
and commands using Logger class. Note that the results of running commands
are not expected in this step.
3. Next tackle implementing the first command, world. In this step, you don’t need to
create indices with it but later you will modify the code to initiate the indices along
with world creation.
4. Fill up the BufferPool to write to a file, read from a file, and fill up/use its cache.
Make sure it has a str() command implemented to be used for debug pool.
5. Work on import command. In this step, you will need to create your HashTable and
PRQuadtree classes to support NameIndex and CoordinateIndex classes. Fill out your
support classes with the help of what you have learnt about Trees and Hashing. You
can use the starter code provided in the labs as a starting point for this step as well.
6. To complete what is command, finish up your NameIndex. Make sure it has a str()
command implemented to be used for debug hash.
7. To complete what is at and what is in commands, finish up your CoordinateIndex.
Make sure it has str() and (optionally) visualize() functions implemented to be
used for debug quad and debug world.
8. Wrap up the project by re-reading through the project description and taking care of
the details that have been left out.
9. You are done!
10
Put your Name & Id
on the report
Final Project
Due April 12th, 2021 11:45pm
COMP 8042
Hassan S. Shavarani
Submission
For this project, you must submit an archive (zip or tar) file containing all the source code
files for your implementation (i.e., .cpp files). Submit only the source files. Do not submit
the compiled files or any of object files. If you use packages in your implementation (and
that’s good practice), your archive file must include the correct directory structure for those
packages, and your GIS.cpp file must be in the top directory when the archive file is unpacked.
Your code must be ready to compile using g++ -std=c++11 or a simple Makefile if
you have a more complicated structure. Make sure no visual studio related dependencies
or solution files are there when submitting the result, since I certainly will not use visual
studio to test and grade your project.
Alongside your source files, I need a pdf file describing your solution, general architecture
of your code, and the list of data structures you have implemented or used from STL. Run
script01.txt and put a screen shot of its created log file, as well.
The correctness of your solution will be evaluated by executing your solution on a collection
of test data files. Be sure to test your solution with all of the data sets that are posted, since
I will use a variety of data sets, including at least one very large data one (perhaps hundreds
of thousands of records) in my evaluation.
As it is stated in the beginning of the file description this project must be done individually.
You are not allowed to copy a single function from another person nor from the
internet without citing them. You may use any of the previously provided lab source
codes completed by yourself to reduce the implementation time of this project
(don’t forget to mention which lab codes you have used in your final report).
11
Put your Name & Id
on the report
Final Project
Due April 12th, 2021 11:45pm
COMP 8042
Hassan S. Shavarani
Appendix
Table 1: Geographic Data Record Format
Name Type Length/
Decimals Short Description
Feature ID Integer 10 Permanent, unique feature record identifier
and official feature nameFeature Name String 120
Feature Class String 50 See Table 2 later in this specification
State
Alpha String 2 The unique two letter alphabetic code
and the unique two number code for a US State
State
Numeric String 2
County
Name String 100 The name and unique three number code
for a county or county equivalent County
Numeric String 3
Primary
Latitude
DMS String 7 The official feature location
DMS-degrees/minutes/seconds
DEC-decimal degrees
Note: Records showing “Unknown” and zeros for
the latitude and longitude DMS and decimal fields,
respectively, indicate that the coordinates of the
feature are unknown. They are recorded in the
database as zeros to satisfy the format requirements
of a numerical data type. They are not errors and
do not reference the actual geographic coordinates
at 0 latitude, 0 longitude.
Primary
Longitude
DMS String 8
Primary
Latitude
DEC
Real
Number 11/7
Primary
Longitude
DEC
Real
Number 12/7
Source
Latitude
DMS String 7 Source coordinates of linear feature only
(Class = Stream, Valley, Arroyo)
DMS-degrees/minutes/seconds
DEC-decimal degrees
Note: Records showing “Unknown” and zeros for
the latitude and longitude DMS and decimal fields,
respectively, indicate that the coordinates of
the feature are unknown. They are recorded in the
database as zeros to satisfy the format requirements
of a numerical data type. They are not errors and
do not reference the actual geographic coordinates
at 0 latitude, 0 longitude.
Source
Longitude
DMS String 8
12
Put your Name & Id
on the report
Final Project
Due April 12th, 2021 11:45pm
COMP 8042
Hassan S. Shavarani
Source
Latitude
DEC
Real
Number 11/7
Source
Longitude
DEC
Real
Number 12/7
Elevation
(meters) Integer 5 Elevation in meters above (-below) sea level
of the surface at the primary coordinates
Elevation
(feet) Integer 6 Elevation in feet above (-below) sea level
of the surface at the primary coordinates
Map Name String 100 Name of USGS base series topographic
map containing the primary coordinates.
Date Created String Date The date the feature was initially
committed to the database.
Date Edited String Date The date any attribute of an existing
feature was last edited.
The Feature Class field of the GIS records may contain any of the following designators. We care
about these because of the -filter switch that may be used with the what is in commands. The
table below indicates which of the standard correspond to one of the filter specifiers.
Table 2: Feature Class Designators
Class Type Description
Airport structure Manmade facility maintained for the use of aircraft
(airfield, airstrip, landing field, landing strip).
Arch Natural arch-like opening in a rock mass
(bridge, natural bridge, sea arch).
Area
Any one of several areally extensive natural features
not included in other categories
(badlands, barren, delta, fan, garden).
Arroyo water Watercourse or channel through which water may
occasionally flow (coulee, draw, gully, wash).
Bar
Natural accumulation of sand, gravel, or alluvium
forming an underwater or exposed embankment
(ledge, reef, sandbar, shoal, spit).
Basin Natural depression or relatively low area enclosed
by higher land (amphitheater, cirque, pit, sink).
Bay water
Indentation of a coastline or shoreline enclosing a
part of a body of water; a body of water partly
surrounded by land
(arm, bight, cove, estuary, gulf, inlet, sound).
13
Put your Name & Id
on the report
Final Project
Due April 12th, 2021 11:45pm
COMP 8042
Hassan S. Shavarani
Beach
The sloping shore along a body of water that is
washed by waves or tides and is usually covered
by sand or gravel
(coast, shore, strand).
Bench
Area of relatively level land on the flank of an
elevation such as a hill, ridge, or mountain
where the slope of the land rises on one side
and descends on the opposite side (level).
Bend water
Curve in the course of a stream and (or) the land
within the curve; a curve in a linear body of water
(bottom, loop, meander).
Bridge structure
Manmade structure carrying a trail, road, or other
transportation system across a body of water or
depression (causeway, overpass, trestle).
Building structure
站长地图