讲解CS 2505、辅导GIS System、辅导C++编程设计、C++讲解
- 首页 >> 其他 CS 2505 Computer Organization I C07: GIS System in C
Version 2.10 This is a purely individual assignment! 1
C Programming Structured Types, Files and Parsing, Indexing
Geographic information systems organize information pertaining to geographic features and provide various kinds of access
to the information. A geographic feature may possess many attributes (see below). In particular, a geographic feature has a
specific location. There are a number of ways to specify location. For this project, we will use latitude and longitude,
which will allow us to deal with geographic features at any location on Earth. A reasonably detailed tutorial on latitude and
longitude can be found in the Wikipedia at en.wikipedia.org/wiki/Latitude and en.wikipedia.org/wiki/Longitude.
The GIS record files were obtained from the website for the USGS Board on Geographic Names (geonames.usgs.gov). The
file begins with a descriptive header line, followed by a sequence of GIS records, one per line, which contain the following
fields in the indicated order:
Figure 1: Geographic Data Record Format
Name Type Length/
Decimals Short Description
Feature ID Integer 10
Permanent, unique feature record identifier and official feature name Feature
Name String 120
Feature
Class String 50 See Figure 3 later in this specification
State Alpha String 2
State The unique two letter alphabetic code and the unique two number code for a US State
Numeric String 2
County
Name String 100
The name and unique three number code for a county or county equivalent
County
Numeric String 3
Primary
Latitude
DMS
String 7
The official feature location
DMS-degrees/minutes/seconds
DEC-decimal degrees.
Note: Records showing "Unknown" and zeros for the latitude and longitude DMS and
decimal fields, respectively, indicate that the coordinates of the feature are unknown.
They are recorded in the database as zeros to satisfy the format requirements of a
numerical data type. They are not errors and do not reference the actual geographic
coordinates at 0 latitude, 0 longitude.
Primary
Longitude
DMS
String 8
Primary
Latitude
DEC
Real
Number 11/7
Primary
Longitude
DEC
Real
Number 12/7
Source
Latitude
DMS
String 7
Source coordinates of linear feature only (Class = Stream, Valley, Arroyo)
DMS-degrees/minutes/seconds
DEC-decimal degrees.
Note: Records showing "Unknown" and zeros for the latitude and longitude DMS and
decimal fields, respectively, indicate that the coordinates of the feature are unknown.
They are recorded in the database as zeros to satisfy the format requirements of a
numerical data type. They are not errors and do not reference the actual geographic
coordinates at 0 latitude, 0 longitude.
Source
Longitude
DMS
String 8
Source
Latitude
DEC
Real
Number 11/7
Source
Longitude
DEC
Real
Number 12/7CS 2505 Computer Organization I C07: GIS System in C
Version 2.10 This is a purely individual assignment! 2
Elevation
(meters) Integer 5 Elevation in meters above (-below) sea level of the surface at the primary coordinates
Elevation
(feet) Integer 6 Elevation in feet above (-below) sea level of the surface at the primary coordinates
Map Name String 100 Name of USGS base series topographic map containing the primary coordinates.
Date
Created String The date the feature was initially committed to the database.
Date Edited String The date any attribute of an existing feature was last edited.
Notes:
See https://geonames.usgs.gov/domestic/states_fileformat.htm for the full field descriptions.
The type specifications used here have been modified from the source (URL above) to better reflect the realities of
your programming environment.
Latitude and longitude may be expressed in DMS (degrees/minutes/seconds, 0820830W) format, or DEC (real
number, -82.1417975) format. In DMS format, latitude will always be expressed using 6 digits followed by a
single character specifying the hemisphere, and longitude will always be expressed using 7 digits followed by a
hemisphere designator.
Although some fields are mandatory, some may be omitted altogether. Best practice is to treat every field as if it
may be left unspecified. Certain fields are necessary in order to index a record: the feature ID, the feature name,
and the state abbreviation. Every record will always include each of those fields.
In the GIS record file, each record will occur on a single line, and the fields will be separated by pipe ('|') symbols.
Empty fields will be indicated by a pair of pipe symbols with no characters between them. See the posted
VA_Highland.txt file for many examples.
GIS record files are guaranteed to conform to this syntax, so there is no explicit requirement that you validate the files. On
the other hand, some error-checking during parsing may help you detect errors in your parsing logic.
The file can be thought of as a sequence of bytes, each at a unique offset from the beginning of the file, just like the cells of
an array. So, each GIS record begins at a unique offset from the beginning of the file.
Line Termination
Each line of a text file ends with a particular marker (known as the line terminator). In MS-DOS/Windows file systems, the
line terminator is a sequence of two ASCII characters (CR + LF, 0X0D0A). In Unix systems, the line terminator is a single
ASCII character (LF). Other systems may use other line termination conventions.
Why should you care? Which line termination is used has an effect on the file offsets for all but the first record in the data
file. As long as we’re all testing with files that use the same line termination, we should all get the same file offsets. But if
you change the file format (of the posted data files) to use different line termination, you will get different file offsets than
are shown in the posted log files. Most good text editors will tell you what line termination is used in an opened file, and
also let you change the line termination scheme.
All that being said, when this project is graded, Unix line termination will be used, which will also be true of all the test
data files supplied for the project.CS 2505 Computer Organization I C07: GIS System in C
Version 2.10 This is a purely individual assignment! 3
Figure 2: Sample Geographic Data
Records
Note that some record fields are optional, and that
when there is no given value for a field, there are
still delimiter symbols for it.
Also, some of the lines are "wrapped" to fit into
the text box; lines are never "wrapped" in the
actual data files.
FEATURE_ID|FEATURE_NAME|FEATURE_CLASS|STATE_ALPHA|STATE_NUMERIC|COUNTY_NAME|COUNTY_NUMERIC|PRIMARY_LAT_DMS|PRIM_LONG_DMS|PRIM_LAT_DEC|PRIM_LON
G_DEC|SOURCE_LAT_DMS|SOURCE_LONG_DMS|SOURCE_LAT_DEC|SOURCE_LONG_DEC|ELEV_IN_M|ELEV_IN_FT|MAP_NAME|DATE_CREATED|DATE_EDITED
1479116|Monterey Elementary School|School|VA|51|Roanoke (city)|770|371906N|0795608W|37.3183753|- 79.9355857|||||323|1060|Roanoke|09/28/1979|09/15/2010
1481345|Asbury Church|Church|VA|51|Highland|091|382607N|0793312W|38.4353981|-79.5533807|||||818|2684|Monterey|09/28/1979|
1481852|Blue Grass|Populated Place|VA|51|Highland|091|383000N|0793259W|38.5001188|-79.5497702|||||777|2549|Monterey|09/28/1979|
1481878|Bluegrass Valley|Valley|VA|51|Highland|091|382953N|0793222W|38.4981745|-79.539492|382601N|0793800W|38.4337309|- 79.6333833|759|2490|Monterey|09/28/1979|
1482110|Buck Hill|Summit|VA|51|Highland|091|381902N|0793358W|38.3173452|-79.5661577|||||1003|3291|Monterey SE|09/28/1979|
1482176|Burners Run|Stream|VA|51|Highland|091|382509N|0793409W|38.4192873|-79.5692144|382531N|0793538W|38.4252778|- 79.5938889|848|2782|Monterey|09/28/1979|
1482324|Mount Carlyle|Summit|VA|51|Highland|091|381556N|0793353W|38.2656799|-79.5647682|||||698|2290|Monterey SE|09/28/1979|
1482434|Central Church|Church|VA|51|Highland|091|382953N|0793323W|38.4981744|-79.5564371|||||773|2536|Monterey|09/28/1979|
1482557|Claylick Hollow|Valley|VA|51|Highland|091|381613N|0793238W|38.2704021|-79.5439343|381733N|0793324W|38.2925|- 79.5566667|573|1880|Monterey SE|09/28/1979|
1482785|Crab Run|Stream|VA|51|Highland|091|381707N|0793144W|38.2854018|-79.528934|381903N|0793415W|38.3175|-79.5708333|579|1900|Monterey
SE|09/28/1979|
1482950|Davis Run|Stream|VA|51|Highland|091|381824N|0793053W|38.3067903|-79.5147671|382057N|0793505W|38.3491667|-79.5847222|601|1972|Monterey
SE|09/28/1979|
1483281|Elk Run|Stream|VA|51|Highland|091|382936N|0793153W|38.4934524|-79.5314362|383121N|0793056W|38.5226185|- 79.5156027|757|2484|Monterey|09/28/1979|
1483492|Forks of Waters|Locale|VA|51|Highland|091|382856N|0793031W|38.4823417|-79.5086575|||||705|2313|Monterey|09/28/1979|
1483527|Frank Run|Stream|VA|51|Highland|091|382953N|0793310W|38.4981744|-79.5528258|383304N|0793341W|38.5512285|- 79.5614381|780|2559|Monterey|09/28/1979|
1483647|Ginseng Mountain|Summit|VA|51|Highland|091|382850N|0793139W|38.480675|-79.527547|||||978|3209|Monterey|09/28/1979|
1483860|Gulf Mountain|Summit|VA|51|Highland|091|382940N|0793103W|38.4945636|-79.5175468|||||1006|3300|Monterey|09/28/1979|
1483916|Hamilton Chapel|Church|VA|51|Highland|091|381740N|0793707W|38.2945677|-79.6186591|||||823|2700|Monterey SE|09/28/1979|
1484097|Highland High School|School|VA|51|Highland|091|382426N|0793444W|38.4071387|-79.5789333|||||879|2884|Monterey|09/28/1979|09/15/2010
1484099|Highland Wildlife Management Area|Park|VA|51|Highland|091|381905N|0793439W|38.3181785|-79.577547|||||954|3130|Monterey SE|09/28/1979|
. . .CS 2505 Computer Organization I C07: GIS System in C
Version 2.10 This is a purely individual assignment! 4
Assignment
You will implement a system that indexes and provides search features for a file of GIS records, as described above.
Your system will build and maintain several in-memory index data structures to support these operations:
Retrieving data for all GIS records matching a given feature name and state
Retrieving data for the unique GIS record matching a given feature ID
Reporting the distance between two features specified by their feature IDs
Displaying the in-memory indices in a human-readable manner; this is purely for debugging purposes and will not
be evaluated for grading purposes.
You will implement a single software system in C to perform all system functions.
Program Invocation
The program will take the names of two files from the command line: (assuming that your executable is named gis):
gis
The command-line specifies:
the name of the file containing the search commands to be carried out, and other information described below
the name of the file to which the program will write its output
The database file will exist, and conform to the format description given above. If the command script file is not found the
program should write an error message to the console and exit gracefully. The log file may or may not already exist; if it
does not exist, a file with the specified name should be created; if it does exist, the old contents should be replaced.
System Overview
Among other requirements, your solution will implement indexing to support finding records that match certain criteria.
The basic notion of an index is that the user gives the index a key value, and the index returns a collection of locations at
which matching records can be found, or an indication that no records match the given key:
The client code neither knows, nor needs to know, how the index manages information internally (i.e., what data structure
is used); the client only needs to understand the "public" aspect of the transaction. That is, "I send the index a key value of
the appropriate type" and "I get back a collection of one or more file offsets in some agreed format".
client code
key value
offsets of
matching
records
in db file
index
suitable
data
structureCS 2505 Computer Organization I C07: GIS System in C
Version 2.10 This is a purely individual assignment! 5
It's not the job of the index to actually retrieve records for the client. In fact, the index doesn't need to know where the
records are stored, or in what format the records are stored.
The GIS records will be indexed by the Feature Name and State (abbreviation) fields. This name index will support
finding offsets of GIS records that match a given feature name and state abbreviation. It is entirely possible that the file
may contain several records that have the same Feature Name and State fields; for example, there are, or were, actually
three places in Virginia named Blacksburg.
The GIS records will also be indexed by their Feature ID fields. The feature ID index will support finding offsets of the
unique GIS record that matches a given feature ID. The feature ID is a unique identifier; that is, there will never be two
different features with the same ID.
See the section on Index Data Structures and Algorithms for the requirements your implementations of the two index
structures must satisfy.
When searches are performed, complete GIS records will be retrieved from the GIS database file that your program
maintains. The only complete GIS records that are stored in memory at any time are those that have just been retrieved to
satisfy the current search, or individual GIS records retrieved while building the index structures.
Command File
The execution of the program will be driven by a script file. Lines beginning with a semicolon character (';') are
comments and should be ignored. Blank lines are possible. Each command consists of a sequence of tokens, which will be
separated by single tab characters. A line terminator will immediately follow the final token on each line. The command file
is guaranteed to conform to this specification, so you do not need to worry about error-checking when reading it.
The first two lines in the command file that are not comments will specify the name of the GIS record file you will be
processing, and the size to use for the hash table (details on that later):
db_file
table_sz
The named file will be one that's supplied for testing. The command file and GIS record file should both be in the same
directory as your executable when you run your program. The remaining commands involve searches related to the
indexed records:
exists
Report the number of records in the database file that match the given feature name and state abbreviation, in the
following format:
N occurrences:
See the posted log files for examples.
details_of
For each record in the database file that matches the given feature name and state abbreviation, report the values of
the following information, with descriptive labels: file offset of record, feature ID, feature name, feature type, state
abbreviation, county, primary longitude, and primary latitude.
If more than one record matches the given criteria, list them in ascending order by feature ID. (This may be easier to
achieve if you make certain decisions about how to organize your feature name index.)
See the posted log files for examples.CS 2505 Computer Organization I C07: GIS System in C
Version 2.10 This is a purely individual assignment! 6
distance_between
Compute and display the orthodromic distance between the two features. See the following section on great circles
for instructions on how to do this. The calculation will require a number of trigonometric functions from the C math
libraries, so this time we will use the -lm switch when doing builds. Round the distance to the nearest tenth of a
kilometer when you print it.
If there is no record in the database file for one or both given feature IDs, log an informative message to that effect.
See the posted log files for examples.
The script files are guaranteed to conform to the syntax descriptions given above, so you do not need to worry about
checking for such errors.
Your program must echo each comment found in the command file to the log file, and echo each command, labeled and
numbered (see posted log files for the format).
Sample command scripts, and corresponding log files, will be provided on the website. As a general rule, every command
should result in some output. In particular, a descriptive message must be logged if a search yields no matching records.
Index Data Structures and Algorithms
Feature ID Index
You will use a simple array of struct variables as the physical data structure for the FID index; it's up to you to decide
what types to use for the fields in those struct variables will have, but they absolutely will not contain representations of
any information except the feature ID and the file offset of a single matching record.
The FID index must support searches that are O(log N), where N is the number of records in the database file; that means
you need to use binary search within the FID index, and so the entries there must be sorted by FID. There are two ways to
achieve this:
keep the entries in sorted order as you add them to the array
put all the entries into the array and then sort them
The first approach is somewhat less efficient, but can be done rather easily by shifting entries in the array when a new entry
is added. The second approach requires using some sensible sorting algorithm.
The Standard Library, from stdlib.h, supplies a function that performs Quicksort:
void qsort(void *base, size_t nelem, size_t width,
int (*compare)(const void*, const void*));
The interface may seem a bit confusing:
base a pointer to the array that is to be sorted; it's specified as void* so you can use qsort() on an array
holding any kind of data that you like
nelem the number of elements to be sorted
width the size (in bytes) of each element in the array; this is usually specified by using sizeof()
int (*compare) (const void *left, const void *right)
the name of a function that takes two void* parameters and returns an integer value; this function is
used by qsort() to compare the elements in the array; returning < 0, 0 or >0 depending on whether
*left < *right, *left == *right, or *left > *right.CS 2505 Computer Organization I C07: GIS System in C
Version 2.10 This is a purely individual assignment! 7
There's a relevant example of how to use this in the Advice section later in this specification.
When implementing the FID index, make the initial array of size 256. If the array becomes full as you build the index,
resize it by doubling the size. This may be accomplished by using realloc(). You may have to resize the array more
than once.
Feature Name Index
You will implement the feature name index as a hash table, using an array of pointers to struct variables as the physical
structure. The struct variables in the array will store three pieces of information: a string that you construct from the
feature name and the state abbreviation for a feature, an array of file offsets of matching records, and a pointer.
Here's an example of the sort of table you'll be implementing:
Dots represent NULL pointers, and the notation used for the strings just indicates that the two parts are combined into a
single string in some way that you can choose. You are free to combine the feature name and state abbreviation in any way
you like.
Elements are mapped into a hash table by making use of a "magic function" that takes a key value as input and produces a
nonnegative integer value, which is called the hash value of the key. The function is called a hash function. For this
assignment, the key values will be the feature name/state abbreviation pairs.
In the example above, let's assume the array has dimension 1000; in other words, we'd say the hash table has 1000 slots.
The hash function has taken the key ["Wigdahl Farm","IA"] and has produced an integer K (which we do not
know), such that K % 1000 == 1.
That is, the hash function supplies a nonnegative integer, computed in some fashion from the key for some record, and that
integer is modded by the table size to produce a valid table index at which the record will be stored. We call that slot the
home slot for the key.
Ideally, the hash function would never map two different keys to the same table slot. Do not count on achieving that except
in special circumstances. If the hash function does map two different keys to the same table slot, we say the keys have
collided in that slot. For this assignment, we will deal with collisions by simply treating each table slot as a linked list of
index entries (struct variables), where each index entry corresponds to a different key value.
Searching in a hash table proceeds by using the hash function to compute a table index from the key we are interested in,
and then searching the list in that table slot to see if there's an entry for the given key value. In the ideal case, searches in
the hash table are O(1), which is why you are using one.
["Elk Creek","IA"]
{532, 7842, 10943}
["Wigdahl Farm","IA"]
{5830}
["Abbes Fork","IA"]
{3714}
table slots
index entriesCS 2505 Computer Organization I C07: GIS System in C
Version 2.10 This is a purely individual assignment! 8
The assignment also involves an annoyance: feature names are not necessarily unique, even within the same state. For
example, the GIS database mentioned earlier lists three different places in Virginia named Blacksburg! Since we never
want to put two different entries (struct variables) that store the same key in a hash table, each entry in the table needs to
store the file offset for every record that matches the given key. That's illustrated in the table above with the key field
["Elk Creek","IA"], which has three matches.
There are lots of other, interesting things to say about the general concept of a hash table, but we will leave that for a course
on data structures.
The question remains: what sort of function could we use to compute an integer from a string? There are actually lots of
ways to do this. You will use the following hash function, which is based on one that was implemented as part of the first
UNIX implementation at Bell Labs:
uint32_t hashKey(char* key) {
uint32_t hashValue = 0;
for (int pos = 0; key[pos] != '\0'; pos++) { // use all elements
hashValue = (hashValue << 4) + key[pos]; // shift/mix
uint32_t hiBits = hashValue & 0xF0000000; // get high nybble
if (hiBits != 0) {
hashValue ^= hiBits >> 24; // fold high nybble into second nybble
}
hashValue &= ~hiBits; // clear high nybble
}
return hashValue;
}
Permitted Assumptions and Some Advice
You may assume that:
no GIS record will ever be longer than 500 characters
no line in the command file will ever be longer than 256 characters
the GIS record file and the command file will follow the specified syntax described earlier
The following observations are purely advisory, but are based on my experience, including that of implementing a solution
to this assignment. These comments are advice, not requirements.
First, and most basic, analyze what your GIS system must do and design a sensible, logical framework for making those
things happen.
Second, and also basic, practice incremental development! This is a fairly sizeable program, especially so if it's done
properly. My solution, including comments, approaches 1000 lines of code. It takes quite a bit of work before you have
enough working code to test on full input files, but incremental testing is extremely valuable. Try to think of implementing
the system feature by feature, and test as you go. Implement one index first, and test searches using that index. Then work
on the other index.
Record your design decisions in some way; a simple text file is often useful for tracking your deliberations, the alternatives
you considered, and the conclusions you reached. That information is invaluable as your implementation becomes more
complete, and hence more complex, and you are attempting to extend it to incorporate additional features.
Write useful comments in your code, as you go. Leave notes to yourself about things that still need to be done, or that you
are currently handling in a clumsy manner.CS 2505 Computer Organization I C07: GIS System in C
Version 2.10 This is a purely individual assignment! 9
Take advantage of tools. You should already have a working knowledge of gdb. Use it! The debugger is invaluable when
pinning down the location of segfaults; but it is also useful for tracking down lesser issues if you make good use of
breakpoints and watchpoints. Some memory-related errors yield mysterious behavior, and confusing runtime error reports.
That's especially true when you have written past the end of a dynamically-allocated array and corrupted the heap. This
sort of error can often be diagnosed by using valgrind.
Enumerated types are extremely useful for representing various kinds of information, especially about type attributes of
structured variables. For example, if implementing a different sort of system, we might find the following type useful:
// Vehicle.h
enum _VehicleMake {ACURA, CHEVROLET, DODGE, FORD, . . . , VOLVO};
typedef enum _ VehicleMake VehicleMake;
...
struct _Vehicle {
VehicleMake make;
char* model; // could also be an enumerated type
uint16_t year;
char* licenseNum;
char* vin;
...
};
typedef struct _Vehicle Vehicle;
...
Enumerated types let me use descriptive labels in my code, which makes it easier to understand the logic of the code.
You will need to use static tables of structures to organize language information; by static, I mean a table that has static
storage duration, and is private to the file in which it’s created. In some cases, such a table may be initialized directly, with
fixed data, when it is declared. For example:
// VehicleData.c
#define NUMRECORDS 50
static Vehicle VehicleTable[NUMRECORDS] = {
{FORD, "Fiesta", 1978, "LXR 804", . . .},
{GMC, "Safari", 1997, "ZFL 8473", . . .},
...
{CORD, "812 Westchester", 1937, "PAM 445", . . .}
};
In other cases, the table may be declared as a static, file-scoped array, and initialized after your program starts, by reading
data from a file. That will be the case for the index structures in this assignment.
Once a static table has been created, we can provide access to it by writing functions that can be called from other parts of
the system. For example:
// VehicleData.c
uint16_t lookupYearByLicense(char* license) { . . . }
Since this function is implemented in the same file as the table, the function can access the table to perform a search. We
would put the declaration of the function in a header file, so it can be called from elsewhere:
// VehicleData.h
uint16_t lookupYearByLicense(char* license);
Obviously, the sample code shown above does not play a role in my solution. On the other hand, I used this approach to
implement the data structures (arrays) that my indexing depends on. CS 2505 Computer Organization I C07: GIS System in C
Version 2.10 This is a purely individual assignment! 10
Write lots of "utility" functions because they simplify things tremendously; e.g., string manipulators, comparison functions,
command handlers, I/O functions, etc. My solution involves all of those.
Data types, like the structure shown above, play a major role in a good solution. I wrote a number of them, in addition to
the ones I used in indexing.
One key to becoming proficient and productive in C, as in most programming languages, is to take full advantage of the
library that comes with that language.
Explore string.h carefully (Google and find it at pubs.opengroup.org). Useful functions include strncpy(),
strncmp(), memcpy() and strtok(). There are lots of useful functions in the C Standard Library, not just in
string.h.
There are some advanced parsing features in C, involving the concept of a scanset used in a format specifier. Consult the
notes on this, linked from the Assignments page.
Another potentially useful function in the Standard Library, from stdlib.h, performs Quicksort:
void qsort(void *base, size_t nelem, size_t width,
int (*compare)(const void*, const void*));
The interface may seem a bit confusing:
base a pointer to the array that is to be sorted; it's specified as void* so you can use qsort() on an array
holding any kind of data that you like
nelem the number of elements to be sorted
width the size (in bytes) of each element in the array; this is usually specified by using sizeof()
int (*compare) (const void *left, const void *right)
the name of a function that takes two void* parameters and returns an integer value; this function is
used by qsort() to compare the elements in the array; returning < 0, 0 or >0 depending on whether
*left < *right, *left == *right, or *left > *right.
Here's a little example. Recall the Vehicle type defined earlier; suppose we had an array of Vehicle objects, and we
wanted to use qsort() to put them in ascending order by the model field. The following comparison function will do:
int compareModels(const void *left, const void *right) {
const Vehicle* pLeft = (Vehicle*) left;
const Vehicle* pRight = (Vehicle*) right;
return ( strcmp( pLeft->model, pRight->model) );
}
The call to qsort() would look something like:
qsort(vehicleList, nVehicles, sizeof(Vehicle), compareModels);
The use of void* in the implementation of the comparison function is typical, old-school C. It would be dangerous, if a
call to the comparison function passed pointers to anything other than Vehicle objects, and there's no way to check for
that since the use of void* prevents type-checking. On the other hand, since the calls to the comparison function are made
by qsort(), as long as we provide qsort() with an array of Vehicle objects, all will be well.
Now, why might sorting be useful? Since each of the index structures uses an array of struct variables to hold
information, the only way to achieve log(N) searches is if the arrays are sorted on the struct element that is used as the
search key.CS 2505 Computer Organization I C07: GIS System in C
Version 2.10 This is a purely individual assignment! 11
One approach would be to build the array, and then sort it. Another would be to keep the array in sorted order as it's built,
by placing each new entry into the array so that everything stays sorted. Either approach is acceptable. The second
approach would be better if your application required adding new elements on the fly, rather than all at once.CS 2505 Computer Organization I C07: GIS System in C
Version 2.10 This is a purely individual assignment! 12
What to Submit
You will submit your solution in a single, flat, uncompressed tar file to the Curator, via the collection point C07. That file
must include all of the .c and .h files involved in your solution, and nothing else.
Your submission will be graded by running the supplied test/grading code on it, with several different scripts and data files.
Grading
A tar file with a few scripts and reference logs is posted on the course website. For example:
; Script 1: very small db file, testing only the "exists" command
;
; Execution: gis script01.txt log01.txt
;
db_file VA_Highland.txt
table_sz 500
;
; Existence searches that should find a single match:
exists Devils Backbone VA
exists Jerkemtight Trail VA
exists Opossum Hollow VA
;
; Existence searches that should find no match:
exists Devils Backbone AK
exists McCowan Spring VA
You can compare your log file to the reference logs visually to determine if your results are logically correct:
; Existence searches that should find a single match:
Cmd 1: exists Devils Backbone VA
1 occurrences: Devils Backbone VA
----------------------------------------------------------------------
Cmd 2: exists Jerkemtight Trail VA
1 occurrences: Jerkemtight Trail VA
----------------------------------------------------------------------
Cmd 3: exists Opossum Hollow VA
1 occurrences: Opossum Hollow VA
----------------------------------------------------------------------
;
; Existence searches that should find no match:
Cmd 4: exists Devils Backbone AK
Not found in index: Devils Backbone AK
----------------------------------------------------------------------
Cmd 5: exists McCowan Spring VA
Not found in index: McCowan Spring VA
----------------------------------------------------------------------
You should match the contents of the reference logs more or less exactly. An updated set of test data, with a comparison
tool and shell script will be posted in a week or so.CS 2505 Computer Organization I C07: GIS System in C
Version 2.10 This is a purely individual assignment! 13
Pledge:
Each of your program submissions must be pledged to conform to the Honor Code requirements for this course.
Specifically, you must include the following pledge statement in the submitted .c file containing main():
// On my honor:
//
// - I have not discussed the C language code in my program with
// anyone other than my instructor or the teaching assistants
// assigned to this course.
//
// - I have not used C language code obtained from another student,
// the Internet, or any other unauthorized source, either modified
// or unmodified.
//
// - If any C language code or documentation used in my program
// was obtained from an authorized source, such as a text book or
// course notes, that has been clearly noted with a proper citation
// in the comments of my program.
//
// - I have not designed this program in such a way as to defeat or
// interfere with the normal operation of the Curator System.
//
//
//
We reserve the option of assigning a score of zero to any submission that is undocumented
or does not contain this statement.
Change Log
Version Posted Pg Change
2.00 April 1 Base document.
2.10 April 1 Multiple changes in phrasing but no adjustments to requirements.CS 2505 Computer Organization I C07: GIS System in C
Version 2.10 This is a purely individual assignment! 14
Orthodromic (great-circle) Distance
Given a sphere, two points on its surface are antipodal if the line connecting the points contains
the center of the sphere. u and v are antipodal.
Given a sphere, a great circle is a circle whose radius equals that of the sphere and having the
same center as the sphere.
Given two points P and Q on the surface of a sphere, if the two points are not at opposite ends of
a diameter of the sphere, there is a unique great circle containing P and Q.
The length of the shorter arc between P and Q is the orthodromic distance between them; that's shown in red in the
diagram. (Of course, if P and Q are antipodal, there are infinitely many great circles containing them, and the distance
between them is just half the circumference of the sphere.)
Now, there's a simple fact about the length of an arc of a circle. Given two points on a circle of
radius R, A and B, let the angle formed by the radii containing A and B be Θ. This is called the
central angle.
Then the length of the arc L between A and B is simply
R? .
But, how do we calculate the central angle of that arc, if the points are specified by giving their coordinates as longitude
and latitude values? The key lies in spherical trigonometry.
First of all, let's pass a plane through the center of the sphere (for our GIS system, this
would be the plane determined by the equator). Then, let's choose a line in that plane from
the center of the sphere to the surface (for our GIS system, that would simply be the zero
meridian).
Given the point P, we have a unique right triangle determined by the radius containing P
and a perpendicular from P to the plane of the equator:
The angle lambda between the zero meridian line and the base of that triangle is simply the
longitude of P.
The angle ? between the base and hypotenuse of the triangle is simply the latitude of P.
And, we can calculate the central angle, Δσ, by using the spherical law of cosines:
Δσ = arccos sin1 2 1 2 × sin + cos × cos × cos?Δλ??
where
is the latitude of
is the longitude of
is the latitude of
is the longitude of
Note that all the angles are expressed in radians (remember those?); you'll have to convert since the GIS data files give
longitude and latitude in degrees. And, we'll use the decimal values for longitude and latitude that are given in the GIS data
files, because those a slightly more precise than the values you'd get by converting the DMS values.
Figure 1
Figure 2
Figure 3CS 2505 Computer Organization I C07: GIS System in C
Version 2.10 This is a purely individual assignment! 15
Standard C does not define a value for π, and you need a value for converting degrees to radians. So you should add the
following statement in your code:
#define PI 3.14159265358979323846 // best approximation as double
Now, the Earth is not a sphere. In fact, the radius through the poles is about 6334.439 km, while the radius at the equator is
about 6378.137 km. For the purposes of this assignment, we will trust the IUGG and include the following statement:
#define RADIUS 6371.0088 // IUGG mean radius of Earth, in km
Credits
Figure 1, 3:
By CheCheDaWaff - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=48187293
Figure 2:
By Lfahlberg - Own work, CC BY-SA 3.0, https://commons.wikimedia.org/wiki/File:Angle_central_convex.svg
Version 2.10 This is a purely individual assignment! 1
C Programming Structured Types, Files and Parsing, Indexing
Geographic information systems organize information pertaining to geographic features and provide various kinds of access
to the information. A geographic feature may possess many attributes (see below). In particular, a geographic feature has a
specific location. There are a number of ways to specify location. For this project, we will use latitude and longitude,
which will allow us to deal with geographic features at any location on Earth. A reasonably detailed tutorial on latitude and
longitude can be found in the Wikipedia at en.wikipedia.org/wiki/Latitude and en.wikipedia.org/wiki/Longitude.
The GIS record files were obtained from the website for the USGS Board on Geographic Names (geonames.usgs.gov). The
file begins with a descriptive header line, followed by a sequence of GIS records, one per line, which contain the following
fields in the indicated order:
Figure 1: Geographic Data Record Format
Name Type Length/
Decimals Short Description
Feature ID Integer 10
Permanent, unique feature record identifier and official feature name Feature
Name String 120
Feature
Class String 50 See Figure 3 later in this specification
State Alpha String 2
State The unique two letter alphabetic code and the unique two number code for a US State
Numeric String 2
County
Name String 100
The name and unique three number code for a county or county equivalent
County
Numeric String 3
Primary
Latitude
DMS
String 7
The official feature location
DMS-degrees/minutes/seconds
DEC-decimal degrees.
Note: Records showing "Unknown" and zeros for the latitude and longitude DMS and
decimal fields, respectively, indicate that the coordinates of the feature are unknown.
They are recorded in the database as zeros to satisfy the format requirements of a
numerical data type. They are not errors and do not reference the actual geographic
coordinates at 0 latitude, 0 longitude.
Primary
Longitude
DMS
String 8
Primary
Latitude
DEC
Real
Number 11/7
Primary
Longitude
DEC
Real
Number 12/7
Source
Latitude
DMS
String 7
Source coordinates of linear feature only (Class = Stream, Valley, Arroyo)
DMS-degrees/minutes/seconds
DEC-decimal degrees.
Note: Records showing "Unknown" and zeros for the latitude and longitude DMS and
decimal fields, respectively, indicate that the coordinates of the feature are unknown.
They are recorded in the database as zeros to satisfy the format requirements of a
numerical data type. They are not errors and do not reference the actual geographic
coordinates at 0 latitude, 0 longitude.
Source
Longitude
DMS
String 8
Source
Latitude
DEC
Real
Number 11/7
Source
Longitude
DEC
Real
Number 12/7CS 2505 Computer Organization I C07: GIS System in C
Version 2.10 This is a purely individual assignment! 2
Elevation
(meters) Integer 5 Elevation in meters above (-below) sea level of the surface at the primary coordinates
Elevation
(feet) Integer 6 Elevation in feet above (-below) sea level of the surface at the primary coordinates
Map Name String 100 Name of USGS base series topographic map containing the primary coordinates.
Date
Created String The date the feature was initially committed to the database.
Date Edited String The date any attribute of an existing feature was last edited.
Notes:
See https://geonames.usgs.gov/domestic/states_fileformat.htm for the full field descriptions.
The type specifications used here have been modified from the source (URL above) to better reflect the realities of
your programming environment.
Latitude and longitude may be expressed in DMS (degrees/minutes/seconds, 0820830W) format, or DEC (real
number, -82.1417975) format. In DMS format, latitude will always be expressed using 6 digits followed by a
single character specifying the hemisphere, and longitude will always be expressed using 7 digits followed by a
hemisphere designator.
Although some fields are mandatory, some may be omitted altogether. Best practice is to treat every field as if it
may be left unspecified. Certain fields are necessary in order to index a record: the feature ID, the feature name,
and the state abbreviation. Every record will always include each of those fields.
In the GIS record file, each record will occur on a single line, and the fields will be separated by pipe ('|') symbols.
Empty fields will be indicated by a pair of pipe symbols with no characters between them. See the posted
VA_Highland.txt file for many examples.
GIS record files are guaranteed to conform to this syntax, so there is no explicit requirement that you validate the files. On
the other hand, some error-checking during parsing may help you detect errors in your parsing logic.
The file can be thought of as a sequence of bytes, each at a unique offset from the beginning of the file, just like the cells of
an array. So, each GIS record begins at a unique offset from the beginning of the file.
Line Termination
Each line of a text file ends with a particular marker (known as the line terminator). In MS-DOS/Windows file systems, the
line terminator is a sequence of two ASCII characters (CR + LF, 0X0D0A). In Unix systems, the line terminator is a single
ASCII character (LF). Other systems may use other line termination conventions.
Why should you care? Which line termination is used has an effect on the file offsets for all but the first record in the data
file. As long as we’re all testing with files that use the same line termination, we should all get the same file offsets. But if
you change the file format (of the posted data files) to use different line termination, you will get different file offsets than
are shown in the posted log files. Most good text editors will tell you what line termination is used in an opened file, and
also let you change the line termination scheme.
All that being said, when this project is graded, Unix line termination will be used, which will also be true of all the test
data files supplied for the project.CS 2505 Computer Organization I C07: GIS System in C
Version 2.10 This is a purely individual assignment! 3
Figure 2: Sample Geographic Data
Records
Note that some record fields are optional, and that
when there is no given value for a field, there are
still delimiter symbols for it.
Also, some of the lines are "wrapped" to fit into
the text box; lines are never "wrapped" in the
actual data files.
FEATURE_ID|FEATURE_NAME|FEATURE_CLASS|STATE_ALPHA|STATE_NUMERIC|COUNTY_NAME|COUNTY_NUMERIC|PRIMARY_LAT_DMS|PRIM_LONG_DMS|PRIM_LAT_DEC|PRIM_LON
G_DEC|SOURCE_LAT_DMS|SOURCE_LONG_DMS|SOURCE_LAT_DEC|SOURCE_LONG_DEC|ELEV_IN_M|ELEV_IN_FT|MAP_NAME|DATE_CREATED|DATE_EDITED
1479116|Monterey Elementary School|School|VA|51|Roanoke (city)|770|371906N|0795608W|37.3183753|- 79.9355857|||||323|1060|Roanoke|09/28/1979|09/15/2010
1481345|Asbury Church|Church|VA|51|Highland|091|382607N|0793312W|38.4353981|-79.5533807|||||818|2684|Monterey|09/28/1979|
1481852|Blue Grass|Populated Place|VA|51|Highland|091|383000N|0793259W|38.5001188|-79.5497702|||||777|2549|Monterey|09/28/1979|
1481878|Bluegrass Valley|Valley|VA|51|Highland|091|382953N|0793222W|38.4981745|-79.539492|382601N|0793800W|38.4337309|- 79.6333833|759|2490|Monterey|09/28/1979|
1482110|Buck Hill|Summit|VA|51|Highland|091|381902N|0793358W|38.3173452|-79.5661577|||||1003|3291|Monterey SE|09/28/1979|
1482176|Burners Run|Stream|VA|51|Highland|091|382509N|0793409W|38.4192873|-79.5692144|382531N|0793538W|38.4252778|- 79.5938889|848|2782|Monterey|09/28/1979|
1482324|Mount Carlyle|Summit|VA|51|Highland|091|381556N|0793353W|38.2656799|-79.5647682|||||698|2290|Monterey SE|09/28/1979|
1482434|Central Church|Church|VA|51|Highland|091|382953N|0793323W|38.4981744|-79.5564371|||||773|2536|Monterey|09/28/1979|
1482557|Claylick Hollow|Valley|VA|51|Highland|091|381613N|0793238W|38.2704021|-79.5439343|381733N|0793324W|38.2925|- 79.5566667|573|1880|Monterey SE|09/28/1979|
1482785|Crab Run|Stream|VA|51|Highland|091|381707N|0793144W|38.2854018|-79.528934|381903N|0793415W|38.3175|-79.5708333|579|1900|Monterey
SE|09/28/1979|
1482950|Davis Run|Stream|VA|51|Highland|091|381824N|0793053W|38.3067903|-79.5147671|382057N|0793505W|38.3491667|-79.5847222|601|1972|Monterey
SE|09/28/1979|
1483281|Elk Run|Stream|VA|51|Highland|091|382936N|0793153W|38.4934524|-79.5314362|383121N|0793056W|38.5226185|- 79.5156027|757|2484|Monterey|09/28/1979|
1483492|Forks of Waters|Locale|VA|51|Highland|091|382856N|0793031W|38.4823417|-79.5086575|||||705|2313|Monterey|09/28/1979|
1483527|Frank Run|Stream|VA|51|Highland|091|382953N|0793310W|38.4981744|-79.5528258|383304N|0793341W|38.5512285|- 79.5614381|780|2559|Monterey|09/28/1979|
1483647|Ginseng Mountain|Summit|VA|51|Highland|091|382850N|0793139W|38.480675|-79.527547|||||978|3209|Monterey|09/28/1979|
1483860|Gulf Mountain|Summit|VA|51|Highland|091|382940N|0793103W|38.4945636|-79.5175468|||||1006|3300|Monterey|09/28/1979|
1483916|Hamilton Chapel|Church|VA|51|Highland|091|381740N|0793707W|38.2945677|-79.6186591|||||823|2700|Monterey SE|09/28/1979|
1484097|Highland High School|School|VA|51|Highland|091|382426N|0793444W|38.4071387|-79.5789333|||||879|2884|Monterey|09/28/1979|09/15/2010
1484099|Highland Wildlife Management Area|Park|VA|51|Highland|091|381905N|0793439W|38.3181785|-79.577547|||||954|3130|Monterey SE|09/28/1979|
. . .CS 2505 Computer Organization I C07: GIS System in C
Version 2.10 This is a purely individual assignment! 4
Assignment
You will implement a system that indexes and provides search features for a file of GIS records, as described above.
Your system will build and maintain several in-memory index data structures to support these operations:
Retrieving data for all GIS records matching a given feature name and state
Retrieving data for the unique GIS record matching a given feature ID
Reporting the distance between two features specified by their feature IDs
Displaying the in-memory indices in a human-readable manner; this is purely for debugging purposes and will not
be evaluated for grading purposes.
You will implement a single software system in C to perform all system functions.
Program Invocation
The program will take the names of two files from the command line: (assuming that your executable is named gis):
gis
The command-line specifies:
the name of the file containing the search commands to be carried out, and other information described below
the name of the file to which the program will write its output
The database file will exist, and conform to the format description given above. If the command script file is not found the
program should write an error message to the console and exit gracefully. The log file may or may not already exist; if it
does not exist, a file with the specified name should be created; if it does exist, the old contents should be replaced.
System Overview
Among other requirements, your solution will implement indexing to support finding records that match certain criteria.
The basic notion of an index is that the user gives the index a key value, and the index returns a collection of locations at
which matching records can be found, or an indication that no records match the given key:
The client code neither knows, nor needs to know, how the index manages information internally (i.e., what data structure
is used); the client only needs to understand the "public" aspect of the transaction. That is, "I send the index a key value of
the appropriate type" and "I get back a collection of one or more file offsets in some agreed format".
client code
key value
offsets of
matching
records
in db file
index
suitable
data
structureCS 2505 Computer Organization I C07: GIS System in C
Version 2.10 This is a purely individual assignment! 5
It's not the job of the index to actually retrieve records for the client. In fact, the index doesn't need to know where the
records are stored, or in what format the records are stored.
The GIS records will be indexed by the Feature Name and State (abbreviation) fields. This name index will support
finding offsets of GIS records that match a given feature name and state abbreviation. It is entirely possible that the file
may contain several records that have the same Feature Name and State fields; for example, there are, or were, actually
three places in Virginia named Blacksburg.
The GIS records will also be indexed by their Feature ID fields. The feature ID index will support finding offsets of the
unique GIS record that matches a given feature ID. The feature ID is a unique identifier; that is, there will never be two
different features with the same ID.
See the section on Index Data Structures and Algorithms for the requirements your implementations of the two index
structures must satisfy.
When searches are performed, complete GIS records will be retrieved from the GIS database file that your program
maintains. The only complete GIS records that are stored in memory at any time are those that have just been retrieved to
satisfy the current search, or individual GIS records retrieved while building the index structures.
Command File
The execution of the program will be driven by a script file. Lines beginning with a semicolon character (';') are
comments and should be ignored. Blank lines are possible. Each command consists of a sequence of tokens, which will be
separated by single tab characters. A line terminator will immediately follow the final token on each line. The command file
is guaranteed to conform to this specification, so you do not need to worry about error-checking when reading it.
The first two lines in the command file that are not comments will specify the name of the GIS record file you will be
processing, and the size to use for the hash table (details on that later):
db_file
table_sz
The named file will be one that's supplied for testing. The command file and GIS record file should both be in the same
directory as your executable when you run your program. The remaining commands involve searches related to the
indexed records:
exists
Report the number of records in the database file that match the given feature name and state abbreviation, in the
following format:
N occurrences:
See the posted log files for examples.
details_of
For each record in the database file that matches the given feature name and state abbreviation, report the values of
the following information, with descriptive labels: file offset of record, feature ID, feature name, feature type, state
abbreviation, county, primary longitude, and primary latitude.
If more than one record matches the given criteria, list them in ascending order by feature ID. (This may be easier to
achieve if you make certain decisions about how to organize your feature name index.)
See the posted log files for examples.CS 2505 Computer Organization I C07: GIS System in C
Version 2.10 This is a purely individual assignment! 6
distance_between
Compute and display the orthodromic distance between the two features. See the following section on great circles
for instructions on how to do this. The calculation will require a number of trigonometric functions from the C math
libraries, so this time we will use the -lm switch when doing builds. Round the distance to the nearest tenth of a
kilometer when you print it.
If there is no record in the database file for one or both given feature IDs, log an informative message to that effect.
See the posted log files for examples.
The script files are guaranteed to conform to the syntax descriptions given above, so you do not need to worry about
checking for such errors.
Your program must echo each comment found in the command file to the log file, and echo each command, labeled and
numbered (see posted log files for the format).
Sample command scripts, and corresponding log files, will be provided on the website. As a general rule, every command
should result in some output. In particular, a descriptive message must be logged if a search yields no matching records.
Index Data Structures and Algorithms
Feature ID Index
You will use a simple array of struct variables as the physical data structure for the FID index; it's up to you to decide
what types to use for the fields in those struct variables will have, but they absolutely will not contain representations of
any information except the feature ID and the file offset of a single matching record.
The FID index must support searches that are O(log N), where N is the number of records in the database file; that means
you need to use binary search within the FID index, and so the entries there must be sorted by FID. There are two ways to
achieve this:
keep the entries in sorted order as you add them to the array
put all the entries into the array and then sort them
The first approach is somewhat less efficient, but can be done rather easily by shifting entries in the array when a new entry
is added. The second approach requires using some sensible sorting algorithm.
The Standard Library, from stdlib.h, supplies a function that performs Quicksort:
void qsort(void *base, size_t nelem, size_t width,
int (*compare)(const void*, const void*));
The interface may seem a bit confusing:
base a pointer to the array that is to be sorted; it's specified as void* so you can use qsort() on an array
holding any kind of data that you like
nelem the number of elements to be sorted
width the size (in bytes) of each element in the array; this is usually specified by using sizeof()
int (*compare) (const void *left, const void *right)
the name of a function that takes two void* parameters and returns an integer value; this function is
used by qsort() to compare the elements in the array; returning < 0, 0 or >0 depending on whether
*left < *right, *left == *right, or *left > *right.CS 2505 Computer Organization I C07: GIS System in C
Version 2.10 This is a purely individual assignment! 7
There's a relevant example of how to use this in the Advice section later in this specification.
When implementing the FID index, make the initial array of size 256. If the array becomes full as you build the index,
resize it by doubling the size. This may be accomplished by using realloc(). You may have to resize the array more
than once.
Feature Name Index
You will implement the feature name index as a hash table, using an array of pointers to struct variables as the physical
structure. The struct variables in the array will store three pieces of information: a string that you construct from the
feature name and the state abbreviation for a feature, an array of file offsets of matching records, and a pointer.
Here's an example of the sort of table you'll be implementing:
Dots represent NULL pointers, and the notation used for the strings just indicates that the two parts are combined into a
single string in some way that you can choose. You are free to combine the feature name and state abbreviation in any way
you like.
Elements are mapped into a hash table by making use of a "magic function" that takes a key value as input and produces a
nonnegative integer value, which is called the hash value of the key. The function is called a hash function. For this
assignment, the key values will be the feature name/state abbreviation pairs.
In the example above, let's assume the array has dimension 1000; in other words, we'd say the hash table has 1000 slots.
The hash function has taken the key ["Wigdahl Farm","IA"] and has produced an integer K (which we do not
know), such that K % 1000 == 1.
That is, the hash function supplies a nonnegative integer, computed in some fashion from the key for some record, and that
integer is modded by the table size to produce a valid table index at which the record will be stored. We call that slot the
home slot for the key.
Ideally, the hash function would never map two different keys to the same table slot. Do not count on achieving that except
in special circumstances. If the hash function does map two different keys to the same table slot, we say the keys have
collided in that slot. For this assignment, we will deal with collisions by simply treating each table slot as a linked list of
index entries (struct variables), where each index entry corresponds to a different key value.
Searching in a hash table proceeds by using the hash function to compute a table index from the key we are interested in,
and then searching the list in that table slot to see if there's an entry for the given key value. In the ideal case, searches in
the hash table are O(1), which is why you are using one.
["Elk Creek","IA"]
{532, 7842, 10943}
["Wigdahl Farm","IA"]
{5830}
["Abbes Fork","IA"]
{3714}
table slots
index entriesCS 2505 Computer Organization I C07: GIS System in C
Version 2.10 This is a purely individual assignment! 8
The assignment also involves an annoyance: feature names are not necessarily unique, even within the same state. For
example, the GIS database mentioned earlier lists three different places in Virginia named Blacksburg! Since we never
want to put two different entries (struct variables) that store the same key in a hash table, each entry in the table needs to
store the file offset for every record that matches the given key. That's illustrated in the table above with the key field
["Elk Creek","IA"], which has three matches.
There are lots of other, interesting things to say about the general concept of a hash table, but we will leave that for a course
on data structures.
The question remains: what sort of function could we use to compute an integer from a string? There are actually lots of
ways to do this. You will use the following hash function, which is based on one that was implemented as part of the first
UNIX implementation at Bell Labs:
uint32_t hashKey(char* key) {
uint32_t hashValue = 0;
for (int pos = 0; key[pos] != '\0'; pos++) { // use all elements
hashValue = (hashValue << 4) + key[pos]; // shift/mix
uint32_t hiBits = hashValue & 0xF0000000; // get high nybble
if (hiBits != 0) {
hashValue ^= hiBits >> 24; // fold high nybble into second nybble
}
hashValue &= ~hiBits; // clear high nybble
}
return hashValue;
}
Permitted Assumptions and Some Advice
You may assume that:
no GIS record will ever be longer than 500 characters
no line in the command file will ever be longer than 256 characters
the GIS record file and the command file will follow the specified syntax described earlier
The following observations are purely advisory, but are based on my experience, including that of implementing a solution
to this assignment. These comments are advice, not requirements.
First, and most basic, analyze what your GIS system must do and design a sensible, logical framework for making those
things happen.
Second, and also basic, practice incremental development! This is a fairly sizeable program, especially so if it's done
properly. My solution, including comments, approaches 1000 lines of code. It takes quite a bit of work before you have
enough working code to test on full input files, but incremental testing is extremely valuable. Try to think of implementing
the system feature by feature, and test as you go. Implement one index first, and test searches using that index. Then work
on the other index.
Record your design decisions in some way; a simple text file is often useful for tracking your deliberations, the alternatives
you considered, and the conclusions you reached. That information is invaluable as your implementation becomes more
complete, and hence more complex, and you are attempting to extend it to incorporate additional features.
Write useful comments in your code, as you go. Leave notes to yourself about things that still need to be done, or that you
are currently handling in a clumsy manner.CS 2505 Computer Organization I C07: GIS System in C
Version 2.10 This is a purely individual assignment! 9
Take advantage of tools. You should already have a working knowledge of gdb. Use it! The debugger is invaluable when
pinning down the location of segfaults; but it is also useful for tracking down lesser issues if you make good use of
breakpoints and watchpoints. Some memory-related errors yield mysterious behavior, and confusing runtime error reports.
That's especially true when you have written past the end of a dynamically-allocated array and corrupted the heap. This
sort of error can often be diagnosed by using valgrind.
Enumerated types are extremely useful for representing various kinds of information, especially about type attributes of
structured variables. For example, if implementing a different sort of system, we might find the following type useful:
// Vehicle.h
enum _VehicleMake {ACURA, CHEVROLET, DODGE, FORD, . . . , VOLVO};
typedef enum _ VehicleMake VehicleMake;
...
struct _Vehicle {
VehicleMake make;
char* model; // could also be an enumerated type
uint16_t year;
char* licenseNum;
char* vin;
...
};
typedef struct _Vehicle Vehicle;
...
Enumerated types let me use descriptive labels in my code, which makes it easier to understand the logic of the code.
You will need to use static tables of structures to organize language information; by static, I mean a table that has static
storage duration, and is private to the file in which it’s created. In some cases, such a table may be initialized directly, with
fixed data, when it is declared. For example:
// VehicleData.c
#define NUMRECORDS 50
static Vehicle VehicleTable[NUMRECORDS] = {
{FORD, "Fiesta", 1978, "LXR 804", . . .},
{GMC, "Safari", 1997, "ZFL 8473", . . .},
...
{CORD, "812 Westchester", 1937, "PAM 445", . . .}
};
In other cases, the table may be declared as a static, file-scoped array, and initialized after your program starts, by reading
data from a file. That will be the case for the index structures in this assignment.
Once a static table has been created, we can provide access to it by writing functions that can be called from other parts of
the system. For example:
// VehicleData.c
uint16_t lookupYearByLicense(char* license) { . . . }
Since this function is implemented in the same file as the table, the function can access the table to perform a search. We
would put the declaration of the function in a header file, so it can be called from elsewhere:
// VehicleData.h
uint16_t lookupYearByLicense(char* license);
Obviously, the sample code shown above does not play a role in my solution. On the other hand, I used this approach to
implement the data structures (arrays) that my indexing depends on. CS 2505 Computer Organization I C07: GIS System in C
Version 2.10 This is a purely individual assignment! 10
Write lots of "utility" functions because they simplify things tremendously; e.g., string manipulators, comparison functions,
command handlers, I/O functions, etc. My solution involves all of those.
Data types, like the structure shown above, play a major role in a good solution. I wrote a number of them, in addition to
the ones I used in indexing.
One key to becoming proficient and productive in C, as in most programming languages, is to take full advantage of the
library that comes with that language.
Explore string.h carefully (Google and find it at pubs.opengroup.org). Useful functions include strncpy(),
strncmp(), memcpy() and strtok(). There are lots of useful functions in the C Standard Library, not just in
string.h.
There are some advanced parsing features in C, involving the concept of a scanset used in a format specifier. Consult the
notes on this, linked from the Assignments page.
Another potentially useful function in the Standard Library, from stdlib.h, performs Quicksort:
void qsort(void *base, size_t nelem, size_t width,
int (*compare)(const void*, const void*));
The interface may seem a bit confusing:
base a pointer to the array that is to be sorted; it's specified as void* so you can use qsort() on an array
holding any kind of data that you like
nelem the number of elements to be sorted
width the size (in bytes) of each element in the array; this is usually specified by using sizeof()
int (*compare) (const void *left, const void *right)
the name of a function that takes two void* parameters and returns an integer value; this function is
used by qsort() to compare the elements in the array; returning < 0, 0 or >0 depending on whether
*left < *right, *left == *right, or *left > *right.
Here's a little example. Recall the Vehicle type defined earlier; suppose we had an array of Vehicle objects, and we
wanted to use qsort() to put them in ascending order by the model field. The following comparison function will do:
int compareModels(const void *left, const void *right) {
const Vehicle* pLeft = (Vehicle*) left;
const Vehicle* pRight = (Vehicle*) right;
return ( strcmp( pLeft->model, pRight->model) );
}
The call to qsort() would look something like:
qsort(vehicleList, nVehicles, sizeof(Vehicle), compareModels);
The use of void* in the implementation of the comparison function is typical, old-school C. It would be dangerous, if a
call to the comparison function passed pointers to anything other than Vehicle objects, and there's no way to check for
that since the use of void* prevents type-checking. On the other hand, since the calls to the comparison function are made
by qsort(), as long as we provide qsort() with an array of Vehicle objects, all will be well.
Now, why might sorting be useful? Since each of the index structures uses an array of struct variables to hold
information, the only way to achieve log(N) searches is if the arrays are sorted on the struct element that is used as the
search key.CS 2505 Computer Organization I C07: GIS System in C
Version 2.10 This is a purely individual assignment! 11
One approach would be to build the array, and then sort it. Another would be to keep the array in sorted order as it's built,
by placing each new entry into the array so that everything stays sorted. Either approach is acceptable. The second
approach would be better if your application required adding new elements on the fly, rather than all at once.CS 2505 Computer Organization I C07: GIS System in C
Version 2.10 This is a purely individual assignment! 12
What to Submit
You will submit your solution in a single, flat, uncompressed tar file to the Curator, via the collection point C07. That file
must include all of the .c and .h files involved in your solution, and nothing else.
Your submission will be graded by running the supplied test/grading code on it, with several different scripts and data files.
Grading
A tar file with a few scripts and reference logs is posted on the course website. For example:
; Script 1: very small db file, testing only the "exists" command
;
; Execution: gis script01.txt log01.txt
;
db_file VA_Highland.txt
table_sz 500
;
; Existence searches that should find a single match:
exists Devils Backbone VA
exists Jerkemtight Trail VA
exists Opossum Hollow VA
;
; Existence searches that should find no match:
exists Devils Backbone AK
exists McCowan Spring VA
You can compare your log file to the reference logs visually to determine if your results are logically correct:
; Existence searches that should find a single match:
Cmd 1: exists Devils Backbone VA
1 occurrences: Devils Backbone VA
----------------------------------------------------------------------
Cmd 2: exists Jerkemtight Trail VA
1 occurrences: Jerkemtight Trail VA
----------------------------------------------------------------------
Cmd 3: exists Opossum Hollow VA
1 occurrences: Opossum Hollow VA
----------------------------------------------------------------------
;
; Existence searches that should find no match:
Cmd 4: exists Devils Backbone AK
Not found in index: Devils Backbone AK
----------------------------------------------------------------------
Cmd 5: exists McCowan Spring VA
Not found in index: McCowan Spring VA
----------------------------------------------------------------------
You should match the contents of the reference logs more or less exactly. An updated set of test data, with a comparison
tool and shell script will be posted in a week or so.CS 2505 Computer Organization I C07: GIS System in C
Version 2.10 This is a purely individual assignment! 13
Pledge:
Each of your program submissions must be pledged to conform to the Honor Code requirements for this course.
Specifically, you must include the following pledge statement in the submitted .c file containing main():
// On my honor:
//
// - I have not discussed the C language code in my program with
// anyone other than my instructor or the teaching assistants
// assigned to this course.
//
// - I have not used C language code obtained from another student,
// the Internet, or any other unauthorized source, either modified
// or unmodified.
//
// - If any C language code or documentation used in my program
// was obtained from an authorized source, such as a text book or
// course notes, that has been clearly noted with a proper citation
// in the comments of my program.
//
// - I have not designed this program in such a way as to defeat or
// interfere with the normal operation of the Curator System.
//
//
//
We reserve the option of assigning a score of zero to any submission that is undocumented
or does not contain this statement.
Change Log
Version Posted Pg Change
2.00 April 1 Base document.
2.10 April 1 Multiple changes in phrasing but no adjustments to requirements.CS 2505 Computer Organization I C07: GIS System in C
Version 2.10 This is a purely individual assignment! 14
Orthodromic (great-circle) Distance
Given a sphere, two points on its surface are antipodal if the line connecting the points contains
the center of the sphere. u and v are antipodal.
Given a sphere, a great circle is a circle whose radius equals that of the sphere and having the
same center as the sphere.
Given two points P and Q on the surface of a sphere, if the two points are not at opposite ends of
a diameter of the sphere, there is a unique great circle containing P and Q.
The length of the shorter arc between P and Q is the orthodromic distance between them; that's shown in red in the
diagram. (Of course, if P and Q are antipodal, there are infinitely many great circles containing them, and the distance
between them is just half the circumference of the sphere.)
Now, there's a simple fact about the length of an arc of a circle. Given two points on a circle of
radius R, A and B, let the angle formed by the radii containing A and B be Θ. This is called the
central angle.
Then the length of the arc L between A and B is simply
R? .
But, how do we calculate the central angle of that arc, if the points are specified by giving their coordinates as longitude
and latitude values? The key lies in spherical trigonometry.
First of all, let's pass a plane through the center of the sphere (for our GIS system, this
would be the plane determined by the equator). Then, let's choose a line in that plane from
the center of the sphere to the surface (for our GIS system, that would simply be the zero
meridian).
Given the point P, we have a unique right triangle determined by the radius containing P
and a perpendicular from P to the plane of the equator:
The angle lambda between the zero meridian line and the base of that triangle is simply the
longitude of P.
The angle ? between the base and hypotenuse of the triangle is simply the latitude of P.
And, we can calculate the central angle, Δσ, by using the spherical law of cosines:
Δσ = arccos sin1 2 1 2 × sin + cos × cos × cos?Δλ??
where
is the latitude of
is the longitude of
is the latitude of
is the longitude of
Note that all the angles are expressed in radians (remember those?); you'll have to convert since the GIS data files give
longitude and latitude in degrees. And, we'll use the decimal values for longitude and latitude that are given in the GIS data
files, because those a slightly more precise than the values you'd get by converting the DMS values.
Figure 1
Figure 2
Figure 3CS 2505 Computer Organization I C07: GIS System in C
Version 2.10 This is a purely individual assignment! 15
Standard C does not define a value for π, and you need a value for converting degrees to radians. So you should add the
following statement in your code:
#define PI 3.14159265358979323846 // best approximation as double
Now, the Earth is not a sphere. In fact, the radius through the poles is about 6334.439 km, while the radius at the equator is
about 6378.137 km. For the purposes of this assignment, we will trust the IUGG and include the following statement:
#define RADIUS 6371.0088 // IUGG mean radius of Earth, in km
Credits
Figure 1, 3:
By CheCheDaWaff - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=48187293
Figure 2:
By Lfahlberg - Own work, CC BY-SA 3.0, https://commons.wikimedia.org/wiki/File:Angle_central_convex.svg