ADT for a Directed Graph (digraph)
- 首页 >> 其他1
The purpose of this assignment is to implement an ADT for a Directed Graph (digraph) and associated
operations in C. PA3 will utilize your List ADT once again, and some parts of this project will look very
familiar if you completed PA2.
The Adjacency List representation of a digraph also consists of an array of Lists; However, the edges in a
digraph are directed (like one-way streets); there may be an edge (3, 1) between vertex 3 and vertex 1
although there is no edge (1, 3) between vertex 1 and vertex 3. You will create a Digraph ADT that
represents a directed graph as an array of Lists. As in PA2, each vertex is associated with an integer label
in the range 1 to numVertices, where numVertices is the number of vertices in the digraph.
Your client program will use this Digraph ADT to perform various operations on the digraph that are
described below. The client program using your Digraph ADT will be called DigraphProperties, and
will take two command line arguments (here % denotes the unix prompt):
% DigraphProperties input_file output_file
For PA3, you are to write a makefile that creates the executable file, called DigraphProperties, similar to
the makefiles that you used for PA1 and PA2. Include a clean utility in your makefile.
2
Input and Output File Formats
The first line of the input file describes the digraph, and the other lines describe operations to be
performed on the digraph. Most of these operations simply print values returned by functions
implemented in a Digraph ADT that you will implement in Digraph.c and Digraph.h
The first line starts with an integer that we’ll call numVertices that tells you the number of vertices in the
digraph. The rest of that line gives pair of distinct numbers in the range 1 to numVertices, separated by a
space. These numbers are the vertices for an edge. There is a comma between numVertices and the first
edge, and a comma between edges. We’ll also put a space after each comma for readability. All edges
are directed.
4, 1 3, 2 4, 3 4, 4 3, 3 2
Note that the same digraph could be represented by each of the following input lines:
4, 3 2, 4 3, 1 3, 3 4, 2 4
4, 3 2, 4 3, 1 3, 3 4, 2 4, 1 3
even though the edge (1, 3) appears in the last digraph twice.
However, the following digraph is different:
4, 3 2, 4 3, 1 3, 3 4, 2 4, 3 1
because edges {1, 3} and {3, 1} are different edges.
The rest of the lines in the input file that correspond to digraph operations on operands. Each line begins
with a keyword, which is followed by the operands. You should output the entire input line into the
output file as a separate line. That line should be followed by another line that just has the result of the
operation. If an input line does not follow one of the specified formats, the output on the second line
should just say ERROR, and processing of the input file should continue. For example, it’s an error if the
operation isn’t one of the legal operations (capitalization matters), or if the operation has the wrong
number of operands, or if an operand that’s supposed to be a vertex isn’t a number between 1 and
numVertices.
So the output file should always have twice as many lines as the input file (not counting the line that
describes the digraph). For example, if an input file has 7 lines, the first of which is the digraph, then the
output should have 12 lines in it, containing each digraph operation lines, each followed immediately by
the result of that operation (or ERROR).
As in PA2: If the digraph in the first line of the input file is illegal, then you should just print out that
first line, followed by another line with the word ERROR on it. Do not process any subsequent lines in
the input file. What’s an illegal digraph? A digraph is illegal if the number of vertices isn’t positive, or if
an edge specifies a vertex that doesn’t exist in the digraph. For example, if there are 6 vertices, both
vertex 0 and vertex 7 are illegal. Also, there shouldn’t be any “self-loop” edges between a vertex and
itself, so an edge (3, 3) would also make the digraph illegal.
3
Here are the operation tokens, their operands, and descriptions of the results. All of these operations
simply print values returned by functions implemented in the Digraph ADT that you will implement in
Digraph.c and Digraph.h.
• PrintDigraph takes no operands. It prints out the current digraph, in the same format as an input line.
Each edge should be printed out exactly once, following the same format as the input line for the
digraph, starting with the number of vertices, then a comma, then the edges.
For PA3, you must print the edges sorted by source and then by destination. If u < v, print all the
edges whose source is vertex u before all the edges whose source is vertex v. Also, if (u, v1) and
(u, v2) are edges, and v1 < v2, print (u, v1) before (u, v2).
Strong suggestion: Keep the destinations for each vertex as a sorted list. That will make it easy to
print out all the edges in the current digraph sorted as described.
• GetOrder takes no operands. It returns the total number of vertices in the current digraph.
• GetSize takes no operands. It returns the total number of edges in the current digraph.
• GetOutDegree takes a vertex u as operand. It returns the number of vertices that are outgoing
neighbors of u in the current digraph. The out degree of a vertex u is the number of vertices v such
that (u, v) is an edge in the digraph.
• AddEdge takes two vertices u and v as operands. It adds the edge (u, v) to the current digraph. If that
edge didn’t exist in the current digraph, it returns 0. If that edge already existed in the current digraph,
it returns 1, since no action was taken. (This is not treated as an error.)
• DeleteEdge takes two vertices u and v as operands. It deletes the edge (u, v) from the current digraph.
If that edge already existed in the current digraph, it returns 0. If that edge didn’t exist in the current
digraph, it returns 1, since no action was taken. (This is not treated as an error.)
• Distance takes a pair of vertices u and v as operands, and outputs the distance from u to v in the
current digraph, and INF if there is no path from u to v. Of course, this must be a directed path. Yes,
if the operands u and v are the same vertex, then there is a path from u to v that has no edges in it, so
the output will be 0. The distance from u to v is the length of the shortest path from u to v.
• Acyclic takes no operands. It outputs YES if the current digraph is acyclic, and NO otherwise.
• TopoSort takes no operands. It outputs a topological sort of the current digraph, if the current digraph
is acyclic. (There may be more than one correct topological sort.) However, if the digraph is not
acyclic, it outputs IMPOSSIBLE.
4
Example: For an input file that consists of the following lines:
4, 1 3, 2 4, 3 4, 4 3, 3 2
PrintDigraph
GetSize
Distance 1 4
Distance 3 1
Acyclic
TopoSort
GetOutDegree 4
GetOutDegree 5
DeleteEdge 2 4
Acyclic
DeleteEdge 1 2
DeleteEdge 4 3
AddEdge 1 2
Acyclic
TopoSort
the output file should be:
PrintDigraph
4, 1 3, 2 4, 3 2, 3 4, 4 3
GetSize
5
Distance 1 4
2
Distance 3 1
INF
Acyclic
NO
TopoSort
IMPOSSIBLE
GetOutDegree 4
1
GetOutDegree 5
ERROR
DeleteEdge 2 4
0
Acyclic
NO
DeleteEdge 1 2
1
DeleteEdge 4 3
0
AddEdge 1 2
0
Acyclic
YES
TopoSort
1 3 4 2
5
Your Digraph ADT will be implemented in files Digraph.c and Digraph.h. Digraph.c defines a struct
called DigraphObj, and Digraph.h should define a type called Digraph that points to this struct.
Your DigraphObj struct is required to have the following fields. It is okay for you to have other fields, but
you should document them in your README file.
• numVertices, the number of vertices in the digraph, called the order of the digraph.
• numEdges, the number of edges in the digraph, called the size of the digraph.
• An array of Lists, whose jth element contains the outgoing neighbors (destinations) from source
vertex j, the vertices k such that (j, k) is an edge in the digraph.. This is the outgoing Adjacency
List for vertex j. You’ll use your List ADT from PA1 and PA2 to represent the outgoing
Adjacency List for each vertex. We strongly suggest that you keep the outgoing neighbors in
sorted order, to make printing the digraph simpler.
• Other arrays similar to the marks that you used in PA2. You’ll have to determine yourselves what
arrays (if any) you’ll need for PA3.
Since indexes of C arrays begin at 0, you might choose to use arrays that have length numVertices +1, but
only use indices 1 through numVertices. That’s so that array indices can be directly identified with vertex
numbers (and index 0 is ignored). Of course, if you prefer to have arrays of length numVertices, rather
than numVertices + 1,that works, as long as you always remember to look at position j-1 when you’re
accessing information about vertex j.
6
Your Digraph ADT should export the following operations through the file Digraph.h:
/*** Constructors-Destructors ***/
Digraph newDigraph(int numVertices);
// Returns a Digraph that points to a newly created DigraphObj representing a digraph which has
// n vertices and no edges.
void freeDigraph(Digraph* pG);
// Frees all dynamic memory associated with its Digraph* argument, and sets
// *pG to NULL.
/*** Access functions ***/
int getOrder(Digraph G);
// Returns the order of G, the number of vertices in G.
int getSize(Digraph G);
// Returns the size of G, the number of edges in G.
int getOutDegree(Digraph G, int u);
// Returns the number of outgoing neighbors that vertex u has in G, the number of vertices v such
// that (u, v) is an edge in G. Returns -1 if v is not a legal vertex.
List getNeighbors(Digraph G, int u);
// Returns a list that has all the vertices that are outgoing neighbors of vertex u, i.e.,
// a list that has all the vertices v such that (u, v) is an edge in G.
// There is no input operation that corresponds to getNeighbors.
/*** Manipulation procedures ***/
int addEdge(Digraph G, int u, int v);
// Adds v to the adjacency list of u, if that edge doesn’t already exist.
// If the edge does already exist, does nothing. Used when edges are entered or added.
// Returns 0 if (u, v) is a legal edge, and the edge didn’t already exist.
// Returns 1 if (u, v) is a legal edge and the edge did already exist.
// Returns -1 if (u, v) is not a legal edge.
int deleteEdge(Digraph G, int u, int v);
// Deletes v from the adjacency list of u, if that edge exists.
// If the edge doesn’t exist, does nothing. Used when edges are deleted.
// Returns 0 if (u, v) is a legal edge, and the edge did already exist.
// Returns 1 if (u, v) is a legal edge and the edge didn’t already exist.
// Returns -1 if (u, v) is not a legal edge.
You probably will want to have other functions in Digraph.c/Digraph.h that are similar to the unvisitAll,
getMark and setMark functions in PA). But you’ll have to figure out what those functions are yourselves.
7
/*** Other operations ***/
void printDigraph(FILE* out, Digraph G);
// Outputs the digraph G in the same format as an input line, including the number of vertices
// and the edges. The edges should be in sorted order, as described above.
void distance(FILE* out, Digraph G, int u, int v);
// Outputs the distance between vertices u and v if there is a directed path between them in the
// digraph. Otherwise, outputs INF
void acyclic(FILE* out, Digraph G);
// Outputs YES if the digraph is acyclic. Otherwise, outputs NO.
void topoSort(FILE* out, Digraph G);
// Outputs a topological sort of the digraph. If the digraph is not acyclic, outputs IMPOSSIBLE.
In addition to the above prototypes, Digraph.h will define the type Digraph, as well as #define constant
macros for constants that you decide that you need to implement the functions described above.
8
What does DigraphProperties.c do?
DigraphProperties reads in the first line of the input file, and uses newDigraph() and addEdge() to create a
digraph that corresponds to that line, as described above. When it reads in a subsequent line whose
operation token is one of GetOrder, GetSize or GetOutDegree, it calls the corresponding function in
Digraph.c (using the digraph that was read in as the digraph argument), and outputs the value returned.
For other operations tokens (Distance, Acyclic, TopoSort), it calls the corresponding function in Digraph.c
which performs its own output. Obvious requirements about number of operands should be checked in
DigraphProperties.c; obvious requirements about operand values in should be checked in Digraph.c. (If G
isn’t a digraph, your program has a bug, and should exit.)
About distance: Find the distance between two vertices of a digraph using BFS.
About acyclic: A digraph is acyclic if and only if a DFS of the digraph produces no back edges, as
described in Section 22.3 of CLRS. You’ll have to figure out how to implement this.
• About topological sort: You may do topological sort using either DFS either from in-degree 0 vertices or
(more traditionally) from all vertices. We’ll accept any legal topological sort result, not just some specific
results.
9
Submitting your Solution
Since the Digraph ADT includes uses a List to represent the outgoing neighbors of each vertex, and
getNeighbors() returns a List, the file Digraph.h should #include the header file List.h. (See the handout
C Header File Guidelines, which is posted under Resources→General Resources, for commonly accepted
policies on using .h files.) What files should Digraph.c and DigraphProperties.c include?
You will submit 7 files for this project in the directory PA3.
List.c, List.h, Digraph.c, Digraph.h, DigraphProperties.c, makefile, README
Added in readme: Your README must briefly explain your algorithms for of distance, acyclic and topoSort, and
provide complexity of your algorithms (e.g., O(n+m) ).
• We’ll deduct credit for inefficient algorithms.
Based on the makefile from PA1, you should know how to construct a makefile that will create an
executable called DigraphProperties, and will include a clean utility that removes all object files,
including DigraphProperties. (There is no ListClientfor PA3.) We will post some public input
files that you can use for testing, but we will also test your program on other private input files.
Remember that the compile operations mentioned in the Makefile must invoke the gcc compiler
with the flags -std=c99. You may develop your programs on any system, using any editor or IDE.
But it is a requirement of this and all other assignments in C that your program compile without
warnings or errors under gcc, and run properly in the Linux computing environment on the UNIX
Timeshare unix.ucsc.edu provided by ITS. In particular, you should not use the cc compiler. Your
C programs must also run without memory leaks. Test them using valgrind on unix.ucsc.edu by
doing:
% valgrind --leak-check=full program_name argument_list
You also must submit a README file for this (and every) assignment. The README file should list each
file (other than the README file itself) that you submitted, together with a brief description of its role in
the project, and any special notes to the people grading your assignment. The README is essentially a
table of contents for the assignment. For PA3, you must also describe all the fields that you have in your
DigraphObj struct besides the ones provided to you (numVertices, numEdges and the array of Lists)
earlier in this document. Also, your README must briefly explain your algorithms for distance, acyclic
and topoSort, and provide complexity of your algorithms (e.g., O(n+m) ). We’ll deduct credit for
inefficient algorithms.
10
Note1 on Lecture 16 and PA3
This is the first of 3 notes on PA3 that should clarify and help you complete PA3.
An edge means that the software package that's the source of the edge must be installed before the software package
that's the destination of the edge. So (for example) dpkg must be installed before coreutils, and coreutils must be
installed before libbz2. I said that backwards during the Feb 15 presentation of Lecture 16, and I wanted to fix that.
Topological sort uses Depth-First Search, produces an ordering for installation of the software packages that always
install a software package before other software packages that depend on it. So the ordering shown on what is now
Slide 45 of Lecture 16 for package installation is an ordering that works, starting with dpkg and ending with
multiarch_support:
dpkg
coreutils
tar
libbz2
libselinux1
multiarch_support
That topological sort (different from ordinary sort) is obtained by starting with an empty stack, and pushing a vertex onto
the stack when it gets marked as AllDone during the DFS of the graph. So multiarch_support is the first vertex pushed
onto the stack (hence installed last), and dpkg is the last vertex on the stack (hence installed first).
The Finish time for a vertex is the term that it gets marked as AllDone. Vertices that have smaller "Finish times" will
be lower down in the stack that vertices that have larger "finish times". For example, the "Finish time" for libbz2 is 6,
and the "Finish time" for coreutils is 10. That means that libbz2 will be lower down in the stack than coreutils.
Time isn't clock time; it's just an integer that starts at 0 and keeps getting incremented. We used Finish time to prove
that topological sort works; we haven't used Start time yet. But (since you have to implement topological sort in PA3)
it's worth noting that you don't have to keep track of time in your implementation! Instead, you can just push vertices on
your stack when they are marked as AllDone. (Of course, vertices that are marked AllDone are never visited a second
time.) And the topological sort result is the list of vertices in the stack, read from the top of the stack down. Of course,
in PA3 you'll just be using vertex numbers, not strings such as "coreutils".
Note2 on Lecture 16 and PA3 will discuss DFS for Undirected Graphs, and explain how to determine if an Undirected
Graph is Acyclic. The last note, Note3, will continue the discussion for Directed Graphs.
11
Note2 on Lecture 16 and PA3
This is the second of 3 notes on Lecture 16 and PA3 that should clarify Lecture 16 and help you complete PA3. This
note discusses DFS and detecting cycles in Undirected Graphs. Note3 discusses Directed Graphs.
For Undirected Graphs, let's review DFS (which we've already discussed in class), as well as how to use DFS to
test whether an Undirected Graph is Acyclic.
As we saw, to do a DFS of an Undirected Graph G, you begin by marking all vertices as Unvisited. Then you perform
DFS recursively starting at some vertex v.
DFS(v):
Mark v as InProgress
For all vertices w that are Neighbors(v),
If w is Unvisited
DFS(w)
Mark v as AllDone
Return
That's a recursive DFS; MA3 had you write the equivalent non-recursive DFS.
But that doesn't complete DFS for G. If Undirected Graph G is not Connected, then it has more than one Connected
Component, and you've only "visited" the Connected Component that has vertex v in it. So we need to perform DFS(v)
for every vertex v that's unvisited. And this takes time O(n+m), where n is the number of vertices in G and m is the
number of edges in G. [Reason: Every edge is traveled once in each direction, and every vertex has to be touched
once as we loop through all the vertices to do DFS on each unvisited vertex.]
Now how can we use this algorithm to decide if there's a cycle in Undirected Graph G? There's a cycle in G if an only if
during DFS we reach a vertex that's InProgress. That means that we're following a path in the DFS tree and there's an
edge that goes back to a vertex on that same path! Such an edge is called a "Back Edge"; see page 609 of CLRS. If
we find a Back Edge, then of course there must be a cycle in G. And if there is a cycle in G, then we must find a Back
Edge. (That may be less obvious.)
So here's how you modify DFS to tell if G has a cycle. In what follows, Neighbors of a vertex v means Outgoing
Neighbors, the vertices w such that there is an edge (v,w) in the digraph.
Mark all vertices as Unvisited.
Boolean DFS_Test_Acyclic(v):
Mark v as InProgress
For all vertices w that are Neighbors(v)
If w is InProgress
Return(TRUE) // Found a Back Edge
If DFS(w)
Return(TRUE) // Found a Back Edge
Mark v as AllDone
Return(FALSE) // Didn't find a Back Edge
If the value returned is TRUE, G has a cycle. If the value returned is FALSE, we didn't find a cycle. But that doesn't
prove that G is Acyclic, since it could have more than one Connected Component.
Once again, we have to loop going through all the vertices of G, performing DFS_Test_Acyclic(v) for
each Unvisited vertex so that we can determine whether any Connected Component of G has a cycle. As soon as
DFS_Test_Acyclic(v) returns TRUE for a vertex v, we know that there's a cycle in G. But if DFS_Test_Acyclic(v)
returns FALSE for every vertex v, then G is Acyclic.
In Note3, we'll continue this discussion for Directed Graphs, since PA3 addresses Directed Graphs.
12
Note3 on Lecture 16 and PA3
This is the third of 3 notes on Lecture 16 and PA3 that should clarify Lecture 16 and help you complete PA3. This note
discusses DFS and detecting cycles in Directed Graphs.
In this note, we'll assume that G is a Directed Graph (Digraph). Of course, edges in a directed graph are "one-way".
If there is an edge (u, v) in G, then (v, u) might also be an edge in G, or (v, u) might not also be an edge in G.
You should already know how to do Topological Sort on a Digraph using DFS based on Lecture 16 and Note1 in this
series. So we'll explain how to determine whether a Digraph is Acyclic. You might be able to figure this out for
yourselves based on Lecture 16 plus Note1 and Note2. Try to figure this out for yourselves before reading further ...
... Okay, if you're still reading, here's a simple way to determine whether a Digraph is Acyclic. We're going to describe
this in words, rather than giving the algorithm, since you already know how Topological Sort on a Digraph using DFS
works. And once again, it's not important to keep track of Start time and Finish time, but it is important to know whether
a vertex is marked as Unvisited, InProgress or AllDone. (By the way, CLRS uses the terms White, Gray and Black to
describe these marks.)
Begin by marking all vertices as Unvisited. Now start off at a vertex that has in-degree 0. If there are no such vertices,
there you know that the graph has a cycle (as we discussed in class). Perform Topological Sort, counting how many
Unvisited vertices you've reached, including the vertex that you started from (that has in-degree 0). Of course, you're
following directed edges when you go to (outgoing) neighbors of a vertex, since this is a Directed Graph.
• If a neighbor is Unvisited, you're going to mark it as InProgress and visit it using DFS (which should return count of
number unvisited vertices it reached, which gets added in).
• If a neighbor is AllDone, you don't have to do anything with that neighbor. That doesn't mean that there's a cycle.
We'll explain why below; look for ***.
• But if a neighbor is InProgress, then the Digraph has a cycle. No need to keep looking; you're done. Return -1
instead of count. (If your DFS function invocation returns -1, its caller should also immediately return -1, rather
than adding that to the unvisited count.)
If DFS returns -1, then there's a cycle. If the DFS invocation from the starting vertex (which has in-degree 0) doesn't
return 0, no cycle was found. But remember the number that it returned, the number of Unvisited vertices that were
found starting at that in-degree 0 vertex.
Of course, there may be multiple vertices that have in-degree 0. Do the same thing for each of the vertices that have
in-degree 0, counting the number of Unvisited vertices that you found. Of course, you keep old marks; you only
marked vertices as Unvisited at the beginning, before you took care of the first vertex that has in-degree 0. And as
soon as you discover a cycle, you're done; the graph has a cycle.
Now suppose that you completed this for every vertex that has in-degree 0. That does not prove that the graph is
Acyclic. Example: Suppose that you had 6 vertices, with edges (A, B), (B, C), (C, A), (D, E) and (D, F). The only
vertex that has in-degree 0 is D; it's count is 3 (for D, E and F). But the graph has a cycle A B C A.
So you have to add up the counts for all the vertices that have in-degree 0. If that count equals the number of vertices
in G, then the digraph is Acyclic. If that count doesn't equal the number of vertices in the digraph, then the graph has a
cycle.
Note that the basic idea is very similar to the way you detect cycles in an Undirected Graph: If DFS ever reaches a
node that's InProgress, then there's a cycle. But because it's a Directed Graph, that's not the entire algorithm.
Two additional important comments:
13
• *** Why is it okay for us to reach a vertex that's marked AllDone during the DFS? That's because there could be a
vertex that is reachable from two different vertices that have in-degree 0. Here's a simple example with that
property: The 3 vertex digraph that has edges (A, C) and (B, C). Both A and B have in-degree 0, and C is
reachable from both of them.
• What happens if you do topological sort on a graph that has a cycle? You might have an infinite loop going
around the cycle forever, if you're careless; that's very bad. [Consider what would happen on the 5 vertex graph
with edges (A, B1) (B1, B2), (B2, B3), (B3, B4), (B4, B1).]
Or you might not reach all of the vertices in the graph, as with the6 vertex graph (A, B), (B, C), (C, A), (D, E) and
(D, F) mentioned earlier. That's also bad.
But in PA3, the output for topoSort on a graph that has a cycle is supposed to be IMPOSSIBLE, so you'll have to
handle topoSort correctly on a graph that isn't Acyclic. Perhaps you can do topological sort and test for cycles in
the same algorithm? Hmmm ...
• And if you want to use Start and Finish time (instead of Unvisited, InProgress and AllDone marks), then you may
do that, and you can figure out if a vertex is Unvisited, InProgress or AllDone based on Start and Finish. Start and
Finish times are initialized for all vertices to a special "NIL" value. (You might use a -1 constant for NIL, if all times
are non-negative.)
• A vertex is Unvisited if Start is NIL. (In that case, Finish also is NIL.)
• A vertex is InProgress if Start isn't NIL but Finish is NIL.
• A vertex is AlDone if Finish isn't NIL. (In that case, Start also isn't NIL.)
So there no need to have both a) Start and Finish time and b) Unvisited, InProgress
and AllDone marks for PA3; either will do.