B-Tree辅导、辅导C++B-Tree编程

2018.04.19 - 首页 >> C/C++编程

Developed by Jerry Cain of Stanford University and adapted by Jingling Xue.

We’ve learned that C++ considers the preservation of type information to be much more

impor-tant than C ever does. With that understanding, C++’s support for generic containers and

algorithms is much more sophisticated, and you should practise writing template containers in

C++—afterwards, you’ll be in a better position to compare and contrast the two languages,

because you’ll have written serious, industrial-strength data structures in C++. In general, it’s

hard to write quality generic code no matter what language you’re using!

1. Aims

In this assignment you will get a chance to implement a reasonably complex, generic

C++ container, together with proper iterators. You will practice working with template

classes and everything this entails. You will write a bi-directional iterator from scratch,

use iterator adaptors to provide reverse iterators. You will also employ lots of the C++

features you have learnt about in this course. You will also have a second chance to

practice working move semantics if you didn’t get it right in Assignment 3.

2. The B-Tree Container

B-Trees are generalised binary search trees, in that a binary-search-tree-like property is

maintained throughout the entire structure. The primary difference—and it’s a big

one—is that each node of the tree has many children, from a handful to thousands.

Actually, it’s the number of client elements stored within the node that determines the

number of children. In the same way that a single element partitions a binary search tree

into two sub-trees, a

B-Tree node storing four

clients elements

partitions the B-Tree into

five sub-B-Trees. If the

four elements are stored

in sorted order, it’s very

easy to constrain what

type of elements can

reside in each of the five

sub-B-Trees.

Check out the B-Tree node

we’ve drawn on the next

page. Note that C, J, S, and W

split the rest of the B-Tree

into sub-B-Trees that store elements A through B, D through I, K through R, T through

V, and X through Z. This B-Tree property is maintained throughout the entire structure.

Think of each node as a mother with quintuplets instead of twins.

The above picture implies that the stored elements are characters, but any element

capable of comparing itself to others can be stored in a B-Tree. And the decision to store

four keys per node was really arbitrary. While all nodes will store the same number of

elements (except along the fringe—sometimes there just aren’t enough elements to fill up

the fringe of the B-Tree), the actual node size can be tweaked to maximise performance.

In practice, the number is typically even, and is chosen so that the total amount of

memory devoted to one node is as close to a memory page size as possible. That way, all

of your elements are likely to be right next to each other in memory, and the operating

system maximises the number of elements in the cache at any one time.

We’re not going to burden you with those optimisations though. Your task is to

implement a generic B-Tree template capable of storing any type whatsoever. Here’s a

very brief look at the template interface. You’ll need to provide implementations for the

constructor and destructor, the insert operation, the two versions of find (which are

precisely the same, save the const versus non-const business), and the output

operator <<. You will also need to provide full, explicit copy control using value

semantics, i.e., no shared pointees.

template <typename T>

class btree {

public:

btree(size_t maxNodeElems = 40);

btree(const btree<T>&);

btree(btree<T>&&);

btree<T>& operator=(const btree<T>&);

btree<T>& operator=(btree<T>&&);

friend ostream& operator<< <> (ostream&, const btree<T>&);

// typedef’s for iterators and declarations

// for begin(), end(), rbegin(), rend() and

// cbegin(), cend(), crbegin() and crend() go here

iterator find(const T&);

const_iterator find(const T&) const;

pair<iterator, bool> insert(const T&);

// make sure your destructor does not leak memory

// (if raw pointers are used)

~btree();

private:

// your internal representation goes here

};

The btree.h header file tells you everything you need to know about what you’re

implementing and what you aren’t. You must implement what’s outlined above, and in

doing so, you should write clean, compact C++ code that you’d be proud to show mum

and dad.

3. Implementation Details

3.1 Internal Representation

You are free to choose whatever suitable STL container to to represent a b-tree node. As

for the pointers linking a node with its child nodes and parent node, you can either use

raw pointers or smart pointers. This is completely up to you. However, either raw

pointers or smarter pointers must be used. failure to do so will result in a final mark of 0 for

this deliverable.

3.2 Insertion

Remember that you are not required to balance the B-Tree.

We’re leaving the details of the concrete representation up to you, but the details of

insertion are complicated enough that you deserve a few illustrations. Once you have

insertion down, search is more or less straightforward.

For the purposes of illustration assume that nodes are engineered to store up to four

characters. Initially, a B-Tree is empty, but after inserting two elements, the state of the

tree would look as follows:

Because there are only two elements in the tree, we needn’t give birth to any child nodes.

In fact, it won’t be until we saturate the root node above with two more elements before

the addition of new elements will mandate the allocation of a child.

To insert the letter P at this point would require that the X be shifted over to make room.

The node is designed to store a total of four elements, so the P really belongs in between

the M and the X:

Throw in a G for good measure, and voilà, you have a saturated node of size four:

At this point, the node is structured this way for the lifetime of the B-Tree. But now that

it’s completely full, it will need to give birth to child nodes as needed.

Consider now the insertion of the letter T. Since five’s a crowd, a new node—one capable

of storing all types of characters ranging from Q through W—is allocated, and a pointer

to this node is recorded as being ‘in between’ elements P and X of the root:

Curious how the tree grows after we insert a B, a Z, an N, and then an R? Check out the

following film in slow motion:

Note that each of the new nodes was allocated because the first element inserted into it

was required to reside there. T had to go in the node stemming between P and X because

anything alphabetically in between P and X needs to be placed there. B is inserted before

G, Z after X and N between M and P. Finally R is T’s sibling.

As more and more characters are inserted, it’s likely that even the child nodes will

saturate, and when that happens, each of the children starts its own family. Consider the

insertion of S and W into the Frame-5 B-Tree above. R and T will allow S and W to join

them:

The insertion of Q and V at this point would require the introduction of third-level

nodes, as with:

In theory, there’s no limit whatsoever to how deep the tree can get. At the risk of

repea-ting ourselves and seeming monotonous, let us say some important things again:

The tree above stored four elements per node, but B-Trees can store up to any

prede-fined maximum. The default value of the constructor argument implies that you

should store a default of 40 elements per node. Of course, the code you write should

work for any value greater than 0.

The tree above stored characters, but the B-Tree is templated, so we could have just as

readily used a B-Tree of strings or doubles or records. Of course, the code you write

shouldn’t be character-specific, but general enough to store any type whatsoever. That’s

where templates come in.

The underlying implementation of the B-Tree is for you to decide, but the

implemen-tation certainly must make use of a multiply-linked structure in line with the

drawings above. Failure to do so is likely to result in a nice, round mark for this

deliverable (0!). How do you store the client elements and the child nodes? That’s your

job.

The insert function returns a pair that is a class template provided in the C++ STL. The

first element in the pair is an iterator positioned at the element inserted. The second

element is a Boolean value indicating whether the insertion is successful or not. The

second element is false if the element to be inserted is found already in the B-Tree and

true otherwise.

3.3 Searching

The find function returns an iterator positioned at the element found in the B-Tree. If

the element being searched is not found in the B-Tree, an iterator that is equal to the

return value of end() is returned.

3.4 Copy Control

All copy-control members must be implemented correctly. The move constructor and

move-assignment operators are implemented in a few lines each. For a moved-from

btree object, you should know by now how to put it in a state correctly according to the

C++14 requirement. For this assignment, a moved-fromm b-tree should be empty.

3.5 Iterators

You are required to implement a bonafide bi-directional iterator type capable of doing

an inorder walk of the B-Tree. An inoder traversal will allow all values in a B-Tree to be

printed in the increasing order of their values, where the comparison operator is defined

by the type of the elements in the B-Tree. Since the iterators are bi-directional they will

also support the reverse inorder traversal. You MUST implement your iterators as an

external class (btree_iterator), failure to do so will result in a final mark of 0 for this

deliverable.

The iterators will come in const and non-const forms, just like they would for any

built-in STL container. The iterators should respond to the *, ->, ++, --, == and !=

operators and be able to march over and access all of the sorted elements in sequence,

from 1 to 32767, from “aadvark” to “zygote”, etc., as well as support backward

move-ment. The challenge is working out how ++/-- behave when a neighboring

element resides in another part of the B-Tree.

Your iterators should be appropriately declared as bi-directional. You should implement

and test the complete bi-directional iterator functionality. This includes pre/post

incre-ment and decrement. The post-increment/decrement operations should return a

copy of the value pointed to before the increment/decrement is performed.

It can be argued that the non-const iterators to a B-Tree are dangerous, since changing

the values of elements is likely to break the B-Tree invariant. This is indeed the case, but

you still have to implement this feature to develop some practical experience and we

will assume the responsibility. Your non-const iterator will be tested, e.g., increment

every value in the B-Tree by one. It is important that you get to practice writing a

non-const iterator.

In addition, you are to implement const and non-const forms of reverse iterators.

These reverse iterators should be returned by the rbegin() and rend() B-Tree

mem-ber functions. You should use iterator adaptors to implement your reverse

iterators. Once you have managed to make the const and non-const forms of the “forward”

iterators work for you, it takes usually a few extra minutes to implement the reverse iterators by

using iterator adaptors.

You can now implement the C++14 begin and end member functions, cbegin(), cend(),

crbegin() and crend(). This can be done in a few minutes by implementing them in terms of the

member functions that you have already implemented for the “forward” and reverse iterator.

3.6 Output Operator

You are to implement the output operator for the B-Tree class. This operator should just

print the values of the nodes in the B-Tree in breadth-first (level) order. The values

should be separated by spaces. No newline is to be output. For example, the output

operator would print the following when called on the last version of the tree in 3.1:

G M P X B N R S T W Z Q V

You may assume that the types stored in your B-Tree themselves support the output

operator. Note that we will use the output operator in combination with the iterator

traversal to ensure that your B-Tree structure is above-board correct. This means that

you must construct your B-Tree exactly as indicated, any other approach, including any

balancing scheme, is likely to lead to failed tests.

4. Testing the B-Tree

We’ve provided a simple C++ program that’ll exercise your B-Tree just enough to start

believing in it when it passes. It’s hardly exhaustive, as it only builds B-Trees of integers.

It does, however, do a good job of confirming that find and insert do their jobs, and it

does an even better job of showing you how quickly B-Trees support the insertion of

roughly 5,000,000 random integers in less than 90 seconds. Try that with a hash table or a

binary search tree and you’ll see it flail in comparison. This is cool and you should think

so too! Exciting!

Apart from this test program, a couple of other simple tests are provided.

You are responsible for making sure your B-Tree is bullet proof, and if some bug in your

code isn’t flagged by our tests, that’s still your crisis and not ours. You should try

buil-ding your own test programs to make sure that everything checks out okay when

you store doubles, strings, and structures. This is a great opportunity to try some unit

testing, because the class you’re testing is very small and easily stressed.

Your code will be compiled together with a test file containing the main function and

with these compiler flags:

% g++ -Wall -Werror -O2 –std=c++14 -fsanitize=address...

No matter where you do your development work, when done, be sure your deliverable works on

the school machines.

5. Getting Started

You are provided with a stub that contains the files you need to get started on this

deliverable. The stub is available on the subject account. Login to CSE and then type

something like this:

% mkdir -p cs6771/btree

% cp ~cs6771/soft/17s2/btree.zip .

% unzip btree.zip

% ls

% more README

In the stub, you should find a fully documented btree.h file, a btree_iterator.h

stub file, and some test files designed to exercise the B-Tree and make sure it’s working.

You’ll see a Makefile in the mix as well.

You are free to introduce any new classes you like, e.g., a node class. To make marking

easier please place any new classes in the submission files. Take the time to do this thing right,

since it’s really at the heart of what C++ generics are all about. You should want to do

these things the right way.

You are not allowed to change the names of the btree/btree_iterator classes.

Neither can you change the public interface of the btree class template as defined in

btree.h, of course, you will agument it with the appropriate iterator related types and

operations.

Please note that the supplied files don’t yet compile since there is no definition for some

of the files.

Our implementation with generous comments consists of about 400 lines of C++ code.

6. Marking

This deliverable is worth 10% of your final mark.

Your submission will be given a mark out of 100 with a 80/100 automarked component

for output correctness and a 20/100 manually marked component for code style and

quality.

As always, failure to implement a proper B-Tree representation and/or an external,

iterator class, will earn you a total mark of 0 for this deliverable.

As this is a third-year course we expect that your code will be well formatted,

docu-mented and structured. We also expect that you will use standard formatting and

naming conventions. However, the style marks will be awarded for writing C++ code

rather than C code.

As for the performance of your solution, we are lenient. When compared to a

straightforward implementation, your solution is expected not to run >20X slower.

7. Submission

Copy your code to your CSE account and make sure it compiles without any errors or

warnings. Then run your test cases. If all is well then submit using the command (from

within your del4 directory):

% give cs6771 ass4 btree.h btree_iterator.h

Note that you are to submit specific files only, this means that these are the only files you

should modify. You should also ensure they work within the supplied infrastructure.

If you submit and later decide to submit an even better version, go ahead and submit a

second (or third, or seventh) time; we’ll only mark the last one. Be sure to give yourself

more than enough time before the deadline to submit.

Late submissions will be penalised unless you have legitimate reasons to convince the

LIC otherwise. Any submission after the due date will attract a reduction of 20% per day

to the maximum mark. A day is defined as a 24-hour day and includes weekends and

holidays. Precisely, a submission x hours after the due date is considered to be ceil(x/24)

days late. No submissions will be accepted more than five days after the due date.

Plagiarism constitutes serious academic misconduct and will not be tolerated.

Further details about lateness and plagiarism can be found in the Course Outline.