CS-453程序代做、代写C++编程语言
- 首页 >> Database Project description (CS-453)
September 24, 2024
Contents
1 Software transactional memory 2
1.1 Gentle introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Speciffcation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Possible implementations (in English) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3.1 Using a coarse-lock (i.e. the “reference” implementation) . . . . . . . . . . . . . . . 5
1.3.2 Dual-versioned transactional memory . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 The project 9
2.1 Practical information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.1 Testing and submission . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1.2 Grading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1.3 Other rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Interface of the software transactional memory . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2.1 Data structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2.2 Functions to implement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3 Concurrent programming with shared memory 17
3.1 Atomic operations since C11/C++11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2 Memory ordering in C11/C++11: a quick overview . . . . . . . . . . . . . . . . . . . . . . 17
3.3 From C11/C++11 to the model used in the course . . . . . . . . . . . . . . . . . . . . . . 19
11 Software transactional memory
Your goal is to implement a software transactional memory library.
The project repository (see Section 2.1) contains a very simple reference implementation of such a
library, along with the skeleton for another implementation: the one you’ll have to (design and) code.
The project repository includes the source code for, and a Makeffle to run the program that will evaluate
your implementation on the evaluation server (provided by the Distributed Computing Laboratory).
But ffrst, let’s informally describe what a (software) transactional memory is, and why it can be useful.
1.1 Gentle introduction
Let me introduce you to Alice. Alice runs a bank. As for any bank today, her system must be able to
process electronic orders. And it should be fast (Alice’s bank is very successful and has many customers).
At least two decades ago, to increase the system throughput, Alice could have only needed to buy
newer hardware, and her program would run faster. But 15–20 years ago, this free lunch ended [1]: Alice
now has to use the computational power of several hardware threads to get the most of modern processors.
This is where a transactional memory can come in handy, and where you step in to help Alice.
Alice’s program is currently written to run on a single thread. When Alice runs her code on her
single-core processor (no concurrency), it runs as expected.
For instance, let’s say that two customers, Bob and Charlie, want to transfer money: both Bob and
Charlie wants to wire each other (and they did not agree beforehand on “who owes who” to make only
one transfer. . .). The pseudo-code for the “money transfer” function could be the following:
shared accounts as integer[];
fn transfer(src, dst, amount) {
if (accounts[src] < amount) // Not enough funds
return; // => no transfer
accounts[dst] = accounts[dst] + amount;
accounts[src] = accounts[src] - amount;
}
Bob wants to send 10 CHF to Charlie, and Charlie 15 CHF to Bob. Since there is only one hardware
thread, each transfer will naturally be processed one after the other. That is, if Bob’s request was
received just before Charlie’s request:
Bob 12 CHF
Charlie 10 CHF
−−−−−−−−→
Bob’s request
Bob 2 CHF
Charlie 20 CHF
−−−−−−−−−−→
Charlie’s request
Bob 17 CHF
Charlie 5 CHF
Another possible execution, if instead Charlie’s request was processed ffrst by Alice’s banking system:
Bob 12 CHF
Charlie 10 CHF
−−−−−−−−−−→
Charlie’s request
Bob 12 CHF
Charlie 10 CHF
−−−−−−−−→
Bob’s request
Bob 2 CHF
Charlie 20 CHF
Even if Bob may not be very happy with this second possible outcome, it is still a consistent execution
as far as the speciffcation of the bank is concerned: when the funds are insufffcient, no transfer happens.
Now, Alice wants to try her program (by assumption correct on a single-core processor) on brandnew,
multi-core hardware. This way her system will be able to process transfers concurrently, (hopefully)
answering her customers faster. So Alice, for testing purposes, runs the same program on multiple
threads, each taking customer requests as they come. To analyze what outcomes are possible when two
threads process the two respective transfers from Bob and Charlie in concurrence, we will rewrite the
pseudo code to highlight each shared memory access, and in which order they are written in the program:
fn transfer(src, dst, amount) {
let v0 = accounts[src]; // 0th read
if (v0 < amount) // Not enough funds
return; // => no transfer
let v1 = accounts[dst]; // 1st read
accounts[dst] = v1 + amount; // 1st write
let v2 = accounts[src]; // 2nd read
accounts[src] = v2 - amount; // 2nd write
}
2We assume that this pseudo-code runs on a pseudo-machine with atomic register
1
, and that this
pseudo-machine does not reorder memory accesses
2
: memory is always accessed in the same order as
written in the program, and there exist a total order following which each atomic memory access happens.
Supposing the account of Bob is at index 0 in accounts, and the account of Charlie at index 1, a possible,
concurrent execution is then (for clarity, v0, v1, v2 have been replaced by accounts[. . . ]):
Bob request’s thread Charlie request’s thread
0th read 12 == accounts[0]
1st read 10 == accounts[1]
1st write accounts[1] = 20
0th read 20 == accounts[1]
1st read 12 == accounts[0]
2nd read 12 == accounts[0]
2nd write accounts[0] = 2
1st write accounts[0] = 27
2nd read 20 == accounts[1]
2nd write accounts[1] = 5
Figure 1: A concurrent execution that leads to an inconsistent state.
Under this concurrent execution, starting from Bob and Charlie having respectively 12 + 10 = 22
CHF in total on their bank accounts, they end up with a total of 27 + 5 = 32 CHF. Oups! Alice’s system
involutarily “offered” 10 CHF to Bob, by letting the thread from Charlie’s request overwrites (in this
speciffc execution) the deduction of 10 CHF made by the thread concurrently handling Bob’s request.
From the standpoint of the speciffcation of the bank, this is an inconsistency.
A possible immediate solution could be to use a mutual exclusion mechanism, i.e. a lock:
fn transfer(src, dst, amount) {
lock accounts { /* Only one thread at a time can enter this block:
...concurrent threads wait until no thread is in this block */
let v0 = accounts[src]; // 0th read
if (v0 < amount) // Not enough funds
return; // => no transfer
let v1 = accounts[dst]; // 1st read
accounts[dst] = v1 + amount; // 1st write
let v2 = accounts[src]; // 2nd read
accounts[src] = v2 - amount; // 2nd write
}
}
Such a mechanism would prevent the execution from Figure 1, as only one thread accessing the shared
variable accounts can run at any given time. But this mechanism also prevents multiple threads from
processing transfers concurrently. Alice wanted to use multiple cores speciffcally to process multiple
requests at the same time, answering faster her (many) customers. Always taking a single, coarse lock
when accessing the shared memory is thus not a satisfying solution.
Actually, Alice noticed that some transfers can always run concurrently, without the need for any
synchronization. For instance, if Bob wires Charlie and Dan wires Erin at the same time, no ordering
of memory accesses can lead to an inconsistency. This is simply because the two sets of shared
memory segments respectively accessed by these two transfers, i.e. {accounts[0], accounts[1]} and
{accounts[2], accounts[3]}, do not overlap
3
: they do not conffict.
This is where the transactional memory is useful: it runs in parallel transactions that do not conffict,
and synchronizes these which do conffict, (hopefully) allowing to make the most out of multi-core
processors with minimal coding efforts. To get a glimpse of the general ideas behind the interface of the
software transactional memory proposed in this project
4
, let’s rewrite (again) Alice’s pseudo-code:
1The notion of atomic register is taught at the beginning of the course.
2The memory consistency models of modern hardware is outside the scope of this course.
3Also, memory segments that are only read by each thread can overlap without synchronization.
4Please refer to Section 2.2 for a precise deffnition of the interface of the library you have to implement.
3// Initialization of the transactional memory, which holds the accounts
shared tm = new TransactionalMemory(/* ... */);
/* ...Initialize the array of accounts,
at the "start address" of the shared memory... */
fn transfer(src, dst, amount) {
let accounts = tm.get start address() as integer[];
while (true) {
try {
let tx = tm.begin();
if (tx.read(&accounts[src]) < amount) {
tx.commit();
break;
}
tx.write(&accounts[dst], tx.read(&accounts[dst]) + amount);
tx.write(&accounts[src], tx.read(&accounts[src]) - amount);
tx.commit();
break;
} except (RetryTransaction) {
continue;
}
}
}
A transaction is enclosed, between calls to tm.begin and tx.end. Unlike with a lock, these functions
(especially begin) may not block, allowing for concurrent executions.
Reads and writes (and allocs and frees) onto the shared memory are all controlled by the software
transactional memory library: Alice’s code is not allowed to e.g. directly read accounts[dst]. Instead,
a call to tx.read must be made (tx.write, tx.alloc, tx.free for other operations). Notably, each of
these operations can abort, requesting the caller to try again the same transaction.
1.2 Speciffcation
The software transactional memory library sees and controls every access to the shared memory. The
library can delay execution, return two different values when reading the same address from two different,
concurrent transactions, etc. Basically, the goal of the library is to make concurrent transactions appear
as if they were executed serially, without concurrency. (In the course, this notion is called opacity.) Also,
once a transaction successfully committed, its modiffcations must be visible in subsequent transactions.
A software transactional memory library that achieves this goal will be deemed correct.
Think of the serial executions of Bob’s and Charlie’s concurrent transfers (in Section 1.1): either the
request from Bob was processed ffrst, not seeing any modiffcation from Charlie’s request, or the opposite.
They executed one after the other. Also, if Bob makes a new transaction (e.g. requesting its account
details) after these two transactions committed, Bob must see both of their outcomes on his screen.
We can deffne three notions which, taken together, would satisfy the (informal) goal described above.
These notions are: snapshot isolation, atomicity, and consistency. Namely:
Snapshot isolation ensures that: (1) no write/alloc/free
5
in any non-committed transaction can
be read/become available/unavailable in any other transaction, (2) all the reads/writes/allocs/frees
made in a transaction appear to be made from the same atomic snapshot of the whole shared
memory, excepted for (3) a transaction observes its own modiffcations (i.e. writes/allocs/frees).
Example: if snapshot isolation had been satisffed in Figure 1, as Charlie’s transaction already read
v1 == 12, then the 2
nd
read from Charlie’s thread could not have read v2 == 20, which either: (1)
is the effect of a write in a non-committed transaction, if Bob’s transaction had not committed, or
(2) belongs to a different snapshot, if Bob’s transaction had committed between the two reads.
Atomicity ensures that all the memory writes/allocs/frees of any committed transaction seem to
have all happened at one indivisible point in time.
Example: without atomicity, Bob’s transaction could successfully commit, and another transaction
(e.g. Charlie’s transaction) could take a snapshot where only one of the two writes had been made.
5When Alice’s program calls tx.free, the transactional memory library is not required to immediately call libc’s free.
4 Consistency ensures, for a committing transaction T, that: (1) none of the transactions that
committed since the atomic snapshot of T (a) freed a segment of memory read or written by T or
(b) allocated a segment that overlaps with a segment allocated by T, and (2) each read made by T
in its snapshot would read the same value in a snapshot taken after the last
6
committed transaction.
Example: let’s consider again Figure 1. Both Bob’s and Charlie’s transactions run concurrently with
(in this example) the same snapshot: Bob has 12 CHF and Charlie 10 CHF. It is consistent for Bob’s
transaction to commit ffrst, as there has been no committed transaction since Bob’s transaction
begun and thus the shared memory remained the same since the snapshot of Bob’s transaction.
Regarding Charlie’s committing transaction, which reads memory locations that were modiffed since
its atomic snapshot, it would not be consistent to commit: Charlie’s transaction must be retried
7
.
Now if we consider the other example of Bob wiring Charlie while Dan wires Erin, it would be
consistent for both transactions to commit, since they accessed different words of the shared memory.
Side notes:
There are several deffnitions of consistency in the literature. Here we provide a sensible deffnition
for this project, while being arguably more precise/speciffc/actionable than some other existing
deffnitions (e.g. “Data is in a consistent state when a transaction starts and when it ends.” [3]).
Point (3) of snapshot isolation is usually found in consistency.
1.3 Possible implementations (in English)
This section describes the reference implementation and a possible transactional memory for this project.
You are free to implement any other software transactional memory (see the rules in Section 2.1.3). More
ideas can be found while studying:
TinySTM: https://github.com/patrickmarlier/tinystm
LibLTX: https://sourceforge.net/projects/libltx
stmmap: https://github.com/skaphan/stmmap
1.3.1 Using a coarse-lock (i.e. the “reference” implementation)
This implementation is very simple to describe: the transactional memory library uses a single mutex to
(trivially) serialize transactions made onto the shared memory.
When a transaction begins (i.e. tm.begin is called), the single mutex is taken. When the transaction
ends, it always commits (i.e. never retries), and the single mutex is released. Read, write, allocation and
deallocation operations are directly mapped to reading/writing the memory at the requested addresses,
and essentially calling libc’s malloc/free
8
.
This approach is obviously correct, as it directly consists in both (1) executing transactions serially
and (2) ensuring new transactions see all the writes/allocs/frees from the last committed transaction.
From the prism of the three notions laid above, snapshot isolation is guaranteed because no (concurrent)
transaction can run to observe the memory effects of the pending transaction holding the lock,
until the pending transaction commits, and so, releases the lock. Atomicity is directly guaranteed by the
lock, and ditto for consistency, since there cannot be two transactions running concurrently: the pending
transaction is the only one able to alter the state of the shared memory.
1.3.2 Dual-versioned transactional memory
This transactional memory is only provided as a suggestion. You are free to look for other implementations
in the litterature. It is not the fastest possible for the grading workload
9
, and not even the simplest you can
implement. You thus have two choices: 1) be guided and follow the pseudocode in this document, which
will get you an STM that requires a good amount of non-trivial optimizations to get the maximum grade,
6The set of committed transactions must be linearizable, and for any valid linearization, the last committed transaction
for any (but the ffrst) transaction T is simply the transaction that is considered to have happened right before T.
7As long as tm end (see Section 2.2) for Bob’s transaction has not returned control, an implementation could as well,
without violating linearizability, make Bob’s transaction abort and instead let Charlie’s transaction commit.
8The implementation also needs to keep track of the allocated segments, to prevent any memory leak: see Section 2.2.
9Any reasonably optimized implementation will nevertheless be faster than the reference.
52) explore the world of transactional memories by yourself and implement a more efficient algorithm such
as TL2 or LSA, which should make getting the maximum grade easier. If you want to get help from the
TAs, and are not that much of an explorer, it is recommended you stick with the algorithm described
below. Also, only the high-level concepts and logic will be presented. Low-level details and optimizations,
e.g. actual memory layouts, are omitted on purpose, and for you to find.
Through the use of a very basic multiversionning scheme, this implementation aims at making sure
read-only transactions can both: run concurrently with read-write transactions, and never fail. Two copies
of every (aligned) word of the size of the requested alignment (see Section 2.2.2) are kept. When a transaction
reads/writes words in shared memory, the transactional library has to decide for each word which
of the two copies must be read/written, or if none can be read/written and the transaction must abort.
One key structure that can considerably simplify the design (at the expense of performance though)
is what we will call the Thread Batcher (or more simply, the Batcher ). The goal of the Batcher is to
artificially create points in time in which no transaction is running. This goal might sound uncanny, but
it is not: a mere mutex, as the one used in the reference implementation, creates such points in time.
Indeed, when the transaction that got the global lock releases it, no transaction is running.
You can think of the Batcher as a special kind of mutex: while a mutex lets only one blocked thread
enter and leave no matter how many threads were blocked waiting, the Batcher lets each and every
blocked thread(s) enter together when the last thread (from the previous batch) leaves. So the interface
of the Batcher is pretty much the same as the one of a mutex: enter and leave (analogous to lock and
unlock), and one special function called get epoch that can only be called by a thread after it entered
(and before it leaved) the Batcher. get epoch returns an integer that is unique to each batch of threads
(e.g. a counter, incrementing when the last thread of each batch leaves, would do well).
Batcher pseudo-code, assuming all functions execute in one indivisible point in time (they are atomic):
Data: counter : integer, remaining: integer, blocked: list of threads
fn get epoch()
return counter;
end
fn enter()
if remaining = 0 then
remaining = 1;
else
append current thread to blocked;
wait until “woken up”;
end
end
fn leave()
decrement remaining by 1;
if remaining = 0 then
increment counter by 1;
remaining = length of blocked;
“wake up” every thread in blocked;
empty list blocked;
end
end
With this implementation, each shared memory region holds one instance of Batcher. When a transaction
begins on a shared memory region, enter must be called. Symmetrically, when a transaction ends
(no matter if it committed or aborted), leave must be called as its last action.
At this point, we have “batched” transactions running onto the shared memory region. We will call
an “epoch” the time span during which each batch of transactions runs.
With our dual-versioned implementation, each segment of n word(s) (the size of a word here is the
requested alignment) is duplicated, and for each word there is a control structure to decide which version
to read/write, or if the transaction must abort. A schematic representation of one segment is given below:
6word 0 word 1 word 2 word 3
control control control control
copy A copy A copy A copy A
copy B copy B copy B copy B
. . .
For the sake of the presentation laid below, the word indexes above (i.e 0, 1, . . . ) will be assumed unique
for the whole shared memory region: there are no two words in the same region that have the same index.
The control structure contains the following pseudo-fields10:
Which copy is “valid” from the previous epoch (i.e. either A or B).
This copy will be called the “readable copy”, and the other copy will be called the “writable copy”.
The “access set” of read-write transaction(s) which have accessed the word in the current epoch.
Do not implement an actual set in any (optimized) implementation: this set will only be used to tell
whether a transaction can write to the word. Namely, if at least one other transaction has accessed
(i.e. read or written) this word in the same epoch, the write cannot happen. Said differently, if two
transactions are in the access set, it doesn’t matter which one they are.
Whether the word has been written in the current epoch.
The pseudo-code of the read/write/alloc/free/commit operations (+ 2 helper functions) can then be:
fn read word(index, target)
if the transaction is read-only then
read the readable copy into target;
return the transaction can continue;
else
if the word has been written in the current epoch then
if the transaction is already in the “access set” then
read the writable copy into target;
return the transaction can continue;
else
return the transaction must abort;
end
else
read the readable copy into target;
add the transaction into the “access set” (if not already in);
return the transaction can continue;
end
end
end
10Your implementation may not declare a control structure with this exact list of fields or with the exact same semantic:
your implementation should probably have more fields, with adjusted semantic, for the purpose of optimization. Moreover,
multiple fields can be merged using bit fields.
7fn write word(source, index)
if the word has been written in the current epoch then
if the transaction is already in the “access set” then
write the content at source into the writable copy;
return the transaction can continue;
else
return the transaction must abort;
end
else
if at least one other transaction is in the “access set” then
return the transaction must abort;
else
write the content at source into the writable copy;
add the transaction into the “access set” (if not already in);
mark that the word has been written in the current epoch;
return the transaction can continue;
end
end
end
fn read(source, size, target)
foreach word index within [source, source + size[ do
result = read word(word index , target + offset);
if result = transaction must abort then
return the transaction must abort;
end
end
return the transaction can continue;
end
fn write(source, size, target)
foreach word index within [target, target + size[ do
result = write word(source + offset, word index );
if result = transaction must abort then
return the transaction must abort;
end
end
return the transaction can continue;
end
fn alloc(size, target)
allocate enough space for a segment of size size;
if the allocation failed then
return the allocated failed;
end
initialize the control structure(s) (one per word) in the segment;
initialize each copy of each word in the segment to zeroes;
register the segment in the set of allocated segments;
return the transaction can continue;
end
fn free(target)
mark target for deregistering (from the set of allocated segments)
and freeing once the last transaction of the current epoch leaves
the Batcher, if the calling transaction ends up being committed;
return the transaction can continue;
end
8fn commit()
foreach written word index do
defer the swap, of which copy for word index is the “valid”
copy, just after the last transaction from the current epoch
leaves the Batcher and before the next batch starts running;
end
return the transaction has committed;
end
As for the Batcher pseudo-code, we assume the functions outlined above all execute atomically. This
means that the execution of a function should not be affected by the concurrent execution of other
functions. Practically, this implies that you should use synchronization primitives to control the access
to shared variables. Numerous implementation details, e.g. how to access the control structure of each
word, how to keep track of the allocated segments, etc, are intentionally left completely unspecified.
For reference: (very optimized) implementations can reach speedups above ×2.5 against the grading
workload, and the implementation of the TA is less than 1000 lines long (all comments and spaces
included). The proposed algorithms can be extended with speculative execution of read-only transactions
(for you to figure out) to get a higher speedup or to compensate for the lack of optimizations. You should
look for such optimizations after getting an initial version that works correctly.
2 The project
You are expected to be reasonably fluent in C11 or C++17 for this project. Picking one or the other
shouldn’t impact the performance of your implementation (granted you don’t use overly expensive abstractions).
Pick the one you’re the most familiar with. If you’re unsure, we advise you to stick to C11.
A (very brief) introduction to the memory model of C11/C++11, along with available atomic operations,
are provided in Section 3.
2.1 Practical information
This project is associated with a git repository, hosted on GitHub. It will be available at https://
github.com/LPD-EPFL/CS453-2024-project when the semester starts. Its layout is the following:
grading/ Directory containing the source code of the evaluation program, to test your solution on
your own machine. The evaluation server (see below) runs the exact same binary, possibly with a
different seed. Section 2.1.1 further details how to test and later submit your code.
include/ C11 and C++17 header files that define the public interface of the STM.
template/ Template (in C11) of your implementation, but you are free to replace everything and
write C++17 instead. See section 2.1.3 for more information.
reference/ Reference, single lock-based implementation (see Section 1.3.1).
Note that you are not allowed to write an implementation that is equivalent to this reference
implementation. See section 2.1.3 for more information.
sync-examples/ Examples of how to use synchronization primitives (locks, condition variables and
atomics).
The specification of the evaluation server, on which your submission will be run, are the following:
CPU Virtualized Intel(R) Xeon(R) E5-2680: 24 cores at 2.80GHz
RAM 128 GB
OS GNU/Linux (distribution: Ubuntu 24.04)
2.1.1 Testing and submission
At this point, you should have received your unique (and secret) user identifier by mail.
If not, ask the TAs as soon as possible: without this identifier, you cannot submit your implementation.
The proposed workflow is the following:
91. Clone/download the repository.
2. Make a local copy of/rename the template directory with your SCIPER number:
you@your-pc:/path/to/repository$ cp -r template 123456
Be aware make is sensitive to spaces in the path/names.
3. Complete or completely rewrite (your copy of) the template with your own code.
(a) Complete/rewrite your code; only the interface should be kept identical.
(b) Compile and test locally with:
you@your-pc:/path/to/repository/grading$ make build-libs run
(c) Send your code for testing on the evaluation server; see below.
(d) Repeat any of the previous steps until you are satisfied with correctness and performance.
The procedure to send your code for evaluation is the following:
1. Zip your modified (copy of the) template directory (not directly its content).
2. Send your code for evaluation with the submit.py script:
you@your-pc:/path/to/repository$ python3 submit.py --uuid <. . .> 123456.zip
The UUID (Unique User IDentifier) to use is the one you should have received by mail.
Your request will be queued on the evaluation server, and you will receive the results of both the compilation
and the execution of your implementation (including its speedup vs the reference one) as soon as
it finishes running. You can submit code as often as you want until the deadline.
Only the (correct) submission with the highest speedup will be stored persistently and considered for
grading, so no need to resubmit your best implementation just before the deadline. The TAs may still
ask you to provide a copy of your best implementation though (possibly after the deadline), so do not
delete anything before the end of the semester (2025-01-30).
As an option from the submit.py script, you can force-replace your submission with another one. In
this case no compilation nor performance check is performed. The submission script also allows you to
download the submission considered for your grade, so you can make sure the system has the version you
want to be considered for grading. You can display the command-line help for this tool with:
you@your-pc:/path/to/repository$ python3 submit.py -h
Also note that the automated submission system has no false negative, i.e. a correct implementation
will always be deemed correct; unless it is more than 16 times slower than the reference (see Section
2.1.2).
On the other hand, the system has false positive (i.e. incorrect implementations deemed correct).
Unless your submission includes portions of code that appear to be specifically designed to trick the
submission system into considering an incorrect implementation correct, if the submission system qualifies
your implementation as correct, it will be graded as a correct one.
2.1.2 Grading
Correctness is a must
The executed transactions must satisfy the specification (sections 1.2 and 2.2).
No correctness ⇒ no “passing” grade for the project (i.e. grade < 4).
Performance is required to get the maximum grade
The only considered metric is the number of transactions committed per second.
Providing a correct implementation guarantees you at least a grade of 4 for the project, although an
implementation that is at least 16 times slower than the reference (coarse lock-based) implementation
will not be deemed correct , regardless of whether a longer execution time would have allowed it to finish.
Please also note: an implementation that leaks memory will have its grade capped at 5.
Given its speedup relative to the reference, the grade of a correct implementation (that does not leak) is:
10Speedup s Grade g
s < 1/16 g < 4
s ∈ [
1/16, 1] g = 4 + 16s−1
15
s ∈ [1, 2.5] g = 5 + 2s−2
3
s > 2.5 g = 6
For reference, the maximum speedup obtained on this project is ×15.04 with an optimized (algorithmwise)
version of TL2. The assoc
September 24, 2024
Contents
1 Software transactional memory 2
1.1 Gentle introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Speciffcation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Possible implementations (in English) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3.1 Using a coarse-lock (i.e. the “reference” implementation) . . . . . . . . . . . . . . . 5
1.3.2 Dual-versioned transactional memory . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 The project 9
2.1 Practical information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.1 Testing and submission . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1.2 Grading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1.3 Other rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Interface of the software transactional memory . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2.1 Data structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2.2 Functions to implement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3 Concurrent programming with shared memory 17
3.1 Atomic operations since C11/C++11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2 Memory ordering in C11/C++11: a quick overview . . . . . . . . . . . . . . . . . . . . . . 17
3.3 From C11/C++11 to the model used in the course . . . . . . . . . . . . . . . . . . . . . . 19
11 Software transactional memory
Your goal is to implement a software transactional memory library.
The project repository (see Section 2.1) contains a very simple reference implementation of such a
library, along with the skeleton for another implementation: the one you’ll have to (design and) code.
The project repository includes the source code for, and a Makeffle to run the program that will evaluate
your implementation on the evaluation server (provided by the Distributed Computing Laboratory).
But ffrst, let’s informally describe what a (software) transactional memory is, and why it can be useful.
1.1 Gentle introduction
Let me introduce you to Alice. Alice runs a bank. As for any bank today, her system must be able to
process electronic orders. And it should be fast (Alice’s bank is very successful and has many customers).
At least two decades ago, to increase the system throughput, Alice could have only needed to buy
newer hardware, and her program would run faster. But 15–20 years ago, this free lunch ended [1]: Alice
now has to use the computational power of several hardware threads to get the most of modern processors.
This is where a transactional memory can come in handy, and where you step in to help Alice.
Alice’s program is currently written to run on a single thread. When Alice runs her code on her
single-core processor (no concurrency), it runs as expected.
For instance, let’s say that two customers, Bob and Charlie, want to transfer money: both Bob and
Charlie wants to wire each other (and they did not agree beforehand on “who owes who” to make only
one transfer. . .). The pseudo-code for the “money transfer” function could be the following:
shared accounts as integer[];
fn transfer(src, dst, amount) {
if (accounts[src] < amount) // Not enough funds
return; // => no transfer
accounts[dst] = accounts[dst] + amount;
accounts[src] = accounts[src] - amount;
}
Bob wants to send 10 CHF to Charlie, and Charlie 15 CHF to Bob. Since there is only one hardware
thread, each transfer will naturally be processed one after the other. That is, if Bob’s request was
received just before Charlie’s request:
Bob 12 CHF
Charlie 10 CHF
−−−−−−−−→
Bob’s request
Bob 2 CHF
Charlie 20 CHF
−−−−−−−−−−→
Charlie’s request
Bob 17 CHF
Charlie 5 CHF
Another possible execution, if instead Charlie’s request was processed ffrst by Alice’s banking system:
Bob 12 CHF
Charlie 10 CHF
−−−−−−−−−−→
Charlie’s request
Bob 12 CHF
Charlie 10 CHF
−−−−−−−−→
Bob’s request
Bob 2 CHF
Charlie 20 CHF
Even if Bob may not be very happy with this second possible outcome, it is still a consistent execution
as far as the speciffcation of the bank is concerned: when the funds are insufffcient, no transfer happens.
Now, Alice wants to try her program (by assumption correct on a single-core processor) on brandnew,
multi-core hardware. This way her system will be able to process transfers concurrently, (hopefully)
answering her customers faster. So Alice, for testing purposes, runs the same program on multiple
threads, each taking customer requests as they come. To analyze what outcomes are possible when two
threads process the two respective transfers from Bob and Charlie in concurrence, we will rewrite the
pseudo code to highlight each shared memory access, and in which order they are written in the program:
fn transfer(src, dst, amount) {
let v0 = accounts[src]; // 0th read
if (v0 < amount) // Not enough funds
return; // => no transfer
let v1 = accounts[dst]; // 1st read
accounts[dst] = v1 + amount; // 1st write
let v2 = accounts[src]; // 2nd read
accounts[src] = v2 - amount; // 2nd write
}
2We assume that this pseudo-code runs on a pseudo-machine with atomic register
1
, and that this
pseudo-machine does not reorder memory accesses
2
: memory is always accessed in the same order as
written in the program, and there exist a total order following which each atomic memory access happens.
Supposing the account of Bob is at index 0 in accounts, and the account of Charlie at index 1, a possible,
concurrent execution is then (for clarity, v0, v1, v2 have been replaced by accounts[. . . ]):
Bob request’s thread Charlie request’s thread
0th read 12 == accounts[0]
1st read 10 == accounts[1]
1st write accounts[1] = 20
0th read 20 == accounts[1]
1st read 12 == accounts[0]
2nd read 12 == accounts[0]
2nd write accounts[0] = 2
1st write accounts[0] = 27
2nd read 20 == accounts[1]
2nd write accounts[1] = 5
Figure 1: A concurrent execution that leads to an inconsistent state.
Under this concurrent execution, starting from Bob and Charlie having respectively 12 + 10 = 22
CHF in total on their bank accounts, they end up with a total of 27 + 5 = 32 CHF. Oups! Alice’s system
involutarily “offered” 10 CHF to Bob, by letting the thread from Charlie’s request overwrites (in this
speciffc execution) the deduction of 10 CHF made by the thread concurrently handling Bob’s request.
From the standpoint of the speciffcation of the bank, this is an inconsistency.
A possible immediate solution could be to use a mutual exclusion mechanism, i.e. a lock:
fn transfer(src, dst, amount) {
lock accounts { /* Only one thread at a time can enter this block:
...concurrent threads wait until no thread is in this block */
let v0 = accounts[src]; // 0th read
if (v0 < amount) // Not enough funds
return; // => no transfer
let v1 = accounts[dst]; // 1st read
accounts[dst] = v1 + amount; // 1st write
let v2 = accounts[src]; // 2nd read
accounts[src] = v2 - amount; // 2nd write
}
}
Such a mechanism would prevent the execution from Figure 1, as only one thread accessing the shared
variable accounts can run at any given time. But this mechanism also prevents multiple threads from
processing transfers concurrently. Alice wanted to use multiple cores speciffcally to process multiple
requests at the same time, answering faster her (many) customers. Always taking a single, coarse lock
when accessing the shared memory is thus not a satisfying solution.
Actually, Alice noticed that some transfers can always run concurrently, without the need for any
synchronization. For instance, if Bob wires Charlie and Dan wires Erin at the same time, no ordering
of memory accesses can lead to an inconsistency. This is simply because the two sets of shared
memory segments respectively accessed by these two transfers, i.e. {accounts[0], accounts[1]} and
{accounts[2], accounts[3]}, do not overlap
3
: they do not conffict.
This is where the transactional memory is useful: it runs in parallel transactions that do not conffict,
and synchronizes these which do conffict, (hopefully) allowing to make the most out of multi-core
processors with minimal coding efforts. To get a glimpse of the general ideas behind the interface of the
software transactional memory proposed in this project
4
, let’s rewrite (again) Alice’s pseudo-code:
1The notion of atomic register is taught at the beginning of the course.
2The memory consistency models of modern hardware is outside the scope of this course.
3Also, memory segments that are only read by each thread can overlap without synchronization.
4Please refer to Section 2.2 for a precise deffnition of the interface of the library you have to implement.
3// Initialization of the transactional memory, which holds the accounts
shared tm = new TransactionalMemory(/* ... */);
/* ...Initialize the array of accounts,
at the "start address" of the shared memory... */
fn transfer(src, dst, amount) {
let accounts = tm.get start address() as integer[];
while (true) {
try {
let tx = tm.begin();
if (tx.read(&accounts[src]) < amount) {
tx.commit();
break;
}
tx.write(&accounts[dst], tx.read(&accounts[dst]) + amount);
tx.write(&accounts[src], tx.read(&accounts[src]) - amount);
tx.commit();
break;
} except (RetryTransaction) {
continue;
}
}
}
A transaction is enclosed, between calls to tm.begin and tx.end. Unlike with a lock, these functions
(especially begin) may not block, allowing for concurrent executions.
Reads and writes (and allocs and frees) onto the shared memory are all controlled by the software
transactional memory library: Alice’s code is not allowed to e.g. directly read accounts[dst]. Instead,
a call to tx.read must be made (tx.write, tx.alloc, tx.free for other operations). Notably, each of
these operations can abort, requesting the caller to try again the same transaction.
1.2 Speciffcation
The software transactional memory library sees and controls every access to the shared memory. The
library can delay execution, return two different values when reading the same address from two different,
concurrent transactions, etc. Basically, the goal of the library is to make concurrent transactions appear
as if they were executed serially, without concurrency. (In the course, this notion is called opacity.) Also,
once a transaction successfully committed, its modiffcations must be visible in subsequent transactions.
A software transactional memory library that achieves this goal will be deemed correct.
Think of the serial executions of Bob’s and Charlie’s concurrent transfers (in Section 1.1): either the
request from Bob was processed ffrst, not seeing any modiffcation from Charlie’s request, or the opposite.
They executed one after the other. Also, if Bob makes a new transaction (e.g. requesting its account
details) after these two transactions committed, Bob must see both of their outcomes on his screen.
We can deffne three notions which, taken together, would satisfy the (informal) goal described above.
These notions are: snapshot isolation, atomicity, and consistency. Namely:
Snapshot isolation ensures that: (1) no write/alloc/free
5
in any non-committed transaction can
be read/become available/unavailable in any other transaction, (2) all the reads/writes/allocs/frees
made in a transaction appear to be made from the same atomic snapshot of the whole shared
memory, excepted for (3) a transaction observes its own modiffcations (i.e. writes/allocs/frees).
Example: if snapshot isolation had been satisffed in Figure 1, as Charlie’s transaction already read
v1 == 12, then the 2
nd
read from Charlie’s thread could not have read v2 == 20, which either: (1)
is the effect of a write in a non-committed transaction, if Bob’s transaction had not committed, or
(2) belongs to a different snapshot, if Bob’s transaction had committed between the two reads.
Atomicity ensures that all the memory writes/allocs/frees of any committed transaction seem to
have all happened at one indivisible point in time.
Example: without atomicity, Bob’s transaction could successfully commit, and another transaction
(e.g. Charlie’s transaction) could take a snapshot where only one of the two writes had been made.
5When Alice’s program calls tx.free, the transactional memory library is not required to immediately call libc’s free.
4 Consistency ensures, for a committing transaction T, that: (1) none of the transactions that
committed since the atomic snapshot of T (a) freed a segment of memory read or written by T or
(b) allocated a segment that overlaps with a segment allocated by T, and (2) each read made by T
in its snapshot would read the same value in a snapshot taken after the last
6
committed transaction.
Example: let’s consider again Figure 1. Both Bob’s and Charlie’s transactions run concurrently with
(in this example) the same snapshot: Bob has 12 CHF and Charlie 10 CHF. It is consistent for Bob’s
transaction to commit ffrst, as there has been no committed transaction since Bob’s transaction
begun and thus the shared memory remained the same since the snapshot of Bob’s transaction.
Regarding Charlie’s committing transaction, which reads memory locations that were modiffed since
its atomic snapshot, it would not be consistent to commit: Charlie’s transaction must be retried
7
.
Now if we consider the other example of Bob wiring Charlie while Dan wires Erin, it would be
consistent for both transactions to commit, since they accessed different words of the shared memory.
Side notes:
There are several deffnitions of consistency in the literature. Here we provide a sensible deffnition
for this project, while being arguably more precise/speciffc/actionable than some other existing
deffnitions (e.g. “Data is in a consistent state when a transaction starts and when it ends.” [3]).
Point (3) of snapshot isolation is usually found in consistency.
1.3 Possible implementations (in English)
This section describes the reference implementation and a possible transactional memory for this project.
You are free to implement any other software transactional memory (see the rules in Section 2.1.3). More
ideas can be found while studying:
TinySTM: https://github.com/patrickmarlier/tinystm
LibLTX: https://sourceforge.net/projects/libltx
stmmap: https://github.com/skaphan/stmmap
1.3.1 Using a coarse-lock (i.e. the “reference” implementation)
This implementation is very simple to describe: the transactional memory library uses a single mutex to
(trivially) serialize transactions made onto the shared memory.
When a transaction begins (i.e. tm.begin is called), the single mutex is taken. When the transaction
ends, it always commits (i.e. never retries), and the single mutex is released. Read, write, allocation and
deallocation operations are directly mapped to reading/writing the memory at the requested addresses,
and essentially calling libc’s malloc/free
8
.
This approach is obviously correct, as it directly consists in both (1) executing transactions serially
and (2) ensuring new transactions see all the writes/allocs/frees from the last committed transaction.
From the prism of the three notions laid above, snapshot isolation is guaranteed because no (concurrent)
transaction can run to observe the memory effects of the pending transaction holding the lock,
until the pending transaction commits, and so, releases the lock. Atomicity is directly guaranteed by the
lock, and ditto for consistency, since there cannot be two transactions running concurrently: the pending
transaction is the only one able to alter the state of the shared memory.
1.3.2 Dual-versioned transactional memory
This transactional memory is only provided as a suggestion. You are free to look for other implementations
in the litterature. It is not the fastest possible for the grading workload
9
, and not even the simplest you can
implement. You thus have two choices: 1) be guided and follow the pseudocode in this document, which
will get you an STM that requires a good amount of non-trivial optimizations to get the maximum grade,
6The set of committed transactions must be linearizable, and for any valid linearization, the last committed transaction
for any (but the ffrst) transaction T is simply the transaction that is considered to have happened right before T.
7As long as tm end (see Section 2.2) for Bob’s transaction has not returned control, an implementation could as well,
without violating linearizability, make Bob’s transaction abort and instead let Charlie’s transaction commit.
8The implementation also needs to keep track of the allocated segments, to prevent any memory leak: see Section 2.2.
9Any reasonably optimized implementation will nevertheless be faster than the reference.
52) explore the world of transactional memories by yourself and implement a more efficient algorithm such
as TL2 or LSA, which should make getting the maximum grade easier. If you want to get help from the
TAs, and are not that much of an explorer, it is recommended you stick with the algorithm described
below. Also, only the high-level concepts and logic will be presented. Low-level details and optimizations,
e.g. actual memory layouts, are omitted on purpose, and for you to find.
Through the use of a very basic multiversionning scheme, this implementation aims at making sure
read-only transactions can both: run concurrently with read-write transactions, and never fail. Two copies
of every (aligned) word of the size of the requested alignment (see Section 2.2.2) are kept. When a transaction
reads/writes words in shared memory, the transactional library has to decide for each word which
of the two copies must be read/written, or if none can be read/written and the transaction must abort.
One key structure that can considerably simplify the design (at the expense of performance though)
is what we will call the Thread Batcher (or more simply, the Batcher ). The goal of the Batcher is to
artificially create points in time in which no transaction is running. This goal might sound uncanny, but
it is not: a mere mutex, as the one used in the reference implementation, creates such points in time.
Indeed, when the transaction that got the global lock releases it, no transaction is running.
You can think of the Batcher as a special kind of mutex: while a mutex lets only one blocked thread
enter and leave no matter how many threads were blocked waiting, the Batcher lets each and every
blocked thread(s) enter together when the last thread (from the previous batch) leaves. So the interface
of the Batcher is pretty much the same as the one of a mutex: enter and leave (analogous to lock and
unlock), and one special function called get epoch that can only be called by a thread after it entered
(and before it leaved) the Batcher. get epoch returns an integer that is unique to each batch of threads
(e.g. a counter, incrementing when the last thread of each batch leaves, would do well).
Batcher pseudo-code, assuming all functions execute in one indivisible point in time (they are atomic):
Data: counter : integer, remaining: integer, blocked: list of threads
fn get epoch()
return counter;
end
fn enter()
if remaining = 0 then
remaining = 1;
else
append current thread to blocked;
wait until “woken up”;
end
end
fn leave()
decrement remaining by 1;
if remaining = 0 then
increment counter by 1;
remaining = length of blocked;
“wake up” every thread in blocked;
empty list blocked;
end
end
With this implementation, each shared memory region holds one instance of Batcher. When a transaction
begins on a shared memory region, enter must be called. Symmetrically, when a transaction ends
(no matter if it committed or aborted), leave must be called as its last action.
At this point, we have “batched” transactions running onto the shared memory region. We will call
an “epoch” the time span during which each batch of transactions runs.
With our dual-versioned implementation, each segment of n word(s) (the size of a word here is the
requested alignment) is duplicated, and for each word there is a control structure to decide which version
to read/write, or if the transaction must abort. A schematic representation of one segment is given below:
6word 0 word 1 word 2 word 3
control control control control
copy A copy A copy A copy A
copy B copy B copy B copy B
. . .
For the sake of the presentation laid below, the word indexes above (i.e 0, 1, . . . ) will be assumed unique
for the whole shared memory region: there are no two words in the same region that have the same index.
The control structure contains the following pseudo-fields10:
Which copy is “valid” from the previous epoch (i.e. either A or B).
This copy will be called the “readable copy”, and the other copy will be called the “writable copy”.
The “access set” of read-write transaction(s) which have accessed the word in the current epoch.
Do not implement an actual set in any (optimized) implementation: this set will only be used to tell
whether a transaction can write to the word. Namely, if at least one other transaction has accessed
(i.e. read or written) this word in the same epoch, the write cannot happen. Said differently, if two
transactions are in the access set, it doesn’t matter which one they are.
Whether the word has been written in the current epoch.
The pseudo-code of the read/write/alloc/free/commit operations (+ 2 helper functions) can then be:
fn read word(index, target)
if the transaction is read-only then
read the readable copy into target;
return the transaction can continue;
else
if the word has been written in the current epoch then
if the transaction is already in the “access set” then
read the writable copy into target;
return the transaction can continue;
else
return the transaction must abort;
end
else
read the readable copy into target;
add the transaction into the “access set” (if not already in);
return the transaction can continue;
end
end
end
10Your implementation may not declare a control structure with this exact list of fields or with the exact same semantic:
your implementation should probably have more fields, with adjusted semantic, for the purpose of optimization. Moreover,
multiple fields can be merged using bit fields.
7fn write word(source, index)
if the word has been written in the current epoch then
if the transaction is already in the “access set” then
write the content at source into the writable copy;
return the transaction can continue;
else
return the transaction must abort;
end
else
if at least one other transaction is in the “access set” then
return the transaction must abort;
else
write the content at source into the writable copy;
add the transaction into the “access set” (if not already in);
mark that the word has been written in the current epoch;
return the transaction can continue;
end
end
end
fn read(source, size, target)
foreach word index within [source, source + size[ do
result = read word(word index , target + offset);
if result = transaction must abort then
return the transaction must abort;
end
end
return the transaction can continue;
end
fn write(source, size, target)
foreach word index within [target, target + size[ do
result = write word(source + offset, word index );
if result = transaction must abort then
return the transaction must abort;
end
end
return the transaction can continue;
end
fn alloc(size, target)
allocate enough space for a segment of size size;
if the allocation failed then
return the allocated failed;
end
initialize the control structure(s) (one per word) in the segment;
initialize each copy of each word in the segment to zeroes;
register the segment in the set of allocated segments;
return the transaction can continue;
end
fn free(target)
mark target for deregistering (from the set of allocated segments)
and freeing once the last transaction of the current epoch leaves
the Batcher, if the calling transaction ends up being committed;
return the transaction can continue;
end
8fn commit()
foreach written word index do
defer the swap, of which copy for word index is the “valid”
copy, just after the last transaction from the current epoch
leaves the Batcher and before the next batch starts running;
end
return the transaction has committed;
end
As for the Batcher pseudo-code, we assume the functions outlined above all execute atomically. This
means that the execution of a function should not be affected by the concurrent execution of other
functions. Practically, this implies that you should use synchronization primitives to control the access
to shared variables. Numerous implementation details, e.g. how to access the control structure of each
word, how to keep track of the allocated segments, etc, are intentionally left completely unspecified.
For reference: (very optimized) implementations can reach speedups above ×2.5 against the grading
workload, and the implementation of the TA is less than 1000 lines long (all comments and spaces
included). The proposed algorithms can be extended with speculative execution of read-only transactions
(for you to figure out) to get a higher speedup or to compensate for the lack of optimizations. You should
look for such optimizations after getting an initial version that works correctly.
2 The project
You are expected to be reasonably fluent in C11 or C++17 for this project. Picking one or the other
shouldn’t impact the performance of your implementation (granted you don’t use overly expensive abstractions).
Pick the one you’re the most familiar with. If you’re unsure, we advise you to stick to C11.
A (very brief) introduction to the memory model of C11/C++11, along with available atomic operations,
are provided in Section 3.
2.1 Practical information
This project is associated with a git repository, hosted on GitHub. It will be available at https://
github.com/LPD-EPFL/CS453-2024-project when the semester starts. Its layout is the following:
grading/ Directory containing the source code of the evaluation program, to test your solution on
your own machine. The evaluation server (see below) runs the exact same binary, possibly with a
different seed. Section 2.1.1 further details how to test and later submit your code.
include/ C11 and C++17 header files that define the public interface of the STM.
template/ Template (in C11) of your implementation, but you are free to replace everything and
write C++17 instead. See section 2.1.3 for more information.
reference/ Reference, single lock-based implementation (see Section 1.3.1).
Note that you are not allowed to write an implementation that is equivalent to this reference
implementation. See section 2.1.3 for more information.
sync-examples/ Examples of how to use synchronization primitives (locks, condition variables and
atomics).
The specification of the evaluation server, on which your submission will be run, are the following:
CPU Virtualized Intel(R) Xeon(R) E5-2680: 24 cores at 2.80GHz
RAM 128 GB
OS GNU/Linux (distribution: Ubuntu 24.04)
2.1.1 Testing and submission
At this point, you should have received your unique (and secret) user identifier by mail.
If not, ask the TAs as soon as possible: without this identifier, you cannot submit your implementation.
The proposed workflow is the following:
91. Clone/download the repository.
2. Make a local copy of/rename the template directory with your SCIPER number:
you@your-pc:/path/to/repository$ cp -r template 123456
Be aware make is sensitive to spaces in the path/names.
3. Complete or completely rewrite (your copy of) the template with your own code.
(a) Complete/rewrite your code; only the interface should be kept identical.
(b) Compile and test locally with:
you@your-pc:/path/to/repository/grading$ make build-libs run
(c) Send your code for testing on the evaluation server; see below.
(d) Repeat any of the previous steps until you are satisfied with correctness and performance.
The procedure to send your code for evaluation is the following:
1. Zip your modified (copy of the) template directory (not directly its content).
2. Send your code for evaluation with the submit.py script:
you@your-pc:/path/to/repository$ python3 submit.py --uuid <. . .> 123456.zip
The UUID (Unique User IDentifier) to use is the one you should have received by mail.
Your request will be queued on the evaluation server, and you will receive the results of both the compilation
and the execution of your implementation (including its speedup vs the reference one) as soon as
it finishes running. You can submit code as often as you want until the deadline.
Only the (correct) submission with the highest speedup will be stored persistently and considered for
grading, so no need to resubmit your best implementation just before the deadline. The TAs may still
ask you to provide a copy of your best implementation though (possibly after the deadline), so do not
delete anything before the end of the semester (2025-01-30).
As an option from the submit.py script, you can force-replace your submission with another one. In
this case no compilation nor performance check is performed. The submission script also allows you to
download the submission considered for your grade, so you can make sure the system has the version you
want to be considered for grading. You can display the command-line help for this tool with:
you@your-pc:/path/to/repository$ python3 submit.py -h
Also note that the automated submission system has no false negative, i.e. a correct implementation
will always be deemed correct; unless it is more than 16 times slower than the reference (see Section
2.1.2).
On the other hand, the system has false positive (i.e. incorrect implementations deemed correct).
Unless your submission includes portions of code that appear to be specifically designed to trick the
submission system into considering an incorrect implementation correct, if the submission system qualifies
your implementation as correct, it will be graded as a correct one.
2.1.2 Grading
Correctness is a must
The executed transactions must satisfy the specification (sections 1.2 and 2.2).
No correctness ⇒ no “passing” grade for the project (i.e. grade < 4).
Performance is required to get the maximum grade
The only considered metric is the number of transactions committed per second.
Providing a correct implementation guarantees you at least a grade of 4 for the project, although an
implementation that is at least 16 times slower than the reference (coarse lock-based) implementation
will not be deemed correct , regardless of whether a longer execution time would have allowed it to finish.
Please also note: an implementation that leaks memory will have its grade capped at 5.
Given its speedup relative to the reference, the grade of a correct implementation (that does not leak) is:
10Speedup s Grade g
s < 1/16 g < 4
s ∈ [
1/16, 1] g = 4 + 16s−1
15
s ∈ [1, 2.5] g = 5 + 2s−2
3
s > 2.5 g = 6
For reference, the maximum speedup obtained on this project is ×15.04 with an optimized (algorithmwise)
version of TL2. The assoc