CIS 548 - Project 2
- 首页 >> 其他CIS 548 - Project 2 - Spring 2019
CIS 548 Project 2: penn-mmu
MILESTONE 1 : Feb 25 @ 10pm
DUE : March 12 @10 pm
Directions
You must work in the same group of 2 that you were a part of for Project 1. If you are caught using code
from other groups or any sources from public code repositories, your entire group will receive ZERO for
this assignment, and will be sent to the Office of Student Conduct where there will be additional sanctions
imposed by the university.
Overview
In this assignment you will implement an in-memory simulation of the Memory Management Unit and Page
Replacement, penn-mmu. The primary learning goal of this assignment is to gain a thorough understanding
of Memory Allocation, Page Tables, Page Replacement Algorithms (also referred to as Replacement Policies in this document) and the Translation Lookaside Buffer.
While we recommend that your simulation stays as true to real working systems as possible, we will make
simplifying assumptions that gloss over certain hardware limitations that may exist in real systems.
DISCLAIMER: This is a never-before offered assignment that the teaching staff has spent time vetting and
reviewing. That being said, there may be unforeseen bugs, or instructions that are unintentionally ambiguous. Your feedback will help us make this better for this offering of the class and years to come. If you find
something that is unclear, please reach out to the teaching staff sooner rather than later and help us help
you.
This assignment is fairly different from the other projects of the class in that you will not be making use of
Linux system calls. A framework structure is provided in order to streamline the assignment without sacri-
ficing design freedom. Be sure to read through the entire spec before beginning, and make no assumptions
about what is expected. Whiteboard design is your friend, use it to design a coherent system before you dive
into the details. As always, take advantage of the teaching staff and attend as many office hours as possible.
We are here to help you and make sure you get the most out of the course!
IMPORTANT: Under no circumstances is your team allowed to use any personal git repository for
this project. Please make sure you use the repository that has been provided to you by the teaching
staff. In adherence to Penn’s Academic Integrity Policies, you are not allowed to upload any of your
work (now or in the future) to any git repository open to the public. Violating this will warrant serious
retroactive action!
Compiled 2019-02-18 08:35
CIS 548 - Project 2 - Spring 2019
1 Specification
This project can be broadly classified into two main parts - Memory Allocation and Virtual Memory Management. The Memory Allocation part of the project will have you implement your own version of malloc
and free. The Virtual Memory Management part of the project will involve implementing virtual memory
address translation (including the use of a Translation Lookaside Buffer) and page replacement algorithms.
In this assignment, you will use the given framework to build a simulation of these components. Your
simulation will run as a single user level process.
1.1 Memory Allocation
In order to read and write data, penn-mmu must allocate memory. In this project, you are required to
implement two functions - pemmu_malloc and pemmu_free (see interface.h in the provided framework
folder, for more on this). Though these functions are intended to mimic Linux’s implementation of malloc
and free, it is sufficient if they support the following basic functionalities :
1. The following data types must be supported - uint8_t (byte), uint16_t (word), uint32_t (dword),
and uint64_t (qword)
2. Memory Allocation should follow the First-Fit algorithm.
3. Unlike Unix, where the memory allocated may be equal to or greater than the size requested (due to
alignment boundaries and minimum block sizes), your implementation should allocate the exact size
requested. You are not required to align your addresses to any specific boundary since the granularity
of memory access for penn-mmu is 1 byte.
4. Your implementation should track memory allocated such that it should should not allow data to be
written to addresses that are not allocated.
5. Any address that lies within the range of allocated memory is a valid address for read/write operations.
6. Only addresses that are returned by pemmu_malloc can be freed. Calling pemmu_free on any other
address or on an address that has already been freed should result in an error.
7. In order to simplify the implementation, you are not required to support defragmentation.
1.1.1 First Fit Memory Allocation Algorithm
In order to ensure the virtual addresses allocated by your implementation is in sync with the testing framework you are required to follow the First Fit Memory Allocation Algorithm. The upper limit on the size of
a malloc is the page size. Your allocator design should make use of a particular page only if the requested
size fit’s that page. Hence a single malloc request should not span across page boundaries. Your implementation of the First Fit algorithm should be efficient enough to make use of every free contiguous address
block in the virtual memory address range before your implementation of malloc can return an error due
to insufficient memory. We emphasize on the phrase "free contiguous address block" since you are not required to support memory defragmentation. Be assured that the testing framework will stretch your memory
allocation algorithm to its absolute limit.
CIS 548 - Project 2 - Spring 2019
1.2 Virtual Memory Address Translation
It is the job of the Memory Management Unit (MMU) to translate virtual addresses to physical addresses.
Recall that a virtual address is really a pair of address (p,o) where the lower order bits of the address give
the offset within a given page o and the higher order bits specify the page number p. So the job of the
MMU is really to translate (virtual) page number p to (physical) frame number f. The mapping between
page numbers and frame numbers is maintained by the Operating System in page tables. There is no need
to map the offset bits since that remains the same across virtual and physical addresses.
penn-mmu makes certain simplifying assumptions outlined here and illustrated by Figure 1
1. There is no support for multiple processes to run at the same time. We will work with a single process.
Thus, penn-mmu need only maintain a single page table
2. Given the small size of memory we will be working with, the page table maintained by penn-mmu is
a single-level page table
3. Memory is divided into fixed sized pages, the size of which is configurable. Since penn-mmu is byte
addressable, the granularity of page addresses is 1 byte (uint8_t). Hence after an address range is
allocated, any address within that range is a valid address for write/read calls.
4. The size of virtual and physical memories are specified via a set_memory_configuration function
(see interface.h in the provided framework, for more on this)
5. Swap memory, as referred to in all parts of this spec, will be a single file that resides on the host’s
file system. (The terms swap space, swap memory and disk are used interchangeably throughout this
spec and all represent the same underlying idea of a place on "disk" where pages can live when they
are swapped out of physical memory.)
6. The swap file should only increase in size as pages are evicted from the physical memory. Pages are
never evicted from swap memory.
7. Since this project is a software simulation of memory management, you do not need to worry about
the memory footprint of the data structures that you plan to use for book keeping.
Figure 1 shows a snapshot of a sample configuration of penn-mmu with a virtual memory size of 60
bytes, physical memory size of 30 bytes and page size of 6 bytes. In this sample snapshot, the entire virtual
memory is filled with an up-counter data of type uint8_t (1 byte). Since the size of each page is 6 bytes, it
can hold 6 entries of uint8_t. It is extremely important you note that the virtual memory is just a logical
construct. It should never actually occupy any physical space in your program. The data movement for the
read and write calls should be via the physical memory. Study this figure carefully as it gives you a good
visualization of penn-mmu. All our test configurations will be small, since it is a simulation that will run
on the limited resources of the speclab machines. However, make no assumptions on the max size of the
memory configurations. Your implementation should be able to scale to any configuration. Always make
use of dynamic memory allocation to scale your data structures.
As part of your design process, you will have to decide on your representation of each of these components (including your representation of swap space which should be a single file on the host system). You are
free to use any reasonable data structure of your choosing for this purpose. Remember that this assignment,
like all others in this class, must be implemented only in C. You may not use any standard external libraries
CIS 548 - Project 2 - Spring 2019
Figure 1: A simple penn-mmu memory configuration where each page holds 6 bytes of data
for data structures, but must implement your own wherever needed. Since this is a software simulation, your
implementation need not be constrained by resources the way that an actual MMU would.
1.2.1 Data Types
Your design of physical memory should be such that it supports the storage and retrieval data types uint8_t
(byte), uint16_t (word), uint32_t (dword), and uint64_t (qword). These data types are of size 1 byte,
2 bytes, 4 bytes and 8 bytes respectively. For example, if the size of each page is 16 bytes, then each page
should be able to hold either
1. 16 units of uint8_t data
2. 8 units of uint16_t data
3. 4 units of uint32_t data
4. 2 units of uint64_t data
It is entirely possible for each page to hold any combinations of the above data types (provided space is
available).
1.2.2 Page Table Entries
The page table must at minimum contain the following data
1. Frame number - the physical frame that a virtual page translates to. You may not use a special
placeholder value to indicate that the page table entry is invalid. That is the purpose of the valid
bit
CIS 548 - Project 2 - Spring 2019
2. Valid bit - indicates whether a given page table entry is valid or not
3. Dirty bit - indicates if a write has been performed to the page since it was last written to swap memory.
If the dirty bit is not set, a page’s information does not have to be written back to disk on eviction. If
it is set, the page must be written back to ensure no loss of data.
You may (and will) require that the page table store additional information, which will include metadata
required for different page replacement algorithms. The above is just the minimum requirement that we will
look for while grading.
1.3 Page Replacement
The other important component of Virtual Memory Management is page replacement. Since virtual memory
is always larger than physical memory, there is a need to swap pages out of physical memory and in to disk
(the swap file), and bring in pages from disk as required. penn-mmu will implement the following
1. First In First Out (FIFO)
2. Clock Algorithm (CLOCK)
3. Least Recently Used (LRU)
The replacement policy to be used during the run of a particular test case will be specified using the
set_replacement_policy function that can be found in virtual_memory_controller.h.
Each of the replacement algorithms is to be implemented in adherence with the class slides, and the explanation given in class. Please use Piazza or Office Hours to clarify any ambiguity that you face in this
regard.
1.4 Translation Lookaside Buffer - TLB
Recollect that the TLB behaves as a cache for virtual memory address translation in that it stores the mapping from (virtual) page number to (physical) frame number, much like a single-level page table does. The
true gains of a TLB are seen when the system uses multi-level page tables. Since this is not the case in pennmmu, you might wonder what the benefit of the TLB is here. This implementation will hopefully serve as a
learning tool, to help understand all of the different bookkeeping and synchronization that the OS must do
in order to maintain consistency. It will also play a part in the analysis portion of this project as described in
Section 4.
In penn-mmu, you will implement the TLB as a configurable part of the system. The boolean tlb_enabled
in interface.h (which is a part of the configurations set by the testing framework) must be used to turn on
and off the functionality of the TLB as required. Your implementation will be subject to a series of tests
both with and without the TLB.
The size of the TLB is also configurable, and will set by the testing framework as the number of TLB entries. Please go through the section on testing (Section 5) for more details on configurations and the testing
framework.
In hardware, the TLB is implemented as associative memory (or content addressable memory), so that all
entries can be accessed in parallel. You are not required to maintain this illusion and can scan through entries
CIS 548 - Project 2 - Spring 2019
iteratively since we are not concerned about access latency in a software simulation. Having said that, it is
useful to remember that the TLB is not an indexed data structure like the page table!
At minimum, your TLB must contain the following entries
1. (virtual) page number
2. (physical) frame number
3. modified bit
Your TLB may (and will) require additional information stored in each entry. The above is the minimum
requirement for grading purposes. You will also find that it is necessary to synchronize the TLB and page
table, so that page replacement is performed correctly. Be careful not to overdo this and sync the two unnecessarily often.
Your TLB must implement an LRU based eviction of entries.
1.5 End-to-End Behavior
penn-mmu supports only custom read and write calls which are a part of the framework that has been
provided to you. At a high level, these are the operations that the testing framework will make use of
1. pemmu_malloc - in order to perform any write or read operation, memory must be allocated for the
variable via the pemmu_malloc function call. This function is used to invoke your address allocation
algorithm and should return a virtual address that can be used by the read and write routines. As
always, the virtual memory is just a logical allocation. All reads and writes MUST be via the physical
memory. Note that pemmu_malloc should only allocate memory (book keeping) and NOT cause a
page fault.
2. pemmu_free - frees the address that was returned by a pemmu_malloc call.
3. pemmu_write - write a value via the pemmu_write function to an address that was ’allocated’ using
the pemmu_malloc function call. If the address does not currently have a valid page table mapping, a
page fault occurs, which is then appropriately handled by bringing in the page from disk to physical
memory. If there is no free frame in physical memory, a frame is freed up in accordance with the page
replacement policy (which was specified by the testing framework).
4. pemmu_read - follows the same flow as write, except it reads the value stored at a specified address
via the pemmu_read function call.
An address must only be written to or read from after it has been allocated via the pemmu_malloc function
call. Any reads/writes to addresses that are not malloced by pemmu_malloc should throw an error. However,
you are assured that a read will only be called after a write.
The pemmu_malloc, pemmu_free, pemmu_write and pemmu_read functions provided to you in interface.h
should not be modified. These are the top level functions that the testing framework will interact with. These
functions in turn call corresponding API’s listed in Section 2. You are required to implement these functions. Please ensure that you do not change the signature of any of the given APIs as they are critical to
grading.
CIS 548 - Project 2 - Spring 2019
2 Framework
You have been provided with a framework that is intended to assist you in the implementation of penn-mmu.
Any files that are outside the submission folder must not be modified by you, we will use a stock version of
these during grading so any changes you make will not be seen by us. All your code files must be within the
submission folder.
You are expected to modify the Makefile inside the submission folder based on the source files you
decide to add. This Makefile will be referenced by the top-level Makefile present outside the submission
folder. Please do not modify the top-level Makefile. Ensure that your code compiles and runs on speclab
with this top level Makefile. The teaching staff will not spend time debugging Makefiles.
2.1 Given code files:
(Inside the framework folder - not to be modified)
1. main.c is the driver of the program, which invokes the required initialization and then calls a single
test program
2. interface.c implements the top level functions - pemmu_malloc, pemmu_read, and pemmu_write. It
also contains the set_memory_configuration function which sets the memory configuration for
penn-mmu and a helper function split_virtual_address that splits a virtual address into page
number and offset.
3. logger.c contains important logging functions that must be used at appropriate points in your code to
comply with grading
4. perf_counters.c implements important performance counters that must be used at appropriate points
in your code to comply with grading and assist with analysis
2.2 Functions that you must implement for grading compliance
(found in the virtual_memory_controller.c file):
1. master_initialization must hold all of the initialization calls that your implementation requires
since this is the only initialization that the testing framework will call
2. set_replacement_policy will be invoked by the testing framework and must be used to set the
replacement policy to be used by the system
3. mmu_write will be invoked by a higher level write function (from interface.c), and is expected to
follow the API that has been provided
4. mmu_read will be invoked by a higher level read function (from interface.c), and is expected to follow
the API that has been provided
5. mmu_allocate_variable will be invoked by a higher level malloc function (from interface.c), and
is expected to follow the API that has been provided
CIS 548 - Project 2 - Spring 2019
6. mmu_free_variable will be invoked by a higher level free function (from interface.c), and is expected to follow the API that has been provided
7. master_clean_up must hold all of the clean up calls that your implementation requires before the
program exits. This is the only clean up that the testing framework will invoke.
8. (found in the analysis_programs.c file):
All the function signatures. Please refer to the analysis section (Section 4) for more information on
these functions. This portion will count for extra credit.
3 Logging and Grading
It is imperative that your solution uses the following logging functions appropriately.
1. log_page_fault
2. log_translate
3. log_eviction
4. log_disk_write
5. log_add_TLB_entry
6. log_tlb_hit
7. log_tlb_miss
There are a number of other logging utilities, none of which needs to be invoked by you. The invocations of
these functions are scattered throughout the framework as necessary.
You have been provided test programs and their expected outcomes, it is in your interest to ensure that
your solution’s log output matches the sample solutions EXACTLY, as this portion is autograded and is the
primary basis for grading in this assignment.
You have been provided with a few test cases, representing the flavor of testing that your implementation of
penn-mmu will go through. Feel free to modify and play around with these to gain a complete understanding
of the framework.
4 Analysis (Extra Credit)
A significant component of this project is to understand that no one page replacement policy is universally
the best solution. Data access patterns have a huge impact on the performance of replacement policies. In
this section of the project, you will analyze and understand the different policies that penn-mmu implements.
To help with this we have provided certain performance counters that must be used in your code.
CIS 548 - Project 2 - Spring 2019
4.1 Performance Counters
Source files perf_counters.c and perf_counters.h define a number of counters that you must use in your
implementation. These performance counters are intended to help you with the analysis that we ask you
to perform. In addition, a component of your grade will depend on the numbers that these counter values
accumulate during testing, so do not forget to include them at the appropriate places in your code. You will
have to use all of the following functions
1. perf_stats_increment_disk_write() every time a disk write is carried out
2. perf_stats_increment_evictions() every time a page is evicted from the page table
3. perf_stats_increment_page_faults() whenever there is a page fault, i.e. when the mapping
from virtual page number to physical frame number is missing from the page table
4. perf_stats_increment_tlb_hit() on every TLB hit (this will be 0 in test runs where the TLB is
configured as disabled)
5. perf_stats_increment_tlb_miss() on every TLB miss (this will be 0 in test runs where the TLB
is configured as disabled)
6. perf_stats_increment_tlb_sync() is to be used whenever your implementation syncs the TLB
with the page table (this will be 0 in test runs where the TLB is configured as disabled)
4.2 Analysis Requirement
To facilitate a thorough understanding of replacement policies we ask that you submit one test program per
condition outlined below. Your test programs are not required to be any more complicated than they need to
be to satisfy the specific condition. You are free to choose any configuration of physical memory and virtual
memory, so long as physical memory is at least 32 bytes large and virtual memory is strictly larger than
physical memory. All of your test cases need only work with the TLB disabled.
Please take the time to understand the provided testing framework and submit test programs that comply
with the specified APIs. We will run your test cases on our reference implementation, so do not directly reference any custom functions for which an API was not provided to you. Test programs that do not comply
with these requirements will not be graded.
Write one test case each that satisfies the condition where
1. FIFO performs better than Clock (fifo_over_clock)
2. Clock performs better than FIFO (clock_over_fifo)
3. LRU performs better than Clock (lru_over_clock)
4. Clock performs better than LRU (clock_over_lru)
5. FIFO performs better than LRU (fifo_over_lru)
6. LRU better than FIFO (lru_over_fifo)
where "performs better" in this context simply means has less evictions from main memory. To streamline
grading, stub code with the function signatures are provided under analysis_programs.c. Place your test
programs in these functions.
CIS 548 - Project 2 - Spring 2019
5 Testing Framework
This project will be auto-graded with the help of an open source testing framework called CUTest, which
enables Java style unit testing in C. The test framework contains a vast suite of targeted unit tests that are
written to test different areas of your penn-mmu design. A small portion of these tests are provided to you
as part of the skeleton code along with the logs from the reference solution. These logs are provided to
help you understand what is expected from your solution. The auto grading script will run the complete
suite of tests against your submission and will compare the output logs against the reference solution logs.
The testing framework and the sample test suite is provided under the tests folder. The top level testing
framework functions are implemented in test_framework.c. These are :
1. test_framework_initialize: Performs all the initialization before the start of a test
2. test_framework_cleanup: Performs all the clean up once the test completes
3. run_cutest_framework: Runs the testing framework
The logs from each test are written into separate files which are created and closed before and after the
test respectively. The memory configuration (including enabling and disabling the TLB) can be configured
according to the requirement of each test using the set_memory_configuration function.
6 Developing and Testing Your Code
Typing ’make’ in the src folder should compile your code.
Various tests can be run using:
./penn-vm
The various , and codes and descriptions are available
in the test_framework.h file in the tests folder. codes range between 1 and 17.
The different codes are available in the interface.h file in the framework folder.
Also you can run all configurations of a particular test type using the following:
./penn-vm
The various codes are also available in the test_framework.h file in the tests folder. They
range between 22 and 38.
./penn-vm 0 runs all tests.
For milestone 1 you should be able to pass the following tests:
./penn-vm 22
./penn-vm 23
./penn-vm 24
For this project you should run and test your code on SpecLab (but not Eniac). We will be grading this
CIS 548 - Project 2 - Spring 2019
project on specLab and not on the VM. You can access SpecLab remotely over ssh. There are a number of
SpecLab machines up at any time. To check which machines are online, visit http://www.seas.upenn.
edu/cets/checklab/index.php?lab=speclab. When you are ready to develop, you can either choose a
machine ID number that is online and issue this command:
bash# ssh specDD.seas.upenn.edu
where DD is replaced with the machine ID number you chose, OR you can ssh into a random machine with
the following command:
ssh pennkey@speclab.seas.upenn.edu
You must be on SEASnet to directly ssh into SpecLab. If you are not on SEASnet, you may still remotely
access SpecLab by first ssh-ing into eniac and then ssh-ing into SpecLab.
7 What to turn in
You must provide each of the following for your submission, no matter what. Failure to do so may reflect
in your grade.
1. README file. In the README you will provide
• Your name and eniac username
• A list of submitted source files (not including anything outside the submission folder)
• Overview of work accomplished
• Description of code and code layout
• Whether you completed the extra credit (Analysis) portion
• General comments and anything that can help us grade your code
2. Makefile
3. Your code. You may think this is a joke, but at least once a year someone forgets to include it.
8 Submission
This project will be submitted on Canvas. After ensuring all relevant code and the README, from the
parent directory, compress the directory into a tarball named “mygroupnum-project2”. Finally, copy this file
to your local machine so you can upload through Canvas’s web interface. E.g.
spec01:~/CIS548a> tar --exclude-vcs -cvaf mygroupnum-project2.tar.gz submission
spec01:~/CIS548a> logout
eniac:~> logout
local:~> scp mypennkey@eniac.seas.upenn.edu:~/DIR/TO/mygroupnum-project2.tar.gz .
Replace “mygroupnum” with your group number. You can then upload the file from your local home
directory to the Canvas website.
Upon submission, a good sanity check would be to download your submission and verify that it has all of
the required files.
CIS 548 - Project 2 - Spring 2019
9 Tentative Grading Guidelines
Each team will receive a group grade for this project; however, individual grades may differ. This can occur
due to lack of individual effort, or other group dynamics. The team Git repository will be monitored and
used to determine individual contribution. In most cases, everyone on a team will receive the same grade.
Below is the grading guideline.
• 10% Milestone
• 2% Version Control
• 8% Documentation
• 8% Code Design
• 72% Functionality
• 10% Analysis (Extra Credit)
Please note that the grading scheme may be modified as the assignment progresses. The above is more of a
guideline that highlights all of the factors that you will be graded on.
Also note that general deductions may occur for a variety of programming errors including memory violations, lack of error checking, etc. Also, do not take the documentation lightly, it provides us (the graders) a
road-map of your code, and without it, it is quite difficult to grade the implementation.
Use the provided sample test suite and log files to achieve the best version of your solution. Do not rely
solely on the sample tests, as they are just a subset of the final test bench. Use the provided framework to
develop your own test cases.
Your programs will be graded on SpecLab, and must execute as specified there. You may develop and test on
your local machine, but you should ensure that your program functions properly on the SpecLab machine.
10 Milestone (10%)
In order to ensure that students stay on track and receive early feedback, we require all groups to submit an
intermediate solution before the deadline. For the milestone, students are expected to have completed the
designing of the different data structures to be used in the project along with the Memory Allocation part of
the MMU (mmu_malloc() and mmu_free())
Do include a README in this submission, including:
• An accounting of the work you have completed thus far
• Description of all the data structures
This milestone is designed to guide your workflow. As such, we will not grade it in significant detail.
CIS 548 - Project 2 - Spring 2019
11 Attribution
There is a large amount of material available online that explains and implements different kinds of virtual
memory simulators. It is in your best interest to refrain from looking any of these specific implementations
up. Not only will you be disciplined for violating the Academic Integrity Policy, but it will also be unlikely
to help since there is a specific framework and format that must be followed. That being said, there may be
sources online that you use to further your understanding of the working of the system. The primary rule
to keep you safe from plagiarism/cheating in this project is to attribute in your documentation any outside
sources you use. This includes both resources used to help you understand the concepts, and resources
containing sample code you incorporated in your simulator. The course text and APUE need only be cited
in the later case. Use external code sparingly.