SWE3021辅导、讲解C/C++编程、辅导Multicore Computing留学生、讲解C/C++程序
- 首页 >> C/C++编程Project #2 - LU Decomposition using OpenMP
SWE3021 Multicore Computing - Fall 2018
Due date: October 19 (Fri) 17:00pm
1 Notes
1. For this project, please use swui, swye, and swji server. Since this project will use lots of
CPU cycles, I do not want you to mess up swin that is used for OS and System Programming
classes.
2. Please do not run more than 16 threads. And, please do not run a job that runs for more
than 5 minutes.
2 Parallel LU Decomposition
LU Decomposition is a classical method for transforming an N x N matrix A into the product of
a lower-triangular matrix L and an upper-triangular matrix U,
A = LU.
You will use a process known as Gaussian elimination to create an LU decomposition of a
square matrix. By performing elementary row operations, Gaussian elimination transforms a
square matrix A into an equivalent upper-triangular matrix U. The lower-triangular matrix L
consists of the row multipliers used in Gaussian elimination.
In this assignment, you will develop an OpenMP parallel implementation of LU decomposition
that use Gaussian elimination to factor a dense N x N matrix into an upper-triangular one and
a lower-triangular one. In matrix computations, pivoting involves finding the largest magnitude
value in a row, column, or both and then interchanging rows and/or columns in the matrix for the
next step in the algorithm. The purpose of pivoting is to reduce round-off error, which enhances
numerical stability. In your assignment, you will use row pivoting, a form of pivoting involves
interchanging rows of a trailing submatrix based on the largest value in the current column. To
perform LU decomposition with row pivoting, you will compute a permutation matrix P such that
PA = LU. The permutation matrix keeps track of row exchanges performed. Below is pseudocode
for a sequential implementation of LU decomposition with row pivoting.
i n p u t s : a ( n , n )
o u tpu t s : π( n ) , l ( n , n ) , and u ( n , n )
i n i t i a l i z e π a s a v e c t o r o f l e n g t h n
i n i t i a l i z e u a s an n x n ma trix with 0 s below the di a g o n al
i n i t i a l i z e l a s an n x n ma trix with 1 s on the di a g o n al and 0 s
above the di a g o n al
f o r i = 1 t o n
π [ i ] = i
f o r k = 1 t o n
max = 0
f o r i = k t o n
i f max < | a ( i , k ) |
max = | a ( i , k ) |
k ’ = i
i f max == 0
e r r o r ( s i n g u l a r ma trix )
swap π [ k ] and π [ k ’ ]
1
swap a ( k , : ) and a ( k ’ , : )
swap l ( k , 1 : k?1) and l ( k ’ , 1 : k?1)
u ( k , k ) = a ( k , k )
f o r i = k+1 t o n
l ( i , k ) = a ( i , k ) / u ( k , k )
u ( k , i ) = a ( k , i )
f o r i = k+1 t o n
f o r j = k+1 t o n
a ( i , j ) = a ( i , j ) ? l ( i , k )?u ( k , j )
Here , the v e c t o r π i s a compact r e p r e s e n t a t i o n o f a permuta tion
ma trix p ( n , n ) , which i s very s p a r s e . For the i t h row o f p , π( i )
s t o r e s the column inde x o f the s o l e p o s i t i o n th a t c o n t ai n s a 1 .
You will write a shared-memory parallel program that performs LU decomposition using row
pivoting. You will develop the solution using OpenMP. Your LU decomposition implementation
should accept four arguments:
1. n - size of a matrix
2. r - random seed number
3. t - number of threads (smaller than 16)
4. p - print flag (if 1, print matirx L, U, and A. if 0, print nothing)
Your output format must be as follows. Note that you should print out floating point values
using “%.4e” format specifier in printf.
$ a.out 4 1 8 1
L:
1.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00
1.0851e+00 1.0000e+00 0.0000e+00 0.0000e+00
3.3061e-01 -1.8387e+00 1.0000e+00 0.0000e+00
4.3417e-01 -1.4853e+00 2.0883e-01 1.0000e+00
U:
1.8043e+09 8.4693e+08 1.6817e+09 1.7146e+09
0.0000e+00 -4.9473e+08 -1.1048e+09 -2.1071e+08
0.0000e+00 0.0000e+00 -1.5622e+09 3.9619e+08
0.0000e+00 0.0000e+00 0.0000e+00 8.2737e+08
A:
1.8043e+09 8.4693e+08 1.6817e+09 1.7146e+09
1.9577e+09 4.2424e+08 7.1989e+08 1.6498e+09
5.9652e+08 1.1896e+09 1.0252e+09 1.3505e+09
7.8337e+08 1.1025e+09 2.0449e+09 1.9675e+09
Your programs will allocate an n x n matrix A of double precision (64-bit) floating point
variables.
You should initialize the matrix with uniform random numbers computed using a random
number generator rand(). Note that you must call srand(r) before you initialize the matrix. The
srand() function sets the starting point for producing a series of pseudo-random integers. If srand()
is not called, the rand() seed is set as if srand(1) were called at program start. Any other value
for seed sets the generator to a different starting point. Then, you should apply LU decomposition
with partial pivoting to factor the matrix into an upper-triangular one and a lower-triangular one.
Have your program time the LU decomposition phase by reading the real-time clock before and
after and printing the difference.
Note: To evaluate the performance, you should use a problem size larger than n = 1000. But
do not run with too large parameter as in-ui-ye-ji cluster is shared by many students.
2
3 How to Submit
To submit your project, you must rename your code to project2.cpp and run multicore_submit
command in your project directory in swji.skku.edu server.
$ multicore_submit project2 project2.cpp
This command will submit your code to the instructor’s account.
For any questions, please post them in Piazza so that we can share your questions and answers
with other students. Please feel free to raise any issues and post any questions. Also, if you can
answer other students’ questions, please don’t hesitate to post your answer. You would get some
credits for posting questions and answers in Piazza.