SWE3021辅导、讲解C/C++编程、辅导Multicore Computing留学生、讲解C/C++程序

2018.10.12 - 首页 >> C/C++编程

Project #2 - LU Decomposition using OpenMP

SWE3021 Multicore Computing - Fall 2018

Due date: October 19 (Fri) 17:00pm

1 Notes

1. For this project, please use swui, swye, and swji server. Since this project will use lots of

CPU cycles, I do not want you to mess up swin that is used for OS and System Programming

classes.

2. Please do not run more than 16 threads. And, please do not run a job that runs for more

than 5 minutes.

2 Parallel LU Decomposition

LU Decomposition is a classical method for transforming an N x N matrix A into the product of

a lower-triangular matrix L and an upper-triangular matrix U,

A = LU.

You will use a process known as Gaussian elimination to create an LU decomposition of a

square matrix. By performing elementary row operations, Gaussian elimination transforms a

square matrix A into an equivalent upper-triangular matrix U. The lower-triangular matrix L

consists of the row multipliers used in Gaussian elimination.

In this assignment, you will develop an OpenMP parallel implementation of LU decomposition

that use Gaussian elimination to factor a dense N x N matrix into an upper-triangular one and

a lower-triangular one. In matrix computations, pivoting involves finding the largest magnitude

value in a row, column, or both and then interchanging rows and/or columns in the matrix for the

next step in the algorithm. The purpose of pivoting is to reduce round-off error, which enhances

numerical stability. In your assignment, you will use row pivoting, a form of pivoting involves

interchanging rows of a trailing submatrix based on the largest value in the current column. To

perform LU decomposition with row pivoting, you will compute a permutation matrix P such that

PA = LU. The permutation matrix keeps track of row exchanges performed. Below is pseudocode

for a sequential implementation of LU decomposition with row pivoting.

i n p u t s : a ( n , n )

o u tpu t s : π( n ) , l ( n , n ) , and u ( n , n )

i n i t i a l i z e π a s a v e c t o r o f l e n g t h n

i n i t i a l i z e u a s an n x n ma trix with 0 s below the di a g o n al

i n i t i a l i z e l a s an n x n ma trix with 1 s on the di a g o n al and 0 s

above the di a g o n al

f o r i = 1 t o n

π [ i ] = i

f o r k = 1 t o n

max = 0

f o r i = k t o n

i f max < | a ( i , k ) |

max = | a ( i , k ) |

k ’ = i

i f max == 0

e r r o r ( s i n g u l a r ma trix )

swap π [ k ] and π [ k ’ ]

swap a ( k , : ) and a ( k ’ , : )

swap l ( k , 1 : k?1) and l ( k ’ , 1 : k?1)

u ( k , k ) = a ( k , k )

f o r i = k+1 t o n

l ( i , k ) = a ( i , k ) / u ( k , k )

u ( k , i ) = a ( k , i )

f o r i = k+1 t o n

f o r j = k+1 t o n

a ( i , j ) = a ( i , j ) ? l ( i , k )?u ( k , j )

Here , the v e c t o r π i s a compact r e p r e s e n t a t i o n o f a permuta tion

ma trix p ( n , n ) , which i s very s p a r s e . For the i t h row o f p , π( i )

s t o r e s the column inde x o f the s o l e p o s i t i o n th a t c o n t ai n s a 1 .

You will write a shared-memory parallel program that performs LU decomposition using row

pivoting. You will develop the solution using OpenMP. Your LU decomposition implementation

should accept four arguments:

1. n - size of a matrix

2. r - random seed number

3. t - number of threads (smaller than 16)

4. p - print flag (if 1, print matirx L, U, and A. if 0, print nothing)

Your output format must be as follows. Note that you should print out floating point values

using “%.4e” format specifier in printf.

$ a.out 4 1 8 1

1.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00

1.0851e+00 1.0000e+00 0.0000e+00 0.0000e+00

3.3061e-01 -1.8387e+00 1.0000e+00 0.0000e+00

4.3417e-01 -1.4853e+00 2.0883e-01 1.0000e+00

1.8043e+09 8.4693e+08 1.6817e+09 1.7146e+09

0.0000e+00 -4.9473e+08 -1.1048e+09 -2.1071e+08

0.0000e+00 0.0000e+00 -1.5622e+09 3.9619e+08

0.0000e+00 0.0000e+00 0.0000e+00 8.2737e+08

1.8043e+09 8.4693e+08 1.6817e+09 1.7146e+09

1.9577e+09 4.2424e+08 7.1989e+08 1.6498e+09

5.9652e+08 1.1896e+09 1.0252e+09 1.3505e+09

7.8337e+08 1.1025e+09 2.0449e+09 1.9675e+09

Your programs will allocate an n x n matrix A of double precision (64-bit) floating point

variables.

You should initialize the matrix with uniform random numbers computed using a random

number generator rand(). Note that you must call srand(r) before you initialize the matrix. The

srand() function sets the starting point for producing a series of pseudo-random integers. If srand()

is not called, the rand() seed is set as if srand(1) were called at program start. Any other value

for seed sets the generator to a different starting point. Then, you should apply LU decomposition

with partial pivoting to factor the matrix into an upper-triangular one and a lower-triangular one.

Have your program time the LU decomposition phase by reading the real-time clock before and

after and printing the difference.

Note: To evaluate the performance, you should use a problem size larger than n = 1000. But

do not run with too large parameter as in-ui-ye-ji cluster is shared by many students.

3 How to Submit

To submit your project, you must rename your code to project2.cpp and run multicore_submit

command in your project directory in swji.skku.edu server.

$ multicore_submit project2 project2.cpp

This command will submit your code to the instructor’s account.

For any questions, please post them in Piazza so that we can share your questions and answers

with other students. Please feel free to raise any issues and post any questions. Also, if you can

answer other students’ questions, please don’t hesitate to post your answer. You would get some

credits for posting questions and answers in Piazza.