代做CPT111 2223 Resit-CW Task Sheet

- 首页 >> OS编程

Erick Purwanto and Teng Ma – July 2023
CPT111 2223 Resit-CW Task Sheet
Overview
Resit Coursework (Resit-CW) is the final coursework component of the course for resit
students. It contributes to 100% of your final marks.
You will use your object-oriented techniques, file processing, and data structures you
have learned throughout the semester to solve a problem that consists of two main
tasks. In addition, you will create a video presentation to showcase your problem
solving knowledge and algorithm analysis skill, which mainly involves string
processing. You need to complete Java Code, Ethic Quiz, MP4 and PPT presentation.
Timeline
Resit 1
st Week, Resit-CW package is released, containing
July 11, 2023 this task sheet, skeleton codes, and partial test cases.
Resit 2nd Week, Resit-CW Java Code Online Quiz and Ethic Online Quiz
July 21, 2023, 18:00 CST are open;
23:59 CST are closed.
Resit 2nd Week, Video MP4 Dropbox and PPT Dropbox
July 17, 2023, 14:00 CST are open;
July 21, 2023, 23:59 CST are closed.
Late Submission Period 5% lateness penalty per-day only for Video and PPT.
No lateness allowed for Code / Ethic Quiz.
July 28, 2023, 23:59 CST End of Late Submission Period.
No submissions are accepted thereafter.
University Lateness Policy
Video MP4 and PPT are allowed to have late submission with penalty for max 5 days.
There will be no late Code or Ethic Quiz submissions since feedback is given by the
quiz. This is consistent to University lateness policy on not having late submission
period for assessment with feedback.
Outline
The rest of the task sheet will describe the background of the problem, detailed
specification of the two main tasks, and the deliverables you have to submit.
CPT111
Erick Purwanto and Teng Ma – July 2023
Resit-CW – DNA for Profiling and Disease
Detection
Background
DNA carries the genetic information in living beings. Interestingly, it has been used in
criminal justice system for profiling work, as well as disease diagnosis in medicine. In
this resit coursework, your task is to develop algorithms for those two purposes.
DNA
Deoxyribonucleic acid (DNA) is a sequence of molecules called nucleotides, arranged
into a double helix shape. Each nucleotide of DNA contains one of four different
bases: Adenine (A), Cytosine (C), Guanine (G), or Thymine (T).
Every human cell has billions of these nucleotides arranged in sequence. Some
portions of this sequence are the same or very similar, across almost all humans.
However, there are some portions of the sequence have a higher genetic diversity
and thus vary more across the population.
Short Tandem Repeats (STRs)
One place where DNA tends to have high genetic diversity is in Short Tandem
Repeats (STRs). An STR is a short sequence of DNA bases that is repeated
continuously numerous times at specific locations in DNA. The number of times any
particular STR repeats varies a lot among different people.
CPT111
Erick Purwanto and Teng Ma – July 2023
In the DNA samples below, for example, Alice has the STR AAGT repeated back-toback three times in her DNA, while Bob has the same STR repeated back-to-back four
times.
DNA Profiling and Database
DNA profiling is a procedure used to identify individuals on the basis of their unique
genetic makeup. Recording the number of STR of the population in a DNA database,
and then firstly using it for searching can help speeding up the identification process.
Using multiple STRs, we can improve the accuracy of DNA profiling. If the probability
that two people have the same number of a single STR is 5% and we look at 10
different STRs, then the probability that two DNA samples match solely by chance
(assuming independence of all STRs) is about 1 in 1 quadrillion. So, if two DNA
samples match in the number of continuous repeats for each of the STRs, we can
have enough confidence that they came from the same person.
Let us have a very simple DNA database in the form of a CSV file. Each row
corresponds to an individual, and each column corresponds to a particular STR.
For example, database.csv contains:
name,AAGT,ACTC,TATG
Alice,22,35,18
Bob,16,20,18

The data in the above CSV file would suggest that Alice has the sequence AAGT
repeated 22 times consecutively somewhere in her DNA, the sequence ACTC
repeated 35 times, and TATG repeated 18 times. Bob, meanwhile, has those same
three STRs repeated 16 times, 20 times, and 18 times, respectively.
Next, a sequence of DNA is queried to the database. Given that sequence of DNA,
how can one identify to whom it belongs? Well, for example, one may first search for
the longest length of consecutive repeats of AAGT in the sequence, followed
similarly by ACTC and TATG. If one then found that the longest sequence of AAGTs is
22 repeats long, ACTCs is 35 repeats long, and TATGs is 18; one may as a result
conclude that the DNA was Alice's. Finally, it's also possible that after one takes the
CPT111
Erick Purwanto and Teng Ma – July 2023
counts for each of the STRs, it doesn't match anyone in the DNA database, in which
case one reports no match.
One of your task is to write a program that will first take a CSV file containing STR
counts for a list of individuals, build a DNA database of your own, take another TXT
file that contains a DNA sequence, and then output to whom the DNA belongs or
reports no match.
Huntington's Disease Diagnosis
Huntington’s disease (HD) is an inherited and terminal neurological disorder. It is a
condition that stops parts of the brain working properly over time, and is usually
fatal after a period of up to 20 years.
At this time, there is no cure for HD. However, in 1993, a group of scientists
discovered a very accurate genetic test for diagnosing HD. The gene that causes HD
is actually located on Chromosome 4, and has a consecutive repeats of CAG. The
normal range of CAG repeats is between 10 and 35. Individuals with HD have
between 36 and 180 repeats.
Doctors use a certain DNA test to count the number of CAG repeats; and consult the
following table to produce a diagnosis:
Number of Repeats Diagnosis
0 - 9 Faulty Test
10 - 35 Normal
36 - 39 High Risk
40 - 180 Huntington's
>= 181 Faulty Test
The other one of your task is to write a method that based on the DNA sequence
read before, will analyze that sequence for Huntington's disease and produce a
diagnosis following the table above.
CPT111
Erick Purwanto and Teng Ma – July 2023
Specification and Deliverables
In this section, you will find details about your implementation and the files that you
have to submit.
Specification and Use Cases
Your implementation must satisfy the following specification and use cases:
1. You will implement your program in DnaProfileDiagnosis.java.
2. A new object of DnaProfileDiagnosis is created by calling
DnaProfileDiagnosis constructor. The name of the CSV file containing
the DNA database would be passed to the constructor.
3. Your program should open the CSV file and read its contents into the instance
variables. You may assume that the first row of the CSV file will be the
column names. The first column will be the word name and the remaining
columns would be the STR sequences. The following columns would be the
actual name and the corresponding STR counts.
4. The name of the TXT file containing the DNA sequence would be passed to
the readDna instance method. Your program should open the TXT file and
read its contents into the instance variables.
5. The DNA sequence in the TXT file may contain some whitespace (spaces,
tabs, newlines). Your program should remove any whitespace before storing
and computing on it.
6. The method checkProfile could then be called, after setting the query
sequence. Your algorithm will try to match the STRs counts of the database
and the DNA sequence. If a match is found, the name of the individual will be
returned as a String, such as "Alice". Otherwise, the String "None
matches" will be returned.
You may assume the STR counts will not match more than one individual.
7. Calling the checkProfile method before setting the DNA sequence would
cause an IllegalArgumentException to be thrown.
8. The method diagnoseHd could also then be called after setting the DNA
sequence.
Your algorithm will perform a diagnosis based on the CAG repeats and the
table at the previous section. The output of the method would be one of the
following Strings: "Faulty Test", "Normal", "High Risk", or
"Huntington's".
9. Calling the diagnoseHd method before setting the DNA sequence would
cause an IllegalArgumentException to be thrown.
10. Another readDna calls may be made to change the DNA sequence.
CPT111
Erick Purwanto and Teng Ma – July 2023
Instance Variable and Complexity Requirements
In this Resit Coursework, to store, query and compute on the DNA database and the
DNA sequence, you must use ArrayList and/or TreeMap, and their methods. Failing
to satisfy this by using other data structures would result in getting 0 marks.
There is no requirements on the running time of your program.
Public API
public class DnaProfileDiagnosis {
// build a database from database.csv
public DnaProfileDiagnosis(String database)
// store a dna sequence with no whitespace from dna.txt
public void readDna(String dna)
// based on the STR counts, return either a name in
// database, or "None Matches"
// throws IllegalArgumentException if dna has not been set
public String checkProfile()
// based on the CAG repeats, return either "Faulty Test",
// "Normal", "High Risk", or "Huntington's"
// throws IllegalArgumentException if dna has not been set
public String diagnoseHd()
}
Sample Client
Your program should behave as the example below:
public class TestCoursework {
public static void main(String[] args) {
DnaProfileDiagnosis test = new DnaProfileDiagnosis(db1);
test.readDna(dna1);
System.out.println(test.checkProfile()); // Alice
System.out.println(test.diagnoseHd()); // Normal
test.readDna(dna2);
System.out.println(test.checkProfile()); // Bob
System.out.println(test.diagnoseHd()); // Huntington's
DnaProfileDiagnosis test2 = new DnaProfileDiagnosis(db2);
System.out.println(test2.checkProfile()); // IllegArgExce
// ception thrown
}
}
CPT111
Erick Purwanto and Teng Ma – July 2023
Video Requirements
Create a video and make a submission to Learning Mall with the following
requirements:
1. The video must contain description and discussion of the algorithms you use
to complete both the profiling and the diagnosis tasks, followed by their
running time analysis.
2. The length of the video must be less than or equal to 4 minutes.
Violating the length requirements will result in 0 marks of your video grade.
3. Your video must show your face for the purpose of authenticity verification.
Violating the showing face requirement will result in 0 marks in your video
grade.
4. You may want to make your video look nicer, however, the grade will not be
based on the looks. Only the quality and clarity of the algorithm description,
discussion and analysis will count.
A simple recording of a PPT explanation while showing the presenter's face in
a box by shared screen with BBB or Tencent Meeting would be sufficient.
5. Submit to Learning Mall the following:
a. The video file in .mp4
b. The PPT file you used to create a video
Grades
The marks of your submission:
1. Correctness of all the methods: 70 marks
(your code will be tested on a new set of test cases)
2. Algorithm discussion, analysis and clarity of the video: 25 marks
3. Ethics Online Quiz 5 marks
Total 100 marks
Academic Integrity
1. Plagiarism, e.g. copying materials from other sources without proper acknowledgement, copying, or collusion are serious academic offences. Plagiarism,
copying, or collusion will not be tolerated and will be dealt with in accordance
with the University Code of Practice on Academic Integrity.
2. In some cases, individual students may be invited to explain parts of their code
in person, and if they fail to demonstrate an understanding of the code, no
credit will be given for that part.
3. In more severe cases, the violation will be reported to the Exam Officer for
further investigation and will be permanently recorded in the student's official
academic transcript.

站长地图