COMP4121辅导、辅导C++/Java编程
- 首页 >> Database Advanced Algorithms COMP4121 2022
Topics for the Major Project
Abbreviation BHK refers to the textbook “Foundations of Data Science” by
Blum, Hopcroft and Kannan, available for free at https://www.cs.cornell.
edu/jeh/book.pdf.
Abbreviation MC refers to the book “Networked Life” by Mung Chiang.
Abbreviation KT refers to the COMP3121 textbook Algorithms Design by
Kleinberg and Tardos.
Here are a few topics for those undecided about what to do as a Major Project
for this class (50% of your class mark). You can choose several kinds of projects.
Here are some possible choices.
(I) You can choose the topic yourselves, in consultation with me. That would
be the best choice, choosing something which you are personally interested in,
preferably in a field that you would like to make a career in. If you have lined
up a job already, talk to your future manager and ask them for a project related
to what you will be working on. I’ll keep your project report confidential if
your future employer requests that. You can also choose to implement an app
that you have an idea for, in any field.
(II) You can write a scholarly essay, explaining a topic on the level equal to
my lectures or in more detail. Here are a few options.
(1) Local Search. See pages 661-700 in KT. This is an important technique
for finding an approximate solution to problems which are too hard to
solve exactly in a reasonable amount of time. For a high mark, include
solutions of the exercises at the end of this chapter of KT.
1
(2) Random Walks and Markov Chains See pages 76-116 in BHK. This
is an extremely important material of which the PageRank is probably
the most important example. For a high mark include solutions to a few
exercises at the end of that chapter of BHK.
(3) Random Graphs See pages 245-299 in BHK. For a high mark include
solutions to a few exercises at the end of the chapter.
(4) Topic Models, Nonnegative Matrix Factorisation, Hidden Markov
Models, and Graphical Models See pages 310-355 in BHK. These
are extremely important topics for those interested in advanced Machine
Learning. For a high mark include solutions to a few exercises at the end
of this chapter of BHK.
(5) Compressed Sensing and Sparse Vectors. See pages 360-373 in BHK.
A relatively new field important for dealing with massive data. For a high
mark include solutions to a few exercises at the end of that chapter of
BHK.
(6) Wavelets. See pages 385-402. An important method for signal represen-
tation, used for example, in JPEG2000. An alternative to the standard
sinc-based signal representation. For a high mark include solutions to a
few exercises at the end of that chapter of BHK.
(7) How does Google sell ad spaces? See pages 25 - 44 in MC. As an
alternative, choose any chapter from MC and extend it with more details
by finding relevant sources using Google.
(8) You can start reading “A first course in Machine Learning” by Simon
Rogers and Mark Girolami. If you have not taken a machine learning class
already, you are making a huge mistake - this is where most of the future
jobs will be, especially future good jobs. Then summarise the part you
managed to cover during the next couple of weeks. But, by all means, for
your own good, finish reading the whole book (after the trimester is over
and you are done with exams). In my opinion, this is by far the best
introduction to Machine Learning. It accomplishes something I would
think impossible: it assumes essentially only high school mathematics and
no statistics background, and yet, by introducing math, probability and
statistics as needed, it manages to do an entirely rigorous intro to Ma-
chine Learning. No black boxes whatsoever, goes fairly deep and is really
enjoyable to read. You can get a Kindle copy on Amazon and it will be
one of the best investments you have ever made!
2
(III) You can choose to implement yourselves a few algorithms for the
same task and compare their performance. Examples include:
(1) Implement several clustering algorithms and apply them to synthetic
and real life data sets. You can find such real life data sets at https://
archive.ics.uci.edu/ml/datasets.php?format=&task=clu&att=&area=
&numAtt=&numIns=&type=&sort=nameUp&view=table.
(2) Implement several regression algorithms (such as the Ridge Regres-
sion or LASO) and apply them to synthetic and real life datasets, such
as the ones that can be found at https://archive.ics.uci.edu/ml/
datasets.php?format=&task=reg&att=&area=&numAtt=&numIns=&type=
&sort=nameUp&view=table.
(3) Implement several classification algorithms and apply them to syn-
thetic and real life data sets, such as https://archive.ics.uci.edu/ml/
datasets.php?format=&task=cla&att=&area=&numAtt=&numIns=&type=
&sort=nameUp&view=table
(4) Join Kaggle competitions by going to https://www.kaggle.com/competitions
to join; you can start with the Titanic competition https://www.kaggle.
com/c/titanic; see also the tutorial https://www.kaggle.com/alexisbcook/
titanic-tutorial and then write a report on what you have learned.
This website is accessible as a starter to students with little exposure to
Machine Learning.
Marking: There is no strict requirement how many pages your report should
have; as a rough guideline, it usually takes 25-30 pages to present a topic well. Your
submission must be self contained, and someone who reads it should be able to learn
at least as much as you would learn by studying one topic which I have covered in
class and to the same depth level or higher. You should include the bibliography,
referencing any sources you have used. If you cite something verbatim, state so
and give a reference. The main criterion in marking will be how profitable
your project is for the advancement of your own knowledge and skill. This
includes being able to document what you have learned and what you have done. I
do not want you to do something just to pass the course; it has to be of real use to
you and to anyone who might read your submission.
Please start working on your project straight away. Feel free to come to the online
office hours on Friday 4PM - 5PM or you can send me a meeting invitation to discuss
3
what you want to do. Our Online Consultation meeting link can be found at our
course website: https://webcms3.cse.unsw.edu.au/COMP4121/22T3/resources/
77815.
Topics for the Major Project
Abbreviation BHK refers to the textbook “Foundations of Data Science” by
Blum, Hopcroft and Kannan, available for free at https://www.cs.cornell.
edu/jeh/book.pdf.
Abbreviation MC refers to the book “Networked Life” by Mung Chiang.
Abbreviation KT refers to the COMP3121 textbook Algorithms Design by
Kleinberg and Tardos.
Here are a few topics for those undecided about what to do as a Major Project
for this class (50% of your class mark). You can choose several kinds of projects.
Here are some possible choices.
(I) You can choose the topic yourselves, in consultation with me. That would
be the best choice, choosing something which you are personally interested in,
preferably in a field that you would like to make a career in. If you have lined
up a job already, talk to your future manager and ask them for a project related
to what you will be working on. I’ll keep your project report confidential if
your future employer requests that. You can also choose to implement an app
that you have an idea for, in any field.
(II) You can write a scholarly essay, explaining a topic on the level equal to
my lectures or in more detail. Here are a few options.
(1) Local Search. See pages 661-700 in KT. This is an important technique
for finding an approximate solution to problems which are too hard to
solve exactly in a reasonable amount of time. For a high mark, include
solutions of the exercises at the end of this chapter of KT.
1
(2) Random Walks and Markov Chains See pages 76-116 in BHK. This
is an extremely important material of which the PageRank is probably
the most important example. For a high mark include solutions to a few
exercises at the end of that chapter of BHK.
(3) Random Graphs See pages 245-299 in BHK. For a high mark include
solutions to a few exercises at the end of the chapter.
(4) Topic Models, Nonnegative Matrix Factorisation, Hidden Markov
Models, and Graphical Models See pages 310-355 in BHK. These
are extremely important topics for those interested in advanced Machine
Learning. For a high mark include solutions to a few exercises at the end
of this chapter of BHK.
(5) Compressed Sensing and Sparse Vectors. See pages 360-373 in BHK.
A relatively new field important for dealing with massive data. For a high
mark include solutions to a few exercises at the end of that chapter of
BHK.
(6) Wavelets. See pages 385-402. An important method for signal represen-
tation, used for example, in JPEG2000. An alternative to the standard
sinc-based signal representation. For a high mark include solutions to a
few exercises at the end of that chapter of BHK.
(7) How does Google sell ad spaces? See pages 25 - 44 in MC. As an
alternative, choose any chapter from MC and extend it with more details
by finding relevant sources using Google.
(8) You can start reading “A first course in Machine Learning” by Simon
Rogers and Mark Girolami. If you have not taken a machine learning class
already, you are making a huge mistake - this is where most of the future
jobs will be, especially future good jobs. Then summarise the part you
managed to cover during the next couple of weeks. But, by all means, for
your own good, finish reading the whole book (after the trimester is over
and you are done with exams). In my opinion, this is by far the best
introduction to Machine Learning. It accomplishes something I would
think impossible: it assumes essentially only high school mathematics and
no statistics background, and yet, by introducing math, probability and
statistics as needed, it manages to do an entirely rigorous intro to Ma-
chine Learning. No black boxes whatsoever, goes fairly deep and is really
enjoyable to read. You can get a Kindle copy on Amazon and it will be
one of the best investments you have ever made!
2
(III) You can choose to implement yourselves a few algorithms for the
same task and compare their performance. Examples include:
(1) Implement several clustering algorithms and apply them to synthetic
and real life data sets. You can find such real life data sets at https://
archive.ics.uci.edu/ml/datasets.php?format=&task=clu&att=&area=
&numAtt=&numIns=&type=&sort=nameUp&view=table.
(2) Implement several regression algorithms (such as the Ridge Regres-
sion or LASO) and apply them to synthetic and real life datasets, such
as the ones that can be found at https://archive.ics.uci.edu/ml/
datasets.php?format=&task=reg&att=&area=&numAtt=&numIns=&type=
&sort=nameUp&view=table.
(3) Implement several classification algorithms and apply them to syn-
thetic and real life data sets, such as https://archive.ics.uci.edu/ml/
datasets.php?format=&task=cla&att=&area=&numAtt=&numIns=&type=
&sort=nameUp&view=table
(4) Join Kaggle competitions by going to https://www.kaggle.com/competitions
to join; you can start with the Titanic competition https://www.kaggle.
com/c/titanic; see also the tutorial https://www.kaggle.com/alexisbcook/
titanic-tutorial and then write a report on what you have learned.
This website is accessible as a starter to students with little exposure to
Machine Learning.
Marking: There is no strict requirement how many pages your report should
have; as a rough guideline, it usually takes 25-30 pages to present a topic well. Your
submission must be self contained, and someone who reads it should be able to learn
at least as much as you would learn by studying one topic which I have covered in
class and to the same depth level or higher. You should include the bibliography,
referencing any sources you have used. If you cite something verbatim, state so
and give a reference. The main criterion in marking will be how profitable
your project is for the advancement of your own knowledge and skill. This
includes being able to document what you have learned and what you have done. I
do not want you to do something just to pass the course; it has to be of real use to
you and to anyone who might read your submission.
Please start working on your project straight away. Feel free to come to the online
office hours on Friday 4PM - 5PM or you can send me a meeting invitation to discuss
3
what you want to do. Our Online Consultation meeting link can be found at our
course website: https://webcms3.cse.unsw.edu.au/COMP4121/22T3/resources/
77815.