NFS7410讲解I、Java设计讲解、Java程序语言调试
- 首页 >> Java编程INFS7410 Project - Part 2
Preamble
The due date for this assignment is 19 September 2019 17:00 Eastern Australia Standard Time,
together with part 1.
This part of the project is worth 10% of the overall mark for INFS7410 (part 1 is woth 5% -- and
thus the whole submission of part 1 + 2 is worth 15%). A detailed marking sheet for this
assignment is provided at the end of this document.
Aim
Project aim: The aim of this project is to implement a state-of-the-art information retrieval
method, evaluate it and compare it to the baseline and rank fusion methods obtained in part 1 in
the context of a real use-case.
Project Part 2 aim
The aim of part 2 is to:
Use the evaluation infrastructure setup for part 1
implement state-of-the-art information retrieval methods, based on query reduction
evaluate, compare and analyse the developed state-of-the-art methods against baseline and
ranking fusion methods
The Information Retrieval Task: Ranking of studies for
Systematic Reviews
Part 2 of the project considers the same problem described in part 1: re-rank a set of documents
retrieved for the compilation of a systematic review. A description of the wider task is provided in
part 1.
What we provide you with (same as part 1)
We provide:
for each dataset, a list of topics to be used for training. Each topic is organised into a file.
Each topic contains a title and a Boolean query.
for each dataset, a list of topics to be used for testing. Each topic is organised into a file. Each
topic contains a title and a Boolean query.
each topic file (both those for training and those for testing), includes a list of retrieved
documents in the form of their PMIDs: these are the documents that you have to rank. Take
note: you do not need to perform the retrieval from scratch (i.e. execute the query against
the whole index); instead you need to rank (order) the provided documents.
for each dataset, and for each train and test partition, a qrels file, containing relevance
assessments for the documents to be ranked. This is to be used for evaluation.
for each dataset, and for test partitions, a set of runs from retrieval systems that
participated to CLEF 2017/2018 to be considered for fusion.
a Terrier index of the entire Pubmed collection. This index has been produced using the
Terrier stopword list and Porter stemmer.
a Java Maven project that contains the Terrier dependencies and a skeleton code to give you
a start. NOTE: Tip #1 provides you with a restructured skeleton code to make the processing
of queries more efficient.
a template for your project report.
What you need to produce
You need to produce:
correct implementations of the state-of-the-art methods required by this project
specifications
correct evaluation, analysis and comparison of the state-of-the-art method, including
comparison with the methods implemented in part 1. This should be written up into a
report following the provided template.
a project report that, following the provided template, details: an explanation of the state-ofthe-art
retrieval method used (with your own words), an explanation of the evaluation
settings followed, the evaluation of results (as described above), inclusive of analysis, a
discussion of the findings. Note that you will need to provide a unique report that
encompasses both part 1 and part 2.
Required methods to implement
In part 2 of the project you are required to implement the following query reduction retrieval
method:
Query reduction using IDF-r. We have discussed this method in the week 6 lecture (online
video) and in the week 6 tutorial. This method is described in Koopman, Bevan, Liam
Cripwell, and Guido Zuccon, "Generating clinical queries from patient narratives: A
comparison between machines and humans." Proceedings of the 40th international ACM SIGIR
conference on Research and development in information retrieval. ACM, 2017. (see the first
paragraph of section 3.1 if you want a description from the literature -- ignore the settings of
described in that publication). You may have already implemented this for part 1 for
reducing the boolean queries (tip 4), and in the relevant tutorial.
Query reduction using Kullback-Liebler informativeness (KLI). This reduction method is
partially described in Daniel Locke, Guido Zuccon, and Harrisen Scells, "Automatic Query
Generation from Legal Texts for Case Law Retrieval." Asia Information Retrieval Symposium.
Springer, Cham, 2017. (top of page 187)
For IDF-r, we ask you explore reduction on the query formed by the title query. Queries will be
reduced at a reduction of , where is the retantion rate, i.e. means retaining 85%
of the original terms. We ask you explore three retantion rates on the training set: 85%, 50% and
30%. When rounding the number of query terms to retain to an integer number, use the ceiling
function.
For implementing KLI, consider the following, revised definition of this method. The KLI of a term
is formally defined by
where is the set of documents provided to rank (i.e. the documents initially retrieved by the
Boolean query), and is the entire collection as indexed in the provided index. Thus, you need to
compute, for each query term, the probability of the term appearing in the provided retrieved set
(i.e. term frequency in the set -- note, here is not representing one document!, but the set
of initially retrieved documents): use MLE to compute this. Similarly, use MLE to compute the
probability of term appearing in the collection. Query reduction is then performed by ranking
query terms in decresing value of , and applying the retaintion rate . For KLI, perform a
similar exploration of retation rates as for IDF- .
For both methods, rank documents according to the reduced queries using BM25 with the best
parameters found from part 1 for the dataset you are experimenting in.
When tuning, tune with respect to MAP.
We strongly recommend you use and extend the Maven project provided for part 1 to implement
these methods. You should have already attempted the implementation of IDF- as part of the
relevant tutorial exercise.
In the report, detail how the methods were implemented, including which formula you
implemented.
What queries to use
For part 2, we ask you to consider the queries for each topic created from the title field of each
topic. For example, consider the example (partial) topic listed below: the query will be Rapid
diagnostic tests for diagnosing uncomplicated P. falciparum malaria in endemic
countries (you may consider performing text processing). This is the same query type used in
part 1.
Above: example topic file
Required evaluation to perform
In part 1 of the project you are required to perform the following evaluation:
1. For all methods, train on the training set for the 2017 topics with respect to the retaintion
rate and test on the testing set for the 2017 topics (using the parameter value you selected
from the training set). Report the results of every method on the training (the best selected)
and on the testing set, separately, into one table. Perform statistical significance analysis
across the results of the methods.
2. Comment on the results reported in the previous table by comparing the methods on the
2017 dataset.
3. For all methods, train on the training set for the 2018 topics (with respect to the retaintion
rate and test on the testing set for the 2018 topics (using the parameter value you selected
from the training set). Report the results of every method on the training (the best selected)
and on the testing set, separately, into one table. Perform statistical significance analysis
across the results of the methods.
4. Comment on the results reported in the previous table by comparing the methods on the
2018 dataset.
5. Perform a topic-by-topic gains/losses analysis for both 2017 and 2018 results on the testing
datasets, by considering as baseline (tuned) BM25.
6. Comment on trends and differences observed when comparing the findings from 2017 and
2018 results. Is there a query reduction method that consistently outperform the others?
In terms of evaluation measures, evaluate the retrieval methods with respect to mean average
precision (MAP) using trec_eval . Remember to set the cut-off value ( -M , i.e. the maximum
number of documents per topic to use in evaluation) to the number of documents to be reranked
for each of the queries. Using trec_eval , also compute Rprecision (Rprec), which is the
precision after R documents have been retrieved (by default, R is the total number of relevant
docs for the topic).
For all statistical significance analysis, use paired t-test; distinguish between p<0.05 and p<0.01.
Topic: CD008122
Title: Rapid diagnostic tests for diagnosing uncomplicated P. falciparum
malaria in endemic countries
Query:
1. Exp Malaria/
2. Exp Plasmodium/
3. Malaria.ti,ab
4. 1or2or3
5. Exp Reagent kits, diagnostic/ 6. rapid diagnos* test*.ti,ab
7. RDT.ti,ab
8. Dipstick*.ti,ab
How to submit
You will have to submit 3 files:
1. the report, formatted according to the provided template, saved as PDF or MS Word
document. Note, write the report by combining part 1 (the previous assignment) and part 2
(this assignment) results and methods. make sure you clearly label methods and results that
belong to the different assignments.
2. a zip file containing a folder called runs-part2 , which itself contains the runs (result files)
you have created for the implemented methods.
3. a zip file containing a folder called code-part2 , which itself contains all the code to re-run
your experiments. You do not need to include in this zip file the runs we have given to you.
You may need to include additional files e.g. if you manually process the topic files into an
intermediate format (rather than automatically process them from the files we provide you),
so that we can re-run your experiments to confirm your results and implementation.
If your set of runs is too big, please do the following:
include in the zip the test run
include in the zip the best train run you used to decide upon the parameter tuning
create a separate zip file with all the runs; upload it to a file sharing service like dropbox or
google drive (or similar), then make sure it is visible without login and add the link to it to
your report. Please ensure that the link to the resources is available for at least 6 days after
the submission of the assignment.
All items need to be submitted via the relevant Turnitin link in the INFS7410 Blackboard site, by 19
September 2019 17:00 Eastern Australia Standard Time, together with part 1, unless you have
been given an extension (according to UQ policy), before the due date of the assignment. Note:
appropriate, separate links are provided in the Assignment 2 folder in Blackboard for submission
of the report, or runs-part1, runs-part2, code-part1, and code-part2.
INFS 7410 Project Part 2 – Marking Sheet
• Correct empirical evaluation has