NFS7410讲解I、Java设计讲解、Java程序语言调试

2019.09.19 - 首页 >> Java编程

INFS7410 Project - Part 2

Preamble

The due date for this assignment is 19 September 2019 17:00 Eastern Australia Standard Time,

together with part 1.

This part of the project is worth 10% of the overall mark for INFS7410 (part 1 is woth 5% -- and

thus the whole submission of part 1 + 2 is worth 15%). A detailed marking sheet for this

assignment is provided at the end of this document.

Aim

Project aim: The aim of this project is to implement a state-of-the-art information retrieval

method, evaluate it and compare it to the baseline and rank fusion methods obtained in part 1 in

the context of a real use-case.

Project Part 2 aim

The aim of part 2 is to:

Use the evaluation infrastructure setup for part 1

implement state-of-the-art information retrieval methods, based on query reduction

evaluate, compare and analyse the developed state-of-the-art methods against baseline and

ranking fusion methods

The Information Retrieval Task: Ranking of studies for

Systematic Reviews

Part 2 of the project considers the same problem described in part 1: re-rank a set of documents

retrieved for the compilation of a systematic review. A description of the wider task is provided in

part 1.

What we provide you with (same as part 1)

We provide:

for each dataset, a list of topics to be used for training. Each topic is organised into a file.

Each topic contains a title and a Boolean query.

for each dataset, a list of topics to be used for testing. Each topic is organised into a file. Each

topic contains a title and a Boolean query.

each topic file (both those for training and those for testing), includes a list of retrieved

documents in the form of their PMIDs: these are the documents that you have to rank. Take

note: you do not need to perform the retrieval from scratch (i.e. execute the query against

the whole index); instead you need to rank (order) the provided documents.

for each dataset, and for each train and test partition, a qrels file, containing relevance

assessments for the documents to be ranked. This is to be used for evaluation.

for each dataset, and for test partitions, a set of runs from retrieval systems that

participated to CLEF 2017/2018 to be considered for fusion.

a Terrier index of the entire Pubmed collection. This index has been produced using the

Terrier stopword list and Porter stemmer.

a Java Maven project that contains the Terrier dependencies and a skeleton code to give you

a start. NOTE: Tip #1 provides you with a restructured skeleton code to make the processing

of queries more efficient.

a template for your project report.

What you need to produce

You need to produce:

correct implementations of the state-of-the-art methods required by this project

specifications

correct evaluation, analysis and comparison of the state-of-the-art method, including

comparison with the methods implemented in part 1. This should be written up into a

report following the provided template.

a project report that, following the provided template, details: an explanation of the state-ofthe-art

retrieval method used (with your own words), an explanation of the evaluation

settings followed, the evaluation of results (as described above), inclusive of analysis, a

discussion of the findings. Note that you will need to provide a unique report that

encompasses both part 1 and part 2.

Required methods to implement

In part 2 of the project you are required to implement the following query reduction retrieval

method:

Query reduction using IDF-r. We have discussed this method in the week 6 lecture (online

video) and in the week 6 tutorial. This method is described in Koopman, Bevan, Liam

Cripwell, and Guido Zuccon, "Generating clinical queries from patient narratives: A

comparison between machines and humans." Proceedings of the 40th international ACM SIGIR

conference on Research and development in information retrieval. ACM, 2017. (see the first

paragraph of section 3.1 if you want a description from the literature -- ignore the settings of

described in that publication). You may have already implemented this for part 1 for

reducing the boolean queries (tip 4), and in the relevant tutorial.

Query reduction using Kullback-Liebler informativeness (KLI). This reduction method is

partially described in Daniel Locke, Guido Zuccon, and Harrisen Scells, "Automatic Query

Generation from Legal Texts for Case Law Retrieval." Asia Information Retrieval Symposium.

Springer, Cham, 2017. (top of page 187)

For IDF-r, we ask you explore reduction on the query formed by the title query. Queries will be

reduced at a reduction of , where is the retantion rate, i.e. means retaining 85%

of the original terms. We ask you explore three retantion rates on the training set: 85%, 50% and

30%. When rounding the number of query terms to retain to an integer number, use the ceiling

function.

For implementing KLI, consider the following, revised definition of this method. The KLI of a term

is formally defined by

where is the set of documents provided to rank (i.e. the documents initially retrieved by the

Boolean query), and is the entire collection as indexed in the provided index. Thus, you need to

compute, for each query term, the probability of the term appearing in the provided retrieved set

(i.e. term frequency in the set -- note, here is not representing one document!, but the set

of initially retrieved documents): use MLE to compute this. Similarly, use MLE to compute the

probability of term appearing in the collection. Query reduction is then performed by ranking

query terms in decresing value of , and applying the retaintion rate . For KLI, perform a

similar exploration of retation rates as for IDF- .

For both methods, rank documents according to the reduced queries using BM25 with the best

parameters found from part 1 for the dataset you are experimenting in.

When tuning, tune with respect to MAP.

We strongly recommend you use and extend the Maven project provided for part 1 to implement

these methods. You should have already attempted the implementation of IDF- as part of the

relevant tutorial exercise.

In the report, detail how the methods were implemented, including which formula you

implemented.

What queries to use

For part 2, we ask you to consider the queries for each topic created from the title field of each

topic. For example, consider the example (partial) topic listed below: the query will be Rapid

diagnostic tests for diagnosing uncomplicated P. falciparum malaria in endemic

countries (you may consider performing text processing). This is the same query type used in

part 1.

Above: example topic file

Required evaluation to perform

In part 1 of the project you are required to perform the following evaluation:

1. For all methods, train on the training set for the 2017 topics with respect to the retaintion

rate and test on the testing set for the 2017 topics (using the parameter value you selected

from the training set). Report the results of every method on the training (the best selected)

and on the testing set, separately, into one table. Perform statistical significance analysis

across the results of the methods.

2. Comment on the results reported in the previous table by comparing the methods on the

2017 dataset.

3. For all methods, train on the training set for the 2018 topics (with respect to the retaintion

rate and test on the testing set for the 2018 topics (using the parameter value you selected

from the training set). Report the results of every method on the training (the best selected)

and on the testing set, separately, into one table. Perform statistical significance analysis

across the results of the methods.

4. Comment on the results reported in the previous table by comparing the methods on the

2018 dataset.

5. Perform a topic-by-topic gains/losses analysis for both 2017 and 2018 results on the testing

datasets, by considering as baseline (tuned) BM25.

6. Comment on trends and differences observed when comparing the findings from 2017 and

2018 results. Is there a query reduction method that consistently outperform the others?

In terms of evaluation measures, evaluate the retrieval methods with respect to mean average

precision (MAP) using trec_eval . Remember to set the cut-off value ( -M , i.e. the maximum

number of documents per topic to use in evaluation) to the number of documents to be reranked

for each of the queries. Using trec_eval , also compute Rprecision (Rprec), which is the

precision after R documents have been retrieved (by default, R is the total number of relevant

docs for the topic).

For all statistical significance analysis, use paired t-test; distinguish between p<0.05 and p<0.01.

Topic: CD008122

Title: Rapid diagnostic tests for diagnosing uncomplicated P. falciparum

malaria in endemic countries

Query:

1. Exp Malaria/

2. Exp Plasmodium/

3. Malaria.ti,ab

4. 1or2or3

5. Exp Reagent kits, diagnostic/ 6. rapid diagnos* test*.ti,ab

7. RDT.ti,ab

8. Dipstick*.ti,ab

How to submit

You will have to submit 3 files:

1. the report, formatted according to the provided template, saved as PDF or MS Word

document. Note, write the report by combining part 1 (the previous assignment) and part 2

(this assignment) results and methods. make sure you clearly label methods and results that

belong to the different assignments.

2. a zip file containing a folder called runs-part2 , which itself contains the runs (result files)

you have created for the implemented methods.

3. a zip file containing a folder called code-part2 , which itself contains all the code to re-run

your experiments. You do not need to include in this zip file the runs we have given to you.

You may need to include additional files e.g. if you manually process the topic files into an

intermediate format (rather than automatically process them from the files we provide you),

so that we can re-run your experiments to confirm your results and implementation.

If your set of runs is too big, please do the following:

include in the zip the test run

include in the zip the best train run you used to decide upon the parameter tuning

create a separate zip file with all the runs; upload it to a file sharing service like dropbox or

google drive (or similar), then make sure it is visible without login and add the link to it to

your report. Please ensure that the link to the resources is available for at least 6 days after

the submission of the assignment.

All items need to be submitted via the relevant Turnitin link in the INFS7410 Blackboard site, by 19

September 2019 17:00 Eastern Australia Standard Time, together with part 1, unless you have

been given an extension (according to UQ policy), before the due date of the assignment. Note:

appropriate, separate links are provided in the Assignment 2 folder in Blackboard for submission

of the report, or runs-part1, runs-part2, code-part1, and code-part2.

INFS 7410 Project Part 2 – Marking Sheet

• Correct empirical evaluation has