Computer Science 4140 Project Description

- 首页 >> 其他

Using the resources of the Dialog State Tracking Challenge 2 (DSTC 2), you will conduct a series ofcomputational investigations that address the following aspects of a set of speaker-specific dialogs.1. Basic performance analysis of the speech quality.2. Automatic annotation of the change in dialog state based on what was actually spoken.3. Automatic generation of dialog feature information that can be used to improve detection ofmiscommunication in information-seeking dialog.Every student will be assigned a different speaker. Each speaker interacted with the entire set of dialogsystems used in the data collection process. Associated with each dialog in the data set are two files.log.json is the raw dialog data collected during the actual dialog interaction. label.json is the dialog datacreated by a combination of human and automatic annotation of the dialog data based on listening to therecording of what was actually said.More specific details about the required analysis are provided in the following sections.COMPUTATIONAL REQUIREMENTS1. Basic performance analysis of the speech qualityFor each dialog turn of each dialog in your data set, you will generate the followinginformation. The turn number The number of words actually spoken The number of words in the highest scored live speech recognition hypothesis. The total number of unique words (i.e. word types) found in the union of all ofthe live speech recognition hypotheses. A label describing whether or not the utterance was understood: U – understood;P – partially understood; N – not understood (I will be providing a tool to assistwith the labeling process at a later date)The primary data source for this analysis is the raw dialog data found in the log.json file.label.json is a secondary data source that will help with producing the label that indicatesthe level of system understanding for the dialog turn.2. Automatic annotation of dialog state changeFor each dialog turn of each dialog in your data set, you will generate the followinginformation. The set of attributes for which a new value has been supplied. The set of attributes for which a modified value has been supplied. The set of attributes for which a value had been removed.This annotation will be for the “informable” and “requestable” slots of the dialog state.You must use a dictionary format for the annotation, one dictionary per turn. Anexample format is provided on the project WWW page. You will need information fromboth the label.json and log.json files to produce the annotation.3. Automatic generation of dialog feature information for miscommunication detectionFor each dialog turn of each dialog in your data set, you will generate featureinformation that could be helpful in constructing a program that would try to detect whenmiscommunication has occurred. The choice of information will be based on youranalysis of the transcript of the dialogs in your data set as well as the analysis you willhave performed in completing tasks 1 and 2. You are required to propose at least threepossible features to be used. Based on a meeting with me on Friday April 13, we willselect a specific feature for which you will then write a program that will produce thedesired information.PROJECT DELIVERABLESFor each of the three investigations, you will submit your Python source code. For all submissions,use proj as the . spkdata.py – task 1, basic performance analysis of the speech quality dlgann.py – task 2, automatic annotation of dialog state change featuregen.py – task 3, generation of dialog feature informationIn addition, you will submit a written report. Report requirements will be provided at a later date.DAILY LOGThough not a part of your project deliverables, I STRONGLY encourage you to maintain a daily logof your activities (electronic or handwritten). This is invaluable in being able to maintain an efficientflow of work. Among other things it can help you keep track of key insights, ideas for futureexploration, ideas that didn’t work (and why), and a reminder of what task needs to be done next.Project DeadlinesApril 24, 9:00 A.M.Deadline for submitting all deliverables, including the report. The should be projProject EvaluationYour grade on this assignment will be weighted as follows:Code Quality and Correctness 70%Report Quality 30%