代写MODL5007M: Introduction to Corpus linguistics for translators调试Python程序

2024.08.07 - 首页 >> Java编程

MODL5007M: Introduction to Corpus linguistics for translators (15 credits)

1 Overview

Corpus linguistics is aimed at the empirical study of how language is used. The basis for the study is provided by corpora, i.e. large databanks of texts in natural language. This module explores basic methods in corpus linguistics and aims to equip you with the ability to develop and use monolingual and multilingual corpora for learning foreign languages and doing translations. It complements most closely the core modules in Translation and in Translation Memories. Traditional bilingual dictionaries and their electronic versions provide basic information on translation equivalence, but typically there are more possibilities for translating words in context than offered by dictionaries. In contrast, translation memory tools are designed to provide examples of translations in their context, but the size of a database available for a translator is typically limited. A corpus can help you in studying uses of words in a foreign language and comparing uses in two languages when translating.

2 Available corpora and corpus tools

From the Internet you can access some reference corpora, such as the British National Corpus, as well as general purpose corpora for Arabic, Chinese, Czech, German, Italian, Japanese, Portuguese, Russian, Spanish (and some other languages). A software applic- ation that produces lines with keywords and their contexts is a concordancer. The course will also teach you to use concordancers for studying uses of words and testing translation equivalents.

3 Objectives

On completion of this module, you should be able to:

• describe and exemplify goals and methods of corpus linguistics

• describe basic types of corpora

• understand principles of corpus querying

• know relevant statistical methods

• design your own specialised corpora

• compare word uses in the source and target languages using parallel and comparable corpora

• use corpus data to build glossaries and task-speciﬁc dictionaries

4 Learning approaches

To achieve the module aims, you need a combination of conceptual knowledge and practical experience. Accordingly, you have weekly lectures (1 hour) combined with seminars (1 hour) or practical sessions (1 hour). Supervised practical sessions in ERIN will focus on basic IT skills for querying corpora and using concordancers. The lectures covering basic topics of the module alternate with seminars and practical sessions in which theory and practice are confronted and further explored through exercises.

5 Syllabus

Date Session Topic

W1 Lecture Theoretical foundations: Using corpora in research and practice

W1 Practical Using online corpus interfaces

W2 Lecture Quantitative study of corpora: frequency lists and collocations

W2 Seminar Analysing and comparing frequencies

W3 Lecture Methods for exploiting corpora: making queries

W3 Seminar Making queries and recording your work

W4 Lecture Quantitative study of corpora: collocations

W4 Seminar Using collocations and word sketches

W5 Seminar Linguistic annotation

W5 Seminar Experiments with explicit annotation

W6 Lecture Corpus-based dictionary development

W6 Practical Development of dictionaries in XML

W7 Reading week

W8 Lecture Building corpora from the Web

W8 Practical Building your own corpus

W9 Lecture Know your corpus: assessing corpus composition

W9 Seminar Assessing composition of your corpus

W10 Lecture Introduction to using Python

W10 Practical Building your corpus in Python

W11 Seminar More experiments with Python and XML

6 Assessment

At the end of the course you must complete a case study (of 2000 words) to report your project that compares uses of several lexical items in two languages using data from both large corpora and from corpora collected by you. The purpose of the case study to demonstrate your ability to use the tools for corpus querying and to analyse evidence provided by these tools. As an outcome of this case study you will also create a bilingual dictionary in XML for the lexical items with contexts of their uses to demonstrate how you can apply annotation methods.

The progress in the course will be also monitored by participation in the seminars.

For more information, including examples of expected submissions and the reading lists, please see the Minerva area and the Corpus module website: http://corpus.leeds.ac.uk/teaching/modl5007