代写DS-GA / LING-GA 1012, Spring 2025 Natural Language Understanding代写Python编程

2025.02.21 - 首页 >> OS编程

Natural Language Understanding

DS-GA / LING-GA 1012, Spring 2025

Overview

Building computational systems that can communicate with humans using natural language has been a central goal for what we now think of as AI research. Understanding real, naturally occurring human language is the key to reaching this goal. This course will briefly survey the fundamental technical methods that have led to successes in language understanding, and will focus on methods for determining to what extent they are successful (evaluation), illuminating how they work (interpretability), and comparing them to humans (cognitive modeling). Analytical ideas from linguistics and the psychology of reasoning will be introduced as necessary. A major goal of the course is to prepare students to do original research in this area, culminating with a substantial final project that should meet the standards of published work in this field.

Prerequisites

Ideally, students will have had some experience with most of the following concepts. That being said, since this is a graduate-level course with students from a diverse array of backgrounds (data science, computer science, linguistics, and undergrads), we recognize that many students will be unfamiliar with one or more of the topics below. This is okay, as long as you feel comfortable looking up anything that you don’t understand or asking for help when necessary.

Calculus and Linear Algebra

Partial derivatives, gradients, vectors, matrices, matrix multiplication, vector spaces

Probability and Statistics

Probability distributions, conditional probabilities, Bayes’s theorem, linear regression

Machine Learning and Data Science

Features (discrete vs. continuous), optimization, train/dev/test, dimensionality reduction (e.g., PCA), deep learning

Python Programming

Basic syntax, iterables/comprehension, Jupyter notebooks, package managers (e.g., pip), modules, object-oriented programming, data types

Textbooks

Many of the readings, especially those assigned during the first half of the semester, will come from the following textbooks.

SLP Speech and Language Processing, 3rd Edition Draft by Dan Jurafsky and James H. Martin

D2L Dive into Deep Learning by Aston Zhang, Zachary C. Lipton, Mu Li, and Alexander J. Smola

Ling1 Linguistic Fundamentals for Natural Language Processing by Emily M. Bender

Ling2 Linguistic Fundamentals for Natural Language Processing II by Emily M. Bender and Alex Lascarides

EOL Essentials of Linguistics, 2nd Edition by Catherine Anderson, Bronwyn Bjorkman, Derek Denis, Julianne Doner, Margaret Grant, Nathan Sanders, and Ai Taniguchi

Schedule

The list of topics for each week is subject to change. Most lectures will have a methodological section on how to conduct or report research; those are indicated in italics.

Week 1 (1/24): Course overview; NLP fundamentals (slides, recording)

● Lab due to holiday (1/20)

Topics:

● Symbolic and neural representations

● Challenges in language understanding

● Course overview

Readings:

● SLP Chapter 6 (vector semantics and word embeddings)

● D2L Chapter 2 (skip Section 2.5), Section 4.1, and Sections 15.1–15.7

● Ling1 Chapter 2

● Ling2 Chapters 3 and 4

● EOL Sections 5.1–5.4 and 4 and 7.5

Week 2 (1/31): NLP fundamentals (slides, recording)

● Lab (1/27): Word Vectors & Scaling up N-Gram Models, colab

Topics:

● N-gram language models

● Classification and logistic regression

● Word embeddings and vector semantics

● Stochastic gradient descent

● Neural networks

● How to find a research question and do a literature review?

● Structuring team meetings

Readings:

● SLP Chapter 3 (N-gram Language Models) pages 1-10, Chapter 5 (logistic regression), Chapter 7 (Neural Networks),

● D2L Chapter 1, Section 2.5, Chapter 5, Chapter 12 (skip Sections 12.7–12.9), Section 16.1, and Section 19.1 (the rest of the chapter is optional)

Week 3 (2/7): Deep learning for NLP (slides, recording - starts 15 minutes late)

● Lab 002 (2/3) - Bag-of-Words Sentiment Classification , Colab

● Lab 003 (2/3) - Bag-of-Words + MLP Sentiment Classification Colab

Topics:

● Neural networks as LMs

● Transformers

● Writing an abstract

Readings:

● SLP Chapter 9 (Transformers)

Week 4 (2/14): Language models (slides, recording), lab (2/10)

● Masked LMs

● Fine-tuning and transfer learning

● Zero-shot prompting for LMs

● In-context learning

● Instruction tuning

● Learning from human feedback

● Abstract rubric

● Finding good papers to cite

● Why and how to cite a paper

Readings:

● SLP Chapter 11 (Masked Language Models), Chapter 12 (Model Alignment, Prompting, and In-Context Learning)

● Sanh et al. (2022): instruction tuning

● Prompt engineering guide

● Brown et al. (2020): language models as few-shot learners

● Ouyang et al., (2022): reinforcement learning from human feedback (RLHF)

● Rafailov et al. (2023): direct preference optimization (DPO)

Week 5 (2/21): Evaluation, lab (2/18)

Topics:

● Classic NLU classification tasks (natural language inference, question answering)

● Commonsense knowledge and Winograd schemas

● Machine translation and reference-based evaluation

● Popular aggregate benchmarks (GLUE, BigBench)

● Human annotations, inter-annotator agreement, and inherent disagreements

● Automatic evaluation with LLMs

● Evaluating chatbots

● LM scaling behavior. and “emergent abilities”

● How to write a proposal

Week 6 (2/28): Generalization, lab (2/24)

Topics:

● Syntax

● Targeted evaluations

● Heuristics (“right for the wrong reason”)

● Robustness

● Adversarial evaluation

● Memorization

● Sensitivity to pretraining distribution (e.g. “embers of autoregression”)

● Compositional generalization

● Long context evaluation

Week 7 (3/7): Fairness and safety, lab (3/3)

Topics:

● Bias and fairness

● Evaluating safety

● Red teaming

● The NLP/AI publication landscape

● The structure of a research paper

Week 8 (3/14): Interpretability: basic methods, lab (3/10)

Topics:

● Probing tasks

● Dependency and constituency parsing

● Structural probes

● Attention head analysis

● Circuits

● Sparse autoencoders

● How to create good figures

Week 9 (3/21): Interpretability: causal methods (guest lecture: Shauli Ravfogel), lab (3/17)

Topics: TBD

3/28: No class (spring break)

Week 10 (4/4) Reasoning, lab (3/31)

● Deductive reasoning

● Pragmatics

● Theory of mind

● Chain of thought and search

● How to cite and why

Week 11 (4/11) Knowledge, lab (4/7)

Topics:

● Faithfulness, attribution and factuality

● Retrieval augmented generation

● Model editing

● How to make a great poster

Week 12 (4/18) Language models and humans: acquisition, lab (4/14)

Guest: Michael Hu

Topics:

● Inductive bias

● What can LMs teach us about human language acquisition?

● Synthetic data and meta-learning

● Curriculum learning

● Sample efficiency and the LM data gap

● The BabyLM challenge

Week 13 (4/25) Language models and humans: comprehension, lab (4/21)

Guest: Byung-Doh Oh

Topics:

●

Week 14 (5/2): Project presentations, lab (4/28)