|
Course description: Machine learning is a multi-faceted area of research, with many subfields that have been explored over the past 50 years. This class will provide a state-of-the-art overview of the field, with a strong emphasis on core mathematical foundations. It is recommended that students taking the course attend the weekly Machine Learning lunch, which will provide additional exposure to current applications in the field. This is the first of two classes, with the Advanced ML class in Spring semesters providing a deeper exposure to current research in the field.
Machine learning is the formalized study of the intuitive notion of "learning" (which itself can be interpreted in a myriad of ways, from "self-improvement" to "pattern discovery"). We will study four major forms of learning: supervised learning, semi-supervised learning, reinforcement learning, and unsupervised learning. While some of the specifics differ, the course will stress foundations that underly these seemingly disparate forms of learning.
Most forms of learning can be modeled as extracting regularities from observed data. Formally, this process involves projecting or embedding the data onto some measurable hypothesis space (e.g, a parametric probability distribution, a vector space with an inner product defined on it, or a manifold or graph). In this course, we will cover three approaches in detail: parametric approaches based on Bayesian inference and graphical models; nonparametric approaches based on kernel methods; and finally, spectral techniques using manifolds and graphs.
In the graphical models paradigm, observed data is viewed as having been generated by sampling from a probability distribution. "Learning" corresponds to statistical estimation of the parameters of an underlying distribution that can be viewed as a generator of the data. In the kernels approach, data is embedded in a Hilbert space (a vector space with an inner product that directly provides a similarity metric on the space). "Learning" in nonparametric kernel-based models expresses the hypothesis in an instance-specific manner, as a weighted sum of kernel functions. Finally, in the manifold framework, a data-dependent kernel is used to embed the data onto a vector space, where a graph defines the similarity between instances. Here, distances are defined using non-Euclidean geometry based on random walks on the graph (or more formally, the graph Laplacian).
Lecture: Tuesday & Thursday 1:00-2:15, Room 140
Prerequisites: Good undergraduate level exposure to basic concepts in artificial intelligence and machine learning; linear algebra, probability theory and statistics, algorithmic analysis; knowledge of MATLAB (helpful, not required), and high-level (C, Java etc.) computer programming. Please talk with the instructor if you want to take the course but have doubts about your qualifications. This is a CORE class for CMPSCI graduate students.
Credit: 3 units
Instructor: Sridhar Mahadevan
Teaching assistant: TBA
References: There is no single text that covers all the material in the course. The first book by Bishop is recommended since it is a comprehensive modern treatment. The second book by Mackay covers ML from an information-theoretic perspective. The third book by Jordan is unpublished, and draft chapters will be circulated via email. The fourth book by Scholkopf and Smola, portions available online, covers material related to kernel methods. In addition, it is essential to have access to background texts on linear algebra (e.g. Strang) and statistics (e.g. Casella and Berger), since many ideas in ML are grounded in linear algebra and statistics.
Mathematical Foundations: It is highly recommended that you have access to a standard undergraduate level text on linear algebra (e.g, Strang), and a graduate level text on statistics (e.g, Casella and Berger).
Other resources:
Research Forums:
Exams: There will be a take-home midterm and no final exam.
Grading: