CMPSCI 689

Machine Learning

Fall 2008


Course description: Machine learning is a multi-faceted area of research, with many subfields that have been explored over the past 50 years. This class will provide a state-of-the-art overview of the field, with a strong emphasis on core mathematical foundations. It is recommended that students taking the course attend the weekly Machine Learning lunch, which will provide additional exposure to current applications in the field. This is the first of two classes, with the Advanced ML class in Spring semesters providing a deeper exposure to current research in the field.

Machine learning is the formalized study of the intuitive notion of "learning" (which itself can be interpreted in a myriad of ways, from "self-improvement" to "pattern discovery"). We will study four major forms of learning: supervised learning, semi-supervised learning, reinforcement learning, and unsupervised learning. While some of the specifics differ, the course will stress foundations that underly these seemingly disparate forms of learning.

Most forms of learning can be modeled as extracting regularities from observed data. Formally, this process involves projecting or embedding the data onto some measurable hypothesis space (e.g, a parametric probability distribution, a vector space with an inner product defined on it, or a manifold or graph). In this course, we will cover three approaches in detail: parametric approaches based on Bayesian inference and graphical models; nonparametric approaches based on kernel methods; and finally, spectral techniques using manifolds and graphs.

In the graphical models paradigm, observed data is viewed as having been generated by sampling from a probability distribution. "Learning" corresponds to statistical estimation of the parameters of an underlying distribution that can be viewed as a generator of the data. In the kernels approach, data is embedded in a Hilbert space (a vector space with an inner product that directly provides a similarity metric on the space). "Learning" in nonparametric kernel-based models expresses the hypothesis in an instance-specific manner, as a weighted sum of kernel functions. Finally, in the manifold framework, a data-dependent kernel is used to embed the data onto a vector space, where a graph defines the similarity between instances. Here, distances are defined using non-Euclidean geometry based on random walks on the graph (or more formally, the graph Laplacian).

Tentative Course Outline

Lecture: Tuesday & Thursday 1:00-2:15, Room 140

Prerequisites: Good undergraduate level exposure to basic concepts in artificial intelligence and machine learning; linear algebra, probability theory and statistics, algorithmic analysis; knowledge of MATLAB (helpful, not required), and high-level (C, Java etc.) computer programming. Please talk with the instructor if you want to take the course but have doubts about your qualifications. This is a CORE class for CMPSCI graduate students.

Credit: 3 units

Instructor: Sridhar Mahadevan

Teaching assistant: TBA

References: There is no single text that covers all the material in the course. The first book by Bishop is recommended since it is a comprehensive modern treatment. The second book by Mackay covers ML from an information-theoretic perspective. The third book by Jordan is unpublished, and draft chapters will be circulated via email. The fourth book by Scholkopf and Smola, portions available online, covers material related to kernel methods. In addition, it is essential to have access to background texts on linear algebra (e.g. Strang) and statistics (e.g. Casella and Berger), since many ideas in ML are grounded in linear algebra and statistics.

  • Christopher Bishop, Pattern Recognition and Machine Learning, Springer, 2006.
  • David MacKay, Information Theory, Inference, and Learning Algorithms, Cambridge University Press, 2003. (There is an electronic version of this book).
  • Michael Jordan, Graphical Models, unpublished. (handouts will be emailed to class participants).
  • Bernhard Scholkopff and Alex Smola, Learning from Kernels, MIT Press, 2001. ( Search Google for "learning with kernels" for a free downloadable sample of some chapters).

    Mathematical Foundations: It is highly recommended that you have access to a standard undergraduate level text on linear algebra (e.g, Strang), and a graduate level text on statistics (e.g, Casella and Berger).

  • Weber, Statistics , a concise overview of statistics from a course taught at Cambridge University, 2000.
  • Casella and Berger, Statistical Inference, 2001.
  • Gilbert Strang, Linear Algebra, 2003. See this link for a video version of this book from an MIT course.
  • Fan Chung Graham, Spectral Graph Theory, American Mathematical Society, 1996.

    Other resources:

  • Duda, Hart, and Stork: Pattern Recognition (Wiley), Sixth printing. Early editions have a lot of errors. There is a detailed list of errata for earlier editions at this web site .
  • Hastie, Tibshirani, and Friedman, Statistical Learning, Springer-Verlag, 2001.
  • Mitchell, Tom: Machine Learning, McGraw-Hill, 1997. Excellent undergraduate level introduction to the field.
  • Research Forums:

  • The two premier forums for publication of current research in machine learning are the annual International Conference on Machine Learning (ICML) and the International Neural Information Processing systems (NIPS) conference. You will be required to present 2-3 papers from recent ICML/NIPS proceedings at the end of the class.

  • The two premier journal forums for publication of current research are the Journal for Machine Learning Research, and the Machine Learning journal.

  • Besides the above, ML research is reported in a vast array of auxiliary conferences and journals, including AAAI, UAI, IEEE conferences and journals, statistics conferences and journals.

    Exams: There will be a take-home midterm and no final exam.

    Grading: