Class Schedule

Event TypeDateTopicReferencesAnnouncements
Lecture Jan 23 Class overview. Reservoir Sampling Presentation-1
Mini-Ex 1 due on 1/30
Lecture Jan 25 Probability Review. Expectation, Variance, Markov Inequality, Chebyshev’s Inequality Presentation-2
Lecture Jan 30 Chernoff Bound Presentation-3
Lecture Feb 1 Sampling Presentation-4
Lecture Feb 6 Hashing Presentation-5
Lecture Feb 8 Bloom filter Presentation-6
Bloom Filter-1
Bloom Filter-2
Lecture Feb 13 Data Streaming Algorithms and Heavy Hitter Presentation-10
Lecture Feb 15 Lower bounds for Streaming Algorithms slides
HW-1 out, due on Mar 5
Lecture Feb 20 Count-Min Sketch Presentation-10
lecture note
Lecture Feb 22 Frequency Moment Estimation Presentation-11
lecture note
No class Feb 27 Class cancelled.
Lecture Mar 1 Frequency Moment Estimation Presentation-11
lecture note
Lecture Mar 6 Graph Streaming Presentation-12
Lecture Mar 8 Introduction to MapReduce Presentation-7
No class Mar 13 Spring recess
No class Mar 15 Spring Recess
Lecture Mar 20 MapReduce Algorithms mapreduce-notes Mini-Ex 2 due on Mar 29
Exam Mar 22 Midterm Exam: In class between 11:30am-12:45pm.
Lecture Mar 27 MapReduce Algorithms
Lecture Mar 29 Finding similar items Presentation-13 HW-2 due on Apr 16
Lecture Apr 3 Locality Sensitive Hashing Presentation-14
lecture note
Lecture Apr 5 Locality Sensitive Hashing Presentation-14
lecture note
Lecture Apr 10 Clustering: k-means, k-means++, k-center, k-median Presentation-15
lecture note
Lecture Apr 12 Correlation Clustering Presentation-16
No class Apr 17 Monday class schedule will be followed
Lecture Apr 19 Interactive Clustering Presentation-17
lecture notes
HW-3 due on May 1
Lecture Apr 24 Learning Algorithms Presentation-18
Ch 12 : section 12.1,12.2 from
textbook by Leskovec
Perceptron ref
Lecture Apr 26 Learning Algorithms Presentation-19
Ch 12 : section 12.3 from
textbook by Leskovec
Lecture May 1 Course Overview
Exam May 3 Final Exam: In Room 119, Engineering & CmpSci Ctr II between 1:00pm - 3:00pm.

Some student-written mostly unedited class notes from a previous course. If you find any typos/errors, please report to the instructor.

  1. Data-stream-Introduction: Introduction, computing frequent item deterministically, lower bound, basic concentration inequality.
  2. Count-Distinct: Chernoff’s bound, Universal Hash Function, Counting Distinct Element
  3. Count-Min Sketch
  4. Estimating F_2
  5. Dimensionality Reduction: Linear sketch as dimensionality reduction technique, JL Lemma, Near Neighbor Search
  6. Locality Sensitivity Hashing
  7. Sampling: Reservoir, Priority
  8. Clustering
  9. Intro to Semistreaming
  10. Graph Sparsifiers
  11. Sketching_Graphs
  12. Some MapReduce Algorithms