Class Schedule

Event TypeDateInstructorTopicReferencesAnnouncements
Lecture Sep 5 BS Class overview. Reservoir Sampling Presentation-1
Link-1
Mini Exercise-1 due on Sep 12
Lecture Sep 7 BS Probability Review. Expectation, Variance, Markov Inequality, Chebyshev’s Inequality Presentation-2
Lecture-2
Lecture Sep 12 BS Chernoff Bound Presentation-3
Lecture-2
Lecture Sep 14 BS Sampling Presentation-4
Lecture-3
Lecture-4
Lecture Sep 19 BS Hashing Presentation-5
Hashing
Lecture-5
Hw-1 due on Oct 2
Lecture Sep 21 BS Bloom filter Presentation-6
Bloom Filter-1
Bloom Filter-2
No class Sep 26 Career Fair
Lecture Sep 28 DW MapReduce Presentation-7
Lecture Oct 3 DW MapReduce implementations Presentation-8 Hw-2 due on Oct 16
Lecture Oct 5 DW MapReduce Algorithms mapreduce-notes
No class Oct 10 Monday Schedule
Lecture Oct 12 DW More advanced, distributed setting Presentation-9
Exam Oct 17 Midterm
Lecture Oct 19 BS Data Streaming Algorithms and Heavy Hitter Presentation-10 Mini Exercise-2 due Oct 31
Lecture Oct 24 BS Count-Min Sketch Presentation-10
lecture note
Lecture Oct 26 BS Frequency Moment Estimation Presentation-11
lecture note
Lecture Oct 31 BS Semi-streaming and MapReduce revisited Presentation-12
Lecture Nov 2 BS More graph algorithms
Lecture Nov 7 BS Finding similar items Presentation-13 Homework 3 posted, due on Dec 1st 4pm
Lecture Nov 9 BS Locality Sensitive Hashing Presentation-14
lecture note
Lecture Nov 14 BS Clustering Presentation-15
lecture note
Lecture Nov 16 BS Correlation Clustering and Interactive Clustering Presentation-16
No class Nov 21 Thanksgiving recess
No class Nov 23 Thanksgiving recess
Lecture Nov 28 DW Learning Algorithms
Lecture Nov 30 DW Learning Algorithms Mini-Exercise 3 posted, due on Dec 7th
Lecture Dec 5 DW Learning Algorithms
Lecture Dec 7 BS Lower bounds Presentation-17
lower-bounds
No class Dec 12 No class due to Instructor travel
Exam Dec 19 In class final exam between 10:30am - 12:30pm

Some student-written mostly unedited class notes from a previous course. If you find any typos/errors, please report to the instructor.

  1. Data-stream-Introduction: Introduction, computing frequent item deterministically, lower bound, basic concentration inequality.
  2. Count-Distinct: Chernoff’s bound, Universal Hash Function, Counting Distinct Element
  3. Count-Min Sketch
  4. Estimating F_2
  5. Dimensionality Reduction: Linear sketch as dimensionality reduction technique, JL Lemma, Near Neighbor Search
  6. Locality Sensitivity Hashing
  7. Sampling: Reservoir, Priority
  8. Clustering
  9. Intro to Semistreaming
  10. Graph Sparsifiers
  11. Sketching_Graphs
  12. Some MapReduce Algorithms