CS 145

Fall 2025 • Intro to Big Data Systems

Course Overview

SQL Systems nanoDB Transactions Distributed Systems

This course covers how to build data applications, and key principles on how to build and scale big data systems. Taught by Shiva Shivakumar.

A few key topics:

  1. How to use a database and SQL (Structured Query Language)? Spotify stores and manages 1 Billion+ songs/podcasts for 100M+ users and 10M+ artists. Let's build Spotify's features with SQL. E.g., artist search, popular songs, song recommendations. Intro to SQL alternatives.
  2. How to optimize queries with well-designed algorithms and data structures? Speed up Spotify queries with row and column stores, parallelism, BigSort, Hashing. See how OpenAI's GPT and Google Maps index and query complex data types like text and geo data.
  3. How to build reliable Transactions? Ticketmaster-like orgs sell Taylor Swift concert tickets to 1 million+ fans across ~50 concerts in transactions – no double selling, guaranteeing tickets after payment. How does it work?
  4. How distributed big data systems scale? Lessons from Google Ads and Discord in scaling.

Course Logistics and Policies

  1. Conflict in exams or course schedule
    • It's a big class. We will not accommodate alternate exam schedules for those who have exam conflicts (both midterm and final).
    • For OAE accomodations, the tests will be on the same day, but with extended hours.
    • Please make sure you do not have a conflict in exam schedules when enrolling in CS 145.
  2. Mandatory attendance You must attend at least 3 of the four guest lectures in class. (Inclass short quiz) (Not for SCPD)
  3. Grading base
    1. Exam #1: 20%, Exam #2: 30%
    2. Project1: 10%, Project2: 25%
    3. PSETs: 12%, Guest Lectures: 3%
    4. Extra credit: Upto 2% teaching team discretion, for students with insightful in-class and Ed participation. Review the Gradescope - Extra credit for instructions.
    5. A+ > 97%, A > 93%, A- > 90%, B+ > 87%, B > 83%, B- > 80%, C+ > 77%, C > 73%, C- > 70%, D+ > 67%, D > 63%, D- > 60%, F < 60%
    6. For credit/no-credit, you need to score equivalent to at least a C- grade to pass the course.
  4. Canvas, Ed, Gradescope: Lecture videos will be posted on Canvas. Also, please access course Ed, Gradescope, Office Hour schedules on Canvas.
  5. Audit requests (Stanford only) For audit requests and access to internal tools, please fill form
  6. Timings and Late Days
    1. Projects and PSETs are due at 11:59 PM on the due date.
    2. For PSETs, no late days, unfortunately. Each Section builds on the last. Solutions are released on time so the full class can stay on pace.
    3. For projects, you can use a total of two late days (24 hours each, not pro-rated) shared between both project deadlines. You do not lose any credit when using a late day. If you run out of late days and need additional time, you can submit after the deadline -- you will receive a 10% deduction for the 1st 24 hours after the deadline, 25% deduction for the next 24 hours, and zero credit after that. It's by 24 hours, no proration (for seconds, hours, etc)

Honor Code/Collaboration Policy

Students must adhere to The Stanford Honor Code and The Stanford Honor Code as it pertains to CS courses.

We encourage students to form study groups. Students may discuss and work on homework problems in groups. However, each student must write down the solution independently, and without referring to written notes from the joint session.

It is an honor code violation to copy, refer to, or look at written or code solutions from a previous year, including but not limited to: official solutions from a previous year, solutions posted online, and solutions you or someone else may have written up in a previous year. Furthermore, it is an honor code violation to post your assignment solutions online, such as on a public git repo.

The teaching staff will be using plagiarism detection software and if we have reason to believe that you are in violation of the honor code, we will follow the university policy to report it.