Timeline:

Pick teams (it's fine to work alone or in groups of 2-3) and brainstorm on possible topics: October - early November
Project proposal: Friday November 8
Project presentations: December 9 and 11
Project writeup: December 11

Some tips for coming up with a project:

Find a cool paper (or two) to read.
Find a big data set and implement an algorithm from class on that data set. If you want to do more: Compare the algorithm from class to a linear time algorithm? Try a few versions of the sublinear time algorithm? Invent a new version of the class algorithm and try it out? Perhaps the sublinear algorithm isn't actually faster because of the memory model in the machine, e.g. disk scans are faster than jumping around and sampling -- can you take advantage of sampling + fast scans?
Can you get a better dependence on n or epsilon for some algorithm in class? (check with me before you work too hard on that.. I may already know a better bound) Can you get a better lower bound? Can you get a better dependence for a special class of inputs? (say subgraphs of the grid? planar graphs? expanding graphs? random graphs?)
What if you add a stronger assumption on the form of the input? For example, maybe the input is a set of points in sorted order? (in this case, you can do binary search and a number of other things) What can you do that is interesting in this case for other problems that we considered in class? Another way of viewing this question -- what if I give you access to some sort of famous data structure for the input (say range queries, nearest neighbor queries, or something like that) -- can you do anything interesting with that?
What if I give you a weaker assumption on the form of the input -- maybe you can only get random examples of the input rather than query access. Can you solve any interesting property testing problems in this case? For example, for estimating the diameter of a graph or a point set? For estimating the average degree of a graph?
Most of our distance related results are in Hamming distance or L1 distance. Is it easier or harder for a different distance metric -- e.g. EMD, or any other favorite distance? e.g., are there sublinear time tests for whether the data is well approximated by a histogram under EMD (as opposed to L1,L2 distances, which has been studied).
Some specific questions that may or may not be doable: (come see me for explanations)
- Sampling correctors for distributions that are one of (bimodal, juntas, monotone over the plane).
- LCAs for sparse subgraphs that aren't so sparse (i.e., average degree 10) (improving on general questions of Levi, Ron, Rubinfeld which look for sparse subgraphs that are (1+eps) average degree)

Project writeup: The project writeup will depend on the type of project that is being done. Aim to keep it under five pages (shorter is ok too).

Explain the project (problem that you are solving, problem solved by the papers that you are reading or algorithm that you are implementing), summary of existing work, your results/contributions.
Make sure to define any new terms that weren't discussed in the course.
If the project is to read a couple of papers, then the project writeup should describe an overview of the results and give more details on interesting theorems or lemmas (the extent to which this is done depends on the difficulty of the paper and the size of the group).
If the project is to implement an algorithm, then the writeup should describe the algorithm, the datasets chosen and the rational behind the experiments. Did you learn anything interesting?
If the project is to prove theorems, then the theorem statements and proofs should be in the writeup. If you weren't successful, where did you get stuck? do you have a conjecture? What would be an interesting question to pursue?

Some interesting data sets (it's great if you can find other interesting data sets!!)

Project presentations: Make sure to describe the project, say something about previous work (if relevant) and definitely define any new terms before you use them! Note that the alloted time goes very quickly, so think of high level ways to explain the interesting parts of the project.

Examples of past projects Note that it is standard in the theory community to include the whole project team on the list of authors, though perhaps adding others that joined the project later.
A few links to research that began with previous projects in this class:

Prof. Paul Valiant's thesis
Sublinear Time Algorithms for Earth Movers Distance TCS 2011.
A note on testing intersection of convex sets in sublinear time
Eccentricity Heuristics through Sublinear Analysis Lenses APOCS 2020.
Property Testing of LP-Type Problems ICALP 2020.
Testing Tail Weight of a Distribution via Hazard Rate ALT 2023.

Examples of possibly interesting papers to read: (list under construction, more may be added) (links not included) If you go to the dblp pages of these authors and their coauthors, there are many other good examples of papers.

Parameterized Convexity Testing, Abhiruk Lahiri, Ilan Newman and Nithin Varma, To appear in SOSA22. Also, arXiv
Ilan Newman and Nithin Varma "Strongly Sublinear Algorithms for Testing Pattern Freeness."
Every Property of Outerplanar Graphs is Testable, Jasine Babu, Areej Khoury, Ilan Newman APPROX-RANDOM 2016: 21:1-21:19
Bhattacharyya, Canonne, Yang. Independence Testing for Bounded Degree Bayesian Network
Talya Eden, Dana Ron, Will Rosenbaum Almost Optimal Bounds for Sublinear-Time Sampling of k-Cliques in Bounded Arboricity Graphs. ICALP 2022: 56:1-56:19
Talya Eden, Shyam Narayanan, Jakub Tetek: Sampling an Edge in Sublinear Time Exactly and Optimally. CoRR abs/2211.04981 (2022)
Talya Eden, Saleet Mossel, Dana Ron: Approximating the Arboricity in Sublinear Time. SODA 2022: 2404-2425
Differential Privacy from Locally Adjustable Graph Algorithms: k-Core Decomposition, Low Out-Degree Ordering, and Densest Subgraphs Laxman Dhulipala, Quanquan C. Liu, Sofya Raskhodnikova, Jessica Shi, Julian Shun and Shangdi Yu in FOCS 2022.
Ramesh Krishnan S. Pallavoor, Sofya Raskhodnikova, Erik Waingarten: Approximating the distance to monotonicity of Boolean functions. Random Struct. Algorithms 60(2): 233-260 (2022)
Amit Levi, Ramesh Krishnan S. Pallavoor, Sofya Raskhodnikova, Nithin Varma: Erasure-Resilient Sublinear-Time Graph Algorithms. ACM Trans. Comput. Theory 14(1): 1:1-1:22 (2022)
Sublinear Time and Space Algorithms for Correlation Clustering via Sparse-Dense Decompositions ITCS 2022 Sepehr Assadi, Chen Wang
Expander Decomposition in Dynamic Streams, to appear in ITCS 2023 A. Filtser, M. Kapralov, M. Makarov
Motif Cut Sparsifiers, to appear in FOCS 2022 M. Kapralov, M. Makarov, S. Silwal, C. Sohler and J. Tardos
Efficient and Local Parallel Random Walks, NeurIPS 2021 M. Kapralov, S. Lattanzi, N. Nouri and J. Tardos
Approximating matching size from random streams, SODA 2014 M. Kapralov, S. Khanna and M. Sudan
A. Shapira and L. Gishboliner Hypergraph Removal with Polynomial Bounds
Nathaniel Harms, Yuichi Yoshida: Downsampling for Testing and Learning in Product Distributions. ICALP 2022
Pan Peng, Yuichi Yoshida: Sublinear-Time Algorithms for Max Cut, Max E2Lin(q), and Unique Label Cover on Expanders. CoRR abs/2210.12601 (2022)
Pan Peng, Daniel Lopatta, Yuichi Yoshida, Gramoz Goranci: Local Algorithms for Estimating Effective Resistance. KDD 2021: 1329-1338
Hubie Chen, Matthew Valeriote, Yuichi Yoshida: Testing Assignments to Constraint Satisfaction Problems. FOCS 2016: 525-534
Nithin Varma, Yuichi Yoshida: Average Sensitivity of Graph Algorithms. SODA 2021: 684-703

A good resource for open problems: o(n)