Timeline:
- Pick teams (it's fine to work alone or in groups of 2-3)
and brainstorm on possible topics: October - early November
- Project proposal: Friday November 8
- Project presentations: December 9 and 11
- Project writeup: December 11
Some tips for coming up with a project:
- Find a cool paper (or two) to read.
- Find a big data set and implement an algorithm from class
on that data set. If you want to do more:
Compare the algorithm from class to
a linear time algorithm? Try a few versions of the sublinear time
algorithm? Invent a new version of the class algorithm and try it out?
Perhaps the sublinear algorithm isn't actually faster because of
the memory model in the machine, e.g. disk scans are faster
than jumping around and sampling -- can you take
advantage of sampling + fast scans?
- Can you get a better dependence on n or epsilon for
some algorithm in class? (check with me before you work too hard on that..
I may already know a better bound) Can you get a better lower bound?
Can you get a better dependence for a special class of inputs?
(say subgraphs of the grid? planar graphs? expanding graphs? random graphs?)
-
What if you add a stronger assumption on the form of the input?
For example, maybe the input is a set of points in sorted order?
(in this case, you can do binary search and a number of other things)
What can you do that is interesting in this case for other
problems that we considered in class?
Another way of viewing this question -- what if I give you access
to some sort of famous data structure for the input (say range queries,
nearest neighbor queries, or something like that) -- can you do anything
interesting with that?
-
What if I give you a weaker assumption on the form of the input -- maybe
you can only get random examples of the input rather than query access.
Can you solve any interesting property testing problems in this case?
For example, for estimating the diameter of a graph or a point set?
For estimating the average degree of a graph?
-
Most of our distance related results are in Hamming distance or L1 distance.
Is it easier or harder for a different distance metric -- e.g. EMD, or
any other favorite distance?
e.g., are there sublinear time tests for whether the data is well approximated by a histogram under EMD (as opposed to L1,L2 distances, which has been studied).
- Some specific questions that may or may not be doable:
(come see me for explanations)
-
Sampling correctors for distributions that are one of (bimodal,
juntas, monotone over the plane).
-
LCAs for sparse subgraphs that aren't so sparse (i.e., average degree
10) (improving on general questions of Levi, Ron, Rubinfeld
which look for sparse subgraphs that are (1+eps) average degree)
Project writeup:
The project writeup will depend on the type of project that is being done.
Aim to keep it under five pages (shorter is ok too).
-
Explain the project (problem that you are solving, problem solved by the
papers that you are reading or algorithm that you are implementing), summary of existing work,
your results/contributions.
- Make sure to define any new terms that weren't discussed in the course.
-
If the project is to read a couple of papers, then the project writeup
should describe an overview of the results and give more details on
interesting theorems or lemmas (the extent to which this is done depends
on the difficulty of the paper and the size of the group).
- If the project is to implement an algorithm, then the writeup should describe the algorithm, the datasets chosen and the rational behind the experiments. Did you learn anything interesting?
- If the project is to prove theorems, then the theorem statements and proofs should be in the writeup. If you weren't successful, where did you get stuck? do you have a conjecture? What would be an interesting question to pursue?
Some interesting data sets (it's great if you can
find other interesting data sets!!)
Project presentations:
Make sure to describe the project, say something about previous work (if relevant) and definitely define
any new terms before you use them! Note that the alloted time goes very quickly, so think of high
level ways to explain the interesting parts of the project.
Examples of past projects
Note that it is standard in the theory community to include the
whole project team on the list of authors, though perhaps adding others
that joined the project later.
A few links to research that began with previous projects in this class:
Examples of possibly interesting papers to read:
(list under construction, more may be added)
(links not included)
If you go to the dblp pages of these authors and their coauthors,
there are many other good examples of papers.
-
Parameterized Convexity Testing, Abhiruk Lahiri, Ilan Newman and Nithin Varma, To appear in SOSA22. Also, arXiv
-
Ilan Newman and Nithin Varma "Strongly Sublinear Algorithms for Testing Pattern Freeness."
-
Every Property of Outerplanar Graphs is Testable, Jasine Babu, Areej Khoury, Ilan Newman APPROX-RANDOM 2016: 21:1-21:19
-
Bhattacharyya, Canonne, Yang.
Independence Testing for Bounded Degree Bayesian Network
-
Talya Eden, Dana Ron, Will Rosenbaum
Almost Optimal Bounds for Sublinear-Time Sampling of k-Cliques in Bounded Arboricity Graphs. ICALP 2022: 56:1-56:19
-
Talya Eden, Shyam Narayanan, Jakub Tetek:
Sampling an Edge in Sublinear Time Exactly and Optimally. CoRR abs/2211.04981 (2022)
-
Talya Eden, Saleet Mossel, Dana Ron:
Approximating the Arboricity in Sublinear Time. SODA 2022: 2404-2425
-
Differential Privacy from Locally Adjustable Graph Algorithms: k-Core Decomposition, Low Out-Degree Ordering, and Densest Subgraphs
Laxman Dhulipala, Quanquan C. Liu, Sofya Raskhodnikova, Jessica Shi, Julian Shun and Shangdi Yu in FOCS 2022.
-
Ramesh Krishnan S. Pallavoor, Sofya Raskhodnikova, Erik Waingarten:
Approximating the distance to monotonicity of Boolean functions. Random Struct. Algorithms 60(2): 233-260 (2022)
-
Amit Levi, Ramesh Krishnan S. Pallavoor, Sofya Raskhodnikova, Nithin Varma:
Erasure-Resilient Sublinear-Time Graph Algorithms. ACM Trans. Comput. Theory 14(1): 1:1-1:22 (2022)
-
Sublinear Time and Space Algorithms for Correlation Clustering via Sparse-Dense Decompositions ITCS 2022
Sepehr Assadi, Chen Wang
-
Expander Decomposition in Dynamic Streams, to appear in ITCS 2023
A. Filtser, M. Kapralov, M. Makarov
-
Motif Cut Sparsifiers, to appear in FOCS 2022
M. Kapralov, M. Makarov, S. Silwal, C. Sohler and J. Tardos
-
Efficient and Local Parallel Random Walks, NeurIPS 2021
M. Kapralov, S. Lattanzi, N. Nouri and J. Tardos
-
Approximating matching size from random streams, SODA 2014
M. Kapralov, S. Khanna and M. Sudan
-
A. Shapira and L. Gishboliner
Hypergraph Removal with Polynomial Bounds
- Nathaniel Harms, Yuichi Yoshida:
Downsampling for Testing and Learning in Product Distributions. ICALP 2022
-
Pan Peng, Yuichi Yoshida:
Sublinear-Time Algorithms for Max Cut, Max E2Lin(q), and Unique Label Cover on Expanders. CoRR abs/2210.12601 (2022)
-
Pan Peng, Daniel Lopatta, Yuichi Yoshida, Gramoz Goranci:
Local Algorithms for Estimating Effective Resistance. KDD 2021: 1329-1338
-
Hubie Chen, Matthew Valeriote, Yuichi Yoshida:
Testing Assignments to Constraint Satisfaction Problems. FOCS 2016: 525-534
-
Nithin Varma, Yuichi Yoshida:
Average Sensitivity of Graph Algorithms. SODA 2021: 684-703
A good resource for open problems:
o(n)