Some tips for coming up with a project:
- Find a big data set and implement an algorithm from class
on that data set. If you want to do more:
Compare the algorithm from class to
a linear time algorithm? Try a few versions of the sublinear time
algorithm? Invent a new version of the class algorithm and try it out?
Perhaps the sublinear algorithm isn't actually faster because of
the memory model in the machine, e.g. disk scans are faster
than jumping around and sampling -- can you take
advantage of sampling + fast scans?
- Can you get a better dependence on n or epsilon for
some algorithm in class? (check with me before you work too hard on that..
I may already know a better bound) Can you get a better lower bound?
Can you get a better dependence for a special class of inputs?
(say subgraphs of the grid? planar graphs? expanding graphs? random graphs?)
-
What if you add a stronger assumption on the form of the input?
For example, maybe the input is a set of points in sorted order?
(in this case, you can do binary search and a number of other things)
What can you do that is interesting in this case for other
problems that we considered in class?
Another way of viewing this question -- what if I give you access
to some sort of famous data structure for the input (say range queries,
nearest neighbor queries, or something like that) -- can you do anything
interesting with that?
-
What if I give you a weaker assumption on the form of the input -- maybe
you can only get random examples of the input rather than query access.
Can you solve any interesting property testing problems in this case?
For example, for estimating the diameter of a graph or a point set?
For estimating the average degree of a graph?
-
Most of our distance related results are in Hamming distance or L1 distance.
Is it easier or harder for a different distance metric -- e.g. EMD, or
any other favorite distance?
e.g., are there sublinear time tests for whether the data is well approximated by a histogram under EMD (as opposed to L1,L2 distances, which has been studied).
- Some specific questions that may or may not be doable:
(come see me for explanations)
-
Sampling correctors for distributions that are one of (bimodal,
juntas, monotone over the plane).
-
LCAs for sparse subgraphs that aren't so sparse (i.e., average degree
10) (improving on general questions of Levi, Ron, Rubinfeld
which look for sparse subgraphs that are (1+eps) average degree)
Timeline:
- (Suggested) Week of October 19 and October 26: pick "teams" and brainstorm
on what type of project as well as possible topics.
- (Suggested) November: run topics by me to get feedback, ideas and suggestions. (This can be done as often as you like during the semester, and is
not limited to November).
- Tuesday November 16: Project proposals due. This can be a simple paragraph describing the topics, plans and goals for your project.
- Draft of project writeup: Tuesday, Dec 1.
- Project presentations: Thursday Dec 3, Tuesday Dec. 8 in class.
- Project writeup duedate: Dec 10.
Project writeup:
The project writeup will depend on the type of project that is being done.
-
If the project is to read a couple of papers, then the project writeup
should describe an overview of the results and give more details on
interesting theorems or lemmas (the extent to which this is done depends
on the difficulty of the paper and the size of the group).
- If the project is to implement an algorithm, then the writeup should describe the algorithm, the datasets chosen and the rational behind the experiments. Did you learn anything interesting?
- If the project is to prove theorems, then the theorem statements and proofs should be in the writeup. If you weren't successful, where did you get stuck? do you have a conjecture? What would be an interesting question to pursue?
A few links to research that came out of previous projects in this class: