Some tips for coming up with a project:
- Find a big data set and implement an algorithm from class
on that data set. If you want to do more:
Compare the algorithm from class to
a linear time algorithm? Try a few versions of the sublinear time
algorithm? Invent a new version of the class algorithm and try it out?
- Can you get a better dependence on n or epsilon for
some algorithm in class? (check with me before you work too hard on that..
I may already know a better bound) Can you get a better lower bound?
Can you get a better dependence for a special class of inputs?
(say subgraphs of the grid? planar graphs? expanding graphs? random graphs?)
-
What if you add a stronger assumption on the form of the input?
For example, maybe the input is a set of points in sorted order?
(in this case, you can do binary search and a number of other things)
What can you do that is interesting in this case for other
problems that we considered in class?
Another way of viewin this question -- what if I give you access
to some sort of famous data structure for the input (say range queries,
nearest neighbor queries, or something like that) -- can you do anything
interesting with that?
-
What if I give you a weaker assumption on the form of the input -- maybe
you can only get random examples of the input rather than query access.
Can you solve any interesting property testing problems in this case?
-
Most of our distance related results are in Hamming distance or L1 distance.
Is it easier or harder for a different distance metric -- e.g. EMD, or
any other favorite distance?
e.g., are there sublinear time tests for whether the data is well approximated by a histogram under EMD (as opposed to L1,L2 distances, which has been studied).