Some tips for coming up with a project:
- Find a big data set and implement an algorithm from class
on that data set. If you want to do more:
Compare the algorithm from class to
a linear time algorithm? Try a few versions of the sublinear time
algorithm? Invent a new version of the class algorithm and try it out?
- Can you get a better dependence on n or epsilon for
some algorithm in class? (check with me before you work too hard on that..
I may already know a better bound) Can you get a better lower bound?
Can you get a better dependence for a special class of inputs?
(say subgraphs of the grid? planar graphs? expanding graphs? random graphs?)
What if you add a stronger assumption on the form of the input?
For example, maybe the input is a set of points in sorted order?
(in this case, you can do binary search and a number of other things)
What can you do that is interesting in this case for other
problems that we considered in class?
Another way of viewin this question -- what if I give you access
to some sort of famous data structure for the input (say range queries,
nearest neighbor queries, or something like that) -- can you do anything
interesting with that?
What if I give you a weaker assumption on the form of the input -- maybe
you can only get random examples of the input rather than query access.
Can you solve any interesting property testing problems in this case?
Most of our distance related results are in Hamming distance or L1 distance.
Is it easier or harder for a different distance metric -- e.g. EMD, or
any other favorite distance?
e.g., are there sublinear time tests for whether the data is well approximated by a histogram under EMD (as opposed to L1,L2 distances, which has been studied).