Personal note:
For those who may be tempted to believe that I had at any point signed off on the outrageous claim that GPT-4 achieves a perfect solve rate on
the entire EECS curriculum, I want to point out that when Iddo posted the paper on arXiv without my permission, he actually left more or less
intact the
introduction that I had originally written for the version submitted to the NeurIPS
Datasets and Benchmarks track.
It wasn't my best writing (it was a draft and not yet meant for publication), but it reflects my understanding of the paper I
had signed off to submit. You will note that it makes no mention of the claim that GPT-4 can
solve the entire curriculum. It was meant to be a dataset paper about a really compelling dataset and some of the neat things you can do with it,
such as understanding dependencies between courses. Even that paper, I now know, should never have been submitted due to the unethical
sourcing of the data. But I can assure everyone that if I ever want to submit a paper that claims that I can automatically ace the entire MIT
curriculum, that calim will be stated prominently in the introduction of the paper, and the paper will not be submitted to
a
Datasets and Benchmarks track. What's the point of a benchmark if the state of the art tool can already solve 100% of the problems?