"Online Monoculture and the End of the Niche" (Whimsley)
Models of cultural diversity arise from models of cultural transmission. But "transmission" is hard to observe, sometimes. Another way of thinking about the project from last night is that we were building something that tries to identify these vectors of transmission (and their direction). So we should be able to take some of those results, and plug them into a small-scale simulator of this type, and ...
"The complement of Atlas Shrugged" (Shtetl-Optimized)
"Overall, I got the impression that Rand didn’t really care for technology as such—only for what certain specific, 19th-century technologies symbolized to her about Man’s dominance over Nature." -- Ah, I get it now. W. W. isn't a closet-Victorian, he's a crypto-Randian! It all makes sense to me, now.
"The World Is Divided into Four Groups of People" (Grasping Reality with Both Hands)
I honestly expected a punchline (okay, fine, a *different* punchline).
"Social Collider would be useful with a real target" (birgerking)
"The creators of Social Collider describe that they build the service “With the Internet’s promise of instant and absolute connectedness, two things appear to be curiously underrepresented: both temporal and lateral perspective of our data-trails. Yet, the amount of data we are constantly producing provides a whole world of contexts, many of which can reveal astonishing relationships if only looked at through time.”"
Begelman et al. "Automated Tag Clustering: Improving search and exploration in the tag space" (2006)
"The use of clustering techniques enhances the user experience and thus the success of collaborative tagging services. Clustering techniques could improve the user experience of current tagging services. This document describes the current limitations of tagging services, gives an overview of existing approaches, describes the algorithms for tag clustering, and shows experimental results and a variety of conclusions."
Bollen et al. "Clickstream Data Yields High-Resolution Maps of Science" (PLoS ONE)
The "click streams map of science" paper I was telling you about...
"Tom Slee on monopoly populism and cultural niches" (Crooked Timber)
The "very interesting short paper" at the bottom is the one I was telling you about: "Social Media as Windows on the Social Life of the Mind"
"Ricardian Equivalence" (Matthew Yglesias)
"But stepping beyond this, why on earth would anyone believe that Ricardian equivalence holds? The theory is that a tax cut today can’t boost spending tomorrow because people will know that taxes will have to rise in the future. Do you believe that people act like that? If so, I have this interesting theorem about how since cigarette smoking today causes cancer in the future, nobody smokes cigarettes." --- Funny because it's true.
"The Brilliant Bechdel/Wallace Movie Test" (Whimsley)
My version would be, "To be worth watching, a movie must: (1) Have at least one intelligent computer in it, (2) which talks to humans, (3) and then kills them." You'll notice that 2001: A Space Odyssey passes *this* test with flying effin' colors. Hrm.
hdl 2010 : helsinki design lab
Bryan's new project! Looks great... also, the 1968 picture of Fuller and his "Dymaxion Map" is pretty awesome.
"Will Wolfram make bioinformatics obsolete?" (john hawks weblog)
"Alpha could turn into an online robot armed with basic genetics knowledge. And if not Alpha -- genetics is a logical priority for Wolfram, but it may not be the first or primary one -- certainly some other system using similar technology will emerge. Put it to work on public databases of genetic information, and you have a system that can resolve the incompatibilities by adding semantic knowledge. A bit of effort on existing databases would allow the resolution of discrepancies in ascertainment." --- You know, I obviously think this is a worthy goal. But any engagement with modern genomics data will quickly show that just adding a few description logics to "resolve discrepancies" will fail spectacularly. (Why is why the answer to the title's question is a resounding, 'No') --- "Or, more likely, another couple of years of whole-genome sequencing will solve most of ascertainment biases by drowning them in new data." --- Yeah, maybe.
Sniffing keystrokes via laser, power lines - Hack a Day
Sniffing keystrokes, by either shining a laser at the back of a laptop, or by monitoring an electrical outlet into which the computer was plugged. Uhhh... (data in the weirdest places, hm?)
"Automating scientific grunt-work" (Mailund on the Internet)
Asking (some of) the right questions, but in the wrong way.
arthegall's response and eric-lander bookmarks on delicious
André Joyal explained to me that he likes to think of entropy as something like cardinality. More precisely, the exponential of entropy is like cardinality.
nowhere in sight « quantum of wantum
Thanks for the kind words, man -- it's been a rough couple of weeks, and I appreciate the encouragement. (And of course, if we ever did go behind a password wall, we'd still be more than happy to put you on the access list.) (And also, those lectures really *are* good, aren't they? Lander's such an aggressive and abrasive guy in person, in a research setting, but he really is a remarkable lecturer... I've heard him give a talk to a roomful of other scientists, and he has the same enthusiastic and engaging manner even when he's not lecturing students. And he's also a new member of one of the US science advisory boards to the president, which makes me happy too.)
#1154 (SSI from Apache mod_include directive to django fails) - Django - Trac
A patch, but listed in "wontfix."
"Only in England, part III" (Marginal Revolution)
Taxonomy? Also, the headline "only in England" is ridiculously off the mark. "Only in every nation that has ever existed on the face of the planet since the dawn of the printing press" is more like it.
Simon Anders, "Visualisation of genomic data with the Hilbert curve"
[ 10.1093/bioinformatics/btp152 -- Bioinformatics] Using Hilbert curves (that is, discrete approximations to a space-filling curve) to map 1-dimensional data (intensity along a chromosome) into a 2-dimensional space (yer screen), for visualization purposes. That's a pretty awesome idea. (Okay, so first question -- point data is great, but what do genes, or other annotation features, look like in this space? Probably some weird curvy worm-things, right? One problem is going to be that these kinds of features, which are disjoint in 1-d, won't necessarily be disjoint in 2 dimensions anymore, unless you adopt some spacing or visual padding requirements.)
"MIT adopts a university-wide Open Access mandate" (Peter Suber, Open Access News)
"To assist the Institute in distributing the scholarly articles, as of the date of publication, each Faculty member will make available an electronic copy of his or her final version of the article at no charge to a designated representative of the Provost's Office in appropriate formats (such as PDF) specified by the Provost's Office."
"Jonah Goldberg Asks A Question" (Crooked Timber)
"(Given that Goldberg claims to believe in classical liberalism, his failure to consider that any standard could possibly occupy the intellectual space occupied by the standards of classical liberalism is … an impressive feat of doxastic auto-evacuation. It doesn’t occur to him to believe what he believes, apparently.)" --- I heart Holbo. Srsly.
"Getting Past the Pie Chart" (Veronique Greenwood at Seed Magazine)
"Skimming through these visual databases, he’s found, can be much more effective than complicated visualization; Cleveland is working on a protocol to share with others soon. Yet even as he advocates the use of visualization databases, he emphasizes that numerical tools — statistical tests of variance and significance — are just as important in assessing trends. Current enthusiasm for putting numbers into pictures sometimes obscures the fact that science is, after all, a quantitative pursuit, and an image alone cannot replace numbers." --- But what about images *of* numbers?? In all seriousness, this is the problem (tools for analysis, not just visualization) that is constantly at the back of my mind.
"The propagation of false news in wartime." (Eric Rauchway at The Edge of the American West)
The historian as statistician. Among other things.
"More on the question of whether it's better to be one point behind at halftime" (Andrew Gelman)
A sneak-peek at the future of Peer Review. Or at least, one part of it.
"A downside of following good graphical practice" (Andrew Gelman)
"The funny story is that, by putting in the effort to apply sound graphical principles, I brought these difficulties upon myself--thus, among other things, spending several hours that I could otherwise have spent doing research, for example. Usually when discussing the costs of good graphical display, I think only of the costs of preparing the graph itself, not the hazards entailed by putting more numbers out there that have to be defended." --- and down the rabbit-hole he goes...
College Basketball - Scores & Schedule Monday November 10, 2008 - Rivals.com
rivals.com box scores. (Not going to have time to get this done before the tournament. Next year!)
"Students" (xkcd)
Oh, god, it's so true. I've had this nightmare about high school several times -- I find myself, today, but back in high school for some reason. And it's the end of the year, and I realize that I've been (accidentally) skipping my calculus course (of all things) all year. And I'm never going to make up the work, and I'm doomed, and I just wanttograduateandgetoutofhereohgod...
"Roman humor (swallow your coffee before proceeding)" (Brainiac)
"The "strange unnaturalness of the number system": killer material in any culture."
Questia version of Schumpeter's "Capitalism, Socialism and Democracy"
David Warsh, "Knowledge and the Wealth Of Nations: A Story of Economic Discovery"
John Sutton, "Technology and Market Structure" (MIT Press)
"One [way of studying market structure] looks to "industry characteristics" to explain why different industries develop in different ways; the other looks to the pattern of firm growth within a "typical" industry to describe the evolution of the size distribution of firms. In his new book, John Sutton sets out a unified theory that encompasses both approaches, while generating a series of novel predictions as to how markets evolve."
"The unfortunate uselessness of most ’state of the art’ academic monetary economics" (Willem Buiter)
"The conclusion, boys and girls, should be that trade - voluntary exchange - is the exception rather than the rule and that markets are inherently and hopelessly incomplete. Live with it and start from that fact. The benchmark is no trade - pre-Friday Robinson Crusoe autarky. For every good, service or financial instrument that plays a role in your ‘model of the world’, you should explain why a market for it exists - why it is traded at all. Perhaps we shall get somewhere this time." --- Buiter's criticism of (among others) Robert Lucas.
Steven N. Durlauf
"... empirics, as opposed to speculative theory and stylized facts, of economic growth..."
Acemoglu, "Introduction to Modern Economic Growth" (Princeton Press)
"AIG Bonuses: Some Perspective" (Capital Gains and Games)
"While the Senate was constructing the $787 billion stimulus last month, Dodd added an executive-compensation restriction to the bill. The provision, now called “the Dodd Amendment” by the Obama Administration provides an “exception for contractually obligated bonuses agreed on before Feb. 11, 2009” -- which exempts the very AIG bonuses Dodd and others are now seeking to tax. Dodd’s original amendment did not include that exemption, and the Connecticut Senator denied inserting the provision." --- This seems like it'd be an interesting story to get to the bottom of, you know? Is there any other reason to insert a clause like that, *other* than to specifically exempt the AIG-bonus-situation? "sadly, he actually exists" (scatterplot) "An immoral person is of course incapable of making a legitimate, intellectual, argument because they come from deceit." --- drektheuninteresting reads Conservapedia so I don't have to. Links to Kottke's "How the Crash Will Reshape America" I wrote a response to your email, but I can't tell if it got sent because my email is being wonky on this train. Apologies, if this is a duplicate then. At any rate, I've been randomly browsing for Robert Lucas papers on cultural capital, concentration, development, and technological innovation after my brother sent me this link a week or two ago. This all came after the argument my brother and father and I were having on my blog, about taxes and innovation and Will Wilkinson and whether or not Wilkinson is a Victorian Nerd or not, a few weeks ago. But I'm just a lazy computer scientist -- if you have better links on the subject, or advice on where to look, I'm all ears (and gratitude). Robert Lucas, "On the Mechanics of Economic Development" (1987) "This paper considers the prospects for constructing a neoclassical theory of growth and international trade that is consistent with some of the main features of economic development. Three models are considered and compared to evidence: a model emphasizing physical capital accumulation and technological change, a model emphasizing human capital accumulation through schooling, and a model emphasizing specialized human capital accumulation through learning-by-doing." --- Doesn't look like *exactly* what I'm looking for, but it's a start. "anti-social capital" (orgtheory.net) "After noting that Facebook reportedly adds 600,000 users a day (if you’re not on there yet, you will be), he wonders if Facebook will contribute to a buildup of anti-social capital, which is “a snarky (and imprecise) term for the absence of ties of a certain type, namely those whose main consequence is that you spend a lot of time online communicating with people who, like you, have a lot of time to spend socializing online.”" --- Uhoh. "Saturday Night Live - Sloths!" "Hunt in packs, and we eat algae off our fur... Sloths!Sloths!Sloths!!" (I can't believe I had forgotten about this...) "Idle Question of the Day" (Three-Toed Sloth) I don't know, but I trust Felix Salmon, and he seemed to think this would be a supremely bad idea (see here: http://www.portfolio.com/views/blogs/market-movers/2008/10/19/why-the-cds-market-didnt-fail -- notice the part about Karen Shaw Petrou at the end of the story, and then a follow-up a few days later here: http://www.portfolio.com/views/blogs/market-movers/2008/10/22/idiotic-cds-proposal-of-the-day-ben-stein-edition ). But I don't know, those were from the end of last year, so maybe he's changed his mind since then? Or maybe I shouldn't have so much confidence in his opinions, either. Monsters from the ID I'd love to watch it, definitely! When are you going to be in town? IM Outtake of the Day, CDS Edition - Finance Blog - Felix Salmon - Market Movers - Portfolio.com "In any case, I fail to see how the CDS market -- even broadly understood to include AIG -- was in any way responsible for the financial meltdown. Maybe it would have been, had AIG not been bailed out. But AIG was bailed out, so it wasn't." --- A philosophical question, if ever there was one. ("I broke my right hand in an accident, and therefore when I wrote my paper I wrote it with my left hand. But if I hadn't broken my hand, I would have written it with my right hand anyway. So we can't say that my accident was the cause of my paper being written...") "Everyone is Irish" (gnarayan's flickr photostream) Gaaaah. St. Patrick's Day is the worst. day. evar. I'm so not looking forward to wading through the empty beer cans and drunk people on the T and train home tonight. The Django and Ubuntu Intrepid Almanac @ Irrational Exuberance Setting up django on ubuntu, from start to finish. Where the eff is Mason? Learning Experiment Databases "An experiment database is a database designed to store learning experiments in full detail, aimed at providing a convenient platform for the study of learning algorithms." -- So, learning about learning algorithms, is it? Jungle Disk - Reliable online storage powered by Amazon S3 and Rackspace - JungleDisk Another option would be one of the several online storage services that are backed by S3 (an Amazon web-service). "Jungle Disk" is one, but there are several others. It might also be possible to build your own custom service using the S3 interface directly. It's not too hard, and it's likely to be cheaper and better-supported than an out of the box system running on some ISP's computer somewhere... Live Mesh Beta C -- John suggests "Live Mesh" from Microsoft as a storage/sharing option for data online. Oddly enough, Alex M. (a friend who also reads the blog) is a developer for the Mesh team, I think. We might could ask him what he thinks, too... Preserve City Hall Plaza? - Brainiac Ph'nglui mglw'nafh S'collay R'lyeh wgah'nagl P'ei fhtagn! --- "Like City Hall itself -- the brutalist brainchild of Kallmann, McKinnell, and Knowles -- I.M. Pei's windswept brick plaza has never been fully embraced by Bostonians. But it represents a historically important effort to bring new energy to an American urban center, argues Charles A. Birnbaum, president of the Cultural Landscape Foundation. Birnbaum believes it could still serve that purpose if some tweaks were made: new landscape features, for instance, programmed cultural events, and more vendors allowed nearby." --- Yeah, maybe. Or, maybe it's too late. Perhaps the surrounding urban culture has already learned that "nothing happens in the bleak wasteland of concrete that surrounds city hall?" "Electronic medical records and what statistics can do" (Aleks Jakulin) Did I not already tag this? I meant to... "In summary: (1) It is important to collect the data correctly. (2) Electronic medical records make it possible to deploy predictive models widely, improving health care. It is important to build user interfaces that make use of this. (3) There will be opportunities for centers that specialize in predictive models for specific symptoms or diseases, combining the background knowledge aggregated in medical profession over many years with the modern data collection and analysis." Pearlmutter and Siskind, "Reverse-mode AD in a functional framework: Lambda the ultimate backpropagator" The TOPLAS paper ... nice title. Barak Pearlmutter comments on "Conal Elliott » Paper: Beautiful differentiation" "I will try to formulate my deeper technical comments later. But let me mention three things quickly. First, the paper is only about forward-mode accumulation automatic differentiation, i.e., forward AD. So “AD” should be qualified throughout. Maybe fAD, or AD(f), I dunno. Second, in the intro of a TOPLAS paper on reverse-mode AD, Jeff Siskind and I construct forward AD as a step in the definition of its adjoint. That construction involves compositionality, and has a bit of the flavour of some of the cool things you’re doing here." -- I gotta remember to look this paper up, later. "AIGamemnon (A Fragment)" (Crooked Timber) "So, my dear lord, dismount from your car, but do not set on common earth the foot that has trampled upon global markets. You to whom I have assigned the task to strew with bonuses, salary top-offs and the like, Quick! With something on the order of$160 million let his path be strewn, that Justice may usher him into a new quarter he never should have seen. The rest my unslumbering vigilance shall order duly, for if Geithner can be made to swallow this than, please god, pretty much fucking anything can be subsequently ordained."
Dropbox - Home - Secure backup, sync and sharing made easy.
A question from my brother -- John, what do you think?
"The World's Largest MMORPG: You're Playing it Right Now" (Coding Horror)
Someone should type "Luis von Ahn" into lmgtfy.com ("Let Me Google That For You"), and send it to Jeff Atwood.
"Littell’s Kindly Ones: 1" (Adam Roberts at The Valve)
Roberts is running a one-man reading group of The Kindly Ones? In seven parts. Amazon says my copy of Les Bienveillantes is waiting for me at the office... in an earlier comment, Roberts writes that "the reviews I’ve read suggest that The Kindly Ones is basically just a modern rewriting of Adam Bede. With, you know, Nazis." So maybe I should track down my copy of Adam Bede too, and finally read *that* as well.
Meloso et al. "Promoting Intellectual Discovery: Patents Versus Markets." (Science, 323 (5919): 1335)
Apropos of our discussion last week -- "Because they provide exclusive property rights, patents are generally considered to be an effective way to promote intellectual discovery. Here, we propose a different compensation scheme, in which everyone holds shares in the components of potential discoveries and can trade those shares in an anonymous market. In it, incentives to invent are indirect, through changes in share prices. In a series of experiments, we used the knapsack problem (in which participants have to determine the most valuable subset of objects that can fit in a knapsack of fixed volume) as a typical representation of intellectual discovery problems. We found that our "markets system" performed better than the patent system."
Clarkson et al. "Fingerprinting Blank Paper Using Commodity Scanners" (PDF)
The Ed Felten paper on using scanners to identify fiber patterns in paper (!), and then using those patterns to fingerprint *particular pages* of paper (!!). If nothing else, read the last two sections -- the discussion of what an adversary could try to do to forge a piece of paper fingerprinted under this system, and possible applications of being able to uniquely identify pieces of paper. The whole thing is pretty great.
"Kiton's Eye For Detail" (Felix Salmon)
I would've guessed at the answer proffered by commenter #1, ("neopolitan" as a portmanteau for "neo-cosmopolitan,") but then he refutes that suggestion by observing that Kiton is somehow based out of Naples. So Salmon's sarcasm was well-placed to begin with.
Salam Pax
Dude's blogging again. Pretty great.
"Newspapers and Thinking the Unthinkable" (Clay Shirky)
Effing the ineffable. Everybody and their mom (but not our Mom!) is linking to this post. And indeed, it is good! C, when you get back from DC, tell me what you think about it... and get Dad to read it. I've already ordered the (two-volume) Eisenstein book from Amazon, so I'll let you know when that arrives and I've had a chance to dig into it.
Understanding Bidirectional (BIDI) Text in Unicode
"But how does this work? Not magic, but science." A semi-detailed introduction to right-to-left and left-to-right orderings in Unicode, including examples of how to switch between them in the same string. And of course, when you get to the end, the other shoe drops: properly-formatted Unicode strings are actually a context-free grammar -- every 'opening marker' has to have a paired 'closing marker.' Otherwise, if you're embedding user-entered Unicode in your website and you don't have a valid pairing of markers, you risk flipping all the rest of the text on your webpage.
had me kinda chuckling. As if the decision about being "public" were simply a decision between "completely hidden" and "open to an avalanche of unknown people who drop in, leave slightly annoying and completely off-topic comments, and then disappear forever." Yes, surely, I must have failed to understand *something.*
Dolores Labs Vets Web Sites On The Cheap - Forbes.com
Holy sh*t, Tikhon Bernstam is one of Scribd's cofounders??
How the Crash Will Reshape America « Kottke
I really should look up and read some of Lucas's papers on this. (Thanks for the link, C!)
"Peer into the dark heart of a troll." (Acephalous)
"You troll me on a blog, you better wake up and apologize..." (SEK's trolls come back and ask for forgiveness? Truly, here is one who is touched by the angels!)
qwantz.com - dinosaur comics - March 09 2009
"DIRECTION IS A BUCKET THAT PEOPLE KEEP SNEAKING INTO."
rayogram NEWScan
"Cloud computing is a trap, warns GNU founder" (guardian.co.uk)
RMS is Admiral Akbar... maybe he's right, who knows? But keep in mind, he also thinks that the electronic locking system on the doors here at the Stata Center are "a trap" too, so let's keep things in perspective.
"How Did I Miss This?" (Ta-Nehisi Coates)
THRU YOU | Kutiman mixes YouTube
Whoa. Fantastic. Beyond-fantastic. A mini-album, completely on YouTube, made out of a mix of other (unrelated) videos. That's truly tremendous... (Also, watch for the Ras Trent moment, right in the middle.)
"WU2WEI2: Do Nothing" (Language Log)
It's the ancient Chinese version of "No-Drama Obama."
Balding and Torney, "Optimal Pooling Designs with Error Detection" (1994)
[ScienceDirect - Journal of Combinatorial Theory, Series A]
Harismendy et al. "Genome-wide location of yeast RNA polymerase III transcription machinery" (EMBO 2003)
"Introducing Redis: a fast key-value database" (Zen and the Art of Programming)
"According to Apache’s benchmark data, Salvatore’s commodity server (a Pentium D which is also running several large sites) could handle 150 pageviews per second (6 milliseconds each) for each of the 50 concurrent users. This was possible while using the grand total of 1 MB of RAM for the database. Of course, this is just a quick benchmark and there wasn’t a huge deal of data in the database either, but the responsiveness was very impressive nevertheless." --- I have a question (a real question, *not* snark). In all seriousness... a bunch of key-value pairs, kept completely in memory, and subjected to a read-heavy (or read-only) load. Wouldn't we be much more surprised if this *wasn't* fast? Everything is blazingly fast, when (a) you can ignore disk, (b) you can ignore writes and transactions and consistency, (c) you can ignore recovery, and (d) you're not doing any joins. Or am I misunderstanding?
"Are big law firms built on implicit leverage?" (Marginal Revolution)
I find this description (these descriptions) of "leverage" to be somewhat opaque. Basically, you're talking about a relationship between an input (say: money spent) and output (say: money earned) -- and the relationship has a slope (that is, either it's linear, or it has an instantaneous tangent vector). And levered relationships are relationships where the slope is much greater than one... is that it? Or is this something deeper?
"Self-experimentation, placebo testing, and the great Linus Pauling conspiracy" (Andrew Gelman)
"I'm reminded of the idea I heard once that Linus Pauling knew all along that megadoses of Vitamin C have no effect, and that he altruistically sacrificed his reputation as a scientist to trumpet Vitamin C's virtues, on the theory that it would reduce the suffering of millions via the placebo effect." --- Fits the facts!
I was positively impressed with Wolfram Alpha | Semantic Universe
"Alpha excels at not just retrieving the stored data but performing various appropriate numeric calculations on the data, and displaying the results in beautiful graphs and easily comprehended tables for the user." -- Fair enough. This seems like a much more realistic description than, "OmG, it's like Google, except it computes with words and understands things!!1!" -- "It does not have an ontology, so what it knows about, say, GDP, or population, or stock price, is no more nor less than the equations that involve that term." -- But just saying "no ontologies" doesn't make it so. Indeed, "equations" kinda implies "some ontology," whether implicit or not. So basically, this is a giant, hand-curated dataset with a nice interface on the front of it. That's totally reasonable, not earth-shattering. But warning bells should go off, when someone starts talking about "understanding." -- "One vulnerability... is that errors in the data may go unnoticed for a long time." -- Yeah, no kidding.
A Softer World: 408
"Come on, man. Use your evil cow brains for once."
"A Modest Proposal - Bikers, Take the High Road" (NYT)
Bikers, pedestrians, and cars, oh my. It's reaching the point with me, around campus, that I've started getting really angry at *other pedestrians* who can't seem to wait 15 gol-darn seconds for the crosswalk signal to illuminate their path. I feel like, if everyone were slightly more willing to only walk when they had the light, I'd feel a lot more comfortable yelling at cars who run red lights and swerve around pedestrians. Arrrgh.
"How easy is it to fill those Treasury jobs?" (Marginal Revolution)
"How many brilliant academics even manage to make good deans?" -- As close to a one-sentence refutation of an Yglesias post as you're going to see. (All his other points may be good, may be beside the point, who knows? But it's this last one... ooff.)
Real estate - Wikipedia
"Some have claimed that the word Real is derived from "royal" ... However, the "real" in "real property" is derived from the Latin for "thing"." --- Good to know.
"Does coverage matter?" (Radford Neal’s blog)
"I think part of the problem is that reports of experimental results should not be aimed at presenting conclusions, as may seem most natural from a Bayesian viewpoint, but rather at providing the information with which the readers may draw conclusions. This may be the source of some objections to the prior distribution in Bayesian analysis, which can be seen as corrupting the objective presentation of the experimental results, even though frequentist methods like p-values are not suitable presentations either."
"Banning Open Access II" (The n-Category Café)
More back-and-forth on the Conyers bill, with J.C. himself actually getting in on the act with a response to a criticism from Lawrence Lessig. The comments to this thread are really useful, too.
"Bet on a Pakistan Coup" (Rootless Cosmopolitan - By Tony Karon)
"Back to the future in Pakistan? Last week, General Ashfaq Kiyani flew to Washington for consultations with the Obama Administration; this week, it’s reported from Pakistan, he warned President Asif Ali Zardari to “set things right” in the country before March 16, when opposition forces are set to march on the capital. When the army puts the government on notice to clean up its act, you know what’s coming next." -- Eeesh.
"Journal Version : Two Choices + Bloom Filters" (My Biased Coin)
"This paper pretty much came about from a dare." -- link to Mitzenmacher's paper, "Using the Power of Two Choices to Improve Bloom Filters."
"Applying Kennan-esque Realism to the Problem of Terrorism" (Stephen M. Walt)
"A containment strategy places a ceiling on the threat while awaiting its eventual internal collapse... But a strategy of ‘rollback’ risks the more likely outcomes of financial haemorrhage, the erosion of constitutional liberties and the inflaming of other world crises. Consider this in blunt policy terms. An Al-Qaeda at large, trying full-time to stay alive, pursued by an ever-growing set of enemies, even with the remote chance that it inflicts a terrible blow, is less dangerous than wars with Iran or Pakistan, an emptied treasury or a shredded constitution. Trading off time and conceding longevity to the enemy for the sake of lowering the war’s costs is worth it. This is because A.Q.'s capacity to hurt America is less than America’s capacity to hurt itself. The ‘war on terror’ is a war declared on a tactical method rather than an identifiable group, for cosmic rather than achievable goals, with little grasp of ends, ways and means or weighing of vital versus peripheral interests.”
"I have a reaction to [Facebook] as a consumer advocate and an advertiser: What in heaven's name made you think you could monetize the real estate in which somebody is breaking up with their girlfriend?" -- Via David Pennock's blog. (Also, I'm amused by the way in which a net-space like a Facebook page is referred to as "real estate" here. "Estate," surely, but "real?")
"Beyond the Data Deluge" -- Bell et al. 323 (5919): 1297 -- Science
A better take (overview) on "Google Science" and "Big Data" -- "The urgency for new tools and technologies to enable data-intensive research has been building for a decade or more (2, 7). In 2007, Jim Gray laid out his vision for a fourth research paradigm--data-intensive science--which he described as collaborative, networked, and data-driven (1, 10). He defined eScience as the synthesis of information technology and science that enables challenges on previously unimaginable scales to be tackled. Despite the enormous potential of this approach, data-intensive science has been slow to develop due to the subtleties of databases, schemas, and ontologies, and a general lack of understanding of these topics by the scientific community. ... Indeed, many areas of science lag commercial use and understanding of data analytics by at least a decade."
"Spherical Trigonometry"
A nice set of identities for spherical geometry -- John Cook asks, "I’m not sure why schools quit teaching spherical trig," but I don't think that's quite right. "Schools", writ large, probably *do* teach spherical geometry and trig. I just suspect that it's dressed up in some other mathematical notation -- for instance, it'll be written as the algebra of normalized quaternions, or something. Right?
Theano — theano v0.1 documentation
"Theano is a Python library aiming to allow definition, optimization and efficient evaluation of mathematical expressions involving multi-dimensional arrays (though it may be extended to support many other types). Theano melds some aspects of a computer algebra system (CAS) with aspects of an optimizing compiler. This is particularly useful in fields such as machine learning where complicated algorithms must be run over large amounts of data."
Radul and Sussman, "The Art of the Propagator" (DSpace@MIT)
Alexey's project. Building up to presenting this "propagator" architecture was what Chris Hanson and Gerry Sussman were aiming for in the "Adventures in Symbolic Programming" course I took a year ago -- although I kinda found it a bit underwhelming when we got there. Still, there were connections to be made -- for instance, some of this stuff comes out looking like factor graphs and message passing. And they showed you could do neat back-tracking and "provenance" propagation using continuations, if you played your hand right, which was pretty cute. I saw, via someone's blog the other day, that Chris Hanson is still talking about this in other places (I think he's left MIT already), and obviously Alexey's writing about it, so maybe not a dead letter quite yet.
http://www.sagebase.org/publications.html
Ah, so this is what happened to Rosetta... "The foundation for Sage’s activities are the pioneering studies conducted by researchers at Rosetta Inpharmatics, a subsidiary of Merck & Co., Inc.. Here is a sampling of 2008 publications by these scientists that illustrate the value and potential of the advanced Sage technology." --- That list is fine, but likely not enough to build an entire company around, right?
"Gene Expression: You Haven't Been Thinking Big Enough?" (In the Pipeline)
"Well, here’s another crack at open-source science. Stephen Friend, the previous head of Rosetta (before and after being bought by Merck), is heading out on his own to form a venture in Seattle called Sage. The idea is to bring together genomic studies from all sorts of laboratories into a common format and database, with the expectation that interesting results will emerge that couldn’t be found from just one lab’s data. ... once you get down to the many labs that can do high-level genomics (or to the even larger number that can do less extensive sequencing), the problems will be many. Sage is also going to look at gene expression levels, something that's easier to do (although we're still not in weekend-garage territory yet). Some people would say that it's a bit too easy to do: there are a lot of different techniques in this field, not all of which always yield comparable data, to put it mildly. ... Then you've got the really hard issues: intellectual property, for one."
"Wolfram Alpha is Coming -- and It Could be as Important as Google"
"[WA] actually computes the answers to a wide range of questions... such as 'What country is Timbuktu in?' ... Think about that for a minute. It computes the answers." -- In what way is this "computation?" At least, what way in which Google is not also "computing"? -- "It understands and then computes answers to certain kinds of questions." -- assumes facts which are not in evidence. --"Wolfram Alpha is a system for computing the answers to questions. To accomplish this it uses built-in models of fields of knowledge, complete with data and algorithms, that represent real-world knowledge." -- Ah, I see. I'm glad Wolfram finally figured out this was the way to go. -- "But as intelligent as it seems, Wolfram Alpha is not HAL 9000, and it wasn't intended to be." -- Oh, thank goodness! -- "Instead, it is a system that has been engineered to provide really rich knowledge about human knowledge -- it's a very powerful calculator." -- A new *kind* of calculator!!
"Citigroup: World’s Worst Investment to Get Even Worse" (The Big Picture)
"So for about 100% of the market value of Citi, plus insurance guarantees worth of as much as 500% of its value (~$275 billion), we got less than 1/10 of a company that in total was worth 1/5 of our investment. ... Its just another example of why these insolvent banks should be nationalized, or for you squeemish free marketers, FDIC mandated, pre-packaged Chapter 11, government funded reorganization. If Obama continues to listen to the god-awful advice of Larry Summers and Tim Geithner, he will doom his presidency, and finsh marginally ahead of George W. Bush on the list of worst presidents." --- Barry Ritholtz is not happy. "Atlantis" (lidsblog) "Speaking of measuring the ocean and boats, after a talk by Marco Duarte a couple of weeks ago, my officemate Matt Johnson was saying how it would be a great demonstration of the power of compressed sensing (CS) if you made a version of the game Battleship in which one player could take CS measurements and exploit structured sparsity, while the other player played normally. The CS player would win, either every time or with overwhelming probability --- I'm not sure exactly which." "Rules for Radicals" (Who Is IOZ?) "The Donk complains that the Republicans are crass obstructionists. Would that it were true. The contemporary GOP wears the guise of obstructionism but lacks the wherewithal to oppose effectively. Superjesus Black Reagan rules the airwaves, and the supposed opposition is sequestered away in a chintzy hotel ballroom listening to C-list newsmedia celebrities extemporize around the posthumous legacy of Romulus and Remus Ronald Reagan." Banach - SIMILE "Banach is a collection of operators that work on RDF graphs to infer, extend, emerge or otherwise transform a graph into another." --- Think: the pieces of a Pipes-like system, but for (pure-ish) data. Maybe I should just keep re-bookmarking the whole SIMILE site? Anyhow, my friend Ted has a new Exhibit-based system for blogging+data, coming out on Monday. I'll link to it when it does, but this is the reason I was asking you about M. and the server he had been talking about setting up. "Correlation" (xkcd) No causation without correlation! (Assuming stability and faithfulness, which maybe Jamie Robins would say one shouldn't do?) "Wolfram|Alpha Is Coming!" (Wolfram Blog) Wolfram will save the children, but not the children who don't use Mathematica --- "With Mathematica, I had a symbolic language to represent anything—as well as the algorithmic power to do any kind of computation. And with NKS, I had a paradigm for understanding how all sorts of complexity could arise from simple rules. But what about all the actual knowledge that we as humans have accumulated? ... in effect, we can only answer questions that have been literally asked before. We can look things up, but we can’t figure anything new out." --- Sweet! Wolfram is going to solve the problem of inference! and also A.I.! On the other hand, that seems hard. How are you going to do that? --- "Armed with Mathematica and NKS I realized there’s another way: explicitly implement methods and models, as algorithms, and explicitly curate all data so that it is immediately computable." --- Oh, *right*, of course. Thanks for clearing that up. Wolfram for teh win!!!1!one! "Follow-up on Robins' Talk ("A Bold Vision of Artificial Intelligence and Philosophy")" (Social Science Statistics Blog) "The point of the talk was not to defend faithfulness, but rather to show that it implies a lot more than was realized by researchers who currently employ it to uncover causal structure from joint densities." --- I am *so* bummed that I wasn't able to go to this talk. Gaaaah. "Trusted Institutions" (Felix Salmon) I'm not as optimistic about the chances for institutions to regain trust in the future, but Salmon is right -- that Daily Show clip is *gold*. "Is an Education Revolving Door Such a Bad Thing?" (Matthew Yglesias) "But though alternative certification programs exist in all 50 states, in many states they’re not very robust and/or there’s no clear vision of how what’s already in place could be expanded and built upon." --- Since we were just talking about this on the phone the other day... thoughts, Ms. C? "Why Obamanomics Will Hurt Innovation" (Will Wilkinson) "The concept of “animal spirits” recognizes that not all economic decisions are made entirely with spreadsheets. Some people start companies because they’re driven by a dream that transcends rational economic calculation. But most successful entrepreneurs are pretty serious about comparing risks with opportunities. Higher tax burdens raise the price of entrepreneurship. When you raise the price of something, then, all else held equal, you usually get less of it." --- "All else held equal?" That's why I love economics. It's those little caveats that get you. "I, Jim Manzi, have run a linear regression, and I will now tell you the policy implications of my coefficient-estimates." And of course, since it's very unlikely that there any interactions with the "tax burden" predictor... *coughcough*. "Basics: Significant Figures" (Good Math, Bad Math) "Significant figures are a rather crude way of tracking precision. They're largely ad-hoc. The "right" way of tracking precision is error bars. ... Significant digits are basically a way of estimating error bars." -- What always bothered me (in school) was how the number of significant digits *seemed* (to me, at the time) to be specific to the base in which one was working. Significant-digits-in-binary vs. significant-digits-in-base-10 vs. some other base... depending on how you were counting, using significant digits in the same way would lead to different levels of precision. (But actually: that's just an intuition... is that even right?) But obviously, in practice, when you don't have access to error bars or some form of measurement-of-uncertainty in a quantitative sense, they're clearly useful. "Ikea chairs" (the statistical mechanic) "In a review article about agent-based models Dietrich Stauffer once wrote "Physicists not only know everything, they also know everything better."" -- They're more like a contagion (physicists, that is) than anything else. Having done physics as a physicist would, they then spread out to new, uncharted (to them) areas, to explain how science should be done. Economics? It's really physics. Biology? Start with theoretical models, as a physicist would. Bioinformatics? We've solved those problems already. Philosophy of Science? Sorry, I think you meant to say, "philosophy of Science as it would be if performed by a Physicist." Philosophy of Physics, really. I suppose most of this is a completely normal byproduct of the fact that we're at the tail end of a century when physics was, as a discipline, technology, and economic activity, remarkably successful. "The Email Event Horizon" (Shtetl-Optimized) "When I was a student, I used to wonder constantly about the professors who’d ignore my long, meticulously-crafted emails or fire off one-word replies, yet who might suddenly have an hour for me if I walked into their offices. Were they senile? Rude? Did they secretly despise me? Now I get it, now I understand—yet I doubt I could explain the warped spacetime Gmailometry I now inhabit to my own past self. On the other hand, the recognition of what’s happened is itself a sort of liberation. I’m starting to grasp what’s long been obvious to many of you, those who crossed the [Email Event Horizon] before I got my first AOL account in seventh grade: that it’s useless to struggle." -- Words to this effect could probably be posted over the doors of many grad students, to good effect. (Myself included, at times.) "Ontology And Why I Am Not Obsessed With This Fancy Little Overrated Word" (Jeff Jonas) Yeah, yeah, ontologies, suck. Fine. They're awful things, hard to use, difficult to create, brittle over time, and terrible in every way ... except for the alternatives. "Automated. Untrained. Unguided. Self-organizing." Lafff. If you think that those are the "preferred model" in any real sense, then you've replaced a set of problems you know about with a set of problems you haven't discovered yet. (In other words, there's still no good solution here, even after 50 years of thinking and working. Keep Thinking.) Challenges for the New Genomics - O'Reilly Radar chl is right -- this comment by Thomas Lord is exactly right in the big picture as well as many of its details. Although, what's the deal with this sentence: "(Stonebraker's recent column-oriented work notwithstanding -- but it'll be a long time before that's directly applicable to genomics)"? Dude, some genomicists are already using column-stores for some of their data! There is plenty of large-scale biological data that's ideally suited for that sort of store (read-heavy, bulk-upload, highly-sparse), and I'd suggest that you could probably put "sequencing" data into that category too, if you thought hard enough about it. "Another Karzai Forges Afghan Business Empire" (NYTimes) This is the guy (or maybe the other brother) who owns the restaurant near my office, "The Helmand." Pretty good, actually -- good bread, good lamb, and not too overpriced for all that. "Three detective novels that restore pleasure to reading." - By Ron Rosenbaum - Slate Magazine "Ah yes... we had days like these, in New Haven." -- You're right, Ms. C, it *is* completely insufferable. Why are you still reading Slate? "Fiscal Policy Using the Quantity Theory" (Alex Tabarrok at Marginal Revolution) "... since there is also reverse causality..." -- reverse causality? Jamie Robins on A Bold Vision of Artificial Intelligence and Philosophy: Finding Causal Effects Without Background Knowledge or Statistical Independences" [Social Science Statistics Blog] --- Damnit, right in the middle of the group meeting I am giving on Wednesday afternoon. I wonder if I could find someone who would want to go to this (instead of listening to me yammer on for an hour) and tell me what he said... "How come they don't sell grape flavored Starbursts anymore?" (Andrew Gelman) Pleasure from Now-and-Laters cannot be received, it must be *extracted.* "Lovecraftian School Board Member Wants Madness Added To Curriculum" (The Onion) "West says the school inadequately prepares students for the black seas of infinity." --- I couldn't agree more. "Mautam" (Wikipedia) Watched a NOVA special on "Mau tam" last night, which is the cyclical 48-year flowering of bamboo trees -- followed by a sweeping plague of black rats that (over) feed on the fallen fruit-- in areas around India and south-east Asia. At one point, the Indian government was offering bounties for individual rat-tails... and, remembering that Goldbarth poem ("of course -- a penny a tail"), I thought, "surely, they're going to catch some guy up in the hills, breeding rats." But no, they just followed a rat biologist around, doing field-dissections to pinpoint the number and timing of rat-reproduction "pulses." Pretty interesting, actually. Brian Weatherson, "decision theory notes" Tagging this a second time, noting that I like the analogy (at the end of ch. 15) of the current financial crisis to an insurance company that accidentally finds itself selling hurricane insurance. Also, I'm not-sure that a claim in the middle of the chapter (about the non-convergence of sums of non-independent random variables) is not-wrong. "What Bruce Sterling Actually Said About Web 2.0 at Webstock 09" "I really think it's the original sin of geekdom, a kind of geek thought-crime, to think that just because you yourself can think algorithmically, and impose some of that on a machine, that this is "intelligence." That is not intelligence. That is rules-based machine behavior. It's code being executed. It's a powerful thing, it's a beautiful thing, but to call that "intelligence" is dehumanizing. You should stop that. It does not make you look high-tech, advanced, and cool. It makes you look delusionary." --- The whole thing is a laundry-list shot to hell, wrong in so many ways, but that's what makes it great, I guess. Or at least: funny. Worth it for that one 'graf alone. Some of the conclusions he's reaching are insane, of course, but then I suppose that everyone wants to bite the invisible hand the second it stops feeding them... U.S. Code The entire US Code, in ASCII format. Snavely! Look at this. It's structured -- definitions, cross-references, hierarchically-formatted, with certain sub-sections that actually form logical functions. And yet -- try to find a good, open interface to this sort of thing online. At people, people just put the plain text on the web with a couple of the top levels of hierarchy separated as links and pages. The more hyper-linked versions are behind paywalls. I'm not saying "do this now," but think about how you could structure and represent this as data. Laws, by the way, basically form structured *edits* to this data -- so the current US Code is, in some sense, a snapshot of some source repository that's being edited over time. (Remember our structured text-editor idea?) "Unboxed - How to Make Electronic Medical Records a Reality" (NYTimes) "A crucial bridge to success, according to experts, will be how local organizations help doctors in small offices adopt and use electronic records. The new legislation calls for creation of “regional health I.T. extension centers.” In a letter to the White House and Congress last month, Dr. Middleton and 50 other experts emphasized the importance of these centers and pointed to the Primary Care Information Project in New York City as a model." -- This article pinpoints interfaces with billing systems as one of the key technical challenges here. What would be really nice would be if the government could help to mandate, or push along public standards for, billing systems *interfaces* that could be standardized and shared. (Maybe people are already doing this?) If you had those, then these "regional health I.T. centers" would be one place where local activists could make a real difference. "Link between the Nuclear Export of mRNA and Decay" (The Daily Transcript) To model... "What It's Like To Get Grilled By The New York Attorney General" (Henry Blodget) "All day long, Ken Lewis will feel like a soccer goalie in a sudden-death shootout: One careless answer and he's toast. Team Cuomo, meanwhile, can shoot away until they just can't think of anything else to ask. (This, after all, is perhaps the only public-private sector interaction other than the DMV in which the low-paid public servants actually have the upper hand)." --- Made me laugh and laugh. (To get really technical, though, there's really no such thing as a "sudden death" shootout in soccer; even past the normal five-shots, both teams always have a chance to keep it going, each round. Also, in a shoot-out, the shooter is *expected* to score. The chance of the goalie being a 'goat' is rather low, compared to the chance for heroism if he or she stops a shot that most people *expected* the kick-taker to make. Most of the pressure is on the shooter! But still, nice analogy.) Antonakis and Dalgas , "Predicting Elections: Child's Play!" (Science) [Science 323 (5918): 1183] Another paper to drive Andrew Gelman crazy. (http://www.stat.columbia.edu/~cook/movabletype/archives/2009/02/no-no-noooooooo-1.html) The earlier experiments showed a correlation between college-students' assessments of "competence," and electoral success. This work extends the "result" to children (telling them a story about a long journey on a boat, basically the Odyssey, and then asking them "who would you want to be the captain on your boat?"), who could, again, pick electoral winners at about a ~70% rate. They label this, again, a result based on "competence", because the childrens' picks correlated with adult ratings of "competence" (but not "intelligence," or "looks") -- and if that seems oddly circular to you, then we agree. But there it is: children. Simple question: did they control for stupid stuff, like skin-tone or facial size? I haven't read the paper in depth, but the interview with the author was oddly not-reassuring. 70% both times? "Stimulus Ostriches" (Grasping Reality with Both Hands) "Back at the start of 2004, America's banks discovered that they could borrow money cheaply from Asia and lend it out in higher-yielding domestic mortgages while using sophisticated financial engineering to wall off and strictly control their risks - or so they thought. Over the next two years, annual US spending on residential construction roared upward, from$624 billion to 798 billion, the US unemployment rate dropped from 5.7% to 4.6%, and the economy grew at a 3.1% real annual rate." --- Part of a larger complaint about the "treasury view," and people who still believe in it; but this paragraph is maybe the most succinct explanation of the last year of crisis that I've seen so far. GEO SOFT Deposit SOFT file format notes. Romer Speech on the ARRA (PDF) "Indeed, if you want to know why I am more optimistic than some, it is probably because I believe my own research." --- When she says, "omitted-variable bias," she's couching her argument in explicitly causal terms. Furthermore, her disagreement with Mankiw ("the best man at my wedding") that "created or saved" is not a meaningless number is (also) essentially a counterfactual, and therefore causal, claim. Re-reading Pearl, I admit I see this stuff everywhere now... Gresham et al. "The Repertoire and Dynamics of Evolutionary Adaptations to Controlled Nutrient-Limited Environments in Yeast" (PLoS Genetics) David Botstein is second-to-last author... ASN.1/XML Translator Grrrr, ASN.1, why are people still using you to distribute non-legacy non-network-protocol-based data? "Ride the Train" (Matthew Yglesias) "Which isn’t to deny that a quality rail link could be useful; only to observe that there are a large number of potential projects—basically everything on the existing HSR corridor list plus all kinds of littler things like Phoenix-Tuscon, Worcester-Boston, DC-Norfolk, DC-Richmond—that would seem like a better idea." --- But there's *already* a train that goes from Worcester to Boston. It takes between 1:15 to 1:30, depending on which train you get on (express or local). What's a high-speed-rail version of this line going to do -- cut 30 or 40 minutes from this? And stop nowhere in between? And ... I dunno. Are people really dying to be able to make the trip from Boston to Worcester in <30 minutes? I would imagine that a better mass-transit option in the Boston area would be to re-expand the Green Line into JP, complete the Green Line expansion into Medford, and generally beef-up the woeful MBTA service (bus and T) everywhere else. Worcester's already got a train... "Do you buy this?" (Brainiac) Chris Shea tags the anonymous-live-blog-apologia for the Plantinga-Dennett exchange as "fishy." -- "Whether this person wanted anonymity for the stated reason or some other -- to provide cover for a comically slanted account of the debate, perhaps, or to claim authority (status as a philosopher) the author does not possess -- is the subject of debate in the comments section of that post, and elsewhere. I think the mawkishness ("my family") is a giveaway. Plantinga, anyway, seems to have survived, indeed thrived, as a Christian philosopher." -- Hmmm. Fontana et al. "Rapid Annotation of Anonymous Sequences from Genome Projects Using Semantic Similarities and a Weighting Scheme in Gene Ontology" (PLoS ONE) GAaah. Shaun! Very close to the "alignment free" stuff I was sketching a while ago. But really, we *could* do it better! Zobal and Moffat, "Inverted files for text search engines" (CiteSeer) "Bill Gates on the state of education" (Aleks Jakulin) Bill Gates TED talk on teacher performance and evaluation. "Musical protolanguage: Darwin’s theory of language evolution revisited" (Tecumseh Fitch at the Language Log) I was listening to a description of a paper Darwin wrote, "A Biographical Sketch of an Infant," about the development of his own son over his first four years, and comparing it to observations he had made earlier about a baby orangutan. At some point, I'd like to come back to this and track down that paper... Simple 4.1.5 "The primary focus of the project is to provide a truly embeddable Java based HTTP engine capable of handling enormous loads. Simple provides a truly asynchronous service model, request completion is driven using an internal, transparent, monitoring system. This allows Simple to vastly outperform most popular Java based servers in a multi-tier environment, as it requires only a very limited number of threads to handle very high quantities of concurrent clients." --- I keep thinking that these standalone, Java-based, easily-deployable server-options would be really useful in some kind of distributed graph database project. But I still haven't nailed down the details in my mind... obviously this isn't going to happen while I'm still at school. Dharmapurikar, Krishnamurthy, Taylor "Longest Prefix Matching using Bloom Filters" (ResearchIndex) Via a discussion on Michael Mitzenmacher's blog. Open Yale Course on "Financial Markets" "Color vocabulary and pre-attentive color perception" (Language Log) Lots of references on color naming and comparisons, both within and across culture and language boundaries. "Your Paper Is A Sack Of Raving Nonsense. Thank You." (In the Pipeline) I think that "thanking Sir John Cornforth" would be a good euphemism for this kind of response in an academic setting. "Did you see that ridiculous claim in Cell last week?" "I did! One of us should write a letter, and thank Sir John Cornforth." It could even be a verb -- "they totally got Cornforthed on that one," or "we need to re-do this analysis, or the Cornforthing we receive will be painful." RSS Hits the Big Time (Aaron Swartz's Raw Thought) "For each of the near term reporting requirements (major communications, formula block grant allocations, weekly reports) agencies are required to provide a feed (preferred: Atom 1.0, acceptable: RSS) of the information so that content can be delivered via subscription." --- Now, someone needs to figure out where each of these feeds is (will be?) published, and write an aggregator so that there's a one-stop-shopping end-point. Takahashi et al. "When Your Gain Is My Pain and Your Pain Is My Gain: Neural Correlates of Envy and Schadenfreude" [323 (5916): 937 -- Science] "To elucidate the neurocognitive mechanisms of envy and schadenfreude, we conducted two functional magnetic resonance imaging studies." --- cue jokes about Lewd and Prude (again), or maybe some more serious reference to the fMRI-ization of the Theory of Moral Sentiments... "Tech Trend: Shanzai" (bunnie’s blog) "The contemporary shanzai are rebellious, individualistic, underground, and self-empowered innovators. They are rebellious in the sense that the shanzai are celebrated for their copycat products; they are the producers of the notorious knock-offs of the iPhone and so forth. They individualistic in the sense that they have a visceral dislike for the large companies; many of the shanzai themselves used to be employees of large companies (both US and Asian) who departed because they were frustrated at the inefficiency of their former employers.... They are self-empowered in the sense that they are universally tiny operations, bootstrapped on minimal capital, and they run with the attitude of “if you can do it, then I can as well”." --- Modulo a meltdown of the global capitalist system, now would probably be a good time to start studying Chinese. Ahem. "Homework and Search Engines" (Luis von Blog) "Allow Searching on the Web but Change the Problems. Pros: In real life they will be able to use Google. Cons: It's hard to come up with good ways to change the problems, and inventing brand new problems every year is even harder, especially if you want them to be as good as the classics. My advisor Manuel Blum has recently been thinking deeply about this and he told me a good strategy: for most problems (at least in theoretical CS), you can change them significantly by thinking "how can I make this problem be closer to reality?"" "What the Box Score Data Says About Shane Battier" (The Wages of Wins Journal) Dave Berri, author of "Wages of Wins," talking about Shane Battier and statistics in the wake of that Michael Lewis article. "Lewd and Prude" (John Holbo, at Crooked Timber) Lewd & Prude, Goofus & Gallant, and Sen's Paradox. DeLong also commented, at one point, that he "didn't understand Sen's Paradox." (http://www.j-bradford-delong.net/movable_type/2003_archives/001366.html) Building Scalable Web Applications with Google App Engine ‎(Google I/O Session Videos and Slides)‎ "Toric Varieties and Fans" (Secret Blogging Seminar) Second in the series. "Google App Engine 1.1.9 boosts capacity and compatibility" (Niall Kennedy) It looks like the resource requirements for Google's AppEngine, under the free plan, have solidified (and become much easier to satisfy). No more "high CPU" limits? Might be time to try writing something for it, again. Burr Settles, "Active Learning Literature Survey" Felix Salmon, "Recipe for Disaster: The Formula That Killed Wall Street" The piece on the Gaussian copula and the financial meltdown, in Wired. "Provost, Fawcett & Kohavi (1998) The Case Against Accuracy Estimation for Comparing Induction Algorithms" (LingPipe Blog) I could probably learn a lot by just going back and re-reading all of Ron Kohavi's old papers... "types in Statistics" (gustavolacerda) "My Stats homework has a question of the type: "Given this joint distribution over X and Y, compute E(E(X|Y))."... This notation is extremely confusing.Given how unclear the notation is, I decided to do something about it, using the formal(ish?) language that I designed yesterday..." --- A(nother) interesting attempt at a "formal language with types" approach to probabilistic notation. It's good! But to be honest, I think ideas like this occur to a lot of Haskell-aware CS people when they first encounter probabalistic notation -- I remember John Barnett showing me something like this four or five years ago. I'd recommend looking at some of the papers of Avi Pfeffer and Claire Jones, if you want to see where this stuff ends up going. In particular, Pfeffer's 2000 thesis, or his IBAL paper, or his "Stochastic Lambda Calculus and Monads of Probability Distributions" paper, and the references therein. Things get interesting when you go to higher-order modeling languages... Plant Encyclopedia Database "Can be bonsaied..." Chapman and Liu, "Numeracy, Frequency, and Bayesian Reasoning" A follow-up to that Gigerenzer and Hoffrage paper! "The Color of Government Money" (Will Wilkinson) "And someone please tell me how speeding up the process of picking winners doesn’t simply make well-prepared regulatory capture specialists like T. Boone Pickens (whose PR blitz seems to have worked to get him into the DoE inner circle) richer simply because they’ve got the resources to hoover up contracts." --- I don't see why W.W. is being so obtuse here -- the answer to his question is in the third 'graf of the quoted article he gives. Loan guarantees and simpler applications disproportionately aid people who *aren't* already billionaires or otherwise "in a position to hoover up government dollars." Rising tide, lower activation energy, etc. etc. David Li, "On Default Correlation: A Copula Function Approach" (SSRN) Everyone's all over the "Li copula function" beat these days (trailing in Felix Salmon's wake as always, right?) ... but it's worth it to actually read the original paper. Go through, and highlight every single sentence that uses the word "assumption" or its equivalent. (Seriously! Do it! I did it last night on the train, it was fun.) Stranger et al. "Relative Impact of Nucleotide and Copy Number Variation on Gene Expression Phenotypes" (Science 2007) [Science 315 (5813): 848] Picked up a copy of Science from the lab to take home with me last night and read on the train... and noticed this article towards the end. Seems relevant. Caching Tutorial for Web Authors and Webmasters In all it's glory/gory detail. everyone's bookmarks for "[0901.2640] promise and pitfalls of extending google's pagerank alg..." on delicious Oh, no, I didn't mean to imply that there was anything wrong with the *paper*. I haven't read it in depth at all, I'm sure it's fine. And graph methods for exploring networks (of citations, or anything else), or spectral algorithms, are cool. My objection is with people who think that these metrics are reasonable ways of making research decisions, or of assessing quality of publication lists. I know I've had at least one paper where, after the first submission was rejected, the choice of a second journal to which to submit it was made (between two suggested destinations) based on one of these style of impact and citation ratings. You can tell a computer scientist or a statistician all about rating bias, or the possibility of google-bombing or attacking the system... but a biologist just wants to see the number, and won't care about how its calculated, or whether it can be manipulated. In practice, any role for a system like this in real research and credit-building is a problem. "Note on point-free programming style" (The Universe of Discourse) I'd read this before, but not tagged it ... it's Mark Dominus, writing about programming in post-free vs. ... not-point-free style. ('Argumentative' style?) This is related to a bunch of stuff, including thinking about things in a CPS-ey framework. As it turns out, our entire system for dealing with bio data here in the lab turns around the idea of building up your computation in point-free style using Iterators. It's pretty easy. Maslov and Redner, "Promise and pitfalls of extending google's pagerank algorithm to citation networks" (arXiv) Why are people even trying to go down this road? It's weird that people call it an "algorithm," spectral methods for summarizing graph-theoretic properties are all over the place (not that that makes it wrong), and anyway Google itself freely admits that they have to constantly monitor and tweak and adjust their scoring and evaluation in order to stay ahead of SEOs and Google-bombers. But exactly the same attacks will be relevant to a system that uses these sorts of scores to judge research quality and citation and influence networks! Choosing these kinds of metrics encourages people to start companies for the sole purpose of REO -- optimizing your research score. People will form little co-citation rings. It'll be PNAS Track III all over again, right? So... why? "Quiz Answer: Continuation Passing Style for Converting Visitors to Iterators" (LingPipe Blog) Is it annoying of me, to suggest that continuations are not functions? (In particular, java.util.Runnable in the Java SDK is *not* a way of "programming with continuations," even if it does lend itself to reasonable ways of expressing some ideas.) Functions are not continuations. Functions with no arguments (thunks) are not continuations. Iterators are not continuations. All of these constructs are useful, of course ... and I suppose that one could program in a continuation-passing *style* without actually having continuations, but this is beside the point. A continuation *is* a computation in a technical sense -- it's a value which picks up and holds a "point in the control flow," and it's got interesting (and useful) interactions with things like state and side-effects when implemented properly. To see the difference, go look at MIT Scheme and compare "let" and "fluid-let." I know this is nit-picky, but still. Jerzy Karczmarczuk, "Functional Coding of Differential Forms" Differential forms in Haskell. This is cited by the Conal Elliott paper, and looks like it'd be worth reading too. John Hodgman on "meh" - Waxy.org "It's part of the toxic Internet art of constant callous one upsmanship. And it is a sort of art, but not for me." --- And with that, Hodgman demonstrates his superior adherence to a deeper and more robustly-discomforting form of Internet Lifemanship. Truly awe-inspiring. "Unforgettable Yoo" (Christopher Shea at Brainiac) "No one can write about the law-professor/torture-memo-author John Yoo without reaching into the pun basket. But, people, there are standards even for the lowest form of humor. Suffice it to say that "Yoo Complete Me" fails the test." "Book Review - 'Snark - A Polemic in Seven Fits,' by David Denby" (Walter Kirn) "One almost wonders if what he so deplores about what he calls “the hunting of the snark” is that, invariably ... someday the snark would come for books like his." --- Notable in particular for this wonderful last line. "The Economics of Charity Auctions" (Felix Salmon) "I've been outbid this morning -- three times -- on a charity auction for a very good cause; the top bid has now increased by81 as a result. Does that mean that without spending a penny, I've in some way managed to raise $81 for charity?" --- This is a question about causality! I think someone like David Lewis would say, "no." "NASA’s carbon dioxide detector lost" (Climate Feedback) Worse than the$280m lost is the 12-18 extra months that most of us will be forced to listen to George Will go on about climate change. Multiplied out over the entire newspaper-reading population of the U.S., that's gotta equal some large-ish fraction of the actual monetary cost of the mission in lost work (spent ranting or pulling one's own hair or shopping for sackcloth clothing). Where's Richard Posner or Cass Sunstein to do these kinds of calculations when you really *need* them?
"The Bad Bank Owns the Good Bank -- Brilliant!" (Andrew Samwick)
I've seen a couple of references to this "split things into a good bank and a bad bank, and then make the bad bank own equity in the good bank" plan for the last few days ... so, let me see if I can get this straight. The problem is that "sensitivity to failure" is a *transitive* relation: if I'm sensitive to your failure, and you're sensitive to your mom's failure, then I'm sensitive to your mom's failure (ka-ching). But *ownership* of "bad assets," and the corresponding accounting regulations, are *not* transitive relationships. So ... this good/bad splitting plan would be the government basically doing some kind of financial-accounting regulatory arbitrage (on rules that it presumably set up, itself, for everyone) to finesse one kind of transitivity into the other kind of non-transitivity. Is that right? (Is this what's happening whenever *anyone* forms some kind of limited-liability partnership or corporate entity?) If so... weird.
If you go thinking about Newcomb's Paradox (from that old post about thought experiments), John, you should go to Brian Weatherson's blog (linked to in this comment) and read some of his notes about it. He talks about it in an informative way, I think.
Rocket launches 'Predator' - Entertainment News, Film News, Media - Variety
"Will Clark is set to direct "Pride and Predator," which veers from the traditional period costume drama when an alien crash lands and begins to butcher the mannered protags, who suddenly have more than marriage and inheritance to worry about." --- OMfg, I must see this. Pleasepleaseplease let it not be terrible.
"Collaborative public policy-making, the Freiburg way" (Wikinomics)
"One innovative case of public policy consultation can be found in the city of Freiburg, Germany. In 2008, the municipal government of Freiburg invited its citizens to partake in a participatory budgeting exercise. The goal was to gather citizen input for the drafting of the 2009/2010 municipal budget. With the help of software company TuTech Innovation, the Freiburg government created a website that used discussion forums, wikis and a new innovation - the budget slider. Citizens who registered for the website could manipulate these sliders to create their own individual budgets, by moving the sliders up or down to either increase or decrease spending to any one of the 22 budget areas. The key constraint was that the total budget had to balance to 2008 levels, so spending increases in one area necessitated economizations in another. Citizens were also invited to provide written justifications for their changes." --- Budget-simulation-software is awesome.
"Paper: Beautiful differentiation" (Conal Elliott)
Ah! I understand, now! Automatic Differentiation != Symbolic Differentiation, and it's not just a matter of "AD works on actual subroutines." It's not an implementation thing, they're two different things. In C.E.'s words, "the symbolic method uses a series of rules... [for the] transformation of the source code," whereas AD is a method "to simultaneously manipulate values and derivatives," in which "numeric operations apply point-wise." Implementations given in Haskell, and then he shows how it generalizes to derivatives over non-single-variable functions (i.e. he's doing some portion of standard diff. geometry, right)? This is all pretty great. (I admit that I probably would have figured this out before, if I'd been reading Elliott's posts as carefully as I should have -- therefore, the fact that this is a revelation to me *now* is evidence only of my laziness, nothing else.)
"All we want are the facts, ma'am" (Peter Norvig)
"I figured they would either use the quote I gave them, paraphrase it, or drop it completely if it didn't fit with the point of the story. But when [the "End of Theory" issue] came out in June 2008, there was a fourth possibility that I hadn't even counted upon: they attributed to me a made-up quote that actually contradicts the reply I gave them... To set the record straight: That's a silly statement, I didn't say it, and I disagree with it. The ironic thing is that even the article's author, Chris Anderson, doesn't believe the idea. I saw him later that summer at Google and asked him about the article, and he said "I was going for a reaction." That is, he was being provocative, presenting a caricature of an idea, even though he knew the idea was not really true." --- Chris Anderson wants to take it all back! What about the Long Tail? Was that just "provocation" too? How am I to know what to believe when I see Anderson's byline?
"Beautiful differentiation" (Conal Elliott, draft paper)
"Automatic differentiation (AD) is a precise, efficient, and convenient method for computing derivatives of functions. Its implementation can be quite simple even when extended to compute all of the higher-order derivatives as well. The higher-dimensional case has also been tackled, though with extra complexity. This paper develops an implementation of higher-dimensional, higher-order differentiation in the extremely general and elegant setting of calculus on manifolds and derives that implementation from a simple and precise specification."
"Differences between French and English statistical models" (Statistical Modeling, Causal Inference, and Social Science)
"French social scientists prefer a visual display of statistical data, which is usually presented as "orthogonal projections" of a "cloud of points, constructed in a space having a large number of dimensions".... In contrast, he says, Anglo-Saxon social scientists prefer to use regressions, and write their articles in the language of formulas and equations, which include a deterministic part "plus a random residue." --- Starting points for a history of statistical *visualization*. (Alain Desroisieres, in this case.) Someone buy this (http://www.ensmp.fr/Presses/consultation.php?livreplus=109--col10) for me, please? Also, probably file under "sociology of data" too.
"Data paparazzi" (Adventures in Ethics and Science)
"It now seems that some physicists have taken matters into their own hands. At least two papers recently appeared on the preprint server arXiv.org showing representations of PAMELA's latest findings (M. Cirelli et al. http://arxiv.org/abs/0808.3867; 2008, and L. Bergstrom et al. http://arxiv.org/abs/0808.3725; 2008). Both have recreated data from photos taken of a PAMELA presentation on 20 August at the Identification of Dark Matter conference in Stockholm, Sweden." --- Scientists taking pictures of unpublished data, and writing articles about it. Optical data recognition, intellectual property, academic politics... to be honest, this would probably drive some biologists insane. Not that it doesn't happen (the brain is the ultimate optical data recorder, right?)
"Facebook's use of our content has to have clear limits. * If I do not wish any of my content to be used for commercial purposes, or submitted to 3rd parties, I should be able to select this in my Privacy settings...." --- Good luck with that. Facebook *is* a "commercial purpose," doiii.
"The Limits of Free Scientist Chow?" (In the Pipeline)
This would make a perfect experiment on vultures.
"Mark Halpern on Language Log" (Language Log)
The reason that many people "feel that the most exciting parts of Paradise Lost are those in which Satan speaks," is because when Satan *does* show up, he doesn't spend two paragraphs bitching about how "no one told me they were talking about my book on the interwebs." Silence should have been his good, all hope to him is lost, etc etc. (While thus he spake, the blogging squadron bright turn'd fiery red, sharpening in mooned horns their phalanx, and began to hem him round with ported comments, as thick as when a field of Ceres ripe for harvest waving bends her bearded grove of ears, which way the wind sways them; the careful ploughman doubting stands, lest on the threshing floor his hopeful bullet-pointed-response proves chaff.)
Rush to Recovery
"Because, my friends, 'cold turkey' is something far worse than what you eat the day after Thanksgiving..." Still my favorite Rush Limbaugh parody (requires Real Player). Harry Shearer's "Le Show" from October 12 2003.
GeoCommons Maker!
indiemapper
"Indiemapper is the smarter, easier, more elegant way to make thematic maps from digital data."
"Mapping with Isotype" (DIY Cartography: Making Maps)
This page kills my home internet connection every time it tries to load ... but when it does, beautiful stuff. Links to the Gerd Arntz icons (which I had seen before, but not downloaded until now), along with links to a PDF of one of his atlases that's ... beautiful, pretty much.
Did Last.fm Just Hand Over User Listening Data To the RIAA?
"As a result, word is going around that the RIAA asked social music service Last.fm for data about its user’s listening habits to find people with unreleased tracks on their computers. And Last.fm, which is owned by CBS, actually handed the data over to the RIAA." ---- aaaahahahahahahahaaa. I mean, that's not funny at all. But still. Why would anyone share anything like this over a social network? (I'd heard of this "scrobble" thing, but I didn't know what it did.) The second that Facebook starts to collect data on what's in my computer, is the second that I delete my Facebook account. (Actually, they're probably already monitoring my web-browsing habits. Maybe I need to go on a cookie-cleaning expedition?)
"I’m Simply Wild About That Sled" (The Valve - A Literary Organ)
I think it'd be fun to compile a list of greatest last-lines (or dying-lines) from movies, in this style. This post mentions two of them -- Kane's "rosebud," and Plainview's "I'm finished." To those two, I'd add Emmett Ray's "I made a mistake" (from Sweet and Lowdown). Other suggestions?
"Batman as a Monster in a Classic Horror Film (Batman Begins)" (Acephalous)
"Someone sent me an email asking if I thought Nolan had shot Batman like a monster in a horror film in the fundraiser scene. I replied that I had not. But when I taught Batman Begins last quarter I taught it as a horror film." -- Exactly. You know, for reasons I can't explain, this all makes me think of the American version of "The Grudge," which I saw a couple of years ago. When I was watching it, I started to get the feeling that Shimizu was playing a game -- "how many sides of the screen can I make things jump out at you from?" Top, bottom, left, right, they were all there. And then he flips you, he surprises you from the *fifth* side of the screen: the middle. Sarah Michelle Gellar's in the shower, and the pans show you that nothing's around. ("See, there's nothing up my sleeves.") And then, with the camera still and focused on the back of her head -- the monster jumps out at you. From under her hair. In the middle of the screen. Still my fave movie moment of all time.
"The City Is A Prototyping Engine" (there is a lot to say, of this we are sure)
"If the largest cities are able to churn over constant ptototypes it's because the abundance of density yields disproportionately large opportunities in the form of financing, know how, and other limited resources. The rural, on the other hand, typically has ample supplies of raw material and time. The rural ethos is to assemble what you have in the best way that you can and this kind of improvisation is is what was missing from Paso Robles. As a place that now thinks of itself as a city, Paso Robles looked to other, larger cities for its missing expertise and equipment rather than taking the imperative of the event to test something new. This opportunity for civitas was treated as a chance to consume." -- There's a lot more to say about *this* in particular. I think the idea of "cities-that-consume-or-produce" is very similar to "cultures-that-learn." These are attributes we normally think of ascribing to people; which other characteristics can be ascribed to corporate entities?
Lefrancois et al. "Efficient yeast ChIP-Seq using multiplex short-read DNA sequencing" (BMC Genomics)
Look at the second-to-last point before their conclusions -- they notice, and point out, the "towers" phenomenon, but they claim that barcoding removes it (and "regardless, these artifacts are readily eliminated by removing non-unique reads.") Hmm...
"Finally, A Use for Twitter" (A foot and a half)
"To all twitterers , if u c me n public come say hi, we r not the same we r from twitteronia, we connect."
"The Washington Post's 'Multi-Layer Editing Process'" (Obsidian Wings)
s/multi/2/ and s/layer/wafer/, maybe ... I mean, it's a funny story and all, but I'm really at a loss why hilzoy (and by extension, Brad DeLong) are getting so worked up about all of this.
"An implementation of the SPARQL protocol, exploring issues of robustness and scalability. Currently based on the Jena SDB/TDB libraries." It'd be nice to see more documentation, to understand exactly what's going on here.
"Found in space" (Code: Flickr Developer Blog)
"A robot intelligence has invaded Flickr. The “blind astrometry server” is a program which monitors the Astrometry group on Flickr, looking for new photos of the night sky. It then analyzes each photo, and from the unique star positions shown it figures out what part of the sky was photographed and what interesting planets, galaxies or nebulae are contained within. Not only does the photographer get a high-quality description of what’s in their photo, but the main Astrometry.net project gets a new image to add to its storehouse of knowledge." -- I do wonder how many of the vision groups (well, at least the ones that deal mainly with photographs) out there are building software that interacts with Flickr automatically...
Baxter et al. "The leaf ionome as a multivariable system to detect a plant's physiological status" (PNAS)
The "ionome?" Really?
Klimentidis, Shriver, "Estimating Genetic Ancestry Proportions from Faces" (PLoS ONE)
Coates, Gurnell, and Rustichini, "Second-to-fourth digit ratio predicts success among high-frequency financial traders" (PNAS)
"Decision by Vetocracy" (Machine Learning (Theory))
"John, I am the reviewer who have rejected your repetitive attempts to submit this worthless paper. I won’t go into details, since the review has all of them. I will, however, address your allegations w.r.t. to my evil intentions. I do recognize your paper and bid on it, but only as a service to other reviewers. If I have read your paper, why not save some time to unsuspecting reviewers? Yours, anonymous reviewer." --- Wow. What do they say about academic politics being so vicious? (Something about the stakes being so small?)
MetaPost - TeX Users Group
"This page has a list of links related to MetaPost, a powerful tool for creating graphics in scalable PostScript. It was written by John Hobby, based on Metafont by Donald Knuth." -- Because thinking about modular or interesting graphics languages is an interesting diversion.
David Wolpert, "Stacked generalization" (1992)
"This paper introduces stacked generalization, a scheme for minimizing the generalization error rate of one or more generalizers. Stacked generalization works by deducing the biases of the generalizer(s) with respect to a provided learning set. This deduction proceeds by generalizing in a second space whose inputs are (for example) the guesses of the original generalizers when taught with part of the learning set and trying to guess the rest of it, and whose output is (for example) the correct guess. When used with multiple generalizers, stacked generalization can be seen as a more sophisticated version of cross-validation, exploiting a strategy more sophisticated than cross-validation 's crude winner-takes-all for combining the individual generalizers. When used with a single generalizer, stacked generalization is a scheme for estimating (and then correcting for) the error of a generalizer which has been trained on a particular learning set and then asked a particular question."
"How can one equivalent statement be stronger than another?" (Gowers’s Weblog)
"One final remark is that questions of this kind show just how incomplete a picture of mathematics was provided by some philosophers early in the twentieth century, who held that it was merely a giant collection of tautologies. The way that tautologies relate to each other is fascinating and important, so even if one believes that mathematics consists of tautologies, the “merely” is unacceptable. It should be said that this particular view of mathematics went out of fashion fairly soon after it came into fashion, so it doesn’t really need attacking, but it would be interesting to develop this particular line of attack."
Bartolome, et al - Widespread evidence for horizontal transfer of transposable elements across Drosophila genomes, 2009 Genome Biology
Very interesting and everything, but really "it's all just an elaborate exercise in causal inference." Ahem.
everyone's bookmarks for "everyone's bookmarks for "automatic differentiation: the most crimi..." on delicious
Shoehorning things into delicious is fun :-). Let me know if you notice Domke talking about this anywhere (and in the meantime, when I see "automatic differentiation" I'll just take that to mean "a particular implementation of a symbolic differentiator," and proceed from there...)
Ricci et al. "Magic Moments for Structured Output Prediction"
Whoa, that looks pretty interesting. To read.
Brachman and Levesque, "Knowledge Representation and Reasoning"
Something else to spend my fool money on.
"From The Desk of Schadenfreude Esq.: What’s Good for Facebook is Good for America" (Public School Intelligentsia)
"Review of a review of my review of Angrist and Pischke" (Andrew Gelman)
"The purpose of a multilevel model is not to "get the standard errors right" but rather to model structure in the data. An analogy that might help here for economists is time series analysis. If you have data with time series structure and you ignore it, you can get over-optimistic standard errors. But that's not the main reason people do time series modeling. The main reason is that the time series structure is interesting and important in its own right. We are interested in individual and contextual effects and unexplained variation at the individual and group levels, just as we are interested in autocorrelation, periodicity, long-range dependence, and so forth." --- Maybe it's the Pearl I've been reading recently, but this makes the hair on the back of my neck kinda rise up. (Not that I'm trying to disagree with him or anything.)
Everyone's bookmarks for "Automatic Differentiation: The most criminally underused tool in t..." on Delicious
No, I mean, I understand what distinction he's trying to make -- and I'm not trying to be argumentative, I swear. I just don't understand what you get by naming "differentiation of a function expressed in [say] C++" to be "automatic differentiation," as opposed to "symbolic". Presumably "C++" (or Haskell, or C, or Python, or whatever) is a symbolic language, no? What does the distinction buy you, except the opportunity for a distinct domain name? (On the other hand, I would argue that *introducing* a distinction when none really exists is then becomes an opportunity to *miss* a prominent use of "symbolic" differentiation, just because it doesn't operate on a set of symbols that you're used to...)
Levesque and Lakemeyer, "The Logic of Knowledge Bases"
Just arrived in my office - w000oot!
Gomez, Kappen, Chertkov, "Approximate inference on planar graphs using Loop Calculus and Belief Propagation" (arXiv)
"The loop calculus (Chertkov et al 06) allows to express the exact partition function of a graphical model as a finite sum of terms that can be evaluated once the belief propagation (BP) solution is known. In general, full summation over all correction terms is intractable. We develop an algorithm for the approach presented in (Chertkov et al 08) which represents an efficient truncation scheme and a new representation of the series in terms of pfaffians of a matrix for planar graphs."
"Targeted? Infrastructure Spending by Unemployment Rate" (Marginal Revolution)
"Here is a graph of planned infrastructure spending per person (by state) against the state unemployment rate. The spending doesn't look especially targeted." --- Basically, this is Alex Tabarrok complaining about R^2. File under: "Useful-Mainly-Because-It-Provides-A-Pointer-To-A-Site-With-Real-Data."
The Space Game - Real Time Strategy Game by David Scott – Candystand.com
"Automatic Differentiation: The most criminally underused tool in the potential machine learning toolbox?" (Justin Domke’s Weblog)
I don't understand why he thinks that "automatic differentiation != symbolic differentiation." That seems (to me) to be pretty clearly what it is. As for the question of "why machine learning people don't use it more often," I suppose the answer has something to do with the answer to "why don't machine learning people use symbolic integration at all, either?" (A problem that has been solved, at least for a lot of the cases where it would be useful, for at least 30 years.) The answer to both questions could be the same: they really don't know about it. (OTOH, the other possibility here is that machinelearning people *are* using symbolic differentiation all the time, and you just don't *recognize* it. I would argue that this actually is the case -- a lot of methods for dealing with graphical models are basically symbolic methods for calculating a derivative, most notably, belief propagation.)
my choices [Oded Goldreich]
"I would like to contribute to the project of regaining forums devoted to the presentation of ideas in a different way; specifically, by calling attention to works that have facsinated me although they were not necessarily labeled as "hot". Needless to say, my choices will be restricted to my own research areas, and even within these areas the choices will be confined to what I have heard and/or read (and understand well enough to write about...). Thus, the fact that a specific work is not mentioned does not indicate that I have a "not so high" opinion of this work; it may be the case that I do not know about this work or that I don't know it well enought to feel comfortable writing about it."
"Progress on NASA's Constellation" (Big Picture Blog)
Sometimes, it *is* rocket science.
"Social pressure and biased refereeing in Italian soccer" (Social Science Statistics Blog)
Officials are biased by the home crowd, but ... not in the way you'd immediately expect. (This is a result I definitely would not have predicted. On the other hand, there's a pretty big caveat at the end.) -- "One of the interesting things in the results is that refs showed no favoritism toward the home team in games with spectators -- they handed out about the same number of fouls and cards to the home and away teams in those games. The bias shows up in games without spectators, where they hand out more fouls and cards to the home team. (The difference is not statistically significant in games with spectators but is in games with spectators.) If we are to interpret the empty stadium games as indicative of what refs would do if not subjected to social pressure, then we should conclude from the data that refs are fundamentally biased against the home team and only referee in a balanced way when their bias is balanced by crowd pressure."
"What’s Wrong with SQL?" (Haystack Blog)
"There should be a general and declarative way to make big joiny queries like the above work efficiently, returning the data in exactly the hierarchical form we want it — strictly relational result sets are not expressive enough. I am currently working on a simple SQL-like query language that does just this: send my generalized middleware a single big, declarative (no for loops or outer joins here!) query, and you’ll get back the JSON equivalent of the relational result set with the data nested into arrays and objects any way you want it." --- Graphs-at-a-time, man. (I'm pretty sure that Eirik was in my Database class, last semester.)
"Little bit more on teaching The Dark Knight" (Acephalous)
When it comes to expository academic writing in a blog format, Scott sure knows what he's doing. (It'd be fun to contrast this whole camera-movement-showing-you-what's-not-there to the same kinda deal that goes on in horror movies all the time -- except, this time, Batman's the monster...)
"Temporal Scope for RDF Triples" (Jeni's Musings)
"There seem to be two acceptable ways of handling the problem, and one unacceptable way. The unacceptable way is to give a reified triple and hang metadata from that reified object. ... The reason that this is unacceptable is that reified statements aren’t incorporated into triplestores in the same way as normal statements, so you can’t query as naturally on these statements as you can on unreified statements." --- But that seems more like a problem with your triple-stores, than a problem with your language (RDF) or your representational format (reified triples vs. named graphs vs. whatever). In general, this kind of reasoning (choosing a description language based on its storage implementations and formats) seems somewhat ass-backward, and one of the reasons why the RDF/Semantic-Web community still weirds me out a little bit.
eRepublik - The New World | Online Social Strategy Game
"Lincoln’s Bicentennial" (The Unapologetic Mathematician)
"At last I said,- Lincoln, you never can make a lawyer if you do not understand what demonstrate means; and I left my situation in Springfield, went home to my father’s house, and stayed there till I could give any proposition in the six books of Euclid at sight. I then found out what demonstrate means, and went back to my law studies."
Palacios et al. "Allele-Specific Gene Expression Is Widespread Across the Genome and Biological Processes" (PLoS ONE)
"Allelic specific gene expression (ASGE) appears to be an important factor in human phenotypic variability and as a consequence, for the development of complex traits and diseases." --- HMmmmm.
Kappen, Gomez, Opper, "Optimal control as a graphical model inference problem" (arXiv)
"We reformulate a class of non-linear stochastic optimal control problems introduced by &#92;cite{NIPS2006_691} as a KL minimization problem. As a result, the optimal control computation reduces to an inference computation and approximate inference methods can be applied to efficiently compute approximate optimal controls." -- To read.
Rennie, Shih, Teevan, and Karger (2003) Tackling the Poor Assumptions of Naive Bayes Text Classifiers « LingPipe Blog
Hey look at that -- LingPipe Blog's talking about one of Jason's papers...
Right, right right... but if you've been putting "your stuff" on Facebook, then you are acting recklessly already.
My notes on "Causal Network Discovery"
Robin, I'm putting links (to papers) and notes, as well as some organization, for the papers on causal network discovery that I've been reading. This is the start of an attempt to do the "network paper" review that you asked of me, last Friday. This page, in particular, will be changing over the next few days as I add more notes, read more of the papers, add additional papers as I find them, and generally try to get it to the point where it will be a sufficient review for our purposes. Let me know if you have any questions...
Arthur G. Powell, Eleanor Farrar, David K. Cohen, "The Shopping Mall High School: Winners and Losers in the Educational Marketplace"
Recommended by Harry Brighouse (http://crookedtimber.org/2009/02/16/ten-books-every-teacher-should-read-1-shopping-mall-high-school/) as the first of "ten books every teacher should read."
"Banning Open Access" (The n-Category Café)
[FYI, I think he means John Conyers here, not "Dave" Conyers.] The NIH public access system is actually a thorn in my side right now (you have to submit all articles you publish under NIH-funded grants to a "public access" web system, which most journals do automatically, but one conference we published in last year did *not* do, and so I had to do it manually, a slow process which held up a grant thingy...) ... but no matter: repealing or substantially changing the NIH public access policy would be a *horrible* idea.
"We’ve talked about groupiness in sports teams before. The idea is that team sports can’t be reduced to their individual components (or even to their individual statistics). Team performance is a group outcome. Each players’ ability to perform is affected by every other player’s performance. This is why you can take a team of good players and they do not automatically turn into a good team. Good teams figure out how to play together and take advantage of task and skill interdependence. Valuable team players like Battier know how to use their skills to complement the skills of the other players on their team." -- What would a model for team-groupiness look like? One thing to mention here is that basketball *isn't* unique in this respect; hockey has a similar problem (as does soccer, to a lesser extent). In hockey, there's the "plus/minus" system of accounting, which has (I think) been slowly picked up by some basketball stats guys.
"Picking up the phone" (Crooked Timber)
Hits several nails on their heads -- "I’ve noted a common theme in which journalist deplore bloggers’ habit of speculating about subjects instead of “just picking up the phone” and asking those directly involved... The implied (and sometimes expressed) view of bloggers is that of lazy amateurs [the distinction, then, must be that most journalists are lazy *professionals*]... It struck me though, that asking questions of total strangers is both a distinctively journalistic activity and one that implies and requires a special kind of professional license.In fact, “Journalists do interviews” comes much closer to a definition of what is distinctive about journalism than formulations like “journalists report news, bloggers do opinion”... By contrast, on the relatively rare occasions when bloggers and other non-journalists do interviews, the practice, almost invariably, is to publish the result verbatim."
"Complete Metric Spaces and the Interpretation of Probability" (Ars Mathematica)
I blew about 20 minutes yesterday, following a couple of troll-threads through the posts at Ars Mathematica. You've got a couple confused people who insist that everyone *else* must be confused, one guy who's decided that mathematics disproves modern science (which should be replaced by his own "unique" blend of mysogyny and creationism), and a couple el-Naschie sock-puppets, all just in the last few months. It's a pretty entertaining collection of psychoceramica...
How Not To Sort By Average Rating
"PROBLEM: You are a web programmer. You have users. Your users rate stuff on your site. You want to put the highest-rated stuff at the top and lowest-rated at the bottom. You need some sort of "score" to sort by. ... CORRECT SOLUTION: Score = Lower bound of Wilson score confidence interval for a Bernoulli parameter." --- What? In what world is this the "correct" solution?
"One shaman, two shamuses?" (Language Log)
"The usual plural of shaman is shamans. shamuses is the plural of shamus, American slang for "private detective", apparently from Yiddish shammes "sexton", due to an equation of the duties of the sexton of a synagogue with those of store security." -- What do you mean, like, an Irish monk?
Review of "The Black Swan: The Impact of the Highly Improbable."
Dennis Lindley doesn't like N.N. Taleb... (via Andrew Gelman's blog, http://www.stat.columbia.edu/~cook/movabletype/archives/2009/02/dennis-lindleys.html , which is funny because gexpr's 'razib' shows up and asks a question the answer to which is basically, "because of his other books").
Carnegie Mellon Department Of Philosophy: Peter Spirtes
Partial list of Peter Spirtes's papers...
"The No-Stats All-Star" (Michael Lewis, NYT)
What, is Michael Lewis just going down the roster of American sports? First baseball, then football, now basketball. Next, I predict articles about how Arsene Wenger has used new methods of identifying young talent for Arsenal, about boxing promoters who are promoting a new "scientific" approach to the Sweet Science, articles concerning the Mathematics of Sprinting (or Track and Field in general), the data-rich characteristics of modern hockey G.M.s, and maybe even an essay or two about an unknown coach who has some interesting ideas about the recruitment and training of ice curlers...
epigraph.pdf (application/pdf Object)
Instructions on using the 'epigraph' package.
Leominster Public Library
R and I finally moseyed down to the public library down the street from us this afternoon -- this place is actually pretty amazing. Large, recently renovated, lots of tables, power, network access. Looks like their collection isn't bad for a community library either. This is actually somewhat amazing.
"245b Real Analysis"
Via the Wordpress tags, all of Terry Tao's lecture notes on Real Analysis, that he's posted on his blog.
Alchemy - Open Source AI
Papers on Alchemy, a system for reasoning with Markov Logic Networks.
Thomas, Mi, and Lewis, "Ontology annotation: mapping genomic regions to biological function," [Curr Opin Chem Biol. 2007] - PubMed Result
"Elkan and Noto (2008): Learning Classifiers from Only Positive and Unlabeled Data" (LingPipe Blog)
For an idea about probabilistic models of path connectivity...
Kalvin et al. "Building perceptual color maps for visualizing interval data"
"This allowed us to study how the three perceptual dimensions represent magnitude information for test patterns varying in spatial frequency. This design also allowed us to test the hypothesis that the luminance channel best carries high-spatial frequency information while the saturation channel best represents low spatial-frequency information." --- It's the guys from IBM! Dennis, are you using this stuff (the info in this paper) at all, for your work? If not, do you want to try to work with me to try to put some of this into a package for visualization of some basic kinds of biological data? I need to read the paper carefully, first...
Chris Reeder, "A novel computational method for inferring dynamic genetic regulatory trajectories" (PDF)
Chris's Masters Thesis, from late last year.
"My favorite things Puerto Rican" (Marginal Revolution)
"Actress: Jennifer Lopez. Seriously. Out of Sight is quite good and the badly misunderstood The Cell makes perfect sense once you realize it is a retelling of parts of Sikh theology. Rita Moreno gets honorable mention." -- Unless "Sikh theology" involves wearing a hot-dog suit, I totally don't believe him.
"The junk heap of (blog) history" (The Universe of Discourse)
"I invite your suggestions for what to do with this stuff. Mailing list? Post brief descriptions in the blog and let people request them by mail? Post them on a wiki and let people hack on them? Stop pretending that my every passing thought is so fascinating that even my failures are worth reading?" -- I vote against the last suggestion.
"Retrofitting Suburbia" (Andrew Gelman)
Someone should buy this book for Dr. Wharton...
Cosma Shalizi's Notebook on "Causality"
Also, the papers listed here are another good place to start. (Again, I'll try to collect a reduced summary for you, this weekend probably.)
Ajtai, Gurevich, "Datalog vs first-order logic"
"Our main result is that every datalog query expressible in first-order logic is bounded; in terms of classical model theory it is a kind of compactness theorem for finite structures. In addition, we give some counter-examples delimiting the main result."
Johnson, "Life of Milton"
"His political notions were those of an acrimonious and surly republican, for which it is not known that he gave any better reason than that "a popular government was the most frugal; for the trappings of a monarchy would set up an ordinary commonwealth." It is surely very shallow policy, that supposes money to be the chief good; and even this without considering that the support and expence of a Court is for the most part only a particular kind of traffick, by which money is circulated without any national impoverishment." -- Thank you, Frank Kermode.
Zecchini & Mills, "Putting chromatin immunoprecipitation into context" (Journal of Cellular Biochemistry)
"This article will discuss the extrapolations involved in using ChIP data to draw conclusions about these themes and the discoveries that have resulted." -- Includes two whole paragraphs, and a figure, talking about the the Loh Oct4/Nanog ChIP-PET dataset, Laurie's comparable experiments in human, and the "Mathur et al." analysis (and to the extent there was an analysis, it was *my* analysis) of the overlap between the two, involving additional ChIP-chip experiments. The Mathur paper actually (originally) started as a follow-up to Laurie's work -- in fact, it was her paper before she handed it off to Divya. The "recovery curves" were my idea, and were meant to emphasize the arbitrariness of certain comparison cutoffs. But for all that, Zecchini and Mills still don't have the real story... anyway, the whole thing is worth a read, as a (welcome) warning about the dangers of interpretation in ChIP-based experiments.
The publications list for the TETRAD program from CMU -- which is quite a long list, but I'll go through and pick out a few of the relevant papers for you to read. Also, I'll work on a summary...
Hu & Qin, "Query Large Scale Microarray Compendium Datasets Using a Model-Based Bayesian Approach with Variable Selection" (PLoS ONE)
Nice title.
Bryant et al. "Detection of Gene Expression in an Individual Cell Type within a Cell Mixture Using Microarray Analysis" (PLoS ONE)
Cell populations, mixtures, and microarrays. "This study quantified the proportion of lipopolysaccharide (LPS) induced differentially expressed monocyte genes that could be measured in peripheral blood mononuclear cells (PBMC), and determined the extent to which gene expression in the non-monocyte cell fraction diluted or obscured fold changes that could be detected in the cell mixture."
Jiang and Pugh, "Nucleosome positioning and gene regulation: advances through genomics" (Nature Reviews Genetics)
"In Saccharomyces cerevisiae, the TSS resides at the nucleosome border, suggesting that the transcription machinery must contend with the +1 nucleosome before initiation. In metazoans, the TSS resides in the NFR, suggesting that RNA polymerase II contends with the first nucleosome after initiation."
genepath.org
"GenePath uses abductive inference to elucidate network constraints and logic to derive consistent networks."
Yeo et al. "Sequential Logic Model Deciphers Dynamic Transcriptional Control of Gene Expressions" (PLoS ONE)
"Some obvious comments about school improvement and the achievement gap." (Crooked Timber)
Harry Brighouse writes about schools, evaluation, and teaching -- "The question isn’t just whether this is going on in your school. The question is, if it is not, how will teachers react to the sentence I have bolded? In plenty of comfortable suburban schools in the US it would be anathema. Some of them, are also going to fear that any move in this direction will, ultimately, be used to evaluate them (and they’re going to be right!). But one way or the other, both improving the school and addressing the achievement gap requires that managers give close scrutiny to the effectiveness of teachers, and requires teachers and administrators to understand that they have to see their own learning about teaching the way they see their students’ learning and even their own learning about every other complex and difficult activity."
"Heroic Milton: Happy Birthday" (Frank Kermode, NYRB)
Kermode's somewhat-late NYRB piece on Milton (presumably occasioned by several new biographies). Recommended by a podcast, and (yes) it's actually really good...
matplotlib: python plotting
"matplotlib is a python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms. matplotlib can be used in python scripts, the python and ipython shell (ala matlab or mathematica), web application servers, and six graphical user interface toolkits."
Basemap -- Introduction
"The matplotlib basemap toolkit is a library for plotting 2D data on maps in Python. It is similar in functionality to the matlab mapping toolbox, the IDL mapping facilities, GrADS, or the Generic Mapping Tools. PyNGL and CDAT are other libraries that provide similar capabilities in Python."
Lehigh University Benchmark (LUBM)
Software for generating "large" network datasets, for benchmarking purposes.
Simkus et al. "Efficient Query Answering over Conceptual Schemas of Relational Databases : Technical Report" (arXiv)
"We develop a query answering system, where at the core of the work there is an idea of query answering by rewriting. For this purpose we extend the DL DL-Lite with the ability to support n-ary relations, obtaining the DL DLR-Lite, which is still polynomial in the size of the data. ... The formalization of the conceptual schema and the developed reasoning technique allow checking for consistency between the database and the conceptual schema, thus improving the trustiness of the information system." --- Trustiness?
Trailer for "Inglourious Basterds" (kottke)
Tarantino, Pitt, and a ridiculous "southern" accent? Oh, I'll watch it....
"Defining Science" (Biocurious)
Seth Roberts would have a hernia.
"Musique Vol 1" (Daft Punk)
Daft Punk's "greatest hits" album, only 1.99 to download the mp3s of the entire album, today only. Definitely worth it. "Frank Gehry’s Software Keeps Buildings on Budget" (NYTimes) "Now, however, the economy is crumbling, the building is envisioned as rental apartments and Mr. Gehry is bringing a more potent tool to control costs than most architects can deliver. For the Forest City Ratner Companies, the developer of Beekman Tower, the project will test the idea that an architect can provide powerful (and expensive) modeling software to help keep costs down." --- Laaaaaffff. "An Open Letter to Jona Bechtolt..." (analogindustries.com) The subtle slip (separated by a couple paragraphs) from "illegal" to immoral ("wrong"), the weird "White Hen" analogy, the complete inability to see the contingent nature of his own particular view on intellectual property and "theft" -- it's totally, completely weird. I mean, I'm sure the guy writes really good software that a lot of people use to great effect. And *I* have no problem with him charging money for it. But, seriously? "Optimizing SPARQL-DL" (Clark & Parsia) "These problems led me to experiment with reordering queries dynamically during their execution. Even the first implemented method—greedy-like search—seems to be promising (contrary to the use of greedy search in most other applications). The algorithm starts with the the list L of all query atoms and in each step the atom with best score in L is evaluated and removed from L. The scoring function I’m using is simple: for a given atom it sums up an estimate of the number of consistency checks that are required to evaluate this atom and the number of new variable bindings to be explored in next steps. This heuristic is linear in the number of atoms and thus cheap to compute. " "Towards SPARQL-DL Evaluation in Pellet" (Clark and Parsia) "So, what is SPARQL-DL ? It is a query language recently proposed by Evren and Bijan. Actually, it has less to do with SPARQL than its name may suggest. It’s really an expressive language for querying OWL-DL ontologies, which can in turn be used to extend the semantics of SPARQL’s basic graph patterns." (So, yes, plenty of other people appreciate the similarity.) Grohe and Schwandtner, "The Complexity of Datalog on Linear Orders" (arXiv) "We study the program complexity of datalog on both finite and infinite linear orders. ...As an application, we show that the datalog nonemptiness problem on Allen's interval algebra is EXPTIME-complete." --- Funny how papers like this always seem to come along at the right time. "Toric Varieties and Polytopes" (Secret Blogging Seminar) Relevant tutorial for the algebraic-geometry-of-statistics stuff. Erick Hanushek's Articles Hanushek is an education researcher, one of the guys who pioneered the "value-added" methods of teacher evaluation. (I think.) "Agricultural Sabotage" (BLDGBLOG) "Briefly, I'm reminded of a design project from nearly half a decade ago called "Johnny Apple Sandal," where the soles of a pair of sandals had different varieties of wildflower seeds embedded in their plastic; as your soles wore down, the seeds were released – theoretically going on to form new landscapes. A kind of pedestrian agronomy." -- They should have sold these here in Leominster, in the summer. Horrocks and Tessaris, "Querying the semantic web: a formal approach" (CiteSeer) Boring the first time I read through it -- semi-revelatory the second time, though. The pair of figures at the bottom of page 7, in particular. This is ... something I need to think through more carefully. (Maybe the relationship between "query languages" as such, and description logics, is obvious to everyone else?) "The CNBC Cacophony, Taleb-Roubini Edition" (Felix Salmon) "Nassim Taleb and Nouriel Roubini appear on CNBC to talk about big-picture economic crisis, while the anchors in the studio are interested in things like why "the financials are rallying today". After Taleb talks about the need for massive global deleveraging and deep-rooted systemic change, he's immediately faced with an absolute classic of a question: 'How is this actionable? How is this actionable? Nassim, Nassim, how do I sublimate that into action today?'" --- Do people really use the word "sublimate" like that? pv -- "Pipe Viewer" (Unix Utilities) "pv - Pipe Viewer - is a terminal-based tool for monitoring the progress of data through a pipeline. It can be inserted into any normal pipeline between two processes to give a visual indication of how quickly data is passing through, how long it has taken, how near to completion it is, and an estimate of how long it will be until completion." -- cute. Amazon.com: Kindle 2: Amazon's New Wireless Reading Device (Latest Generation): Kindle Store "You can email your PDFs wirelessly to your Kindle. Due to PDF's fixed layout format, some complex PDF files may not format correctly on your Kindle." --- If the Kindle 2 really does have better support for PDFs, I'd almost be ready to buy it. I want something I can put *all* my papers on. "Oops! Make That Four Forms of Pure Boron" (TierneyLab Blog - NYT) "So stick to wine reviews and profiles of Liev Schreiber, 'cause when it comes to science ..." (What kind of half-assed paper are you running over there?) "Classification of simple groups and computers" (Secret Blogging Seminar) "(Some people seem to consider computer proofs as bad because they are unreliable, these people are either lying or crazy. There are lots of issues with computer proofs, but reliability relative to a long computation in a journal article is not one of them.)" "Being Acquitted Versus Being Searched (YANAL)" (Paul Ohm, at Freedom to Tinker) "With this post, I'm launching a new, (very) occasional series I'm calling YANAL, for "You Are Not A Lawyer." In this series, I will try to disabuse computer scientists and other technically minded people of some commonly held misconceptions about the law (and the legal system)." --- OMG, what an opportunity missed! A "new series" of legal posts from a lawyer named "Paul Ohm," and he names it "YANAL"??? No, no, a thousand times no! This series *very obviously* should have been named, "Ohm's Law." "Install JAGS and rjags in Fedora" (Yu-Sung Su's Blog) "JAGS was developed to help those non-Window users to be able to use BUGS. However, I found it is very hard to install JAGS and its R package-rjags. Here is my note on how to install JAGS and rjags in Fedora 10. The note here should work for all Linux systems." --- R & BUGS on Linux? Sweeeeet. "It Might Be Cool" (xkcd) "... faithfully." Best inauguration joke yet. XKCD's been on a roll, recently. prog21: Puzzle Languages "Compare this to, say, Python. I can usually bang out a solution to just about anything in Python." -- Clearly this guy hasn't seen enough complicated problems. lwjgl.org - Home of the Lightweight Java Game Library ScenarioML : Thomas Alspaugh : UCI "ScenarioML" appears to be a structured language for expression "scenarios" about programs, or during design? (See also: Alspaugh's publications: http://www.cs.georgetown.edu/~alspaugh/pubs.html ). Not sure this is relevant to what I'm looking at right now, but I came across it while browsing some technical reports from Alspaugh about Allen's interval algebra, and wondered if it might be of interest to other people... Dechter, Meiri, and Pearl, "Temporal constraint networks" (1991) Interval constraints, expressed as numerical quantities and directed graphs between intervals. I think this is closer to my own original conception of the problem (17 years out of date, of course) -- good to know I was on the right track. "Formality and interpretation" (Language Log) "The one positive conclusion from Fish's work that I believe I've grasped, so far, is the crucial role of what he calls "interpretive communities" in providing enough of a shared context — even if ephemeral and unfounded — for some minimal communication to take place. So it's ironic that he so completely fails to understand Kempson's work in the context of her native interpretive community." --- That sound you hear is Stanley Fish getting smacked down *with logic*. (That is to say, not only is it a logical smackdown, but its actual performance includes a prominent use of the phrase, "model theory.") Also, the Kempson article is really good. Time Ontology in OWL OWL-Time, a 2006 W3C working draft for representing the Allen interval relations in RDF. Allen, "Maintaining knowledge about temporal intervals" (CiteULike) A 1983 paper on reasoning about intervals -- in his case, about "time" intervals. It might be overkill, but something like this seems like it'd be a reasonable component in a system for reasoning about genomic intervals. "Prop 8 Donor Web Site Shows Disclosure Law Is 2-Edged Sword" (NYTimes.com) "For the backers of Proposition 8, the state ballot measure to stop single-sex couples from marrying in California, victory has been soured by the ugly specter of intimidation. Some donors to groups supporting the measure have received death threats and envelopes containing a powdery white substance, and their businesses have been boycotted." -- Intimidation and death threats are awful (and probably illegal) but seriously, NYT, one of these things is not like the others. "How to travel in the U.S." (Marginal Revolution) "Boston would please a European but not in a truly instructive way." But Boston != Beacon Hill + North End. I find it hard to believe that Medford or Dorchester would please a European in *any* way, although I do believe that they might be quite "instructive," in a certain sense. Hartig and Heese, "The SPARQL Query Graph Model for Query Optimization" - semanticweb.org Ruckhaus, Ruiz, Vidal, "Query evaluation and optimization in the semantic web" (Theory and Practice of Logic Programming) The research article to go along with their 2007 slides. Ruckhaus, Vidal, and Ruiz "SparQL Cost-based Query Optimization" Slides from a 2007 talk in Madrid. Some notes on doing query optimization in RDF networks for which some of the edges are inferred. From page 10: "Sampling techniques are good for estimating statistical characteristics where data is not known a priori." Good. (Although, probably not all "sampling techniques" are created equal, in this case, hmm?) Hu, Killion, Iyer, "Genetic reconstruction of a functional transcriptional regulatory network." (Nature Genetics, 2007) The Hu/Iyer TF knockout paper. GSE4654: the Iyer "genetic reconstruction" dataset on GEO David Danks -- Selected publications CMU guy -- psychology, perception, causal reasoning, and one author on that paper (Wimberly, etc) about aggregated measurements and causal networks for gene regulation. uniprot RDF Semantic Mashup of Biomedical Data, Journal of Biomedical Informatics. "Journal of Biomedical Informatics, Volume 41, Issue 5, Pages 683-860 (October 2008)" (None of the PDFs appear to be online through the journal webpage. Most of them appear to be available elsewhere on the web, though, and about five of them look interesting. Also, I hate the word "mashup." ) Robert Koch *Astrology*, jolene?? Electronic Reading Room | Office of Inspector General | US EPA The EPA's OIG has a "reading room" with links to all their reports. (But no data!) Well laid out, and pretty reasonable. Now all I need is a little OCR tool that can recognize digits, and that I can use to pull out the numbers from the graphs in PDFs like these. "Why Obama’s Energy Savings Estimate May Be Skewed" (NYTimes.com) "In December, the inspector general issued a report that said Energy Star’s savings claims were “not accurate or verifiable.” The report found that shipment data for Energy Star products were not being adequately reviewed and in some cases were based on estimates instead of actual shipping totals. In the other report, in August 2007, the inspector general addressed the integrity of the Energy Star label, noting that “E.P.A. does not have reasonable assurance” that the process allowing manufacturers to self-certify their products is effective." --- I'm not sure that the word 'skewed' is *entirely* appropriate here. It doesn't seem (at a cursory glance) to show up in the reports themselves. And "uncertainty" is a different thing than skewness. But more to the point, nowhere in the NYT article is there an actual link to the report. Here's the most recent one: http://www.epa.gov/oig/reports/2009/20081217-09-P-0061.pdf Bacon, "Novum Organum" Book II (Wood) - Wikisource "But natural and experimental history is so varied and diffuse, that it confounds and distracts the understanding unless it be fixed and exhibited in due order. We must, therefore, form tables and co-ordinations of instances, upon such a plan, and in such order that the understanding may be enabled to act upon them. Even when this is done, the understanding, left to itself and to its own operation, is incompetent and unfit to con­struct its axioms without direction and support. Our third ministration, therefore, must be true and legitimate induction, the very key of interpretation." "UnitedHealth and I.B.M. Test Health Care Plan" (NYTimes.com) "UnitedHealth will try giving doctors more authority and money than usual in return for closely monitoring their patients’ progress, even when patients go to specialists or require hospitalization. The insurer will also move away from paying doctors solely on the basis of how many services they provide, and will start rewarding them more for the overall quality of care patients receive. " "Mapping Religiosity in the States" (The Monkey Cage) John Sides discovers the "numbers don't map easily onto a color gradient" problem -- "But now a different problem emerges, one that affects both maps. The color gradations distort the underlying data. It looks as if something is qualitatively different when two states are different colors, even when the quantitative differences are small. In the Gallup map, Arizona and New Mexico are different colors, but only 5 points separate them. Mississippi and Alabama are the same color, but 5 points also separate them. In my map, Mississippi and Alabama look really religious, but in fact only 2-3 points separate them from South Carolina or Tennessee. ... I’m open to further thoughts on this. These seem to me potential problems in mapping lots of different quantitative data." Gianoulis et al. "Quantifying environmental adaptation of metabolic pathways in metagenomics" (PNAS) "Therefore, we introduce an approach that employs correlation and regression to relate multiple, continuously varying factors defining an environment to the extent of particular microbial pathways present in a geographic site. Moreover, rather than looking only at individual correlations (one-to-one), we adapted canonical correlation analysis and related techniques to define an ensemble of weighted pathways that maximally covaries with a combination of environmental variables (many-to-many), which we term a metabolic footprint." --- Possibly, perhaps, maybe, modestly interesting. Kurihara et al. "Genome-wide suppression of aberrant mRNA-like noncoding RNAs by NMD in Arabidopsis" (PNAS) "Here, we noticed that, in Arabidopsis, most of the mRNA-like nonprotein-coding RNAs (ncRNAs) have the features of an NMD substrate. We examined the expression profiles of 2 Arabidopsis mutants, upf1-1 and upf3-1, using a whole-genome tiling array. The results showed that expression of not only protein-coding transcripts but also many mRNA-like ncRNAs (mlncRNAs), including natural antisense transcript RNAs (nat-RNAs) transcribed from the opposite strands of the coding strands, were up-regulated in both mutants. The percentage of the up-regulated mlncRNAs to all expressed mlncRNAs was much higher than that of the up-regulated protein-coding transcripts to all expressed protein-coding transcripts. This finding demonstrates that one of the most important roles of NMD is the genome-wide suppression of the aberrant mlncRNAs including nat-RNAs." Omics! Omics!: Ah, them gold rush days! "In some sense the genomics companies were just too early for their own good (though the late entrants such as DeCode haven't fared much better). There are no genomics companies -- yet genomics is everywhere. Basic biology fueled by the genome or the technologies pushed by genomics permeate the drug industry (based on the 2 large pharmas I interviewed at in the year MLNM laid me off & what I can read; constructive dissent on this point is welcomed). Probably no novel small molecule drug development history will be directly pinned back to a 1990's genomics effort -- but also virtually no drugs going forward will have their development unaffected by the knowledge of the genome. Everything is tangled up & confused & merged." --- That's funny, some of the people from Millennium sit in Stata these days (in the Science Commons consortium). They have a similar perspective, I think, but a different takeaway message (also: more optimism). Publications, Mark Johnson, Cognitive and Linguistic Sciences, Brown University Saw Mark Johnson give a talk about "Adaptor Grammars" (man, that 'o' really bothers me) two days ago. It turned out to be ... an extremely boring talk, although the idea itself seems modestly interesting and it included several reasonable animations of hierarchical Chinese Restaurant processes that were modestly illuminating. At any rate, I sat in the back, doodled on my notebook, and started to idly wonder if issues of "frequentist consistency" for this sort of learning process had been examined (or were even worth examining) at all... 36-707: Regression Analysis, Fall 2007 Larry Wasserman's class notes from a course on regression analysis. Via cshalizi. "Doubling Speed and Halving Memory in Binomial Naive Bayes" (LingPipe Blog) "In practice, there’s no need to allocate a vector of token counts. Instead, as each token is streamed out of the tokenizer, its coefficient is looked up and added to the running total, which is initialized with the category log probability difference (aka the intercept). This recasting of naive Bayes as a linear classifier is not novel. The question becomes how best to choose the coefficient vector. Naive Bayes tends to be outperformed in terms of accuracy by discriminative methods like logistic regression or linear support vector machines, but these methods are orders of magnitude more expensive to train." (Includes a link to the Klein and Manning paper at the end.) Lorenz, Cantor, and Collins, "A network biology approach to aging in yeast" (PNAS) "In this study, a reverse-engineering strategy was used to infer and analyze the structure and function of an aging and glucose repressed gene regulatory network in the budding yeast Saccharomyces cerevisiae. The method uses transcriptional perturbations to model the functional interactions between genes as a system of first-order ordinary differential equations. The resulting network model correctly identified the known interactions of key regulators in a 10-gene network from the Snf1 signaling pathway, which is required for expression of glucose-repressed genes upon calorie restriction." "computational social science" (orgtheory.net) "A slew of notable social scientists including David Lazer, Nicholas Christakis, Gary King, Michael Macy, and my colleague Noshir Contractor make the case that more funding, attention, and serious energy should be put into the study of social life on computer networks (e.g., the Internet, mobile phones)." -- I was listening to someone (Alan Blinder or Tyler Cowen) on This American Life the other day, explaining how economics "as a science" was hampered by the (in)ability to perform controlled experiments. Cowen was suggesting that the upcoming stimulus was going to be as close to a controlled experiment on "Keynesian spending" as we were going to get. But I was sitting in my seat on the train, shouting (to myself) "you need to be doing this research in MMORPGs!" I know that some people are already thinking along these lines... but it's the *obvious* opportunity. Jump on it, economists! digitalresearchtools An index of tools, divided by category and use. A lot of for-pay software is listed (bleagh), and some of the entries that I spot-checked are missing some obvious software. But still, there are some gems hidden throughout, so it might be worth a look. Campaign Donors : Fundrace 2008 - Huffington Post Location-mapped FEC donation data from the 2008 race. "See the data underlying our tax database" (guardian.co.uk) UK Tax data, collated by The Guardian, available (it appears) as XML. Anatomy of a Program in Memory : Gustavo Duarte For my own uses, I should really go through and create a total index of *all* of Duarte's posts/tutorials on the low-level structure of software and computers. It's like a mini-review of my architecture course, with a much greater focus on Intel-specific details. David Draper, "Bayesian Decision Theory in Biostatistics: the Utility of Utility." (PDF) Slides from a talk, via John Cook's blog (http://www.johndcook.com/blog/2009/01/28/cost-benefit-analysis-versus-benefit-only-analysis/). Cook's comment: "So if two variables make nearly equal contributions to a model, for example, the procedure would give preference to the variable that is cheaper to collect. In short, Draper recommended a cost-benefit analysis rather than the typical (statistical) benefit-only analysis. Very reasonable." --- but all this seems so very far removed from the kinds of biostatistical data gathering I'm mostly familiar with. How does the calculus change (at all?) when the experiments are *large* (a sequencing run? a microarray?) and when the results themselves depend on some complicated downstream analysis? I'm not sure how to even phrase these questions. The Logic of Knowledge Bases - The MIT Press Wantwantwant. (purchasedpurchasedpurchased.) Estacio-Moreno, Toussaint, and Bousquet, "Mining for adverse drug events with formal concept analysis" (arXiv) I haven't read it, but it sounded similar to some of what the guy from Genstruct was talking about. "And the race continues…" (Next Generation Sequencing) "Illumina expects to increase their Genome Analyzer sequencing throughput per run by about 6 times in 2009 - following a 15 fold increase in 2008 this gives them a whopping 90 fold increase in just two years." Kelly et al. "Cell Lineage and Regional Identity of Cultured Spinal Cord Neural Stem Cells and Comparison to Brain-Derived Neural Stem Cells" (PLoS ONE) Linux Assembly and Disassembly: an Introduction "Ten Years After: The Genomics Frenzy." (In the Pipeline) "So where are we now? So, in the end, there was no genomics gold rush, at least not in the way that everyone thought. The genomics players are out of business, or if not, they had to completely retool and find something else to do. Most of their patent applications were wastes of time and money, since they never issued, were generally hard/impossible to defend if they did, and are mostly heading for expiration without having made anyone a dime. The value of the genome is real, but it’s taken (and it’s still taking) a lot longer to realize it than anyone would have believed in 1999." The Article Search API A JSON interface to the NYTimes article-search system. "New USACM Poilcy Recommendations on Open Government" (Freedom to Tinker) Hear hear. (I don't know about the recommendation about having an API, but all the others seem eminently reasonable.) "Fan attacks ref who also is a state trooper" (nwi.com) "That's not fair." "Where's Markopolos's Blog?" (Felix Salmon) "Ray Pellecchia is right: if Harry Markopolos had taken all of his evidence about Bernie Madoff and put it on a blog, instead of submitting it to the SEC, there's a good chance that would have been the end of Madoff right there. All the time Markopolos was talking to the WSJ, trying to get them to run a story about Madoff, would have been much better spent setting up an anonymous Wordpress blog and just putting the information and analysis out there himself." --- Is Salmon really asking, "why didn't this random investor start an 'anonymous' blog that claimed a particular massively-wealthy person was committing fraud?" Isn't the obvious answer that, 'No blog is truly anonymous, and that even if the claims were true (which in retrospect they obviously were) he would have spent the majority of his life between now and then getting sued out the wiz-oose?' "The Role of Schema Matching in Large Enterprises" (David Karger at Haystack Blog) "They discovered belatedly that spreadsheets were actually a good way to show the pairwise matches of schema terms. But this doesn’t work beyond 2 schemas. Multi-way matching is hard and vital." "Writing Linux Programs in Raw Binary" "And that's all there is to it! Of course, a sequence of bits like that is unmaintainable, but now you know: how it works." VoiD - semanticweb.org "voiD (from "Vocabulary of Interlinked Datasets") is an RDF based schema to describe linked datasets. With voiD the discovery and usage of linked datasets can be performed both effectively and efficiently. A dataset is a collection of data, published and maintained by a single provider, available as RDF, and accessible, for example, through dereferenceable HTTP URIs or a SPARQL endpoint." "The Thing King" (Gustavo Duarte) "Things can only be zarked when they are in the workshop. Only the Thing King knows whether a thing is in the workshop or in a warehouse. The longer a thing goes without being zarked, the grubbier it is said to become. The way you get things is to ask the Thing King. He only gives out things by the crateful. This is to keep the royal overhead down." --- The Thing King's workshop management strategy probably shouldn't be hardcoded like that. (http://portal.acm.org/citation.cfm?id=1286772) "Worry less about releasing terrorists" (Marginal Revolution) "Fewer terrorists are better than more terrorists, to be sure. But a terrorist we release is not obviously worse than a terrorist who was free in the first place. We evaluate outcomes differently when we feel we are in control or should be in control. We should examine this intuition carefully, since it is not always justified." --- I'm tempted just to quote the entire post in this space. "Different meanings of Bayesian statistics" (Andrew Gelman) "Anyway, I posted the above discussion (basically, all except for the previous two paragraphs, to their blog and got the strangest comments. Not that people were saying anything wrong, just they were coming from a traditional theoretical computer science perspective. For them, Bayesian statistics is all about code lengths; for me it's all about hierarchical models. Which I guess is consistent with my original point. Still, it's frustrating for me (but perhaps frustrating to some of these people from the other side, that statisticians see Bayes as about models rather than philosophy and code lengths). I thought that communicating with econometricians and non-Bayesian statisticians was tough, but this is a whole new level of difficulty!" Kryders Law - Hard Disk Growth "This means Kryder's Law isn't any stronger than Moore's Law. In fact, Kryder's is exactly the same rate as Moore's Law. Its a wonder there is a separate name for the hard disk observation at all. A little checking shows the concept called Kryder's Law probably wouldn't even exist if it weren't for Wikipedia. The only time "Kryder's Law" is mentioned anywhere on the googleable Internet apart from works derived from the Wikipedia article is in the headline of the Scientific American article, which was senationalised, like most headlines, to attract interest. It seems an editor has read the article and used it as inspiration to create the wikipedia entry. A second editor has then misquoted the article when including the 10.5 year figure. Wikipedia's strong reputation did the rest." Cost of CPU Performance Through Time 1944-2003 More data on the growth of computing power. "Hard drive capacity over time" (Wikipedia) An image file, from wikipedia, with the relevant graph. "Boyfriend" (xkcd) That's.... awful. Just awful. "The (Ironic) Best Way to Make the Bailout Transparent" (Freedom to Tinker) "Why does this matter? Because without the underlying data, anyone who wants to provide a useful new tool for analysis must first try to reconstruct the underlying numbers from the "user-friendly visual presentations" or "printable reports" that the government publishes. Imagine trying to convert a nice-looking graph back into a list of figures, or trying to turn a printed transcript of a congressional debate into a searchable database of who said what and when. It's not easy." --- Not only a problem with government-generated and -released data, by the way. "What is Japan doing at 2:04pm?" (Social Science Statistics Blog) "October 18th is Statistics Day in Japan. There are posters. And a slogan: "Statistical Surveys Owe You and You Owe Statistical Data"!" "Follow me like a leopard." (The Edge of the American West) "... Bertie Wooster’s Uncle Percy (with a brief assist from Bertie), in P. G. Wodehouse, Joy in the Morning. Which, I arbitrarily assert, is the best of the Jeeves/Wooster novels." "Tom Coughlin Retires From Family To Spend More Time With Team" (The Onion) Pitch-perfect ESPN send-up... "How to read flow of funds accounts" (Investment Tutorial) "Obscure tables may be acceptable to economists, who thrive on murky concepts and mysterious mathematics, but for pragmatic investors and portfolio managers, time is of the essence." --- Hilarious. "Porteous et al. (2008) Fast Collapsed Gibbs Sampling for Latent Dirichlet Allocation" (LingPipe Blog) "Porteous et al. present a method for speeding up the sampling step of the inner loop of collapsed LDA, but their method is actually much more general. The approach is pretty straightforward, and reminds me of arithmetic coding. It relies on being able to visit outcomes in roughly decreasing order or probability, compute their unnormalized probability efficiently, and computing an upper bound on the normalization constant given only the unnormalized probabilities of the items seen so far (plus other knowledge if it’s easy to compute)." Green et al. "Provenance Semirings" (PDF) "We show that relational algebra calculations for incomplete databases, probabilistic databases, bag semantics and why-provenance are particular cases of the same general algorithms involving semirings. This further suggests a comprehensive provenance representation that uses semirings of polynomials. We extend these considerations to datalog and semirings of formal power series. We give algorithms for datalog provenance calculation as well as datalog evaluation for incomplete and probabilistic databases. Finally, we show that for some semirings containment of conjunctive queries is the same as for standard set semantics." -- Via a link on Mark Dominus's blog (http://blog.plover.com/math/catalan-squared.html#ENIAC) Julius Plenz - Tunnel everything through SSH "In this Tutorial I'll cover how you can tunnel any TCP traffic through an encrypted SSH connection or a SOCKS server, even if a certain program doesn't support proxying of connections natively. The only requirement for SSH tunneling to work is a shell account on a machine connected to the internet (and, optionally, a HTTP Proxy server). I will refer to this account as your server (it doesn't matter if you may not become root)." --- Via the Hack-a-Day blog. This feels like it'll be useful at some point soon. A RDFa-based Templating Language proposal "RDFa is a W3C Recommendation for embedding RDF in XHTML. Since RDF represents structured data, we can utilise it to represent both invariant data and variables. The invariant data can be used to control a backend web application. By connecting invariant data and variables, data we wish to use to populate a document can be retrieved. This is the motivation for a new templating language, RDFa Templates." -- There's nothing that can't be fixed by adding an additional level of indirection. ("Virtual classes... layers of abstraction...") Clearly a proposal after my own heart. "Cohomology for dynamical systems" (What’s new) "When n=0, and if the action of G is transitive (in the discrete category), minimal (in the topological category), or ergodic (in the measure-theoretic category), the only 0-cocycles are the constants, and the only 0-coboundary is the zero function, so H^0(G,X,U) &#92;equiv U. When n=1, it is not hard to see that the notion of 1-cocycle and 1-coboundary correspond to the notion of cocycle and coboundary discussed at the beginning of this post." "Reading graphs: How we do it, and what it tells us about making better ones" (Cognitive Daily) Extended comments, from a psychologist, on this paper: Raj M. Ratwani, J. Gregory Trafton, Deborah A. Boehm-Davis (2008). Thinking graphically: Connecting vision and cognition during graph comprehension. Journal of Experimental Psychology: Applied, 14 (1), 36-49. "McCloskey et al. on significance testing in economics" (Andrew Gelman) "Many years ago, Gary told me that his generic question to ask during seminars was, "What are your standard errors." Apparently in poli sci, that used to stop most speakers in their tracks. We've now become much more sophisticated--in a good way, I think. (By the way, it's good to have a few of these generic questions stored up, in case you fall asleep or weren't paying attention during the talk. My generic questions include: "Could you simulate some data from your fitted model and see if they look like your observed data?" and "How many data points would you have to remove for your effect estimate to go away?)" --- I live for these little asides, tucked away in Gelman's posts... Huffduffer Turning found audio files into podcasts. (aka, something like 'listenr'). Sharon Goldwater's Bayesian language modeling reading list A reading list that goes through (it appears) 2007, hitting most of the high points -- broad, but not overly deep. Bradford Plumer writing about David Chalmers talking about "the extended mind," cognition and objects. "One more thing about the extended mind. It ought to include other people, too. Shouldn't it? I know I have a bunch of stories and anecdotes I can't tell—or at least tell well—without some of the other participants around to help me fill in the details and color. In effect, I have to recreate the memory with someone else. I read somewhere once a devastating article about how older widows essentially lose a large chunk of their memory after their partner dies, because there are many events that could only recollect "together," with their spouse. No, wait, maybe this was in a novel? Or a This American Life podcast? Oy, my brain's a barren wasteland—off to wheedle the answer out of the Internet." Download PyRoom — distraction free writing Python version of the (fantastic) OS X software, "Write Room." This is really, really great. "How’s that workin’ for ya?" (Is there no sin in it?) "You can see where this is going." (The teacher-as-Gordon-Ramsay, not a take on his show that I had thought of before. Pretty great.) Ord, Hillerbrand, and Sandberg, "Probing the Improbable: Methodological Challenges for Risks with Low Probabilities and High Stakes" (arXiv) "Using the risk estimates from the Large Hadron Collider as a test case, we show how serious the problem can be when it comes to catastrophic risks and how best to address it." --- Via Felix Salmon, which is enough of a recommendation to warrant a second look, in my book. "Confidence tricks" (Kottke) "I can't find any evidence that the FL gift certificate incident ever happened or documentation of a trick called "The Ogged" anywhere aside from Wikipedia. Anyone?" --- Kottke *is* Standpipe, and his joke-explaining blog has been hiding in plain sight this entire time. "What is automatic differentiation, and why does it work?" (Conal Elliott) "Bertrand Russell remarked that: 'Everything is vague to a degree you do not realize till you have tried to make it precise.' I’m mulling over automatic differentiation (AD) again, neatening up previous posts on derivatives and on linear maps, working them into a coherent whole for an ICFP submission. I understand the mechanics and some of the reasons for its correctness. After all, it’s “just the chain rule”." --- Conal Elliott's series on automatic differentiation has been really good. "All you ever wanted to know about writing bloom filters" (Jonathan Ellis's Programming Blog - Spyced) "Judging from the bloom filter implementations out there, generating appropriate hashes is surprisingly hard to get right. One implementation in java uses object.hashCode() to seed the stdlib's PRNG and calls Random.nextInt() to generate the bloom hashes. This works okay for small filters but the false positive rate is up to 140% of the expected rate for large filters. This one in python combines the stdlib hash and a pjw hash with simple arithmetic to achieve "only" an extra 10-15% false positives." --- That accords exactly with my experience. "HDR and Color Constancy--New Psychophysical Results" (YouTube) Looks like exactly what I wanted to know. To watch in full, when I get home tonight. "Exascale computing" (The Endeavour) "...Peter Kogge explains what it would take to increase our computing capacity by a factor of 1000. We can’t just do more of what we’re doing now and scale linearly. For example, one limitation is power consumption. Data centers now consume 1.5% of the electricity on the U.S. power grid. Nevertheless, Kogge outlines ways we could achieve a 1000-fold increase in computing capacity by 2015." "Eight Years Gone" (Carrie Brownstein, at her NPR blog) Andrew Bird's "Scythian Empire." "A simple trigonometric identity" (The Universe of Discourse) "I will let you all know if I come up with anything about the almost-equilateral lattice triangles. Clearly, you can approximate the equilateral triangle as closely as you like by making the lattice coordinates sufficiently large, just as you can approximate √3 as closely as you like with rationals by making the numerator and denominator sufficiently large. Proof: Your computer draws equilateral-seeming triangles on the screen all the time." "Good teaching" (Is there no sin in it?) "Caesaigh asks if we might have a thread on good teaching. What is it? Is it like pornography, and we know it when we see it, but it’s too elusive to be described? Can we give instances? I remember there was that Malcolm Gladwell article in the NYer a while back about how difficult it is to predict whether someone will be a good teacher. It’s not predictable by intelligence, and it’s also not predictable by charisma or any of that. Some of the smartest, most charming people of my acquaintance struggle in the classroom." --- Even AWB is in on the game. "Judging Presidents" (Greg Mankiw's Blog) "Consider a related question: How would you judge the competence of a doctor if you could observe him treating only a single patient? What you would not do is judge him by the outcome. Even the best physicians have patients die. And even witchdoctors can have patients recover. Randomness is a fact of life (and death). In the case of a medical doctor, the answer seems clear: Instead of looking at the outcome, you would judge him by the decisions he makes and treatments he prescribes. That is, you would examine whether he followed best practices for the circumstances he faced." --- It seems like the question of professional evaluation (whether it's quarterbacks, teachers, doctors, or the president) is on everyone's mind these days... "Doing science online" (Michael Nielsen) Fine, and interesting. But two (important) things: (a) not all science looks like Terry Tao collaboratively exploring deep mysteries of the primes, and (b) by my understanding, his account of Galileo is *completely wrong*. Shrader-Frechette on Sunstein on Risk I knew I'd seen this review somewhere before -- did I really not save it at the time (I guess not). I should probably go ahead and buy the book. "Although Sunstein correctly calls for “sound science” in risk policy, he often gets his science wrong and almost always attempts to reduce ethical to purely scientific questions." -- although that's a little question-begging itself. Then --- "Because most hazardous materials are not tested, most risk probabilities are determined through mathematical models. As such, the models describe events falling into the category of Bayesian “uncertainty,” where no accurate probabilities are available, because there are no frequency data. If data were available, there would be no need for risk analysis and its attendant models. Given this Bayesian uncertainty, virtually all risk experts accept the fact that risk analyses typically err by 4 to 6 orders of magnitude." -- What? Sunstein and Zeckhauser, "Overreaction to Fearsome Risks" (SSRN) "Fearsome risks are those that stimulate strong emotional responses. Such risks, which usually involve high consequences, tend to have low probabilities, since life today is no longer nasty, brutish and short. In the face of a low-probability fearsome risk, people often exaggerate the benefits of preventive, risk-reducing, or ameliorative measures." --- Maybe *Sunstein's* life isn't "nasty, brutish, and short," sure. In fact, I'd bet it isn't. But that's not a complete answer to the question... (link via Tyler Cowen) "Is Google Too Big To Fail?" (Felix Salmon) "I think there's definitely something to the theory that Nassim Taleb thinks the way he does partly because he grew up in Lebanon, where you couldn't trust the banks. By contrast, most of the US and Europe grew up in a culture where banks were old and venerable and were to be trusted implicitly... Today, much of that trust is gone -- but we still have trust in something more precarious still, which is the cloud, and the internet generally." --- When you get right down to it, what else is too big to fail? Microsoft, probably. Walmart, maybe? Surely we can think of others? One or more of the major wireless carriers? (Starbucks? *I* certainly wouldn't want it to fail...) Muxtape Information It's back... well, sort of. Not really. Shalev-Schwartz, Singer, and Srebro, "Pegasos: Primal Estimated sub-GrAdient SOlver for SVM" Nati's paper I was telling you about -- gradient-descent, but with stochastic sub-sampling so that the rate of convergence doesn't depend (i.e. get worse!) when you've got more data points. "A Lottery for People Who Are Good at Math" (Freakonomics Blog at NYTimes) Alex just pointed out this comment to me -- he said that the pool (of residents of Random Hall) is real, that they've been doing it for years, and that some of them have had individual takes in the tens-of-thousands of dollars. He mentioned that one former student has apparently being doing this *full time*, even taking the time to travel out-of-state to play other lotteries with similar characteristics. Wow. "Ask the Administrator: What the Fish?" (Confessions of a Community College Dean) Cosma Shalizi writes: "I'd add that when European universities began in the middle ages, the three big subjects were law, medicine, and theology, and the last was not about rapturous contemplation but producing higher officials for the Church." --- Dad, "Confessions..." is that blog that you noticed over my shoulder, when I was home for the holidays this year. "A longstanding principle in statistics" (Andrew Gelman) "This reminds me of a longstanding principle in statistics, which is that, whatever you do, somebody in psychometrics already did it long before." --- This reminds me of a comment Gelman made at the end of his lecture on Red State Blue State, about how "psychometrics is the most difficult area of statistics." Because, basically, "there's no ground truth." "Renting Big Data" (ben fry) Ben Fry talks about the "Public Data" datasets from Amazon S3. No opinion as to its utility, though -- a lot of that depends, I think, on the price involved. Speccing out how much time/money it would take (or what the time/money curve is) to do some basic alignment runs with short-sequence reads seems like something that Dave would be interested in... "Barry Schuler: Genomics 101" (TED.com) Barry Schuler talks about genomics and its application to food science (sponsored by Sean^H^H^H^HRobert Mondavi vineyards, "the finest genomics for your tablery"). I haven't watched this yet, I have no idea if it's worth anyone's time or not. Switching from scripting languages to Objective C and iPhone: useful libraries :: Hackdiary Ted taught a class in iPhone development this IAP, and now Dave has an iPhone. He *needs* a version of the Warpdrive visualizer for it. Somehow, these two observations need to be put together. Manual Protocol Reverse Engineering — BreakingPoint "With that said, I'd like to share some screenshots and discuss some of the techniques I've developed to deal with this arcane art." What it turns out he's doing, I think, is byte-wise alignment on the packets, followed by some by-hand analysis of covariation. Once again, it'd be interesting to see some of the same techniques for doing this in bioinformatics (Smith-Waterman alignment, some of the sequence conservation or RNA-secondary-structure-predictor algorithms applied to the result) used for the same purpose. Crowley on Disch II — Crooked Timber Standpipe? 50+ Semantic Web Pros to Follow on Twitter - ReadWriteWeb Hilarious. Bill Kristol "Ambivalent" About Staying At New York Times "By his own account, Kristol is the sort of person who browses through a used bookstore at the Milwaukee airport while waiting for a plane and picks up an old edition of Orwell's essays." --- Christ, *really*? Anyone who's been to the Milwaukee Airport has been to that book store (Renaissance Book Shop). It's a little played out now, but if he was actually there at least a couple of years ago, and he only came away with "an old edition of Orwell's essays," then he didn't actually look very hard for anything interesting. It'd be like saying, "he travelled to Paris, and managed to pick up a small faux-bronze figurine of the Eiffel Tower." In fact, this essay accomplishes the rare (for me) bi-fecta of reducing my opinion of Kristol and George Packer at the same time. Well-played, sir. "Moral arbitrage" (John Quiggin at Crooked Timber) I liked this example when I first read it. It's time to get it out of my FF tabs and into delicious, where I can forget it until I need it. --- "As this example shows, with arbitrage opportunities, all sorts of things can be made possible. A consistent virtue ethicist (for example, a Jeffersonian) might reasonably conclude that the criminal behavior inevitable in a long occupation of a largely hostile country is unacceptable to someone who wants to maintain a virtuous disposition. A deontologist would object to violations of well-established principles of just war theory. A consequentialist would certainly conclude that the foreseeable costs of a war exceed the benefits. But a moral arbitrageur can mix and match these principles to reach a conclusion none of them would individually support." "The Effects of Chart Size and Layering on the Perception of Time Series Visualizations" (information aesthetics) "Horizon graphs" were not "invented by ... Panopticon," yo. At the same time -- this is some cool research into UI. It'd be interesting to incorporate some of these relationships (height and stackability of "tracks" vs the height of the screen as a whole) into some of our genome-visualization software... "Whose Job Is This, Anyway?" (The Classroom Window) Kate P.'s mini-rant about testing and teacher evaluation -- it's probably true that just because "statistics are necessary" for a national education policy, doesn't mean that "any statistics will work." But this discussion needs to be grounded in the larger context of NCLB, probably. To blog. "Data availability" (The Monkey Cage) Andrew Gelman discussing data availability from the Catalist guys (the DNC-contracted group that did data mining and analysis for the Obama campaign in 2008. -- "My own experiences coming at it from the other side: I have data and code from my books on the web. And people email me all the time asking for cleaner code, or help with the data, or advice on debugging their code, or whatever. I don’t mind—really, it’s no problem, when I’m too busy I don’t have to reply—but it does seem to be true that when you give stuff away, people start asking for more…" "Readings in Distributed Systems," Marton Trencseni A list of papers. Garber and Skinner, "Is American Health Care Unique Inefficient?" (Journal of Economic Perspectives) In this paper, we consider two questions: First, is the U.S. healthcare system productively efficient relative to other wealthy countries, in the sense of producing better health for a given bundle of hospital beds, physicians, nurses, and other factor inputs? Second, is the United States allocatively efficient relative to other countries, in the sense of providing highly valued care to consumers? For both questions, the answer is most likely no. "Probability distributions and object oriented programming" (The Endeavour) I tend to think of this as more of a parable about the dangers of the interactions between side effects and subtyping. (There's a well known example, where you show how just because "X is a subtype of Y" that, in the presence of side effects, you *can't* also assume that "X[] is a subtype of Y[]". I'd argue that this example with distributions is really an instance of the same dictum.) Computer Science: Past, Present, Future (Ed Lazowska, PDF Slides) Slides from a talk by Ed Lazowska, a well-known computer scientist at UW. I saw him give a version of this talk as a Dertouzos Lecture about a year ago. If you look at nothing else, check out the slides on pages 74 and 75 of the PDF. "Playing around with Google's AJAX APIs" (Google Code Blog) "Not marble nor the gilded monuments" (Language Log) Geoff Nunberg runs down the president's inaugural address. "There are 67 people in America who live for this stuff." "OCR and Neural Nets in JavaScript" (John Resig) It should be possible to write some software that takes in training data and a neural-network structure, learns the weights through training, and then compiles it all down into Javascript -- in other words, this kind of thing should be automatically generate-able. A Slime draws near! (Kriston's Flickr stream) I must have this. Cammarano, Dong, Chan, Klingler, Halevy, and Hanrahan, "Visualization of Heterogeneous Data" "When the schema is large and unfamiliar, this requirement inhibits exploratory visualization by requiring a costly up-front data integration step. To eliminate this step, we propose an automatic technique for mapping data attributes to visualization attributes. We formulate this as a schema matching problem, finding appropriate paths in the data model for each required visualization attribute in a visualization template." Vovk and Watkins, "Universal portfolio selection" (SIGACT) Kalai and Vempala, "Efficient algorithms for universal portfolios" (JMLR) "Is the world running out of oil, err I mean alpha?" (CASTrader Blog) Satyen Kale's Publications Jayram, Kale, and Vee, "Efficient Aggregation Algorithms for Probabilistic Data" "We study the problem of computing aggregation operators on probabilistic data in an I/O efficient manner. Algorithms for aggregation operations such as SUM, COUNT, AVG, and MIN/MAX are crucial to applications on probabilistic databases. We give a generalization of the classical data stream model to handle probabilistic data, called probabilistic streams, in order to analyze the I/O requirements of our algorithms." "Thomas Cover's Universal Portfolio - Part II, Further Reading" (CASTrader Blog) T. Cover Selected Publications on Portfolio Theory SF conference for data mining mercenaries - Brendan O'Connor's Blog "John Langford posted a very interesting email interview with one of the organizers for the event, about how machine learning gets applied in the real world. The guy seemed to think that data integration — getting all the data out of different information systems within an organization and in the same place — is the most critical and hardest step. This aligns with my experiences." "Is it something I said?" (Andrew Gelman) "You must have received some excellent proposals to fund that were even better than ours! So congratulations on that." -- I think these are the funniest two sentences I've read all week (nay! all month!). Plus, it's quickly on its way to becoming an academic-blogging meme (http://lingpipe-blog.com/2009/01/23/lacks-a-convincing-comparison-with-simple-non-bayesian-approaches/). So, congratulations on that. How Do Hospitals Get Paid? A Primer - Economix Blog, Uwe Reinhardt C, you should totally talk to Plainy about this. She was on the billing end of the MGH liver stuff, for a while. George Harrison: I Got My Mind Set On You (1987) (YouTube) This came out when I was in the third or fourth grade... and for years after I heard it I didn't know it was George Harrison who was singing (this was probably, in all fairness, before I knew who the Beatles were too). It's such a fluffy little piece of pop music, and the video is nearly unwatchable. But it's deeply weird to me, how deeply embedded in my psyche this song ends up being. Owlgres: A Scalable OWL Reasoner | Thinking Clearly "Owlgres a DL-Lite implementation for PostgreSQL. DL-Lite is a fragment of expressive DLs with some interesting properties. First, together with the standard reasoning services (e.g. subsumption) it supports conjunctive query answering over an ABox maintained in secondary storage (typically a RDBMS). Second, DL-Lite has some nice computational properties, i.e. the standard reasoning tasks are polynomial in the size of the TBox and query answering is polynomial in data complexity. Third, DL-Lite allows us to expand ABox queries with TBox knowledge and translate the expanded ABox query into SQL queries that are subsequently evaluated over the RDBMS." --- Ted was mentioning Clark&Parsia's OWLgres (OWL-lite, built on top of Postgres) to me yesterday. "The 10,000 Year Explosion" (Marginal Revolution) "The stories of "lots of recent change overall" and "current groups differ" are jammed together but of course they are very different. Epigenetics don't receive much attention, even critically, and the lower levels of Ashkenazi social achievement before 1800 are dismissed quickly." --- What does Tyler Cowen even *think* he means, when he uses the word "epigenetics" like this? "Boosting and the Regularity Lemma" (in theory) "As shown by Klivans and Servedio, the algorithm in Impagliazzo’s proof can be seen as a boosting algorithm in learning theory (and every other known boosting algorithm can be used to prove Impagliazzo’s Hardcore Lemma); this is why we think of our proof as a “boosting” proof. I must confess, I have never really understood Impagliazzo’s proof, and so I can’t say I really understand ours, other than being able to reproduce the steps." "Tories then and now" (Language Log) More on the Obama inauguration speech from Mark Liberman ... "The Wikipedia article cites historians' estimate that 15-20% of the white population of the American colonies were Loyalists — about the same as Dick Cheney's approval rating today. Not that there's any connection." I Am Here: One Man's Experiment With the Location-Aware Lifestyle A letter from the future... "To test whether I was being paranoid, I ran a little experiment. On a sunny Saturday, I spotted a woman in Golden Gate Park taking a photo with a 3G iPhone. Because iPhones embed geodata into photos that users upload to Flickr or Picasa, iPhone shots can be automatically placed on a map. At home I searched the Flickr map, and score—a shot from today. I clicked through to the user's photostream and determined it was the woman I had seen earlier. After adjusting the settings so that only her shots appeared on the map, I saw a cluster of images in one location. Clicking on them revealed photos of an apartment interior—a bedroom, a kitchen, a filthy living room. Now I know where she lives." "Restoring Science"? - Jonathan Adler - The Corner on National Review Online "President Obama's remark that his administration would "restore science to its rightful place" was both a shot at the Bush Administration, as well as a misleading statement. First, this comment wrongly suggested that science can somehow resolve difficult policy questions. Insofar as policy differences are based upon normative concerns, science may inform our decisions, but it cannot resolve policy disputes." --- I'm so tired of this ridiculous claim ("science is about what we 'can do', not what we 'should do'"), since it always pops up and it's very often totally wrong. I was watching the inaugural speech in a packed auditorium here at MIT -- the "restoring science" line got the loudest cheer by far. "The Today Show" (will wilkinson) "It’s all like this. They can’t help themselves, apparently. But it’s also pretty clear that they really do see their job as mediating and engineering our emotional response, as manufacturing our consent." -- Wilkinson manufactures the (completely appropriate) sense of disgust. The 17c/45 Caterpillar spaceship There's a whole community of people (apparently) who are doing engineering *in cellular automata.* That is, they seem to be taking a compositional approach to building *really large* structures which move coherently ("spaceships," I guess), and they have a system for measuring their speed. This is ... insane. I guess it make sense though -- once you see the little movers, you want to build bigger ones out of smaller pieces, right? It has a strange similarity-of-feel to the synthetic biology folks -- I wonder if the CA guys have similar "parts databases?" "The true history of the Bush years" (Making Light) All the Onion articles on Bush, from the last 8 years. An essential resource. "Stochastic Approximation" (Justin Domke’s Weblog) This was open on my tabs at work, but I was trying to remember it from home a few days ago. Three reasonable-looking papers indexed at the bottom... Lacoste-Julien, Sha, and Jordan, "DiscLDA: Discriminative Learning for Dimensionality Reduction and Classification" (NIPS 2008) The paper I was telling you about over lunch today... HttpClient - HttpClient Home "Although the java.net package provides basic functionality for accessing resources via HTTP, it doesn't provide the full flexibility or functionality needed by many applications. The Jakarta Commons HttpClient component seeks to fill this void by providing an efficient, up-to-date, and feature-rich package implementing the client side of the most recent HTTP standards and recommendations. See the Features page for more details on standards compliance and capabilities." "Persistence of regional economic disparities" (Andrew Gelman) To blog, in conjunction with thoughts on GG&S. "Naive Bayes, Binomials, and Bags of Words" (LingPipe Blog) A basic review of naive Bayes modeling, bag of words representations, and all that. "Distance between two lines" (Reasonable Deviations) To think through: connections with some convex approximation stuff. "Dojo: Building Blocks of the Web" (SitePen Blog) Simple Dojo example, worked through from beginning to end. "JSON Schema in Dojo" (SitePen Blog) To do: get straight, in my own head, the connection between stuff like JSON schema and XML schema (and RDFS?) and description logics "proper." Pronto—A Probabilistic Reasoner for OWL DL and Pellet "Pronto is an extension of Pellet that enables probabilistic knowledge representation and reasoning in OWL ontologies. Pronto is distributed as a Java library equipped with a command line tool for demonstrating its basic capabilities." Semantic Web Application Platform From W3C (although not endorsed by them?) -- a python platform for working with a bunch of semantic web stuff. Description Logics Includes a link to the Description Logic Handbook (which I spent part of the weekend reading, and is very good) as well as lists of resources for reasoners, summaries of complexity results, and a few other implementations. "Gregg Nations’s Job: Keeping ‘Lost’ on Track" (NYTimes.com) "Before the show’s premiere in September 2004, the producers were unsure that “Lost” would last beyond a few episodes. They therefore spent little time keeping track of the interlocking, overlapping and often confounding story lines that began to emerge even in the first episode. But when the series proved to be an out-of-the-gate hit, “we quickly realized we needed some system to keep track of all the details, that we weren’t going to be able to do that by memory,” said Carlton Cuse, one of the show’s executive producers." --- This is so, totally not going to end well. Boehm, Atkinson, and Plass, "Ropes: An alternative to Strings" Published in a software engineering journal -- as I read it, it's a replacement for fixed-length arrays, that makes things like concatenation much faster than simple bytes-in-memory. GeoCommons Finder! "Finder! is a browser-based application for finding, organizing and sharing GeoData in common formats." -- with a search box on the front end? Useful for, I suppose, tracking down already-existing resources. "LIBRIS - Linked Library Data" (Nodalities) "Earlier this year, LIBRIS was published as Linked Data on the web, exposing the entire library state with all its records, links and relations. As far as we know, this is the first union catalogue or national library catalogue to be published in its entirety as linked data. Not counting lcsh.info (which is great, but contains no bibliographic information) it is the first effort by a national library to actually be part of the semantic web." --- Allan R. made a joke about FOAF in the IAP course last week. "Students learn what they need, not what is assigned" (Surprise and Coincidence) "At the risk of sounding self-contradictory, I also think that when the student owns the task of learning, they will learn enormously better than when the task is imposed. This does not imply that the student discover everything, however. It merely implies that they need to discover the need for the things that they learn. Students who need to learn something can learn from almost any source, even from something as currently unfashionable as, say, sitting quietly through traditional lecture." NodeBox | Colors The "nodebox color library." Brachman and Levesque, "The Tractability of Subsumption in Frame-Based Description Languages" Spent part of the morning reading this -- exceptionally clear. Bjorner, Tillmann, and Voronkov, "Path Feasibility Analysis for String-Manipulating Programs" (Microsoft Research) "In the context of test-case generation, we are interested in an efficient finite model finding method for string constraints." -- Via Lambda the Ultimate. Related, I think, to some of the BDD/FSM thoughts that Bill Tozier linked to before, as well as some of the static-analysis-for-design stuff in general. "Dojo charts that sing: box and whisker plots" (Information Retrieval on the Live Web) The original post that got me thinking about Dojo charting. Paul Ogilvie Graduate student at CMU. Via Mark Reid's recent post (and the comments thereto). "A Beginner’s Guide to Dojo Charting, Part 2 of 2" (SitePen Blog) "A Beginner’s Guide to Dojo Charting, Part 1 of 2" (SitePen Blog) I need to find a list of javascript-charting libraries, at some point... "Via Andrew Gelman, an article at Pollster.com..." (A Well, With Two Buckets) The thing that's missing from these heat-maps (I mean, beyond a scale for the colors) is a notion of "density," right? Presumably, registered voters aren't distributed evenly across the two scales. A useful ODR exercise would be -- try to recover relative ratios, out of these two figures, for the parts that aren't obscured by post-hoc explanatory text and graphics. "The Concept of the Hero in Modern Civilization (or, the Best of the Akheans)" (Brad DeLong) "... We cannot today--that nobody can, since World War I--see war as glorious, and see the skill of the warrior as as source of glory. We admire the honor of Hector. We admire the strategic genius of Odysseus. But we do not see sheer excellence in the techniques of war as glorious in itself. And an earlier generation would. An earlier generation would see the march of the 3rd Infantry Division from Kuwait to Baghdad as glorious, even though the strategic fruits of that operational victory were thrown away by the incompetence of Bush, Cheney, Rumsfeld, Franks, Bremer, and company. We do not. And so, for us, it is Hector fighting to defend his home and family (even though the war waged by the Achaeans against Troy does, by their lights, have a just cause) who is the hero of the Illiad. Is it a good thing that we modern American liberals have the mindset that we do--that we cannot even suspend our disbelief for long enough to enter into a frame of mind in which Achilles is glorious?" Obama Plays It Cool (Saturday Night Live) I've been laughing about this for two days. "When I accomplish a mission, there isn't going to be a banner..." Using XQuery "This page describes how to use Saxon as an XQuery processor, either from the command line, or from the Java API." Jonathan Rees was using Saxon (the Apache XQuery processor code) to great effect as a data processor and cleaner, during the Neurocommons IAP course this week. "Google Blog Converters 1.0 Released" (Google Open Source Blog) So, embedded in this code somewhere must be stuff that parses out content from different prominent blogging platforms... Hmm.... Marvin Minsky, "A Framework for Representing Knowledge" (MIT-AI Laboratory Memo 306, June, 1974.) The (or one of the?) "frame" paper(s) from Marvin Minsky in the late '70s and early '80s. --- "While my theory is thus addressed to basic problems of Gestalt psychology, the method is fundamentally different. In both approaches, one wants to explain the structuring of . sensory data into wholes and parts. Gestalt theorists hoped this could be based primarily on the operation of a few general and powerful principles; but these never crystallized effectively and the proposal lost popularity. In my theory the analysis is based on many interactions between sensations and a huge network of learned symbolic information. While ultimately those interactions must themselves be based also on a reasonable set of powerful principles, the performance theory is separate from the theory of how the system might originate and develop." Hulu - Saturday Night Live: SNL Digital Short: J*** in My Pants So, I finally got around to watching this when I was bored in the office yesterday... jesus christ, that's hilarious. "The gamma function" (The Endeavour) Two nice visualizations of the gamma function, the second one on the complex plane. "This is a Good-bye Post" (evolgen) "This is the final post ever at evolgen. It was a fun 4+ years, the last three spent at ScienceBlogs, but it has come time for me to close up shop." --- I had noticed that he was posting less frequently. It's too bad that he's feeling the ennui and deciding to close up shop, it was a great blog. The Visual LaTeX FAQ " LaTeX-Project Awesome. Browsing the PDF, I was able to find the answer to (about) three or four questions that had been bugging me for a while. And just in time for thesis-writing, too! "Fama's Fallacy V: Are There Ever Any Wrong Answers in Economics?" (Brad DeLong) "I seek a Harvard undergraduate to take Greg Mankiw's course this spring, to write the following in an appropriate place: "the classical model of chapter 3 shows us that expansionary fiscal policies have no effect on output even where there are idle resources--unemployment," and to report back on the reaction of the course instructors." -- Hilarious. Everyone's bookmarks for ""ML and Stats People on Twitter" (Inductio Ex Machina)" on Delicious chl, do you have a blog? Wang, Dong, Sun, & Sun "Reasoning support for Semantic Web ontology family languages using Alloy" (PDF) Ah! Search, and ye shall find -- Alloy *has* been used to analyze or model some web ontology stuff, including some OWL work. Dad's response to my question about Mars Methane @ "It’s the Freakiest Show" (Quantum of Wantum) "Or the methane could be produced by some kind of Martian cows, buried no doubt. On Earth each Earth-type cow produces .6 metric tons of methane annually. One of the comments to yesterday’s NASA press conference on this discovery pointed out that since the plume contained about 19,000 tons of CH4 that indicates (if Martian cows are similar to Earth cows) that we should be looking for about 19,000/.6 = 31,000 (hidden or buried) Martian cows. This possibility will probably be reported on extensively in supermarket publications over the next several weeks." Peter F. Patel-Schneider's home page Semantic web guy, working on OWL, Manchester syntax, and description logic reasoners. “Molasses, waist deep” (The Edge of the American West) "Several of the bodies were too battered and glazed to be properly identified." ----- This is a really awful story. But I'm a immoral and disgusting person, because this line made me laugh, it sounds too much like a line from a Wendy's commercial. "Procrustes analysis" (Wikipedia) Mentioned by mja on twitter. "ML and Stats People on Twitter" (Inductio Ex Machina) I am completely and totally susceptible to flattery. "Obama’s Indonesian redux" (Language Log) "I still share Bill Poser's doubt, however, that Obama would be able "to carry out political negotiations with Indonesian leaders in Indonesian, or even to understand discussions of topics like politics and technology in an Indonesian newspaper." But as far as I know he's never made any claims to that level of proficiency. When Obama does eventually make a trip to Indonesia, I'm sure that simply throwing out his conversational pleasantries will go a long way in the eyes of many Indonesians." ---- I knew Obama spoke a form of Indonesian, but this is pretty awesome. Totes getting the warm fuzzies. Enterprise Business Intelligence & Analytics Applications - TIBCO Spotfire "Spotfire" -- dynamic, automatic, data visualization. Jonathan Rees just told me that he loved using it, and that it costs40k/seat/year (omFg...) but that he used it when he was at Millennium.
Everyone's bookmarks for "How do we find good teachers and QBs?" on Delicious
The Protocol Informatics Project
"The Protocol Informatics project is a software framework that allows for advanced sequence and protocol stream analysis by utilizing bioinformatics algorithms. The sole purpose of this software is to identify protocol fields in unknown or poorly documented network protocol formats. The algorithms that are used perform comparative analysis on a series of samples to better understand the underlying structure of the otherwise random-looking data." --- So... Smith-Waterman, but used to analyze network protocol data? That's ... neat. Sort of like that genetic-programming-for-file-parsing stuff from a week or two ago.
Good enough that I would actually consider using it as a template for a future abstract.
A Tiling Database
Via Metafilter.
seth's response, "The Power Law of Scientific Dismissiveness"
I'm not trying to be argumentative, so I'm not going to try to respond or whatever on his blog. (Who wakes up in the morning and thinks, "I want to be a troll?") But I still think he's missing something. Dismissive papers like the one he quotes, that is, papers that dichotomize or put things into binary categories, might still provide a useful social function. To wit: something like bounded rationality. Yeah, sure, in the limit of infinite personal time, I'd never be dismissive -- every paper I read, I'd examine it, evaluate its findings, maybe assign them a weight (or multiple weights), keep track of them all, use them (combine them) in ways which summarize my world-view (weighting reliable results more than less-reliable, a form of numerical caution), and so on. But I don't have infinite time or attention. Instead -- summarize. Give me the "take home" message. Tell me what you're *sure* of. It's a form of intellectual compression, more than anything else.
"Performance Bonuses" (Matthew Yglesias)
My proposal: pay professors based on the *later* performance of their students in higher-level courses of the same subject. So, if I taught CS 101, and then all my students in that class went on to do well in CS 201, boom! -- I get a bonus. Also, it provides extra incentive for instructors to teach lower level courses: the more "intro" a course is, the more chances there are to get paid for later performance. Two birds with one stone!
"Zizek and House" (An und für sich)
Kotsko analyzes the television show, "House." -- "At this point, we could say that House perfectly embodies Badiou’s ethics of the situation — and in fact, medicine is the prime example that Badiou uses in his Ethics. What House introduces, however, is a Lacanian twist: if medicine is House’s ethical duty, then it must be said that he enjoys doing his duty. Foreman is stuck in the mindset that whatever intellectual satisfaction he gains from diagnosis must be regarded strictly as a “side-effect,” lest his enjoyment somehow pollute the moral rectitude of his practice as a doctor — that is, prevent him from being a doctor who is nonetheless a “real person” underneath." --- How did I miss this, the first time around?
Hedge funds as Martingales: MIT Blackjack Team on the financial crisis
"Writing out-of-the-money puts." I really had my eyes opened to the world of financial engineering by two things: (a) reading Simon Peyton-Jones paper about using a functional language (Haskell) and type systems for evaluating financial contracts, and (b) a couple of papers on the "regulatory arbitrage" and the historical use of financial engineering in the world of Islamic finance to get around religious restrictions on things like lending and charging rent. Of course, it's one thing to just assign "a value" to some of these engineered instruments, even if that value is "risk-adjusted" -- but if you think that these techniques got us *into* the mess, then (I believe) they can probably get us out as well. Anyone interested in seeing links to either of those two (a or b) papers?
Papers of the War Department
Compilation of papers, collected from distributed archives, from 1784 to 1800. Mostly with an HTML search interface -- this is just *begging* to be scraped and put into a real format (coughsemanticwebprojectcough).
"The Power Law of Scientific Dismissiveness" (Seth’s blog)
So, can I be dismissive of his use of the "power law" analogy? Not every monotonically decreasing thing is a power law, right? The rest is pretty funny, and probably true, although I'm still waiting to see why "dismissiveness," even if wrong on a paper-by-paper (or study-by-study, or result-by-result) level, isn't (still) a good thing for a discipline *as a whole*. N'est-ce pas?
"A New Kind of Big Science" (Aaron Hirsh on Olivia Judson's NYT Blog)
"In physics, a slow drift toward centralization was given a sudden shove during the Second World War — think Manhattan Project — so it is perhaps not surprising that colliders today epitomize what historians have called “Big Science.” But a similar evolution is now evident in virtually every discipline. ... When a biologist wants the sequence of a certain genome, he submits his proposal to a large sequencing center, where armies of automated machines read their way in parallel through different paragraphs of a genome’s text." --- Science is a moving target. What was true three years ago (genome sequencing at centers) won't be true three years from now. Also, the title of this piece is ironic. (Or maybe he meant it that way? In which case, it's "subtle.") At any rate, the rest of the piece seems reasonable in places -- the rise of institutional, or factory, science is a fact, although I wonder if that has more to do with funding structures than anything else.
IAP 2009 notes 2009-01-12 - NeuroCommons
The wiki-notes section for the IAP course.
Ontology Lookup Service (OLS)
Searching (I think) the OBO Foundry ontologies. Noted as part of the Science Commons/OWL course I'm taking for IAP this week.
Everyone's bookmarks for "The Post-Materialist | A Pattern's Math Magic - The Moment Blog - N..." on Delicious
I'd be up for a weekend programming project, sure. We'd really have to restrict it to a three or four hour chunk when we're both online, and maybe do a little (slow) planning first, but I could find time to do it.
The Post-Materialist | A Pattern’s Math Magic - The Moment Blog - NYTimes.com
John saved this in Google Readera while ago... what's going on here, I guess, is that the tiles are defined, or constrained, by their interfaces? "Interface" here being, I guess, the location and number of points along their edges. Tiling with a fixed alphabet would probably be easy to write a search for. I can't get the sketch for the "magnetic tile" out of my head.
New Ratatat? “9 Beats” Hits Hard - rob’s blog
John, have you heard the leaked track from the new "9 Beats" album yet?
"Weak on Dragons" (Grasping Reality)
"Eustace Scrubb, in the early chapters of The Voyage of the Dawn Treader, manages to get himself turned into a dragon largely because the books he has read have “a lot to say about exports and imports and governments and drains, but they were weak on dragons.”" -- I have to say, the Eustace Scrubb episode in VotDT *still* gives me nightmares (the clawing at his arm, to get off the armband which dug into his skin, very Atul-Gawande-like). Also, both DeLong and Cowen have praised Laura Miller's "The Magician's Book," to that's probably something that should go on the to-buy list.
"The spontaneous regression of breast cancer?" (Respectful Insolence)
Orac's skeptical comments on the paper about spontaneous regression of breast cancer [ http://archinte.ama-assn.org/cgi/content/abstract/168/21/2311 ], which I learned about from the Medline "Director's Comment" podcast [ http://www.learnoutloud.com/Podcast-Directory/Education-and-Professional/Medical/Medline-Plus-NLM-Directors-Comments-Podcast/18821# ].
"Untitled" (Ars Mathematica)
"It sounds like the metric was the reciprocal of physical exertion."
D2RQ - Treating Non-RDF Databases as Virtual RDF Graphs - Chris Bizer
"D2RQ is a declarative language to describe mappings between relational database schemata and OWL/RDFS ontologies. The D2RQ Platform uses these mapping to enables applications to access a RDF-view on a non-RDF database through the Jena and Sesame APIs, as well as over the Web via the SPARQL Protocol and as Linked Data." -- recommended several times, by Alan R. from Science Commons.
"A PURL is a Persistent Uniform Resource Locator. Functionally, a PURL is a URL. However, instead of pointing directly to the location of an Internet resource, a PURL points to an intermediate resolution service. The PURL resolution service associates the PURL with the actual URL and returns that URL to the client. The client can then complete the URL transaction in the normal fashion. In Web parlance, this is a standard HTTP redirect." --- All the semantic web guys at W3C here are using the purl redirection service, for setting up their RDF-associated URLs. But OCLC is bad, right?
"The Renegades at the New York 'Times'" (New York Magazine)
The NY Magazine about the NYTimes tech guys.
Linden, Florin, & McGuckin, "Mucin Dynamics in Intestinal Bacterial Infection" (PLoS ONE)
"Bacterial gastroenteritis causes morbidity and mortality in humans worldwide. Murine Citrobacter rodentium infection is a model for gastroenteritis caused by the human pathogens enteropathogenic Escherichia coli and enterohaemorrhagic E. coli. Mucin glycoproteins are the main component of the first barrier that bacteria encounter in the intestinal tract. ... Major changes in both the cell-surface and secreted mucins occur in response to intestinal infection."
Lenth, "Some Practical Guidelines for Effective Sample-Size Determination" (PDF)
Via Andrew Gelman, and a post of his on sample size and power calculations [ http://www.stat.columbia.edu/~cook/movabletype/archives/2005/06/sample-size-and.html ]. Power -- "what statisticians are always calculating, but never have."
"Cache: a place for concealment and safekeeping" (Gustavo Duarte)
"This post shows briefly how CPU caches are organized in modern Intel processors." Gustavo Duarte's technical writing (and his diagrams) are extraordinarily clear. We need some meta-blogging here, collecting and linking to his posts by topic...
"The Missing Heritability of Height" (Seth’s blog)
"I don’t think it’s so mysterious. ... I believe the answer is this: The heritability estimates were overestimates. As one researcher put it, “Heritability estimates are basically what clusters in families, and environment clusters in families.” Variations in environment make far more difference than variation in genes. What the researchers “don’t fundamentally understand,” I believe, is their own tendency toward religious thinking — the tendency, shared by all of us, to believe what we’re told regardless of the (lack of) evidence for it. The notion that genes make a big difference in practice is one of those beliefs, repeated endlessly by genetics researchers..."
"Upstream Transcription: A whole lotta stuff going on" (The Daily Transcript)
"Since I'm pressed for time I'll just point out a few manuscripts that appeared in the past month that indicate how widespread transcription throughout the genome affects the packing of DNA which ultimately affects the expression of protein-coding genes."
"Lab color space" (Wikipedia)
COIN-OR Open Solver Interface - Trac
"Osi (Open Solver Interface) provides an abstract base class to a generic linear programming (LP) solver, along with derived classes for specific solvers."
No Title
"... As you might expect!"
Cairo Tutorial for Python Programmers
"Lou Dobbs on "Cheap Science"" (Biocurious)
Another way to think about this is that "science" (that is, academic research science) isn't the only alternative for (American) postdocs. Presumably, post-docs that can't find academic positions take the next-best alternative, which would be ... private industry? Anyway, science doesn't exist in a vacuum, etc etc. (Shaun says: "Poorly informed drivel." Fair enough!)
"The Embrace" (Matthew Yglesias)
One need only watch an episode or two of "24" to see how deep it goes...
"Intel 4004" (Flylogic Engineering’s Analytical Blog)
"It’s beautiful to see how none of the inefficiencies we see in modern chips are found on the 4004 and how the available space is completely filled with logic. The entire 4004 has only some 2,300 transistors and makes for a perfect exercise in learning neat chip layout and logic gate design..."
How to publish Linked Data on the Web
Preparing for the IAP course, starting tomorrow (assuming I can make it in through the snow).
MIT 18.085 Computational Science and Engineering I - Fall 2007
Part I in Gilbert Strang's two-part course on "computational science" -- basically, numerical methods for linear algebra and diff eq, applied to lots of relevant stuff. I've been listening to the video lectures of 18.086 (Part II of the course) via iTunes U. (18.06 is a pre-req for this stuff.)
MIT OpenCourseWare | Mathematics | 18.06 Linear Algebra, Spring 2005 | Video Lectures
Sure, you didn't take linear algebra at The D, John... but set aside ~30 hours of your time (say, 6 hours a weekend over the next five weekends), and watch the video lectures at this site. There are also tons of course material, too. The lectures are by Gilbert Strang, who not only wrote one of "the books" on the subject, but is also a great (and famous) lecturer. Do it!
Unwin, Theus, and Hofmann, "Graphics of Large Datasets: Visualizing a Million"
Via Andrew Gelman's blog... I probably should pick up a copy of this book. ($90, gah.) Financial Meltdown | n+1 The second part of the interview, several months after the first. He starts by talking about Bear Stearns... "Problem, Set, Match" (Choicelessness) In the first two problems, it seems clear what you want -- a transformation that rotates (and probably, unskews) your view of the points on the ruler, so that the ruler appears "straight" and "horizontal" across the bottom of the image. Like I said on the phone, I think that's just a matter of finding the right linear transformation... although I might need to brush up on my homogeneous coordinates, first (since there's also, I think, a translation involved). The third problem, I think it's still unclear what you want -- although it's clear that it won't be a linear transformation. I'm betting it's *not* impossible, although it might require two rulers, or something. Nuclear apocalypse and the Letter of Last Resort. - By Ron Rosenbaum - Slate Magazine Fascinating, true. But I think his assertion that "this was uncovered a few months ago by the journalists behind 'The Human Button'" is wrong. A lot of this was written about (and I don't know if even he got to it first) by a guy named Peter Hennessy, in a book called "The Secret State." I read about it first, over a year ago, in this post: [http://www.armscontrolwonk.com/1741/mad-nukes-and-englishmen], which has more detail too (including possible contents of the letter). More recently, the Human Button program inspired this follow-up post: [http://www.armscontrolwonk.com/2121/trident-on-the-radio]. The ArmsControlWonk blog is really worth reading, in general, actually. wolfgang's comment, at "backwards or twice as fast" (the statistical mechanic) I think he's saying that *you* should ask Dennett... My Genome, My Self - Steven Pinker Gets to the Bottom of his own Genetic Code - NYTimes.com Pinker, writing about Church's PGP. "Last fall I submitted to the latest high-tech way to bare your soul. I had my genome sequenced and am allowing it to be posted on the Internet, along with my medical history." Fine, fine... but where is it? (http://www.personalgenomes.org/public/6.html -- but I don't see any links that work.) Interview With a Hedge Fund Manager | n+1 C has been sending me links to this (two-part) interview. They're hilarious, at times. The story about them joking, "it's a six-sigma event! the first one we've seen in three months!" is funny, true. But it also reminds me of reading about people who are doing gene expression analyses, and they realize that their p-value distributions are far from uniform. "That's okay," they say, "I'll just *model the p-values!* I'll assume that they can be represented as a mixture of beta distributions, and we're good to go..." "Top 100 Stories of 2008" (DISCOVER Magazine) Via Kottke. No. 17 should be a *lot* higher on the list (probably in the top three, at least). Lieberman, Michel, Jackson, Tang, & Nowak "Quantifying the evolutionary dynamics of language" (Nature) The regularization of irregular verbs, and the death of old words. I remember reading about this (on the Language Log, probably?) when it was published. Job Dekker, Ph.D. - Faculty in PGFE - UMass Medical School "Losing Is Fun, Learning is Better: DF UltraTutorials" (Rock, Paper, Shotgun) Someday, I'll get back into Dwarf Fortress (this post is a link to 2+ hours of DF tutorials on Youtube). "R, the FUD argument, the self-cleaning oven, and how to you count "users"?" (Andrew Gelman) "Unit-testing for Bayesian models." A Formal Debate About George W. Bush With Some Unusual Players | MetaFilter An overview and summary of the Intelligence Squared debate, on NPR, between Weisber/Jenkins and Kristol/Rove. (!) We heard a preview of this during our drive back up from DC a few days ago, and I meant to track it down online... but then I had forgotten it again, until seeing this. The rules for who "won" seem like they're skewed towards favoring one side over the other, but maybe that's not the point. "voyagers and voyeurs—supporting social data analysis" (Haystack Blog) David Karger's thoughts on a talk about "social data analysis," which I think currently (basically) means "social data visualization." I think it would be interesting to think about making data websites not just about graphics, but about analysis itself -- for that, however, you're going to need to think of a language. "The public choice economics of Star Wars: A Straussian reading" (Marginal Revolution) "6. The prophecy was that Anakin (Darth) will restore order and balance to the force. How true this turns out to be. But none of the Jedi can begin to understand what this means. Yes, you have to get rid of the bad guys. But you also have to get rid of the Jedi. The Jedi are, after all, the primary supply source and training ground for the bad guys. Anakin/Darth manages to get rid of both, so he really is the hero of the story. (It is also interesting which group of "Jedi" Darth kills first, but that would be telling.)" --- Genius, throughout. Bravo! myGrid » What is a workflow? Everyone seems to be inventing these "dataflow languages for scientific computing" applications these days -- the Broad's got one, hell even I've got my private implementation of one (called "echo"). A lot of programmers who've had some experience with scheme or another functional language immediately begin thinking this way (and really, that's what these things are: distributed map and filter processes). But you have to ask yourself: "what is the data" that will be flowing through? And what language do I use to specify the flow-able parts of my computation? There are some pretty mature products that do this thing in the database space -- the Aurora paper is one place to start, but there are others, too. As always, ignoring what the (smart) DB folks have been doing for the last 10 years, and just rolling your own workflow environment, is probably a recipe for ... well, getting it mostly wrong. IAP 2009 - NeuroCommons "Scientific Data Integration on the Semantic Web," and IAP course taught next week by (among others) Jonathan Rees. I'm in! Hammer drill - Wikipedia, the free encyclopedia The source of some discussion with Tom, the other night. "A drill, that is also a hammer." Lewis-Mogridge Position - Wikipedia, the free encyclopedia I was going to make a joke... LizardFeeder | Mozilla Cute AJAX-ey interface to a continuous feed of Mozilla-related events. Not that I'm interested in Mozilla, necessarily, but the stylez are great. John, check it (and also: email me when you get the package I mailed to you from NC.) Via: [[ http://decafbad.com/blog/2009/01/05/enter-the-lizardfeeder ]] "Train Wreck" (The Edge of the American West) "In addition to being a thoroughly wretched president, Franklin Pierce delivered the most inarguably depressing opening sentence in the history of American inaugural addresses." Judd Gregg in 2012!!1! Genetic Algorithm in Python to Generate File Converters - biais.org "I applied the algorithm to a problem that is not really one: trying to help lazy programmers to write file converters. I had to write file converters to unify all (more or less) formated input files into one kind of CSV file. Each of these converters is made using the right combination of filtering / regexp matching / line splitting." "I Love You Man" Trailer (You Ain't No Picasso) Paul Rudd and Andy Samberg? Tell me more. "A conceptual drill" (kottke) I was going to say, "So, ripping off Arthur Ganson gets you a link from Kottke and thousands of hits on YouTube, now?" But then I realized that this "dreaming machines" guy *is* Arthur Ganson. Dude! The concrete drill is awesome, but he also has about a dozen sculptures that are even cooler. "On the Drug Money Trail" (Bradford Plumer) "In this month's New York Review of Books, Marcia Angell argues that "it is simply no longer possible to believe much of the clinical research that is published, or to rely on the judgment of trusted physicians or authoritative medical guidelines" when it comes to drugs or medical devices. Big Pharma, she argues, has corrupted the clinical trial process too thoroughly." -- With a discussion focusing on "off-label" uses of psychoactive drugs. I don't know why, but I'm suddenly glad that Brad Plumer's back blogging at his old place. "Democrats Debate Methods to End Stem Cell Ban" (NYT) "Stem cells from human embryos, “are the gold standard,” said Dr. George Q. Daley, a stem cell researcher at Children’s Hospital in Boston and the Harvard Stem Cell Institute. Before they can be replaced by cells derived from skin cells, researchers have to know, at a detailed molecular level, how similar the two types of stem cells are, and how different. “There are still so many unknowns,” Dr. Daley said. “I am going to continue to have my lab use both at the same time.”" Numbrary "Numbrary is a free online service dedicated to finding, using and sharing numbers on the web." "Scientific impossibility: Did FBI get their man in Bruce Ivins?" (Deborah Rudacille, in the Baltimore Examiner) "The DNA evidence linking the dry anthrax spores in the contaminated letters to the “wet” anthrax spores in the flask of RMR-1029 is not in dispute. “The part that seems still hotly debated is whether there was sufficient evidence to name Dr. Ivins as the perpetrator,” Fraser-Liggett says." Foster and Stine, "Variable Selection in Data Mining: Building a Predictive Model for Bankruptcy" "We illustrate our methodology by predicting the onset of personal bankruptcy among users of credit cards. This applications presents many challenges, ranging from the rare frequency of bankruptcy to the size of the available database. Only 2,244 bankruptcy events appear among some 3 million months of customer activity. To predict these, we begin with 255 features to which we add missing value indicators and pairwise interactions that expand to a set of over 67,000 potential predictors. From these, our method selects a model with 39 predictors chosen by sequentially comparing estimates of their significance to a series of thresholds. The resulting model not only avoids over-fitting the data, it also predicts well out of sample. To find half of the 1800 bankruptcies hidden in a validation sample of 2.3 million observations, one need only search the 8500 cases having the largest model predictions." Cambridge Researcher’s Special Christmas Gift - Microsoft Research Chris Bishop will give the "Christmas Lectures", the annual lecture series (for kids!) from the Royal Institution of Great Britain. Note to self: make sure to look up his Lecture #5, online. "In the winter of light" (BLDGBLOG) "Give us those nice bright colors / give us the greens of summers / makes you think all the world's a sunny day." Oh, yeah. Subsidyscope.com Tom's (manifestdensity.net) new employer/project. He mentioned, over beers in DC two nights ago, that the 36-350 notes were being read-and-used-as-inspiration by members of his team. Twitter Venn I need to download the new Twitter dataset when they put it back up... but in the meantime, the more I think about it, the more I think that something like this tool should be part of the "large data"/"end of theory" discussion. "Inside the Machine" (Bradford Plumer) Plumer reviews Royko's bio of Richard Daley. The parallels to Robert Moses seem, at least on the surface, remarkable -- right down to the quotes, which are almost identical to "At least he got it built." I bought "Guns, Germs, and Steel" over the holidays -- maybe "Boss" should be next, as soon as I finish the Diamond. Cultural history of a different sort. Netflix Update: Try This at Home Simon Funk's Netflix work, in his own words. Getting this out of Firefox tabs and into delicious. "Traditional lectures don't work" (Biocurious) "As will be no surprise to anyone who has ever sat through a lecture, traditional lecturing doesn’t work, or at least, not as we wish." -- A statement that seems (to me) so obviously wrong that I'm surprised that anyone believes it! (I suppose some lecturers would respond, "As will be no surprise to anyone who has ever tried to teach, modern students don't work, or at least, not as we wish," but that would be too facile too.) There's a hidden variable here, which is: the quality of the teacher him/herself. He ends with a call for more interactive classroom systems ("clickers"), which I suppose is an advancement for certain kinds of courses. Maybe he's misinformed as to what the point of a classroom experience is supposed to be? Maybe there's an unstated assumption which I'm completely missing? "Light Yourself a Candle" (Unqualified Offerings) Yet another reason not to take the Weekly Standard seriously. "The Zune Bug" (bit-player) "Thus it turns out the Zune has a fixed lifespan of 101 years, from January 1, 1980, through December 31, 2080." Critiquing some of the code that led to the ZUNOCALYPSE. To be blogged, probably, when I get around to talking about test-driven-development and logical modeling (or formal methods). "Lessons to learn from the financial crisis" (Cranial Darwinism) Writing about the SEC and Madoff scandal: "Maybe their analysis is right - but personally, I'm not sure that wishful thinking and incompetence are the whole story. So much of life has become so complex that it's poorly understood, and we seem to mostly accept this. Case in point: I have a PhD in computer science and about 25 years experience, and I've done everything from soldiering together microprocessors to writing compilers to proving impossibility theorems. Yesterday I spent 45 minutes fixing the wireless on my laptop - and I don't understand why what I did worked. I could in principle, I'm sure - but that would take even more time, which I'd rather spend doing other stuff." -- And three bullet-pointed suggestions for complex problems, which seem like obvious points but are all the more welcome for that. "Eric Lander Teaches?" (evolgen) "Eric Lander is a professor, not a teacher. And he's also taking steps into politics under the upcoming administration. This has got me wondering whether he may move into the position of director of the NHGRI (a post vacated by Francis Collins earlier this year)." -- But being "a teacher" means more than just "being assigned to teach classes at an accredited university." I've seen Lander give lectures to *other scientists* before (at a Whitehead retreat) or little impromptu lessons, in his office, to (potential) collaborators. Say what you will about the man, he is most definitely A Teacher. "New book on the psychology and engineering of traffic" (Cognitive Daily) I really meant to buy Vanderbilt's Traffic book for someone this christmas, but it somehow (inexcusably) fell off my radar. It's definitely going to be given to someone for a birthday in the next couple of months. How to Update the Book for Alloy 4 Notes on revising the notes and examples in the book to work for the latest example of Alloy. Alloy Community "Alloy" is the name of the formal analysis tool (which lets you build models using a nice logical language, and then does small-model counterexample searching by reducing the specification to a SAT problem and using a fast SAT solver to look for solutions) from Daniel Jackson's group here at MIT. An earlier version of Alloy was the basis for his Software Abstractions book. "Is it Art?" (John Lanchester in the LRB) "The only thing which isn’t ridiculous about Rand and her ‘objectivism’ is the number of people who take her seriously. It would be a good time for someone to publish a work of fiction or make a movie going into Rand’s ideas and duffing them up a bit – for instance, imagining what it would look like if a society with no laws were turned over to the free will of self-denominated geniuses. Well, someone has done that, except it isn’t a book or movie, it’s a video game." --- Lanchester takes a look at video games, culture, history, and art in the LRB. I scanned and liked, but I need to go back and re-read carefully. This article got talked up a lot by Tom at Infovore. NIH: Public Access Homepage More NIH pages on public access paper submission... NIH Public Access: Submission Methods Guide to submitting a paper to NIH's public access system. Damn you, PSB, for not doing this for me! Commission on Rationalizing New Jersey's Health Care Resources, Final Report 2008 The Dartmouth Atlas of Health Care Via Uwe Reinhardt's column in the NYT. Crossing the Quality Chasm: A New Health System for the 21st Century From the "Committee on Quality of Health Care in America." The Repugnant Conclusion (Stanford Encyclopedia of Philosophy) It's like an impossibility theorem (cf. Kleinberg, or Arrow), but for population ethics. Pointed out by Yglesias. Relevant, I think, for thinking about the politics of health-care and funding. "Cut Medicine In Half" (Overcoming Bias) One of Robin Hanson's skepticism-about-health-care-expenditures posts. McGlynn et al. "The Quality of Health Care Delivered to Adults in the United States" (NEJM) U.S. Health Care Costs Part VI: At What Price Physician Autonomy? - Economix Blog - NYTimes.com Part 6 in a multi-part series from Uwe Reinhardt. "Physician, Heal Thyself (or at least do it more cheaply)" (Matthew Yglesias) Source of several useful links and reports. Nancy Park, "Imperial Chinese Justice and the Law of Torture" (Project MUSE) I picked up a copy of the "Late Imperial China" journal from the desk of R's aunt, while we were helping her fix her iTunes installation this afternoon -- I opened it, flipped to this article, and read it while R fiddled with the computer. It turns out to be a really fascinating description of the laws concerning "judicial torture" during the Qing dynasty in China (17th to early 20th centuries). Most were the laws *restricting* torture, as well as the Park's short conclusion (which I thought was very cogent). The Truth about BDD As it turns out, I've got some thoughts about this that will probably turn into a blog post in the next day or two. In my mind, the connections between these methods and some research work in formal methods (cf. my offhand reference to Daniel Jacksons book a few days ago) is pretty strong; there's probably some wheel reinvention going on here too. But I'm not 100% sure. Thanks for the link, Bill! Predicting Structured Data - Table of Contents - The MIT Press Tracking down some of the papers listed in this table of contents has been the work of part of my holiday... (reference via cshalizi's notebook on "Statistics with Structured Data.") "Let the Rent Seeking Begin" (Greg Mankiw's Blog) Mankiw thinks it'd be better to drop money out of a helicopter than to give it to AZA-suggested zoo projects. Cuts a little close to home, does it not? everyone's bookmarks for "36-350, data-mining: self-evaluation and lessons learned" on delicious The "Jordan book" I remember was used as course notes for 6.867 (MIT's grad machine learning course) when I took it, several years ago. Looking back on some old course websites, I see that it was actually labeled as a draft of a book by Michael Jordan *and* Chris Bishop (it was distributed as draft PDFs, and I'm not sure I have copies of them anymore). Cosma Shalizi gossiped to me, earlier today, that he believes there was a "parting of ways," and that Jordan left the project and the book became this Bishop book: http://research.microsoft.com/en-us/um/people/cmbishop/prml/ -- which, if you don't own, I recommend highly. I have a copy of it, and I hadn't realized it was the same book (although it makes sense, when pointed out to me). It's changed a *lot*, since it was in draft form... Guttag and Horning, "Formal specification as a design tool" (1980) "In this paper we outline a specification language combining algebraic axioms and predicate transformers, present part of a non-trivial example (the specification of a high-level interface to a display), and finally discuss the analysis of this specification." Daniel Jackson cites this paper (which describes the "Z" system for specification and analysis of programs) as a early, and key, inspiration to his Alloy system (I'm paraphrasing this from Jackson's book, "Software Abstractions.") "XMP Labelling for Nature" (Nascent) Nature has started putting XMP tags (historically used for MP3s) in their PDFs online. oakland crime maps XI: how close, and how bad? (tecznotes) Some discussion of zoomable geographic heat-maps for crime-statistics. Avoids the issue of color/scale, I think. "lxml: an underappreciated web scraping library" (Ian Bicking) "Archinect Sees 2009" (BLDGBLOG) Bryan gets a shout-out from Geoff Manaugh! Including a link to the thesis, nice. Well done, bryan. A Few Links On Building Zoomable Images for the Web Some notes I wrote myself, for the future, on some of the rudimentary details of building a system of pre-cached image-pyramid blocks for zooming up "Maps-style" zoomable images on the web. Most of these links are derived from a single blog post, which I got via vielmetti. "Variational Integrators" (Reasonable Deviations) "One particular class of geometric integrators I have been interested in over the past year is the class of variational integrators. Many physical laws are formulated as variational principles from which differential equations can be derived. Instead of discretizing the differential equations, why not discretize the variational principles instead?" mikeash.com: Friday Q&A 2008-12-26 Obj-C gets first-class-functions with lexical scoping. (Why are all these languages, like Obj-C and Java, getting half-on the functional-programming-style bandwagon *now*?) As usual, this all just seems like syntactic window-dressing if you don't have TCO. Lots of people in the comments whose questions would be answered with a reasonable semantics for the modified language... Also, "Foo" asks the right question, and "J. Fortmann" seems to provide the answer: apparently it involves stack-to-heap magic. Sounds complicated. 36-350, Data-Mining: Self-Evaluation and Lessons Learned Cosma Shalizi writes his end-of-semester thoughts on how is Data Mining course went, and what he'd change next time. C gave me a printed, notebook-bound copy of the 36-350 course notes as a Christmas gift this year, along with copies of several of the papers referenced in the notes (inserted at the appropriate places). I should go back, print out Tom Minka's course notes, dig up the course notes from some old sections of that Michael Jordan book (was that ever published) and compare-and-contrast. Might be a fun exercise... "Nuclear Urbanism" (BLDGBLOG) "In this context, I have to say that the books of Richard Rhodes cannot be recommended highly enough. His The Making of the Atomic Bomb – which I have to confess to having read only partially – is required reading for anyone interested in what intensive, well-funded efforts of design can – in this case, unfortunately – produce. The bomb as an act of national infrastructure." -- I received the Rhodes book as a gift this year; it almost can't be recommended highly enough. Elser, Rankenburg, and Thibault "Searching with iterated maps" — PNAS Still trying to think of something that will make the layout-search better (or easier)... Confessions of a Community College Dean: Some Thoughts on the AFT Report My dad thought this was intensely interesting, when he saw it sitting open on the computer screen the other day... "T-Rex in: Secrets of the Medical Profession" qwantz.com - dinosaur comics - January 05 2005 Soon, C will be an initiate... Machine Learning Theory Course notes. "Java Fork/Join + Groovy" (behind the times) "Fork/Join is similar to MapReduce in that they are both algorithms for parallelizing tasks." Gah! Okay, look. Fork/Join looks pretty damn cool. Neat use of reflection, and Doug Lea is an uber-smart guy (and good writer!), and automatic ways of detecting and implementing parallelism are teh hottness. But! I think the quoted line gives a good example of what Stonebraker and DeWitt were complaining about, when they wrote their criticism of MapReduce. First of all, it's not even clear in what sense MapReduce is "an algorithm." But even if you grant that, it's this reflexive comparison of "parallel things" to MapReduce, as if the latter were any kind of gold standard for doing things in parallel... it's just that, look, MR gets its power from the fact that it can rely on the programmer having already arranged his or her data into independent key-value chunks. *That's* what's doing the heavy lifting, algorithmically, in MR! "Beyond Proportional Analogy" (Apperceptual) "For some time now, I’ve been experimenting with algorithms for solving proportional analogies." Haven't had enough time to read the underlying paper yet (apparently, my family and girlfriend are not cool with me reading papers at the dinner table on Christmas Eve? who knew...), but this looks totally sweet. Time to add Apperceptual to the RSS reader (I'm not sure why I hadn't done that already). Everyone's bookmarks for "A Semantic Analysis, So Latent as to be Completely Hidden " Quantum..." on Delicious Well, I didn't mean to imply that regularized or iterative matrix decompositions would be totally obvious -- on some level, all this stuff is fairly abstruse. And there are lots of ways to attack the collaborative filtering problem. Maybe it's just a function of who I learned about this material from, people like Jason Rennie and Nati Srebro, who've been preaching the gospel of regularized matrix decomp. and iterative methods for five or six years... anecdotally, I know that Jason's only taken a crack at the Netflix stuff in the past month, as he's been between jobs. I think my ultimate complaint is probably with the NYT itself, as I think their science journalism is uniformly pretty bad. Anyway, thank *you* for the great links throughout the year too. Delicious is slowly becoming more and more essential to my everyday thinking, and your links are no small part of that. All the best, and happy holidays. "Uh oh. Somebody cut the cake. I told them to wait for you, but they cut it anyway. There is still some left, though, if you hurry back." (Crooked Timber) Kieran Healy gives the game away with his struck-out text. "We are throwing a party in honor of your tremendous success. Please place the device on the ground, then lie on your stomach with your arms at your sides. A party associate will arrive shortly to collect you for your party." Heh. "Obfuscated Perl Program" (Mark Dominus) Reason enough why Perl People are insane. "I would have been proud to lose to Bruhat's entry. It is truly impressive, and deserves to be seen. It runs in Perl and in PostScript so be sure to send it to your printer to see what comes out..." "What is (and what good is) a combinatorial prediction market?" (Oddhead Blog) David Pennock is a good writer, I think, and I still believe that combinatorial markets are among the cooler ideas I've seen in the last five years... now if only someone could convince him that highlighting *and* underlining items on his blog is typographic overkill. Also: pull quotes? 4 Top Science Advisers Are Named by Obama - NYTimes.com Varmus and Lander are among those named to Obama's Council of Advisors on Science and Technology. JamesM's kernel development tutorials Nice. To read, once thesis is finished... (and no one is hungry, and peace reigns around the globe, etc etc.) "Unequal Protection" (A Little Urbanity) Dave Wharton is my hero: "The News and Record reports this morning that the GPD has stepped up patrols at "shopping centers across the city" in order to make people feel safer during the Christmas shopping season. Here's the N&R's map showing the affected shopping centers... I'm sure people who shop at those centers appreciate the effort. But there's something missing -- literally -- from the map. That story should read, "across the city (if you don't count anything east of Church St. as part of the city)." "Creating Effective Cartograms" (Tim Showers) I feel duty-bound to forward this on to you... "A Semantic Analysis, So Latent as to be Completely Hidden" (Quantum of Wantum) Actually, I'm not in an ML group at all -- top-notch or otherwise. And my snark (which I admit is probably unwelcome) was directed at Netflix, not at the people (Funk, or anyone else) who were pursuing the prize. Anyone who's tried to do SVD on big matrices from real-world data immediately finds themselves looking at exactly the same solutions that Funk did -- iterative methods for large matrices, and regularizing methods for dealing with missing values -- but I didn't want to get into that stuff on my blog, because I'm writing (basically) for my parents there. The "Jordan book" I remember was used as course notes for 6.867 (MIT's grad machine learning course) when I took it, several years ago. The upshot is that NFL quarterbacking and teaching are both jobs that need to be performed in order to find out if a certain person is good at them or not." --- Wait, waitwaitwait. See, this is my problem with Malcolm Gladwell and his interpreters. Are you trying to tell me that there are *no* other activities, in which success is correlated with successful teaching (or quarterbacking)? I understand that *perfect* prediction isn't possible, and I'm willing to grant that the "standard" predictors we use for predicting success at these tasks might be flawed. But it seems like a fallacy to presume, therefore, that there aren't any predictors available at all! More likely is: we just haven't found the right ones, yet. The Shirt Project "We diagram the news on shirts. These t-shirts are available by subscription, like a newspaper." (Pleasepleaseplease, someone buy me a subscription to this...) "I AM HAPPY TO HELP THOUGH" qwantz.com - dinosaur comics - December 01 2008 "You're saying that in a breakup, entities should not be multiplied without necessity?" Occam's razor, people! OCCAM'S RAZOR. Jaakkola and Haussler, "Exploiting generative models in discriminative classifiers" Yet another paper I always have trouble remembering. "In this paper, we develop a natural way of achieving this combination by deriving kernel functions for use in discriminative methods such as support vector machines from generative probability models." It seems like one way of approaching the clustering stuff would be to combine this idea with work like Brun's and Kleinberg's. Assume that the distance functions which are input to your clustering algorithm are derived from these sorts of generative models -- then, presumably, you could classify some kinds of clustering functions in terms of their (derived) performance on different classes of generative models... Brun et al. "Model-based evaluation of clustering validation measures" (Pattern Recognition) To read. "A cluster operator takes a set of data points and partitions the points into clusters (subsets). As with any scientific model, the scientific content of a cluster operator lies in its ability to predict results. This ability is measured by its error rate relative to cluster formation. To estimate the error of a cluster operator, a sample of point sets is generated, the algorithm is applied to each point set and the clusters evaluated relative to the known partition according to the distributions, and then the errors are averaged over the point sets composing the sample." DBLP: Marcel Brun Via a note in csantos' links... In particular, I need to go look up "Model-based evaluation of clustering validation measures." "Do people still use microarrays?" (evolgen) A topic which was (quite independently of this post) the center of discussion the other day... Jakulin and Bratko, "Quantifying and Visualizing Attribute Interactions" (arXiv) I'd seen this before, but I finally got around to reading it a little more carefully after seeing it linked on Cosma Shalizi's CS-350 course website. (Jakulin, you will remember, is co-blogger with Andrew Gelman on his blog -- and former student, maybe?) Diagrammatic representations of "interactions," which are N-way relationships between random variables that are mediated through measurements of N-way entropy and information, and which capture some aspect of context-specific independence or dependence characteristics... Paparizos, Lakshmanan, "Tree logical classes for efficient evaluation of XQuery" (SIGMOD 2004) "A set of XML elements, say of type book, may have members that vary greatly in structure, e.g. in the number of author sub-elements. This kind of heterogeneity may permeate the entire document in a recursive fashion: e.g., different authors of the same or different book may in turn greatly vary in structure. Even when the document conforms to a schema, the flexible nature of schemas for XML still allows such significant variations in structure among elements in a collection. Bulk processing of such heterogeneous sets is problematic. In this paper, we introduce the notion of logical classes (LC) of pattern tree nodes, and generalize the notion of pattern tree matching to handle node logical classes. This abstraction pays off significantly in allowing us to reason with an inherently heterogeneous collection of elements in a uniform, homogeneous way." Michigan Molecular Interactions (MiMI): putting the jigsaw puzzle together -- Jayapandian et al. 35 (Supplement 1): D566 -- Nucleic Acids Research "Michigan Molecular Interactions (MiMI) assists scientists searching through this overwhelming amount of protein interaction data. MiMI gathers data from well-known protein interaction databases and deep-merges the information. Utilizing an identity function, molecules that may have different identifiers but represent the same real-world object are merged. Thus, MiMI allows the users to retrieve information from many different databases at once, highlighting complementary and contradictory information. To help scientists judge the usefulness of a piece of data, MiMI tracks the provenance of all data. Finally, a simple yet powerful user interface aids users in their queries, and frees them from the onerous task of knowing the data format or learning a query language." "Building a polynomial from its roots II" (Reasonable Deviations) "This graphical approach is (most likely) of no practical use. Nonetheless, this approach does reveal the “structure” of the process of constructing a polynomial from its roots, and such structure is of much greater beauty than cumbersome algebraic manipulation." LiveDosGames | Cuz the future is in the past Panzer General, Master of Orion, SimLife? Tell me more... Tian et al. "Single-nucleotide mutation rate increases close to insertions/deletions in eukaryotes" (Nature) "Matrix decompositions with sparsity constraints" (Reasonable Deviations) In comments: "I’ve just found out that every 5 \times 5 matrix can be factorized into matrices with the above sparsity pattern: To prove this, you only have to prove that every matrix can be reduced to a diagonal matrix using elementary row and column operation, and that the matrices corresponding to the operations can be written as products of matrices with the given sparsity pattern (and of cause that diagonal matrices have the given sparsity pattern). Of cause this generalize to n \times n matrices with a similar sparsity pattern." "Good Gift Games Guide 2008" (defective yeti) Jessica Fridrich Specializes in Problems That Only Seem Impossible to Solve - Biography - NYTimes.com 53 algorithms, ZOMG! (I'm betting it's more accurate to call this "an algorithm with 53 operations," but whatever. I do appreciate the phrase, "cube-obsessed.") "Eine Kleinberg Knocked Musing" (Quantum of Wantum) John -- don't worry about the content, just read to the bottom and check out the second "reward" link. Canto "Canto is an Atom/RSS feed reader for the console that is meant to be quick, concise, and colorful." --- Hmm, yes, this might be quite useful (when running in a screen, of course). Twitter Venn (Neoformix) This ... is actually really great. (Now take it to the next level -- what can you infer, from some of these patterns? Can you imagine providing a simple, programmatic API to exploring some of these interactions? What about combinations with order higher than 3?) "Tyrone runs monetary policy" (Marginal Revolution) "What about that guy who set up the phony investment company? Can the Treasury make a new one of those, only bigger? He took money away from people and gave it to charities and the needy and the arts and higher education. That sounds like stimulus so why are we sending him to jail? Wasn't he ahead of the curve?" Introduction to iPhone Application Development - IAP 2009 Ted(6.830 teammate)'s website for the IAP course he's teaching next month. Thinking of building a mobile genome visualizer... Hazan & Shashua, "Convergent Message-Passing Algorithms for Inference over General Graphs with Convex Free Energies." Something about the "free energy" approach to talking about belief propagation has always really confused me ... but I'll give it another shot. Tightening LP Relaxations for MAP using Message Passing David Sontag, Tommi, and Yair Weiss all on the author list. At first blush, it looks like the "integer programs relaxed to linear programs" approach, applied to inference in discrete models. "Spam, visualizations and obvious variables" (Aleks Jakulin) With Gelman in comments, making cracks about Mercator. Why didn't I bookmark this before? (Also: I submit to you that "obvious variable" should be a commonly-accepted synonym for "a constant"). "Publishing is Changing" (Unlikely Words) Dad, I think that posts like this (and also, read all the essays that he links to, too) should be a significant datapoint in your thinking about the death of the newspaper industry. "It’s not that publishing is over, or banking, or auto manufacturers, or the music industry. This isn’t a coincidence. These are all businesses that haven’t evolved from where they were and they’re getting punished for it." Will Wilkinson, "The Lost World" "For the well-heeled, perhaps the biggest problem with economic growth is that eventually one is forced to compete with the hoi polloi for non-manufacturable goods. In this example, to avoid entirely the snowboarding philistines, one ends up having to own a mountain. But in what kind of damnable world must a Yale man be that rich in order to carve virgin powder?" --- Wilkinson's snark is fine, indeed. Said the Gramophone: BEST SONGS OF 2008 Right in so many ways. Get 'em before they're gone. MIT OpenCourseWare: 14.03 Intermediate Applied Microeconomics, Fall 2004 MIT OpenCourseWare: 14.02 Principles of Macroeconomics, Fall 2004 MIT OpenCourseWare: 14.01 Principles of Microeconomics, Fall 2007 MIT OpenCourseWare: 15.433 Investments, Spring 2003 MIT OpenCourseWare: 15.414 Financial Management, Summer 2003 MIT OpenCourseWare: 15.402 Finance Theory II, Spring 2003 MIT OpenCourseWare:15.070 Advanced Stochastic Processes, Fall 2005 Haug and Taleb, "Why We Have Never Used the Black-Scholes-Merton Option Pricing Formula" (SSRN) Gotta get this off my list of open tabs. "Your brain is now their open book." (Acephalous) "Seems the human brain represents letters on itself so precisely that the algorithms these scientists used to train local decoders to select relevant voxels and assign weight matched signifier to signified down to the last pixel." --- Scott calls bullshit on the Neuron journal article's graphics. Hmmmm. Maybe if we had some method of taking images and reading data back out... (the quantitative from the qualitative, as it were) T N T — The Network Thinker: Influencer Targeting Valdis Krebs is driven to new heights of shrillness... (trying to channel Brad DeLong, here). Listen, I don't see what the big deal is. If you believe that patents are the right kind of intellectual property regime for protecting software, and if you believe that you can draw line around the word "algorithm" wide enough to capture stuff like this, then this patent shouldn't be a problem. Conversely, if you believe that not everything that some people call an algorithm is "an algorithm" (is a 'scoring function' properly considered an algorithm) and if you believe that intellectual property systems designed to work with physical machines and processes probably aren't suited for modern software design and development, then this is (yet another) straw on your camel-like back. But none of this turns, it seems to me, on the issue of "prior work" from Pharma companies in the 60s (or any other decade). ZooBorns C, you should show this to J. Fernando, Karishma, & Szathmary, "Copying and Evolution of Neuronal Topology" (PLoS One) "We propose a mechanism for copying of neuronal networks that is of considerable interest for neuroscience for it suggests a neuronal basis for causal inference, function copying, and natural selection within the human brain. To date, no model of neuronal topology copying exists. We present three increasingly sophisticated mechanisms to demonstrate how topographic map formation coupled with Spike-Time Dependent Plasticity (STDP) can copy neuronal topology motifs. Fidelity is improved by error correction and activity-reverberation limitation. The high-fidelity topology-copying operator is used to evolve neuronal topologies. Possible roles for neuronal natural selection are discussed." -- No idea how plausible this is. "Whatever you do, don't panic: The drama of a call to the emergency services" Audio recordings of 999 (911, but in Britain, I guess?) calls. I'm too scared to actually listen. Schulte et al "Join Bayes Nets: A new type of Bayes net for relational data" (arXiv) "Many databases store data in relational format, with different types of entities and information about links between the entities. The field of statistical-relational learning has developed a number of new statistical models for such data. Instead of introducing a new model class, we propose using a standard model class--Bayes nets--in a new way: Join Bayes nets contain nodes that correspond to the descriptive attributes of the database tables, plus Boolean relationship nodes that indicate the presence of a link." ---- Everyone's got their own flavor of "higher-order" graphical model. Queries and Indexes - Google App Engine - Google Code In retrospect, getting the indices right seems like it was one of the key things (key things I got wrong) with the last App Engine attempt I made. I should revisit this at some point, with an eye towards thinking more carefully about the query workload and indexing aspect... "Paper tigers and hidden dragons, (Roy Fielding, Untangled) "Instead of a list of changed user ids or URIs, we can represent the state as a sparse bit array corresponding to all of Flickr’s users. I don’t know exactly how many users there are at Flickr, but let’s be generous and estimate it at one million. One million bits seems like a lot, but it is only 122kB in an uncompressed array. Considering that this array will only contain 1s when an update has occurred within the last minute, my guess is that it would average under 1kB per representation." ---- And then you have to publish a separate resource that maps bits-to-users, or bits-to-URLs?? It's like he's never even *heard* of a hash function! "Prudie on husbands who write books" (Marginal Revolution) Maybe I miss the distinction between Trudie and Prudie? At any rate, I find all the people who show up in comments and say things like, "not enough information has been given to give a correct answer!" to be completely confusing. How can they think that's what the question is about? Google Reader API Niall Kennedy deconstructs Google Reader. (Later confirmed, in some details, by Google?) Matono et al. "A path-based relational RDF database" More path-indexing. Baolin and Bo, "HPRD: A High Performance RDF Database" Path-indexing in an RDF triple-store. Continuous Audio Life Logs and Personal Audio Recordings Includes a link to a page that reviews some commercially-available audio recording units. "How to get your grocery shopping done quickly and effectively" (Vacuum - Edward Vielmetti) "Always take the same route through the store, and organize your shopping list according to the store layout. That means for our Whole Foods that the order goes vegetables, bulk foods, canned goods, cheese, bread, while at Trader Joes the order is bread, cheese, yogurt, frozen veg, canned goods, snacks. You should be able to walk through the store without backtracking to pick up everything you need." --- What would a generic grocery store map look like? (An open area, descriptions of rows, the topology of connections between different places.) I need to talk to John about the ability to have "approximate mapping" software. Jain, Kulis, and Grauman, "Fast Similarity Search for Learned Metrics." "Locality-sensitive hash functions." Ackerman and Ben-David, "Measures of Clustering Quality: A Working Set of Axioms for Clustering." Via Mark Reid. From NIPS 2008. Basically, a response to that Kleinberg paper about clustering and impossibility that can be summarized (roughly) as, "Suck it, Kleinberg." From the abstract: "We argue that an impossibility result is not an inherent feature of clustering, but rather, to a large extent, it is an artifact of the specific formalism used [in Kleinberg's paper]. As opposed to previous work focusing on clustering functions, we propose to address clustering quality measures as the object to be axiomatised. [What?] We show that principles like those formulated in Kleinberg's axioms can be readily expressed in the latter framework without leading to inconsistency." (Okay, so I need to read this in detail before adding any more comment.) "In GoogleEarth, Shadows are Your Friend" (ArmsControlWonk) "If you are anything like me, you suffer a death of a thousand cuts when you see something on GoogleEarth and want to know how tall it is." So he uses the date-stamp that (sometimes) appears at the bottom of the Google Earth images (but is noisy), and observed shadows, to estimate heights. Probably the best example of "pulling data back out of images" that I've seen recently. "2008 Storage Hierarchy" (ongoing) These tables get updated every so often -- this is a good recent one. Gaming these numbers (or optimizing for them) has pretty much been the name-of-the-game in databases for the last thirty years or so. Jon Kleinberg, "An Impossibility Theorem for Clustering" This is the paper I was desperately trying to remember the title of, the other day. "Although the study of clustering is centered around an intuitively compelling goal, it has been very difficult to develop a unified framework for reasoning about it at a technical level, and profoundly diverse approaches to clustering abound in the research community. Here we suggest a formal perspective on the difficulty in finding such a unification, in the form of an impossibility theorem: for a set of three simple properties [Scale Invariance, Richness, and Consistency] we show that there is no clustering function satisfying all three. Relaxations of these properties expose some of the interesting (and unavoidable) trade-offs at work in well-studied clustering techniques such as single-linkage, sum-of-pairs, k-means, and k-median." Miyawaki et al. "Visual Image Reconstruction from Human Brain Activity using a Combination of Multiscale Local Image Decoders" (Neuron) Wow. Just... wow. (See figure 2.) "What’s Appreciative Thinking?" (Seth’s blog) "[Some random guy] asks what I mean by appreciative thinking. A good question, since I invented the phrase. To learn appreciative thinking is to learn to appreciate, to learn to see the value of things. More or less the opposite of critical thinking." --- But all his examples of "appreciative" thinking actually seem like what I would have called "critical" thinking in the first place. Cowen Index: 1. "The Black-Scholes equation" (What’s new) Terence Tao walks the reader through the Black-Scholes equation. "Super-Seniors: Your Questions Answered" (Felix Salmon) Part II of Felix Salmon's explanation of tranches, super-senior and otherwise. "What's a Super-Senior Tranche?" (Felix Salmon) Part I of Felix Salmon's two-part explanation of what a CDO is, what tranches are, and which one is the super-senior one. list.it .. before you forget it! a place to stash your information A firefox extension from the haystack group -- "list.it, the Latitudinal Information Scrap Trapper that Indexes Things - is a small, simple note-keeping tool for solving a big, complex task -- helping you manage the tons of little information bits you need to keep track of each day. list.it does this by focusing on speed and simplicity. We have gotten rid of everything except a way to get things in and out quickly, so that you can get things out of your head and somewhere you can access easily any time." ColorWiki - Delta E: The Color Difference Yes, this might be *exactly* what I want. "Fire: The Next Sharp Stick?" (John Hodgman at McSweeney's) "Well, before we go on, we all have to accept that not everything is going to appeal to Johnny Blunt Stick." Garcon (from Yahoo! Research) Recommendations based on delicious network. No details on how it works, and the recommendations (at least for me) don't seem very good. But at least it's fast? Can I use Google Reader offline? I did know, C, about the whole "Google Gears to use Reader offline" thing -- although, when I tried it a couple times when it first came out (a year or two ago?), it tended to choke when I tried to get back online after a couple of days. Apparently I had too many items in my feed. Maybe it's better now, I haven't tried it recently. As for getting your data out, Google Gears (as I understand it) works by storing its data in an embedded database (SQLite), in your Firefox browser. I would imagine that it'd be possible to figure out how to open up a connection to that database, separately from Reader, figure out what schema it's using, and read your stored links out that way ... I just haven't taken the time to track down exactly how one would do that (yet). "How many people do you know?: Efficiently estimating personal network size" (Andrew Gelman) I was browsing through my starred-reader posts yesterday, and I happened back across this entry from a few months ago: Andrew Gelman points out a paper on estimating "personal network sizes" (that is, degree estimates) in social networks, using "the scale-up method" from sampling (questionnaires, "how many people named Michael do you know?"). I've been thinking about database methods for graph data models recently, and one of the things that a good query planner needs to know (in this setting) is, what do the statistics of my data look like? This is possible to do in a straightforward way in a relational setting, but it's less clear in a semi-structured/graph setting... but something like this approach might be one way to estimate an answer, no? "Medical Coding: Accident Involving Spacecraft Injuring Other Person" (LingPipe Blog) "Being scientists, the presenters were not surprisingly, although unbeknownst to them, squarely within the logical positivist tradition. Clearly they never read Wittgenstein or Quine, or they’d be post-analytic. Much less Rorty, or they’d be neo-pragmatic, like me. Keep tuned for more philosophy once I clear out the technical backlog." Pellet: The Open Source OWL DL Reasoner "Pellet is an open source reasoner for OWL 2 DL in Java. It provides standard and cutting-edge reasoning services for OWL ontologies." We used this in our 6.830 class project, as a background example. Ted claims that Jena+Pellet is state-of-the-art for the Semantic Web, that it's "used by NASA." Databases, Lists, Maps, Rankings - Index - Data Desk - Los Angeles Times The LA Times "Data Desk" site -- I don't know why I didn't tag this before. An exercise for the reader: write a series of parsers that will take some of the LAT's data graphics and pull the data *back out*. "TQFTs via Planar Algebras (Part 3)" (Secret Blogging Seminar) If I told you that I looked at "planar algebras," and I saw something that reminded me of some of these dataflow languages, would you think I'm crazy? (Yes, it's probably crazy.) Michael Mitzenmacher, "Cuckoo Hashing and CAMs" (PPT) They're not labeled as such, but these are almost exactly (if not exactly) the slides that M.M. used for his talk here at the TOC colloquium yesterday. Questions to think about: Is there an 'algebra' of hashing schemes? And what would cuckoo hashing look like, if each level of the hash were on a different machine ('distributed' cuckoo hashing)? Does that make any sense at all? Picture 16 on Flickr - Photo Sharing! This is from the alpha-shapes-from-flickr-geocoded-images stuff. OpenStreetMap has info on Baghdad?? Sweeeeeeet. Okay, so, the flickr alpha shapes should be incorpoated into some kind of larger mapping system, toute de suite. "Why on-site renewables don't add up" (David MacKay, Sustainable Energy - without the hot air) Dad, I've been meaning to ask you: you should write something about the observatory project (the one you were telling me about at T-giving) on the blog ---- "The bottom line: if you want to completely power a typical building, or even an amazing eco-building, from renewables, most of those renewables have to be offsite. There isn't room on-site! And it's probably a better use of resources to accept this fact up front, rather than force developers to squeeze uneconomic figleafs (such as micro-turbines) into their developments. We should modify the planning regulations for new buildings so that developers are still required to build renewables, but are encouraged to build new renewable capacity off-site." "A NIPS paper" (Machine Learning (Theory)) "I’m interested in this beyond the application to word prediction because it is relevant to the general normalization problem: If you want to predict the probability of one of a large number of events, often you must compute a predicted score for all the events and then normalize, a computationally inefficient operation. The problem comes up in many places using probabilistic models, but I’ve run into it with high-dimensional regression. There are a couple workarounds for this computational bug: (1) approximate, (2) avoid, (3) [what this paper does] use a self-normalizing structure." "Genetic Programming: Evolution of Mona Lisa" (Roger Alsing Weblog) Ummm.... (stochastic) gradient descent's a hell of a drug? Nice pics, but I'm not sure why this is interesting. "Languages and games" (Statistical Modeling, Causal Inference, and Social Science) "Chess is a game for two players with complete information; you can't "explain" what's going on in a chess game in terms of bridge, which is a game for two sets of partners with imperfect information, a mixture of skill and chance which depends on skilful sharing of information between partners. And you can't "explain" either in terms of poker, which is a game for an indeterminate number of players, a mixture of skill and chance in which sharing of information between players would in fact be collusion and outlawed. A game is intelligible on its own terms - which means, paradoxically, that you can play a game with someone whose language you don't know, provided you both know the rules of the game." --- Recapitulating Wittgenstein, poorly, right? (My copy of "Philosophical Investigations" is at home, though...) "The Malcolm Gladwell Backlash" (Matthew Yglesias) The vitriol you're starting to see in the reviews of Gladwell is the same vitriol present in (many) anonymous reviewers' comments -- people are often upset to see someone else take the simple or straightforward approach, and have it work. It's jealousy, the pain of getting (intellectually) scooped! Which is not to say that the criticism isn't sometimes warranted. It's even more frustrating to see a straightforward idea fleshed out *poorly*. (And honestly, do I go out and buy each of Gladwell's new books? No! I dislike him too! Because I'm jealous of him, of course. Let's call a spade a spade, right?) "Maps, wait!" (Quantum of Wantum) Bryan, this is the blog post describing the Baghdad map project that John and I worked on last year -- the one I described to you over lunch. GDAL: GDAL - Geospatial Data Abstraction Library "GDAL is a translator library for raster geospatial data formats that is released under an X/MIT style Open Source license by the Open Source Geospatial Foundation. As a library, it presents a single abstract data model to the calling application for all supported formats." "Explore and Analyze Geographic Data with UUorld" (FlowingData) "UUorld (pronounced "world") is a 4-dimensional mapping tool that lets you explore geographic data - the fourth dimension being time." -- It's awesome that they had to clarify that. Also: "How effective is this method of visualization though? There's the usual argument of area perception, but does color-coding and vertical dimension make up for that?" --- but how in the hell could color-coding make up for that? SUBDUE - Graph Based Knowledge Discovery "SUBDUE is a graph-based knowledge discovery system that finds structural, relational patterns in data representing entities and relationships. SUBDUE represents data using a labeled, directed graph in which entities are represented by labeled vertices or subgraphs, and relationships are represented by labeled edges between the entities." Cantrill and Bonwick, "Real-world Concurrency" Tips on locking and concurrency, from the ACM Queue book, "The Concurrency Problem." Composing Contracts "This document is an unofficial example implementation of the system originally described in the paper Composing contracts: an adventure in financial engineering, by Simon Peyton Jones, Jean-Marc Eber, and Julian Seward. Familiarity with both versions of this paper is assumed." A stand-alone Haskell library that implements most of the SPJ paper. "Preemptive Blogging" (John Snavely’s Blog) John's post about (essentially) blogging across platforms, including delicious. Kottke shows up in the comments, which is also funny. Facette: Faceted Tagging for Delicious A tool, from one of the Haystack guys (Peter Lai, I think), that tries to automatically classify your delicious tags into "facets." You need greasemonkey installed to get it to work fully. Nice interface... the whole thing looks pretty sweet. One of my 6.830 teammates (another Haystacker) told me about this... Remembering on the web - 5 reasons why online bookmarking is the wrong tool Maybe 50% right, but probably 100% thought-provoking. I agree with Reason #1, disagree with #3, can't understand #4, doubt #5, and wonder what happened to #2. John, you know that my feeling is probably something along the lines of "delicious is awesome, but it's only about half the tool I need." Followed closely by, "delicious isn't really like a 'bookmarks' feature (despite its name), it's more like a 'blog'." I can use my blog to remember things too, of course. Finally: I find his comparison of the relative utility of searching delicious vs. Google totally backwards. TCA Journal Excerpts (7) I've been laughing over these all day... LaTeX Equation Editor - SITMO LaTeX images for blogging. indiemaps.com/blog » noncontiguous area cartograms I totally saw that yesterday, too. (Thanks for the link!) But what's the point of a cartogram, if you no longer have to maintain the contiguity of the original map? This kind of thing would be easy to program, if all you have to do is rescale the states (for instance) in place. Lieberson and Horwich, "Implication Analysis: A Pragmatic Proposal for Linking Theory and Data in the Social Sciences" (Sociological Methodology) Along with six responses. Via the Harvard's Social Sciences Statistics Blog. Scalability Perspectives #4: Kevin Kelly – One Machine | High Scalability "There is only one time in the history of each planet when its inhabitants first wire up its innumerable parts to make one large Machine. Later that Machine may run faster, but there is only one time when it is born. You and I are alive at this moment." -- I honestly have no idea what this means, except that it lessens (was that even possible?) my opinion of Kevin Kelly. What are the implications of "one machine?" Does it have to include *every* computer, by definition? Obviously this "one" machine will be always running a billion (or whatever) concurrent processes, always emulating lots of smaller machines, and there will be no way to program it *as a single machine,* so what then? I conclude your use of the word "machine" is doing too much work, and I dismiss your writing as that of a crackpot (albeit, a crackpot who helped found and run and influential tech magazine, natch). "Ask The Mineshaft" (Matthew Yglesias) Cowen Index: 6. (Why do 90% of the Cowen Indices I assess seem to have a value of '6'?) "Natasha's question" (Marginal Revolution) Tyler Cowen's wife is obviously an accomplished lifeman (lifewoman?). Hu, Lee, and Lee "Distance indexing on road networks" (VLDB 2007) From the session "Indexing for spatial & sequence data." "A comment about “Mathematical undecidability and quantum randomness” by Tomasz Paterek et al." (Mathematics and Computation) "Ok, did everyone get that? They can only handle finite theories expressed in the propositional calculus. They did not solve an undecidable problem, and they know it." Coffee - Niemann Opinion Art Blog (NYT) Excellent. (I love the graph napkin, the most.) Cal Henderson, "Building Scalable Web Sites" 1st ed. atomsmasher "atomsmasher works by facilitating the construction of a single, consolidated world model consisting of an entity database and state model from heterogeneous web data sources This representation is used to drive a rule-based behavior engine built in Javascript that manages efficient evaluation of rule triggers (antecedents) and rings (actions) given the sequential arrival of new information." Supposedly a beta version was supposed to be out last month? I should knock on some doors to find out what's what. Josh Benaloh, "Ballot Casting Assurance via Voter-Initiated Poll Station Auditing." Via Dan Wallach at Freedom to Tinker (http://www.freedom-to-tinker.com/blog/dwallach/future-voting-technologies-simplicity-vs-sophistication). PLUM: towards lifetime user modeling "In this project, we propose a strategy for enriching end-user applications with information about their users obtained using three simple strategies: mining information already available on a userùs own personal devices, logging user activity and contexts unobtrusively through these same devices, and by opening up channels by which users can easily and flexibly express knowledge to the system as part of his or her workflow. " Max van Kleek and Michael Bernstein (in Haystack). "How Transcription Affects Genomic Organization and Vice Versa" (The Daily Transcript) Alex Palazzo writes about data that I know something about! Quotes the Barski paper, as well as a paper with Rick's name on it. "You can look at these graphs and say, so histone modifications dictate gene expression!" Yeah, you could. You could look at them and say other things, too. Also, looking at the *averaged* mark profiles from experiments which are (themselves) averaged across millions of cells (as well as averaged over some period of time) is ... problematic. Google Insights for Search "With Google Insights for Search, you can compare search volume patterns across specific regions, categories, and time frames." Interesting! I wonder if it has an API? (Picked up from here: http://statestats.appspot.com/, which uses it for state-by-state comparisons.) "Unfortunate Obituaries: The Case of David Freedman" (Seth’s blog) "I wonder if someday we will look back on Freedman’s behavior and the way powerful institutions (such as UC Berkeley) supported it the same way we now look back on racism and its support by powerful institutions (”institutional racism”). Freedman’s dismissal of and failure to learn from people different from himself is just human nature, like racism, but of course history is all about eventually rising above that sort of thing." ---- Damn, man. Talk about speaking ill of the dead... Lander & Waterman, "Genomic Mapping by Fingerprinting Random Clones: A Mathematical Analysis" A foundational paper (from 1988). Film personality test (Kottke) Great idea, but using films-as-features is only half the story -- we need to associate personalities with particular patterns of film preferences, if this is to have any use. Maybe some semi-supervised learning would be appropriate, here? We need to start collecting a dataset. Zhu, Kruglyak, Schadt, et al. "Integrating large-scale functional genomic data to dissect the complexity of yeast regulatory networks" (Nature Genetics) Already in Connotea, but I need to remember this one... TISCH - Tangible Interactive Surfaces for Collaboration between Humans "We believe that the logical next step is the design of a software architecture, which allows developers to focus on writing applications instead of focusing on low-level stuff like hardware issues, gesture recognition and so on." A software library, for doing higher-level programming with multi-touch stuff. "Inspired by Jeff Han." Shental, Dolev et al. "Gaussian Belief Propagation Solver for Systems of Linear Equations" Bickson, Tock, Shental, and Dolev, "Polynomial Linear Programming with Gaussian Belief Propagation" Abe et al. "Self-Organizing Map (SOM) unveils and visualizes hidden sequence characteristics of a wide range of eukaryote genomes." David DeWit, "Partitioning Sparse Graphs using the Second Eigenvector of their Graph Laplacian" 3.4. Fiedler Ordering "A recently proposed heuristic for the symmetric envelope minimization problem involves sorting the rows/columns of the matrix by the values of associated entries in the Fiedler vector of the graph of nonzero entries. This approach was proposed at about the same time by several different researchers [BPS93,JM92,PMGM94a,PMGM94b], and seems to often produce better orderings than more traditional combinatorial methods, albeit at a somewhat increased cost." Implementing a Data Source - Google Visualization API - Google Code "This page describes how you can implement a data source to feed data to visualizations built on the Google Visualization API. Your data can be anything that fits into a two-dimensional table and can be sent in response to an HTTP GET request." "What is New Trade Theory?" (Alex Tabarrok at Marginal Revolution) "Here is a primer on one of Krugman's key contributions, New Trade Theory." "Philosophy of science: Theories of almost everything" (P.M. Binder in Nature) "Writing in Physica D, David Wolpert has made headway in this direction by demonstrating that the entire physical Universe cannot be fully understood by any single inference system that exists within it." I think Robin sent this to me? No idea what to make of it. "Fear and Trembling and the Incarnation" (The Valve) I don't know if he's right... but there are times when John Holbo's writing is just so *clear* that you think, "it must be right." Reality Mining | MetaFilter "Reciprocity in China" (Seth’s blog) "And the reciprocity norms of rich countries take the form they do because the countries are rich." "Scale: How Large Quantities of Information Change Everything" (Good Math, Bad Math) Implicit Cowen Index: 6. ("Scientists now casually work with massive quantities of information, and by doing that, they're finding completely new ways of doing science. Ten years ago, if they could have gotten state of the art equipment, the scientists looking at the Ebola genome could probably have sequenced it. But they would never have dreamt of comparing it to feline leukemia virus." -- bwahahaha....) "Understanding Synthetics" (Felix Salmon) Sometimes I wonder how much of the financial crisis we can blame on Simon Peyton-Jones. "Tarpipe begins to tackle personal content overload" (CNET) I believe this will sound intensely familiar to you... Liane Gabora, "Modeling Cultural Dynamics" (arXiv) "EVOC (for EVOlution of Culture) is a computer model of culture that enables us to investigate how various factors such as barriers to cultural diffusion..." Honestly, the actual implementation doesn't strike me as that interesting ("neural network based agents... not a Darwinian but a Lamarckian process..."), but the cultural evolutionary simulators seems like the right *idea*, at least. Melting into Air: A Critic at Large: The New Yorker Can we have a discussion of fixed points now? Not to say that this wasn't a Ponzi scheme, but not everything that "looks like" such a scheme actually *is* one. To blog, with regards to PageRank. (kthxbye.) jgpu: Home Alex and I were literally just discussing whether something like this (Java library for GPU programming) existed, this morning. "Le Citi Toujours Dormer..." (Brad DeLong) What starts as a smackdown of the NYT turns into an excellent point-by-point review of Citibank's crisis (specifically), and bank-failures-during-financial-crises (in general). "The Feds Double Down on Citigroup" (Andrew Samwick) "The technical term for this is a joke." Eeeeesh. Race and D&D - Ta-Nehisi Coates TNC links to a blog about "Race and Dungeons and Dragons." --- "I don't blame that on Jackson or Tolkien. If someone was doing a fantasy epic based on Xhosa creation myths, I wouldn't expect to see any white people." --- and obviously, there's both something here and not. Probably there should be a distinction between "D&D: the rules" and "D&D: the campaign settings." In comments, someone brings up the great datapoint of the Drow. Also, the blog he links to has an entire post dedicated to the Al Qadim campaign setting. Still! Showdown - Java HTML Parsing Comparison | Lumidant Comparison of several Java html parsing packages. This'll be important for network crawling... StatPlanet – Interactive Data Visualization through Maps and Graphs "Interest in computer science is volatile" (Social Science Statistics Blog) Gender-based trends in computer science are (a) volatile, but (b) volatile-conditioned-on-gender in different ways. Different now, but not *always* different. Which is weird. I'd like to see this graph overlaid on a graph of a general economic indicator (unemployment? growth?). Also, notice that this is "intent to major" and not "admissions," and it'd be important to think about how those two are related. "Regulatory Genomics, Systems Biology and Dream3" (Thirst for Science) Robin's rundown of the RECOMB satellite conference from a few weeks ago. Includes links to the "activity motifs" paper. "Building a polynomial from its roots" (Reasonable Deviations) We always think of taking a polynomial and finding its roots -- but what about the other way around? Say you're given a set of numbers and you want to determine the polynomial (that is, the coefficients of the polynomial) for which they are the roots. Not surprisingly, the answer is recursive.... although the double-indexing of the coefficients confused me a little, the first time through. (This is a constructive way of finding the generator of the principal ideal I(X) for some variety X in R, right?) Joaquin Quiñonero Candela's HomePage Q-C's publications -- I wish I could find a table of contents for his new book, though. "Statistics 36-835: Statistical Models and Methods for Networks" Class notes from a course taught by Stephen Fienberg. Includes an index of readings, as well as a link to a preliminary book (paper anthology, edited by, among others, Fienberg, Blei, and Xing) on the same. Lemonade Stand Promises Visual-Basic source code that is faithful to the original lemonade-stand game. But I can't seem to extract it properly (and I can't find a writeup of the original rules on the web). Help, please? I just want to look at the source code! "Why are music reviews so positive?" (Marginal Revolution) For some reason, I'm on a collaborative-filtering/rating kick recently -- "I might add that Washington Post restaurant reviews are far too positive. If WP readers were simply told "There are hardly any good restaurants in your crummy little city," this wouldn't do much for WP circulation or advertising revenue." -- the funniest line I've read all morning. "Lucene or a Database? Yes!" (LingPipe Blog) Sam Madden was just talking about this in class this week. Lucene is (essentially) a system conforming to the "classical" definition of a search engine. A similar, but more thorough, comparison of the two types of systems can be read "Combining Systems and Databases: A Search Engine Retrospective," (http://www.cs.berkeley.edu/~brewer/papers/SearchDB.pdf), by Eric Brewer (of Inktomi fame). Plangprasopchok & Lerman, "Modeling Social Annotation: a Bayesian Approach" A model for social data and tagging, yes. But not a *social* model, and certainly not something that models the social aspect of the tagging. It seems like an obvious generalization, no? "Netflix Prize scoring function isn't Bayesian" (Aleks Jakulin) "Now, your model might choose not to make recommendations with controversial movies - but this won't help you on Netflix Prize - you're forced to make errors even when you know you're making them. (R)MSE is pre-probabilistic: it gives no advantage to a probabilistic model that's aware of its own uncertainty." Dabble DB - Create an Online Database A Zoho competitor? "Hard information management that should have been easy" (Haystack Blog) John -- the Haystack blog is now required reading. Amazon Web Services (AWS) Hosted Public Data Sets For "public datasets," Amazon will (I take it) allow you to download the data to your own EC2 instance free-of-charge. I assume this is a way to get more people doing data analysis into their specific cloud? But it still seems like a fascinating way to go, and this is probably the direction that scientific computing should take anyway... "Thoughts About Why GM Executives Are Clueless And Their Destructive “No We Can’t” Mindset" (Bob Sutton) Via Felix Salmon, an interesting blog post about insitutional problems at GM in the wake of last week's capitol hill testimony. "I also believe [the auto-industry bailout] will be a waste because the leaders of these firms (at least GM, which I know best) are so backward and misguided that the thought of giving these bozos any of my tax money turns my stomach..." -- especially look for the anecdotes about talking-times in GM meetings. "Annals of Drinking: A Better Brew" (The New Yorker) Burkhard Bilger interviews the brewers of Dogfish Head brewery. Also talks about hard woods from South America, and Budweiser vs. craft breweries. Via DSquared. Interesting throughout. The Screens Issue - If You Liked This, Sure to Love That - Winning the Netflix Prize - NYTimes.com "The first major breakthrough came less than a month into the competition. A team named Simon Funk vaulted from nowhere into the No. 4 position, improving upon Cinematch by 3.88 percent in one fell swoop. Its secret was a mathematical technique called singular value decomposition. It isn’t new; mathematicians have used it for years to make sense of prodigious chunks of information. But Netflix never thought to try it on movies." --- OMFG. I refuse to believe that. allRGB Creating images that use every 24-bit RGB color (~16 million colors) once per pixel (and not more, or less). This would be useful, I think, when thinking about color? Also, it's useful to compare some of the ordered images against randomized versions of the same (which look completely gray). A reasonable color-scaling algorithm is (possibly) going to be context dependent, right? Clarkson, "Nearest-Neighbor Searching and Metric Space Dimensions" (PDF) A reasonable (and reasonably-recent) review of nearest-neighbor algorithms, including some that are useful in high dimensions. If I were looking for references on NN search, I would (in general) start by looking at the publications of Ken Clarkson or Piotr Indyk, or Indyk's former student Mihai Badiou, and start working backwards from there... "The Denominator, or, Is it an advantage to have a humble background?" (Andrew Gelman) Once a certain degree of insight has been reached... anyway, this post should be read in conjunction with Ed Felten's note about TSA hit rates (http://www.freedom-to-tinker.com/blog/felten/low-hit-rate-isnt-problem-tsa-screening). "Bike Hero-gate" (Infovore) Bike Hero was faked. Thoughts Arguments and Rants » Blog Archive » Decision Theory Notes "It is really hard to offer a motivation for exclusively playing equilibrium strategies in one-shot zero-sum games that doesn’t look like a motivation for taking one-box in Newcomb’s problem." Just spent a while browsing his decision theory notes. The "tickle defense" on page 85, in particular. Bill Tozier explains Behavior-Driven (and Test-Driven) Programming. Programming ... I'm doing it wrong. JBehave - Behaviour-driven development in Java "Behaviour-driven development in Java." CS 294-5, Spring 2006 A set of lectures from a class, taught by Richard Karp, on "Great Algorithms." Includes a nice compact description of Buchberger's algorithm, and several other gems as well. "No Price Discrimination Here" (The Sports Economist) "Being non-profit does not mean that you don't have profits as an objective. All it does is restrict what you can do with earned profits, meaning that they can't be dispersed to shareholders. ... At a meeting when I jokingly brought up the fact that my university is a non-profit, I was told by an older gentleman at my table 'Oh, we get plenty of profits. We just make sure we spend it all.'" Host - The Atlantic (April 2005) David Foster Wallace's profile of John Ziegler from 2004. Just ... amazing. In every detail. I love the fact that he got his start in Fuquay-Varina. I love the short aside on the 2004-era radio ads for home refinancing (topical!). I love the pop-up footnotes. Everything. The entire essay is genius. "Markets in everything?" (Marginal Revolution) "Didn't I read as recently as ten years ago that "Jurassic Park" scenarios were more or less impossible?" But which part of the scenario are you talking about, Tyler? And anyway, that was 10 years ago. Science has (cough) advanced a wee bit, in the last ten years. Cowen Index: as yet unassigned. Yeast consensus metabolic network "This page is a portal to the consensus network model, as published in the article Herrgård, Swainston et al. (2008) "A consensus yeast metabolic reconstruction obtained from a community approach to systems biology" Nature Biotechnol. 26, 1155-1160." "Is Nested Clade Analysis Worthwhile?" (evolgen) "That raises the question: why do people continue to use NCPA if it hasn't been shown to work? It can't be because they don't know of the limitations of NCPA -- they're citing papers that layout those limitations." --- And then later, in comments: "The Knowles paper is a great rant: they should allow comments at the end and call it a blog." That would make a great snowclone: "X is a great rant. They shoudl add/allow comments at the end, and call it a blog." "Estimated votes by county among non-blacks" (Red State, Blue State, Rich State, Poor State) Really glorious map of Obama's vote-share by county, among non-black voters. The post addresses issues of county-level voting, including dividing by race... but I think it also touches on some of the issues of color and mapping that we've been talking about. There's the "hidden variable" of county-level population here, too. "Chuck Klosterman reviews Chinese Democracy" (The A.V. Club) "I've been thinking about this record for 15 years; during that span, I've thought about this record more than I've thought about China, and maybe as much as I've thought about the principles of democracy. This is a little like when that grizzly bear finally ate Timothy Treadwell: Intellectually, he always knew it was coming. He had to. His very existence was built around that conclusion. But you still can't psychologically prepare for the bear who eats you alive, particularly if the bear wears cornrows." Research: "Center for Financial Engineering" "The Test Passes, Colleges Fail" (Peter Salins, NYT Op-Ed) "In the 1990s, several SUNY campuses chose to raise their admissions standards by requiring higher SAT scores, while others opted to keep them unchanged. With respect to high school grades, all SUNY campuses consider applicants’ grade-point averages in decisions, but among the total pool of applicants across the state system, those averages have remained fairly consistent over time. Thus, by comparing graduation rates at SUNY campuses that raised the SAT admissions bar with those that didn’t, we have a controlled experiment of sorts that can fairly conclusively tell us whether SAT scores were accurate predictors of whether a student would get a degree. The short answer is: yes, they were." --- I totally don't understand (maybe I should read the paper, first). Does this make sense at all? He's comparing changes in the *averages* of the GPA and SATs of incoming freshmen, and using those to make *causal* connections to graduation rates by institution? Really? The Behavior of Algorithms Dan Spielman's 2002 course at MIT. Most of the lecture notes are included here, including notes on "Gaussian Elimination without Pivoting." Combinatorial Preconditioning Dan Spielman's page on "combinatorial preconditioning," or "support theory," including links to the paper, "Nearly-Linear Time Algorithms for Graph Partitioning, Graph Sparsification, and Solving Linear Systems," which later was divided into three papers. Includes supporting materials, additional links, etc. The "NIBBLE" algorithm. Anyanwu and Sheth, "Ρ-Queries: enabling querying for semantic associations on the semantic web" "In the context of a graph model such as that for RDF, Semantic Associations amount to specific certain graph signatures. Specifically, they refer to sequences (i.e. directed paths) here called Property Sequences, between entities, networks of Property Sequences (i.e. undirected paths), or subgraphs of r-isomorphic Property Sequences.The ability to query about the existence of such relationships is fundamental to tasks in analytical domains such as national security and business intelligence, where tasks often focus on finding complex yet meaningful and obscured relationships between entities. However, support for such queries is lacking in contemporary query systems, including those for RDF." "Stephen Boyd’s lectures on Convex Optimization" (Reasonable Deviations) Links to the lectures/transcripts/videos of two classes of Stephen Boyd's "Convex Optimization." Biological color vision inspires artificial color processing : SPIE Newsroom: SPIE.org Heat maps aren't the only area in biology where "color" is an issue. There's been a lot more thought about color and intensity in the microscopy field than in bioinformatics, I think. Cook & Torrance, "A reflectance model for computer graphics" qwantz.com - dinosaur comics - May 26 2005 "Everyone! Program Harder!!" qwantz.com - dinosaur comics - September 04 2003 "I have started a new screenplay, based on the Structured Query Language for databases! It's called... 'UPDATE bodies SET status = 'DEAD'!'" "The future of photography" (Freedom to Tinker) Dan Wallach writes about High-Dynamic-Range photography (and the possibility of getting this "automatically" out of new high-resolution cameras that can shoot video). "Java Daemon" (Peter Williams) "Lately I have been writing a Java program that needs to run in the back ground (like a daemon). I found a couple of neat little tricks that can make this easier." Actually, this taught me a couple of things about Unix, more than anything else. JMG Lammens, "A computational model of color perception and color naming" A doctoral thesis from 1995. Can everyone see the potential here? "Influences on Tag Choices in del.icio.us" (Haystack Blog) "Your Favorite NP-Complete Cheat" (Coding Horror) So much wrong in a single blog post. "NP-complete problems are like hardcore pornography. Nobody can define what makes a problem NP-complete, exactly, but you'll know it when you see it." No, no no no no. The definition of NP is perfectly clear, and NP-completeness is built off of it: if a problem (language, theory, whatever) is in NP, and if any other problem in NP can be reduced to it in polynomial time, then it's NP-complete. A shortcut to showing that a language in NP is NP-complete is to show that *another* NP-complete language is reducible to it. And if you've proved any NP-complete language is in P, then you've proved P=NP. This isn't hard! It's undergraduate-level stuff. Why do I still read Jeff Atwood? (I have no idea.) Yacker User Guide "Yacker parses ABNF and generates parser executables in a variety of languages." Online! Nice. Bryan Ford, "Parsing Expression Grammars" (LtU) "Wi-fi structures and people shapes" (cityofsound) Clearly he should have created a cartogram of the floorplan (floorgram?). Gastner and Newman, "Diffusion-based method for producing density-equalizing maps" (PNAS) The cartogram paper, for future reference. ScapeToad - cartogram software by the Choros laboratory An alternate implementation of the Newman and Gastner diffusion cartogram method. Open source, and in Java. "The “predict flu using search” study you didn’t hear about" (Oddhead Blog, David Pennock) The rules of getting scooped change, when you're interacting with the New York Times. Didn't you know that? PhotoRec - CGSecurity "PhotoRec is file data recovery software designed to recover lost files including video, documents and archives from Hard Disks and CDRom and lost pictures (thus, its 'Photo Recovery' name) from digital camera memory." "Hack Day photos and blog posts" (The Freebase Blog) David Huynh shows up in the photos of the Freebase Hack Day. Maybe I should be paying more attention to Freebase... Patrik Hoyer's publications Hosted by Tommi, coming to talk here tomorrow... thinking of going to see it. How to Tell if Your Cat is Plotting to Kill You Watch out. The Morpheus Data Transformation Management System Michael Stonebraker was just talking about this project in class today. "Alternative to Cartograms Using Transparency" (FlowingData) Instead of using cartograms... *don't* use cartograms? I don't understand. "Applications of Algebraic Geometry" (Reasonable Deviations) "The IMA Thematic Year on Applications of Algebraic Geometry was a series of events that took place at the University of Minnesota’s Institute for Mathematics and its Applications (IMA) from September 2006 to June 2007." Yeast Literature Corpus - Machine Learning Group (UCD) "The grammar of graphics." (Vincent Zoonekynd's Blog) Wow, this book sure would be an awesome gift. Phil. Trans. B issue on cultural transmission and the evolution of human behaviour Apparently, MIT's got the hookup. Is there a particular article(s) you want? Xu, Caramanis, and Mannor, "Robust Regression and Lasso" "It is shown in this paper that the solution to Lasso, in addition to its sparsity, has robustness properties: it is the solution to a robust optimization problem. ... In particular, it is shown that robustness of the solution explains why the solution is sparse. ... based on this approach, a proof that Lasso is consistent is given using robustness directly. Finally, a theorem saying that sparsity and algorithmic stability contradict each other, and hence Lasso is not stable, is presented." "100 years of genetic research and science journalists are still confused" (The Daily Transcript) Alex Palazzo's take on the NYT Genes article. "The science of doping" (Nature) Bayes Rule and Lance Armstrong. The statistics of detecting doping (performance enhancing drugs). "Another 'Barrier' Has Been Broken" (Capital Gains and Games) "I agree with just about everything Robert Reich says [in a new article]... We always suspected that it might be possible, we may have hoped, but now we have the demonstration. Sometimes, I think the political spectrum from Left to Right is curved, and that I am so far to the Right that I find myself closer in viewpoint to those far to the Left rather than in the middle of the spectrum." Obama Victory Speech This is getting ridiculous. You can't just overlay the Shepard Fairey poster onto any old text -- either come up with some neat word-to-color mapping and then compose the image out of *those*, or get another gimmick, or just quit trying. Just taking some image and recoloring the pixels so that it is *also* an Obama poster is stupid. The Sounds of Music - Paper Cuts Blog - NYTimes.com "Over the centuries, musicians have accommodated this discrepancy, called a comma, through a huge number of tuning systems — well over a hundred — but the prevailing system today, the one we all know and use, is equal temperament (ET)." --- We should, at this late day and in this late age, have the ability to create automatically self-tuning (and re-tuning!) pianos, which produce the "correct" tuning for particular pieces of music. ---- "Bach, Mozart and Beethoven may be great, but they are not great in any absolute sense because they are servants to tuning systems of their particular time and place." -- But the first half of that sentence doesn't follow from the second half, at all! xpkg: Generic Package Managment Software "xpkg is just such a bunch of generic package management tools. I've extracted these tools from the xjs project and these tools will be able to work on the client and server for any project that could benefit from package distribution on any operating system." "Wikipedia as a Public Good" (Freedom to Tinker) "For Wikipedia, there isn't so much a "free-riding problem" as there is a "free-riding opportunity": the more free-riders there are, the easier it will be to recruit new contributors down the road." MAGE-ML There's gotta be a better repository for information about MAGE-ML than this... Code your own election mashup with Google's JSON data Contains links to Google's JSON-formatted election data with (I believe) county-level information. "Phoenix Suns shooters" (Andrew Gelman) Gelman posts the "heat plot" pictures of the Phoenix Suns shooting locations, that someone named "Yair" sent him. These are (a) the reason why I wanted to scrape that .swf NBA shooting chart, (b) subject to the same criticisms of the use of color for representing value on maps, and (c) totally relevant to the general idea of extracting data from pictures. "Can a blind person whose vision is restored understand what she sees?" (Cognitive Daily) "Overall, the researchers found that S.R.D. could complete nearly all of the tasks just as accurately as the others, although anecdotally it took her about 5 to 10 seconds longer than the people who had never experienced blindness. There were two exceptions. She wasn't quite as good at recognizing faces (though she still was more than 75 percent accurate), and she had a peculiar difficulty in judging gaze direction." "coloring your opinion" (Manifest Density) Tom's post about color scales (linear or otherwise) and their uses in election maps. The problem is, if anything, worse in the arena of biological graphics. Didn't I write something about that, at some point? If so, I need to dig it up. If not, I need to write it down. cart: home page Newman's cartogram creation software, including Java code (apparently). Election maps Mark Newman's diffusion-based cartograms for 2008, with a version that includes a "nonlinear color scale" at the very bottom. "Now: The Rest of the Genome" (NYT) Carl Zimmer interviews Tom Gingeras about the ENCODE project. "It’s Not all in the Genes Anymore" (NYT) Natalie Angier interviews Evelyn Fox Keller and Eric Lander. Cartogram (Wikipedia) "A cartogram is a map in which some thematic mapping variable – such as travel time or Gross National Product – is substituted for land area." To do: look for links to publicly available cartogramming software. (I'm assuming this is related to things like graph layout algorithms, right?) "The Shape of Alpha" (Code: Flickr Developer Blog) Generating map contours based on geotagged photographs. "bud selig, organizational rules, and authority" (orgtheory.net) Competing conceptions of "authority" (rational-legal vs. cognitive) -- with reference to the near-fiasco during the World Series, and to Bud Selig. "Words and Credit Scores" (Social Science Statistics Blog) Word-frequency models for P(default), based on loan applications from P2P lending sites. Filing this away for use at a later date, when (a) I have more money, and (b) it wouldn't be insane to lend it to somone on "a P2P lending site." "Inside Obama" (John Derbyshire) Derbyshire's so smart, he can *intuit* factor loadings in his own brain! No need for your fancy linear models with hidden variables. Also, I note his line "heritability of IQ, a well-established fact..." without additional comment. (It's still funny, that Derbyshire is somehow the "science guy" at The Corner.) "Your Shrinking Sense of Humor" (PHD Comics) "That's not normal, that's a Levy skew alpha distribution!" This is a 100% correct observation. Haystack Blog The haystack folks are keeping a blog. This may also be of interest to some of you... Edward Benson Ted is my one of my teammates for my 6.830 final project, this semester. Infovore » Blog All Dog-eared Pages: This Gaming Life, by Jim Rossignol SSRN-The Economics of Structured Finance by Joshua Coval, Jakub Jurek, Erik Stafford "It's an ill Lind that blows no minds" (Julian Sanchez at Ars Technica) I don't know about the rest of his comment, but Sanchez's opener is the best sentence I've read all day (granted, it *is* only 10am): "... those who study the past are condemned to imagine it repeating itself..." System Biology: Ready, or Not?. In the Pipeline: "Systems biology – depending on your orientation, this may be a term that you haven’t heard yet, or one from the cutting edge of research, or something that’s already making you roll your eyes at its unfulfilled promise. There’s a good spread of possible reactions." --- Derek Lowe provides some general reaction to "systems biology" in the large, but doesn't seem to reach any particular conclusions about it... When Can You Trust Economics Papers? - Economics Blog - Zubin Jelveh Daniel Davies on Steven Levitt: "it looks to me as if Levitt is using quite strong instruments." But Jelveh's whole (long) post is deep and thorough and worth reading, on the topic of academic publishing, blogging, and media coverage of science. This is important stuff. Pew Research Center Reports on Election '08 "Me want dataaaaa...." AITopics / ArtificialIntelligenceAndMolecularBiology PDFs of the out-of-print book, "Artificial Intelligence and Molecular Biology." When I first saw this, I thought, "meh." But then I looked a little closer -- chapter 2 by David Searls, chapter 8 from Peter Karp, and few others that look quite interesting as well. The whole thing is a little old, but good ideas never go out of style. Definitely worth a closer look. Carlin and Louis, third edition (Andrew Gelman) Gelman gives his opinion of the Carlin & Louis book. I randomly bought an earlier edition of C&L when I was in college, at a used book store. So now I've got G&R sitting next to C&L on the bookshelft, and I can actually follow some of his comments -- how seredipitous. Statistics 36-350: Data Mining (Fall 2008) Cosma Shalizi's course in data mining (that he's currently teaching, I guess?). Lectures 10, 11, and 12 have probably the best basic description of PCA that I've seen in one place, from first principles. shotChart_02.swf (application/x-shockwave-flash Object) ESPN Flash SWF for NBA shot distributions. Aiming to see if this can be scraped in a reasonable way... Comic Book Resources > CBR News: Rubio & Boatwright Sing The "Cemetery Blues" Ryan Rubio's writing graphic novels?? That's pretty great. Cindy McCain Claims She’s ‘Just Like Any Other Female Human’ (The Onion) Why hadn't I seen this already *before* the election? GSOIS--The Graduate School of Operational and Information Sciences at NPS David Alderson's publication list. www.lshift.net/election: Obama v McCain - battleground graph "Entropy, Diversity and Cardinality (Part 1)" (The n-Category Café) "André Joyal explained to me that he likes to think of entropy as something like cardinality. More precisely, the exponential of entropy is like cardinality." "The evidence of lived experience" vs. the statistics (Andrew Gelman) 'razib' shows up in the comments, makes cracks about The Corner. Cognitive dissonance! "The Kind of Email I Don't Need" (The n-Category Café) "I hope these are jokes rather than the works of seriously deluded minds. " John Baez is just *asking* to be creatively joke-spammed. Polenta: Centuries-old Italian dish a proud link to a pleasant past We could probably get a little more adventurous here... "What is causal about a causal dag?" (Causal Analysis in Theory and Practice) Judea Pearl writes about "causal dags" --- "The phrase "what is causal about … " can be answered at three distinct levels of discussion: 1. Interpretation, 2. Construction and 3. Validation. ... There is, I admit, some finger crossing in such judgment, as there is in any judgment, but the amount of guesswork is much much less than in the highly respected "Let us assume strong ignorability" which no mortal understands, except through translation into "no correlated hidden factors."" "Computer Science AP Test" (My Biased Coin) "Several people said, "if you wonder why kids are turned off on CS in high schools, just look at the questions on the AP exam to see what image of the field is projected." Some proposed, half-seriously, that we ask the College Board to rename the existing course and exam "AP Computer Programming" (no dice, it has to be the name of an academic department in colleges and universities)." ---- It's an interesting question, to ask "what a proper high-school course in computer science would look like." I wonder if you could title a class something like "Problem Solving," starting off with, "what is a problem, what is a solution, and what is a recipe [algorithm] for finding solutions to a problem?", and based around solving a series of exercises over the course of a semester -- sometimes with a computer, and sometimes without one. dpkt - Google Code "fast, simple packet creation / parsing, with definitions for the basic TCP/IP protocols." "Combining high thoughput screens with small biology to gain insight" (The Daily Transcript) Meso-scale biology research. "The Case for Crude Measures" (Yglesias) "The best you can hope from a regulatory regime is that it will be a satisficing solution wherein some fairly crude rule will improve on the outcomes generated by the unfettered market. When that’s not the case, we may as well let the market go unfettered even though that, too, will be somewhat sub-optimal. ... We shouldn’t count on being to fine-tune our results to perfection, we should either lean in with a heavy hand or else stay away." -- Right, so what he's outlining here is the case of some kind of regulatory *overfitting*. I wonder if we couldn't be talking about the complexity of different classes of regulatory control. "The VC-dimension of a set of laws which control the overall size of banks and their capital reserves is ..." "Obama is a Hater?" (John & Belle Have A Blog) Belle FTW!!1! I was reading only the second paragraph of this post, when I immediately realized what the correct punchline would be. Indeed, I scroll down to the end and -- there it is! B.W. writes the jokes the rest of us only repeat silently in our hearts. I only wish La was on del.icio.us (indeed, online anywhere?) so that I might send this to her. "theoreticians and the polls" (scatterplot) Cowen Index: 2 "Atypical Typing" (The Universe of Discourse) MJD's talk at OOPSLA 2008, plus extra anecdotes. Dweeb languages are awesome. Tactical Assassin Substratum | GameKrunch UMass Amherst CHFA Visioning Project :: Data-Rich Humanities Research "This massive influx of new data means that, for the first time in history, results obtained from close analysis of specific texts can readily be tested against the quantitative characteristics of entire literary genres, styles, and dialects." Via the Language Log. “There never was such a fog” (Edge of the American West) "On this date sixty years ago, a poisonous fog descended on the small town of Donora, Pennsylvania." typeface.js -- Rendering text with Javascript, <canvas>, and VML "typeface.js uses browsers' vector drawing capabilites to draw text in HTML documents. For a good while, browsers have had support for vector drawing -- Firefox, Safari, and Opera support the <canvas> element (as well as SVG), and IE supports VML." Saturday Night Live - Road to the White House - Video - NBC.com "How I write JavaScript Widgets" (Peter's Blog) Basically, I need to look at more peoples' code. "My Personal Work Incentives" (Greg Mankiw's Blog) Greg Mankiw feels the soidenfreude -- "The bottom line: If you are one of those people out there trying to induce me to do some work for you, there is a good chance I will turn you down. And the likelihood will go up after President Obama puts his tax plan in place. I expect to spend more time playing with my kids. They will be poorer when they grow up, but perhaps they will have a few more happy memories." -- An Obama win, Mankiw working less and playing with his kids more? Are you kidding me? It's a win-win-win situation. Prototype - If No One Sees It, Is It an Invention? - NYTimes.com Johnny Lee, in the NYT. "Against Obedience" (An und für sich) Kotsko and the consitutive exception -- "God is the constitutive exception — to get an order in which ownership or the demand for obedience are disallowed, God has to be able to do those very disallowed things. (In both cases, you can say, “But of course God’s different!” — but that doesn’t displace the logic of the constitutive exception so much as assert that it’s okay.)" Lepage, "Lattice QCD for Novices" (arXiv) Because we like thinking about numerical computing and simulation for scientific purposes... "Personal data integration (part 1)" (kiwitobes.com) "I’ve been toying with the idea of attempting “semantic integration” of a lot of personal data in my life. I’ll be sure to share more later, but so far I’ve managed to pull together my September phone records, my email history, my contacts, my calendar and my Facebook friends (via the API, not something sketchy!) into a single triple-store." THE TENORION - You want one! Just blew my mind. (http://www.youtube.com/watch?v=N6tLRCDqJ2c) Look at the graphic at the bottom, with layers, like photoshop but for sound. feedly: a more social and magazine-like start page Sold! (Great rec, John.) Enceladus up close - The Big Picture - Boston.com I'm assuming you've seen these, Dad? "I hate BIC blah blah blah" (Andrew Gelman) "Or is the point that it's problematic to use any approach that tries to approximate the marginal likelihood when you don't have proper priors? [Gelman's] reply: Yes, to that last question." "15.501 Introduction to Financial and Managerial Accounting, Spring 2004" (MIT OCW) "The distinction between recording and presenting - and what it means for an online lab notebook" (Science in the open) "This also plays into the discussion I had some months ago with Frank Gibson about the use of data models. There is a lot to be said for using a data model to present the description of an experiment. It provides all sorts of added value to have an agreed model of what these descriptions look like. However it is less clear to me that it provides a useful way of recording or capturing the research process as it happen, at least in a general case." "How will we interact with all this data?" (Nodalities) A question, noted without comment. Books Available for Scan-On-Demand (Boston Public Library & Open Library) "We are excited to announce a new program to allow users and patrons to "scan-on-demand". It's simple to use: just search for a public domain book on OpenLibrary.org and, if it's at the Boston Public Library and hasn't been scanned yet, there will be a "Scan This Book" button on the left-hand side. When you click the button and follow the steps to confirm, we'll have a librarian go and get the book from the stacks, bring it to our scanning center, and have our team of scanners digitize it page-by-page. Within 3-5 days, you should receive an email follow-up with a link to the newly-digitized copy, complete with PDF, online flip book, full text (using OCR technology) and more. All thanks to your request." Meet Rick Reilly's fantasy football partner: Barack Obama - ESPN The Magazine "Likes to bait Hall of Famers. Check." Will Ferrell's Anchorman meets Max Headroom - there is a lot to say, of this we are sure My Twitterboard's about to explode, here. spread_1.jpg (JPEG Image, 500x292 pixels) From Piketty and Saez. (Linked to by Yglesias here: http://yglesias.thinkprogress.org/archives/2008/10/spreading_the_wealth_around.php) "Jane Jacobs on Experts" (Seth’s blog) I (honestly) don't find this quote damning at all. This is exactly how being a graduate student generally works (at least, around here): someone sits you down and says, "you are now our expert in implementing dynamic programming methods for calling segments in microarray data." Or, "you are now our expert in motif discovery methods that take sequence conservation into account." Or, whatever. And you totally aren't an expert, at first. But by the time you graduate, you know a lot more than you did when you started. This anecdote just makes me think that Jane Jacobs had a really poor understanding of Learning and its role in Culture -- which I don't think she did. So something else must be wrong. Olde Boston footage - Brainiac Film footage of a 1903 trolley ride (today's Green Line) through parts of Downtown Boston. The view of (I think) Copley at the end is ... dramatic. "Keeping it real with the FriendFeed Real-time API" (FriendFeed Blog) John (or Bryan), have either of you guys tried out FriendFeed before? "Blacklight Power: What on Earth?" (In the Pipeline) "No, Occam’s Razor doesn’t leave much stubble behind when you run it over Blacklight Power." --- In the course of discussing a totally banal piece of corporate psycho-ceramica, Derek Lowe turns an awesome phrase. United, Citing Fuel Hedging, Lost$779 Million in Quarter - NYTimes.com
"The UAL Corporation, the parent of United Airlines, said Tuesday that it lost $779 millon in the third quarter, largely because of a noncash charge that reflected the declining value of its hedging contracts for jet fuel." --- Does this answer my longstanding question of why, pace Andrew Samwick, more airlines don't hedge these kinds of fuel costs? Reason #onebillion that I'm not a futures trader, right? What's Your Elevation? An interface to Bryan's (Yahoo Pipes-driven) application, that takes your Dopplr feed and displays a distribution of elevations of places you visited. Very impressive! "Impressions of China, Part 1: Don’t Yield to Pedestrians" (in theory) "I have grown up in Rome, and I assumed I knew all there is to know about crossing the street, but I felt out of my depth in Beijing. Waiting for the green signal did not seem to help, and following other people was dangerous too, because they would suddenly stop and I would stop one step futher, uncomfortably close to a bus zooming by. Then, this time, I got it. Cars can turn right on a red light, and have always precedence over pedestrians and bikes. This means that all there is to it in crossing the street is to ignore the pedestrian lights, cross when it’s red for the cars, always stop to let cars turning right go through, and be aware of bicycles." --- There's a study waiting to be done, of region specific "yielding rules." A brief digression on lost time: John Hodgman on TED.com "What, in the alien abductee community, they call a 'screen memory.'" -- I know that loving Hodgman is cliche and wrong, but damnit, I love Hodgman. Florida or Ohio? Forecasting Presidential State Outcomes Using Reverse Random Walks | Red State, Blue State, Rich State, Poor State To read. Genome Database Will Link Genes, Traits in Public View - washingtonpost.com The washington post article on Church's PGP. "The DNA Age - Taking a Peek at the Experts’ Genetic Secrets" (NYT) The NYT article on George Church's "PGP 10" -- his public genome-and-phenotype information database. Written by Amy Harmon, who (rightly) mentions the issue of privacy not just for oneself, but for one's relatives: parents, children, siblings, and even more-distantly-related relatives. Church requires every participant to have "the equivalent of a Master's in genetics," but why is this clear that this is enough? "conplot - a console plotter" (Brendan O’Connor’s Blog) ASCII-art data visualizatoin. Warfare 1917 | Armor Games 18.098 Street-Fighting Mathematics, January (IAP) 2008 (MIT OpenCourseWare) The course notes to Sanjay Mahajan's IAP course. "Who pays the income tax?" (Greg Mankiw's Blog) And the equivalent question, "who has the money?", has the same answer too. Cowen Index: NaN, because Mankiw (probably rightly) has no comments. "three easy transportation tips" (outta mind outta site) Why are we all so angry about biking/driving/walking? Can't we all just get along? (Clearly I've taken this personally in the past, from the pedestrian standpoint. I know how many glass houses I own.) "Perverse Incentives in Polling" (Matthew Yglesias) It'd be sort of funny to hear a take on Journalism as a sort of high-pass filter on the political time-series. "Design Concepts in Programming Languages is now available" (Lambda the Ultimate) Dave's book is out! The coursebook for the class I TA'ed last semester. Mastermind: World Conqueror "Density and Partisanship" (Matthew Yglesias) Cowen Index: 9. Let there be markets: The evangelical roots of economics—By Gordon Bigelow (Harper's Magazine) Reasonably open to parody. A Softer World: 364 "If you play the opening wrong, the game is already lost." "Personal Days: The Penultimate Post" (Andrew Gelman) "... which reminds me of a particularly asinine passage in the incredibly overrated Godel Escher, Bach, which for some horrible reason I remember after nearly thirty years..." -- Gelman also hates Hofstadter. McCain and Obama Palling Around? Must Be the Al Smith Dinner - The Caucus Blog - NYTimes.com "Those stupid bankers and their stupid stupidity" (Crooked Timber) "This [a constant recession] might actually be the kind of economy that Taleb would prefer, as it would afford limitless opportunities to sit at home, stroke our beards and sagely note that nobody really knew the nature of the risks one takes by getting out of bed. In a very dignified, aesthetic way." -- DSquared is a jerk, but dammit, he's a *funny* jerk who writes really well. YouTube - Obama Talks to Joe A video of a part of the debate, with the dial-testing lines embedded in it. (I can't find an archived online version that also has them.) Silverlight 2 Ships - MicroMiel "Deep Zoom: Enables unparalleled interactivity and navigation of ultra-high resolution imagery" "Learning from the Marines" (Reasonable Deviations) "The Marine Corps teaches you how to be miserable. This is invaluable for an artist." Reading this, I was thinking, "this must be the function of graduate school, too." And then I got to the end and, yes, there is the requisite mention of graduate school. historicalarchives | The Onion Via Languagehat. An Onion front page from 1783. Including attacks upon "that Rogue Noah Webster." This is just pure genius. Graphs that lie (Phil Gyford: Writing) Kottke linked to this, writing that Gyford "puts on his Tufte trousers" and fixes a graph by "plac[ing] it on a scale" that "shows its full value." But this is a graph of the FTSE 100, which (as the *first* commenter points out), has a baseline of 1000 (not zero, as Gyford's revised graph makes it seem). When it's pointed out to him, Gyford responds, "I'm not sure why the graph should then start at 1000. Unless it was technically impossible for its value to ever go below that number, which seems unlikely." A simple trip to the "charts" section of Yahoo Finance would have put this right. Blind leading the blind, man. I'm getting to the point where, when I see a reference to Tufte, I give it 50/50 odds of being something blazingly stupid. "The Sun" (The Big Picture) Richard Thaler, "Anomalies: Saving, Fungibility, and Mental Accounts." [PDF] "People do not appear to treat pension wealth as a substitute for other wealth." Mattie M-dog has a dog-blog. NBT: "Next Generation Sequencing" The table of contents for the Nature Biotech issue on next-gen sequencing. "Marginal MCMC: The tool to solve our statistical computing problems?" (Andrew Gelman) "This seems similar to the parameter expansion or redundant parameter idea developed by C. Liu, J. Liu, Meng, Rubin, van Dyk, and others, but perhaps a bit more generalizable and thus usable in routine problems." "No more tail calls in Javascript?" (Lambda the Ultimate) You can't properly talk about functional programming in Language X, unless Language X has (among other things) proper tail calls. "This seems misguided. The user can implement functional data structures but not tail calls (without whole program transformation), so the later are much more valuable than the former. Furthermore, as a functional programmer I'm quite happy to use mutable data structures but I would certainly miss tail calls. Finally, every JS implementation is already shifting to code generation because straightforward implementation techniques are too slow for the existing idioms used in JS code." "Scientific Zeitgeist vs. “Independent” Discoveries" (LingPipe Blog) I've had this blog-post open in Firefox for the longest time. It's about resolving annotation-errors in NLP, using variations of the item-response model (which, like the author, I first learned about from Gelman and Hill's book). But the larger point I keep coming back to here is one of -- in a larger, modeling and "learning" setting, separating out the inferences we make as part of our models from the data itself. In some sense, the models/parameters/what-have-you that we learn should be analogous to annotations-on-text made by the parts of speech analyzers. Except that it's annotations on data. Conflicting annotations == conflicting models. Feyerabend and anarchy and inconsistency and machine learning. Here is where I trail off. "Kevin Trudeau: A Bit of Good News." (In the Pipeline) "Kevin Trudeau, the infomercial king who makes his living slandering drug research and feeding conspiracy theories about diet and health, has been fined$5 million dollars over the marketing of his weight-loss book. He's also been banned from the infomercial business for three years, and found in contempt of court." --- Although Trudeau is a malevolent insane person, it is true that his infomercials were useful for invoking a kind of surrealist/absurdist dizziness at 3 in the morning. I admit that I'll (sort of, kinda of, not really) miss him. I guess now it's up to the Dual Action Cleanser guy to entertain me.
Course Outline (Course: OCRopus)
Programming for OCRopus, statistical layout techniques, some image processing stuff... a whole wealth of links that look interesting and (potentially) useful.
"Obama’s Plausibility plus Watching Watchmen" (Crooked Timber)
"Notional maverickiness, plus liberalism=maoism, is rhetorically unperformable. Obama isn’t practicing any kind of crouching Rezko hidden Ayers plausibility fu. It’s just impossible to slide smoothly from ‘reaching across the aisle to get the job done’ to hinting that those on the other side are a bunch of terrorist-huggers. Positive and negative cancel out."
Efficient Markets Hypothesis: Bibliography
"Counting Electronic Votes in Secret" (Freedom to Tinker)
"The Election Board said this election is “too important” to permit extra people in the polling place," and denied a (lawful) request for authorization to observe the vote-counting process. Ridiculous. Risible. Dangerous.
"Forthright civil servant to be PM's security adviser" (The Guardian)
The classic Mottram quote: "We're all fucked. I'm fucked. You're fucked. The whole department's fucked. It's been the biggest cock-up ever and we're all completely fucked."
CarTel [MIT Cartel]
"CarTel is a distributed, mobile sensor network and telematics system. Applications built on top of this system can collect, process, deliver, analyze, and visualize data from sensors located on mobile units such as automobiles. A small embedded computer on the car interfaces with a variety of sensors in the car, processes the collected data, and delivers it to an Internet server. Applications running on the server analyze this data and provide interesting features for users. "
The Borealis Project
Distributed stream-processing system.
Duffie and Hu, "Competing for a Share of Global Derivatives Markets: Trends and Policy Choices for the United States" (SSRN)
"This is a preliminary study on the status of the U.S. in the global market for derivatives-related services. We include some of the policy choices available to enhance this status. We begin with a review of the importance of active and efficient derivatives markets for the U.S. economy. We then analyze the status of U.S. derivatives-market service providers in both over-the-counter and exchange-based markets. We then discuss factors that play a role in determining where a provider of derivatives services is located."
Md. Police Put Activists' Names On Terror Lists - washingtonpost.com
Would be hilarious, if it weren't so predictable. "The Maryland State Police classified 53 nonviolent activists as terrorists ... The former state police superintendent who authorized the operation ... said the program was a bulwark against potential violence and called the activists 'fringe people.' ... the activists' names were entered into the state police database as terrorists partly because the software offered limited options for classifying entries ... The police also entered the activists' names into the federal Washington-Baltimore High Intensity Drug Trafficking Area database, which tracks suspected terrorists. One well-known antiwar activist from Baltimore, Max Obuszewski, was singled out in the intelligence logs released by the ACLU, which described a "primary crime" of "terrorism-anti-government" and a 'secondary crime' of 'terrorism-anti-war protesters.' ... Hutchins said some names might have been shared with the National Security Agency."
"Nobel Prize Blogging: Symmetry Breaking" (Good Math, Bad Math)
John Armstrong, in comments, is the best math expositor working on the web today. His blog (Unapologetic Mathematician) is really must-reading.
A political clash of Shakespearean proportions - Brainiac - The Boston Globe
Steven Colbert interviews Stephen Greenblatt, and it is suitably awesome.
Shental, Bickson, Siegel, Wolf, & Dolev, "Gaussian Belief Propagation for Solving Systems of Linear Equations: Theory and Application" (arXiv)
Haven't read it yet. The thing I'll be looking for, I assume, will be direct relationship to one of the iterative methods for QR factorization? Is that even right? I'm not sure.
"Generational Myth: Not all young people are tech-savvy" - ChronicleReview.com
"Every class has a handful of people with amazing skills and a large number who can't deal with computers at all. A few lack mobile phones. Many can't afford any gizmos and resent assignments that demand digital work. Many use Facebook and MySpace because they are easy and fun, not because they are powerful (which, of course, they are not). And almost none know how to program or even code text with Hypertext Markup Language (HTML). Only a handful come to college with a sense of how the Internet fundamentally differs from the other major media platforms in daily life."
"The great global cooling myth" (Climate Feedback)
Debunking the myth of a "consensus" on "global cooling" by looking at historical publications. "Thomas Peterson of NOAA teamed with William Connolley of the British Antarctic survey and science reporter John Fleck to create a survey of peer-reviewed climate literature from the 1970s. Looking at every paper that dealt with climate change projections or an aspect of climate forcing from 1965 to 1979, they were able to assess the ‘trends’ in the literature. They found that only 7 of the 71 total papers surveyed predicted global cooling. The vast majority (44) actually predicted that rising atmospheric carbon dioxide could lead to global warming."
How to embalm a dead body
"The closer to its living self a body looked, the happier a family would be. And keeping families happy, I'd learn as the night went on..." People are crazy. As my own recent family experienced proved, this is probably the exact *opposite* of what a lot of people want.
FLCDataCenter.com
" H-1B Program Data A Labor Condition Application (LCA) is used by employers as supporting evidence for the petition for an H-1B visa. Only first issuance H-1B visas are subject to the legislated numeric limitation. DOL disclosure data does not indicate the employer's intended use for the LCA." -- Data on job applications and salaries. Awesome. (Via orgtheory.)
medical reviews of house
Why had I not already guessed that something like this must exist? And it contains recaps of most episodes from all five seasons. Like TwoP, but written by MDs.
"Slightly Deleterious in Trans" (evolgen)
More US Warcraft players than farmers - Boing Boing
Misses the point entirely. (Via BLDGBLOG).
"Semantic Schema Matching"
Another Giunchiglia paper, but I don't think he means (by "semantic matching") what other people (say, Moses or Sussman) mean.
Richard Fateman, "The user-level semantic matching capability in MACSYMA"
Fateman's 1971 paper.
Giunchiglia & Shvaiko, "Semantic matching" (The Knowledge Engineering Review)
A Softer World: 361
I think this is your new investment strategy... (As usual, the mouseover is key.)
"Earth From Above comes to NYC" (The Big Picture)
I didn't realize that the permanent multi-story wall photographs on the first floor of the Student Street here in the Stata Center were Arthus-Bertrand photographs (including at least two of those shown in this series). Some of these are really spectacular.
Calcott, Balcan, and Hohenlohe, "A Publish-Subscribe Model of Genetic Networks" (PLoS ONE)
"We present a simple model of genetic regulatory networks in which regulatory connections among genes are mediated by a limited number of signaling molecules. Each gene in our model produces (publishes) a single gene product, which regulates the expression of other genes by binding to regulatory regions that correspond (subscribe) to that product." --- supported by ... investigations of degree distributions and simulations of boolean networks (ack!), but still: cute idea.
Ripped from the headlines by John Sides at the Monkey Cage. Most striking (and heartening, I suppose, from one point of view) graph I've seen all day.
Databases, RDF, graph-based matching tools, and even a paper on "practical" suffix-tree construction.
Saturday Night Live - VP Debate Open: Palin / Biden - Video - NBC.com
Genius.
Xml and semi-structured data bibliography - Scratchpad Wiki Labs - Free wikis from Wikia
Dawn of Low-Price Mapping Could Broaden DNA Uses - NYTimes.com
$5000/genome, next year. So they say. The story is half news-article, half press-release. Apache POI - Java API To Access Microsoft Format Files "The POI project consists of APIs for manipulating various file formats based upon Microsoft's OLE 2 Compound Document format using pure Java. In short, you can read and write MS Excel files using Java. Soon, you'll be able to read and write Word, PowerPoint and Visio files using Java. POI is your Java Excel solution as well as your Java Word solution. However, we have a complete API for porting other OLE 2 Compound Document formats, and welcome others to participate. " Introduction to Arrowlets Arrows in JavaScript. I'm kinda amazed. "WaPo "Fact Checkers" Blow it ... Again" (ArmsControlWonk) Jonathan Weisman is a hack. (And if you read the comments, PBS News Hour gets caught up in the mistake too. That's too bad.) "Another One for the Machine" (Kevin Kelly) Be a Scientist, guy. "What's Wrong with Economic Theory as Presented to the Public ?" (Angry Bear) "Once a model has been put in textbooks, it becomes immortal invulnerable not only to the data (which can prove it is not a true statement about the world but no one ever thought it was) but also to further theoretical analysis." -- and because I see everything through the lens of what-I'm-reading-at-the-moment, I'd say that this entire diatribe could be fruitfully compared, mutatis mutandis, with Feyerabend's description of Galilean and Copernican cosmology. NIPS*2008 Workshop - Probabilistic Programming Sylvester - Vector and Matrix math for JavaScript MIT-Licensed linear algebra routines for JavaScript. Can you see where this is going? :-) JSamp: A sampling profiler for Java Evan Jones's all-Java sampling profiler, using the Attach API. Famous Awk One-Liners Explained, Part I SED Tutorial with examples A reasonable tutorial, Dad. I'll dig some up for Awk too. "Simpsons actor reads 'stay calm'" (BBC NEWS) Harry Shearer-as-Walter Cronkite reads a British nuclear-warning from the '70s. Chain, Ilieva, & Evans, "Single-Species Microarrays and Comparative Transcriptomics" (PLoS ONE) Cybenko & Crespi, "Learning Hidden Markov Models using Non-Negative Matrix Factorization" (arXiv) I didn't realize that Cybenko was still kickin' around... "We present an HMM learning algorithm based on the non-negative matrix factorization (NMF) of higher order Markovian statistics that is structurally different from the Baum-Welsh and its associated approaches." (Shouldn't it be Baum-*Welch*?) Mona Charen apparently can't figure out how to use "The Google" I'm sure lots of people thought that the "Bosniak" thing was a mis-step on Biden's part. Only a few of us, though, had the good sense to blog about it on a high-traffic blog where we could be publicly corrected. "The Federation Attack Fighter" (Matthew Yglesias) "But as several readers have pointed out, there is such a thing as the Federation Attack Fighter that gets used in large numbers during the Dominion War once the Federation put itself on a war footing." -- Addresses the "no CGI was cheaper in the '60s" argument, too. "Maps for Advocacy: An Introduction to Geographical Mapping Techniques" (Tactical Technology Collective) "The booklet is an effective guide to using maps in advocacy. The mapping process for advocacy is explained vividly through case studies, descriptions of procedures and methods, a review of data sources as well as a glossary of mapping terminology. Scattered through the booklet are links to websites which afford a glance at a few prolific mapping efforts. " Bosniaks - Wikipedia, the free encyclopedia Apparently *not* a Biden disfluency in the debate last night. I stand corrected. Voting insecurities - Hack a Day "UCSB researchers demonstrated how disturbingly easy it is to hack into Sequoia’s e-voting systems and delete or add votes with little more than a USB key." And of course, they've suppressed the effin' report on these systems in NJ until after the election. Thanks a lot, jerks. "Hierarchical Bayesian Models of Categorical Data Annotation" (LingPipe Blog) A two-page writeup of his BUGS models. Calvanese et al. "Cancer Genes Hypermethylated in Human Embryonic Stem Cells" (PLoS ONE) "We report here that silencing of a significant proportion of these TSGs in human embryonic and adult stem cells is associated with promoter DNA hypermethylation." --- they had better not just be reporting a correlation. "Imperfections, Ambiguities and Physics" (The n-Category Café) "I mentioned Lautman’s association of, on the one hand, Descartes’ argument to the existence of a perfect being (God) from an awareness of his own imperfections with, on the other, a mathematical argument to the existence of an algebraically closed field from the inability to factor polynomials in a given field, or to the existence of a simply connected space from the inability to contract all loops in a given space. This perfection/imperfection ‘dialectic’ involves our realising from an imperfect state that there is a perfect state, and also from the nature of the imperfections what are the attributes of the perfect state." "Twelve Virtues of Rationality" (kottke) Beware of people who describe something (curiosity, in this case) as a "burning itch." I guess that's why all his friends call him Whiskers, right? "Let’s build an MP3-decoder!" (blog.bjrn.se) Step-by-step explanation of the MP3 format and encoding/decoding principles, with code written in Haskell. Cascading Dataflow language, in Java, on top of Hadoop. Via Simon Willison. Microsoft Research DataDepot - Home It was only a matter of time, until MSR got into the game. Right, John? Matthew Yglesias » Nerds at War "I’m not sure that we ever get a really clear sense of which kind of fighting the Imperial Fleet prefers, since the Fleet does nothing during the Battle of Endor." Mistake #1! Why Oh Why Can't We Have a Better Corps of Nerds??? Despite this glaring error, however, this entire post and subsequent threads has been one of my favorite things on the interwebs today. Especially comment #27, the implications of which are mindblowing. TCP/IP Illustrated The whole Stevens book, online. PyX - Python graphics package Python plotting and charting, with output to PDF and PS. Freeman Dyson, "1951 Lectures on Advanced Quantum Mechanics Second Edition" Alchemy - Open Source AI Implementations of, among other things, the Markov Logic Network stuff. Linked to from Pedro Domingos's ongoing class (thank you, Prof. Shalizi!), for which there are now five lectures online. bibapp - Google Code "BibApp matches researchers on your campus with their publication data and allows you to mine the data to see collaborations and to find experts in research areas. BibApp makes it easy to see what publications can be archived for greater access and impact and makes it easy to push those publications directly into an institutional or other repository. " 13.6 The InnoDB Storage Engine (MySQL Reference Manual) 6.830 is making a lot of this a lot less mysterious. David Foster Wallace, "Democracy and Commerce at the US Open" Tennis.com puts up DFW's piece on the Sampras/Philipoussis match from '95. "A Brief Tour of Graphd" (The Freebase Blog) graphd, Freebase, triple stores. "On Dumping Palin" (FiveThirtyEight.com) "It’s already the Obama v. Nobama election – you overhear it in all the volunteer-to-volunteer discussions. “Obama Scares Me” is not just the unofficial motto, it’s actually a button we’ve seen sported." "Obama Runs Constructive Criticism Ad Against McCain" (The Onion) "Those are some scathingly helpful suggestions." We're cool, right? Via Brad DeLong. "More untimely stuff about disability" (Michael Berube at Crooked Timber) I think the word 'troll' has too negative a connotation. Here is (essentially) a trolls-eye view of a certain portion of a conference, Berube as the Righteous Troll, and it's satisfying in a way that crime dramas are satisfying when you watch the edgy cop beat up the serial killer in the interrogation room. Anyway, be a Dear Reader all the way to the end. "War For The White House Blog" (The Onion) Don DeLillo blogs for the Onion. (Panasonic!) I am literally unable to tell if this is for "real" or not. Yahoo! Query Language - YDN 23 Personal Tools to Learn More About Yourself | FlowingData John, this is the FlowingData post about tracking software and websites, that I mentioned yesterday and that dissuaded me from writing a post about Mycrocosm and Daytum on my own. John Platt, "Strong Inference" Assigned as reading in R's "research concepts" class. Weird. "Earmarks and the Ridicule of Science" (Uncertain Principles) "Explaining why a study of bear DNA isn't ridiculous takes long enough that the average uninformed voter will tune out long before the key point is reached." Really? Has someone actually tried this? At any rate, the general point that "science is too important not to be funded by the government at a large scale" might be right -- but asserting it just begs the question. Try telling this to (a) Robin Hanson, or (b) anyone from the Broad Insititute. "McCain's Beef with Bears?" (Pork: Scientific American) "This is not pork barrel at all," says Richard Mace, a research biologist with Montana Fish, Wildlife & Parks (FWP). "We have a federal law called the Endangered Species Act and [under this law] the federal government is supposed to help identify and conserve threatened species." The grizzly has been listed as a threatened species since 1975 and scientists say that it is essential to get a handle on the population to preserve it. But, according to Kendall, until the feds decided to invest in this grizzly bear DNA study, researchers lacked the funds to conduct research at the scale necessary to get a reliable measure." Owens Library APA Citation Style Examples Library -- Citing Legal Materials in APA Style Nashoba Valley Winery, Orchard and J's Restaurant. Heading here with R&SA, on the 26th. "Lafayette, we are here. Part II." (The Edge of the American West) " The British and French winced when they heard the Americans’ confidence; it reminded them of their own confidence in 1914. Haig and Foch worried that Pershing had planned another Somme or Passchendaele, one that would end in sanguinary failure. They were right, and wrong." --- Can't wait for Part III. "Self-Awareness" (The Unapologetic Mathematician) I would assume most teachers would kill to have a chance to write a note like this on an exam? "STRAIGHT OVTTA LONDOUN" (Geoffrey Chaucer Hath An Extreme Blog: Go England! It ys Rad!) Old, but good. "Thou too, churl, an thou swyvst with me! \\ The marshalsea shal nede to detainen me \\ Off of youre culorum, thatte ys how ich am goinge outte: \\ For the drastye lollarde traytors, that ys showinge outte." "To the citizens of the United States of America from Her Sovereign Majesty Queen Elizabeth II" (The Monkey Cage) "Her Sovereign Majesty Queen Elizabeth II will resume monarchical duties over all states, commonwealths, and territories (except Kansas, which she does not fancy). Your new Prime Minister, Gordon Brown, will appoint a Governor for America without the need for further elections. Congress and the Senate will be disbanded. A questionnaire may be circulated next year to determine whether any of you noticed." "Electoral Projections Done Right: McCain Doubling Down on Debate?" (FiveThirtyEight.com) My blood pressure is rising... "The book's finished!" (Sustainable Energy - without the hot air) David MacKay's book is finished, and on its way for final editing. "A fair question." (The Edge of the American West) Someday, somehow, I swear, I will have the intellectual gravitas or whatever-it-is so that when I ask a question about a book I'm reading, the author will just pop up and answer it for me on his or her blog. "Lafayette, we are here. Part I." (The Edge of the American West) "Superb stuff (for military history)." Decidedly Casual « Quantum of Wantum Nothing really special about this post *in particular*, but looking back on the archives, I realize you've been blogging about Costco/Walmart/Big Box stores for a while. This might be an interesting thread to include in the blook. Also, we should track down any posts you've written about EBM. Also, it occurred to me that it'd be neat to pull out and have a separate index (at the back of the blook, say) of any poems we've ever quoted on the blog. It'd be like a little poetry appendix, which I know Dad would like too. Questions for Charles Murray - Head of the Class - Interview - NYTimes.com "The last thing we need are more pointy-headed intellectuals running the government." The *last* thing? Really? He must have a really weird internal ranking system -- but I think Murray is, at this point, simply trying to troll the readers of the Times. If you read the whole interview as a kind of performance art piece, then it's really rather droll. Vacuum - Edward Vielmetti in Ann Arbor, Michigan 48104: Financial panic, name the year "... this particular phenomenon being prolonged through a period of months and sometimes years." -- Are you trying to say I should be worried about my looming job search next year? "Meltdown at CERN" (Not Even Wrong) Eirik for the win! "Obviously this is the effect of the anthropic principle! In all the universes that the LHC were functioning and crossed it’s beams, black holes were created which swallowed the earth. Since we are all alive and see the LHC malfunctioning, this is evidence that miniature black holes can be created at LHC. This is really the first experimental data the LHC have produced. It’s amazing how much new good science this anthropic prinicple makes possible!" "Jeffrey Ely's mortgage proposal" (Marginal Revolution) "True just sending money is not incentive compatible. But there is no reason to bail out homeowners. Just intervene in any mortgage default. Seize the property and continue making the mortgage payments. In the short run rent the property back to the homeowner." Is Tyler Cowen talking about himself in the third person, now? David Robinson, Harlan Yu, William Zeller, Edward Felten, "Government Data and the Invisible Hand" Exactly right: "Rather than struggling, as it currently does, to design sites that meet each end-user need, we argue that the executive branch should focus on creating a simple, reliable and publicly accessible infrastructure that exposes the underlying data. Private actors, either nonprofit or commercial, are better suited to deliver government information to citizens and can constantly create and reshape the tools individuals use to find and leverage public data. The best way to ensure that the government allows private parties to compete on equal terms in the provision of government data is to require that federal websites themselves use the same open systems for accessing the underlying data as they make available to the public at large. " "Bad Probability and Economic Disaster; or How Ignoring Bayes Theorem Caused the Mess" (Good Math, Bad Math) Assuming that two events are independent when they really aren't *isn't* the same thing as "ignoring Bayes theorem." Cthulhu Roaster Via John Armstrong. "Roaster of dogs, Eater of souls." What is the Singularity? | The Singularity Institute for Artificial Intelligence "The Singularity" is pretty obviously (it seems to me) the basis of the new JJ Abrams tv show, "Fringe." "Intrade Betting is Suspicious" (FiveThirtyEight.com) Someone is manipulating Intrade -- trading in Obama, McCain, and Hillary contracts (but not Biden?). That's funny. "Things I saw while waiting for the train" (Andrew Gelman) "When the sign says the train will be 0:05 late, it won't be 0:05 late. If it were going to be 0:05 late, they wouldn't say anything at all. In reality it will be 0:30 late." On the T in Boston, when a train is full, the conductor *always* says, "there's another train right behind us." But there (invariably) never is -- they just say that so that some people, who don't care as much, will stop trying to push into the subway cars. LRB · Donald MacKenzie: What’s in a Number? "Broker's ear," and the determination of the LIBOR. "The Two Classes of Airport Contraband" (Schneier on Security) Via Unlikely Word. Exactly right: "To fix this, airport security has to make a choice. If something is dangerous, treat it as dangerous and treat anyone who tries to bring it on as potentially dangerous. If it's not dangerous, then stop trying to keep it off airplanes. Trying to have it both ways just distracts the screeners from actually making us safer." Read the whole thing. "Benjamin Fry and Data Visualization" (Social Science Statistics Blog) Ben Fry at Harvard tomorrow (same time and room as the Gelman talk last week). Anyone want to go? Oscar Wilde's The Decay of Lying For jokes about art and life and the two imitating each other. Aristotle's Rhetoric > The topoi of the Rhetoric (Stanford Encyclopedia of Philosophy) The closest I can come, to the phrase, "Aristotle topoi". Is this what you were listening to, Ms. C? "SpinSpotter unspun" (Language Log) Mark Liberman spots some spin in SpinSpotter's own documentation. "...when I downloaded and installed SpinSpotter, and tested it on several dozen news and opinion pieces, it did nothing at all. I expected to see silly complaints, like the ones that you often get from "grammar checkers". But it flagged nothing, right or wrong, as spin. ... Mr. Herman's description — and the company's explanation on its web site of "what we do" — are prime examples of spin. In fact, I'd go a little further, and suggest that Mr. Herman's description is beyond "spin", edging into the territory of good old-fashioned "lies"." "Aaron Sorkin Conjures a Meeting of Obama and Bartlet" (Maureen Dowd, NYT) Obscene. Aggressively middlebrow. I refuse to even read it. "amusing footnote in Symmetric Functions and Hall Polynomials" (An Ergodic Walk) There should be a dedicated blog for amusing footnotes. Adnan Darwiche, "A differential approach to inference in Bayesian networks" Citeseer X. (2000) Surprised I hadn't saved a link to this already... O'Connor & Spitters, "A computer verified, monadic, functional implementation of the integral" (arXiv) "Step functions are a monad." Bagchi & Wells, "Graph-based Logic and Sketches" (arXiv) "Sketches as a method of specification of mathematical structures are an alternative to the string-based specification employed in mathematical logic... Forms are a proper generalization of sketches: a form can have a model category that cannot be the model category of a sketch. ... Sketch theory has been criticized as being lacunary when contrasted with logic because it apparently has nothing corresponding to proof theory. In this monograph, we outline a uniform proof theory for all types of sketches and forms. We show that, in the case of finite-product sketches, this results in a system with the same power as equational logic." arXiv papers of Georg Gottlob Monadic Datalog "BOLEYN." (languagehat) "And let's not forget Reams." I decided, the other day while waiting for the T, that the combination of a mullet haircut with a bald-on-top hair-on-the-sides style should be called a "bullet." Halberda, Mazzocco, and Feigenson, "Individual differences in non-verbal number acuity correlate with maths achievement" (Nature) The original paper behind the "gut instinct" and mathematics article in the NYT. "Guess how good you are at math" (Language Log) The Log mentions the "gut instinct & mathematics" article in the NYT, along with a link to the original paper. Galileo, "Dialogues on two world systems" Thumbnails and transcriptions of an older translation. "League says no to Watford replay" (BBC SPORT) "According to the Laws of the Game, the decision of the referee, regarding facts connected with play are final and that includes whether a goal is scored or not." There's an analogy lurking in here, somewhere.... "Championship: Reading's phantom goal at Watford will haunt referee Stuart Attwell" (The Guardian) The youngest referee in the Premier League (25 years old!), screwed by his older flag-happy linesman. Just reading about it sends shivers up my spine. Sanjoy Mahajan, "Order of Magnitude Physics: A Textbook, with Applications to the Retinal Rod and to the Density of Prime Numbers" CiteSeerX Mahajan's 1998 thesis, from Citeseer. Order of Magnitude Physics Material Draft of Sanjoy Mahajan's book, "Order of Magnitude Physics: Understanding the World with Dimensional Analysis, Educated Guesswork, and White Lies." Adapted, it appears, from his 1998 thesis. Includes links to earlier coursework, from the mid-'90s, on the same material. Is this where you got the draft, C? Lock & Gelman, "Bayesian Combination of State Polls and Election Forecasts" Andrew Gelman's new paper on election prediction using state-level polling. Partial pooling from multilevel models. "Paul Krugman Gets, I Think, too Close to His Inner Hayek" (Brad DeLong) "This Hayekian argument was, of course, dead wrong. Its problem was that it mistook value for being a fact of nature rather than a social relationship among people. The value of something is what people are willing to pay for it. If there is extra liquidity--extra real money balances--in the economy then the value of commodities in terms of nominal yardsticks will be higher and the value of liquidity will be lower--which means that the value of bonds will be higher. There is no "fall back in price to their true value."" Having a Beer - CSDP Election 2008 Via Andrew Gelman. Larry Bartels quotes a study that asked the infamous "which politician would you rather have a beer with" question. But this isn't even the right question, just another example of snobby egg-headed ivory-tower divorced-from-reality elitism. The correct question is, "Who would you rather have, like, nine beers with? And also maybe a couple of shots of some old whiskey you found in your brother's closet? And can't you just keep it down, already? People have to work tomorrow. Also, you kids get off my damn lawn." I think the answer is clear, people. "Operant conditioning at the NC Zoo" (Cognitive Daily) "As we strolled from exhibit to exhibit and listened to Jayne's comments, we were struck by how frequently psychology enters into the daily routine of managing a zoo. Through operant conditioning, the animals are trained to assist the zookeepers in practically every zoo function, from feeding, to grooming, to medication and contraception." "Ha Ha Ha Not Funny." (Acephalous) "I'm thinking, there's no way the fates are gonna let this man get out of graduate school without sending him a fat man who screams, 'I'll show you the life of the mind!'" "Practicing political science without a license, or, all the rants conveniently in a single place" (Andrew Gelman) X moonlighting as a Y. "Why You Should Hate the Treasury Bailout Proposal" (naked capitalism) "We have said more than once that the the US in the same position as Thailand and Indonesia, circa 1996, except we have the reserve currency and nukes. It looks like we will have the opportunity to see how those two assets influence the end game." The Decline of ‘Virtue’ — Crooked Timber In parallel to the "Cowen Index," we might also define a "Holbo Index" -- the number of comments before a thread degenerates into an angry off-topic discussion about The Bell Curve. On the other hand, this post did inspire me to re-read Meno this morning. So that was probably okay for me, too. "A common problem when optimizing COUNT()" (MySQL Performance Blog) "If you know your SQL well, you know COUNT() has two meanings. 1) count the number of rows 2) count the number of values. Sometimes, but not always, these are the same thing. COUNT(*) always counts the number of rows in the result. If you write COUNT(col1) it counts the number of times col1 is not null. If it's never null, the result is the same as the number of rows." --- this distinction was actually mentioned in class (6.830) on Thursday. Molecular Coupling of Xist Regulation and Pluripotency -- Navarro et al. 321 (5896): 1693 -- Science "We conclude that the three main genetic factors underlying pluripotency cooperate to repress Xist and thus couple X inactivation reprogramming to the control of pluripotency during embryogenesis." "Gut Instinct’s Surprising Role in Math" NYTimes.com I guess this isn't that surprising, if you think of Math as largely driven by Geometric Intuition. I admit, I've never heard back-of-the-envelope calculations called "Fermi games." "Understanding the Three Ways of Dealing with Financial Crises" (Brad DeLong) Gotta ask Jeremy, resident economics PhD, about all this at the tailgate party tomorrow. ("The response to objection (2) is "tough." Yes, it is important to design the elements of the rescue package in such a way as to give as few windfalls as possible to the undeserving feckless, greedy, imprudent, thriftless, et cetera. We will do what we can within the law to make sure as few gains ill-gotten survive going forward. But as Federal Reserve vice chair Don Kohn says, it is bad public policy to hold the jobs of tens of millions hostage in an attempt to teach a few feckless financiers (or even somewhat more thriftless borrowers) even a much-deserved lesson.") "Do Genes Matter for Health?" (Seth’s blog) "Some rare non-hype on this issue has recently come from Dr. David Goldstein..." -- d00d, "non-hype" on this is rare only if the NYT is your sole source of genomics-and-disease news. OTOH, the discussion of this article over at The Corner (why do I continue to browse that crap blog?) is hilarious. "An Investigation of Graham’s Scan and Jarvis’ March" - Chris Harrison Convex hulls and computational geometry. Undergraduate project. Lowd & Domingos, "Learning Arithmetic Circuits" The paper I was mentioning to you on the train today. Probably should read an earlier paper by Adnan Darwiche, "A Logical Approach to Factoring Belief Networks," too. "The Benign Rule of Ben Bernanke and the Ideal of Democratic Equality" (Will Wilkinson) File under: Department of One Thing That Does Not Follow From the Other. About: Degrafa I'm not sure what "declarative graphics" means, but I think I know what it *should* mean. Anyway, this appears to be a project for Flex. (John, when you come to town we should talk about my latest thinking on visualization frameworks in Java. I remember a discussion about this over pizza in the MIT student center about a year ago, but ... my thinking has changed since then.) Hazel Mail, A Dead Simple Way To Design And Ship Custom Postcards Worldwide Sounds like it'd be up your alley, Ms. C. Welcome to the new Freedom to Tinker Tim Lee joins Freedom to Tinker. Roger Federer as Religious Experience - Tennis - New York Times David Foster Wallace's description of Roger Federer's tennis game, from two years ago in the NYT Play magazine. Reading DFW-on-tennis is like hearing my old roommate Jax talk about it. JBox2D Demos A "close Java port of Box2D", a physics engine. Being used in processing, but ... yeah. Why I Hate Django - Hack a Day "Sino-Russian Transcription and Transliteration" (Language Log) Aren't each of these systems for character-rewriting characterized (heh) by rules? Couldn't these rules be encoded as algorithms? Couldn't the entire un-rewriting problem be framed as a statistical inference problem? This is transliteration, not translation ... so why is this not a solved problem? Books to Read While the Algae Grow in Your Fur, August 2008 Israel's "The Dutch Republic"! I feel vindicated. Also, somewhat ashamed -- I kinda petered out after about 700 pages or so. I should restart it... Guild of Blades Retail Group: Retail Division of the Guild of Blades Publishing Group Print-on-demand playing cards. "I'm Going Back To Tennis Camp" (Capital Gains and Games) "You can't help but wonder why a Bear Sterns bailout was acceptable but a Lehman (and AIG?) bailout wasn't. Paulson didn't answer the question when he was asked (and asked, and asked) yesterday and what he did say doesn't make a great deal of sense. He indicated that the Bush administration was thinking about the moral hazard issue and decided that Bear, Fannie, and Freddie were enough government involvement in the economy. To me, that raised even more uncertainty about what can be expected from this White House at a time when the last thing we need is uncertainty. It's almost the worst of both worlds from the Bush administration and may eventually be seen as a Herbert Hoover-like blunder." "David B. Goldstein Finds Fault in Effort to Decode Human Genome to Fight Disease" (NYT) David Goldstein thinks your puny HapMap is no match for most diseases... "Keep beholding!" (The Edge of the American West) "What was two scarves has become ... three scarves." "Open access: public good or publishers' evil?" (The Great Beyond) Via Wolfson. Pat Schroeder and John Conyers on the side of ... well, not the good side. The punchline comes at the end ("the bill doesn't have legs,") but that doesn't make its introduction any less disturbing. Via Wolfson, who points out (correctly) the idiocy and confusion of the "sooner rather than later" locution. More lipstick! Tina Fey as Sarah Palin "MEDIEVAL NAMES ARCHIVE." (languagehat.com) "Routed by Sarmatians... thwarted by the Thracians..." "The ten most underrated science fiction movies" (Marginal Revolution) These list posts are enough to drive *me* to drink. I love me some Primer, and I think an argument could be made for (or had about) Sunshine, but anyone who listed Event Horizon (and there was more than one!) needs to get their head examined. NetFlix Origami - Recycle Paper I know, I know -- it's probably child's play to you, Ms. C. But still. I do particularly like, "Starched Shirt." See it now, before Netflix takes 'em down. Pubget "Pubget is like PubMed, except you get the PDFs right away." Via the Daily Transcript, who describes it as "an order of magnitude better than PubMed." "Tax Plans (that’s one for you, nineteen for me)." (chartjunk) Chartjunk (which I didn't realize had been so beautifully re-branded) re-draws the Washington Post bar graphs depicting expected changes in taxes by tax-brackets. (Hint: evenly spacing the brackets, as the WaPo did, is somewhat deceptive.) Diabetes Care -- Table of Contents (October 1994, 17 [10]) One of our postdocs points out that, in the October issue of "Diabetes Care" in that same year, there were at least four responses to "Tai's Model", including: "Tai's formula is the trapezoidal rule." I gotta go track these issues down in a library somewhere. A mathematical model for the determination of total area under glucose tolerance and other metabolic curves -- Tai 17 (2): 152 -- Diabetes Care Genius. We all had a big laugh over this in our office just a minute ago -- now we've printed it out, and we're going to go quiz some of our biologist collaborators and see if they pick up on it. "We think this method might be useful for interpreting some of our gene expression time series curves, can you scan it over and tell us what you think?" (Would doing this make me a jerk?) Foster and Stine, "Variable Selection in Data Mining: Building a Predictive Model for Bankruptcy" [PDF] "We predict the onset of personal bankruptcy using least squares regression." Mentioned in a letter to Andrew Gelman on his blog. IBM - IMS -Information Management System - database management system Still kickin' around, after all these years. (Michael Stonebraker claimed in class, on Tuesday, that there's (still) more data in IMS than in any other type of system, worldwide. I took this to be large-scale business transaction data -- he gave FedEx as an example.) "Robin Hanson and I discuss adjusting for variables you shouldn't adjust for (for example, adjusting grades given sex, race, or pre-test scores)" (Andrew Gelman) I read this, and I think: double-counting. (Right?) Zampieri et al. "Origin of Co-Expression Patterns in E.coli and S.cerevisiae Emerging from Reverse Engineering Algorithms" (PLoS ONE) "The concept of reverse engineering a gene network, i.e., of inferring a genome-wide graph of putative gene-gene interactions from compendia of high throughput microarray data has been extensively used in the last few years to deduce/integrate/validate various types of “physical” networks of interactions among genes or gene products." --- an observation not without its own problems and contradictions. (For instance, are any of these networks that are "reverse engineered" (a poor term, anyway) actually consistent with one another? Do any of the patterns that are discovered to be over-represented in them actually derive from the method of discovery itself?) "WeatherBill shows the way toward usable combinatorial prediction markets" (Oddhead Blog) "WeatherBill can be thought of as expressive insurance... WeatherBill can also be thought of as a combinatorial prediction market with an automated market maker..." Twin Peaks: Watch Full Episodes - CBS.com CBS has put a subset of the full episodes online. Everymoment Now : Obama Vs. McCain : Context and Scope to the 2008 US General Election Graphing news-articles on the election, Obama vs. McCain. The color scheme is awful, and some of the lines look really ... weird. There's some optical stuff going on here. But at the same time, I really like the unfilled space counting down until the election... Requirements for Relational-to-RDF Mapping Notes on a guy's weblog. Via Raw, I think. To read. "The Mechanisms of Nixonland" (Crooked Timber) "At the panel Paul Krugman suggested that we might want to use the concept of path dependence to understand why Nixonland has persisted until today. I think that’s right – but I also think that recent political science work (by Kathleen Thelen; by the other Paul on the panel; by Jacob Hacker) has a better grip on what path dependence actually involves than the original work by economists on Polya urn processes and the like. Political scientists argue that the basic idea of path dependence needs to be fleshed out by a more particular understanding of the specific mechanisms through which institutions and other phenomena reproduce themselves, and hence the mechanisms through which either stability or change can occur." -- I don't understand this paragraph. Isn't "path dependence" really a function of your model, not so much an inherent property of the system itself? "The Modern Gentleman's Decision-Making Flowchart" Achewood - September 10, 2008 "I don't know this guy well enough to rock that kind of chuckle!" (Soon: Perfect Friends.) "David Frum, Columbia statistician spar over inequality" (Brainiac) Brainiac tries to referee the Gelman-Frum scrum. YouTube - Tendercrisp Bacon Cheddar Ranch LaChappelle Hootie BK Ah, yes. My all time Favorite Commercial. Love those swingin' buckets of ranch dressing. "It's the tendercrispbaconcheddarraaaanch." WhaleNet Humpback Catalog Intro "Data Search", although I can't get the web-form to work. There's also a CD-ROM available for purchase ($18). Images of flukes, names, locations, and lineages.
TinyDB: A Declarative Database for Sensor Networks
One of Sam Madden's older projects. SQL-like language for collecting data from networks of sensors?
"Caro Speaks to the Spirit of Jane Jacobs" (NYTimes City Room Blog)
Robert Caro talks about meeting Jane Jacobs. JJ wanted to know about what it was like to meet Robert Moses (and Caro had wanted to know what it was like to beat him).
"Down Syndrome and Decision Theory" (Radford Neal’s blog)
"The first question is whether to have amniocentesis done, which would provide an accurate diagnosis of whether the fetus has Down Syndrome, but which has a 1 in 200 chance of causing a miscarriage. It’s at this point that decision theory has something to say."
"Barack Obama Discusses Civil Liberties At Farmington Hills Town Hall" (YouTube)
"Go catch them, first." (via Edge of the American West, which needs to cut down on the number of guest-posters they've got over there...)
"Bayesian computation in Java?" (Andrew Gelman)
I need to move fast(er).
"Sarah Palin Sarah Palin Sarah Palin" (Fafblog)
You wanted to know what I think about Sarah Palin, Jolene? Here you go. -- "As a moose-hunting Jesus-fearing hockey-mom mother of five who hunts moose, Sarah Palin isn't some petty Washington bureaucrat. She's a petty Alaskan bureaucrat, and she's gonna shake things up in Washington!"
"Adam Yauch of the Beastie Boys Tries Out the Challenging Business of Independent Films" (NYTimes.com)
I read this and instantly thought of you. Cliche, right?
Demogines et al. "Identification and Dissection of a Complex DNA Repair Sensitivity Phenotype in Baker's Yeast" (PLoS Genetics)
Another Kruglyak paper.
PLoS Genetics: An Integrated Approach for the Analysis of Biological Pathways using Mixed Models
Quoting (and disagreeing with) Harvey Mansfield on grade inflation. "To know it would require a large year-by-year database of the actual work done by Harvard students (including, presumably, evidence of their classroom participation), and the matching grades. I’d be very surprised if the administrators have such a database, access to which they are jealously guarding. Mansfield ought to know: if one existed he would have been asked to contribute to it. But he does not even seem aware that that is what would be needed." --- But there's something else going on here, too. There are at least three or four different reasons why grade inflation could be bad, right?
Dare left something out (and it's important) (Scripting News)
"What's missing in REST, btw, is a standard method of serializing structs, lists and scalar types."
Django Documentation
The re-factored version 1.0 documentation.
Simple Top-Down Parsing in Python
Things are so much harder, when you don't have reasonable control structures or tail recursion (or, you know, both).
Burn-In
"Burn-in is only one method, and not a particularly good method, of finding a good starting point." I was given advice once, that was a combination of "one long run" and "no burn-in". It sounded weird at the time -- it still sounds weird today.
One Long Run
"If you can't get a good answer with one long run, then you can't get a good answer with many short runs either."
Ted Dziuba, "Chrome-fed Googasm bares tech pundit futility"
Hilarious. And pretty much right. "The blue internet that opens my web sites" is now my motto. (Actually, I've been running Chrome on my Vista laptop for the last week and -- it's pretty sweet. Only one or two hiccups so far. It might even replace Firefox on that machine, soon...)
"Prediction and the Axiom of Choice" (Inductio Ex Machina)
Theology! "So, apart for the bit where you have to invoke the well-ordering theorem to order the set of all functions over the reals, it’s a very simple strategy." I am not yet convinced that theology has its uses.
"The intergenerational transmission of IQ" (Marginal Revolution)
Tagged *almost* without comment. MR makes me feel a little creepy, sometimes. Like: the level of confidence the assess for their own opinions is (probably) not warranted, in some cases.
SIGGRAPH 2008 Papers
Hyde et al. "Gender Similarities Characterize Math Performance"
[Science. 321 (5888): 494] Really? I hadn't tagged this already? (This is the Science meta-study on gender, testing, and mathematics education.)
"Summers Vindicated (again)" (Marginal Revolution)
A round-up of summaries to the Science meta-study.
Remarks at NBER Conference on Diversifying the Science & Engineering Workforce
Larry Summers' infamous remarks.
"NYT vs WSJ on gender issues" (Andrew Gelman)
Gelman's take on two of the news articles that followed the Science meta-study on gender differences in testing. His take is that the WSJ gets it right, but it looks (to me) like the NYT is following Science and the authors' spin itself. Anyway. Saving to clear tabs ,and for blogging in the future.
"Wow, girls suck at math" (xkcd)
The one about math and gender. For tab-clearing future-posting purposes.
Keyczar
"Keyczar is an open source cryptographic toolkit designed to make it easier and safer for developers to use cryptography in their applications. Keyczar supports authentication and encryption with both symmetric and asymmetric keys."
Sircah
"Sircah is an application written in python that dectects and visualises alternative splicing events using spliced alignments."
"STS.002 Toward the Scientific Revolution, Fall 2003 | Readings" (MIT OpenCourseWare)
Feyerabend reference.
Nature Issue: "Big Data: Science in the Petabyte Era"
Nature takes on the Chris Anderson article (basically).
"Burn-in Man" (Andrew Gelman)
Best post title of the day.
Genome Quilts by Beverly St. Clair
"Beverly St. Clair has originated a way of encoding genetic information in quilt designs." Well, I wouldn't say "originated," but this is a *particular* encoding that leads to particularly cool looking designs. Examples given mostly include genes, but we should consider that non-coding regions might be equally important (or beautiful) for design and aesthetic purposes. Really, *any* piece-by-piece encoding could be tiled in this way. It'd be neat to think about whether certain encodings make clear things like ... protein structure (functional domains), regulatory structure, etc. Also: automatic generation of these patterns at high densities...
"Everybody chill the f*ck out..."
Profiles: Why Me?: Reporting & Essays: The New Yorker
Alec Baldwin in the New Yorker. To be read solely for the little asides, the commentary, from his brother Billy that the writer inserts throughout the article.
Examples of the "cocked hat" in navigation.
"Unchecked Exceptions can be Strictly More Powerful than Call/CC" (Lambda the Ultimate)
"I have to say that on seeing the title I was surprised: I cut my functional teeth on Scheme and every baby Schemer sucks up the knowledge that call/cc lets you create all manner of flow control including exceptions. But, as the paper makes clear, that's not necessarily the case in a statically-typed context."
"Better Never to Have Been" (Crooked Timber)
The symmetricity of utility. Or something. How can a blog with a Murphy quote as its title not comment?
"New E-Newspaper Reader Echoes Look of the Paper" (NYTimes)
"we have positioned this for business documents..." This looks *much* close to the kind of thing that I would want, and is precisely the reason why buying a Sony eReader a month ago would have been a bad idea...
Couzin, "GENETIC PRIVACY: Whole-Genome Data Not Anonymous, Challenging Assumptions"
Whether or not an individual makes up part of a pool of SNPs can be figured out after the fact. I think we can all see where this is going... (i.e. -- GINA).
Java programming dynamics, Part 7: Bytecode engineering with BCEL
IBM Tutorial.
BCEL
Apache project -- Byte Code Engineering Library
Practical Efficient Memory Management | EntBlog
Doniger et al. "A Catalog of Neutral and Deleterious Polymorphism in Yeast" (PLoS Genetics)
realtimecollisiondetection.net - the blog » Posts and links you should have read
"Bobcat family settles in at Lake Elsinore foreclosed home" (Inland News | PE.com)
Encroach nature. With a fantastic photograph. Via BLDGBLOG.
Announcing dmigrations
"dmigrations" and "django-evolution" both solve a particular problem with running a working, ongoing django system.
"Live Mics" (Matthew Yglesias)
Peggy Noonan, on a live mic, refers to the Palin pick as "political bullshit about narratives." Nice.
"A COCKED HAT." (languagehat.com)
The second time I've come across the phrase 'cocked hat' in the past week. (The first was in David MacKay's book on inference and information theory -- one of the many examples includes a simplified model of navigation by three noisy readings, which produce a diagram called a 'cocked hat', apparently.)
"Simulating Harmonographs" (Walking Randomly)
Simulating decayed pendulum motion to get those pen-on-paper diagrams you sometimes see at science centers. This guy implemented the simulation in Mathematica, but it probably wouldn't be too hard to put something similar into an open-source system.
Amazon S3 tools - Command line S3 client
Trail: The Extension Mechanism (The Java™ Tutorials)
"The extension mechanism provides a standard, scalable way to make custom APIs available to all applications running on the Java platform. As of the Java 1.3 platform release, Java extensions are also referred to as optional packages."
"There are three key areas to V8's performance: * Fast Property Access * Dynamic Machine Code Generation * Efficient Garbage Collection"
"A Survey on Hash-Based Packet-Processing Algorithms" (My Biased Coin)
FINALTOON.COM
3ds max plug-in for cartoon-rendering.
Research Blogging
Re-launched, with a new interface. Some of the topic classifications don't seem to work correctly (yet?) -- the two papers currently listed under "mathematics" don't seem to be about "mathematics."
"Machine Learning, Neural and Statistical Classification" (Book)
Manin "On the nature of long-range letter correlations in texts" (arXiv)
Breusch, "Hypothesis Testing in Unidentified Models" (JSTOR)
[JSTOR: The Review of Economic Studies, Vol. 53, No. 4 (Aug., 1986 ), pp. 635-651]
Freedman, "Randomization Does Not Justify Logistic Regression" (arXiv)
"The logit model is often used to analyze experimental data. However, randomization does not justify the model, so the usual estimators can be inconsistent."
Watanabe, "Algebraic Analysis for Nonidentifiable Learning Machines "
[Neural Computation 13 (4): 899] The best that a few minutes of googling on "algebraic" and "non-identifiable" could do...
The Householder-QR/QL Algorithm
Novelties - Lines and Bubbles and Bars, Oh My! New Ways to Sift Data - NYTimes.com
NYT article about Many Eyes (the IBM data-visualization site). John, do you remember a couple of years ago, when we talked in Starbucks about a shared data repository web-site? Anyway. I'm still working up the courage to email Andrew Gelman about the beta-base thing, too.
"Google Results for X Girls, Y Cups" (FlowingData)
Shouldn't the number in the (0, 0) coordinate be the results for which *neither* the words "girls" nor "cups" appear? Shouldn't that be, like, the nearly the size of the entire Google cache? No matter. Ten kinds of yes to the hand-drawn info-graphics, people.
"A Force More Powerful" (Matthew Yglesias)
"In retrospect, this seems obvious to everyone. The moral force of non-violent protest won friends and allies to the cause, exposed the crass immorality of Civil Rights’ opponents, and was forceful enough to bring about major change while also being low-key enough to take “yes” for an answer rather than turning into an endless cycle of recriminations. And yet these ideas about conflict and its resolution seem almost entirely absent from our present-day discourse about violence and its utility. This even though King’s non-violence stemmed not from some esoteric element of his life, but from Christianity — a faith that’s pervasively present in American politics, but whose practical political upshot these days is support for large-scale and casual deployment of violence."
A Softer World: 346
"You know what you do is wrong."
"NASA Observes Earth Blogs" (ben fry)
Notes the NASA Earth Observatory videos and data.
"Suppose You Write the Times to Fix an Error (part 1)" (Seth’s blog)
Seth Roberts trolls the NYT! (But in a *good* way, man.) And one of his commenters goes ahead and does a quick CI calculation (we're just flippin' coins with the binomial, right?) in the comments. Reading part II (immediately after Part I) is recommended, too. Eventually the reporter just says, "We're not getting anywhere," and ignores him. Where's Brad DeLong's Righteous Fury when you need it?
Consistency and availability in Amazon’s Dynamo at Paper Trail
"Consistency and Vector Clocks."
"David Gordon on Rawls" (Will Wilkinson)
I've read some of (not a little of, but not all of either) Rawls, and now I've read Gordon on Rawls, and Wilkinson on Gordon on Rawls, and I admit I'm still confused. Trying to hold that "principles of justice apply to the 'basic structure' of society" (and don't identify "patterns of property holdings") seems to contradict how we go about evaluating different possible structures, justice-wise. It *feels* like a philosophical shell game, even though I'm willing to admit that I might have deeply misunderstood something.
Thomas Gingeras, "Mapping the strand-specific transcriptome of fission yeast"
In Nature Genetics.
Papers from the International Joint Conference on AI. Including several papers in the "Learning" section that I never got around to reading...
Sweetcron - The Automated Lifestream Blog Software
"Automated Lifestream Blog Software." Front page only mentions flickr, blogs, links-saved... not quite what we want, right? But I still mention it to you, FYI, etc etc.
"Love = Boston Red Sox Hall of Fame Catcher Carlton Fisk" (You Will Not Believe The Blog)
Poems with the word "love" swapped for other words or phrases. In this case, "Boston Red Sox Hall of Fame Catcher Carlton Fisk."
WolfenFlickr 3D
Wolfenstein-style raycasting (with the original textures), re-written in Javascript and running in your browser. Combined with an interface, that grabs pictures from flickr and puts them on the walls in the virtual world. Awesome.
Max Weber, "Politics as a Vocation"
"Safeguards at Natanz" (ArmsControlWonk)
Remarkably informative, as always. -- "Anyone who has bothered to read this far probably already knows that ES [Environmental Sampling] has played the starring role in the Iran-IAEA drama. The accuracy of the technique, combined with the persistence of trace amounts of uranium, certainly came as a shock to the Iranians."
Noam Elkies, "On numbers and endgames: Combinatorial game theory in chess endgames"
"Open problems concerning card games" (Gowers’s Weblog)
"One thing [Persi Diaconis] said that particularly struck me was, “If anyone can solve this I can guarantee them a front-page spread in the New York Times,” presumably an allusion to the attention he received for showing that you need seven riffle shuffles to shuffle a pack of cards properly."
10-803: Markov Logic Networks
Only the introductory slides have been posted so far (as I guess we would expect). Looks hottt. ("Lazy and lifted inference?" Sweet.)
Achewood - August 25, 2008
"YOU KNOW, I MIGHT ACTUALLY BRUSH HERE IF YOUR STUPID TOM'S OF MAINE TOOTHPASTE DIDN'T TASTE LIKE RECLAIMED KENNEL CAULK!"
Stephen Spruiell "Going Out With a Bang" (The Corner)
"The 2008 Republican Platform calls for a ban on *all* embryonic stem-cell research, public or private." Insane.
Me on McCain on Technology (Lessig Blog)
Larry Lessig gives his typical-style (single-sentence-slides) talk about John McCain's technology policy, recently released. Major points include comment on broadband penetration and net neutrality.
DataPlace
"Homework Assignment for Lise Eliot: Tour Chicago schools" (Eduwonk)
"Conclusion: College admissions tests are worthless for judging gender. Both the ACT and College Board agree that far more poor and minority girls than boys take their tests; they just can’t measure it." -- More on sub-categorizing and predicting test performance within gender categories. For the upcoming gender-disparities post.
Taking Care of Their Own :: Inside Higher Ed
The whole "Saint John's is having trouble recruiting men" story is weirdly out of place.
The Postsecondary Picture for Minority Students (and Men) :: Inside Higher Ed
"The newest report from the National Center for Education Statistics is, as its title (”Status and Trends in the Education of Racial and Ethnic Minorities“) suggests, designed to provide a comprehensive look at how members of minority groups are faring in the American educational system, from top to bottom. But while the data it offers on that subject are decidedly mixed — showing significant progress over time for all groups, but wide gaps remaining in access to and success in college — the report’s most provocative (and potentially troubling) numbers may be about gender, not race."
Is There a Crisis in Education of Males? :: Inside Higher Ed
"A report issued Tuesday by the American Association of University Women refers to a “so called” crisis and argues that there is no such thing with regard to male students as a whole."
"A 'next-gen' tool to view genomic data" (Broad News)
Not sure what the implied "generations" are here, and their graphic is too fuzzy to make out completely. But the Broad is well-funded, and Ben Fry is still there, and they're smart people, and they've got the buzzwords ("Google-Maps-like") down, so it's probably pretty reasonable.
Jeffrey Mark Siskind's Papers
"Automatic differentiation of functional programs."
"New, Improved DNA?" (In the Pipeline)
'But when I saw the end of this paper, the first thing that popped into my head was "stable high-affinity antisense DNA backbone. Holy cow".' --- That's pretty cool.
"Biden as VP" (Ezra Klein)
Probably my favorite Biden clip evar.
LXR / The Linux Cross Reference
"... a software toolset for indexing and presenting source code repositories. LXR was initially targeted at the Linux source code, but has proved usable for a wide range of software projects."
Barak A. Pearlmuttter Publications
Symbolic differentiation.
"Epidemiologists’ Bayesian Latent Class Models of Inter-Annotator Agreement" (LingPipe Blog)
Two paper links, and an outline of a WinBUGS model.
"Power Series" (The Unapologetic Mathematician)
Direct products vs. direct sums. Polynomials and power series.
Persistent Django on Amazon EC2 and EBS - The easy way
Blog post / tutorial on the new Amazon block storage devices -- and using them to host a persistent Django installation. Which is pretty much what you want, right?
"A NEW ORDER: HOT NEWES ON BLAZINGE FELLOW" (Geoffrey Chaucer Hath An Extreme Blog)
Starts slow, but ends with a bang. -- "[What is it like?] Imagine the best hanginge ye haue evir seen. Nowe, get out yower awesome deeth abacus and multiplye the entertaynment value of that hanginge by the power of X. That is just how good a real emblazinge kan be. Certes, sum folk call it cruel and saye that swich thinges sholde nevir happen in Engelonde. Ich saye to them: crye me a river, moonbattes, next yeere ye shall all be Fellowes!"
"my own private harold and kumar sequel" (orgtheory.net)
The next time I fly a plane, I am totally going to be sitting at the gate trying my hardest to move my eyes in "sudden, shifty movements." (But will it ... break paths?)
"Hnefatafl, an ancient Viking board game, revived." (Boylston Chess Club Weblog)
A computer version of this probably wouldn't be too hard...
Chemudugunta, Smyth, and Steyvers "Text Modeling using Unsupervised Topic Models and Concept Hierarchies" (arXiv)
"more on interaction effects" (scatterplot)
"Roughly, biological interaction refers to an actual substantive process out in the world, and statistical interaction is what you can observe with regression models on population data. You can statistical interaction in the absence of biological interaction and vice versa. I wish a distinction between “substantive” and “statistical” interaction would diffuse more broadly, and substantive interaction is what social scientists interested in causal inference ultimately need to be focused on." We don't want to use the word "causal," so we'll just invent another one to use in its place? Let's stipulate that there are "substantive process[es] out in the world," and let's *still* remember that behavioral geneticists can't "see" those happening, either! Which isn't to say that they don't exist.
Chipman, George, and McCulloch "BART: Bayesian Additive Regression Trees" (arXiv)
Linked to by Andrew Gelman, I think. "Effectively, BART is a nonparametric Bayesian regression approach which uses dimensionally adaptive random basis elements. Motivated by ensemble methods in general, and boosting algorithms in particular, BART is defined by a statistical model: a prior and a likelihood. This approach enables full posterior inference including point and interval estimates of the unknown regression function as well as the marginal effects of potential predictors."
"STS.002 Toward the Scientific Revolution, Fall 2003" (MIT OpenCourseWare)
Reading list on the history of science in the West/Europe.
Das "Generating Conditional Probabilities for Bayesian Networks: Easing the Knowledge Acquisition Problem"
To read. "We invoke the methods of information geometry to demonstrate how these weighted sums capture the expert's judgemental strategy."
"Life Curves" (bit-player)
The blog-post that linked to the Paleobiology Database. "In 2005 Richard A. Muller of the Lawrence Berkeley National Laboratory and Robert A. Rohde, a graduate student at UC Berkeley, published a report in Nature claiming to detect periodic cycles of rising and falling diversity in the Sepkoski data. ... I decided to take a do-it-yourself approach to understanding the issue. I went back to the original data, reimplemented the analytic methods and tried to assess the robustness of the conclusion. I told the story in an American Scientist column. The column pleased no one. It certainly didn’t please Muller and Rohde, who objected that I was out of my depth in my amateur attempt to replicate their work. It didn’t please the critics of the Muller-Rohde hypothesis, who thought my [technical] focus ... deflected attention from deeper conceptual flaws in the argument. ..." I wonder if M&R's objections were ever made publicly? They come off pretty much sounding like jerks.
Scribd
The Paleobiology Database
"The Paleobiology Database is an international scientific organization run by paleontological researchers from many institutions. We are bringing together taxonomic and distributional information about the entire fossil record of plants and animals." Upwards of 84,000 fossil collections.
"Digging Ourselves a Black Hole" (Gail Collins, Op-Ed, NYT)
Why Oh Why Can't We Have a Semi-Numerically-Literate Columnist Writing Op-Eds? "Almost everybody agrees that the Large Hadron Collider may be capable of producing little tiny, black holes. In a way, that’s the idea. Landsberg says he doesn’t think the probability of creating black holes is more than about 1 percent, but you could easily have gotten 100-to-1 odds a year or so ago on John McCain and Barack Obama being the presidential nominees." Seriously. I know probabilities get used and mis-used all the time, but this is ridiculous.
mycrocosm
Like Daytum? But not in "private Beta."
Forumwarz - A free browser-based RPG about Internet Culture
A simulator for being a troll on message boards. "You can start playing right away. What could you possibly lose? (A large percentage of Forumwarz players have lost their dignity. Play at your own risk.)"
"Rostenkowski Isn't Just On Andrew's Mind" (Capital Gains and Games)
Stan Collender compares the elderly Chicagoans' attack on Dan Rostenkowski to the Whiskey Rebellion. And so those anonymous constituents take their places' among the most influential modern Americans... even if their influence was almost wholly pernicious.
mysqlgame
Written in Google's AppEngine framework. "Finally, there's a game that just *is* a database!"
DAYTUM
"Daytum is a home for collecting and communicating your daily data. Begin tracking anything you can count and display the results immediately... or just look around and see what other members are recording."
"In Defense of the Beta Blocker" (Carl Elliott, The Atlantic)
Ridiculous, but someone needs to send this to Duncan.
"When should you throw the ball at top speed" (waste)
Genius, start-to-finish. "(Part of me wants to pose this alternate science fiction scenario: suppose that, over a year or so, a process of combustion carried out with the aid of now-sinister symbiotes gradually replaces me with plant matter. At the end of the year, I don't have any thoughts.)" Didn't Searle write an *entire book* with precisely this thought-experiment as the premise? (With the conclusion supposedly: the physical characteristics of our existence are not sufficient conditions for the mental stuff.) IANAP, but that seems hilariously wrong... anyway, I find it funny how agitated I get when I think about Searle, and I'll just shut up now.
"McCain Spokesman's Retort: Obama Lives in 'a Frickin' Mansion'" (The Trail, WaPo)
'[McCain's spokesman] predicted that the story would not "stick" with the American people. "In terms of who's an elitist, I think people have made a judgment that John McCain is not an arugula-eating, pointy headed professor-type based on his life story."' Oh, *thank goodness* -- we wouldn't want anyone with a misshapen head and a misguided affinity for any kind of intellectualism leading our country now, would we?
Kemp and Tenenbaum, "The discovery of structural form" — PNAS
"Here, we present a computational model that learns structures of many different forms and that discovers which form is best for a given dataset. The model makes probabilistic inferences over a space of graph grammars representing trees, linear orders, multidimensional spaces, rings, dominance hierarchies, cliques, and other forms and successfully discovers the underlying structure of a variety of physical, biological, and social domains." To read. If it were someone other than Josh Tenenbaum in the author list, I'd be less interested...
i is undetached cat-partz? by imipolex_g-unit
Cat & Object. (Oh, a Quine joke. How mature.)
Package Management Sudoku
Provides, Conflicts, and Depends. The Debian package management system solves Sudoku.
"What Makes for a Good Blog?" (Merlin Mann)
s/Good Blog/Interesting Person/, yo.
"Freed on Chern-Simons" (The n-Category Café)
Quoted quote: "I once joked that every mathematician also has a category number, defined as the largest integer n such that (s)he can think hard about n-categories for a half-hour without contracting a migraine. When I first said that my own category number was one, and in the intervening years it has remained steadfastly constant whereas that of many around me has climbed precipitously, if not suspiciously."
Joe Curtatone breaks it up. (Somerville Journal)
Apparently the Somerville Mayor also doubles as the Somerville Bouncer. "Shea reportedly gave police two wrong birth dates as they tried to identify him, yelled at them when they asked him to sit on the curb and wait and then attacked the other man again in front of the police officers." This is right between my old apartment and my old favorite coffee shop. Note that this kind of thing happens all the time in Somerville -- it just doesn't usually get written up in the local newspaper.
ANTHRAX INVESTIGATION: Full-Genome Sequencing Paved the Way From Spores to a Suspect -- Enserink 321 (5891): 898 -- Science
Science News article about the FBI Bruce Ivins investigation.
"DEFCON 16: List of tools and stuff released" (ZDNet)
Open source software released at DEFCON 16.
"Predicting Structured Data"
Class from the University of Helsinki. Good set of links to papers, including a few I hadn't seen before.
Avec Eric
Eric Ripert (Le Bernadin) cooks with a toaster oven.
"Goldfarb Protests Too Much" (TAPPED)
It's funny to make fun of Goldfarb, but this feels like a bit of a reach. Goldfarb's quote reveals (http://bayes.tumblr.com/post/36321264/and-my-student-routinely-passed-a-saving-throw-vs) that he picked up his familiarity with D&D with AD&D 2nd edition (or maybe an even later version?), which suggests a Johnny-come-lately approach. I suggest that it's likely he only played the game once or twice, maybe borrowed a book or two, and then decided it was time to put away the percentile dice and join the ranks of ridiculous political hacks.
"Neither a soul to damn nor a behind to kick." (The Edge of the American West)
Oddly enough, the second time I've heard this phrase ("neither a soul to damn...") in the last 24 hours. The other time was last night on PBS, used by one of the historian interviewees when talking about Andrew Jackson. A quick Googling reveals an attribution to Edward, First Baron of Thurlow, who was apparently the Lord Chancellor of England at the time.
IM2GPS: estimating geographic information from a single image (YouTube)
CMU student presentation at Google. Locating an image based on similarity scores to geo-tagged images. Building location maps (probability distributions over the globe itself) based on similarity scores.
"New Geometric Keyboard" (Kevin Kelly)
Keyboard for music, not for typing. I feel like this is the playable version of the orbifold diagram that I first saw mentioned in Dmitry Tymoczko's "The Geometry of Musical Chords," (Science, July 2006), and have since seen in several other places. But I'm not sure.
"The drink-ice cube ratio" (Marginal Revolution)
"Another reason the ratio becomes too high is if the waiter comes by and pours excess drink into your glass, so that he may take away your can or bottle "too soon" for his own not quite legitimate purposes. This can be avoided by placing your bottle or can in an inconvenient, hard to reach place." Why do I get the feeling that Tyler Cowen also chooses which end of a subway to board based on what direction he's going to have to walk when he gets off? (Note: I do this too.)
"You Can't Predict Who Will Change The World" (N.N. Taleb, Forbes.com)
Pretty convenient, how the people who are going to "change the world" are (a) precisely the people he's written a book about, and (b) immune to prediction. When he's not pitching his book, he's being self-contradictory: neither "followers of Adam Smith nor those of Karl Marx" understand "randomness," yet free markets somehow work by "capturing randomness." Okay. Lists like this are pretty useless. OTOH, it's funny to see someone picking Eggan and Tenenbaum out of a hat.
Mom, Can My Voting Machine Spend the Night? - The Lede - Breaking News - New York Times Blog
Wait, what? Ohio election officials would *take election machines home with them*?? They even called them "sleepovers?" Insanity.
"Shameless Self-Promotion" (The Monkey Cage)
I'd buy it. But more importantly: where's the data?
pptPlex
From the Office Labs. I assume you've seen this already? (Watch the video at the top, if you haven't.) Available as a plugin for Office 2007, which I ... don't have. Blah. But -- 95% of the way there.
"Big Cities Helping Smaller Ones Pollute" (Zubin Jelveh - Odd Numbers)
"So the restrictions on development in cities like Chicago, New York, and Los Angeles, while keeping per-capita emissions low, may be pushing pollution-causing activities to more laissez faire cities." Really, I should just read the paper.
"I'm sure someone can explain this one to me" (Andrew Gelman)
I kinda heart AG. Is that wrong to say?
Publish a Paperback Book - Lulu.com
For the blook. (I know I've tagged this before, somewhere...)
Clay Sanskrit Library
Like the Loeb classical library, with facing text-and-translations. Via LanguageHat.
The Nature of Field Work in a Monolingual Setting
An excerpt of a description of Kenneth Pike eliciting information about a new language from a native speaker without a translator -- "monolingual elicitation." Benzon at the Valve links to this, asking "what did Pike know that Quine didn't?" But that seems like a complete misreading of Quine, who's describing how the narrowing process (I think he terms this, asking "ostensive" questions) narrows down some intermediate translation without ever permanently settling the question. Would someone like Pike ever really dispute that? Anyway, I think language-learning-games like this are probably a great lab for thinking about science too -- "nature" as the native speaker.
WILLIAM LAMSON / Work / Video Work / Video / 1
It's the splattering sound that makes this really creepy.
Lee and Seung, "Algorithms for Non-negative Matrix Factorization"
Two multiplicative update rules, for which an NNMF is a fixed-point and which alternately update either a Euclidean metric or a divergence. Looks easy enough...
"Beyond the Hoax: Science, Philosophy and Culture by Alan Sokal" (Review by Simon Blackburn)
"But the reality is that science is a human activity, not an abstract calculus, and this properly makes its great achievements a subject of pride and awe, not suspicion and skepticism. It should also make us aware of its desperate fragility, and the hostile cultural forces that it constantly has to overcome." -- An excellent essay, with a phenomenal paragraph (from which this quote is taken) right in the middle.
Finer, Jenkins, Pimm, Keane, & Ross "Oil and Gas Projects in the Western Amazon: Threats to Wilderness, Biodiversity, and Indigenous Peoples" (PLoS ONE)
"We synthesized information from government sources to quantify the status of oil development in the western Amazon. National governments delimit specific geographic areas or “blocks” that are zoned for hydrocarbon activities, which they may lease to state and multinational energy companies for exploration and production. About 180 oil and gas blocks now cover ~688,000 km2 of the western Amazon. These blocks overlap the most species-rich part of the Amazon. We also found that many of the blocks overlap indigenous territories, both titled lands and areas utilized by peoples in voluntary isolation. In Ecuador and Peru, oil and gas blocks now cover more than two-thirds of the Amazon."
Deus et al. "A Semantic Web Management Model for Integrative Biomedical Informatics" (PLoS Computational Biology)
"In this scenario, the acquisition of a new dataset should automatically trigger the delegation of its analysis." Open source version of the system and RDF schema at : www.s3db.org. At (very) first glance, looks promising/cool.
Devarajan, "Nonnegative Matrix Factorization: An Analytical and Interpretive Tool in Computational Biology" (PLoS Computational Biology)
"Voting Machines" (xkcd)
Premier Election Solutions ("formerly Diebold" -- really?) is doing it wrong. Antivirus software on voting machines. Laff.
Decision Tree Learning
Pretty clear description of the ID3 decision-tree-learning algorithm. Actually finding a reasonable description of ID3, publicly accessible and on the web instead of in a book, has seemed surprisingly hard to me for a while. Maybe I just wasn't looking hard enough. Anyway, an implementation of this in our generalized classification framework here at work turns out to be as easy as you'd expect.
"Shared Streets: Feasible or Farfetched?" (Bostonist)
"Boston could benefit from Shared Space, a movement based on the idea that drivers and pedestrians can better (and more safely) share space by eliminating divisions between the two. Tenets include the idea that we need fewer, not more, signals to manage traffic, and that signals should be more pedestrian-oriented instead of car-oriented." Theoretically, I think I would welcome something like this. Practically, I wonder how much of the philosophy survives after first contact with two craziest classes of vehicles on the road: delivery trucks, and cabs.
"This is the threading algorithm that was used in Netscape Mail and News 2.0 and 3.0, and in Grendel. " Working myself back up to working on the Mineshaft Miner.
davis_square: Assaulted on Bike Path to Davis Sq. T-Stop
I'm absolutely sure I've seen this group of kids hanging around the bike path and the parking lot behind the Lofts. When I saw them, they were sitting in the parking lot and screaming threats at people on the path. And now they've attacked someone. Insane.
Starobin et al. "Action potential restitution and hysteresis in a reaction-diffusion system with pacing rate dependent excitation threshold"
"Clinton Campaign Didn’t Grasp Rudimentary Proportionality Math" (FiveThirtyEight.com)
Wow... just, wow. "Set aside the malpractice of discontinuing polling in caucus states where the blind-flying Clinton campaign allowed Obama’s team to run up the score, this revelation shows that the Clinton’s HQ apparently did not have simple calculators. ... At this point, it’s much more a cautionary tale for future campaigns to make sure they hire people who know how to work a calculator and look up some basic information. High school interns would probably do it for free. In short, one key aspect of the epitaph on Clinton’s 2008 campaign will be that simple numbers that any old math-minded person could figure out escaped her top people." (538 is fast becoming my favorite daily stop for political news. No one else writes as engagingly or entertaingly (or as precisely) about the day-to-day horserace.)
BYTE.com > Graphics Programming Black Book by Michael Abrash
Abrash's graphics book, available in chapter-by-chapter PDFs online. Some of this stuff is dated, but some of it (I would imagine) could be useful in a "custom graphics for visualization" context.
Via The Daily Transcript. Search engine for ... slides, online databases, software, "protocols", tools. Not sure how complete it is, or who runs it, but a couple of test searches actually return a reasonable number of fairly useful results. (FWIW: "factor binding" returned at least one database on the front page that (a) I hadn't known about, and (b) potentially could be interesting.) TDT calls it, "Google for Scientists" (but with a question mark). But Google already *is* "for scientists," right? (Joking!)
WaveMetrics - scientific graphing, data analysis, curve fitting & image processing software
$400 for the academic license, Dad? Are you sure that there aren't free (and more powerful) versions of this kind of software out there? I mean, free-as-in-speech, not just free-as-in-beer. I could help you look for something specific, if you like... "huge and important news: free licenses upheld" (Lessig Blog) Lessig is "very proud to report today that the Court of Appeals for the Federal Circuit (THE "IP" court in the US) has upheld a free (ok, they call them "open source") copyright license, explicitly pointing to the work of Creative Commons and others. (The specific license at issue was the Artistic License.)" Richard Bourgon, "Chromatin immunoprecipitation and high-density tiling microarrays ..." To read. Parts of this appear pretty similar to A.R.'s master's thesis? Also, Qi et al. is in his references, but is only cited on page 29, which seems odd. It's interesting to find how disconnected I am to the "start of the art" in practical analysis of ChIP-chip (or ChIP-*, for that matter). McBride, Alford, Reid, Larson, Baxevanis, and Brody "Putting science over supposition in the arena of personalized genomics" (Nature Genetics) To read. Putz et al. "Improved Tropical Forest Management for Carbon Retention" (PLoS Biology) "Using reduced-impact timber-harvesting practices in legally logged tropical forests would reduce global carbon emissions by 0.16 Gt/year at a modest cost and with little risk of "leakage" (increased carbon emissions elsewhere)." Exquisite Corpse - Journal of Letters and Life - Man and Dog Fiction from Susan. "On using "information processing" as a metaphor for biological processes" (The Daily Transcript) Part II of Alex Palazzo's commentary on Paul Nurses "information networks" and systems biology essay. "Intuitionistic mathematics for physics" (Mathematics and Computation) Andrej Bauer walks us through intuitionistic mathematics -> synthetic differential geometry -> physics. The Lessons From the Kindles Success (The NYT "Bits" Blog) Complete with the quote from Jobs ("nobody reads anymore") -- predictions of$1 billion in Kindle and Kindle-related sales by 2010. Rachel was going to buy me an e-reader of some kind, but I talked her out of it -- the Kindle doesn't support PDFs well, and the Iliad has serious problems. The Sony reader looked pretty good, but I also found suggestions of poor PDF support online, too.
Data expo 09. ASA Statistics Computing and Graphics: Airline on-time performance
"The data consists of flight arrival and departure details for all commercial flights within the USA, from October 1987 to April 2008. This is a large dataset: there are nearly 120 million records in total, and takes up 1.6 gigabytes of space compressed and 12 gigabytes when uncompressed. To make sure that you're not overwhelmed by the size of the data, we've provide two brief introductions to some useful tools: linux command line tools and sqlite, a simple sql database. ...The aim of the data expo is to provide a graphical summary of important features of the data set."
"How should logarithms be taught?" (Gowers’s Weblog)
"Behind that suggestion is a more general claim, which is that mathematicians greatly underestimate the extent to which they think syntactically rather than semantically. When you work with the log function, it feels as though you are somehow in direct contact with the function, but this feeling is as much of an illusion as the feeling that you are actually seeing a cube when you decide to visualize it. "
"How to use Zorn’s lemma" (Gowers’s Weblog)
"Note the similarity between the position we are now in and the position we were in when we were trying to create a basis for \mathbb{R} over \mathbb{Q}. Once again, we can continue to create larger and larger objects, but there seems to be no easy way of saying that the process eventually ends. Or rather, there is an easy way: Zorn’s lemma tells us that it ends." I found myself kind of giggling, all the way through. Gowers writes really well for a famous mathematician, but as soon as he writes "there is basically only one candidate to try: the union itself," that was where I got off the train. Thank Not-God for Constructive (Non-Platonic) Mathematics, right?
"Shut up and calculate!" (bit-player)
Quoting Leibniz: "When controversies arise, there will be no more necessity of disputation between two philosophers than between two accountants. Nothing will be needed but that they should take pen in hand, sit down with their counting tables and (having summoned a friend, if they like) say to one another: Let us calculate." (Calculemus! I think he's wrong about there not being enough "inquisitive" computational environments, though. I would suggest R or Matlab, or other statistical environments with programming interfaces, or maybe even an appropriately-equipped installation of Python or MIT-Scheme. Something interpreted.)
"Bad Ideas - A Play In One Act" (Boston Biker)
Well-played, sir.
"What are the best games?" (Marginal Revolution)
Diplomacy, man. Diplomacy. (I was excited for a second, before I realized he wasn't going to talk about computer games.)
Shaun Mahony
Shaun's webpage.
Photography as a Weapon - Errol Morris - Zoom - New York Times Blog
Errol Morris interviews Hany! Awesome.
A Softer World 338
I think I'm assembling something that looks like a post about the recent Science-published meta-review of studies on test-scores in mathematics for women vs. men. (This comic is a reference to the xkcd classic: http://xkcd.com/385/).
A Lightweight SQL Database for Cloud and Web in Launchpad
I'm not sure why I'm collecting links to different databases...
"Prepare yourself, for tonight we board IN HELL!" (CinnamonPirate.com)
"This is where the game derails." Blogger plays through a ROM of the Famicom game, "Titanic 1912," which vies for the title of worst game of all time. Written with some kind of Final-Fantasy-like engine and based on the James Cameron movie (!), but with some notable plot-point updates (watch for the submarine). I was in tears by the time he describes walking through the window.
Thesis Draft: Computational Methods for Analyzing and Modeling Gene Regulation Dynamics
Jason Ernst is Ziv's first (or second?) student at CMU. His STEM software is pretty slick, and DREM too. It looks like they comprise a reasonably large part of his thesis work.
A django application for updating existing models (as syncdb, obviously, only handles completely new models). Totally experimental, and only works with the new (post 0.96) versions of django. Possibly still useful, though.
Visuals - The Gang of Four at the Gateway of Life - NYTimes.com
Stuart gets written up in the NYT. Funny to see Rick's "network" graphics get top billing.
Fuelly
Web 2.0 social networking / mileage-sharing car site.
Raphaël—JavaScript Library
A library for browser-independent graphics programming in Javascript.
nmdb - A multiprotocol network database
Database with python, scheme, and haskell bindings. "Multprotocol" means multiple *network* protocols. "It consists of an in-memory cache that saves (key, value) pairs, and a persistent backend that stores the pairs on disk. Both work combined, but the use of the backend is optional, so you can use the server only for cache queries, pretty much like memcached."
David Pollard, "Empirical Processes: Theory and Application"
Ruth Rosenholtz
Researcher in BCS here at MIT. Pointed out to me by Bryan R.
Revisiting Robert Drew's groundbreaking John F. Kennedy documentaries. - By Elbert Ventura - Slate Magazine
The videos (in the accompanying slideshow) really are pretty amazing. Watching Drew discuss his own technique is like watching the beginning of the end of our political culture -- it'd be almost disgusting, if it weren't so fascinating.
Bradley Efron, "Microarrays, Empirical Bayes and the Two-Groups Model" (arXiv)
"...high-throughput devices, such as microarrays, routinely require simultaneous hypothesis tests for thousands of individual cases, not at all what the classical theory had in mind. In these situations empirical Bayes information begins to force itself upon frequentists and Bayesians alike. The two-groups model is a simple Bayesian construction that facilitates empirical Bayes analysis. This article concerns the interplay of Bayesian and frequentist ideas in the two-groups setting, with particular attention focused on Benjamini and Hochberg's False Discovery Rate method."
Databases - A New Frontier
Look familiar? (Scroll down to the bottom.)
Lu, Mitra, Wang, and Giles "Automatic categorization of figures in scientific documents"
[Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries, 2006] Pointed out to me by Bryan (who was back in town from Paris, visiting for a few days), when I was talking to him about my blot and bio-paper extraction ideas.
OpenBugs: Binaries and Installation
OpenBUGS doesn't run on Linux? Really?? Gaaaaah.
OneGeology - Making Geological Map Data for the Earth Accessible
"[OneGeology]'s aim is to create dynamic geological map data of the world available via the web." Unfortunately, their "portal" is "optimized" for Firefox 2, and doesn't "work" on Firefox 3. Lame!
Dan Kaminsky's Black Hat presentation slides on the DNS vulnerabilities
An interesting read, and a good insight into one little particular part of internet/networking security. Eminently interesting up until the slide where he talks about the "Forgot your password?" links on websites, and then it becomes absolutely unbelievable.
"Black Hat 2008: FasTrak toll system completely broken" (Hack a Day)
In which we are glad that we do not drive regularly in California. I wonder how many other toll systems suffer from similar stupid problems? -- "The transponders and readers perform no authentication. Someone could wander through a parking lot with an RFID reader and pick up the ID of every tag in the lot. They could then write their own transponder with the stolen IDs. Here's the really bad part: the transponders support unauthenticated over the air upgrading. You can force any transponder to take on a new ID. An attacker could overwrite every tag passing a certain intersection and cause havoc in the toll system. Some have suggested that there are IDs in the system that are unbilled, since they're assigned to administrators; these would be especially attractive to thieves."
"Airborne in an Emirates A380 at SFO" (Telstar Logistics: Flight Report)
Blogger takes a free demo-ride (two hours) on a new Airbus A380. First class "seats" are actually their own private suites. There is a bar and two private shower stations (for which the plane carries an extra half-ton of water). Videos are posted.
"Delicious.com redesigns, screws pooch" (Nathan Bowers)
Honestly, all these people who are complaining about the del.icio.us redesign are completely missing the point. Sure, the layout is weird. Sure the colors need improvement. Sure, there are a whole host of other tweaks that could be made. But! Three things: (a) 1000 character comments, (b) working (and good!) search, and (c) better handling of tag bundles. "Good" web design pales in the face of real, useful features. If you spent your life studying web design instead of web programming -- well, sorry you wasted your life, guy.
Tony Judt, "The 'Problem of Evil' in Postwar Europe" (The New York Review of Books)
R's aunt was talking about this essay in the car last night. Memory, history, and so forth. Starting with Hannah Arendt and Eichmann in Jerusalem, continuing with Israel and memories of the Holocaust. "Maybe all our museums and memorials and obligatory school trips today are not a sign that we are ready to remember but an indication that we feel we have done our penance and can now begin to let go and forget, leaving the stones to remember for us. I don't know: the last time I visited Berlin's Memorial to the Murdered Jews of Europe, bored schoolchildren on an obligatory outing were playing hide-and-seek among the stones. What I do know is that if history is to do its proper job, preserving forever the evidence of past crimes and everything else, it is best left alone. When we ransack the past for political profit—selecting the bits that can serve our purposes and recruiting history to teach opportunistic moral lessons—we get bad morality and bad history."
Firefox Mobile Concept Video on Vimeo
But zoom-and-pan is quite different from infinite-zoom. I still think it'd be cool to find the actual *content* of a linked-to webpage by zooming way in on (say) the edge of a serif on one of the letters in the text of that link.
Arthur Bentley, "The Process of Government: A Study of Social Pressures." (Google Books)
No sooner do I see it mentioned on Brainiac (http://www.boston.com/bostonglobe/ideas/brainiac/2008/08/nicholas_lemann.html) than I search-for-and-find it on Google Books, scanned and in the public domain. Fantastic! To be read in conjunction with the rest of The Power Broker, I think.
Technology Review: Swallowable Sensors
"As it journeys through the GI tract, a radio transmitter sends information about the time its journey takes, as well as acidity and pressure levels. The level of acidity helps the physician to determine when the capsule enters and leaves the stomach. A receiver that's slightly larger than a cell phone collects the data from the device. And the disposable capsule gets excreted from the body after a day or two. The patient then returns the receiver to the doctor, for downloading and analysis. Each capsule will cost $500, and the entire system, including a docking station and software, costs$20,000."
"Visiting Alexandre Grothendieck " (Roy Lisker)
Via the n-Category Cafe. '"Legend has it (everything is legend) that, out at the farm, he would save his shit in buckets, then go around - ever the good ecologist - to the local farmers trying to sell it . What would be your reaction to being offered the shit of a Fields medalist at bargain prices?"'
[ubuntu] HP Pavilion tx2000 Hardware solutions !!! - Ubuntu Forums
Installing Ubuntu on an HP Pavilion tx2000. Looks like it's going to be some extra work.
The Night of the Gun by David Carr
Videos! Tom Arnold is apparently an old friend of his? The video of "Tommy" describing throwing a beer at the head of a guy in a wheelchair at a Knights of Columbus hall is ... weird. (And what the hell is he doing with his tongue, in that video?)
"The Real Original Maverick" (The Monkey Cage)
"Maverick" was actually a guy's name. In the '30s. I had no idea. (And when I read this post in my RSS readers, for a second I accidentally thought I was reading Edge of the American West.)
GeoDjango Documentation — GeoDjango v1.0 documentation
OOOooooooooh.
Bugs in Ubuntu
Notice the bug with ID #1.
"Ebook Formats..." (DeLong)
"So once again I am reminded of the immortal words of Guy Fleegman..." Knowing that DeLong is a Galaxy Quest fan makes me a little happier inside.
The Verstrepen Lab | Publications
ISMB 2008 - FriendFeed
The live-blog of the recent ISMB conference.
"Euclid’s Algorithm for Polynomials" (The Unapologetic Mathematician)
I swear, if I ever find myself teaching high school mathematics, I'm basically going to print out John Armstrong's blath and use it as class notes.
"A NEW ORDER: A WORD FROM YOWER EDITOR" (Geoffrey Chaucer Hath An Extreme Blog: Go England! It ys Rad!)
"TEN KINDES OF NO TO THE GOWER RUMORS, people." Makes me think I'm back in college again, listening to Laura make jokes.
"Let's make a programming language!" (Lambda the Ultimate)
Frank Atanassow comment at LtU -- "five questions to answer before you design a new programming language," but really these could be applied to almost any significant research task. "If you have only ever programmed in C/C++/Java and Lisp and scripting languages, you have been sitting in a corner your whole life. Perl, Python, Ruby, PHP, Tcl and Lisp are all the same language." But read the whole thing, because all of it is amazing.
"Canoe wives and unnatural semantic relations" (Language Log )
"In my view, there is no sense in trying to develop a taxonomy of possible semantic relations that noun-noun compounds can express, given that one of them would apparently have to be a relation that permits N1 N2 to hold of a person x iff N2 is the name of the relation that x bears to some person y such that y was involved in an incident in which an object of the type N1 played a salient role. Define the notion "natural semantic relation" as you will, this surely isn't one."
"Polynomial Division" (The Unapologetic Mathematician)
John Armstrong's "expository blath" is really excellent. ". The fact that we can find [divisors of polynomials] is called the “division algorithm”, even though it’s a theorem. There are various algorithms we can use to establish this fact, but one of the most straightforward is polynomial long division, which we were all taught in high school algebra."
PDFMiner
Python PDF parser/analyzer.
"ICML/UAI/COLT 2008 Retrospective" (natural language processing blog)
"I'll bet you don't understand error bars (updated with answers)" (Cognitive Daily)
Complete with an updated quiz. Standard error vs. Confidence intervals is a regular gotcha.
Inside the Linux boot process
Since I'm finally running Linux on one of my personal machines again.
"New Understanding of Biology" (The Daily Transcript)
Robin recommended this review of Paul Nurse's opinion piece in Nature on systems biology and signaling networks.
The videos of gear-simulations using sketchup and this plugin, on youtube, are pretty impressive. (Via Shaun.)
qwantz.com - dinosaur comics - May 03 2006
"Hooray for Occam's Razor!"
Connotea: free online reference management for clinicians and scientists
You might find this useful, Giorgios.
"Big Money" (bit-player)
"Inflation at its worst, they find, proceeds at a doubly exponential rate. In other words, prices rise not just as an exponential function of time—exp(t)—but as an exponentiated exponential—exp(exp(t))... This growth law has a simple meaning in terms of everyday experience. With “ordinary,” single-exponential inflation, prices have a constant doubling time. If bus fare was 1 million last month and 2 million this month, it will be 4 million next month. Under double-exponential growth, the doubling time itself decreases exponentially. In the last months of the Hungarian inflation the doubling time fell from about 20 days to 15 hours."
"Classifying Olympic athletes as male or female, leading to a comment about the recognition of uncertainty in life" (Andrew Gelman)
"I think people are often uncomfortable with ambiguity. Boylan correctly notes that sex tests can have problems and that there is no perfect rule, but then she jumps to the recommendation that there be no rules at all."
"Wearables and Usability Pt I" (John Snavely's Blog)
"I have a whole book of “art projects”, many of which are wearables, done by architecture students. I am completely guilty of this bs too. There’s a point where sticking an RFID tag inside a sock (or whatever) and having it connect to a social sock network seems like a good idea that will distract your critics from realizing that nobody wants to be sock friends. But since “social networking”, “RFID” and “Wearables” are all cool words you’ll be ok."
"Veal Stock and Remouillage" (Elements of Cooking)
Sounds easy enough.
"Unsolvable Inhomogenous Systems" (The Unapologetic Mathematician)
"Even Worseness: The Even-Worsening" (Sadly, No!)
WSJ article quotes are taken from the WSJ writer trolling a couple of internet message boards. And this is surprising because...?
"THIS IS NOT MLM!!!" - An Appreciation (kung fu grippe)
Like the internet-version of informercials.
"The Index of a Linear Map" (The Unapologetic Mathematician)
What I should do is find a basis for the column-space, Im(A), of A, and then ... find a basis for U. Which I could do using using something like Gramm-Schmidt and the canonical basis for R^n, right? And then, if the vector has an image in U, we know there's no solution, right?
"beyond rest" (joshua's blog)
XMPP and REST and alternatives. Push vs. Poll. Just clearing out the tabs, here...
"Compositional Machine Learning Algorithm Design" (Machine Learning (Theory))
"People may be dissatisfied with the component assembly approach to learning algorithm design, because all of the pieces are not analyzed together." The post title itself was enough to pique my interest.
qwantz.com - dinosaur comics - May 14 2008
"let's never talk about death again!"
"Hello, Walls" (Grammar.police)
Hooray!
"GNU Radio is a free software development toolkit that provides the signal processing runtime and processing blocks to implement software radios using readily-available, low-cost external RF hardware and commodity processors." Via the CSAIL mailing list.
Kim et al. "Pluripotent stem cells induced from adult neural stem cells by reprogramming with two factors"
Oct4 + { Klf4 or c-Myc }
Freeth et al. "Calendars with Olympiad display and eclipse prediction on the Antikythera Mechanism"
[Nature 454, 614-617 (31 July 2008)] The new paper on the Antikythera mechanism.
Gelman's response to his own April Fools joke (in the journal of Bayesian Analysis no less), and its responses. "In a nutshell: Bayesian statistics is about making probability statements, frequentist statistics is about evaluating probability statements."
Automated Market Maker for Intrade contracts
BIG GIANT HELVETICA
Twitter does not need to get any bigger.
Justin Smith, "What's _Really_ Wrong With Bestiality" (3quarksdaily)
In which we attempt to locate why we *really* are meat-eaters who find animal cruelty to be ... well, cruel.
Bostonist: Boston Blotter: 15-Year-Old Shot Dead in Brockton
My little corner of MA is going to hell-in-a-handbasket. That Market Basket is right down the street from where you used to live, Ms. C!
Affymetrix in new patents row : Nature News
MIT sues Affy. "And there is no word yet about any effect the lawsuit might have on Affymetrix's relationship with the Broad Institute — a genomics institute run by Harvard University and MIT."
Mancera et al. "High-resolution mapping of meiotic crossovers and non-crossovers in yeast" (Nature)
"DOM DocumentFragments" (John Resig)
Performance improvements to standard DOM manipulations.
YSlow for Firebug
Tips for web performance improvements; integrated with the Firefox plugin.
"All the Answers" (Charles Van Doren, in the New Yorker)
Gerry sounds like a wonderful person.
"Crimes and Misdemeanors" (Slate)
Graphing Bush Administration scandals and participants -- a laudable goal, but ... augh, a "Venn" diagram? ('Cause it's not even that?) Someone get junkcharts to update this.
"Plenty of Blame to Go Around in Yahoo Music Shutdown" (Freedom to Tinker)
Buying DRM'ed music is *always* a bad idea, no matter who from.
Python interface to ffmpeg. For use with those movies of maps that Ben Fry linked to. (Baghdad Mapper v 2.0, basically).
"Atom Wins: The Unified Cloud Database API" (Anil Dash)
This has been sitting in my tabs for a week or two.
week267: John Baez talks about Wallpaper Groups
"Some people say that tilings with all 17 possible "wallpaper groups" as symmetries can be found in the Alhambra. This article rebuts that claim with all the vehemence such an academic issue deserves, saying that only 13 wallpaper groups are visible."
Conway and Huson, "The Orbifold Notation for Two-Dimensional Groups "
Behind a paywall, but still. Explains the cryptic notation that I could not, for the life of me, pick up from his book.
Jozy Altidore: Love Those Siestas - Goal - Soccer - New York Times Blog
Ronaldinho and Robert Pires can't believe Jozy Altidore is only 18. Get good quick, Jozy, we're going to need you in two years.
"Trixie Tracker, Or When Nerds Have Babies" (The Monkey Cage)
Apparently a well-known phenomenon at MIT Medical: new parents who keep lab notebooks on their babies. It was only a matter of time before they turned it into a website with crappy graphs.
Mark van Atten, "The Development of Intuitionistic Logic " (Lambda the Ultimate)
"This article gives an excellent account of the development of intuitionistic logic, from its roots in Brouwer's theological metaphysics, through to its formal presentation by Heyting in 1956."
"R is too strongly typed" (Andrew Gelman's Blog)
'a.hat <- fixef(M2)["(Intercept)"] + fixef(M2)["u.full"]*u + unlist (ranef(M2)$county)'? Saints preserve us. "Hierarchy and Emergence" (The n-Category Café) This is what it reads like, Larry, when you Fight a Stranger in the Alps. IYKWIM. "Biases of Elite Education" (Overcoming Bias) Probably about Even Wrong -- which is to say, 50% True, 50% Bullshit. "Elite" schools are like rich fish in a barrel, anyway. Ironic to see the sentence, "My experience confirms all of this," on a blog titled "Overcoming Bias." Anyway! labmeeting Hm. "Mapping Walkability – San Francisco" (Lee Byron) WalkScore combined with Google Maps. But it'd be nice to combine this with OpenStreetMap, too. "Graphs, Trees, Materialism, Fishing" (Cosma Shalizi at The Valve) Cosma Shalizi's post about Moretti's book, from the Valve book event on the same. Vicary, "Categorical properties of the complex numbers" (arXiv) Via the n-Category Cafe. Paul Yale, "Automorphisms of the Complex Numbers" [JSTOR: Mathematics Magazine, Vol. 39, No. 3 (May, 1966 ), pp. 135-141] Via the Dialogue on Infinity blog. "Wild" automorphisms of complex numbers. Zoubin Ghahramani, "Recent directions in nonparametric Bayesian machine learning" Video. Via Andrew Gelman's blog. Dynamic programming equations on the third slide -- nice. "How Radovan Karadzic Made Bosnia Suffer" (NYT Op-Ed) "Justice is good, but a peaceful life would have been much better." Sacrilege! — Crooked Timber D^2 kinda annoys me, but *this* is truly inspired. And of course, none of the commenters seem to really get it. Why do I even read the comments? I don't know. Also: when did CT get redesigned? "Go England! It ys Rad!: A NEW ORDER, by the Lords Appellant and Tho. Favent" (Geoffrey Chaucer Hath An Extreme Blog) "We haue replaced Chaucer wyth a top team of new media specialistes." Musing about lifestreams, subscribe-aggregation and publish-aggregation "Beer prices vs. wine prices" (Marginal Revolution) It takes seven comments before someone gives the right answer (at least, for *me*). "The Truly Strange Research Hall of Fame's First Inductee..." (The Monkey Cage) Every time a urinal is micturated upon in this fair city, they are required to time the unwitting participant. I wonder what the IRB application looked like for this one? http://www.local-motors.com/ My friend Ben is working for them this summer... "Fourteen Passive-Aggressive Appetizers" (The New Yorker) "Mock tofu" has entered my personal lexicon. The Freebase Blog The freebase blog, which I mentioned to you yesterday, and which I've been following... Goubault, Putot. "Perturbed affine arithmetic for invariant computation in numerical program analysis" (arXiv) How reliable is DNA in identifying suspects? - Los Angeles Times "Based on Troyer's results, she and her colleagues believed that a nine-locus match could point investigators to the wrong person." The NBC Show ‘Heroes’ Is Ready for Its Rebound - NYTimes.com On Sept 22. "Me and My Girls" (David Carr, NYT) "But you know, if I was wrong about the gun ... what else was I wrong about?" Memory, drugs, detox, and children. I'm sold -- I need to buy the book itself. "We Have Charts and Graphs..." (Sticker Giant) I'll take a box of 1,000, please. DeLong quotes Peter Orszag on Anesthesiology and Health Care Reform. Just popped up in my RSS, and I figured you might find it interesting. "Anesthesiology provides one example of a great success story in putting evidence- based standards into practice. " Athans, Ku, and Gershwin, "The uncertainty threshold principle: Some fundamental limitations of optimal decision making under dynamic uncertainty." (IEEE Trans. on Auto. Control) [Automatic Control, IEEE Transactions on , vol.22, no.3, pp. 491-495, Jun 1977] IEEE Xplore. Picked up from Bertsekas's book on Dynamic Programming. PHYS771 Lecture 16: Interactive Proofs and More Scott Aaronson's lecture. "Apparently, if you give this puzzle to people, the vast majority get it wrong." Dasgupta, Papadimitriou, and Vazirani "Algorithms" Draft of the book, online. "Fear & Loathing in America" (Hunter Thompson, ESPN.com) "We are going to punish somebody for this attack, but just who or what will be blown to smithereens for it is hard to say. Maybe Afghanistan, maybe Pakistan or Iraq, or possibly all three at once. Who knows?" "the rhetorics of numbers" (orgtheory.net) Sort of like the opposite of "overcoming bias," right? "Maybe Normalizing Isn't Normal" (Coding Horror) "As the old adage goes, normalize until it hurts, denormalize until it works." "ETech '07 Summary - Part 2 - MegaData" (Joe Gregorio, BitWorking) The comments about RDF are illuminating. "autoincrement considered harmful" (joshua's blog) Schachter is the del.icio.us architect. I need to think more carefully about the structure of some of our data in our databases. Fertigo Pro Free font. To be used for presentations, probably. ZooKeeper (SourceForge) "ZooKeeper is a high available and reliable coordination system. Distributed applications use ZooKeeper to store and mediate updates key configuration information." Written in Java. R.A. Radford, "The Economic Organisation of a P.O.W. Camp" [Economica, vol. 12, 1945] To be read in conjunction with that Taleb piece on how Merton and Black and Scholes really just ripped off Thorp. Thsrs (Shorter Thesaurus) From the Ironic Sans guy. I need something like this, but for *ideas*. JAGS "JAGS is Just Another Gibbs Sampler. It is a program for analysis of Bayesian hierarchical models using Markov Chain Monte Carlo (MCMC) simulation not wholly unlike BUGS." Errol Morris, "Believing Is Seeing" (Op-Ed - NYTimes.com) Morris on the Shahab missile photoshopping incident. "...the missiles are now more than ever firmly embedded in the popular imagination." "Rising Inequality Has Not Been Offset by Mobility" (Consider the Evidence) Using the PSID to answer questions about *income* mobility. Links to a paper. For use in the any upcoming family "discussions" about the topic. "Build the Open Shelves Classification" (Thingology (LibraryThing's ideas blog)) Didn't I save a link to this already? "Throw Dewey under the train!" Building an "open-access" version of the Dewey Decimal System. Mimram, "Presentation of a Game Semantics for First-Order Propositional Logic" (arXiv) "Game semantics aim at describing the interactive behaviour of proofs by interpreting formulas as games on which proofs induce strategies." Via John Baez (Theorems into Coffee) "Murky Coffee, Arlington: Hold That Espresso Between Your Knees" (And I Am Not Lying) Honestly, I just like the phrase "patronizing millenial." Galenson, Kotin "From the New Wave to the New Hollywood: The Life Cycles of Important Movie Directors from Godard and Truffaut to Spielberg and Eastwood" (NBER) "Old Masters and Young Geniuses." Two life-cycles of genius. Via the Odd Numbers blog (Zubin Jelveh). When we were 25, my old roommate used to lament how L.W. had written the Tractatus at 25. "What have *I* done with my life?" Visualization: BioHeatMap - A Heatmap for Gene Expression and Other Data A Google Chart plugin. So they're ... varying the alpha linearly? Where's that Vaguery-link to color and visualization? "Correct" color usage never gets dealt with in these kinds of applications. "Special Volume: Statistical Modeling of Social Networks with "statnet"" Journal of Statistical Software radiohead - Google Code Data download for the House of Cards video. Jason Morton: Research Morton, Pachter, Shiu, Sturmfels, Wienand. "Geometry of rank tests" (arXiv) "Arithmetic natural proofs theory is sought" (Shtetl-Optimized) As always, an interesting and informative post made priceless by its comments -- Bram Cohen (BitTorrent guy) shows up. Also, the comments get trolled by one "Craig," who has a *directional* problem. SA's (warranted) banning could have been funnier. "Google Spreadsheet, My Asset Allocation, Investing 101" (random($foo))
"Best economics paper of 1958" (Marginal Revolution)
It would be nice to see a list of items like this (along with runners-up) for *every* year. In more than one subject. But such a list would require the encyclopedic knowledge of the intarwebs...
"What Science Blogs Can’t Do" (Science After Sunclipse)
I've had this open in tabs for several weeks, and only just now got around to reading it. Wow -- way too many good points to summarize in ~250 characters. Instead, note the term "blook", but really: read the whole thing.
Farhad Mehran, "Bounds on the Gini Index Based on Observed Points of the Lorenz Curve" (JSTOR)
[JSTOR: Journal of the American Statistical Association, Vol. 70, No. 349 (Mar., 1975 ), pp. 64-66]
Intute - the best Web resources for education and research
Curated, subject-specific links. Via Ars Mathematica.
Max Payne Game Script
"Out in the night, snow fell like confetti over the Devil's parade." (The movie is going to be *awful*).
"I wish that son of a gun would take that other hand out of his pocket." (The Edge of the American West)
SEK talks about Heinlein. "Bob can write a better story, with one hand tied behind him, than most people in the field can do with both hands. But Jesus..."
"The Visible FISA" (Julian Sanchez)
Truly great! But really, this should be an interactive computer program (or, what's the same thing, an interactive website). It could highlight the cases where the two laws differ. John, want to do some remote Flash work this weekend?
"Revisiting Coroutines" (Lambda the Ultimate)
"Taking this to the extreme implies that yield and resume can be unified into a single 'invoke' operation which accepts a coroutine argument to be used in a subsequent invoke operation. Indeed, these are 'symmetric coroutines'."
"@ Future of Journalism: Adrian Holovaty's vision for data-friendly journalists" (PDA Blog @ the Guardian)
Two questions and a statement: Is "orientated" a word? Is EveryBlock going to disappear (without getting to Boston!) in a year or so when its funding runs out? The web's real "dark matter" is data that's *not* on the web (i.e. on a journalist's desk).
"Propped up by a culture of fear, TSA has become a bureaucracy..." (Patrick Smith, Ask the Pilot)
Wait, gel inserts are forbidden now?? As Bruce Schneier says: "everything else is security theatre." Of course, the difference between me or you and Patrick Smith is that *he* has a widely read air-travel column online.
J. Michael Steele Bibliography (Research Publications Links to Books and Articles)
100+ pdfs.
Charles Murray, "How to Accuse the Other Guy of Lying with Statistics" (Statistical Science)
[Statist. Sci. Volume 20, Number 3 (2005), 239-241.] A more passive-aggressive and bitter three pages have rarely been seen. Point #6 is particularly remarkable for its invocation of Goebbels (!). Awesome.
ragebunny - Scary
More on biking in Boston.
PyMC
MCMC in Python. BUGS-like. Good examples.
Snack Foods That Sound Like Sex Acts - a set on Flickr
Yes. Now, show me the list of sex acts that sound like snack foods.
"Google App Engine optimizations" (Niall Kennedy)
I'm sorta collecting links on google's app-engine (not to be confused with Apple's new App Store) and optimizations thereof. My first attempt at an appspot application did not go so well -- every query well over the median request time.
Quest for the Perfect Chocolate Chip Cookie - NYTimes.com
OSRC: The Operating System Resource Center
Lots of tutorials and examples. Some out of date. Many useful.
"Should the driving rules favor cars or bikers?" (Marginal Revolution)
In point #2, "car" s/b "escalator," and what do you get? A dumb argument. Stick to point #1.
"A Girl Named Florida" (Marginal Revolution)
Radford Neal shows up in the comments and starts flaming people. Awesome...
Documentation - ftputil - Trac
A better python library for interacting with FTP.
"Fart Spray (And Disgust) Makes Moral Judgments More Severe" (Mixing Memory)
A comprehensive theory of Leon Kass! Maybe he once had an office-mate who didn't clean up after himself.
Labs - Neighborhood Boundaries | Zillow Real Estate
Zillow (an internet real-estate evaluation company) distributes neighborhood boundaries under the CC license. Seems like it should go in OSM, right?
"Basic Concepts in Science: A list" (Evolving Thoughts)
Semi-comprehensive blog-post index on basic science posts from big-name blogs. (But why is "proof by contradiction" listed under "statistics"??)
"Euro-update 2: Is science art?" (Cognitive Daily)
Blank Canvas vs. Scientific Manuscript. Inspires a much more rancorous comment-thread than you'd expect.
"Miles per Gallon vs. Gallons per Mile" (bunnie’s blog)
"Apparently the thinking that gas savings is linear with MPG is not uncommon." For some reason, this post has been on my mind for several days.
"An individual right to keep and bear arms: Ho, hum." (Mark Kleiman)
"There's simply no evidence that keeping guns out of the hands of those currently eligible to own them under Federal law ... reduces the level of criminal violence."
"Capital punishment and recidivism" (Andrew Gelman)
More attention than that inane Mankiw post is worth. (FWIW, there's "the problem of recidivism" for society as a whole, and for particular prisoners specifically, and the two are probably not the same thing.)
"Understanding the Difference Between Column-Stores and OLAP Data Cubes" (The Database Column)
The data-cube stuff reminds me of the Anderson "petabyte computing" idiocy (although data cubes: not idiotic). Except Anderson's point would seem to be, "we can use data-cubes without worrying about schemas." Right?
Google meets MUSHs meets Second Life meets ... weird.
"The More Things Change…" (Technology Liberation Front)
FBI, wiretapping, history, FISA.
"Jay-Z And The Notorious B.I.G. - Allure (Ratatat Remix)" (YouTube)
I can never remember which album these remixes come from -- you've heard this one already, right?
Videos! Need to look for some on Android.
"Protocol Buffers are a way of encoding structured data in an efficient yet extensible format. Google uses Protocol Buffers for almost all of its internal RPC protocols and file formats."
Data Primitives - OpenStreetMap
"Closed Ways" have to be detected by comparing the starting and ending nodes, and are often used to represent (!) buildings, as well as "ways" in particular. Which seems an odd overloading, but okay.
SUMO - Simulation of Urban MObility - Home
Traffic simulator. Can be built on top of the XML output of the OpenStreetMap people, which means... cool things, probably.
"Blog all dog-eared pages: Regarding The Pain Of Others, Susan Sontag" (Infovore)
I've read excerpts before (and these are some good ones), but I should just go ahead and buy this book.
CKAN - Comprehensive Knowledge Archive Network - REST API - Index
The RESTful API to CKAN.
Orbited – Networking for the Web
"Orbited provides a pure JavaScript/HTML socket in the browser." Again, there was some discussion of technology like this over the weekend in WI.
Lime And Ginger Cream Cheese Bars -- Courant.com
"Occasional Poem - Margaret Atwood" (The Culturatti)
That Atwood poem that's so right for so many occasions.
"Playing with SymPy" (Reasonable Deviations)
SymPy can calculate Grobner bases, I didn't know that.
Cornell University Law School -- States: Listing by jurisdiction
Cornell's listing of links to state statutes and legal resources, for all 50 states.
Debunking the "Obama got a sweetheart mortgage deal" story. Which reared its head, briefly, this weekend in WI.
"Reassessing the Reassessment of the Logic of Suicide Terrorism" (The Monkey Cage)
"... not only adds more data on the global patterns of suicide terrorism ... [but also] tests the main hypotheses against all of the other causal factors that are prominent in the literature across several domains..." Sounds like it'd be worth reading.
Lawrence Wright, "The Spymaster" (New Yorker)
Wright quizzes McConnell about being (personally) monitored on page 12.
Publications: Joel A. Tropp, California Institute of Technology
SimplyNoise.com
White noise generator.
Herbert Spencer - Patriotism
"To them no folly seems greater than that of practising on Monday the principles they profess on Sunday."
"The secrets of *Lost*, revealed" (Marginal Revolution)
Looks pretty convincing -- however! No discussion of the archaeology theme, the "temple," the ruins, or the four-toed statue. Which seems like a major missing point.
"Queue everything and delight everyone" (0xDECAFBAD)
"And, in the end, that's really the purpose of a web-based content creation interface—accepting something as quickly as possible to make the user happy enough to continue submitting more." Sort of like a rat pulling a lever.
Salikhmetov, "The Heap Lambda Machine" (arXiv)
"This paper introduces a new machine architecture for evaluating lambda expressions using the normal-order reduction, which guarantees that every lambda expression will be evaluated if the expression has its normal form and the system has enough memory."
Freund and Hsu, "A new Hedging algorithm and its application to inferring latent random variables" (arXiv)
"We also sketch how a regret-based algorithm can be used as an alternative to Bayesian averaging in the context of inferring latent random variables. "
"Yoopick: A sports prediction contest on Facebook with a research twist" (Oddhead Blog)
The first Facebook App I've signed up for on my own.
Best visual joke of the day.
"Maximum Margin Matrix Factorization" (StyleFeeder Tech Blog)
Jason explains some of Nati's older research, with links. Fun to see these matrix factorization techniques in use in the wild. Jason's work (as he described it to me) on getting these things to run on a large scale has been pretty impressive.
"Reviewing Horror Stories" (Machine Learning (Theory))
JL reviews the papers he had inexplicably rejected. Some good ones in there.
"Listserv Etiquetter V: In which nothing is ever "merely" anything" (Acephalous)
Here in CSAIL, it's not adding the 'Q' to LGBT -- it's correcting people who send PDF attachments, or ask for help with MS products. And the one doing the correcting has the initials RMS. Otherwise, this is perfectly accurate.
"What determines fertility?" (Marginal Revolution)
For the next time that someone I know (ahem) whips out the old "Eurabia" canard.
"How to load large files safely into InnoDB with LOAD DATA INFILE" (MySQL Performance Blog)
Looks like it could be vaguely useful at some point.
"Running C and Python Code on The Web" (Toolness)
"Type Camp 2008" (i love typography, the typography blog)
I know you'll be vacationing still for part of this time, but I thought it'd be fun to forward it to you nonetheless.
Julian Sanchez at Ars Technica, "Judge: FISA trumps state secrets, binds executive branch"
EVEREST Testing Reports
Via Dan Wallach blogging at Freedom to Tinker. Reports on security evaluations of Ohio voting machines. EEEEeeeeeeek.
Lynch et al. "A genome-wide view of the spectrum of spontaneous mutations in yeast"
Junk Charts does Ben Fry! Which is funny, because in some circles his baseball graphs are some of his best-loved visuals.
My first gedit plugin - RussellBeattie.com
For when my new lab machine arrives, and I'm finally running GNOME at the office...
"No idea more obscure and uncertain" (Crooked Timber)
Swartz and Easwaran for the win...
eigenfactor.org - ranking and mapping scientific journals
At least they can generate beautiful graphics. Presupposing the existence of *modules* though, at least in biological networks, is completely question-begging. And can we stop using "eigen" as a prefix now? (kthxbye.)
"ContextFree.js & Algorithm Ink: Making Art with Javascript" (Aza’s Thoughts)
Completely and totally sweet. The initial focus on the syntax of the language is ... perfect.
"Sunstein-Wolfers on Capital Punishment" (Greg Mankiw's Blog)
Wow. Not really a subject deserving of such a flippant response. Presumably life-without-parole cuts down on the recidivism rate, too.
Needs an interface for sub-classification and search, but still... actually pretty sweet. Somewhat powerful in its simplicity ("simplify, simplify," should be our motto more often).
HopStop.com
I need to try this out and see how efficient their bus routing is, but this looks pretty useful.
AMD Stream Computing
Interfaces to the GPU from AMD (since I'm about to get a fancy new AMD-powered computer, here at work).
Fiddler
"Fiddler is a HTTP Debugging Proxy which logs all HTTP traffic between your computer and the Internet. Fiddler allows you to inspect all HTTP Traffic, set breakpoints, and "fiddle" with incoming or outgoing data." Via the YUI blog.
Lederman, "The Strangest (and Perhaps Most Revealing) Sentence in Justice Scalia's Boumediene Dissent" (Balkinization)
"Colbert: Right! – So we’d better just keep them all in there, for safety’s sake, ya know? The devil you know and the devil you don’t know – just lock them all up!"
Mark Denbeaux, "Justice Scalia, The Department of Defense, and the Perpetuation of an Urban Legend"
Via Marty Lederman. 12, not 30, of the released detainees could be said to have "returned to the fight." Of those, only 1 took up arms against the US. None of the detainees were released by federal courts or the CSRT.
"The End of Theory? Not Likely" (Freedom to Tinker)
Ed Felten on the Anderson essay. "But the reason is not that the correlation might have arisen by chance — that possibility can be eliminated given enough data. The problem is that we need to know what kind of causation is operating."
Van Roy, "The Challenges and Opportunities of Multiple Processors: Wy Mulit-Core Processors are Easy and Internet is Hard"
"The challenge of programming multi-core processors is real, but it is not a technical challenge. It is a purely sociological challenge." Via LtU. I gotta buy Nancy Lynch's book.
"Science versus Engineering" (Seth’s blog)
"Anderson gives no examples of this approach to science being replaced by something else." -- this could be a game, "find the best one sentence response to Anderson's article."
"The End of Theory: The Data Deluge Makes the Scientific Method Obsolete" (Andrew Gelman)
"Faster computing gives the potential for more modeling along with more data processing." Personally, I thought the Anderson article sounded more like John Derbyshire at his most technocratic, than anything else.
Ravenal's "Complex Cobordism and Stable Homotopy Groups of Spheres"
Wordle - atgdel
Wordle's cloud of my own del.icio.us tags. Apparently all I do is read (and tag) blog posts. EEEsssh.
Zerbino and Birney, "Velvet: Algorithms for de novo short read assembly using de Bruijn graphs"
( 18 (5): 821 -- Genome Research) -- "short read assembly problem."
"A Path to Road Safety With No Signposts" (NYT Magazine)
How to drive in traffic that doesn't stop. I might have mentioned this to you before... I know I've talked about it, a lot.
Microsoft Popfly
You have to sign up for MSN ID and install Silverlight... a little graphics intensive, and I can't get it to work on a simple RSS processing example, but it looks interesting.
WNDB(5WN) manual page
The WordNet database files are (a) text files, but (b) are indexed by byte-offsets. I can't tell if this is hideous or hilarious (probably both).
"LA-602 vs. RHIC Review" (Overcoming Bias)
A miscalculation involving lithium ended up costing the life of a Japanese fisherman.
Gordon Miller, and Ostapenko. "Optimal hash functions for approximate closest pairs on the n-cube" (arXiv)
"One way to find closest pairs in large datasets is to use hash functions." To read.
Eclipse Memory Analyzer Open Source Project
Looks useful. Recommended by the Stylefeeder guys.
"Bissinger's Blog Bashing: Under the Bottom, and Off Target" (Dan Steinberg)
Best comment on the whole Buzz Bissinger/Deadspin thing, which functions perfectly (as do many things in Sports) as a proxy for old-media vs. new-media debates. Should send this to Strato.
django-rdf
"Django-RDF is an RDF engine implemented in a generic, reusable Django app, providing complete RDF support to Django projects without requiring any modifications to existing framework or app source code..." Cool.
Debategraph
Semantic web meets online (multi-site) debates. Awesome idea. Completely hideous browser/visualization. Can we get some real graphics and UI skillz up in here, please?
"We need a Wikipedia for data" (Bret Taylor)
"The trouble with institutional repositories" (Science in the Open)
Roberge, Blanchet, Dodson, Guderley, and Bernatchez. "Disturbance of Social Hierarchy by an Invasive Species: A Gene Transcription Study" (PLoS)
Invasive species meets gene expression.
Piwik - Web analytics
"piwik aims to be an open source alternative to Google Analytics." Nice graphs.
Devlin, Roeder, Wasserman. "Genomic control, a new approach to genetic-based association studies"
Yan, Isard, and Liberman. "Different Roles of Pitch and Duration in Distinguishing Word Stress in English"
Via the Language Log: http://languagelog.ldc.upenn.edu/nll/?p=252 Includes an analysis of American Supreme Court recordings, although "Clarence Thomas didn't speak often enough to be included in the analyzed data."
Ho, Imai, and King. "Matching as Nonparametric Preprocessing for Reducing Model Dependence in Parametric Causal Inference"
Via the IQSS blog.
Kevin Quinn, "What Can Be Learned from a Simple Table? Bayesian Inference and Sensitivity Analysis for Causal Effects..."
Via the IQSS blog.
Halpern, "Defaults and Normality in Causal Structures" (arXiv)
ettercap
"Ettercap is a suite for man in the middle attacks on LAN. It features sniffing of live connections, content filtering on the fly and many other interesting tricks."
"Uniformed Letter Carrier with Child in Mailbag" (Smithsonian's Flickr Account)
The description says it's "a humorous photograph," but neither subject is smiling.
"Are you not entertained?" (soccernista.com)
"... yet the Portuguese talent always seems better placed on teams with more discipline and organization. The problem with Portugal? Too many Portuguese players."
"Java is finally Free and Open" (Rich Sharples’ Blog)
A good thing.
"Brouwer Fixed Point Theorem" (Ars Mathematica)
It took me a while to track down a downloadable version of the Scarf paper.
DecaffeinatID: A Very Simple IDS / Log Watching App / ARPWatch For Windows
"Designing Systems for the Grid: The Problem with "Retrofitting," Part 1" (The Database Column)
Optimizing queries with joins in RDBMSs, a very basic/light overview. The Catalan numbers make an appearance.
"What is the probability of the Large Hadron Collider destroying the universe?" (Overcoming Bias)
"If you think the answer is zero please don’t bother posting a comment since your knowledge of probability theory is insufficient for your comment to be informative." What an odd comment! I'm not sure what "probability theory" has to do with it.
Algebraic Data Types in JavaScript - w3future.com
"The strong law of large numbers" (What’s new)
"the strong law .. is a little more subtle, so much so that its proof usually only appears in advanced graduate texts. So I thought I would present a proof here of both laws, which proceeds by the standard techniques of the moment method and truncation."
Irongeek.com
Premature generalization of Tufte is the root of all evil.
Kellogg's Lego® Fruit Flavored Snacks
Entry into the Annals of Bad Ideas.
6 New Tainted Tomato Cases in City - City Room - Metro - New York Times Blog
"The sum-product phenomenon in arbitrary rings" (What’s new)
It's this lingering question of mine, how stuff like this is related to "sum-product algorithms" in things like graphical models. Tropicalization appears in the first (and as of now, only) comment. Is that part of the connection?
"Hoping to control stem cells at Cold Spring Harbor" (The Niche)
To blog.
Roadmap to Torture - The Washington Independent
Spackerman summarizes the recent Senate Armed Services Committee Hearing.
No, just *you*.
"An Improved Analysis for "Why Simple Hash Functions Work"" (My Biased Coin)
Because I've been thinking about hash functions recently. "...whenever anyone actually does an experiment involving hashing, you get the right results no matter what hash function you use. "
"Who are the aggressive drivers?" (Marginal Revolution)
Don't forget: Massachusetts plates.
"File Recovery: How to Recover Deleted Files with Free Software" (Life Hacker)
Geohash for spatial index and search - GSWB
"Given any two pairs of latitude and longitude coordinates about a similar location but of different granularity, the strings produced by the algorithm will always share a common prefix string."
"User authentication in Django" (Django Documentation)
Users in Django. Need to add this to the cavitynesting stuff.
Manual Page: logresolve - Apache HTTP Server
logresolve resolves IPs -> hostnames in apache log-files, in a semi-intelligent way (hashing old results).
Don't talk to the police - Hack a Day
"PRAWN!" (April Winchell)
"Aw yeah. Now that is some hot salmon." Funniest/most-disgusting thing I've read all day. Totally NSFW.
Long Bets - On the Record: Bets
Long Bets is a fun site, but the thing it makes me ultimately conclude is: Kevin Kelly is crazy. Bigfoot? Seriously?
"Detecting ISP throttling" (Hack a Day)
Google is building software to detect ISP throttling? That's awesome.
"Richard McKenzie's Popcorn" (Felix Salmon at Portfolio.com)
"if you're in a cinema which gives you a choice between buying a medium bag of popcorn and a large tub of popcorn, there's a greater-than-50% chance that the medium bag will actually contain more popcorn than the large tub..."
"Research-Changing Books" (Inductio Ex Machina)
It'd be worth thinking about which books, if any, have changed the way I do research...
ATMCS2
Program from the 2006 conference.
Mod_python Manual
Dare Obasanjo aka Carnage4Life - Two Cardinal Sins of REST API Design: Lessons you can Learn from the NewsGator REST API
"Evaluating topic models" (natural language processing blog)
What *would* "topic models for the sake of topic models" look like? "I must publish a paper on topic models, so that no one forgets the name 'Dirichlet'?"
"Libnet is a generic networking API that provides access to several protocols."
Caulkins, "Modeling the Domestic Distribution Network for Illicit Drugs" (JSTOR)
(Management Science, Vol. 43, No. 10 (Oct., 1997 ), pp. 1364-1371)
"It Feels Like Home" (freedarko.com)
"For all of you who wear grad school like a badge, consider yourself knighted if you pick the just-right theorist for that sentiment." Freedarko addresses the NBA reffing/corruption controversy.
"More mathematical confections" (Secret Blogging Seminar)
The Poincaré disk in shortbread form.
Nordstrom, "Models of Computation: Primitive Recursive Functions"
Informal, readable undergraduate-level review. Should be second-nature to any graduate student of computer science. No claims to completeness, of course, but useful nonetheless.
Stylized Depiction: Non-Photorealistic, Painterly and 'Toon Rendering
Lee, Nadler, and Wasserman "Treelets -- an Adaptive Multi-Scale Basis for Sparse Unordered Data"
Andrew Beyer - The Story Behind Big Brown's Bad Belmont May Never Be Known - washingtonpost.com
Someone's on their trail -- so the jockey's going to take the fall! I'm looking into hosting options for DirtyBrown.com
Djangofriendly
Django-ready hosting services. WebFaction is recommended.
"Generic CSV Export" (Django snippets)
It's the HTTP response header stuff, that always trips me up.
"What to teach in a Bayesian data analysis course" (Andrew Gelman)
"The key thing in the early chapters is to not obsess on the question of 'where do the priors come from.' They're just models, they come from the same place that likelihoods come from."
"Security Theater" (mrh at Unlikely Words)
Seriously? I just say, "I forgot my ID," and that's all it takes?
Charles Bukowski - We Ain't Got No Money, Honey, But We Got Rain
"and the jobless men stood // looking out the windows // at the old machines dying // like living things out there."
Andrews, D'Argenio, van Rossum, "Significant Diagnostic Counterexamples in Probabilistic Model Checking" (arXiv)
"(Finite) paths in counterexamples are grouped together in witnesses that are likely to provide similar debugging information to the user. " It'd be neat to apply this to other kinds of models, but I need to read the paper first.
Eucalyptus
"Elastic Utility Computing Architecture for Linking Your Programs To Useful Systems - is an open-source software infrastructure for implementing "cloud computing" on clusters."
"NASA: 'Extreme programming' controls Mars Lander robot"
A glimpse at the programming team that controls the Phoenix lander, writing control code daily and then uploading it millions of miles away. (But they're writing day-to-day stuff in C? Really?)
"To Justify Something Is To Diminish It?" (John Holbo at Crooked Timber)
"Being an academic [for Stanley Fish] means never having to say you’re sorry for not having reasons."
American Public Media: Budget Hero
"A New Step In Evolution" (The Loom)
Carl Zimmer writes about the Lenski E. coli evolution experiments -- they've spontaneously developed the ability to metabolize citrate. Hottt.
"Carbon Tax: How Much, How Soon?" (Capital Gains and Games)
Yeeesh.
"Are Machine-Learned Models Prone to Catastrophic Errors?" (Datawocky)
This doesn't seem like it's a feature of "machine learning" in *general*. Contrast EP and BP methods of approximation in graphical models, D(P||Q) vs D(Q||P).
"Breaking Boundaries" (The Loom)
Carl Zimmer has scarred me, forever.
Ierusalimschy, de Figueiredo, Celes "The Implementation of Lua 5.0"
Discussing the implementation of a particular interpreted language. (Closures, one-pass compiling.)
Shi, Gregg, Beatty, Ertl "Virtual Machine Showdown: Stack Versus Registers"
The title has an "Ecks vs Sever" ring to it. Comparing performance of stack-based (ie Java VM) vs register-based implementations for interpreted languages.
Ertl, Gregg "The Structure and Performance of Efficient Interpreters" (ResearchIndex)
Branch optimization in the design of an interpreted language.
"APS 2008: Doing algebra -- it's the little things that count" (Cognitive Daily)
Ordering and notation in high-school algebra. See, what I take away from this is that it must be *very hard* to design algebra tests correctly (to actually test what you want to test). Test-writers probably don't even recognize their own mistakes.
How to Create Translucent and Shaped Windows (SDN)
Tutorial for Java/Swing.
"Playing Chess With RZA" (Gambit)
From the NYT Chess Blog. Somewhat funny.
Let's all crawl Nature.com!
"Social Tagging for Science" (Nascent)
Weitzman, "Prices vs Quantities" (JSTOR)
(The Review of Economic Studies, Vol. 41, No. 4 (Oct., 1974 ), pp. 477-491)
"Friday PSTSS: "Birdhouse in Your Soul"" (In Medias Res)
Now that I read it, it makes complete sense... but I don't think I ever really understood what that song *meant*, before. Weird.
"Murakami, Baseball, and Inspiration" (Seth’s blog)
"'Something flew down from the sky at that instant,' he wrote, 'and, whatever it was, I accepted it.'" Murakami decided to become a writer after watching the leadoff batter for his favorite baseball team hit a double.
Kerman, "Umacs: A Universal Markov Chain Sampler"
Draft paper behind the R package for Umacs, which (along with Bugs) is used throughout Gelman's (black) book.
Sujay, Vincent, Willsky. "Learning Graphical Models for Hypothesis Testing"
(Statistical Signal Processing, 2007. SSP '07. IEEE/SP 14th Workshop, 26-29 Aug. 2007, pp. 69-73) To read. Via lidsblog.
Harriet McBryde Johnson, "Unspeakable Conversations" (NYT Magazine)
Johnson writes about visiting/debating Peter Singer at Princeton. I feel so strange when I read about her friends who urge her *not to go to Princeton at all*, or even shake his hand! As if "legitimating" his views was what was at issue!
"Dense Subsets of Pseudorandom Sets: The Paper(s)" (in theory)
"... it can be a short starting point for the computer scientist who is interested in trying to read Gowers’ paper, or for the combinatorialist who is interested in trying to read our paper."
"The Bush Administration's Teapot Dome" (Capital Gains and Games)
Probably will be (ultimately) swept under the rug. Senate debates will be mined, 150 years from now, by the holo-movie director of the future, Paul Thomas Anderson VI, for quotes. "*You* have a federal subsidy, and *I* have a corrupt contract..."
Jansche, "Maximum Expected F-Measure Training of Logistic Regression Models"
(Human Language Technology Conference / Conference on Empirical Methods in Natural Language Processing. October 2005.) Via LingPipe Blog. To read.
Pask, Behringer, and Renfree. "Resurrection of DNA Function In Vivo from an Extinct Genome" (PLoS ONE)
I heard David Haussler talk, a year or two ago, about paleogenomics (briefly). Since then, not only have I started noticing more papers on the subject, but I've also started to feel that research like this should play a larger role in "public debate."
"Post-World War II cooling a mirage" (Climate Feedback)
Someone should make an lolcat out of this -- "Mah bukkits can has misinterpreted sea surface temperature readings?"
"are religious congregations permanently failing organizations?" (orgtheory.net)
Orgtheory links to a paper -- religious organizations have "among the lowest annual mortality rates ever observed..." Is this because they are "minimalist," or "permanently failing" organizations?
"Map-reduce-merge: simplified relational data processing on large clusters" (Lambda the Ultimate)
Map-reduce, extended with a relational operator. To read.
"Applied Proof Theory: Proof Interpretations and their Use in Mathematics" (Lambda the Ultimate)
"Defunctionalization and Java" (The Universe of Discourse)
Straightforward. But well-written, nonetheless.
"Money talks and the social construction of reality, walks" (Crooked Timber)
"... but the infantry of the Science Wars seems to me to be very heavy indeed on science 'fans'." Apparently Davies is not much a fan (anymore) of Dennett. The category of "science fan" applies in biology, too.
"Politically Induced Dementia" (Julian Sanchez)
"E pluribus unhinged." I still browse over to Taylor Marsh's site, every time I need a little weirdo pick-me-up.
GOS
"Your One Stop for Finding and Using Geographic Data"
Hadley Wickham, "A Grammar of Graphics"
Presentation for an R-graphics package. Not quite fleshed out, but .. thinking along the right lines.
Danny Hillis, "Richard Feynman and The Connection Machine" (Physics Today)
"I suspect his motivation was not so much to understand the world as it was to find new ideas to explain. The act of discovery was not complete for him until he had taught it to someone else. "
File input (or "upload") in HTML forms
"Casual Fridays: Mac users don't like people touching their technology" (Cognitive Daily)
No touchy.
"I've (probably) been using Google App Engine for a week longer than you have"
Simon Willison's slides from a presentation about Google App Engine. I need to go think about indices and models for a while. I'm not sure yet, about what "ancestor()" does.
Phillips.pdf (application/pdf Object)
Water-based economic simulations. It'd be fun to write a computer simulator for the Phillips machine, too.
Barron and Sheu, "Approximation of Density Functions by Sequences of Exponential Families" (JSTOR)
The Annals of Statistics, Vol. 19, No. 3 (Sep., 1991 ), pp. 1347-1369
Spanish Team in Deal for Altidore - NYTimes.com
Altidore to be signed (we hope) by Villareal, and (maybe) loaned out to Recreativo Huelva. All to the good.
"Mapmaker, mapmaker, make me a map." (The Edge of the American West)
Next idea for a weekend programming project!
Starbucks offers new flavor: Free Wi-Fi - USATODAY.com
"...Starbucks will try to lure more customers by offering two hours of free AT&T Wi-Fi a day." Two consecutive hours a day if you sign up for a card and use it once a month.
"Power analysis for multilevel studies" (Andrew Gelman)
I spent part of the weekend re-reading Ch 20 of Gelman's (black) book.
REST - The short version
"REST" has always seemed, to me, to be a shorthand for "rediscovering the original intent behind HTTP."
de Lara and Guerra. "Pattern-based Model-to-Model Transformation" (arXiv)
"We present a new, high-level approach for the specification of model-to-model transformations based on declarative patterns."
Andrew, Mitnitski, and Rockwood. "Social Vulnerability, Frailty and Mortality in Elderly People" (PLoS ONE)
"The aims of the present study were to operationalize social vulnerability according to a deficit accumulation approach, to compare social vulnerability and frailty, and to study social vulnerability in relation to mortality."
Daniel Dennett, "Fun and Games in Fantasyland"
Daniel Dennett vs. Jerry Fodor on the subject of "design" and evolution. (Mind & Language, Volume 23 Issue 1 Page 25-31, February 2008). Via SEK.
ABC News: Karaoke Singer Allows Boston to Rock On
Installing Apache 2 and mod_python on OS X
Brian Greene, "Put a Little Science in Your Life"
NYT Op-ed.
ID3v2Easy - ID3.org
The organization that supports the ID3 tag format. Clearing out my opened tabs...
JID3 - A Java ID3 Class Library Implementation
Java library for reading ID3 tags in mp3s. For when I get around to rolling my own mp3 indexer.
New Cohen Brothers movie.
tights are not pants
The subject of some discussion recently.
"DENIED ASPIRATES." (languagehat.com)
Languagehat reads Middlemarch. "I beg your pardon: correct English is the slang of prigs who write history and essays. And the strongest slang of all is the slang of poets."
"Decoration Day" (The Edge of the American West)
4000, 2000, 13000, 620000 (!), 2000, 115000, 405000, 36000, 58000, 400, 5000+.
4. More Control Flow Tools
"elif"?? Seriously? I didn't have internets over the weekend, and I was unable to intuit this on my own. What a dumb keyword.
Hale, "The Man without a Country." (Harvard Classics Shelf of Fiction)
The requisite short story.
"Is the U.S. "operating 'floating prisons'..." (BLDGBLOG)
Lacks the requisite reference to "The Man Without a Country." Also, not that I wouldn't put it past them, but I also totally would never trust anything reported in the Guardian without additional corroboration.
Via Alex Palazzo at The Daily Transcript. Pretty beautiful. I'm not sure what I think about the sound-effects, though. Little whirring machines? Probabably should be tiny splatting and squishing sounds instead, right?
Tang, Boujemma, and Chen. "Modeling Loosely Annotated Images with Imagined Annotations" (arXiv)
It'd be cool to think about meshing a system like this with a framework for handling and storing biological blot image data.
Victor Shoup, "A Computational Introduction to Number Theory and Algebra"
‘The Onion Movie’ Trailer: Finally, a ‘Kentucky Fried Movie’ for Our Generation (New York Magazine)
"That looks awesome."
"Swiss Shred Tinner Docs; Khan Talks" (ArmsControlWonk)
"I think, at this point, with no one actually like going to jail for running the AQ Khan network, I am pretty convinced that the entire network by 2003 was on the US payroll." Jesus.
"Generating Primary Keys" (StyleFeeder Tech Blog)
Jason Rennie (who is thoughtful) thinks about generating primary keys in Java with MySQL. It's about *not* using auto_increment.
"The Other Night Sky" (BLDGBLOG)
BLDGBLOG talks about Iridium flares. (I'm not sure that "blent" is a word, though.)
Guido van Rossum's Django/appengine reimplementation of his Mondrian project, but for public consumption.
"Sweden: Image and Reality" (Consider the Evidence)
"Like all countries, though, Sweden is more complex than the stereotype suggests. Here are a few things that may surprise." I like the "Surprises for the Left" and "Surprises for the Right" division of points.
"Tokenization vs. Eager Regular Expressions" (LingPipe Blog)
"I was expecting the whole thing to match. Apparently, that’s what a POSIX regex would do. But Java follows the Perl model, in which eagerness overcomes greediness." As 6.945 just began to show me this semester, most regex implementations are *broken*.
Bergman, Rogowitz, Treinish, "A Rule-based Tool for Assisting Colormap Selection"
(Dennis -- same guys as the ones you just linked to, but this time a report on a color picking tool called "PRAVDA", which is *slightly* more formal.) An IBM technical report. Describes the PRAVDA color selection system. Something like this should be more available in the biological sciences -- the use and abuse of "heat maps," for instance, with gene expression, is insane.
"The ethanol program is even worse than you think" (Marginal Revolution)
"But more realistically, the movie is a loss leader to attract buyers of high-margin popcorn. " (Damn you, Big Corn.)
"Wednesday Sex: Schopenhauer - hurrah for lesbians and contraception" (The Weblog)
"All hail the non-reproductive species-being of a paradoxical Schopenhauerian feminism!" The Weblog is beginning to read like Free Darko, in places. (This is one of those places.)
Java Programming - How to find the verion number a class compiled with? (Sun Forum)
Tells us what we want to know -- basically, Chris, we can either look inside the class files manually, or we can use javap (the Java disassembler) to look at them for us. To do this, we'll need to unjar the JAR, but that's easy.
"A: 1,114" (Acephalous)
"It is in the nature of a piece of writing that it is able to stand free of its begetter, and can dispense with his or her physical presence. In this sense, writing is more like an adolescent than a toddler."
Hugh Laurie, "Wodehouse saved my life"
"My history teacher's report actually took the form of a postcard from Vancouver." Laurie writes about playing Wooster for television.
"New York Times API Coming" (ReadWriteWeb)
NYT planning a web-api to their structured data (their news, of course). I don't understand what the business model here is, but then again, maybe that's why I'm not in the newspaper business? Right?
"Ratatationage" (Sous les paves, la plage)
Another LP3 track: "Shempi."
Harris and Mattick, "Science Sublanguages and the Prospects for a Global Language of Science" (JSTOR)
Annals of the American Academy of Political and Social Science, Vol. 495, Telescience: Scientific Communication in the Information Age (Jan., 1988), pp. 73-83
Zellig Harris, "The structure of science information"
[http://dx.doi.org/10.1016/S1532-0464(03)00011-X] Journal of Biomedical Informatics, 2003.
Zellig S. Harris, "Language and Information: The Bampton Lectures"
"The links below enable you to hear Zellig Harris describe his theory of language and information in his own words." To listen, and (probably) to buy.
"APS 2008: What Chutes and Ladders has to do with learning Math" (Cognitive Daily)
"Whether kids played board games correlated with number line success, but playing video games did not."
"The Microsoft Case: The Government’s Theory, in Hindsight" (Freedom to Tinker)
"Life doesn’t always offer do-overs, but we may get a do-over on the browser war, and this time it looks like Microsoft will take the high road." Ed Felten is writing a series of posts on the 10th anniversary of the MS antitrust decision.
Bertsekas and Tsitsiklis, "Parallel and Distributed Computation: Numerical Methods"
Republished book, in 1997. Now available free online.
"The array size of the universe" (Shtetl-Optimized)
"I’ve been increasingly tempted to make this blog into a forum solely for responding to the posts at Overcoming Bias." Haven't we all?
Asian Dan: Ratatat - Mirando
It's time.
RATATAT & CAZALS at WAVES AT NIGHT
I think you know what to do.
"Through the Children's Gate: A Home in New York", by Adam Gopnik (Amazon)
Apparently includes the essay, "Last of the Metrozoids," one of my favorites.
Martin Jansche, "Treebank Transfer"
Annotating a large corpus of text from a 'related' treebank.
"aesthetics of scientific illustration" (orgtheory.net)
Some really beautiful examples.
Bartz, Kane. "Matching Portfolios" (SSRN)
"Under the framework of the Rubin Causal Model, the matched portfolio provides a view of the counterfactual, an alternative portfolio that the manager could have chosen but did not." But -- it's David Kane.
Carl Zimmer's Science Tattoo Emporium
A menagerie, of sorts.
6 Tribes of Bacteria, the Good Kind, Found to Be at Home in Inner Elbow - NYTimes.com
"No Harm Done?" Also: # of genes != complexity. Especially when you're *summing* across species.
Luna's Light -- Universal Grieving Symbol™
"The Universal Grieving Symbol™ Pin can be worn at the darkest, most difficult time - like on a day when support is needed from friends or co-workers." OMFG. Also -- trademarked? Really?
Webmonkey: the Web Developers Resource
arbingersysWrit: Google App Engine: [A Better] Many-to-many JOIN
I'm sure someone's doing work on the *theory* of data denormalization, right? The hows/whys/how-muchs of duplicating data? I wonder how hard it would be to do this kind of transformation automatically?
"There Goes My Hero: Golden Richards Won't Wake Up" (Deadspin)
"Whoa—say what? Painkillers? From someone who served jail time for trying to buy pills with checks he stole from his dad? This seems very wrong. “No, sorry,” I say. “I don’t have anything like that.”"
Cremer, Garicano, and Prat "Language and the Theory of the Firm"
"...the world's foremost genomics research center, the Broad Institute in Cambridge, MA."
"A NEW AENEID." (languagehat.com)
Frederick Ahl can snack on 'em, as the kids say.
“The vulgar Abolitionists in the Senate are getting above themselves…” (The Edge of the American West)
"The nation slipped headlong toward war, the skids greased with martyrs’ blood."
Cyberspace by William Gibson from Burning Chrome
"Lines of light ranged in the nonspace of the mind, clusters and constellations of data." Gibson's definition of "cyberspace" as a "consensual hallucination" of data. I need to reread Neuromancer.
"Linear Logical Algorithms" (Lambda the Ultimate)
Deletion as a "non-logical" operation. And how do we make it "logical?"
"A Genetic Gastric Bypass" (The Loom)
Carl Zimmer, writing about the Platypus Genome, says "...for some reason, the platypus stomach has disappeared, leaving just as featureless tube connecting the esophagus to the intestines."
Paul Murrell, "Introduction to Data Technologies"
Draft book, available online. Linked to by Andrew Gelman.
"Parents' influence on kids' behavior: Not much" (Cognitive Daily)
"Unfortunately a lot of the research suggests that parents don't actually have much influence on their kids' behavior -- peers, other environmental factors, and genetics seem to have a larger impact."
The Official YAML Web Site
Apparently, this ain't a markup language.
Music Computation and Cognition [ MuCoaCo ]: Publications: Theses
Must track down the Eric Cheng master's thesis. A USC database, somewhere?
MCM International Conference Mathematics and Computation
"...the first international Conference of the newly founded Society for Mathematics and Computation in Music (SMCM). "
Elaine Chew: Presentations
USC Researcher on computational music.
Full-Screen Exclusive Mode (The Java™ Tutorials)
Tutorial for using the entire screen in Java.
Creating and Drawing an Accelerated Image (Java Developers Almanac Example)
Sun tutorial for using VolatileImage.
"VolatileImage: Now you See it, Now you Don't" (Chet Haase's Blog)
Hardware-accelerated image buffers in Java.
Python: module tex_wrap
"Implements TeX's algorithm for breaking paragraphs into lines." I need to look into this -- I wonder how hard that actually is?
Graphserver - Fine Open Source Itineraries
"Graphserver is a webservice server providing shortest-path itineraries on large graphs. " Graph searching and paths in a client/server setup.
"waayt!" (I Can Has Cheezburger?)
I'm really ashamed that I find this cute.
"After your comments are verified, you will be awarded points through the McCain Online Action Center. " It's the John McCain Role Playing Game! But they don't tell me how many points I'd need to level up.
A History of Sigma (from Cora Styles, who used to be in the Fink Lab)
"A number of people in the lab claimed this "wasn't yeast" and Carlos should get rid of it. Carlos was not so easily convinced."
"obama is no longer toast, thanks to michigan and florida" (orgtheory.net)
Apparently, 90% of politics is just *not* showing up.
"walmart and economic inequality" (orgtheory.net)
IANAE, but I score this one for Kenworthy too.
"Inequality and Prices" (Consider the Evidence)
"But income is important in its own right because it confers capabilities to make choices." The obvious response to that somewhat-maddening Levitt post. So why didn't SDL think of it on his own?
"Retinal sex and sexual rhetoric" (Language Log)
Even better. In this follow-up, Liberman catches Sax making the obvious mistake in mixing up what are (essentially) the posterior and predictive distributions. Could easily be a case study in Stats 101: How Not to Do It.
Thinkbase
Freebase graph viz. I'm just going to go ahead and say: "unless you have a fresh idea about automatic layout, no more graph visualizations please." I think they're almost universally hard to understand...
3D-Mailbox (Level 2: LAX!)
"Makes your email experience feel like a cross between a videogame and a movie." I'd like to Send Money in Exchange for Your Awesome Software, kthxbye.
"Portable Executable" (Wikipedia)
"The Portable Executable (PE) format is a file format for executables, object code, and DLLs, used in 32-bit and 64-bit versions of Windows operating systems."
Application-Specific Attacks: Leveraging the ActionScript Virtual Machine (IBM)
A whitepaper from IBM, explaining how an exploit in the ActionScript VM works (based on a size-check that does an unsigned comparison, followed by a memory alloc whose return value isn't checked).
"My favorite things Japan, cinema edition" (Tyler Cowen at Marginal Revolution)
Tyler Cowen taught Battle Royale in a "Law and Literature" class? I confess, I couldn't even make it through 15 minutes of that movie, I thought it was so bad (sorry John!).
"Lolita and the Middle East" (Seth's Blog)
The next person I hear use the phrase "speaking truth to power" who *isn't* named Ms. Charcar, I'll break their nose.
Elaine Chew, "The Spiral Array: An Algorithm for Determining Key Boundaries" (CiteSeer)
Just came from a talk she gave. I think this paper is adopted from her thesis, in 2000?
"Liberman on Sax on Liberman on Sax on hearing" (Language Log)
Oh, snap.
Genome Reference Consortium
"The goal of this group is to correct the small number of regions in the reference that are currently misrepresented, to close as many remaining gaps as possible and to produce alternative assemblies of structurally variant loci when necessary."
[tt] NS: Do we need to change the definition of science?
The New Scientist is totally worthless. I need to filter my "Overcoming Bias" feed through Pipes, so that I get all the Hanson but none of the Yudkowsky. Seriously!
"Using Git as a versioned data store in Python" (New Artisans LLC)
"We have violated the prime directive" (Cranial Darwinism)
The LGF'ers freak out when they discover that someone's doing NLP research on their comment threads. But if you're posting on a (public) computer forum, you should *expect* that machines will read what you write!
Processing Matrix Library
It probably wouldn't be too hard to contribute my (brain-dead) GE and matrix inversion code to this project, right?
de.bezier.mysql
MySQL (and other databases?) library for Processing. I think you can probably guess where I'm going with this.
"Convenient Categories of Smooth Spaces" (The n-Category Cafe)
"Here’s one way the category of smooth manifolds is annoying: the space of all smooth maps between smooth manifolds is not a smooth manifold!" With two interesting links to papers.
"Logical Algorithms" (Lambda the Ultimate)
Paper Critters
Papervision + toys. "Cute or evil? Loyal or rebel?" I'm not sure what the point is, and it nearly crashed my browser. And I wish you could fold in different patterns. But it's still cool.
Ex Libris Mortis
Two of my favorite things, together at last. "The fearsome 'Hello Kitty Sisters of Battle,' notice how the rhinos sparkle..."
"The Spectral Lower Bound to Edge Expansion" (Luca Trevisan)
I've been meaning to read this for so long... but I really can't have it taking up a tab in firefox anymore. Into delicious it goes.
"Economic fundamentalism and the minimum wage" (Crooked Timber)
Kathy G. gives a personally biased (the best kind!) of review of research on the minimum wage and its effect on employment. Truly informative. Probably should be more hyperlinked than it is.
"Axes that extend below 0 or above 1: actually a bigger issue involving how statistical variables are stored on the computer" (Andrew Gelman)
It's too facile to say that this is "Gelman Dreams of Type-Systems," but that's kind of what this is. The issue of automatic analysis and visualization tools that rely on a static analysis of their input is ... important.
django-db-log
Via Simon Willison. Database logging solution for Django. One of my projects for today is getting a pre-existing Django app up and working on Google App Server.
Vertext
"...enables you to draw giant, detailed typography at high frame-rates." After seeing Ben Fry talk at CSAIL yesterday, I'm beginning to think about Processing again. John?
Beanstalkd
"beanstalkd is a fast, distributed, in-memory workqueue service." I wonder how hard it'd be to write a java interface for this.
a softer world #306.
"She doesn't take any shit from vegetables, for instance."
"Placebo effects and the probability of assignment to active treatment" (Social Science Statistics Blog)
"Also, if placebo drugs can improve health outcomes, maybe ineffective social programs would still work as long as participants don’t know whether the program works or doesn’t? Maybe this is the role of politics." Don't go there!
"Copper: A Gentleman's Disagreement" (In the Pipeline)
Hilarious. Of course, the punchline of the joke is the parenthetical aside ("(see title)") in the abstract of the paper itself.
Yahoo! Internet Location Platform - YDN
"The Yahoo! Internet Location Platform provides a resource for managing all geo-permanent named places on Earth."
SBML: Systems Biology Markup Language
They emphasize that it's meant to represent "quantitative models" not just "pathways." Still not sure that it's not too low-level for what I want, but ... probably a step in the right direction.
Colbert Report, "The Bill O'Reilly Inside Edition Video"
The whole "Papa Bear" schtick was *made* for situations like this.
"Floating in Platonic heaven" (Shtetl-Optimized)
"To summarize, Kronecker had it backwards. Let Man and Woman deal with the integers; all else is the province of God." Scott Aaronson's an intuitionist, I guess. (But then again, aren't most computer scientists?)
Wilf, "generatingfunctionology"
Free online book about generating functions, by the ubiquitous (to me) Herbert Wilf.
SBGN : Systems Biology Graphical Notation
Mentioned as something I should look at, by one of the other students (who works at the Broad) after my final presentation in 6.945 today.
"Procrastination Is Not Laziness" (Will Wilkinson)
"Doing something else is not laziness; it’s misdirected industriousness." But let me tell *you* -- watching seven hours straight of Law & Order isn't industrious, misdirected or otherwise.
"Charter Will Monitor Customers Web Surfing to Target Ads" (NYT Bits Blog)
"He said that opt-out has become the norm for all targeting on the Internet."
Steven Pinker, "The Stupidity of Dignity."
(David Gelernter is on the council for bioethics now??!?) This essay is correct in *every* detail.
"I Said, This Ain’t No Mecca, Man…" (Attackerman)
Spackerman IMs with a friend of his, who teaches in Irbil (Iraq). "But people fill in teh blanks… 'America can’t be taht stupid'"...
Novembre and Stephens, "Interpreting principal component analyses of spatial population genetic variation" (Nature Genetics)
"We find that gradients and waves observed in ... maps resemble sinusoidal mathematical artifacts that arise generally when PCA is applied to spatial data, implying that the patterns do not necessarily reflect specific migration events."
Reaction to the DEC Spam of 1978
RMS has been soft on spam since the beginning! (Unless it's spam from a dating service.) Awesome.
FiveThirtyEight.com
Comprehensive political polling, nice interface, beautiful graphs (cartograms!). Why didn't I know about this before? (Via the Harvard Social Sciences blog.)
Datamob
"Datasets and interfaces to them." Closer to exactly right.
"Shiller" (YANP)
"The end in sight" (BLDGBLOG)
I assume everyone has seen these absolutely amazing volcano/lightning photographs by now? This is the volcano that kept Dave's wife grounded in Argentina last week -- apparently it shut down a lot of air travel.
"Cat Proximity" (xkcd)
An accurate description of how my girlfriend lives.
"Vicious Cycle" (PhD Comics)
An accurate description of how I live.
"Celebrity Blogship Has Its Champion" (soccernista.com)
"But wait - might there be some stirring in the Soccernista bushes? Stay tuned, my nine remaining readers…" I can't believe it's only been eight months -- it seems like a lifetime. Thank you, RSS reader.
Benjamin Pierce, "Lambda, the Ultimate TA"
Literally, "I lectured from an emacs buffer." To do: look up his class notes, after I finish my final project for this wednesday.
"Cheap Thrills" (Andrew Bird at Measure for Measure, the NYT Music Blog)
"Sometimes, the object itself gets assigned a mystical value and must be on a song, though I know most listeners could not care less whether we use a Telefunken mic or a 30-year-old calf skin drum head. " Like follow-through on a golf swing.
Real-time polymerase chain reaction (Wikipedia)
JMLR: Special Issue on Variable and Feature Selection
BioTeam
Run by one of the founders of BioPerl. They're in Boston, apparently. Cloud-computing + next-gen sequencing solutions.
Western blot (Wikipedia)
Southern blot (Wikipedia)
Northern blot (Wikipedia)
Johansen, "Robust Strategies and Counter-Strategies: Building a Champion Level Computer Poker Player"
A masters thesis.
REAPER
Audio sequencer from the guy who did Winamp.
Arrow, "Social Choice and Individual Values"
The whole book.
Fold It!
"Solve puzzles for science!" (I was musing, in the shower this morning, about whether you could encode distributed computations as newspaper sudoku-style puzzles...)
"Can a human skull be used as a bong?" (Slate)
"The most effective skull bong might include a removable 'slide,'..." I bet it would. (The Slate team seems to have done their research on this one.)
"How I Built a Working Poker Bot, Part 1"
Useful tidbits on windows APIs. Most of the poker stuff is ... not there.
JPC
A java-based x86 emulator. Does both real and protected modes. "Virtual peripherals." Hm.
"The Man Who Loved China" (Marginal Revolution)
Did you know that there's another Winchester book out?
"The Origin of Consciousness" (The Universe of Discourse)
Ah, the ole' "alien hand" trick.
jdb - The Java Debugger
Found this to be incredibly useful, yesterday.
"And Now We Worry" (Yglesias)
Ygglz is freaked out by a JOIN.
"Databases and R" (Andrew Gelman)
Gelman fields a question about databases and R -- he doesn't understand the question, but his commenters do. I've been toying, for a while, with the idea of adding an R interface for our own lab's data...
"RealClimate roll up to the climate casino" (Climate Feedback)
Wang et al. "Adaptive Affinity Propagation Clustering" (arXiv)
Haven't read it, can't vouch for it, but I know you'd probably be interested in seeing it...
"History of Logic Programming..." (Lambda the Ultimate)
Hewitt Knew-it.
"Beautiful differentiation" (Conal Elliott)
RANDOM.ORG - True Random Number Service
"RANDOM.ORG offers true random numbers to anyone on the Internet. The randomness comes from atmospheric noise, which for many purposes is better than the pseudo-random number algorithms typically used in computer programs."
Nuclear Phynance
Avellaneda and Lipkin, "A Market-Induced Mechanism for Stock Pinning"
Cremers and Weinbaum, "Deviations from Put-Call Parity and Stock Return Predictability"
Goyal, Saretto, "Option Returns and Volatility Mispricing"
"Chess and Computers: Using Brute Force" (Boylston Chess Club Blog)
"Use of the brute force of computers to solve problems, instead of relying on elegant programs, has served me well professionally many times." You've been working on small problems then, d00d.
"Purple Toupee" (Yglesias)
If we're going to start making analogies using songs from "Lincoln," then obviously "Shoehorn with Teeth" is a much better metaphor for all of politics in general. Pencil Rain and Kiss Me, Son of God are also probably applicable.
“Logical abstract nonsense is a subfield of general abstract nonsense” (Language Log)
Mark Liberman picks up on a phrase ("abstract nonsense" as a joke term-of-art for categories) that I was just reading about the other day.
"prepping as procrastination, continued" (scatterplot)
To remember: the graphic in the lower right-hand corner.
"Congress Passes Bill to Bar Bias Based on Genes" (NYT)
New York Times article on the passage of the GINA bill.
"TYPESETTING ARABIC." (Languagehat)
"The technical problem is this: Arabic letters are generally not written separately but joined to each other in groups or entire words, like a script typeface in English." Of course, the jerk inside me wants to yell, "dynamic programming!"
"If You Build It" (Yglesias)
Ygglz posts an aerial picture of my aunt's house (almost). I'm not sure his description of the area is quite right, but ... maybe. And probably right In General.
"Through the looking glass" (Andrew Gelman)
"In contrast, a lawyer may be trained more to brush aside or not even notice details that contradict his main story. Perhaps this is even more true of a lawyer such as Yoo who is famous for writing opinions that are kept secret."
Running with Rashida (WGBH Labs)
"This short film would give a snapshot of Rashida as she makes the leap from passionate activist to ardent politico." Sign up at WGBH and vote for Jolene's short documentary.
"Directed Sentence Drawings" (Neoformix)
Pretty, but I'm not sure what the point is (or what illumination it gives). It'd be cool if this could be formalized in some way. Maybe abstract locations on the screen/page could correspond to topics?
"...it is a burning pity that our lives are not long enough and not sufficiently free of annoying obstacles, to study all things with the same care and depth as the one we now devote to some favorite subject or period." Nabokov on scholarship.
"The Great Ubuntu-Girlfriend Experiment" (Content Consumer blog)
Thinking about what makes a desktop system user-friendly, for more than one meaning of the word "user."
'The Cheese Man' speaks (Boston.com)
Boston is sort of like a cross between a bad mob movie and an instructional video for dairy products.
"Pernicious Symbolization" (n-Category Cafe)
Ryle jumps down Carnap's throat. But this is a universal vice -- see a significant amount of applied computational literature, as well as the proliferation of the suffix "-ome" in modern biology.
"Metaphysics with Computers" (Mixing Memory)
The phrase "high-hanging fruit" is severely under-used.
"With Tuppence for Paper and Strings" (defective yeti)
"It was the first time I'd done so since childhood, and had forgotten the intensity and purity of emotions a $5 kite can evoke. " Brawndo Ad (YouTube) "It's like a monster truck you can pour into your face." (Via Julian Sanchez) This viral video addiction must stop. "Look Who's Doping" (Process Algebra Diary) Mathematicians and performance-enhancing drugs (a much better-known example, to be sure). "Compression using Canvas and PNG-embedded data" Encoding javascript in images. Sounds funny/cool at first, but then you realize it's just a (poor) custom binary representation that's stuffed into a PNG and then "visualized." I was expecting some OCR or something sorta mindblowing... "Hoisted from Comments: Torture and Its Euphemisms" (DeLong) More on torture under Elizabeth. I should track down the earlier post on this from him, too. libevent "An event notification library." Used in memcached. "libevent is meant to replace the event loop found in event driven network servers." Available for a variety of platforms. Piet Programming Gallery A graphical programming language (literally). The approximation of pi program is pretty great -- "Naturally, a more accurate value can be obtained by using a bigger program." Via Lambda-the-Ultimate. "This Is Your Brain On Free Choice" (Mixing Memory) The discussion between Chris and Neil in the comments is actually the best part. Nagalakshmi et al. "The Transcriptional Landscape of the Yeast Genome Defined by RNA Sequencing" (Nagalakshmi et al., 10.1126/science.1158441 -- Science) Mark Gerstein and Michael Snyder are last-authors on a paper that sequences cDNA from expressed mRNA in yeast. "Lost. And found." (Edge of the American West) Ari talks about GPS. My advisor likes to suggest that he was the first person in the US to send a location addressed envelope through the USPS (using his very-early GPS system). I have no idea if this is true. "Bombshells" (Edge of the American West) Eric Rauchway talks about the Haymarket Riots. "Remember, it all started with the eight-hour day. By the end of it you couldn’t join a union without seeming a bit terrorist." I didn't realize that new scholarship was still being done on this. "Overreacting" (Is There No Sin In It?) AWB muses on kindness in the city. "...this coldness about very kind interpersonal interactions is key to keeping us all moving. " Fernandes and Crainic, "Lectures on Integrability of Lie Brackets" (arXiv) Lecture notes. Via the n-Category cafe. "Turning the table" (Junkcharts) Junkcharts does the regressions on the NYT's "40-yd-times predict NFL success" story. "Automatic Generation of Peephole Superoptimizers" (LtU) Peephole vs. dataflow optimization. "G-code" (Wikipedia) Via LtU. Machine code for maching tools (CNC routers). "What can the Internet do to improve public discourse?" (Corie Lok's blog) "My Secret Shame" (Short Schrift) "Great Quotes from Modern Composers" (The Monkey Cage) Lenin said, “I can’t listen to music too often. It affects your nerves, makes you want to say stupid nice things, and stroke the heads of people who could create such beauty while living in this vile hell.” "Cormac McCarthy & the semi-colon" (paperpools) "The problem is not that I am speaking from a position of ignorance. I am speaking from a position of knowledge to people who don't know what knowledge would look like." and "One tries not to write about King Charles' head." Nonparametric Time Series Analysis Using Gaussian Processes (Damouras) Oh, I *will* read it. "Concerning the Fine Line Between Literary Criticism and Rank Paranoia" (Acephalous) It's weird to see the Venkatesan story (with references to Tom Cormen, too!) pop up on SEK's blog, of all places. "Good books get to you" (dsiska.wordpress.com) A great quote. Oddly enough, reading The Idiot in high school was what made me decide I didn't ever want to support capital punishment. "Theorems Into Coffee" (The n-Category Café) John Baez simultaneously channels Erdos and Robin Hanson. "I’ll send a$15 Starbucks gift certificate to anybody who gives me, in LaTeX, a well-written rigorous proof of the following theorem..."
"Take It To the Next Level" (YouTube)
Ogged is right -- this *is* a ridiculously great ad. (You have to be half-aware of European soccer to catch most of the visual player/coach references.)
Seeger, "Bayesian Inference and Optimal Design for the Sparse Linear Model"
JMLR 9(Apr):759--813, 2008.
Paul Hsieh's "Super Fast Hash" function
"Pursuing the Next Level of Artificial Intelligence" (NYT)
Daphne Koller gets written up in the NYT. Apparently because she just got an award from the ACM?
Full-Screen Exclusive Mode API (The Java™ Tutorials)
Cowles Foundation for Research in Economics: Monographs (1934-1941)
Papers and book chapters, freely available.
Clearinghouse for Ecology Software
MONITOR User Manual
"How to See This Mission Accomplished" (NYT)
Nate Fick gets equal billing to Kagan, Cordesman, Bremer, and Slaughter.
"Unforeseeables"
Probably my favorite Goldbarth poem of all time.
Kenny Easwaran, "Dominance-Based Decision Theory"
On my mind a lot in the past day...
Noncoding RNA annotation database. Via Alla Sigova.
Bergamaschi, Castano, Vincini, "Semantic integration of semistructured and structured data sources"
Worst abstract ever. ACM-SIGMOD 1999.
Joel Moses, "Symbolic integration: the stormy decade"
The survey of symbolic integration I was telling you about ...
"The Link Grammar Parser is a syntactic parser of English, based on link grammar, an original theory of English syntax." Since I've been roaming through Sleator and Lafferty's stuff anyway.
Burton Bloom, "Space/time trade-offs in hash coding with allowable errors"
The original Bloom Filter paper. (Comm. of the ACM, 1970)
Danny Sleator's implementations of Splay Trees
Java and C.
James Heckman, "Econometric Causality"
"Explaining the econometric tools for causality to statisticians..." (NBER)
Something that's bugged me for a while -- how should we turn prices from prediction markets into predictions (with probabilities)?
"Can having information public hurt consulting business?" (MySQL Performance Blog)
In the course of explaining how a business can co-exist with a blog about the same subject, a MySQL consultant says something important (I think) about accumulation and use of ideas in general.
Weston man, 63, suspected of trafficking cocaine, marijuana - Framingham, MA - The MetroWest Daily News
Saved for my own purposes...
"Firefox 3, del.icio.us, and you" (del.icio.us blog)
Thank God. My long personal nightmare is finally over.
Labeled Statements in Java
The 'break' and 'continue' commands can take identifier arguments. I ... did not know this. (Via Chaotic Java)
Reprints in Theory and Applications of Categories
Barr and Wells, "Toposes, Triples and Theories"
Available free online. Linked to by John Baez.
Sleator and Tarjan, "Self-adjusting Binary Search Trees"
The original Splay Tree paper.
van Wijk, "Views on Visualization" (IEEE Transactions on Visualization)
Figure 1 is ... funny.
"Is there a general skill of “management”?" (Daniel Davies at Crooked Timber)
Just when I think I've managed to worm my way out from under the dsquared cult... he pulls me back in. It's a great post.
SoyLatte: Java 6 Port for Mac OS X 10.4 and 10.5 (Intel)
No idea how stable this is.
Java 6 SE for Mac OS X
Only for OX 10.5+, only for 64-bit Intel machines. Lame.
Ravikumar, Wainwright, Lafferty, "High-Dimensional Graphical Model Selection Using l1-Regularized Logistic Regression" (arXiv)
Slightly less interesting than I originally thought. Still, "sparsistency" is a great word.
"Bolivian president debuts with second-division soccer club"
It's as if Bush decided he wanted to be the starting pitcher for the Nationals (zing!).
Brauneis, "Copyright and the World's Most Popular Song"
Robert Brauneis writes that the song "Happy Birthday" is "almost certainly no longer under copyright," and does the research to track down how it was written, who had the rights, and whey they (probably) no longer do.
Remote Javascript Facilities for Connotea
Josh Kalk, "To walk or not to walk"
Article at the Hardball Times, trying to predict whether (and when) intentional walks are a good idea. "Sometimes you push the wrong button and get the right result..." But there should be a more formal approach to this, right?
Alpha Assault
YAAFG: Yet Another Addictive Flash Game. This one spells words, with little castles.
Park et al. "Fine mapping of regulatory loci for mammalian gene expression using radiation hybrids" (Nature Genetics)
"We mapped regulatory loci for nearly all protein-coding genes in mammals using comparative genomic hybridization and expression array measurements from a panel of mouse–hamster radiation hybrid cell lines."
"GlassFish v3 just got embeddable" (Kohsuke Kawaguchi)
Embeddable servlet containers (either GlassFish or Jetty) ... hmmmm.
Seven minute video from this year's Coachella. Pretty much awesome.
"Space as a Symphony of Turning Off Sounds" (BLDGBLOG)
Reminds me of those graffiti that come from cleaning off *other* graffiti in a well-defined pattern ("negative" graffiti?).
"The Elements of Typographic Style Applied to the Web"
"Administration Says Particulars May Trump Geneva Protections" (Washington Post)
"...when judging whether a specific interrogation practice would violate the conventions' ban on degrading treatment, the government can weigh 'the identity and information possessed by a detainee,'..."
Effective Procrastination with Hiveminder
Hiveminder looks like an interesting (IM-enabled) todo-list system. The "programmer API" part is pretty funny, too.
MediaWiki API
JSON and XML (among others) formatted output of content in MediaWiki. Via I don't remember who.
"Confessions of an April Fool and the Dope on Brain Doping" (Jonathan Eisen)
In every joke, there's a grain of truth, right? Also, "April Fools Day" is actually becoming somewhat destructive, in the inter-tubes. It's not an event which meshes well with a medium where content is searchable and permanent.
"ICWSM Report" (Kevin Duh at Hal Daume's Blog)
NLP for blog and news-analysis. Related to my mineshaft analyzer.
Python wrapper (for use in Django) for Google Charts. Looks *very* cute.
"Senate passes bill barring genetic discrimination" (Ars Technica)
More on GINA. I still don't know if my previous objections hold water or are tiny and inconsequential. Time will tell, I guess?
"Why Your Code Sucks" (Dave Astels)
Do your tests work? (Are they complete?) How do you know? Can you prove it? Technically, we could play this game all day. (Which doesn't mitigate the practicality of some of the advice.)
Vaguery's LibraryThing?
Are you on LibraryThing, for realz?
Quinones et al. "Comparative Genomic Analysis of Clinical Strains of Campylobacter jejuni from South Africa"
At least the title looks interesting, right?
"Implementing Hanson’s Market Maker" (Oddhead Blog)
I can't believe I didn't already bookmark this. David Pennock gives a concrete/alternate description of Robin Hanson's market-maker-from-scoring-rules paper.
Kaminski, Sloutsky, and Heckler, "The Advantage of Abstract Examples in Learning Math" (Science)
The original paper on abstract-vs-concrete learning in mathematics.
Chazelle et alia, "The Bloomier Filter: An Efficient Data Structure for Static Support Lookup Tables"
Odd harmonic convergence alert. I think these might be useful for a project I've been working on most recently.
"Eliminating the Birthday Paradox for Universal Features" (John Langford, Machine Learning (Theory))
I don't quite understand, but I only skimmed the post. At the same time, I think he's missing the point about Bloomier filters in the comments? Maybe?
User/IP Banning Middleware (djangosnippets.org)
The main thing here, I think, is the use of REMOTE_ADDR in the http header. (Is that set up by the server?)
"I Don’t Know Just Where I’m Going" (Attackerman)
Because I have nowhere else to post the political stuff...
A "research scientist" from the Whitehead is quoted within the first three paragraphs. Guesses as to his/her identity?
"Godwin this." (Eric Rauchway at Crooked Timber)
Heidegger and Karl Jaspers, in the court of public (and academic) opinion.
LingPipe
"LingPipe is a suite of Java libraries for the linguistic analysis of human language."
"Car of the Future" (NOVA)
Available online. R and I watched this two nights ago -- the best part is when Click (or Clack, I can't remember) actually swears (for realz!) at a VP of tech development from GM when she says something really rude.
Sterling on Life, the Universe, and Everything
"philosophy of science and org theory and strategy" (orgtheory)
Hooray for philosophy of science reading lists! (I didn't realize that "structuration" was a real word.)
"Vengeance is Ours" (Jared Diamond, in the New Yorker)
Jacques Carette and Gordon Uszkay, " Species: making analytic functors practical for functional programming"
Weber and Penn, "Creation and rendering of realistic trees"
Tree-rendering model, complete with pseudocode. Includes a version of the model for degraded-complexity "at range." This is the paper behind the "Arbaro" tool.
Generator Tricks for Systems Programmers
Re-implementing continuations/monads/dataflow language (take your pick) in Python. But he gives a lot of good example code too, so there's *that*. My "Echo" system is basically a graphical version of this for bio-programming in Java.
Dominik Janzing, Bernhard Schoelkopf, "Causal inference using the algorithmic Markov condition"
I wish they wouldn't call them "causal graphs," when they're really just what (most) people call Bayesian networks. Maybe we could come to a compromise? "Causal-ish Graphs?"
"Debra Charlesworth on Lynch's Origin of Genome Architecture" (evolgen)
Tag team! Note, in particular, "Debra Charlesworth takes Lynch to task for not devoting enough of his discussion to within species polymorphism."
Not sure I realized exactly what this was, or what it meant, when I first saw it...
drop.io : Simple Private Exchange
Not sure how these guys plan on making money, but ... they've got a nice logo.
"TREATISE OF PLANE GEOMETRY THROUGH GEOMETRIC ALGEBRA"
Windows Live Mesh
What Alex has been working on...
"NJ Voting Machine Tape Shows Phantom Obama Vote" (Ed Felten at Freedom to Tinker)
The hits just keep on coming...
PITCHf/x tool (Josh Kalk)
Web-based interface to dataset on pitch break and speed information, from MLB. (How did he get this information? Scraped from the web, or does he work for the pitchfx company?)
"New Threat to Farmers: The Market Hedge" (NYT)
"Some grain elevators are coping with the volatility and hedging problems by refusing to buy crops in advance, foreclosing the most common way farmers lock in prices."
Paola Antonelli + Benoit Mandelbrot (Seed Magazine)
"...a paper I wrote, and that was widely quoted, concerned fractals and architecture. It was in a certain sense a critique of the Bauhaus. A very short paper, but very influential." Link please! This is the web, damnit.
"Footprint" (Ygglz)
"...the last thing you need is people sitting around thinking 'I drive a Prius, I've done my part' and then not voting the right way or otherwise being disengaged from the political process."
An Introduction to BLOB Streaming for MySQL Project
I didn't realize that BLOB was a backronym. Hm.
"Losing Greenland" (Climate Feedback blog)
"...Sarah Das of the Woods Hole Oceanographic Institution, reports on a 4-kilometre-long melt lake that vanished into the ice sheet within the space of two hours."
Mandelbrot, "Scalebound or Scaling Shapes: A Useful Distinction in the Visual Arts and in the Natural Sciences" (JSTOR)
Trying to track down a quote from Mandelbrot about van der Rohe, but ... I'm not sure this is it.
Wheeler et al. "The complete genome of an individual by massively parallel DNA sequencing" (Nature)
454 is in yr base, sequenzing ur genomes.
Wojdyga, "Short proofs of strong normalization" (arXiv)
I feel like the title could be turned into a good joke. It's sort of Modest-Mouse-esque, "Short proofs for people who like strong normalization." Or something.
"Reading binary files using Ajax" (Nagoon97's Weblog)
YUI Loader as Django middleware (djangosnippets.org)
Building one of these opensource javascript ui libraries into a system like Django has been on my mind for a while.
Selinger, "Lecture notes on the lambda calculus" (arXiv)
NYT Magazine, "The Green Issue."
"Congress Near Deal on Genetic Test Bias Bill" (NYT)
Doeo!
"Bonus!"
"Watson's Genome" (evolgen)
"Personal genomics needs not only data, but also ways of assigning genomic variants to particular phenotypes."
"Science 2.0 -- Is Open Access Science the Future?" (Scientific American)
Discusses wikis and blogs, although I'm not sure that "Web 2.0" is the right term for this. It's just "the web." But still.
JSSh - a TCP/IP JavaScript Shell Server for Mozilla
"JSSh is a Mozilla C++ extension module that allows other programs (such as telnet) to establish JavaScript shell connections to a running Mozilla process via TCP/IP." Whoa, what now? Sweet.
"Coalition dynamics" (Andrew Gelman)
Taking into account the caveats ("somewhat/highly unconvincing"), this paper seems (superficially) like it could be relevant to some of your stuff.
"Sorry, I was just ... ordering some magazine subscriptions."
Garner, "On the strength of dependent products in the type theory of Martin-Löf" (arXiv)
Miriam Burstein gives a set of links to books and references on Milton illustration.
Wilson Miner, "Accessible Data Visualization with Web Standards" (A List Apart)
Tutorial on javascript-ing methods for simple graphs and visualizations. It'd be nice to see something abstracted over both this and Google Charts api.
Dukkipati, "Towards algebraic methods for maximum entropy estimation" (arXiv)
SOFT submission instructions (NCBI)
Information on the SOFT file format used for gene expression data submissions to the GEO database.
Li, "Causal models have no complete axiomatic characterization" (arXiv)
Java Quirks
Jason R., smart machine-learning guy that he is, is working in Java now that he has a "real" job. And he's blogging about it (the Java part). In it goes, to my RSS feed!
"Caffeine is a good proving ground for positions on the newer compounds. " Also on the academics-and-performance-enhancing-drugs tip.
Darwin Online
"About 90,000 pages of manuscripts, field notes, photographs and sketches connected with Charles Darwin are being placed online, where they can be viewed free."
Ty Alpers, "What Do Lawyers Know About Lethal Injection?"
Via Edge of the American West. Of course, a more transparent process for determining the process of lethal injection would probably erode public support for that same process. It's crazy that we even find ourselves in this situation.
StyleFeeder
KMZMiddleware (Djangosnippets)
"KMZ isn't just zipped KML. A good KMZ file is a zipped file that decompress to a file called 'doc.kml', which is a KML file." I need to think about neat ways of using the Django middleware layer stuff, which is tres cool.
"That's totally meta, dude." (AWWTB)
"Look Who's Doping" (Nature News)
Gotta track this meme to its source!
Amaral, "The Eukaryotic Genome as an RNA Machine"
Both Chris and Rick have referenced this in their talks, now.
"X - Files' movie title is out there: I Want to Believe' " (NYT)
"Due in theaters July 25, the movie will not deal with aliens or the intricate mythology about interaction between humans and extraterrestrials that the show built up over the years, Carter said." You have got to be kidding me.
"Quaternion Quotes of the Day" (Zero Divides)
Three quotes about an "unmixed evil," and a good comment from John Armstrong.
SQLiteJDBC
JDBC driver for Sqlite3. Includes platform-specific and pure-java versions. Depends on NestedVM.
"The Official Village Voice Election-Season Guide to the Right-Wing Blogosphere"
"Mostly covered other people’s reporting from a right-wing Republican perspective, like Fox News with a scroll bar." A well-written take-down. (It would be interesting to see one of the other side of the 'sphere, too.)
"Up and Then Down: The Lives of Elevators" (Nick Paumgarten in the New Yorker)
The structural arrangement of the story is weird at times -- it's got the typical magazine-style "let's break away from the story to tell the history behind X" thing going. But still, there are a lot of interesting elevator tidbits here.
Jonathan Rees's Homepage.
Talking with him was ... illuminating/exciting.
Science Commons
Jonathan Rees is working on this.
Chris Langmead is at CMU now -- cool!
"Best Practices for Speeding Up Your Web Site" (Yahoo Developer Network)
Base-line best-practices. Reasonable tips.
"You’re Not Punk and I’m Telling Everyone" (Attackerman)
CrimethInc! I haven't thought about them in so long. It's really funny that they kinda hate blogs.
"Announcing AppDrop.com (host Google App Engine projects on EC2)"
This guy wrote an adapter for Amazon EC2 to run python systems written to use the new Django-based framework available in Google App Engine.
"Kissing the Duke of Exeter's Daughter, or De Laudibus Legum Angliae..." (Brad DeLong)
"For a beginning thereof they erected a rack for torture, ... [which] still remains in the Tower of London; where it was occasionally used as an engine of state, not of law, more than once in the reign of Elizabeth."
F. P. Kelly, "Reversibility and Stochastic Networks"
Recommended on Michael Mitzenmacher's blog.
ImageQuant software
Some of our collaborators use the proprietary ImageQuant software to quantify their blots and rt-pcr experiments.
"The fallacy of hypothesis testing?" (Andrew Gelman)
"I've learned that one often needs simply to sit and observe and learn about one's subject before even attempting to devise a testable hypothesis." Gelman quotes Irene Pepperberg.
"Blame Berkeley" (Philip Carter at Slate)
Carter has always seemed like a thoughtful guy, but this: "we wouldn't accept that result in molecular biology or medicine or many other disciplines," seems exactly wrong to me. We *routinely* accept situations like this, throughout academia.
"Homicidal Thoughts While Driving, Reconsidered" (Ogged)
"It's no wonder, then, that drivers are so angry, when they're free to be so, and when each trip is a catalog of disappointments visited upon them by evil people." There's a reason he's a star blogger, people.
"Do People Care About Inequality?" (Consider the Evidence)
Pretty pictures, but the notion of translating peoples' answers into inferred opinions about the mean wealth level is ... tenuous?
Edward Thorp, "The Mathematics of Gambling"
Online book.
Olivier Bousquet's Machine Learning video collection
Mostly youtube stuff. A few I'd seen before, including Jordan's HDP lecture.
"Did Yoo and Bybee Violate Canons of Professional Ethics? " (Jack Balkin)
"My own conclusion is that Yoo and Bybee did violate their professional obligations to the President as constitutional actor, and to the country as a whole.... But I do not pretend that the question is at all an easy one."
Those who cannot be bothered to learn about the past, etc. etc.
"Going Back to 'Lucky' Lottery Stores" (Lee Sigelman at the Monkey Cage)
"Thus, these two results play off two 'well-documented but seemingly contradictory misperceptions of randomness,' the hot hand fallacy and the gambler’s fallacy, against one another."
FontStruct
Online font-building tool? Free, I think.
"to wii and to wed" (Scatterplot)
We're living in a brand new world, kids.
Eyal Weizman, "The Art of War" (Frieze Magazine)
Introducing Deleuze to the IDF. "Unwalling the wall." The discussion of "swarming" is pretty weak. This article came up in a conversation about video game strategy, "seeing the map without walls."
An excerpt from Jacques Monod's "Le Hasard et la necessite."
"The cornerstone of the scientific method is the postulate that nature is objective. In other words, the systematic denial that 'true' knowledge can be got at by interpreting phenomena in terms of final causes-that is to say, of 'purpose.'"
UNData
Reasonably slick searchable interface to a large amount of data from the UN.
"Procedural liberalism. You’re soaking in it." (Eric Rauchway at The Edge of the American West)
I'm much closer to Rauchway than Ogged on this one. Worth reading (plus the first 50 comments too, although the comment thread degenerates after that).
Christopher Edley Jr., "The Torture Memos and Academic Freedom"
The dean of Boalt Hall talks about John Yoo and academic freedom.
Built using the Slinkset software, by Nick Barrowman (the guy with the LogBase2 blog).
It's a great game -- it would be nice, though, if there was someplace to download a copy? Seeing as how it's not been sold commercially for several years.
Online tool (free?) for creating a link-sharing site, with users and voting and comments, a la Digg.
The iterated f-tag is also humorous.
"Now I understand what they mean by tabular data (or: building a relational database using jQuery and <TABLE> tags)"
Gives new meaning to the slogan, "the network is the computer."
"In Re John Yoo..." (DeLong)
DeLong quotes LB, and then concludes "I don't see an answer to this argument." I think it's funny how the very *first* comment then provides the obvious answer to this argument.
"Hidden Video Courses in Math, Science, and Engineering" (Data Wrangling)
I need to go through the mathematics and machinelearning list, over a weekend sometime, and see how much is interesting there.
"The Perverse Appeal of LOST." (Defective Yeti)
"BUT WHAT'S THIS ABOUT A GIANT AMBULATORY SENTIENT COCONUT??!!!" But the analogy to leveling-up and grinding is spot-on.
Lazebnik, "Can a Biologist Fix a Radio -- or, What I Learned While Studying Apoptosis"
Damn near perfect, except that ... it should be 60 pages, instead of 6. You can't just say, "we need a formal language!"
Vapnik is Cats
Yes, that *is* remarkably strange. On the other hand, reading parts of Statistical Learning Theory does bear a striking resemblance to playing parts of Zerowing.
"Take Control of Your Maps" (Paul Smith, A List Apart)
Excellent run-down, from top to bottom, of rolling your own map solution for a dynamic website.
Quantum GIS
"Quantum GIS (QGIS) is a user friendly Open Source Geographic Information System (GIS) that runs on Linux, Unix, Mac OSX, and Windows."
"Let Them Eat Empty Slogans!" (Yglesias)
Surely, "the heroic conception of politics" should take its place alongside the "Green Lantern theory of foreign policy" as an accurate generalization. But how far in the other direction will the pendulum swing, post-2008?
Imai, King, and Stuart, "Misunderstandings between experimentalists and observationalists about causal inference"
From the Harvard Social Science blog.
"About Starving Beasts and Supply Side Tax Cuts" (Pete Davis)
"I vividly recall sitting in the OMB conference room in early 1981 looking at a summary of the budget President Reagan was about to unveil. His tax cuts were no surprise, but the jump in defense spending was so large, we thought it was a typo."
Running Django on Google App Engine
And it will support Django! (With modifications of the models only, for the most part.) Nice.
A preview of Google's cloud-computing offering, I assume.
Kriegspiel
Game on, if I can get it to run. Have you gotten it running on a Vista box?
Potluck
Exhibit 2.0
Everytime I see dhuynh in the elevator, I kinda want to shake his hand.
Raptor Web Library 1.4.17
Time to re-self-familiarize.
"What Should Be Public About Public Education?" (Andrew Samwick)
"There should be some redistribution inherent in the funding of public education, even if market-based elements are introduced. "
Elliott, "Simply efficient functional reactivity"
"This paper presents a way to implement FRP that combines data- and demand-driven evaluation, in which values are recomputed only when necessary, and reactions are nearly instantaneous."
van de Geer, "High-dimensional generalized linear models and the lasso" (arXiv)
Lecture Notes from Daniel Kleitman
Coding, combinatorics, and some linear algebra. Via the Secret Bloggin Seminar.
Evernote
A company whose tagline is "remember everything." I think we can see where this is going, right?
"Gale duality and linear programing." (Ben Webster at Secret Blogging Seminar)
"This is a fact linear programmers learn with their mother’s milk, but which doesn’t seem to be particularly well-known amongst mathematicians in general, even though it just uses basic concepts of linear algebra."
Rodriguez, "Grammar-based Random Walkers in Semantic Networks" (arXiv)
"This article presents a framework for calculating semantically meaningful primary eigenvector-based metrics such as eigenvector centrality and PageRank in semantic networks using a modified version of the random walker model of Markov chain analysis."
"Everyone Realizes the Challenges of Causal Inference" (The Monkey Cage)
“Oh I see!” says one Reverend Minister, “We need a control group! This is a good idea.” It turns out his holiness was once an agronomist.
"On Knives" (The Hungry Cupboard)
Ingrid writes about knives (and keeping them sharp, and keeping them clean).
"This is the Future of Eukaryotic Genome Sequencing" (The Daily Transcript)
Database of free speculative fiction online (Metafilter)
Looks like it might be up your alley, J.
Font Editor
"We are building a pure Java™ font-rendering technology - targetted at J2ME, Personal Java and Java 1.1 environments."
Videos of Simon Peyton-Jones
"i’m all about money these days" (scatterplot)
Discussing Craig Venter as "the iconic scientist of the early 21st century." That is *one* way to do it, I guess.
"Should researchers share data?" (Adventures in Ethics and Science)
Paranoia, cha-cha-cha.
"Company data from the SEC" (Freebase Blog)
I wonder how many other nice datasources could be pulled from the Freebase data-dumps?
"NJ Election Discrepancies Worse Than Previously Thought, Contradict Sequoia’s Explanation" (Ed Felten)
A little here, a little there...
Matthew Rabin, "Risk Aversion and Expected-Utility Theory: A Calibration Theorem"
Mentioned on Andrew Gelman's blog.
Infochimps.org
"The infochimps.org community is assembling and interconnecting the world's best repository for raw data -- a sort of giant free almanac, with tables on everything you can put in a table."
"The mathematics of preservation and the future of urban ruins" (BLDGBLOG)
"Knot diagrams. Could we treat these as infrastructural blueprints and redesign the U.S. highway system to form a catalog of complex knots? You could then study experiential mathematics from behind the wheel of your car.."
Spackerman's IM Convo on Jezebel
"MEGAN: So, I'm supposed to crush on Petraeus? SPENCER: yes all the male journalists do"
Narayan et al. "Structure and Interpretation of Computer Programs"
Why? Why would you name your paper that?
"My favorite things Utah" (Marginal Revolution)
Right, but as a bunch of his commenters point out, Card doesn't live in Utah!
Name the Elements of the Periodic Table
My officemates and I could get 93 in 15 minutes. Recalling Tom Lehrer helps.
Ptitsyn, "Stochastic Resonance Reveals “Pilot Light” Expression in Mammalian Genes" (PLoS One)
If you look long enough... (cue anecdotes about excessive "skepticism" in science).
toe.gif (GIF Image, 644x652 pixels)
Formal systems at the bottom, GR and QFT at the top ("triple fields" but no "modules"?), must be drawn by a physicist. (Yup: Max Tegmark).
Mapnik C++/Python GIS Toolkit | Welcome
"Free toolkit for developing mapping applications..."
Web Developer Toolbar
Tamper Data tutorial - Jimbojw.com
Omnisio
Video annotation, online. To check out, when I get a moment.
"N-best lists and duplication" (natural language processing blog)
"The issuee is that ... when we look at an N-best list, we get the top N out/hid pairs, rather than the top N outputs. This leads to "duplication" in the N-best list... One might ask: is this a big problem? I suspect so." This is not just an NLP problem!
Takemura, Vovk, Shafer, "The game-theoretic martingales behind the zero-one laws"
"Emboldening" (Yglesias)
"This sort of thing -- the impact of our policies on the real world -- seems much more important to me than the subjective emotional state of hard-core killers." It needs to be said over and over.
Hackers Assault Epilepsy Patients via Computer
Really horrible, of course. Also, one more entry for the William Gibson Files (as in, we've now taken one more step towards something like the world of Neuromancer).
"The Necessity of Contingency" (Yglesias)
Absolutely a gem.
The LaTeX Beamer Class Homepage
--
Bjork is relentlessly weird, is what's going on. The animation is killer though.
FWTools: Open Source GIS/RS Binary Kit
Open source GIS tools.
Shapefile - Wikipedia, the free encyclopedia
ESRI formatted GIS files. Wikipedia description of the file format.
MassGIS - Available GIS Datalayers
"Why I don't like Bayesian statistics" (andrew gelman)
Gelman is like the anti-Yudkowsky. It's humorous.
"Carbon dioxide - not the only culprit" (Climate Feedback)
"[They] showing that brown clouds in the atmosphere, which are largely comprised of black carbon and other aerosols, are significant contributors to regional warming over Asia, in some cases having as great an influence as carbon dioxide."
Coquand and Lombardi, "A logical approach to abstract algebra"
The Associated Press: Television Awards Honors Students' Work
Not only was I the first person in her del.icio.us network, but I already knew about this through my super-secret contacts. Thanks though :-). The family is in town this weekend -- my Dad and I talked about you a long time yesterday.
Internet Archive: American Libraries
Reinhold, "Reassessing the Link between Premarital Cohabitation and Marital Instability."
Obviously of personal interest. Via Tyler Cowen.
Genetic reconstruction of a functional transcriptional regulatory network : Abstract : Nature Genetics
Awesome! Why didn't I see this paper before??
"Salt Passage and Causal Inference" (Social Science Statistics Blog)
This is where I got that article, C.
How Dumb is Daniel Dennett? (Aaron Swartz's Raw Thought)
Funny. (Even funnier when the comments section devolves into a "You love Searle and hate Dennett too much" argument.) None of the positive arguments in this thread actually make any sense to me, only the criticisms.
Card, Dobkin, and Maestas, "Does Medicare Save Lives?"
"We estimate a nearly 1 percentage point drop in 7-day mortality for patients at age 65, implying that Medicare eligibility reduces the death rate of this severely ill patient group by 20 percent." Via one of the econ blogs.
"285G, Lecture 0: Riemannian manifolds and curvature" (Terence Tao)
It seems to me that the special nature of three dimensions stems from the fact that it is the unique number of dimensions in which 2-forms (which are naturally associated with curvature) are Hodge dual to vector fields...
Kansas Joe, Memphis Minnie "When The Levee Breaks" (YouTube)
Allen, "The Category-Theoretic Arithmetic of Information" (arXiv)
Kozak consensus sequence (Wikipedia)
Via Biocurious, indirectly.
"Showing That You Care" (Overcoming Bias)
"When I wrote this paper ten years ago I didn't understand that there is simply no academic market for such grand theory papers, at least written by non-stars." Sometimes I think that I live on a different planet from Hanson.
"Psychological Occurences" (The Valve)
But the *entire constructional system* is "unavoidably subjective," for Carnap! Also, I don't know how you can approach a subject like this (Carnap's take on the aesthetic) without noting his distinction between the auto- and hetero-psychological.
Lower Back Tattoos Now Available at Toys R Us
Via Waxy. Like I'm living in an alternate reality.
Pacanowsky_1978.pdf (application/pdf Object)
"JANIS AND FESHBACH (10) found that other utterances were just as effective as 'Please pass the salt' in achieving salt passage compliance." It sounds so hot, when you say it like that. (We used to pass the salt, in Cambridge.)
"The Architecture of Self-Measurement" (BLDGBLOG)
"It occurred to me, then, that everyone should pick a book – a novel, a work of theory, poetry, biography, whatever – and re-read it every few years, but they should do this for the rest of their lives." My choice is "Murphy."
A fantastic pair of maps, courtesy of Strange Maps: -... (kottke.org)
Moonwalk diagrams outlined on more recognizable shapes. The overlay on the Universal Studios soundstage map is what makes it kind of genius. (But ... do they have the scales right on that one?)
Measure for Measure - Opinion - New York Times Blog
Another NYT blog with a fantastic line-up (and now we get to see how often they post). But still: Andrew Bird, Rosanne Cash, and Suzanne Vega all blogging in the same place? HOttttt.
The Carbon Account
Requires manual reading of meters? (I wonder where ours are.) But still... interesting.
Recent directions in nonparametric Bayesian machine learning
How many cultures must we ascribe buffets to? Stop the madness!
B-Course
The Dataverse Network Project
Mentioned by Andrew Gelman.
[math/0506081] The Dantzig selector: Statistical estimation when $p$ is much larger than $n$
Tao and Candes on arXiv. Includes links to responses.
Ping the Semantic Web.com - Share your RDF data with the World!
"PingtheSemanticWeb.com is a repository for RDF documents. You can notify this service that you created/updated a RDF document on your web site."
"Too soon?" (The Edge of the American West)
"The army marched under a banner with a portrait of Christ and a motto reading, 'He is Risen, but Death to Interest on Bonds.'"
Yeah, okay. The switch to over-head perspective at the end is reaaaaaaally impressive.
Candes and Tao, "The Dantzig selector: Statistical estimation when p is much larger than n"
Project Euclid page.
Much more extensive than I had realized.
"Credibility and Truth" (Opiniatrety)
I started out thinking that this post was about one thing, but really it turned out to be about something else altogether.
"I can't imagine doing this" (Marginal Revolution)
I can imagine doing this! John? Also, btw, I think that MR does a lot of "futurism" stuff under the guise of "what would people do with different incentives." In some sense, this is what the future *is*: the present, but with different incentives.
"My short history of liberalism." (Eric Rauchway at The Edge of the American West)
"What comes next? Because the Original Observation is now understood to have been incorrect, we’re about to replace an insurance program with an investment policy. This is of course a simple category mistake that nobody should make. "
"What Will Life Be Like in the Year 2008?"
"Mechanix Illustrated" article from 1968. Flying cars and all that.
"What's the worst movie ever?" (Daniel Drezner)
Everyone's got their own pet-favorite -- mine is that Oliver Stone movie from 1997, "U-Turn."
Pew Research Center Datasets
So, look back at this in about six months for the "Trends in Political Values" dataset? Or does what's at the end of the PDF already qualify?
Breiman, "Statistical Modeling: The Two Cultures."
JSTOR: Statistical Science: Vol. 16, No. 3, pp. 199-215
"Entering Exotic Characters" (Language Log)
Bill Poser's guide to Unicode character entry on blogs.
Java library to reason over FOAF graphs. This is thinking about things in ... the right way.
Willard Van Orman Quine, Mathematical Logic
"When... the Bertrand Russell e-mail list attempted to list all those who had read all three volumes of the Principia Mathematica, they came up with less than two dozen names; two of those people died while the list was being compiled."
"My Iraq War Retrospective" (John Cole)
There's something weirdly visual about the way that all those "I was wrong"s line up on the screen.
"A Stupid Question" (An und für sich)
Binmore is referenced in the comments. More comment than this is probably unnecessary -- one can draw one's own conclusions about "monadic agents" and the "irreducibility of relationality."
YouTube - Drunk History vol. 2
"Good job, William. You're my kid. You do what you think is right."
Ashley Alexandra Dupré Makeover (NYT Blog, T Magazine)
Truly we, as a culture, have reached a new place. We should send postcards.
"Functional programming in C" (Richard Crowley’s blog)
"There are so many other ways..." I'm going to write something about this, but the upshot is: 1st-class functions are nice, but they're only sort-of useful without tail-recursion (and, probably, continuations). Don't talk to me about setjmp and longjmp.
Robyn Dawes, "The Robust Beauty of Improper Linear Models in Decision Making"
PsycARTICLES - American Psychologist - Vol 34 Iss 7 Page 571. Gah, why won't MIT give me access to this? Time to track it down in the library. (Via Andrew Gelman).
History, Politics, Documentary, “Hitler speaks” – Hitler’s private movies
Weird.
Starbucks pledges change — and lots of it - Food Inc.- msnbc.com
Some Starbucks will get a Clover machine? *That* would be pretty sweet.
"Ultrafilters, nonstandard analysis, and epsilon management" (What’s new)
"But if we attempt to formalise this by trying to create the set A := \{ x \in {\Bbb R}: x = O(1) \} of all bounded numbers, and asserting that this set is then closed under addition and multiplication, we are speaking nonsense..."
"It Can Be Told: Spitzer Dribbled Before He Shot" (Deadspin)
"...it pales next to the soul-crushing despair of being eclipsed by a club ... anointed as 'the most loathsome world eleven, surpassing Team Evil from Shaolin Soccer and the New England Patriots.'" Elliott Spitzer played prep-school soccer.
"Five Best Chess Books" (Boylston Chess Club Weblog)
Four out of five ain't bad. I've heard people talk about the Tal-Botvinnik before -- I need to track it down.
Dobbs Code Talk - Learn as Many Languages as You Can (or just learn Scala)
Monads are "cheating"?? Also, it's not enough just to learn what cwcc *does* -- it's important to think about how to use it, too.
The AT&T ads. For looking fifteen years out, these guys got about ... 75% of their predictions right? Which is pretty amazing, considering. Note the video phone, and the fax, though. (I still send my faxes manually, IYKWIM.)
"Biases in Processing Political Information" (Overcoming Bias)
See? Stuart Buck has the right idea. (Obviously, this is not a new bias.)
"A trip down memory lane..." (elearnspace)
Sure, everyone laughs at Clifford Stoll, but predicting the future is hard. I look at it another way: 1 out of 3 ain't bad, for a 13-year prediction.
Charles Murray, "My Last Word on Obama, I Promise" (The Corner)
It *is* really unbelievable how much "what we hear" is biased by "who is saying it," and how much that cuts across the entire political spectrum.
Cary Magazine Presents ::: Elegant Weddings Gala ::: March 6th, 2008
A wedding story?? Weird magazine articles notwithstanding, it was a wonderful wedding.
Eric Falkenstein's Beta Arbitrage
Eric Falkenstein's "Articles and Papers"
On-Line Construction of Suffix Trees - Ukkonen (ResearchIndex)
One can debate the meaning of "online", I guess. I've implemented Ukkonen's algorithm several times -- it's linear but it still requires you to 'remember' a recently-seen segment of the string. I don't think the length of that segment is bounded.
"Morality Is Overrated" (Overcoming Bias)
Seems so obviously wrong that I'm surprised it's written by Robin Hanson -- that we act in contradiction with the prima facie requirements of morality isn't necessarily evidence that we "want" other things. Even Plato talked about this.
Unix Manual Page for proc
man page for the /proc filesystem, including descriptions of the /proc/self/fd/* stuff.
Introduction To Unix Signals Programming
"z-commands" (The Universe of Discourse)
It's like he's hand-coding a Perl monad. I need to look up this /proc/self/fd thing.
"The Disciplined Disciple Compiler" (LtU)
Gelman, Hill, and Yajima "Why we (usually) don't have to worry about multiple comparisons."
This needs to be read in *direct* conjunction with that Mayo paper.
Bousquet and Elisseeff, "Algorithmic Stability and Generalization Performance"
Amazon Web Services: Fulfillment Web Services
Whoa, my mind is blown a little bit.
triplify.org
"making database content available as RDF, JSON or Linked Data."
Bousquet, Boucheron, and Lugosi. "Introduction to Statistical Learning Theory"
What else have I read from Olivier Bousquet?
Defying Classification: Queryset Implementation
Although it's beside the point: it's not an "online" algorithm if you have to pre-process the *entire dataset* first. Also, suffix-tree-building isn't online either, not in any reasonable sense of the word.
"I have a thorough understanding of biology and the workings of the human body." (waste)
"Honey is kept in model bears as a magical warding-off of real bears. The fleshly bears will see the untransmuted Bärstoff and treat it as if it were a bear itself, and avoid trespassing on its territory."
dsquared in the "Race" thread at Unfogged
D^2 is so put-upon. He should be given a medal (with a pony emblazoned upon its outward face) for putting up with so much abuse. It must be tough.
"If Wright is wrong, then wrong is right: the Victor Davis Hanson guide to moral absolutes." (Crooked Timber)
"...what it shows is that conservatives see they have a pressing situational need to move some goalposts. But they aren’t sure where. So they are running in all directions, carrying goalposts." I'll read CT when it's not written by oh-so-sensitive D^2.
Fuguitt and Lieberson, "Correlation of Ratios or Difference Scores Having Common Terms"
"JSTOR: Sociological Methodology: Vol. 5, pp. 128-144" Via the Social Science Statistics Blog.
Hubbard and Bayarri, "P Values are not Error Probabilities."
I think this *is* the paper I was thinking of, although (I think) I've seen a more formal version of it published somewhere. But now I can't find it.
"A taxi, Leon Swain, and me" (Greater Greater Washington)
Via Ygglz. A pretty great story -- chalk this one up, though, as yet another thing that would never happen in Boston. Try finding anyone to talk to about a taxi that drives off with your luggage at 4pm on a Friday: you'll be out of luck.
Stallings, "Mind Your p's and Alphas"
"JSTOR: Educational Researcher: Vol. 14, No. 9, pp. 19-20" (Not the paper I was thinking of, but it's a start.)
Mayo, "Experimental Practice and an Error Statistical Account of Evidence"
"JSTOR: Philosophy of Science: Vol. 67, No. 3, Supplement. Proceedings of the 1998 Biennial Meetings of the Philosophy of Science Association. Part II, pp. S193-S207." At some point, I'm just going to have to search all of Mayo's papers on JSTOR.
week261
I know you're busy... but scan at least the introduction, computation, and logic sections of the "Rosetta Stone" paper linked to at the beginning of week261. I've been following it in (public) drafts. I think it'll be up your alley.
Karger et al. "Consistent hashing and random trees"
Shorter paper on consistent hashing, in an ACM Symposium.
"the p-value and the cancer patient" (orgtheory.net)
We are all soft-Bayesians now. (See the comments.) Fabio Rojas gives the standard arguments for and against. But see the footnote to the original post, where he mixes alphas and p's!
"Evidence of New Jersey Election Discrepancies" (Freedom to Tinker)
Felten's got the goods!
"Ethical and data-integrity problems in Iraq mortality study?" (Andrew Gelman)
Weird. Some of his criticisms seem awfully minor for the space they take up on the page, and David Kane shows up on pg. 31 (I was waiting...). On the other hand, the figure on pg. 19, if true, is *strange*.
"The exchange lemma and Gaussian elimination" (Gowers’s Weblog)
A failure of memory! It was the "(MacLane-)Steinitz Exchange Lemma," not Buchberger's algorithm, for which the similarity to GE might only be skin-deep. I take it the similarity to Buchberger is easier and more straightforward to show. Sorry!
GLPK -- the GNU Linear Programming Kit
I did not know that this existed. Sweet. Thanks, RMS! You sure are crazy, but you sure do help me get things done!
Daniel Klein, "A Plea to Economists Who Favour Liberty: Assist the Everyman"
In comments, on pg. 84, Gordon Tullock quotes Ronald Coase, "If you torture the data long enough, it will confess." (Picked up via non-Yudkowsky at Overcoming Bias.)
"Gaps and Redundancies" (Vox Baby)
Amusing -- but, as usual, stay out of the comments.
Is it possible that a Hummer's better for the environment than a Prius is? - By Brendan I. Koerner - Slate Magazine
I know you already found debunkings of this particular claim, but I'm sending it to you anyway.
Eric Xing's "Advanced Topics in Graphical Models"
Syllabus for Spring '07. Good list of papers.
Chains Report Stolen Card Data - WSJ.com
300,000 cards in MA alone. Time to go back to shopping at Shaws. (d*mnit.)
Consistent hashing and random trees : algorithms for caching in distributed networks (DSpace)
Danny Lewin's (who died on 9/11) master's thesis, with Tom Leighton and David Karger, on consistent hashing.
Worries Over Being ‘Slimed’ (NYT)
More Fick, all the time! (Also, he's at the Center for a New American Security now? Interesting.)
"Fire Bill Kristol/The Death of Credibility" (Opiniatrety)
And they have *video* to prove Kristol wrong. But really: calling this failure the lack of "epistemic penalty" is pretty weak sauce. (Also, this isn't the 'death' of credibility, this is just kicking the corpse a bit.)
[0803.2212] Conditioning Probabilistic Databases
"The em" (Font Bureau Type 101)
"Bill James shares his method to determine when a college basketball game is out of reach." (Slate)
I've been mentally doing a calculation like this for years, although James's is more precise (and more conservative). Mine was roughly: points = minutes. This was mostly tuned for possible comebacks when the Heels were already down.
Interview with George Clooney (Esquire)
A.J. Jacobs shows George Clooney "2 Girls, 1 Cup," and ascends into a state of semi-celeb-immortality.
Tiling Array Transcriptome Analysis of Saccharomyces cerevisiae
Support page from EBI for the Steinmetz paper.
"Valuing 'Lives Saved' vs. 'Life-Years Saved'"
Via Andrew Gelman.
"Babble with Beckett" Marina Warner TLS
As Edwards comments, “a foreign language is already a kind of fiction”.
Applied Biosystems Surpasses Industry Milestone in Lowering the Cost of Sequencing Human Genome
Press release on the SOLiD system. Mentions Evan Eichler, and includes a link to data (in the NCBI Trace sequence FTP site).
"The theory of interstellar trade" (Marginal Revolution)
"I believe that in such worlds the real interest rate cannot exceed the costs at which more fuel can 'propel you into the future through time dilation.'" Plus, the requisite point about robots.
"Computer Scientists Needed Now" (The n-Category Café)
John Baez's (older) post asking for computational help on the Rosetta Stone paper. The comments are actually really good, in places.
Samson Abramsky - Publications by Theme
gmb://publications
Gavin Bierman's publication list.
"Seek: stop searching, start finding (your email)"
David Huynh writes code, makes progress. I need to move to thunderbird for my email anyway.
"Weighted explanations in history"
Hunter and Lu, "DNA base-stacking interactions: a comparison of theoretical calculations with oligonucleotide X-ray crystal structures"
Journal of Moleculary Biology (1996).
"GPU++" (Shift Happens)
"...a seamless integration into the C++ programming language to address graphics hardware via a familiar syntax, an abstraction layer..., and a novel approach to relax the vector processor paradigm of the GPU."
Pop, and Salzberg "Bioinformatics challenges of new sequencing technology"
A review, in "Trends in Genetics."
"Evaluating the Illumina/Solexa Genome Analyzer for whole genome re-sequencing" (Next Generation Sequencing)
"Does Solexa have problems with amplifying A/T rich regions? I just read a really interesting paper by Hillier et al, from Nature Methods which claims that this might be the case."
Iyer Data in SMD
Also in the Stanford Microarray Database.
Iyer Dataset in GEO
It's in GEO, which at least includes genomic coordinates. But against what version of the genome? I'm not sure. And I can't tell if it also includes the probe sequences themselves.
Resource Letter PSNAC-1: Physics and society: Nuclear arms control
Mentioned by ArmsControlWonk. Basically, a reference/bibliography.
PEP 249 -- Python Database API Specification v2.0
Question: is there an implementation of this for Oracle?
Writing MySQL Scripts with Python DB-API
Two-year-old quick guide to MySQLdb in python.
Bergstra, Trenite, van der Zwaag, "Towards a formalization of budgets"
I hate papers with title that are "Towards" something. On the other hand: budget combinators! (Do they even call them that?) To be re-read in conjunction with some Simon Peyton-Jones papers.
Sliced bananas on opaque data (The expression lemma)
Basically reminding myself: I need to write some "Friday Catamorphism Blogging" posts, at some point.
Cetin, Bingol, "Use of Rapid Probabilistic Argumentation for Ranking on Large Complex Networks"
Once again, I find the title of the paper totally inspiring. On the other hand, I've *always* found these Ranking Methods in machine-learning-type papers to be a little under-motivated. Ordinality for Cardinality seems like you lose something.
The *other* early paper on discrete models for traffic flow. This one I can't get a copy of online.
Schreckenberg, Schadschneider, Nagel, and Ito, "Discrete stochastic models for traffic flow."
Phys. Rev. E 51 (1995). Another statistical-physics-and-traffic paper. This one reviews the recent (at the time) research on the subject.
Biham, Middleton, and Levine, "Self-organization and a dynamical transition in traffic-flow models"
Phys. Rev. A 46 (1992). One of the two earliest papers (as far as I can tell) on using models from statistical physics to study traffic "flow" (i.e., spontaneous congestion).
Totally, totally insane. "Next time you are told how a madman threatens the world remember the greatest threats have come from our own mad men."
Two Trees Management - Dumbo - Art
Manhattan Bridge and not the Williamsburg Bridge, I guess. But still.
Rodriguez and Shinavier, "The RDF Virtual Machine"
I haven't read it yet, but the title of this paper just intrigues me so much.
John Rawls on Baseball
A personal letter from Rawls, published in the Boston Review. Point three is so laughably wrong -- anyone who thinks that baseball uses more parts of the body than soccer ... well, show me a centerfielder who uses his knee or chest to catch fly balls.
"The Astronaut Farmer" (Unlikely Words)
"Gahahsdafhasdfhadshfsdhfasdhfjljhatesplosion" is clearly my Word of the Day.
bunnie’s blog » Pictures of Logic Gates in Silicon
"Ludacris’ Rap Map of US Area Codes" (Strange Maps)
"Ludacris has hoes in the Midway and Wake Islands. Only scientists are allowed to inhabit the Midway Islands, and only military personnel may inhabit the Wake Islands. Draw your own conclusion." Via MeFi. Oddly funny.
"Washington Doesn’t Sleep Here" (NYT)
The Flophouse gets written up in the NYT. Including a link to Unfogged. Game over, man, game over.
oldmathpapers.org
Via secret bloggin seminar. Not much in there now, except some Grothendieck.
"Income Inequality and Baseball" (Odd Numbers)
Calculating the Gini coefficients of baseball teams.
(Climate Progress) "Must Read Bali Climate Declaration by Scientists"
...global greenhouse gas emissions need to be reduced by at least 50% below their 1990 levels by the year 2050.
A good index of mathematics books at all levels. I think their comments about Eisenbud and Hartshorne are ... funny.
<SUCKA PANTS>, Depth in Chaos
Look down at the bottom -- another ratatat remix. It almost sounds a bit like gorillaz meets ratatat. Pretty nice.
Microsoft Excel: Revolutionary 3D Game Engine?
Funny, inspired, and kinda deep (in parts). Fundamentally amazing. It's always interesting to see something you know could be done *in principle* actually executed. Complete with videos, which are a must-see. Via Waxy.
"High Castle and Inner Truth" (The Valve)
I've got it on my shelf -- it's time to read it.
Peter Medawar, On The Effecting of All Things Possible''
On Cosma Shalizi's site? But I got here through a link from Derek Lowe.
Propeller General Information
The Propeller chip makes it easy to rapidly develop embedded applications. Its eight processors (cogs) can operate simultaneously, either independently or cooperatively, sharing common resources through a central hub.
Arrows: A General Interface to Computation
From John B.
John Hughes, "Programming With Arrows"
Paterson, "Arrows and Computation"
John Hughes, "Generalising monads to arrows"
"The Good Cold War. (a.k.a. Return of the King of Nerdtasia.)" (The Edge of the American West)
Something new for you to watch on television.
"Traffic jam emerges for no reason at all" (Cognitive Daily)
"Real-life traffic jams move backwards at a rate of about 20 km/h."
Cleveland and Devlin, "Locally Weighted Regression: An Approach to Regression Analysis by Local Fitting"
JSTOR: Journal of the American Statistical Association: Vol. 83, No. 403, p. 596. The LOESS paper.
The Defense Never Rests - The New York Times
I thought you'd like this too, C.
Scott McLemee, "Bookshelf and Self"
"All of which makes perfect sense if and only if you are not a total nerd." Pretty much the last word on bookshelf display.
Jörg's useful and ugly FXT page
The book is rough, and pseudocode/book-implementations are in C++ (!). But still, possibly, remarkably useful.
"Education, Inequality, and Complementarities" (Will Wilkinson)
"No matter how many people we train to achieve this higher levels of functioning, there will always be some kind of normal-ish distribution in it." I don't see how that follows (or even how symmetricity follows) at all.
"Of Mice and Computers" (My Biased Coin)
"The talk left me with a feeling that I've had before... [Comp. Biology] seems interesting, but it looks like to have a real impact, you really need a lot of devotion to learning the underlying biology and making connections with working biologists."
"Nickel and Dimed at Dartmouth" (Vox Baby)
"She's a compelling story teller. Here's an example of something that I had not previously appreciated--paying rent." I like reading Samwick, but *why* had this not occurred to him before?
"Absolute Poverty" (Consider the Evidence)
"Paul Krugman suggests, using calculations by Tim Smeeding (see table 2), that the United States is second-worst among affluent countries on absolute poverty. I don’t think that’s quite right."
"Peep - an Open Twitter Server" (RussellBeattie.com)
"How to Build a Twitter Agent" (Dominiek.com)
Gives the example of the telescopes that talk to each other using Twitter.
"What's encoded in your genome" (The Daily Transcript)
A freakin' large part of the genome is involved in these buffering capabilities.
Terry's Tao's Category for his own PCM Articles
Waldman, Nicholson, and Adilov, "Does Television Cause Autism?"
"This suggests that, if television is a trigger for autism, then autism should be more prevalent in communities that receive substantial precipitation."
"Conservatism and Its Absence of Contents" (Brad DeLong)
This isn't an intellectual argument about how to decide what institutions are good. It is a practical-political argument about how to create good institutions and then buttress and secure them by making them facts on the ground.
“Fondly do we hope, fervently do we pray, that this mighty scourge of war may speedily pass away.” (The Edge of the American West)
The "this day in history" series has been good, but this short little exegesis of Lincoln's 2nd Inaugural is *very* good.
"A Glimpse Into The Future" (Penny Arcade!)
So much to love. "Block Obama" and "learn to build noob." Also, "Red Brick Building Coefficient Reduced." It is all quite subtle.
Li, MacArthur, et. al. "Transcription Factors Bind Thousands of Active and Inactive Regions in the Drosophila Blastoderm" (PLoS Biology)
"Specific high-affinity in vitro recognition sequences for each factor are enriched in bound regions. This enrichment cannot explain the pattern of binding ... and varies in a context-dependent manner. ...Higher-order rules must govern targeting of TFs."
Frank Ramsey, "Truth and Probability"
Re-typeset in Word, and then converted into PDF; a public service nonetheless.
"Texting shiv" (Language Log)
Meet Shiv!
"Subtypes and polymorphism" (The Universe of Discourse)
Mark Dominus makes a mistake (which seems rare). By my eye, this should be titled "Subtypes and side-effects," but okay. Also notice that Java 1.5+ allows restricted polymorphism for both super and subtypes, for precisely this reason.
"Kenneth Goldsmith Reads Ludwig Wittgenstein's Culture and Value in German, a Language he Neither Speaks nor Understands"
Via A White Bear. I'm not sure I even want to try listening to this.
"Lying feminist ideologues wreck English, says Yale prof" (Language Log)
There was a time, post Unabomber, when Gelernter didn't write crazy things like this. For instance, I don't remember "Machine Beauty" as particularly angry. Wrong, maybe, but not *angry*.
"Geekium" (Haiku Monkey)
Free font, designed under the "SIL Open Font License" for "logic, set theory, and mathematics."
week240 (This Week in Mathematical Physics)
John Baez on categories, computation, and holodeck games.
Lucretius, De Rerum Natura (ed. William Ellery Leonard)
Via Cosma Shalizi.
"The Political Valence of DNA Testing" (The Monkey Cage)
Damnit. I need to print this out and tape it to my advisor's door.
Joel Moses, "Symbolic Integration"
Joel Moses's PhD thesis.
"and in other news" (scatterplot)
"I said, 'Oh! [Mean] is just another term for the average,' and the student replied, 'Oh. Okay. How do you calculate that?'" Seriously? But how do you go through a month without figuring this out?
Joel Moses, "Symbolic Integration: the stormy decade."
Communications of the ACM (1971). A survey of some topics touched on in my symbolics course, in the last couple of classes.
Stein, "Unbiased Estimates with Minimum Variance"
JSTOR: The Annals of Mathematical Statistics: Vol. 21, No. 3 (Sep., 1950), pp. 406-415 (via cshalizi)
"Feature Detection: state of the art browser scripting" (Peter's Blog)
"NIPS 2006" (The n-Category Café)
Includes a link to that Snoussi paper.
"Reality Blogging" (Secret Blogging Seminar)
Rubbernecking on the Information Superhighway. (Nothing to see here, folks. Move along...) I do love the "angry string theorists" trope, though.
Tuch et. al. "The Evolution of Combinatorial Gene Regulation in Fungi" (PLoS Biology)
Look at that, three cites in one paper. Also, "it was determined empirically that the Joint Binding Deconvolution (JBD) algorithm [31] provides the best combination of consistency across species and accuracy on a test set..."
"UGLIER THAN A MONKEY'S ARMPIT." (languagehat.com)
"This lively expression ... illustrates the fact that Slovak cursing makes greater use of sexual terms than that of the Czechs." Ooooh, I want it.
"More Referee Bias" (Overcoming Bias)
Surprising to no one. I wonder how much of this bias comes from an effect on the central referee vs. the linesmen (assistant referees). Asst. refs spend the entire game (often) mere yards from the crowd. I know from experience, the abuse can be extreme.
"ApoE4: Test or Not?" (In the Pipeline)
"And here’s hoping that Smart Genetics, the company that has licensed the test and is bringing it to market, handles it responsibly and resists the temptation to sell fear and uncertainly for a profit." Yeah, not a chance.
236 - The Room - Message to Ralph Nader from Anonymous
But still, he totally makes me cry sometimes. "Here are your laurels. Why don't you lie down on them, and take a goddamn nap, for once?"
"WORDMALL." (languagehat.com)
All sket up over pronunciations of Italian bread products. FMI.
"Making bridges talk" (Infovore)
A good idea, which I want to not forget. Twitter as a message bus. (For physical objects.)
"Simply Grand" (The Edge of the American West)
"Instead, I’ll ask for your National Park stories." The nightime entrance to the floor of Yosemite valley, in a full moon's light, with J&K, in my junior summer in college. I'll die with that as my favorite, most beautiful, memory.
"Poor, arid, and, in appearance, deformed" (Language Log)
This has been open in firefox for too long. I need to save it.
Afghanistan - Korengal Valley - United State Military - Counterinsurgency - New York Times
So so so so so so so crazy. Seriously.
Vinod, Sengupta, Bhat, and Venkatesh, "Integration of Global Signaling Pathways, cAMP-PKA, MAPK and TOR in the Regulation of FLO11"
Did you see this, Robin? (PLoS One)
Jonathan Coulton performs "Still Alive" in Rock Band on Vimeo
Even I can tell that his backing band isn't very good, but still... that's pretty great.
A Well, With Two Buckets
I'm writing some of my shorter, more work- or technically-oriented thoughts over here, now. Stuff that's too long for del.icio.us, not flippant enough for twitter, not a joke explanation, and would probably bore my mother to death.
"Voting and Anti-Voting" (My Biased Coin)
"Given a graph G and a starting configuration of labels, is there a polynomial time algorithm for computing the probability of being absorbed into the all 0 state? ... [My guess is there's a hardness result in there somewhere.]"
"Sand Won't Save You This Time." (In the Pipeline)
"Let's put it this way: during World War II, the Germans were very interested in using it in self-igniting flamethrowers, but found it too nasty to work with. " Damn. Just ... damn.
Hardly accessible publications in theoretical computer science
"with a historical interest."
"Jackboots and Whole Foods" (Michael Tomasky, TNR)
"As always in this book, the canard survives the complexities." Tomasky reviews Goldberg.
"Bill James on Craig Biggio." (Slate)
"That's kind of Biggio's career; it was over, and then it went on for quite awhile." Wow. Bill James doesn't pull any punches, although it's obvious he really liked Biggio as a player during his hey-day.
"IS FRENCH LOSING GENDER?" (languagehat.com)
Variation before speciation, right? This would have been a useful study to have in hand, when I was in high school. "But Mme White, I scored better than the average French teenager!"
"Victim Of Mall Shooting Determined Not To Die In Yankee Candle" (The Onion)
"I remember thinking 'This is it, I'm going to die,'" ... "Then I looked around at where I was and told myself there was no way in hell I was going to let them find me curled up behind a floor display of Midnight Jasmine Housewarmer jar candles."
SNL "I Drink Your Milkshake" (YANP)
The first half of this sketch was pretty inspired. "But enough talk; let's go drink a milkshake, shall we?"
The Princeton Mathematics Community in the 1930s
Transcripts of oral history interviews with Princeton faculty and members of the Institute for Advanced Study. The first interview with Duren, in particular, is pretty great. Via Scott Aaronson.
"The Crucial Flaws in Mearsheimer and Walt's 'The Israel Lobby'" (Brad DeLong)
This seems right, especially his (unnumbered) points 2 and 3. But can we drop the whole Tel Aviv/"abbatoir" phrasing? That's a really flippant way of discussing a serious and awful prospect.
Nader Announces Third-Party Run for President - New York Times
Well-played, sir. The next time I see one of those damn MassPIRG kids on the street outside of the Kendall T-stop, I'm going to punch him in the nose.
Scribus
Open-source page layout software.
"Market Makers for Multi Outcome Markets" (PanCrit.org)
"PM intro: basic formats" (The Now Economy)
Chris Hibbert blogs about basic types of prediction markets.
"The right way to implement a multi-outcome prediction market: Linear programming" (Oddhead Blog)
"Note that Hanson’s market scoring rules market maker also solves the same problems as the LP formulation ... However, the market maker requires a patron to subsidize... while the LP auctioneer is budget balanced — that is, can never lose money."
Zocalo Prediction Markets
"Zocalo is a toolkit for building prediction markets, markets in securities that pay out depending on outcomes of future events." Via David Pennock's Oddhead blog.
Michael Mitzenmacher's "Bloom Filters Survey"
Powerpoint slides.
Cosmides and Tooby, "Are humans good intuitive statisticians after all? Rethinking some conclusions from the literature on judgment under uncertainty."
Gigerenzer and Hoffrage, "How to Improve Bayesian Reasoning Without Instruction: Frequency Formats."
Need to re-read this again, but it's a great result. The format you use to ask a question can influence the "cognitive algorithm" a person uses to answer it. They identified Bayesian, "Fisherian", and Neyman-Pearson-style algorithms.
DiMaggio and Powell, "The Iron Cage Revisited: Insitutional Isomorphism and Collective Rationality in Organizational Fields."
"JSTOR: American Sociological Review: Vol. 48, No. 2 (Apr., 1983), pp. 147-160" Via Kieran Healy on the orgtheory blog.
Emmanuel Derman's Lecture Notes from Master's class in Financial Engineering at Columbia
"Models are only models, toy-like descriptions of idealized worlds. ... For that reason, because models are unreliable guides ... and because you don't know which is the right one .. [and if you have to use one] it's always good to use more than one."
"The blue-eyed islanders puzzle" (What's New)
Terry Tao writes out a (well-known) puzzle about common knowledge -- the blue-eyed islanders, who are logical and religious and so on. And in the long comment thread, a bunch of people: "This isn't logic, it's genocide!" Hilarious.
Clements, Felleisen. "A Tail-Recursive Machine with Stack Inspection."
"Security folklore holds that a security mechanism based on stack inspection is incompatible with a global tail call optimization policy... In this article, we prove that widely held belief wrong."
Osipenko, "Lectures on Symbolic Analysis of Dynamic Systems."
Because I'm still looking for a project for my symbolic programming class this semester...
http://www.bungeeconnect.com/
A dev environment, browser-based, for cloud apps.
"Quotes of interest -- Ohno (1973) and discussion." (Genomicron)
T.R. Gregory traces back the origins of the phrase "junk DNA" to a conference proceedings in 1973. But you should also check out his entire "Quotes of interest" series, which has recently had several posts on repetitive elements.
Kevin Kelly, "Ockham's Razor, Truth, and Information"
Kevin Kelly, "Learning, Simplicity, Truth, and Misinformation." (Draft)
To read. (Valiant, but no Vapnik?)
Gale, Binmore, and Samuelson. "Learning to be imperfect: The ultimatum game"
Binmore + ultimatum game
Ken Binmore, "Making Decisions in Large Worlds"
What happens, when you can't compute your own (presumably consistent) prior distribution? I've read this twice (still need to read it a few more times), and I admit, it bugs the hell out of me. Sections 1-2, not happy. 3, alright. 4-8 I'm okay with. WTF?
How long is the longest you've ever been on the internet? I was on it twenty four seven.
John Longley, "When is a Functional Program Not a Functional Program?"
"Sometimes all functions are continuous" (Mathematics and Computation)
Via LtU. "(Note: if you are a physicst, your mind is of a different sort. I shall address your psychology on another day.)"
"Cutting out Static" (Room 101)
Gilad Bracha on static state. "The bottom line, though, should be clear. Static state will disappear from modern programming languages, and should be eliminated from modern programming practice."
Templates for the Solution of Linear Systems, 2nd Edition
Matrix Toolkits for Java (MTJ)
Java Numerics: Main (NIST)
JungleDisk - Reliable online storage powered by Amazon S3 ™ - Jungle Disk
Carbonite Online Backup: Easy. Completely Automatic. Secure
Mozy Online Backup: Simple, Automatic, Secure
File Destructor 2.0
Hilarious. Also: love the icon.
VLDB 2007 - 33rd Very Large Data Bases Conference
Program, which includes links to pages with PDFs.
"Study finds some thoughts really do require language" (Cognitive Daily)
Verbal vs. "Rhythmic" distractions, which seem to affect thinking about the minds of others in different ways. And some of the comments section is worth reading too. Reminds me of Lance Fortnow's talk on computational models of "attention."
Game-theoretic foundations of probability and finance
"The origins of this project lie in the algorithmic theory of probability, started by Andrei Kolmogorov and developed, among others, by Per Martin-Löf, Leonid Levin and Claus-Peter Schnorr." Vovk's page to support the book he wrote with Shafer.
Glenn Shafer's List of Publications
"Why Arrow’s theorem is a scam." (Secret Blogging Seminar)
Terry Tao, in comments: "It is true that no system is perfect, but selecting a system solely because it avoids one type of problem is a little dangerous, as it may unwittingly increase the chance of susceptibility to other sorts of problems."
"God (and Gadgets) of the Lonely?" (Mixing Memory)
Larkin would say, "that vast moth-eaten musical brocade / created to pretend we never die." As so often happens, poetry trumps p-values.
Critical Eating
A (new-ish) blog you might be interested in, C -- group blogging on sociology and food.
"Relative vs. Absolute Rationality" (Overcoming Bias)
But it feels much more correct to say that people rarely get the answer exactly right, but that they generally respond in the right direction when things change.
"Sock Puppets on Neoliberal Society" (Crooked Timber)
Only Kieran Healy could write a sentence like, "Like Sifl and Olly with less slacking..." If it has less slacking, it's not anything like Sifl and Olly, man.
Linux OCR: A review of free optical character recognition software | groundstate
The arXiv.org API
"FISA Confusion" (Cato-at-liberty)
Tim Lee's comments on the recent FISA bill.
"How to Be Wrong (continued)" (Seth’s blog)
To blog. Andrew Gelman with the one comment.
ShmooCon 2008: Intercepting GSM Traffic - Hack a Day
All your GSM are belong to us. Chances that someone in the US Government is already doing this kind of thing? Pretty high, probably.
"How to shoot a skyhook ... after 50" (The Kareem Abdul-Jabbar Blog)
His style is impetuous, his offense is indefensible...
Chimpanzees Are Rational Maximizers in an Ultimatum Game -- Jensen et al. 318 (5847): 107 -- Science
What got me thinking about the ultimatum game in the first place. R's criticism, obvious in retrospect, is that it's done with food (raisins).
"Don't burn it" (Marginal Revolution)
Tyler says, "the Dead tell no tales, don't burn it." I want to say that this is like an ultimatum game.
"Claims my Russian wife won't even deign to laugh at" (Marginal Revolution)
Obviously, sleep is only valuable when you're tired. (also: "social welfare function" does so much work in that sentence).
Burn it - Times Online
Tom Stoppard says, "We don't need no Laura, let the last work burn."
Lane Kenworthy, "Reconsidering the Effect of Public Opinion on Social Policy Generosity in Affluent Democracies"
PDF. Via Andrew Gelman.
"Believing Too Little" (Overcoming Bias)
Robin Hanson quotes Seth Roberts. I think, though, that plenty of scientists believe too much -- in some sense, every "Discussion" section at the end of a paper in Science or Nature or Cell is an exercise in this.
AnthroSource | Visual Anthropology Review - 22(2):84 - Citation
A review of 'Dead Birds' by an anthropologist, who describes showing the movie to anthropology students as an example of how *not* to do it.
NPR: 'Dead Birds': Documentary Profiles New Guinea Tribe
Robert Gardner on his recently re-released film.
Patterns - Migraine - Opinion - New York Times Blog
Oliver Sacks and Jeff Tweedy (!), co-bloggers. I need to look up the Dennett quote, and references, about patterns marking the foundation of vision (since I'm sure it's not an idea original to him, either).
"Boris Spassky always the sportsman." (Boylston Chess Club Weblog)
I see you, Boris / shakin' that ash.
"Cryonics: both sides of the story" (Marginal Revolution)
"Besides, if all the patient's cells are alive, why can't the patient recover and walk out of the hospital?" Are you serious?
"Invariants of Finite Groups I" (Rigorous Trivialities)
"Here are my lecture notes for a talk I gave yesterday on invariants of finite groups in the graduate student algebra seminar here...."
Image > Labors of Love (NYT)
Um.
"Russia's Subtle Shift on Iran" (ArmsControlWonk)
"If this is true, it means that [those] complaining that the NIE plays into Iran’s hands have it exactly backwards: by taking U.S. military action off the table for now, the NIE makes it easier for countries like Russia to send Iran a stronger signal...
IJ Preview on Yahoo
Oh, I'll watch it. I'll watch the you-know-what out of it.
Doodle
Reasonably useful group-scheduling site, poll-driven and doesn't (I think) require registration. Nice.
CMU's causal discovery project/software.
"Ontological Promiscuity v. Recursion" (Language Log)
Several links to the controversy over the Piraha.
Edge: "When the world's great scientific thinkers change their minds"
How the hell does "The Edge" magazine get such smart people to say things like this? Do they ask them to contribute to their "kind of informal think-tank?" Gah. You might as well title this, "More In Sadness Than In Anger."
Christopher Walken's Three Little Pigs (YouTube)
Exit, Pig One. Pig Two, same story.
"The Race to Read Genomes on a Shoestring, Relatively Speaking" (NYT)
Because I can't help saving stories about "next-gen" sequencing technologies.
DJabberd: XMPP server where everything is a plugin.
Jabber/XMPP server framework, written in perl.
"There Will be Testosterone" (K. Capps)
(careful: spoilers.)
"Move Over, Chase and Sanborn" (Brain Hammer)
It is not for no reason, I suspect, that Carnap divides the realm of psychological objects into the autopsychological (my own) and the heteropsychological (those of others).
Beyond3D - Origin of Quake3's Fast InvSqrt()
An amusing story. Two iterations of Newton-Raphson in a few lines of bit-twiddling magic.
"My Favorite Prime Number with Four Divisors" (The Everything Seminar)
"A prank I recommend to readers is to use the number 91 when a group situation calls for a random prime number." Mathematicians tell funny jokes.
"Was It Only a Game?" (Dick Cavett)
"Here was no Nabakovian homunculus." Dick Cavett, on his NYT blog, recalls Bobby Fischer.
"portrait of a red veined darter, which is full of dew" by Photographer Martin Amm
on wasting one’s time « orgtheory.net
www.YeastNet.org
Edward Marcotte's Lab's database of "probabilistic functional" links between Yeast Genes.
Tractatus Logico-Philosophicus Introduction
Bertrand Russell's introduction to the Tractatus.
"The Conservative Car on the Obama Express" (Vox Baby)
In which I slowly begin to revise my in-the-large opinions of Jeffrey Hart.
LilyPond, music notation for everyone
"Concrete Groups and Axiomatic Theories II" (The n-Category Cafe)
"But I found out recently that the logician Alfred Tarski was also interested in applying Klein’s Erlanger Programm to logic, in his “What are Logical Notions?”"
"Web-scale Environments for Deduction Systems"
Chris Hanson is co-teaching (along with Gerry Sussman) the course I'm taking this semester.
philosophy jokes
699. Now the feeling of dizziness vanishes. We feel we want to say: "Now it seems more like a dull throbbing behind the eyes."
Text + Image + CSS3 = Crazy Delicious
"On Certainty and Illegal Substitutions" (Crooked Timber)
"John, I’m merely citing Joe Buck. I didn’t say whether I agreed with his important and cogent analysis." This is what happens, when a philosophy guy who likes hockey more than football starts writing alternate dialogue for your sportscasters.
"Journal" (xkcd)
"It's like s