KangHao Lu (Kenny)

This will temporarily be my homepage, if this term ever makes sense. My URI is http://dig.csail.mit.edu/People/kennyluck#I . You should probably link to this URI if you are writing something about me. Here's my ChangeLog.


Web of Things — Stop Using Page URIs in HTML links

The story starts with...

hypertexted names linked to email URIs

one of the specifications published by W3C like this. I wanted to know more about the authors, so I clicked the hypertexted name. And then my email client, which I never used, opens up. I got really pissed off because I had no intention to send emails to them, and I think it's an improper way of using hypertext.

Normally, the hypertexted name is supposed to link to the homepage of the author if there's one. There You can find informaion of the author, which is likely to include the email address. However, how you find the email address totally depends on how the author arranged his/her page.

With the introduction of the Semantic Web and the "Web of Things" notion, I would like push the following practice

Of course, for a normal Web agent which doesn't accept RDF, a 303 redirection is neccessary to let the agent get something useful. Here's my Apache .htaccess recipe:

RewriteEngine on
RewriteBase /People

#RewriteRule ^kennyluck$ http://people.csail.mit.edu/kennyluck/ [R=303]
#The above doesn't work because Apache does content negotiation 
#before RewriteRule

RewriteCond %{HTTP_ACCEPT} !application/rdf\+xml
RewriteRule ^kennyluck.n3$ http://people.csail.mit.edu/kennyluck/ [R=303]

which redirects the document part of my URI http://dig.csail.mit.edu/People/kennyluck#I to this page.

Let me state out an imaginative scenario to illustrate the benefits of this practice. This might be a near future Tabulator work. Suppose I have a post in my blog as follows:

I always think of the Tabulator as a context manager which will ultimately become an extension of the user's memory. I want to discuss with Tim about developing this principle a little bit with the "Stop Using Page URIs in HTML links" practice.

The reader might have specific question about Tim such as "What is Tim's work" or "What is the relationship between Tim and this author". Enter "What is Tim's work" in the google search bar makes no sense because

Telling Google your browsing history might be a way to tell Google who Tim is. But this is certainly going to raise privacy issues.

On the contrary, a newer version of Tabulator (Extension), as a client side application, could do the following:

  1. Scrape all the hypertext into rdfs:label triples.
     rdfs:label "Tim".
     rdfs:label "Tabulator".
     rdfs:label '"Stop Using Page URIs in HTML links"'.
  2. These triples are sorted based on the time order the hypertext appears on the screen. And then the store could know which "Tim" you mean even if there are different Tims in different tabs of your browser. The Labeler component of the Tabulator could possibly be extended to do that.
  3. Provide a UI that allows the user type in questions. Could be as simple as:

    The search bar becomes a search+question bar

    which provides a question bar in the upper right corner.

  4. Translate the query into a formal SPARQL query, using information in the store, particularly those rdfs:label triples. Sparqlbot is one of the Semantic Web application going this direction. This feature might rely on natural language processing tools.

    For me, "Tim's work" might get translated into:

    SELECT ?work
    WHERE { 
      ?work doap:developer  .
  5. Query internally or externally or both. Render the answer neatly.

A Sloppy Argument

I've also got a casual argument arguing against linking to pages. Imagine what would you say to explain how browsers work to ancient people. Instead of explaining in this way:

Me: After you click the underlined "Tim", you go to Tim's homepage.

Einstein: Homepage? What is a homepage?

you might want to explain in this way:

Me: After you click the underlined "Tim", you get more information about this Tim, whom the author referred to.

Einstein: OK. I see.

What I meant to say is, I think the terms "homepage", "blogs", etc., are fabricated within 20 years and their meaning is unclear in the Data Web. I really think links from things pages such as foaf:workplaceHomepage, foaf:schollHomepage links are boring, and links from pages to pages are even more boring. Using global identifiers such as URI to identify things is such a smart idea people have not yet realized. A good analogy of URI would be "true name" in the Earthsea series, with which you can control the thing with the "true name" (this is probably illegal in the current law system). With URI as a name of a thing, you can get more information about the thing. Anyway, there are lots of things that are not pages, and you should not link to a page if what you are refering in the human sentence is not a page.

Another Argument

On the Semantic Web Interest Group IRC channel, Tim and I suggested Sindice people to do a URI lookup service. Once that's done, if you type in "Tim Berners-Lee" in Sindice, you'll probably get http://www.w3.org/People/Berners-Lee/card#i , whereas if you type in "Tim Berners-Lee" in google you still get http://www.w3.org/People/Berners-Lee/ , as always.

Wait! Are we trying to split the URI space into two, one for RDF URI, one for HTML URI. Are we not doing a very bad bad bad thing?

URI is the fundamental block that makes all the Web magic possible. There shouldn't ever be a term "RDF URI".

One might say, when people google something, they are looking for pages of those things. But I think it's a misunderstanding. When people google something, they are looking for more information about those things.

Let's make http://www.w3.org/People/Berners-Lee/card#i top the google search "Tim Berners-Lee"!!

maintained by KangHao Lu (Kenny) newmail
Updated: 2009-01-16

Microformat hcard icon