Previous Up Next

2  Lecture Server API

This section describes how a client, such as the Lecture Browser, can use the lecture server. The next section will describe the implementation of the lecture server.

The lecture server is a web application that can answer queries about lecture content, access media, authenticate a user, modify a transcription, and index a new lecture. All queries are in the form of an HTTP GET or POST request, and results are in the form of values embedded in XML. Queries can be for the list of lecture categories, or for sets of lectures or parts of lectures. Since GET requests append their parameters to the URL, they can easily be peformed from a browser.

2.1  Category Queries

Category queries have no parameters and return the complete list of category names and their identifiers. The lecture browser uses this query to fill in the menu item with the list of categories. Example output for http://web.sls.csail.mit.edu/lectures/categories.jsp is shown in Figure 1. If you type the URL into your web browser address area, your browser will show you the current list of categories. Each item in the list has a name, which the client shows in a menu, and an identifier, which the client can include in lecture queries to restrict the query to a particular category.


<categories>
    <category name="Applied Mathematics" categoryid="201"/>
    <category name="Architecture" categoryid="511"/>
    <category name="Arts and Humanities" categoryid="503"/>
    <category name="Astronomy" categoryid="303"/>
    <category name="Biology" categoryid="505"/>
    <category name="Business and Economics" categoryid="501"/>
    <category name="Classical Mechanics" categoryid="302"/>
    <category name="Cognitive Science" categoryid="509"/>
    <category name="Education" categoryid="507"/>
    <category name="Electricity and Magnetism" categoryid="401"/>
    <category name="Engineering" categoryid="506"/>
    <category name="History and Political Science" categoryid="504"/>
    <category name="Linear Algebra" categoryid="102"/>
    <category name="MIT Culture and History" categoryid="508"/>
    <category name="Mathematics" categoryid="101"/>
    <category name="Media" categoryid="510"/>
    <category name="Physics" categoryid="301"/>
    <category name="Speech Processing" categoryid="1"/>
    <category name="Technology and Innovation" categoryid="502"/>
    <category name="Vibrations and Waves" categoryid="402"/>
</categories>
Figure 1: categories.jsp

2.2  Lecture Queries

Lecture queries match text and structural attributes to return a list of “hits.” It is easiest to describe the query by working backwards from the general result, shown in Figure 2.


<results>
  <course/>
  <seminarseries/>
  <lecture>
    <segment>
      <fragment>
        <word/>
      <fragment>
    </segment>
  </lecture>
</results>
Figure 2: General Form of Lecture Results

The results are always enclosed in a results tag. Zero or more course tags may follow, then zero or more seminarseries tags. The course and seminarseries tags are only included when the results contain lectures associated with the courses or seminar series. After any course and seminar series tags there are zero of more lecture tags. Each lecture may have zero of more segment tags, each segment zero or more fragment tags, and each fragment zero or more word tags.

The client often only wants a list of lectures or segments, so the parameter depth can be used to control how much detail is returned. The default value is 1, which means only the lectures are returned, so thier segments, fragments, and words are omitted. A depth of 2 will also return the segments, 3 the fragments, and 4 the words.

2.2.1  A Simple Text Query

Another important parameter is query, which is a text query suitable for the Apache Lucene text index. Figure 3 shows the results of a query for the text “hacks”, requested as http://web.sls.csail.mit.edu/lectures/lectures.jsp?query=hacks.


<results query="hacks" >
  <lecture 
   rpmurl="http://web.sls.csail.mit.edu/lectures/lecturerpm.jsp?lectureid=192"
   count="6"
   lectureid="192"
   keywords="cow board bridge tetazoo hack p dome hate floor art"
   date="October 20, 2005"
   name="Where the Sun Shines, There Hack They"
   number=""
   lecturer="Samuel Jay Keyser"
   duration="3642530">
  </lecture>
</results>
Figure 3: lecture.jsp?query=hacks

This query returned a single lecture which was not part of a seminar series or course. Not all lecture attributes are listed; some, such as the course and seminar series identifiers, are elided if they are not applicable. Figure 4 describes the various lecture attributes.


count
The number of hits in segments. We only retrieved the lectures, but if we had retrieved with a depth of 3, for fragments, there would have been six fragments for this lecture.
courseid
If present, the course identifier for the lecture.
date
The date of the lecture.
duration
The length of the lecture in milliseconds.
keywords
the statistical segmenter identified these words as occurring more often than normal in this lecture.
lectureid
The identifier for the lecture, and can be used in queries to restrict the query to a specific lecture.
lecturer
The person giving the lecture.
name
The name or title of the lecture.
number
In a course, the lecture number indicates which lecture in the course this lecture corresponds to, i.e. 1, 2, etc.
rpmurl
The URL for the media description required by RealPlayer. RealPlayer requires that the actual media be described with a short amount of text, which will be returned by a GET request to this URL.
seriesid
If present, the series identifier for the lecture.
Figure 4: Lecture Attributes

2.2.2  Another Text Query


<results query="jupiter" >
  <course
   courseid="201"
   institution="MIT"
   department="Physics"
   number="8.01"
   name="Physics I: Classical Mechanics"
   year="1999"
   term=""/>
  <course
   courseid="401"
   institution="MIT"
   department="Physics"
   number="8.03"
   name="Physics III: Vibrations and Waves"
   year="2004"
   term=""/>
  <seminarseries
   seriesid="4"
   institution="MIT"
   name="Poetry@MIT"
   host="MIT Program in Writing and Humanistic Studies"/>
  <lecture
    rpmurl="http://web.sls.csail.mit.edu/...ctureid=168" 
    count="1" lectureid="168"
    keywords="shop train poet kevin sofa parish w skylight local frogs"
    seriesid="4" date="October 17, 2002" name="A Reading by Seamus Heaney"
    number="" lecturer="Seamus Heaney" duration="3402053">
  </lecture>
  <lecture
   rpmurl="http://web.sls.csail.mit.edu/...ctureid=189" 
   count="1" lectureid="189"
   keywords="rocks apollo moon object ... layers erupt"
   date="April 2, 2003"
   name="The Quest for Mars: Scientific and Human Destiny?"
   number="" lecturer="Jim Garvin" duration="5621425">
  </lecture>
  ...  
</results>
Figure 5: lecture.jsp?query=jupiter

Figure 5 shows the partially elided results of a query to http://web.sls.csail.mit.edu/lectures/lectures.jsp?query=jupiter that returns more lectures. With this query, there were two lectures from courses and one from a seminar series. Figure 6 describes the course attributes, and Figure 7 describes the seminar series attributes.


courseid
The identifier of the course.
department
The department that offered the course.
institution
Where the course was given.
name
The name of the course.
number
The course's institutional number.
term
The term of the course, e.g. Fall.
year
The year of the course.
Figure 6: Course Attributes


seriesid
The identifier for the seminar series.
host
The sponsor of the seminar series.
institution
Where the seminar series was held.
name
The name of the seminar series.
Figure 7: Seminar Series Attributes

2.2.3  Drilling Down

When browsing, a user typically starts with an initial text query and then clicks on some lecture to see it in more detail. This can be handled by performing a second text query, restricting the lecture to the one they clicked on. In this case, we want to set the depth parameter to 2 so that we get the segments. We also want to set the parameter fillLecture to True. This causes all the segments to be fetched instead of just the ones with hits. This allows the client to show all the segments, highlighting the ones with hits. Figure 8 shows the result, and Figure 9 describes the attributes.


<results query="jupiter" >
 ...  
 <lecture ...>
  <segment
   summary="frequency object omega pi ..."
   count="0" score="1.0"
   beginTime="1571" endTime="384448">
  </segment>
  <segment 
   summary="push table pool non ..."
   count="0" score="1.0"
   beginTime="385321" endTime="578013">
  </segment>
  <segment
   summary="planets sun model string orbits..."
   count="1" score="1.0"
   beginTime="578259" endTime="1337630">
  </segment>
  <segment summary="particles gravity direction..."
   count="0" score="1.0"
   beginTime="1340219" endTime="2234896">
  </segment>
  <segment summary="salt nitrate table..."
   count="0" score="1.0"
   beginTime="2235345" endTime="2420363">
  </segment>
  <segment
   summary="bucket string gravity sense..."
   count="0" score="1.0"
   beginTime="2420907" endTime="3043149">
  </segment>
 </lecture>
</results>
Figure 8: lectures.jsp?query=jupiter&lectureid=73&depth=2&fillLecture=True

The descriptions of the fragments and words, return in results of depth 3 and 4 respectively, are in Figure 9 and Figure 11.


beginTime
The time in milliseconds in the lecture when the segment begins.
count
How many hits are in the segment.
endTime
The time in milliseconds in the lecture when the segment ends.
score
Sometimes related to the hit score.
summary
The keywords that the statistical segmenter determined happened more frequently than normal.
Figure 9: Segment Attributes


beginTime
The starting time of the fragment, in milliseconds.
count
The number of hits in the segment.
endTime
The ending time of the fragment, in milliseconds.
Figure 10: Fragment Attributes


beginTime
The starting time of the word, in milliseconds.
endTime
The ending time of the word, in milliseconds.
term
Whether or not the word is a term to be highlighted.
text
The text of the word.
Figure 11: Word Attributes

Figure 12 shows the lecture.jsp query parameters.


beginTime
Only return hits whose fragments start at or beyond this time.
categoryid
Only return hits whose lecture includes the specified categoryid.
courseid
Only return hits for lectures with the specified courseid.
depth
The depth of the lecture tree to return. 1 is lectures, 2 is segments, 3 is fragments, and 4 is words. The default is 1.
endTime
Only return hits whose fragments end at or before this time.
fillLecture
If True, retrieve lecture segments even if the query did not match. This is useful in drilling down into a lecture since it returns timing information about all of the lecture segments, rather than just those with hits.
highlighter
A query of words that should be marked as terms in the result. Ideally, the highlighter would actually mark matching words, but right now it just marks words that appear in the query. For example, if the highlighter query were “bright light” all instances of the words “bright” and all instances of the word “light” within the search results would be marked as terms, not just those places where the sequence “bright light” occurred. Furthermore, “NOT gravity” would highlight the word “gravity” instead of everything else. With some work, it is possible to make the Lucene text retrieval do the right thing.
lectureid
Only return hits for lectures with the specified lectureid.
maxHits
The maximum number of hits (fragments) to be returned.
query
A text query.
seriesid
Only return hits for lectures with the specified seminar seriesid.
startHit
If a subset of the hits are to be returned, this indicates the offset of the first fragment within the set. This might not be implemented.
Figure 12: lectures.jsp Query Parameters

2.3  Times Queries

Times queries retrieve a subset of a time-aligned transcription based on start and end times within a lecture. These are used by the lecture browser when it presents the synchronized transcription. Figure 13. The results are in the form of a results containing fragment tags containing word tags, as described in Figure 10 and Figure11.


beginTime
The starting time in milliseconds for words in the lecture.
endTime
The ending time in milliseconds for words in the lecture.
highlighter
The highlighter expression used for marking terms as words.
lectureid
The lecture identifier for the lecture.
Figure 13: times.jsp query parameters

2.4  Accessing Media

The RealPlayer browser plugin requires that the location of media be provided indirectly. The page lecturerpm.jsp is responsible for this indirection.2 The page takes one parameter, lectureid, and returns a result in format audio/x-pn-realaudio-plugin containing the text of the actual URL with the media. We have found that we get the best results by letting an Apache server send the data, rather than using a streaming server or Akamai cached locations.

2.5  User Login

The lecture server includes support for user login. The server allows authenticated users to perform additional tasks, such as submitting changes to transcriptions. At this time, the users and passwords are hard-coded, but the login and capabilities facilities in the server are relatively complete.

Currently, each user has two roles, login and edit. The login role allows the user to login, while the edit role allows the user to edit a transcription.

In general, when a user account is created, the user enters a password. Many users are not very good at password generation, so the system also generates a random string called the salt. The salt and the user password are concatenated, and an MD5 sum is computed and associated with the user identifier and the salt. Thus, the password is not itself stored, and any dictionary of common passwords used for attacks will need to be expanded by the number of salt values, which makes the target less desirable.

The first step to login is to get a randomly generated nonce and the salt for the user from nonce.jsp which has userid as a parameter. The salt is concatenated with the password the user types to the client and an MD5 sum is computed. The sum is converted to a base-64 string and concatenated with the nonce. An MD5 sum of that is converted to a base-64 string and sent to the login.jsp page as the clientHash parameter.

The login.jsp returns an XML login tag with attributes of loginOK and editOK to indicate whether or not login was successful, and whether or not the user is permitted to edit. For the client, this information is only informative; it is the server's session state that determines what the user can do.

NOTE: Unless SSL is used, which we do not use, someone monitoring network traffic would be able to pretend to be in the same session. SSL could be added relatively easy were it deemed necessary. The login procedure does prevent the need for storing passwords on the server and does make the passwords themselves relatively secure.


Previous Up Next