A Cwm-Based SPARQL Server

Yosi Scharf

UROP

Why?

Most Straightforward Use of SPARQL
Performance
Performance

Most Straightforward Use of SPARQL

I have been working all summer on adding SPARQL server support to cwm
Need to get it tested/used
This is an easy way to use a SPARQL server, without knowing cwm at all

Performance

Cwm is slow to start up
All of the builtins have to be rebuilt
Compiling the regular expressions for the parser by itself takes nearly a second

Performance

The real gain you get in a SPARQL server is when the store is persistent
Can query many times from a large set of triples in less than O(n) time
Causes persistence issues

Implementation

Used built-in python BaseHTTPServer
Inherited from BaseHTTPHandler to handle events
Code is in http://www.w3.org/2000/10/swap/sparql/webserver.py

Issues

Cwm was not designed for running over any length of time

Memory leaks
Security
Bigger persistence issues

Memory leaks

Cwm interns everything, thus preventing the garbage collector from freeing them
Switching to WeakValueDictionary was an almost complete solution
Needed fix to keep builtins interned anyways

Security Issues

PREFIX log: <http://www.w3.org/2000/10/swap/log#> SELECT * { <file:/etc/passwd> log:content ?x }
Using FROM to try to get server to download random files

Bigger Persistance Issues

Cwm interns every file it downloads, will not download them twice
- better performance on subsequent times
- Never forgets anything; leaks memory
- No way to know if file updated
Way to update cwm's knowledge base after going into server mode is missing

The (Not very) final Product

http://mr-burns.w3.org:8000

Running with
syosi@mr-burns:~/SWAP$ /usr/bin/python ./cwm.py log.n3 math.n3 os.n3 list.n3 --sparqlServer