IPTPS 2007 TRIP REPORT
2/26-2/27 2007
Bellevue, WA

Jonathan Ledlie

Note: These are my notes and my opinions.  They are not meant to be a
complete summary of all of the talks.  If you strongly disagree with
something here and want to do something about it, please send me an
email and I will (most likely) add your response here.


Overview: Quite a bit of interesting work, some fairly derivative,
solving niche- (or non-) problems.  Lots of focus on BitTorrent.
Because there are a lot more real systems out there compared to when
the workshop first started, quite a few of the papers were substantial
measurement studies.  Many of them showed similar results, but just
the shear magnitude of real data was surprising (and refreshing).  The
VoD work is interesting and my money would be on it becoming more hot
over the next couple of years.  Because there was no wireless in the
room, most talks had a good number of questions -- this aspect was
really nice.  Not too surprisingly, there were fewer and fewer people
as the second day of talks rolled on, leading to fewer post-talk
questions and less discussion.  It's nice to see how this community
has matured over time.


Best talks / work (IMO):
- Monitoring Peers in KAD
- Scaling Peer-to-Peer Games in Low-Bandwidth Environments
- Don't Give Up on Distributed File Systems 
- Peer-Assisted VoD


Keynote Address: Q&A with Bram Cohen Bram Cohen, Founder and CEO of
BitTorrent, Inc.  (Interviewer: John Douceur)

JD Q&A:

BT grew out of MojoNation: swarming seemed like the most promising
area.

How does BT work (succeed) as a company?  (1) low pricing on content and (2)
content-distribution tool.

Will DRM work?  No, except maybe for games (e.g. maybe "repeat use"
software).  Watermarking might become feasible in the future (i.e.,
catching cheaters, as opposed to preventing use).

Streaming?  Yes, they are currently working on this.  Just need to
change the piece selection algorithm.

What are the remaining major technical problems?  Nothing specific.
Just getting all the details right is hard.

Structured overlays have never really taken off; unstructured ones
have.  Do you think there is something fundamental there?  Structure
and elegance can often add more problems than they take away.  Just trying
to maintain connectivity in a random graph can be hard.  CS academics
can suffer from "math envy:" e.g. building trees, error-correcting
codes.  Structured systems tend to have more cascading effects.

What should academics work on?  Something that does "a thing."
Measure success by utility.

What are you most proud of (over the past five year)?  Getting it to
work.  Raising money in scary legal climates.

What about the future?  Bandwidth acceleration product.  Become what
Amazon is for books now.

Audience Q&A:

What would you change with the BT protocol if you could?
Choked/unchoked is only a single bit and this can result in an
unfortunate corner condition.  Pipelining (repeatedly sending pieces
btw nodes) could be improved.

P2P search?  Centralized indexing works pretty well.

Denial of service on BT?  We just inherit from HTTP security.  Says
that content can't be poisoned due to hash checks.

How will streaming work?  Preferential pieces works directly against
rarest first.  "Television model" (watching at a specific time) is an
out of date model.

People don't have any incentive to seed: is this a problem?  No.

How do you test changes BT (testbeds) ?  Just scribbles on paper.

BT is not network-friendly.  Is this a problem?  This is what the
network is for: to be used.  We've added auto-backoff too.

Are local ISPs upset about b/w usage?  Not a lot, especially b/c
there's a lot of competition among ISPs, so people will switch.

If they RIAA hired your evil twin to bring down BT, could you?  I'm
not a security guru.

Could tit-for-tat be improved?  Yes, it could be made tighter.

What are the problems with DRM?  End users can always break it b/c
they have complete control over the system.

How would watermarking help?  It would allow stolen content to be
traced back to the original people who broke it and penalize them.


Session 1: Attacks and Defenses (Chair: Roger Wattenhofer)

Towards Scalable and Robust Overlay Networks 
Baruch Awerbuch (Johns Hopkins University); Christian Scheideler
(Technische Universitat Munchen)

- Problem: we want to become robust against insider and outsider
  attacks (member and non-members of overlay) -- i.e., some attacker
  taking over the system.
- Focus: join and leave attacks where not all peers are honest and
  reliable.  In particular, we want to make "abuse" of join and leave
  protocols hard.
- Adversary has complete knowledge of system: is it possible to make
  sure the "honest" peers are in the majority for all time?
- Their previous technique: force all peers to have truncated
  lifetimes.
- Resolving a case of their previous work where, now, the adversary
  can ask honest nodes to leave the system.
- Perhaps useful in some future context.  Interesting to think about
  this in terms of Bram Cohen's talk: certainly makes this seem like a
  pretty obscure problem (and, most likely, quite a bit of this work).


LIP: A Lifetime and Popularity Based Ranking Approach to Filter out
Fake Files in P2P File Sharing Systems 
Qinyuan Feng, Yafei Dai (Peking University)

They look at cheating in the Maze file sharing system.  Maze looks
like Napster; it uses centralized server for indexing and
identification.  Unlike Napster, users get social status based on
points (plus for uploads, minus for downloads).  It has two million
accounts.

The problem is that users change file names and send bogus content to
other users and game the system.

They use logs on server to detect fake files that have been injected in
to this popular network.


Verme: Worm Containment in Peer-to-Peer Overlays Filipe Freitas,
Rodrigo Rodrigues, Carlos Ribeiro, Paulo Ferreira, Luis Rodrigues
(INESC-ID / Instituto Superior Técnico and FCUL)

Focus: p2p-assisted worms.  Hard to detect with traditional
techniques b/c their communication paths are unusual.  Problematic
because we've make communication in p2p so efficient!

Approach: worms will probably only affect specific versions of a p2p
system.  Thus, worms are unlikely to propagate to other versions
(e.g. ones running Windows 98 vs Mac OS X) (distinct versions are
called islands).

Set (make non-random) a few of the lower order bits to isolate spans
of the ID space.  Also do something to the fingers / neighbor tables
to keep forwarding within the same span.  

This is probably a real problem, but their solution seems pretty
complicated.

In Q&A: Microsoft Vista will come out with many different versions
(e.g. 256) that have been compiled in different ways (or, from what I
understood, this would at least create a situation where exploits on
one version would be unlikely to work on another).


Free-riding in BitTorrent Networks with the Large View Exploit 
Michael Sirivianos, Jong Han Park, Rex Chen, Xiaowei Yang (University
of California, Irvine)

They propose a method to download more content than is uploaded (free-ride).

They performed experiments by modifying public code and running in a
public network.  Might be interesting to see how they ran their experiments.

Not sure relation to NSDI BitTyrant paper (which was under submission
while this paper was under review).


Session 2: Measurement Studies (Chair: Rodrigo Rodrigues) 

Actively Monitoring Peers in KAD 
Moritz Steiner, Ernst W. Biersack, Taoufik
Ennajjary (Institut Eurécom, Sophia-Antipolis)

Goal: learn about Kademlia.  Kad clients have a permanent ID
(independent of IP addresses).  Kad does iterative routing.

Challenges in doing a full crawl of the Kad network:
- unbalanced routing tree: many close nodes, few distance
- speed of crawling is crucial: need to crawl quickly between b/c of
  high churn
- network instability: sometimes nodes are not visible from different
  viewpoints.

In each 8 minute crawl, about half the nodes responded (3-4.3 million
peers seen, 1.5-2 million peers responding)

Dinural peaks -- whose time do they correspond to: Western Europe and
China?

Lots of interesting data!

Q&A: Where is the small core of nodes that tend to stay up?  Doesn't
appear to be limited to a particular locale.


Geographical Statistics and Characteristics of P2P Query Strings 
Adam Shaked Gish (Skyrider, Inc.); Yuval Shavitt, Tomer Tankel
(Tel-Aviv University)

No notes :(


Understanding the Dynamic of Peer-to-Peer Systems 
Jing Tian, Yafei Dai (Peking University)

Interesting side of this work: makes comparison between learning about
a network through crawling vs. what can be learned in a network like
Maze, which contacts a central server.  They show the results can be
significantly different.  He hasn't quite said this, but an assumption
appears to be that the central server is more accurate.


Session 3: Facilities and Techniques for P2P Systems (Chair: M. Frans
Kaashoek) 


Scaling Peer-to-Peer Games in Low-Bandwidth Environments
Jeffrey Pang (Carnegie Mellon University); Frank Uyeda (University of
California, San Diego); Jacob R. Lorch (Microsoft Research)

First person shooter: objects are distributed among players in the
game.  Each peer has his local view.  Game changes quickly and in
real-time.

Problem: this state increases quadratically with the number of peers.

Solution: Donnybrook (what is meaning of name?).  Implemented within
Quake3.

Three principles:
- total attention of a user scales linearly (and we tend to focus on
  just a few items at once)
  - technique: focus set
- sometimes better to sacrifice accuracy to make games more smooth
  - technique: guideable AI
- interaction should be prioritized: if I shoot someone, he's more
  relevant than someone I didn't

Who goes into the Focus Set?  They try to determine how much a player
is focused on every object: using proximity, aim, recency.

Performed a user study that measured improvement in "fun" for each
user :-)

Nice set of work; they estimate scalability, but haven't tried it out
in large scale (but really quite a bit of work for a workshop paper).

Great use of videos in talk to get point across; actual user study.


HPTP: Relieving the Tension between ISPs and P2P 
Guobin Shen (Microsoft Research Asia); Ye Wang (Microsoft Research
Asia and Tsinghua University); Yongqiang Xiong (Microsoft Research
Asia); Ben Y. Zhao (University of California, Santa Barbara); Zhi-Li
Zhang (University of Minnesota at Twin Cities)

Problem: P2P traffic is costing ISPs a lot of money, but they aren't
really benefitting from the traffic themselves.

Approaches:
- reactionary: rate-limit, block traffic
- constructive: build caches (and previous studies have shown that p2p
  traffic is highly cacheable)

Challenges:
- lots of protocols + updates
- legal issues
- spend on bandwidth

Proposal: get benefits of caching without the expense and legal
issues.

Errr, missed how they actually did this.


Don't Give Up on Distributed File Systems 
Jeremy Stribling, Emil Sit, M. Frans Kaashoek (MIT); Jinyang Li (New
York University); Robert Morris (MIT)

Lots of systems have re-implemented the wheel for pushing data around
distributed file systems.  Wouldn't it be great if we could make this
once and lots of apps could use it (like Coral CDN, scripts on PL)?

Needs for distributed apps:
- control over consistency and delays
- efficient data sharing between peers

Problem: existing systems focus too much on transparency (hiding
faults).  Much more of a wide-area, fault-exposing approach than
e.g. NFS, AFS.  E.g., I'd like to get the most recent version of the
file; try for X seconds, and if you can't get it, just use the local
copy.

Proposal: WheelFS
Contributions:
- give apps control with "semantic cues"
- read globally, write locally

They don't have any implementation, but they seem to have put a good
deal of thought into a new, more loose model for distributed file
systems.  Seems like cool work!

[There is a lot of chatter in German around me during the talks :( ]


Finding Content in File-Sharing Networks When You Can't Even Spell
Matei A. Zaharia (University of Waterloo); Amit Chandel, Stefan Saroiu
(University of Toronto); Srinivasan Keshav (University of Waterloo)

Applied 100 year old work (SOUNDEX) plus some smartness (e.g.,
allowing off-by-one errors, as when DNA strings are compared) to
permit similarly named content / queries to match up.

This is interesting, but I'm getting sleepy :(


Tuesday, February 27 

Panel Discussion: 
Persistent Identity in Peer-to-Peer systems Anonymization
vs. Accountability
Edward Bill, Forensic Examiner at FBI Seattle Lance Cottrell, Founder
and Chief Scientist of Anonymizer, Inc.  Roger Dingledine, Owner and
Founder of Moria Research Labs Daniel Simon, Senior Researcher at
Microsoft Research (Moderator: John Douceur)

Dan:
Why are DoS attacks not a problem on the telephone network?
a: accountability (not a technical answer!)  Interesting point: stick
a system without accountability on to a system with accountability,
and you get an unaccountable system.  E.g., VOIP is turning the phone
into an unaccountable system (and, according to an example Dan gave,
already has).

Interesting discussion.  Ed focused on the need to find the IP address
of e.g. child porn "users" (appropriate word?).  Much of the
discussion, however, focused on the positive aspects of providing
anonymity, e.g. letting people in China read the NY Times.  There
wasn't really any attempt to bring these two sides together (and it's
not clear at all if this is in any way possible).


Session 4: Routing and Peer Selection (Chair: Adriana Iamnitchi)

[Didn't take notes on other two papers in my session.]

Wired Geometric Routing 
Jonathan Ledlie, Michael Mitzenmacher, Margo Seltzer (Harvard
University); Peter Pietzuch (Imperial College London)

Feedback:
- Mike Freedman: some DHTs already have Vivaldi as part of them, can't
you build off of that?  Yes, but here we're proposing using purely the
network coordinate space as a basis for network-aware applications.


Session 5: Protocol Modeling and Analysis (Chair: Christian
Scheideler)

Multilayer Gnutella P2P Resource Sharing with an Efficient Flexible
Multi-Keyword Search Facility 
Sebnem Oeztunali, Steffen Rusitschka, Alan Southall (Siemens AG)

Didn't quite get this: seems very related to CMU work from '02 on
finding similar topics in Gnutella.


On the Role of Helpers in Peer-to-Peer File Download Systems: Design,
Analysis and Simulation 
Jiajun Wang, Chuohao Yeo, Vinod Prabhakaran, Kannan Ramchandran
(University of California, Berkeley)

In a BT network, the total exchange capacity is limited by users' upload
capacity.  Is it possible to relieve this bottleneck by adding helpers
(e.g. node that just upload for free)?  They ignore the incentives for
doing this, and they also ignore the fact that ISPs count on the fact
that not everybody is turned on at any given time -- i.e. that that
there are other bottlenecks such as aggregate ISP b/w (which costs
ISPs money, so they want to minimize).

They use a model of BT proposed in a SIGCOMM '04 paper.  I have the
feeling this model has been shown to have problems....

Seems like a poor substitute for a cache.


Evaluating DHT Implementations in Complex Environments by Network
Emulator Daishi Kato, Toshiyuki Kamiya (NEC)

Motivation: how do you evaluate the changes you make to DHTs?
They developed a multi-machine emulator to evaluate DHTs and changes
to them.  They discuss relation to Macedon in their paper.


Efficient Private Techniques for Verifying Social Proximity 
Michael J. Freedman, Antonio Nicolosi (New York University and
Stanford University)

How can social proximity be demonstrated?  (They focus on two hop
chains.)  The work is inspired by discovering chains of friends in spam
detection.  Look at intersection of friends of sender with set of
friends of the receiver.  (Funny talk which included the problem of
avoiding the ex-girlfriend attack!)  The problem is (I think) that
people can pretend to have friends they don't really have.  They use a
hash/PKI solution to generate forward trust and backward
authorization.


Session 6: VoIP and VoD (Chair: Antony Rowstron)

A Measurement-based Study of the Skype Peer-to-Peer VoIP Performance
Haiyong Xie, Yang Richard Yang (Yale University)

Skype: ordinary nodes and supernodes; nodes become supernodes under
certain conditions (b/w, uptime).  SNs form overlay; regular nodes
connect only to them.  SNs provide NAT traversal and routing.

- They infer Skype network via crawling
  - Why does number of SNs in system at a particular point follow a
    regular pattern?  Weird [Reading this over, I don't think it was a
    dinural pattern -- there was regularity where there shouldn't have
    been].

- They measure the quality of end-to-end Skype connections (using what
  appears to be an excepted model).

- Their use of communication (e.g. jitter measurement) between the
  nearest PlanetLab node as a proxy for that between Skype nodes
  seems really sketchy.

Observation: placing supernode-type services in stub ASs may be a bad
idea (They have to pay for traffic they send out, and seem to be the
ones that most actively shape traffic [not sure if he drew this
conclusion]).


A Measurement Study of a Peer-to-Peer Video-on-Demand System
Bin Cheng (Huazhong University of Science and Technology); Xuezheng
Liu, Zheng Zhang (Microsoft Research Asia); Hai Jin (Huazhong
University of Science and Technology)

Problem: supplying infrastructure for today's centralized VoD is
costly.  They did measurement of GridCast.  GC attempts to get as much
content as possible from peers, but falls back to centralized server
if content is unavailable.

Lots of interesting data from this system.


Peer-Assisted VoD: Making Internet Video Distribution Cheap
Cheng Huang, Jin Li (Microsoft Research); Keith W. Ross (Polytechnic
University)

Questions they try to address here:
- insight into performance limits of VoD
- given current user base, if you deployed system X, how much (money)
  would you save?

Key performance metric: how much bandwidth will you still need (from
the server/CDN/whatever) if we use various peer sharing strategies?

They appear to have worked out the effect of different policies
(mainly prefetching policies) via analysis.  OK, they do have some
real world results.  Nice work!

One observation that came out of the two VoD papers was that a good
deal (maybe 40%) of server b/w was consumed by the long tail of
unpopular videos -- they were hard to cache and peers that had this
content had gone offline.

The End.