IPTPS 2007 TRIP REPORT 2/26-2/27 2007 Bellevue, WA Jonathan Ledlie Note: These are my notes and my opinions. They are not meant to be a complete summary of all of the talks. If you strongly disagree with something here and want to do something about it, please send me an email and I will (most likely) add your response here. Overview: Quite a bit of interesting work, some fairly derivative, solving niche- (or non-) problems. Lots of focus on BitTorrent. Because there are a lot more real systems out there compared to when the workshop first started, quite a few of the papers were substantial measurement studies. Many of them showed similar results, but just the shear magnitude of real data was surprising (and refreshing). The VoD work is interesting and my money would be on it becoming more hot over the next couple of years. Because there was no wireless in the room, most talks had a good number of questions -- this aspect was really nice. Not too surprisingly, there were fewer and fewer people as the second day of talks rolled on, leading to fewer post-talk questions and less discussion. It's nice to see how this community has matured over time. Best talks / work (IMO): - Monitoring Peers in KAD - Scaling Peer-to-Peer Games in Low-Bandwidth Environments - Don't Give Up on Distributed File Systems - Peer-Assisted VoD Keynote Address: Q&A with Bram Cohen Bram Cohen, Founder and CEO of BitTorrent, Inc. (Interviewer: John Douceur) JD Q&A: BT grew out of MojoNation: swarming seemed like the most promising area. How does BT work (succeed) as a company? (1) low pricing on content and (2) content-distribution tool. Will DRM work? No, except maybe for games (e.g. maybe "repeat use" software). Watermarking might become feasible in the future (i.e., catching cheaters, as opposed to preventing use). Streaming? Yes, they are currently working on this. Just need to change the piece selection algorithm. What are the remaining major technical problems? Nothing specific. Just getting all the details right is hard. Structured overlays have never really taken off; unstructured ones have. Do you think there is something fundamental there? Structure and elegance can often add more problems than they take away. Just trying to maintain connectivity in a random graph can be hard. CS academics can suffer from "math envy:" e.g. building trees, error-correcting codes. Structured systems tend to have more cascading effects. What should academics work on? Something that does "a thing." Measure success by utility. What are you most proud of (over the past five year)? Getting it to work. Raising money in scary legal climates. What about the future? Bandwidth acceleration product. Become what Amazon is for books now. Audience Q&A: What would you change with the BT protocol if you could? Choked/unchoked is only a single bit and this can result in an unfortunate corner condition. Pipelining (repeatedly sending pieces btw nodes) could be improved. P2P search? Centralized indexing works pretty well. Denial of service on BT? We just inherit from HTTP security. Says that content can't be poisoned due to hash checks. How will streaming work? Preferential pieces works directly against rarest first. "Television model" (watching at a specific time) is an out of date model. People don't have any incentive to seed: is this a problem? No. How do you test changes BT (testbeds) ? Just scribbles on paper. BT is not network-friendly. Is this a problem? This is what the network is for: to be used. We've added auto-backoff too. Are local ISPs upset about b/w usage? Not a lot, especially b/c there's a lot of competition among ISPs, so people will switch. If they RIAA hired your evil twin to bring down BT, could you? I'm not a security guru. Could tit-for-tat be improved? Yes, it could be made tighter. What are the problems with DRM? End users can always break it b/c they have complete control over the system. How would watermarking help? It would allow stolen content to be traced back to the original people who broke it and penalize them. Session 1: Attacks and Defenses (Chair: Roger Wattenhofer) Towards Scalable and Robust Overlay Networks Baruch Awerbuch (Johns Hopkins University); Christian Scheideler (Technische Universitat Munchen) - Problem: we want to become robust against insider and outsider attacks (member and non-members of overlay) -- i.e., some attacker taking over the system. - Focus: join and leave attacks where not all peers are honest and reliable. In particular, we want to make "abuse" of join and leave protocols hard. - Adversary has complete knowledge of system: is it possible to make sure the "honest" peers are in the majority for all time? - Their previous technique: force all peers to have truncated lifetimes. - Resolving a case of their previous work where, now, the adversary can ask honest nodes to leave the system. - Perhaps useful in some future context. Interesting to think about this in terms of Bram Cohen's talk: certainly makes this seem like a pretty obscure problem (and, most likely, quite a bit of this work). LIP: A Lifetime and Popularity Based Ranking Approach to Filter out Fake Files in P2P File Sharing Systems Qinyuan Feng, Yafei Dai (Peking University) They look at cheating in the Maze file sharing system. Maze looks like Napster; it uses centralized server for indexing and identification. Unlike Napster, users get social status based on points (plus for uploads, minus for downloads). It has two million accounts. The problem is that users change file names and send bogus content to other users and game the system. They use logs on server to detect fake files that have been injected in to this popular network. Verme: Worm Containment in Peer-to-Peer Overlays Filipe Freitas, Rodrigo Rodrigues, Carlos Ribeiro, Paulo Ferreira, Luis Rodrigues (INESC-ID / Instituto Superior Técnico and FCUL) Focus: p2p-assisted worms. Hard to detect with traditional techniques b/c their communication paths are unusual. Problematic because we've make communication in p2p so efficient! Approach: worms will probably only affect specific versions of a p2p system. Thus, worms are unlikely to propagate to other versions (e.g. ones running Windows 98 vs Mac OS X) (distinct versions are called islands). Set (make non-random) a few of the lower order bits to isolate spans of the ID space. Also do something to the fingers / neighbor tables to keep forwarding within the same span. This is probably a real problem, but their solution seems pretty complicated. In Q&A: Microsoft Vista will come out with many different versions (e.g. 256) that have been compiled in different ways (or, from what I understood, this would at least create a situation where exploits on one version would be unlikely to work on another). Free-riding in BitTorrent Networks with the Large View Exploit Michael Sirivianos, Jong Han Park, Rex Chen, Xiaowei Yang (University of California, Irvine) They propose a method to download more content than is uploaded (free-ride). They performed experiments by modifying public code and running in a public network. Might be interesting to see how they ran their experiments. Not sure relation to NSDI BitTyrant paper (which was under submission while this paper was under review). Session 2: Measurement Studies (Chair: Rodrigo Rodrigues) Actively Monitoring Peers in KAD Moritz Steiner, Ernst W. Biersack, Taoufik Ennajjary (Institut Eurécom, Sophia-Antipolis) Goal: learn about Kademlia. Kad clients have a permanent ID (independent of IP addresses). Kad does iterative routing. Challenges in doing a full crawl of the Kad network: - unbalanced routing tree: many close nodes, few distance - speed of crawling is crucial: need to crawl quickly between b/c of high churn - network instability: sometimes nodes are not visible from different viewpoints. In each 8 minute crawl, about half the nodes responded (3-4.3 million peers seen, 1.5-2 million peers responding) Dinural peaks -- whose time do they correspond to: Western Europe and China? Lots of interesting data! Q&A: Where is the small core of nodes that tend to stay up? Doesn't appear to be limited to a particular locale. Geographical Statistics and Characteristics of P2P Query Strings Adam Shaked Gish (Skyrider, Inc.); Yuval Shavitt, Tomer Tankel (Tel-Aviv University) No notes :( Understanding the Dynamic of Peer-to-Peer Systems Jing Tian, Yafei Dai (Peking University) Interesting side of this work: makes comparison between learning about a network through crawling vs. what can be learned in a network like Maze, which contacts a central server. They show the results can be significantly different. He hasn't quite said this, but an assumption appears to be that the central server is more accurate. Session 3: Facilities and Techniques for P2P Systems (Chair: M. Frans Kaashoek) Scaling Peer-to-Peer Games in Low-Bandwidth Environments Jeffrey Pang (Carnegie Mellon University); Frank Uyeda (University of California, San Diego); Jacob R. Lorch (Microsoft Research) First person shooter: objects are distributed among players in the game. Each peer has his local view. Game changes quickly and in real-time. Problem: this state increases quadratically with the number of peers. Solution: Donnybrook (what is meaning of name?). Implemented within Quake3. Three principles: - total attention of a user scales linearly (and we tend to focus on just a few items at once) - technique: focus set - sometimes better to sacrifice accuracy to make games more smooth - technique: guideable AI - interaction should be prioritized: if I shoot someone, he's more relevant than someone I didn't Who goes into the Focus Set? They try to determine how much a player is focused on every object: using proximity, aim, recency. Performed a user study that measured improvement in "fun" for each user :-) Nice set of work; they estimate scalability, but haven't tried it out in large scale (but really quite a bit of work for a workshop paper). Great use of videos in talk to get point across; actual user study. HPTP: Relieving the Tension between ISPs and P2P Guobin Shen (Microsoft Research Asia); Ye Wang (Microsoft Research Asia and Tsinghua University); Yongqiang Xiong (Microsoft Research Asia); Ben Y. Zhao (University of California, Santa Barbara); Zhi-Li Zhang (University of Minnesota at Twin Cities) Problem: P2P traffic is costing ISPs a lot of money, but they aren't really benefitting from the traffic themselves. Approaches: - reactionary: rate-limit, block traffic - constructive: build caches (and previous studies have shown that p2p traffic is highly cacheable) Challenges: - lots of protocols + updates - legal issues - spend on bandwidth Proposal: get benefits of caching without the expense and legal issues. Errr, missed how they actually did this. Don't Give Up on Distributed File Systems Jeremy Stribling, Emil Sit, M. Frans Kaashoek (MIT); Jinyang Li (New York University); Robert Morris (MIT) Lots of systems have re-implemented the wheel for pushing data around distributed file systems. Wouldn't it be great if we could make this once and lots of apps could use it (like Coral CDN, scripts on PL)? Needs for distributed apps: - control over consistency and delays - efficient data sharing between peers Problem: existing systems focus too much on transparency (hiding faults). Much more of a wide-area, fault-exposing approach than e.g. NFS, AFS. E.g., I'd like to get the most recent version of the file; try for X seconds, and if you can't get it, just use the local copy. Proposal: WheelFS Contributions: - give apps control with "semantic cues" - read globally, write locally They don't have any implementation, but they seem to have put a good deal of thought into a new, more loose model for distributed file systems. Seems like cool work! [There is a lot of chatter in German around me during the talks :( ] Finding Content in File-Sharing Networks When You Can't Even Spell Matei A. Zaharia (University of Waterloo); Amit Chandel, Stefan Saroiu (University of Toronto); Srinivasan Keshav (University of Waterloo) Applied 100 year old work (SOUNDEX) plus some smartness (e.g., allowing off-by-one errors, as when DNA strings are compared) to permit similarly named content / queries to match up. This is interesting, but I'm getting sleepy :( Tuesday, February 27 Panel Discussion: Persistent Identity in Peer-to-Peer systems Anonymization vs. Accountability Edward Bill, Forensic Examiner at FBI Seattle Lance Cottrell, Founder and Chief Scientist of Anonymizer, Inc. Roger Dingledine, Owner and Founder of Moria Research Labs Daniel Simon, Senior Researcher at Microsoft Research (Moderator: John Douceur) Dan: Why are DoS attacks not a problem on the telephone network? a: accountability (not a technical answer!) Interesting point: stick a system without accountability on to a system with accountability, and you get an unaccountable system. E.g., VOIP is turning the phone into an unaccountable system (and, according to an example Dan gave, already has). Interesting discussion. Ed focused on the need to find the IP address of e.g. child porn "users" (appropriate word?). Much of the discussion, however, focused on the positive aspects of providing anonymity, e.g. letting people in China read the NY Times. There wasn't really any attempt to bring these two sides together (and it's not clear at all if this is in any way possible). Session 4: Routing and Peer Selection (Chair: Adriana Iamnitchi) [Didn't take notes on other two papers in my session.] Wired Geometric Routing Jonathan Ledlie, Michael Mitzenmacher, Margo Seltzer (Harvard University); Peter Pietzuch (Imperial College London) Feedback: - Mike Freedman: some DHTs already have Vivaldi as part of them, can't you build off of that? Yes, but here we're proposing using purely the network coordinate space as a basis for network-aware applications. Session 5: Protocol Modeling and Analysis (Chair: Christian Scheideler) Multilayer Gnutella P2P Resource Sharing with an Efficient Flexible Multi-Keyword Search Facility Sebnem Oeztunali, Steffen Rusitschka, Alan Southall (Siemens AG) Didn't quite get this: seems very related to CMU work from '02 on finding similar topics in Gnutella. On the Role of Helpers in Peer-to-Peer File Download Systems: Design, Analysis and Simulation Jiajun Wang, Chuohao Yeo, Vinod Prabhakaran, Kannan Ramchandran (University of California, Berkeley) In a BT network, the total exchange capacity is limited by users' upload capacity. Is it possible to relieve this bottleneck by adding helpers (e.g. node that just upload for free)? They ignore the incentives for doing this, and they also ignore the fact that ISPs count on the fact that not everybody is turned on at any given time -- i.e. that that there are other bottlenecks such as aggregate ISP b/w (which costs ISPs money, so they want to minimize). They use a model of BT proposed in a SIGCOMM '04 paper. I have the feeling this model has been shown to have problems.... Seems like a poor substitute for a cache. Evaluating DHT Implementations in Complex Environments by Network Emulator Daishi Kato, Toshiyuki Kamiya (NEC) Motivation: how do you evaluate the changes you make to DHTs? They developed a multi-machine emulator to evaluate DHTs and changes to them. They discuss relation to Macedon in their paper. Efficient Private Techniques for Verifying Social Proximity Michael J. Freedman, Antonio Nicolosi (New York University and Stanford University) How can social proximity be demonstrated? (They focus on two hop chains.) The work is inspired by discovering chains of friends in spam detection. Look at intersection of friends of sender with set of friends of the receiver. (Funny talk which included the problem of avoiding the ex-girlfriend attack!) The problem is (I think) that people can pretend to have friends they don't really have. They use a hash/PKI solution to generate forward trust and backward authorization. Session 6: VoIP and VoD (Chair: Antony Rowstron) A Measurement-based Study of the Skype Peer-to-Peer VoIP Performance Haiyong Xie, Yang Richard Yang (Yale University) Skype: ordinary nodes and supernodes; nodes become supernodes under certain conditions (b/w, uptime). SNs form overlay; regular nodes connect only to them. SNs provide NAT traversal and routing. - They infer Skype network via crawling - Why does number of SNs in system at a particular point follow a regular pattern? Weird [Reading this over, I don't think it was a dinural pattern -- there was regularity where there shouldn't have been]. - They measure the quality of end-to-end Skype connections (using what appears to be an excepted model). - Their use of communication (e.g. jitter measurement) between the nearest PlanetLab node as a proxy for that between Skype nodes seems really sketchy. Observation: placing supernode-type services in stub ASs may be a bad idea (They have to pay for traffic they send out, and seem to be the ones that most actively shape traffic [not sure if he drew this conclusion]). A Measurement Study of a Peer-to-Peer Video-on-Demand System Bin Cheng (Huazhong University of Science and Technology); Xuezheng Liu, Zheng Zhang (Microsoft Research Asia); Hai Jin (Huazhong University of Science and Technology) Problem: supplying infrastructure for today's centralized VoD is costly. They did measurement of GridCast. GC attempts to get as much content as possible from peers, but falls back to centralized server if content is unavailable. Lots of interesting data from this system. Peer-Assisted VoD: Making Internet Video Distribution Cheap Cheng Huang, Jin Li (Microsoft Research); Keith W. Ross (Polytechnic University) Questions they try to address here: - insight into performance limits of VoD - given current user base, if you deployed system X, how much (money) would you save? Key performance metric: how much bandwidth will you still need (from the server/CDN/whatever) if we use various peer sharing strategies? They appear to have worked out the effect of different policies (mainly prefetching policies) via analysis. OK, they do have some real world results. Nice work! One observation that came out of the two VoD papers was that a good deal (maybe 40%) of server b/w was consumed by the long tail of unpopular videos -- they were hard to cache and peers that had this content had gone offline. The End.