ACM SIGACT News Distributed Computing Column

Archive of columns edited by Idit Keidar and Sergio Rajsbaum 2000-2013

The column is currently edited by Jennifer L. Welch

  New columns edited by Jennifer L. Welch  
  Archive 2007-2013  
  Archive 2000-2007  

This is a mirror of an archive of Distributed Computing Columns published since December 2000 for the quarterly ACM publication (electronic and printed) SIGACT News.

Acknowledgment

Sergio Rajsbaum has edited this column for seven years and established it as a relevant and popular venue. This archive contains columns edited by Sergio (2000-2007) and me (2007-2013). The current editor is Jennifer L. Welch. Most of the issues consist of contributions by guest authors. The success of this column is thanks to them.

COPYRIGHT

The documents available here are provided as a means to ensure timely dissemination of scholarly and technical work on a non-commercial basis. Permission to make digital or hard copies of part or all of these works for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that new copies bear this notice and the full citation on the first page.

I retain the copyright to my columns, and authors contributing to them retain the copyright of their papers; those can be submitted in more polished forms to refereed conferences and publications. All ACM wants is the right to publish the column in hardcopy and electronic formats (in SIGACT News Online and the ACM Digital Library). I would appreciate being notified, however, of any uses being made of the column.

 

Contents

 

Column Introductions and Tables of Content

Column 51, SIGACT News Volume 44, Number 3, September 2013
Large-Scale Transaction Replication
Pdf

Introduction

Distributed storage systems have now become mainstream, partly due to exponential growth in data volumes. It is common for applications nowadays to access data that is partitioned or sharded across many nodes, and in addition, partially replicated. Large Internet companies further deploy applications that span multiple data centers.

For a few years, it was common to settle for weak consistency in such settings, adhering to the so-called "NoSQL" approach. Indeed, an article that appeared in this column four years ago1 reported a "fear of synchronization" among chief architects who had built large-scale distributed systems in industry. But in the past couple of years, we see this trend reversing. Both academic and industrial projects have recognized the need for strong consistency and so-called ACID transactions in large distributed storage systems. Such strong consistency is the topic of the current column.

Our first contribution, by Dahlia Malkhi and Jean-Philippe Martin explains the concurrency control mechanism of Google's Spanner system. Our second contribution, by Ittay Eyal, discusses more generally fault-tolerant architectures for distributed ACID transactions. Many thanks to Dahlia, Jean-Philippe, and Ittay for their contributions!

Farewell. On a personal note, this is my last issue as editor of the distributed computing column. I have truly enjoyed performing this role for the past six years. Above all, I am grateful to the many authors who have contributed interesting and enlightening articles on diverse topics. They have made my job both easy and gratifying. Yet after six years, I feel that it is time to step down, and bring in a fresh perspective. I leave the column in the capable hands of Jennifer Welch, who will replace me as of the next issue. I wish Jennifer that her tenure as editor will be at least as enjoyable as mine was.

Column 51 Table of Contents


Column 50, SIGACT News Volume 44, Number 2, June 2013
Distributing Trusted Third Parties, Innovation Prize, and SIROCCO Review
Pdf

Introduction

Trusted third parties (TTPs) play many important roles in electronic commerce. It is often desirable to distribute the role of a TTP, in order to avoid placing too much power at the hands of a single entity. This column features an article by Alptekin Kupcu, surveying approaches to distributing TTPs. Alptekin first surveys standard techniques for doing so, using Byzantine Agreement and secure multi-party computations. He then proceeds to discuss a more efficient approach, where the trust is distributed among multiple autonomous parties that do not communicate among themselves.

This column also pays back an outstanding debt from last year's annual review column1, with reports on the Prize for Innovation in Distributed Computing and SIROCCO 2012 where it was awarded. I open with the announcement of the prize, which was awarded to Roger Wattenhofer in 2012; congratulations Roger! The award statement is followed by our main article by Alptekin. The column concludes with a review by Heger Arfaoui of SIROCCO 2012.

Many thanks to Alptekin and Heger for their contributions!

Column 50 Table of Contents


Column 49, SIGACT News Volume 44, Number 1, March 2013
Coding for Distributed Storage
Pdf

Introduction

Storage systems nowadays are increasingly distributed. While disk arrays, which are a form of a tightly coupled distributed system, have been in use for over two decades, the trend today is to move towards networked, widely dispersed distributed storage comprised of loosely coupled nodes. Coding plays an important role in both contexts, as it can help provide fault-tolerance without an excessive storage overhead. Indeed, minimizing storage redundancy for a given level of faulttolerance was the first requirement considered in the design of codes for distributed storage systems. Over the years, numerous other considerations have come to light, driving the coding community to develop solutions that cater the various needs of distributed storage systems. The current shift towards networked storage has raised the need for yet additional properties from codes, which are the subject of much ongoing research.

This column deals with advances in coding theory that are (or might be) applicable to distributed storage. It begins with a primer by Yuval Cassuto, describing the different considerations in distributed storage, as well as codes designed to address them. Yuval lists a range of requirements from codes for distributed storage. The list begins with properties required in disk arrays, such as low redundancy and low encode/decode complexity, and continues to considerations that arise in wide-area distributed storage such as degraded reads and efficient rebuild. The latter is the focus of our second contribution, by Anwitaman Datta and Frederique Oggier. Their article gives an overview of codes that aim to achieve better repairability in networked distributed storage systems. In particular, Anwitaman and Frederique consider the rebuild cost in the face of concurrent failures.

Many thanks to Yuval, Anwitaman, and Frederique for their contributions!

Column 49 Table of Contents


Column 48, SIGACT News Volume 43, Number 4, December 2012
Annual Review 2012
Pdf

Introduction

As usual, I conclude the year with an annual review of distributed computing awards and conferences. I begin by reporting on two prestigious awards - the Dijkstra Prize and the Principles of Distributed Computing Doctoral Dissertation Award. I then proceed with reviews of the main two distributed computing conferences, PODC - the ACM Symposium on Principles of Distributed Computing - and DISC - the International Symposium on DIStributed Computing. Finally, the column includes a review of WTTM - The Fourth Workshop on the Theory of Transactional Memory.

Column 48 Table of Contents


Column 47, SIGACT News Volume 43, Number 3, September 2012
Distributed Computability
Pdf

Introduction

Today's column deals with the theory of computability in a distributed system. It features a tutorial on this topic by Maurice Herlihy, Sergio Rajsbaum, and Michel Raynal. The tutorial focuses on a canonical asynchronous computation model, where processes communicate by writing to and reading from shared memory. It studies which distributed tasks can be solved in this model in the presence of process failures and communication delays, and which cannot. The tutorial highlights two powerful techniques for obtaining computability results: First, the abstraction of an iterated write-snapshot model is used in order to simplify algorithms, and reduce the complexity of the solutions space one needs to explore for impossibility proofs. Second, concepts from combinatorial topology provide an understanding of the mathematical structure induced by possible executions of a protocol in this model.

Column 47 Table of Contents


Column 46, SIGACT News Volume 43, Number 2, June 2012
Synthesizing Distributed and Concurrent Programs
Pdf

Introduction

Software synthesis is experiencing a renaissance. After years of being narrowly deployed in a few domains, automated program synthesis now appears to be ready for prime time. There are two main factors contributing to this trend. First, much related technology has matured to the point that synthesis is now feasible. In particular, program synthesis benets from advances in verication, decision procedures, and machine learning, as well as the availability of more powerful computers. Second, with the increase in software complexity, automated program synthesis is now needed more than ever. The latter is particularly important in the domains covered by this column, namely distributed and concurrent systems. As distributed protocols are now increasingly deployed in clouds and large data centers, synthesizing correct ones becomes more important. Likewise, automated synthesis can serve instrumental in alleviating the programming challenge raised by the multi-core revolution, (which has been extensively discussed in past instances of this column).

Today's column includes two articles highlighting recent advances in automated synthesis in these two domains. Both consider the case where a given program was generated to work correctly in a certain model, but needs to be fixed to work in a different model, under less restrictive assumptions. In the context of distributed programs, Borzoo Bonakdarpour and Sandeep S. Kulkarni discuss automated model repair - a mechanism for automatically fixing bugs in a program or increasing its fault tolerance. They provide a broad survey of techniques developed in the last decade for model repair in distributed computing, fault-tolerance, self-stabilization, and real-time systems. Second, Michael Kuperstein, Martin Vechev, and Eran Yahav present an approach for automatic fence inference in concurrent programs, allowing them to run under relaxed memory models. Again, we are given a program that is correct in some restricted model - in this case, a sequentially consistent memory model - and the problem is to synthesize one that behaves correctly under a relaxed model, such as those implemented in today's computer architectures.

Many thanks to Borzoo, Sandeep, Michael, Martin, and Eran for their contributions!

Column 46 Table of Contents


Column 45, SIGACT News Volume 43, Number 1, March 2012
What Theory for Transactional Memory?
Pdf

Introduction

In this issue, we have a short column. It features a review of WTTM 2011, the third Workshop on the Theory of Transactional Memory, by Petr Kuznetsov and Srivatsan Ravi. This annual work- shop focuses on developing theory for understanding transactional memory systems. It discusses recent achievements as well as remaining challenges. Petr and Srivatsan review the most recent instance of the workshop, which was co-located with DISC in September 2011 in Rome. Many thanks to Petr and Srivatsan for their contribution.

Column 45 Table of Contents


Column 44, SIGACT News Volume 42, Number 4, December 2011
2011 in Review
Pdf

Introduction

The last column of the year is dedicated, as always, to a review of distributed computing awards and conferences in 2011.

Column 44 Table of Contents


Column 43, SIGACT News Volume 42, Number 3, September 2011
Using Social Networks to Overcome Sybil Attacks
Pdf

Introduction

We open the new academic year with Haifeng Yu's article on overcoming sybil attacks using social networks. In a sybil attack, a malicious user assumes multiple identities, and uses them to pose as multiple users. Sybil attacks are a threat of the new millennium -- they arise in Internet-based distributed systems with a dynamic user population. Indeed, such attacks were not a concern in traditional distributed systems, where the set of participating processes was statically pre-defined. Sybil attacks are inherently difficult to deal with in systems where users do not wish to disclose binding private information, like credit card numbers. A recent popular approach for overcoming sybil attacks is using social networks. Intuitively, even if a malicious user can create many identities, he will have a hard time getting many honest users to befriend all of them in a social network. Thus, the graph structure of a social network can assist in revealing sybil nodes.

In this column, Haifeng Yu presents a tutorial on how social networks can be leveraged to defend against sybil attacks, and a survey of recent suggestions employing this approach. Though Haifeng tackles the problem from a theoretical standpoint, (proving formal bounds etc.), this direction has garnered more attention from the systems community, perhaps because sybil attacks are perceived as a real threat for which social networks can provide a viable solution. Yet it appears that much theory for sybil defense using social networks is yet to be developed. In particular, there is a need for better understanding the graph properties of social networks, and how they can be leveraged to achieve provable bounds on false positives and false negatives in the detection of sybil nodes. Haifeng touches upon some of these directions in the last section. Many thanks to Haifeng for his article!

Column 43 Table of Contents


Column 42, SIGACT News Volume 42, Number 2, June 2011
Game Theory and Fault Tolerance in Distributed Computing
Pdf

Introduction

Game theory and fault tolerance offer two different flavors of robustness to distributed systems -- the former is robust against participants attempting to maximize their own utilities, whereas the latter offers robustness against unexpected faults. This column takes a look at attempts to combine the two. It features a review of recent work that provides both flavors of robustness by Ittai Abraham, Lorenzo Alvisi, and Joe Halpern. Ittai, Lorenzo, and Joe discuss how game theory-style strategic behavior can be accounted for in fault-tolerant distributed protocols. They make a compelling case for bringing a game-theoretic perspective to distributed computing problems. Many thanks to Ittai, Lorenzo and Joe for their article!

Column 42 Table of Contents


Column 41, SIGACT News Volume 42, Number 1, March 2011
Computing Over Dynamic Networks
Pdf

Introduction

This column focuses on the rather recent topic of dynamic communication networks, which are modeled as evolving graphs. Admittedly, the idea that a network can be dynamic is hardly new--- networking research has been considering network topology changes and churn for decades. But by and large, past research considered such changes to be exceptions, and focused on adapting to them and re-stabilizing. Only in recent years, we begin to see works that treat dynamic changes as the norm. Such work on constantly-evolving networks may prove important in capturing many real world scenarios, in particular, ones that arise in wireless networks with mobile devices.

In today's column, Fabian Kuhn and Rotem Oshman survey recent work on dynamic networks. They consider adversarial as well as random models of graph evolution. In this context, they explain how the classical graph notions of diameter and cover time generalize to dynamic ones. These notions have important implications on information dissemination in the network. Fabian and Rotem then present an example algorithm for counting and information dissemination on adversarially evolving graphs. As research on dynamic networks is still in its infancy, many questions remain open; the column concludes with a discussion of some promising directions for future research. Many thanks to Fabian and Rotem for this survey!

Column 41 Table of Contents


Column 40, SIGACT News Volume 41, Number 4, December 2010
Annual Review 2010
Pdf

Introduction

As another year comes to a close, it is time for my traditional summary of this year's distributed computing awards and conferences.

Column 40 Table of Contents


Column 39, SIGACT News Volume 41, Number 3, September 2010
Byzantine Generals: The Next Generation
Pdf

Introduction

After almost 30 years of research on Byzantine Agreement (BA), the problem continues to be relevant and to re-invent itself in new ways. This column discusses two new research directions that further push the scale of BA. It suggests new domains where BA can, and perhaps should, be deployed.

First, our main contribution, by Valerie King and Jared Saia, argues for running BA in setting with a large number of nodes (or processors). Valerie and Jared survey new BA protocols whose communication complexity is scalable in the number of participating processors. This, they argue, enables their deployment in larger-scale domains for which BA was considered infeasible before. The second contribution, by Marko Vukolic, considers another emerging domain for BA. It calls for wider-scale deployment of BA protocols, not among many processors, but rather over multiple cloud computing providers.

The column ends with a short announcement about Morgan Claypool's new monograph series on Distributed Computing Theory, edited by Nancy Lynch.

Many thanks to Valerie, Jared, and Marko for sharing their insights!

Column 39 Table of Contents


Column 38, SIGACT News Volume 41, Number 2, June 2010
Models for Algorithm Design in Wireless Networks
Pdf

Introduction

The prosperity of research on wireless communication has not skipped the distributed computing community. Wireless networks provide unique and challenging platforms for distributed computation, inspiring researchers to develop many distributed algorithms for such environments. Any algorithmic work targeting wireless networks must begin by defining an appropriate model. The first research results in this vein employed simplified models, like the Unit Disk Graph (UDG), which facilitated algorithm design and were readily amenable to analysis. Recently, models that more accurately capture the physical nature of wireless networks were developed; such models are nowadays being gradually adopted.

This column's contribution, by Zvi (Zvika) Lotker and David Peleg focuses on the promising signal-to-interference and noise ratio (SINR) model, and studies it from an algorithmic perspective. It surveys the model's structural properties as well as algorithms designed for this model. Along the way, some of the similarities and differences between the SINR model and the UDG model are highlighted. Many thanks to Zvika and David for their contribution!

Column 38 Table of Contents


Column 37, SIGACT News Volume 41, Number 1, March 2010
Reconfiguring State Machines ... and the History of Common Knowledge
Pdf

Introduction

As suggested in the title, this column deals with two distinct issues. First-- a debt from last year. In the previous column, (SIGACT News 40(4), December 2009, pp. 64-97), I mentioned that the 2009 Dijkstra Prize was awarded to Joe Halpern and Yoram Moses for their paper ``Knowledge and Common Knowledge in a Distributed Environment''. Joe and Yoram received the prize during the DISC 2009 banquet, around 10:30pm, after a seven-course Spanish dinner. This timing did not readily allow for formal speeches. During a coffee break on the following day, some DISC attendees lamented the lack of such a speech, and opined that the Dijkstra Prize ought to be an occasion for the recipients to address the audience, although perhaps at a better timed event. Yoram Moses followed up on this suggestion, but in lieu of delivering a formal speech, wrote down what he might have said in such an address. I am happy to include here his fascinating personal account of the events ``behind the scenes'' of this paper, and how it came to be written.

Second--- our main contribution for this column is a tutorial on reconfiguration in replicated state machines, by Leslie Lamport, Dahlia Malkhi, and Lidong Zhou of Microsoft Research. This is also a bit of a throwback to a column from last year (SIGACT News 40(3), September 2009, pp. 77-85), where Lidong Zhou discussed a number of oft theoretically-neglected issues that arise in practical deployments of large-scale distributed systems. Lidong had written that, in practice, reconfiguration of replicated systems has always been a key focus, though it got little attention in theoretical work. In this column, you will find an instructive (semi-formal) treatment of this vital-but-not-well-understood issue, covering a range of practical approaches to reconfiguration.

Many thanks to Yoram, Leslie, Dahlia and Lidong for their contributions!

Column 37 Table of Contents


Column 36, SIGACT News Volume 40, Number 4, December 2009
Distributed Computing: 2009 Edition
Pdf

Abstract

It's now the season for colorful reviews of 2009. While you can read elsewhere about the year in sports, top-40 pop hit charts, people of the year, and so on, I throw my share into the mix with a (biased) review of this year's major distributed computing events.

Column 36 Table of Contents


Column 35, SIGACT News Volume 40, Number 3, September 2009
Theory and Practice in Large Distributed Systems
Pdf

Introduction

In a follow-up on the theme of the previous Distributed Computing Column (SIGACT News 40(2), June 2009, pp. 67-95), which dealt with ``Cloud Computing'', the current column deals with the tension between theoretical results and practical deployments of large-scale distributed systems. It consists of a single contribution by Lidong Zhou of Microsoft Research Asia, who has experience on both sides of this divide- both in designing distributed algorithms and in building large-scale distributed systems. Many thanks to Lidong for his contribution!

Column 35 Table of Contents


Column 34, SIGACT News Volume 40, Number 2, June 2009
Distributed Computing in the Clouds
Pdf

Introduction

It seems like ``computation clouds'' are cropping up everywhere nowadays... well, except perhaps, actually ``in the clouds'', as a recent April Fool's joke by Amazon suggested. While there is no commonly agreed-upon definition of what exactly constitutes a cloud, it is clear that there are some pretty interesting mega-scale distributed computing environments out there. Such environments require, and already deploy, many distributed services and applications. This column examines distributed computing research that seeks to develop new solutions for clouds, as well as to improve existing ones.

Our main contribution is by Ken Birman, Gregory (Grisha) Chockler, and Robbert van Renesse, who identify a research agenda for cloud computing, based on insights gained at the 2008 LADIS workshop. They question whether contemporary research in distributed computing, which sometimes targets cloud environments, is indeed relevant for cloud computing. Some researchers will be disappointed by (and might disagree with) the conclusions they reach. They then proceed to define a new agenda for cloud computing research. Their article, however, does not consider issues of security and trust. This perhaps stems from the fact that the paper is written from the perspective of cloud service providers, rather than users, whereas trust is a concern for the latter. In the next contribution, Christian Cachin, Yours Truly, and Alexander (Alex) Shraer examine the trust that users have (or can have) in cloud services where they store their data, surveying risks as well as solutions that are being proposed to address them.

The column then turns to a more applied perspective. The next contribution, by Edward (Eddie) Bortnikov from Yahoo! Research, surveys open source technologies that are used for web-scale computing, highlighting some technology transfer from the research community to actual implementations. The column concludes with an announcement, provided by Roger Barga and Jose Bernabeu-Auban from Microsoft, about a Cloud Computing tutorial that will be given at DISC'2009 in September, in Elche, Spain.

Many thanks to Ken, Grisha, Robbert, Christian, Alex, Eddie, Roger, and Jose for their contributions!

Column 34 Table of Contents


Column 33, SIGACT News Volume 40, Number 1, March 2009
Teaching Concurrency
Pdf

Introduction

As multi-core computer architectures are becoming mainstream, it is widely believed that the biggest challenge facing computer scientists today is learning how to exploit the parallelism that such architectures can offer. For example, Bill Dally, chair of Computer Science at Stanford, is quoted (Stanford Report, April 30, 2008) as saying: In fact, the media buzz around multi-core programming is said to be rivaling the buzz on global warming.

This column looks at the multi-core programming problem from an educational perspective. How can those among us who teach, help students become comfortable with concurrency? A workshop on this very topic, organized by Nir Shavit, will take place on March 8, co-located with ASPLOS'09. I include here an announcement about the workshop, as provided by Nir.

The column then proceeds with a review, by Danny Hendler, of the book ``Synchronization Algorithms and Concurrent Programming'' by Gadi Taubenfeld. Danny discusses how material from the book can be used in both undergraduate and graduate courses. The next contribution, by Alan Fekete, tackles concrete questions of curriculum design: What courses in the undergraduate computer science curriculum should deal with concurrency? What should be taught about concurrency? and how? As might be expected, there are no definite answers to these questions, yet Alan provides a through discussion of the pros and cons of various approaches and what works well in different contexts. Alan also surveys related results from the Computer Education literature. The column concludes with Leslie Lamport's high-level vision of how computer scientists should learn to think about computations, both serial and concurrent. This thought-provoking article takes a step back from the details of where, what, and how, and makes a case for the high level goal of teaching students how to think clearly. Many thanks to Danny, Alan, and Leslie for their contributions!

Column 33 Table of Contents


Column 32, SIGACT News Volume 39, Number 4, December 2008
The Year in Review
Pdf

Introduction

This column overviews the main events related to distributed computing in 2008. I begin with a citation of this year's Dijkstra Prize in Distributed Computing, which was awarded (in PODC'08) to Baruch Awerbuch and David Peleg for their paper ``Sparse Partitions'' published in FOCS in 1990. I include a reprint of the award statement, provided by Gadi Taubenfeld, the head of this year's award committee.

I then proceed with reviews of PODC- the ACM Symposium on Principles of Distributed Computing- and its European counterpart, DISC- the International Symposium on DIStributed Computing. I decided to also include a review of SPAA- the ACM Symposium on Parallelism in Algorithms and Architectures- since the boundaries between distributed and parallel computing are quite blurred these days: SPAA increasingly deals with classical distributed computing topics such as network (and graph) algorithms, while PODC and DISC continue to deal extensively with concurrent and shared memory-based computing, which arises in parallel architectures.

To review these three events, I invited students who have won Best Paper and Best Student Paper Awards in the respective conferences (PODC, DISC, and SPAA). I also asked them to include descriptions of their winning papers in their reviews.

The review of PODC is by Armando Castaneda from Universidad Nacional Autonoma de Mexico, who won the Best Student Paper Award for his paper ``New Combinatorial Topology Upper and Lower Bounds for Renaming'', co-authored with his advisor Sergio Rajsbaum. This remarkable paper refutes a long-standing lower bound result on wait-free renaming, which was proven in a number of papers, including a Godel Award winner. More specifically, the paper shows that the previously proven lower bound does not hold for all values of n, where n is the number of processes, while for other values of n it does hold. (See more below). PODC was co-located with CONCUR this year, and featured a special symposium to celebrate the contributions of Nancy Lynch in light of her sixtieth birthday. See more on the celebration (and a picture) in Armando's review of PODC below.

DISC is reviewed by the winners of two awards: Robert Danek and Wojciech Golab from the University of Toronto, who won the Best Paper Award for their paper ``Closing the Complexity Gap Between FCFS Mutual Exclusion and Mutual Exclusion'', and Wojciech Wawrzyniak from Adam Mickiewicz University in Poland, who won the Best Student Paper Award for the paper ``Fast distributed approximations in planar graphs'' he co-authored with M. Hanckowiak and A. Czygrinow. The winning papers are discussed in the review below.

The review of SPAA is by Zvika Guz from the Technion, winner of SPAA's Best Paper Award, for the paper ``Utilizing Shared Data in Chip Multiprocessors with the Nahalal Architecture'', which he co-authored with Yours Truly, Avinoam Kolodny, and Uri Weiser, and discusses in his review of SPAA below.

Of course, these reviews do not cover all the interesting events where distributed computing is studied; distributed computing papers appear in numerous additional conferences. While it is clearly impossible to cover all the relevant venues in this column, I do try to provide a taste of this wide variety. For example, a recent column (Column 30, SIGACT News 39(2)), has surveyed distributed computing research in systems conferences (SOSP and OSDI). In the current column, I chose to highlight another community that dabbles in distributed computing-- the dependability community. To this end, I include a review by Gabi (Gabriel) Kliot of distributed computing papers in DSN 2008-- the International Conference on Dependable Systems and Networks.

In all four reviews, you will find fun information about the venues, as well as technical content. Many thanks to Gadi, Armando, Robert, Wojciech, Wojciech, Zvika and Gabi for their colorful contributions!

Column 32 Table of Contents


Column 31, SIGACT News Volume 39, Number 3, September 2008
Quantum Computers Meet Distributed Computing
Ps, Pdf

Introduction

After two columns on practical problems arising in current day technologies (multicores in Column 29; systems research in Column 30), this column takes a sharp turn towards the futuristic realm of quantum computations. More specifically, the column features two surveys of distributed quantum computing, which, unbeknownst to many distributed computing folks, is an active area of research.

First, Anne Broadbent and Alain Tapp provide a broad overview of distributed computations and multi-party protocols that can benefit from quantum mechanics, most notably from entanglement. Some of these are unsolvable with classical computing, for example, pseudo-telepathy. In other cases, like appointment scheduling, the problem's communication complexity can be reduced by quantum means.

Next, Vasil Denchev and Gopal Pandurangan critically examine the joint future of quantum computers and distributed computing, asking whether this is a new frontier ... or science fiction. They give background to the lay reader on quantum mechanics concepts that provide added value over classical computing, (again, entanglement figures prominently). They also elaborate on the practical difficulties of implementing them. They then illustrate how these concepts can be exploited for two goals: (1) to distribute centralized quantum algorithms over multiple small quantum computers; and (2) to solve leader election in various distributed computing models. They conclude that the jury is still out on the cost-effectiveness of quantum distributed computing.

Both surveys outline open questions and directions for future research. Many thanks to Anne, Alain, Vasil and Gopal for their contributions!

Column 31 Table of Contents


Column 30, SIGACT News Volume 39, Number 2, June 2008
On Distributed Computing Principles in Systems Research
Ps, Pdf

Introduction

Distributed systems are increasingly deployed in the real world nowadays. Concepts like high availability, disaster recovery, service-oriented architectures, grid computing, and peer-to-peer are now a standard part of a software engineer's vocabulary. Data is often stored on remote servers or disks, and redundancy is employed for fault-tolerance and high availability. Even computations on a single machine are becoming increasingly parallel due to the advent of multi-core architectures, as discussed in the previous Distributed Computing Column.

Not surprisingly, the "systems" research community follows a similar trend, and increasingly focuses on distributed systems, distributed storage, and parallel computing. Topics like replication and fault-tolerance, (including Byzantine fault-tolerance), which have been studied in distributed computing conferences like PODC and DISC for a couple of decades, have now found their way to the mainstream of systems research. Even the SIGOPS Hall of Fame Award, which recognizes the most influential Operating Systems papers, was awarded in 2007 to five distributed computing papers (see below). At the same time, new research topics with a distributed flavor have emerged in response to real-world drives such as peer-to-peer applications, data storage across multiple administrative trust domains, and multi-core architectures.

This column examines how distributed computing principles can (and do) come into play in systems research. I first list the laureates of the 2007 SIGOPS Hall of Fame Award. Next, Allen Clement surveys recent papers in systems conferences (SOSP and OSDI) that employ distributed algorithms. He first discusses how topics that have been studied in the distributed algorithms community are now used in systems research, and then overviews new topics that are treated in both communities, albeit differently. Allen also points out where future research on foundations of distributed computing can help advance the field.

The bulk of this column is by Roy Friedman, Anne-Marie Kermarrec, and Michel Raynal, who discuss the important principle of modularity in distributed systems. They illustrate how this principle has contributed to research in the areas of shared memory-based computing and agreement problems. They then advocate a similar approach for peer-to-peer systems.

Many thanks to Allen, Roy, Anne-Marie, and Michel for their contributions to this column. Upcoming columns will focus on "quantum computers meet distributed computing" and on "teaching concurrency".

Column 30 Table of Contents


Column 29, SIGACT News Volume 39, Number 1, March 2008
On the Role of Concurrent Computing Research in the Age of Multicores
Pdf

Introduction

Concurrent computing, once an esoteric pastime of Distributed Computing Theorists and High Performance Computing Extremists, is suddenly important in the computing world at large. This new interest in concurrency stems from a dramatic paradigm shift in computer architecture: No longer are hardware manufacturers making faster and faster (uni-)processors. Nowadays, chip companies are producing multi-processors with more and more cores. Only concurrent (multi-threaded) programs can effectively exploit the potential of such multi-core processors. And thus, as multi-processors become mainstream, so does concurrent computing.

This column features three contributions that reflect on the role of Distributed Computing research in the brave new world of multi-processors. What new challenges are raised by the newfound ubiquitous relevance of concurrency? What can we, the Distributed Computing community, do to address them?

The most notable challenge stemming from the shift to multi-core architectures is the difficulty of programming concurrent code. One concept that tackles this challenge is transactional memory, which allows multiple processes (or threads) to concurrently access memory objects with transaction-like consistency semantics. As this concept is already taken seriously by industry, and real-world implementations begin to materialize, one may ask what role the Distributed Computing community has in its continued development. This column provides three different takes on this question.

The first contribution, by Pascal Felber, Christof Fetzer, Rachid Guerraoui, and Tim Harris, questions whether transactions in memory are any different from the well-studied database transactions; and if there are differences, what do they entail? The authors point out some aspects in which conventional database wisdom does not provide adequate answers for transactional memory systems. The second contribution, by Hagit Attiya, is a call for more foundational research, or Theory, for transactional memory. Last but not least, Maurice Herlihy and Victor Luchangco provide their perspective on the role of Distributed Computing in the multi-core revolution. All three highlight some promising research directions.

Many thanks to Hagit, Pascal, Christof, Rachid, Tim, Maurice, and Victor for their contributions.

Column 29 Table of Contents


Column 28, SIGACT News Volume 38, Number 4, December 2007
Distributed Computing Research - Past, Present, Trends, and Centers
Pdf

Introduction

Sergio Rajsbaum, who edited this column for seven years and established it as a relevant and popular venue, is stepping down. This issue is my first step in the big shoes he vacated. I would like to take this opportunity to thank Sergio for providing us with seven years' worth of interesting columns. In producing these columns, Sergio has enjoyed the support of the community at-large and obtained material from many authors, who greatly contributed to the column's success. I hope to enjoy a similar level of support; I warmly welcome your feedback and suggestions for material to include in this column!

The main two conferences in the area of principles of distributed computing, PODC and DISC, took place this summer. This issue is centered around these conferences, and more broadly, distributed computing research as reflected therein, ranging from reviews of this year's instantiations, through influential papers in past instantiations, to examining PODC's place within the realm of computer science.

I begin with a short review of PODC'07, and highlight some ``hot'' trends that have taken root in PODC, as reflected in this year's program. Some of the forthcoming columns will be dedicated to these up-and-coming research topics. This is followed by a review of this year's DISC, by Edward (Eddie) Bortnikov. For some perspective on long-running trends in the field, I next include the announcement of this year's Edsger W. Dijkstra Prize in Distributed Computing, (presented at DISC'07), along with the list of past laureates. The prize is awarded annually to an outstanding paper on the principles of distributed computing, whose significance and impact on the theory and/or practice of distributed computing has withstood the text of time and has been evident for at least a decade. This year's winner is ``Consensus in the Presence of Partial Synchrony'' by Dwork, Lynch, and Stockmeyer.

The main part of this issue, contributed by Michael Kuhn and Roger Wattenhofer, is a quest for the ``center'' of computer science, based on ``distances'' between conferences. It further studies the evolution of such distances over time, focusing on theory of computing and distributed computing conferences. Finally, it identifies ``central'' people in these communities. The article extends a more PODC-centered (and quite entertaining) presentation given by Roger at the PODC business meeting.

Many thanks to Eddie, Michael, and Roger for their contributions.

Column 28 Table of Contents

 

Archive - Columns Edited by Sergio Rajsbaum

  1. Volume 31, Number 4, (Whole Number 117), December 2000
  2. Volume 32, Number 1, (Whole Number 118), March 2001
  3. Volume 32, Number 2, (Whole Number 119), June 2001
  4. Volume 32, Number 3, (Whole Number 120), September 2001
  5. Volume 32, Number 4, (Whole Number 121), December 2001
  6. Volume 33, Number 1, (Whole Number 122), March 2002
  7. Volume 33, Number 2, (Whole Number 123), June 2002
  8. Volume 33, Number 3, (Whole Number 124), September 2002
  9. Volume 33, Number 4, (Whole Number 125), December 2002
  10. Volume 34, Number 1, (Whole Number 126), March 2003
  11. Volume 34, Number 2, (Whole Number 127), June 2003
  12. Volume 34, Number 3, (Whole Number 128), September 2003
  13. Volume 34, Number 4, (Whole Number 129), December 2003
  14. Volume 35, Number 2, (Whole Number 131), June 2004
  15. Volume 35, Number 3, (Whole Number 132), September 2004
  16. Volume 35, Number 4, (Whole Number 133), December 2004
  17. Volume 36, Number 1, (Whole Number 134), March 2005
  18. Volume 36, Number 2, (Whole Number 135), June 2005
  19. Volume 36, Number 3, (Whole Number 136), September 2005
  20. Volume 36, Number 4, (Whole Number 137), December 2005
  21. Volume 37, Number 1, (Whole Number 138), March 2006
  22. Volume 37, Number 2, (Whole Number 139), June 2006
  23. Volume 37, Number 3, (Whole Number 140), September 2006
  24. Volume 37, Number 4, (Whole Number 141), December 2006
  25. Volume 38, Number 1, (Whole Number 142), March 2007
  26. Volume 38, Number 2, (Whole Number 143), June 2007
  27. Volume 38, Number 3, (Whole Number 144), September 2007

The Columns

  1. Volume 31, Number 4, (Whole Number 117), December 2000
    Postcript

    This is the first Distributed Computing Column that I have edited. It consists of four sections: A few thoughts on the role of distributed computing theory, a reprint of the PODC 2000 Influential-Paper Award statement, a review of the PODC 2000 Conference, and a summary of a paper by Jennifer E. Walter, Jennifer L. Welch, and Nancy M. Amato titled "Distributed Reconfiguration of Metamorphic Robot Chains" presented at the conference. The column concludes with some news and acknowledgments. The winner of the Influential-Paper Award is Leslie Lamport, for his paper "Time, Clocks, and the Ordering of Events in a Distributed System." 
     
  2. Volume 32, Number 1, (Whole Number 118), March 2001
    Postcript
    This issue consists of two guest contributions: A review of the DISC 2000 Conference by Lisa Higham, and a routing survey by Cyril Gavoille. This survey focuses on routing messages in distributed networks with efficient data structures. After an overview of the various results of the literature, some interestingly open problems are described.

     

     

  3. Volume 32, Number 2, (Whole Number 119), June 2001
    Postcript

    After some announcements, this issue includes the contribution by Idit Keidar and myself titled "On the Cost of Fault-Tolerant Consensus When There Are No Faults." This paper considers the consensus problem in an asynchronous model enriched with unreliable failure detectors and in the partial synchrony model. It considers algorithms that solve consensus and tolerate crash failures and/or message omissions. It proves tight lower bounds on the number of communication steps performed by such algorithms in failure-free executions. It presents in a unified framework a number of related lower bound results. It also illustrates matching upper bounds, by giving algorithms that achieve the lower bound.
     

  4. Volume 32, Number 3, (Whole Number 120), September 2001
    Postcript

    This issue consists of the contribution by Boaz Patt-Shamir "Transmission of Real-Time Streams," that surveys some of the interesting ideas behind lossless off-line smoothing and lossy on-line smoothing. In smoothing, the basic idea is to trade bandwidth for space and latency. The bits of the input stream generated by the source are not transmitted directly over the communication link:  first they are stored in a server's buffer; the server submits bits stored in its buffer to the link when it deems appropriate, subject to the link rate constraint. Bits arriving at the other side of the communication link are first stored in a client's buffer, which delivers them to the play-out device  after a reconstruction action. The added flexibility provided by the buffers is used to create a smoother traffic on the communication link, thus reducing the peak bandwidth requirement of the stream. The basic problem in smoothing is determining what is the link rate, buffer sizes, the play-out delay, and, of course, what is the right algorithm to use.   

     
  5. Volume 32, Number 4, (Whole Number 121), December 2001
    Postcript

    This issue consists of four parts: a survey of SIROCCO'01 by Pierre Fraigniaud, a survey of POMC'01 by Rui Fan, a survey of PODC'01 by myself, the paper "Paxos Made Simple" by Leslie Lamport.

    This year was the 20-th anniversary of PODC, and included a special celebration honoring the work of Leslie Lamport, on the occasion of his 60th birthday, and was collocated with the Workshop on Principles of Mobile Computing (POMC).  I briefly describe the PODC conference, and reproduce the toast by Fred Schneider in the honor of Leslie. The Business Meeting report is by Gil Neiger. Finally, there is Lamport's paper on his own algorithm which was discussed in a few of the PODC lectures. Here is what Lamport says in his web page about this paper:

    "At the PODC 2001 conference, I got tired of everyone saying how difficult it was to understand the Paxos algorithm, published in [115].  Although people go so hung up in the pseudo-Greek names that they found the paper hard to understand, the algorithm itself is very simple.  So, I cornered a couple of people at the conference and explained the algorithm to them orally, with no paper.  When I got home, I wrote down the explanation as a short note, which I later revised based on comments from Fred Schneider and Butler Lampson.  The current version is 13 pages long, and contains no formula more complicated than n1 > n2." 

     

  6. Volume 33, Number 1, (Whole Number 122), March 2002
    Postcript

    This issue consists of two parts: a survey of DISC'01 by Panagiota Fatourou, and a survey of Self-Stabilization at WSS'01 and DISC'01 by Ted Herman. 

     

  7. Volume 33, Number 2, (Whole Number 123), June 2002
    Postcript

    This issue consists of three parts:

    The abstract for Gilbert and Lynch paper is the following. When designing distributed web services, there are three properties that are commonly desired: consistency, availability, and partition tolerance.  It is impossible to achieve all three.  In this note, they prove this conjecture in the asynchronous network model, and then discuss solutions to this dilemma in the partially synchronous model.

    The survey by Rajaraman consists of the following parts. First various aspects relevant to modeling ad hoc networks are described. Then topology control is discussed. Since the nodes of an ad hoc network are often associated with points in 2-dimensional space, topology control is closely tied to computational geometry; this relationship and extant work in the area are briefly reviewed.  Next,  routing protocols for ad hoc networks are discussed.  After a brief overview of the many protocols that have been proposed, alternative approaches based on the adversarial network model are discussed.  

  8. Volume 33, Number 3, (Whole Number 124), September 2002
    Postcript

    This issue consists of the paper:

    Abstract: Ensembles of distributed, heterogeneous resources, or Computational Grids, have emerged as popular platforms for deploying large-scale and resource-intensive applications. Large collaborative efforts are currently underway to provide the necessary software infrastructure. Grid computing raises challenging issues in many areas of computer science, and especially in the area of distributed computing, as Computational Grids cover increasingly large networks and span many organizations.  In this paper we briefly motivate Grid computing and introduce its basic concepts. We then highlight a number of distributed computing research questions, and discuss both the relevance and the shortcomings of previous research results when applied to Grid computing. We choose to focus on issues concerning the dissemination and retrieval of information and data on Computational Grid platforms. We feel that these issues are particularly critical at this time, and as we can point to preliminary ideas, work, and results in the Grid community and the distributed computing community.  This paper is of interest to distributing computing researchers because Grid computing provides new challenges that need to be addressed, as well as actual platforms for experimentation and research.Volume 33, Number 3, (Whole Number 124), September 2002

  9. Volume 33, Number 4, (Whole Number 125), December 2002
    Postcript



    This issue consists of the paper:

Abstract: The emergence of the Internet as a standard platform for distributed computing has led to diversification of the research agenda in distributed algorithms, and that agenda now consists of much more than traditional, core PODC concerns. If distributed algorithms are to be designed, analyzed, implemented, and deployed for the full range of applications that are now plausible, the research community will need to develop new computational models, new failure models, new measures of computational complexity, and new analysis techniques. This column is intended as an introduction to one theme that has grown steadily in popularity and importance during the past few years: the
recognition that participants in an Internet algorithm are economic actors as well as computational processes.
Postcript

  1. Volume 34, Number 1, (Whole Number 126), March 2003
    Postcript
    This issue consists of the paper:

Abstract: The celebrated Paxos algorithm of Lamport implements a fault-tolerant deterministic service by replicating it over a distributed message-passing system. This paper presents a deconstruction of the algorithm by factoring out its fundamental algorithmic principles within two abstractions: an eventual leader election and an eventual register abstractions. In short, the leader election abstraction encapsulates the liveness property of Paxos whereas the register abstraction encapsulates its safety property. Our deconstruction is faithful in that it preserves the resilience and efficiency of the original Paxos algorithm in terms of stable storage logs, message complexity, anc communication steps. In a companion paper, we show how to use our abstractions to reconstruct powerful variants of Paxos

  1. Volume 34, Number 2, (Whole Number 127), August 2003
    Postcript 

This issue consists of the paper
Abstract: The celebrated Paxos algorithm of Lamport implements a fault-tolerant deterministic service by replicating it over a distributed message-passing system. In the previous column the authors presented a deconstruction of the algorithm by factoring
out its fundamental algorithmic principles within two abstractions:
an eventual leader election and an eventual register abstractions.
Using those abstractions, they show in this paper how to reconstruct, in a modular manner, powerful variants of Paxos.  In particular, they show how to (1) alleviate the need for stable storage access if some processes remain up for sufficiently long, (2) augment the resilience of the algorithm against unstable processes, (3) enable single process decision with shared commodity disks, and (4) reduce the number of communication steps during stable periods of the system.

  1. Volume 34, Number 3, (Whole Number 128), SeptemberAugust 2003
    Postcript

 This issue consists of the paper Abstract: In this note, we discuss the applications of lattice theory to solving
problems in distributed systems. The first problem we consider is that of detecting a predicate in a computation, i.e., determining whether there exists a consistent cut of the
computation satisfying the given predicate. The second problem involves computing the slice of a computation with respect to a predicate. A slice is a concise representation of all those global states of the computation that satisfy the given predicate. The third problem we consider is that of analyzing a partial order trace of a distributed program to determine whether it satisfies the given temporal logic formula. Finally, we consider the problem of timestamping events and global states of a computation to capture the order relationship. We discuss how the results from lattice theory can be used in solving each of the above problems.

  1. Volume 34, Number 4, (Whole Number 129), December 2003August 2003
    Postcript

This issue describes the PODC 20th anniversary Special Issue published by
Distributed Computing journal.

Abstract: The 20th ACM Principles of Distributed Computing (PODC) conference was held in 2002. To celebrate this event, Hagit Attiya and myself edited a special issue of the journal Distributed Computing, with the
support and encouragement of its editor, Vassos Hadzilacos. We invited
contributions from the community, including surveys, retrospectives,
new results, and personal perspectives. We ended up with  nine papers, described here.
  1.  Volume 35, Number 2, (Whole Number 131), August 2004
    Postcript

 This issue consists of the paper Abstract: Many distributed algorithms are designed for a system with a fixed set of n processes. However, some systems may dynamically change and expand over time, so that the number of processes may grow to infinity as time tends to infinity. This paper considers such systems, and gives algorithms that are new and simple (but not necessarily efficient) for common problems. The reason for simplicity is to better expose some of the algorithmic techniques for dealing with infinitely many processes. A brief summary of existing work in the subject is also provided.
  1.  Volume 35, Number 3, (Whole Number 132), September 2004
    Postcript

 This issue consists of:
  1.  Volume 35, Number 4, (Whole Number 133), December 2004
    Postcript

 This issue consists of:
Postcript

 This issue consists of:
  1. Volume 36, Number 2, (Whole Number 135), June 2005
    Postcript

    This issue consists of:
    • "Algorithmic Foundations of the Internet,'' a survey by Alejandro López-Ortiz.
  1. Volume 36, Number 3, (Whole Number 136), September 2005
    Pdf

    This issue consists of:
    • A review of the SIROCCO conference held in Mont Saint Michel, France, May 23-26, 2005, by Corentin Travers.

  1. Volume 36, Number 4, (Whole Number 137), December 2005
    Pdf

    This issue consists of:
    • A review of the DISC conference held in September 26-29, 2005, in Cracow, Poland, by Dariusz Kowalski.
  2. Volume 37, Number 1, (Whole Number 138), March 2006
    Pdf

    This issue consists of:
  3. Volume 37, Number 2, (Whole Number 139), June 2006
    Pdf

    This issue consists of:
  4. Volume 37, Number 3, (Whole Number 140), September 2006
    Pdf

    This issue consists of:
  5. Volume 37, Number 4, (Whole Number 141), December 2006
    Pdf

    This issue consists of:
  6. Volume 38, Number 1, (Whole Number 142), March 2007
    Pdf

    This issue consists of:
  7. Volume 38, Number 2, (Whole Number 143), June 2007
    Pdf

    This issue consists of   "A Collection of Kinesthetic Learning Activities for a Course on Distributed Computing", by Paolo Sivilotti and Scott Pike.

  8. Volume 38, Number 3, (Whole Number 144), September 2007
    Pdf

    This issue consists of   "Delayed Password Disclosure", by Markus Jakobsson and Steven Myers.





  Last modified: Wed Feb 27 17:10:22 IST 2008