[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Choreography and Components (was: Re: Is Systems Software Research Irrelevant?)



Daniel Weinreb wrote:
> 
>...
> Real component architectures, in the sense of an architecture that 
> lets two programs interoperate even if the programs were written 
> by disparate people in disparate places under very different 
> adminstrative environments, require *more*, not *less*, declarative 
> semantics. CORBA and > RMI are just a start; at least they give names 
> to operations and specify input and output types. The whole 
> WSDL/UDDI/SOAP path is a somewhat improved version of the
> same.  

I'm curious: improved how?

> But even that's really not good enough for a lot of things, because 
> the above technologies only deal in single request-response 
> interactions, whereas a lot of real interactions have a lot more steps
> than that.  You need to have what eXcelon's BPM product (what I used to
> work on) calls "collaboration definitions", also known as the BPSS 
> stuff in the ebXML standard. These are basically declarations of 
> choreography of messages: A sends a foo to B, and then B sends a bar 
> to A, and then A might send either a baz or a frob to C, and so 
> on (picture a little flow chart). 

Let me posit a theory I've been thinking about for a while. 

"Choreography" is not a notion that we usually use in the local case. If
it was, you would not have to explain it to the list. The reason
choreography is not usually necessary in the local case is because we
instinctively move the "steps to accomplish something" into the data. If
there are a hundred steps to solve a problem, then this can be
represented in an object oriented language as an object, which you call
a method on to get another object, which you ... a hundred times. You
don't have to tell the person the order in which to call because it is
*impossible to get the order wrong*. You don't have the object necessary
to call step 99 until after you've called step 98.

As a simple example, consider two different designs for file IO:

open("/somedirectory/somefile")
data = read("/somedirectory/somefile")

You could execute those steps in the wrong order. But consider what
programming languages really do:

file = open("/somedirectory/somefile")
data = file.read()

You *cannot* execute those steps in the wrong order so there is no need
to define a choreography for this API. In a language with sufficiently
deterministic garbage collection, you could even do away with the
"close" method (which reintroduces a sort of method ordering issue). 

But even if we do not do away with the "close" method,
we do not have to view this as a choreography problem (and generally do
not do so). Rather we say: "when the object is in the closed state, the
read method cannot be called." This is no different than saying: "When a
stack is empty, it may not be popped."

So why can't we apply this principle to business protocol interactions?
First presume as a base that the two partners have access to a shared
authenticated writing space (i.e. a web server). Then make all
assertions about legal protocol interactions in terms of the state of
the written resources, not in terms of the flow of messages that have
preceded. The writing space enforces the assertions. Then do away with
the choreography. 

I claim that this model has the following advantages:

 * The two participants can be as stateful or stateless as they like (as
long as the shared state is available somewhere). For instance, the Web
uses this model and this allows browsers to be stateless. Not all
browsers are stateless, but you can accomplish meaningful, multi-step
tasks on the Web with browsers that are. Obviously less trusting
participants will want to be more stateful if only to log interactions.

 * Eliminating the need to *negotiate* the choreography simplfies the
integration process. Of course you still have to make the assertions
about proper actions given particular properties of the shared state but
I claim that you need to do this *anyways*. That's standard business
rules. You can only do X if the Y amount is greater than 0 etc. DAML
seems like an excellent starting point for a declarative language for
these assertions.  

 * Eliminating the concept of choreography simplifies the implementation
of systems. One less XXML to learn and implement. In today's proposed
architcture using choreography, we need *four* different MLs:

 1. Syntax description ML (e.g. XML Schema)
 2. Business rules ML (e.g. DAML)
 3. Operations ML (e.g. WSDL)
 4. Choreography ML (e.g. BPML)

As a straw man, I propose abolishing 1. and 4. Arguably, validating
syntax doesn't matter if you can validate semantics and choreography
doesn't matter if you can use your shared information space.

 * Third parties (e.g. auditors) can be brought in at any time and they
do not need to see a "replay" of the messages since the beginning of
time to figure out the implicit state of the conversation. They can look
at the shared *explicit* store and instantly know as much about the
conversation as the real participants.

 * Participants can migrate between devices and bring the new device
into sync on the state of the conversation by looking at the shared
state.

So to answer my original question: I claim that the reason choreography
is needed in the remote case but not the local case is because Web
Services standards try to get away from the notion of shared state. It
also seems that the shared state can be virtual in the sense that you
maintain your view of it and I maintain my view of it and we use common
addressing mechanisms. But it feels like it would be a shame to leave
out the simple extra feature whereby one of the participants can choose
to merely use the other participant's shared space if it prefers to do
so.

Other thoughts on this issue for those who wish to followup further:

 * http://www.prescod.net/rest/state_transition.html

> And the protocols have to allow for asynchronous communication, too.

The model above does not prevent asynchronous communication but it does
presume that both parties have a synchronous communication path to the
shared data store (i.e. TCP).

> Pipes don't have any declarative semantics at all. It's like 
> programming in Lisp using only cons cells everywhere, never using 
> structs or classes.  No compile-time typing and no run-time 
> typing: if things don't match, you just get garbage out with 
> no coherent error message.

If you combine pipes with XML schemas and transforms you could probably
solve this problem. The issue is not the pipes, it is the lack of
"schema" for the data. After all, a pipe is just a function with streams
as domain and range. The basic concept seems highly coherent to me.
-- 
Come discuss XML and REST web services at:
  Extreme Markup: Aug 4-9, 2002,  www.extrememarkup.com/extreme/