[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Gwydion & threads

To: address@hidden
Subject: Re: Gwydion & threads
From: Eric Gouriou <address@hidden>
Date: Mon, 01 Apr 2002 21:45:01 -0500
Organization: AT&T Broadband
References: <B8CE02EE.F2CF%scott_ribe@killerbytes.com>
Sender: "Gregory T. Sullivan" <address@hidden>
User-agent: Mozilla/5.0 (Windows; U; Win98; en-US; rv:0.9.9+) Gecko/20020330
Xref: traf.lcs.mit.edu comp.lang.dylan:14076

Scott Ribe wrote:
>>>Then frob could reach into random bits of memory and do unpredictable
>>>things -- the language would be unsafe in the painful C sense, rather
>>>than safe in the happy Lisp/Smalltalk sense.
>>>
>>>When values are words, then even if the value of a variable is not
>>>consistent across threads, they are alawys *values*, rather than
>>>totally random bit-patterns. Now, I've not done much concurrent
>>>programming, so I'd like to know if this a realistic concern?
>>
>>I am not familiar with the internal affairs of d2c, but it just struck me
>>that this problem could be solved by using a slightly different
>>representation of Dylan objects.
> 
> 
> The problem is deeper and more difficult than has been discussed. Our
> discussions up to this point have assumed that an assignment, once made, is
> visible to all threads.

  I do not think this assumption was made. The current
d2c representation cannot use normal assignments and be thread-safe,
whether you have a strongly-ordered memory model or not.

  Now if we start proposing & discussing solutions, then
indeed we need to address weakly-ordered memory models.
Thanks for pointing this issue.

 > This is only true on a single-processor computer. On
> a multiprocessor computer, a simple assignment in one thread will only
> update the value in the cache of the current CPU, so a subsequent read from
> another thread can get the old value from the cache of a different CPU.
 >
> Not just that, but cache is not necessarily written back to main memory in
> order of modification, so a thread can read an arbitrarily strange mix of
> up-to-date and stale values. Thus, ***ALL*** accesses to data that is shared
> between threads must utilize some sort of thread-boundary synchronization
> primitive--you can't get around the problems with ***ANY*** redesign of the
> internal data structures, no matter how clever.

  I do not think this is entirely correct.
  The issue here is the atomicity of a single assignment
of a single-word or a double-word quantity.

  In quite a few cases, it doesn't matter whether another
thread sees the new or the old value, it just matters that it
sees either the new or the old, and no "in between" bit
configuration.

  That is the issue where d2c's double-word representation
loses, while tagged single-word representations (like FD's)
are fine.

  While we can debate how far you can go without involving
real synchronization (spin locks/ mutexes / rw-locks / semaphores,
etc.), there are places where those are not needed and introduce
a lot of overhead.

  Indeed there are quite a few places where performance
hinges on the ability to avoid locks, despite concurrent
read/writes to the same data structures. But you'll have
to decide whether to trust me on this one, I am not at
liberty to discuss any case I heard of.

  To bring the discussion back to Dylan, one question
is: assuming that only one thread can modify a given
list, can I append a value to this list in a thread-safe
manner, without locking ? (Basically assigning a new value
to last_pair.tail)

  If the answer is no, then thread-safe programming suddenly
becomes even more tedious.

> The solution to this problem
> must involve some special OS-level threading support. (Assuming of course
> that direct control over the flushing of CPU cache is handled in the kernel
> and not available to user-land processes!)

  The ISA I am most familiar with (IA64) does not
necessitate any OS support to specify memory ordering
constraints. I believe that's the norm, rather than the
exception.

  From memory:
  ldN  -> N bytes load, no memory ordering constraint, N=1,2,4,8
  stN  -> N bytes store, no memory ordering constraint, N=1,2,4,8

  ldN.acq  -> N bytes ordered load, acquire semantics (aka read fence)
  stN.rel  -> N bytes ordered store, release semantics (aka write fence)

  mf       -> memory fence (aka read/write fence)

Roughly:
  A read fence ensures that any previous read from the same
thread occur before this instruction.
  A write fence ensures that any previous write is made visible
before this write is.

  For reference:
   http://developer.intel.com/design/itanium/index.htm
  Memory Access instructions & memory model:
   http://developer.intel.com/design/itanium/downloads/24531703s.htm,
    of direct interest: sections 4.4.6.2 and 4.4.7, but the whole
    of 4.4 is also somewhat relevant.
  All Instructions:
   http://developer.intel.com/design/itanium/downloads/24531903s.htm

  All read/writes are "atomic". Ie the whole N bytes are
read/written in an atomic fashion. You just aren't guaranteed
that two memory accesses are ordered, unless ordering hints
are used.

  Obviously you also do not want to use ordering hints unless you
have too. I do not remember seeing a figure of how much overhead
using always ld.acq / st.rel would impose, but don't do this.

  Notes:
  - ld.acq/st.rel are generated by compilers (at least the HP ones)
    when accessing volatile variables.
    I often use *((volatile uint32_t*)&uint32_var)) = new_value;
    when I want a st.rel to executed without having to
    declare the variable as volatile to start with.

  - the HP compilers provide additional qualifiers to allow
    developers to specify exactly what they need when they
    use "volatile":
      http://developer.intel.com/design/itanium/devhpux/sld033.htm
    As the slide specifies, "strongly ordered access" is typically
    the most important/costly aspect of "volatile".

  - Given that ld8/st8 are atomic and that the ILP32 memory model
    of IA64 is only a software convention (the processor only runs
    in 64bits), you could get thread-safe assignments assuming:
      - a 32bit GD
      - a descriptor_t aligned on 64bit boundaries
      - a compiler smart enough to use ld8/st8 only
     To ensure this the best way would probably be to declare
     decriptor_t  as a union containing a uint64_t field and use
     this field for all assignments.
     To make things clear, I certainly do _not_ recommend this approach.

  - IIRC x86/ia32 has a strongly-ordered memory model,
    while Ultrasparc and Alpha don't. I don't know about Power/PowerPC.
    This can represent a challenge for new architectures as a good
    deal of code has been written assuming a strongly ordered memory
    model.

> Now, if you remove the possibility of different threads running on different
> CPUs, you can do it without any special OS support.

  But then it's debatable whether you support threads :-)

  Such a threading model would be closer to co-routines
than to the Posix thread model. I won't argue whether
Posix threads are really the right programming model,
but that's certainly what many programmer expect when
you mention "threads".

  Regards,

  Eric

Follow-Ups:
- Re: Gwydion & threads
  - From: Scott Schwartz <"schwartz+@usenet "@bio.cse.psu.edu>
- Re: Gwydion & threads
  - From: Eric Gouriou <egouriou@cup.hp.com>

References:
- Re: Gwydion & threads
  - From: Scott Ribe <scott_ribe@killerbytes.com>

Prev by Date: Re: Gwydion & threads
Next by Date: dylan revival
Previous by thread: Re: Gwydion & threads
Next by thread: Re: Gwydion & threads
Index(es):
- Date
- Thread