Unix Programming - multiprocess mmap'd space and condition-like variables (clarification please).

This is Interesting: Free IT Magazines  
Home > Archive > Unix Programming > December 2006 > multiprocess mmap'd space and condition-like variables (clarification please).





You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

Author multiprocess mmap'd space and condition-like variables (clarification please).
tyler.retzlaff@gmail.com

2006-12-18, 1:37 am

Is the following safe on an SMP system?

p1 and p2 are processes (not threads) that share some mmap()'d space.

The space contains a header which holds variable that is an offset into
the mmap()'d space.
The space also contains "records" to which the offset points to the
last written in the space.

p1 - wants to do the following
loop:
write a record/entry
updates the offset value in the header
goto loop

p2 - wants to do the following
current_offset = 0
while current_offset < offset in mmap:
process record
current_offset += record_size
goto loop

My understanding is that this will probably work fine on a UP system
but if this kind of code (with no synchronization mechanisms) will most
certainly blow up on a SMP system because the write a record entry,
update the offset statements may be subject to out of order execution.
Is my assumption correct here?

If my assumption is correct how do I achieve what I'm trying to do? If
I was using pthreads I would use pthread_cond_t and then
wait()/signal() when a new record arrived.

Can anyone clarify this, it would be appreciated.

Thanks

Barry Margolin

2006-12-18, 1:37 am

In article <1166410587.595968.48050@n67g2000cwd.googlegroups.com>,
tyler.retzlaff@gmail.com wrote:

> Is the following safe on an SMP system?
>
> p1 and p2 are processes (not threads) that share some mmap()'d space.
>
> The space contains a header which holds variable that is an offset into
> the mmap()'d space.
> The space also contains "records" to which the offset points to the
> last written in the space.
>
> p1 - wants to do the following
> loop:
> write a record/entry
> updates the offset value in the header
> goto loop
>
> p2 - wants to do the following
> current_offset = 0
> while current_offset < offset in mmap:
> process record
> current_offset += record_size
> goto loop
>
> My understanding is that this will probably work fine on a UP system
> but if this kind of code (with no synchronization mechanisms) will most
> certainly blow up on a SMP system because the write a record entry,
> update the offset statements may be subject to out of order execution.
> Is my assumption correct here?


It might not even be safe on a UP system, unless you declare the offset
variable volatile. Otherwise, the update to offset might be cached in a
register in p1, rather than being written to the shared memory.

> If my assumption is correct how do I achieve what I'm trying to do? If
> I was using pthreads I would use pthread_cond_t and then
> wait()/signal() when a new record arrived.


Declare everything in the mmap'ed space volatile.

--
Barry Margolin, barmar@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***
*** PLEASE don't copy me on replies, I'll read them in the group ***
tyler.retzlaff@gmail.com

2006-12-18, 1:37 am

Barry Margolin wrote:
> In article <1166410587.595968.48050@n67g2000cwd.googlegroups.com>,
> tyler.retzlaff@gmail.com wrote:
>
>
> It might not even be safe on a UP system, unless you declare the offset
> variable volatile. Otherwise, the update to offset might be cached in a
> register in p1, rather than being written to the shared memory.


You are quite correct if optimisation is enabled at compile time this
would also potentially blow things up even in the UP case. I should
have mentioned that they are declared volatile.

I'm aware that I could probably use architecture dependant asm to
insert memory barriers but I don't want to go that route.

Just to fill in a few more details about what I'm doing (maybe someone
can suggest a pleasant solution).

The two processes in question are not related.

The amount of records I'm shifting between the processes is of high
frequency but relatively low in volume/size (which is why I'm using a
map instead of other forms of IPC such as AF_UNIX sockets).

It is expected that the consumer will fall behind and if it does it is
important that it can process the records, even in the case of producer
failure. So the records must persist which is why mmap()'d files were
chosen. The records themselves are just metadata about more lengthy
work to be done and files needed to do that work.

Tyler

>
> --
> Barry Margolin, barmar@alum.mit.edu
> Arlington, MA
> *** PLEASE post questions in newsgroups, not directly to me ***
> *** PLEASE don't copy me on replies, I'll read them in the group ***


Maxim Yegorushkin

2006-12-20, 7:24 am


Barry Margolin wrote:

> In article <1166410587.595968.48050@n67g2000cwd.googlegroups.com>,
> tyler.retzlaff@gmail.com wrote:
>
>
> It might not even be safe on a UP system, unless you declare the offset
> variable volatile. Otherwise, the update to offset might be cached in a
> register in p1, rather than being written to the shared memory.
>
>
> Declare everything in the mmap'ed space volatile.


volatile only prohibits the compiler from caching the variable in a
register and reordering/combining reads and writes. volatile has no
effect on the processor, which readers and combines reads and writes,
unless there is a memory barrier. Thus, volatile is not sufficient.
http://en.wikipedia.org/wiki/Memory...g_optimizations

One solution may be to use atomic integer functions,
http://gcc.gnu.org/onlinedocs/gcc-4...c-Builtins.html or
those in <bits/atomicity.h>.

Another one is to use memory barriers.
http://www.linuxjournal.com/article/8211
http://www.linuxjournal.com/article/8212

Third solution is to use pthread synchronization primitives. Mutex
locking/unlocking, creating a thread, signaling a condition constitute
a memory barrier (and may be some other function I can't recall).

In either of these cases volatile is not necessary.

moi

2006-12-20, 7:24 am

On Wed, 20 Dec 2006 01:11:00 -0800, Maxim Yegorushkin wrote:

>
> Barry Margolin wrote:


>
> volatile only prohibits the compiler from caching the variable in a
> register and reordering/combining reads and writes. volatile has no
> effect on the processor, which readers and combines reads and writes,
> unless there is a memory barrier. Thus, volatile is not sufficient.
> http://en.wikipedia.org/wiki/Memory...g_optimizations
>
> One solution may be to use atomic integer functions,
> http://gcc.gnu.org/onlinedocs/gcc-4...c-Builtins.html or
> those in <bits/atomicity.h>.
>
> Another one is to use memory barriers.
> http://www.linuxjournal.com/article/8211
> http://www.linuxjournal.com/article/8212
>
> Third solution is to use pthread synchronization primitives. Mutex
> locking/unlocking, creating a thread, signaling a condition constitute
> a memory barrier (and may be some other function I can't recall).
>
> In either of these cases volatile is not necessary.


A fourth candidate, IMHO, could be an implementation of the Bakery
algorithm, which would of course also need volatile.
I don't remember if it works for N>2 processes, but that could be
built on top of N==2, anyway.

AvK



Maxim Yegorushkin

2006-12-20, 1:18 pm


moi wrote:

> On Wed, 20 Dec 2006 01:11:00 -0800, Maxim Yegorushkin wrote:
>
>
>
> A fourth candidate, IMHO, could be an implementation of the Bakery
> algorithm, which would of course also need volatile.
> I don't remember if it works for N>2 processes, but that could be
> built on top of N==2, anyway.


Most textbook parallel algorithms which only rely on volatile do not
work on modern multiprocessors.

moi

2006-12-20, 1:18 pm

On Wed, 20 Dec 2006 06:43:28 -0800, Maxim Yegorushkin wrote:

>
> moi wrote:
>


>
> Most textbook parallel algorithms which only rely on volatile do not
> work on modern multiprocessors.


Please define "Most" and "modern" ;-)

It just needs atomic reads/writes to (shared) memory.
[plus of course 'volatile', but that is just a way to avoid
compiler-generated artefacts]

Maybe I understood Lamport incorrectly ?

http://research.microsoft.com/users...ubs.html#bakery
http://research.microsoft.com/users...t-mutual-solved


HTH,
AvK
Maxim Yegorushkin

2006-12-22, 7:21 pm

moi wrote:
> On Wed, 20 Dec 2006 06:43:28 -0800, Maxim Yegorushkin wrote:
>
>
>
> Please define "Most" and "modern" ;-)


May be x86 and arm's
(http://www.arm.com/markets/mobile_solutions/app.html)?

> It just needs atomic reads/writes to (shared) memory.
> [plus of course 'volatile', but that is just a way to avoid
> compiler-generated artefacts]
>
> Maybe I understood Lamport incorrectly ?
>
> http://research.microsoft.com/users...ubs.html#bakery
> http://research.microsoft.com/users...t-mutual-solved


I believe you understand it just right.

The algorithm essentially does busy waiting and depends on strict
program ordering of memory reads and writes, which normally requires
memory barriers on processors with weaker memory ordering rules.
Volatile has no effect on processor, atomic integer instructions or
memory barriers bust be used.

AMD64 Architecture Programmer's Manual Volume 2

7.1 Memory-Access Ordering

Implementations of the AMD64 architecture retire instructions in
program order, but implementations
can execute instructions in any order. Implementations can also
speculatively execute instructions-
executing instructions before knowing they are needed. Internally,
implementations manage data reads
and writes so that instructions complete in order. However, because
implementations can execute
instructions out of order and speculatively, the sequence of memory
accesses can also be out of
program order (weakly ordered). Processor implementations adhere to the
following rules governing
memory accesses, which can be further restricted depending on the
memory type being accessed:

....

7.1.3 Read/Write Barriers
When the order of memory accesses must be strictly enforced, software
can use read/write barrier
instructions to force reads and writes to proceed in program order.
Read/write barrier instructions force
all prior reads or writes to complete before subsequent reads or writes
are executed. The LFENCE,
SFENCE, and MFENCE instructions are provided as dedicated read, write,
and read/write barrier
instructions (respectively). Serializing instructions, I/O
instructions, and locked instructions can also
be used as read/write barriers.
Table 7-1 on page 168 shows the memory-access ordering possible for
each memory type supported
by the AMD64 architecture.

moi

2006-12-23, 7:27 am

On Fri, 22 Dec 2006 14:01:25 -0800, Maxim Yegorushkin wrote:
[vbcol=seagreen]
> moi wrote:

[Excellent stuff snipped]

I misread the OP. In the typical one producer//one consumer (cyclical
buffer) problem, only two pointers are needed, so bakery would be
overkill) Lamport's concurrent reading and writing(each of the two
processes has write access to only one pointer) will only work if the
{write to buffer; bump pointer;} in the producer-process are executed in
the correct order. So, once out-of-order access hits you, memory-barriers
are a way to impose that order, you are probably correct.
'flush the cache' inbetween (similar to consistency points in
transaction processing) would also be a solution, but I don't know if that
exists in hardware.

AvK


Sponsored Links






Free braindumps | Software forum | Database administration forum

Copyright 2003 - 2008 webservertalk.com