|
Home > Archive > Unix Programming > December 2006 > multiprocess mmap'd space and condition-like variables (clarification please).
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
multiprocess mmap'd space and condition-like variables (clarification please).
|
|
| tyler.retzlaff@gmail.com 2006-12-18, 1:37 am |
| Is the following safe on an SMP system?
p1 and p2 are processes (not threads) that share some mmap()'d space.
The space contains a header which holds variable that is an offset into
the mmap()'d space.
The space also contains "records" to which the offset points to the
last written in the space.
p1 - wants to do the following
loop:
write a record/entry
updates the offset value in the header
goto loop
p2 - wants to do the following
current_offset = 0
while current_offset < offset in mmap:
process record
current_offset += record_size
goto loop
My understanding is that this will probably work fine on a UP system
but if this kind of code (with no synchronization mechanisms) will most
certainly blow up on a SMP system because the write a record entry,
update the offset statements may be subject to out of order execution.
Is my assumption correct here?
If my assumption is correct how do I achieve what I'm trying to do? If
I was using pthreads I would use pthread_cond_t and then
wait()/signal() when a new record arrived.
Can anyone clarify this, it would be appreciated.
Thanks
| |
| Barry Margolin 2006-12-18, 1:37 am |
| In article <1166410587.595968.48050@n67g2000cwd.googlegroups.com>,
tyler.retzlaff@gmail.com wrote:
> Is the following safe on an SMP system?
>
> p1 and p2 are processes (not threads) that share some mmap()'d space.
>
> The space contains a header which holds variable that is an offset into
> the mmap()'d space.
> The space also contains "records" to which the offset points to the
> last written in the space.
>
> p1 - wants to do the following
> loop:
> write a record/entry
> updates the offset value in the header
> goto loop
>
> p2 - wants to do the following
> current_offset = 0
> while current_offset < offset in mmap:
> process record
> current_offset += record_size
> goto loop
>
> My understanding is that this will probably work fine on a UP system
> but if this kind of code (with no synchronization mechanisms) will most
> certainly blow up on a SMP system because the write a record entry,
> update the offset statements may be subject to out of order execution.
> Is my assumption correct here?
It might not even be safe on a UP system, unless you declare the offset
variable volatile. Otherwise, the update to offset might be cached in a
register in p1, rather than being written to the shared memory.
> If my assumption is correct how do I achieve what I'm trying to do? If
> I was using pthreads I would use pthread_cond_t and then
> wait()/signal() when a new record arrived.
Declare everything in the mmap'ed space volatile.
--
Barry Margolin, barmar@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***
*** PLEASE don't copy me on replies, I'll read them in the group ***
| |
| tyler.retzlaff@gmail.com 2006-12-18, 1:37 am |
| Barry Margolin wrote:
> In article <1166410587.595968.48050@n67g2000cwd.googlegroups.com>,
> tyler.retzlaff@gmail.com wrote:
>
>
> It might not even be safe on a UP system, unless you declare the offset
> variable volatile. Otherwise, the update to offset might be cached in a
> register in p1, rather than being written to the shared memory.
You are quite correct if optimisation is enabled at compile time this
would also potentially blow things up even in the UP case. I should
have mentioned that they are declared volatile.
I'm aware that I could probably use architecture dependant asm to
insert memory barriers but I don't want to go that route.
Just to fill in a few more details about what I'm doing (maybe someone
can suggest a pleasant solution).
The two processes in question are not related.
The amount of records I'm shifting between the processes is of high
frequency but relatively low in volume/size (which is why I'm using a
map instead of other forms of IPC such as AF_UNIX sockets).
It is expected that the consumer will fall behind and if it does it is
important that it can process the records, even in the case of producer
failure. So the records must persist which is why mmap()'d files were
chosen. The records themselves are just metadata about more lengthy
work to be done and files needed to do that work.
Tyler
>
> --
> Barry Margolin, barmar@alum.mit.edu
> Arlington, MA
> *** PLEASE post questions in newsgroups, not directly to me ***
> *** PLEASE don't copy me on replies, I'll read them in the group ***
| |
| Maxim Yegorushkin 2006-12-20, 7:24 am |
|
Barry Margolin wrote:
> In article <1166410587.595968.48050@n67g2000cwd.googlegroups.com>,
> tyler.retzlaff@gmail.com wrote:
>
>
> It might not even be safe on a UP system, unless you declare the offset
> variable volatile. Otherwise, the update to offset might be cached in a
> register in p1, rather than being written to the shared memory.
>
>
> Declare everything in the mmap'ed space volatile.
volatile only prohibits the compiler from caching the variable in a
register and reordering/combining reads and writes. volatile has no
effect on the processor, which readers and combines reads and writes,
unless there is a memory barrier. Thus, volatile is not sufficient.
http://en.wikipedia.org/wiki/Memory...g_optimizations
One solution may be to use atomic integer functions,
http://gcc.gnu.org/onlinedocs/gcc-4...c-Builtins.html or
those in <bits/atomicity.h>.
Another one is to use memory barriers.
http://www.linuxjournal.com/article/8211
http://www.linuxjournal.com/article/8212
Third solution is to use pthread synchronization primitives. Mutex
locking/unlocking, creating a thread, signaling a condition constitute
a memory barrier (and may be some other function I can't recall).
In either of these cases volatile is not necessary.
| |
|
| On Wed, 20 Dec 2006 01:11:00 -0800, Maxim Yegorushkin wrote:
>
> Barry Margolin wrote:
>
> volatile only prohibits the compiler from caching the variable in a
> register and reordering/combining reads and writes. volatile has no
> effect on the processor, which readers and combines reads and writes,
> unless there is a memory barrier. Thus, volatile is not sufficient.
> http://en.wikipedia.org/wiki/Memory...g_optimizations
>
> One solution may be to use atomic integer functions,
> http://gcc.gnu.org/onlinedocs/gcc-4...c-Builtins.html or
> those in <bits/atomicity.h>.
>
> Another one is to use memory barriers.
> http://www.linuxjournal.com/article/8211
> http://www.linuxjournal.com/article/8212
>
> Third solution is to use pthread synchronization primitives. Mutex
> locking/unlocking, creating a thread, signaling a condition constitute
> a memory barrier (and may be some other function I can't recall).
>
> In either of these cases volatile is not necessary.
A fourth candidate, IMHO, could be an implementation of the Bakery
algorithm, which would of course also need volatile.
I don't remember if it works for N>2 processes, but that could be
built on top of N==2, anyway.
AvK
| |
| Maxim Yegorushkin 2006-12-20, 1:18 pm |
|
moi wrote:
> On Wed, 20 Dec 2006 01:11:00 -0800, Maxim Yegorushkin wrote:
>
>
>
> A fourth candidate, IMHO, could be an implementation of the Bakery
> algorithm, which would of course also need volatile.
> I don't remember if it works for N>2 processes, but that could be
> built on top of N==2, anyway.
Most textbook parallel algorithms which only rely on volatile do not
work on modern multiprocessors.
| |
|
|
| Maxim Yegorushkin 2006-12-22, 7:21 pm |
| moi wrote:
> On Wed, 20 Dec 2006 06:43:28 -0800, Maxim Yegorushkin wrote:
>
>
>
> Please define "Most" and "modern" ;-)
May be x86 and arm's
(http://www.arm.com/markets/mobile_solutions/app.html)?
> It just needs atomic reads/writes to (shared) memory.
> [plus of course 'volatile', but that is just a way to avoid
> compiler-generated artefacts]
>
> Maybe I understood Lamport incorrectly ?
>
> http://research.microsoft.com/users...ubs.html#bakery
> http://research.microsoft.com/users...t-mutual-solved
I believe you understand it just right.
The algorithm essentially does busy waiting and depends on strict
program ordering of memory reads and writes, which normally requires
memory barriers on processors with weaker memory ordering rules.
Volatile has no effect on processor, atomic integer instructions or
memory barriers bust be used.
AMD64 Architecture Programmer's Manual Volume 2
7.1 Memory-Access Ordering
Implementations of the AMD64 architecture retire instructions in
program order, but implementations
can execute instructions in any order. Implementations can also
speculatively execute instructions-
executing instructions before knowing they are needed. Internally,
implementations manage data reads
and writes so that instructions complete in order. However, because
implementations can execute
instructions out of order and speculatively, the sequence of memory
accesses can also be out of
program order (weakly ordered). Processor implementations adhere to the
following rules governing
memory accesses, which can be further restricted depending on the
memory type being accessed:
....
7.1.3 Read/Write Barriers
When the order of memory accesses must be strictly enforced, software
can use read/write barrier
instructions to force reads and writes to proceed in program order.
Read/write barrier instructions force
all prior reads or writes to complete before subsequent reads or writes
are executed. The LFENCE,
SFENCE, and MFENCE instructions are provided as dedicated read, write,
and read/write barrier
instructions (respectively). Serializing instructions, I/O
instructions, and locked instructions can also
be used as read/write barriers.
Table 7-1 on page 168 shows the memory-access ordering possible for
each memory type supported
by the AMD64 architecture.
| |
|
| On Fri, 22 Dec 2006 14:01:25 -0800, Maxim Yegorushkin wrote:
[vbcol=seagreen]
> moi wrote:
[Excellent stuff snipped]
I misread the OP. In the typical one producer//one consumer (cyclical
buffer) problem, only two pointers are needed, so bakery would be
overkill) Lamport's concurrent reading and writing(each of the two
processes has write access to only one pointer) will only work if the
{write to buffer; bump pointer;} in the producer-process are executed in
the correct order. So, once out-of-order access hits you, memory-barriers
are a way to impose that order, you are probably correct.
'flush the cache' inbetween (similar to consistency points in
transaction processing) would also be a solution, but I don't know if that
exists in hardware.
AvK
|
|
|
|
|