|
Home > Archive > Unix Programming > June 2007 > IBM p690 Power4 Shared Memory
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
IBM p690 Power4 Shared Memory
|
|
| techinfo7@gmail.com 2007-06-18, 7:19 pm |
| We are running Suse Linux Enterprise Server (SLES) 10 on an IBM p690
system with 32 Power4 processors. We are having problems porting
software to this system that is tightly coupled using shared memory
(shmget, shmat, etc.).
Doing some research on this problem, we discovered an IBM paper
discussing the Power4 processor. This paper describes the memory model
on the Power4 as "weakly consistent". This memory model requires
applications that use shared memory to generate special
synchronization functions. These synchronization functions ensure that
the data is available to other programs accessing the shared memory.
Special compiler instructions are provided in the paper to generate
these synchronization functions under AIX. A link to this paper is
provided below.
http://www-128.ibm.com/developerwor...power4_mem.html
Does anyone know how can I generate the functions to perform sync,
lsync, eieio, etc. function calls mentioned in the reference paper on
SLES using g++? Does anyone have further information on this topic?
Thanks in advance,
Maria
| |
| Chris Thomasson 2007-06-19, 7:22 am |
| <techinfo7@gmail.com> wrote in message
news:1182205154.559290.258230@q19g2000prn.googlegroups.com...
[...]
> Does anyone know how can I generate the functions to perform sync,
> lsync, eieio, etc. function calls mentioned in the reference paper on
> SLES using g++? Does anyone have further information on this topic?
You can create your own using the GCC Assembler:
http://gcc.gnu.org/ml/gcc-patches/2005-05/msg02374.html
| |
| Jeremy Linton 2007-06-19, 7:22 pm |
|
Chris Thomasson wrote:
> <techinfo7@gmail.com> wrote in message
> news:1182205154.559290.258230@q19g2000prn.googlegroups.com...
> [...]
>
> You can create your own using the GCC Assembler:
>
> http://gcc.gnu.org/ml/gcc-patches/2005-05/msg02374.html
Those are just for individual memory locations. I assume the problem is
that the application isn't explicitly flushing changes because the
locking mechanism isn't flushing the changes itself. I'm guessing he
needs a lwsync before the lock drop. This will guarantee the data
written before the lock drop is actually synced to memory so the next
lock owner reads the correct data.
I thought pthread_mutex_xx did the right thing. I'm guessing that the
original author isn't using pthread_mutex's. So he probably needs a
little piece of code like:
#define SyncMemory() __asm__("lwsync\n"::"memory")
| |
| Jeremy Linton 2007-06-19, 7:22 pm |
|
Jeremy Linton wrote:
> #define SyncMemory() __asm__("lwsync\n"::"memory")
Whoops that should be:
#define SyncMemory() __asm__("lwsync\n":::"memory")
| |
| Chris Thomasson 2007-06-20, 1:23 am |
| "Jeremy Linton" <replytothelist@nospam.com> wrote in message
news:467839ac$0$24723$4c368faf@roadrunne
r.com...
>
>
> Chris Thomasson wrote:
> Those are just for individual memory locations. I assume the problem is
> that the application isn't explicitly flushing changes because the locking
> mechanism isn't flushing the changes itself. I'm guessing he needs a
> lwsync before the lock drop. This will guarantee the data written before
> the lock drop is actually synced to memory so the next lock owner reads
> the correct data.
Okay. Yeah, I assume you would sync after the atomic-op that acquires the
lock, and an lwsync before the atomic-op that releases the lock...
I am not sure if:
lwsync ==
the SPARC version:
membar #LoadStore | #StoreStore
However, I do believe that sync is ==
membar #StoreLoad | #StoreStore
[...]
Are you sure that lwsync is a release barrier strong enough to release the
mutex?
| |
| techinfo7@gmail.com 2007-06-20, 1:26 pm |
| On Jun 19, 1:16 pm, Jeremy Linton <replytothel...@nospam.com> wrote:
> Chris Thomasson wrote:
> Those are just for individual memory locations. I assume the problem is
> that the application isn't explicitly flushing changes because the
> locking mechanism isn't flushing the changes itself. I'm guessing he
> needs a lwsync before the lock drop. This will guarantee the data
> written before the lock drop is actually synced to memory so the next
> lock owner reads the correct data.
>
> I thought pthread_mutex_xx did the right thing. I'm guessing that the
> original author isn't using pthread_mutex's. So he probably needs a
> little piece of code like:
>
> #define SyncMemory() __asm__("lwsync\n"::"memory")
Correct, we don't use any locking mechanisms on shared memory (shmget,
shmat, etc.) for performance reasons. With our software, it is very
important that the write order is preserved on data written to shared
memory. From what I understand, using the eieio will enforce the
ordering of i/o to memory.
I am going to try these suggestions today. I will let you know how
things turn out.
Thanks for the advice,
Maria
| |
| Jeremy Linton 2007-06-20, 1:26 pm |
| techinfo7@gmail.com wrote:
> On Jun 19, 1:16 pm, Jeremy Linton <replytothel...@nospam.com> wrote:
[vbcol=seagreen]
> Correct, we don't use any locking mechanisms on shared memory (shmget,
> shmat, etc.) for performance reasons. With our software, it is very
> important that the write order is preserved on data written to shared
> memory. From what I understand, using the eieio will enforce the
> ordering of i/o to memory.
My understanding is that eieio is a super scalar pipeline control, more
than a memory barrier. You might also look at this link
http://www.ibm.com/developerworks/e...es/powerpc.html which
explicitly states:
"However, eieio has no effect on the order in which two accesses are
performed if one access is to device memory and the other is to system
memory. Also, eieio has no effect in ordering loads to system memory,
and it's not recommended for ordering stores to system memory. As a
final note, eieio is not cumulative for device memory accesses. This is
in contrast to the sync instruction, as described below."
Your original link also says "Creates a memory barrier that provides the
same ordering function as the sync instruction except that ordering
applies only to accesses to I/O memory."
So unless your memory is marked uncached, then the eieio instruction
probably doesn't do what you want.
|
|
|
|
|