Web Server forum
Back To The Forum Home!Search!Private Messaging System

This is Interesting: Free IT Magazines Now Free shipping to   
Web Server Talk Web Server Talk > Unix and Linux reviews > Linux support forum > Linux Kernel > [PATCH] Document Linux's memory barriers [try #3]




  Last Thread   Next Thread Next
  Show Printable Version Email this Page Subscribe to this Thread      Post New Thread    Post A Reply      

    [PATCH] Document Linux's memory barriers [try #3]  
David Howells


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
03-08-06 10:50 PM


The attached patch documents the Linux kernel's memory barriers.

I've updated it from the comments I've been given.

Note that the per-arch notes sections are gone because it's clear that there
are so many exceptions, that it's not worth having them.

I've added a list of references to other documents.

I've tried to get rid of the concept of memory accesses appearing on the bus
;
what matters is apparent behaviour with respect to other observers in the
system.

I'm not sure that any mention interrupts vs interrupt disablement should be
retained... it's unclear that there is actually anything that guarantees tha
t
stuff won't leak out of an interrupt-disabled section and into an interrupt
handler. Paul Mackerras says this isn't valid on powerpc, and looking at the
code seems to confirm that, barring implicit enforcement by the CPU.

There's also some uncertainty with respect to spinlocks vs I/O accesses on
NUMA.

Signed-Off-By: David Howells <dhowells@redhat.com>
---
warthog>diffstat -p1 /tmp/mb.diff
Documentation/memory-barriers.txt |  781 +++++++++++++++++++++++++++++++++++
+++
1 files changed, 781 insertions(+)

diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barrie
rs.txt
new file mode 100644
index 0000000..6eeb7e4
--- /dev/null
+++ b/Documentation/memory-barriers.txt
@@ -0,0 +1,781 @@
+			 ============================
+			 LINUX KERNEL MEMORY BARRIERS
+			 ============================
+
+Contents:
+
+ (*) What are memory barriers?
+
+ (*) Where are memory barriers needed?
+
+     - Accessing devices.
+     - Multiprocessor interaction.
+     - Interrupts.
+
+ (*) Explicit kernel compiler barriers.
+
+ (*) Explicit kernel memory barriers.
+
+ (*) Implicit kernel memory barriers.
+
+     - Locking functions.
+     - Interrupt disabling functions.
+     - Miscellaneous functions.
+
+ (*) Inter-CPU locking barrier effects.
+
+     - Locks vs memory accesses.
+     - Locks vs I/O accesses.
+
+ (*) Kernel I/O barrier effects.
+
+ (*) References.
+
+
+=========================
+WHAT ARE MEMORY BARRIERS?
+=========================
+
+Memory barriers are instructions to both the compiler and the CPU to impose
 an
+apparent partial ordering between the memory access operations specified ei
ther
+side of the barrier.  They request that the sequence of memory events gener
ated
+appears to other components of the system as if the barrier is effective on
+that CPU.
+
+Note that:
+
+ (*) there's no guarantee that the sequence of memory events is _actually_ 
so
+     ordered.  It's possible for the CPU to do out-of-order accesses _as lo
ng
+     as no-one is looking_, and then fix up the memory if someone else trie
s to
+     see what's going on (for instance a bus master device); what matters i
s
+     the _apparent_ order as far as other processors and devices are concer
ned;
+     and
+
+ (*) memory barriers are only guaranteed to act within the CPU processing t
hem,
+     and are not, for the most part, guaranteed to percolate down to other 
CPUs
+     in the system or to any I/O hardware that that CPU may communicate wit
h.
+
+
+For example, a programmer might take it for granted that the CPU will perfo
rm
+memory accesses in exactly the order specified, so that if a CPU is, for
+example, given the following piece of code:
+
+	a = *A;
+	*B = b;
+	c = *C;
+	d = *D;
+	*E = e;
+
+They would then expect that the CPU will complete the memory access for eac
h
+instruction before moving on to the next one, leading to a definite sequenc
e of
+operations as seen by external observers in the system:
+
+	read *A, write *B, read *C, read *D, write *E.
+
+
+Reality is, of course, much messier.  With many CPUs and compilers, this is
n't
+always true because:
+
+ (*) reads are more likely to need to be completed immediately to permit
+     execution progress, whereas writes can often be deferred without a
+     problem;
+
+ (*) reads can be done speculatively, and then the result discarded should 
it
+     prove not to be required;
+
+ (*) the order of the memory accesses may be rearranged to promote better u
se
+     of the CPU buses and caches;
+
+ (*) reads and writes may be combined to improve performance when talking t
o
+     the memory or I/O hardware that can do batched accesses of adjacent
+     locations, thus cutting down on transaction setup costs (memory and PC
I
+     devices may be able to do this); and
+
+ (*) the CPU's data cache may affect the ordering, though cache-coherency
+     mechanisms should alleviate this - once the write has actually hit the
+     cache.
+
+So what another CPU, say, might actually observe from the above piece of co
de
+is:
+
+	read *A, read {*C,*D}, write *E, write *B
+
+	(By "read {*C,*D}" I mean a combined single read).
+
+
+It is also guaranteed that a CPU will be self-consistent: it will see its _
own_
+accesses appear to be correctly ordered, without the need for a memory
+barrier.  For instance with the following code:
+
+	X = *A;
+	*A = Y;
+	Z = *A;
+
+assuming no intervention by an external influence, it can be taken that:
+
+ (*) X will hold the old value of *A, and will never happen after the write
 and
+     thus end up being given the value that was assigned to *A from Y inste
ad;
+     and
+
+ (*) Z will always be given the value in *A that was assigned there from Y,
 and
+     will never happen before the write, and thus end up with the same valu
e
+     that was in *A initially.
+
+(This is ignoring the fact that the value initially in *A may appear to be 
the
+same as the value assigned to *A from Y).
+
+
+=================================
+WHERE ARE MEMORY BARRIERS NEEDED?
+=================================
+
+Under normal operation, access reordering is probably not going to be a pro
blem
+as a linear program will still appear to operate correctly.  There are,
+however, three circumstances where reordering definitely _could_ be a probl
em:
+
+
+ACCESSING DEVICES
+-----------------
+
+Many devices can be memory mapped, and so appear to the CPU as if they're j
ust
+memory locations.  However, to control the device, the driver has to make t
he
+right accesses in exactly the right order.
+
+Consider, for example, an ethernet chipset such as the AMD PCnet32.  It
+presents to the CPU an "address register" and a bunch of "data registers". 
 The
+way it's accessed is to write the index of the internal register to be acce
ssed
+to the address register, and then read or write the appropriate data regist
er
+to access the chip's internal register, which could - theoretically - be do
ne
+by:
+
+	*ADR = ctl_reg_3;
+	reg = *DATA;
+
+The problem with a clever CPU or a clever compiler is that the write to the
+address register isn't guaranteed to happen before the access to the data
+register, if the CPU or the compiler thinks it is more efficient to defer t
he
+address write:
+
+	read *DATA, write *ADR
+
+then things will break.
+
+
+In the Linux kernel, however, I/O should be done through the appropriate
+accessor routines - such as inb() or writel() - which know how to make such
+accesses appropriately sequential.
+
+On some systems, I/O writes are not strongly ordered across all CPUs, and s
o
+locking should be used, and mmiowb() should be issued prior to unlocking th
e
+critical section.
+
+See Documentation/DocBook/deviceiobook.tmpl for more information.
+
+
+MULTIPROCESSOR INTERACTION
+--------------------------
+
+When there's a system with more than one processor, the CPUs in the system 
may
+be working on the same set of data at the same time.  This can cause
+synchronisation problems, and the usual way of dealing with them is to use
+locks - but locks are quite expensive, and so it may be preferable to opera
te
+without the use of a lock if at all possible.  In such a case accesses that
+affect both CPUs may have to be carefully ordered to prevent error.
+
+Consider the R/W semaphore slow path.  In that, a waiting process is queued
 on
+the semaphore, as noted by it having a record on its stack linked to the
+semaphore's list:
+
+	struct rw_semaphore {
+		...
+		struct list_head waiters;
+	};
+
+	struct rwsem_waiter {
+		struct list_head list;
+		struct task_struct *task;
+	};
+
+To wake up the waiter, the up_read() or up_write() functions have to read t
he
+pointer from this record to know as to where the next waiter record is, cle
ar
+the task pointer, call wake_up_process() on the task, and release the refer
ence
+held on the waiter's task struct:
+
+	READ waiter->list.next;
+	READ waiter->task;
+	WRITE waiter->task;
+	CALL wakeup
+	RELEASE task
+
+If any of these steps occur out of order, then the whole thing may fail.
+
+Note that the waiter does not get the semaphore lock again - it just waits 
for
+its task pointer to be cleared.  Since the record is on its stack, this mea
ns
+that if the task pointer is cleared _before_ the next pointer in the list i
s
+read, another CPU might start processing the waiter and it might clobber it
s
+stack before up*() functions have a chance to read the next pointer.
+
+	CPU 0				CPU 1
 +	===============================	======
=========================
+					down_xxx()
+					Queue waiter
+					Sleep
+	up_yyy()
+	READ waiter->task;
+	WRITE waiter->task;
+	<preempt>
+					Resume processing
+					down_xxx() returns
+					call foo()
+					foo() clobbers *waiter
+	</preempt>
+	READ waiter->list.next;
+	--- OOPS ---
+
+This could be dealt with using a spinlock, but then the down_xxx() function
 has
+to get the spinlock again after it's been woken up, which is a waste of
+resources.
+
+The way to deal with this is to insert an SMP memory barrier:
+
+	READ waiter->list.next;
+	READ waiter->task;
+	smp_mb();
+	WRITE waiter->task;
+	CALL wakeup
+	RELEASE task
+
+In this case, the barrier makes a guarantee that all memory accesses before
 the
+barrier will appear to happen before all the memory accesses after the barr
ier
+with respect to the other CPUs on the system.  It does _not_ guarantee that
 all
+the memory accesses before the barrier will be complete by the time the bar
rier
+itself is complete.
+
+SMP memory barriers are normally nothing more than compiler barriers on a
+kernel compiled for a UP system because the CPU orders overlapping accesses
+with respect to itself, and so CPU barriers aren't needed.
+
+
+INTERRUPTS
+----------
+
+A driver may be interrupted by its own interrupt service routine, and thus 
they
+may interfere with each other's attempts to control or access the device.
+
+This may be alleviated - at least in part - by disabling interrupts (a form
 of
+locking), such that the critical operations are all contained within the
+interrupt-disabled section in the driver.  Whilst the driver's interrupt
+routine is executing, the driver's core may not run on the same CPU, and it
s
+interrupt is not permitted to happen again until the current interrupt has 
been
+handled, thus the interrupt handler does not need to lock against that.
+
+
+However, consider the following example:
+
+	CPU 1				CPU 2
 +	===============================	======
=========================
+	[A is 0 and B is 0]
+	DISABLE IRQ
+	*A = 1;
+	smp_wmb();
+	*B = 2;
+	ENABLE IRQ
+	<interrupt>
+	*A = 3
+					a = *A;
+					b = *B;
+	smp_wmb();
+	*B = 4;
+	</interrupt>
+
+CPU 2 might see *A == 3 and *B == 0, when what it probably ought to see is 
*B
+== 2 and *A == 1 or *A == 3, or *B == 4 and *A == 3.
+
+This might happen because the write "*B = 2" might occur after the write "*
A =
+3" - in which case the former write has leaked from the interrupt-disabled
+section into the interrupt handler. In this case it is a lock of some
+description should very probably be used.
+
+
+This sort of problem might also occur with relaxed I/O ordering rules, if i
t's
+permitted for I/O writes to cross.  For instance, if a driver was talking t
o an
+ethernet card that sports an address register and a data register:
+
+	DISABLE IRQ
+	writew(ADR, ctl_reg_3);
+	writew(DATA, y);
+	ENABLE IRQ
+	<interrupt>
+	writew(ADR, ctl_reg_4);
+	q = readw(DATA);
+	</interrupt>
+
+In such a case, an mmiowb() is needed, firstly to prevent the first write t
o
+the address register from occurring after the write to the data register, a
nd
+secondly to prevent the write to the data register from happening after the
+second write to the address register.
+
+
+=================================
+EXPLICIT KERNEL COMPILER BARRIERS
+=================================
+
+The Linux kernel has an explicit compiler barrier function that prevents th
e
+compiler from moving the memory accesses either side of it to the other sid
e:
+
+	barrier();
+
+This has no direct effect on the CPU, which may then reorder things however
 it
+wishes.
+
+
+In addition, accesses to "volatile" memory locations and volatile asm
+statements act as implicit compiler barriers.  Note, however, that the use 
of
+volatile has two negative consequences:
+
+ (1) it causes the generation of poorer code, and
+
+ (2) it can affect serialisation of events in code distant from the declara
tion
+     (consider a structure defined in a header file that has a volatile mem
ber
+     being accessed by the code in a source file).
+
+The Linux coding style therefore strongly favours the use of explicit barri
ers
+except in small and specific cases.  In general, volatile should be avoided
.
+
+
+===============================
+EXPLICIT KERNEL MEMORY BARRIERS
+===============================
+
+The Linux kernel has six basic CPU memory barriers:
+
+		MANDATORY	SMP CONDITIONAL
+		===============	===============
+	GENERAL	mb()		smp_mb()
+	READ	rmb()		smp_rmb()
+	WRITE	wmb()		smp_wmb()
+
+General memory barriers give a guarantee that all memory accesses specified
+before the barrier will appear to happen before all memory accesses specifi
ed
+after the barrier with respect to the other components of the system.
+
+Read and write memory barriers give similar guarantees, but only for memory
+reads versus memory reads and memory writes versus memory writes respective
ly.
+
+All memory barriers imply compiler barriers.
+
+SMP memory barriers are only compiler barriers on uniprocessor compiled sys
tems
+because it is assumed that a CPU will be apparently self-consistent, and wi
ll
+order overlapping accesses correctly with respect to itself.
+
+There is no guarantee that any of the memory accesses specified before a me
mory
+barrier will be complete by the completion of a memory barrier; the barrier
 can
+be considered to draw a line in that CPU's access queue that accesses of th
e
+appropriate type may not cross.
+
+There is no guarantee that issuing a memory barrier on one CPU will have an
y
+direct effect on another CPU or any other hardware in the system.  The indi
rect
+effect will be the order in which the second CPU sees the first CPU's acces
ses
+occur.
+
+There is no guarantee that some intervening piece of off-the-CPU hardware&#
91;*]
+will not reorder the memory accesses.  CPU cache coherency mechanisms shoul
d
+propegate the indirect effects of a memory barrier between CPUs.
+
+ [*] For information on bus mastering DMA and coherency please read:
+
+	Documentation/pci.txt
+	Documentation/DMA-mapping.txt
+	Documentation/DMA-API.txt
+
+Note that these are the _minimum_ guarantees.  Different architectures may 
give
+more substantial guarantees, but they may not be relied upon outside of arc
h
+specific code.
+
+
+There are some more advanced barrier functions:
+
+ (*) set_mb(var, value)
+ (*) set_wmb(var, value)
+
+     These assign the value to the variable and then insert at least a writ
e
+     barrier after it, depending on the function.  They aren't guaranteed t
o
+     insert anything more than a compiler barrier in a UP compilation.
+
+
+===============================
+IMPLICIT KERNEL MEMORY BARRIERS
+===============================
+
+Some of the other functions in the linux kernel imply memory barriers, amon
gst
+them are locking and scheduling functions and interrupt management function
s.
+
+This specification is a _minimum_ guarantee; any particular architecture ma
y
+provide more substantial guarantees, but these may not be relied upon outsi
de
+of arch specific code.
+
+
+LOCKING FUNCTIONS
+-----------------
+
+All the following locking functions imply barriers:
+
+ (*) spin locks
+ (*) R/W spin locks
+ (*) mutexes
+ (*) semaphores
+ (*) R/W semaphores
+
+In all cases there are variants on a LOCK operation and an UNLOCK operation
.
+
+ (*) LOCK operation implication:
+
+     Memory accesses issued after the LOCK will be completed after the LOCK
+     accesses have completed.
+
+     Memory accesses issued before the LOCK may be completed after the LOCK
+     accesses have completed.
+
+ (*) UNLOCK operation implication:
+
+     Memory accesses issued before the UNLOCK will be completed before the
+     UNLOCK accesses have completed.
+
+     Memory accesses issued after the UNLOCK may be completed before the UN
LOCK
+     accesses have completed.
+
+ (*) LOCK vs UNLOCK implication:
+
+     The LOCK accesses will be completed before the UNLOCK accesses.
+
+And therefore an UNLOCK followed by a LOCK is equivalent to a full barrier,
 but
+a LOCK followed by an UNLOCK isn't.
+
+Locks and semaphores may not provide any guarantee of ordering on UP compil
ed
+systems, and so can't be counted on in such a situation to actually do anyt
hing
+at all, especially with respect to I/O accesses, unless combined with inter
rupt
+disabling operations.
+
+See also the section on "Inter-CPU locking barrier effects".
+
+
+As an example, consider the following:
+
+	*A = a;
+	*B = b;
+	LOCK
+	*C = c;
+	*D = d;
+	UNLOCK
+	*E = e;
+	*F = f;
+
+The following sequence of events is acceptable:
+
+	LOCK, {*F,*A}, *E, {*C,*D}, *B, UNLOCK
+
+But none of the following are:
+
+	{*F,*A}, *B,	LOCK, *C, *D,	UNLOCK, *E
+	*A, *B, *C,	LOCK, *D,	UNLOCK, *E, *F
+	*A, *B,		LOCK, *C,	UNLOCK, *D, *E, *F
+	*B,		LOCK, *C, *D,	UNLOCK, {*F,*A}, *E
+
+
+INTERRUPT DISABLING FUNCTIONS
+-----------------------------
+
+Functions that disable interrupts (LOCK equivalent) and enable interrupts
+(UNLOCK equivalent) will barrier memory and I/O accesses versus memory and 
I/O
+accesses done in the interrupt handler.  This prevents an interrupt routine
+interfering with accesses made in a interrupt-disabled section of code and 
vice
+versa.
+
+Note that whilst disabling or enabling interrupts acts as a compiler barrie
rs
+under all circumstances, they only act as memory barriers with respect to
+interrupts, not with respect to nested sections.
+
+Consider the following:
+
+	<interrupt>
+	*X = x;
+	</interrupt>
+	*A = a;
+	SAVE IRQ AND DISABLE
+	*B = b;
+	SAVE IRQ AND DISABLE
+	*C = c;
+	RESTORE IRQ
+	*D = d;
+	RESTORE IRQ
+	*E = e;
+	<interrupt>
+	*Y = y;
+	</interrupt>
+
+It is acceptable to observe the following sequences of events:
+
+	{ INT, *X }, *A, SAVE, *B, SAVE, *C, REST, *D, REST, *E, { INT, 
*Y }
+	{ INT, *X }, *A, SAVE, *B, SAVE, *C, REST, *D, REST, { INT, *Y, 
*E }
+	{ INT, *X }, SAVE, SAVE, *A, *B, *C, *D, *E, REST, REST, { INT, 
*Y }
+	{ INT }, *X, SAVE, SAVE, *A, *B, *C, *D, *E, REST, REST, { INT, 
*Y }
+	{ INT }, *A, *X, SAVE, SAVE, *B, *C, *D, *E, REST, REST, { INT, 
*Y }
+
+But not the following:
+
+	{ INT }, SAVE, *A, *B, *X, SAVE, *C, REST, *D, REST, *E, { INT, 
*Y }
+	{ INT, *X }, *A, SAVE, *B, SAVE, *C, REST, REST, { INT, *Y, *D, 
*E }
+
+
+MISCELLANEOUS FUNCTIONS
+-----------------------
+
+Other functions that imply barriers:
+
+ (*) schedule() and similar imply full memory barriers.
+
+
+=================================
+INTER-CPU LOCKING BARRIER EFFECTS
+=================================
+
+On SMP systems locking primitives give a more substantial form of barrier: 
one
+that does affect memory access ordering on other CPUs, within the context o
f
+conflict on any particular lock.
+
+
+LOCKS VS MEMORY ACCESSES
+------------------------
+
+Consider the following: the system has a pair of spinlocks (N) and (Q), and
+three CPUs; then should the following sequence of events occur:
+
+	CPU 1				CPU 2
 +	===============================	======
=========================
+	*A = a;				*E = e;
+	LOCK M				LOCK Q
+	*B = b;				*F = f;
+	*C = c;				*G = g;
+	UNLOCK M			UNLOCK Q
+	*D = d;				*H = h;
+
+Then there is no guarantee as to what order CPU #3 will see the accesses to
 *A
+through *H occur in, other than the constraints imposed by the separate loc
ks
+on the separate CPUs. It might, for example, see:
+
+	*E, LOCK M, LOCK Q, *G, *C, *F, *A, *B, UNLOCK Q, *D, *H, UNLOCK M
+
+But it won't see any of:
+
+	*B, *C or *D preceding LOCK M
+	*A, *B or *C following UNLOCK M
+	*F, *G or *H preceding LOCK Q
+	*E, *F or *G following UNLOCK Q
+
+
+However, if the following occurs:
+
+	CPU 1				CPU 2
 +	===============================	======
=========================
+	*A = a;
+	LOCK M		[1]
+	*B = b;
+	*C = c;
+	UNLOCK M	[1]
+	*D = d;				*E = e;
+					LOCK M		[2]
+					*F = f;
+					*G = g;
+					UNLOCK M	[2]
+					*H = h;
+
+CPU #3 might see:
+
+	*E, LOCK M [1], *C, *B, *A, UNLOCK M [1],
+		LOCK M [2], *H, *F, *G, UNLOCK M [2], *D
+
+But assuming CPU #1 gets the lock first, it won't see any of:
+
+	*B, *C, *D, *F, *G or *H preceding LOCK M [1]
+	*A, *B or *C following UNLOCK M [1]
+	*F, *G or *H preceding LOCK M [2]
+	*A, *B, *C, *E, *F or *G following UNLOCK M [2]
+
+
+LOCKS VS I/O ACCESSES
+---------------------
+
+Under certain circumstances (such as NUMA), I/O accesses within two spinloc
ked
+sections on two different CPUs may be seen as interleaved by the PCI bridge
.
+
+For example:
+
+	CPU 1				CPU 2
 +	===============================	======
=========================
+	spin_lock(Q)
+	writel(0, ADDR)
+	writel(1, DATA);
+	spin_unlock(Q);
+					spin_lock(Q);
+					writel(4, ADDR);
+					writel(5, DATA);
+					spin_unlock(Q);
+
+may be seen by the PCI bridge as follows:
+
+	WRITE *ADDR = 0, WRITE *ADDR = 4, WRITE *DATA = 1, WRITE *DATA = 5
+
+which would probably break.
+
+What is necessary here is to insert an mmiowb() before dropping the spinloc
k,
+for example:
+
+	CPU 1				CPU 2
 +	===============================	======
=========================
+	spin_lock(Q)
+	writel(0, ADDR)
+	writel(1, DATA);
+	mmiowb();
+	spin_unlock(Q);
+					spin_lock(Q);
+					writel(4, ADDR);
+					writel(5, DATA);
+					mmiowb();
+					spin_unlock(Q);
+
+this will ensure that the two writes issued on CPU #1 appear at the PCI bri
dge
+before either of the writes issued on CPU #2.
+
+
+Furthermore, following a write by a read to the same device is okay, becaus
e
+the read forces the write to complete before the read is performed:
+
+	CPU 1				CPU 2
 +	===============================	======
=========================
+	spin_lock(Q)
+	writel(0, ADDR)
+	a = readl(DATA);
+	spin_unlock(Q);
+					spin_lock(Q);
+					writel(4, ADDR);
+					b = readl(DATA);
+					spin_unlock(Q);
+
+
+See Documentation/DocBook/deviceiobook.tmpl for more information.
+
+
+==========================
+KERNEL I/O BARRIER EFFECTS
+==========================
+
+When accessing I/O memory, drivers should use the appropriate accessor
+functions:
+
+ (*) inX(), outX():
+
+     These are intended to talk to I/O space rather than memory space, but
+     that's primarily a CPU-specific concept. The i386 and x86_64 processor
s do
+     indeed have special I/O space access cycles and instructions, but many
+     CPUs don't have such a concept.
+
+     The PCI bus, amongst others, defines an I/O space concept - which on s
uch
+     CPUs as i386 and x86_64 cpus readily maps to the CPU's concept of I/O
+     space.  However, it may also mapped as a virtual I/O space in the CPU'
s
+     memory map, particularly on those CPUs that don't support alternate
+     I/O spaces.
+
+     Accesses to this space may be fully synchronous (as on i386), but
+     intermediary bridges (such as the PCI host bridge) may not fully honou
r
+     that.
+
+     They are guaranteed to be fully ordered with respect to each other.
+
+     They are not guaranteed to be fully ordered with respect to other type
s of
+     memory and I/O operation.
+
+ (*) readX(), writeX():
+
+     Whether these are guaranteed to be fully ordered and uncombined with
+     respect to each other on the issuing CPU depends on the characteristic
s
+     defined for the memory window through which they're accessing. On late
r
+     i386 architecture machines, for example, this is controlled by way of 
the
+     MTRR registers.
+
+     Ordinarily, these will be guaranteed to be fully ordered and uncombine
d,,
+     provided they're not accessing a prefetchable device.
+
+     However, intermediary hardware (such as a PCI bridge) may indulge in
+     deferral if it so wishes; to flush a write, a read from the same locat
ion
+     is preferred[*], but a read from the same device or from configura
tion
+     space should suffice for PCI.
+
+     [*] NOTE! attempting to read from the same location as was written
 to may
+     	 cause a malfunction - consider the 16550 Rx/Tx serial registers for
+     	 example.
+
+     Used with prefetchable I/O memory, an mmiowb() barrier may be required
 to
+     force writes to be ordered.
+
+     Please refer to the PCI specification for more information on interact
ions
+     between PCI transactions.
+
+ (*) readX_relaxed()
+
+     These are similar to readX(), but are not guaranteed to be ordered in 
any
+     way. Be aware that there is no I/O read barrier available.
+
+ (*) ioreadX(), iowriteX()
+
+     These will perform as appropriate for the type of access they're actua
lly
+     doing, be it inX()/outX() or readX()/writeX().
+
+
+==========
+REFERENCES
+==========
+
+AMD64 Architecture Programmer's Manual Volume 2: System Programming
+	Chapter 7.1: Memory-Access Ordering
+	Chapter 7.4: Buffering and Combining Memory Writes
+
+IA-32 Intel Architecture Software Developer's Manual, Volume 3:
+System programming Guide
+	Chapter 7.1: Locked Atomic Operations
+	Chapter 7.2: Memory Ordering
+	Chapter 7.4: Serializing Instructions
+
+The SPARC Architecture Manual, Version 9
+	Chapter 8: Memory Models
+	Appendix D: Formal Specification of the Memory Models
+	Appendix J: programming with the Memory Models
+
+UltraSPARC Programmer Reference Manual
+	Chapter 5: Memory Accesses and Cacheability
+	Chapter 15: Sparc-V9 Memory Models
+
+UltraSPARC III Cu User's Manual
+	Chapter 9: Memory Models
+
+UltraSPARC IIIi Processor User's Manual
+	Chapter 8: Memory Models
+
+UltraSPARC Architecture 2005
+	Chapter 9: Memory
+	Appendix D: Formal Specifications of the Memory Models
+
+UltraSPARC T1 Supplment to the UltraSPARC Architecture 2005
+	Chapter 8: Memory Models
+	Appendix F: Caches and Cache Coherency
+
+Solaris Internals, Core Kernel Architecture, p63-68:
+	Chapter 3.3: hardware Considerations for Locks and
+			Synchronization
+
+Unix Systems for Modern Architectures, Symmetric Multiprocessing and Cachin
g
+for Kernel Programmers:
+	Chapter 13: Other Memory Models
+
+Intel Itanium Architecture Software Developer's Manual: Volume 1:
+	Section 2.6: Speculation
+	Section 4.4: Memory Access
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/





[ Post a follow-up to this message ]



    Re: [PATCH] Document Linux's memory barriers [try #3]  
David Howells


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
03-09-06 10:49 PM


I'm thinking of adding the attached to the document. Any comments or
objections?

David

diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barrie
rs.txt
index 6eeb7e4..f9a9192 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -4,6 +4,8 @@

Contents:

+ (*) What do we consider memory?
+
(*) What are memory barriers?

(*) Where are memory barriers needed?
@@ -32,6 +34,82 @@ Contents:
(*) References.


+===========================
+WHAT DO WE CONSIDER MEMORY?
+===========================
+
+For the purpose of this specification, "memory", at least as far as cached 
CPU
+vs CPU interactions go, has to include the CPU caches in the system.  Altho
ugh
+any particular read or write may not actually appear outside of the CPU tha
t
+issued it because the CPU was able to satisfy it from its own cache, it's s
till
+as if the memory access had taken place as far as the other CPUs are concer
ned
+since the cache coherency and ejection mechanisms will propegate the effect
s
+upon conflict.
+
+Consider the system logically as:
+
+	    <--- CPU --->         :       <----------- Memory ----------->
+	                          :
+	+--------+    +--------+  :   +--------+    +-----------+
+	|        |    |        |  :   |        |    |           |    +---------+
+	|  CPU   |    | Memory |  :   | CPU    |    |           |    |	       |
+	|  Core  |--->| Access |----->| Cache  |<-->|           |    |	       |
+	|        |    | Queue  |  :   |        |    |           |--->| Memory  |
+	|        |    |        |  :   |        |    |           |    |	       |
+	+--------+    +--------+  :   +--------+    |           |    | 	       |
+	                          :                 | Cache     |    +---------+
+	                          :                 | Coherency |
+	                          :                 | Mechanism |    +---------+
+	+--------+    +--------+  :   +--------+    |           |    |	       |
+	|        |    |        |  :   |        |    |           |    |         |
+	|  CPU   |    | Memory |  :   | CPU    |    |           |--->| Device  |
+	|  Core  |--->| Access |----->| Cache  |<-->|           |    | 	       |
+	|        |    | Queue  |  :   |        |    |           |    | 	       |
+	|        |    |        |  :   |        |    |           |    +---------+
+	+--------+    +--------+  :   +--------+    +-----------+
+	                          :
+	                          :
+
+The CPU core may execute instructions in any order it deems fit, provided t
he
+expected program causality appears to be maintained.  Some of the instructi
ons
+generate load and store operations which then go into the memory access que
ue
+to be performed.  The core may place these in the queue in any order it wis
hes,
+and continue execution until it is forced to wait for an instruction to
+complete.
+
+What memory barriers are concerned with is controlling the order in which
+accesses cross from the CPU side of things to the memory side of things, an
d
+the order in which the effects are perceived to happen by the other observe
rs
+in the system.
+
+
+Note that the above model does not show uncached memory or I/O accesses.  T
hese
+procede directly from the queue to the memory or the devices, bypassing any
+cache coherency:
+
+	    <--- CPU --->         :
+       	                          :		+-----+
+	+--------+    +--------+  :             |     |
+	|        |    |        |  :             |     |              +---------+
+	|  CPU   |    | Memory |  :             |     |              |	       |
+	|  Core  |--->| Access |--------------->|     |              |	       |
+	|        |    | Queue  |  :             |     |------------->| Memory  |
+	|        |    |        |  :             |     |              |	       |
+	+--------+    +--------+  :             |     |              | 	       |
+	                          :             |     |              +---------+
+	                          :             | Bus |
+	                          :             |     |              +---------+
+	+--------+    +--------+  :             |     |              |	       |
+	|        |    |        |  :             |     |              |         |
+	|  CPU   |    | Memory |  :             |     |<------------>| Device  |
+	|  Core  |--->| Access |--------------->|     |              | 	       |
+	|        |    | Queue  |  :             |     |              | 	       |
+	|        |    |        |  :             |     |              +---------+
+	+--------+    +--------+  :             |     |
+	                          :		+-----+
+	                          :
+
+
=========================
WHAT ARE MEMORY BARRIERS?
=========================
@@ -448,8 +526,8 @@ In all cases there are variants on a LOC

The LOCK accesses will be completed before the UNLOCK accesses.

-And therefore an UNLOCK followed by a LOCK is equivalent to a full barrier,
 but
-a LOCK followed by an UNLOCK isn't.
+     Therefore an UNLOCK followed by a LOCK is equivalent to a full barrier
,
+     but a LOCK followed by an UNLOCK is not.

Locks and semaphores may not provide any guarantee of ordering on UP compile
d
systems, and so can't be counted on in such a situation to actually do anyth
ing

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/





[ Post a follow-up to this message ]



    Sponsored Links  




 





   All times are GMT. The time now is 12:45 PM.      Post New Thread    Post A Reply      
  Last Thread   Next Thread Next


Most Popular forums 

Forum Jump:
Rate This Thread:

Forum Rules:
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is OFF
vB code is ON
Smilies are ON
[IMG] code is OFF
 

Back To The Top
Home | Usercp | Faq | Register