 |
|
 |
|
|
 |
branch prediction optimization |
 |
 |
|
|
03-23-07 06:27 AM
Hi
I guess some processors have special branch instructions where you can
inform the CPU which path is likely to be taken. I assume this must
be a useful optimization otherwise why have the special instructions.
If you had a loop like
head = NULL;
tail = NULL;
while (more_data) {
n = new_node();
munge data;
if (tail)
tail->next = n;
else
head = n;
tail = n;
}
and wanted to avoid the branch misprediction penalty, you could
"manually" create the head node first, then enter a loop to append all
following nodes.
Is there a way to tell the compiler (say gcc) that the 'if (tail)' will
be false the first time through the loop and true the rest of the time?
Just curious. I can't really think of an application for this. If it
really mattered you could use likely() and unlikely() like in the
linux kernel, although the first time through the loop will suffer.
I guess some processors will automatically predict the same path as
the last time, thus auto-optimizing this for you. Is that right, or
did I just make that up.
-frank
[ Post a follow-up to this message ]
|
|
|
 |
|
 |
|
 |
|
|
 |
Re: branch prediction optimization |
 |
 |
|
|
03-23-07 12:25 PM
Frank Cusack wrote:
> I guess some processors have special branch instructions where you can
> inform the CPU which path is likely to be taken. I assume this must
> be a useful optimization otherwise why have the special instructions.
>
> If you had a loop like
>
> head = NULL;
> tail = NULL;
> while (more_data) {
> n = new_node();
> munge data;
> if (tail)
> tail->next = n;
> else
> head = n;
> tail = n;
> }
>
> and wanted to avoid the branch misprediction penalty, you could
> "manually" create the head node first, then enter a loop to append all
> following nodes.
>
> Is there a way to tell the compiler (say gcc) that the 'if (tail)' will
> be false the first time through the loop and true the rest of the time?
In theory, the compiler can deduce this on its own if it can analyze
"more_data", "new_node()", and "munge data" and see that none of them
affect the value of "tail". Whether it does this is another question
entirely.
Another possible route to this is a compiler that allows input from a
profiler. You compile the code once with some kind of instrumentation
turned on, run it on "typical" data (whatever that means), then feed
profile data into the compiler and compile it again. Then it can go
back through the data and go, "Oh look, in actual practice, the branch
was taken the first time through." Then it can emit the appropriate
code.
> I guess some processors will automatically predict the same path as
> the last time, thus auto-optimizing this for you. Is that right, or
> did I just make that up.
Yes, some processors have dynamic branch prediction. They remember a
set of branch instructions (probably based on the address in memory
where the instruction resides) and which branch was taken last time,
then they assume next time will be the same. I believe some processors
even take this a step further and remember a slightly longer history
than one bit's worth, then only change if the general *trend* is
changing rather than basing everything on the last time. That might
be useful, for example, in this code:
for (int i = 0; i < 100; i++) {
if (i % 4 == 0) {
foo();
} else {
bar();
}
}
In this example, you will take the foo() direction 25% of the time
and bar() the other 75%. But the sequence will be like this:
foo()
bar()
bar()
bar()
foo()
bar()
bar()
bar()
If you always predict that it will branch the same way as last time,
you will be wrong 50% of the time, because you guess wrong when
switching from foo() to bar() and then *again* when switching from
bar() back to foo(). But if you use a longer history where, say,
it takes two consecutive calls to foo() to predict foo() instead of
bar(), you will always predict bar(), and you will only be wrong 25%
of the time. The improvement can be even more dramatic if you change
the expression to "i % 2". The naive approach (predicting this time
will be the same as last time) gets it wrong 100% of the time in
that case!
- Logan
[ Post a follow-up to this message ]
|
|
|
 |
|
 |
|
 |
|
|
 |
Re: branch prediction optimization |
 |
 |
|
|
03-23-07 12:25 PM
Frank Cusack <fcusack@fcusack.com> writes:
> I guess some processors have special branch instructions where you can
> inform the CPU which path is likely to be taken. I assume this must
> be a useful optimization otherwise why have the special
> instructions.
> If you had a loop like
>
> head = NULL;
> tail = NULL;
> while (more_data) {
> n = new_node();
> munge data;
> if (tail)
> tail->next = n;
> else
> head = n;
> tail = n;
> }
>
> and wanted to avoid the branch misprediction penalty, you could
> "manually" create the head node first, then enter a loop to append all
> following nodes.
Not related to branch prediction: I would really write this as
head = tail = NULL;
if (more_data) {
head = tail = new_node();
while (more_data) {
munge data;
n = new_node();
tail->next = n;
tail = n;
}
munge data;
/* tail->next = NULL? */
}
to get rid of the if in the middle of the loop, which isn't really a
'conditional', but separates the 'runtime life' of the loop in two
non-overlapping phases (condition has value X the first n iterations,
then value !X for all remaining ones).
[I hope the description can be understand -- I have never needed words
to describe this phenomenon so far]
[ Post a follow-up to this message ]
|
|
|
 |
|
 |
|
 |
|
|
 |
Re: branch prediction optimization |
 |
 |
|
|
03-24-07 06:17 PM
On Thu, 22 Mar 2007 22:12:51 -0700, Frank Cusack wrote:
> Hi
>
> I guess some processors have special branch instructions where you can
> inform the CPU which path is likely to be taken. I assume this must
> be a useful optimization otherwise why have the special instructions.
>
> If you had a loop like
>
> head = NULL;
> tail = NULL;
> while (more_data) {
> n = new_node();
> munge data;
> if (tail)
> tail->next = n;
> else
> head = n;
> tail = n;
> }
>
> and wanted to avoid the branch misprediction penalty, you could
> "manually" create the head node first, then enter a loop to append all
> following nodes.
>
> Is there a way to tell the compiler (say gcc) that the 'if (tail)' will
> be false the first time through the loop and true the rest of the time?
>
> Just curious. I can't really think of an application for this. If it
> really mattered you could use likely() and unlikely() like in the
> linux kernel, although the first time through the loop will suffer.
>
> I guess some processors will automatically predict the same path as
> the last time, thus auto-optimizing this for you. Is that right, or
> did I just make that up.
>
> -frank
Just a thought:
#include <stdlib.h>
struct node {
struct node *next;
};
main()
{
int more_data;
struct node *head, **tail;
head = NULL;
tail = &head;
while(more_data ^= random()) {
*tail = malloc(sizeof **tail);
tail = &(*tail)->next;
}
}
Would contain only one condition (more_data), which could be massaged into
the right branch-prediction behaviour. GCC -S- -O8 compiles the main loop
into:
leal -16(%ebp), %esi
jmp .L2
.p2align 4,,7
.L3:
movl $4, (%esp)
call malloc
movl %eax, (%esi)
movl %eax, %esi
.L2:
call random
xorl %eax, %ebx
jne .L3
Which is (given the unpredictability of the loop condition)
about perfect, IMHO.
HTH,
AvK
[ Post a follow-up to this message ]
|
|
|
 |
|
 |
|
 |
|
|
 |
Re: branch prediction optimization |
 |
 |
|
|
03-24-07 06:17 PM
On Sat, 24 Mar 2007 13:29:25 +0100 moi <root@localhost.localdomain> wrote:
> Just a thought:
>
> #include <stdlib.h>
> struct node {
> struct node *next;
> };
> main()
> {
> int more_data;
> struct node *head, **tail;
>
> head = NULL;
> tail = &head;
...
> Would contain only one condition
Sure. I was just constructing an example. There are any number of reasons
a loop might have a conditional inside it.
-frank
[ Post a follow-up to this message ]
|
|
|
 |
|
 |
|
 |
|
|
 |
Re: branch prediction optimization |
 |
 |
|
|
03-25-07 12:19 AM
On Sat, 24 Mar 2007 11:04:20 -0700, Frank Cusack wrote:
> On Sat, 24 Mar 2007 13:29:25 +0100 moi <root@localhost.localdomain> wrote:
[vbcol=seagreen]
>
> Sure. I was just constructing an example. There are any number of reason
s
> a loop might have a conditional inside it.
>
> -frank
Oops. Sorry, I misread.
Inspecting the manpage for gcc-4.1.1, I learned that some of the
heuristics for optimisation e.g. branchpredicion can be tuned by means of
the --param -flag. This will of course be per-compiled-unit (unless some
ugly #pragma has be introduced).
From an information-theoretic view, the "best" jump/don't jump ratio would
of course be 50/50, but that could never lead to any gain in
branch-prediction ;-] My gut-feeling is that most conditions have a ratio
in the range of about 10/90 or 1/99. It would be nice to be able to inform
the compiler *at least* with execution-path is expected to be taken more
often.
In your example, it would come in handy to be able to hint the compiler
that one of the two paths is expected to happen only once.
Off-topic: Personally, I tend to use a lot of
for( ... ) {
if (condition1) continue;
if (condition2) continue;
..
}
- constructs. (which I find easier to read than:
for( ...) {
if (!condition1 && !condition2) {
..
}}
I expect them to generate exactly the same code.
(the code-path is dominated by the loop-logic, anyway)
AvK
[ Post a follow-up to this message ]
|
|
|
 |
|
 |
|
 |
|
|
 |
Re: branch prediction optimization |
 |
 |
|
|
03-25-07 06:21 PM
moi <root@localhost.localdomain> writes:
[...]
> Just a thought:
>
> #include <stdlib.h>
> struct node {
> struct node *next;
> };
> main()
> {
> int more_data;
> struct node *head, **tail;
>
> head = NULL;
> tail = &head;
>
> while(more_data ^= random()) {
> *tail = malloc(sizeof **tail);
> tail = &(*tail)->next;
> }
> }
>
> Would contain only one condition (more_data), which could be massaged into
> the right branch-prediction behaviour. GCC -S- -O8 compiles the main loop
> into:
> leal -16(%ebp), %esi
> jmp .L2
> .p2align 4,,7
> .L3:
> movl $4, (%esp)
> call malloc
> movl %eax, (%esi)
> movl %eax, %esi
> .L2:
> call random
> xorl %eax, %ebx
> jne .L3
JFTR: -O levels above 3 are treated like 3. Apart from that 'gcc
doesn't compile your code into the assembly code you gave', but 'some
particular version of gcc for x86 has done so at some point in time'.
Since the code is totally useless, it as as totally pointless to care
about what is translated into.
[ Post a follow-up to this message ]
|
|
|
 |
|
 |
|
 |
|
|
 |
Re: branch prediction optimization |
 |
 |
|
|
03-25-07 06:21 PM
Frank Cusack <fcusack@fcusack.com> writes:
[...]
> Sure. I was just constructing an example. There are any number of
> reasons a loop might have a conditional inside it.
If the 'conditional' is truly 'conditional', there is no way to
'branch-predict' it correctly, because if the necessary sequence of
jumps was predictable, the conditional could have been avoided.
[ Post a follow-up to this message ]
|
|
|
 |
|
 |
|
 |
|
|
 |
Re: branch prediction optimization |
 |
 |
|
|
03-25-07 06:21 PM
Rainer Weikusat <rainer.weikusat@sncag.com> writes:
> Frank Cusack <fcusack@fcusack.com> writes:
>
> [...]
>
>
> If the 'conditional' is truly 'conditional', there is no way to
> 'branch-predict' it correctly, because if the necessary sequence of
> jumps was predictable, the conditional could have been avoided.
If the input values to the condition are constant for some time before
the condition is evaluated, branch prediction logic might be able to
notice this and correctly predict the branch.
--
Måns Rullgård
mans@mansr.com
[ Post a follow-up to this message ]
|
|
|
 |
|
 |
|
 |
|
|
 |
Re: branch prediction optimization |
 |
 |
|
|
03-25-07 06:21 PM
Rainer Weikusat wrote:
> Frank Cusack <fcusack@fcusack.com> writes:
[vbcol=seagreen]
> If the 'conditional' is truly 'conditional', there is no way to
> 'branch-predict' it correctly, because if the necessary sequence of
> jumps was predictable, the conditional could have been avoided.
It's true that it could have been avoided, but this isn't necessarily
always the right thing to do. Consider, for example, this code:
int a = 0, b = 0, c = 1;
for (int i = 0; i < 1000; i++) {
if (i % 37 == 0) a += c;
if (i % 11 == 0) a += b;
if (i % 13 == 0) b += a;
if (i < 900) c *= 2;
if (i % 3 == 0) c %= 3;
}
This loop runs for a finite, constant number of iterations that is known in
advance. It can thus be entirely unrolled and all conditionals eliminated.
However, by applying some simple reasoning, it's also easy to know which
way the branches will go the majority of the time. (Yes, as it's written,
you can just run the entire computation at compile time too, but let's
assume a, b, and c are just example values and might vary at runtime.)
Of course, you can't predict the conditional correctly 100% of the time
in the above code. But the first one can be predicted correctly 97.3%
the time, the second one 90.9% of the time, the third one 92.3% of the
time, etc. So it seems like a compiler that emits conditionals might
still be generating pretty good code for this.
The point being, you are absolutely correct that you cannot predict it
correctly 100% of the time. But there are many times when you can. If
there were no regular patterns in the execution of software, branch
prediction would be essentially useless. But then so would caches.
- Logan
[ Post a follow-up to this message ]
|
|
|
 |
|
 |
|
 |
|
|
|
Sponsored Links |
 |
 |
|
|
 |
All times are GMT. The time now is 08:04 AM. |
 |
|
|
 |
|
 |
|
|
 |
|
Forum Rules:
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
|
HTML code is OFF
vB code is ON
Smilies are ON
[IMG] code is OFF
|
|
|
|
Medical and Health forum | Computer Games Reviews | Graphics design forum
|
 |
|
 |
|