|
Home > Archive > Unix Programming > March 2007 > branch prediction optimization
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
branch prediction optimization
|
|
| Frank Cusack 2007-03-23, 1:27 am |
| Hi
I guess some processors have special branch instructions where you can
inform the CPU which path is likely to be taken. I assume this must
be a useful optimization otherwise why have the special instructions.
If you had a loop like
head = NULL;
tail = NULL;
while (more_data) {
n = new_node();
munge data;
if (tail)
tail->next = n;
else
head = n;
tail = n;
}
and wanted to avoid the branch misprediction penalty, you could
"manually" create the head node first, then enter a loop to append all
following nodes.
Is there a way to tell the compiler (say gcc) that the 'if (tail)' will
be false the first time through the loop and true the rest of the time?
Just curious. I can't really think of an application for this. If it
really mattered you could use likely() and unlikely() like in the
linux kernel, although the first time through the loop will suffer.
I guess some processors will automatically predict the same path as
the last time, thus auto-optimizing this for you. Is that right, or
did I just make that up.
-frank
| |
| Logan Shaw 2007-03-23, 7:25 am |
| Frank Cusack wrote:
> I guess some processors have special branch instructions where you can
> inform the CPU which path is likely to be taken. I assume this must
> be a useful optimization otherwise why have the special instructions.
>
> If you had a loop like
>
> head = NULL;
> tail = NULL;
> while (more_data) {
> n = new_node();
> munge data;
> if (tail)
> tail->next = n;
> else
> head = n;
> tail = n;
> }
>
> and wanted to avoid the branch misprediction penalty, you could
> "manually" create the head node first, then enter a loop to append all
> following nodes.
>
> Is there a way to tell the compiler (say gcc) that the 'if (tail)' will
> be false the first time through the loop and true the rest of the time?
In theory, the compiler can deduce this on its own if it can analyze
"more_data", "new_node()", and "munge data" and see that none of them
affect the value of "tail". Whether it does this is another question
entirely.
Another possible route to this is a compiler that allows input from a
profiler. You compile the code once with some kind of instrumentation
turned on, run it on "typical" data (whatever that means), then feed
profile data into the compiler and compile it again. Then it can go
back through the data and go, "Oh look, in actual practice, the branch
was taken the first time through." Then it can emit the appropriate
code.
> I guess some processors will automatically predict the same path as
> the last time, thus auto-optimizing this for you. Is that right, or
> did I just make that up.
Yes, some processors have dynamic branch prediction. They remember a
set of branch instructions (probably based on the address in memory
where the instruction resides) and which branch was taken last time,
then they assume next time will be the same. I believe some processors
even take this a step further and remember a slightly longer history
than one bit's worth, then only change if the general *trend* is
changing rather than basing everything on the last time. That might
be useful, for example, in this code:
for (int i = 0; i < 100; i++) {
if (i % 4 == 0) {
foo();
} else {
bar();
}
}
In this example, you will take the foo() direction 25% of the time
and bar() the other 75%. But the sequence will be like this:
foo()
bar()
bar()
bar()
foo()
bar()
bar()
bar()
If you always predict that it will branch the same way as last time,
you will be wrong 50% of the time, because you guess wrong when
switching from foo() to bar() and then *again* when switching from
bar() back to foo(). But if you use a longer history where, say,
it takes two consecutive calls to foo() to predict foo() instead of
bar(), you will always predict bar(), and you will only be wrong 25%
of the time. The improvement can be even more dramatic if you change
the expression to "i % 2". The naive approach (predicting this time
will be the same as last time) gets it wrong 100% of the time in
that case!
- Logan
| |
| Rainer Weikusat 2007-03-23, 7:25 am |
| Frank Cusack <fcusack@fcusack.com> writes:
> I guess some processors have special branch instructions where you can
> inform the CPU which path is likely to be taken. I assume this must
> be a useful optimization otherwise why have the special
> instructions.
> If you had a loop like
>
> head = NULL;
> tail = NULL;
> while (more_data) {
> n = new_node();
> munge data;
> if (tail)
> tail->next = n;
> else
> head = n;
> tail = n;
> }
>
> and wanted to avoid the branch misprediction penalty, you could
> "manually" create the head node first, then enter a loop to append all
> following nodes.
Not related to branch prediction: I would really write this as
head = tail = NULL;
if (more_data) {
head = tail = new_node();
while (more_data) {
munge data;
n = new_node();
tail->next = n;
tail = n;
}
munge data;
/* tail->next = NULL? */
}
to get rid of the if in the middle of the loop, which isn't really a
'conditional', but separates the 'runtime life' of the loop in two
non-overlapping phases (condition has value X the first n iterations,
then value !X for all remaining ones).
[I hope the description can be understand -- I have never needed words
to describe this phenomenon so far]
| |
|
| On Thu, 22 Mar 2007 22:12:51 -0700, Frank Cusack wrote:
> Hi
>
> I guess some processors have special branch instructions where you can
> inform the CPU which path is likely to be taken. I assume this must
> be a useful optimization otherwise why have the special instructions.
>
> If you had a loop like
>
> head = NULL;
> tail = NULL;
> while (more_data) {
> n = new_node();
> munge data;
> if (tail)
> tail->next = n;
> else
> head = n;
> tail = n;
> }
>
> and wanted to avoid the branch misprediction penalty, you could
> "manually" create the head node first, then enter a loop to append all
> following nodes.
>
> Is there a way to tell the compiler (say gcc) that the 'if (tail)' will
> be false the first time through the loop and true the rest of the time?
>
> Just curious. I can't really think of an application for this. If it
> really mattered you could use likely() and unlikely() like in the
> linux kernel, although the first time through the loop will suffer.
>
> I guess some processors will automatically predict the same path as
> the last time, thus auto-optimizing this for you. Is that right, or
> did I just make that up.
>
> -frank
Just a thought:
#include <stdlib.h>
struct node {
struct node *next;
};
main()
{
int more_data;
struct node *head, **tail;
head = NULL;
tail = &head;
while(more_data ^= random()) {
*tail = malloc(sizeof **tail);
tail = &(*tail)->next;
}
}
Would contain only one condition (more_data), which could be massaged into
the right branch-prediction behaviour. GCC -S- -O8 compiles the main loop
into:
leal -16(%ebp), %esi
jmp .L2
.p2align 4,,7
..L3:
movl $4, (%esp)
call malloc
movl %eax, (%esi)
movl %eax, %esi
..L2:
call random
xorl %eax, %ebx
jne .L3
Which is (given the unpredictability of the loop condition)
about perfect, IMHO.
HTH,
AvK
| |
| Frank Cusack 2007-03-24, 1:17 pm |
| On Sat, 24 Mar 2007 13:29:25 +0100 moi <root@localhost.localdomain> wrote:
> Just a thought:
>
> #include <stdlib.h>
> struct node {
> struct node *next;
> };
> main()
> {
> int more_data;
> struct node *head, **tail;
>
> head = NULL;
> tail = &head;
....
> Would contain only one condition
Sure. I was just constructing an example. There are any number of reasons
a loop might have a conditional inside it.
-frank
| |
|
| On Sat, 24 Mar 2007 11:04:20 -0700, Frank Cusack wrote:
> On Sat, 24 Mar 2007 13:29:25 +0100 moi <root@localhost.localdomain> wrote:
[vbcol=seagreen]
>
> Sure. I was just constructing an example. There are any number of reasons
> a loop might have a conditional inside it.
>
> -frank
Oops. Sorry, I misread.
Inspecting the manpage for gcc-4.1.1, I learned that some of the
heuristics for optimisation e.g. branchpredicion can be tuned by means of
the --param -flag. This will of course be per-compiled-unit (unless some
ugly #pragma has be introduced).
From an information-theoretic view, the "best" jump/don't jump ratio would
of course be 50/50, but that could never lead to any gain in
branch-prediction ;-] My gut-feeling is that most conditions have a ratio
in the range of about 10/90 or 1/99. It would be nice to be able to inform
the compiler *at least* with execution-path is expected to be taken more
often.
In your example, it would come in handy to be able to hint the compiler
that one of the two paths is expected to happen only once.
Off-topic: Personally, I tend to use a lot of
for( ... ) {
if (condition1) continue;
if (condition2) continue;
...
}
- constructs. (which I find easier to read than:
for( ...) {
if (!condition1 && !condition2) {
...
}}
I expect them to generate exactly the same code.
(the code-path is dominated by the loop-logic, anyway)
AvK
| |
| Rainer Weikusat 2007-03-25, 1:21 pm |
| moi <root@localhost.localdomain> writes:
[...]
> Just a thought:
>
> #include <stdlib.h>
> struct node {
> struct node *next;
> };
> main()
> {
> int more_data;
> struct node *head, **tail;
>
> head = NULL;
> tail = &head;
>
> while(more_data ^= random()) {
> *tail = malloc(sizeof **tail);
> tail = &(*tail)->next;
> }
> }
>
> Would contain only one condition (more_data), which could be massaged into
> the right branch-prediction behaviour. GCC -S- -O8 compiles the main loop
> into:
> leal -16(%ebp), %esi
> jmp .L2
> .p2align 4,,7
> .L3:
> movl $4, (%esp)
> call malloc
> movl %eax, (%esi)
> movl %eax, %esi
> .L2:
> call random
> xorl %eax, %ebx
> jne .L3
JFTR: -O levels above 3 are treated like 3. Apart from that 'gcc
doesn't compile your code into the assembly code you gave', but 'some
particular version of gcc for x86 has done so at some point in time'.
Since the code is totally useless, it as as totally pointless to care
about what is translated into.
| |
| Rainer Weikusat 2007-03-25, 1:21 pm |
| Frank Cusack <fcusack@fcusack.com> writes:
[...]
> Sure. I was just constructing an example. There are any number of
> reasons a loop might have a conditional inside it.
If the 'conditional' is truly 'conditional', there is no way to
'branch-predict' it correctly, because if the necessary sequence of
jumps was predictable, the conditional could have been avoided.
| |
| Måns Rullgård 2007-03-25, 1:21 pm |
| Rainer Weikusat <rainer.weikusat@sncag.com> writes:
> Frank Cusack <fcusack@fcusack.com> writes:
>
> [...]
>
>
> If the 'conditional' is truly 'conditional', there is no way to
> 'branch-predict' it correctly, because if the necessary sequence of
> jumps was predictable, the conditional could have been avoided.
If the input values to the condition are constant for some time before
the condition is evaluated, branch prediction logic might be able to
notice this and correctly predict the branch.
--
Måns Rullgård
mans@mansr.com
| |
| Logan Shaw 2007-03-25, 1:21 pm |
| Rainer Weikusat wrote:
> Frank Cusack <fcusack@fcusack.com> writes:
[vbcol=seagreen]
> If the 'conditional' is truly 'conditional', there is no way to
> 'branch-predict' it correctly, because if the necessary sequence of
> jumps was predictable, the conditional could have been avoided.
It's true that it could have been avoided, but this isn't necessarily
always the right thing to do. Consider, for example, this code:
int a = 0, b = 0, c = 1;
for (int i = 0; i < 1000; i++) {
if (i % 37 == 0) a += c;
if (i % 11 == 0) a += b;
if (i % 13 == 0) b += a;
if (i < 900) c *= 2;
if (i % 3 == 0) c %= 3;
}
This loop runs for a finite, constant number of iterations that is known in
advance. It can thus be entirely unrolled and all conditionals eliminated.
However, by applying some simple reasoning, it's also easy to know which
way the branches will go the majority of the time. (Yes, as it's written,
you can just run the entire computation at compile time too, but let's
assume a, b, and c are just example values and might vary at runtime.)
Of course, you can't predict the conditional correctly 100% of the time
in the above code. But the first one can be predicted correctly 97.3%
the time, the second one 90.9% of the time, the third one 92.3% of the
time, etc. So it seems like a compiler that emits conditionals might
still be generating pretty good code for this.
The point being, you are absolutely correct that you cannot predict it
correctly 100% of the time. But there are many times when you can. If
there were no regular patterns in the execution of software, branch
prediction would be essentially useless. But then so would caches.
- Logan
| |
| Eric Sosman 2007-03-25, 1:21 pm |
| Rainer Weikusat wrote:
> Frank Cusack <fcusack@fcusack.com> writes:
>
> [...]
>
>
> If the 'conditional' is truly 'conditional', there is no way to
> 'branch-predict' it correctly, because if the necessary sequence of
> jumps was predictable, the conditional could have been avoided.
The idea isn't to make a correct prediction for every
execution of the branch, but for the majority of them.
for (i = 0; i < 1000000; ++i) {
loop body ...
}
There are at least two commonly-used ways to compile this:
i = 0;
label1:
if (! (i < 1000000)) goto label2;
loop body ...
++i;
goto label1;
label2:
and
i = 0;
goto label4;
label3:
loop body ...
++i;
label4:
if (i < 1000000) goto label3;
In both cases, the unconditional branches (to label1 and label4)
can of course be predicted with perfect accuracy. The conditional
branches (to label2 and label3) cannot be predicted with certainty,
but the predictions "not taken to label2" and "taken to label3"
will be right almost always. This is a bonus unless the penalty
for an incorrect prediction is more than a million times worse than
the penalty for no prediction at all.
--
Eric Sosman
esosman@acm-dot-org.invalid
| |
| Rainer Weikusat 2007-03-26, 1:36 am |
| Eric Sosman <esosman@acm-dot-org.invalid> writes:
> Rainer Weikusat wrote:
>
> The idea isn't to make a correct prediction for every
> execution of the branch, but for the majority of them.
Oh, really?
> for (i = 0; i < 1000000; ++i) {
> loop body ...
> }
There is no conditional inside this loop.
> There are at least two commonly-used ways to compile this:
>
> i = 0;
> label1:
> if (! (i < 1000000)) goto label2;
> loop body ...
> ++i;
> goto label1;
> label2:
>
> and
>
> i = 0;
> goto label4;
> label3:
> loop body ...
> ++i;
> label4:
> if (i < 1000000) goto label3;
>
> In both cases, the unconditional branches (to label1 and label4)
> can of course be predicted with perfect accuracy.
It is nonsensical to talk about 'predicting technical
certainties', because they are not unknown.
> The conditional branches (to label2 and label3) cannot be predicted
> with certainty,
That should be 'the conditional branches must be predicted, because
they are not certainly taken'.
Could you enlighten me what the point of this posting was?
| |
| Eric Sosman 2007-03-26, 1:36 am |
| Rainer Weikusat wrote:
> Eric Sosman <esosman@acm-dot-org.invalid> writes:
>
> Oh, really?
> [...]
> Could you enlighten me what the point of this posting was?
It's possible I misunderstood you. You wrote that there is
no way to predict a conditional branch correctly (see quote above),
and I read that as meaning that the attempt to predict was not
useful (because it couldn't always be right). My post was an
attempt to show that prediction can be a gain even if it is not
perfect.
I hope this enlightens you.
--
Eric Sosman
esosman@acm-dot-org.invalid
| |
| Rainer Weikusat 2007-03-26, 1:36 am |
| Eric Sosman <esosman@acm-dot-org.invalid> writes:
> Rainer Weikusat wrote:
>
> It's possible I misunderstood you. You wrote that there is
> no way to predict a conditional branch correctly (see quote above),
Exactly. Consequently, there is no way to avoid misprediction, except
if the conditional could have been avoided altogether by using a
different algorithm. And if that was possible, it would be more
sensible to change the algorithm at the source code level, if only to
avoid confusing the next person that is going to read it (even when the
next person is expected to be the original code author himself,
because if the code isn't illogical in the first place, nobody needs
to remember that and why it is so).
> and I read that as meaning that the attempt to predict was not
> useful (because it couldn't always be right).
In other words, you read this as 'Oh, this person has obviously no
clue what branch prediction is supposed to accomplish'. And took this
as an invitation for showing off by demonstrating that you certainly have.
I could follow you so far.
| |
| Logan Shaw 2007-03-26, 1:36 am |
| Rainer Weikusat wrote:
> Eric Sosman <esosman@acm-dot-org.invalid> writes:
[vbcol=seagreen]
> There is no conditional inside this loop.
There are no conditionals inside the body of the loop, but
the loop itself has a conditional. Moreover, it's one that
will branch the same direction 99.9999% of the time[1].
Maybe what we should be saying here is that while you can't
predict the direction a branch will take for any one loop
iteration, it is often the case that the probability of it
taking one direction is much higher than the probability of
it taking the other. And that is enough to make branch
prediction worthwhile.
- Logan
[1] Assuming no loop unrolling. But if you do have loop
unrolling, I assume you won't unroll the loop so much
that you make a million copies of the loop body, which
would be the only way to eliminate the conditional.
If you make 1000 copies of the loop body, you still
have a branch that takes the same direction 99.9% of
the time. This is not as good as 99.9999%, but it's
still enough that branch prediction helps.
| |
| Rainer Weikusat 2007-03-26, 8:20 am |
| Logan Shaw <lshaw-usenet@austin.rr.com> writes:
> Rainer Weikusat wrote:
>
>
>
> There are no conditionals inside the body of the loop, but
> the loop itself has a conditional.
I wasn't writing about 'loop conditionals' but specifically about
conditionals in loop bodies.
[...]
> Maybe what we should be saying here is that while you can't
> predict the direction a branch will take for any one loop
> iteration, it is often the case that the probability of it
> taking one direction is much higher than the probability of
> it taking the other. And that is enough to make branch
> prediction worthwhile.
What makes 'branch prediction' worthwhile (Or was used to make it
appear worthwhile. Somewhere on the 'net is a paper about Itanium that
claims that 'conventional superscalar processors' waste more than half
of their time recovering from mispredicted branches ...) is if the
branch is predicted correctly, the instruction pipeline of the CPU,
which will, on applicable CPUs, already contain future instructions in
various states of 'partially processed', doesn't need to be flushed.
If there isn't some form of branch prediction, no instructions can be
prefetched and fed into the 'entry' stages of the pipeline across a
branch. Otherwise, instructions can be prefetched (and 'started') from
either the branch target location or from sequentially behind the
branch. (Dynamic) 'Branch prediction' tries to 'guess' the correct
descision based on a certain amount of history information about which
branches have or have not been taken so far (this, of course, only
applies to branches that are 're-evaluated').
Without branch prediction, you're toast either way. Otherwise, there
is some chance of avoiding this. Which is clearly worthwhile even when
the percentage of correct 'guesses' was low.
|
|
|
|
|