 |
|
 |
|
|
 |
gdb (linux) "print" command clears memory corruption - so how do I find my b |
 |
 |
|
|
01-23-04 10:28 PM
I am looking for some advice on how to debug a program when the
debugger "print" command actually clears the corruption. This is not
the usual non-initialised memory problem, because the program aborts
with a SIGBUS inside the debugger as well. But when I use the print
command inside the debugger, the program completes normally.
I am using gdb on a linux system. The offending C code is:
memcpy(new_entry, &newloc, IRECPTRLEN);
I display these values just before the memcpy:
printf("Calling memcpy(%p, %p, %d)\n", new_entry, &newloc,
IRECPTRLEN);
... which works. When run straight from gdb (snipped a bit):
$ gdb xwif
(gdb) b src/c_library.c:598
Breakpoint 1 at 0x804bca3: file src/c_library.c, line 598.
(gdb) run
Starting program: /home/dev/bin/xwif -p
Calling memcpy(0x4001f000, 0xbffff04c, 4)
Breakpoint 1, c$keyed_write (p=0x80520a0, record=0x80658a0 "\002") at
src/c_library.c:598
598 memcpy(new_entry, &newloc, IRECPTRLEN);
(gdb) s
Program received signal SIGBUS, Bus error.
0x4207c46c in memcpy () from /lib/i686/libc.so.6
But when I use "print" before "step":
$ gdb xwif
(gdb) b src/c_library.c:598
Breakpoint 1 at 0x804bca3: file src/c_library.c, line 598.
(gdb) r
Starting program: /home/dev/bin/xwif -p
Calling memcpy(0x4001f000, 0xbffff04c, 4)
Breakpoint 1, c$keyed_write (p=0x80520a0, record=0x80658a0 "\002") at
src/c_library.c:598
598 memcpy(new_entry, &newloc, IRECPTRLEN);
(gdb) p new_entry
$1 = 0x4001f000 ""
(gdb) s
599 new_entry += IRECPTRLEN;
(gdb)
... and it completes successfully.
I *know* that I am corrupting memory somewhere (I am calling mmap). I
wrote a small program to test the way I am using mmap(), and it works.
But when I try to include it in a much larger application, it aborts.
I am not asking you to debug my program, nor for help on mmap()
(although, if you really want to spend hours stepping through my code,
I won't object :-) But I am requesting help with techniques to debug
programs exhibiting symptoms like the above.
(I orignally posted this to comp.lang.c, but suspect that I might have
chosen the wrong newsgroup. Perhaps someone can also advise me how I
determine which group to post a query to; is there a FAQ on choosing
newsgroups?)
[ Post a follow-up to this message ]
|
|
|
 |
|
 |
|
 |
|
|
 |
Re: gdb (linux) "print" command clears memory corruption - so how do I find |
 |
 |
|
|
01-23-04 10:28 PM
kreuiter@netscape.net (Gavin Kreuiter) writes:
quote:
> I am looking for some advice on how to debug a program when the
> debugger "print" command actually clears the corruption. This is not
> the usual non-initialised memory problem, because the program aborts
> with a SIGBUS inside the debugger as well. But when I use the print
> command inside the debugger, the program completes normally.
Hmm ... This behaviour is quite rare.
What is the output from "cat /proc/<pid>/maps" just before memcpy()?
quote:
> I *know* that I am corrupting memory somewhere (I am calling mmap).
This doesn't look like memory corruption.
This looks like mmap() with strange/incorrect flags, and possibly
an interaction with or a bug in Linux ptrace (which linux is it BTW?).
quote:
> I wrote a small program to test the way I am using mmap(), and it works.
> But when I try to include it in a much larger application, it aborts.
Or perhaps the application mmap()s something "on top of" your
previous mapping? [But that should not be affected by "gdb print",
I think].
quote:
> I am not asking you to debug my program, nor for help on mmap()
> (although, if you really want to spend hours stepping through my code,
> I won't object :-) But I am requesting help with techniques to debug
> programs exhibiting symptoms like the above.
Well, if you give me access to the debug binary, I can see what I
can do for you (I am curious to find the root cause).
mru@kth.se (Måns Rullgård) writes:
quote:
> Since you are running Linux, I'd suggest you try out valgrind,
Always a good advice.
Though even if valgrind tells what the bug is, I'd still be curious
why it disappears with "gdb print".
Cheers,
--
In order to understand recursion you must first understand recursion.
Remove /-nsp/ for email.
[ Post a follow-up to this message ]
|
|
|
 |
|
 |
|
 |
|
|
 |
Re: gdb (linux) "print" command clears memory corruption - so how do I find |
 |
 |
|
|
01-23-04 10:28 PM
On Thu, 04 Dec 2003 08:39:53 -0800, Paul Pluzhnikov wrote:
quote:
> Though even if valgrind tells what the bug is, I'd still be curious
> why it disappears with "gdb print".
It could be related to the cpu cache (e.g. the new_entry variable
may be cached after the print command) or the cpu pipeline.
The original poster does not mention how the code is compiled, but
it may be worth trying to compile with and without optimization.
--
mail1dotstofanetdotdk
[ Post a follow-up to this message ]
|
|
|
 |
|
 |
|
 |
|
|
 |
Re: gdb (linux) "print" command clears memory corruption - so how do I find |
 |
 |
|
|
01-23-04 10:28 PM
"Bjorn Reese" <breese@see.signature> writes:
quote:
> On Thu, 04 Dec 2003 08:39:53 -0800, Paul Pluzhnikov wrote:
>
>
> It could be related to the cpu cache (e.g. the new_entry variable
> may be cached after the print command) or the cpu pipeline.
Given that all the values were also printed from within the program
on the previous line, this is quite an unlikely explanation.
Cheers,
--
In order to understand recursion you must first understand recursion.
Remove /-nsp/ for email.
[ Post a follow-up to this message ]
|
|
|
 |
|
 |
|
 |
|
|
 |
Re: gdb (linux) "print" command clears memory corruption - so how do I find |
 |
 |
|
|
01-23-04 10:28 PM
On Thu, 04 Dec 2003 11:47:55 -0800, Paul Pluzhnikov wrote:
quote:
> "Bjorn Reese" <breese@see.signature> writes:
quote:
>
> Given that all the values were also printed from within the program
> on the previous line, this is quite an unlikely explanation.
Without further information we can only continue to speculate,
but maybe only memcpy exhibits the cache/pipeline/whatever
problem and printf does not. After all, memcpy tends to be a
highly optimized inlined function/macro, whereas printf is too
complicated to be inlined. So the use of printf may "fix" the
crash in the same manner as the gdb print command does.
I still would be interested in knowing if the code behaves
differently depending on whether optimization has been turned
on or off.
--
mail1dotstofanetdotdk
[ Post a follow-up to this message ]
|
|
|
 |
|
 |
|
 |
|
|
 |
Re: gdb (linux) "print" command clears memory corruption - so how do I find |
 |
 |
|
|
01-23-04 10:32 PM
NNTP-Posting-Host: 168.209.98.35
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit
X-Trace: posting.google.com 1071749857 17356 127.0.0.1 (18 Dec 2003 12:17:37
GMT)
X-Complaints-To: groups-abuse@google.com
NNTP-Posting-Date: Thu, 18 Dec 2003 12:17:37 +0000 (UTC)
Xref: intern1.nntp.aus1.giganews.com comp.unix.programmer:141990
Thanks to all who replied. I was actually looking for advice on how
to debug a problem of this nature. Valgrind seems like a good bet for
future, although it didn't help in this case. As Paul suggested, it
wasn't memory corruption as such; in essence, it was dereferencing an
out-of-bounds pointer (the mmap'd file's disk size is zero).
I managed to reproduce gdb's curious behavior in the small program
below, and include a gdb session for the sake of completeness
(although the reason why this occurs remains a mystery):
-------------------------- <snip> ----------------------------
#include <stdio.h>
#include <fcntl.h>
#include <sys/mman.h>
main() {
int i, fd;
char *ptr;
fd = open("data", O_RDWR | O_CREAT + O_TRUNC, 0777);
ptr = mmap(NULL, 32768, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
for (i = 4096; i <= 32768; i += 4096) {
printf("setting file size to %d\n", i);
printf("ptr[%d] = %d\n", i-1, ptr[i-1]);
}
}
-------------------------- <snip> ----------------------------
$ gdb demo
(gdb) r
setting file size to 4096
Program received signal SIGBUS, Bus error.
0x08048406 in main () at demo.c:14
14 printf("ptr[%d] = %d\n", i-1, ptr[i-1]);
(gdb) b 14
Breakpoint 1 at 0x80483fc: file demo.c, line 14.
(gdb) r
setting file size to 4096
Breakpoint 1, main () at demo.c:14
14 printf("ptr[%d] = %d\n", i-1, ptr[i-1]);
(gdb) p ptr[i-1]
$1 = 0 '\0'
(gdb) c
Continuing.
ptr[4095] = 0
setting file size to 8192
Breakpoint 1, main () at demo.c:14
14 printf("ptr[%d] = %d\n", i-1, ptr[i-1]);
(gdb) p ptr[i-1]
$2 = 0 '\0'
(gdb) c
Continuing.
ptr[8191] = 0
setting file size to 12288
Breakpoint 1, main () at demo.c:14
14 printf("ptr[%d] = %d\n", i-1, ptr[i-1]);
(gdb) c
Continuing.
Program received signal SIGBUS, Bus error.
0x08048406 in main () at demo.c:14
14 printf("ptr[%d] = %d\n", i-1, ptr[i-1]);
-------------------------- <snip> ----------------------------
As can be seen from the above, accessing the out-of-bounds pointer
signalled SIGBUS; first using gdb to dereference it (via "print")
resets it somehow, so that the SIGBUS is not produced.
- The demo is a modified version from Stevens, without calling
ftrunc() to increase the file size on disk.
- I am running Red Hat 8.0.
- valgrind terminates with bus error, without additional info
- optimisation had no visible effect
[ Post a follow-up to this message ]
|
|
|
 |
|
 |
|
 |
|
|
 |
Re: gdb (linux) "print" command clears memory corruption - so how do I find |
 |
 |
|
|
01-23-04 10:32 PM
kreuiter@netscape.net (Gavin Kreuiter) writes:
quote:
> I managed to reproduce gdb's curious behavior in the small program
> below, and include a gdb session for the sake of completeness
> (although the reason why this occurs remains a mystery):
This is a kernel bug: gdb performs "ptrace(PEEK_TEXT, ptr+4095, ...)",
and the kernel "automagically" extends the vma to "cover" the
[ptr, ptr+4095) range.
quote:
> - I am running Red Hat 8.0.
Also reproduces with kernels 2.4.6 and 2.4.20 (RH-9.0).
If anyone could reproduce this on 2.6.0 kernel, this should be
reported, as it makes debugging this particular problem unnecessarily
difficult.
For comparison, here is Solaris behaviour:
Breakpoint 1, main () at mmap4.c:14
14 printf("ptr[%d] = %d\n", i-1, ptr[i-1]);
(gdb) p ptr[i-1]
Cannot access memory at address 0xff390fff.
Cheers,
--
In order to understand recursion you must first understand recursion.
Remove /-nsp/ for email.
[ Post a follow-up to this message ]
|
|
|
 |
|
 |
|
 |
|
|
|
Sponsored Links |
 |
 |
|
|
 |
All times are GMT. The time now is 01:26 PM. |
 |
|
|
 |
|
 |
|
|
 |
|
Forum Rules:
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
|
HTML code is OFF
vB code is ON
Smilies are ON
[IMG] code is OFF
|
|
|
|
Medical and Health forum | Computer Games Reviews | Graphics design forum
|
 |
|
 |
|