Unix Programming - localtime() coredumps

This is Interesting: Free IT Magazines  
Home > Archive > Unix Programming > January 2004 > localtime() coredumps





You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

Author localtime() coredumps
aryzhov

2004-01-23, 5:14 pm

Hello All,

I am getting a coredump on ca. every 50th call
to localtime_r().

Program terminated with signal 11, Segmentation fault.
#0 0x00044d14 in _free_unlocked ()
(gdb)
(gdb)
(gdb) bt
#0 0x00044d14 in _free_unlocked ()
#1 0x00044cdc in free ()
#2 0x0003d42c in tzcpy ()
#3 0x0003d0e8 in _ltzset_u ()
#4 0x0003c0a4 in localtime_u ()
#5 0x0003c28c in localtime_r ()
#6 0x0001dac0 in MyDate (time=1067940071) at fnct.c:53
#7 0x00013e90 in ind_cat () at indcat.c:73
#8 0x0001d5a4 in dpj
(elements=51, cur=0x7a8a8, li=17, width=26, x0=3, y0=5)
at dpj.c:53
#9 0x000107b8 in main
(argc=1, argv=0xffbef3fc) at main.c:182
(gdb)

This only happens in MyDate, although I call localtime_r
in many other places of program. According to "top",
the program starts with the size of 970K, slowly leaks
the memory until reaches almost 1000K, then makes a "BOOM".
I've spent quite a while searching for a leak, with no luck.
Only couple of malloc's() and free's(), always giving memory
back completely. It wouldn't coredump that quickly even if
they not returned, would it? And, at least, not in localtime(),
I guess.

I also have couple of longjmp()'s, and it's not so easy
to get rid of them. But these longjmp's worked perfectly
on other platforms for past 5 years (I am porting the
thing to Solaris 6/7/8/9 now, compiling in Solaris 2.6
with gcc 3.3).

Here's a piece of source. I have changed some vars to static
just for a test, but with dynamic it was absolutely the same.

static time_t extime;
static struct tm tms;
// dateline is also a static char[64];

char *MyDate(time) time_t time; {

extime=time;

localtime_r(&extime,&tms);
snprintf(dateline, sizeof dateline,
"%02d.%02d.%04d %02d:%02d'%02d\"",
tms.tm_mday, 1+tms.tm_mon, 1900+tms.tm_year,
tms.tm_hour, tms.tm_min, tms.tm_sec
);
return((char *)&dateline[0]);

}

I also tried to catch SIGSERV:
with signal(), it coredumps on the very next call to localtime(),
anyway; with sigset() just hangs, I guess, looping on a signal.

Any ideas, please?

Is there any GPL library for time manipulation that
I could try out instead of Solaris bundled one?
Or do you think the lib is not a suspect, and I must keep
searching for a leak?

Thanks,
Andrei
Andrei Voropaev

2004-01-23, 5:14 pm

On 2003-11-04, aryzhov <aryzhov@my-deja.com> wrote:
quote:

> Hello All,
>
> I am getting a coredump on ca. every 50th call
> to localtime_r().
>
> Program terminated with signal 11, Segmentation fault.
> #0 0x00044d14 in _free_unlocked ()
> (gdb)
> (gdb)
> (gdb) bt
> #0 0x00044d14 in _free_unlocked ()
> #1 0x00044cdc in free ()
> #2 0x0003d42c in tzcpy ()
> #3 0x0003d0e8 in _ltzset_u ()
> #4 0x0003c0a4 in localtime_u ()
> #5 0x0003c28c in localtime_r ()
> #6 0x0001dac0 in MyDate (time=1067940071) at fnct.c:53
> #7 0x00013e90 in ind_cat () at indcat.c:73
> #8 0x0001d5a4 in dpj
> (elements=51, cur=0x7a8a8, li=17, width=26, x0=3, y0=5)
> at dpj.c:53
> #9 0x000107b8 in main
> (argc=1, argv=0xffbef3fc) at main.c:182
> (gdb)
>
> This only happens in MyDate, although I call localtime_r
> in many other places of program. According to "top",
> the program starts with the size of 970K, slowly leaks
> the memory until reaches almost 1000K, then makes a "BOOM".
> I've spent quite a while searching for a leak, with no luck.
> Only couple of malloc's() and free's(), always giving memory
> back completely. It wouldn't coredump that quickly even if
> they not returned, would it? And, at least, not in localtime(),
> I guess.
>
> I also have couple of longjmp()'s, and it's not so easy
> to get rid of them. But these longjmp's worked perfectly
> on other platforms for past 5 years (I am porting the
> thing to Solaris 6/7/8/9 now, compiling in Solaris 2.6
> with gcc 3.3).
>
> Here's a piece of source. I have changed some vars to static
> just for a test, but with dynamic it was absolutely the same.
>
> static time_t extime;
> static struct tm tms;
> // dateline is also a static char[64];
>
> char *MyDate(time) time_t time; {
>
> extime=time;
>
> localtime_r(&extime,&tms);
> snprintf(dateline, sizeof dateline,
> "%02d.%02d.%04d %02d:%02d'%02d\"",
> tms.tm_mday, 1+tms.tm_mon, 1900+tms.tm_year,
> tms.tm_hour, tms.tm_min, tms.tm_sec
> );
> return((char *)&dateline[0]);
>
> }
>
> I also tried to catch SIGSERV:
> with signal(), it coredumps on the very next call to localtime(),
> anyway; with sigset() just hangs, I guess, looping on a signal.
>
> Any ideas, please?



Check the rest of the code for buffer overflow. I always find one when
localtime crashes like this.

Andrei
Casper H.S. Dik

2004-01-23, 5:14 pm

aryzhov@my-deja.com (aryzhov) writes:

You stacktrace shows that the heap is getting corrupted.
That is bad; a new localtime() implementation will not help but may
mask the problem (so it appears to help)_
quote:

>This only happens in MyDate, although I call localtime_r
>in many other places of program. According to "top",
>the program starts with the size of 970K, slowly leaks
>the memory until reaches almost 1000K, then makes a "BOOM".
>I've spent quite a while searching for a leak, with no luck.
>Only couple of malloc's() and free's(), always giving memory
>back completely. It wouldn't coredump that quickly even if
>they not returned, would it? And, at least, not in localtime(),
>I guess.


quote:

>I also have couple of longjmp()'s, and it's not so easy
>to get rid of them. But these longjmp's worked perfectly
>on other platforms for past 5 years (I am porting the
>thing to Solaris 6/7/8/9 now, compiling in Solaris 2.6
>with gcc 3.3).



Where are you longjumping from? From signal handlers? if so, note
that after a longjmp from a signal handler you are only allowed to
call Async-signal-safe functions; i.e., not much and certainly not
localtime_r().
quote:

>Here's a piece of source. I have changed some vars to static
>just for a test, but with dynamic it was absolutely the same.



You'd really need to give a complete program which reproduces the
problem.

Casper
--
Expressed in this posting are my opinions. They are in no way related
to opinions held by my employer, Sun Microsystems.
Statements on Sun products included here are not gospel and may
be fiction rather than truth.
Kurtis D. Rader

2004-01-23, 5:14 pm

On Tue, 04 Nov 2003 02:33:43 -0800, aryzhov wrote:
quote:

> I am getting a coredump on ca. every 50th call
> to localtime_r().
>
> Program terminated with signal 11, Segmentation fault.
> #0 0x00044d14 in _free_unlocked ()
> (gdb)
> (gdb)
> (gdb) bt
> #0 0x00044d14 in _free_unlocked ()
> #1 0x00044cdc in free ()
> #2 0x0003d42c in tzcpy ()
> #3 0x0003d0e8 in _ltzset_u ()
> #4 0x0003c0a4 in localtime_u ()
> #5 0x0003c28c in localtime_r ()
> #6 0x0001dac0 in MyDate (time=1067940071) at fnct.c:53
> #7 0x00013e90 in ind_cat () at indcat.c:73
> #8 0x0001d5a4 in dpj
> (elements=51, cur=0x7a8a8, li=17, width=26, x0=3, y0=5)
> at dpj.c:53
> #9 0x000107b8 in main
> (argc=1, argv=0xffbef3fc) at main.c:182
> (gdb)



This has nothing to do with the localtime_r() function. Notice that the
failure is in the free() routine. Somewhere in your code you've done one
of three things:

1) Free()'d a block twice
2) Used a block after it was free()'d
3) Written past the end of a block you malloc()'ed

I'm betting it's #3. Something I've seen happen many times is for a
program that's been running fine on platform X to start failing like this
when ported to platform Y. Invesigation always shows that on platform X
the allocations are being rounded up to a larger size than on platform Y.
Thus providing just enough "padding" to keep the heap from being corrupted
on platform X.
quote:

> Any ideas, please?
>
> Is there any GPL library for time manipulation that I could try out
> instead of Solaris bundled one? Or do you think the lib is not a
> suspect, and I must keep searching for a leak?



I am certain the problem is not with the Solaris time routines. You might
try using a debug malloc package like Electric Fence. Alternatively,
perform a code review looking at every malloc() invocation (and calloc()
etc., of course). Consider whether you might be allocating a buffer that
is smaller than actually required. For example, a common mistake is to do
something like this:

char *str = malloc( strlen( "abcd" ));
strcpy( str, "abcd" );

That will result in a block that is one byte shorter than required.
Depending on the granularity used by your malloc() routine that
mistake could be masked by padding added to the block you requested.
Sean Burke

2004-01-23, 5:15 pm


"Kurtis D. Rader" <krader@skepticism.us> writes:
quote:

> On Tue, 04 Nov 2003 02:33:43 -0800, aryzhov wrote:
>
>
> This has nothing to do with the localtime_r() function. Notice that the
> failure is in the free() routine. Somewhere in your code you've done one
> of three things:
>
> 1) Free()'d a block twice
> 2) Used a block after it was free()'d
> 3) Written past the end of a block you malloc()'ed
>
> I'm betting it's #3. Something I've seen happen many times is for a
> program that's been running fine on platform X to start failing like this
> when ported to platform Y. Invesigation always shows that on platform X
> the allocations are being rounded up to a larger size than on platform Y.
> Thus providing just enough "padding" to keep the heap from being corrupted
> on platform X.
>
>
> I am certain the problem is not with the Solaris time routines. You might
> try using a debug malloc package like Electric Fence. Alternatively,
> perform a code review looking at every malloc() invocation (and calloc()
> etc., of course). Consider whether you might be allocating a buffer that
> is smaller than actually required. For example, a common mistake is to do
> something like this:
>
> char *str = malloc( strlen( "abcd" ));
> strcpy( str, "abcd" );
>
> That will result in a block that is one byte shorter than required.
> Depending on the granularity used by your malloc() routine that
> mistake could be masked by padding added to the block you requested.



Note that Solaris includes a malloc debugging library,
which can catch most elementary goofs like double-frees,
and is fairly easy to use. man 'watchmalloc' for details.
(I'm not sure it was included in Solaris 2.6 tho).

-SEan
aryzhov

2004-01-23, 5:15 pm

The 2 subroutines I just posted, it fact was one only, since
only one frees the memory.. Here's another one that frees.
Regards,
Andrei

// prepend the new tuning file
//
sgettune(name) register char *name; {

register int fd, add, len, newlen;
register char *newdescr, *newendloc, *newenddescr;
struct stat st;
int nr, totlen, statlen;

// See how big the new tuning file is
fd=open(name,0);
if(
( fd<=0 and fd!=ENOENT)
or fstat(fd,&st) == -1
or (st.st_dev == sdev and st.st_ino == sino )
or (st.st_dev == udev and st.st_ino == uino )

) { fd = -1; len = 0; } else len = (int)st.st_size;

statlen=enddescr-endloc; // Piece that never change
totlen=len+statlen; // How much we need now
if( (newdescr=malloc(totlen+2)) == NULL ) { //+1024 tried, too
wrerr("Can not malloc for new tuning file");
return;
}
nr=0;
if(fd>0) nr=read(fd, newdescr, len); close(fd);
if( nr != len ) {
wrerr("Bad read from tuning file");
free(newdescr);
return;
}

// Move the static piece to the very end of new area.
memcpy(newdescr+len, endloc, statlen);
free(begdescr); // Release the old area
begdescr=newdescr;
endloc=newdescr+len; // Set the statis area pointer
enddescr=newdescr+len+statlen;
*enddescr='\0'; // We still have 2 spare bytes here
if( *begdescr == '/' ) gethead(begdescr);
}
Eric Sosman

2004-01-23, 5:15 pm

aryzhov wrote:
quote:

> [...]
> extern char *environ[];



Some of your troubles may begin with this incorrect
declaration of `environ', which is a `char**' and not a
`char*[]'. See Question 6.1 in the comp.lang.c Frequently
Asked Questions (FAQ) list

http://www.eskimo.com/~scs/C-faq/top.html

if you don't understand why `char**' and `char*[]' are
different. (Questions 6.2 and 6.3 may also be helpful.)

A larger question: Why not use getenv() and putenv()
instead of this code?

--
Eric.Sosman@sun.com
Jens.Toerring@physik.fu-berlin.de

2004-01-23, 5:15 pm

aryzhov <aryzhov@my-deja.com> wrote:
quote:

> The 2 subroutines I just posted, it fact was one only, since
> only one frees the memory.. Here's another one that frees.
> Regards,
> Andrei


quote:

> // prepend the new tuning file
> //
> sgettune(name) register char *name; {



Sorry, but why do you still use pre-ANSI C constructs nearly one and
half a decade since they have become obsolete? And you're not declaring
the return type of the function which seems to return void (but without
explicitely declaring it a C89 compiler will default to int) And nowadays
'register' usually isn't useful anymore, compilers have become quite
good at figuring out what to put into registers and what not. So make
this

void sgettune( char *name ) {

or better, since you don't seem to change the contents of what 'name'
points to, make it

void sgettune( const char *name ) {
quote:

> register int fd, add, len, newlen;
> register char *newdescr, *newendloc, *newenddescr;
> struct stat st;
> int nr, totlen, statlen;


quote:

> // See how big the new tuning file is
> fd=open(name,0);



The last time I checked the man page for open(2) there where lots
of macros lsited for setting the 'flags' argument, but 0 wasn't one
of them - assuming that O_RDONLY is 0 makes this inherently
unportable.
quote:

> if(
> ( fd<=0 and fd!=ENOENT)



Since when has 'and' become a C keyword? What about '&&' or isn't
this supposed to be C? And if fd is less than 0 (that's when you
can be sure there was a failure, 0 is a completely legal return
value of open(2)) the value of fd won't be ENOENT (which usually
is 2) but 'errno' might be set to ENOENT. And why don't you
check just for fd < 0? If it's less than 0 it doesn't make
sense to call fstat() on fd, whatever the reason for the failure
to open the file was.
quote:

> or fstat(fd,&st) == -1



And 'or' is also not a C keyword, it's spelled '||'.
quote:

> or (st.st_dev == sdev and st.st_ino == sino )
> or (st.st_dev == udev and st.st_ino == uino )



What's 'sdev', 'sino', 'udev' or 'uino'?
quote:

> ) { fd = -1; len = 0; } else len = (int)st.st_size;


quote:

> statlen=enddescr-endloc; // Piece that never change



Where did you define 'enddescr' or 'endloc'?
quote:

> totlen=len+statlen; // How much we need now
> if( (newdescr=malloc(totlen+2)) == NULL ) { //+1024 tried, too


quote:

> wrerr("Can not malloc for new tuning file");
> return;
> }
> nr=0;
> if(fd>0) nr=read(fd, newdescr, len); close(fd);



The layout of this makes it look as if you think that both statements
would only be executed if fd is larger than 0, but only the execution
of the read() call depends on the value of fd.

Ok, I give up - the rest is complete gibberish and won't work unless
all the obviously global variables have been carefully initialized,
which I can't check. Chances aren't bad that something is going
wrong here. And did you rea lly ever try to feed this to your
compiler? I can hardly imagine that any self-respecting C compiler
would do anything else than spitting out an armload of error and
warning messages.
Regards, Jens
--
_ _____ _____
| ||_ _||_ _| Jens.Toerring@physik.fu-berlin.de
_ | | | | | |
| |_| | | | | | http://www.physik.fu-berlin.de/~toerring
\___/ens|_|homs|_|oerring
aryzhov

2004-01-23, 5:15 pm

Jens.Toerring@physik.fu-berlin.de wrote in message news:<bocd64$1ctocc$1@uni-berlin.de>...
quote:

> aryzhov <aryzhov@my-deja.com> wrote:
>
>
> Sorry, but why do you still use pre-ANSI C constructs nearly one and
> half a decade since they have become obsolete?



I don't use them, I just don't see them since they still work.
Had no reason yet to rewrite the whole code.
quote:

> And you're not declaring
> the return type of the function which seems to return void (but without
> explicitely declaring it a C89 compiler will default to int) And nowadays
> 'register' usually isn't useful anymore, compilers have become quite
> good at figuring out what to put into registers and what not. So make
> this
>
> void sgettune( char *name ) {
>



Thanks, accepted :-)
quote:

>
>
> The last time I checked the man page for open(2) there where lots
> of macros lsited for setting the 'flags' argument, but 0 wasn't one
> of them - assuming that O_RDONLY is 0 makes this inherently
> unportable.



Good point, too

quote:

>
>
> Since when has 'and' become a C keyword?



I guess since staements like "#define and &&" have become
possible. Some people are quite nostalgic about other languages
and may define rather funny constructs that still get
compiled with CC with no single warning. I don't see
a problem with this if the code remains readable.
quote:

> And if fd is less than 0 (that's when you
> can be sure there was a failure, 0 is a completely legal return
> value of open(2)) the value of fd won't be ENOENT (which usually
> is 2) but 'errno' might be set to ENOENT.



True, thats a plain mistake.
quote:

> And why don't you
> check just for fd < 0? If it's less than 0 it doesn't make
> sense to call fstat() on fd, whatever the reason for the failure
> to open the file was.



fstat() will return a failure, anyway, if descriptor is -1,
and I check it later; One less if() statement...
One more system call... true, extra if() is better.
quote:

>
> And 'or' is also not a C keyword, it's spelled '||'.
>



Sure :-) Why not write the whole thing in Assembler?
Forget about macros, defines, etc.. :-)
quote:

>
> What's 'sdev', 'sino', 'udev' or 'uino'?
>



Static variables that contain device id, inode num, etc,
to make sure this file hasn't been read in yet.
quote:

>
>
> Where did you define 'enddescr' or 'endloc'?



Those are static char*, defined in the very beginning of the
file which is rather big. There's another subroutine that initialises
them. It also does some malloc's(), but never free(), so I
didn't post it here since it's out of suspiction for memory leaks.
quote:

>
>
>
> The layout of this makes it look as if you think that both statements
> would only be executed if fd is larger than 0, but only the execution
> of the read() call depends on the value of fd.



True.. I'm just trying to keep the code more compact, sometimes
readability suffers
quote:

>
> Ok, I give up - the rest is complete gibberish and won't work unless
> all the obviously global variables have been carefully initialized,
> which I can't check. Chances aren't bad that something is going
> wrong here. And did you rea lly ever try to feed this to your
> compiler? I can hardly imagine that any self-respecting C compiler
> would do anything else than spitting out an armload of error and
> warning messages.



As I mentioned before, compiler warnings is something that I got rid
of many years ago. I can mail the whole code if you don't believe :-)
Your input is highly appreciated, is a great lesson and certainly will be
considered for further code cleanup, even though only slightly relates
to the original question :-)

Thanks,
Andrei
aryzhov

2004-01-23, 5:15 pm

Eric Sosman <Eric.Sosman@sun.com> wrote in message news:<3FA945BD.18E505E6@sun.com>...
quote:

> aryzhov wrote:
>
> Some of your troubles may begin with this incorrect
> declaration of `environ', which is a `char**' and not a
> `char*[]'. See Question 6.1 in the comp.lang.c Frequently
> Asked Questions (FAQ) list
>
> http://www.eskimo.com/~scs/C-faq/top.html
>
> if you don't understand why `char**' and `char*[]' are
> different. (Questions 6.2 and 6.3 may also be helpful.)



I tried both, it doesn't seen to make much difference, but I'll
certainly check the FAQ
quote:

>
> A larger question: Why not use getenv() and putenv()
> instead of this code?



No idea... Originally, the program was written by Mike Flerow,
I did a major rewrite back in 1989, and didn't touch it since,
except small dirty patches here and there, until it started coredumping
in Solaris. This piece used to be even worse - a huge static array where
slots for new variables were allocated and never removed.

Since the program makes lots of exec's() I just thought it makes sense
to revise the environ at very start and keep it at the heap..
Probably a wrong consideration... It all seemed so diferent 15 years ago.
Ralf Fassel

2004-01-23, 5:15 pm

* aryzhov@my-deja.com (aryzhov)
| free(begdescr); // Release the old area

Check whether on the very first time you come here `begdescr' contains
a valid pointer to pass to free() (must have been returned by malloc()
et al. before). If not, it might be one source of your coredumps.

R'
Jens.Toerring@physik.fu-berlin.de

2004-01-23, 5:16 pm

aryzhov <aryzhov@my-deja.com> wrote:
quote:

> Jens.Toerring@physik.fu-berlin.de wrote in message news:<bocd64$1ctocc$1@uni-berlin.de>...
[QUOTE][color=darkred]
> I guess since staements like "#define and &&" have become
> possible. Some people are quite nostalgic about other languages
> and may define rather funny constructs that still get
> compiled with CC with no single warning. I don't see
> a problem with this if the code remains readable.



Of course, you can set all kinds of self-defined macros to obfuscate
your code beyond all readibilty (just have a look at entries for
the Obfuscated C Contest from about 10 or 15 years ago;-), but that
doesn't make it good practice. Every C programmer will immediately
recognize what '&&' and '||' are but to find out what 'and' and
'or' mean you have to find out where they are defined and if they
really are what one would guess them to be.
quote:

[QUOTE][color=darkred]
> Those are static char*, defined in the very beginning of the
> file which is rather big. There's another subroutine that initialises
> them. It also does some malloc's(), but never free(), so I
> didn't post it here since it's out of suspiction for memory leaks.



free()'s most probably won't be your problem. The moment you write
outside the boundaries of memory you own (either by allocation or
in the form of arrays) all bets are off. So you have to check all
places where you use allocated memory as well as places where you
use arrays (to make sure that you don't write past their end for
example). Unfortunately, what you posted does not allow one to try
to find out if everything is ok. And some things from the code
that look a bit strange make one suspect that there were already
similar problems in the code that you (or someone else) tried to
"repair" without really understanding the reasons for the problem
in the first place. Look for example at this
quote:


I can't see why you need to add 2 to 'totlen' (1 should be enough
if I understand correctly what's coming later in the function)
and the comment about "trying to add 1024 too" makes it look even
more suspicious. Usually you should know exactly how much memory
you're going to need. The moment you start trying to avoid
problems by just experimenting with the numbers you pass to
malloc() you're in deep, dark, cold and very slimy water ;-) It
might look as if it works for some shorter or longer time but
any small change to the program (or trying to compile it with a
different compiler or compile it on a different architecture) it
may come back to haunt you.
[QUOTE][color=darkred]
[QUOTE][color=darkred]
> True.. I'm just trying to keep the code more compact, sometimes
> readability suffers



Well, you should close 'fd' only if it is a valid file descriptor,
and since you seem to treat everything negative or zero to be
invalid descriptors you shouldn't try to close it. And when the
alternatives are "compactness" versus readibility you should always
go for readability - a few extra bytes for newlines and spaces
(or even braces) won't hurt your hard disk and won't make the
compiler any slower.
quote:

> As I mentioned before, compiler warnings is something that I got rid
> of many years ago. I can mail the whole code if you don't believe :-)



Even after raising the verbosity of the compiler to something useful?
At least it should be complaining e.g. about the unused variables you
are defining...
Regards, Jens
--
_ _____ _____
| ||_ _||_ _| Jens.Toerring@physik.fu-berlin.de
_ | | | | | |
| |_| | | | | | http://www.physik.fu-berlin.de/~toerring
\___/ens|_|homs|_|oerring
aryzhov

2004-01-23, 5:16 pm

aryzhov@my-deja.com (aryzhov) wrote in message news:<7ca75749.0311040233.6edcb44e@posting.google.com>...
quote:

> Hello All,
>
> I am getting a coredump on ca. every 50th call
> to localtime_r().
>



Hello again,

This is probably to make everyone laugh,
but I wouldn't mind a materiaistic explanation, either.
As Sean Burke suggested, I've relinked the thing with
watchmalloc:

rm PROG
LD_PRELOAD=watchmalloc.so.1 ; export LD_PRELOAD ; make
MALLOC_DEBUG=WATCH ; export MALLOC_DEBUG ; ./PROG

Maybe I misunderstand something, but it catches SIGTRAP
on the very first call to malloc() that perfectly works
when WATCHMALLOC is not set. Usage?

Now, the funny part.. Since watchmalloc library only exists
in DSO form, I relinked everything dynamically (previously
PROG was always linked statically, as I use it sometimes in
environments where no dynamic libs are available).
AND IT DIDN'T COREDUMP ANYMORE!
Interesting.. I've relinked it again, dynamically but without
LD_PRELOAD in the environment. Well... it did crash, but
it took 15 minutes of sress test to get that SIGSEGV in localtime(),
whereas the static version compiled from the same source got it
after few seconds of such test. Another advantage of shared libs, heh?
And, if linked with watchmalloc, there is no way to crash it..
Which sort of proves that memory allocations may be taken out of
suspect list. The trouble remains that the process still grows,
and DSO version grows alot faster than statc, so there is a leak
somewhere.

I've spent some time looking for hidden recursies, with no success
so far. What else should I look for?

Regards,
Andrei
Jens.Toerring@physik.fu-berlin.de

2004-01-23, 5:16 pm

aryzhov <aryzhov@my-deja.com> wrote:
quote:

> aryzhov@my-deja.com (aryzhov) wrote in message news:<7ca75749.0311040233.6edcb44e@posting.google.com>...
> Now, the funny part.. Since watchmalloc library only exists
> in DSO form, I relinked everything dynamically (previously
> PROG was always linked statically, as I use it sometimes in
> environments where no dynamic libs are available).
> AND IT DIDN'T COREDUMP ANYMORE!
> Interesting.. I've relinked it again, dynamically but without
> LD_PRELOAD in the environment. Well... it did crash, but
> it took 15 minutes of sress test to get that SIGSEGV in localtime(),
> whereas the static version compiled from the same source got it
> after few seconds of such test. Another advantage of shared libs, heh?
> And, if linked with watchmalloc, there is no way to crash it..
> Which sort of proves that memory allocations may be taken out of
> suspect list.



No, absolutely not, quite to the contrary. You *have* some memory
corruption in your program. But depending on how you compile and
link your program the problem moves within the code. Under some
circumstances it might seem to go away completely, because the
memory you trash isn't important in that configuration (just might
alter your data in a way that makes the results still seem to be
reasonable but wrong anyway), in other cases it may happen with
different frequencies. That's one of the most typical signs of
memory corruption you can get. It may even happen that the bug
seems to go away when you run it under a debugger or with
watchmalloc, do a google search for what a "Heisenbug" is.

Regards, Jens
--
_ _____ _____
| ||_ _||_ _| Jens.Toerring@physik.fu-berlin.de
_ | | | | | |
| |_| | | | | | http://www.physik.fu-berlin.de/~toerring
\___/ens|_|homs|_|oerring
Casper H.S. Dik

2004-01-23, 5:16 pm

aryzhov@my-deja.com (aryzhov) writes:
quote:

>This is probably to make everyone laugh,
>but I wouldn't mind a materiaistic explanation, either.
>As Sean Burke suggested, I've relinked the thing with
>watchmalloc:


quote:

> rm PROG
> LD_PRELOAD=watchmalloc.so.1 ; export LD_PRELOAD ; make
> MALLOC_DEBUG=WATCH ; export MALLOC_DEBUG ; ./PROG


quote:

>Maybe I misunderstand something, but it catches SIGTRAP
>on the very first call to malloc() that perfectly works
>when WATCHMALLOC is not set. Usage?



That means that there's a problem with the allocated memory;
I wouldn't expect it to crash inside malloc, though.
quote:

>Now, the funny part.. Since watchmalloc library only exists
>in DSO form, I relinked everything dynamically (previously
>PROG was always linked statically, as I use it sometimes in
>environments where no dynamic libs are available).
>AND IT DIDN'T COREDUMP ANYMORE!



Dynamic system libraries are *always* available; statically linking
libc and other system libraries does not give portable binaries.
quote:

>Interesting.. I've relinked it again, dynamically but without
>LD_PRELOAD in the environment. Well... it did crash, but
>it took 15 minutes of sress test to get that SIGSEGV in localtime(),
>whereas the static version compiled from the same source got it
>after few seconds of such test. Another advantage of shared libs, heh?
>And, if linked with watchmalloc, there is no way to crash it..
>Which sort of proves that memory allocations may be taken out of
>suspect list. The trouble remains that the process still grows,
>and DSO version grows alot faster than statc, so there is a leak
>somewhere.



You should not involve watchmalloc until after the linking phase
and exactly like this:


env LD_PRELOAD=watchmalloc.so.1 MALLOC_DEBUG=WATCH,RW ./PROG

The "RW" will make it run a lot slower but catches more errors.

Casper
--
Expressed in this posting are my opinions. They are in no way related
to opinions held by my employer, Sun Microsystems.
Statements on Sun products included here are not gospel and may
be fiction rather than truth.
aryzhov

2004-01-23, 5:17 pm

Jens.Toerring@physik.fu-berlin.de wrote in message news:<bodh14$1di5dj$1@uni-berlin.de>...
quote:

> aryzhov <aryzhov@my-deja.com> wrote:
>
> No, absolutely not, quite to the contrary. You *have* some memory
> corruption in your program. But depending on how you compile and
> link your program the problem moves within the code.



....

I don't object there's a problem with memoty usage, I'm just trying to say
it's probably not related to malloc/free. There may me thousand other erasons
for memory leaks, but none except recursion of wrong usage of local
variables comes to mind.

Regards,
andrei
aryzhov

2004-01-23, 5:17 pm

Casper H.S. Dik <Casper.Dik@Sun.COM> wrote in message news:<3faa4cde$0$58700$e4fe514c@news.xs4all.nl>...
quote:

> aryzhov@my-deja.com (aryzhov) writes:
>
>
> That means that there's a problem with the allocated memory;
> I wouldn't expect it to crash inside malloc, though.
>



There's nothing allocated yet at this stage; it's literally
the very first use of malloc().
quote:

>
> Dynamic system libraries are *always* available; statically linking
> libc and other system libraries does not give portable binaries.



I didn't mean portability, but rather the recovery scenarios where,
for instance, Solaris / slice is mounted ro, and /usr is nowhere
to find at all. No libc.so ... "echo *" instead of ls.. etc.
Staticaly linked binaries are usually a great help there.
quote:

> You should not involve watchmalloc until after the linking phase
> and exactly like this:
>
>
> env LD_PRELOAD=watchmalloc.so.1 MALLOC_DEBUG=WATCH,RW ./PROG



Same thing.. SIGTRAP on the very first call to malloc(),
while no writes to any allocated zones happened so far.


#0 0xff371010 in malloc_unlocked () from /usr/lib/watchmalloc.so.1
(gdb)
(gdb)
(gdb)
(gdb) bt
#0 0xff371010 in malloc_unlocked () from /usr/lib/watchmalloc.so.1
#1 0xff370e6c in malloc () from /usr/lib/watchmalloc.so.1
#2 0x0001f540 in inienv () at setenv.c:32
#3 0x0001269c in main (argc=1, argv=0xffbef34c) at main.c:119
(gdb)

-----------
20 #define ENVSIZE 4096 // Only so many allowed
21
22 extern char **environ;
23 static char *new_env[ENVSIZE+2];
24
25 int inienv() { // Called from main() once; just copies
26 // the whole inherited environ[]
27 // to the internal space, so we can
28 // easily manipulate it later
29 register i;
30 for(i=0; i<ENVSIZE && environ[i]!=NULL; i++) {
31 int ell=strlen(environ[i]); // Env line len with "xx="
32 new_env[i]=malloc(ell+2);
33 checkfatal(new_env[i]==0, "No Memory at env init");
34 strcpy(new_env[i], environ[i]);
35 }
36 for( ; i<ENVSIZE ; i++) new_env[i]=NULL;
37 environ=new_env;
38 }
39

Of course I can get rid of this piece and use putenv() later, instead
of own setenv() function, as Eric Sosman suggested, but I doubt it will
help against memry leaks.

Something is probably wrong with the way I am using watchmalloc..

Thanks,
Andrei
Ralf Fassel

2004-01-23, 5:17 pm

* aryzhov@my-deja.com (aryzhov)
| > env LD_PRELOAD=watchmalloc.so.1 MALLOC_DEBUG=WATCH,RW ./PROG
|
| Same thing.. SIGTRAP on the very first call to malloc(),
| while no writes to any allocated zones happened so far.

Remember that other library calls (stdio, strdup) might call malloc,
too (though I would not expect them to corrupt the malloc stack if
used properly).

FWIW, I just copied your inienv() and run it in a small test app, no
problems with or without watchmalloc. So I suspect your main() is
doing something else before calling inienv()?

R'

Kurtis D. Rader

2004-01-23, 5:17 pm

On Fri, 07 Nov 2003 00:12:25 -0800, aryzhov wrote:
quote:

> 20 #define ENVSIZE 4096 // Only so many allowed
> 21
> 22 extern char **environ;
> 23 static char *new_env[ENVSIZE+2];
> 24
> 25 int inienv() { // Called from main() once; just copies
> 26 // the whole inherited environ[]
> 27 // to the internal space, so we can
> 28 // easily manipulate it later
> 29 register i;
> 30 for(i=0; i<ENVSIZE && environ[i]!=NULL; i++) {
> 31 int ell=strlen(environ[i]); // Env line len with "xx="
> 32 new_env[i]=malloc(ell+2);
> 33 checkfatal(new_env[i]==0, "No Memory at env init");
> 34 strcpy(new_env[i], environ[i]);
> 35 }
> 36 for( ; i<ENVSIZE ; i++) new_env[i]=NULL;
> 37 environ=new_env;
> 38 }
> 39
>
> Of course I can get rid of this piece and use putenv() later, instead
> of own setenv() function, as Eric Sosman suggested, but I doubt it will
> help against memry leaks.
>
> Something is probably wrong with the way I am using watchmalloc..



I can't speak to your use of watchmalloc since I don't work on Sun
systems.

I do see one thing obviously wrong with the code. If the existing
environ array has exactly 4096 members your new_env array will not be
terminated with a NULL pointer by the inienv() function. In practice it
will still be NULL terminated because uninitialized static objects are
normally placed in pages of memory that are zeroed by the OS before being
given to the process.

However, I'm troubled by two things. The first is the "+2" in
the new_env array sizing expression. Why is it padded by two pointers?
Only one should be needed. And, of course, the final for() loop needs to
be modified to read "i<=ENVSIZE" so the last element of the array is
guaranteed to be initialized to the desired value. The second concern
mirrors the first. What is the purpose of adding two to the strlen()? The
length being malloc()'ed needs to be increased by only one to accomodate
the null byte at the end of the string. The extra padding suggests the
author of the code is uncertain of what they are doing. Which raises
serious concerns about the correctness of the rest of the code.

Lastly, a minor performance issue. The memcpy() function can be
significantly faster than strcpy() on some platforms. Since you've already
calculated the amount of data to be copied it would be more efficient to
code that inner loop as:

31 int ell=1+strlen(environ[i]); // Env line len with "xx="
32 new_env[i]=malloc(ell);
33 checkfatal(new_env[i]==NULL, "No Memory at env init");
34 memcpy(new_env[i], environ[i], ell);

Kurtis D. Rader

2004-01-23, 5:18 pm

On Fri, 07 Nov 2003 18:45:50 -0800, Kurtis D. Rader wrote:
quote:

> I do see one thing obviously wrong with the code. If the existing
> environ array has exactly 4096 members your new_env array will not be



That should have read

...has 4096 or more members...

I really need to stop posting replies while my attention is split between
multiple tasks :-)
Sponsored Links






Free braindumps | Software forum | Database administration forum

Copyright 2003 - 2008 webservertalk.com