|
Home > Archive > Unix Programming > June 2006 > Generating smaller coredumps?
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Generating smaller coredumps?
|
|
| stephen.fedele@yale.edu 2006-06-06, 1:23 pm |
| I'm looking for a way to generate smaller coredumps. Essentially, I'd
like to capture things like the current registers, stack, thread and
process information, etc, similar to a Windows minidump. I'd like to
then be able to run this through GDB with the associated symbolic
information and obtain some basic information about the process. This
will be part of an automated crash reporting and analysis tool, hence
the interest in a smaller version of a coredump. Windows can do this
with Minidumps, and in the interest of cross-platform compatibility I'm
wondering if there's anything similar on Unix?
| |
| Bruce Barnett 2006-06-06, 7:23 pm |
| stephen.fedele@yale.edu writes:
> I'm looking for a way to generate smaller coredumps. Essentially, I'd
> like to capture things like the current registers, stack, thread and
> process information, etc, similar to a Windows minidump. I'd like to
> then be able to run this through GDB with the associated symbolic
> information and obtain some basic information about the process. This
> will be part of an automated crash reporting and analysis tool, hence
> the interest in a smaller version of a coredump. Windows can do this
> with Minidumps, and in the interest of cross-platform compatibility I'm
> wondering if there's anything similar on Unix?
Well, you can set the maximum coredump size using a shell command to
prevent HUGE core dumps. I'm not sure how usable they are, but
they are smaller.
(see csh's limit command)
You can also trap some of the errors and write out stuff yourself when
an error occurs. You can trap signals like SIGBUS, SIGSEGV and print
stuff out.
--
Sending unsolicited commercial e-mail to this account incurs a fee of
$500 per message, and acknowledges the legality of this contract.
| |
| Paul Pluzhnikov 2006-06-07, 1:24 am |
| stephen.fedele@yale.edu writes:
> I'm looking for a way to generate smaller coredumps. Essentially, I'd
> like to capture things like the current registers, stack, thread and
> process information, etc, similar to a Windows minidump. I'd like to
> then be able to run this through GDB with the associated symbolic
> information and obtain some basic information about the process.
One problem with this approach is that UNIX core dumps are generally
of very little use on any machine other then the one they were
generated on.
To analyze a core (e.g. to print crash stack trace), you need all
the libraries that match execution machine exactly, and you are
unlikely to have such a set of libraries for any given user machine.
> Windows can do this with Minidumps,
I have not looked into the format of Win32 minidumps, but believe
you can only effectively analyze them if you have access to matching
..PDB files. The fact that many DLLs come from a single vendor,
and that this vendor allows you to access .PDB for most of the
versions of DLLs they distribute, helps. But there is nothing even
remotely close on UNIX.
Cheers,
--
In order to understand recursion you must first understand recursion.
Remove /-nsp/ for email.
| |
| davids@webmaster.com 2006-06-07, 1:24 am |
|
stephen.fedele@yale.edu wrote:
> I'm looking for a way to generate smaller coredumps. Essentially, I'd
> like to capture things like the current registers, stack, thread and
> process information, etc, similar to a Windows minidump. I'd like to
> then be able to run this through GDB with the associated symbolic
> information and obtain some basic information about the process. This
> will be part of an automated crash reporting and analysis tool, hence
> the interest in a smaller version of a coredump. Windows can do this
> with Minidumps, and in the interest of cross-platform compatibility I'm
> wondering if there's anything similar on Unix?
You can catch SIGSEGV, SIGFAULT, and the like. In the signal handler,
you can dump whatever information you want. You can access the stack
with various gcc builtins (google around, getting the stack information
from a running program is a FAQ).
DS
| |
| Paul Pluzhnikov 2006-06-07, 1:24 am |
| davids@webmaster.com writes:
> You can catch SIGSEGV, SIGFAULT, and the like. In the signal handler,
> you can dump whatever information you want.
In a signal handler, you pretty much can't do *anything* reliably.
> You can access the stack with various gcc builtins
Which, if stack is corrupted, will themselves promptly crash,
causing the failure to become even harder to diagnose.
While this approach may work sometimes in single-threaded
environment, it will be a complete failure if used as a general
diagnostic tool.
Cheers,
--
In order to understand recursion you must first understand recursion.
Remove /-nsp/ for email.
| |
| Casper H.S. Dik 2006-06-07, 1:24 am |
| Paul Pluzhnikov <ppluzhnikov-nsp@charter.net> writes:
>One problem with this approach is that UNIX core dumps are generally
>of very little use on any machine other then the one they were
>generated on.
Which is why Solaris allows you to dump anything, including the
text segments.
Casper
--
Expressed in this posting are my opinions. They are in no way related
to opinions held by my employer, Sun Microsystems.
Statements on Sun products included here are not gospel and may
be fiction rather than truth.
| |
| Eric Sosman 2006-06-07, 7:23 pm |
|
Casper H.S. Dik wrote On 06/07/06 01:21,:
> Paul Pluzhnikov <ppluzhnikov-nsp@charter.net> writes:
>
>
>
>
> Which is why Solaris allows you to dump anything, including the
> text segments.
... and if you're running Solaris 10, you've got
DTrace. One of my colleagues used DTrace to build a
crash-catching tool that seems a pretty good match for
what the O.P. seeks. Google for "appcrash".
--
Eric.Sosman@sun.com
| |
|
|
Bruce Barnett wrote:
> stephen.fedele@yale.edu writes:
> prevent HUGE core dumps.
They are not as big as an "ls" connotes. As a sparse file, the actual
disk space used is much less than this number.
SOME tar(1) versions and all dump(1) versions can consign these
files to digital purgatory.
> (see csh's limit command)
or ulimit:
ulimit -hc 10k # whatever minidump-like limit is appropriate or
common...
=Brian
| |
| davids@webmaster.com 2006-06-08, 1:29 am |
|
Paul Pluzhnikov wrote:
> davids@webmaster.com writes:
>
>
> In a signal handler, you pretty much can't do *anything* reliably.
Reliability is not important here, the program has already failed.
[vbcol=seagreen]
> Which, if stack is corrupted, will themselves promptly crash,
> causing the failure to become even harder to diagnose.
No problem, just 'fork' before you do it. That way, if one diagnostic
cannot complete, you can still do others.
> While this approach may work sometimes in single-threaded
> environment, it will be a complete failure if used as a general
> diagnostic tool.
I don't agree at all. It does take some effort to get it to work, but
it can be refined to the point where it's an extremely useful
diagnostic tool.
DS
| |
| Paul Pluzhnikov 2006-06-08, 1:29 am |
| davids@webmaster.com writes:
> Reliability is not important here, the program has already failed.
Reliability *is* somewhat important here:
you may not see that same crash again for another year, so it might
be important to provide reliable crash trace.
>
>
> No problem, just 'fork' before you do it.
fork()ing MT apps is surprizingly complicated, and subject to its
own set of crashes and deadlocks. If you crash or deadlock while
attempting to fork(), you loose all chances to provide any meaningful
results, and on some systems you get into a state where the process
can't even be killed with SIGKILL 
>
> I don't agree at all. It does take some effort to get it to work, but
> it can be refined to the point where it's an extremely useful
> diagnostic tool.
For a *specific* application, yes.
As a *general* toolkit, to be linked into *any*
application? Unlikely.
Cheers,
--
In order to understand recursion you must first understand recursion.
Remove /-nsp/ for email.
| |
| davids@webmaster.com 2006-06-08, 7:23 pm |
|
Paul Pluzhnikov wrote:
> davids@webmaster.com writes:
>
>
> Reliability *is* somewhat important here:
> you may not see that same crash again for another year, so it might
> be important to provide reliable crash trace.
So call 'abort' if you can't get the information you need.
>
> fork()ing MT apps is surprizingly complicated, and subject to its
> own set of crashes and deadlocks. If you crash or deadlock while
> attempting to fork(), you loose all chances to provide any meaningful
> results, and on some systems you get into a state where the process
> can't even be killed with SIGKILL 
You are correct. This is easily worked around on Linux but a bigger
problem on other platforms. On Linux, you just use the raw 'fork' call
that doesn't do any fancy pthreads stuff before forking. Other
platforms might have something similar or might not.
>
> For a *specific* application, yes.
>
> As a *general* toolkit, to be linked into *any*
> application? Unlikely.
I guess you have a point. It pretty much has to be designed in from the
beginning.
I have a biased point of view because most of my work is developed on
top of the same core code and is primarily debugged on Linux and WIN32.
DS
| |
| Andrew Smallshaw 2006-06-11, 7:19 pm |
| On 2006-06-08, Paul Pluzhnikov <ppluzhnikov-nsp@charter.net> wrote:
> davids@webmaster.com writes:
>
>
> Reliability *is* somewhat important here:
> you may not see that same crash again for another year, so it might
> be important to provide reliable crash trace.
If a single bug causes only one crash a year over an entire user base many
would argue it isn't worth fixing, no matter how simple the fix is. I'm
not sure I agree with that but I wouldn't put much effort into tracking it
down either.
At least, providing we're not talking true misson-critical software (ie
failure potentially causes loss of life, widespread environmental
damage etc). If we are, you shouldn't be interested in dumps anyway...
--
Andrew Smallshaw
andrews@sdf.lonestar.org
| |
|
| Begin <slrne8ovpi.s09.andrews@sdf.lonestar.org>
On 2006-06-11, Andrew Smallshaw <andrews@sdf.lonestar.org> wrote:
> If a single bug causes only one crash a year over an entire user base many
> would argue it isn't worth fixing, no matter how simple the fix is. I'm
> not sure I agree with that but I wouldn't put much effort into tracking it
> down either.
The problem with that argument is of course that often enough you don't
know which bug caused what crash. If you're sure what the particular
problem is and can make a sensible tradeoff where not fixing it is
acceptable, a convincing argument to not fix it can be made. On the
other hand, if you're in the position to make that argument with any
degree of certainty, you know a lot about the specific problem already,
so fixing it might be less work than making the argument, not to mention
that not fixing it likely also means throwing away time+effort invested
in tracking down the information. And in that case the only sensible
course of action for the fixer is to shut up and fix it.
Or, put another way, if looked at only in local scope and assuming a
pretty strict set of prerequisites, then yes. If looked at with a global
perspective, then no. Put yet another way; not fixing has a cost, too.
--
j p d (at) d s b (dot) t u d e l f t (dot) n l .
This message was originally posted on Usenet in plain text.
Any other representation, additions, or changes do not have my
consent and may be a violation of international copyright law.
|
|
|
|
|