AOL Webserver - More on lost of memory of aolserver process

This is Interesting: Free IT Magazines  
Home > Archive > AOL Webserver > October 2007 > More on lost of memory of aolserver process





You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

Author More on lost of memory of aolserver process
Agustin Lopez

2005-12-24, 5:48 pm

Hello!

Down there is a small extract on the memory behavior of the Web server
(time / mem in KB) in our production server.

Hardware is:
-Opteron 64 dual
-4 GB RAM

The software scene is:
-Linux Debian AMD-64 Sarge
-aolserver-4.0.10 (http and https)
-tcl8.4.11
-OpenACS 5.2 with dotLrn
-Many concurrent conections.
-Postgresql-7.4.8 (in other server)

I think of I have applied all the patches I have seen in this maillist.
(The behaviour is the same that in 32 bits Debian)
HTTPS is not the problem. TCL is not the problem. It is a production server
and I think of is very complicated put it in mode debugger.

Any pointer to detect where the memory is losing?
To this rate of loss the server (from 1 - 10 MB per minute) only holds 2 days
of uptime.

...
10:40:48 1150740
10:41:48 1205000
10:42:48 1213952
...
11:23:50 1415084
11:24:50 1415596
...
14:44:00 1575496
14:45:00 1575496
...
17:41:08 1634500
17:42:08 1634716
...

Thanks for your time,

Regards,
Agustin


--

========================================
==============
| Jose Agustin Lopez Bueno |
| E-Mail: Agustin.Lopez@uv.es |
| Home Page: http://www.uv.es/~lopezj/ |
| http://www.uv.es/postman/ |
| Tfnos: +34-96-3544310 +34-96-3543129 |
| Fax: +34-96-3544200 |
| Servicio de Informatica, Univ. de Valencia, Spain |
========================================
==============


Gustaf Neumann

2005-12-24, 5:48 pm

Agustin Lopez schrieb:

>Any pointer to detect where the memory is losing?
>To this rate of loss the server (from 1 - 10 MB per minute) only holds 2 days
>of uptime.
>
>

Well, this means, it runs perfectly, when you reboot it manually every
night and the
traffic is not increasing... (just joking)

i have written the following script to help to find "application level
leaks",
such as every growing variables, associative arrays, etc. It sould give
you a rough idea what is using which memory in your application. It is
not measuring
the tcl overhead (memory footprint of namespaces, variables, commands)
or e.g.
nsvs. It might be useful to detect, when an application creates
namespaces and leaves
growing variables there. Actually, the script must be run in each
interpreter, since
these might have different states (e.g. source the script via ns_eval).

Look for differences between interpreters and differences over time.

best regards
-gustaf neumann
----------------------

# script to determine application level leaks
# -gustaf neumann

proc report {} {
set ps [exec ps xv | grep "^[pid] "]
set vsz [lindex $ps 6]
set result ""
array set total {vars 0 procs 0 cmds 0 array_elements 0 var_bytes 0}
set details [lindex [__report__ns ::] 0]
foreach l $details {
array set tmp $l
foreach e [array names total] {incr total($e) $tmp($e)}
append result $l\n
}
return "$result\nTOTAL: pid [pid] vsz $vsz namespaces [llength
$details] [array get total]"
}

proc __report__ns {ns} {
if {$ns eq "::dom::domNode" ||
$ns eq "::dom::domDoc"} {
set result [list [list ns $ns]]
} else {
set pattern [expr {$ns eq "::" ? "::*" : "${ns}::*"}]
set nrvars [llength [info vars $pattern]]
set elements 0
set bytes 0
foreach var [info vars $pattern] {
if {[array exists $var]} {
incr elements [array size $var]
foreach e [array names $var] {incr bytes [string length ${var}($e)]}
} else {
incr bytes [string length $var]
}
}
set nrprocs [llength [info procs $pattern]]
set nrcmds [llength [info commands $pattern]]
incr nrcmds -$nrprocs
set result [list [list ns $ns vars $nrvars procs $nrprocs cmds
$nrcmds \
array_elements $elements var_bytes $bytes]]
}
foreach nc [lsort [namespace children $ns]] {
if {$nc eq "::xotcl::classes"} continue
foreach l [__report__ns $nc] {eval lappend result $l}
}
return [list $result]
}

ns_log notice [report]


patrick o'leary

2005-12-24, 5:48 pm

You may not want to run that within aolserver.

The exec creates a fork, which can double the
about of memory consumed.

P

Gustaf Neumann wrote:

> Agustin Lopez schrieb:
>
> Well, this means, it runs perfectly, when you reboot it manually every
> night and the
> traffic is not increasing... (just joking)
>
> i have written the following script to help to find "application level
> leaks",
> such as every growing variables, associative arrays, etc. It sould give
> you a rough idea what is using which memory in your application. It is
> not measuring
> the tcl overhead (memory footprint of namespaces, variables, commands)
> or e.g.
> nsvs. It might be useful to detect, when an application creates
> namespaces and leaves
> growing variables there. Actually, the script must be run in each
> interpreter, since
> these might have different states (e.g. source the script via ns_eval).
>
> Look for differences between interpreters and differences over time.
>
> best regards
> -gustaf neumann
> ----------------------
>
> # script to determine application level leaks
> # -gustaf neumann
>
> proc report {} {
> set ps [exec ps xv | grep "^[pid] "]
> set vsz [lindex $ps 6]
> set result ""
> array set total {vars 0 procs 0 cmds 0 array_elements 0 var_bytes 0}
> set details [lindex [__report__ns ::] 0]
> foreach l $details {
> array set tmp $l
> foreach e [array names total] {incr total($e) $tmp($e)}
> append result $l\n
> }
> return "$result\nTOTAL: pid [pid] vsz $vsz namespaces [llength
> $details] [array get total]"
> }
>
> proc __report__ns {ns} {
> if {$ns eq "::dom::domNode" ||
> $ns eq "::dom::domDoc"} {
> set result [list [list ns $ns]]
> } else {
> set pattern [expr {$ns eq "::" ? "::*" : "${ns}::*"}]
> set nrvars [llength [info vars $pattern]]
> set elements 0
> set bytes 0
> foreach var [info vars $pattern] {
> if {[array exists $var]} {
> incr elements [array size $var]
> foreach e [array names $var] {incr bytes [string length ${var}($e)]}
> } else {
> incr bytes [string length $var]
> }
> }
> set nrprocs [llength [info procs $pattern]]
> set nrcmds [llength [info commands $pattern]]
> incr nrcmds -$nrprocs
> set result [list [list ns $ns vars $nrvars procs $nrprocs cmds
> $nrcmds \
> array_elements $elements var_bytes $bytes]]
> }
> foreach nc [lsort [namespace children $ns]] {
> if {$nc eq "::xotcl::classes"} continue
> foreach l [__report__ns $nc] {eval lappend result $l}
> }
> return [list $result]
> }
>
> ns_log notice [report]
>
>
> --
> AOLserver - http://www.aolserver.com/
>
> To Remove yourself from this list, simply send an email to
> <listserv@listserv.aol.com> with the
> body of "SIGNOFF AOLSERVER" in the email message. You can leave the
> Subject: field of your email blank.
>



Gustaf Neumann

2005-12-24, 5:48 pm

patrick o'leary schrieb:

> You may not want to run that within aolserver.
>
> The exec creates a fork, which can double the
> about of memory consumed.


I used this on large and small machine (my notebook) without problems on
significant
blueprints (oacs + dotlrn). exec creates a new address space (not
running into the 4GB limit);
Since modern OSes do a copy page on write, i really doubt that this exec
is doubling the memory
consumption....

However, the line with the exec is not essential for the script, drop it
and remove the variable vsz
from the output.

-gustaf neumann


Agustin Lopez

2005-12-24, 5:48 pm

Hello!

Here it is a url from some results of your script

http://pizarra.uv.es/report.txt

I have not understand very much the results.
Do you see anything strange?

Regards,
Agustin

> patrick o'leary schrieb:
>
>
> I used this on large and small machine (my notebook) without problems on
> significant
> blueprints (oacs + dotlrn). exec creates a new address space (not
> running into the 4GB limit);
> Since modern OSes do a copy page on write, i really doubt that this exec
> is doubling the memory
> consumption....
>
> However, the line with the exec is not essential for the script, drop it
> and remove the variable vsz
> from the output.
>
> -gustaf neumann
>
>
> --
> AOLserver - http://www.aolserver.com/
>
> To Remove yourself from this list, simply send an email to <listserv@listserv.aol.com> with the
> body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: field of your email

blank.
>
>



--

========================================
==============
| Jose Agustin Lopez Bueno |
| E-Mail: Agustin.Lopez@uv.es |
| Home Page: http://www.uv.es/~lopezj/ |
| http://www.uv.es/postman/ |
| Tfnos: +34-96-3544310 +34-96-3543129 |
| Fax: +34-96-3544200 |
| Servicio de Informatica, Univ. de Valencia, Spain |
========================================
==============


Gustaf Neumann

2005-12-24, 5:48 pm

Agustin Lopez schrieb:

>Hello!
>
>Here it is a url from some results of your script
>
>http://pizarra.uv.es/report.txt
>
>I have not understand very much the results.
>Do you see anything strange?
>
>
>

no. look at the total lines:

TOTAL: pid 18867 vsz 2533029 namespaces 376 vars 103 var_bytes 14501 array_elements 472 cmds 348 procs 8408

it says: 2.5GB virtual memory size, array elements and var_bytes etc.
look normal. Somebody
suggested earlier a leak of domnodes, but this does not look so, since
it would imply a larger
number of cmds.

let your server grow close to crash and check the results again (e.g.
when it reaches 3GB or 3.5GB).
If the numbers don't change significantly, it is highly likely that
there is a c-level memleak involved and
it is not an application problem (oacs, dotlrn).

hope, this helps
-gustaf neumann
PS: there are four lines with TOTAL in the report file. does this mean
you have only four
connection threads running?


Agustin Lopez

2005-12-24, 5:48 pm

Hello again!

Thanks for your reply!

> PS: there are four lines with TOTAL in the report file. does this mean
> you have only four
> connection threads running?


No, I have run your script logging from OpenACS shell
utility. Four is the number of executions of your script
appending the results to file.

http://myserver/ds/shell.tcl

Regards,
Agustin


> Agustin Lopez schrieb:
>
> no. look at the total lines:
>
> TOTAL: pid 18867 vsz 2533029 namespaces 376 vars 103 var_bytes 14501 array_elements 472 cmds 348

procs 8408
>
> it says: 2.5GB virtual memory size, array elements and var_bytes etc.
> look normal. Somebody
> suggested earlier a leak of domnodes, but this does not look so, since
> it would imply a larger
> number of cmds.
>
> let your server grow close to crash and check the results again (e.g.
> when it reaches 3GB or 3.5GB).
> If the numbers don't change significantly, it is highly likely that
> there is a c-level memleak involved and
> it is not an application problem (oacs, dotlrn).
>
> hope, this helps
> -gustaf neumann
> PS: there are four lines with TOTAL in the report file. does this mean
> you have only four
> connection threads running?
>
>



--

========================================
==============
| Jose Agustin Lopez Bueno |
| E-Mail: Agustin.Lopez@uv.es |
| Home Page: http://www.uv.es/~lopezj/ |
| http://www.uv.es/postman/ |
| Tfnos: +34-96-3544310 +34-96-3543129 |
| Fax: +34-96-3544200 |
| Servicio de Informatica, Univ. de Valencia, Spain |
========================================
==============


Fenton, Brian

2007-10-05, 7:11 pm

On Sun, 16 Oct 2005 05:21:17 -0700
Gustaf Neumann wrote:
> i have written the following script to help to find "application level leaks"


and later
> If the numbers don't change significantly, it is highly likely that
> there is a c-level memleak involved and
> it is not an application problem (oacs, dotlrn).



Hello everyone,

I hope nobody minds me bumping this old thread. Gustaf, I used your script on a production server that appears to be leaking memory. Can you please help me interpret it?

This was the total at startup
TOTAL: pid 10686 vsz 592489 namespaces 543 vars 96 var_bytes 10118 array_elements 313 cmds 266 procs 4511

then just before the crash
TOTAL: pid 10686 vsz 1112509 namespaces 543 vars 86 var_bytes 8771 array_elements 256 cmds 266 procs 4469
unable to alloc 495960171 bytes

Would you consider the less than double "vsz" figure a significant change? Is it of any significance that the other values decreased (vars var_bytes etc)?
So would this suggest a C-level or application-level problem?

The strange thing about this particular case is that the developers tell me that "nothing changed" on this server in quite some time. And yet it mysteriously started crashing last week.

many thanks for any assistance

Brian Fenton


Gustaf Neumann

2007-10-05, 7:11 pm

Hi Brian,


Fenton, Brian schrieb:
> I hope nobody minds me bumping this old thread. Gustaf, I used your script on a production server that appears to be leaking memory. Can you please help me interpret it?
>
> This was the total at startup
> TOTAL: pid 10686 vsz 592489 namespaces 543 vars 96 var_bytes 10118 array_elements 313 cmds 266 procs 4511
>
> then just before the crash
> TOTAL: pid 10686 vsz 1112509 namespaces 543 vars 86 var_bytes 8771 array_elements 256 cmds 266 procs 4469
> unable to alloc 495960171 bytes
>

well the script was written to detect, if someone adds variables into
namespaces, or
appends permanently to variables, which are not reclaimed during cleanup
of a
connection. This is what i called "application level leaks".

in your case, this does not seem to be the case (no additional name
spaces or vars, all figures
are rather less than at start-time).

The growth in vsz is not good (but not unusual) but the size
of the alloc is something to worry.

Growth of vsz might be due to fragmentation of memory (zippy is optimized
for minimal locks, not minimal memory footprint) or due to the number
of threads you are using (maybe there were less threads at startup time
than on crash time?)
> Would you consider the less than double "vsz" figure a significant change? Is it of any significance that the other values decreased (vars var_bytes etc)?
> So would this suggest a C-level or application-level problem?
>

The main question is: why does it try to allocate 500MB?
what was the last request?
do you use some image libraries?
> The strange thing about this particular case is that the developers tell me that "nothing changed" on this server in quite some time. And yet it mysteriously started crashing last week.
>

if the allocation of 500MB sounds for your apps unusual to you,
i would suggest to try to figure out, what happens here in the last
request in deail.

Hope this helps

-gustaf neumann


Dossy Shiobara

2007-10-05, 7:11 pm

On 2007.10.05, Fenton, Brian <Brian.Fenton@QUEST.IE> wrote:
> unable to alloc 495960171 bytes


What does your application do that it tries to request 495MB of memory
in one shot?

I smell an [exec] ...

-- Dossy

--
Dossy Shiobara | dossy@panoptic.com | http://dossy.org/
Panoptic Computer Network | http://panoptic.com/
"He realized the fastest way to change is to laugh at your own
folly -- then you can let go and quickly move on." (p. 70)


Fenton, Brian

2007-10-08, 7:11 am

Thanks very much for the replies. The developers reported back that the crashes were caused by a report getting repeatedly called that was returning massive amounts of data to the browser. Sounds a bit odd to me, but I don't have any more data at the mome
nt. When I get more, I'll report back here.

Many thanks again for the assistance. It's much appreciated. It's comforting to know there's such capable support here on the mailing list.

Brian



-----Original Message-----
From: AOLserver Discussion [mailto:AOLSERVER@LISTSERV.AOL.COM] On Behalf Of Gustaf Neumann
Sent: 05 October 2007 22:12
To: AOLSERVER@LISTSERV.AOL.COM
Subject: Re: [AOLSERVER] More on lost of memory of aolserver process

Hi Brian,


Fenton, Brian schrieb:
> I hope nobody minds me bumping this old thread. Gustaf, I used your script on a production server that appears to be leaking memory. Can you please help me interpret it?
>
> This was the total at startup
> TOTAL: pid 10686 vsz 592489 namespaces 543 vars 96 var_bytes 10118 array_elements 313 cmds 266 procs 4511
>
> then just before the crash
> TOTAL: pid 10686 vsz 1112509 namespaces 543 vars 86 var_bytes 8771 array_elements 256 cmds 266 procs 4469
> unable to alloc 495960171 bytes
>

well the script was written to detect, if someone adds variables into
namespaces, or
appends permanently to variables, which are not reclaimed during cleanup
of a
connection. This is what i called "application level leaks".

in your case, this does not seem to be the case (no additional name
spaces or vars, all figures
are rather less than at start-time).

The growth in vsz is not good (but not unusual) but the size
of the alloc is something to worry.

Growth of vsz might be due to fragmentation of memory (zippy is optimized
for minimal locks, not minimal memory footprint) or due to the number
of threads you are using (maybe there were less threads at startup time
than on crash time?)
> Would you consider the less than double "vsz" figure a significant change? Is it of any significance that the other values decreased (vars var_bytes etc)?
> So would this suggest a C-level or application-level problem?
>

The main question is: why does it try to allocate 500MB?
what was the last request?
do you use some image libraries?
> The strange thing about this particular case is that the developers tell me that "nothing changed" on this server in quite some time. And yet it mysteriously started crashing last week.
>

if the allocation of 500MB sounds for your apps unusual to you,
i would suggest to try to figure out, what happens here in the last
request in deail.

Hope this helps

-gustaf neumann


--
AOLserver - http://www.aolserver.com/

To Remove yourself from this list, simply send an email to <listserv@listserv.aol.com> with the
body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: field of your email blank.


Gustaf Neumann

2007-10-08, 1:11 pm

Hi Brian,

Fenton, Brian schrieb:
> Thanks very much for the replies. The developers reported back that the crashes were caused by a report getting repeatedly called that was returning massive amounts of data to the browser. Sounds a bit odd to me, but I don't have any more data at the mo

ment. When I get more, I'll report back here.
>

if this large amount of data is not a bug, you will run always into
such memory problems, since all aolserver threads run in the same
(limited) address space of a single process. I would suggest to
considerate the following options:
- spool data to a file and return the file
- use HTTP streaming via ns_write
- move to 64bit

-gustaf neumann


Fenton, Brian

2007-10-08, 1:11 pm

Hi Gustaf,

The developers have acknowledged that this is a bug, so they are changing their code. However thanks for the good suggestions! :-)

Brian

-----Original Message-----
From: AOLserver Discussion [mailto:AOLSERVER@LISTSERV.AOL.COM] On Behalf Of Gustaf Neumann
Sent: 08 October 2007 13:04
To: AOLSERVER@LISTSERV.AOL.COM
Subject: Re: [AOLSERVER] More on lost of memory of aolserver process

Hi Brian,

Fenton, Brian schrieb:
> Thanks very much for the replies. The developers reported back that the crashes were caused by a report getting repeatedly called that was returning massive amounts of data to the browser. Sounds a bit odd to me, but I don't have any more data at the mo

ment. When I get more, I'll report back here.
>

if this large amount of data is not a bug, you will run always into
such memory problems, since all aolserver threads run in the same
(limited) address space of a single process. I would suggest to
considerate the following options:
- spool data to a file and return the file
- use HTTP streaming via ns_write
- move to 64bit

-gustaf neumann


--
AOLserver - http://www.aolserver.com/

To Remove yourself from this list, simply send an email to <listserv@listserv.aol.com> with the
body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: field of your email blank.


Tom Jackson

2007-10-08, 1:11 pm

So if the issue isn't that you have only 500meg of memory, but that, for
instance, a user got tired of waiting and kept hitting 'reload', or even that
a bunch of users wanted the massive report at the same time, you can use a
threadpool with a very limited number of threads...max you want simultaneous
download. You could still get more requests at once, but they will queue up
and wait. Then if they hit reload, no cost.

If this is an autogenerated report, you could also save it as a file. That
might help by using fastpath as a cache, and mem map on unix like machines.
Even though AOLserver will stop sending if the client quits or reloads, you
may have a backend process which keeps on going.

tom jackson

On Monday 08 October 2007 06:22, Fenton, Brian wrote:
> Hi Gustaf,
>
> The developers have acknowledged that this is a bug, so they are changing
> their code. However thanks for the good suggestions! :-)



Fenton, Brian

2007-10-08, 1:11 pm

Great suggestion Tom! Will pass it on to the team.

Thanks
Brian

-----Original Message-----
From: AOLserver Discussion [mailto:AOLSERVER@LISTSERV.AOL.COM] On Behalf Of Tom Jackson
Sent: 08 October 2007 14:59
To: AOLSERVER@LISTSERV.AOL.COM
Subject: Re: [AOLSERVER] More on lost of memory of aolserver process

So if the issue isn't that you have only 500meg of memory, but that, for
instance, a user got tired of waiting and kept hitting 'reload', or even that
a bunch of users wanted the massive report at the same time, you can use a
threadpool with a very limited number of threads...max you want simultaneous
download. You could still get more requests at once, but they will queue up
and wait. Then if they hit reload, no cost.

If this is an autogenerated report, you could also save it as a file. That
might help by using fastpath as a cache, and mem map on unix like machines.
Even though AOLserver will stop sending if the client quits or reloads, you
may have a backend process which keeps on going.

tom jackson

On Monday 08 October 2007 06:22, Fenton, Brian wrote:
> Hi Gustaf,
>
> The developers have acknowledged that this is a bug, so they are changing
> their code. However thanks for the good suggestions! :-)



--
AOLserver - http://www.aolserver.com/

To Remove yourself from this list, simply send an email to <listserv@listserv.aol.com> with the
body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: field of your email blank.


Sponsored Links






Free braindumps | Software forum | Database administration forum

Copyright 2003 - 2008 webservertalk.com