|
Home > Archive > Apache Mod-Python > August 2006 > New module importer - why not make it the default in 3.3?
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
New module importer - why not make it the default in 3.3?
|
|
| Jim Gallacher 2006-08-12, 1:12 pm |
| Graham,
In your JIRA cleanup session you made the comment a number of times that
the new importer is not likely to be the default in 3.3. I'm just
wondering why it can't be the default, with the old importer as an
option? Maybe:
PythonOption mod_python.legacy.importer *
If we were to do this we'd likely want a longer 3.3 beta test period,
but if it takes care of the ongoing module import issues it might be
worth it.
Jim
| |
| Graham Dumpleton 2006-08-13, 1:12 am |
|
On 13/08/2006, at 12:35 AM, Jim Gallacher wrote:
> Graham,
>
> In your JIRA cleanup session you made the comment a number of times
> that
> the new importer is not likely to be the default in 3.3. I'm just
> wondering why it can't be the default, with the old importer as an
> option? Maybe:
>
> PythonOption mod_python.legacy.importer *
>
> If we were to do this we'd likely want a longer 3.3 beta test period,
> but if it takes care of the ongoing module import issues it might be
> worth it.
I wasn't looking at it as being the default as I only know of one
person so far
who has been even trying to use it, thus feedback on it has been very
limited. It may be that the lack of negative feedback means that
there are no
problems in the code, but it would help to get some positive feedback
from
anyone who is using it on real world code examples even if only in a
development setup, that it is working okay. That way I would feel more
comfortable about making it the default.
By making it the default and therefore a first class citizen and not
just
the hanger on it is now, then also opens up the possibility of
introducing
some new directives to control it, rather than using PythonOption for
some things.
For example, if one is to introduce a feature whereby a global search
path
can be specified for modules like PythonPath does for sys.path, then
I would
prefer it to be a directive, perhaps calling it PythonModulePath. The
reason
for making it a directive is that with the new directive I proposed
in recent
JIRA issue, one could say:
PythonModulePath '["/some/path"]'
PythonAllowOverride -ModulePath
This makes how to configure it more visible and at the same time allows
the main Apache configuration to set where extra modules can be found
but then prohibit a user from changing it.
One slight problem with prohibiting overrides, is that even for
directives,
a user can get in and modify the table object returned by
req.get_config()
which could screw things up. To fix that, I think perhaps that the
table object
should be able to be created as read only so that handlers can't
modify it.
As with introducing PythonAllowOverride in the first place, making it so
the config table object is read only is another step in making it
possible to
restrict what can be done in handlers thereby making it perhaps a bit
friendlier to ISPs who want to make mod_python available and know that
one user can't screw up another users stuff.
Anyway, if we can make a decision that we will make new importer the
default in 3.3 that would be great and would allow me to progress
with other
ideas. When I have raised this question before though, never really
got any
responses or agreement on whether to proceed with making it the default.
Graham
| |
| Jim Gallacher 2006-08-13, 1:12 pm |
| Graham Dumpleton wrote:
>
> On 13/08/2006, at 12:35 AM, Jim Gallacher wrote:
>
>
> I wasn't looking at it as being the default as I only know of one person
> so far
> who has been even trying to use it, thus feedback on it has been very
> limited. It may be that the lack of negative feedback means that there
> are no
> problems in the code, but it would help to get some positive feedback from
> anyone who is using it on real world code examples even if only in a
> development setup, that it is working okay. That way I would feel more
> comfortable about making it the default.
I've had it turned on on my development machine since April, without
problems. However the stuff I'm doing is pretty plain vanilla and I have
generally designed things to avoid the problems seen in the old (3.1)
importer, so my experience may not be the best test.
> By making it the default and therefore a first class citizen and not just
> the hanger on it is now, then also opens up the possibility of introducing
> some new directives to control it, rather than using PythonOption for
> some things.
>
> For example, if one is to introduce a feature whereby a global search path
> can be specified for modules like PythonPath does for sys.path, then I
> would
> prefer it to be a directive, perhaps calling it PythonModulePath. The
> reason
> for making it a directive is that with the new directive I proposed in
> recent
> JIRA issue, one could say:
>
> PythonModulePath '["/some/path"]'
> PythonAllowOverride -ModulePath
>
> This makes how to configure it more visible and at the same time allows
> the main Apache configuration to set where extra modules can be found
> but then prohibit a user from changing it.
>
> One slight problem with prohibiting overrides, is that even for directives,
> a user can get in and modify the table object returned by req.get_config()
> which could screw things up. To fix that, I think perhaps that the table
> object
> should be able to be created as read only so that handlers can't modify it.
> As with introducing PythonAllowOverride in the first place, making it so
> the config table object is read only is another step in making it
> possible to
> restrict what can be done in handlers thereby making it perhaps a bit
> friendlier to ISPs who want to make mod_python available and know that
> one user can't screw up another users stuff.
I think this is a worthy goal. I'd also like to come up with a fix for
the shared mutexes. It is just not right that one user can deadlock the
server by grabbing all the locks and never releasing them.
> Anyway, if we can make a decision that we will make new importer the
> default in 3.3 that would be great and would allow me to progress with
> other
> ideas. When I have raised this question before though, never really got any
> responses or agreement on whether to proceed with making it the default.
>
> Graham
I think part of the problem is that you are the only person that has
taken the time to think through the import mechanism. It makes it
difficult for the rest of us when we haven't made the effort to
understand the deep magic. It's unfair to put that kind of pressure on
you, but to some extent that is a result of a fairly small developer
community.
Is the importer change big enough to warrant a jump to 4.0, in order to
indicate that this is a *major* change?
Either way (3.3 or 4.0 as the next release) I'm in favour of turning on
the new importer by default. As long as users have the option of falling
back to the old importer if things go pear-shaped I think we'll be OK.
Not turning it on by default now means that won't happen for another
year (if past history for releases is any idication). If it's just
optional in 3.3 my guess is most users *won't* turn it on, so we really
won't be that much further ahead.
If and when we do turn it on by default we should make a much more
vigorous attempt to solicit testing feedback from application
developers, beyond getting people to just run the unit tests.
Jim
| |
| Graham Dumpleton 2006-08-14, 7:15 am |
|
On 14/08/2006, at 12:29 AM, Jim Gallacher wrote:
>
> I think part of the problem is that you are the only person that has
> taken the time to think through the import mechanism. It makes it
> difficult for the rest of us when we haven't made the effort to
> understand the deep magic. It's unfair to put that kind of pressure on
> you, but to some extent that is a result of a fairly small developer
> community.
>
> Is the importer change big enough to warrant a jump to 4.0, in
> order to
> indicate that this is a *major* change?
I'm not totally sure, but whether it is called 3.3 or 4.0 we will
still have to
document the changes very well and will have to educate people. Most
people will probably blindly install it anyway even if called 4.0 and
not
consider how it may be different.
> Either way (3.3 or 4.0 as the next release) I'm in favour of
> turning on
> the new importer by default. As long as users have the option of
> falling
> back to the old importer if things go pear-shaped I think we'll be OK.
In which case calling it 3.3 is probably not a big deal, as the
choice will
be there to restore the old behaviour. If it was an all or nothing
change,
then would definitely need to be called 4.0.
> Not turning it on by default now means that won't happen for another
> year (if past history for releases is any idication). If it's just
> optional in 3.3 my guess is most users *won't* turn it on, so we
> really
> won't be that much further ahead.
Too true. :-(
> If and when we do turn it on by default we should make a much more
> vigorous attempt to solicit testing feedback from application
> developers, beyond getting people to just run the unit tests.
Also want to probably interact with major packages which use mod_python
and have them explicitly check to make sure everything still works.
Graham
| |
| Jim Gallacher 2006-08-14, 7:15 am |
| Graham Dumpleton wrote:
>
> On 14/08/2006, at 12:29 AM, Jim Gallacher wrote:
>
>
> I'm not totally sure, but whether it is called 3.3 or 4.0 we will still
> have to
> document the changes very well and will have to educate people. Most
> people will probably blindly install it anyway even if called 4.0 and not
> consider how it may be different.
I'm pretty sure that is what I would do. 
>
> In which case calling it 3.3 is probably not a big deal, as the choice will
> be there to restore the old behaviour. If it was an all or nothing change,
> then would definitely need to be called 4.0.
Perhaps it makes sense to call it 4.0 when the old behaviour is
permanently removed, but introduce it as the default in 3.3.
>
> Too true. :-(
>
>
> Also want to probably interact with major packages which use mod_python
> and have them explicitly check to make sure everything still works.
Definitely.
Jim
| |
| Dan Eloff 2006-08-16, 7:12 pm |
| The new importer gets my vote.
I've been using it for a while now in my development servers and it
works great. I've not discovered any bugs. I've verified it with
PythonAutoReload and PythonDebug in any combination of On and Off. For
a complex hierarchy of Python files in with both set to off, I've
verified it doesn't touch the filesystem. For both set to on it
touches the filesystem around 700 times (still < 1 second), I wouldn't
recommend that people do that on servers without lots of spare time.
It's a real time saver for development though! I hardly ever have to
restart apache anymore.
-Dan
On 8/12/06, Jim Gallacher <jim@jgassociates.ca> wrote:
> Graham,
>
> In your JIRA cleanup session you made the comment a number of times that
> the new importer is not likely to be the default in 3.3. I'm just
> wondering why it can't be the default, with the old importer as an
> option? Maybe:
>
> PythonOption mod_python.legacy.importer *
>
> If we were to do this we'd likely want a longer 3.3 beta test period,
> but if it takes care of the ongoing module import issues it might be
> worth it.
>
> Jim
>
| |
| Graham Dumpleton 2006-08-16, 7:12 pm |
| Dan Eloff wrote ..
> The new importer gets my vote.
>
> I've been using it for a while now in my development servers and it
> works great. I've not discovered any bugs. I've verified it with
> PythonAutoReload and PythonDebug in any combination of On and Off. For
> a complex hierarchy of Python files in with both set to off, I've
> verified it doesn't touch the filesystem. For both set to on it
> touches the filesystem around 700 times (still < 1 second), I wouldn't
> recommend that people do that on servers without lots of spare time.
> It's a real time saver for development though! I hardly ever have to
> restart apache anymore.
Hmm, 700 seems a lot. How are you determining that?
Right at the end of your response handler, can you add:
from mod_python import importer
apache.log_error("modules visited %d" + len(importer._get_modules_cache())
What this is doing is accessing a special per request module cache used
to avoid problems with a module being changed part way through a
request and two bits of code loading different versions. As a consequence,
seeing how many modules are held in that, will tell you how many modules
come into play with the current request.
If you want to know what these modules actually are, you can do something
like:
for module in importer._get_modules_cache().values():
apache.log_error("module %s" % module.__file__)
One of things I used to do when I was testing this, was to use some of the
dependency information to produce 'dot' graph definition files and view them
in GraphViz on the Mac. Quite interesting. I should resurrect the code and
post it on the mailing list so people can play with it. I should also go back
and reaudit the code again to see how many stats are being done and
whether I am doubling up.
Graham
| |
| Graham Dumpleton 2006-08-17, 1:12 am |
| Graham Dumpleton wrote ..
> One of things I used to do when I was testing this, was to use some of
> the dependency information to produce 'dot' graph definition files and
> view them in GraphViz on the Mac. Quite interesting. I should resurrect
> the code and post it on the mailing list so people can play with it.
Okay, I couldn't find it, but I have written a new version from scratch.
My original used to look at whole cache for process. This one looks at
just the modules used by a specific request. To use it, just specify a
PythonLogHandler which calls the handler below.
from mod_python import apache, importer
import time
def loghandler(req):
req.log_error('loghandler')
output = file('/tmp/request.dot', 'w')
modules = importer._get_modules_cache()
print >> output, 'digraph REQUEST {'
print >> output, 'node [shape=box];'
for module in modules.values():
children = module.__mp_info__.children
file1 = module.__file__
mtime1 = module.__mp_info__.cache.mtime
time1 = time.asctime(time.localtime(mtime1))
direct1 = module.__mp_info__.cache.direct
indirect1 = module.__mp_info__.cache.indirect
if children:
for child in children:
file2 = modules[child].__file__
mtime2 = modules[child].__mp_info__.cache.mtime
time2 = time.asctime(time.localtime(mtime2))
direct2 = modules[child].__mp_info__.cache.direct
indirect2 = modules[child].__mp_info__.cache.indirect
print >> output, '"%s\\n%s - %d - %d" -> "%s\\n%s - %d - %d";' % \
(file1, time1, direct1, indirect1, file2, time2, direct2, indirect2)
else:
print >> output, '"%s\\n%s - %d - %d";' % \
(file1, time1, direct1, indirect1)
print >> output, '}'
output.close()
return apache.OK
Your then need to use one of the graphing tools for 'dot' graphs which
are available to view it. For the Mac tool see:
http://www.graphviz.org/
There are viewers for other platforms linked on the resources page.
I have attached two images of graphs produced from two different quick
tests, one of then showing a loop within the module imports.
One thing that is great about being able to see this stuff visually, is that
so long as you have force only one Apache child process to be running,
you can watch the modification times change as you modify things and
confirm the reloads based on direct/indirect hit counts being displayed.
This is how I was confirming that module importer was doing the correct
thing in the first place. It has been a while since I have done this, so hope
I didn't stuff up the code in the interim and doing unnecessary reloads.
I would love to see an example of what the module dependencies for your
application are, unless the path names expose sensitive information. Can
you send the "dot" file produced. I used images in this so people could see it
without needing graphing tool, but your example might end up with text
being too small on image. If anyone else produces other quite complicated
examples, would love to see them as well.
Have fun.
Graham
| |
| Graham Dumpleton 2006-08-17, 1:12 am |
| Graham Dumpleton wrote ..
> Graham Dumpleton wrote ..
>
> Okay, I couldn't find it, but I have written a new version from scratch.
> My original used to look at whole cache for process. This one looks at
> just the modules used by a specific request. To use it, just specify a
> PythonLogHandler which calls the handler below.
Here is a slightly more refined version.
from mod_python import apache, importer
import time
def loghandler(req):
req.log_error('loghandler')
output = file('/tmp/request.dot', 'w')
modules = importer._get_modules_cache()
print >> output, 'digraph REQUEST {'
print >> output, 'node [shape=box];'
for module in modules.values():
name1 = module.__name__
file1 = module.__file__
mtime1 = module.__mp_info__.cache.mtime
time1 = time.asctime(time.localtime(mtime1))
direct1 = module.__mp_info__.cache.direct
indirect1 = module.__mp_info__.cache.indirect
generation1 = module.__mp_info__.cache.generation
print >> output, '%s [label="%s\\n%d - %s - %d - %d"];' % \
(name1, file1, generation1, time1, direct1, indirect1)
children = module.__mp_info__.children
for child in children:
name2 = modules[child].__name__
print >> output, '%s -> %s' % (name1, name2)
print >> output, '}'
output.close()
return apache.OK
This one also shows the generation number which is quite important
as it is this which is used more than file modification times to determine
if children have changed and parents should be reloaded.
Do not though that anything displayed out of __mp_info__.cache may
not be totally accurate, as that information reflects the current state
of the cache and not what it was at the time that the module loading
occurred for the specific request. Thus, if there are lots of other
requests happening in parallel, the direct/indirect hit counts may
not be what you expect.
Except for where loops are occurring, a child of a module should always
have a lower generation count.
BTW, the direct hit counter is incremented when the module is first
loaded and when that module is the root module being imported. The
indirect hit count is where a module is consulted due to being some
descendent child of the root.
Play around and you might get the idea. The counts were principally
for debugging, but when we know that this all works okay, we could
deleted that tracking code to make it run a miniscule faster.
Graham
|
|
|
|
|