|
Home > Archive > Apache Mod-Python > June 2005 > Solving the import problem
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Solving the import problem
|
|
| Nicolas Lehuen 2005-06-07, 5:48 pm |
| One last thing that we should prepare is a clear and definite answer
to the zillion users who need to import a custom utility module.
Today, we have 4 ways of importing code :
a) the standard "import" keyword. Today, it works unchanged
(mod_python doesn't install any import hook). The consequence is that
the only modules that can be imported this way are those found on the
PYTHONPATH. Importing custom code is easy if you can manipulate this
variable (either directly or through the PythonPath configuration
directive), but not everybody has this luxury (think shared hosting,
although not being able to change the PythonPath through an .htaccess
file seems pretty restrictive to me.).
b) the PythonImport directive, which ensure that a module is imported
(hence its initialization code is ran), but doesn't really import it
into the handler's or published module's namespace.
c) the apache.import_module() function, which is frankly a strange
beast. It knows how to approximately reload some modules, but has many
tricks that makes it frankly dangerous, namely the one that causes
modules with the same name but residing in different directories to
collide. I really think that mixing dynamic (re)loading of code with
the usual import mechanisms is a recipe for disaster. Anyway, today,
it's the only way our users can import some shared code, using a
strange idiom like
apache.import_module('mymodule',[dirname(__file__)]).
d) the new publisher.get_page(req,path), which is not really an answer
since it is designed to allow a published object to call another
published object from another page (not to call some shared code).
This mess should be sorted out. As a baseline, I'd say that we have 4
kinds of code in mod_python :
1) the standard Python code that should be imported using the "import" keyw=
ord
2) handlers, which are dynamically loaded through apache.import_module
(so they are declared in sys.module, with all the problem that can
cause when sharing a single setup with multiple handlers that have the
same name, "publisher" for example) - this should be fixed.
3) published modules, which are dynamically loaded by the
mod_python.publisher handler (so now they don't have any problems that
were previously caused by apache.import_module). An important thing to
notice is that published module are usually stored in a directory
which is visible by Apache (handlers don't need to reside in a public
directory), amongst .html and image files. Hence, people can
legitimately be reluctant to put their core application code
(including DB passwords etc.) in published modules, for security and
code/presentation separation issues.
4) custom library code, AKA core application code. This code should
reside somewhere, preferably in a private directory (at least direct
access to this code from the web should be denied) and be easily
imported and reloaded into published modules, without having to tinker
too much with the PYTHONPATH variable or the PythonPath directive.
What would be nice is a clear and definite way to handle those 4 kinds
of code. To me, layers 2, 3 and 4 could be handled by the same dynamic
code cache, except that a careful directory structure or naming scheme
would prevent the layer 4 to be visible from the web.
I know Vampire solves a lot of these problems, so we have two alternatives =
:=20
A) We decide that we won't solve the whole problem into mod_python. We
take apache.import_module out and shoot it. Handlers are loaded in a
real dynamic code cache maybe the same as the one now used by
mod_python.publisher), which solves a lot of problems.
Custom library code is not handled : if you want to import some code,
you put it wherever you like and make sure PYTHONPATH or the
PythonPath directive point to it, so you can import it like a standard
module. You'll never use apache.import_module anymore, it will
blissfully dissolve into oblivion (and be removed from the module,
anyway).
If you need to reload your core application code without restarting
Apache, then too bad, mod_python doesn't know how to do this. Check
out Vampire.
B) We decide to solve the whole problem into mod_python.
apache.import_module is not much luckier this time, it is still taken
out and shot in the head. We solve the handlers loading problem. But
now, with a little help from Graham, custom application code can be
dynamically loaded and reloaded from any place without having to
tinker with the PYTHONPATH variable and/or the PythonPath directive.
Everything can be done from the source code with a little help from an
..htaccess file.
So, sorry for this long mail, but I had to get this out. The current
situation is pretty bad, zillions of people need to do this simple
thing, and when they notice it's not that simple (or it's buggy), they
decide to build the nth application framework on mod_python. So,
either we reckon it's None of our business, that users should turn to
higher level frameworks like Vampire, and we remove
apache.import_module, or we decide to tackle the issue, and we remove
apache.import_module. Either way, it must leave .
What do you think ?
Regards,
Nicolas
| |
|
| I'm all for fixing the importing of handler modules, because that can bite
you if they are being loaded and initialized under heavy load in the same
process in different threads.
As for user imports in general, I would prefer that everything work in a
standard Python way out of the box. mod_python should be what it is -- an
interface to apache. That's most useful for application and framework
developers I think. Otherwise you could end up breaking other people's
applications and frameworks who have resolved the problem in a different way.
I'm not opposed to providing some sort of module or function for psp or
publisher or whatever "example" handlers that come with mod_python, just so
long as it's not in the core of mod_python itself. To me this is an
application/framework issue, not a mod_python issue. That said, the
behaviour should probably be documented well so people can avoid the pitfalls.
Nick
Nicolas Lehuen wrote:
> One last thing that we should prepare is a clear and definite answer
> to the zillion users who need to import a custom utility module.
> Today, we have 4 ways of importing code :
>
> a) the standard "import" keyword. Today, it works unchanged
> (mod_python doesn't install any import hook). The consequence is that
> the only modules that can be imported this way are those found on the
> PYTHONPATH. Importing custom code is easy if you can manipulate this
> variable (either directly or through the PythonPath configuration
> directive), but not everybody has this luxury (think shared hosting,
> although not being able to change the PythonPath through an .htaccess
> file seems pretty restrictive to me.).
>
> b) the PythonImport directive, which ensure that a module is imported
> (hence its initialization code is ran), but doesn't really import it
> into the handler's or published module's namespace.
>
> c) the apache.import_module() function, which is frankly a strange
> beast. It knows how to approximately reload some modules, but has many
> tricks that makes it frankly dangerous, namely the one that causes
> modules with the same name but residing in different directories to
> collide. I really think that mixing dynamic (re)loading of code with
> the usual import mechanisms is a recipe for disaster. Anyway, today,
> it's the only way our users can import some shared code, using a
> strange idiom like
> apache.import_module('mymodule',[dirname(__file__)]).
>
> d) the new publisher.get_page(req,path), which is not really an answer
> since it is designed to allow a published object to call another
> published object from another page (not to call some shared code).
>
> This mess should be sorted out. As a baseline, I'd say that we have 4
> kinds of code in mod_python :
>
> 1) the standard Python code that should be imported using the "import" keyword
>
> 2) handlers, which are dynamically loaded through apache.import_module
> (so they are declared in sys.module, with all the problem that can
> cause when sharing a single setup with multiple handlers that have the
> same name, "publisher" for example) - this should be fixed.
>
> 3) published modules, which are dynamically loaded by the
> mod_python.publisher handler (so now they don't have any problems that
> were previously caused by apache.import_module). An important thing to
> notice is that published module are usually stored in a directory
> which is visible by Apache (handlers don't need to reside in a public
> directory), amongst .html and image files. Hence, people can
> legitimately be reluctant to put their core application code
> (including DB passwords etc.) in published modules, for security and
> code/presentation separation issues.
>
> 4) custom library code, AKA core application code. This code should
> reside somewhere, preferably in a private directory (at least direct
> access to this code from the web should be denied) and be easily
> imported and reloaded into published modules, without having to tinker
> too much with the PYTHONPATH variable or the PythonPath directive.
>
> What would be nice is a clear and definite way to handle those 4 kinds
> of code. To me, layers 2, 3 and 4 could be handled by the same dynamic
> code cache, except that a careful directory structure or naming scheme
> would prevent the layer 4 to be visible from the web.
>
> I know Vampire solves a lot of these problems, so we have two alternatives :
>
> A) We decide that we won't solve the whole problem into mod_python. We
> take apache.import_module out and shoot it. Handlers are loaded in a
> real dynamic code cache maybe the same as the one now used by
> mod_python.publisher), which solves a lot of problems.
>
> Custom library code is not handled : if you want to import some code,
> you put it wherever you like and make sure PYTHONPATH or the
> PythonPath directive point to it, so you can import it like a standard
> module. You'll never use apache.import_module anymore, it will
> blissfully dissolve into oblivion (and be removed from the module,
> anyway).
>
> If you need to reload your core application code without restarting
> Apache, then too bad, mod_python doesn't know how to do this. Check
> out Vampire.
>
> B) We decide to solve the whole problem into mod_python.
> apache.import_module is not much luckier this time, it is still taken
> out and shot in the head. We solve the handlers loading problem. But
> now, with a little help from Graham, custom application code can be
> dynamically loaded and reloaded from any place without having to
> tinker with the PYTHONPATH variable and/or the PythonPath directive.
> Everything can be done from the source code with a little help from an
> .htaccess file.
>
> So, sorry for this long mail, but I had to get this out. The current
> situation is pretty bad, zillions of people need to do this simple
> thing, and when they notice it's not that simple (or it's buggy), they
> decide to build the nth application framework on mod_python. So,
> either we reckon it's None of our business, that users should turn to
> higher level frameworks like Vampire, and we remove
> apache.import_module, or we decide to tackle the issue, and we remove
> apache.import_module. Either way, it must leave .
>
> What do you think ?
>
> Regards,
> Nicolas
| |
| Barry Pearce 2005-06-07, 5:48 pm |
| Indeed Im for fixing it...its on my list of things to do...right after
'do everything the company want RSN'!!!!
I do believe it should be mod_python that is fixed. I have a VERY big
need for reload of modules *without* taking down my server - end users
are using it and credit card transactions are taking place....I cannot
afford to take it down...
As for vampire - why would I want vampire? mod_python is great except
this. I personally have no interest in adding yet more software to my
system just to solve the mod_python import issue - Id rather it was
fixed in the right place...not everyone uses vampire...
Nick wrote:
> I'm all for fixing the importing of handler modules, because that can
> bite you if they are being loaded and initialized under heavy load in
> the same process in different threads.
? Surely not in different interpreters - and as mod_python uses a number
of interpreters surely this is not an issue...or are we saying python
itself is not thread safe in this regard?
> As for user imports in general, I would prefer that everything work in a
> standard Python way out of the box. mod_python should be what it is --
> an interface to apache. That's most useful for application and
> framework developers I think. Otherwise you could end up breaking other
> people's applications and frameworks who have resolved the problem in a
> different way.
agreed....but...optional functionality to stop us having to restart
apache would be good...and I consider that that is part of
'interfacing'. It solves a problem that is inherent in 'interfacing'.
> I'm not opposed to providing some sort of module or function for psp or
> publisher or whatever "example" handlers that come with mod_python, just
> so long as it's not in the core of mod_python itself. To me this is an
> application/framework issue, not a mod_python issue. That said, the
> behaviour should probably be documented well so people can avoid the
> pitfalls.
quite. From my perspective its like this...either it can be solved as a
group here and in mod_python or ill solve it in my way outside of
mod-python....but whatever happens the business driver states I cannot
take down a web server - its like antivirus software asking you to
reboot your machine every time it updates (no tales of woah please...ive
heard them already!)
Barry
| |
| Graham Dumpleton 2005-06-07, 8:45 pm |
|
On 08/06/2005, at 8:33 AM, Barry Pearce wrote:
> Indeed Im for fixing it...its on my list of things to do...right after
> 'do everything the company want RSN'!!!!
>
> I do believe it should be mod_python that is fixed. I have a VERY big
> need for reload of modules *without* taking down my server - end users
> are using it and credit card transactions are taking place....I cannot
> afford to take it down...
>
> As for vampire - why would I want vampire? mod_python is great except
> this. I personally have no interest in adding yet more software to my
> system just to solve the mod_python import issue - Id rather it was
> fixed in the right place...not everyone uses vampire...
From what I can see, hardly anyone actually uses Vampire and a big
reason is
probably the same attitude you are expressing. :-(
To echo a comment I just made in a separate posting to the main mailing
list,
I believe that mod_python and Apache in combination have huge potential
as
being a base for quite powerful and complex systems. I feel though that
most
people don't really appreciate the fullness of what mod_python has to
offer
and just scratch the surface. Things aren't helped by mod_python having
some
rough edges and gaps in its basic functionality which if present would
make
it so much easier for people new to mod_python to make use of it. As a
result
of these gaps I keep seeing people trying to harness what is provided in
mod_python in ways that it probably shouldn't, resulting in code which
over
time will just become messy and hard to manage. This may be okay for
simple
systems, but in a complicated system its asking for trouble.
One can liken mod_python to providing a good foundation and some basic
bits
and pieces for building a house. Some of these bits are currently
broken or
don't function in an ideal way. The point of Vampire is to provide fixed
versions of some of these bits and to provide some better bits to help
you
in building your house. What Vampire isn't is a preconstructed house
which you
are forced to adopt. I get the impression from various people that they
think
Vampire is a house and as such it will be inflexible because it can
only be
used in a certain way, consequently the often repeated thought I see
expressed
is "why would I want to use it?".
In some respects, some of the bits and ideas embodied in Vampire are
things
that should be in the core mod_python package. At the moment though, I
see
enough bugs and other issues in mod_python that need fixing that one is
better
concentrating on them first, rather than trying to push more stuff in
there.
At least to my mind, Vampire is serving at the moment as a test bed for
stuff
that could be later incorporated into mod_python when a clear idea
develops
of where the best way to take mod_python would be. Unfortunately, a lot
of
people seem to feel that since it isn't in mod_python now, that there
can't be
much point to it and it isn't worth investigating. :-(
Graham
| |
| Graham Dumpleton 2005-06-07, 8:45 pm |
| An update on a few things that I have managed to get working in
Vampire in respect of some of the issues below, plus a few other
comments.
On 08/06/2005, at 6:33 AM, Nicolas Lehuen wrote:
> One last thing that we should prepare is a clear and definite answer
> to the zillion users who need to import a custom utility module.
> Today, we have 4 ways of importing code :
>
> a) the standard "import" keyword. Today, it works unchanged
> (mod_python doesn't install any import hook). The consequence is that
> the only modules that can be imported this way are those found on the
> PYTHONPATH. Importing custom code is easy if you can manipulate this
> variable (either directly or through the PythonPath configuration
> directive), but not everybody has this luxury (think shared hosting,
> although not being able to change the PythonPath through an .htaccess
> file seems pretty restrictive to me.).
I finally worked out the proper way in Python that one is meant to
install import hooks so that you don't screw up other packages also
trying to use import hooks, although it relies on the other packages
doing it the correct way as well.
The result is that in Vampire, when the feature is enabled, you can
use the "import" keyword to import modules local to the document tree
where the handler is and it will use the Vampire module importing
system instead for those imports. Where the context is traceable back
to a top level import of a handler from Vampire, the automatic module
reloading mechanism, including changes in children causing parents
to be reloaded, is all working okay.
When this feature kicks in, it will only search in the same directory
as handler file is located and optionally along a module search path
which is distinct from the normal sys.path. This search path has to
be separate and can't overlap with sys.path because you will end up
with duplicate modules loaded in different ways if one isn't careful.
The preferred approach is that sys.path should simply not include any
directory which is a part of the document tree.
The only part of what "import" provides that isn't working completely
yet is importation of packages. The bits of this that do work are the
importing of the root of the package. Importing of a sub module/package
of the package which was already imported by the parent and using the
from/import syntax to import only bits of any of these.
The one bit that I haven't been able to get working yet is where you
have "import package.module" and where "module" wasn't explicitly
imported by "package/__init__.py".
The reason it doesn't work is that the part of the Python import system
that deals with packages assumes that any module imports are always
stored
in sys.modules. It relies on this and will search sys.modules for the
parent module to determine which directory it is in and thus from where
it should import the sub module/package.
At the moment to me this makes is look like any system that tries to use
import hooks in Python, cannot support packages where the
modules/packages
are not stored in sys.modules.
Because of this, even though packages partly work, at the moment I throw
an import error with a message saying that packages aren't supported in
the context of the Vampire module importing system if such an import is
attempted. This shouldn't be an issue for individual handler files
stored
in the document tree as you wouldn't write them as packages normally
anyway.
It might be an issue if someone had a set of utility modules living
outside
the document tree that they wanted automatic reloading to work on. The
only choice there at the moment is not to use a traditional package in
that context. You could get more flexibility by accessing the module
loading API in Vampire directly, but that means the utility modules,
that
perhaps shouldn't strictly know about Vampire/mod_python, will.
> b) the PythonImport directive, which ensure that a module is imported
> (hence its initialization code is ran), but doesn't really import it
> into the handler's or published module's namespace.
>
> c) the apache.import_module() function, which is frankly a strange
> beast. It knows how to approximately reload some modules, but has many
> tricks that makes it frankly dangerous, namely the one that causes
> modules with the same name but residing in different directories to
> collide. I really think that mixing dynamic (re)loading of code with
> the usual import mechanisms is a recipe for disaster. Anyway, today,
> it's the only way our users can import some shared code, using a
> strange idiom like
> apache.import_module('mymodule',[dirname(__file__)]).
I know you have marked:
http://issues.apache.org/jira/browse/MODPYTHON-9
as resolved by virtue of including a new module importing system in
publisher, but there is still the underlying problem in import_module()
function that once you access an "index.py" in a subdirectory, the one
in the parent is effectively lost. I realise that even if this is fixed,
each still gets reloaded on cyclic requests, but at least the parent
doesn't become completely useless.
> d) the new publisher.get_page(req,path), which is not really an answer
> since it is designed to allow a published object to call another
> published object from another page (not to call some shared code).
>
> This mess should be sorted out. As a baseline, I'd say that we have 4
> kinds of code in mod_python :
Brain slowing down at this point. I'll perhaps come back with some more
coherent thoughts on the rest of your points later when I have got some
other things out of the way. :-)
> 1) the standard Python code that should be imported using the "import"
> keyword
>
> 2) handlers, which are dynamically loaded through apache.import_module
> (so they are declared in sys.module, with all the problem that can
> cause when sharing a single setup with multiple handlers that have the
> same name, "publisher" for example) - this should be fixed.
>
> 3) published modules, which are dynamically loaded by the
> mod_python.publisher handler (so now they don't have any problems that
> were previously caused by apache.import_module). An important thing to
> notice is that published module are usually stored in a directory
> which is visible by Apache (handlers don't need to reside in a public
> directory), amongst .html and image files. Hence, people can
> legitimately be reluctant to put their core application code
> (including DB passwords etc.) in published modules, for security and
> code/presentation separation issues.
>
> 4) custom library code, AKA core application code. This code should
> reside somewhere, preferably in a private directory (at least direct
> access to this code from the web should be denied) and be easily
> imported and reloaded into published modules, without having to tinker
> too much with the PYTHONPATH variable or the PythonPath directive.
>
> What would be nice is a clear and definite way to handle those 4 kinds
> of code. To me, layers 2, 3 and 4 could be handled by the same dynamic
> code cache, except that a careful directory structure or naming scheme
> would prevent the layer 4 to be visible from the web.
>
> I know Vampire solves a lot of these problems, so we have two
> alternatives :
>
> A) We decide that we won't solve the whole problem into mod_python. We
> take apache.import_module out and shoot it. Handlers are loaded in a
> real dynamic code cache maybe the same as the one now used by
> mod_python.publisher), which solves a lot of problems.
>
> Custom library code is not handled : if you want to import some code,
> you put it wherever you like and make sure PYTHONPATH or the
> PythonPath directive point to it, so you can import it like a standard
> module. You'll never use apache.import_module anymore, it will
> blissfully dissolve into oblivion (and be removed from the module,
> anyway).
>
> If you need to reload your core application code without restarting
> Apache, then too bad, mod_python doesn't know how to do this. Check
> out Vampire.
>
> B) We decide to solve the whole problem into mod_python.
> apache.import_module is not much luckier this time, it is still taken
> out and shot in the head. We solve the handlers loading problem. But
> now, with a little help from Graham, custom application code can be
> dynamically loaded and reloaded from any place without having to
> tinker with the PYTHONPATH variable and/or the PythonPath directive.
> Everything can be done from the source code with a little help from an
> .htaccess file.
>
> So, sorry for this long mail, but I had to get this out. The current
> situation is pretty bad, zillions of people need to do this simple
> thing, and when they notice it's not that simple (or it's buggy), they
> decide to build the nth application framework on mod_python. So,
> either we reckon it's None of our business, that users should turn to
> higher level frameworks like Vampire, and we remove
> apache.import_module, or we decide to tackle the issue, and we remove
> apache.import_module. Either way, it must leave .
>
> What do you think ?
>
> Regards,
> Nicolas
| |
| Gregory (Grisha) Trubetskoy 2005-06-08, 2:46 am |
|
On Tue, 7 Jun 2005, Barry Pearce wrote:
>
> quite. From my perspective its like this...either it can be solved as a group
> here and in mod_python or ill solve it in my way outside of mod-python....but
> whatever happens the business driver states I cannot take down a web server -
> its like antivirus software asking you to reboot your machine every time it
> updates (no tales of woah please...ive heard them already!)
Perhaps if we agree on the general direction but think that a change may
be a bit radical, doing something like what Python has done with "future"
is the way to go, i.e. new functionality has to be explicitely requested,
and if people like it, then it becomes default?
Grisha
| |
| Nicolas Lehuen 2005-06-08, 2:46 am |
| 2005/6/8, Graham Dumpleton <grahamd@dscpl.com.au>:
>=20
> I know you have marked:
>=20
> http://issues.apache.org/jira/browse/MODPYTHON-9
>=20
> as resolved by virtue of including a new module importing system in
> publisher, but there is still the underlying problem in import_module()
> function that once you access an "index.py" in a subdirectory, the one
> in the parent is effectively lost. I realise that even if this is fixed,
> each still gets reloaded on cyclic requests, but at least the parent
> doesn't become completely useless.
Well, the publisher problem is really solved, and you *can* publish
two index.py modules in different directories, even subdirectories,
without any problem or cyclic reload.
However, if you are still using apache.import_module, for example to
dynamically import a support module into a published module, then
you'll have some trouble, because *apache.import_module is hopelessly
broken*. I can't stress that enough.
I'm sorry to be harsh, but this function cannot be fixed so that it
works correclty,. Well, it can, but the result would be a kludge.
Right now, if two applications or two users (in shared hosting) have a
util.py module that they want to import, the first one will do
something like :
util =3D apache.import_module('util',['/path/to/my/application/code'])
and the second one, being very clever and having read the mailing
list, will do :
util =3D apache.import_module('util',[os.dirname(__file__)])
Bang, you've got a module collision. The problem is, there is no way
to solve this while retaining the current semantics of
apache.import_module.
apache.import_module tries to emulate an environment in which modules
are imported in the standard way, which a kind of
on-the-fly-reconfigurable PYTHONPATH. This simply does not work.
For starters, I really think that providing a search path as an
argument is very, very bad practice. It forces you to hardcode a path
into some application code (or to make complicated convolutions using
os.dirname or req.get_option). I think the way you did it in Vampire
is pretty clever, Graham. You look into the directory of the module
which handles the current request (and you provide a special __req__
attribute during import time to be able to refer to the current
request), then if not in a special, application-specific search path
which can be redefined through a configuration directive, and is
distinct from the sys.path.
You wrote it better than me, we should not mix sys.path, the
application-specific search path and the published document tree.
Another problem with apache.import_module is that it pollutes the
sys.modules list, hence causing a wealth of collisions opportunities.
sys.modules should be for nice, standard Python modules and modules
imported from the sys.path. Not for dynamically loaded modules.
The nice thing is that I think I understand the usage patterns of
apache.import_module much better now. So I think I see a way to
reimplement and fix it with the following behaviour :
A) Finding the module
1) If the path parameter is passed, use it
2) If the path parameter is absent, look for the module in the "local"
directory (local being based on the __file__ attribute of the current
module)
3) If there is no matching "local" module, look for it in the
application-specific path which is defined in a configuration file. I
think this requires access to a request object, so this one could be
tricky.
4) Look for the module into sys.path
B) Loading or reloading the module
In case 1 to 3, we use the cache.ModuleCache class as in the publisher
to load or reload the module. We DO NOT store the resulting module in
sys.modules. If two modules have the same same but reside in different
directories, there won't be any collision, thanks to
cache.ModuleCache. There also won't be any multithreading issues since
ModuleCache uses a two-level locking scheme.
In case 4, we lock the import lock. We check for the existence of the
module in sys.module.
- If it is absent, we load the module.
- If it is present, we check that its __file__ attribute is the same.
If not, something really weird is happening (the same module has moved
within the sys.path, or there is a possible collision) and we throw an
ImportError. We then check for a __timestamp__ attribute : if absent,
then too bad, this module is a core Python module which was not
imported using import_module, and we don't reload it (we could
forcedly reload it but I fear this would be a bit dangerous), so we
use it as is. If the __timestamp__ attribute is present, then we can
check and reload the module if necessary.
Eventually, we set the __timestamp__ attribute on the dynamically
loaded module and store it in sys.modules, since we found the module
on sys.path in the first place (case 4). It seems that it's good
practice to retain the "it's on sys.path so it's going to end on
sys.modules" semantics. We can finally release the import lock.
Note that we could extend this behaviour by checking dependencies, so
that a dynamically loaded module could be reloaded if one of the
dynamically loaded modules it depends on are reloaded. We'll see that
once we reached the above description.
What do you think about that ?
Regards,
Nicolas
| |
|
| Barry Pearce wrote:
>
> agreed....but...optional functionality to stop us having to restart
> apache would be good...and I consider that that is part of
> 'interfacing'. It solves a problem that is inherent in 'interfacing'.
It seems to me that this is expected behaviour when working with persistent
processes in Python, that you have to stop and restart if you've changed
something that got imported. There are ways the programmer can take care of
this, but there isn't an "auto-reimport" function in Python itself, and
therefore should not be the default behaviour of "import" in mod_python.
Maybe I'm not disagreeing with you here, but I just want to be clear that I
don't think mod_python should install any import hooks. That would probably
be annoying to seasoned Python programmers, and really a pain in the butt
for me specifically. Having a module in the mod_python library that allows
you to do that is fine by me, but I don't know the feasibility of having
that ready for a 3.2.0 release. Patching mod_python manually is starting to
get tedious 
Nick
| |
| Graham Dumpleton 2005-06-08, 5:46 pm |
|
On 09/06/2005, at 12:29 AM, Nick wrote:
> but I just want to be clear that I don't think mod_python should
> install any import hooks. That would probably be annoying to seasoned
> Python programmers, and really a pain in the butt for me specifically.
Curious to know why you think this would be annoying to seasoned Python
programmers and a pain in the butt to yourself.
Defining import hooks aren't the horrid thing they used to be back in
the
Python 1.5 era. They are flexible enough now that a special import hook
can
be made to only come into play when used in a quite specific context.
Ie.,
you don't have the problem of an import hook having to duplicate exactly
how the standard Python one worked because it isn't taking on the
complete
responsibility of managing everyones imports.
The only real problem is where you use a third party package which
installs
an old style import hook of its own, ie., doesn't follow the newer way
of
doing import hooks, and as a result stuffs everything up. If import
hooks
follow the new way of doing things, multiple import hooks should be
able to
happily coexist.
Graham
| |
|
| Graham Dumpleton wrote:
>
> On 09/06/2005, at 12:29 AM, Nick wrote:
>
>
> Curious to know why you think this would be annoying to seasoned Python
> programmers and a pain in the butt to yourself.
As I mentioned before, as a programmer I would want mod_python to be an API
for apache in Python, with no special alteration of how Python works as
advertised in the Python documentation. Installing import hooks bypasses
the Python import functionality as it's documented, and in my opinion is not
necessary to use mod_python as an API for apache.
While it may be convenient for some people to have an import hook that will
automagically reload a module when it changes on disk, that's *not* how
Python works, and would be unexpected behaviour for a Python programmer.
That's like saying that since most people want to parse the input stream as
form data, let's automatically parse it and put it into req.form. But
mod_python doesn't do that. You have to use utils.FieldStorage.
For my own applications I take advantage of Python's internal import routine
because it is in C and therefore much faster than any pure Python
implementation (try benchmarking an import hook in ihooks and you'll see how
slow it is). I want to inspect sys.modules, and that's where I expect
imports to go. I don't want to bypass an import mechanism that works
differently just to get back to Python's; that's counter-intuitive. Plus,
it'll make reuse a pain, because I'll have to write modules with special
checks to see if I'm in mod_python or do something else when not.
I just don't think default import hooks are necessary to make mod_python
generally useful to people. I can certainly understand people's needs for
something more flexible than Python's stock importer when writing
applications, and there's always an option to add a module to mod_python
that can let people do exactly that. I just don't want it out of the box.
Nick
| |
| Nicolas Lehuen 2005-06-08, 5:46 pm |
| Nick, what do you think about apache.import_module, as it is now
and/or as I've described before ?
Let's forget about the import hook, let's say import is for standard
modules only. What do you think about providing our users a way to
dynamically import some modules without polluting sys.modules
(remember that shared hosting exists) ?
Regards,
Nicolas
2005/6/9, Nick <nick@dd.revealed.net>:
> Graham Dumpleton wrote:
>=20
> As I mentioned before, as a programmer I would want mod_python to be an A=
PI
> for apache in Python, with no special alteration of how Python works as
> advertised in the Python documentation. Installing import hooks bypasses
> the Python import functionality as it's documented, and in my opinion is =
not
> necessary to use mod_python as an API for apache.
>=20
> While it may be convenient for some people to have an import hook that wi=
ll
> automagically reload a module when it changes on disk, that's *not* how
> Python works, and would be unexpected behaviour for a Python programmer.
> That's like saying that since most people want to parse the input stream =
as
> form data, let's automatically parse it and put it into req.form. But
> mod_python doesn't do that. You have to use utils.FieldStorage.
>=20
> For my own applications I take advantage of Python's internal import rout=
ine
> because it is in C and therefore much faster than any pure Python
> implementation (try benchmarking an import hook in ihooks and you'll see =
how
> slow it is). I want to inspect sys.modules, and that's where I expect
> imports to go. I don't want to bypass an import mechanism that works
> differently just to get back to Python's; that's counter-intuitive. Plus=
,
> it'll make reuse a pain, because I'll have to write modules with special
> checks to see if I'm in mod_python or do something else when not.
>=20
> I just don't think default import hooks are necessary to make mod_python
> generally useful to people. I can certainly understand people's needs fo=
r
> something more flexible than Python's stock importer when writing
> applications, and there's always an option to add a module to mod_python
> that can let people do exactly that. I just don't want it out of the box=
..
>=20
> Nick
>
| |
|
| Yeah I totally agree that something needs to be done there. It's a good
utility to have in the mod_python toolbox, and there is a definite need.
I've looked at Graham's code for caching modules, and it does a good job.
If something like that could be included in mod_python, probably in util.py,
that would definitely be an asset.
But definitely EOL the apache.import_module, it just doesn't work well.
Nick
Nicolas Lehuen wrote:[vbcol=seagreen]
> Nick, what do you think about apache.import_module, as it is now
> and/or as I've described before ?
>
> Let's forget about the import hook, let's say import is for standard
> modules only. What do you think about providing our users a way to
> dynamically import some modules without polluting sys.modules
> (remember that shared hosting exists) ?
>
> Regards,
> Nicolas
>
> 2005/6/9, Nick <nick@dd.revealed.net>:
>
| |
| Graham Dumpleton 2005-06-08, 5:46 pm |
|
On 09/06/2005, at 8:04 AM, Nick wrote:
> Graham Dumpleton wrote:
>
> As I mentioned before, as a programmer I would want mod_python to be
> an API for apache in Python, with no special alteration of how Python
> works as advertised in the Python documentation. Installing import
> hooks bypasses the Python import functionality as it's documented, and
> in my opinion is not necessary to use mod_python as an API for apache.
The apache.import_module() method is already currently used to perform
the
top level import of a handler module and already behaves differently to
a
standard "import", so even without an import hook you already have a
problem.
> While it may be convenient for some people to have an import hook that
> will automagically reload a module when it changes on disk, that's
> *not* how Python works, and would be unexpected behaviour for a Python
> programmer. That's like saying that since most people want to parse
> the input stream as form data, let's automatically parse it and put it
> into req.form. But mod_python doesn't do that. You have to use
> utils.FieldStorage.
FWIW, if you use mod_python.publisher form parameters are always
decoded into
req.form regardless of whether you use them and actually can cause a
few problems.
In Vampire I have demonstrated though that lazy form evaluation can be
implemented,
with form parameters only being decoded in the prototype if the handler
actually
indicates by way of defining form arguments that it wants them.
> For my own applications I take advantage of Python's internal import
> routine because it is in C and therefore much faster than any pure
> Python implementation (try benchmarking an import hook in ihooks and
> you'll see how slow it is).
The ihooks module is deprecated and is not used in any way in the new
style import
hooks mechanism in Python. The new style import mechanism is still
almost all written
in C from what I can tell, with call outs to Python code only in those
specific spots
where the now minimal hooks are allowed for a user defined module
importer and loader.
It isn't as bad as you make out, the general runtime cost of your own
code, Apache
and network I/O is still going to greatly exceed any minor cost of the
import hooks.
> I want to inspect sys.modules, and that's where I expect imports to
> go. I don't want to bypass an import mechanism that works differently
> just to get back to Python's; that's counter-intuitive. Plus, it'll
> make reuse a pain, because I'll have to write modules with special
> checks to see if I'm in mod_python or do something else when not.
As I mentioned before, the import hook would only come into play in a
specific
context. It doesn't touch normal imports of stuff from the Python
installation
or site packages directory, or in practice any modules of your own
which you have
placed on sys.path somewhere. As such, no special checks are required,
as general
utility modules wouldn't make use of it.
The intent is that the import hooks are a convenience in some very
specific
cases related to the handler modules as they reside in document tree.
The idea
is that in general people wouldn't rely on them. You cannot get away
from using
it though if you want to use a package like Cheetah Templates and still
want
to use automatic module reloading, as it generates code which uses
"import"
and it isn't possible to change that to use a mod_python specific module
loading function unless you are prepared to rewrite bits of Cheetah
itself.
> I just don't think default import hooks are necessary to make
> mod_python generally useful to people. I can certainly understand
> people's needs for something more flexible than Python's stock
> importer when writing applications, and there's always an option to
> add a module to mod_python that can let people do exactly that. I
> just don't want it out of the box.
Even in Vampire where I have the import hook stuff working, it isn't on
by default
and I would not suggest that it should be on by default.
Graham
| |
|
| Graham Dumpleton wrote:
> The apache.import_module() method is already currently used to perform the
> top level import of a handler module and already behaves differently to a
> standard "import", so even without an import hook you already have a
> problem.
You're right; when I hit apache with ab, I get all sorts of import problems
when importing handlers. Well, even with ab the failure rate is less than
10%; it's just something I accept and start enough servers to mitigate the
problem.
> The ihooks module is deprecated and is not used in any way in the new
I was suggesting to use it solely as an example to benchmark how a pure
Python importing mechanism would perform. God, I hope nobody ever tries to
use it for any purpose other than educational!
[...other stuff here that I appreciate the problems in dealing with, but IMO
are handler specific..]
> Even in Vampire where I have the import hook stuff working, it isn't on
> by default
> and I would not suggest that it should be on by default.
Then why are we debating this? :-) We seem to be in agreement here.
Nick
| |
| Nicolas Lehuen 2005-06-08, 5:46 pm |
| 2005/6/9, Nick <nick@dd.revealed.net>:
> Graham Dumpleton wrote:
the[vbcol=seagreen]
a[vbcol=seagreen]
>=20
> You're right; when I hit apache with ab, I get all sorts of import proble=
ms
> when importing handlers. Well, even with ab the failure rate is less tha=
n
> 10%; it's just something I accept and start enough servers to mitigate th=
e
> problem.
Most of the problems raised when importing handlers (including double
import of handlers) have been solved in the trunk for a while now, the
bug fixes are waiting for an official release... But like I've wrote
before, the current implementation of import_module is still broken
and dangerous for our users.
It's time to change it before it's too late (before compatibility
issues force us to leave it as is). I'll try to spend some time on it
on next sunday.
Regards,
Nicolas
| |
|
| On Thu, 2005-06-09 at 00:50 +0200, Nicolas Lehuen wrote:
> Most of the problems raised when importing handlers (including double
> import of handlers) have been solved in the trunk for a while now, the
> bug fixes are waiting for an official release.
Looks good, seems to work well. However, I never use import_module
myself in any of my code, so I'm never going to interact with the dict
that holds the handler modules.
> But like I've wrote
> before, the current implementation of import_module is still broken
> and dangerous for our users.
I really don't see how you're going to get around using an import hook
for that, at least temporarily while import_module is called so all the
dependent imports get reloaded, unless it's a "standard" Python library
(i.e. /usr/lib/pythonX.X or wherever Python is installed). I also don't
think there's anything wrong with operating directly on the sys.modules
dict if you're properly using the global import lock.
> It's time to change it before it's too late (before compatibility
> issues force us to leave it as is). I'll try to spend some time on it
> on next sunday.
I think the syntax is OK, just the implementation that needs changing.
I don't think there should be any compatibility issues.
Nick
| |
| Graham Dumpleton 2005-06-08, 8:45 pm |
| I admit I haven't necessarily fully digested some of what has already been
proposed but, here is my take on the issue put together on main train ride
this morning to work .....
I feel that there are a lot of issues which need to be covered to solve the
problems with module loading. The apache.import_module() method is
currently used in a number of different contexts and each has differing
requirements. We need to look at each of these in turn and make sure we
clearly record and understand what is required for each.
The first point at which apache.import_module() is used is to load the top
level handler. Ie., the module associated with a PythonHandler directive or
the directive associated with a phase other than the content handler. The
other type of top level import is that done by the PythonImport directive.
If apache.import_module() were to be replaced with a mechanism which avoids
use of the "imp" module and storage of modules in sys.modules, these
particular cases of top level imports wouldn't be able to use it
exclusively. This is because the top level handler for PythonHandler will
often be a module which is stored in the Python site-packages directory.
Ie., modules such as mod_python.publisher, mod_python.psp, mpservlets and
vampire.
There are already problems in situations where in one part of the
documentation tree someone defines PythonHandler to be mod_python.psp and
in a handler in a different part of the tree a handler does an explicit
import of mod_python.psp. From memory, if PythonHandler case is triggered
first, then when the explicit import of mod_python.psp occurs it will fail
as the apache.import_module() function doesn't quite set up the sys.modules
environment in a way that is compatible with the "import" statement.
As well as top level imports from site-packages, PythonHandler has to
deal with the case where the module to be imported is loaded from the
document tree itself, specifically where the Directory directive is
specified or where the .htaccess file resides.
In this case, it currently works by virtue of sys.path being amended by
mod_python to include that directory before doing the import using
apache.import_module(). The problem here is that you can't then easily use
the same named module in different directories as the PythonHandler.
What I think needs to happen for these top level imports is that mod_python
has to determine if the module to be loaded is to come from the document
tree or from somewhere else on sys.path. If the module is not from the
document tree then the standard Python import mechanisms would be used to
import it. Consequently, such modules would not be candidates for any form
of automatic module reloading. Ie., no module reloading is done on anything
in sys.modules as it is now.
This would ensure for example that mod_python.psp is imported in a standard
way and that an explicit import of mod_python.psp from a users handler code
is going to work, thus avoiding the hack at the moment that mod_python.psp
must be loaded in a users handler using apache.import_module().
If mod_python finds that the module is not a standard module but one which
is defined within the document tree, then it would use the new and improved
apache.import_module() which doesn't rely on sys.modules.
Note that the direction I am looking at here is that apache.import_module()
is made to function properly in the contexts it needs to and not perform
double duty in satisfying extra requirements of top level mod_python imports
where it has to import stuff from site-packages. The top level imports
should be treated specially and it should only defer to
apache.import_module() for imports from the document tree.
If this separation is done, I think that the distinction that has been
introduced with a separate module loader in mod_python.publisher can be
eliminated. The apache.import_module() can simply be replaced with that in
mod_python.publisher or a modification of it to satisfy other requirements
I will talk about later in future emails.
As far as imports from any of the above imported modules goes, the general
rule should be that if it is a standard module in sys.path, then "import"
is used. If it is within the document tree then apache.import_module().
As far as utility modules which exist outside of the document tree which
are specifically related to the web application but which aren't on sys.path
and for which you want module reloading to work, apache.import_module()
would still be used, but you have to specify the actual directory to the
function.
In some respects the ability not to specify a path to apache.import_module()
should be disallowed with a path always required. Further, sys.path should
no longer be automatically ammended to include the directory where the
PythonHandler is defined for. And apache.import_module() should never
search in sys.path.
As far as I can tell at the moment, the only real reason that sys.path is
searched at the moment is to satisfy the requirements of top level imports
as far as being able to find stuff in site-packages or elsewhere on sys.path.
As such, if mod_python does special checking and knows when standard Python
imports should be used, this ability can be discarded.
The implication of not extending sys.path automatically is that "import"
will not work to load a file in the same directory as the handler when in
the document tree. This was always dangerous anyway as that module could
also have been loaded by apache.import_module() and a problem could thus
arise. If "import" is used in this way it would need to be changed to
apache.import_module(), or a simple import hook introduced which when
used in a module imported using apache.import_module() will use
apache.import_module() underneath for an "import" of a file in the same
directory.
How does this seem to people? There is stil more detail just in this bit
which will need clarification and there are other issues as well which
I haven't even mentioned.
Anyway, time to do some work.
Graham
| |
|
| Graham Dumpleton wrote:
> Note that the direction I am looking at here is that apache.import_module()
> is made to function properly in the contexts it needs to and not perform
> double duty in satisfying extra requirements of top level mod_python imports
> where it has to import stuff from site-packages. The top level imports
> should be treated specially and it should only defer to
> apache.import_module() for imports from the document tree.
Yes, I was thinking exactly along these lines myself. I have never
really looked that closely at the import_module code since I never used
it, but I felt I might have something useful to contribute if I did.
After looking closely at it, I think you're right. The handler module
loading should be handled separately and maintained separately.
"Autoreloading" a handler module should be as simple as just deleting
the module and everything that came with it, and then calling the loader
again.
> If this separation is done, I think that the distinction that has been
> introduced with a separate module loader in mod_python.publisher can be
> eliminated. The apache.import_module() can simply be replaced with that in
> mod_python.publisher or a modification of it to satisfy other requirements
> I will talk about later in future emails.
I agree; the code there is relatively simple and will probably do the
job for most people. Keep it simple, and it can serve as an example for
people who want to do something more sophisticated. Or they can upgrade
from publisher to vampire; honestly, if you like publisher I don't know
why you *wouldn't* want to upgrade to vampire 
> As far as imports from any of the above imported modules goes, the general
> rule should be that if it is a standard module in sys.path, then "import"
> is used. If it is within the document tree then apache.import_module().
Right, that's what I was trying to get at in my earlier email. Just
avoid the problems of messing with the standard libraries altogether.
> As far as utility modules which exist outside of the document tree which
> are specifically related to the web application but which aren't on sys.path
> and for which you want module reloading to work, apache.import_module()
> would still be used, but you have to specify the actual directory to the
> function.
>
> In some respects the ability not to specify a path to apache.import_module()
> should be disallowed with a path always required. Further, sys.path should
> no longer be automatically ammended to include the directory where the
> PythonHandler is defined for. And apache.import_module() should never
> search in sys.path.
Exactly... apache.import_module should be used only to load support
modules for your application, not in general for importing. What you're
doing with apache.import_module can have unforeseen side effects that
you don't expect, so keeping its use restricted to ONLY modules that you
KNOW need to be reloaded is the best policy. And requiring the path
argument will go a long way in enforcing this.
> The implication of not extending sys.path automatically is that "import"
> will not work to load a file in the same directory as the handler when in
> the document tree. This was always dangerous anyway as that module could
> also have been loaded by apache.import_module() and a problem could thus
> arise. If "import" is used in this way it would need to be changed to
> apache.import_module(), or a simple import hook introduced which when
> used in a module imported using apache.import_module() will use
> apache.import_module() underneath for an "import" of a file in the same
> directory.
I think that's a perfectly acceptable trade off, and it avoids potential
problems that exist with the current code. I still don't necessarily
think and import hook is necessary, as you've got to follow *some*
conventions when you're working within a framework. And, we're really
only talking about people who are going to use the handlers provided
with mod_python.
> How does this seem to people? There is stil more detail just in this bit
> which will need clarification and there are other issues as well which
> I haven't even mentioned.
This all looks good to me. I hate to just say "yes I agree" to
everything without really adding much to the discussion, but you've
clearly been thinking about it a lot longer than most people.
Nick
| |
| Graham Dumpleton 2005-06-08, 8:45 pm |
| Nick wrote ..
> I agree; the code there is relatively simple and will probably do the
> job for most people. Keep it simple, and it can serve as an example for
> people who want to do something more sophisticated. Or they can upgrade
> from publisher to vampire; honestly, if you like publisher I don't know
> why you *wouldn't* want to upgrade to vampire 
Huh!
I actually don't like aspects of publisher, the vampire::publisher equivalent
of mod_python.publisher which addressed shortcomings in the later
was introduced purely to try and entice people to at least look at Vampire
and the other stuff it has to offer which I think is of more use than
publisher alone.
If we fix apache.import_module() and the other issues associated with
mod_python.publisher, there will not necessarily be a reason for people
to use Vampire if all they are interested in is publisher support. :-(
Graham
| |
| Graham Dumpleton 2005-06-09, 7:46 am |
|
On 09/06/2005, at 4:38 PM, Nicolas Lehuen wrote:
> Erm, so, no, handlers could also be imported from the document tree.
> No problem, we can do that, but the security issues pop up once again.
We can't protect against everything though. ;-)
> I've understood you point, but there is a difficulty in judging from a
> PythonHandler directive whether the handler should be loaded as a
> standard Python module, from the sys.path, or as a dynamic Python
> module, from the document tree. Maybe the context of the directive
> could be used for that ; if the directive is defined at the server or
> virtual host level, then it's a top level handler, otherwise if it is
> defined in a Location or Directory (or .htaccess file), then it's a
> handler that should be loaded from the document tree (with a possible
> fallback to sys.path if it is not found ?).
If PythonImport is used, it can only come from sys.path as there is no
connection with a physical directory. Similar with PythonHandler which
is defined in a Location directive, there is no connection with a
directory and thus can only come from sys.path. Thus, only where the
Directory directive is used, or PythonHandler is specified in the
actual .htaccess file do we have a physical directory and can use
apache.import_module().
> Anyway, saying that "import" should be used to import from the
> sys.path and apache.import_module should be used to import from the
> document tree looks like a clean rule, easy to understand and to
> implement.
>
> The suggestion I've made in my former (way too long) mail was simply
> that when a module is not found from the document tree, we could fall
> back to a careful standard import from the sys.path, but this would
> smudge in appearance this clean separation between standard and
> dynamic modules.
At the moment I am a bit worried about falling back on sys.path in
apache.import_module() itself, partly because it confuses the two
concepts, but not sure there aren't some strange problems lurking
in there as well if that was done. What can be considered though
is an alternative fallback search path, one that is distinct from
sys.path but where mod_python style module loading is used if the
module is found in the alternate path. I have implemented this in
Vampire to see how it might work in practice. The main thing that
it allows is for utility code to use apache.import_module() without
the need to have to look up some special configuration mechanism to
determine a special directory from which it should otherwise load
from. Jury is out on this one at the moment though as to whether
it is a good idea or not. :-)
Time now to start bringing up some of the other issues that have to be
dealt with. The first is that the current apache.import_module() is
able to support packages to a degree, its not perfect though as is
evidenced I think by problems importing mod_python.psp. This is more
though to do with it not setting up the import exactly as the standard
Python module importer requires it look. If apache.import_module()
doesn't use sys.modules it will not matter if the way it sets things
up as long as it is able to provide the same behaviour.
The question you might be asking is do we need to support packages.
Well, its a question I am not sure I have a good answer for. I know I
have seen people posting on the mailing list with examples of package
use, eg:
http://www.modpython.org/pipermail/...May/018182.html
In that case though they were using "from/import" and not actually
using apache.import_module(). They did have the package stored in the
document tree though. So, not sure if in practice people are using
packages with apache.import_module() or not.
If packages were supported there would still be a few things to do.
First is that the module loader when given a module name which equates
to the name of a directory would need to see if the directory contains
a __init__.py file and if it does, load that file as the module.
The big problem now is that if the __init__.py file uses standard
import statement with the expectation that it will grab the module
or package from within the same directory, it will not work. This is
because to the Python import system it will not know that it is to
be treated as a package and look in that local directory first.
I got past this problem in Vampire through the use of the import hook.
Vampire would stash a special global object in the module so the import
hook knew that it was a special Vampire managed module and would grab
the module from the local directory and import it using the Vampire
module importing system rather than standard Python module importer.
At the moment though this only works at global scope and not when
import is used in the handler code when executed, although can
probably solve that.
Although from/import syntax also works, if it tried to import "a.b" from
a subpackage, it will not work if "b" wasn't explicitly imported by "a"
to begin with.
In summary, haven't been able to get package imports to work correctly.
If it can't be made to work then would have to say that packages are
not supported by apache.import_module() and if people are using it to
import packages now, they will not have a choice but to not use packages
for handlers in the document tree and if a utility package is in the
document tree, it will have to be moved outside of the document tree
and sys.path set to that location as import isn't going to work for it
if we don't allow document tree directories into sys.path.
The question thus is, if you understand what I am raving about, is
whether it is reasonable that packages will not be supported by
apache.import_module(). There is a slim chance some ones code may
break as a result but the majority would work fine.
Enough for now.
Graham
| |
| Nicolas Lehuen 2005-06-09, 7:46 am |
| 2005/6/9, Graham Dumpleton <grahamd@dscpl.com.au>:
> I admit I haven't necessarily fully digested some of what has already bee=
n
> proposed but, here is my take on the issue put together on main train rid=
e
> this morning to work .....
>=20
> I feel that there are a lot of issues which need to be covered to solve t=
he
> problems with module loading. The apache.import_module() method is
> currently used in a number of different contexts and each has differing
> requirements. We need to look at each of these in turn and make sure we
> clearly record and understand what is required for each.
>=20
> The first point at which apache.import_module() is used is to load the to=
p
> level handler. Ie., the module associated with a PythonHandler directive =
or
> the directive associated with a phase other than the content handler. The
> other type of top level import is that done by the PythonImport directive=
..
>=20
> If apache.import_module() were to be replaced with a mechanism which avoi=
ds
> use of the "imp" module and storage of modules in sys.modules, these
> particular cases of top level imports wouldn't be able to use it
> exclusively. This is because the top level handler for PythonHandler will
> often be a module which is stored in the Python site-packages directory.
> Ie., modules such as mod_python.publisher, mod_python.psp, mpservlets and
> vampire.
Are you saying that the top-level handlers should always reside on
sys.path ? I'm OK with that, but that may be a big restriction in
shared hosting environment. Then again, it could also be a security
measure, as badly conceived top handlers could be a source for
security holes (we know this all to well, hence the 3.1.4 release ;).
So the official justification for this restriction would be "we only
allow top handlers to come from the sys.path because being able to use
any kinf of top level handler would be dangerous in a shared hosting
environment".
=20
> There are already problems in situations where in one part of the
> documentation tree someone defines PythonHandler to be mod_python.psp and
> in a handler in a different part of the tree a handler does an explicit
> import of mod_python.psp. From memory, if PythonHandler case is triggered
> first, then when the explicit import of mod_python.psp occurs it will fai=
l
> as the apache.import_module() function doesn't quite set up the sys.modul=
es
> environment in a way that is compatible with the "import" statement.
>=20
> As well as top level imports from site-packages, PythonHandler has to
> deal with the case where the module to be imported is loaded from the
> document tree itself, specifically where the Directory directive is
> specified or where the .htaccess file resides.
Erm, so, no, handlers could also be imported from the document tree.
No problem, we can do that, but the security issues pop up once again.
> In this case, it currently works by virtue of sys.path being amended by
> mod_python to include that directory before doing the import using
> apache.import_module(). The problem here is that you can't then easily us=
e
> the same named module in different directories as the PythonHandler.
Agreed. This is an ugly hack that we should get rid of.
> What I think needs to happen for these top level imports is that mod_pyth=
on
> has to determine if the module to be loaded is to come from the document
> tree or from somewhere else on sys.path. If the module is not from the
> document tree then the standard Python import mechanisms would be used to
> import it. Consequently, such modules would not be candidates for any for=
m
> of automatic module reloading. Ie., no module reloading is done on anythi=
ng
> in sys.modules as it is now.
>=20
> This would ensure for example that mod_python.psp is imported in a standa=
rd
> way and that an explicit import of mod_python.psp from a users handler co=
de
> is going to work, thus avoiding the hack at the moment that mod_python.ps=
p
> must be loaded in a users handler using apache.import_module().
>=20
> If mod_python finds that the module is not a standard module but one whic=
h
> is defined within the document tree, then it would use the new and improv=
ed
> apache.import_module() which doesn't rely on sys.modules.
>=20
> Note that the direction I am looking at here is that apache.import_module=
()
> is made to function properly in the contexts it needs to and not perform
> double duty in satisfying extra requirements of top level mod_python impo=
rts
> where it has to import stuff from site-packages. The top level imports
> should be treated specially and it should only defer to
> apache.import_module() for imports from the document tree.
>=20
> If this separation is done, I think that the distinction that has been
> introduced with a separate module loader in mod_python.publisher can be
> eliminated. The apache.import_module() can simply be replaced with that i=
n
> mod_python.publisher or a modification of it to satisfy other requirement=
s
> I will talk about later in future emails.
>=20
> As far as imports from any of the above imported modules goes, the genera=
l
> rule should be that if it is a standard module in sys.path, then "import"
> is used. If it is within the document tree then apache.import_module().
>=20
> As far as utility modules which exist outside of the document tree which
> are specifically related to the web application but which aren't on sys.p=
ath
> and for which you want module reloading to work, apache.import_module()
> would still be used, but you have to specify the actual directory to the
> function.
>=20
> In some respects the ability not to specify a path to apache.import_modul=
e()
> should be disallowed with a path always required. Further, sys.path shoul=
d
> no longer be automatically ammended to include the directory where the
> PythonHandler is defined for. And apache.import_module() should never
> search in sys.path.
>=20
> As far as I can tell at the moment, the only real reason that sys.path is
> searched at the moment is to satisfy the requirements of top level import=
s
> as far as being able to find stuff in site-packages or elsewhere on sys.p=
ath.
> As such, if mod_python does special checking and knows when standard Pyth=
on
> imports should be used, this ability can be discarded.
>=20
> The implication of not extending sys.path automatically is that "import"
> will not work to load a file in the same directory as the handler when in
> the document tree. This was always dangerous anyway as that module could
> also have been loaded by apache.import_module() and a problem could thus
> arise. If "import" is used in this way it would need to be changed to
> apache.import_module(), or a simple import hook introduced which when
> used in a module imported using apache.import_module() will use
> apache.import_module() underneath for an "import" of a file in the same
> directory.
>=20
> How does this seem to people? There is stil more detail just in this bit
> which will need clarification and there are other issues as well which
> I haven't even mentioned.
>=20
> Anyway, time to do some work.
>=20
> Graham
>=20
I've understood you point, but there is a difficulty in judging from a
PythonHandler directive whether the handler should be loaded as a
standard Python module, from the sys.path, or as a dynamic Python
module, from the document tree. Maybe the context of the directive
could be used for that ; if the directive is defined at the server or
virtual host level, then it's a top level handler, otherwise if it is
defined in a Location or Directory (or .htaccess file), then it's a
handler that should be loaded from the document tree (with a possible
fallback to sys.path if it is not found ?).
Anyway, saying that "import" should be used to import from the
sys.path and apache.import_module should be used to import from the
document tree looks like a clean rule, easy to understand and to
implement.
The suggestion I've made in my former (way too long) mail was simply
that when a module is not found from the document tree, we could fall
back to a careful standard import from the sys.path, but this would
smudge in appearance this clean separation between standard and
dynamic modules.
Regards,
Nicolas
| |
| Graham Dumpleton 2005-06-09, 5:46 pm |
|
On 09/06/2005, at 11:15 PM, Nick wrote:
> Graham Dumpleton wrote:
>
> Aside from the potential security issues of storing your handler
> modules in the document tree,
Handler modules in your document tree is done all the time, you can't
avoid
it with mod_python.publisher and most other systems.
> I just don't it's a good idea to traverse the document tree for a
> Python module/package. Even in a shared hosting situation, there are
> still ways to store your modules outside the accessible document tree.
> I can just see so many confused people wondering why module B was
> imported instead of module A, which reside in different parts of the
> document tree. And module B isn't even a handler, it's a support
> module. Ugly. Not to mention name collision problems that are bound
> to happen when per directory interpreters aren't being used in that
> situation.
Huh?
We aren't talking about any form of arbitrary traversal of the document
tree.
Overall I am not sure where you are going with this ...
Its getting a bit late for me now to really address these issues
properly plus
what else you said in your email, brain is starting to clag up. I will
say though
that I think I have basically got packages to work now despite what I
said. Just
need to finish a couple of things, clean up code and check that certain
other
usage cases still work.
For me at least, if things can be made to work how people generally
expect them
to work and for it to be transparent, that is better than too many
obscure rules.
Good night from me ...
Graham
| |
|
| Graham Dumpleton wrote:
> Handler modules in your document tree is done all the time, you can't avoid
> it with mod_python.publisher and most other systems.
I think you misunderstand why I'm saying, or maybe I misuderstood Nicolas.
By "Handler" modules I mean your modules that implement accesshandler,
authenhandler, handler, etc. mod_python.publisher is one such module.
vampire is another. At least that's what I understand you to be importing
in the context of this statment:
> I've understood you point, but there is a difficulty in judging from a
> PythonHandler directive whether the handler should be loaded as a
> standard Python module, from the sys.path, or as a dynamic Python
> module, from the document tree.
Nick
| |
|
| Graham Dumpleton wrote:
Aside from the potential security issues of storing your handler modules in
the document tree, I just don't it's a good idea to traverse the document
tree for a Python module/package. Even in a shared hosting situation, there
are still ways to store your modules outside the accessible document tree.
I can just see so many confused people wondering why module B was imported
instead of module A, which reside in different parts of the document tree.
And module B isn't even a handler, it's a support module. Ugly. Not to
mention name collision problems that are bound to happen when per directory
interpreters aren't being used in that situation.
[vbcol=seagreen]
> The big problem now is that if the __init__.py file uses standard
> import statement with the expectation that it will grab the module
> or package from within the same directory, it will not work. This is
> because to the Python import system it will not know that it is to
> be treated as a package and look in that local directory first.
Among other minor "gotchas" that crop up from time to time, but you've hit
the big one. mod_python itelf isn't a framework as such; I'm for making
mod_python accessible and usable and all that, but isn't it fair to say that
mod_python doesn't have to solve *all* import problems? Handling imports
automagically from your imported modules requires an import hook...
> I got past this problem in Vampire through the use of the import hook.
> Vampire would stash a special global object in the module so the import
> hook knew that it was a special Vampire managed module and would grab
> the module from the local directory and import it using the Vampire
> module importing system rather than standard Python module importer.
> At the moment though this only works at global scope and not when
> import is used in the handler code when executed, although can
> probably solve that.
We can probably do some checks to see if we're importing a single module or
an entire package without resorting to import hooks. Deal with the modules
in __all__ (from __init__.py) like we'd handle single module imports, and
any explicit "imports" in the package beyond that are not reloadable.
Otherwise they can explicity call apache.import_module. Document this
behaviour and keep it simple, otherwise people are not going to be able to
debug their problems easily. Not to mention that the code will grow into a
beast.
> Although from/import syntax also works, if it tried to import "a.b" from
> a subpackage, it will not work if "b" wasn't explicitly imported by "a"
> to begin with.
I haven't generally experience that with import hooks, although os (and
therefore os.path) seems to peskily not import because of some sys.modules
manipulation weirdness.
> The question thus is, if you understand what I am raving about, is
> whether it is reasonable that packages will not be supported by
> apache.import_module(). There is a slim chance some ones code may
> break as a result but the majority would work fine.
See above.
Nick
| |
| dharana 2005-06-09, 5:46 pm |
|
Graham Dumpleton wrote:[vbcol=seagreen]
>
> On 08/06/2005, at 8:33 AM, Barry Pearce wrote:
>
I'm with Barry. mod_python is my holy grail now except the import situacion
(apachectl restart in production *weeeh*).
[vbcol=seagreen]
I really get annoyed by so much Vampire ads everywhere. I understand it's
developers have spent considerable time in it but I think people who come to
python for webdev from an easier framework do it because they want _more_
control, not less (at least that is my case). I am happy with my custom
framework in Python now, it didn't took me a lot of time and it's tuned for my
needs. I won't look into using Vampire for that reason.
(Just my newbie opinion on that matter)
| |
| Graham Dumpleton 2005-06-09, 5:46 pm |
|
On 10/06/2005, at 2:53 AM, dharana wrote:
>
> I really get annoyed by so much Vampire ads everywhere. I understand
> it's developers have spent considerable time in it but I think people
> who come to Python for webdev from an easier framework do it because
> they want _more_ control, not less (at least that is my case). I am
> happy with my custom framework in Python now, it didn't took me a lot
> of time and it's tuned for my needs. I won't look into using Vampire
> for that reason.
Vampire is not about giving you less control, it is actually the
opposite. It
gives you more glue components and hooks so as to give you more control
and
more and better ways of doing things over what mod_python by itself
provides.
It is not just some monolithic blob and isn't intended to be a
framework where
you are restricted to working in a certain way.
I'll quit with the advocacy if that is what people want, but it gets
pretty
disheartening when you see people on the mailing list trying to solve
problems,
and not really understanding properly how to do it, when Vampire already
provides an example of how to do it or a pre-canned solution, yet you
can't
even get them too look at it.
I have continually found that it is like the saying "you can lead a
horse to
water, but you can't make it drink". :-)
I'll shut up for a while now.
Graham
| |
| David Fraser 2005-06-10, 2:45 am |
| Graham Dumpleton wrote:
>
> On 10/06/2005, at 2:53 AM, dharana wrote:
>
>
>
> Vampire is not about giving you less control, it is actually the
> opposite. It
> gives you more glue components and hooks so as to give you more
> control and
> more and better ways of doing things over what mod_python by itself
> provides.
> It is not just some monolithic blob and isn't intended to be a
> framework where
> you are restricted to working in a certain way.
>
> I'll quit with the advocacy if that is what people want, but it gets
> pretty
> disheartening when you see people on the mailing list trying to solve
> problems,
> and not really understanding properly how to do it, when Vampire already
> provides an example of how to do it or a pre-canned solution, yet you
> can't
> even get them too look at it.
>
> I have continually found that it is like the saying "you can lead a
> horse to
> water, but you can't make it drink". :-)
>
> I'll shut up for a while now.
I think it's great you've solved some of these problems in Vampire...
Some of the solutions should definitely be brought back into mod_python,
using Vampire as a staging ground for mod_python improvements as you
recommended earlier.
David
|
|
|
|
|