|
Home > Archive > Apache Mod-Python > May 2005 > Commented: (MODPYTHON-54) Add a way to import a published page into another published
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Commented: (MODPYTHON-54) Add a way to import a published page into another published
|
|
| Nicolas Lehuen (JIRA) 2005-05-18, 5:50 pm |
| [ http://issues.apache.org/jira/brows...ts#action_65702 ]
Nicolas Lehuen commented on MODPYTHON-54:
-----------------------------------------
I have added two function to mod_python.publisher :
def import_page(relative_path, auto_reload=True):
"""
This imports a published page. The relative_path argument is a path
relative to the directory of the page where import_page() is called.
Hence, if we have index.py and foobar.py in the same directory, index.py
can simply do something like :
import mod_python.publisher
foobar = mod_python.publisher.import_page('foobar.py')
If auto_reload is True (the default), the returned object is not really
the module itself, but a placeholder object which allows the real module
to be automatically reloaded whenever its source file changes.
"""
and
def get_page(req, relative_path):
"""
This imports a published page. The relative_path argument is a path
relative to the published page where the request is really handled (not
relative to the path given in the URL).
Warning : in order to maintain consistency in case of module reloading,
do not store the resulting module in a place that outlives the request
duration.
"""
Now we need a bit of documentation.
> Add a way to import a published page into another published page
> ----------------------------------------------------------------
>
> Key: MODPYTHON-54
> URL: http://issues.apache.org/jira/browse/MODPYTHON-54
> Project: mod_python
> Type: Improvement
> Versions: 3.2.0
> Reporter: Nicolas Lehuen
> Assignee: Nicolas Lehuen
> Fix For: 3.2.0
>
> Before mod_python 3.2, standard Python modules and published modules could be imported the same way, using apache.import_module. This had a number of disadvantages, leading to MODPYTHON-8, MODPYTHON-9, MODPYTHON-10, MODPYTHON-11 and MODPYTHON-12.
> All these bugs were fixed by separating the published modules from the standard Python module. apache.import_module can still be used to import standard modules, but published modules are now fully managed by mod_python.publisher, and are not inserted
into sys.modules.
> The problem is that there is a use case of importing a published module from another published module :
> /index.py----------------
> def index(req):
> return "Hello, world !"
> def utility_function(foobar):
> return foobar+1
> /other.py----------------
> import os
> directory = os.path.split(__file__)[0]
> other_index = apache.import_module("index",path=[directory])
> def index(req):
> return "%s %i"%(other_index.index(req),other_index.utility_function(2004))
> This was alread a bit of a hack in 3.1.4, but in 3.2 it does not really work the expected way since the imported module (other_index in the example) is not the same module as the one the publisher would use to publish /index.py. This could be troublesom
e if the developer wanted to share some data between the modules, e.g. a cache or a connection pool, but not if he only wanted to share some code.
> Therefore, we need to provide a clean API in mod_python.publisher to allow developers to reference another published module.
| |
| Graham Dumpleton 2005-05-18, 8:45 pm |
| Nicolas Lehuen (JIRA) wrote ..
> [ http://issues.apache.org/jira/brows...ts#action_65702
> ]
>
> Nicolas Lehuen commented on MODPYTHON-54:
> -----------------------------------------
>
> I have added two function to mod_python.publisher :
>
> def import_page(relative_path, auto_reload=True):
> """
> This imports a published page. The relative_path argument is a
> path
> relative to the directory of the page where import_page() is called.
> Hence, if we have index.py and foobar.py in the same directory,
> index.py
> can simply do something like :
>
> import mod_python.publisher
> foobar = mod_python.publisher.import_page('foobar.py')
>
> If auto_reload is True (the default), the returned object is not
> really
> the module itself, but a placeholder object which allows the real
> module
> to be automatically reloaded whenever its source file changes.
> """
>
> and
>
> def get_page(req, relative_path):
> """
> This imports a published page. The relative_path argument is a
> path
> relative to the published page where the request is really handled
> (not
> relative to the path given in the URL).
>
> Warning : in order to maintain consistency in case of module reloading,
> do not store the resulting module in a place that outlives the
> request
> duration.
> """
>
> Now we need a bit of documentation.
Don't understand why it has to be relative for import_page(). Should
be able to specify any path. The code can do an os.path.isabs() call
and if it is absolute use it as is and if relative only then consider
trying to automagically turn it into an absolute path somehow.
Overall I feel that these functions might need a bit more thought.
The handle around the module worries me a bit as it would appear to
work in read mode only. Ie., not sure it would work if someone tried to
actually set a global variable within the module from outside of it.
The lack of __setattr__() means it will probably set the variable in the
handle and not the cached module.
There are other things about it which means it will fail as appearing
to be a module. For example, dir() on the handle returns handle
stuff and not module stuff. Granted that for most this might not make
a difference but for a small number it might.
I'll think some more about the code and see if I can come up with any
suggestions.
BTW, do like the idea of looking back up the stack frame to find the
context. Am not sure that it will work all the time, although cases
that it will not probably work wouldn't come up when using publisher.
I'll probably look at falling back to this stack frame peek for my
Vampire importing system, so if I come up with additional checks
required to handle those cases where it may not work, will let you
know about that as well.
Graham
| |
| Nicolas Lehuen 2005-05-19, 7:45 am |
| 2005/5/19, Graham Dumpleton <grahamd@dscpl.com.au>:
> Nicolas Lehuen (JIRA) wrote ..
s#action_65702[vbcol=seagreen]
lled.[vbcol=seagreen]
t[vbcol=seagreen]
al[vbcol=seagreen]
led[vbcol=seagreen]
loading,[vbcol=seagreen]
>=20
> Don't understand why it has to be relative for import_page(). Should
> be able to specify any path. The code can do an os.path.isabs() call
> and if it is absolute use it as is and if relative only then consider
> trying to automagically turn it into an absolute path somehow.
I totally agree. I think the relative path feature is handy, since it
gives you a bit of independence wrt the place where the files are
physically stored. But we should not prevent people from giving an
absolute path, even if it should be discouraged. I'll add your
suggestion of using isabs() to the code.
=20
> Overall I feel that these functions might need a bit more thought.
>=20
> The handle around the module worries me a bit as it would appear to
> work in read mode only. Ie., not sure it would work if someone tried to
> actually set a global variable within the module from outside of it.
> The lack of __setattr__() means it will probably set the variable in the
> handle and not the cached module.
Yeah, I've thought about that, but setting an attribute to a module
which can be reloaded at any time does not seem a good idea to me. I
think the get_page() function (which return a module instance valid
for the duration of the request) is much more robust, since it reckons
that a published module is a volatile object.
There is another way which would not rely on wrapper objects : it is
simply to add the imported file to the dependencies of the importing
file, so that whenever the imported file changes, its page is rebuilt
and the importing page is also rebuilt. This is not really difficult
to do and could be cleaner and faster.
> There are other things about it which means it will fail as appearing
> to be a module. For example, dir() on the handle returns handle
> stuff and not module stuff. Granted that for most this might not make
> a difference but for a small number it might.
>
> I'll think some more about the code and see if I can come up with any
> suggestions.
>=20
> BTW, do like the idea of looking back up the stack frame to find the
> context. Am not sure that it will work all the time, although cases
> that it will not probably work wouldn't come up when using publisher.
> I'll probably look at falling back to this stack frame peek for my
> Vampire importing system, so if I come up with additional checks
> required to handle those cases where it may not work, will let you
> know about that as well.
>
> Graham
>=20
I've seen this pattern in the Python cookbook ; you can also check
the code for traceback.extract_stack (which I was originally relying
upon, but dropped since I do not need all the linecache stuff).
| |
| Graham Dumpleton 2005-05-19, 7:45 am |
|
On 19/05/2005, at 7:24 PM, Nicolas Lehuen wrote:
>
> Yeah, I've thought about that, but setting an attribute to a module
> which can be reloaded at any time does not seem a good idea to me. I
> think the get_page() function (which return a module instance valid
> for the duration of the request) is much more robust, since it reckons
> that a published module is a volatile object.
>
> There is another way which would not rely on wrapper objects : it is
> simply to add the imported file to the dependencies of the importing
> file, so that whenever the imported file changes, its page is rebuilt
> and the importing page is also rebuilt. This is not really difficult
> to do and could be cleaner and faster.
You should really look at the module caching mechanism in Vampire
sometime,
it effectively does what you are now talking about. Vampire takes this
to
the point of doing a depth search for dependencies and reloading a top
level module if any of the nested dependencies, even a few levels below,
have changed. Am sure that some will criticise this as being inefficient
but I design for usability over break neck speed. Anyway, based on
number
of downloads Vampire isn't being used much by anyone but me, so still
very
much my toy for trying out ideas.
The cache code can be seen at:
http://svn.dscpl.com.au/vampire/tru...ampire/cache.py
If you are interested in peering into some of the cache information for
my
own web site, check out:
http://www.dscpl.com.au/projects/va...ples/templates/
cached_modules.html
Graham
| |
| Nicolas Lehuen 2005-05-19, 7:45 am |
| 2005/5/19, Graham Dumpleton <grahamd@dscpl.com.au>:
>=20
> On 19/05/2005, at 7:24 PM, Nicolas Lehuen wrote:
>=20
>=20
> You should really look at the module caching mechanism in Vampire
> sometime,
> it effectively does what you are now talking about. Vampire takes this
> to
> the point of doing a depth search for dependencies and reloading a top
> level module if any of the nested dependencies, even a few levels below,
> have changed. Am sure that some will criticise this as being inefficient
> but I design for usability over break neck speed. Anyway, based on
> number
> of downloads Vampire isn't being used much by anyone but me, so still
> very
> much my toy for trying out ideas.
>=20
> The cache code can be seen at:
>=20
> http://svn.dscpl.com.au/vampire/tru...ampire/cache.py
>=20
> If you are interested in peering into some of the cache information for
> my
> own web site, check out:
>=20
> http://www.dscpl.com.au/projects/va...ples/templates/
> cached_modules.html
>=20
> Graham
I've thought about using some of the work you did on Vampire, but I'm
not really convinced we should go that far.
One thing I want to ensure is that normal Python modules and published
modules are not the same thing. If I were implementing this in Java
(I've did something similar back in the heydays of Java), I would use
different ClassLoaders.
I'm OK with reloading published modules being automagically reloaded,
but I'm not for standard Python modules. Published modules are or
should be developed with some assumptions in mind, namely that the
module can be reloaded at any time, that it can be executed in a
multithreaded environment, and so on. Standard Python modules are not
always built this way, and not all modules can safely be reloaded (at
least, we cannot guarantee this).
That is were Vampire worries me, in that it seems to do its job too
well by being able to reload a huge bunch of modules if one of the
core dependencies change. Or is it ?
Frankly, I'm not convinced that being able to hot-swap any piece of
code is so important. What's the problem with doing an "apachectl
restart" ? There is not interruption of service as Apache is
gracefully closing one set of worker processes while starting a new
set of workers and keeping incoming connections in a buffer until the
new worker is ready to go on. This way, you get a pristine environment
to work with, and not some kind of weirdo, half correctly reloaded,
half "not meant to be reloaded", PLUS half "cannot be properly
reloaded in a multi-threaded context" execution environment .
Reloadable published modules is handy during development, but a fully
reloadable module set (published + standard modules) is a promise for
strange behaviour. As far as we have to dynamically load an arbitrary
piece of code for a given file and use it as a module, there was not
much to do to add the possibility of reloading a modified module. We
could introduce some dependency checks when a published module uses
code from another published module. But going further seems a bit
dangerous and not really useful to me.
That said, I don't want to undermine the feat you accomplished in
implementing the Vampire module cache. There a a bunch of neat ideas
in your code, and the possibility to keep some data during the reload
is very clever. It's just that what is totally justified for the
Vampire publisher may not be needed for the simpler
mod_python.publisher.
Regards,
Nicolas
| |
| Graham Dumpleton 2005-05-19, 7:45 am |
|
On 19/05/2005, at 8:32 PM, Nicolas Lehuen wrote:
> I'm OK with reloading published modules being automagically reloaded,
> but I'm not for standard Python modules. Published modules are or
> should be developed with some assumptions in mind, namely that the
> module can be reloaded at any time, that it can be executed in a
> multithreaded environment, and so on. Standard Python modules are not
> always built this way, and not all modules can safely be reloaded (at
> least, we cannot guarantee this).
The Vampire module loading mechanism doesn't go anywhere near standard
Python modules. Anything it does load does not get put in sys.modules
and
if the advice is taken of setting PythonPath to 'sys.path' there is no
real chance of importing a module using the Vampire import mechanism
at the same time as it is loaded using "import" unless you specifically
go out to break things in that way. Thus, it also is only intended to
be used with published modules or modules which are specifically are
a part of the web application and not some generic module usable outside
of the application.
> Frankly, I'm not convinced that being able to hot-swap any piece of
> code is so important. What's the problem with doing an "apachectl
> restart" ?
One of the environments I have to work in I have no ability to perform
restarts as the system is managed by others. Thus I don't have that
luxury. Many users of ISP environments may not have that luxury either.
> This way, you get a pristine environment
> to work with, and not some kind of weirdo, half correctly reloaded,
> half "not meant to be reloaded", PLUS half "cannot be properly
> reloaded in a multi-threaded context" execution environment .
In Vampire, unlike in mod_python's import_module() function, when it
does a reload it does not reload on top of an existing module because
that I agree is quite dangerous and can result in lots of problems.
There are special hooks that can be defined in a module which allow
you to migrate data from an existing module to the one being reloaded
in a thread safe manner.
The fact that it can look down multiple levels ensures that everything
is consistent as much as possible. This to my mind is much better than
the import_module() method and how it is applied to top level handlers,
where whether auto reload applies depends on what part of the URL
namespace you are in because of the PythonAutoReload option. If
different
parts of the URL namespace have different settings for this option they
will screw each other up. Gets worse when using import_module() direct
as auto reload is on or off on a case by case basis because you have
to pass the option in to specify what happens. If an auto reload option
is provided, it needs to turn it off completely for the whole
interpreter
for the remaining life of the interpreter. The ability to turn it on and
off at will and at different places at the same time is much worse.
> That said, I don't want to undermine the feat you accomplished in
> implementing the Vampire module cache. There a a bunch of neat ideas
> in your code, and the possibility to keep some data during the reload
> is very clever. It's just that what is totally justified for the
> Vampire publisher may not be needed for the simpler
> mod_python.publisher.
All I can really say is that I have obviously been too clever in as
much as it isn't obvious to what extent I have gone to avoid all these
problems. :-(
Graham
| |
| Nicolas Lehuen 2005-05-19, 5:46 pm |
| 2005/5/19, Graham Dumpleton <grahamd@dscpl.com.au>:
>=20
> On 19/05/2005, at 8:32 PM, Nicolas Lehuen wrote:
>=20
> The Vampire module loading mechanism doesn't go anywhere near standard
> Python modules. Anything it does load does not get put in sys.modules
> and
> if the advice is taken of setting PythonPath to 'sys.path' there is no
> real chance of importing a module using the Vampire import mechanism
> at the same time as it is loaded using "import" unless you specifically
> go out to break things in that way. Thus, it also is only intended to
> be used with published modules or modules which are specifically are
> a part of the web application and not some generic module usable outside
> of the application.
My bad, see below.
=20
>=20
> One of the environments I have to work in I have no ability to perform
> restarts as the system is managed by others. Thus I don't have that
> luxury. Many users of ISP environments may not have that luxury either.
Good point. I'm a lucky guy with root access on the server, so I tend
to forget that not everyone can perform an apachectl restart.
=20
>=20
> In Vampire, unlike in mod_python's import_module() function, when it
> does a reload it does not reload on top of an existing module because
> that I agree is quite dangerous and can result in lots of problems.
> There are special hooks that can be defined in a module which allow
> you to migrate data from an existing module to the one being reloaded
> in a thread safe manner.
>=20
> The fact that it can look down multiple levels ensures that everything
> is consistent as much as possible. This to my mind is much better than
> the import_module() method and how it is applied to top level handlers,
> where whether auto reload applies depends on what part of the URL
> namespace you are in because of the PythonAutoReload option. If
> different
> parts of the URL namespace have different settings for this option they
> will screw each other up. Gets worse when using import_module() direct
> as auto reload is on or off on a case by case basis because you have
> to pass the option in to specify what happens. If an auto reload option
> is provided, it needs to turn it off completely for the whole
> interpreter
> for the remaining life of the interpreter. The ability to turn it on and
> off at will and at different places at the same time is much worse.
>=20
>=20
> All I can really say is that I have obviously been too clever in as
> much as it isn't obvious to what extent I have gone to avoid all these
> problems. :-(
>=20
> Graham
My bad, I haven't read your code well enough. You are effectively
doing something pretty close to what I wanted to do, except that I
wanted to handle the dependencies declarations within import_page (by
adding the imported page to a special __dependencies__ attribute
stored into the module) instead of by looking at the content of the
module.
OTOH, you have implemented a clever import hook that allows users to
keep on using the familiar import statement, but I don't feel very
comfortable with this idea. I like the import statement semantics to
remain always the same, and the way import hooks are managed globally
makes me feel a bit nervous. For example, are you sure you don't get
some stale import hooks in the import hooks chain whenever the Vampire
handler is reloaded ?
Anyway, now, I don't know what to do . Should I add this dependency
management to the existing code or integrate some code from Vampire ?
Regards,
Nicolas
| |
| Graham Dumpleton 2005-05-19, 5:46 pm |
|
On 19/05/2005, at 11:16 PM, Nicolas Lehuen wrote:
> OTOH, you have implemented a clever import hook that allows users to
> keep on using the familiar import statement, but I don't feel very
> comfortable with this idea. I like the import statement semantics to
> remain always the same, and the way import hooks are managed globally
> makes me feel a bit nervous. For example, are you sure you don't get
> some stale import hooks in the import hooks chain whenever the Vampire
> handler is reloaded ?
Unfortunately the import hook mechanism isn't ideal. The biggest problem
is that there can only really be one application wide import hook, at
least perhaps if you want to support older versions of Python, don't
know about newer versions of Python. This means that if some other
module got loaded before Vampire which overrode the import hook,
Vampire wouldn't install its own and the feature wouldn't work.
I still need to do more investigation work on the import hook to see if
it is at all possible to override it just for specific modules, ie.,
those loaded using the custom module loader. If I can find a way of
doing this, then great. If not, am always at the risk that if mod_python
were to do a similar thing that my way of doing it would stop working,
although, just had an idea how it could be done in mod_python and I
could still make use of it. :-)
The only reason the import hook exists at all is so that Cheetah
templates
can be used and for the dependency checks of the module loader to work.
The issue here is that the cheetah-compile program when you extend a
page definition in another file, generates code of the form:
from base_class import base_class
There is no way in the code generator to change this behaviour, without
modifying the Cheetah source code, so that I could instead substitute
an explicit call to the module loader instead.
Thus it has a quite specific use at the moment and because of the danger
of it messing with normal imports will only work in quite constrained
circumstances. Specifically, you have to turn it on first with the
VampireImportHooks option. This will then only work for modules which
have been loaded by the module loader in the first place and it will
only
look in the same directory as the file the import is being done in.
Ie., it will not arbitrarily search sys.path. If you do want it to
search
other directories as well, you have to go to extra effort to define the
path in the Vampire configuration file. This path though shouldn't
overlap
sys.path and PythonPath would want to be set to "sys.path" so no part of
the document tree ever gets added to "sys.path".
In other words, a conscious decision needs to be made for it to come
into play at all and it would only be recommended for where you can't
avoid using it. With your frame peek code that I have also now
integrated,
one can do a module import from the same directory with one line anyway,
so the argument that using "import" is simpler, is less of one and thus
the import hook would really only be for special cases.
> Anyway, now, I don't know what to do . Should I add this dependency
> management to the existing code or integrate some code from Vampire ?
I certainly at this stage wouldn't be rushing to do in mod_python what
I have been doing. Even if it were added at some later date, I would
probably require it to be enabled explicitly through an option, using
the current simple and more efficient mechanism as the default.
I think at the moment there are going to be enough issues with the
loader
you have introduced with publisher for people to worry about as it is.
Am a bit worried that the way that certain aspects of the loader don't
behave the same as before might be an issue for some code. It may be
necessary to consider options that can be enabled to make it behave a
bit more like it used to, although this introduces the same sorts of
problems as auto reload being set to different values in different parts
of a document tree.
Anyway, when I get a chance I'll read through your code again and see
if my concerns are valid or not and post something about it. From that
you may see why I had to add certain features in my own module loader.
Graham
| |
| Graham Dumpleton 2005-05-19, 8:45 pm |
| Graham Dumpleton wrote ..
> I still need to do more investigation work on the import hook to see if
> it is at all possible to override it just for specific modules, ie.,
> those loaded using the custom module loader. If I can find a way of
> doing this, then great. If not, am always at the risk that if mod_python
> were to do a similar thing that my way of doing it would stop working,
> although, just had an idea how it could be done in mod_python and I
> could still make use of it. :-)
Hmmm, got it working on my box so am only overriding "import" for specific
modules that get imported by module loader now. Thus, isn't used outside
of scope of those modules and thus less chance for problems and no chance
of some other systems global import hook replacing my own.
Turned out my only problem from when I tried the first time some weeks
back was a lack of an "s" on "__builtins__" in the specific case where it was
required. Is confusing how there is "__builtin__" and "__builtins__". :-(
I'll push it into my subversion repository when I get home again if you want
to look at it.
Graham
| |
| Graham Dumpleton 2005-05-20, 2:45 am |
| Noel J. Bergman wrote ..
> Why are you people cc'ing jira@apache.org with this stuff?
Sorry, simply doing reply all on mailing list to a post that originally
was generated by way of JIRA web site update. Where the web site
updates get posted to a mailing list, the JIRA email address should
possibly somehow be dropped off if this is a problem and is what
is happening as people aren't going to know to manually remove it.
Anyone know how to do that if that is how the address got added
in the first place?
Graham
| |
| Nicolas Lehuen 2005-05-20, 7:46 am |
| The adresse is cced in the first place because all comments to a JIRA
issue are posted to python-dev@httpd.apache.org with the name of the
commenter BUT jira@apache.org as the email address, instead of the
real email address of the commenter. This may be a configuration
problem or a bug in JIRA.
Regards,
Nicolas
2005/5/20, Graham Dumpleton <grahamd@dscpl.com.au>:
> Noel J. Bergman wrote ..
>=20
> Sorry, simply doing reply all on mailing list to a post that originally
> was generated by way of JIRA web site update. Where the web site
> updates get posted to a mailing list, the JIRA email address should
> possibly somehow be dropped off if this is a problem and is what
> is happening as people aren't going to know to manually remove it.
> Anyone know how to do that if that is how the address got added
> in the first place?
>=20
> Graham
>
|
|
|
|
|