|
Home > Archive > Apache Mod-Python > January 2006 > Server side includes and Python.
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Server side includes and Python.
|
|
| Graham Dumpleton 2006-01-22, 5:48 pm |
| A few weeks back I created a JIRA entry relating to integrating server
side includes with Python. The entry is:
http://issues.apache.org/jira/browse/MODPYTHON-104
I finally got around to having a go at implementing it and have some
initial code now working. The point of this email is to get feedback on
the approach used and whether anyone has any alternate suggestions of
how it should in practice work, or any other ideas for that matter.
For those who might not know what server side includes are all about,
see:
http://httpd.apache.org/docs/2.0/mod/mod_include.html
Effectively, what it is is an Apache output filter that will look for
SGML (HTML/XML) comments in content returned by a handler or as may be
found in a static file. It will identify specially tagged comments and
process them to yield new content which will be inserted in place in the
output instead of the original comment. The filter even supports
conditionals and is able to trigger sub requests and CGI scripts.
To cater for other sources of input, the filter can be extended to
support additional tags. For example, mod_perl adds the tag "perl",
allowing one to use:
<!--#perl sub="MySSI::remote_host" -->
<!--#perl arg="Hello" arg="SSI" arg="World"
sub="sub {
my($r, @args) = @_;
print qq(@args);
}"
-->
Now, there is no reason that support for Python can't also be added and
it is actually quite straightforward to do. The only question that needs
to be answered is how it would be used. What one would get out of it is
a simple template mechanism which uses only mod_python and existing
features of Apache itself, and which can also be used in conjunction
with other modules like mod_perl as well.
Now for how the syntax would work. What I have got going at the moment
are the following two scenarios:
<html>
<body>
<pre>
<!--#python eval="filter.req.subprocess_env['SERVER_ADMIN']" -->
<!--#python
exec="filter.write(filter.req.subprocess_env['SERVER_ADMIN'])" -->
<!--#python exec="print >> filter,
filter.req.subprocess_env['SERVER_ADMIN']" -->
</pre>
</body>
</html>
This first will run "eval" with the result being converted into a string
and included direct into the output.
The second will run "exec" instead. In this case there is no result and
thus filter.write() has to be used to generate the output.
In both cases, the "filter" object is pushed into the local variable
set and would be accessible.
When using "exec", multiple lines of Python code can be provided. As
usual Python indentation is always fun, and as such, when providing
multiple lines of code, you can't have leading whitespace unless it is
truly required for nesting.
<html>
<body>
<pre>
<!--#python exec="
for key in filter.req.subprocess_env:
print >> filter, key, filter.req.subprocess_env[key]
"-->
</pre>
</body>
</html>
Having to include all the code in the HTML isn't fun and it is actually
better to separate it out anyway.
To do that, one simply has to perform an import of the required module
and execute some function contained in it or access data and write it
out.
<!--#python exec="
import sys
print >> filter, sys.version
"-->
If the target module is already the subject of automatic module
reloading
as implemented by mod_python, you might instead use:
<!--#python exec="
from mod_python import apache
module = apache.import_module("example")
module.output_header(filter)
"-->
For "eval", one could provide a short cut mechanism whereby one could
specify
the module in which the code should be evaluated.
<!--#python module="sys" eval="version" -->
Ie., this is essentially equivalent to having said:
from mod_python import apache
module = apache.import_module("sys")
filter.write(str(eval("version", module.__dict__, {"filter":filter})))
The "module" short cut wouldn't exist for "exec" as any module imports
then
performed would pollute the global name space of the specified module,
which
may not be desirable.
As you can see, various things are possible but what is the minimum that
should be provided?
In the interests of discouraging a lot of code inside of a HTML file and
thus protecting people from themselves, should the "exec" variant above
be discarded, leaving just the "eval" variant? This would have the
effect
of forcing users to put complex code into separate modules and simply
triggering calls of the separate code.
<!--#python module="example" eval="output_header(filter)" -->
The only hard bit is how one should treat the response from the function
called. One could say that if the result of the eval is 'None' then no
output is written and if it was a function which was called, it is
assumed
that the called function did that by writing back to the filter object.
In other words, the result of the eval isn't simply converted to a
string.
This would only be done if result is not 'None', which 'None' truly
meaning
no output.
Anyway, that is all there is to it.
Given that the intent here is to try and get this functionality rolled
into
mod_python, any feedback would be most appreciated. No consensus
probably
means it would get rejected.
Thanks.
Graham
| |
| Deron Meranda 2006-01-23, 2:47 am |
| I like the SSI feature. It would fill a nice gap between using
plain HTML files and having to go to a more featured template
or engine. Some things are simple enough that the SSI concept
should be enough, and having Python would be nice.
I do need to give your proposal some more thought before I can
properly comment, but it looks interesting so far.
One thing I think that *should* be very easy to do in an SSI setting
is HTML-escaping. I shouldn't have to do something like
'from cgi import escpe'. Perhaps adding another parameter, like
<!--#python esc=3D"h" eval=3D"print '1<2'"-->
where esc is a built-in escaping filter: h=3Dhtml, u=3Durl, x=3Dxml
(difference between h and x is how it escapes quote chars).
Another question. How are character sets handled? If the
output is a Unicode string, how does it get encoded? Should
it always asume say UTF-8, or can it determine the actual
character encoding for this reponse somehow?
Also, my vote is that None should result in no output.
--
Deron Meranda
| |
| Graham Dumpleton 2006-01-23, 7:48 am |
|
On 23/01/2006, at 4:59 PM, Deron Meranda wrote:
> I like the SSI feature. It would fill a nice gap between using
> plain HTML files and having to go to a more featured template
> or engine. Some things are simple enough that the SSI concept
> should be enough, and having Python would be nice.
>
> I do need to give your proposal some more thought before I can
> properly comment, but it looks interesting so far.
>
> One thing I think that *should* be very easy to do in an SSI setting
> is HTML-escaping. I shouldn't have to do something like
> 'from cgi import escpe'. Perhaps adding another parameter, like
>
> <!--#python esc="h" eval="print '1<2'"-->
>
> where esc is a built-in escaping filter: h=html, u=url, x=xml
> (difference between h and x is how it escapes quote chars).
Using "print" like that can't be done in an "eval", would need to
use "exec". Even then, you have to explicitly direct the "print"
to the filter as not possible with mod_python to change sys.stdout
so that "print" by itself could be used. Thus, with making things
more verbose:
<!--#python escape="html" exec="print >> filter, '1<2'" -->
Unfortunately though, this would not work. This is because it is
writing direct to the filter and there is no opportunity to
escape the content as it is written. Such automatic escaping could
only be done for "eval", and only if the content is the result.
Something could still write direct to the filter object and bypass
it.
Thus, although it indeed sounds like it may be a useful thing to have,
it would not be practical to implement it such that it was able to
capture all generated content. Thus, requiring the user to explicitly
escape where needed might still be necessary.
In terms of avoiding having to do imports at each point of use,
have got it going now so that all code executing in the page uses
the same local variable space. Thus one could do a whole lot of
imports and variable assignments in one exec at the start and
then use that later on. Thus:
<!--#python exec="
from mod_python import apache
example = apache.import_module('example')
import cgi, sys
"-->
<html>
<body>
<p><!-- eval="cgi.escape(sys.version)" --></p>
<p><!-- exec="example.output_body(filter)" --></p>
</body>
</html>
> Another question. How are character sets handled? If the
> output is a Unicode string, how does it get encoded? Should
> it always asume say UTF-8, or can it determine the actual
> character encoding for this reponse somehow?
Changes made in the mod_python.publisher layer in mod_python 3.2
might be used as a guide here. Ie., it tries to be smart about
encoding:
elif isinstance(object,UnicodeType):
# We've got an Unicode string to publish, so we have to
encode
# it to bytes. We try to detect the character encoding
# from the Content-Type header
if req._content_type_set:
charset = re_charset.search(req.content_type)
if charset:
charset = charset.group(1)
else:
# If no character encoding was set, we use UTF8
charset = 'UTF8'
req.content_type += '; charset=UTF8'
else:
# If no character encoding was set, we use UTF8
charset = 'UTF8'
result = object.encode(charset)
In mod_python.publisher you can see though that it only worries
about it if the result is a Unicode string. Ie., if anything else
is returned or the request object is written to direct, then it is
up to the user to implement what they want.
If this approach in mod_python.publisher as seen as reasonable and
works, then for 'eval' one could take the same approach. If you are
going to allow that though, then allowing "escape" for "eval" might
also be seen as reasonable. One would just need to document the
caveats.
That said, can you explain more about the differences between HTML
and XML escaping with the quoting. I don't really understand the
differences. Are there particular Python routines that implement
each variant, or is it some option to 'cgi.escape'? Also, which is
the preferred routine for url encoding?
Thanks for your interest.
Graham
| |
| Deron Meranda 2006-01-23, 5:47 pm |
| On 1/23/06, Graham Dumpleton <grahamd@dscpl.com.au> wrote:
> Using "print" like that can't be done in an "eval", would need to
> use "exec".
Sorry, I probably didn't mean to use the print in my example.
Of course though you can always wrap sys.stdout if you wanted
to capture the output for post-escaping.
> If this approach [to charset handing] in mod_python.publisher as
> seen as reasonable and works, then for 'eval' one could take the
> same approach.
Yes, I think what publisher does looks quite reasonable to me.
They should do it the same way.
> That said, can you explain more about the differences between HTML
> and XML escaping
It comes down to what to do with quote marks. In HTML escaping you
usually use entity references, but in XML you must use numeric character
references for anything except <, >, and &.
HTML XML
---------------------
< < <
> > >
& & &
" " "
' ' '
But really HTML can use character references too, so you could just use
XML escaping and not worry about an HTML special case.
> Are there particular Python routines that implement
> each variant, or is it some option to 'cgi.escape'?
There is a second optional-argument to cgi.escape, which is a Boolean
defaulting to False. If True, it will escape " as ". It never escape=
s
the apostrophe.
> which is the preferred routine for url encoding?
That's much less clear because there's no well-defined idea of
what exactly URL-escaping is...it depends upon the kind of URL.
I would tend to think it would be urllib.quote_plus()
Note that sometimes you may want to do multiple escaping.
URL escaping followed by HTML escaping. Perhaps in something
like,
<a href=3D"<!--#python esc=3D"uh" eval=3D"random_page()"-->">surprise</a>
although that is admittedly an ugly use case.
--
Deron Meranda
| |
| Graham Dumpleton 2006-01-23, 5:47 pm |
|
On 24/01/2006, at 3:07 AM, Deron Meranda wrote:
> On 1/23/06, Graham Dumpleton <grahamd@dscpl.com.au> wrote:
>
> Sorry, I probably didn't mean to use the print in my example.
>
> Of course though you can always wrap sys.stdout if you wanted
> to capture the output for post-escaping.
You can't wrap sys.stdout or replace it for the purposes of a single
request. This is because mod_python needs to be able to work in a
multi thread MPM where multiple requests need to be handled at the
same time. The cgihandler in mod_python does fiddle with sys.stdout,
but to do that it has to lock on the cgihandler so only one at a
time can run, thus affecting performance.
Graham
|
|
|
|
|