|
Home > Archive > Apache Mod-Python > December 2005 > Various musings about the request URL / URI / whatever
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Various musings about the request URL / URI / whatever
|
|
| Nicolas Lehuen 2005-11-29, 7:47 am |
| Hi,
Is it me or is it quite tiresome to get the URL that called us, or get the
complete URL that would call another function ?
When performing an external redirect (using mod_python.util.redirect for
example), we MUST (as per RFC) provide a full URL, not a relative one.
Instead of util.redirect(req,'/foo/bar.py'), we should write util.redirect
(req,'https://whatever:8443/foo/bar.py').
The problem is, writing this is always tiresome, as it means building a
string like this :
def current_url(req):
req.add_common_vars()
current_url = []
# protocol
if req.subprocess_env.get('HTTPS') == 'on':
current_url.append('https')
default_port = 443
else:
current_url.append('http')
default_port = 80
current_url.append('://')
# host
current_url.append(req.hostname)
# port
port = req.connection.local_addr[1]
if port != default_port:
current_url.append(':')
current_url.append(str(port))
# URI
current_url.append(req.uri)
return ''.join(current_url)
So I have two questions :
First question, is there a simpler way to do this ? Ironically, when using
mod_rewrite, you get an environment variable named SCRIPT_URI which is
precisely what I need (SCRIPT_URL, also added by mod_rewrite, is equivalent
to req.uri... Don't ask we why). But relying on it isn't safe since
mod_rewrite isn't always used.
Second question, if there isn't any simpler way to do this, should we add it
to mod_python ? Either as a function like above in mod_python.util, or as a
member of the request object (named something like url to match the other
member named uri, but that's just teasing).
And third question (in pure Spanish inquisition style) : why is
req.parsed_uri returning me a tuple full of Nones except for the uri and
path_info part ?
Ah, fourth question : why are we (mod_python, mod_rewrite and the CGI
environment variables) using the terms "URI" and "URL" to distinguish
between a full, absolute resource path and a path relative to the server,
whereas the definition of URLs and URIs is very vague and nothing close to
this (http://www.w3.org/TR/uri-clarification/#contemporary) ? Shouldn't we
save our souls and a lot of saliva by choosing better names ?
OK, OK, fifth question : we made req.filename and other members writable.
But when those attributes are changed, as Graham noted a while ago, the
other dependent ones aren't, leading to inconsitencies (for example, if you
change req.filename, req.canonical_filename isn't changed). Should we try to
solve this and provide clear definition of the various parts of a request
for mod_python 3.3 ?
Regards,
Nicolas
**
| |
| Jim Gallacher 2005-11-29, 5:49 pm |
| Nicolas Lehuen wrote:
> Hi,
>
> Is it me or is it quite tiresome to get the URL that called us, or get
> the complete URL that would call another function ?
>
> When performing an external redirect (using mod_python.util.redirect for
> example), we MUST (as per RFC) provide a full URL, not a relative one.
> Instead of util.redirect(req,'/foo/bar.py'), we should write
> util.redirect(req,'https://whatever:8443/foo/bar.py').
>
> The problem is, writing this is always tiresome, as it means building a
> string like this :
>
> def current_url(req):
> req.add_common_vars()
> current_url = []
>
> # protocol
> if req.subprocess_env.get('HTTPS') == 'on':
> current_url.append('https')
> default_port = 443
> else:
> current_url.append('http')
> default_port = 80
> current_url.append('://')
>
> # host
> current_url.append(req.hostname)
>
> # port
> port = req.connection.local_addr[1]
> if port != default_port:
> current_url.append(':')
> current_url.append(str(port))
>
> # URI
> current_url.append(req.uri)
>
> return ''.join(current_url)
>
> So I have two questions :
>
> First question, is there a simpler way to do this ? Ironically, when
> using mod_rewrite, you get an environment variable named SCRIPT_URI
> which is precisely what I need (SCRIPT_URL, also added by mod_rewrite,
> is equivalent to req.uri... Don't ask we why). But relying on it isn't
> safe since mod_rewrite isn't always used.
I guess you could just assemble the parts from the req.parsed_uri tuple,
except that apache doesn't actually fill in parsed_uri. 
> Second question, if there isn't any simpler way to do this, should we
> add it to mod_python ? Either as a function like above in
> mod_python.util, or as a member of the request object (named something
> like url to match the other member named uri, but that's just teasing).
I'm not against it, but for my purposes I think it would be more useful
for parsed_uri to just work properly.
> And third question (in pure Spanish inquisition style) : why is
> req.parsed_uri returning me a tuple full of Nones except for the uri and
> path_info part ?
It comes from apache that way. I sure don't know why though. Maybe we're
missing some magic apache call that would fill it in?
> Ah, fourth question : why are we (mod_python, mod_rewrite and the CGI
> environment variables) using the terms "URI" and "URL" to distinguish
> between a full, absolute resource path and a path relative to the
> server, whereas the definition of URLs and URIs is very vague and
> nothing close to this
> (http://www.w3.org/TR/uri-clarification/#contemporary) ? Shouldn't we
> save our souls and a lot of saliva by choosing better names ?
Strangely I was reading the cited page just last week, for perhaps the
100th time. I keep hoping I'll find enlightment but alas no. The danger
of choosing new names (ie absolute_thingy or relative_thingy) is that
we also add another layer of confusion. I'm not saying new names are a
bad idea, just that we need to be very careful.
> OK, OK, fifth question : we made req.filename and other members
> writable. But when those attributes are changed, as Graham noted a while
> ago, the other dependent ones aren't, leading to inconsitencies (for
> example, if you change req.filename, req.canonical_filename isn't
> changed). Should we try to solve this and provide clear definition of
> the various parts of a request for mod_python 3.3 ?
That would make sense. I'm wondering how often people make use of
req.canonical_filename (CFN*)? Also, just how would the CFN be
adjusted if the user code changes req.filename, since the user is free
to put any string in there they want? Maybe CFN just gets changed to the
same string. Hopefully Graham will shed some light on this, since it was
his use case.
Regards,
Jim
* Because I can't type canonical_filename the same way twice. Stupid
fingers.
| |
| Mike Looijmans 2005-11-29, 5:49 pm |
| Nicolas Lehuen wrote:
> When performing an external redirect (using mod_python.util.redirect for
> example), we MUST (as per RFC) provide a full URL, not a relative one.
> Instead of util.redirect(req,'/foo/bar.py'), we should write
> util.redirect(req,'https://whatever:8443/foo/bar.py').
The RFC and reality are not in agreement on this one.
I haven't met a browser that refused to follow a relative redirection,
even though the RFC states that all redirects must be absolute. Only
lynx would output a warning if you gave it enough switches on the
commandline.
But I do feel that the code snippet you provided should end up in a
library somewhere. A robust way of creating an absolute URL is a
nice-to-have function that we should not all be writing for ourselves.
Maybe a util.redirect_relative(req, link) that does a best effort to
make an absolute url and redirect to that. If it fails to create an
absolute path (e.g. because the HOST header is missing and the host name
wasn't set in apache.conf) it should just send a relative link (using
the machine's address is probably worse than that - there's no telling
how the client ever reached us if it doesn't tell us).
| |
| Daniel J. Popowich 2005-11-29, 5:49 pm |
|
Jim Gallacher writes:
> Nicolas Lehuen wrote:
>
> I'm not against it, but for my purposes I think it would be more useful
> for parsed_uri to just work properly.
Here, here!! I've wanted parsed_uri to work as expected for quite
some time...I'm actually in a position where I could devote some time
to tracking this down. If apache doesn't provide it, I think
mod_python should at least fill it in, right? Can someone knudge me
in the right direction to start? Haven't poked around apache source
and/or developer docs in years.
Daniel Popowich
---------------
http://home.comcast.net/~d.popowich/mpservlets/
| |
| Gregory (Grisha) Trubetskoy 2005-11-29, 5:49 pm |
|
On Tue, 29 Nov 2005, Nicolas Lehuen wrote:
> def current_url(req):
[snip]
>
> # host
> current_url.append(req.hostname)
[snip]
This part isn't going to work reliably if you are not using virtual hosts
and just bind to an IP number. Deciphering the URL is an impossible task -
I used to have similar code in my apllications, but lately I realized that
it does not work reliably and it is much simpler to just treat it as a
configuration item...
> First question, is there a simpler way to do this ? Ironically, when using
> mod_rewrite, you get an environment variable named SCRIPT_URI which is
> precisely what I need (SCRIPT_URL, also added by mod_rewrite, is equivalent
> to req.uri... Don't ask we why). But relying on it isn't safe since
> mod_rewrite isn't always used.
well - here's how it does it.
/*
* create the SCRIPT_URI variable for the env
*/
/* add the canonical URI of this URL */
thisserver = ap_get_server_name(r);
port = ap_get_server_port(r);
if (ap_is_default_port(port, r)) {
thisport = "";
}
else {
apr_snprintf(buf, sizeof(buf), ":%u", port);
thisport = buf;
}
thisurl = apr_table_get(r->subprocess_env, ENVVAR_SCRIPT_URL);
/* set the variable */
var = apr_pstrcat(r->pool, ap_http_method(r), "://", thisserver, thisport,
thisurl, NULL);
apr_table_setn(r->subprocess_env, ENVVAR_SCRIPT_URI, var);
/* if filename was not initially set,
* we start with the requested URI
*/
if (r->filename == NULL) {
r->filename = apr_pstrdup(r->pool, r->uri);
rewritelog(r, 2, "init rewrite engine with requested uri %s",
r->filename);
}
> Second question, if there isn't any simpler way to do this, should we add it
> to mod_python ? Either as a function like above in mod_python.util, or as a
> member of the request object (named something like url to match the other
> member named uri, but that's just teasing).
I don't know... Since the result is going to be half-baked... I think a
more interesting and mod_python-ish thing to do would be to expose all the
API's used in the above code (e.g. ap_get_server_name, ap_is_default_port,
ap_http_method) FIRST, then think about this.
> And third question (in pure Spanish inquisition style) : why is
> req.parsed_uri returning me a tuple full of Nones except for the uri and
> path_info part ?
This is an httpd question most likely...
> Ah, fourth question : why are we (mod_python, mod_rewrite and the CGI
> environment variables) using the terms "URI" and "URL" to distinguish
> between a full, absolute resource path and a path relative to the server,
> whereas the definition of URLs and URIs is very vague and nothing close to
> this (http://www.w3.org/TR/uri-clarification/#contemporary) ? Shouldn't we
> save our souls and a lot of saliva by choosing better names ?
No, we (mod_python) should just use the exact same name that httpd uses.
If we come up better names, then it's just going to make it even more
confusing.
> OK, OK, fifth question : we made req.filename and other members writable.
> But when those attributes are changed, as Graham noted a while ago, the
> other dependent ones aren't, leading to inconsitencies (for example, if you
> change req.filename, req.canonical_filename isn't changed). Should we try to
> solve this
The solutions is to make req.canonical_filename writable too and document
that if you change req.filename, you may consider changing
canonical_filename as well and what will happen if you do not.
> and provide clear definition of the various parts of a request
> for mod_python 3.3 ?
Yes, that'd be good 
Grisha
| |
| Nicolas Lehuen 2005-11-29, 5:50 pm |
| 2005/11/29, Gregory (Grisha) Trubetskoy <grisha@apache.org>:
>
>
> On Tue, 29 Nov 2005, Nicolas Lehuen wrote:
>
>
> [snip]
>
>
> [snip]
>
> This part isn't going to work reliably if you are not using virtual hosts
> and just bind to an IP number. Deciphering the URL is an impossible task -
> I used to have similar code in my apllications, but lately I realized that
> it does not work reliably and it is much simpler to just treat it as a
> configuration item...
That's awful. How come such a basic thing is so difficult ? I mean, isn't it
weird that server-side code has less information about its URL than the
client ? Note that it's not a mod_python specific problem, I've seen it also
in the Servlet API.
If I understand you correctly, req.hostname is not reliable in case where
virtual hosting is not used. What about server.server_hostname, which seems
to be used by the code from mod_rewrite you posted below ? Can it be used
reliably ?
> First question, is there a simpler way to do this ? Ironically, when using
> equivalent
>
> well - here's how it does it.
>
> /*
> * create the SCRIPT_URI variable for the env
> */
>
> /* add the canonical URI of this URL */
> thisserver = ap_get_server_name(r);
> port = ap_get_server_port(r);
> if (ap_is_default_port(port, r)) {
> thisport = "";
> }
> else {
> apr_snprintf(buf, sizeof(buf), ":%u", port);
> thisport = buf;
> }
> thisurl = apr_table_get(r->subprocess_env, ENVVAR_SCRIPT_URL);
>
> /* set the variable */
> var = apr_pstrcat(r->pool, ap_http_method(r), "://", thisserver,
> thisport,
> thisurl, NULL);
> apr_table_setn(r->subprocess_env, ENVVAR_SCRIPT_URI, var);
>
> /* if filename was not initially set,
> * we start with the requested URI
> */
> if (r->filename == NULL) {
> r->filename = apr_pstrdup(r->pool, r->uri);
> rewritelog(r, 2, "init rewrite engine with requested uri %s",
> r->filename);
> }
Shall we add this code to the native part of the request object, then ? Or
the server object (without the URL part), maybe ? But is it really reliable
(see question above) ?
> Second question, if there isn't any simpler way to do this, should we add
> it
> as a
> other
>
> I don't know... Since the result is going to be half-baked... I think a
> more interesting and mod_python-ish thing to do would be to expose all the
> API's used in the above code (e.g. ap_get_server_name, ap_is_default_port,
> ap_http_method) FIRST, then think about this.
>
>
> This is an httpd question most likely...
So it's a feature / bug in httpd. Maybe it's due to my use of
VirtualDocumentRoot.
> Ah, fourth question : why are we (mod_python, mod_rewrite and the CGI
> server,
> to
> we
>
> No, we (mod_python) should just use the exact same name that httpd uses.
> If we come up better names, then it's just going to make it even more
> confusing.
Fair enough. The problem is that even httpd and mod_rewrite don't agree on
what an URL and an URI are...
> OK, OK, fifth question : we made req.filename and other members writable.
> you
> try to
>
> The solutions is to make req.canonical_filename writable too and document
> that if you change req.filename, you may consider changing
> canonical_filename as well and what will happen if you do not.
>
>
> Yes, that'd be good 
>
> Grisha
>
| |
| Gregory (Grisha) Trubetskoy 2005-11-29, 5:50 pm |
|
On Tue, 29 Nov 2005, Nicolas Lehuen wrote:
> If I understand you correctly, req.hostname is not reliable in case where
> virtual hosting is not used. What about server.server_hostname, which seems
> to be used by the code from mod_rewrite you posted below ? Can it be used
> reliably ?
I don't think so.
if I do this:
telnet some.host.com 80
GET /index.html
How would apache know what the hostname is?
Grisha
| |
| Gregory (Grisha) Trubetskoy 2005-11-29, 5:50 pm |
|
On Tue, 29 Nov 2005, Nicolas Lehuen wrote:
> 2005/11/29, Gregory (Grisha) Trubetskoy <grisha@apache.org>:
>
> Shall we add this code to the native part of the request object, then ? Or
> the server object (without the URL part), maybe ?
No, I wasn't suggesting that by any means :-) The point was to demostrate
that mod_rewrite does pretty much the same excercise your Python code was
doing, there is no magic there.
What I did suggest was:
[vbcol=seagreen]
Grisha
| |
|
| Gregory (Grisha) Trubetskoy wrote:[vbcol=seagreen]
> What I did suggest was:
>
+1. I think it would be much more Pythonic (and much more maintainable) to
have the _apache module only provide a nearly transparent access to the
apache API, then have the actual apache.py wrap everything. The way it is
now it's kind of half and half. Only really CPU intensive stuff should be
in the C code (is there really anything like that?).
Also, you give others the opportunity to implement their own wrappers
instead of using those in apache.py, if they so desire.
Nick
| |
| Jim Gallacher 2005-11-29, 5:50 pm |
| Daniel J. Popowich wrote:
> Jim Gallacher writes:
>
>
>
> Here, here!! I've wanted parsed_uri to work as expected for quite
> some time...I'm actually in a position where I could devote some time
> to tracking this down. If apache doesn't provide it, I think
> mod_python should at least fill it in, right?
+1
> Can someone knudge me
> in the right direction to start? Haven't poked around apache source
> and/or developer docs in years.
All I can say is grep is your friend. 
I've found http://docx.webperf.org to be useful. Unfortunately you can
only drill down into the header files, not c files (unless I'm missing
something). I might even be tempted to generate my own local copy of the
apache docs using doxygen so that the c-files get included. I've been
playing with doxygen + mod_python and it's pretty cool.
Searching docx for parse_uri turns up ap_parse_uri.
http://docx.webperf.org/group__APAC...PROTO.html#ga44
Grab the src and put grep to work. I'll dig in and help any way I can.
Jim
| |
| Gregory (Grisha) Trubetskoy 2005-11-29, 5:50 pm |
|
On Tue, 29 Nov 2005, Jim Gallacher wrote:
> Daniel J. Popowich wrote:
>
> +1
I don't know what the specific issue is with parsed_uri, if this is a
mod_python bug it should just be fixed BUT if this is an issue with httpd,
I don't think we should cover the problem up by having mod_python "fix"
it. Since we are part of the HTTP Server project, we should just fix it in
httpd.
Grisha
| |
| Jim Gallacher 2005-11-29, 5:50 pm |
| Gregory (Grisha) Trubetskoy wrote:
>
> On Tue, 29 Nov 2005, Jim Gallacher wrote:
>
>
>
> I don't know what the specific issue is with parsed_uri, if this is a
> mod_python bug it should just be fixed BUT if this is an issue with
> httpd, I don't think we should cover the problem up by having mod_python
> "fix" it. Since we are part of the HTTP Server project, we should just
> fix it in httpd.
Either way, it should be fixed.
In case anyone is not familiar with the issue, a request for
http://example.com/tests/mptest?view=form currently gives a tuple that
looks something like this:
(None, None, None, None, None, None, '/tests/mptest', 'view=form', None)
which is not what we expect. This is what the mod_python docs have to say:
parsed_uri
Tuple. The URI broken down into pieces. (scheme, hostinfo, user,
password, hostname, port, path, query, fragment). The apache module
defines a set of URI_* constants that should be used to access elements
of this tuple. Example:
fname = req.parsed_uri[apache.URI_PATH]
(Read-Only)
Jim
| |
| Jim Gallacher 2005-11-29, 5:50 pm |
| Nicolas Lehuen wrote:
> Hi,
>
> Is it me or is it quite tiresome to get the URL that called us, or get
> the complete URL that would call another function ?
>
> When performing an external redirect (using mod_python.util.redirect for
> example), we MUST (as per RFC) provide a full URL, not a relative one.
> Instead of util.redirect(req,'/foo/bar.py'), we should write
> util.redirect(req,'https://whatever:8443/foo/bar.py').
>
> The problem is, writing this is always tiresome, as it means building a
> string like this :
>
> def current_url(req):
... snip ...
> So I have two questions :
>
> First question, is there a simpler way to do this ? Ironically, when
> using mod_rewrite, you get an environment variable named SCRIPT_URI
> which is precisely what I need (SCRIPT_URL, also added by mod_rewrite,
> is equivalent to req.uri... Don't ask we why). But relying on it isn't
> safe since mod_rewrite isn't always used.
I was digging into the parsed_uri issue and I came across this nugget:
char* apr_uri_unparse (apr_pool_t * p,
const apr_uri_t * uptr,
unsigned flags
)
Unparse a apr_uri_t structure to an URI string. Optionally suppress the
password for security reasons.
http://docx.webperf.org/group__APR__Util__URI.html#ga2
See srclib/apr-util/uri/apr_uri.c. in the apache source.
Might this do the job of your current_url function, assuming of course
that we can sort out the parsed_uri issue? BTW that magic seems to
happen in apr_uri_parse. Not sure yet why the whole tuple is not getting
filled in yet though.
Jim
| |
| Daniel J. Popowich 2005-11-30, 2:46 am |
|
Gregory (Grisha) Trubetskoy writes:
>
> On Tue, 29 Nov 2005, Nicolas Lehuen wrote:
>
>
> I don't think so.
>
> if I do this:
>
> telnet some.host.com 80
>
> GET /index.html
>
> How would apache know what the hostname is?
By the Host header. I've been looking into this issue tonight and
think I have the answers (but it's really late, so I'll save the gory
details for tomorrow). In brief: typically, req.hostname is set from
the Host header and, in fact, when I telnet to apache and issue a GET
by hand, if I don't send the Host header, apache barfs with a 400, Bad
Request, response. (apache 2.0.54, debian testing)
As for the larger issue at hand: the reason req.parsed_uri is not
filled in is because browsers don't send the info in the GET, e.g.,
browsers send this:
GET /index.py?a=b&c=d HTTP/1.1
not
GET http://user:pass@somehost.org:80/index.py?a=b&c=d#here HTTP/1.1
if they did, parsed_uri would be filled in. req.unparsed_uri is
whatever the "word" after GET in the http protocol exchange;
req.parsed_uri is the parsing of that "word."
Given the full URI spec:
SCHEME://[USER[:PASS]@]HOST[:PORT]/PATH?QUERY#FRAGMENT
you can see where eight of the nine elements of the parsed_uri tuple
come from; the ninth, hostinfo, is the combination of
[USER[:PASS]@]HOST[:PORT] (everything between "//" and "/").
Unfortunately, browsers only send:
/PATH?QUERY
and that's why we only ever see it in unparsed_uri and parsed_uri.
Again, lots more to share...in the morrow...
Daniel Popowich
---------------
http://home.comcast.net/~d.popowich/mpservlets/
| |
| Mike Looijmans 2005-11-30, 2:46 am |
| Daniel J. Popowich wrote:
> By the Host header. I've been looking into this issue tonight and
> think I have the answers (but it's really late, so I'll save the gory
> details for tomorrow). In brief: typically, req.hostname is set from
> the Host header and, in fact, when I telnet to apache and issue a GET
> by hand, if I don't send the Host header, apache barfs with a 400, Bad
> Request, response. (apache 2.0.54, debian testing)
It will only do that if you claim to be a HTTP/1.1 client. If you send
GET / HTTP/1.0
it will not complain about the host header. Sending:
GET / HTTP/1.1
will get you a 400 response, because you MUST supply it (says RFC 2068,
and whatever superseded that one). There is more you must do to be able
to call yourself HTTP/1.1 by the way, such as keep-alive connections and
chunked encoding.
| |
| Gregory (Grisha) Trubetskoy 2005-11-30, 5:48 pm |
|
I guess the fundamental problem here now that I think about it is that
such a Host header based determination relies on trusting the client of
what the host should be, which, if you think about it isn't a good
programming practice.
For example, if Apache is configured such that it just answers requests
regardless of what the Host header says (which is the default
configuration usually), then if the client sends "gobbledygook.bleh" as
the host name, then that becomes the URL. While this may be harmless, it
can at least be a source of confusion and there may be even a security
issue lurking there somewhere.
I think a properly designed site should insist on its host name, i.e. "I
see you think I'm gobbledygook.bleh, but I'm going to redirect you to
http://www.modpython.org/ because that is my true name". This is very
common with sites that respond to both www.site.com and site.com, but
insist on only one of those by redirecting.
Grisha
| |
| Daniel J. Popowich 2005-11-30, 5:48 pm |
|
Gregory (Grisha) Trubetskoy writes:
> I think a properly designed site should insist on its host name, i.e. "I
> see you think I'm gobbledygook.bleh, but I'm going to redirect you to
> http://www.modpython.org/ because that is my true name". This is very
> common with sites that respond to both www.site.com and site.com, but
> insist on only one of those by redirecting.
As I said in my previous email to the list, I *think* if you use
virtual hosts and your "real" sites are NOT the first real host, then
you are forcing clients to speak HTTP/1.1, thus forcing the Host
header to be sent. If you then put in your first, default
virtualhost:
RedirectPermanent / http://realserver/
then you protect yourself from "gobbledygook.bleh" because that will
be sent to the default virtualhost which will redirect.
Right? If so, a bit convoluted and not accessible to the novice.
Perhaps a "Tips & Tricks" chapter to the manual?
Daniel Popowich
---------------
http://home.comcast.net/~d.popowich/mpservlets/
| |
| Daniel J. Popowich 2005-11-30, 5:48 pm |
|
> As for the larger issue at hand: the reason req.parsed_uri is not
> filled in is because browsers don't send the info in the GET...
Disclaimer: What follows is not an exhaustive, conclusive search by
tracing running code, but rather searching source code and watching
apache behaviour with tools like curl, telneting to the apache port
and using a browser.
Onward...
As mentioned already, req.parsed_uri is a tuple wrapping of a
request_rec.parsed_uri which is an apr_uri_t.
The contents of this struct are touched in many places, but the
primary functions setting this structure are in
srclib/apr-util/uri/apr_uri.c: apr_uri_parse() and
apr_uri_parse_hostinfo(). Doing a search within apache to see where
these functions are called I discovered a number of modules making use
of these functions, but probably not of concern to this issue. The
primary caller is ap_parse_uri() in server/protocol.c.
ap_parse_uri() is called numerous times in server/request.c to deal
with sub-requests; it is also called in modules/http/http_request.c
for internal redirects. The main calling stack which is of concern to
this issue is:
Function Called Function defined in
---------------------------------------------------------------
ap_process_http_connection() [modules/http/http_core.c]
=> ap_read_request() [server/protocol.c]
=> read_request_line() [server/protocol.c]
=> ap_parse_uri() [server/protocol.c]
=> apr_uri_parse() [srclib/apr-util/uri/apr_uri.c]
ap_parse_uri is called with a request_rec and the uri (as a string);
the string is what read_request_line delivers; this is whatever is
specified with GET during the protocol exchange with the client. If
the uri is "full" then the whole struct is properly filled in (BTW,
the apr_uri_t is zero'd out with memset in apr_uri_parse).
Observations
============
I wrote a handler to return, as text/plain, the setting of various req
members of interest to this discussion. I set up apache to run on a
non-default port and required basic auth to access the page so the
full uri will be parsed (theoretically).
When I type the following into my browser (firefox):
http://foo:bar@localhost:8000/~dpop...ed?a=b&c=d#here
Here's the output:
req.hostname: localhost
req.unparsed_uri: /~dpopowich/py/parsed?a=b&c=d
req.parsed_uri: (None, None, None, None, None, 8000, '/~dpopowich/py/parsed', 'a=b&c=d', None)
req.uri: /~dpopowich/py/parsed
req.args: a=b&c=d
It appears only "/PATH?QUERY" has been passed to the server and I
confirmed this by sniffing the packets. It's interesting that the
port is set and hostname is not...I think this has to do with some
code in the virtual host handling.
Here's the output with a verbose call with curl (same uri as above):
* About to connect() to localhost port 8000
* Trying 127.0.0.1... connected
* Connected to localhost (127.0.0.1) port 8000
* Server auth using Basic with user 'foo'
> GET /~dpopowich/py/parsed?a=b&c=d#here HTTP/1.1
> Authorization: Basic Zm9vOmJhcg==
> User-Agent: curl/7.15.0 (i486-pc-linux-gnu) libcurl/7.15.0 OpenSSL/0.9.8a zlib/1.2.3 libidn/0.5.18
> Host: localhost:8000
> Accept: */*
>
< HTTP/1.1 200 OK
< Date: Wed, 30 Nov 2005 15:43:19 GMT
< Server: Apache/2.0.54 (Debian GNU/Linux) mod_python/3.2.5b Python/2.3.5 mod_perl/2.0.1 Perl/v5.8.7
< Connection: close
< Transfer-Encoding: chunked
< Content-Type: text/plain
req.hostname: localhost
req.unparsed_uri: /~dpopowich/py/parsed?a=b&c=d#here
req.parsed_uri: (None, None, None, None, None, 8000, '/~dpopowich/py/parsed', 'a=b&c=d', 'here')
req.uri: /~dpopowich/py/parsed
req.args: a=b&c=d
* Closing connection #0
Notice how "/PATH?QUERY#FRAGMENT" is passed with this client.
Now if I type the following into a telnet session (telnet localhost 8000):
GET http://foo:bar@localhost:8000/~dpop...ed?a=b&c=d#here HTTP/1.1
Authorization: Basic Zm9vOmJhcg==
Host: localhost:8000
Then the output is:
req.hostname: localhost
req.unparsed_uri: http://foo:bar@localhost:8000/~dpop...ed?a=b&c=d#here
req.parsed_uri: ('http', 'foo:bar@localhost:8000', 'foo', 'bar', 'localhost', 8000, '/~dpopowich/py/parsed', 'a=b&c=d', 'here')
req.uri: /~dpopowich/py/parsed
req.args: a=b&c=d
Summary
=======
o req.hostname is set by the contents of the full URI, or in absence
of a full uri, the value of the Host header (this is what is
actually said in the mod_python docs). As mentioned before, in the
case when HTTP/1.1 AND the full URI are not specified, req.hostname
can be None.
o req.unparsed_uri is set to the uri specified with GET
o req.parsed_uri is the parsing of req.unparsed_uri (although the
port may appear even if it's not in req.unparsed_uri and if it's
not 80). Definitely there's inconsistencies in how apache handles
this struct. A bug? Maybe not, but some cleanup of the code with
regards to this struct would be nice.
o req.uri is set to req.parsed_uri.path
o req.args is set to req.parsed_uri.query
o When a full URI is specified with GET, the values of hostname and
port can be bogus, i.e., the values in parsed_uri will be set to
whatever the uri specifies, but this may not be the host or port
the client actually connected to. While not explicitly a security
risk, poor programming based on these values could lead to one,
IMHO.
Therefore, I think we're stuck. There's no way we can guarantee
browsers will pass full URIs and none seem to do so. I agree with
Grisha:
o get interfaces to apache functions that return the actual
connection attributes.
Also:
o since you can't rely on any of the hostinfo specified with GET
being valid, apps should rely on hard-coded values in
configuration files to build full URIs. E.g., you know your app
is rooted at http://somehost:someport/, put it as a string in
configuration module that can be imported, then append to it with
your PATH&QUERY. Forcing redirects to the "proper" host in your
apache configurations is probably good practice as well.
o if you're using virtual hosts and your app is not running in the
default virtual host, then (I believe) you're forcing clients to
be speaking HTTP/1.1 in which case req.hostname is guaranteed to
be set, right? You might be able to build strings off of that,
but then your app is dependent on the vagaries of your apache
configuration and one miscalculated cut&paste, placing your
virtualhost first, may lead to weirdness.
OK...long enough...ttfn,
Daniel Popowich
---------------
http://home.comcast.net/~d.popowich/mpservlets/
| |
| Jorey Bump 2005-11-30, 5:48 pm |
| Daniel J. Popowich wrote:
> Now if I type the following into a telnet session (telnet localhost 8000):
>
> GET http://foo:bar@localhost:8000/~dpop...ed?a=b&c=d#here HTTP/1.1
> Authorization: Basic Zm9vOmJhcg==
> Host: localhost:8000
>
> Then the output is:
>
> req.hostname: localhost
> req.unparsed_uri: http://foo:bar@localhost:8000/~dpop...ed?a=b&c=d#here
> req.parsed_uri: ('http', 'foo:bar@localhost:8000', 'foo', 'bar', 'localhost', 8000, '/~dpopowich/py/parsed', 'a=b&c=d', 'here')
> req.uri: /~dpopowich/py/parsed
> req.args: a=b&c=d
Servers are required to respond to an absoluteURI, but when requesting a
resource from the origin server, most clients will use only the abs_path.
> o req.hostname is set by the contents of the full URI, or in absence
> of a full uri, the value of the Host header (this is what is
> actually said in the mod_python docs). As mentioned before, in the
> case when HTTP/1.1 AND the full URI are not specified, req.hostname
> can be None.
I don't see where you confirmed this by using an absoluteURI but a
different hostname in the Host: header.
> o When a full URI is specified with GET, the values of hostname and
> port can be bogus, i.e., the values in parsed_uri will be set to
> whatever the uri specifies, but this may not be the host or port
> the client actually connected to. While not explicitly a security
> risk, poor programming based on these values could lead to one,
> IMHO.
>
> Therefore, I think we're stuck. There's no way we can guarantee
> browsers will pass full URIs and none seem to do so.
They do if they are set up to use a proxy server. This is currently the
most common (if not only) use case for sending an absoluteURI. This
suggests that parsed_uri is behaving as expected, and developers should
recognize that it will contain mostly null values under common use,
unless the request comes from a proxy.
IOW, the uparsed URI is just another client-supplied string, like
Referer, and should be treated as an untrustworthy source that may
occasionally contain interesting information.
| |
| Jorey Bump 2005-11-30, 5:48 pm |
| Daniel J. Popowich wrote:
> As I said in my previous email to the list, I *think* if you use
> virtual hosts and your "real" sites are NOT the first real host, then
> you are forcing clients to speak HTTP/1.1, thus forcing the Host
> header to be sent.
This is technically false. A better way to put it might be: When using
name-based virtual hosts, a Host: header must be sent. If it is not, the
default server will respond to the request, regardless of the hostname
used in the URI by the client, or, optionally, in the request string
itself (as an absoluteURI via a proxy server, for example).
Also, the Host: header is not limited to HTTP/1.1.
| |
| Jorey Bump 2005-11-30, 5:48 pm |
| Jim Gallacher wrote:
> Gregory (Grisha) Trubetskoy wrote:
>
> Either way, it should be fixed.
I think maybe it's not really broken.
> In case anyone is not familiar with the issue, a request for
> http://example.com/tests/mptest?view=form currently gives a tuple that
> looks something like this:
That's not true. That's what you might see in your client browser, but
(usually) it only asks for /tests/mptest?view=form, regardless of the
name it used to find the server. It may use the Host: header to
negotiate the right virtual host, but the Host: header is not part of
the string that parsed_uri is actually parsing.
> (None, None, None, None, None, None, '/tests/mptest', 'view=form', None)
>
> which is not what we expect. This is what the mod_python docs have to say:
>
> parsed_uri
> Tuple. The URI broken down into pieces. (scheme, hostinfo, user,
> password, hostname, port, path, query, fragment). The apache module
> defines a set of URI_* constants that should be used to access elements
> of this tuple. Example:
>
> fname = req.parsed_uri[apache.URI_PATH]
>
> (Read-Only)
This is all correct. I think the problem is that developers are hoping
to use parsed_uri in a use case for which it is inappropriate. Those
values are populated *if present* in the supplied request URI, but the
only *minimal* requirement would be a "/" for the path.
If you want to know what resource the client *really* requested (and
inquiring minds do), you *must not* attempt to rewrite or repopulate this.
| |
| Gregory (Grisha) Trubetskoy 2005-11-30, 5:48 pm |
|
This is cool stuff, thanks!
I'm quessing that perhaps req.parsed_uri makes a lot more sense when
Apache is used as a proxy, in which case what follows GET is the full URL.
Perhaps we can add something to the docs that says "this attribute gets
its data from the argument to the HTTP GET method, which is usually just
the path in the URL and does not include the protocol, hostname and port.
It is only filled in completely when the server is used as a proxy"..?
(the wording could use improvement)
Grisha
On Wed, 30 Nov 2005, Daniel J. Popowich wrote:
>
>
> Disclaimer: What follows is not an exhaustive, conclusive search by
> tracing running code, but rather searching source code and watching
> apache behaviour with tools like curl, telneting to the apache port
> and using a browser.
>
> Onward...
>
[snip]
| |
| Jorey Bump 2005-11-30, 5:48 pm |
| Gregory (Grisha) Trubetskoy wrote:
> Perhaps we can add something to the docs that says "this attribute gets
> its data from the argument to the HTTP GET method, which is usually just
> the path in the URL and does not include the protocol, hostname and
> port. It is only filled in completely when the server is used as a
> proxy"..?
How about : "This attribute gets its data from the client-supplied
Request-URI."
This isn't limited to GET. For more info, see:
http://www.w3.org/Protocols/rfc2616...5.html#sec5.1.2
| |
| Jim Gallacher 2005-11-30, 5:48 pm |
| Jorey Bump wrote:
> Jim Gallacher wrote:
>
>
>
> I think maybe it's not really broken.
As I review Daniel's summary and the apache code I've come to this
conclusion as well.
>
>
> That's not true.
Well, it is true for some definitions of request. 
> That's what you might see in your client browser, but
> (usually) it only asks for /tests/mptest?view=form, regardless of the
> name it used to find the server. It may use the Host: header to
> negotiate the right virtual host, but the Host: header is not part of
> the string that parsed_uri is actually parsing.
>
>
>
> This is all correct. I think the problem is that developers are hoping
> to use parsed_uri in a use case for which it is inappropriate. Those
> values are populated *if present* in the supplied request URI, but the
> only *minimal* requirement would be a "/" for the path.
I guess the confusion comes in part from misunderstanding the difference
between a URI, absoluteURI, relativeURI, and a URL. I think when most
developers (and by most I mean me ;) ) see the description of
parsed_uri they get excited by all the goodies contained therein, only
to find its mostly empty promises. Sorta like the great toy pictured on
the outside of the cereal box. You just *have* to dig to the bottom of
the box to get that wonderful treasure, only to discover it's some cheap
piece of plastic about the size of a pencil eraser. Oh, the
disappointment! Can any of us really trust the cereal industry ever
again? (No issues here... nope... none at all... mumble... mumble...
mumble)
Now where was I... oh, right, mod_python...
At this point I think we should leave parsed_uri alone, as it seems to
do the correct thing - if not the desired thing. At a minimum we should
expand the documentation to warn people of the limitations. I still
think it would be useful to have a tuple similar to parsed_uri, but
which is fully populated. Or maybe everyone just continues to roll there
own.
Regards,
Jim
| |
| Daniel J. Popowich 2005-11-30, 5:48 pm |
| Jorey Bump writes:
>
> I don't see where you confirmed this by using an absoluteURI but a
> different hostname in the Host: header.
What, you won't take my word for it? :-)
In each of the following I connect to localhost on port 8000 using
telnet. In none of the requests do I actually specify localhost or
port 8000 in the full URI or Host header to demonstrate the
arbitrariness of the client-specified values. Also, I turned off the
basic auth to simplify the examples.
---------------------------------
Full URI != Host Header, HTTP/1.1
---------------------------------
Request:
GET http://crap:666/~dpopowich/py/parsed?a=b&c=d#here HTTP/1.1
Host: foo:999
Response:
req.hostname: crap
req.unparsed_uri: http://crap:666/~dpopowich/py/parsed?a=b&c=d#here
req.parsed_uri: ('http', 'crap:666', None, None, 'crap', 666, '/~dpopowich/py/parsed', 'a=b&c=d', 'here')
req.uri: /~dpopowich/py/parsed
req.args: a=b&c=d
----------------------------------
Partial URI, Host Header, HTTP/1.1
----------------------------------
Request:
GET /~dpopowich/py/parsed?a=b&c=d#here HTTP/1.1
Host: foo:999
Response:
req.hostname: foo
req.unparsed_uri: /~dpopowich/py/parsed?a=b&c=d#here
req.parsed_uri: (None, None, None, None, None, 999, '/~dpopowich/py/parsed', 'a=b&c=d', 'here')
req.uri: /~dpopowich/py/parsed
req.args: a=b&c=d
----------------------------------
Full URI != Host Header; HTTP/1.0
----------------------------------
Request:
GET http://crap:666/~dpopowich/py/parsed?a=b&c=d#here HTTP/1.0
Host: foo:999
Response:
req.hostname: crap
req.unparsed_uri: http://crap:666/~dpopowich/py/parsed?a=b&c=d#here
req.parsed_uri: ('http', 'crap:666', None, None, 'crap', 666, '/~dpopowich/py/parsed', 'a=b&c=d', 'here')
req.uri: /~dpopowich/py/parsed
req.args: a=b&c=d
----------------------------------
Partial URI, Host Header; HTTP/1.0
----------------------------------
Request:
GET /~dpopowich/py/parsed?a=b&c=d#here HTTP/1.0
Host: foo:999
Response:
req.hostname: foo
req.unparsed_uri: /~dpopowich/py/parsed?a=b&c=d#here
req.parsed_uri: (None, None, None, None, None, 999, '/~dpopowich/py/parsed', 'a=b&c=d', 'here')
req.uri: /~dpopowich/py/parsed
req.args: a=b&c=d
-------------------------------------
Partial URI, no Host Header; HTTP/1.0
-------------------------------------
Request:
GET /~dpopowich/py/parsed?a=b&c=d#here HTTP/1.0
Response:
req.hostname: None
req.unparsed_uri: /~dpopowich/py/parsed?a=b&c=d#here
req.parsed_uri: (None, None, None, None, None, None, '/~dpopowich/py/parsed', 'a=b&c=d', 'here')
req.uri: /~dpopowich/py/parsed
req.args: a=b&c=d
> IOW, the uparsed URI is just another client-supplied string, like
> Referer, and should be treated as an untrustworthy source that may
> occasionally contain interesting information.
Well put. And for me, and I think others, this is what was not known
and is now realized. I think a note should be added to the doc for
parsed_uri in section 4.5.3.2 of the manual to avoid this confusion
and let developers know it is not a bug there being so many Nones.
Daniel Popowich
---------------
http://home.comcast.net/~d.popowich/mpservlets/
| |
| Jorey Bump 2005-11-30, 5:48 pm |
| Jim Gallacher wrote:
> I still
> think it would be useful to have a tuple similar to parsed_uri, but
> which is fully populated.
Not sure if it's possible:
Developer: STOP! He who would use the Killer App must answer me these
questions three, 'ere my precious data he see: What is your request?
mod_python: /grail
Developer: What kind of grail do ye seek?
mod_python: type=holy
Developer: For what host do ye seek it?
mod_python: What do you mean, the VirtualHost in which the interpreter
is running or the host requested by the client?
Developer: AAAAAAAAAHHHHHHHHHHH!!!!! (developer tossed into the Code of
Eternal Perl)
> Or maybe everyone just continues to roll there
> own.
Probably best, since context will affect which values we want to
magically appear in such a tuple.
| |
| Daniel J. Popowich 2005-11-30, 5:48 pm |
| Jorey Bump writes:
> Gregory (Grisha) Trubetskoy wrote:
>
>
> How about : "This attribute gets its data from the client-supplied
> Request-URI."
>
I'd prefer something more explicit (because I'm dense and need 2x4s
about the head). I humbly offer the following to the editorial board:
unparsed_uri
String. The URI without any parsing performed. This is the
argument passed to, e.g., the HTTP GET method, and so is
completely dependent on the value submitted by the client; you
have been warned. Clients typically send a partial uri containing
only the path and query with no hostinfo, e.g.:
"GET /path/to/handler?query=value HTTP/1.1". (Read-Only)
parsed_uri
Tuple. The value of unparsed_uri broken down into pieces. (scheme,
hostinfo, user, password, hostname, port, path, query,
fragment). The apache module defines a set of URI_* constants that
should be used to access elements of this tuple. Example:
fname = req.parsed_uri[apache.URI_PATH]
Please note: as stated for unparsed_uri, the value is completely
dependent on the uri submitted by the client. Since it is typical
for clients to only submit the path and query components the rest
of the elements in the tuple will often be None. This is not a
bug. (Read-Only)
args
String. Same as parsed_uri[apache.URI_QUERY] (and CGI
QUERY_ARGS). (Read-Only)
uri
String. The path portion of the URI. Same as
parsed_uri[apache.URI_PATH]. (Read-Only)
hostname
String. Host, as set by a full URI from, e.g., the HTTP GET
method, or in absence of a full URI, the value of the Host header.
In either case, the value is provided by the client; you have been
warned. Note: when set by the Host header (which is typical) this
value will differ from parsed_uri[apache.URI_HOSTNAME] (which will
be None). See unparsed_uri and parsed_uri. Also, in rare cases
(no full URI, no Host header) this value can be None. (Read-Only)
| |
| Nicolas Lehuen 2005-11-30, 5:48 pm |
| 2005/11/30, Jim Gallacher <jpg@jgassociates.ca>:
>
>
> At this point I think we should leave parsed_uri alone, as it seems to
> do the correct thing - if not the desired thing. At a minimum we should
> expand the documentation to warn people of the limitations. I still
> think it would be useful to have a tuple similar to parsed_uri, but
> which is fully populated. Or maybe everyone just continues to roll there
> own.
>
> Regards,
> Jim
>
>
Well, I'm still interested in writing a function that would return a fully
populated parsed_uri-like structure, even in the absence of an absolute URL
in the request path. Call me stubborn, but I still feel that using a
configuration item to tell mod_python the supposed protocol, server name and
listening port is a blatant breach of the DRY principle. Plus, it would
force me to change the configuration file between my development, test and
production platform...Yeah, I'm that lazy .
So, based on Daniel's excellent posts (thanks, Daniel), here what we have so
far :
1) Protocol : http:// or https:// ?
For now the best way to get this is to call req.add_common_vars() and test
whether req.subprocess_env.get('HTTPS') == 'on'. Using req.is_https() which
was proposed in the other thread "Calling APR optional functions provided by
modules" may be more elegant, but right now we don't have this method.
2) Server name
Thanks to Daniel's excellent posts, I can see that req.hostname is the best
way to get the server name so far. Unfortunately, it depends on data sent by
the client, but hey, so does the rest of the request handling ;).
One thing that would be nice is to let Apache sort out this mess and tell us
what is the virtual host name it choose to serve the request. This is my
Holy Grail and I shall seek it from now on.
Ah, while I'm at it, knowing the DocumentRoot of the current VirtualHost
would be great, too. But that's another story.
3) Port number
port = req.connection.local_addr[1]
'nuf said.
4) URL or URI or whatever you choose to name the part of the resource one
the physical matters of protocol, server and port are sorted out
uri = req.uri
Note that this uri can in turn be splitted in something which is lost by the
publisher and the req.path_info field, that is IIRC that we can assert(
req.uri.endswith(req.path_info)). I don't know what req.path_info is before
the publisher kicks in, though.
Anyway, the length of this thread shows that a bit of clarification is
required. A page named something like "What's in an URL ?" and explaining
the client-side and server-side view of the various components of a URL are
would be great. I'll try to write a draft this week-end.
Regards,
Nicolas
| |
| Nicolas Lehuen 2005-11-30, 5:48 pm |
| Ooops from your definition it looks like this holds :
req.unparsed_uri = req.uri + req.path_info
So we'd better use unparsed_uri to reconstitute the original absolute URL.
Before the publisher computes path_info it must be empty, so in this case
req.unparsed_uri == req.uri. I'll check this.
Regards,
Nicolas
2005/11/30, Daniel J. Popowich <dpopowich@comcast.net>:
>
> Jorey Bump writes:
> gets
> just
>
> I'd prefer something more explicit (because I'm dense and need 2x4s
> about the head). I humbly offer the following to the editorial board:
>
> unparsed_uri
> String. The URI without any parsing performed. This is the
> argument passed to, e.g., the HTTP GET method, and so is
> completely dependent on the value submitted by the client; you
> have been warned. Clients typically send a partial uri containing
> only the path and query with no hostinfo, e.g.:
> "GET /path/to/handler?query=value HTTP/1.1". (Read-Only)
>
> parsed_uri
> Tuple. The value of unparsed_uri broken down into pieces. (scheme,
> hostinfo, user, password, hostname, port, path, query,
> fragment). The apache module defines a set of URI_* constants that
> should be used to access elements of this tuple. Example:
>
> fname = req.parsed_uri[apache.URI_PATH]
>
> Please note: as stated for unparsed_uri, the value is completely
> dependent on the uri submitted by the client. Since it is typical
> for clients to only submit the path and query components the rest
> of the elements in the tuple will often be None. This is not a
> bug. (Read-Only)
>
> args
> String. Same as parsed_uri[apache.URI_QUERY] (and CGI
> QUERY_ARGS). (Read-Only)
>
> uri
> String. The path portion of the URI. Same as
> parsed_uri[apache.URI_PATH]. (Read-Only)
>
> hostname
> String. Host, as set by a full URI from, e.g., the HTTP GET
> method, or in absence of a full URI, the value of the Host header.
> In either case, the value is provided by the client; you have been
> warned. Note: when set by the Host header (which is typical) this
> value will differ from parsed_uri[apache.URI_HOSTNAME] (which will
> be None). See unparsed_uri and parsed_uri. Also, in rare cases
> (no full URI, no Host header) this value can be None. (Read-Only)
>
>
| |
| Jim Gallacher 2005-11-30, 5:48 pm |
| Daniel J. Popowich wrote:
> Jorey Bump writes:
>
>
>
> I'd prefer something more explicit (because I'm dense and need 2x4s
> about the head). I humbly offer the following to the editorial board:
>
> unparsed_uri
> String. The URI without any parsing performed. This is the
> argument passed to, e.g., the HTTP GET method, and so is
> completely dependent on the value submitted by the client; you
> have been warned. Clients typically send a partial uri containing
> only the path and query with no hostinfo, e.g.:
> "GET /path/to/handler?query=value HTTP/1.1". (Read-Only)
>
> parsed_uri
> Tuple. The value of unparsed_uri broken down into pieces. (scheme,
> hostinfo, user, password, hostname, port, path, query,
> fragment). The apache module defines a set of URI_* constants that
> should be used to access elements of this tuple. Example:
>
> fname = req.parsed_uri[apache.URI_PATH]
>
> Please note: as stated for unparsed_uri, the value is completely
> dependent on the uri submitted by the client. Since it is typical
> for clients to only submit the path and query components the rest
> of the elements in the tuple will often be None. This is not a
> bug. (Read-Only)
>
> args
> String. Same as parsed_uri[apache.URI_QUERY] (and CGI
> QUERY_ARGS). (Read-Only)
>
> uri
> String. The path portion of the URI. Same as
> parsed_uri[apache.URI_PATH]. (Read-Only)
>
> hostname
> String. Host, as set by a full URI from, e.g., the HTTP GET
> method, or in absence of a full URI, the value of the Host header.
> In either case, the value is provided by the client; you have been
> warned. Note: when set by the Host header (which is typical) this
> value will differ from parsed_uri[apache.URI_HOSTNAME] (which will
> be None). See unparsed_uri and parsed_uri. Also, in rare cases
> (no full URI, no Host header) this value can be None. (Read-Only)
>
Everything you've stated above is true except when it's not. 
Using an internal_redirect messes with some of these attributes but not
others. Those that change get their new values from the new_uri used in
the redirect. Unchanged values are from the initial request.
req.internal_redirect(new_uri)
the_request unchanged
unparsed_uri new
parsed_uri new
args new
uri unchanged
hostname unchanged
Regards,
Jim
| |
| Jim Gallacher 2005-11-30, 5:48 pm |
| Jim Gallacher wrote:
> Using an internal_redirect messes with some of these attributes but not
> others. Those that change get their new values from the new_uri used in
> the redirect. Unchanged values are from the initial request.
>
> req.internal_redirect(new_uri)
>
> the_request unchanged
> unparsed_uri new
> parsed_uri new
> args new
> uri unchanged
^^^^^^^^^
uri new (oops - wee typo there)
> hostname unchanged
>
> Regards,
> Jim
>
| |
| Daniel J. Popowich 2005-11-30, 5:48 pm |
| Jim Gallacher writes:[vbcol=seagreen]
> Jim Gallacher wrote:
>
> ^^^^^^^^^
> uri new (oops - wee typo there)
>
Jim,
Really, I don't mean to be obtuse, but I'm not groking your point.
Are you clarifying that for internal redirects the *source* for these
attributes is different than what I was saying in my documentation or
is there something more subtle than that?
Are you suggesting a change or is this academic?
Slow, but sure,
Daniel Popowich
---------------
http://home.comcast.net/~d.popowich/mpservlets/
| |
| Jim Gallacher 2005-11-30, 5:48 pm |
| Nicolas Lehuen wrote:
>
>
> 2005/11/30, Jim Gallacher <jpg@jgassociates.ca
> <mailto:jpg@jgassociates.ca>>:
>
>
> At this point I think we should leave parsed_uri alone, as it seems to
> do the correct thing - if not the desired thing. At a minimum we should
> expand the documentation to warn people of the limitations. I still
> think it would be useful to have a tuple similar to parsed_uri, but
> which is fully populated. Or maybe everyone just continues to roll there
> own.
>
> Regards,
> Jim
>
>
> Well, I'm still interested in writing a function that would return a
> fully populated parsed_uri-like structure, even in the absence of an
> absolute URL in the request path. Call me stubborn, but I still feel
> that using a configuration item to tell mod_python the supposed
> protocol, server name and listening port is a blatant breach of the DRY
> principle. Plus, it would force me to change the configuration file
> between my development, test and production platform...Yeah, I'm that
> lazy .
That's not lazy, it's smart. Test and production should be as similar as
is possible.
> Ah, while I'm at it, knowing the DocumentRoot of the current VirtualHost
> would be great, too. But that's another story.
I don't know that story. Is there a problem with req.document_root()?
> 4) URL or URI or whatever you choose to name the part of the resource
> one the physical matters of protocol, server and port are sorted out
>
> uri = req.uri
>
> Note that this uri can in turn be splitted in something which is lost by
> the publisher and the req.path_info field, that is IIRC that we can
> assert(req.uri.endswith(req.path_info)). I don't know what req.path_info
> is before the publisher kicks in, though.
I'm not sure I understand what is being lost since publisher does not
modify req.uri. Something that I've found useful but which seems to be
missing is the idea of a base_uri, where
uri = base_uri + path_info
Or maybe the base_uri part is what you mean when you say something is lost?
> Anyway, the length of this thread shows that a bit of clarification is
> required. A page named something like "What's in an URL ?" and
> explaining the client-side and server-side view of the various
> components of a URL are would be great. I'll try to write a draft this
> week-end.
Excellent.
The corollary of this discussion is putting the parsed_uri back together
again. Is there any support for exposing apr_uri_unparse()?
Jim
| |
| Jorey Bump 2005-11-30, 5:48 pm |
| Daniel J. Popowich wrote:
> Jorey Bump writes:
>
>
>
> I'd prefer something more explicit (because I'm dense and need 2x4s
> about the head). I humbly offer the following to the editorial board:
>
> unparsed_uri
> String. The URI without any parsing performed. This is the
> argument passed to, e.g., the HTTP GET method, and so is
> completely dependent on the value submitted by the client; you
> have been warned. Clients typically send a partial uri containing
> only the path and query with no hostinfo, e.g.:
> "GET /path/to/handler?query=value HTTP/1.1". (Read-Only)
>
> parsed_uri
> Tuple. The value of unparsed_uri broken down into pieces. (scheme,
> hostinfo, user, password, hostname, port, path, query,
> fragment). The apache module defines a set of URI_* constants that
> should be used to access elements of this tuple. Example:
>
> fname = req.parsed_uri[apache.URI_PATH]
>
> Please note: as stated for unparsed_uri, the value is completely
> dependent on the uri submitted by the client. Since it is typical
> for clients to only submit the path and query components the rest
> of the elements in the tuple will often be None. This is not a
> bug. (Read-Only)
>
> args
> String. Same as parsed_uri[apache.URI_QUERY] (and CGI
> QUERY_ARGS). (Read-Only)
>
> uri
> String. The path portion of the URI. Same as
> parsed_uri[apache.URI_PATH]. (Read-Only)
>
> hostname
> String. Host, as set by a full URI from, e.g., the HTTP GET
> method, or in absence of a full URI, the value of the Host header.
> In either case, the value is provided by the client; you have been
> warned. Note: when set by the Host header (which is typical) this
> value will differ from parsed_uri[apache.URI_HOSTNAME] (which will
> be None). See unparsed_uri and parsed_uri. Also, in rare cases
> (no full URI, no Host header) this value can be None. (Read-Only)
+1 on your definitions, but I have another issue, related to this thread...
This discussion leads me to believe that req.hostname, in its current
implementation, is hopelessly ambiguous. It is already doing what we've
concluded in this thread to be a Bad Thing(TM) by automagically
interposing two completely unrelated values simply to avoid returning None.
Can anyone conceive of a use case where it would be alright to rely on
this value, even when it's been arbitrarily populated by a
client-supplied absoluteURI (via a proxy, for example)? What would a
developer expect to be contained in this value? For myself, I would
prefer it to be a high-level interface to req.headers_in['Host'], in
which case, None would be somewhat meaningful.
Even better, deprecate req.hostname in 3.2, where we can add req.host to
contain the value in req.headers_in['Host']. Then drop req.hostname in
3.3 completely. This will give developers some time to adapt.
Finally, I'm getting the impression that most developers are looking for
a portable way to get the ServerName, as defined in the Apache
configuration. This may currently be achieved in a variety of ways,
including:
servername = req.server.server_hostname
or:
req.add_common_vars()
servername = req.subprocess_env['SERVER_NAME']
So, getting back to Nicolas' original post, and reaffirming Grisha's
point that req.hostname isn't appropriate in his script, maybe
req.server.server_hostname will work, in that it allows one to construct
an URL that gets the user back to the site, even if it doesn't exactly
match the URL displayed in the browser during the original request.
Does the fact that this is a difficult discovery warrant the addition of
another high-level attribute, req.servername?
| |
| Gregory (Grisha) Trubetskoy 2005-11-30, 5:48 pm |
|
On Wed, 30 Nov 2005, Jorey Bump wrote:
>
> Can anyone conceive of a use case where it would be alright to rely on this
> value, even when it's been arbitrarily populated by a client-supplied
> absoluteURI (via a proxy, for example)? What would a developer expect to be
> contained in this value? For myself, I would prefer it to be a high-level
> interface to req.headers_in['Host'], in which case, None would be somewhat
> meaningful.
req.hostname is the value of hostname in httpd's req_rec structure.
> Even better, deprecate req.hostname
well... this is the wrong list for this - req.hostname just reflects the
value of req_rec->hostname, you'd have to suggest the deprecation to
dev@httpd.apache.org 
> in 3.2, where we can add req.host to contain the value in
> req.headers_in['Host'].
The only "value" I see in this is saving 14 keystrokes.
> Finally, I'm getting the impression that most developers are looking for a
> portable way to get the ServerName
Keep in mind that ServerName doesn't always exist, but
req.server.server_hostname is the right place to get it.
> req.add_common_vars()
> servername = req.subprocess_env['SERVER_NAME']
That's a waste of CPU cycles, since add_common_vars() copies it from
req.server.server_hostname (most likely, haven't check for sure)
> So, getting back to Nicolas' original post, and reaffirming Grisha's
> point that req.hostname isn't appropriate in his script, maybe
> req.server.server_hostname will work, in that it allows one to construct
> an URL that gets the user back to the site, even if it doesn't exactly
> match the URL displayed in the browser during the original request.
Good point... I won't comment on this since I believe that URL-deduction
is the wrong approach to begin with ;-)
> Does the fact that this is a difficult discovery warrant the addition of
> another high-level attribute, req.servername?
Why introduce redundancy if there already is req.server.server_hostname?
Grisha
| |
| Jorey Bump 2005-11-30, 5:48 pm |
| Nicolas Lehuen wrote:
> 2) Server name
>
> Thanks to Daniel's excellent posts, I can see that req.hostname is the
> best way to get the server name so far. Unfortunately, it depends on
> data sent by the client, but hey, so does the rest of the request
> handling ;).
req.server.server_hostname is unambiguous, and I can't think of a
real-world situation in which it would be None. It matches the
ServerName in your Apache configuration, so won't this be a reliable
value to use when reconstructing an URL?
> One thing that would be nice is to let Apache sort out this mess and
> tell us what is the virtual host name it choose to serve the request.
> This is my Holy Grail and I shall seek it from now on.
Ride to Camelot with req.server.server_hostname.
> Ah, while I'm at it, knowing the DocumentRoot of the current VirtualHost
> would be great, too. But that's another story.
req.add_common_vars()
servername = req.subprocess_env['DOCUMENT_ROOT']
> 4) URL or URI or whatever you choose to name the part of the resource
> one the physical matters of protocol, server and port are sorted out
>
> uri = req.uri
>
> Note that this uri can in turn be splitted in something which is lost by
> the publisher and the req.path_info field, that is IIRC that we can
> assert(req.uri.endswith(req.path_info)). I don't know what req.path_info
> is before the publisher kicks in, though.
Perhaps req.uri is an unfortunate name, since it doesn't represent the
entire Request-URI.
> Anyway, the length of this thread shows that a bit of clarification is
> required. A page named something like "What's in an URL ?" and
> explaining the client-side and server-side view of the various
> components of a URL are would be great. I'll try to write a draft this
> week-end.
The hard part is figuring out where the HTTP spec stops and apache
starts, then where apache stops and mod_python starts, in relation to
the terminology. A worthy goal would be to try to correct
inconsistencies between apache & mod_python, at least, so that both
projects use the same vocabulary (they do in many places, already).
| |
| Gregory (Grisha) Trubetskoy 2005-11-30, 5:48 pm |
|
On Wed, 30 Nov 2005, Jorey Bump wrote:
> Nicolas Lehuen wrote:
>
>
> req.server.server_hostname is unambiguous, and I can't think of a real-world
> situation in which it would be None. It matches the ServerName in your Apache
> configuration, so won't this be a reliable value to use when reconstructing
> an URL?
To add just a tad more confusion here - don't forget ServerAlias, which is
also a perfectly valid value for the "deduced" URL. ;-)
Grisha
| |
| Jorey Bump 2005-11-30, 5:48 pm |
| Gregory (Grisha) Trubetskoy wrote:
>
> On Wed, 30 Nov 2005, Jorey Bump wrote:
>
> That's a waste of CPU cycles, since add_common_vars() copies it from
> req.server.server_hostname (most likely, haven't check for sure)
It may be wasteful for fetching a single environment value, but
add_common_vars() gathers all sorts of disparate information into one
interface, subprocess_env, for those with a CGI bent.
| |
| Nicolas Lehuen 2005-11-30, 5:48 pm |
| 2005/11/30, Jim Gallacher <jpg@jgassociates.ca>:
[snip]
> Nicolas Lehuen wrote:
>
> I don't know that story. Is there a problem with req.document_root()?
Well, I think I'm doing a bad thing, and I have to stop doing it. I'm using
mod_vhost_alias, which is a way to implement mass virtual hosting. It's kind
of neat, since you get one document root per virtual host, all document
roots are subdirs of a common parent directory, without the hassle of using
mod_rewrite. However, it seems a bit unfinished on the edges since
req.document_root() returns the common parent directory instead of the true,
per-virtual host document root.
Also, I don't know if using mod_rewrite to implement mass virtual hosting
can change the document root accordingly. So the only way to know my
document root is to compute it from the common parent directory and the
virtual host name, and bam, we're back on our track of "how do I get the
current virtual host name ?".
> 4) URL or URI or whatever you choose to name the part of the resource
>
> I'm not sure I understand what is being lost since publisher does not
> modify req.uri. Something that I've found useful but which seems to be
> missing is the idea of a base_uri, where
>
> uri = base_uri + path_info
>
> Or maybe the base_uri part is what you mean when you say something is
> lost?
Using the enclosed file, which is both a test handler and a page that can be
published, I got those results :
1) Using test.py as a handler
URI
---
req.unparsed_uri: '/test.handler/subpath#toto'
req.parsed_uri: (None, None, None, None, None, None,
'/test.handler/subpath', None, 'toto')
req.uri: '/test.handler/subpath'
req.path_info: '/subpath'
req.subprocess_env.get("SCRIPT_NAME"): '/test.handler'
req.subprocess_env.get("PATH_INFO"): '/subpath'
req.subprocess_env.get("SCRIPT_URL"): None
req.subprocess_env.get("SCRIPT_URI"): None
2) Using the publisher handler to publish test.py
URI
---
req.unparsed_uri: '/test.py/subpath#toto'
req.parsed_uri: (None, None, None, None, None, None, '/test.py/subpath',
None, 'toto')
req.uri: '/test.py/subpath'
req.path_info: '/subpath'
req.subprocess_env.get("SCRIPT_NAME"): '/test.py'
req.subprocess_env.get("PATH_INFO"): '/subpath'
req.subprocess_env.get("SCRIPT_URL"): None
req.subprocess_env.get("SCRIPT_URI"): None
I must confess I'm completely at a loss here...
a) Handlers and published modules seem to behave the same way, so the
computation of path_info must come from above, i.e. either from mod_python
or from Apache.
b) We've got req.uri == req.subprocess_env.get("SCRIPT_NAME") +
req.subprocess_env.get("PATH_INFO"). Cool, but who does the split ? I'm
guessing that it's Apache who does it thanks to the AddHandler directives ;
it knows that the .py extension must be served by mod_python, hence it
deduces that /test.py must be the script name and /subpath some path info to
provide the script with.
c) We don't have a req.base_uri (to follow Jim's naming suggestion) or
req.script_name that would be equivalent to req.subprocess_env.get
("SCRIPT_NAME"), but we have a req.path_info... Why is this missing ?
I'm beginning to think that all this feels highly un-pythonic. There are a
lot more than one way to get some data (the host name is a good example).
You get to use req.foobar or req.subprocess_env['FOOBAR'] or
req.server.foobar (and feel happy if there is only one FOOBAR which gives
you the data you need). subprocess_env is a very ugly name which doesn't
seem to be related to mod_python at all (I'm using the multi-threaded MPM
and I don't have subprocesses). For some data, there is no way to get it
(where is the current virtual host name, as determined by Apache ?).
One thing I'll try to do is to write a kind of Rosetta Stone with all the
data you can find in a URL, how to get it from the request/connection/server
object, how to get it from subprocess_env (i.e. how you would get it in a
CGI), and what is missing or duplicated.
This way we'll be able to decide if we should deprecate some paths to those
data and remove them in a later release (3.3 or 3.4). The end result would
be a series of statements like "If you want to get the virtual host name,
then use XYZ, don't use ABC which is deprecated. Be aware though that it's
not 100% efficient, blah blah".
Regards,
Nicolas
> Anyway, the length of this thread shows that a bit of clarification is
>
> Excellent.
>
> The corollary of this discussion is putting the parsed_uri back together
> again. Is there any support for exposing apr_uri_unparse()?
>
> Jim
>
| |
| Jim Gallacher 2005-11-30, 5:48 pm |
| Daniel J. Popowich wrote:
> Jim Gallacher writes:
>
>
>
> Jim,
>
> Really, I don't mean to be obtuse, but I'm not groking your point.
> Are you clarifying that for internal redirects the *source* for these
> attributes is different than what I was saying in my documentation
Exactly. The documentation needs to account for the different behaviour
after an internal_redirect. I'll let the code talk for me as I can't
come up with adequate text. Assume you are using publisher with the
following code.
Client request:
http://www.example.com/mod_python/p...py?blah=de-blah
mptest.py
---------
def index(req):
req.content_type = 'text/plain'
req.write('-----\nindex\n-----\n')
stuff(req)
# I'm sorry about the look of the next statement -
# thunderbird line wrapping issues
req.internal_redirect("http://ABCDE.example.org:666"
"/mod_python/parsed_uri/mptest/redirect"
"?inquisition=spanish#foo")
def redirect(req):
req.write('\n\n--------\nredirect\n--------\n')
stuff(req)
def stuff(req):
req.write('req.the_request: %s\n' % req.the_request)
req.write('req.uri: %s\n' % req.uri)
req.write('req.unparsed_uri: %s\n' % str(req.unparsed_uri))
req.write('req.parsed_uri: %s\n' % str(req.parsed_uri))
req.write('req.hostname: %s\n' % req.hostname)
req.write('req.args: %s\n' % req.args)
Output:
=======
-----
index
-----
req.the_request: GET /mod_python/parsed_uri/mptest?blah=de-blah HTTP/1.1
req.uri: /mod_python/parsed_uri/mptest.py
req.unparsed_uri: /mod_python/parsed_uri/mptest
req.parsed_uri: (None, None, None, None, None, None,
'/mod_python/parsed_uri/mptest', 'blah=de-blah', None)
req.hostname: www.example.org
req.args: blah=de-blah
--------
redirect
--------
req.the_request: GET /mod_python/parsed_uri/mptest HTTP/1.1
req.uri: /mod_python/parsed_uri/mptest.py/redirect
req.unparsed_uri:
http://ABCDE.example.org:666/mod_py...ion=spanish#foo
req.parsed_uri: ('http', 'ABCDE.example.org:666', None, None,
'ABCDE.example.org', 666, '/mod_python/parsed_uri/mptest/redirect',
'inquisition=spanish', 'foo')
req.hostname: www.example.org
req.args: inquisition=spanish
Jim
| |
| Daniel J. Popowich 2005-11-30, 5:48 pm |
|
>
> I'm not sure I understand what is being lost since publisher does not
> modify req.uri. Something that I've found useful but which seems to be
> missing is the idea of a base_uri, where
>
> uri = base_uri + path_info
>
> Or maybe the base_uri part is what you mean when you say something is lost?
>
THIS is what has driven me batty...if someone could write the concise
description of the relationship between req.uri and req.path_info with
no ifs-ands-or-buts, exclusionary clauses and definitely no footnotes,
I can stop my drinking habit. :-) Having base_uri would be heaven.
Daniel Popowich
---------------
http://home.comcast.net/~d.popowich/mpservlets/
| |
| Nicolas Lehuen 2005-11-30, 5:48 pm |
| 2005/11/30, Gregory (Grisha) Trubetskoy <grisha@apache.org>:
>
>
> On Wed, 30 Nov 2005, Jorey Bump wrote:
>
> data
> ;).
> real-world
> Apache
> reconstructing
>
> To add just a tad more confusion here - don't forget ServerAlias, which is
> also a perfectly valid value for the "deduced" URL. ;-)
>
> Grisha
>
Exactly ! I was beginning to feel cornered into abandoning my mass virtual
hosting setup, but it's clear that whenever anyone uses ServerAlias, he'll
have the same problem, namely knowing what the request was in the first
time. So I'll stick with req.hostname for now.
Regards,
Nicolas
| |
| Jim Gallacher 2005-11-30, 5:48 pm |
| Jorey Bump wrote:
> Even better, deprecate req.hostname in 3.2, where we can add req.host to
> contain the value in req.headers_in['Host']. Then drop req.hostname in
> 3.3 completely. This will give developers some time to adapt.
It's too late to be deprecating anything in 3.2. I know it seems like
"3.2" is a codeword for some far-off future release, but we really are
close to 3.2 *final*. There has been no negative feedback on 3.2.5b, so
unless something bad happens we hope to have an official release in
mid-December. Of 2005. 
Jim
| |
| Daniel J. Popowich 2005-11-30, 5:48 pm |
|
Nicolas Lehuen writes:
> a) Handlers and published modules seem to behave the same way, so the
> computation of path_info must come from above, i.e. either from mod_python
> or from Apache.
>
> b) We've got req.uri == req.subprocess_env.get("SCRIPT_NAME") +
> req.subprocess_env.get("PATH_INFO"). Cool, but who does the split ? I'm
> guessing that it's Apache who does it thanks to the AddHandler directives ;
> it knows that the .py extension must be served by mod_python, hence it
> deduces that /test.py must be the script name and /subpath some path info to
> provide the script with.
This is how I conceptualize what apache does: apache takes the PATH
component from the uri and appends it to DocumentRoot, it then
searches down this path, starting with the first path component after
DocumentRoot, testing to see if it exists on the filesystem. If it
exists it continues to the next path component. If it doesn't exist
and the previous component was a file, then PATH up to, but not
including the current component is the "script" and this component to
the end of PATH is the path_info. If the previous component was a
directory, however, then the script goes up to and including the
current component, the path_info starting with the next component
ending with the end of the PATH. Confused? I was and still am, but
that was what I discovered.
DocumentRoot/dir1/dir2/dir3/file/some/thing/else
With the above, "/dir1/dir2/dir3/file" would be the "base_uri" and
"/some/thing/else" would be the path_info...but...
DocumentRoot/dir1/dir2/dir3/some/thing/else
will have "/dir1/dir2/dir3/some" as the script (yes, even though no
such thing exists on the filesystem) and "/thing/else" becomes the
path_info.
Odd behaviour, no? It just about killed me discovering it, but it's
what allows some handlers to work with clean urls (no file
extensions), so I'm not complaining. :-)
Daniel Popowich
---------------
http://home.comcast.net/~d.popowich/mpservlets/
| |
| Daniel J. Popowich 2005-11-30, 5:48 pm |
|
Jim Gallacher writes:
> Daniel J. Popowich wrote:
>
> Exactly. The documentation needs to account for the different behaviour
> after an internal_redirect. I'll let the code talk for me as I can't
> come up with adequate text. Assume you are using publisher with the
> following code.
>
> Client request:
> http://www.example.com/mod_python/p...py?blah=de-blah
>
> mptest.py
> ---------
>
> def index(req):
> req.content_type = 'text/plain'
> req.write('-----\nindex\n-----\n')
> stuff(req)
> # I'm sorry about the look of the next statement -
> # thunderbird line wrapping issues
> req.internal_redirect("http://ABCDE.example.org:666"
> "/mod_python/parsed_uri/mptest/redirect"
> "?inquisition=spanish#foo")
>
> def redirect(req):
> req.write('\n\n--------\nredirect\n--------\n')
> stuff(req)
>
> def stuff(req):
> req.write('req.the_request: %s\n' % req.the_request)
> req.write('req.uri: %s\n' % req.uri)
> req.write('req.unparsed_uri: %s\n' % str(req.unparsed_uri))
> req.write('req.parsed_uri: %s\n' % str(req.parsed_uri))
> req.write('req.hostname: %s\n' % req.hostname)
> req.write('req.args: %s\n' % req.args)
>
>
> Output:
> =======
>
> -----
> index
> -----
> req.the_request: GET /mod_python/parsed_uri/mptest?blah=de-blah HTTP/1.1
> req.uri: /mod_python/parsed_uri/mptest.py
> req.unparsed_uri: /mod_python/parsed_uri/mptest
> req.parsed_uri: (None, None, None, None, None, None,
> '/mod_python/parsed_uri/mptest', 'blah=de-blah', None)
> req.hostname: www.example.org
>
> req.args: blah=de-blah
>
> --------
> redirect
> --------
> req.the_request: GET /mod_python/parsed_uri/mptest HTTP/1.1
> req.uri: /mod_python/parsed_uri/mptest.py/redirect
> req.unparsed_uri:
> http://ABCDE.example.org:666/mod_py...ion=spanish#foo
> req.parsed_uri: ('http', 'ABCDE.example.org:666', None, None,
> 'ABCDE.example.org', 666, '/mod_python/parsed_uri/mptest/redirect',
> 'inquisition=spanish', 'foo')
> req.hostname: www.example.org
> req.args: inquisition=spanish
>
HOLY COW! Is it me or does this seem completely arbitrary? The value
of the_request is wrong, too (doesn't include the query). I have no
idea how to write this up concisely without driving people to perl.
Can we ignore it and maybe it will go away? ;-)
Daniel Popowich
---------------
http://home.comcast.net/~d.popowich/mpservlets/
| |
| Jim Gallacher 2005-11-30, 5:48 pm |
| Daniel J. Popowich wrote:
> Jim Gallacher writes:
>
>
>
> HOLY COW! Is it me or does this seem completely arbitrary?
Ambiguous at the very least. At least we can blame always blame apache. ;)
> The value
> of the_request is wrong, too (doesn't include the query).
Bit of an operator error there. The ol' copy and paste bit me on the
bum. Probably a good thing that I'm not writing any nuclear reactor
control code today.
I ran the test a couple of times and the results posted above were *not*
from the given request. It should look more like this:
Client request:
http://www.example.com/mod_python/p...py?blah=de-blah
-----
index
-----
req.the_request: GET /mod_python/parsed_uri/mptest.py?blah=de-blah HTTP/1.1
--------
redirect
--------
req.the_request: GET /mod_python/parsed_uri/mptest.py?blah=de-blah HTTP/1.1
Jim
| |
| Graham Dumpleton 2005-12-01, 7:46 am |
| Hmmm, go away for two days and a mail storm erupts. :-(
I may never be able to catch up and digest this mail thread, but I'll
try and add a few comments of my own.
On 01/12/2005, at 8:41 AM, Nicolas Lehuen wrote:
> c) We don't have a req.base_uri (to follow Jim's naming suggestion)
> or req.script_name that would be equivalent to
> req.subprocess_env.get("SCRIPT_NAME"), but we have a
> req.path_info... Why is this missing ?
Note that SCRIPT_NAME as obtainable from Apache appears to be broken.
See:
http://issues.apache.org/jira/browse/MODPYTHON-68
Also note that with respect to req.uri and req.path_info, be aware
that req.path_info is normalised and req.uri is not. Ie., the latter
may have duplicated instances of '/' in it whereas that cannot occur
in req.path_info. This makes it error prone to take the len() of
req.path_info to work out how much to drop off req.uri to obtain the
base uri or script name, you need to normalise req.uri first, but
then the result will be missing the duplicates instances of '/'
unless you use some more elaborate algorithm to work it out.
Graham
| |
| Jim Gallacher 2005-12-01, 8:47 pm |
| Graham Dumpleton wrote:
> Hmmm, go away for two days and a mail storm erupts. :-(
And when you come back we go completely silent again. It's a conspiracy
I tell ya. ;)
Jim
|
|
|
|
|