|
Home > Archive > Web Servers General Talk > February 2005 > Encoding of document-name (url) in GET-requests
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Encoding of document-name (url) in GET-requests
|
|
| Bodo Kaelberer 2005-01-26, 7:55 am |
| Hi
A client that requests a document from a http-server has to do certain
encodings to the path and name of the document he is requesting.
E.g. a client that request a document
"/test/test 1.html"
will typically talk somethink like:
GET /test/test%201.html HTTP/1.1
Can someone tell me, which characters has do be encoded exactly?
And can I encode to much characters or does the server decode every
occurance of %XX ?
Thanks & bye
Bodo
--
1 Bodo Kaelberer
123 http://www.webkind.de/ http://www.kaelberer-aio.de/
3 Seelig sind die, die da arm an Geist sind, denn sie werden
4 sich Christlich Soziale Union nennen.
| |
| Bodo Kaelberer 2005-01-26, 7:55 am |
| Hi
I forgot something: I'm not only interested in the encoding of a path
but in general, especially path and parameters passed to a script.
Bye
--
1 Bodo Kaelberer
123 http://www.webkind.de/ http://www.kaelberer-aio.de/
3 Seelig sind die, die da arm an Geist sind, denn sie werden
4 sich Christlich Soziale Union nennen.
| |
| Michael Wojcik 2005-02-02, 6:01 pm |
|
In article <h6h0v0peo5q508akac3eqfb7bru0v70a4n@4ax.com>, Bodo Kaelberer <BodoKaelberer_un@webkind.de> writes:
>
> A client that requests a document from a http-server has to do certain
> encodings to the path and name of the document he is requesting.
> E.g. a client that request a document
> "/test/test 1.html"
> will typically talk somethink like:
> GET /test/test%201.html HTTP/1.1
>
> Can someone tell me, which characters has do be encoded exactly?
RFC 2616 is the HTTP/1.1 specification; it governs the conversion
from a full HTTP-scheme URL to the Request-URI in the HTTP request
message. I don't believe there's any change in the character-
encoding requirements in that conversion: the Request-URI will either
be a URL abs_path (for requests directly to the content server) or a
full URL (for requests to a proxy).
RFC 1630 covers URIs specifically with regard to HTTP. These will
generally be URLs with hierarchical paths. RFC 1630 states that all
characters not in the safe set be encoded when a URI is in canonical
form. Specifically:
- Whitespace must be %-encoded.
- The percent sign, the hash sign, the question mark, the colon, and
the equals sign must be %-encoded when they're not being used for
their special purposes.
- The asterisk and the exclamation point may have to be %-encoded for
URLs for certain "WWW protocols". This requirement is ambiguous in
RFC 1630 (it's not clear whether TBL means these characters are
always reserved because they have special significance for some
protocols, or they're reserved only for protocols where they have
such significance; and it conflicts with the BNF grammar later in the
RFC). It never hurts to %-encode them, though.
- The following have special meaning in hierarchical URLs (which
nearly all HTTP URLs are), and must be %-encoded if they do not refer
to the object hierarchy: the slash or any a path component consisting
of just "." or "..". For example, if you have an HTTP server running
on a filesystem that allows the slash character in a filename, and you
want to serve a file with a filename containing a slash (not a good
idea), you would have to %-encode that slash.
- Control characters and all 8-bit characters with a value greater than
0x7f (unsigned) must be %-encoded.
- The full set of characters that specifically do NOT need to be
encoded are:
- The letters and digits included in the ASCII character set.
- Any of the "safe" punctuation marks: $ | - | _ | @ | . | &
- Any of the "extra" punctuation marks, except asterisk and
exclamation point (see note above): " | ' | ( | ) | ,
- The plus sign.
> And can I encode to much characters or does the server decode every
> occurance of %XX ?
RFC 1630 specifically allows %-encoding of characters that do not
need to be encoded; you can encode every character in the URL that
doesn't have special meaning if you like.
See RFC 1630 for more information. Also see RFC 2986 (which is
also STD 66) for the current (just last month) version of the URI
generic syntax.
RFCs are available at http://www.rfc-editor.org/.
--
Michael Wojcik michael.wojcik@microfocus.com
Let's say the conservative is the quiet green grin of the crocodile ...
an' the liberal is the SNAP! -- Walt Kelly
|
|
|
|
|