|
Home > Archive > Web Servers on Unix and Linux > February 2007 > Problem with RewriteRule when url contains percent character
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Problem with RewriteRule when url contains percent character
|
|
| Jon Maz 2007-02-04, 7:19 pm |
| Hi,
I'm having problems with a RewriteRule that's applied to url's with the %
character in them, hope someone can help. The % character is a result of
url-encoding non-ASCII words, as in the example below:
1. the word "sécurité" comes out of my db
2. I construct the following link, using the php urlencode function:
<a href="/search/s%C3%A9curit%C3%A9">sécurité</a>
3. the url created should be interpreted by a RewriteRule:
RewriteRule ^search/([a-zA-Z0-9-+%]+)$ /pages/search.php?word=$1 [QSA,L]
However the RewriteRule doesn't match on my url, and I see this in the
RewriteLog:
init rewrite engine with requested uri /search/sécurité
So it seems like some kind of decoding is going on so that the RewriteRule
never even sees the % character. I have set everything I can think of
(MySql SET NAMES, Apache AddDefaultCharset) to utf-8.
Any ideas?
TIA,
JON
| |
|
| "Jon Maz" <pparker.removethis@gmx.removethistoo.net> schreef in bericht
news:eq5kcj$9q9$1@aioe.org...
> I'm having problems with a RewriteRule that's applied to url's with the %
> character in them, hope someone can help. The % character is a result of
> url-encoding non-ASCII words, as in the example below:
>
> 1. the word "sécurité" comes out of my db
>
> 2. I construct the following link, using the php urlencode function:
> <a href="/search/s%C3%A9curit%C3%A9">sécurité</a>
>
> 3. the url created should be interpreted by a RewriteRule:
> RewriteRule ^search/([a-zA-Z0-9-+%]+)$ /pages/search.php?word=$1
> [QSA,L]
>
> However the RewriteRule doesn't match on my url, and I see this in the
> RewriteLog:
>
> init rewrite engine with requested uri /search/sécurité
>
> So it seems like some kind of decoding is going on so that the RewriteRule
> never even sees the % character. I have set everything I can think of
> (MySql SET NAMES, Apache AddDefaultCharset) to utf-8.
>
So php has encoded the url to some ISO8859 variant and apache is decoding
those to some utf ... next to wonder is the charset used by your OS to
store the file name ...
In general, just forget diacritial, language specific, fancy characters and
just use 'securite' for filename.
It keeps you from dozens of cross-platform and cross-language traps, easing
migration of a website ten fold.
http://czyborra.com/charsets/iso8859.html 'The ISO 8859 Alphabet Soup'
HansH
| |
| Jon Maz 2007-02-04, 7:19 pm |
| Hi Hans,
Thanks for your answer. I guess I'm best off just avoiding the whole thing.
What got me wondering was the fact that my php application can cope fine
when this encoded word is passed in the query string:
/pages/search.php?word=s%C3%A9curit%C3%A9
But perhaps it's simply that different rules apply to a url and a query
string parameter?
Thanks,
JON
| |
|
|
"Jon Maz" <pparker.removethis@gmx.removethistoo.net> wrote in message
news:eq5kcj$9q9$1@aioe.org...
> Hi,
>
> I'm having problems with a RewriteRule that's applied to url's with the %
> character in them, hope someone can help. The % character is a result of
> url-encoding non-ASCII words, as in the example below:
>
> 1. the word "sécurité" comes out of my db
>
> 2. I construct the following link, using the php urlencode function:
> <a href="/search/s%C3%A9curit%C3%A9">sécurité</a>
How do you get s%C3%A9curit%C3%A9 from sécurité
sécurité, url encoded, is s%E9curit%E9
s%C3%A9curit%C3%A9 decoded is sécurité as is correctly reported in your rewrite log.
>
> 3. the url created should be interpreted by a RewriteRule:
> RewriteRule ^search/([a-zA-Z0-9-+%]+)$ /pages/search.php?word=$1 [QSA,L]
a hyphen in a character class specifies a range unless it's the first or last character in
the class
what range are you looking for with 9-+
>
> However the RewriteRule doesn't match on my url, and I see this in the
> RewriteLog:
>
> init rewrite engine with requested uri /search/sécurité
The rewrite rule works correctly, the uri contains à and ©. The regex doesn't allow for
these.
>
> So it seems like some kind of decoding is going on so that the RewriteRule
> never even sees the % character. I have set everything I can think of
> (MySql SET NAMES, Apache AddDefaultCharset) to utf-8.
>
The uri is decoded before the server tries to resolve it, why would it not?
Why are you trying to do the heavy lifting with mod rewrite? just pass the search term to
the script and validate it there, you should validate all user input in your scripts.
RewriteRule ^search/(.+)$ /pages/search.php?word=$1 [QSA,L]
Rich
| |
| Nick Kew 2007-02-05, 1:16 pm |
| On Sun, 4 Feb 2007 21:49:08 -0000
"Jon Maz" <pparker.removethis@gmx.removethistoo.net> wrote:
> So it seems like some kind of decoding is going on so that the
> RewriteRule never even sees the % character. I have set everything I
> can think of (MySql SET NAMES, Apache AddDefaultCharset) to utf-8.
No you haven't. The expression in your RewriteRule is firmly in
ASCII, so it fails to match the non-ASCII characters in the URL.
> Any ideas?
Don't faff about with mod_rewrite like that. Or if you
really must, fix your regexp. Or as someone else said,
stick to ASCII.
--
Nick Kew
Application Development with Apache - the Apache Modules Book
http://www.apachetutor.org/
| |
| Jon Maz 2007-02-06, 7:20 pm |
| Thanks to everybody for their help on this one!
| |
| Jon Maz 2007-02-06, 7:20 pm |
| Thanks to everybody for their help on this one!
| |
| Tim Roberts 2007-02-07, 7:22 am |
| "rh" <disposable12345@cableone.net> wrote:
>
>"Jon Maz" <pparker.removethis@gmx.removethistoo.net> wrote:
>
>How do you get s%C3%A9curit%C3%A9 from sécurité
>
>sécurité, url encoded, is s%E9curit%E9
Only in iso-8859-1. In UTF-8, the OP's encoding is correct.
--
Tim Roberts, timr@probo.com
Providenza & Boekelheide, Inc.
|
|
|
|
|