# Is it ok to change $ENV{'QUERY_STRING'} before "use CGI;" is called..? Discussion in 'Perl Misc' started by Raymundo, Mar 4, 2007. 1. ### RaymundoGuest Dear, When a web-broswer sends a GET request, I can get "keywords" or "param"eters using CGI module, as you know:$q = new CGI;
$name =$q->param('name');

However, when browser's request includes multi-byte characters, they
can be encoded using UTF-8 or EUC-KR(in Korea, for example) according
to the option in the browswer. ("Send URL in UTF-8" in IE,
"network.standard-url.encode-utf8" in FF, etc.)

At first, I tried to check the value which I got from $q->param() like this:$name = $q->param("name");$name = check_and_convert($name); .... sub check_and_convert { # this subroutine guesses the encoding of parameter using Encode::Guess # if not UTF-8, it converts the parameter to UTF-8 encoded string and return it } But there are so many parameters and also so many codes using them. I found that it's almost impossible, or so inconvenient to check whenever the parameters are fetched. Second, I tried to "check and convert"$ENV{QUERY_STRING} value before
a CGI object is created:

# convert QUERY_STRING to UTF-8 here
$ENV{QUERY_STRING} = check_and_convert($ENV{QUERY_STRING});
# then create CGI object
$q = new CGI; # I can get name=XXX and XXX is encoded in UTF-8$name $q->param("name") In this case, I think, I don't need to check each parameter in any other following codes... All the values are now UTF-8 encoded. As far as I had tested, it looked successful. But I'm not sure that such approach is good(?) and safe. (I think it's somewhat tricky to change the environment variable in script..) Is there any other environment variable or anything else that I should check before "new CGI;" is called? Can I be sure that I'll not lose any information when I change QUERY_STRING? Any advices would be appreciated. I'm soryy I'm not good at English. Raymundo at South Korea. Raymundo, Mar 4, 2007 1. ### Advertisements 2. ### -berlin.deGuest Raymundo <> wrote in comp.lang.perl.misc: > Dear, > > When a web-broswer sends a GET request, I can get "keywords" or > "param"eters using CGI module, as you know: >$q = new CGI;
> $name =$q->param('name');
>
> However, when browser's request includes multi-byte characters, they
> can be encoded using UTF-8 or EUC-KR(in Korea, for example) according
> to the option in the browswer. ("Send URL in UTF-8" in IE,
> "network.standard-url.encode-utf8" in FF, etc.)
>
> At first, I tried to check the value which I got from $q->param() like > this: > >$name = $q->param("name"); >$name = check_and_convert($name); > ... > > sub check_and_convert { > # this subroutine guesses the encoding of parameter using > Encode::Guess > # if not UTF-8, it converts the parameter to UTF-8 encoded string and > return it > } > > > But there are so many parameters and also so many codes using them. I > found that it's almost impossible, or so inconvenient to check > whenever the parameters are fetched. > > > Second, I tried to "check and convert"$ENV{QUERY_STRING} value before
> a CGI object is created:
>
> # convert QUERY_STRING to UTF-8 here
> $ENV{QUERY_STRING} = check_and_convert($ENV{QUERY_STRING});
> # then create CGI object
> $q = new CGI; > # I can get name=XXX and XXX is encoded in UTF-8 >$name $q->param("name") > > In this case, I think, I don't need to check each parameter in any > other following codes... All the values are now UTF-8 encoded. > > As far as I had tested, it looked successful. But I'm not sure that > such approach is good(?) and safe. (I think it's somewhat tricky to > change the environment variable in script..) > > Is there any other environment variable or anything else that I should > check before "new CGI;" is called? Can I be sure that I'll not lose > any information when I change QUERY_STRING? > > Any advices would be appreciated. I'm soryy I'm not good at English. > Raymundo at South Korea. You should avoid changing the environment like that. Use the interface that CGI provides. The ->Vars method gives you a hash that contains the parameter values keyed by their names. Convert it as follows (untested): my$param = $q->Vars;$_ = check_and_convert( $_) for values %$param;

This supposes that check_and_convert() leaves null bytes alone. If
that isn't sure, use

$_ = join( "\0", map check_and_convert($_), split /\0/, $_) for values %$param;

See perldoc CGI for the significance of null bytes in the values.
Either way you will convert all values in one go. Use the converted
hash instead of the ->param method for parameter access.

Anno

-berlin.de, Mar 4, 2007

3. ### Ben MorrowGuest

Quoth "Raymundo" <>:
> Dear,
>
> When a web-broswer sends a GET request, I can get "keywords" or
> "param"eters using CGI module, as you know:
> $q = new CGI; >$name = $q->param('name'); > > However, when browser's request includes multi-byte characters, they > can be encoded using UTF-8 or EUC-KR(in Korea, for example) according > to the option in the browswer. ("Send URL in UTF-8" in IE, > "network.standard-url.encode-utf8" in FF, etc.) > > At first, I tried to check the value which I got from$q->param() like
> this:
>
> $name =$q->param("name");
> $name = check_and_convert($name);
> ...
>
> sub check_and_convert {
> # this subroutine guesses the encoding of parameter using
> Encode::Guess
> # if not UTF-8, it converts the parameter to UTF-8 encoded string and
> return it
> }

I would not recommend using Encode::Guess. It isn't safe.

For a (detailed) explanation of details of I18N form submission, see
http://xrl.us/u68e . Executive summary: serve forms as 'text/html;
charset=utf-8' and assume the results are in UTF-8. You should decode
*after* getting the values from CGI->param.

Ben

--
'Deserve [death]? I daresay he did. Many live that deserve death. And some die
that deserve life. Can you give it to them? Then do not be too eager to deal
out death in judgement. For even the very wise cannot see all ends.'

Ben Morrow, Mar 5, 2007
4. ### RaymundoGuest

Thank you, Anno and Ben.

Anno's suggestion:
my $param =$q->Vars;
$_ = check_and_convert($_) for values %$param; works well with GET request. But it makes a problem with POST request like file-uploading. I don't know why. I just guess it's because check_and_convert affects the contents of POST request. (If I comment out check_and_convert line, script works well) I'm interested in only GET request, because POST request includes "charset=" field in its header and I can convert, if needed, the encoding of the contents. So I'm planning to add if clause: if ($q->request_method() eq "GET") {
my $param =$q->Vars;
$_ = check_and_convert($_) for values %$param; } Ben, would you please tell me why Encode::Guess isn't safe? Does it have a security problem? Anyway, > For a (detailed) explanation of details of I18N form submission, see > http://xrl.us/u68e. Executive summary: serve forms as 'text/html; > charset=utf-8' and assume the results are in UTF-8. The script does so when it prints forms and receives POST data from the forms, which seemed to be doing well. The problem is related to GET request, that is, when URL includes multi-bytes characters. W3C recommends that multi-bytes chars in URL should be %-encoded. (http://www.w3.org/TR/REC-html40/interact/ forms.html#form-content-type) But I still want to support when visitors type URL using their fingers (they would not like to type "%EC %90.." and when other webpage gives a link to my page not using %- encoded string. .... Returing to my first post in this thread... Is it so bad idea to change the environment variable QUERY_STRING? It solves every problem about this. It requires only one additional line in code. I think that change may affect only the script and its child processes, and the script doesn't fork any child process. Raymundo at South Korea. Raymundo, Mar 6, 2007 5. ### Ben MorrowGuest Quoth "Raymundo" <>: > > Ben, would you please tell me why Encode::Guess isn't safe? Does it > have a security problem? Not security, per se; it's just that it's impossible to reliably distinguish between (say) UTF-8 and ISO8859-1 that just happens to look like UTF-8. > > For a (detailed) explanation of details of I18N form submission, see > > http://xrl.us/u68e. Executive summary: serve forms as 'text/html; > > charset=utf-8' and assume the results are in UTF-8. Also, if you read the page linked, you will see that many browsers do... rather stupid things when the user enters text into a form that is not representable in the encoding of the page. Since UTF-8 can represent everything, it doesn't have that problem. > The script does so when it prints forms and receives POST data from > the forms, which seemed to be doing well. > > The problem is related to GET request, that is, when URL includes > multi-bytes characters. W3C recommends that multi-bytes chars in URL > should be %-encoded. (http://www.w3.org/TR/REC-html40/interact/ > forms.html#form-content-type) But I still want to support when > visitors type URL using their fingers (they would not like to type "%EC > %90.." and when other webpage gives a link to my page not using %- > encoded string. Well... a not-url-encoded URL is invalid. At least Firefox appears to automatically translate (say) a URL typed into the address bar into its correct URL-escaped form before submitting it to the server; I don't know what IE or Konq/Safari or Opera do. > Returing to my first post in this thread... Is it so bad idea to > change the environment variable QUERY_STRING? It solves every problem > about this. It requires only one additional line in code. I think that > change may affect only the script and its child processes, and the > script doesn't fork any child process. If you're using CGI.pm to process QUERY_STRING, then you should stick to that. Messing about is just asking for trouble. What is the problem with decoding the submitted values afterwards? (It can still be one line or so of code, if you do it right. See Anno's example.) Ben -- I must not fear. Fear is the mind-killer. I will face my fear and I will let it pass through me. When the fear is gone there will be nothing. Only I will remain. Frank Herbert, 'Dune' Ben Morrow, Mar 6, 2007 6. ### RaymundoGuest oops.. I wrote a reply. It took about 3 hours. (It's too difficult to me to write in English) I posted it an hour ago but I can't see it even now. I'm afraid it's lost :'( I'll rewrite my last reply... At first, thank you Ben for your kind advice. In fact, the Perl script that I'm modifying is not my own code. It is UseModWiki (http://www.usemod.com/cgi-bin/wiki.pl) and I've been modifying it to use it for my personal homepage. (But I'm just a novice in Perl so it's not easy In wiki site, the URL of each page consists of script URL and "the title of that page", like ".../wiki.pl?Perl". I'm a Korean and my wiki has many pages whose names are in Korean. > Well... a not-url-encoded URL is invalid. At least Firefox appears to > automatically translate (say) a URL typed into the address bar into its > correct URL-escaped form before submitting it to the server; I don't > know what IE or Konq/Safari or Opera do. As you said, multi-byte characters in URL is invalid. I know it :'( So url-encoded URL is the answer. However, see the following URLs: 1: .../wiki.pl?Linux <- Everyone can know it is the page about "Linux" 2: .../wiki.pl?%EB%A6%AC%EB%88%85%EC%8A%A4 <- Can anyone guess what the title of this page is?? :-/ It's "Linux" in Korean 3: .../wiki.pl?¸®´ª½º <- (If you can't see the Korean chars, plz see http://gypark.pe.kr/upload/linux_in_korean.gif ) Everyone who are able to read Korean can know it is the page about Linux. (I'll type "LINUX(ko)" for this word from now on) URL 2 is valid, but its appearance is so.... :-/ And I must give up the big advantage of wiki, "URL represent the content" URL 3 is said to be invalid. But I still want to support it. That is, when someone types that URL in the address bar of a browser, or someone clicks the link to URL 3 in other site, I want my wiki.pl script show the proper page, "LINUX(ko)". Fortunately, web browsers like FF, IE, and Safari convert the URL into %-encoded form before they submit it, as you said. Therefore, I think, it's not main issue that URL contains multi-bytes chars, because the server will receive %-encoded request. The problem is that, as I'd said in my first article, the %-encoded form of "LINUX(ko)" is not unique. It can be "%EB%A6%AC%EB%88%85%EC%8A%A4" (UTF-8 sequence) or "%B8%AE%B4%AA%BD%BA" (EUC-KR, in Korea) The browsers choose which encoding to use according to the option in them. (for FF, "network.standard-url.encode-utf8" in "about:config") Server can't choose it and even can't know what is chosen explictily, which is the reason that wiki.pl should "guess". > > Returing to my first post in this thread... Is it so bad idea to > > change the environment variable QUERY_STRING? It solves every problem > > about this. It requires only one additional line in code. I think that > > change may affect only the script and its child processes, and the > > script doesn't fork any child process. > > If you're using CGI.pm to process QUERY_STRING, then you should stick to > that. Messing about is just asking for trouble. What is the problem with > decoding the submitted values afterwards? (It can still be one line or > so of code, if you do it right. See Anno's example.) "The problem with decoding the submitted values afterward" is... (following are come from my testing results. it may be fixed but I'm not so expert in Perl) 1) There are hundreds of lines that call "->param()". I don't think it's good idea to insert so many "guess_and_convert()" after those lines. 1-1) In fact, those lines actually call "GetParam()" subroutine and GetParam() calles ->param in it. So it can be a solution to insert guess_and_convert() in GetParam(). However, GetParam() fetches the value of a parameter not only from GET request but also from POST request and even from saved files. For now, I'm not sure it's ok to modify GetParam(). In addition, it seems to be inefficient to call convert routine every time a single parameter is fetched. 2) Concering Anno's example, it looks good because it calls convert routine only once. However, it shows some problem while processing POST request, like file uploading, receiving trackback, etc. I tried to debug but failed to find why. I think it is the second best way to apply that code with additional if-clause: if ($q->request_method() eq
"GET")

3) In the original code, there are some lines that access
$ENV{QUERY_STRING} directly, without calling CGI functions. I need to apply "guess_and_convert" to those lines. So I cling to Q_S like this. As far as I know: (please correct me if I am wrong) 1) Q_S is related to only GET request. (All the forms in wiki.pl calls "wiki.pl" without any appending URL query when it submits) 2) Q_S may be in the form of "keywords" or "param1=value1&param2=value2...". guess_and_convert() will not change the important characters like "&", "=", "+". It will not change any other ASCII characters. It will just change the multi-byte chars. Because those characters have been already encoded by browser, this change is just the change of the number and the sequence of the "%HH" runs. There is, I think, no problem when CGI object is created and initialized using Q_S. 3) Changing Q_S affects only the running script and it's child process. 4) After I began to test my approach, no problem shown until now. (Of course, this can't be the proof that it will never make a problem. So I asked your advices in usenet 5) Most of all, I expect that I don't need to care about it when the rest of code is updated. (at least until the browser's behavior change dramatically or CGI module) If anyone give me concrete examples of the problem that may appear when I convert the encoding of Q_S, I'll give up my way immediately... Raymundo Raymundo, Mar 6, 2007 7. ### Ben MorrowGuest Quoth "Raymundo" <>: > In fact, the Perl script that I'm modifying is not my own code. It is > UseModWiki (http://www.usemod.com/cgi-bin/wiki.pl) and I've been > modifying it to use it for my personal homepage. (But I'm just a > novice in Perl so it's not easy I would have been helpful if you'd mentioned this at the start. > In wiki site, the URL of each page consists of script URL and "the > title of that page", like ".../wiki.pl?Perl". I'm a Korean and my wiki > has many pages whose names are in Korean. > > > Well... a not-url-encoded URL is invalid. At least Firefox appears to > > automatically translate (say) a URL typed into the address bar into its > > correct URL-escaped form before submitting it to the server; I don't > > know what IE or Konq/Safari or Opera do. > > As you said, multi-byte characters in URL is invalid. I know it :'( So > url-encoded URL is the answer. However, see the following URLs: > 1: .../wiki.pl?Linux <- Everyone can know it is the page about "Linux" > 2: .../wiki.pl?%EB%A6%AC%EB%88%85%EC%8A%A4 <- Can anyone guess what > the title of this page is?? :-/ It's "Linux" in Korean [ I've stripped the top-bit-set characters: my newsreader appears to have mangled them ] > 3: .../wiki.pl? <- (If you can't see the Korean chars, plz see > http://gypark.pe.kr/upload/linux_in_korean.gif ) Everyone who are able > to read Korean can know it is the page about Linux. (I'll type > "LINUX(ko)" for this word from now on) > > URL 2 is valid, but its appearance is so.... :-/ And I must give up > the big advantage of wiki, "URL represent the content" > > URL 3 is said to be invalid. But I still want to support it. That is, > when someone types that URL in the address bar of a browser, or > someone clicks the link to URL 3 in other site, Is it common practice for people to write links to URLs with multibyte chars in them? Since the actual link itself is not user-visible (the text of the link is, but that's quite different) there's no reason not to encode it correctly, is there? Of course, if it *is* common practice, you may well want to handle it (if you can), regardless of its incorrectness. > I want my wiki.pl script show the proper page, "LINUX(ko)". Firstly, let me say that I entirely sympathise with this desire . It is a major failing in the design of URLs that they are so unfriendly to people whose native language is not English. That said, I do not think you can win here . At least my copy of FF will convert .../wiki.pl?KOREAN_CHARS into %-encodings *in the address bar* before it submits the URL. IE6 appears to do the opposite: that is, AFAICT it both displays the URL as typed in the address bar and actually submits a multi-byte URL to the server. Your Q_S munging will need to be quite subtle, to handle cases like .../wiki.pl?foo%3bbar, and correctly distinguish them from .../wiki.pl?foo;bar, which presumably means something quite different. > Fortunately, web browsers like FF, IE, and Safari convert the URL into > %-encoded form before they submit it, as you said. Therefore, I think, > it's not main issue that URL contains multi-bytes chars, because the > server will receive %-encoded request. The problem is that, as I'd > said in my first article, the %-encoded form of "LINUX(ko)" is not > unique. It can be "%EB%A6%AC%EB%88%85%EC%8A%A4" (UTF-8 sequence) or > "%B8%AE%B4%AA%BD%BA" (EUC-KR, in Korea) The browsers choose which > encoding to use according to the option in them. (for FF, > "network.standard-url.encode-utf8" in "about:config") Server can't > choose it and even can't know what is chosen explictily, which is the > reason that wiki.pl should "guess". OK, so you're in an impossible situation and you're trying to do the best you can. Encode::Guess may be your best option here . > > > Returing to my first post in this thread... Is it so bad idea to > > > change the environment variable QUERY_STRING? It solves every problem > > > about this. It requires only one additional line in code. I think that > > > change may affect only the script and its child processes, and the > > > script doesn't fork any child process. > > > > If you're using CGI.pm to process QUERY_STRING, then you should stick to > > that. Messing about is just asking for trouble. What is the problem with > > decoding the submitted values afterwards? (It can still be one line or > > so of code, if you do it right. See Anno's example.) > > "The problem with decoding the submitted values afterward" is... > (following are come from my testing results. it may be fixed but I'm > not so expert in Perl) > > 1) There are hundreds of lines that call "->param()". I don't think > it's good idea to insert so many "guess_and_convert()" after those > lines. > > 1-1) In fact, those lines actually call "GetParam()" subroutine and > GetParam() calles ->param in it. So it can be a solution to insert > guess_and_convert() in GetParam(). However, GetParam() fetches the > value of a parameter not only from GET request but also from POST > request and even from saved files. For now, I'm not sure it's ok to > modify GetParam(). In addition, it seems to be inefficient to call > convert routine every time a single parameter is fetched. I would say the Right Answer in this case is to write your own GetParam sub which calls the original GetParam, and then applies your Encode::Guess logic. If the script isn't changing the values of the paramters, only accessing them, you can avoid the multiple guessing by using the Memoize module on your sub. > 2) Concering Anno's example, it looks good because it calls convert > routine only once. However, it shows some problem while processing > POST request, like file uploading, receiving trackback, etc. I tried > to debug but failed to find why. I think it is the second best way to > apply that code with additional if-clause: if ($q->request_method() eq
> "GET")

What sort of problems? If your guessing routine is guessing incorrectly
for some of you real data, this indicates it's not safe to use it
anyway.

> 3) In the original code, there are some lines that access
> $ENV{QUERY_STRING} directly, without calling CGI functions. I need to > apply "guess_and_convert" to those lines. Well, that's just evil . My standard recommendation at this point would be to throw out whatever it is you're using and find something that's decently written. > So I cling to Q_S like this. As far as I know: (please correct me > if I am wrong) > 1) Q_S is related to only GET request. (All the forms in wiki.pl calls > "wiki.pl" without any appending URL query when it submits) You may be correct in this case that your wiki.pl only uses a query string for GET requests. It is certainly possible to POST to a URL with a query string. > 2) Q_S may be in the form of "keywords" or > "param1=value1&param2=value2...". guess_and_convert() will not change > the important characters like "&", "=", "+". It will not change any > other ASCII characters. It will just change the multi-byte chars. > Because those characters have been already encoded by browser, this > change is just the change of the number and the sequence of the "%HH" > runs. There is, I think, no problem when CGI object is created and > initialized using Q_S. Err... OK. You must make sure you alter Q_S *before* any CGI.pm calls are mode, though. > 3) Changing Q_S affects only the running script and it's child > process. I don't know what happens under mod_perl, if you ever move your script to that envionment. Under standard CGI, this is certainly true. It seems to me that you are trying to take a piece of rather badly-written code you don't really understand, and alter it do do something that isn't really possible anyway. Given that you're in that much of a mess, a simple edit of$ENV{QUERY_STRING} may well be the best
way out .

Ben

--
All persons, living or dead, are entirely coincidental.
Kurt Vonnegut

Ben Morrow, Mar 6, 2007
8. ### RaymundoGuest

> > 3: .../wiki.pl? <- (If you can't see the Korean chars, plz see
> >http://gypark.pe.kr/upload/linux_in_korean.gif) Everyone who are able
> > to read Korean can know it is the page about Linux. (I'll type
> > "LINUX(ko)" for this word from now on)

>
> > URL 2 is valid, but its appearance is so.... :-/ And I must give up
> > the big advantage of wiki, "URL represent the content"

>
> > URL 3 is said to be invalid. But I still want to support it. That is,
> > when someone types that URL in the address bar of a browser, or
> > someone clicks the link to URL 3 in other site,

>
> Is it common practice for people to write links to URLs with multibyte
> chars in them? Since the actual link itself is not user-visible (the
> text of the link is, but that's quite different) there's no reason not
> to encode it correctly, is there? Of course, if it *is* common practice,
> you may well want to handle it (if you can), regardless of its
> incorrectness.

Do you mean this case?
[a href="actual link itself"] text of the link [/a]
(I replaced "less than" and "greater than" signs with brackets, so
that any smart(?) news-reader doesn't process it as real link)

Yes, you're right. In that case the URL is hidden to user, so it
doesn't matter that URL is "...%EB%A6". And this is very typical in
plain html documents.

However many recent CGI tools, like blog(MovableType, TatterTools,
etc) and almost (as far as I know) wikis, provide the feature of "auto-
linking"(say). Someone post an article in plain text to his/her blog,
then the blog tool looks for URL pattern in the text, converts it to
"a href" links, and print it in its html output. In this case, "text
of the link" is equal to "actual link".

Another example is, wiki provides the concept of "interwiki" for a
convenient linking. That is, when I submit the text:
UseMod:UseModWiki
Google:UseModWiki (even though google is not a wiki..)
In html output, they are converted automatically to the following
links, respectively:
[a href="http://www.usemod.com/cgi-bin/wiki.pl?
UseModWiki"]UseMod:UseModWiki[/a]
[a href="http://www.google.com/search?q=UseModWiki"]Google:UseModWiki[/
a]
(The mapping table, between a interwiki name like "Google:" and the
real URL like "http://www.google.com/search?q=", is stored in a file
in the server)

In this case, someone may want to put a link to my page in his wiki.
Then "Raymundo:LINUX(ko)" is much (x 100) easier for him and more
understandable to other visitors than "Raymundo:%EB%A6%AC%EB%88%85%EC
%8A%A4".

I've already modified my wiki, so that it encodes the actual link when
it processes interwiki. But it's impossible to force every developers
of all wikis in the world.

Anyway this type of links can be common practice nowadays, in my
opinion.

> > I want my wiki.pl script show the proper page, "LINUX(ko)".

>
> Firstly, let me say that I entirely sympathise with this desire . It
> is a major failing in the design of URLs that they are so unfriendly to
> people whose native language is not English.
>
> That said, I do not think you can win here . At least my copy of FF
> will convert .../wiki.pl?KOREAN_CHARS into %-encodings *in the address
> bar* before it submits the URL. IE6 appears to do the opposite: that is,
> AFAICT it both displays the URL as typed in the address bar and actually
> submits a multi-byte URL to the server. Your Q_S munging will need to be
> quite subtle, to handle cases like .../wiki.pl?foo%3bbar, and correctly
> distinguish them from .../wiki.pl?foo;bar, which presumably means
> something quite different.

I agree IE6 acts differently (and strange). This is the access_log of
apache server when a request URL includes "wiki/LINUX(ko)":

"GET /wiki/\xb8\xae\xb4\xaa\xbd\xba" <- IE, EUC-KR
"GET /wiki/%B8%AE%B4%AA%BD%BA <- FF, EUC-KR
"GET /wiki/%EB%A6%AC%EB%88%85%EC%8A%A4" <- IE and FF, UTF-8

I don't know why IE's requests are in diffrent forms as the encoding
differs. It does url-encode if its option is set to use UTF-8 request,
but it doesn't if the option is unchecked. But as fas as I have
tested, my wiki.pl showed no difference between when a request came
from FF and from IE.

I'll consider what you mention with the example ";" and "%3b" and test
more.

> > 2) Concering Anno's example, it looks good because it calls convert
> > routine only once. However, it shows some problem while processing
> > POST request, like file uploading, receiving trackback, etc. I tried
> > to debug but failed to find why. I think it is the second best way to
> > apply that code with additional if-clause: if ($q->request_method() eq > > "GET") > > What sort of problems? If your guessing routine is guessing incorrectly > for some of you real data, this indicates it's not safe to use it > anyway. I agree and I tried to find the exact problem and the reason of it. I'll describe here what I found until now: At first, Anno's code was to change the values of CGI->Vars hash:$q = new CGI;
# convert
my $param =$q->Vars;
$_ = check_and_convert($_) for values %$param; File-uploading and trackback features are not part of the original file. I added it myself about two years ago, getting codes from examples in WWW. For file-uploading, wiki.pl prints the form including:$q->start_form('post',"$ScriptName", 'multipart/form-data') . "\n"; "<input type='hidden' name='action' value='upload'>"; "<input type='hidden' name='upload' value='1'>" . "\n";$q->filefield("upload_file","",60,80) . "\n"; #
<-- file selection field
"&nbsp;&nbsp;" . "\n";
print $q->submit('Upload') . "\n";$q->endform

User is supposed to click "open" button, choose a file in a file
selection window, and click "Upload" button to submit.

To save the file in server, the following code is used:

$file =$q->upload('upload_file');
open(FILE, ">file_in_local_disk_of_server");
binmode FILE;
while (<$file>) { print FILE$_; # read from client's file and write to
server's disk
}
close(FILE);

I put "die;" for check:

$file =$q->upload('upload_file');
die "[$file]"; # here open(FILE, ">file_in_local_disk_of_server"); If I don't convert Vars, script dies printing "[D:\download \text.txt]". But when Vars is converted, script dies printing "[]". That means$file lost the information that it's a file handle.

How can I keep it as valid file handle? Even without converting, I
found that any write access to $file causes the same problem. my$param = $q->Vars; $$param{'upload_file'} .= ""; # no other string appended, but it lose file handle or even$$param{'upload_file'} = $$param{'upload_file'}; # it also lose file handle!!! :-O So there is nothing that check_and_convert() can do. Modifying "- >Vars" itself cause problem. If I have to choose this approach anyway, I can do like this: my param = q->Vars; foreach (keys %param) {$$param{$_} = guess_and_convert(param{$_}) if ($_ ne
"upload_file"); # don't try to assign param{'upload_file'}
}

But there is no confirm that all other parameters are ordinary
strings.

> > So I cling to Q_S like this. As far as I know: (please correct me
> > if I am wrong)
> > 1) Q_S is related to only GET request. (All the forms in wiki.pl calls
> > "wiki.pl" without any appending URL query when it submits)

>
> You may be correct in this case that your wiki.pl only uses a query
> string for GET requests. It is certainly possible to POST to a URL with
> a query string.

Yes, I have to consider it in the future. And I still believe it
doesn't matter, because "query string" in URL is anyway just a string
which can't have any invisible information (like $file in above). > > 2) Q_S may be in the form of "keywords" or > > "param1=value1&param2=value2...". guess_and_convert() will not change > > the important characters like "&", "=", "+". It will not change any > > other ASCII characters. It will just change the multi-byte chars. > > Because those characters have been already encoded by browser, this > > change is just the change of the number and the sequence of the "%HH" > > runs. There is, I think, no problem when CGI object is created and > > initialized using Q_S. > > Err... OK. You must make sure you alter Q_S *before* any CGI.pm calls > are mode, though. I agree. > > 3) Changing Q_S affects only the running script and it's child > > process. > > I don't know what happens under mod_perl, if you ever move your script > to that envionment. Under standard CGI, this is certainly true. That's the type of answer I want! I've never thought of mod_perl or anything like it. (Actually I have no idea of what it is.) > It seems to me that you are trying to take a piece of rather > badly-written code you don't really understand, and alter it do do > something that isn't really possible anyway. Given that you're in that > much of a mess, a simple edit of$ENV{QUERY_STRING} may well be the best
> way out .
>
> Ben
>

I plan to check and test more things and choose what to do.

I thank you for your constant help. Have a nice day!

Raymundo at South Korea.

Raymundo, Mar 7, 2007

### Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.