The recent release of Apache Tomcat, v6.0.35, seems to break the handling of parameters encoded in UTF-8. For example, if I pass "
%E6%97%A5%E6%9C%AC
" (which is the string of URL-escaped UTF-8 bytes for "日本
"), it gets incorrectly interpreted. Both URIEncoding="UTF-8"
and useBodyEncodingForURI="true"
are set for the necessary Connector
s in server.xml
, and it works as expected prior to v6.0.35.Expected:
$ cat nippon && cat $_ | hexdump -C
日本
00000000 e6 97 a5 e6 9c ac 0a |.......|
00000007
Actual:
$ cat tomcat-bug && cat $_ | hexdump -C
æ¥æ¬
00000000 c3 a6 c2 97 c2 a5 c3 a6 c2 9c c2 ac 0a |.............|
0000000d
I
clone
d the GitHub mirror of tomcat60 and did a quick git-bisect
. The offending commit is 1ef4156
(r1200601
in SVN), which corresponds to the last two items of the Catalina changelog for unreleased version 6.0.34.So, in other words, Tomcat properly interprets parameters prior to (and fails starting from)
1ef4156
.It is hard to tell exactly what the problem is, though, because
1ef4156
is such a large commit. My best guess, without digging into the code, is that ISO-8859-1 is being used instead of UTF-8 in the decoding process—i.e., it seems that the charset is not being correctly passed to the parameter processor.The same "mistaken" decoding can be done with
iconv
, as follows:$ cat nippon | iconv -f ISO-8859-1 -t UTF-8 | hexdump -C
00000000 c3 a6 c2 97 c2 a5 c3 a6 c2 9c c2 ac 0a |.............|
0000000d
Maybe I'll have a look later and try to fix the problem.