servlet request parameter character encoding

Question

I have a Java servlet that receives data from an upstream system via a HTTP GET request. This request includes a parameter named "text". If the upstream system sets this parameter to:

TEST3 please ignore:

It appears in the logs of the upstream system as:

00 54 00 45 00 53 00 54 00 33 00 20 00 70 00 6c   //TEST3 pl
00 65 00 61 00 73 00 65 00 20 00 69 00 67 00 6e   //ease ign
00 6f 00 72 00 65 00 3a                           //ore:

(The // comments do not actually appear in the logs)

In my servlet I read this parameter with:

String text = request.getParameter("text");

If I print the value of text to the console, it appears as:

T E S T 3  p l e a s e  i g n o r e :

If I inspect the value of text in the debugger, it appears as:

\u000T\u000E\u000S\u000T\u0003\u0000 \u000p\u000l\u000e\u000a\u000s\u000e\u0000 
\u000i\u000g\u000n\u000o\u000r\u000e\u000:

So it seems that there's a problem with the character encoding. The upstream system is supposed to use UTF-16. My guess is that the servlet is assuming UTF-8 and therefore is reading twice the number of characters it should be. For the message "TEST3 please ignore:" the first byte of each character is 00. This is being interpreted as a space when read by the servlet, which explains the space that appears before each character when the message is logged by the servlet.

Obviously my goal is simply to get the message "TEST3 please ignore:" when I read the text request param. My guess is that I could achieve this by specifying the character encoding of the request parameter, but I don't know how to do this.

GET parameters have to be ASCII or URL encoded, you can't use a special charset in there. — Maurício Linhares
– Maurício Linhares, Commented Jun 19, 2012 at 11:42
Yup - en.wikipedia.org/wiki/Percent-encoding#Current_standard — Maurício Linhares
– Maurício Linhares, Commented Jun 19, 2012 at 12:43

letonai · Accepted Answer · 2014-01-24 12:02:50Z

9

Use like this

new String(req.getParameter("<my request value>").getBytes("ISO-8859-1"),"UTF-8")

answered Jan 24, 2014 at 12:02

letonai

5364 silver badges6 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

pataluc Over a year ago

this solved my problem but i don't fully understand why... :(

pataluc Over a year ago

[hidden edit] i dig a little deeper and find out that calling request.setCharacterEncoding("UTF-8"); was the only thing i needed (and it make much more sense)

Petr Mensik · Accepted Answer · 2012-06-19 11:57:14Z

2

Try to use Filter for this

public class CustomCharacterEncodingFilter implements Filter {

    public void init(FilterConfig config) throws ServletException {
    }

    public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain) 
                                                       throws IOException, ServletException {
        request.setCharacterEncoding("UTF-8");
        response.setCharacterEncoding("UTF-8");
        chain.doFilter(request, response);
    }

    public void destroy() {
    }

This should set encoding right for whole application

answered Jun 19, 2012 at 11:57

Petr Mensik

27.7k17 gold badges94 silver badges119 bronze badges

Comments

epoch · Accepted Answer · 2012-06-19 12:45:54Z

1

Looks like it was encoded with UTF-16LE (Little Endian) encoding, here is a class that successfully prints your string:

import java.io.UnsupportedEncodingException;
import java.math.BigInteger;

public class Test {
    public static void main(String[] args) throws UnsupportedEncodingException {
            String hex = "00 54 00 45 00 53 00 54 00 33 00 20 00 70 00 6c"  +
                            "00 65 00 61 00 73 00 65 00 20 00 69 00 67 00 6e" +
                           "00 6f 00 72 00 65 00 3a"; // + " 00";
            System.out.println(new String(new BigInteger(hex.replaceAll(" ", ""), 16).toByteArray(), "UTF-16LE"));
    }
}

Output:

TEST3 please ignore?

Output with two zero's added to the input

TEST3 please ignore:

UPDATE

To get this working with your Servlet you can try:

  String value = request.getParameter("text");
  try {
      value = new String(value.getBytes(), "UTF-16LE");
  } catch(java.io.UnsupportedEncodingException ex) {}

UPDATE

see the following link, it verifies that the hex produced is in fact UTF-16LE

edited Jun 19, 2012 at 12:45

answered Jun 19, 2012 at 11:49

epoch

16.7k5 gold badges50 silver badges72 bronze badges

4 Comments

Dónal Over a year ago

The last character should be ':' rather than '?'.

epoch Over a year ago

@Don, that is because the last 00 is missing from 3a, if you add it extra, it decodes correctly, either the encoder of that string is messed up or you maybe forgot to copy the last two zero's

Dónal Over a year ago

you're right, probably a copy-paste error on my part. BTW are you sure this isn't big-endian? Thanks for your help

epoch Over a year ago

no problem, i'm not an expert with character encoding, but i'm pretty sure it is little endian, because big endian does not decode the string at all :)

Collectives™ on Stack Overflow

servlet request parameter character encoding

3 Answers 3

2 Comments

Comments

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related