Saturday, June 1. 2013
HTTP Headers and JavaEE
Today I am going to present a little entry about an issue I found some weeks ago. One Web Single SingOn (SSO) solution, which is already using the old and nice opensso, was planning to move agents from the application layer (JavaEE containers) to the web layer. Until that moment the opensso agents just intercept the requests, perform all the SignOn stuff and insert some information to the application using headers (common user information like identifier, name, email,... And two multi-valued attributes which act as a profile, data to inform the application what the user can and cannot do). Besides a JAR library is used in the applications to isolate the part of the code that interacts with the SSO system.
At first sight the story seemed very easy. Just installing a new agent in the desired web server and removing any agent from the application server. In theory the agent in the front layer can perform exactly the same actions (authentication and header injection) and the library in the applications should work properly without any change. But, as usual, theory and practice are never the same thing.
Two problems came out and needed to be resolved:
The multi-valued attributes are passed differently. In the JavaEE layer they were just several values and they are retrieved using the getHeaders method. The following example shows this idea:
List<String> values = new ArrayList<>(); Enumeration<String> e = request.getHeaders(headerName); while (e.hasNext()) { values.add(e.next()); }
That is because the agent in the JavaEE container just intercepts the request and adds the pre-configured headers in the same JVM (in case of opensso using a wrapper request object). Nevertheless in the web layer all the values are added in only one header using a separator. The standard HTTP 1.1 says the following about multiple header values with the same name:
Multiple message-header fields with the same field-name MAY be present in a message if and only if the entire field-value for that header field is defined as a comma-separated list [i.e., #(values)]. It MUST be possible to combine the multiple header fields into one "field-name: field-value" pair, without changing the semantics of the message, by appending each subsequent field-value to the first, each separated by a comma. The order in which header fields with the same field-name are received is therefore significant to the interpretation of the combined field value, and thus a proxy MUST NOT change the order of these field values when a message is forwarded.
So there is a limitation and, for that, opensso web agents just add a unique value separated by a pre-configured separator (| or pipe by default). This way the agents avoid the problem of a comma being part of a value. Obviously the application library had to be modified to split values in order to get multi-valued attributes properly.
The second (and the main) issue was non-ascii characters in the header values. As you now I am Spanish and as in any non-English country our names can contain non-ascii characters (acutes, character ñ,...). All those characters were shown garbled with the new configuration.
I spent some time trying to guess what the standard says about non-ascii headers and it seems that headers should be in ISO-8859-1 encoding but there is a way of sending other charsets:
The TEXT rule is only used for descriptive field contents and values that are not intended to be interpreted by the message parser. Words of *TEXT MAY contain characters from character sets other than ISO- 8859-1 only when encoded according to the rules of RFC 2047.
So, because of the standard, headers accept any encoding and it should be marked like it is done in the email system (RFC 2047). Java mail for example has methods to encode and decode text in that format. Here it is two header examples for my common name using Q (quoted printable) and B (Base64) encodings respectively:
COMMON_NAME_Q: =?UTF-8?Q?Ricardo_Mart=C3=ADn?= COMMON_NAME_B: =?UTF-8?B?UmljYXJkbyBNYXJ0w61u?=
For what I understand from the spec a encoded header is intended to be parsed by the local application and not by the JavaEE server. If you check the code of tomcat or glassfish there is always a default charset which is ISO-8859-1 and no special parsing is performed. I even tried to recompile a tomcat 7 changing the default from ISO-8859-1 to UTF-8 just to certify my suspicion. And it indeed worked (the characters were not garbled any more), but obviously that test was just a proof and not a solution at all. In my opinion the problem here is that the opensso webagents cannot send headers in RFC 2047 format. I found that at least there is a property, called com.sun.am.policy.agents.config.convert_mbyte.enable, that let encode the headers in the system locale (opensso in UNIX uses iconv to perform the conversion). So if the locale had been set to any ISO-8859-1 language I suppose it would have worked. But obviously that property limits the header values to only the characters allowed by that encoding.
So I think that the problem is as follows: the agent in the web layer inserts the headers in UTF-8 (if the commented property is not set) and the JavaEE container receives it as ISO-8859-1. Garbling is assured. It worked before because the agent was inside the JVM and a common UTF-8 string was placed as a header (remember headers in the previous situation just live inside the JVM and never traveled from one server to another). Finally I reached the conclusion that the best thing I could do was converting the headers by my own inside the library:
String value = request.getHeader(headerName); value = new String(value.getBytes(Charset.forName("ISO-8859-1")), Charset.forName("UTF-8"));
I know it is a crap but I defined some configuration parameters in the library (I am a bit sloppy but elegant). Everybody should be aware that there are some bugs involved here. I think that I decided to follow the least bad solution.
The conclusion of this entry is that chaos theory is a matter of fact, that any little change can break your complete solution up, that shit happens. And as I am going to forget this situation very quickly I thought it was nice to write a little summary here. You know that sometimes this blog is just my personal logbook.
Sometimes just fix it and run away!
Comments