Tuesday, June 7. 2016
WBXML stream: Version 0.2.0
Some days ago pwnslinger opened the first issue in the wbxml-stream project. If you remember that project is my little effort to provide a wbxml parser/encoder for java/StAX. Mainly the problem was that the library did not parse correctly WBXML documents in version 1.1. The WBXML standard has four versions, from 1.0 to 1.3, and there are subtle differences between them. The version 0.1.0 of the wbxml-stream library just managed version 1.3 (only 1.3 documents could be encoded and the previous versions were just parsed like if they were the last one, no differences were taken into account). Nevertheless version 1.1 had an issue in the defined enumeration (a copy/paste problem :-/ ) and it was not recognized by the library.
That issue made me improve the implementation in order to properly manage the four different versions of the specification. This way version 0.2.0 has been released with the feature of handling better with previous versions of the standard. Now it is possible to encode a WBXML document in any version. Besides the parsing/encoding of documents take into account the specific characteristics between versions. The first version 1.0 does not add the charset of the document (it just manages the unknown encoding) and it does not recognize opaques. Version 1.1 adds tag opaques and charset/encodings but attribute opaques are added later in version 1.2. Version 1.2 also adds page switches to increase the number of tags, attributes and values a definition can handle. There is one difference between version 1.0 and 1.1 which wbxml-stream does not manage. In version 1.1 the document body is defined as *pi element *pi (an element with optional processing instructions before and after it), in contrast, in version 1.0 the body was 1*content (one or more content, which in turn can be an element, a string, a extension, an entity or a processing instruction). This previous definition is quite weird, a WBXML document can be only a string or an entity, which is clearly not a valid XML document. For that reason I decided to forget about this difference (a WBXML document version 1.0 with that strange content will not be correctly parsed, throwing an exception for sure). A page in the project wiki summarizes those differences if you are interested in the details.
The new version of the library also adds some improvements in the management of encoding (now the default charset, encoding for unknown, is UTF-8 instead of ASCII) and numeric character references (things like ñ or ñ to reference a character ñ). In the latter I am not sure if everything is right and maybe a new version will be needed but, for sure, it is in a better condition than in the previous version. The WbXmlOutputFactory has been updated in order to receive the version we want to write the WBXML with (property es.rickyepoderi.wbxml.stream.version). Besides the command Xml2WbXml now admits two more options (-c or --charsert and -v or --version) to convert the XML file to WBXML using the encoding and the version specified.
Here it is a little snippet to use the new version 0.2.0 to convert an SL xml file into WBXML v1.1 using a specific encoding.
// read the XML using DOM
InputStream in = new FileInputStream("sl-001.xml");
DocumentBuilderFactory domFact = DocumentBuilderFactory.newInstance();
domFact.setNamespaceAware(true);
domFact.setIgnoringElementContentWhitespace(true);
DocumentBuilder domBuilder = domFact.newDocumentBuilder();
Document doc = domBuilder.parse(in);
Element element = doc.getDocumentElement();
// locate the definition of the WBXML using the name
WbXmlDefinition definition = WbXmlInitialization.getDefinitionByName("SL 1.0");
// create the StAX stream writer using the definition
OutputStream out = new FileOutputStream("sl-001.wbxml");
XMLOutputFactory fact = new WbXmlOutputFactory();
fact.setProperty(WbXmlOutputFactory.DEFINITION_PROPERTY, definition);
fact.setProperty(WbXmlOutputFactory.VERSION_PROPERTY, WbXmlVersion.VERSION_1_1);
XMLStreamWriter xmlStreamWriter = fact.createXMLStreamWriter(out, "ISO-8859-1");
// create a transformer to convert DOM into StAX
Transformer xformer = TransformerFactory.newInstance().newTransformer();
Source domSource = new DOMSource(doc);
StAXResult staxResult = new StAXResult(xmlStreamWriter);
xformer.transform(domSource, staxResult);
And here I present a execution of the command trying to convert a SI XML file into WBXML version 1.0 and encoding ISO-8859-1 (remember that v1.0 does not use encoding, therefore any character outside ascii is compromised). As the SI document uses some opaques (which are not defined in version 1.0) the implementation avoids the use of the opaque and some warning messages are displayed (in general issues with versions generate warnings or throw exceptions, in this case the encoding does not use the opaque and warns the user because the resulting document probably is invalid).
$ java -cp wbxml-stream-0.2.0.jar es.rickyepoderi.wbxml.tools.Xml2WbXml -d "SI 1.0" -v 1.0 -c ISO-8859-1 si.xml si.wbxml Jun 06, 2016 7:23:59 PM es.rickyepoderi.wbxml.document.WbXmlEncoder encode WARNING: Opaque not used for attribute "created" in element "indication" because version "1.0" does not accept attribute opaques. Jun 06, 2016 7:23:59 PM es.rickyepoderi.wbxml.document.WbXmlEncoder encode WARNING: Opaque not used for attribute "si-expires" in element "indication" because version "1.0" does not accept attribute opaques.
And that is all. If you are using wbxml-stream library for something please try to use the new version because it integrates a new nice feature (version management) and some minor improvements and bug fixes. Stay connected for more news about the project.
Cheerio!
Comments