Saturday, September 15. 2012
WBXML stream: First bits
Another of my new initiatives is a Java streaming implementation for the WBXML format. The WAP Binary XML (WBXML) is a binary representation of the XML language thought to transmit documents in a more compact manner over mobile networks. The format is maintained by the Open Mobile Alliance and its complete specification can be downloaded from their website. Currently several mobile phones languages use it, for example SyncML (Synchronization Markup Language), WML (Wireless Markup Language) or WV (Wireless Village), and, although it is not very common, I need it for another idea.
On the other hand now Java SE (version 6 and 7) defines different technologies to deal with XML documents like SAX (Simple API for XML), DOM (Document Object Model) or StAX (Streaming API for XML). All of them are different ways of reading, writing and managing XML documents. But, more interestingly, all can be used together with JAXB (Java Architecture for XML Binding) to construct Java Objects from XML documents and vice-versa. I admit that all the XML stuff in Java is horribly complicated but the final JAXB idea is absolutely awesome. Remember when I developed some RESTful Web Services via JSON, how easy was integrating the parsing and decoding part, and all of that was because JAX-RS uses JAXB.
So in this project the idea is simple, implementing a StAX (reader and writer) that let us encode and parse WBXML documents. Why StAX? Just because it is simple, modern and I worked with it before. I need to say that I used intensively the current libwbxml C implementation to understand the confusing specification (as you see it is also an ancient project).
The library is now divided in different packages:
es.rickyepoderi.wbxml.definition: The package is used to define known WBXML languages. The format specification (not very friendly I have to say) assigns numerical identifiers for XML tags and attributes (besides some other things). This way SyncML or any other language should be defined in order to parse and encode documents of that type. I decided to use a plain properties file for every language definition. The definition file is quite complicated to understand but, please, blame WBXML specification and not me. Finally the current initialization class loads all properties files at startup, but I admit that that class is too over-elaborated and I will surely change the way it is working now.
es.rickyepoderi.wbxml.document: All the classes in that package are a memory representation of a WBXML exchange document. Following libwbxml idea all the parsing and decoding stuff first uses this Java (memory or object) representation and then the StAX part uses it to write or read the XML.
es.rickyepoderi.wbxml.stream: Finally this package implements a XMLStreamReader and XMLStreamWriter for StAX. For the moment only those two classes are implemented (no event reader or writer, no factories).
es.rickyepoderi.wbxml.tools: Mainly an application to have the Java implementation of the wbxml2xml and xml2wbxml binaries. The Converter class let us transform an XML file into WBXML and backward. It is used to test my implementation against the C counterpart.
I think the implementation is very weak nowadays but I have successfully used it to convert some XML files to WBXML and viceversa using libwbxml to check results (the idea is the wbxml2xml binary can decode my WBXML files and my Java program can decode a WBXML file encoded using xml2wbxml binary, and the same with XML files). Currently I have tested with SyncML, WV and SI (Service Indication) languages.
I present the first example of how my library works using it to write the WBXML representation of a DOM SyncML document:
// read the XML using DOM InputStream in = new FileInputStream("syncml-001.xml"); DocumentBuilderFactory domFact = DocumentBuilderFactory.newInstance(); domFact.setNamespaceAware(true); domFact.setIgnoringElementContentWhitespace(true); DocumentBuilder domBuilder = domFact.newDocumentBuilder(); Document doc = domBuilder.parse(in); Element element = doc.getDocumentElement(); // locate the definition of the WBXML using the root element WbXmlDefinition definition = WbXmlInitialization.getDefinitionByRoot( element.getLocalName(), element.getNamespaceURI()); // create the StAX stream writer using the definition OutputStream out = new FileOutputStream("syncml-001.wbxml"); XMLStreamWriter xmlStreamWriter = new WbXmlStreamWriter(out, definition); // create a transformer to convert DOM into StAX Transformer xformer = TransformerFactory.newInstance().newTransformer(); Source domSource = new DOMSource(doc); StAXResult staxResult = new StAXResult(xmlStreamWriter); xformer.transform(domSource, staxResult);
Besides I have created some JAXB classes (using xjc command over the DTDs of some simple languages like SI) and they can be used without problem. For example this sample lines load a SI element from a WBXML file and write it into a XML one.
// read the wbxml input file from a StAX stream InputStream in = new FileInputStream("si-001.wbxml"); XmlStreamReader xmlStreamReader = new WbXmlStreamReader(in); // create the object from the StAX stream reader JAXBContext jc = JAXBContext.newInstance(Si.class); Unmarshaller unmarshaller = jc.createUnmarshaller(); Si si = unmarshaller.unmarshal(xmlStreamReader); // create a default marshaller for XML output OutputStream out = new FileOutputStream("si-001.xml"); Marshaller marshaller = jc.createMarshaller(); marshaller.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, Boolean.TRUE); marshaller.marshal(obj, out);
Please note no stream object is being closed in the examples. And that's all! As usual I created a new project in my github space. I know there is plenty room for improvement but it was so many hours understanding and coding the annoying format that I wanted to share the new info with all of you.
<goodbye/>
Comments