Saturday, September 28. 2013
WBXML stream: Version 0.1.0
If you remember I was developing a wbxml parser and encoder for Java. This was motivated by another idea that I think is not going to be productive, but, at least, I wanted to complete this first step. The wbxml-stream is the result of all this work, it is a StAX (Streaming API for XML) for the JavaSE to deal with WBXML (WAP Binary XML). This week the first version 0.1.0 was released and it is a complete StAX implementation.
The WBXML is a standard which is defined in this document by the Open Mobile Alliance. In order to develop the library the java implementation is based on the C project libwbxml, which is the only implementation that I know of the WBXML format. Nevertheless the java library is quite tricky and different, its main features are the following:
In WBXML all the common elements of XML (element, attributes,... ) are mapped to an integer token (that is why it is called binary). This mapping data and some other information compose what is called a language (examples of languages are SyncML or Wireless Village).
The main feature of this implementation is to be generic, i.e., valid for any WBXML language. This way the wbxml-stream project loads a language from a properties definition file and the implementation does not deal with any special language tricks (thing that happens in libwbxml all the time). The definition part is managed in the library by the es.rickyepoderi.wbxml.definition package.
By default some language definitions are located and loaded at initialization in version 0.1.0 (ActiveSync, CONML, DevInf 1.1, DevInf 1.2, DMDDF 1.2, drmrel 1.0, EMN 1.0, OTA, PROV 1.0, SI 1.0, SL 1.0, SyncML 1.1, SyncML 1.2 and WV CSP 1.1). If you need to add a new definition the project wiki explains how a definition file is organized and presents a working example of a dummy new one.
The main idea of parsing and decoding is that an intermediary java structure is used. This way, when a WBXML is read (parsed), those intermediary objects are first created (memory representation of the WBXML document) and then the StAX reader iterates over the structure. In a writing (encoding) process the operations are reversed, the StAX writer constructs the object representation and it writes the binary stream at the end.
I know this idea means more memory and not a real streaming (StAX) way of working. But WBXML is a quite weird format (please check the pdf presented before) and there are several characteristics that convinced me to do it simple at first (this handicap could be a good improvement for next versions).
The objects of this intermediary representation are placed inside the es.rickyepoderi.wbxml.document package. The WbXmlEncoder and the WbXmlParser are the main classes to encode and parse a WBXML document respectively.
Finally the package es.rickyepoderi.wbxml.stream contains the StAX implementation. I tried to fully cover the standard, there are implementations for XMLStreamReader, XMLStreamWriter, XMLEventReader, XMLEventWriter, XMLInputFactory and XMLOutputFactory. In order to create the readers and writers the factories are recommended, see some examples in the wiki pages.
For some time I decided to only implement the stream classes and use internal java implementation for event reader and writer (a event reader can use a stream reader) but I finally decided to go all the way. Now the library only depends in JavaSE, no other dependency is needed. Even the logging is performed using the java.util.logging classes.
I think this first version is more or less usable. It contains several languages (only a few of the languages that libwbxml supports are not defined in wbxml-stream) and it is quite well tested (all the xml and wbxml files that libwbxml uses for testing are also tested with wbxml-stream). Besides I have been using it against a Microsoft Exchange Server (ActiveSync language) without problems in the communication.
If you need to use it please go ahead and share your feelings with me. Remember that WBXML is a very odd format and it is almost impossible to deal with it without understanding its specification. So although wbxml-stream lets you manage WBXML documents exactly as a XML one, it is convenient that you read the standard before. In case you have to define a new language it is compulsory. If any bug or problem is detected please inform me using this blog or github.
Enjoy wbxml-stream!
Saturday, September 15. 2012
WBXML stream: First bits
Another of my new initiatives is a Java streaming implementation for the WBXML format. The WAP Binary XML (WBXML) is a binary representation of the XML language thought to transmit documents in a more compact manner over mobile networks. The format is maintained by the Open Mobile Alliance and its complete specification can be downloaded from their website. Currently several mobile phones languages use it, for example SyncML (Synchronization Markup Language), WML (Wireless Markup Language) or WV (Wireless Village), and, although it is not very common, I need it for another idea.
On the other hand now Java SE (version 6 and 7) defines different technologies to deal with XML documents like SAX (Simple API for XML), DOM (Document Object Model) or StAX (Streaming API for XML). All of them are different ways of reading, writing and managing XML documents. But, more interestingly, all can be used together with JAXB (Java Architecture for XML Binding) to construct Java Objects from XML documents and vice-versa. I admit that all the XML stuff in Java is horribly complicated but the final JAXB idea is absolutely awesome. Remember when I developed some RESTful Web Services via JSON, how easy was integrating the parsing and decoding part, and all of that was because JAX-RS uses JAXB.
So in this project the idea is simple, implementing a StAX (reader and writer) that let us encode and parse WBXML documents. Why StAX? Just because it is simple, modern and I worked with it before. I need to say that I used intensively the current libwbxml C implementation to understand the confusing specification (as you see it is also an ancient project).
The library is now divided in different packages:
es.rickyepoderi.wbxml.definition: The package is used to define known WBXML languages. The format specification (not very friendly I have to say) assigns numerical identifiers for XML tags and attributes (besides some other things). This way SyncML or any other language should be defined in order to parse and encode documents of that type. I decided to use a plain properties file for every language definition. The definition file is quite complicated to understand but, please, blame WBXML specification and not me. Finally the current initialization class loads all properties files at startup, but I admit that that class is too over-elaborated and I will surely change the way it is working now.
es.rickyepoderi.wbxml.document: All the classes in that package are a memory representation of a WBXML exchange document. Following libwbxml idea all the parsing and decoding stuff first uses this Java (memory or object) representation and then the StAX part uses it to write or read the XML.
es.rickyepoderi.wbxml.stream: Finally this package implements a XMLStreamReader and XMLStreamWriter for StAX. For the moment only those two classes are implemented (no event reader or writer, no factories).
es.rickyepoderi.wbxml.tools: Mainly an application to have the Java implementation of the wbxml2xml and xml2wbxml binaries. The Converter class let us transform an XML file into WBXML and backward. It is used to test my implementation against the C counterpart.
I think the implementation is very weak nowadays but I have successfully used it to convert some XML files to WBXML and viceversa using libwbxml to check results (the idea is the wbxml2xml binary can decode my WBXML files and my Java program can decode a WBXML file encoded using xml2wbxml binary, and the same with XML files). Currently I have tested with SyncML, WV and SI (Service Indication) languages.
I present the first example of how my library works using it to write the WBXML representation of a DOM SyncML document:
// read the XML using DOM InputStream in = new FileInputStream("syncml-001.xml"); DocumentBuilderFactory domFact = DocumentBuilderFactory.newInstance(); domFact.setNamespaceAware(true); domFact.setIgnoringElementContentWhitespace(true); DocumentBuilder domBuilder = domFact.newDocumentBuilder(); Document doc = domBuilder.parse(in); Element element = doc.getDocumentElement(); // locate the definition of the WBXML using the root element WbXmlDefinition definition = WbXmlInitialization.getDefinitionByRoot( element.getLocalName(), element.getNamespaceURI()); // create the StAX stream writer using the definition OutputStream out = new FileOutputStream("syncml-001.wbxml"); XMLStreamWriter xmlStreamWriter = new WbXmlStreamWriter(out, definition); // create a transformer to convert DOM into StAX Transformer xformer = TransformerFactory.newInstance().newTransformer(); Source domSource = new DOMSource(doc); StAXResult staxResult = new StAXResult(xmlStreamWriter); xformer.transform(domSource, staxResult);
Besides I have created some JAXB classes (using xjc command over the DTDs of some simple languages like SI) and they can be used without problem. For example this sample lines load a SI element from a WBXML file and write it into a XML one.
// read the wbxml input file from a StAX stream InputStream in = new FileInputStream("si-001.wbxml"); XmlStreamReader xmlStreamReader = new WbXmlStreamReader(in); // create the object from the StAX stream reader JAXBContext jc = JAXBContext.newInstance(Si.class); Unmarshaller unmarshaller = jc.createUnmarshaller(); Si si = unmarshaller.unmarshal(xmlStreamReader); // create a default marshaller for XML output OutputStream out = new FileOutputStream("si-001.xml"); Marshaller marshaller = jc.createMarshaller(); marshaller.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, Boolean.TRUE); marshaller.marshal(obj, out);
Please note no stream object is being closed in the examples. And that's all! As usual I created a new project in my github space. I know there is plenty room for improvement but it was so many hours understanding and coding the annoying format that I wanted to share the new info with all of you.
<goodbye/>
Comments