Saturday, February 8. 2014
RTC Data Channels
Today's entry is again about the Web Real Time Communication (WebRTC), the new API to enable P2P video and audio between browsers. In two previous entries the WebRTC demo application was presented (python web application developed by google and mozilla in order to test their respective implementation and the interoperability between them) and then the sample application was modified to use glassfish 4 instead of python as the server (glassfish 4 was chosen because it has the WebSocket implementation standardized by JavaEE 7). In this third entry of the series a new concept is going to be introduced: RTC Data Channels.
Obviously in a common WebRTC use (audio and video communication) there are specific data channels to send and receive the steaming audio and video information, but the specification provides a way to obtain a general purpose P2P bi-directional data channel to send whatever information. Obviously WebRTC tries to replace any previous communication application and this type of channels are absolutely necessary (sending files, chat applications,...). Here you have a very detailed link about data channels in bloggeek.me if further information is needed.
I started testing this feature using the following simple application. It is a simple html file which creates two RTCPeerConnection and establishes a data channel between them for sending text and files. The same page represents both ends (the sender and the receiver). The code is prepared to work with chrome but with minimal changes it can also be tested with firefox.
After testing with the previous and simple example, I decided to enhance my glassfish application in order to create a chat system. Now there is a new chat.xhtml page which creates the RTC data channels (the RTCPeerConnection is established in the same way as before). The channel is used to send the messages between the peers. The application works well inside my home network between two firefox browsers (iceweasel 24.2 or current firefox 26.0) and chromium (31.0.1650.63) but it does not work when the browsers are mixed. When the channel is opened in one side the callback at the other side is not triggered, so the connection is never established when the browsers are different. The little application can also exchange files between the peers but the file is transmitted as a single chunk, that produces errors when the file is big enough. If you recheck the previous link from bloggeek.me there is a maximum transmission data size in chrome and if the file is too big errors are displayed (it seems that firefox supports bigger chunks but problems are also reported).
Although the application is very simple and a bit sloppy, it is a good example of a RTC data channel. Here it is a video where two browsers inside the same laptop connect using the glassfish apprtc application and start chatting. The process is exactly the same than in the video example: one user connects to the page and remains there waiting for the other partner to join; the second user connects to the same URL (same room); the negotiation starts using the glassfish server as intermediary; when the peer connection is established a data channel between them is created; then chatting and exchanging files can proceed.
Some time ago I presented some entries about integrating VOIP in a portal (skype first and then an opensource pidgin solution), if you recheck the current series about WebRTC, now it is possible to replace the previous off-browser idea with a browser-only solution. The current implementation and the specification itself is not mature, WebRTC should be fully implemented by the major browsers and the interoperability should be assured, but the apprtc application, although simple and sloppy, is the perfect proof that all the required features are covered by the specification (video, audio, chat, exchanging files,...). The browser is the VOIP device and the server (glassfish in my example) is just a way of making the peers know each other. I suppose that sooner than later some JS projects will appear implementing a fully WebRTC device or, at least, making WebRTC easier to integrate in a final project. Here you have the modified apprtc project for glassfish with the chat page integrated.
Readapting an old slogan: the browser is the computer.
Friday, July 26. 2013
The WebRTC Demo in Glassfish 4
Today, and after a long time without a typical entry, I am going to continue with the new WebRTC browser API and exploring the new possibilities of JavaEE 7. If you remember in a previous entry the apprtc (a sample application done by google and mozilla to test WebRTC interoperability between their browsers) was installed and tested inside the google application engine (python). But some weeks ago the new Glassfish 4.0 was released with all the new JavaEE 7 features. This entry is going to use two of them, JSONP (JSON Processing) and the websocket API, to replicate the server functionality of that application. This way the apprtc application (with minimal changes) will work but using glassfish 4.0 instead of the google engine as the server side component.
Only three classes were developed to replace the main python functionality:
es.rickyepoderi.apprtc.bean.AppRTCBean: A simple request JSF bean that calculates all the needed javascript variables to render the main HTML page. Things like the room key, the username,... Are calculated and inserted in the page source just as it is done in the python counterpart.
es.rickyepoderi.apprtc.server.RoomEndpoint: The websocket endpoint which is going to receive the client messages. In WebRTC both clients use the server just to exchange negotiation information with the other peer. After the negotiation the communication is P2P (browser to browser more specifically). In the original application this client / server communication was done using google channel but obviously the Java implementation is going to use websockets (this standard was also introduced in another entry of the blog, but please take in mind that now the Java API is also standardized by JavaEE 7).
es.rickyepoderi.apprtc.ejb.RoomBean: Initially this class was going to be a Singleton EJB, but the bean was not injected in the websocket endpoint (I suppose that it is a bug, because I think the injection inside an endpoint is supported by JavaEE 7), and therefore it was transformed into a common singleton object. This class handles the room management. The apprtc is based on the idea of rooms: each room has an identifier (the room key); two users can join in the same room; once both are in the room the audio / video call can start; when one of them hangs, he is disconnected from the room.
The HTML pages (index.html and full.html) were transformed into facelets HTMLX with minimal changes. And the main.js (where all the JavaScript resides) was slightly modified. Google channel and websockets are very similar APIs, so the changes are not very complex.
The Java version works in the following way (I am going to repeat the details explained in the previous entry but now some details of the implementation are described):
When the first user enters the JSF generates a random room key and a random username. The client browser opens a websocket and joins to that room (this part is different, now a join message is sent from the client to inform the server about the room key and the username). The browser also requests access to mic and camera as before. Because this client is the first one to be in the room, he waits for the other partner to join (variable initiator).
The second user accesses the application but specifying the same room key (a parameter r is used for that purpose). The JSF maintains the room key and generates another random username for this second client. The browser opens the websocket and sends the join message too. This way the server endpoint receives the second join for the same room, so the room is full and ready to start the video / audio call. Local resources (mic and camera) are requested by the second browser as usual.
The last browser to join receives a variable initiator=1 (created by the JSF bean). This variable marks that there was another user in the room and, therefore, he can start to exchange the negotiation messages. This way the second browser recollects the peer connection data and sends it to the server (an offer message).
The server receives the offer via the websocket. He locates the room using the websocket ID and sends the info to the other partner. In the two previous joins both usernames and websockets IDs were linked to the room.
The first browser receives the info from the server (via websocket) and replies with an answer message. This message is again retransmitted by the server to the other browser.
In this point some messages are exchanged between both browsers following the same idea. In this conversation the details to establish the P2P call are agreed. Besides the server and the websockets are a key part, the server just retransmits what it receives from a browser websocket to the other and vice-versa.
At the end of this point the P2P communication is established between the browsers and the audio / video call is started.
When a user hangs the call up, the websocket is closed, the server detects that, removes the user from the room and sends a bye message to the other user. The other browser in turn receives the message and returns to the initial state, this browser is maintained in the room waiting for another user to join. As soon as it leaves the page the websocket is closed and it abandons the room too.
Here I present a video with the commented behavior. The first browser (main screen) accesses the application specifying a fixed room (r=00000000). Local resources are requested and accepted. And the join message is sent to the server. It can be seen in the chromium console and in the glassfish logs. As soon as the second browser (little window in the bottom-right corner) joins to the same room the video call starts. Then the first browser hangs the call up and the second one receives the bye message and returns to initial state (waiting for other partner to join in that room). The chromium console receives all the messages perfectly and the glassfish server performs the retransmissions from one browser to the other correctly too.
With this entry I wanted to test two new features of JavaEE 7: JSONP and the websocket API. Both APIs are used in the WebRTC application for the client / server communication (a websocket is used to transmit the JSON messages). Besides implementing my own WebRTC server makes me understand even better how this technology works. I always hate the typical hello-world examples, do you want a real and modern example for websokets and JSONP? Here you have one. The complete netbeans project for the glassfish apprtc version can be downloaded from here. This time I only tested the application with chromium (version 28) because the iceweasel package is incredibly old in debian testing (come on guys, iceweasel in testing is the same version than in stable) and I did not want to install a new firefox (so take care with that).
This was my first contact with JavaEE 7. See you!
Saturday, February 16. 2013
Testing WebRTC
The last months I have been following a new project very related with HTML5, WebRTC (Web Real-Time Communication). The WebRTC is an open project with the aim of converting the web browser in a Real-Time Communications (RTC) device via JavaScript APIs. You already know that I talked before in this blog about VoIP and Web in an interesting (at least for me) two entries series. WebRTC is being standardized by the W3C (World Wide Web Consortium). It is backed mainly by Google although Firefox and Opera quickly joined developing their own implementations. Microsoft was reluctant at first but finally has adopted WebRTC but, as usual, doing it their own way. The Microsoft's result is CU-WebRTC which in general is WebRTC but changing codecs and other minor things (do you see any similarity to the video HTML5 tag problems?). About this subject Apple with Safari is currently missing in action but it is easy to guess what will be their position.
When I knew about WebRTC there was a good introduction in youtube and, although things are changing very quickly, I still recommend it to get a brief understanding of the idea. Now things are getting more and more specific but, if you check the main WebRTC page, all the JS functions are still prefixed in all browser implementations (webkit*, moz* and so on) because there is no definitive standard API. When I started to look at the project I did not understand the whole interaction among all the different elements, mainly my doubts were about the web server role.
The past week I realized that firefox and chrome developers are working together in order to get real interaction, there is even a demo application to test both implementations. I was studying how the application works and what is the real function of browsers and servers in WebRTC implementation.
First I am going to comment the two different parts that WebRTC API is divided into:
JavaScript getUserMedia API that allows the browser to access camera and microphone hardware. The returned stream can be used to simply display it locally or send it over the network.
PeerConnection API that allows browsers to establish direct peer-to-peer (P2P) socket connections between them. The connection joins a browser directly to someone else's browser and exchanges data directly. P2P is very useful for high-bandwidth protocols like video and audio, because the server is freed from dealing with large amounts of data. Obviously concepts like STUN servers or media proxies should be considered in the underlying implementation of the API.
Both parts are needed in order to get WebRTC in your browser but they can be considered as different parts of the same API. After studying the demo application it works as follows:
As soon as a user enters in the application a room is assigned (a numeric identifier which is managed as a web parameter). The browser in this point gets the access to the local resources (camera and mic), operation which asks for permission (getUserMedia JavaScript part). It also creates a peer connection offer which is sent to the server (PeerConnection API part). This offer, I suppose, contains things like IPs, codecs, and all the things needed to establish a P2P connection.
The partner browser enters to the application specifying the same room (same parameter r=<ROOM_ID>) and performs the same operations (it accesses to local resources and sends another offer to the server). But this time the server finds that there is another user in the same room and both offers are exchanged.
At this point there is a PeerConnection negotiation to check how the communication between both browsers are going to be, taking the server as the man in the middle which lets both partners to interact (both partners know the server and can talk through it). Each offer is exchanged between the browsers and all the decisions needed to establish a P2P connection are agreed (I do not know for real what things are decided here). The application uses google channel for the communication between browsers and server.
After the negotiation both browsers can interact directly with each other and the video call (sound and video) is performed without any server interaction.
Finally, when any user leaves the page or hangs the call, the browser communicates again with the server in order to inform that this user has left the room (the server deletes the user from the room).
With this application I have finally understood what is the role of the server in WebRTC, it is the one that connects two particular clients/browsers (so concepts like presence that was commented in the previous entries about VoIP is now competency of the server application). Now I am going to comment the steps I performed to test the demo:
The application was checked out:
The demo works with the google app engine (python version), so I downloaded the SDK for linux and unzipped it.
$ unzip -q ~/Desktop/google_appengine_1.7.4.zip
The application was started but listening in all interfaces of my box:
$ google_appengine/dev_appserver.py --address=0.0.0.0 apprtc/
In order to test the application firefox nightly (FF21) and chrome beta (chrome 25) are said to be used. In theory chrome 23 (version 21 added getUserMedia API and the commented version 23 the PeerConnection one) and firefox 18 (that version added all WebRTC code in the mozilla browser) have the full WebRTC API but, it is clear, both implementations are very immature and they are evolving version by version (take in consideration that the standard itself is only a draft right now). The application has some tweaks in order to make the conversation possible.
So Chrome 25 (beta channel right now) and firefox 21 (nightly in these moments) were installed in my old laptop (the new one does not have a camera ) and in my desktop machine.
$ /opt/google/chrome/chrome --version Google Chrome 25.0.1364.68 $ ./firefox --version Mozilla Firefox 21.0a1
Launch chrome 25 in one box and access the application (http://192.168.1.131:8080 in my case). As soon as you enter the browser is redirected with a room identifier parameter (http://192.168.1.131:8080?r=<ROOM_ID>). The browser asks permission to access the camera and microphone and it waits for the other partner to join.
- Launch firefox 21 in the other box and access the application but using the same room id (put directly the url with the parameter, http://192.168.1.131:8080?r=<ROOM_ID>). Firefox asks for permissions too and as soon as it enters the communications starts.
$ svn checkout http://webrtc-samples.googlecode.com/svn/trunk/apprtc/
Firefox needs two options enabled in about:config page, media.navigator.enabled and media.peerconnection.enabled. In the downloaded version 21 only the second one was not set by default.
Here it is the video:
Today's entry is a little explanation of the new WebRTC project. As I commented before I did not understand it completely at the beginning and this entry dissects the new demo application just to review its basics. I think now it is clearer for me and I hope that I helped you to understand it too. Obviously WebRTC is emerging as a new HTML5 feature and (as in many of the previous features) dispute is assured among mayor players.
Web Wars forever!
Comments