Saturday, February 16. 2013
Testing WebRTC
The last months I have been following a new project very related with HTML5, WebRTC (Web Real-Time Communication). The WebRTC is an open project with the aim of converting the web browser in a Real-Time Communications (RTC) device via JavaScript APIs. You already know that I talked before in this blog about VoIP and Web in an interesting (at least for me) two entries series. WebRTC is being standardized by the W3C (World Wide Web Consortium). It is backed mainly by Google although Firefox and Opera quickly joined developing their own implementations. Microsoft was reluctant at first but finally has adopted WebRTC but, as usual, doing it their own way. The Microsoft's result is CU-WebRTC which in general is WebRTC but changing codecs and other minor things (do you see any similarity to the video HTML5 tag problems?). About this subject Apple with Safari is currently missing in action but it is easy to guess what will be their position.
When I knew about WebRTC there was a good introduction in youtube and, although things are changing very quickly, I still recommend it to get a brief understanding of the idea. Now things are getting more and more specific but, if you check the main WebRTC page, all the JS functions are still prefixed in all browser implementations (webkit*, moz* and so on) because there is no definitive standard API. When I started to look at the project I did not understand the whole interaction among all the different elements, mainly my doubts were about the web server role.
The past week I realized that firefox and chrome developers are working together in order to get real interaction, there is even a demo application to test both implementations. I was studying how the application works and what is the real function of browsers and servers in WebRTC implementation.
First I am going to comment the two different parts that WebRTC API is divided into:
JavaScript getUserMedia API that allows the browser to access camera and microphone hardware. The returned stream can be used to simply display it locally or send it over the network.
PeerConnection API that allows browsers to establish direct peer-to-peer (P2P) socket connections between them. The connection joins a browser directly to someone else's browser and exchanges data directly. P2P is very useful for high-bandwidth protocols like video and audio, because the server is freed from dealing with large amounts of data. Obviously concepts like STUN servers or media proxies should be considered in the underlying implementation of the API.
Both parts are needed in order to get WebRTC in your browser but they can be considered as different parts of the same API. After studying the demo application it works as follows:
As soon as a user enters in the application a room is assigned (a numeric identifier which is managed as a web parameter). The browser in this point gets the access to the local resources (camera and mic), operation which asks for permission (getUserMedia JavaScript part). It also creates a peer connection offer which is sent to the server (PeerConnection API part). This offer, I suppose, contains things like IPs, codecs, and all the things needed to establish a P2P connection.
The partner browser enters to the application specifying the same room (same parameter r=<ROOM_ID>) and performs the same operations (it accesses to local resources and sends another offer to the server). But this time the server finds that there is another user in the same room and both offers are exchanged.
At this point there is a PeerConnection negotiation to check how the communication between both browsers are going to be, taking the server as the man in the middle which lets both partners to interact (both partners know the server and can talk through it). Each offer is exchanged between the browsers and all the decisions needed to establish a P2P connection are agreed (I do not know for real what things are decided here). The application uses google channel for the communication between browsers and server.
After the negotiation both browsers can interact directly with each other and the video call (sound and video) is performed without any server interaction.
Finally, when any user leaves the page or hangs the call, the browser communicates again with the server in order to inform that this user has left the room (the server deletes the user from the room).
With this application I have finally understood what is the role of the server in WebRTC, it is the one that connects two particular clients/browsers (so concepts like presence that was commented in the previous entries about VoIP is now competency of the server application). Now I am going to comment the steps I performed to test the demo:
The application was checked out:
The demo works with the google app engine (python version), so I downloaded the SDK for linux and unzipped it.
$ unzip -q ~/Desktop/google_appengine_1.7.4.zip
The application was started but listening in all interfaces of my box:
$ google_appengine/dev_appserver.py --address=0.0.0.0 apprtc/
In order to test the application firefox nightly (FF21) and chrome beta (chrome 25) are said to be used. In theory chrome 23 (version 21 added getUserMedia API and the commented version 23 the PeerConnection one) and firefox 18 (that version added all WebRTC code in the mozilla browser) have the full WebRTC API but, it is clear, both implementations are very immature and they are evolving version by version (take in consideration that the standard itself is only a draft right now). The application has some tweaks in order to make the conversation possible.
So Chrome 25 (beta channel right now) and firefox 21 (nightly in these moments) were installed in my old laptop (the new one does not have a camera ) and in my desktop machine.
$ /opt/google/chrome/chrome --version Google Chrome 25.0.1364.68 $ ./firefox --version Mozilla Firefox 21.0a1
Launch chrome 25 in one box and access the application (http://192.168.1.131:8080 in my case). As soon as you enter the browser is redirected with a room identifier parameter (http://192.168.1.131:8080?r=<ROOM_ID>). The browser asks permission to access the camera and microphone and it waits for the other partner to join.
- Launch firefox 21 in the other box and access the application but using the same room id (put directly the url with the parameter, http://192.168.1.131:8080?r=<ROOM_ID>). Firefox asks for permissions too and as soon as it enters the communications starts.
$ svn checkout http://webrtc-samples.googlecode.com/svn/trunk/apprtc/
Firefox needs two options enabled in about:config page, media.navigator.enabled and media.peerconnection.enabled. In the downloaded version 21 only the second one was not set by default.
Here it is the video:
Today's entry is a little explanation of the new WebRTC project. As I commented before I did not understand it completely at the beginning and this entry dissects the new demo application just to review its basics. I think now it is clearer for me and I hope that I helped you to understand it too. Obviously WebRTC is emerging as a new HTML5 feature and (as in many of the previous features) dispute is assured among mayor players.
Web Wars forever!
Comments