Saturday, May 24. 2014
Couchbase Manager for Glassfish: Version 0.5
As you know I am developing a toy project, called couchbase-manager, which is basically a session manager for glassfish application server (version 3 and 4) with the main feature of storing the sessions in a couchbase server. Some days ago the version 0.5 of the manager was released, and today's entry is dedicated to present the new features in the two last versions. (The version 0.4 was released during the upgrading of the blog and, although I prepared the entry, it was never published. I simply forgot about it and then, when I realized about my oversight, I thought it was too late.)
In general this new version 0.5 is much less attractive than the previous one, version 0.4 was a big change inside the manager. I will try to summarize all the the changes below.
The version 0.4 introduced a new important feature: external attributes. Now in the couchbase-manager there are situations when an attribute inside the session is managed as another object in the couchbase repository. Until this version the session was always managed as a whole (it was serialized, stored and de-serialized as a complete object). The main target of this idea is managing smaller sessions in general.
It is obvious that this feature is an improvement only in some circumstances and, during its development, it was tried to identify those situations. It is important to remember that the manager handles two different configurations (sticky and non-sticky) and this new feature affects them in a different way. The sticky configuration never reads the session from the external repository (as the manager can be sure it is the only one which manages its sessions, it can trust in the sessions which are in memory) and the session saving is always done in the background. Therefore, for this configuration, a lighter session is not very important. Nevertheless the non-sticky configuration always reads (and blocks) the session when a new request comes, so, if the session is smaller, it has a direct benefit. But, what happens if the application requests an externalized attribute? It is clear that the manager should read the attribute synchronously. That means that, in the worst case (all the externalized attributes are requested) this feature is a penalty (the same information is read but in several requests, adding the network, couchbase processing times for each operation). In summary the new feature is only an advantage if the attributes managed as external are rarely accessed and big. If all attributes are frequently requested by the application this feature is useless or, even worse, a penalty.
The externalized attribute values are never maintained inside the session, it does not matter the configuration used. In both configurations the real value is always deleted from the session (saving memory) and only the reference is maintained (this reference is the key to the object in couchbase). So when an attribute is externalized, its value is always removed from the session.
For achieving this feature the package es.rickyepoderi.couchbasemanager.couchbase, which manages interaction with couchbase, was vastly modified to perform bulk operations (the class BulkClientRequest executes several operations at the same time, the operations needed to manage the external attributes and the session in a single bulk request). The other big change was performed in the CouchbaseWrapperSession, the session now needs to track the attributes in order to know which of them are accessed rarely. In short, big attributes (attrMaxSize property, 10KB by default) are tracked, they are externalized if its usage goes below a specified percentage and they are re-integrated in the session is the usage goes above another limit (attrUsageCondition). The tracking is done through the UsageStats class.
Version 0.5 adds another good feature above the changes done in version 0.4. The previous externalization characteristic made that each attribute of the session was serialized and de-serialized independently (before the session was serialized and de-serialized as a whole object). This change makes possible to delay the moment in which an attribute is de-serialized to the time when it is requested by the application, if the attribute is not requested that time is saved. Besides it has a second benefit, if the attribute is not accessed by the application, the same serialized object (which have never been de-serialized) is still valid. So the first part is only important in the non-sticky configuration (in sticky the attribute value is already stored in the session and the manager does not need to read or de-serialize it) but the second is valid for both (the session saving is performed in both configurations, and the attributes which were not accessed can be directly saved without serializing them again).
Nonetheless this feature has some penalty in memory usage in the sticky setup. This configuration uses the idea that the same manager is going to always manage a specified session, for that reason the attribute values remain in the session for avoiding unnecessary reads. Now the values are maintained twice (the serialized byte array and the real object value). When an attribute is accessed by the application the serialized byte array is removed, but it is stored again as soon as it is calculated for the saving. In the non-sticky configuration this penalty does not happen, in this case the attribute values are always cleared from the session and they are re-read at the beginning of the request. When they are read, all of them are only stored as a serialized byte array. The ones that are accessed by the application are de-serialized. So in this configuration the attribute value is the serialized array or the real object, but never both at the same time. Here it is important to remember that externalized attributes are always removed from the session (it means that big / unused attributes are not duplicated in the sticky configuration).
The final new feature is something that it was completely forgotten in the previous versions. JavaEE provides some listeners for monitoring the session and the attributes life-cycle (when a session is created, destroyed, renamed or when an attribute is added, modified or deleted). Until version 0.5 those listeners were not taken into account, so the behavior with them was unknown.
Finally I remembered the existence of those listeners and I tested what happened with them when using the manager. There were problems only with one situation, the destruction of a session because of inactivity. If you remember the manager considered a session invalidated by inactivity using the expiration time in couchbase. If the object still existed in the repository it was valid, if it was expired and therefore it did not exist, it was invalid. That was a very good idea (at least I think that) but the problem was that the session was unavailable when it was expired and, in turn, the listeners receive an incomplete session (only in non-sticky configuration which does not maintain the attribute values).
Therefore a new property was added, extraInactiveInterval, which establishes a extra time in seconds to the expiration time applied to sessions in couchbase (180 seconds by default). During this extra time the loop that searches for expired sessions has time to detect the session as expired and to invalidate it normally, calling the listeners properly. So, since version 0.5, a session is expired checking times instead of session existence in couchbase. Obviously the session needs to be re-read (non-sticky) to be sure it is really expired. As in any other cluster manager there are special considerations when several instances are involved, please check this wiki page for more information.
After all these changes, new performance tests are going to be presented but, this time, there are changes. The previous performance tests executed requests for session attributes with options: 1x50, 4x50, 20x100 and 20x200 (number of attributes and size of each one, sessions of 50, 200, 2000 and 4000 bytes respectively). In order to test external attributes some tests which manages bigger attributes are needed. Besides a new command line option was added to the web services client application. Now there is a u option which specifies the number of attributes that are accessed in each request. A execution with u=1 means in each update operation only one attribute is requested by the application randomly, but if u=a (the number of attributes to modify is the same of the number of attributes created) all the attributes are modified. This command line option lets us modify the usage ratio of the attributes to force their externalization or not. From now on the tests performed are the following: 4x50-u1, 20x200-u1, 12x12000-u1 and 12x12000-u12. The first two tests are the same tests that were performed before (tests 2 and 4 of the previous versions). The other two are new ones, which use twelve attributes of 12000 bytes (total session size around 140K), one has an attribute usage ratio of 8% (u=1, that means that all attributes are going to be externalized in both configurations) and the other of 100% (u=12, the twelve attributes are always read and, therefore, no one is externalized). Other difference now is that my laptop is configured with the performance governor. I saw that sometimes the numbers varied too much and I checked that the difference was because the frequency set by the governor (I suppose that the load is not big enough to set to maximum frequency with the default ondemand governor in all the tests). The numbers for the four tests are presented below but, because the differences, I am not going to compare them with previous versions.
Starting with the creation operation, the numbers are very similar in all the tests and configurations (except the sticky test where all the external attributes remain integrated in the session, which is slower in the three operations, and I really do not know why). Times should be similar because, more or less, all situations need the same operations against couchbase.
In the update graphic we have some interesting effects. The sticky configuration is not so clearly better, the benefit of saving de-serializations is good for non-sticky configuration. In both configurations times are better if only one attribute is accessed by the application (u=1), in the case that all the attributes are requested (u=12) times are clearly worse. So the externalization feature is quite nice, managing smaller sessions is worthy, and despite of the cost of reading synchronously one attribute. I have a strange feeling with the sticky case with u=12, this case is the worst in all the three operations and I have no reason to explain why (in theory it should be a bit better than the non-sticky test).
Finally the delete operation presents another interesting result. The numbers for all the tests except the one that performs externalization are more or less the same in both configurations, but the test with externalization is remarkably worse. The reason is that, when deleting the session, all the session attributes are accessed to execute possible listeners, so, when they are externalized, all of them have to be read synchronously one by one. Those extra reads make this case almost double the time of the other cases.
The tests show that externalization is a very good feature for the manager. And I feel that in a typical application the externalization would be even worthier (in the test with u=1 all the attributes have the same probability of being accessed by the application, which is not common in the real life). As a final comment I want to say that the performance tests stressed the disk notoriously, there are a lot of sessions being created, deleted and modified and couchbase persists all these changes to disk. Several times I have said that my couchbase environment does not need disk persistence, I think that replication is enough for common JavaEE applications, but couchbase guys seem to be reluctant to provide such configuration. I have read several times that the software is moving to be a complete NoSQL database instead of a cache system. If it is finally true, it is a real pity, because I chose couchbase because it was a cache and not because it was a database. I feel that, with disk persistence imposed, this manager is never going to be fully functional, the best setup is unavailable.
Regards!
Comments