Saturday, May 24. 2014
Couchbase Manager for Glassfish: Version 0.5
As you know I am developing a toy project, called couchbase-manager, which is basically a session manager for glassfish application server (version 3 and 4) with the main feature of storing the sessions in a couchbase server. Some days ago the version 0.5 of the manager was released, and today's entry is dedicated to present the new features in the two last versions. (The version 0.4 was released during the upgrading of the blog and, although I prepared the entry, it was never published. I simply forgot about it and then, when I realized about my oversight, I thought it was too late.)
In general this new version 0.5 is much less attractive than the previous one, version 0.4 was a big change inside the manager. I will try to summarize all the the changes below.
The version 0.4 introduced a new important feature: external attributes. Now in the couchbase-manager there are situations when an attribute inside the session is managed as another object in the couchbase repository. Until this version the session was always managed as a whole (it was serialized, stored and de-serialized as a complete object). The main target of this idea is managing smaller sessions in general.
It is obvious that this feature is an improvement only in some circumstances and, during its development, it was tried to identify those situations. It is important to remember that the manager handles two different configurations (sticky and non-sticky) and this new feature affects them in a different way. The sticky configuration never reads the session from the external repository (as the manager can be sure it is the only one which manages its sessions, it can trust in the sessions which are in memory) and the session saving is always done in the background. Therefore, for this configuration, a lighter session is not very important. Nevertheless the non-sticky configuration always reads (and blocks) the session when a new request comes, so, if the session is smaller, it has a direct benefit. But, what happens if the application requests an externalized attribute? It is clear that the manager should read the attribute synchronously. That means that, in the worst case (all the externalized attributes are requested) this feature is a penalty (the same information is read but in several requests, adding the network, couchbase processing times for each operation). In summary the new feature is only an advantage if the attributes managed as external are rarely accessed and big. If all attributes are frequently requested by the application this feature is useless or, even worse, a penalty.
The externalized attribute values are never maintained inside the session, it does not matter the configuration used. In both configurations the real value is always deleted from the session (saving memory) and only the reference is maintained (this reference is the key to the object in couchbase). So when an attribute is externalized, its value is always removed from the session.
For achieving this feature the package es.rickyepoderi.couchbasemanager.couchbase, which manages interaction with couchbase, was vastly modified to perform bulk operations (the class BulkClientRequest executes several operations at the same time, the operations needed to manage the external attributes and the session in a single bulk request). The other big change was performed in the CouchbaseWrapperSession, the session now needs to track the attributes in order to know which of them are accessed rarely. In short, big attributes (attrMaxSize property, 10KB by default) are tracked, they are externalized if its usage goes below a specified percentage and they are re-integrated in the session is the usage goes above another limit (attrUsageCondition). The tracking is done through the UsageStats class.
Version 0.5 adds another good feature above the changes done in version 0.4. The previous externalization characteristic made that each attribute of the session was serialized and de-serialized independently (before the session was serialized and de-serialized as a whole object). This change makes possible to delay the moment in which an attribute is de-serialized to the time when it is requested by the application, if the attribute is not requested that time is saved. Besides it has a second benefit, if the attribute is not accessed by the application, the same serialized object (which have never been de-serialized) is still valid. So the first part is only important in the non-sticky configuration (in sticky the attribute value is already stored in the session and the manager does not need to read or de-serialize it) but the second is valid for both (the session saving is performed in both configurations, and the attributes which were not accessed can be directly saved without serializing them again).
Nonetheless this feature has some penalty in memory usage in the sticky setup. This configuration uses the idea that the same manager is going to always manage a specified session, for that reason the attribute values remain in the session for avoiding unnecessary reads. Now the values are maintained twice (the serialized byte array and the real object value). When an attribute is accessed by the application the serialized byte array is removed, but it is stored again as soon as it is calculated for the saving. In the non-sticky configuration this penalty does not happen, in this case the attribute values are always cleared from the session and they are re-read at the beginning of the request. When they are read, all of them are only stored as a serialized byte array. The ones that are accessed by the application are de-serialized. So in this configuration the attribute value is the serialized array or the real object, but never both at the same time. Here it is important to remember that externalized attributes are always removed from the session (it means that big / unused attributes are not duplicated in the sticky configuration).
The final new feature is something that it was completely forgotten in the previous versions. JavaEE provides some listeners for monitoring the session and the attributes life-cycle (when a session is created, destroyed, renamed or when an attribute is added, modified or deleted). Until version 0.5 those listeners were not taken into account, so the behavior with them was unknown.
Finally I remembered the existence of those listeners and I tested what happened with them when using the manager. There were problems only with one situation, the destruction of a session because of inactivity. If you remember the manager considered a session invalidated by inactivity using the expiration time in couchbase. If the object still existed in the repository it was valid, if it was expired and therefore it did not exist, it was invalid. That was a very good idea (at least I think that) but the problem was that the session was unavailable when it was expired and, in turn, the listeners receive an incomplete session (only in non-sticky configuration which does not maintain the attribute values).
Therefore a new property was added, extraInactiveInterval, which establishes a extra time in seconds to the expiration time applied to sessions in couchbase (180 seconds by default). During this extra time the loop that searches for expired sessions has time to detect the session as expired and to invalidate it normally, calling the listeners properly. So, since version 0.5, a session is expired checking times instead of session existence in couchbase. Obviously the session needs to be re-read (non-sticky) to be sure it is really expired. As in any other cluster manager there are special considerations when several instances are involved, please check this wiki page for more information.
After all these changes, new performance tests are going to be presented but, this time, there are changes. The previous performance tests executed requests for session attributes with options: 1x50, 4x50, 20x100 and 20x200 (number of attributes and size of each one, sessions of 50, 200, 2000 and 4000 bytes respectively). In order to test external attributes some tests which manages bigger attributes are needed. Besides a new command line option was added to the web services client application. Now there is a u option which specifies the number of attributes that are accessed in each request. A execution with u=1 means in each update operation only one attribute is requested by the application randomly, but if u=a (the number of attributes to modify is the same of the number of attributes created) all the attributes are modified. This command line option lets us modify the usage ratio of the attributes to force their externalization or not. From now on the tests performed are the following: 4x50-u1, 20x200-u1, 12x12000-u1 and 12x12000-u12. The first two tests are the same tests that were performed before (tests 2 and 4 of the previous versions). The other two are new ones, which use twelve attributes of 12000 bytes (total session size around 140K), one has an attribute usage ratio of 8% (u=1, that means that all attributes are going to be externalized in both configurations) and the other of 100% (u=12, the twelve attributes are always read and, therefore, no one is externalized). Other difference now is that my laptop is configured with the performance governor. I saw that sometimes the numbers varied too much and I checked that the difference was because the frequency set by the governor (I suppose that the load is not big enough to set to maximum frequency with the default ondemand governor in all the tests). The numbers for the four tests are presented below but, because the differences, I am not going to compare them with previous versions.
Starting with the creation operation, the numbers are very similar in all the tests and configurations (except the sticky test where all the external attributes remain integrated in the session, which is slower in the three operations, and I really do not know why). Times should be similar because, more or less, all situations need the same operations against couchbase.
In the update graphic we have some interesting effects. The sticky configuration is not so clearly better, the benefit of saving de-serializations is good for non-sticky configuration. In both configurations times are better if only one attribute is accessed by the application (u=1), in the case that all the attributes are requested (u=12) times are clearly worse. So the externalization feature is quite nice, managing smaller sessions is worthy, and despite of the cost of reading synchronously one attribute. I have a strange feeling with the sticky case with u=12, this case is the worst in all the three operations and I have no reason to explain why (in theory it should be a bit better than the non-sticky test).
Finally the delete operation presents another interesting result. The numbers for all the tests except the one that performs externalization are more or less the same in both configurations, but the test with externalization is remarkably worse. The reason is that, when deleting the session, all the session attributes are accessed to execute possible listeners, so, when they are externalized, all of them have to be read synchronously one by one. Those extra reads make this case almost double the time of the other cases.
The tests show that externalization is a very good feature for the manager. And I feel that in a typical application the externalization would be even worthier (in the test with u=1 all the attributes have the same probability of being accessed by the application, which is not common in the real life). As a final comment I want to say that the performance tests stressed the disk notoriously, there are a lot of sessions being created, deleted and modified and couchbase persists all these changes to disk. Several times I have said that my couchbase environment does not need disk persistence, I think that replication is enough for common JavaEE applications, but couchbase guys seem to be reluctant to provide such configuration. I have read several times that the software is moving to be a complete NoSQL database instead of a cache system. If it is finally true, it is a real pity, because I chose couchbase because it was a cache and not because it was a database. I feel that, with disk persistence imposed, this manager is never going to be fully functional, the best setup is unavailable.
Regards!
Sunday, November 10. 2013
Couchbase Manager for Glassfish: Version 0.3
Yesterday the version 0.3 of the couchbase-manager was released. As you know it is a little project to manage glassfish sessions inside a couchbase cluster on which I usually spend some time. This version is the first one I feel more or less confortable with it, I think it is starting to be in a mature state. There are mainly two new features in this version: glassfish 4.0 support and, finally, non-sticky configuration is reliable.
As soon as the 0.2 version of my couchbase manager was released the glassfish Oracle group announced the new and bright version 4.0 which, as all of us know, is the reference implementation for JavaEE 7. The couchbase-manager 0.3 works also with glassfish 4.0. I have to admit that the integration was easier than I expected, only some little changes were necessary:
Because now the project is mavenized, just setting the new version of packages web-glue and web-core to version 4.0 was enough to start compiling. Although maven has some objections, library dependencies are managed brilliantly.
Only one class did not compile. The class was the CouchbaseManagerStrategyBuilder which is in charge of the creation of the manager and the one that the glassfish core uses to inject the manager into the application. It seems that the new glassfish has moved some classes from one package to another. In summary, after changing some few lines, the builder compiled again.
In order to manage both versions (glassfish v3 and v4) inside the same project I did some tricks in the maven file (I did not want to separate the code). Now there are two builder classes and two maven profiles, one for v3 and the other for v4, and one class is excluded from compiling depending the profile selected (V3 and V4 maven profiles). It is better explained in the wiki instructions for compiling the project.
Finally in the new glassfish 4.0 the manager is defined in another file.
$ cat META-INF/hk2-locator/default [es.rickyepoderi.couchbasemanager.web.CouchbaseManagerStrategyBuilderV4] contract={com.sun.enterprise.web.PersistenceStrategyBuilder} name=coherence-web
The pity here is that the manager should still be called coherence-web because the reported bug is still present in the new version (nothing has been said about this one).
The second big change is that couchbase server version 2.2.0 fixes the problem about deleting a locked object. That issue forced me to first unlock and then delete the session in previous versions of the manager, and, in a heavy concurrency situation, the two operation technique reported errors. If you check the following bug I opened the issue some time ago and, because couchbase guys did not say anything, I decided to try to fix it by myself. After some hours compiling, checking code, detecting the affected part and fixing it, I realized that the same piece of code had been already patched two months ago. It was a complete waste of time because it was already fixed. I suppose that someone else detected the problem knowing nothing about my bug. I usually work with the community version of couchbase which is one version behind the commercial one. That habit made me fall into that messy situation.
But the result is that finally the manager can delete a session which is locked (passing the cas value). I have tested for a long time the non-sticky configuration and now no concurrency problems are detected. It can be said that now non-sticky solution is reliable.
Nevertheless the new couchbase version has also fixed the touch method. Now when a session is locked touch method results into an error (previously a locked object can be touched without problem). Because there are no unlockAndTouch or similar method in the API I decided to always save the session (cas in non-sticky, set in sticky). It is not a problem cos, after I realized that an attribute can be modified without being put again in the session, the cases when a session was not dirty, and can be touched instead of saved, were really a very little percentage. As a result the previous manager property maxTimeNotSaving has been removed (that property controlled when to save a session which has been only touched for a long time).
Finally I am going to show some times (in ms) of the manager for both configurations. As in the previous entries the configuration is two glassfish v3 in a KVM cluster against a single couchbase installed in my physical laptop. The manager is tested with different session sizes (50, 200, 2000 and 4000 bytes). More or less the numbers are stable and similar to the previous versions except the delete operation, which is slightly slower.
So the summary is that the new version 0.3 of the couchbase-manager is the first one without known problems (I am lying, there is still the LDAP realm problem because of a glassfish bug, but it is a very specific problem). I have more ideas to improve the solution (mainly dealing with attributes as separate objects in order to not save the whole session all the time) but I do not really know when I have the time and the mood to start with such a big change. Just one more comment, in my tests (performance tests mainly) I see that the couchbase server is highly intensive in disk writing (my laptop disk, which is also encrypted, was at 95-100% of utilization rate during tests). As I said several times, in my specific solution persistence to disk is not completely necessary, I think that replicas are enough security for a cache system. But couchbase buckets cannot disable disk persistence (see this forum entry or this another one). I hope that someday this feature was configurable exactly as the replica behavior.
Enjoy version 0.3!
Friday, June 14. 2013
Couchbase Manager for Glassfish: Version 0.2
Going back with my couchbase-manager this week the new version 0.2 has been released. That was long ago when the first version of the manager was made public but I have been waited how several issues evolved. I have to admit that there has not been a lot of movement but I think it is a good time for the second version.
Let's start with the new features in the version:
Two new properties can be used inside the manager. The persistTo and replicateTo properties are two new features of the last versions of the couchbase Java client. In this new version you can configure the manager in order to assure that the session data is stored in disk or replicated to a defined number of nodes.
Remember that the couchbase-manager is asynchronous in nature so the properties are used in such a way. For example a session saved with persistTo=ONE assures that the next operation will wait for the set/cas operation which, in turn, will wait the operation to be saved in the disk of one node.
The special transcoder that the manager uses (it has to handle with the special ObjectStreams that glassfish uses) has been moved from being used in the operations (just another parameter) to be incorporated in a CouchbaseConnectionFactory. That was because of the previous new feature, both the transcoder and persist/replication options cannot be passed as the same time as arguments in all the operations.
Previous version has several problems shutting down the manager. Mainly it was because glassfish is quite restrictive about what managers can be configured (a bug was opened with this issue but there has been no progress), and the coherence-web name was reused (this manager is a memory manager for the so-named Oracle product).
In order to avoid the annoying ClassNotFoundException for com.tangosol.net.CacheFactory class, I have created an empty class with the same name and a shutdown method. Besides some other minor fixes were added to stop the manager flawlessly. Now the manager can be stopped and started as many times as it was needed (any application that uses the couchbase-manager can be undeployed without problems).
The main feature of this version is that it is now mavenized. Previously it was a common Netbeans project but the new version uses a maven project (it is easier to compile and the dependencies are more clear).
The main problem here is that glassfish, although it has a maven repository, it seems to be abandoned. Until now (current version 3.1.2.2) it is still usable but I am not very confident about the new versions (maven repository is not updated since May 2011).
Besides I have been added some tests which uses the manager-test web services application. Those tests just check normal operations (creation, update, refresh and deletion of a session) and the four performance tests I used to execute to measure performance. Obviously you need to install the couchbase-manager and the application in some glassfish to test the project.
Finally the manager has been updated and tested with the newer dependencies:
- GlassFish Server Open Source Edition 3.1.2.2
- Couchbase community edition 2.0.1
- spymemcached-2.8.12.jar
- couchbase-client-1.1.5.jar
- commons-codec-1.6.jar
- netty-3.5.5.Final.jar
- httpcore-4.1.1.jar
- httpcore-nio-4.1.1.jar
The new versions of the couchbase client library have added two new dependencies (httpcore and httpcore-nio). In my humble opinion too many dependencies for the library (maybe the new Java 7 nio implementation could be used instead all this mess but I really do not know).
In the period between the versions there was an attempt to use a different serializer technique. The manager now uses common Java Serializer and it is known that other implementations like Kryo give more performance. Some little tests in order to add Kryo were done but I finally gave up because a lot of internal classes of glassfish require special treatment (objects from cdi/weld, EJBs and so on). Objects of those types need to use the Java Serializer again and I simply thought that the change was not worthy.
The bug about the LDAP realm (the couchbase-manager cannot recover a principal which is stored inside an LDAP realm) was at least updated. Another guy experienced the same problem (but in a completely another way) and the bug was marked to be fixed in version 4.0.1. So I know this is a bug and I will not change the way the manager recovers the principal user from the store when logged.
And finally an annoying behavior still persists when using the non-sticky configuration in the manager. As I have commented several times the deletion of the session cannot be done if the session object had been previously locked (it returns an error). For that reason the only way to delete a session is performing an unlock first and then sending the deletion. Obviously if the session is unlocked weird things could happen and, in my tests, they really happen. I will try to insist on this to the couchbase guys, it should be a way to delete a locked object.
So now you already know, the version 0.2 of the couchbase-manager is out. I encourage all of you to test this new version and comment me or fill a bug in github. I need your comments to check if the manager is minimally usable.
Thanks!
EDIT: I have just discovered that the new glassfish maven repo can be found here. Nice! It is a pity the version 2.0 is just released.
Comments