Saturday, May 19. 2012
I am continuing with my little coubase-manager project. Following the ideas that I already commented in the previous entry, I have setup a two KVM debian boxes with a two-instance glassfish cluster, both instances access to a single couchbase installed in my laptop. Besides each glassfish is configured with a JK enabled listener that is balanced via jk-workers and Apache. First debian box manages a non-sticky apache, the second one a sticky configuration. So in summary the deployment is very similar to the one shown in the simple HA setup entry but using 3.1.2 version. My first surprise was that the manager works smoothly against a balanced cluster, it is incredible when something works at first time.
I developed a JAX-WS WebService application which manages the Java session (here it is the netbeans project for client and server). The main idea behind that application is testing the manager and check some times. The application has four operations: create (a new session is added to the server), update (one attribute in the session is modified), refresh (the session is only read but not changed) and delete (session is invalidated). The attributes added to the session are byte arrays of a specified size. The web services are called by a little multi-thread client which can emulate a typical web user. The client starts several threads (different users) which create a new session, perform several updates and refreshes and, finally, delete it. The update and refresh part is executed again by a second layer of threads (children threads), this second layer performs the operations over the same session (parallel modifications). Both threads (parent and children) run for a specified number of iterations. At the end the mean and standard deviation for every operation time are displayed. The command has several arguments that let me test different aspects of the manager.
$ java -cp . es.rickyepoderi.managertest.client.Test -h Unknown option: -h java es.rickyepoderi.managertest.client.Test [-option [value]] ... Options: -b: Base URL for the WSDL (default: http://localhost:8080/manager-test/SessionTest?wsdl) -n: Namespace of the WSDL (default: http://server.managertest.rickyepoderi.es/) -l: Local part of the WSDL (default: SessionTest) -t: Number of threads (default: 1) -ct: Number of children threads per parent thread (default: 1) -a: Number of attributes in session (default: 10) -s: Size of each attribute in bytes (default: 100) -ur: Ratio (percentage) of updates 0-100 (default: 50) -os: Sleep time inside operation in ms (default: 0) -ts: Sleep time inside thread between operation in ms (default: 0) -i: Number of iterations (default: 1) -ci: Number of iterations/operations for each parent iteration (default: 1) -d: Show debug for threads (default: false)
The first interesting thing that I tested was the locking part. When executing the client with two inner threads (the -ct option creates the commented second level of threads that handle the same session, so session is accessed in parallel by all of them) in the sticky Apache times were uniform:
$ java -cp . es.rickyepoderi.managertest.client.Test \ -b "http://192.168.122.22/manager-test/SessionTest?wsdl" \ -ct 2 -ci 5 -ts 100 -d 11:32:33.504 (ExecutorParent-8|null): CREATE res=debian1:8d85190ffdf0b9bbb88f61cc45e4 time=42 11:32:33.548 (ExecutorParent-8|ExecutorChild-11): REFRESH res=debian1:8d85190ffdf0b9bbb88f61cc45e4 time=10 11:32:33.548 (ExecutorParent-8|ExecutorChild-12): REFRESH res=debian1:8d85190ffdf0b9bbb88f61cc45e4 time=11 11:32:33.658 (ExecutorParent-8|ExecutorChild-11): REFRESH res=debian1:8d85190ffdf0b9bbb88f61cc45e4 time=17 11:32:33.660 (ExecutorParent-8|ExecutorChild-12): REFRESH res=debian1:8d85190ffdf0b9bbb88f61cc45e4 time=32 11:32:33.776 (ExecutorParent-8|ExecutorChild-11): UPDATE res=debian1:8d85190ffdf0b9bbb88f61cc45e4 time=12 11:32:33.792 (ExecutorParent-8|ExecutorChild-12): REFRESH res=debian1:8d85190ffdf0b9bbb88f61cc45e4 time=13 11:32:33.888 (ExecutorParent-8|ExecutorChild-11): REFRESH res=debian1:8d85190ffdf0b9bbb88f61cc45e4 time=12 11:32:33.905 (ExecutorParent-8|ExecutorChild-12): UPDATE res=debian1:8d85190ffdf0b9bbb88f61cc45e4 time=11 11:32:34.000 (ExecutorParent-8|ExecutorChild-11): UPDATE res=debian1:8d85190ffdf0b9bbb88f61cc45e4 time=11 11:32:34.017 (ExecutorParent-8|ExecutorChild-12): UPDATE res=debian1:8d85190ffdf0b9bbb88f61cc45e4 time=10 11:32:34.128 (ExecutorParent-8|null): DELETE res=debian1 time=30 ERRORS: 0 CREATE count: 1.0 mean: 42.0 dev: 0.0 UPDATE count: 4.0 mean: 11.0 dev: 0.7071067811865476 REFRESH count: 6.0 mean: 15.833333333333334 dev: 7.559026980299042 DELETE count: 1.0 mean: 30.0 dev: 0.0
If you see all operations (update and refresh) are just around 10ms (standard deviation is low). But when executing the multi-thread client against the non-sticky server, one execution time is crazy:
$ java -cp . es.rickyepoderi.managertest.client.Test \ -b "http://192.168.122.21/manager-test/SessionTest?wsdl" \ -ct 2 -ci 5 -ts 100 -d 11:32:55.160 (ExecutorParent-8|null): CREATE res=debian2:8d8a4e2efa6998e424a432e9d4e9 time=45 11:32:55.208 (ExecutorParent-8|ExecutorChild-11): REFRESH res=debian2:8d8a4e2efa6998e424a432e9d4e9 time=15 11:32:55.324 (ExecutorParent-8|ExecutorChild-11): REFRESH res=debian1:8d8a4e2efa6998e424a432e9d4e9 time=10 11:32:55.434 (ExecutorParent-8|ExecutorChild-11): REFRESH res=debian2:8d8a4e2efa6998e424a432e9d4e9 time=11 11:32:55.546 (ExecutorParent-8|ExecutorChild-11): REFRESH res=debian1:8d8a4e2efa6998e424a432e9d4e9 time=12 11:32:55.208 (ExecutorParent-8|ExecutorChild-12): REFRESH res=debian1:8d8a4e2efa6998e424a432e9d4e9 time=416 11:32:55.659 (ExecutorParent-8|ExecutorChild-11): UPDATE res=debian2:8d8a4e2efa6998e424a432e9d4e9 time=12 11:32:55.726 (ExecutorParent-8|ExecutorChild-12): REFRESH res=debian1:8d8a4e2efa6998e424a432e9d4e9 time=11 11:32:55.838 (ExecutorParent-8|ExecutorChild-12): UPDATE res=debian2:8d8a4e2efa6998e424a432e9d4e9 time=12 11:32:55.950 (ExecutorParent-8|ExecutorChild-12): REFRESH res=debian1:8d8a4e2efa6998e424a432e9d4e9 time=7 11:32:56.058 (ExecutorParent-8|ExecutorChild-12): REFRESH res=debian2:8d8a4e2efa6998e424a432e9d4e9 time=13 11:32:56.171 (ExecutorParent-8|null): DELETE res=debian1 time=25 ERRORS: 0 CREATE count: 1.0 mean: 45.0 dev: 0.0 UPDATE count: 2.0 mean: 12.0 dev: 0.0 REFRESH count: 8.0 mean: 61.875 dev: 133.86414521820248 DELETE count: 1.0 mean: 25.0 dev: 0.0
One refresh operation lasts 416ms and that is because the session is locked by the other server/thread. There are two threads executing operations in parallel, and, cos it is non-sticky, both servers are processing calls. If one server gets a session but it is locked by the other, internal glassfish implementation tries again a bit later. And that is what is shown here. So, it is working, sessions are locked correctly and it is assured that one server modification does not collide with another. No error is displayed.
Another interesting thing was that the replication configuration produces errors when non-sticky. That means that replication needs some time to be processed from one server to the other (I am really not sure if I have some error in the glassfish setup but I did not find anything). For example:
$ java -cp . es.rickyepoderi.managertest.client.Test \ -b "http://192.168.122.21/manager-test/SessionTest?wsdl" \ -ci 3 -ts 50 -d 10:35:20.192 (ExecutorParent-8|null): CREATE res=debian1:dca46f0af1e07ac1db60e7a8157e time=44 10:35:20.289 (ExecutorParent-8|ExecutorChild-11): REFRESH res=ERROR - debian2 time=12 10:35:20.352 (ExecutorParent-8|ExecutorChild-11): UPDATE res=ERROR - debian1 time=475 10:35:20.877 (ExecutorParent-8|ExecutorChild-11): UPDATE res=ERROR - debian2 time=11 10:35:20.939 (ExecutorParent-8|null): DELETE res=debian1 time=488 ERRORS: 3 CREATE count: 1.0 mean: 44.0 dev: 0.0 UPDATE count: 2.0 mean: 243.0 dev: 232.0 REFRESH count: 1.0 mean: 12.0 dev: 0.0 DELETE count: 1.0 mean: 488.0 dev: 0.0
That means that after the session was created in debian1 it was not retrieved in debian2 (I suppose the session has not been replicated yet), so another session was created (with no attributes, and that was the origin of the following errors). So obviously the use of child threads (threads that execute over the same session) is also erroneously executed in the replication non-sticky environment.
Finally I present some graphics about performance, I tested five different situations:
- No replication and sticky. The manager-test web service application was deployed without distributable tag and was tested again the sticky Apache (non-sticky does not work cos no replication or couchbase is used).
- Replication and sticky. The application was deployed with replication but accessed sticky.
- Replication and non-sticky. Same as before but accessing the non-sticky Apache.
- Couchbase and sticky. The manager-test is configured with the couchbase manager but accessed sticky.
- Couchbase and non-sticky. Same deployment but accessing the non-sticky Apache.
For each situation four different types of session were tested: 50 bytes (1 attribute of 50 bytes), 200 bytes (4x50), 2000 bytes (20x100) and 4000 bytes (40x200). I present the mean time table for the 5 environments and 4 sizes. In the third situation (replication and non-sticky) errors were reported (session is not replicated yet and produces errors). All the tests used 16 threads (-t 16 option), 20 iterations per thread (-i 20, 20 sessions were created and destroyed by each thread), 50 child iterations (-ci 50, each session was accessed, update or refresh, 50 times before deletion) and half a second of sleep time (-ts 500, after each operation the thread slept for 500 ms). The number of attributes (-a) and the size of the attribute (-s) are changed to test the four different types of session commented before. I attach here the spreadsheet with all the numbers (mean and deviation) but the mean graphs are presented below.
Couchbase manager is just a bit slower in refresh and update operations but create and delete operations are worse. Size of sessions does not matter much, in all the scenarios times are more or less independent of the size. Couchbase is very very reliable in non-sticky environments (the other non-sticky environment, non-sticky with replication, is very strange, creation time is enormous and errors are returned, as I said, I am not very confident with this setup). Another curiosity is that couchbase is better in sticky that in non-sticky, although same code is executed in both cases. As a conclusion if I want my manager to be more competitive I need to implement a special sticky property that handles sticky configurations better (saving locks and gets), just in the same way Martin does in memcached-session-manager. That is the following point to implement when I have some time. Nevertheless I am very happy, cos the implementation works well with two servers and times are not excessive.
Stay tuned for more news!