Sunday, October 13. 2013
Coordination in EclipseLink
Today I am going to show an interesting feature of the EclipseLink JPA implementation. As you know the JPA (Java Persistence API) is the technology used in the JavaEE to deal with relational databases in an object oriented way. It maps rows, tables and all the underlaying DDBB stuff to common java objects. One of its main characteristics is the Java Persistence Query Language (JPQL), a platform independent language which is also object oriented. On the other hand, EclipseLink is a JPA implementation and it is used by default in several containers. As you can imagine, Glassfish is one of them.
EclipseLink (and all the major implementations) uses a second layer cache which is called the session cache (check the documentation for a better understanding of the EclipseLink cache system). This cache is enabled by default and it mainly stores final objects in it (queries can also be cached but specific configuration / programming is needed). Besides, the cache is placed inside the JVM heap memory out of the box. Therefore you have to be careful when you deploy a multi-instance / cluster solution.
I have seen several times projects that were not aware of this problem and finally detected the data inconsistency at the final stages. In order to resolve this issue, but not disabling the whole cache system, there are two options: using a distributed cache solution which spans across JVMs and can be integrated in the specific JPA implementation (there are several open source programs, like memcached or ehCache, and also proprietary, Oracle Coherence for example); configuring something inside the JPA implementation that notifies the different JVMs about object changes. Obviously the former technique is a more robust integration but for a lot of applications the plain notification is more than enough. Think that the notification solution is asynchronous in nature. This entry is going to show how to enable EclipseLink cache coordination using a JMS (Java Message Service) topic with Glassfish.
Cache coordination is what EclipseLink uses to notify object changes. There are two out of the box configurations:rmi and jms, but the jms solution is much simpler. Using this kind of notification all JVMs subscribe to a topic and, when an object is modified, that instance sends a message which is received by all the members of the cluster, each instance evicts the object from the cache at that moment.
In order to show all the solution the following steps were performed:
My typical Glassfish environment was setup. Two debian KVM boxes with a two-instanced cluster. I followed my previous entry setup but using the new version 4, for this example the Apache web layer is not necessary.
A glassfish cluster provides a JMS cluster too (it is an embedded solution by default, both, the container itself and the message queue server, run in the same JVM). I did not change that, the default configuration was used.
Once the cluster is up and running the JMS resources are created. A factory and a topic are necessary. First the factory was created:
Please notice the property useSharedSubscriptionInClusteredContainer=false. The topic did not work until that feature was disabled (a weird exception with message "Subscription is shared but no ClientID was set on connection" was thrown when the EclipseLink code subscribed to the topic), I found the solution in the OpenMQ documentation.
Then the topic was added:
And both resources were assigned to all the targets (the admin server instance and the cluster cluster1 itself).
Finally I developed a simple JPA application which is a copy of the clusterjsp but using a DDBB. There is only one entity which stores properties, the object maps to two tables (the table with the main entity, the primary key is an auto-increment identifier, and a sub-table with the key / value property and a foreign reference to the main one).
The integrated JavaDB was used as DDBB. It can be started directly from the command line like this:
$ cd ${GLASSFISH4_HOME}/bin $ ./asadmin start-database
It was running in the first host (debian1) and therefore the default DerbyPool was changed to point to this host (the pool is by default configured against localhost, configuration that would not have worked in the second instance of the cluster).
The default resource jdbc/__default, which uses the DerbyPool, should also be available for both targets (server and cluster1). Exactly the same as I did with the jms resources (see previous image).
Finally the persistence.xml file was configured:
<?xml version="1.0" encoding="UTF-8"?> <persistence version="2.1" xmlns="http://xmlns.jcp.org/xml/ns/persistence" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://xmlns.jcp.org/xml/ns/persistence http://xmlns.jcp.org/xml/ns/persistence/persistence_2_1.xsd"> <persistence-unit name="clusterjpaPU" transaction-type="JTA"> <provider>org.eclipse.persistence.jpa.PersistenceProvider</provider> <jta-data-source>jdbc/__default</jta-data-source> <properties> <property name="eclipselink.target-database" value="Derby"/> <property name="eclipselink.ddl-generation" value="drop-and-create-tables"/> <property name="eclipselink.cache.coordination.protocol" value="jms" /> <property name="eclipselink.cache.coordination.jms.topic" value="jms/CacheCoordinationTopic" /> <property name="eclipselink.cache.coordination.jms.factory" value="jms/CacheCoordinationFactory" /> </properties> </persistence-unit> </persistence>
As you see, the tables are dropped and created in every deploy, the derby database is specified and the coordination is set to jms. All the remaining parameters are only the different resources which have been created in the previous steps (jdbc pool, jms factory and topic).
And that is all. The following video shows the two scenarios, first without the coordination (the three properties about coordination in the persistence file were commented out) and then with that feature enabled. As you see without notifications each server is not aware about the changes performed in the other machine. It is interesting that queries work but the data in the row is not updated (because queries return the ids and the object is retrieved from the internal cache, which is out of date). The only way to fix the situation is evicting the whole cache. Then the application is redeployed with the coordination enabled. In this second scenario everything works perfect, objects are updated because notifications are sent between the instances.
Today's entry explains the configuration of the EclipseLink cache to act coordinated when a multi-instance environment is setup. As usual the glassfish application server was used (but using the new version 4.0). Except for the strange property in the factory resource, everything worked smoothly and the example clusterjpa application ran without problem. Please download the netbeans project from here if you are interested.
Coordinated regards!
Comments