Wednesday, May 25, 2011

Data Grids And The Race To The Cloud

Two recent announcements from the past month are indicating that the Data Grid market is getting hot and that everybody needs one for their cloud offering:
  1. Software AG acquires Terracotta Inc., a leader in in-memory and cloud enabling technology (May 23, 2011)
  2. Red Hat launches Early Access Program for New Offering; New Data Grid Solution Brings Cloud Scale and Agility to the Data-Tier (May 03, 2011)
The first announcement is by Software AG, a provider of BPM solutions with revenues of €1.1 billion in 2010, acquiring Terracotta, one of the leading providers of in-memory data grids (Ehcache, BigMemory).
Software AG clearly states that Terracotta was acquired for supporting its cloud offering :
The ability to scale the processing of massive loads of data across flexible, modular, geographically distributed architectures will also drive cloud adoption and transform Software AG into a full Platform-as-a-Service (PaaS) provider in the mid-term

The second announcement is by Red-Hat launching a new data grid solution based on Infinispan, a JBoss community project and one of the promising solutions in my opinion (lock free, MVCC). The term "cloud" repeats 7 times across this announcement + one occurrence of the term "PaaS".
Red-Hat already launched a two cloud solutions CloudForms and OpenShift Platform as-a-Service (PaaS). Currently OpenShift provides support for Memcached (which is a distributed but not elastic cache solution), adding the new elastic data grid solution to the offering is a necessary addition to OpenShift.
 Another interesting point - the Red-Hat announcement also refers to a blog post by Forrester Research analyst Mike Gualtieri which, among other things, deals with the tight relation between "Elastic caching" and "Cloud computing". Mike Gualtieri deeply covered the topic of elastic caching in two blog posts which got a lot of attention from players in this market, not all mentioned in his post (Oracle Coherence, GigaSpaces, IBM and now Red-Hat).
If you want to read more details about the different classifications of cache solutions an the role of elastic caches in cloud based solution I recommend Nati Shalom's post - WTF is elastic data grid

These two announcement adds to the recent acquisition of GemStone, which develops distributed caching and database solutions, by VMWare (SpringSource) resulting with the addition of GemFire to the vFabric cloud application platform.

So its seems that almost all ISV in the data grid market where acquired by the big players to serve their cloud strategy.
If I'm not mistaken, Gigaspaces is the only serious independent player left in this domain (although its offering is much wider than just an elastic cache). I'm very interested to see if they will stay this way.



  








      Monday, May 23, 2011

      OpenShift Flex part 2 - Create, deploy and run a SeamForge application in less than 10 minutes

      In my last post I started reviewing OpenShift - the new PaaS offering by Red-Hat.
      After playing with the sample application provided with OpenShift, I wanted to check how to deploy my own application.
      I decided to go with a very basic application created using SeamForge, a rapid application generation tool which is a part of the Seam framework. What's nice about SeamForge is that it also lets you to take an existing Java EE project and work-in new functionality.
      Following the SeamForge documentation (7 simple steps and  9 commands) I generated a basic Java EE CRUD web-application.based on JSF, Facelets, Seam and Metawidget.
      By the way, there is an open issue at the SeamForge Jira for directly deploying Forge application to OpenShift.

      Next I went on to deploying an running the application on OpenShift:

      The first step was to create a new application (applications --> add application).


      Once the application is named and given a version, it is time to decide which infrastructure components are required for running the application. In order to do so, you need to choose the newly created application from the applications list and select the "Components" tab.
      In this case I only needed a JDK and a JBoss application server (version 6 or 7). There was no need for an external database since the application uses the JBoss default datasource.


      Next step is uploading the application files. Staying in my new application context, I switched to the "Files" tab and chose to upload the application war file.  Notice that choosing to import an existing application requires a file in VPM format (and not a Java archive as I tried before RTFM).
      After uploading, OpenShift automatically extracted my war file. From the "Files" tab I can now view my application files and edit them (in a very simple editor). This is very useful if you want to make some configuration changes without sending filed back and forth to the server.

      At this point OpenShift indicated that there are 42 changes to the application, which are actually all the files I imported. You can see the list of changes from the console and diff them with the previous versions of the files.
      Since I had nothing more to configure I went on to deploying the application.In a more realistic scenario this would be the time to perform some configuration tasks on the various infrastructure components. OpenShift provides a way to configure the component using a wizard, which is actually a form with all the major configuration details, or expert mode which allows you to directly edit the configuration files.

      So, last step was choosing to deploy all changes, which actually meant deploying the application to JBoss.




      After the last step I had a deployed application which can be tested.
      The "Overview" tab of the "Applications" section has all the information required for accessing your application.

      Latest step was checking the application is actually working

      So, using SeamForge and OpenShift I had a simple web application created, deployed an running in less than 10 minutes.

      Of course this is a very simple application and not a real life scenario, and yet it shows a little about the power of RAD + PaaS.

      Wednesday, May 18, 2011

      Test Driving Red Hat OpenShift Flex

      OpenShift is a new new platform-as-a-service offering by Red-Hat announced on May 4th.
      OpenShift will be available in Express, Flex and Power modes. The Express and Flex offerings were made availability in developer preview and the Power option is coming soon.
      The new platform is aimed for Java, Ruby, PHP, and Python applications, and has support for the MySQL and MongoDB databases.
      OpenShift express offers free hosting while OpenShift Flex offers a greater degree of control and choice over the middleware components with built-in versioning, monitoring and auto scaling.
      The OpenShift platform is based on code by Makara (a cloud start-up company acquired by Red-Hat in 2010) plus some RHEL isolation features, the JBoss EE runtime and some other components.
      For now OpenShift is not open-sourced, but Red-Hat has promised to make the code available.

      I decided to give the OpenShift Flex version a try.
      In order to try OpenShift you have to register as a new user. After you a registered and logged in you can start using the Flex Console.
      At the beginning you will have to follow a few initial setup steps including setting up a cloud provider (currently only Amazon EC2) , creating a cluster of servers and deploying a sample application. Notice that you need an active Amazon EC2 account and you will be required to provide your account access keys (can be found in the AWS management console).
      The first 3 steps are part of an 8 steps self guided tour to OpenShift. You can come back to this guide whenever you want.
      The Flex console is, surprisingly, a Flex based graphical user interface for provisioning cloud resources, governing the size and location of your cloud servers and managing the lifecycle of your cloud deployed applications.
      The console is designed as a portal with a tab panel containing all major functionality of the console. The major areas of the console are:

      The "INTRO" tab allows your to take the self guided tour and provides links to some how-to guides (PDF format).


      The "CLOUDS" tab allows you to manage the cloud accounts which are used for hosting your applications.
      Once you have a cloud account setup you can move on to the "CLUSTERS" tab for creating a cluster of servers and defining basic cluster characteristics such as number of servers, number of cores, memory and disk space and more advanced ones related to auto scaling. Additionaly this area contains functionality for importing/exporting database and setting email preferences.
      Creating a new cluster will result with creating cloud servers and installing necessary software one them.


      The "SERVERS" tab view information about the cloud servers in your cluster including basic resource utilization statistics. It also allows you to join/unjoin servers to a cluster.
      Applications are added and managed in the "APPLICATIONS" tab. At this area you can control various aspects of the application lifecycle from build to deployment, view the application files, configure it and allocate various software components (JDK, web server, app server, database, cache and more). It is also possible to view various changes made to the application files, diff them with previous versions and deploy them to production.



      The "PERFORMANCE" tab is used for performance monitoring and contains a lot of nicely displayed visual information about the performance of various components of the application from the underlying server to the application code.


      Additionally there are tabs for viewing various logs and events.

      My first experience with the console is OK, a little buggy and the UI still needs improvement, especially when some actions takes time and you do not get visual indication, while are actions leaves you stuck with a rotating clock indicator which never disappears. I also encountered some connectivity and stability issues, but after all this is only a developers preview version for now.
      Next steps will be to test the more interesting features provided by OpenShift such as auto scaling and deploying my own applications on it (not the samples provided by Red-Hat).

      At next posts I will dive deeper into the various functionality provided by OpenShift and my personal experience with it.

      Friday, February 19, 2010

      Migrating a Spring/Hibernate application to MongoDB - Part 1

      Backgorund

      For the past few years ORM have been the de facto solution for bridging the gap between object oriented programming languages and relational databases.Well, most of the developers using ORM care about writing less persistence code and SQL than they care about the object-relational impedance mismatch. As time passed and more experience gained, some people started to claim that maybe ORM is not the best solution available.
      Another option for storing your objects, which have been there for quite some time, are non-SQL databases. With recent explosion of non-relational databases and the NoSQL movement ("Not Only SQL") this option is becoming more and more viable.

      There are a lot of examples showing how to develop a new application based on a non-relational database. But what if you already have an application using a relational database + ORM  and you want to migrate it to a non-relational database?

      In the next few posts I will try to suggest a migration path for a Spring/Hibernate (JPA) application to MongoDB.
      MongoDB is a scalable, high-performance, open source, schema-free, document-oriented database and is one of the interesting non-relational databases available today (together with Cassandra, HBase, Redis and others).

      The application

      The example application is a very simple blogging engine implemented using the Spring/Hibernate (JPA) stack.
      The two main entities are Blogger and BlogPost. There are data access objects (DAO) with matching interfaces for both entities.

      Setting up and connecting to MongoDB

      Setting up MongoDB is a pretty simple procedure. The MongoDB quickstart and getting started pages contains all the required details.
      In order to connect to MongoDB we will need to use the Mongo Java driver :

      <dependency>
          <groupId>org.mongodb</groupId>
          <artifactId>mongo-java-driver</artifactId>
          <version>1.2.1</version>
      <dependency>


      Next step is adding a new Spring service that will provide MongoDB connections. This is a basic implementation which uses the default configuration.

      3    import com.mongodb.DB;
      4    import com.mongodb.Mongo;
      5    import java.net.UnknownHostException;
      6    
      7    public class MongoService {
      8        private final Mongo mongo;
      9        private final DB db;
      10   
      11       public MongoService(final String dbName) throws UnknownHostException {
      12           mongo = new Mongo(); // MongoDB server (localhost:27017) 
      13           db = mongo.getDB(dbName); // Connect to database
      14       }
      15   
      16       public Mongo getMongo() {
      17           return mongo;
      18       }
      19   
      20       public DB getDb() {
      21           return db;
      22       }
      23   }
      


      The Mongo class is responsible for the database connection and contains a connection pool. The default pool size has 10 connections per host. You can configure the pool size by using the MONGO.POOLSIZE system property or by passing a MongoOptions parameter to the Mongo constructor.
      The DB class represents a logical database on the MongoDB server. We will use a database names "blog" for the blogging application.

      <bean id="mongo" class="my.demo.blog.services.MongoService">
              <constructor-arg index="0" value="blog"/>
          </bean>
      

      Entities and Documents

      MongoDB stores data in collections of BSON documents. Documents may contain any number of fields of any length and type. Usually you should store documents of the same structure within collections. MongoDB collections are essentially named groupings of documents.
      The Mongo Java driver provides a DBObject interface to save custom objects to the database.
      The DBOject is very similar to a Map with String keys. You can put/get document element by their String key and get a list of all available keys.
      In order to save our entities in MongoDB we will create an adapter which implements the DBObject interface.

      public class DbObjectAdapter implements DBObject {
          private final BeanUtilsBean beanUtils;
          private final Object entity;
          private final Set<String> keySet;
      
          public DbObjectAdapter(Object entity) {
              if (entity == null) {
                  throw new IllegalArgumentException("Entity must not be null");
              }
              if (!entity.getClass().isAnnotationPresent(Entity.class)) {
                  throw new IllegalArgumentException("Entity class must have annotation javax.persistence.Entity present");
              }
              this.entity = entity;
              this.beanUtils = new BeanUtilsBean();
              this.keySet = new HashSet<String>();
              initKeySet();
          }
      
          @Override
          public Object put(String name, Object value) {
              try {
                  beanUtils.setProperty(entity, name, value);
              } catch (Exception e) {
                  return null;
              }
              return value;
          }
          
          @Override
          public Object get(String name) {
              try {
                  return beanUtils.getProperty(entity, name);
              } catch (Exception e) {
                  return null;
              }
          }
      

      In order to decide which members of the entity we would like to store in MongoDB, we create and new annotation - @MongoElement - and annotate the selected getter methods.

          @MongoElement
          public String getDisplayName() {
              return displayName;
          }
      

      The DbObjectAdapter creates the document key set by looking for the annotated methods.

          @Override
          public Set<String> keySet() {
              return keySet;
          }
      
          private void initKeySet() {
              final PropertyDescriptor[] descriptors = beanUtils.getPropertyUtils().getPropertyDescriptors(entity);
              for (PropertyDescriptor desc : descriptors) {
                  final Method readMethod = desc.getReadMethod();
                  MongoElement annotation;
                  if ((annotation = readMethod.getAnnotation(MongoElement.class)) != null) {
                      keySet.add(desc.getName());
                  }
              }
          }
      

      After having the DbObjectAdapter we can create a base DAO class for storing entities in MongoDB

      public abstract class BaseMongoDao<S> implements BaseDao<S> {
          private MongoService mongo;
          private DBCollection collection;
      
          @Override
          public S find(Object id) {
              final DBObject dbObject = collection.findOne(new ObjectId((String) id));
              final DbObjectAdapter adapter = new DbObjectAdapter(getEntityClass());
              adapter.putAll(dbObject);
              return (S) adapter.getEntity();
          }
      
          @Override
          public void save(S entity) {
              collection.save(new DbObjectAdapter(entity));
          }
      
          @Autowired
          public void setMongo(MongoService mongo) {
              this.mongo = mongo;
              // use the entity class name as the collection name
              this.collection = mongo.getDb().getCollection(getEntityClass().getSimpleName());
          }
      
          /**
           * 
           * @return the entity class this DAO handles
           */
          public abstract Class getEntityClass();
      }
      

      Notice that there is no need to create the collections, the database creates it automatically on the first insert.

      Next part

      So far we've seen how to setup MongoDB and how to store our entities in it.
      In the next parts of this series we will discuss the following migration topics:
      • Identifiers
      • Relations
      • Queries
      • Data migration

      Wednesday, February 17, 2010

      Why you should look at the exceptions tab when profiling

      When profiling an application I always like to take a look at the Exceptions tab (I use Yourkit Java profiler). Frequent exceptions may show that something is going wrong and you don't know about it since someone preferred to swallow the exception and hope for the best.
      Today, while trying to figure out a performance issue related to Classloader synchronization in Weblogic I noticed that IndexOutOfBoundsException is frequently thrown by the business layer of the application.

       

      The code clearly speaks for itself:

         public Object getObject1() {
              try{
                  return (Object )getObjectList().get(0);
              }
              catch(IndexOutOfBoundsException e){
                  return null;
              }
          }

        public Object  getObject2() {
              try{
                  return (Object )getObjectList().get(1);
              }
              catch(IndexOutOfBoundsException e){
                  return null;
              }
          }

      * Method and class names where altered  

      Sunday, February 7, 2010

      JBoss, Java6, InstanceNotFoundException and Yourkit profiler

      Today I spent a few hours trying to realize why all of a sudden JBoss starts with annoying InstanceNotFoundException messages.
      There where two changes done from the previous working configuration:
      • Switched to Java 6.0 (found out I was using 5.0 by mistake)
      • A new MBean was added
      The problem did you reoccur when switching back to Java 5.0, but it did not help me much.
      Cursing the entire worlds and blaming the guy who wrote the new MBean did not solve the issue so I started checking other things. It turned out that when removing the Yourkit profiler agent from the JBoss start script prevented this issue.
      Digging into the Yourkit startup options I've found out that the profiler light-weight telemetry may clash with some JavaEE application servers MBeans implementation. The Yourkit J2EE integration wizard (which I did not use) adds a startup option which starts the telemetry with a delay - "delay=10000".
      Adding the delay option solved my problems !

      Friday, February 5, 2010

      Beware of Hibernate's read/write cache startegy with memcached

      Recently I've been working on improving the performance of an application which involves massive data processing. The application is a JavaEE application using Hibernate for persistence and Memcached as its 2nd level cache. Almost all of the entities are cached to reduce the load on the database.
      My immediate goal was improving performance without radically changing the system architecture (I'm well aware of better technologies to use for such an application).

      While profiling the application I noticed that while a bunch of new threads start processing data they get in blocking state one after the other and remain like this for 30-60 seconds.
      Looking at the their stacks I immediately saw the their are all blocking on Hibernate's ReadWriteCache put/get methods.


       

      Apparently most of the entities where cached with a read/write strategy.
      A read/write cache should prevent two thread from updating the same cache element concurrently or updating and reading concurrently - so it makes sense to see locks. But it turns out Hibernate uses method level synchronization which also prevent two threads for reading the same cache element concurrently.
      Now, when using a local cache this issue is probably less noticed, but when using a distributed caching solution such as Memcached, cache access time is longer and so more thread are waiting for each other.
      The cache access time is even longer when you ask for an entity which is not in the cache, then you have to wait for the cache to say the entity is not there, get it from the database and put it into the cache. For whole this time the thread keep the monitor preventing other thread from working with the cache.
      A better way to handle this, would have been using java.util.concurrent.locks.ReentrantReadWriteLock which enables more fine grained locking (read lock for the get method and write lock for the put method).

      Another issue is cache regions. Hibernate creates a ReadWriteCache instance per region, if not regions are defined than only a single instance of ReadWriteCache is used which makes the synchronization even a bigger problem.

      The solution for this issue was switching to a nonstrict read/write strategy wherever possible and creating a cache region per entity. This reduced the locking effect dramatically.