BigMemory 4.1 technical overview

BigMemory is the java solution that allows you to store terabytes of data in distributed memory across multiple servers without performance hits from the Java garbage collector, thus improving the speed of your application, as your data can be accessed directly from memory.

With the release of 4.1, a number of improvements and new features are making BigMemory an even more interesting product to consider when you’re working with a growing amount of data.

Let’s look at what’s new:
 
 
1) “Cross Language Client Support”

This allows BigMemory to be used in other languages than Java : .NET/C# and C++

As BigMemory is a distributed system, you can have many java applications sharing the same cached data.

Now you can also use data from applications written in other languages… It makes it really easy to share information between for instance  a Java and a C# application, without losing BigMemory’s goal of cacheing big data, even in those new languages.
 
 
2) “Multi-data center support”

Aka WAN replication. You probably already know the architecture of BigMemory, which is made of a number of Terracotta servers and Ehcache clients.
Your application is caching data in Ehcache, which is a client to the Terracotta servers. The Terracotta servers are responsible for replicating the data between applications.

This works very efficiently on a local network, and most of the time you will have your applications components (Application server, Database, etc.) running in the same LAN. Since BigMemory is aimed as huge systems, it can happen that your components are spread across a Wide Area network.

In that case, you can have Terracotta servers running close to your application components, and enable WAN replication to let the Terracotta servers synchronize data between themselves in your WAN in order to keep the best performance.

You use the concept of ‘Regions’, each region has a Terracotta server array serving your application components that are part of the region. For instance, let’s say you have networks in different parts of the world, with a data center in Europe, Asia, and in the USA.

In this configuration, you will have three groups of Terracotta server arrays, one in each region, and they all synchronize between themselves thanks to a component called the Orchestrator.

Besides, with the inclusion of Ehcache, you have an in-memory cache at the application level, which of course will improve the data access speed.
 
 
3) “Improved Search features”

BigMemory stores data following a structure, similar to a Map.

You have the possibility to search for data using queries, and some search improvements have been made in this new release.

Before discussing those, let’s recapitulate some of the search features that are supported.

Considering that you have a Person class, with a field ‘age’, and instances of this Person is stored in BigMemory, If you want to search for persons that are 35 years old, you would do this:

Query query = bigmemoryStore.createQuery()
                   .addCriteria(age.eq(35))
                   .includeKeys().end();

Results results = query.execute();

Here you will include the keys so you can fetch them later them from BigMemory with a get(key), but you could also include values directly if you want to do so.
 
 
So what’s new?

When querying for huge data set, you might end up with Results sets that are too big to fit in heap memory. in this case you can set pagination on, which will hint BigMemory to fetch the Results by batches of a maximum size, instead of fetching everything at once:

int paginationSize = 100;

query.execute(
    new ExecutionHints().setResultBatchSize(paginationSize))

Let’s talk more about the search API:

You can use aggregators to do operations on the results,

e.g., to find the sum of the age attribute:

query.includeAggregator(age.sum());

Also if you want to organize the results, you can use orders:

e.g., to order by increasing age:

query.addOrderBy(age, Direction.ASCENDING);

You can group results by attributes

e.g., to group by firstName, lastName

query.addGroupBy(
       bigmemoryStore.getSearchAttribute("firstName"), 
       bigmemoryStore.getSearchAttribute("lastName"))

The attributes of your class can be declared in your configuration, or be automatically discoverable according to the field name and field type

e.g., if your class has

private int age;

private String firstName;

private String lastName;

Then by making your cache searchable, you’ll be able to retrieve the corresponding attributes in order to use them in queries:

Attribute age = bigmemoryCache.getSearchAttribute("age");

Attribute<String firstName = bigmemoryCache.getSearchAttribute("firstName");

Attribute lastName = bigmemoryCache.getSearchAttribute("lastName");

The most useful field types are supported (Boolean, Byte, Character, Double, Float, Integer, Long, Short, String, java.util.Date, java.sql.Date, Enum)
 
 
4) “BigMemory SQL”

Besides searching by writing queries in java, you can now search using SQL-like queries.

For this, you’ll be using the class QueryManager.

In order to create an instance of the QueryManager, you can use the QueryManagerBuilder

QueryManager queryManager = 
         QueryManagerBuilder.newQueryManagerBuilder()
                            .addCache(Person).build();

Then you can call your SQL queries:

Query personQuery = 
           queryManager.createQuery("select * from Person where age = 35");
Results results = personQuery.end().execute();

You will find a good number of syntax examples in the official documentation:

http://www.terracotta.org/documentation/4.1/bigmemorymax/search/bigmemory-sql#bigmemory-sql-syntax-and-examples
 
 
5) “Support for extended hybrid storage”

Compared to the price of DRAM memory, flash-based SSD drives provide more capacity per server and still support tens of thousands of operations per second.

BigMemory Hybrid lets you take advantage of this by optimizing BigMemory for SSD usage.

Following the BigMemory Architecture, the Terracotta server is responsible for keeping and replicating the data amongst clients. It stores data in several tiers :

– Heap (more expensive, faster)

– Offheap

– Disk (Flash SSD Drive) (less expensive, slower)

If you want to use only DRAM (thus using offheap), and you want to assign 100GB of memory to offheap, your Terracotta server configuration (tc-config.xml) would look like this:

<servers>
....
  <server host="hostname" name="server1">
...
    <dataStoragesize=”100g”>
      <offheapsize=”100g”/>
    </dataStorage>
  </server>
</servers>

If you want to leverage BigMemory Hybrid and enabling your SSD drive, for 100GB of memory and a SSD drive of around 350GB, your config would look like this:

<servers>
....
  <server host="hostname" name="server1">
...
    <data>/disk/path/to/ssd/</data>
    <dataStoragesize=”450g”>
      <offheapsize=”100g”/>
      <hybrid/>
    </dataStorage>
  </server>
</servers>

Please note that using the disk does not mean that the data will persist. If you want  the data that is in-memory and on the SSD to persist, you will need to configure fast restartability, by adding the tag

<restartable enabled=”true”/>

The data will be backed up and, in case of failure, will be automatically restored.
 
 
6) “Seamless Data Upgrade for Customers”

Data upgradability gives you the possibility to upgrade BigMemory persisted data for future BigMemory versions.
 
 
7) Eventual CAS

Enables applications running on different JVMs and using Ehcache in eventual consistency mode to use CAS operations (putIfAbsent, replaceElement, removeElement).
 
 
8) Other (“High Availability”, “Management”)

I am not talking here about features that already exist in previous versions.If you are interested to know more about other new features, I’d suggest to read the official documentation first and you can always ask me questions in the comments if needed!

“High Availability”

Full fault-tolerance:

http://terracotta.org/documentation/4.1/terracotta-server-array/high-availability#high-availability-features

Fast Restartable Store:

http://terracotta.org/documentation/4.1/bigmemorymax/configuration/fast-restart

“Management”

The Terracotta Management Console  provides a customizable Web dashboard for advanced monitoring and administration of Terracotta deployments

http://terracotta.org/documentation/4.1/tms/tms

Advertisements

One thought on “BigMemory 4.1 technical overview

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s