Monday, December 15, 2008

EMC Atmos

Let's try to be more positive about new data sources for the cloud, after the harsh comment of a reader seen in my previous post. Last week, I had the opportunity to discover Atmos, a new Cloud Optimized Storage (COS) from EMC.

What I particularly liked is the fact it is based on an object-oriented distributed file system. Microsoft wanted to build such a file system few years ago and eventually they gave up. EMC did it. I still need to dig into the details of that product, but I really think the possibilities of a true OOFS are almost impossible to imagine right now.

EMC Atmos

Let's try to be more positive about new data sources for the cloud, after the harsh comment of a reader seen in my previous post. Last week, I had the opportunity to discover Atmos, a new Cloud Optimized Storage (COS) from EMC.

What I particularly liked is the fact it is based on an object-oriented distributed file system. Microsoft wanted to build such a file system few years ago and eventually they gave up. EMC did it. I still need to dig into the details of that product, but I really think the possibilities of a true OOFS are almost impossible to imagine right now.

Performance of new databases

Interesting performance comparison from Oakleaf. They compare an Amazon EC2 application deployed with different storage options (SQL Server, SimpleDB and Azure Table Service).

I reproduce the result table, if you have no time to read the article:

A reader's comment:

A major piece of work by OakLeaf … has confirmed my suspicions about how inappropriate all the cloud name/value entity store technologies are for serious SaaS developers.
The Google AppEngine Datastore, Amazon’s SimpleDB and Windows Azure have chronic performance problems relative to conventional database throughput. Ultimately, the inherent inefficiencies of these storage options will hit hourly cloud renters in the pocket.

Performance of new databases

Interesting performance comparison from Oakleaf. They compare an Amazon EC2 application deployed with different storage options (SQL Server, SimpleDB and Azure Table Service).

I reproduce the result table, if you have no time to read the article:

A reader's comment:

A major piece of work by OakLeaf … has confirmed my suspicions about how inappropriate all the cloud name/value entity store technologies are for serious SaaS developers.
The Google AppEngine Datastore, Amazon’s SimpleDB and Windows Azure have chronic performance problems relative to conventional database throughput. Ultimately, the inherent inefficiencies of these storage options will hit hourly cloud renters in the pocket.

Semantic SOA

Seen this post on InfoQ, OASIS has released a new version of their Reference Model for SOA.

It is well admitted that if we want to have SOA more policy-driven we must stop to manually compose services together. And this can mostly be achieved through extended metadata (I mean more than WSDL). WSDL was a nice beginning but:
  • Is is limited to Web services, and not all services are Web services. For instance some existing mainframes applications, transactions, green screens, APIs of packaged applications and Stateless Session EJBs are also services.
  • Relationships between entities are missing.
  • There is nothing about behavior of services.

The whole Semantic Web is an ambitious set of projects. However, when it comes to data access in SOA, we have to deal with the same kind of issues. Accessing a data services layer is more complicated than accessing a databaase, because it publishes a business interface instead of a technical interface. Therefore, if we want to dynamically compose data services together, as opposed to hard-code the combinations, we need some kind of semantic metadata about the data manipulation behavior.

That dynamic aspect is fundamental for modern Data Services Platforms and is quite often missing in first generation technologies.

The first step of SOA was all about designing, deploying and consuming services. The second step is now all about dynamically designing, dynamically deploying and dynamically consuming services. We cannot spend all our time in manually connecting thousands of services together.

Semantic SOA

Seen this post on InfoQ, OASIS has released a new version of their Reference Model for SOA.

It is well admitted that if we want to have SOA more policy-driven we must stop to manually compose services together. And this can mostly be achieved through extended metadata (I mean more than WSDL). WSDL was a nice beginning but:
  • Is is limited to Web services, and not all services are Web services. For instance some existing mainframes applications, transactions, green screens, APIs of packaged applications and Stateless Session EJBs are also services.
  • Relationships between entities are missing.
  • There is nothing about behavior of services.

The whole Semantic Web is an ambitious set of projects. However, when it comes to data access in SOA, we have to deal with the same kind of issues. Accessing a data services layer is more complicated than accessing a databaase, because it publishes a business interface instead of a technical interface. Therefore, if we want to dynamically compose data services together, as opposed to hard-code the combinations, we need some kind of semantic metadata about the data manipulation behavior.

That dynamic aspect is fundamental for modern Data Services Platforms and is quite often missing in first generation technologies.

The first step of SOA was all about designing, deploying and consuming services. The second step is now all about dynamically designing, dynamically deploying and dynamically consuming services. We cannot spend all our time in manually connecting thousands of services together.

The Information Perspective of SOA Design

Seen this post on InfoQ. I like to see that most people agree on the need to reintroduce the data aspect into SOA. You should read the full article from IBM, but they basically insist on 3 points:
  • Define data semantics.
  • Canonical modeling.
  • Data quality.

The 2 first points are at the heart of any Data Services Platform.

Defining Data Semantics is very important and should be done with extended metadata. The richer the metadata are, the further we can decouple data sources form data consumers. And this is a strong requirement if you want to be more "policy-driven" and less hard-coded in your data access strategies.

Canonical modeling is sometimes seen as a burden by database purists. However, it is required as soon as you need to federate heterogeneous data sources, that is the common case in SOA. On top of that, canonical models offer a more business-friendly view of data. The question remain about the scope of canonical modeling. Very large business models, at the enterprise level, are known to be very difficult to design and to maintain. DSP must offer a way to select the best granularity for canonical models. One model for all applications is a myth (at least today), but conversely, one model per application does not allow reaching the promises of reuse of SOA. On this aspect it is also interesting to track what vertical standardization efforts (SID, HL7, Acord, SWIFT, etc.) could bring on the table. We now see users starting to use the Telco SID model outside of the Telco market, for instance (from a 40,000 ft perspective, a customer is a customer).

The third one, is a market by itself (MDM) but should also be accessible from within a Data Services Platform. Relationhips between MDM and DSP technologies are multiple. DSP can be used as a synchronization layer for MDM products. DSP can support reference data, as a new kind of data sources, capturing their specific meaning (reference data can be a link to the real data, of maintain a link with it). DSP could use data cleaning services to improve data quality. These are just examples.

All in all, this is all going in the right direction.

The Information Perspective of SOA Design

Seen this post on InfoQ. I like to see that most people agree on the need to reintroduce the data aspect into SOA. You should read the full article from IBM, but they basically insist on 3 points:
  • Define data semantics.
  • Canonical modeling.
  • Data quality.

The 2 first points are at the heart of any Data Services Platform.

Defining Data Semantics is very important and should be done with extended metadata. The richer the metadata are, the further we can decouple data sources form data consumers. And this is a strong requirement if you want to be more "policy-driven" and less hard-coded in your data access strategies.

Canonical modeling is sometimes seen as a burden by database purists. However, it is required as soon as you need to federate heterogeneous data sources, that is the common case in SOA. On top of that, canonical models offer a more business-friendly view of data. The question remain about the scope of canonical modeling. Very large business models, at the enterprise level, are known to be very difficult to design and to maintain. DSP must offer a way to select the best granularity for canonical models. One model for all applications is a myth (at least today), but conversely, one model per application does not allow reaching the promises of reuse of SOA. On this aspect it is also interesting to track what vertical standardization efforts (SID, HL7, Acord, SWIFT, etc.) could bring on the table. We now see users starting to use the Telco SID model outside of the Telco market, for instance (from a 40,000 ft perspective, a customer is a customer).

The third one, is a market by itself (MDM) but should also be accessible from within a Data Services Platform. Relationhips between MDM and DSP technologies are multiple. DSP can be used as a synchronization layer for MDM products. DSP can support reference data, as a new kind of data sources, capturing their specific meaning (reference data can be a link to the real data, of maintain a link with it). DSP could use data cleaning services to improve data quality. These are just examples.

All in all, this is all going in the right direction.

Monday, December 8, 2008

Criticism of Java persistence

The recent Criticsm of Java Persistence thread on TheServerSide shows us once again that data access is still a very sensitive, emotional and almost religious topic.
I have said what I had to say in the thread itself, so no need to duplicate this here.

Many people have wrong ideas about persistence in general, and many people have their own views about how to manage it (while conversely very few people are publicly defending their opinion about how to manage pages in an operating system for instance). There is definitely something special about data access and persistence.

NB: I really think ODBMS vendors should stop communicating on the "the best mapping is no mapping" moto. First it is not true, and second it does not help them.

Criticism of Java persistence

The recent Criticsm of Java Persistence thread on TheServerSide shows us once again that data access is still a very sensitive, emotional and almost religious topic.
I have said what I had to say in the thread itself, so no need to duplicate this here.

Many people have wrong ideas about persistence in general, and many people have their own views about how to manage it (while conversely very few people are publicly defending their opinion about how to manage pages in an operating system for instance). There is definitely something special about data access and persistence.

NB: I really think ODBMS vendors should stop communicating on the "the best mapping is no mapping" moto. First it is not true, and second it does not help them.

Friday, December 5, 2008

ODMG's not dead?

Seems OMG will host an Object Database Standard Definition Scope meeting in Santa Clara, next week.

ODMG's not dead?

Seems OMG will host an Object Database Standard Definition Scope meeting in Santa Clara, next week.

Versant acquired db4o yesterday

See the news from the Versant site and from the db4o blog.

Well, that the Enterprise ODBMS buying the embedded ODBMS. Don't know what it means to the ODBMS community. Does it means that eventually the db4o business model didn't work as expected despite the good image of the product and company on the market? I also don't clearly see what it means to the Poet's FastObject part of Versant...

I know both products quite well, and I know both teams share some common genuine values around quality and performance. Even if I now work for a company who owns ObjectStore, I wanted to send a sincere "good luck" to them.

Versant acquired db4o yesterday

See the news from the Versant site and from the db4o blog.

Well, that the Enterprise ODBMS buying the embedded ODBMS. Don't know what it means to the ODBMS community. Does it means that eventually the db4o business model didn't work as expected despite the good image of the product and company on the market? I also don't clearly see what it means to the Poet's FastObject part of Versant...

I know both products quite well, and I know both teams share some common genuine values around quality and performance. Even if I now work for a company who owns ObjectStore, I wanted to send a sincere "good luck" to them.

Tuesday, December 2, 2008

LINK like initiatives for Java

LINK like initiatives for Java