Tuesday, April 8, 2008

When to use an embedded ODBMS?

I have recently seen this article on TheServerSide, asking the question about "When to use an embedded ODBMS?". See also the related thread on their forums.

I have the feeling that considering not to use an RDBMS now tends to be more and more acceptable to the developer community. As SOA is growing many tenants of RDBMS now agree that sometimes RDBMS are not the best technology to use (we're not saying that RDBMS should never be used). As the author, Rich Grehan, wrote ODBMS are not the only alternative, one could consider XML files, C-ISAM files, lightweight databases like BerkeleyDB or Sleepycat.

The case of embedded applications is maybe a good one for non-relational datastores because:

  • There is no need for ad-hoc queries outside the application

  • There is no need for BI around the system

  • The cost of a full RDBMS engine would be too much in terms of disk space, CPU usage and memory footprint


The author then list potential benefits of an ODBMS. I have to admit most of his arguments are not really valid. Basically, he is saying that schema management and evolution is much easier with an ODBMS than with an RDBMS plus ORM.

That is just a question on how the ORM layer has been designed and implemented and it is quite possible to imagine an advanced ORM solution being able to perform automatic schema evolution (and probably there are some of them doing it). The fact is that on real cases, automatic schema evolution is not often used (even when an ODBMS has been chosen) because schema evolution is in general included in a more global "version upgrade process" of the whole system (involving backups, data checking, data conversions, data re-initialization...). I agree, the situation is a little bit different with embedded systems and these ones might get some benefits from automatic schema evolution (being based on ODBMS or on RDBMS plus ORM). What I want to say is that automatic schema evolution is not a feature linked to ODBMS per se, and it could be available on any kind of datastores. It is just a cool feature, that can be used in very specific situations.

The author then have a second argument: even if ORM hides the complexity of managing an object model into an RDBMS it does not remove the need for some code to be executed in order to manage the impedance mismatch. The author is making here the wrong assumption (that most people are doing) that ODBMS engines are internally natively managing a full object-oriented model. In reality ODBMS engines are simulating inheritance and collections exactly as one ORM layer would do and most of this management is done in the client APIs, giving the taste of transparency.

An ODBMS is just a storage with APIs and QL around it being able to digest object models. That storage has to be as simple, efficient and robust as possible internally. Basically you just need: efficient page management, space allocation algorithm, object IDs, storage of any tuple, indices, etc. Then on top of that simple storage you will build all the necessary features of a typical database: APIs, QL, security, transaction management, crash recovery, logging, backups, replication, network protocol...

That's exactly what we did at Xcalia with the Jalisto project contributed to ObjectWeb. This is a basic storage on top of which you can enable/disable database features. The storage itself is quite configurable but we didn't provide any API around it so it's up to you to wrap Jalisto into an ORM layer of your choice. As many features can be disabled (including the network layer) it could be a good solution for embedded systems. The system is comparable to db4o or BerkeleyDB but the fact is does not impose its own set of APIs and QL, you have to wrap it with your favorite data access layer.

All in all, it is positive to see people starting to push ODBMS, even if I have the feeling this article is mostly a kind of masked db4o advertising (which is a good database and sane technology anyway). This potential rebirth of ODBMS will make the Xcalia universal mapping technology even more competitve against pure ORM solutions limited to RDBMS. It is nice to see that Toplink and JPOX are now also working on extending their mapping technology to non-relational datastores.

My bet, is that the future is in the data access, not in the database. We'll see several database technolgies co-existing and addressing different problems, with standardized Data Services in the middle to efficiently serve new business applications, reporting tools, BI, etc.

No comments:

Post a Comment