Wednesday, November 26, 2008

Different problems, different databases

The big promise of relational databases was to have a unique, single technology for all our data storage needs. The main idea was to separate data from applications manipulating data.
Having data models too much coupled with applications model has indeed been recognized in the 70s as one of the main problems preventing IT flexibility (for instance, because producing new reports for business users required to go back to a full development cycle).

But at the same time, the design of software made significant progress by recommending encapsulating data (state) within methods (behavior), clearly going in an opposite direction. This all created a lot of stress and noises in the software industry, and eventually the emergence of persistence technologies.

But what really means decoupling data from applications? It mostly consists in removing explicit directional relationships from database schemas, so that data views can later be recombined in any way. When you think about it, it just means that relationships where poorly represented in programming languages, and it is still true with most modern languages including Java and C#. But to be honest, relationships are also poorly represented in relational models and this is not the nasty foreign keys that will change anything to it.

The fact is that the next real big revolution in the IT world would be a first comprehensive support for the notion of relationships.

Object database vendors failed to impose ODBMS while it was the most relevant choice for Java, at least from the technical point of view. There are many reasons to that:
  • Some first ODBMS implentations were very bad in terms of database administration, ad-hoc queries and overall performance. They were not really database but rather a storage mechanism for in-memory object pages.
  • The "ecology" never really started arounbd ODBMS (reporting tool...).
  • ODBMS started in the late 80s, exactly when RDBMS were just about to gain mometum on the market, it was not the right time to impose a new database technology.
  • Then major ODBMS vendors raised money from IPOs in the 95, in a very quiet time with no opportunity for expansion and no real need for money, therefore most of that money has been waste for nothing.
  • In 1998, the ODBMS vendors missed the Internet wave, mostly because of the XML mania at that time.
  • Then they surrendered and tried to reposition themselves as cache (Versant, Gemstone) or XML storage (Objectstore).

Hopefully, the XML database market never really emerged, despite the huge XML hype, probably because everybody understands XML is a good exchange format but a very bad (too verbose) storage format. The big problem with XML is still that it tends to impose hierarchical models, which to some extents are a kind of regression in our industry.

You can easily have an object or XML layer on top of any kind of storage, including relational (see IBM pureXML, for instance). Probably the best approach would be to have neutral and efficient storages, with multiple interfaces around them. It could be a kind of relational storage with the notion of relationship efficiently supported.

Internet, SOA, WOA, Web 2.0, mashups, etc. all favor a style where business functionalities become independent, and thus will have their own storage (as they cannot share a common database any longer, as they are really physically distributed).

It now seems some vendors are now trying to push the idea that even this low-level underlying storage layer, should have different foundations, depending on the kind of problems addressed. That's why the vertical model (storage primarily organized by columns instead of rows, like in Vertica) or the key-value model are quickly growing these days.

They will certainly not replace RDBMS systems any soon, as some are already claiming, but they will maybe impose themselves in some situations. Seems to me the time of the omniscient relational model is about to decline, even if it will remain present for decades.

Data services direclty impacts the database world because:

  • We have more and more data sources to access from a even a simple business application.
  • The notion of transaction is change.
  • We more and more frequently have to support asynchronous data access.
  • It becomes not only possible but also mandatory to access any kind of data sources, not only relational ones.
  • Databases are progressively commoditized, and their advanced features will move to intermediate mediation layers.
  • Then it is possible to choose the best database technology for a given need, at any time.

No comments:

Post a Comment