Tuesday, November 25, 2008

The Future of Databases

Last week in San Jose, during the Data Services World event, I've been participating to several discussions (private journalists briefings, public panels) about the Future of Databases in the Cloud Computing. There are people who really seriously think today that RDBMS will soon disappear because of the Cloud.

I have seen that the same topic has also been discussed at various locations at the same time, including interesting point of view from Martin Fowler.

New technologies for databases can be roughly divided into two groups:
  • New kind of database technologies, tuned by design for the Cloud like Vertica, CouchDB, SimpleDB and other products alike that I've already mentioned in that blog.
  • New deployment and access for RDBMS, known as Database-as-a-Service. Basically, the database is remotely hosted and administrate, but you still access it through SQL over HTTP or SOAP/REST.

Having worked in the past for an ODBMS vendor I know how difficult it is to convince CIOs, project managers, architects, developers and DBAs to move from RDBMS. There is a kind of religion about relational theory. My take is that RDBMS are here to stay, as mainframes did (they never disappeared as it has been predicted by many "experts" in the past). New technologies never replace good old ones, they just complement them.

Anyway, there are tangible impacts of SOA and the Cloud on the database market:

  • We will access more kind of data sources in the future, not only RDBMS and services, but also new kind of databases. Heterogeneity will continue to grow.
  • We will access more data sources in the future, most applications were using single databases, they will now access multiple data sources. We are switching from data access to data integration (I tend to prefer the term adaptive mediation of information). Integration has to be done at a business level, not at the SQL one or XML one.
  • Many advanced features of databases engines (security, fault tolerance, stored procedures...) will progressively move to an intermediate integration layer. Databases, including relational databases, will go back to simple and efficient storage technology.
  • Data integration will become more important than the database itself, databases will be commoditized. Each application development team will be able to select the best database technology for its needs.
  • Accessing non-database data sources will impose to have extended metadata. The relational world is simple because SQL provides a convenient, technical APIs to access data at the atomic level (a cell at the cross between a row and a column). Everything is implicit, in terms of metadata, access patterns, etc. Conversely, accessing a service-oriented data source imposes to explicitly describe its data model and its data manipulation semantic. Services can be either fined-grained or coarse-grained, you need to capture that. Data access has its contribution to the Semantic Web.
  • When thousands of data sources will be available as data services (like mainframe screens, APIs of packaged applications), we will need tools to automatically combine them at runtime. Manual, hard-coded or even visual composition of data services is a choice only when dealing with a few data services. Dynamic composition of data services (e.g. aggregation of fine-grained data services into other larger coarse-grained data services as required by ever changing business functionalities) is imposed by the really agile IT. Otherwise "agile" will turn into "fragile"!
  • Ad-hoc data mashups will require availability of the right data services at the right time. This can only be achieved by platforms being able to dynamically create and publish new data services as they become required.
  • Access to non-structured data will grow. At the same time, non-structured data is on the way to structure itself or at least to describe itself better, see the "Linked Data", "OpenCalais" and the "Web of Data" efforts for instance.
  • Accessing multiple data sources with different latencies will impose to deal with reactive data integration patterns. We will have to support asynchronous data access, and we will need tools for that, because asynchronous and parallel programming are not natural to most developers and architects.

As Martin Fowler concludes, Data services platforms are enabling the promises of SOA, by really favoring small business functionalities having their own storage, instead of sharing data in huge centralized databases.

No comments:

Post a Comment