http://www.kuro5hin.org/story/2006/3/11/1001/81803
By mirleid in Op-Ed Mon Mar 13, 2006
ORM stands for Object Relational Mapping. At its most basic level, it is a technique geared towards providing an application with an object-based view of the data that it manipulates.
I have been using ORM in the scope of my professional activities for the better part of three years. I can't say that it has been a smooth ride, and I think other people might benefit from an account of my experiences.
Hence this story.
The basic building block of any application written in an OO language such as Java is the object. As such, the application is basically a more or less large collection of interacting objects. This paradigm works relatively well up to a point. It is when such an application is required to deal with something with a completely different worldview, such as a database, that the brown matter definitely hits the revolving propeller-shaped implement. The term Object-Relational impedance mismatch term was coined to represent this difference in worldviews.
The basic purpose of ORM is to allow an application written in an object oriented language to deal with the information it manipulates in terms of objects, rather than in terms of database-specific concepts such as rows, columns and tables. In the Java world, ORM's first appearance was under the form of entity beans.
There are some problems with entity beans:
The first problem does not kill you, but it also does not make you stronger. In fact, the dependency on a container implies that proper unit testing of entity beans is convoluted and difficult. The second problem is where the real pain lies: the programming model and the sheer number of moving parts will make sure that building a moderately complex, working domain model expressed as entity beans becomes a frustrating and tortuous exercise.
Enter transparent persistence: this is an approach to object persistence that asserts that designers and developers should never have to use anything other than POJOs (Plain Old Java Objects), freeing you from the obligation to implement life-cycle methods. The most common frameworks that claim to provide transparent persistence for Java objects today are JDO, Hibernate and TopLink. At this point, I'd like to clarify that I am not about to discuss the great JDO vs EJBernate 3.0 religious wars, so, don't even think about it.
Hibernate and TopLink are reflection-based frameworks, which basically means that they use reflection to create objects and to access their attributes. JDO on the other hand is a bytecode instrumentation-based framework. While this difference might not seem to be immediately relevant to you, please bear with me: its significance will become apparent in due course.
At a high level, you need to perform the following tasks when using an ORM framework:
Assuming that you have sufficiently detailed requirements and use cases, the first step is a well-understood problem with widely accepted techniques available for its solution. As such, we'll consider the first step as a given and not dwell on it.
The second step is more controversial. The easy way to do it is to create a database schema that mimics the domain model: each class maps to its own table, each class attribute maps to a column in the given table, relationships are represented as foreign keys. The problem with this is that the database's performance is highly dependant on how "good" the schema is, and this "straight" way of creating one generates, shall we say, sub-optimal solutions. If you add to that the fact that you will be constrained (by the very nature of the ORM framework that you are using) in terms of the database optimisation techniques that you can use, and that that one-class-one-table approach will tend to generate a disproportionately large number of tables, you realise pretty soon that the schema that you have is, by DBA standards, a nightmare.
The only way to solve this conundrum is to compromise. From both ends of the spectrum. Therefore, using an ORM tool does not really gel with waterfall development, for you'll need to continually revisit your domain model and your database schema. If you're doing it right, changes at the database schema level will only imply changes at the metadata level (more on this later). Obviously, and by the same token, changes at the domain model level will should only imply changes in the metadata and application code, but not on the database (at least not significant changes).
Creating the metadata for mapping your domain model to the database is where it gets interesting. At a high level, the basic construct available to you is something called a mapping. Depending on which framework you use, you might have different types available to you doing all kinds of interesting stuff, but there is a set that is commonly available:
A direct to field mapping is the basic type of mapping that you use when you want to map a class attribute of some basic type such as string directly onto a VARCHAR column. A relationship mapping is the one that you use when you have an attribute of a class that holds a reference to an instance of some other class in your domain model. The most common types of relationship mappings are "one to one", "one to many" or "many to many".
At this juncture, we need an example to illustrate the use of these mappings. Let us consider the domain model for accounts in a bank; you'll need:
The relationships between them are as follows:
Please note that the terminology that I am about to use is somewhat TopLink-biased, but you should be able to find out the appropriate Hibernate or JDO equivalent without too much trouble. Anyway, once you have figured out the relationships between classes in your domain model, you need to create the metadata to represent them. Prior to Tiger (JDK 5.0), this was typically done via a text file containing something vaguely XML-like describing the mappings (and a lot of other stuff). If you are lucky, you'll have access to a piece of software that facilitates the creation of the metadata file (TopLink provides you with something called the Mapping Workbench, and I understand that there are Eclipse plug-ins for Hibernate).
With TopLink and Hibernate, once you have the metadata file you are away. If you are using JDO, there is an extra step required, which is to instrument your class files (remember that JDO uses bytecode instrumentation), but this is relatively painless, since most JDO implementations provide you with Ant tasks that automate it for you.
What follows is an account of my experience (and exploits) with TopLink. Some of the issues encountered will be, as such, somewhat specific, but I think that most of them are generic enough to hold for most ORM frameworks.
The first problem that you face is documentation. It is not very good, ambiguous, and only covers the basics of the framework and its use. Obviously, this problem can be solved by getting an expert from Oracle (they own TopLink): I guess that that sort of explains why the documentation isn't very good.
The second problem that you face is that if you are doing something real (as in, not just playing around with the tool, but actually building a system with it), you typically have more than one person creating mappings. You would have thought that Oracle would have considered that when creating the Mapping Workbench. They did not. It is designed to be used by one person at a time, and there's no chance that you can use it in a collaborative development environment. Additionally, it represents the mapping project (the Mapping Workbench name for the internal representation of your mapping data) in such a huge collection of files that storing them in a VCS is an exercise in futility. So, mapping your domain model becomes a project bottleneck: only one person at a time can edit the project, after all. As such, the turnaround time for model changes and updates impinges quite a lot on the development teams, since they can play around with the domain model in memory, but they can't actually test their functionality by trying to store stuff to the database.
When you finally get a metadata file that holds what you need to move your development forward, and you run your code, you start receiving angry e-mails from the project DBA, reading something like "What in the name of all that is holy you think you are doing to my database server?" [… the rest half of TopLink specific problems omitted …]
by ttfkam on Mon Mar 13, 2006
New versions of Hibernate implement the EJB 3.0 EntityManager interface. So instead of separate XML schema definition and mapping files, you simply annotate the POJOs and go.
The downside is that persistence info is in your Java source. The upside is, well, that persistence info is in your Java source.
And using EJB 3.0 means that you can swap between Hibernate in standard J2SE apps, JBoss, Glassfish, and the others simply.
by Lars Rosenquist on Mon Mar 20, 2006
By using the TransactionProxyFactoryBean and the HibernateTransactionManager to manage your transactions. Use the OpenSessionInViewInterceptor to support lazy initializing outside of a Hibernate Session scope (e.g. web context).
by claes on Mon Mar 13, 2006
seem to help a lot.
It seems as if every time I start a new project I go around and around with exactly the same issues — how "close to the database" or "how close to the object model" to put the interface.
Currently I think if you know something about databases you're better off starting with a schema that really, truly, reflects the actual "business objects" of your application. Then wrap this in a couple of DAO (buzzword, blech) classes (The Spring Framework classes help here), and deal with it.
Once you get the schema right, it tends to stay put. In our current major project I end up actually doing something at the SQL> prompt about once a month or so, mostly for debugging. That tells me that the model is good — matching both the underlying logic, as well as being easy to get at from the Java classes.
To go up a level, there are a couple of things like ORM (HTML page generation is another) where there are constantly new frameworks, new buzzword, and "better" ways to do things. My feeling is that when this goes on for a while, it just means that the problem is just plain hard, and if magic frameworks haven't fixed it in the past, they arent' going to fix it in the future.
Thanks for the write up.
by skyknight on Sun Mar 12, 2006
You open a session, which means getting back a session object from a factory, with which you will associate new objects and from which you will load existing objects. Such object manipulations are surrounded by the initiation and termination of transactions, for which you can specify the isolation level. I don't know about other frameworks, but Hibernate does take transactions seriously.
ORM is definitely not a tool that people should use if they don't have a solid understanding of relational database technology and issues, or perhaps more generally an understanding of computer architecture. Rather, it should be used by people who have substantial experience writing database applications and have after much hard-won experience gotten tired with the grinding tedium of manually persisting and loading objects to and from relational databases. You need the understanding of relational databases so that you can get good performance from an ORM, and without it you'll have the horrible performance that the original piece characterizes in its anecdotes.
I've been dealing with the ORM problem for 5+ years, with a brief escape for grad school. I've written raw SQL. I've used home grown ORM frameworks written by other people. I've written my own substantial ORM frameworks in each of Perl, Python and Java. I've actually done it twice in Perl, with my latest instantiation being pretty good, and yet still being dwarfed in capability by Java's Hibernate. As such, I've recently started learning Hibernate. Hibernate is extremely complicated, and most certainly not for the weak of heart or for a junior programmer with no relational database experience, but it is also extremely powerful. In learning Hibernate I've been very appreciative of many of the hard problems that it solves, problems with which I have struggled for years, in many cases unsuccessfully.
Mind you, even with Hibernate, ORM is still ugly. The fact that you need to persist your objects to a database is largely an artifact, an accidental component of your development process stemming from limitations in today's technology, not an intrinsic facet of the thing that you're trying to accomplish. Also, ORM is inherently duplicative, in that you end up defining your data model twice, as well as a mapping between the two instantiations of it. Such is life… It would be nice if we had "object servers", as well as cheap and performant non-volatile RAM, but we don't, and we aren't going to have such things for well over a decade at least, not in reliable versions anyway.
As someone who has slogged through implementing his own ORM on a few occasions, I can say that it is a great learning experience, but if your goal is a production quality system, then you should probably use something like Hibernate. The existence of Hibernate alone is probably a strong argument for using Java when writing an application that requires complex ORM. I don't know that C# has solved the problem, but I haven't looked, honestly.
by Scrymarch on Sat Mar 11, 2006
I've used Hibernate on a few projects now and been pretty happy with it. I've found it a definite productivity increase on raw JDBC - there's simply less boilerplate, and hence less stupid typo errors. The overwhelmingly most common class -> table relationship is 1:1, so you cut out a lot of code of the
account.setAccountTitle( rs.getString(DataDictionary.ACCOUNT_TITLE) ); account.setAccountBalance( rs.getInteger(DataDictionary.ACCOUNT_BALANCE) ); collection.add(account);
variety.
It does irritate me that you end up with HQL strings everywhere, but you ended up with SQL strings everywhere before, so shrug. Really the syntax should be checked at compile time, instead of implicitly by unit tests. Such a tool shouldn't even be that hard to write, but I guess I'm lazy. I'd be uneasy letting devs near HQL without a decent knowledge of SQL. For mapping, we used xdoclet or hand editing the result of schema-generated xml files. Usually the same developer would be adding tables or fields, the relevant domain objects, and required mappings.
Now I think about it though, every time I've used ORM I've laid out the data model first, or had legacy tables I had to deal with. Inheritance in the domain model tended to be the interface driven variety rather than involving a lot of implementation inheritance. Relational database have a pretty good track record on persistance, maybe you could let them have a little more say.
We did still get burnt a bit by lazy loading. We were working with DAOs which had been written with each method opening and closing a session. So sometimes objects would make it out to a higher tier without having the right dependant detail-style attributes loaded, which throws a lazy loading exception. We got around this by moving the session control up into the business layer over time. This is really where it should have been in the first place, not being able to go:
session.open(); // or txn.start or whatever data data think data think session.close()
is kind of crazy.
These projects were with small teams on the scale of half a dozen developers. Sounds like you were on a bigger project, had higher interpersonal communication overheads, etc. Just to put all my bias cards on the table, I gave a little yelp of pain when you said "waterfall".
by bugmaster on Mon Mar 13, 2006
I've had some very happy experiences with Hibernate. It lets you specify lazy-loading strategies, not-null constraints, caching strategies, subclass mapping (joned-subclass, union-subclass, table-per-subclass-hierarchy), associations, etc… All in the mapping file. And it has the Hibernate Query Language that looks, feels and acts just like SQL, but is about 100x shorter. Hibernate rules.
by Wolf Keeper on Tue Jul 10, 2007
I'm familiar with Hibernate. I can't speak for the others. The learning curve is steep, but once you've got it you can build applications very fast.
'Derive your database schema from the domain model.'
Hibernate is flexible enough to work with an existing schema. You can write your Hibernate objects to work with your database, and not vice versa. You lose the ability to use polymorphism in some of your database objects, but ORM remains very handy.
'The first problem that you face is documentation.'
The Hibernate website is a font of information, and the Hibernate In Action book remains the only IT tome I have read cover to cover. It is outstanding.
'you typically have more than one person creating mappings.'
Hibernate handles that fine. The XML files governing Hibernate behavior are human-readable, and can be team developed just like any other Java code.
'you realise that that happens because the ORM framework will not, by default, lazy load relationships'
The importance of lazy load relationships is highly documented in Hibernate and the default behavior for a few years now.
'This time, though, in order to make a call on whether a relationship should be lazily loaded, you need to trawl through all the use cases, involve a bunch of people, and come up with the most likely usage and access scenarios for each of the classes in your domain model.'
You have to do this whether you use an ORM or just JDBC. Your JDBC code can just as easily hit the database too often. Either way, the research should be done before the application is built and not after.
' The problem is that reflection-based ORM frameworks figure out what needs to be flushed to the database (as in, what you created or updated, and what SQL needs to be generated) by comparing a reference copy that they keep against the instance that you modified. As such, and at the best of times, you are looking at having twice as many instances of a class in memory as you think you should have.'
I believe Hibernate simply tracks whether an object has been changed and does not keep a reference copy. Regardless, there's well documented guidelines for evicting objects you don't need from memory to cap memory use. And RAM is cheap.
'At a high level, the only way that you can get around this is to actually figure out which classes are read-only from the application standpoint.'
Again, whether you use ORM or SQL and JDBC, identifying read-only classes is part of application development. Setting up read-only caches of objects that don't change is easy.
'Surrogate keys'
I have to disagree that surrogate keys are a drawback. Put a unique constraint on the columns you would have preferred as a primary key (i.e. what would have been the "intelligent keys"). Then you can do joins, updates, deletes, etc… using the intuitive primary key column and the application can chug right along with surrogate keys.
It's also worth mentioning that Hibernate also has easy tools for using straight SQL and for using prepared statements and languages like Transact SQL, PL/SQL, or PL/pgSQL.
I can't say ORM is the best solution for mapping between object oriented languages and databases. But for big applications, it's much easier than rolling your own JDBC code for all of the database work. Someone skilled on Hibernate with input in your project planning could have made life *tremendously* easier.
by Ufx on Mon Mar 13, 2006
I've had the pleasure of using an ORM for multiple projects, and my experiences have compelled me to reply to your article. Our platform is .Net, which offers some more flexibility in the reflection space due to generic type information being preserved. This is particularly useful when figuring out what type your lists contain.
First, let me first state the requirements of one of my projects. There is a client application that must support the following:
As an architect and developer working this project, I added my own required features above those required by the application:
We currently use an ORM that meets all of the above requirements. Allow me to share my thoughts on some very big issues we had to solve regarding these requirements, and mix in some responses to your issues.
As far as configuration is concerned, the system uses a configurable conventions class that allows programmatic configuration of defaults, and it uses attribute decorators for all other configuration. I know that this ties our domain model to a specific data model, but in our situation that tradeoff wasn't so bad. Furthermore, my experience is that data schema and object schema are usually versioned together anyway. Contention for configuration is exactly the same as contention for the domain objects themselves, so there are rarely problems. I'm surprised that your system did not allow you to split the configuration files by some logical boundaries that would've reduced the contention issues you had.
The key factor to easy mapping is consistency. Most mapping issues arise out of essentially idiomatic models being twisted for no good reason. Put your foot down: Everything must follow the idioms unless there is a *very* good reason not to. Usually that reason is performance, and when the need arises you most likely have to introduce an extra step in your mapping in the form of a DTO. While this reduces the transparency of the system, in my experience the need for these is rare enough not to have to worry about it as the vast majority of the system should be plenty performant by default. If it isn't, you're using the wrong technology!
Developer productivity is the most important resource. The more time you save with an excellent model, the more time you have to work on optimizations when the need arises. Normally you should not be coding with consistency-breaking optimizations in mind. Typically when confronted with a performance problem, we need to either eagerly load something, or we need to cache something, or we need to create an index. Most performance issues can be resolved in a matter of minutes.
Your statement about reflection-based ORM frameworks needing to keep copies of the object in memory isn't entirely accurate. Our system does not do this, and instead relies on user code to tell the data layer what to send back to the database. I find this works rather well, because most of the time you definitely know what has changed.
Security-wise, the rich metadata that an ORM provides was a godsend when writing our security system. Because there is exactly one gateway in which all transactions flow, and this gateway had all of the information necessary to perform a verification, even our extremely granular security was easy to implement in a generic manner.
Our disconnected architecture was also aided by the metadata and design of the ORM. When we went into disconnected mode, queries were simply re-routed to a SqlLite driver instead of the default webservices driver. Also, the single point of entry for all returned data allowed for easy caching of that data.
Most good ORM systems can use natural or surrogate key structures. My preference is for surrogate keys, because let's face it: Natural keys change and developers are not always experts in their application's domain. Not every developer can make the call as to what is a natural key 100% of the time and what is a natural key 95% of the time. It's far easier to drop a unique constraint than it is to change your key structure when the inevitable happens.
I understand that our requirements are quite different from those that exist in the web-based world. The world will be a happier place for us developers when we can dump the web as an application platform and replace it with an application delivery platform.
documented on: 2008-02-12