A Study of Impedance Mismatch
From the early 2000s to the mid teens, it seemed like every company hired me to remove an OR (object to relational) mapping because of performance, quality and extensibility issues. The gains the frameworks gave them became liabilities as their user bases grew.
OOR tools have always confused me. The idea is that you can create your OO design and automatically generate queries for your CRUD operations. This allows you to get a simple app up and running quickly. Examples of these frameworks are Java Hibernate, Ruby on Rails, etc.
The Dilemma of Business Value
Let’s say you creates a billion dollar product with simple CRUD operations using an OR. You release, and your product catches fire but competitors immediately add those features to their products. Note: data and data access can not be patented.
In a business, collected data has value and can be differentiating as long as it is correct. To leverage this data you need business logic comes in. Business logic is the embodiment of the enterprise, what they do and how it is different from every other project powered by data.
The business tier implements customer features maintainable and correct no matter how many features are thrown at it. If it can not be extended the company will not thrive. On the other hand if the data layer is not correct, the company will surely die.
There is the dilemma: Collected data is the essence of your business, therefore it must be consistent, correct and reflect transactional boundaries (I will address transactions in another rant). Your business data must be robust, extensible, maintainable, scalable and provably correct. These are two vastly different goals.
The Mismatch
Back to the original premise of an impedance mismatch. This will involve some conceptual mathematics. I am going to try to make an understandable model starting with the basics.
The goal of relational databases is to eliminate redundancy. Set theory defines what is redundant. Redundancy consists of not only attributes and data but dependancies that allow us to infer relationships between data and keep it consistent.
It gets tricky here. My X (does not deserve a pronoun) accurately said that relational databases are a compression method and we just have to make sure they are not lossy.
Relational databases must be normalized (Boyce Codd Normal Form or Third Normal form) to prevent insert, delete and update anomalies. These anomalies will destroy data integrity and ruin the value of your data. Basically, you take your fields and your Functional Dependencies crank them through the synthesis algorithm (Berstein, Phillip A “Synthesizing third normal form relations from functional dependencies).
Object Oriented Design is all about loose coupling, high cohesion and well formed objects (Reil, Arthur J “Object Oriented Design Heuristics”)
Based of these incompatibilities, David Maier at a conference in the late 80s proved that the mapping of OO to relational was an NP-hard problem. Basically you are trying to map two directed graphs onto each other (I wish I could find a reference but the results are pretty obvious).
Basically, objects may have redundant data, say an address or a zip code, that are verboten in a relational design if you allow the redundancy you are subject to the the anomalies and your data will be corrupted.
Developers get around this by either mapping their OO design to entities and relations (crap data) or mapping their entity relationship model (ER) to their OO design (crap, non extensible design).
OR the Idea
The idea is that OR libraries map between OO and ER. I mean it is NP Hard. I guess if you have time to wait for the sun to explode, it might be OK.
Most libraries ignore these inconvenient mathematical truths and create some hybrid solution (say caching redundant fields for objects and having triggers update the objects). The problem is these structures invariably bleed into your business tier and create complexities pushing you to the exponential part of the cost of change curve or to not caring about transactional and data integrity of you database.
Most companies management having me remove OR mapping layers did not understand the subtleties The were more concerned with performance and the horrific queries generated by the mappings (non tunable, random, horrid).
The Solution
In my experience, the most effective solution is to acknowledge the impedance mismatch. Use the synthesis to bring your database to BCNF. Ensure proper transactional boundaries on your updates and inserts and effective queries. This defines your data layer.
Your OO design is an emergent property of your requirements. Objects must be cohesive and loosely coupled. Tune your objects to conform to best practices of OO design (Martin, Robert “Clean Code) (Fowler, Martin “Refactoring”).