Developers and Their Tools

This is my tool. There are many others like it, but this one is mine. My tool is my best friend. It is my life

With apologies to Stanley Kubrick, I can’t think of a better phrase to describe engineers and software companies. The focus is not on the what, the why or the how but on the tools. A colleague from Object Mentor once said, “Never make fun of a man’s tools”. It was hysterical, but revealing.

To explain, I use this analogy. A contractor shows up to remove a concrete / brick wall from your back yard. He is armed with a ball peen hammer. Naturally, you ask, “Did you bring other equipment? That is a pretty sturdy and large wall.” 

The contractor replies, “I love this hammer. Yesterday, I was finishing a piece of furniture and this hammer was perfect. Left the smallest indentation that i took out with a piece of 1000 grit sand paper. I love this hammer, it is all I will ever use. 

My response, as my mother would say, “Don’t let the door hit you in the ass as you leave”.

Let’s explore the causes, consequences and remedies of this developer malady. 

Before I continue, I will probably offend people by mentioning their particular type of hammer. It is not that I mean to offend, but the offense kind of proves my point.

I am not recommending any framework for any situation, but advocating that these frameworks should be in your tool box. This includes libraries, languages, frameworks, patterns. I am ambivalent to which one is used but view it as an optimization problem. Or to quote my father (in his Italian accent), “Oh shiiite Carmine, use de right tool for de right job.”

That begs the question: “What is the right tool?

The Right Tool
The approach I have found most effective in my career is to be language, technology, tool ambivalent. That is I look at the tools available and pick the ones most effective for the job. The decision has to be zero cost which I will get to next.

I look for the most performant, simplest APIs and tools. I am very aware that every API, language, etc. I add increases complexity. A dried eye evaluation must be made to determine if the complexity is worth it. Again, I am assuming that switching out technologies is near zero cost (next chapter).

I have a few other rules. If the API has security holes back of the line. If the API’s documentation has errors, it is a no go. Finally if the API requires complex dependencies (every imported library is overhead and may contain security errors) or special build steps that is an extraordinary high cost.

If you approach the decision without sticking to your hammer, the cost benefits can be fairly clear

Zero Cost Switch
I can’t tell you how many hours I have spent in meetings where ‘architects’ argue about which framework we should use without any provable evidence how it would work in the current application. I would replace the word ‘spent’ with ‘wasted’ if I did not learn early on to bring my laptop to actually do work.

Thing is, without application and tests, it is impossible to determine the fit of any library. You have to integrate the API and see if it works. These meetings are trying to find an unknown bound to an unknown and unless you have a laptop to actually accomplish anything are a waste of time.

This is where my term “Zero Cost of Switch” comes into play. No used APIs should bleed into your business layer. Doing so will cause swapping out APIs nearly impossible. OO design patterns make abstracting off APIs trivial.

The most simple and most useful is dependency injection. Note I am not talking about dependency injection frameworks with come with their own host of complexity and problems, but simply using the abstract factory pattern to wall off the external API and its structures from your business tier. 

This wall has many advantages:

Developers and Their Tools

This is my tool. There are many others like it, but this one is mine. My tool is my best friend. It is my life.

With apologies to Stanley Kubrick, I can’t think of a better phrase to describe engineers and software companies. The focus is not on the what, the why or the how but on the tools. A colleague from Object Mentor once said, “Never make fun of a man’s tools”. It was hysterical, but revealing.

To explain, I use this analogy. A contractor shows up to remove a concrete / brick wall from your back yard. He is armed with a ball peen hammer. Naturally, you ask, “Did you bring other equipment? That is a pretty sturdy and large wall.” 

The contractor replies, “I love this hammer. Yesterday, I was finishing a piece of furniture and this hammer was perfect. Left the smallest indentation that i took out with a piece of 1000 grit sand paper. I love this hammer, it is all I will ever use. 

My response, as my mother would say, “Don’t let the door hit you in the ass as you leave”.

Let’s explore the causes, consequences and remedies of this developer malady. 

Before I continue, I will probably offend people by mentioning their particular type of hammer. It is not that I mean to offend, but the offense kind of proves my point.

I am not recommending any framework for any situation, but advocating that these frameworks should be in your tool box. This includes libraries, languages, frameworks, patterns. I am ambivalent to which one is used but view it as an optimization problem. Or to quote my father (in his Italian accent), “Oh shiiite Carmine, use de right tool for de right job.”

That begs the question: “What is the right tool?

The Right Tool

The approach I have found most effective in my career is to be language, technology, tool ambivalent. That is I look at the tools available and pick the ones most effective for the job. The decision has to be zero cost which I will get to next.

I look for the most performant, simplest APIs and tools. I am very aware that every API, language, etc. I add increases complexity. A dried eye evaluation must be made to determine if the complexity is worth it. Again, I am assuming that switching out technologies is near zero cost (next chapter).

I have a few other rules. If the API has security holes back of the line. If the API’s documentation has errors, it is a no go. Finally if the API requires complex dependencies (every imported library is overhead and may contain security errors) or special build steps that is an extraordinary high cost.

If you approach the decision without sticking to your hammer, the cost benefits can be fairly clear.

Zero Cost Switch

I can’t tell you how many hours I have spent in meetings where ‘architects’ argue about which framework we should use without any provable evidence how it would work in the current application. I would replace the word ‘spent’ with ‘wasted’ if I did not learn early on to bring my laptop to actually do work.

Thing is, without application and tests, it is impossible to determine the fit of any library. You have to integrate the API and see if it works. These meetings are trying to find an unknown bound to an unknown and unless you have a laptop to actually accomplish anything are a waste of time.

This is where my term “Zero Cost of Switch” comes into play. No used APIs should bleed into your business layer. Doing so will cause swapping out APIs nearly impossible. OO design patterns make abstracting off APIs trivial.

The most simple and most useful is dependency injection. Note I am not talking about dependency injection frameworks with come with their own host of complexity and problems, but simply using the abstract factory pattern to wall off the external API and its structures from your business tier. 

This wall has many advantages:

  • Abstract factories expose precisely which behaviors you need and are using from the API. If you need to switch for any reason you have the minimal set required.
  • Abstract factories define minimal inputs needed for the business tier in the form of beans or constraint protected objects. If you need to swap out the API you know what data is needed.
  • Abstract factories allow you to inject test classes that mimic the necessary behavior so unit tests are fully decoupled from externalities.

If we are to preserve zero cost switch then we must adhere to the tenants of OOD12

Basically, you should minimize the interfaces and the data objects to the least amount of information necessary. It is not helpful if you make a huge bag of data and behavior. Breaking it up into pieces will make any forced migration easier.

 

Inject This!

Dependencies or Codependencies?

Before I start this discussion I want to refer you to my previous rambling, This is My Tool as it may keep your head from exploding. 

There are many tools that, while founded with good intention, create more messes than they solve. This article will look at dependency injection (DI). In my experience, I have never seen a project that uses DI that was not a complicated, horrific mess.

I will attempt to catalogue how these messes manifest, reasons they seem to be inevitable and solutions/patterns to prevent their occurrence.

What is it

To be honest I never considered dependency injection to be a thing in and of itself. I use it frequently as part of the Inversion of Control (IOC) pattern by injecting concrete instances of an abstract class into an object through its constructor.

This is incredibly useful for testing code that accesses external resources, abstracting dependencies so you can switch out implementations. For example, it is very useful to be able to swap Google AI, ChatGPT, OpenAI, etc. during development and testing.

With emergence of Spring Boot, Lombok, etc. it seems to have taken a life  of its own. Teams use these frameworks to ‘wire up’ all objects. The results are frequently horrifying.

The intent (no free lunch)

The intent of DI is pretty straight forward: to decouple the construction of a needed object from its use. 

I find the misuse of DI is invariably linked to one problem. Programmers never stop to ask the question: “Now why in tarnation would I want to do that?” They see it as a cool new tool (refer to the aforementioned article) and use it everywhere.

They forget that all tools have overhead. Does the savings from using this tool overcome this cost or are you simply adding more to the accidental complexity of your project.

The costs

I find two main costs of DI. First, you typically don’t know your type until runtime and second, you don’t know life cycle of the object you are using. This makes maintaining state rather difficult.  These costs are a direct result of the tool itself. 

Arthur Riel, in his seminal work “Object Oriented Design Heuristics’ described how objects get the objects they use. One of the five ways was “God gives it to you”. This is the problem. Can you trust the god who gave it to you. I mean, which god Thor or Loki?

One project I worked on, no-one remembered how a particular set of objects were being injected. It was problematic because it was unknown whether the calls to these objects were synchronous or asynchronous. It took me days to figure it out. After searching through config files, configuration classes, under my desk (I mean it had to come from somewhere) I launched the debugger to see what concrete class was being injected, then reversed engineered where it came from.

If we had simply instantiated the class and passed it through the constructor, this search would have taken seconds.

Why

I had to go through several projects to figure out why anyone would use this. I mean there is a compelling argument to use spring boot or similar to set up a web service or bind a db implementation. And as long as you keep the bindings simple and minimal there is no significant overhead. 

Interestingly, in every project that used DI before I got on it, the framework was used everywhere. These projects were ungainly, overly complex, buggy and unscalable.

Why would anyone do this to themselves? I can think of two reasons.

First, many developers like the declarative nature of the DI tools. They seem to think that annotations to sew things together is simpler, hence it goes everywhere. The trouble is that declarative languages do not scale in terms of features. 

It might be fine when you can remember all of the objects, but as the project grows it becomes impossible to track all of the relationships.

Second, many people don’t understand the principals of object oriented design. I had one developer tell me that he did not like OOD because object structures are tall, skinny, deep things. When you instantiate and object it creates objects that creates objects. Using DI he was able to expose all of the objects.

The fundamental simplifying principle of OOD is containment. Successfully calling a constructor means that that object and all of the contained objects are valid and ready to chose. These DI tools blow away this assumption and condition.

This leads to exponential costs of change. I don’t care about, within reason, the cost of implementing the first few features. The real question is how expensive is the thousandth or the ten thousandth. The only reason we design / architect is to keep the cost of change flat. 

DI tools are at odds with this goal. It is essential that if you are using DI it is forced to the outside of you business logic. The layer it lives in will suffer from all of the above effects, but if it is small it will be useable.

Feedback Loops and Pristine Myths

The Reaper Will Come

I was originally writing an article that looks at the time and costs for engineering decisions. Addressing both the immediate and long team effects.It will follow on the heels of this one. However, I noticed that I needed to introduce the idea of feedback and its longterm effects. Hence, this article was born.

My mother used to say, “It is not a question of if the reaper will come but when he will come and how much will he reap.” Yes, she was amazing, but this one quote has always guided my life and career.

Guided by mother’s tail of the reaper, I approach any decision in software engineering I ask the question: “How much will this decision cost us and how much will it cost to reverse it”.

People constantly tell me this is a cynical view. In reality, it is a pragmatic approach to couching each decision in consequences. 

Originally I used to pull effects from my bum but then I found “interaction diagrams”. These allowed me to discuss cause/effect with teams1. Not perfect but a visualization to your decisions and their effect.

This is an unfortunate example of how local optimizations caused the demise of a promising startup.

I was VP of Engineering for a startup in a highly volatile space. I was hired to revive a team going into a first round of funding. I catalogued the problems but got the go ahead when their live site went down for 3 days during a board meeting.

My team spent a month implementing CICD with full automated regression tests. We got to 4 to 8 releases a day with zero defects. Note we used new features to drive these changes. 

The end to end build to stage with tests took less than ten minutes. The regression tests making solid releases gave the company a distinct advantage in a volatile market.

We had a solid, evolving platform for a few months. At a staff meeting two team leads clamored for long lived GIT branches for “reasons” at a staff meeting. I am using “reasons” here derogatorily because they really had none: it was basically an appeal to emotion. It was essential I isolated how much it will cost and try to see if we got value from it.

Apparently, the teams were worried about code that was not correct getting into the branch. I asked, 

“Why would you check in broken code?”

Of course I got some guffaws and wails claiming I was attacking them. I mean, honestly, you should ever check in broken code? How is it possibly a good thing to share knowingly damaged code? So I switched to how much will this cost us.

I did the sequence diagram that showed the result of the long lived branches and that because of merge/integration times people would tend to go longer between pulls from and pushes to master which created a feed back loop making integrations more painful. There are actually a few feed back loops including big merges makes pulls from master longer because it breaks things, and multiple side branches because the long lived branch has become unstable dilates the full integration more.

I have been in companies where the integration cycle from these long lived branches takes months(!)… yah months.

Internally my main fear was that the reason for long lived branches was a lack of development hygiene on the part of the mid managers, tech leads and developers. Long lived branches just hide an obfuscate the lack of discipline.

Fast forward 4 weeks: All progress stopped (deployment/integration times were now weeks long).  developer asked why it was so painful to merge when we used to do it effortlessly. 

My only consolation, since the company was going down hard, was I got a kick ass “I told you so” moment when I brought up the original interaction diagram to answer the developer’s question and his response was (bless him), “Why did we do that?”

I think the disillusion that longed lived branches comes from Linus and his rightful demand that master branch of the Linux kernel should be pristine.

I will lean on Linus and say all checkins should be pristine. Indeed all code should be as pristine as possible at all times (generally I use15 to thirty minute intervals). The real question is: 

“How do you define pristine?”

In the Linux case, Linus had an amazing ability to keep the entire kernel in his head, so coxs 55de that passed his scrutiny is pristine.

For us mere mortals, the only z

We should demand ‘pristine’ branches, but they should be

 

 

 

 

 

 

 

 

 

 

OOR Mapping: Why?

A Study of Impedance Mismatch

From the early 2000s to the mid teens, it seemed like every company hired me to remove an OR (object to relational) mapping because of performance, quality and extensibility issues. The gains the frameworks gave them became liabilities as their user bases grew.

OOR tools have always confused me. The idea is that you can create your OO design and automatically generate queries for your CRUD operations. This allows you to get a simple app up and running quickly. Examples of these frameworks are Java Hibernate, Ruby on Rails, etc.

The Dilemma of Business Value

Let’s say you creates a billion dollar product with simple CRUD operations using an OR. You release, and your product catches fire but competitors immediately add those features to their products. Note: data and data access can not be patented.

In a business, collected data has value and can be differentiating as long as it is correct. To leverage this data you need business logic comes in. Business logic is the embodiment of the enterprise, what they do and how it is different from every other project powered by data.

The business tier implements customer features maintainable and correct no matter how many features are thrown at it. If it can not be extended the company will not thrive. On the other hand if the data layer is not correct, the company will surely die.

There is the dilemma: Collected data is the essence of your business, therefore it must be consistent, correct and reflect transactional boundaries (I will address transactions in another rant). Your business data must be robust, extensible, maintainable, scalable and provably correct. These are two vastly different goals.

The Mismatch

Back to the original premise of an impedance mismatch. This will involve some conceptual mathematics. I am going to try to make an understandable model starting with the basics.

The goal of relational databases is to eliminate redundancy. Set theory defines what is redundant. Redundancy consists of not only attributes and data but dependancies that allow us to infer relationships between data and keep it consistent.

It gets tricky here. My X (does not deserve a pronoun) accurately said that relational databases are a compression method and we just have to make sure they are not lossy.

Relational databases must be normalized (Boyce Codd Normal Form or Third Normal form) to prevent insert, delete and update anomalies. These anomalies will destroy data integrity and ruin the value of your data. Basically, you take your fields and your Functional Dependencies crank them through the synthesis algorithm (Berstein, Phillip A “Synthesizing third normal form relations from functional dependencies).

Object Oriented Design is all about loose coupling, high cohesion and well formed objects (Reil, Arthur J “Object Oriented Design Heuristics”)

Based of these incompatibilities, David Maier at a conference in the late 80s proved that the mapping of OO to relational was an NP-hard problem. Basically you are trying to map two directed graphs onto each other (I wish I could find a reference but the results are pretty obvious).

Basically, objects may have redundant data, say an address or a zip code, that are verboten in a relational design if you allow the redundancy you are subject to the the anomalies and your data will be corrupted.

Developers get around this by either mapping their OO design to entities and relations (crap data) or mapping their entity relationship model (ER) to their OO design (crap, non extensible design).

OR the Idea

The idea is that OR libraries map between OO and ER. I mean it is NP Hard. I guess if you have time to wait for the sun to explode, it might be OK.

Most libraries ignore these inconvenient mathematical truths and create some hybrid solution (say caching redundant fields for objects and having triggers update the objects). The problem is these structures invariably bleed into your business tier and create complexities pushing you to the exponential part of the cost of change curve or to not caring about transactional and data integrity of you database.

Most companies management having me remove OR mapping layers did not understand the subtleties The were more concerned with performance and the horrific queries generated by the mappings (non tunable, random, horrid).

The Solution

In my experience, the most effective solution is to acknowledge the impedance mismatch. Use the synthesis to bring your database to BCNF. Ensure proper transactional boundaries on your updates and inserts and effective queries. This defines your data layer.

Your OO design is an emergent property of your requirements. Objects must be cohesive and loosely coupled. Tune your objects to conform to best practices of OO design (Martin, Robert “Clean Code) (Fowler, Martin “Refactoring”).