I remember my first job after graduating from college was building an internal .NET application interfacing with a legacy system. This legacy system ran on an AS400 and was written in RPG utilizing a DB2 database for data storage. When I looked at the database schema, I was horrified. Opposite of any form of normalization, the real world apparently didn’t build applications as my school had taught.
That wasn’t true, however. The remainder of that job and every job thereafter, and most of the articles I read on the interwebs espouse the same principles I was taught. Normalize your data, maintain referential integrity, run write operations inside a transaction, etc… This seemed the universally accepted way to build systems. So I continued on this learned trajectory churning out quality software meeting business requirements as specified. When the database had trouble handling my 17 join query, we cached the results at the application layer. When we could deal with day old data, we ran ETL processes at night to pre-calculate the source of the burdensome queries.
In retrospect, I wish these situations triggered my memory of that first legacy system. The application cache and the extracted tables mirrored those early schemas that kept me up at night. And worse still is that these objects were not just use to read the data, but they were also used to update the data. While I preached the principles of SOLID, I ignorantly violated the first letter in the acronym.
So, what did those original developers know that I didn’t? The original system was built to run on a mainframe with distributed terminal clients. The mainframe does all the work while clients would simply view screens and then issue commands or queries to change the view or update the data respectively. That resembles very closely the architecture of the web; a webserver on a box and a number of browsers connect to it to view data or post forms. These days, our web servers can handle a whole lot of load (especially when load balanced), and much more than the original mainframes. So, the mainframe guys supporting distributed clients are like a website supporting gazillions of hits a day (an hour?). How did they manage this complexity?
CQRS stands for command-query responsibility segregation. It literally means separating your commands from your queries; your reads from your writes. They are responsible for different things. Reads don’t have any business logic in them (aside from authorization perhaps). So why did I keep insisting on a single model to rule them all?
This may sound complex and it can be. In my next post, I want to delve into how CQRS can help us manage this complexity.