Friday, August 19, 2016

Start with Microservice (in mind) - I think Martin Fowler is wrong



Martin Fowler says: “Don't even consider Microservice unless you have a system that's too complex to manage as a monolith.” The reason for his caution is that Microservice is not a free lunch (nothing in life that is worthy is free): there is a great overhead in implementing Microservice. 

He shows the overhead in this diagram:

 

So where does the overhead lie? I like this diagram from http://microservices.io/patterns/index.html, which shows that the Miroservice architecture involves much more than just application design: 

 

Underneath the application layer, there are the application infrastructure layer and the infrastructure layer. In today’s technology, the application infrastructure layer usually involves some highly robust message broker, logging gathering and analysis tools etc; while the infrastructure layer usually involves some service registry and discovery, docker orchestrating tools etc; on the application layer, you have to consider how all the services can behave nicely together, you have to consider API compatibility, failure handling, eventual consistency etc. 

So not only Microservice is not a free lunch, it is quite expensive!

Martin Fowler is right to caution you about Microservice, but I think he is wrong in saying that you should start with monolith. There are great benefits in starting with Microservice in mind – even if you are building a monolithic application. 

I got this awakening when I tried to dissect one of the systems I have worked on. I first listed its features, and picked one to practice. I thought it wouldn’t be too hard because it has a very clear functionary boundary, but pretty soon I found out it has a lot hidden dependencies with other functions of the system. 

In an aging system, hidden dependencies and relationships are hard to find. We thought we were building a clearly layered beautiful system:

 

We thought we have walls along each layer and around each component. But in reality, because everything is in the same code base and in the same DB, it is so easy to break the walls and even tunnel underneath the walls. Sometimes you do this without even a second thought especially when you saw there were already holes in the wall. You ended up with this:

 


This is very bad: business logic is scattered in the code and in the database models. 

I wanted to break out moduleA as a Microservice, and discovered its hidden dependencies through discussing with my colleagues and digging into the database models. In the process, often times, discussions like the following sprung up:  

“Why did we design this model like this?”

“Probably to provide the maximum flexibility.”

“Do we have any customers using this function in this way?”

“Probably not many, but maybe a few…”

“So we went to so much trouble just for maybe a few…”

If we had started with Microservice in mind, we would have setup walls around the bounded context. We might be building a monolith, but comparing with a monolith without having Microservice in mind, the walls will be guarded more carefully because we have the idea or hope that one day we might break things apart into Microservices.  

  

The conversation, in this situation, might be like this:

“We need this feature.”

“This feature requires communicating across walls, it is expensive. Do you really need it?”

“I didn’t know it was so expensive. We need to think about this carefully.” 

Having visible walls will force us to be economic, it will force us to weigh benefits and costs. It doesn’t mean that we can’t open holes or create tunnels underneath to work around the walls – even in a Microservice architecture, we can still make a mess, at the end of the day, it is our skills that count. But having Microservice in mind force us to do deliberate thinking and make decisions explicit. 

The moduleA in my system has the following data models:
 

It has this structure because in many of its tables, either “flexibility” is reserved for some unknown future usage or because other functions tried to piggyback on those tables (e.g. adding some fields into the tables, the new fields belong to different domain models than the tables). Applying DDD, moduleA’s core can be structured as:

 

Of course, I am benefited by 100% hindsight. But I believe if there is one single bullet, it is: start with simple, evolve it into complexity after it is proven.  

Tuesday, August 9, 2016

The devil is in the details – eventual consistency



In https://www.nginx.com/blog/event-driven-data-management-microservices/, Chris uses an example to demonstrate the challenge ad solutions in keeping data consistent in microservice. 

The example is: when OrderService creates an order, it needs to check with CustomerService to see if a customer has enough credit for the order. OrderService and CustomerSerivce have independent databases, and we do not want to use 2PC because it is not scalable and robust. 

One solution is to use local transaction plus even-driven:
“The Order Service inserts a row into the ORDER table and inserts an Order Created event into the EVENT table. The Event Publisher thread or process queries the EVENT table for unpublished events, publishes the events, and then updates the EVENT table to mark the events as published.”

This is however only half of the picture – how CustomerService handles events is not included, how EventPublisher makes sure events get published is also not mentioned. 

EventPublisher gets all “new” events from DB, publishes them, and updates “new” events to “published”. Upon updating, the DB could be down, or EventPublisher could be down. In either case, when they are up,   EventPublisher will need to republish all “new” events again. This leads to the first design constraint: CustomerService must be able to handle duplicated events.

CustomerService gets an “OrderCreated” event, handles it, and sends a “CreditReserved” or “CreditLimitExceeded” event. Due to the fallacies of distributed system, this event might not reach OrderService in time. OrderService can’t always show to customers “your order is pending, please be patient…” – customers are not that patient, they will pound on the refresh button again and again. After a certain time, OrderService will have to give up and consider the order has failed, since it doesn’t know if CustomerService has processed the order, it needs to send an “OrderUnwinded” event to it. If CustomerService hasn’t processed the order, it is an empty action for CustomerService; otherwise, CustomerService needs to unreserve the customer’s credit. In either case, CustomerService needs to send an “OrderUnwindedSuccess” event to the OrderService. So this leads to the second design constraint: there needs to be a compensating mechanism.

But this “OrderUnwindedSuccess” event may again not arrive at OrderService in time. From a customer's point of view, his order is one transaction: it is either successful (order is created and credit reserved) or fail (order is not created and credit is not reserved), even though our system is not designed to be consistent in all time. If OrderService doesn’t get “OrderUnwindedSuccess” event in a certain period of time, it can’t show to the customers “Your order has failed” because at the moment OrderService doesn’t know if order and credit are consistent. OrderService might retry a couple of more times, give up and log this as an exception somewhere (an error queue, e.g.), and human intervention will be needed to sort it out. So this leads to the third design constraint:  there needs to be a mechanism to handle exceptions.  

To make the human intervention easier, there is better a troubleshooting tool that can dig through all related services/tables/message brokers and piece together an event flow in a time order, e.g.

  • at time #, OrderService creates an order#;
  • at time #, EventPublisher publishes the event OrderCreated for order#;
  • at time #, CustomerServices gets the event OrderCreated for order#
  •  

The complete picture looks like this:

A happy path will be like this:
1) OrderService creates a new order and new OrderEvent

2) EventService queries all "new" events on OrderService
3) EventService publishes all "new" events on OrderService
4) EventService updates "new" events to "published" on OrderService

5) CustomerService gets the event
6) CustomerService processes the event
7) CustomerService updates credit table and creates an "CreditReserved" event 

8) EventService queries all "new" events on CustomerService
9) EventService publishes all "new" events on CustomerService
10) EventService updates "new" events to "published" on CustomerService

11) OrderService gets the "CreditReserved"
12) OrderService updates Order status to "Confirmed"

A sad path will be like this:
1) OrderService times out in getting "CreditReserved"
2) OrderService creates "OrderUnwind" event

3) EventService publishes the "OrderUnwinded" event

4) CustomerService handles "OrderUnwinded" event
5) CustomerService sends " OrderUnwindedSuccess" event

6) EventService publishes the "OrderUnwindedSuccess" event

When you draw an architecture diagram with boxes and links, it looks so clear and neat. But each box might be very complex. The devil is in the details.  

By the way, I do not think this is a good example of microservice, in this example, CustomerService is really PaymentService. In real life, PaymentService is almost always a different service or even system. And in real life, eventual consistency is to be expected. Perhaps replacing CustomerService with InventoryService and replacing credit with stock would make more sense.