In https://www.nginx.com/blog/event-driven-data-management-microservices/,
Chris uses an example to demonstrate the challenge ad solutions in keeping data
consistent in microservice.
The example is: when OrderService
creates an order, it needs to check with CustomerService to see if a customer
has enough credit for the order. OrderService
and CustomerSerivce have independent
databases, and we do not want to use 2PC because it is not scalable and robust.
One solution is to use local transaction plus even-driven:
“The Order Service
inserts a row into the ORDER table and inserts an Order Created event into the
EVENT table. The Event Publisher thread or process queries the EVENT table for
unpublished events, publishes the events, and then updates the EVENT table to
mark the events as published.”
This is however only half of the picture – how CustomerService handles events is not
included, how EventPublisher
makes sure events get published is also not mentioned.
EventPublisher
gets all “new” events from DB, publishes them, and updates “new” events to “published”.
Upon updating, the DB could be down, or EventPublisher
could be down. In either case, when they are up, EventPublisher will need to republish all
“new” events again. This leads to the first design constraint: CustomerService must be able to handle duplicated events.
CustomerService
gets an “OrderCreated” event, handles
it, and sends a “CreditReserved”
or “CreditLimitExceeded” event. Due
to the fallacies
of distributed system, this event might not reach OrderService in time. OrderService
can’t always show to customers “your order is pending, please be patient…” –
customers are not that patient, they will pound on the refresh button again and
again. After a certain time, OrderService
will have to give up and consider the order has failed, since it doesn’t know
if CustomerService has processed
the order, it needs to send an “OrderUnwinded”
event to it. If CustomerService
hasn’t processed the order, it is an empty action for CustomerService; otherwise, CustomerService needs to unreserve the customer’s
credit. In either case, CustomerService
needs to send an “OrderUnwindedSuccess”
event to the OrderService. So
this leads to the second design constraint: there needs
to be a compensating mechanism.
But this “OrderUnwindedSuccess”
event may again not arrive at OrderService
in time. From a customer's point of view, his order is one transaction: it is either successful
(order is created and credit reserved) or fail (order is not created and credit
is not reserved), even though our system is not designed to be consistent in
all time. If OrderService doesn’t
get “OrderUnwindedSuccess” event
in a certain period of time, it can’t show to the customers “Your order has failed”
because at the moment OrderService
doesn’t know if order and credit are consistent. OrderService
might retry a couple of more times, give up and log this as an exception somewhere
(an error queue, e.g.), and human intervention will be needed to sort it out. So
this leads to the third design constraint:
there needs to be a mechanism to handle exceptions.
To make the human intervention easier, there is better a
troubleshooting tool that can dig through all related services/tables/message
brokers and piece together an event flow in a time order, e.g.
- at time #, OrderService creates an order#;
- at time #, EventPublisher publishes the event OrderCreated for order#;
- at time #, CustomerServices gets the event OrderCreated for order#
- …
The complete picture looks like this:
A happy path will be like this:
1) OrderService creates a new order and new OrderEvent
2) EventService queries all "new" events on
OrderService
3) EventService publishes all "new" events on
OrderService
4) EventService updates "new" events to
"published" on OrderService
5) CustomerService gets the event
6) CustomerService processes the event
7) CustomerService updates credit table and creates an
"CreditReserved" event
8) EventService queries all "new" events on
CustomerService
9) EventService publishes all "new" events on
CustomerService
10) EventService updates "new" events to
"published" on CustomerService
11) OrderService gets the "CreditReserved"
12) OrderService updates Order status to "Confirmed"
A sad path will be like this:
1) OrderService times out in getting "CreditReserved"
2) OrderService creates "OrderUnwind" event
3) EventService publishes the "OrderUnwinded" event
4) CustomerService handles "OrderUnwinded" event
5) CustomerService sends " OrderUnwindedSuccess" event
6) EventService publishes the "OrderUnwindedSuccess"
event
When you draw an architecture diagram with boxes and links,
it looks so clear and neat. But each box might be very complex. The devil is in
the details.
By the way, I do not think this is a good example of
microservice, in this example, CustomerService is really PaymentService.
In real life, PaymentService is almost always a different service or
even system. And in real life, eventual consistency is to be expected. Perhaps
replacing CustomerService with InventoryService and replacing credit with stock would make more sense.
No comments:
Post a Comment