Aggregate is the idea that promotes DDD among the Microservice community: how big
should a Microservice be? What should be included in one Microservice and not
another? The common recommendation is: one aggregate should be in one Microservice.
There are
some terms defined in the book that we should be familiar with.
Entity: an entity is an object that has an
identity through its lifecycle, for example, Customer, Employee etc.
Value: a value object is an object that
has no identity. For example, Address of a Customer is composed of Country,
City, Street, it has no meaningful identity; however, in a mapping application,
an Address might be modeled as Entity.
Aggregate: An
AGGREGATE is a
cluster of associated objects that we treat as a unit for the purpose of data
changes. Each AGGREGATE has a root and a boundary. The boundary defines what is inside the AGGREGATE. The root is a single, specific ENTITY contained in the AGGREGATE. The root is the only member of the
AGGREGATE that
outside objects are allowed to hold references to, although objects within the
boundary may hold references to each other. ENTITIES other than the root have local
identity, but that identity needs to be distinguishable only within the AGGREGATE, because no outside object can ever
see it out of the context of the root ENTITY.
An Aggregate needs to follow these rules:
- The root ENTITY has global identity and is ultimately responsible for checking invariants.
- ENTITIES inside the boundary have local identity, unique only within the AGGREGATE.
- Nothing outside the AGGREGATE boundary can hold a reference to anything inside, except to the root ENTITY.
- Only AGGREGATE roots can be obtained directly with database queries. All other objects must be found by traversal of associations.
- Objects within the AGGREGATE can hold references to other AGGREGATE roots.
- A delete operation must remove everything within the AGGREGATE boundary at once.
- When a change to any object within the AGGREGATE boundary is committed, all invariants of the whole AGGREGATE must be satisfied.
- When a change spans across Aggregate boundaries, invariants spanning across boundaries might not be enforced at all times.
If we implement an Aggregate as a Microservices, the last two
rules apply to Microservices perfectly: transactions across Microservices are
not expected to be consistent at all times, the usual recommendation is for
“eventual consistency” (check The
devil is in the details – eventual consistency)
Take the following example:
A Car is obviously an Entity with a global identity; outside of
the Car context, we probably don’t care about a Tire – no one would be
interested to query the database and find out what Car a particular Tire is
attached to. Our interest of Tire is through Car, Tire identification is
meaningful within the car context. You
might be interested to track an Engine independently of a Car, if so, Engine is
outside of the Car context.
Repository:
A repository functions like a DAO: to save and resurrect objects to/from a
storage (usually a database), but in DDD, you should only provide Repository for the Aggregate Root. In the Car context, there
will only be a CarRepostory, whose getCarById() method will return you a Car with proper attributes, for example, a Car
with 4 wheels in correct positions and 4 tires. There will no TireRepository nor WheelRepository.
The book provides a more extended example in
Chapter 7 (Using the language: An Extended Example). The example is about a
delivery company delivering cargoes for customers:
The business logic is:
- Multiple Customers are involved in a Cargo with different roles, shipper, receiver, payer etc.
- A Delivery Specification defines the goal of shipping a Cargo.
- A Handling Event is an action taken with Cargo, such as loading it to a ship, or clearing it through customs.
- A Carrier Movement represents a trip by a Carrier (a ship or a truck) from one Location to another.
- A Delivery History represents that has happened to a Cargo.
With this model, how should we define
boundaries (and Aggregates)?
Cargo, Delivery Specification, Delivery History obviously belong together: Delivery Specification and Delivery History have no interesting identity on
their own, our interest on them is through Cargo.
Customers, Locations,
and Carrier Movements
stand on their own with global identities: we are interested to find out about
them directly, not through a Cargo. So they will be the root of their own Aggregates.
Handling Event is a tricky one. It can be included
in the Cargo context, if
so, we will find out it through Cargo -> Delivery History ->
Handling Event. (Remember, we can only query directly the Aggregate Root, other objects must be traversed
through associations) ; on the other hand, a Carrier Movement is shared by many Carriers, and we might be interested to know
for a particular Carrier Movement, what Handling Events must be carried out. In this sense,
Handling Event is
meaningful outside of the Cargo context. Handling Events can happen outside of a Carrier
Movement, for example,
clearing a cargo through customs.
If it is hard for you to decide, think about
how this application will be used:
- There should be “booking function” that allows customers to book a cargo delivery, they can specify a delivery specification and track delivery history. The Cargo context will be used mostly here.
- There should be a “logging function” that allows operations to log handling events for all cargos.
From the business functions, we can see “Handling
Events” stands out on
its own.
With the above analysis, we arrive at the
following Aggregates: Cargo, Delivery History, and Delivery Specification are in one Aggregate with Cargo as the Aggregate root; other entities are Aggregate root of their own Aggregates.
Notice, in this diagram, there is no HandlingEventRepository, even though Handling Event is the Aggregate root, let us play it along to see
what happens. Without its own Repository, Handling Event can be saved and retrieved through Cargo:
public static HandlingEvent newLoading( Cargo cargo, CarrierMovement loadedOnto, Date timeStamp) { HandlingEvent event = new HandlingEvent(cargo, LOADING_EVENT, timeStamp); event.setCarrierMovement(loadedOnto); cargo.getDeliveryHistory().addEvent(event); return result; }
What is wrong with this
approach?
- It is cumbersome to maintain this relationship.
- If when adding an event to a Cargo, other users are modifying the Cargo, the transaction of adding an event will fail. The “logging function” is used by operation people, and needs to be efficient.
To address these issues,
we add “Handling Event Repository”:
This approach has its own problem: now Handling
Event is saved independently
of Cargo, there is
no guarantee that Cargo will get an up-to-date view of the history. This speaks to the last
rule of Aggregate “When a change spans across Aggregate boundaries, invariants
spanning across boundaries might not be enforced at all times.”
Now you can see why DDD sparks so much interest
among the Microservice Community. Microservice is not easy (check Start
with Microservice (in mind) - I think Martin Fowler is wrong), a
Microservice architecture has 3 layers, while “infrastructure layer” and “application
infrastructure layer” are pretty much technical and can be helped by many open
sources today, a well-behaving “application layer” requires you to carefully
construct your models so there will be minimum interdependencies among them. If
boundaries of Micorservices are not designed carefully, there will be a lot of
dependencies among Microservices, and maintaining data consistency will be a
hell. Aggregate
provides an approach for you to reason about boundaries.
Nice article, thx~
ReplyDeleteGood article
ReplyDelete