The Watchmaker analogy goes back 200+ years. It asks the question, if you were walking in the woods and found a watch laying on the ground, what would you think, “Did it grow organically or was there a designer?” You might examine it externally and be fascinated by the seconds, minutes, hours, days, weeks, months, moon phases and tides being displayed. You then might test its accuracy and marvel at its continuous precision. Then you might open it up and be even more amazed that the displays are all being orchestrated together via a set of gears – each integrated together and performing their purpose to track and display a portion of current time on the face. Most everyone would conclude there had to be an overall designer.
Let’s continue the watchmaker analogy with data mesh, specifically considering data products in the context of gears. Each gear has a unique purpose. For example, there is no need for two gears to keep track of seconds. Just as each gear is responsible for tracking its element of time, each data product is responsible for tracking its own set of data elements. Each watch gear has a specific shape and size. Within the data mesh, each data product has a specific shape and size, called its bounded context. This enables each data product to have specific purposes and to reduce (ideally eliminate) the need for two data products to manage the same data.
Each gear has cogs to enable it to interface with other gears and watch components, e.g., the mainspring. Similarly, data products have APIs (and abstracted views for co-located and connected data products) to interface with other data products and users.
A watch gear only needs to be concerned with the portion of time it is to keep track of and display. But what happens when we move to a different time zone? We need the ability to set the watch time to the new time zone. This can be easily done by spinning the crown of the watch forward or backward to the appropriate hour. All gears must work in concert to adjust all the dials and displays synchronously to reflect (aka cast) the appropriate time elements. Some watch users may care a great deal about moon phases and tides, while others may want to know the day of the week, but it all must be reflected accurately within the current time zone.
Again, there is a similar need within data mesh, the ability to recast data based on any given point-in-time. This happens when hierarchies change over time. The original planning is built from the original hierarchy, likely at a summary level. However, actuals are being tracked based on the current hierarchy, which can change over time. Many data products must have the ability to recast data, as needed by the business, to compare plans to actuals. If business users can only recast some data products within a point-in-time, while others they cannot, then the results are meaningless. Even simply casting data via current hierarchies can have issues when data products refresh data at different intervals, e.g., near real-time, intermittently, or nightly batch. There needs to be clear standards and best practices around change data capture (CDC), slowly changing dimensions (SCDs) and service level goals (SLGs). This needs to be baked into the design of all data products, enabling time flexibility and synchronicity for all.
The data mesh approach enables data products to continuously evolve over time to meet new business needs. Within the data mesh, data governance responsibilities are pushed down to the domain team to own and implement within their data products e.g., data quality, data integration, security, schema design and metadata. These concepts are clearly understood within organizations who have been doing centralized analytic data processing for years, providing enterprise data management via data stewards. As ownership moves to business areas via domain teams, they will need to ensure they have the business and technical skillsets to manage the bounded context of their data products and securely expose the required data to other domains and users.
Decentralization of data products can enable agility. However, done without a master design, chaos will ensue, and result in point solution sprawl. We can see the importance of the watchmaker role in providing global standards, polices and best practices. Domain teams must clearly know their roles and responsibilities, so the data products they build bring value to the data mesh community of users.