This is the third article in a series by Artemio Mendoza (Sullexis Technical Director) and Angela Holmes.
Most MDM solutions will be based on a packaged software product. However, there are cases when a custom solution is needed. For those pursuing a custom path, we have advice for you: don’t make these common mistakes.
Wanted: A Single Source of Truth
Many organizations embark on Master Data Management as a means to achieve a Single Source Of Truth (SSOT). Every data architect dreams of having every data element stored exactly once, and having that record be perfect and complete. Any other system using this data refers to this single instance. In this dream world, when the record is updated, it is updated once and then made available to all of the systems that need to refer to it.
Most organizations have multiple systems that need access to data that relates to the same entities (e.g. customer). Often these are off the shelf software packages with their own data schemas that necessitate they maintain their own copy of the data. It is not easy and not usually recommended to modify these packages to source data elsewhere. In most enterprise landscapes, the customer record is stored by an ERP system (such as SAP or Oracle e-Business Suite), a CRM system and a warehouse dispatch system. Each of these systems contains valid data that is part of the customer record.
It is a good dream, but the Repository model of MDM (as it is known), is not real life in the vast majority of organizations. It is just too difficult to modify every application to source critical data externally. For most organizations, the Registry model, in which critical data is maintained only in the systems of record and an MDM repository updates itself using complex and distributed queries, is also not feasible. This model is complex and presents many performance challenges. It is difficult to maintain an MDM model that is tightly coupled with and dependent on so many external systems.
The Hybrid Approach and The Golden Record
The most common approach for MDM is the Hybrid model. The systems of record keep their local copies of data. The MDM keeps a repository of the data keys and also replicates the most important attributes for each master entity. At first this approach seems the easiest. But this approach has challenges that the other approaches do not. This model includes replicated data and must have a way to manage those duplicates. And the real challenge is that the duplicates are not exact duplicates; each is a version that is partially accurate.
The golden record concept acknowledges that there will be many versions of the same record across multiple systems. This concept also acknowledges that no single instance of this record is perfect. Each instance may have attributes that are more correct or complete than others. As such, the golden record needs to be a compilation that is created from all of the duplicate records representing the same real-world entity.
It is critical that you design your custom MDM solution to properly accommodate the golden record concept. We will tell you how.
Example of a Golden Record
A golden record is a selection of attributes among a collection of records that represent the same object.
Let’s consider an example for a customer domain. In this case, Robert Maloney has been entered as a customer in both the Credit System of a financial services organization and in the Check System. A customer record was created in each system with variations in the data, as shown below.
System | Real Life | Check System | Credit System |
Name | Robert Maloney | Robert M. | Bob Maloney |
SSN | 123-45-6789 | 123-45-6789 | 123-45-6789 |
Client ID | N/A | 123 | 456 |
As every MDM practitioner knows, these two records should be grouped together as a single one. However, what not everyone knows is how to implement it efficiently in a custom solution, not only to identify the records as a single object, but also, to allow the data stewards to review the information, perform changes and re-group if needed.
Mistake #3: Fail to design for duplicate record groups
As our example has illustrated, there will be duplicates in an MDM solution. And these duplicates are not usually exact duplicates. The duplicate records need to be matched, grouped and the attribute values for the golden record need to be pulled from the record with the most accuracy.
In a database design, a golden key is needed to identify the golden record. However, we recommend a database design that will store and properly plan for those valuable duplicates. Our design maintains all of the rows of data that relate to the golden record in a separate table, grouped together by a cluster key, as shown below.
In this example, the record from the Check system and the Credit system that both relate to the same customer are grouped together using the cluster key (ClusterID = 7777). The golden record is linked to this cluster of duplicates using this cluster key. This grouping of the duplicates enables the records to be presented to the data stewards for review and analysis as a cohesive representation of the same real world object, Robert Maloney.
In this design, the golden record has its own key (MasterKey = 999999). The golden record has its own set of master attributes that can be selectively assigned based on algorithms using values from the records in the cluster and/or by the data stewards.
This table design enables a simplified view to present the candidates for a golden record and it allows for efficient maintenance of those records.
Conclusion
This is the 3rd and final article in the series. We can offer you this advice because we have learned the hard way: we have either made these mistakes ourselves or we have had to clean up someone else’s mistakes. We hope you can learn from our experience. If we at Sullexis can help you, just call us . Best wishes to you for success in your custom MDM solutions!