This is the second article in a series by Artemio Mendoza (Sullexis Technical Director) and Angela Holmes (Sullexis Senior Consultant) .
Most MDM solutions will be based on a packaged software product. However, there are cases when a custom solution is needed. If you choose to build instead of buy, we have advice for you: don’t make these common mistakes.
DWH and BI Challenges Create the Need for MDM
Most organizations have years of experience with Data Warehouse (DWH) and Business Intelligence (BI) solutions before beginning an MDM solution. It is common for the challenges experienced by the DWH and BI solutions to drive the need for MDM. DWH solutions often contain inconsistent lists of customers, prospects, employees, suppliers and products. Prior to MDM, the DWH team had to reconcile and produce a master list of data to drive BI. However, the DWH team was limited by the quality of the data it received. The golden rule of DWH is: do not modify data. In the DWH world, the data received from the systems of record must be trusted.
Enter MDM. In the MDM world, it is accepted that the data from systems of record (ERP, CRM, etc.) are not perfect. The goal of MDM is to modify and correct the data. More robust and extensive processes are required to gather, cleanse, conform, reconcile, validate, authorize and publish the agreed upon master data. These processes require more extensive tools than the traditional extract, transform, and load (ETL) used by a DWH. These processes also need to support ownership of the data by the business, with IT as custodian.
Mistake #2: Your MDM Solution is not a Data Warehouse
Most organizations embarking on an MDM solution will involve their DWH and BI teams. This is good. These teams are intimately familiar with the current data structures and challenges. But they must not repeat the design patterns of the existing solutions. New design approaches are needed. A common practice in a DWH is to delete and reload all domain data on a regular basis. Do not do this in an MDM solution. Here’s why.
- The business needs to own the master data. If the data is continually re-loaded (overwritten), the business will not be able to play the critical role of data review. Instead, IT may try to automate the business rules by hardcoding them within the ETL process so they will be applied in every fresh load of data. This can lead to a more time-consuming load process, an ETL process that is difficult to maintain, and worst of all, the ETL developers, rather than the business, own the process.
- If all data is continually deleted and reloaded, each load of the data must be treated as the entire domain universe. Hence, any master data that is not present in the last load must be treated as deleted from the domain. The treatment of deleted master data depends on the implementation. Hopefully, it is not physically deleted (but if you do this, please log the delete in an audit log), but instead logically flagged as not used anymore. Complexity arises if this master data is re-included in a later load of data. This data should be re-inserted into the domain. The management of deleted/reinserted master data within a domain is complex and time consuming. Avoid this scenario.
- Finally, the repeated delete/truncate and reload process for a big domain can introduce data fragmentation. Unless your organization has moved to solid-state storage, this can potentially lead to performance degradation and wasted storage. It is likely your MDM solution will grow over time, both in the amount of data that is maintained and in the number of data domains that are maintained. Plan for growth and consider performance and scalability.
Only Load Data that has Changed
Perform an initial load per source from your Systems of Entry (SOE) and your Systems of Record (SOR). If needed, re-load this data a few times to optimize the initial ETL rules. Following a clean load of data to the MDM solution, start the data quality process. Fix what can be fixed by applying known business rules. Present all that cannot be resolved to the data stewards for resolution. Common scenarios for the data steward include: suspected bad data, missing information, potential duplicate records. While the data stewards perform the lengthy task of resolving data concerns, avoid modifications to the data. After the data stewards have reached decisions, create the golden records. After this time, extract only data from the SOE and SOR systems that has changed.
The 5% Rule for Data Changes
The frequency with which you should evaluate the data from the SOE or SOR systems for changes depends on the volatility of your data. We recommend that you should perform a comparison when 5% of the source data has changed. For very volatile data, this may mean daily comparisons. For less volatile data, it may mean weekly or monthly comparisons.
How to Know When Data has Changed
A simple answer would be: “Do this by comparing every attribute of the golden record with the most recent version of the data from the SOE/SOR. Send these data differences to the data stewards for review.“ This would not be the correct answer. It is possible and likely that the data stewards will modify the golden record data to purposely make it different from the data coming from the source systems. You would not want to show your data stewards these discrepancies, which they have already approved, every time you do a change comparison. To avoid this scenario, you must maintain a version of the last extracted source data. You should compare the newly extracted source data with the last version of extracted source data. Only send changes between these two versions of source data to your data stewards. Remember that this list of changes should not automatically trigger changes to the golden records. The golden records are only modified based on instruction/approval from the data stewards. This delta load approach will ensure the data stewards are in control of the master data.
Don’t Forget the Audit History
In an MDM solution, it is important to maintain an audit history of data changes and approvals made by the data stewards. The audit history should indicate when the comparisons showed changed data and if the golden record was modified or not. If the golden record was modified the changes to the golden record should be documented within the system. To maintain the integrity of the MDM solution, it is important that there is traceability within an MDM solution of the various versions and actions performed on the golden records.