At Sullexis, we are constantly faced with data cleansing and enrichment challenges. The ability to identify missing data, enrich partially complete data, and standardize/clean dirty data is critical to any data migration effort, business intelligence project, or in the tune-up of core ERP functionality.We often get questions about sharing some of our favorite data identification techniques that save time and money, and want to share two of them here: interrogating drawing files (*DWG) files to identify equipment tags and crawling State websites for registration details.For example, we were tasked with building a client’s Asset Master, which included a large number of vehicles. Unfortunately, the existing data had inconsistent and poor descriptions and was missing key information such as Registration Number, License State, and Model Year. However, we did have the VIN numbers.Armed with the VIN, we setup a web crawler that scoured select State websites to search for the missing information. For example, the web crawler was able to harvest the information below for VIN 1FTFW1ET1EKG09594.
|VIN||1FTFW1ET1EKG09594||BODY TYPE||CREW CAB PICKUP 4-DR|
|MAKE||Ford||ENGINE TYPE||3.5L V6 TURBO|
|TRIM||FX4 SuperCrew 5.5-ft. Bed 4WD|
Using this technique, we were able to enrich thousands of vehicle records in a matter of days. If you have data cleansing and enrichment challenges or have cost effective ways to solve them, we would love to hear from you.