Dirty data is a big problem—and too many companies don't know how to handle it. According to a 2013 Gartner study, 20 percent of the information inside a database is of poor quality, and it's costing organizations an average of more than $14 million every year.
Data cleaning is a challenging issue in the following situations:
- Migrating your old data from your legacy system to a new software solution (converting master data)
- Upgrading to a newer version of your current software
- Trying to get rid of your current software's old and polluted data
Regardless of which circumstance applies to you, making your data clean and workable may take weeks or months (or even years!). The exact length of the process will depend on the complexity of your business operations, the age of your database, and the extent to which your data is polluted.
However, the effort is well worth the cost. At the end of the data cleaning process, your information will be of higher quality, occupy less space, and have fewer issues with data integrity. Here's a look at 10 challenges that you need to plan for when cleaning and converting your enterprise data.
1. To Use Internal vs. External Teams When Converting Master Data
To begin with, you need to decide whether the people who clean your data will be internal to your organization or brought in temporarily from a third party. When the data requires a serious cleaning, companies usually choose to partner with external specialists. Depending on the particular survey, estimates show that using an internal team may cost as much as 50 percent more than contracting with a third party.
Why is this cost difference so great? Part of the reason is that your internal team members weren't hired for these activities, so they likely aren't experts in data cleaning. Not only do they need to clean your data, but they also need to fulfill their primary job responsibilities at the same time, which can lead to extra stress and delays. External teams, on the other hand, are data cleaning professionals with a wealth of experience and expertise.
2. Duplicate Data
The biggest problem with dirty data is having multiple data entities to represent the same object, whether it's a part, a vendor, a customer or even a fixed asset. In these cases, you need to compare the duplicate records and select the one with the most complete and accurate information. For example, if you have two records pointing to the same mechanical part, you should select the record with the best technical definitions. In some cases, you may have to merge the information in different records to create the final version.
3. Obsolete or Inactive Records
Whether you're cleaning your current or your legacy system, there's likely data on there that's out of date or no longer in use. It might be parts that are no longer available on the market, vendors that have gone out of business, or your previous bill of materials (BOM). These records should be marked as obsolete and should not move over to your new database.
4. Incomplete Data
One of a master data team's most crucial tasks is to add technical details to the various entries for different mechanical parts. If you're missing some of these details, however, then you need to gather the parts' technical specifications and add them to the appropriate data records. Having a well-classified part list will help you reduce your inventory stocking costs by saving you from buying unnecessary inventory.
5. Incorrect Data
Even if you uploaded the correct data during the initial phase of your software implementation, there are still ways to go wrong later on. For example, after going live, you may have entered operational data with spelling errors or with inconsistent values. Some mistakes can be corrected automatically if the problem is present in multiple records, while others will have to be manually fixed.
6. Changes to Data Structures
By adopting a new database management solution, you're also choosing to change your business processes, which means that your data structure needs to be adjusted. In turn, your production processes and operations need to change to be in line with your new mapping. Because of this interlocking relationship, your data cleaning team needs to be intimately familiar with your company's industry and your choice of solution.
7. Compound Data Fields
Depending on the layout of your new data solution, you may have to separate or combine records as appropriate. For example, your legacy system may have stored the "street name" and "street number" of an address together in a single field, but your new system has two different fields for them. The data cleaning team that you choose needs to know how to handle these records and split them carefully.
8. Data Format and Length
Another issue with migrating to a new database is that the data formats and character limits may be different. For example, legacy systems may allow you to store numbers and dates in a freeform text field, but your new solution has number and date fields for this purpose. You'll therefore have to convert these values when moving over to the new system. What's more, your new database may have text fields that are shorter than your legacy system, requiring you to shrink the length of some records.
9. Intelligent Data
Intelligent data is designed to help users extract meaningful information from part numbers, but it doesn't always function as planned. As a result of product lifecycle management, the useful information about a given part can change over time. When you move to a new solution, intelligent numbering isn't automatically applied. In this case, your team will need to perform data conversion.
10. Unit of Measure (UoM) Changes
Your new database solution may use different units of measure, such as the imperial system instead of the metric system. As a result, your data cleaning team will need to use a conversion table for each record that requires a new unit of measure.