Data Cleaning - Best Practice

SIM for all Clients

Introduction

The Odoo Data Cleaning app is a very powerful Module to help keep your data under control. Introduced in Odoo 15, the app was updated in Odoo 16 to include Data Recycling on top of the existing Deduplication and Field Cleaning. Data Cleaning is the overall term covering all 3 parts of the app. At Silverdale, we recommend using all 3 - some on a manual basis, some automated.

Data Cleaning in 3 Parts

There are 3 main parts of the Data Cleaning app:

Deduplication - uses a set of rules to identify potential duplicates in your data. The majority of our Clients use this part in relation to Contacts, Products, Opportunities, and various CRM related campaign and other data.

Recycle Records - uses a set of rules to archive data that isn't being used. For example, if your business primarily operates in the US, and you've never created a contact in Angola, you can set Odoo to archive the country record of Angola. This reduces the list of potential options on your Country drop-down list. This can help with old, stale, or unused data options.

Field Cleaning - uses a set of rules to format and correct data. For example, you can format phone numbers so that they can be used in VOIP, set all email addresses to lower case only, format Product to be all upper case. This is a great way to get consistency in your data.

Deduplication

  1. Split your Contact deduplication rules by Companies versus Individuals; there are often very large differences in the deduplication similarity % between the 2 types
  2. Keep the deduplication rules as simple as possible; for example, on Contact Individuals only look for Names and Email addresses.
  3. For you first 3 months, keep it Manual; don't use the automatic cleaning option until your confident in the rules you've set
  4. Use the automatic cleaning above 80%; once you've used the tool for a few months and you're finding your rules are more accurate, set you automatic cleaning at 80% similarity. We find this provides the right balance between automation and unintended consequences
  5. Be careful with Products; deduplicating Products can cause issues, especially when you have inventory or open POs, SOs, Invoices, or other open transactions. When you deduplicate one of the Products will be archived which can 'hide' inventory from the system. We recommend only using Manual deduplication for Products so you can use it as a list to investigate before choosing to deduplicate
  6. Watch for followers; any record being deduplicated will also merge the records followers
  7. Use for Tags; tags are a great use-case for deduplication as it's common to get different spellings, singular versus plural, and upper and lower case entries.

Recycle Records

  1. Start VERY simple; for example, create a recycle record for Industry (used in Contacts)
  2. Never use the automation; keep control of this manually, unless you are REALLY sure
  3. Always Archive; don't Delete, you never know when you might need the record again
  4. Make sure your time period is appropriate; don't set this too short, for example 1 month. 6 months is the minimum we recommend.
  5. Make sure you select the right Time Field; depending on the Model you select you will see different options in the Time Field, make sure to choose the right one.
  6. Use the Filter to select the right data population; if you never want to archive Customers for example, exclude them from the recycle by using the Filter section

Field Cleaning

  1. Get consensus on data formatting; upper or lower case for Products? We see this a lot where one part of the company wants one versus the other. Make sure you all agree on the formatting.
  2. Always use a rule to set email addresses as all lower case; this is an industry standard and prevents issues with logging in through the portal - this is the first rule you should set up
  3. Use to get consistent data in tickets, tasks, and opportunities; you can create rules that remove superfluous spaces on all these records - this rule helps with searching records
  4. Use automation; especially for email addresses, phone numbers, removing spaces.
  5. Don't put too many rules in a single cleaning rule; split your cleaning rules so that you can choose to automate the cleaning of phone numbers and email addresses, but not names.