De-Duplication Outline Process

Record de-duplication is the rationalising of two or more records, in the same or different files, containing some or all of the same data. You nominate a master record and you browse records that contain duplicate data (called merge records), optionally using their data to update the master record. The merge records are then deleted.

Warning: Because of the danger of losing data, you should make copies of affected files before starting the de-duplication processes.

The process consists of four principal stages: Selecting records containing duplicate data, confirming your selection, locking the records to prevent anyone else altering them during the process, and merging the records.

To de-duplicate records:

Select duplicate records

Note: This stage consists of: Selecting fields to include in record comparisons, selecting the company branch whose files are to be cleansed, and stipulating the search logic to use.

The files are read and a log of the process (a text file) is created.
Confirm record selection

Note: This stage includes 'merging' data from your nominated Merge records, using them to update data in your nominated Master record.

The master record is brought as up-to-date as, or more up-to-date than, all merge records in your selection.
Lock the records

Note: This stage includes a facility to view files that need merging because matching data has been found (including the number of affected records in each file).

Records due to be merged are locked to prevent them being updated by another process or user.
Merge the records

The merge process is run, the merged records being either set to status X for subsequent deletion or actually deleted, depending on your choice.