2021 – Cleaning Up Bad RDBMS Data

This post compares the coding approach to fixing bad data in an RDBMS and JSON system. First up is the RDBMS version…

After using our patient management system for a few weeks the users started noticing patients with multiple MRN’s in a single Ordering System. This is happening because we forgot to add a unique constraint on the OS and PATIENT_ID columns. We’ll do that in a later post. For now we need to get rid of the bad data – here is an example.

SELECT ( SELECT patient_name
           FROM patient
          WHERE patient_id = patient_mrn.patient_id ) pname,
        ordering_system,
        patient_mrn
   FROM patient_mrn;
PNAME     ORDERING_SYSTEM PATIENT_MRN
--------- --------------- -----------
Patient 2               1 ONE IS OK
Patient 2               1 TWO IS NOT

Patient 2 should not have more than one Medical Record Number (MRN) per Ordering System (OS). We need to make one of them go away fast. Here is the PL/SQL to make that happen.

BEGIN
  -- for every patient with more than 1 MRN for an OS...
  FOR x IN ( SELECT patient_id,
                    ordering_system,
                    COUNT(*) c
               FROM patient_mrn
             GROUP BY patient_id,
                      ordering_system
             HAVING COUNT(*) > 1 ) LOOP
    -- process all but the very last one...
    FOR counter IN 1..x.c - 1 LOOP
      DBMS_OUTPUT.PUT_LINE('Delete ' || x.patient_id || ' ' ||
                           x.ordering_system);
      -- delete the OS/Patient ID record
      DELETE patient_mrn
      WHERE patient_id = x.patient_id
        AND ordering_system = x.ordering_system
        AND rownum = 1;
    END LOOP;  -- all but last one
  END LOOP;  -- every MRN and OS > 1
END;
Delete 2 1

Now the patient has only 1 MRN again!

SELECT ( SELECT patient_name
           FROM patient
          WHERE patient_id = patient_mrn.patient_id ) pname,
        ordering_system,
        patient_mrn
   FROM patient_mrn;
PNAME     ORDERING_SYSTEM PATIENT_MRN
--------- --------------- -----------
Patient 2               1 TWO IS NOT

A short amount of code to get the job done in the RDBMS world.

Thanks for reading!

Leave a Comment