2021 – Cleaning Up Bad JSON Data

After a few weeks using our JSON patient management system the users have noticed patients with more than one MRN in a single Ordering System. This should not have been allowed to happen. We’ll put something in place later to prevent future occurrences. For this post we will look at the cleanup. Here’s an example of a record to clean up.

SELECT patient_name,
                os,
                mrn
  FROM json_patient,
       JSON_TABLE(patient_data, '$'
         COLUMNS ( patient_name VARCHAR2 PATH '$.pname',
           NESTED PATH '$.mrns[*]'
             COLUMNS ( os  VARCHAR2(30) PATH '$.OS',
                       mrn VARCHAR2(30) PATH '$.MRN' )))
PATIENT_NAME OS MRN
------------ -- ------
Patient 1     1 MRN123
Patient 1     1 MRN456

And now for the code to remove one of those entries.

DECLARE
  v_clob CLOB;
  v_jo   JSON_OBJECT_T;
  v_arr  JSON_ARRAY_T;
  v_lmnt JSON_ELEMENT_T;
  v_os   NUMBER;
  v_txt  VARCHAR2(1000);
  TYPE v_n_t IS TABLE OF NUMBER
  INDEX BY BINARY_INTEGER;
  v_os_kept v_n_t;
  v_del_lmnt v_n_t;
  v_counter NUMBER;
BEGIN
  -- for every patient with more than 1 MRN in an OS
  FOR x IN ( SELECT patient_name,
                    os,
                    COUNT(*)
               FROM json_patient,
                    JSON_TABLE(patient_data, '$'
                      COLUMNS ( patient_name VARCHAR2 PATH '$.pname',
                        NESTED PATH '$.mrns[*]'
                          COLUMNS ( os  VARCHAR2(30) PATH '$.OS',
                                    mrn VARCHAR2(30) PATH '$.MRN' )))
             GROUP BY patient_name,
                      os
             HAVING COUNT(*) > 1 ) LOOP
    DBMS_OUTPUT.PUT_LINE(x.patient_name || ' ' || x.os);

    -- Retrieve the patients full JSON record
    SELECT patient_data
      INTO v_clob
      FROM json_patient
     WHERE JSON_VALUE(patient_data,'$.pname') = x.patient_name;

    -- drill down to the OS/MRN array
    v_jo := JSON_OBJECT_T.PARSE(v_clob);
    v_arr := v_jo.get_array('mrns');

    -- initialize program arrays
    --   v_os_kept  = each OS that has a single element to keep
    --   v_del_lmnt = each array element to delete because one
    v_os_kept.DELETE;
    v_del_lmnt.DELETE;
 
    /*
       The algorithm for the code below is as follows;
       For each element in the array
         Is the OS value for the element already stored in the v_os_kept array?
           If YES then add the element number to v_del_lmnt array
           If NO  then add the OS value for the element to the v_os_kept array
       This ensures the first element for each OS will be retained and the reset
       flagged for deletion later
    */

    -- for every OS/MRN combination the patient has
    FOR counter IN 0..v_arr.GET_SIZE - 1 LOOP
      v_lmnt := v_arr.get(counter);
      v_jo := TREAT(v_lmnt AS JSON_OBJECT_T );
      v_os := v_jo.get('OS').stringify;
      IF v_os_kept.EXISTS(v_os) THEN
        DBMS_OUTPUT.PUT_LINE('Delete OS ' || v_os);
        v_del_lmnt(counter) := 1;
      ELSE
        DBMS_OUTPUT.PUT_LINE('Save OS ' || v_os);
        v_os_kept(v_os) := 1;
      END IF;
    END LOOP; -- every OS/MRN combination the patient has

    v_counter := v_del_lmnt.FIRST;
    DBMS_OUTPUT.PUT_LINE(v_arr.stringify);

    -- And now to remove the elements built up in the v_del_lmnt array!
    -- for very array element to remove...
    LOOP
      EXIT WHEN v_counter IS NULL;
      DBMS_OUTPUT.PUT_LINE('Delete Element ' || v_counter);
      v_arr.REMOVE(v_counter);
      v_counter := v_del_lmnt.NEXT(v_counter);
    END LOOP;   -- every array element to remove
    DBMS_OUTPUT.PUT_LINE(v_arr.stringify);

    -- Update the database with the new array of OS/MRN's for the patient
    v_txt := v_arr.stringify;
    UPDATE json_patient
    SET patient_data = JSON_MERGEPATCH(patient_data,'{"mrns":' || v_txt ||'}')
    WHERE JSON_VALUE(patient_data,'$.pname') = x.patient_name;

  END LOOP;  -- every patient with more than 1 MRN in an OS

END;

That does what we want but it took a lot of low level code. Especially compared to the much simpler RDBMS version of this. It would be nice to be able to quickly determine which patients had multiple entries in an Ordering System (OS) and even determine which OS was involved. Perhaps this could even be done with one query! That would be very helpful. And the next few posts will try to do that. I cant promise full success but we will definitely try – and learn along the way.

Thanks for reading!

Leave a Comment