2021 – Compare JSON and RDBMS Data 1

I really need to compare my data sets (one RDBMS and one JSON) to ensure they match. And the only programming languages I trust are SQL and PLSQL. If they say the data is the same then I trust it is the same.

The Data

-- RDBMS
SELECT patient_name,
       ordering_system,
       patient_mrn
  FROM patient p,
       patient_mrn pm
 WHERE p.patient_id = pm.patient_id;

PATIENT_NAME ORDERING_SYSTEM PATIENT_MRN
------------ --------------- -----------
Patient 1                  1 MRN123
Patient 1                  2 MRN456
Patient 2                  1 MRN99

-- JSON
SELECT JSON_VALUE(patient_data,'$.pname'       NULL ON ERROR ) pname,
       JSON_VALUE(patient_data,'$.mrns[0].OS'  NULL ON ERROR ) frstos,
       JSON_VALUE(patient_data,'$.mrns[0].MRN' NULL ON ERROR ) frstmrn,
       JSON_VALUE(patient_data,'$.mrns[1].OS'  NULL ON ERROR ) scndos,
       JSON_VALUE(patient_data,'$.mrns[1].MRN' NULL ON ERROR ) scndmrn
  FROM json_patient;

PNAME     FRSTOS FRSTMRN SCNDOS SCNDMRN
--------- ------ ------- ------ -------
Patient 1      1 MRN123       2 MRN456
Patient 2      1 MRN99

JSON Is Just Text Right?

At first glance the JSON format looks like relational data aggregated into one list. That assumption plus my inclination to rely on SQL to do everything (plus the bolding in the previous sentence) makes me want to try the LISTAGG function. Off I go…

SELECT p.patient_name,
       LISTAGG(ordering_system || ' ' || patient_mrn,' ')
           WITHIN GROUP ( ORDER BY ordering_system) la
  FROM patient p,
       patient_mrn pm
 WHERE p.patient_id = pm.patient_id
GROUP BY p.patient_id, p.patient_name;

PATIENT_NAME LA
------------ ---------------
Patient 1 1  MRN123 2 MRN456
Patient 2 1  MRN99

That result looks like JSON without the keys and brackets and such. It provides enough information for a straight text comparison so I carry on.

-- RDBMS LISTAGG
SELECT p.patient_name,
       LISTAGG(ordering_system || ' ' || patient_mrn,' ')
           WITHIN GROUP ( ORDER BY ordering_system) la
  FROM patient     p,
       patient_mrn pm
 WHERE p.patient_id = pm.patient_id
GROUP BY p.patient_id, p.patient_name
INTERSECT
-- JSON values as text
SELECT JSON_VALUE(patient_data,'$.pname'       NULL ON ERROR ) pname,
       JSON_VALUE(patient_data,'$.mrns[0].OS'  NULL ON ERROR ) || ' ' ||
       JSON_VALUE(patient_data,'$.mrns[0].MRN' NULL ON ERROR ) || ' ' ||
       JSON_VALUE(patient_data,'$.mrns[1].OS'  NULL ON ERROR ) || ' ' ||
       JSON_VALUE(patient_data,'$.mrns[1].MRN' NULL ON ERROR ) listagg_wannabe
FROM json_patient;

PATIENT_NAME LA
------------ -----------------
Patient 1    1 MRN123 2 MRN456

Only 1 row returned? Where did Patient 2 go?

It was not included in the JSON because of the extra spaces I used to separate the text. A quick RTRIM takes care of that.

SELECT JSON_VALUE(patient_data,'$.pname' NULL ON ERROR ) pname,
       RTRIM(
         JSON_VALUE(patient_data,'$.mrns[0].OS'  NULL ON ERROR ) || ' ' ||
         JSON_VALUE(patient_data,'$.mrns[0].MRN' NULL ON ERROR ) || ' ' ||
         JSON_VALUE(patient_data,'$.mrns[1].OS'  NULL ON ERROR ) || ' ' ||
         JSON_VALUE(patient_data,'$.mrns[1].MRN' NULL ON ERROR )
            ) jzn
FROM json_patient;

PNAME     JZN
--------- -----------------
Patient 1 1 MRN123 2 MRN456
Patient 2 1 MRN99

Now the INTERSECT will work.

SELECT JSON_VALUE(patient_data,'$.pname' NULL ON ERROR ) pname,
       RTRIM(
         JSON_VALUE(patient_data,'$.mrns[0].OS'  NULL ON ERROR ) || ' ' ||
         JSON_VALUE(patient_data,'$.mrns[0].MRN' NULL ON ERROR ) || ' ' ||
         JSON_VALUE(patient_data,'$.mrns[1].OS'  NULL ON ERROR ) || ' ' ||
         JSON_VALUE(patient_data,'$.mrns[1].MRN' NULL ON ERROR )
            ) jzn
  FROM json_patient
INTERSECT
SELECT p.patient_name,
       LISTAGG(ordering_system || ' ' || patient_mrn,' ')
             WITHIN GROUP ( ORDER BY ordering_system)
  FROM patient p,
       patient_mrn pm
 WHERE p.patient_id = pm.patient_id
GROUP BY p.patient_id, p.patient_name;

PNAME     JZN
--------- -----------------
Patient 1 1 MRN123 2 MRN456
Patient 2 1 MRN99

JSON Is Still Just Text Right?

Not really. Its does have structure and logic. For example – what if the OS/MRN combinations were reversed?

For example this:

[{"OS":2,"MRN":"MRN456"},{"OS":1,"MRN":"MRN123"}]

Instead of this:

[{"OS":1,"MRN":"MRN123"},{"OS":2,"MRN":"MRN456"}]

Would the text comparison still work? I’ll update patient 1 to reverse the order of the array elements to see what happens.

UPDATE json_patient
SET patient_data = '{"pname":"Patient 1","mrns":[{"OS":2,"MRN":"MRN456"},{"OS":1,"MRN":"MRN123"}]}'
WHERE JSON_VALUE(patient_data,'$.pname') = 'Patient 1';

And re-run the INTERSECT query. Drat! Patient 1 is no longer returned in the result set.

PNAME     JZN
--------- -----------------
Patient 2 1 MRN99

Because the comparison wound up being “2 456 1 123” compared to “1 123 2 456” which are very much not equal.

Its Not Just Text Is It?

Looks like straight text comparisons are not the way to go. The next post explains SQL and PLSQL functions for JSON data that correctly treat it as more than text.

Thanks for reading.

Leave a Comment