I really need to compare my data sets (one RDBMS and one JSON) to ensure they match. And the only programming languages I trust are SQL and PLSQL. If they say the data is the same then I trust it is the same.
The Data
-- RDBMS SELECT patient_name, ordering_system, patient_mrn FROM patient p, patient_mrn pm WHERE p.patient_id = pm.patient_id; PATIENT_NAME ORDERING_SYSTEM PATIENT_MRN ------------ --------------- ----------- Patient 1 1 MRN123 Patient 1 2 MRN456 Patient 2 1 MRN99 -- JSON SELECT JSON_VALUE(patient_data,'$.pname' NULL ON ERROR ) pname, JSON_VALUE(patient_data,'$.mrns[0].OS' NULL ON ERROR ) frstos, JSON_VALUE(patient_data,'$.mrns[0].MRN' NULL ON ERROR ) frstmrn, JSON_VALUE(patient_data,'$.mrns[1].OS' NULL ON ERROR ) scndos, JSON_VALUE(patient_data,'$.mrns[1].MRN' NULL ON ERROR ) scndmrn FROM json_patient; PNAME FRSTOS FRSTMRN SCNDOS SCNDMRN --------- ------ ------- ------ ------- Patient 1 1 MRN123 2 MRN456 Patient 2 1 MRN99
JSON Is Just Text Right?
At first glance the JSON format looks like relational data aggregated into one list. That assumption plus my inclination to rely on SQL to do everything (plus the bolding in the previous sentence) makes me want to try the LISTAGG function. Off I go…
SELECT p.patient_name, LISTAGG(ordering_system || ' ' || patient_mrn,' ') WITHIN GROUP ( ORDER BY ordering_system) la FROM patient p, patient_mrn pm WHERE p.patient_id = pm.patient_id GROUP BY p.patient_id, p.patient_name; PATIENT_NAME LA ------------ --------------- Patient 1 1 MRN123 2 MRN456 Patient 2 1 MRN99
That result looks like JSON without the keys and brackets and such. It provides enough information for a straight text comparison so I carry on.
-- RDBMS LISTAGG SELECT p.patient_name, LISTAGG(ordering_system || ' ' || patient_mrn,' ') WITHIN GROUP ( ORDER BY ordering_system) la FROM patient p, patient_mrn pm WHERE p.patient_id = pm.patient_id GROUP BY p.patient_id, p.patient_name INTERSECT -- JSON values as text SELECT JSON_VALUE(patient_data,'$.pname' NULL ON ERROR ) pname, JSON_VALUE(patient_data,'$.mrns[0].OS' NULL ON ERROR ) || ' ' || JSON_VALUE(patient_data,'$.mrns[0].MRN' NULL ON ERROR ) || ' ' || JSON_VALUE(patient_data,'$.mrns[1].OS' NULL ON ERROR ) || ' ' || JSON_VALUE(patient_data,'$.mrns[1].MRN' NULL ON ERROR ) listagg_wannabe FROM json_patient; PATIENT_NAME LA ------------ ----------------- Patient 1 1 MRN123 2 MRN456
Only 1 row returned? Where did Patient 2 go?
It was not included in the JSON because of the extra spaces I used to separate the text. A quick RTRIM takes care of that.
SELECT JSON_VALUE(patient_data,'$.pname' NULL ON ERROR ) pname, RTRIM( JSON_VALUE(patient_data,'$.mrns[0].OS' NULL ON ERROR ) || ' ' || JSON_VALUE(patient_data,'$.mrns[0].MRN' NULL ON ERROR ) || ' ' || JSON_VALUE(patient_data,'$.mrns[1].OS' NULL ON ERROR ) || ' ' || JSON_VALUE(patient_data,'$.mrns[1].MRN' NULL ON ERROR ) ) jzn FROM json_patient; PNAME JZN --------- ----------------- Patient 1 1 MRN123 2 MRN456 Patient 2 1 MRN99
Now the INTERSECT will work.
SELECT JSON_VALUE(patient_data,'$.pname' NULL ON ERROR ) pname, RTRIM( JSON_VALUE(patient_data,'$.mrns[0].OS' NULL ON ERROR ) || ' ' || JSON_VALUE(patient_data,'$.mrns[0].MRN' NULL ON ERROR ) || ' ' || JSON_VALUE(patient_data,'$.mrns[1].OS' NULL ON ERROR ) || ' ' || JSON_VALUE(patient_data,'$.mrns[1].MRN' NULL ON ERROR ) ) jzn FROM json_patient INTERSECT SELECT p.patient_name, LISTAGG(ordering_system || ' ' || patient_mrn,' ') WITHIN GROUP ( ORDER BY ordering_system) FROM patient p, patient_mrn pm WHERE p.patient_id = pm.patient_id GROUP BY p.patient_id, p.patient_name; PNAME JZN --------- ----------------- Patient 1 1 MRN123 2 MRN456 Patient 2 1 MRN99
JSON Is Still Just Text Right?
Not really. Its does have structure and logic. For example – what if the OS/MRN combinations were reversed?
For example this:
[{"OS":2,"MRN":"MRN456"},{"OS":1,"MRN":"MRN123"}]
Instead of this:
[{"OS":1,"MRN":"MRN123"},{"OS":2,"MRN":"MRN456"}]
Would the text comparison still work? I’ll update patient 1 to reverse the order of the array elements to see what happens.
UPDATE json_patient SET patient_data = '{"pname":"Patient 1","mrns":[{"OS":2,"MRN":"MRN456"},{"OS":1,"MRN":"MRN123"}]}' WHERE JSON_VALUE(patient_data,'$.pname') = 'Patient 1';
And re-run the INTERSECT query. Drat! Patient 1 is no longer returned in the result set.
PNAME JZN --------- ----------------- Patient 2 1 MRN99
Because the comparison wound up being “2 456 1 123” compared to “1 123 2 456” which are very much not equal.
Its Not Just Text Is It?
Looks like straight text comparisons are not the way to go. The next post explains SQL and PLSQL functions for JSON data that correctly treat it as more than text.
Thanks for reading.