2021 – Performance Comparison Part 1

Oracle offers like JSON_TABLE and JSON_VALUE to translate JSON keys and values into RDBMS rows and columns. But there is no fair way to compare the performance of those functions with home grown code because its not possible to see the inner workings and influences of the built in functions. Thus an apples-to-apples comparison is not truly valid.

Fortunately an apples-to-apples comparison has shown up in this series of posts. I forgot to include a constraint to limit each patient to one Medical Record Number (MRN) for each single Ordering System (OS). I missed this in the RDBMS version of the system and the JSON version. The first part of the fix is to identify the rows that need to be fixed. Let begin comparing the performance…

If you want to review the underlying JSON and Relational Data Models they are here.

RDBMS Query, Test Data and Query Plan

Here’s the query to find patients with bad data in the relation model.

SELECT patient_id,
       ordering_system
  FROM patient_mrn
 GROUP BY patient_id,
          ordering_system
HAVING COUNT(*) > 1;

The query does not even have to look at the PATIENT table. It can go straight to the PATIENT_MRN table that records the OS and MRN combination.

For this initial test I’ve set up a test bed of data with the following characteristics.

10,000 records in the PATIENT table
 1,666 patients multiple MRN's for one OS
41,666 total records in the PATIENT_MRN table

Here is the query plan and timings for the query.

PLAN_TABLE_OUTPUT
Plan hash value: 1711873054
-----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 43327 | 1100K| 38 (8)| 00:00:01 |
|* 1 | FILTER | | | | | |
| 2 | HASH GROUP BY | | 43327 | 1100K| 38 (8)| 00:00:01 |
| 3 | TABLE ACCESS FULL| PATIENT_MRN | 43327 | 1100K| 35 (0)| 00:00:01 |
-----------------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
1 - SEL$1
3 - SEL$1 / PATIENT_MRN@SEL$1
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(COUNT(*)>1)
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - (rowset=256) "PATIENT_ID"[NUMBER,22], "ORDERING_SYSTEM"[NUMBER,22]
2 - (#keys=2; rowset=256) "PATIENT_ID"[NUMBER,22],
"ORDERING_SYSTEM"[NUMBER,22], COUNT(*)[22]
3 - (rowset=256) "ORDERING_SYSTEM"[NUMBER,22], "PATIENT_ID"[NUMBER,22]
Note
-----
- dynamic statistics used: dynamic sampling (level=2)

The estimated duration is less than a millisecond.

JSON Query, Data Set and Query Plan

And here is the JSON query to find the same information using a Virtual Column that calls a function.

SELECT JSON_VALUE(patient_data,'$.pname') pname,
        unique_os
  FROM json_patient
 WHERE unique_os = 0;

And the test data set has these characteristics.

10,000 patient (and OS and MRN) records
  1,666 with non unique OS's

And the query plan looks like this:

PLAN_TABLE_OUTPUT
Plan hash value: 1282538007
----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 84 | 165K| 161 (1)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| JSON_PATIENT | 84 | 165K| 161 (1)| 00:00:01 |
----------------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
1 - SEL$1 / JSON_PATIENT@SEL$1
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("UNIQUE_OS"=0)
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - "PATIENT_DATA" /*+ LOB_BY_VALUE */ [LOB,4000]
Note
-----
- dynamic statistics used: dynamic sampling (level=2)

The estimated duration is also less than a millisecond.

In the next post we’ll see if the estimates hold true.

Thanks for reading!

Leave a Comment