2020 – Part 1 – Dataguide Views

The next thing we need for Project 2020 is a simple and fast way to determine what keys are contained in our JSON data. My favourite way to get this is an Oracle Search Indexes because they are:

Simple

The SQL to create them is very straightforward and accessing the JSON keys simple via views in the database.

Fast

The views are in the database, right next to the JSON data itself so access is very fast.

Let’s see how they work…

First up – here is the syntax for creating a search index with the DATAGUIDE ON syntax telling Oracle to maintain the views for it.

CREATE SEARCH INDEX search_stuff ON json_stuff ( json_data )
FOR JSON PARAMETERS ('DATAGUIDE ON');

The first view of the JSON data is provided by the JSON_DATAGUIDES.

SELECT dataguide
  FROM user_json_dataguides
 WHERE table_name = 'JSON_STUFF'
   AND column_name = 'JSON_DATA';

Here is the output.

DATAGUIDE
-------------------------------------------------
[
{
"o:path" : "$.KEY1",
"type" : "string",
"o:length" : 8,
"o:preferred_column_name" : "JSON_DATA$KEY1"
},
{
"o:path" : "$.KEY2",
"type" : "string",
"o:length" : 8,
"o:preferred_column_name" : "JSON_DATA$KEY2"
},
{
"o:path" : "$.KEY3",
"type" : "string",
"o:length" : 8,
"o:preferred_column_name" : "JSON_DATA$KEY3"
}
]

That’s JSON about the JSON! Being a relational database person I much prefer the second view – JSON_DATAGUIDE_FIELDS shown below.

SELECT path,
       type,
       length
  FROM user_json_dataguide_fields
 WHERE table_name = 'JSON_STUFF'
   AND column_name = 'JSON_DATA';

Here is output.

PATH       TYPE       LENGTH
---------- ---------- ------
$.KEY1     string          8
$.KEY2     string          8
$.KEY3     string         83

This simplified output makes it fast and simple to access the values using the JSON_VALUE function like this.

SELECT JSON_VALUE(js.json_data,'$.KEY1') a,
       JSON_VALUE(js.json_data,'$.KEY2') a,
       JSON_VALUE(js.json_data,'$.KEY3') a
FROM json_stuff js;

Here’s the output.

A          A          A
---------- ---------- ----------
VALUE1
           VALUE2     VALUE3

Note that if JSON_VALUE does not find a KEY it does not fail – it just returns NULL.

Foreshadowing

Oracle does not require column aliases (all A’s in the example above) to be unique if if they are only for display. This is not exclusive to JSON processing. I only mention it now because it come’s in very handy as we work through this project.

If the aliased column’s are used in subsequent processing it does become a problem as shown here.

SELECT *
  FROM ( SELECT JSON_VALUE(js.json_data,'$.KEY1') a,
                JSON_VALUE(js.json_data,'$.KEY2') a,
                JSON_VALUE(js.json_data,'$.KEY3') a
           FROM json_stuff js );

SELECT *
*
ERROR at line 1:
ORA-00918: column ambiguously defined

Now that we have JSON data and access to the keys it contains we can start assembling the Polymorphic Table Function (PTF) to process them.

You can find the code for this on the Oracle Live SQL Site.

 Introduction 
 Part 1 (This Post) 

Thanks for reading!

KSCOPE Answer 2 of 2 – Part 3

This post concludes the series on imitating a CONNECT BY SQL query with a PTF. It calculates the CONNECT BY values (dates in this case) and outputs them as a TRACE column.

First I add the trace column in the DESCRIBE function.

  v_new_col DBMS_TF.COLUMN_METADATA_T;
  v_new_cols DBMS_TF.COLUMNS_NEW_T;
...
  v_new_col := DBMS_TF.COLUMN_METADATA_T( 
         type    => DBMS_TF.TYPE_VARCHAR2,
         name    => 'TRACE',
         max_len => 100 );
  v_new_cols := DBMS_TF.COLUMNS_NEW_T( 1 => v_new_col);
...
  RETURN DBMS_TF.DESCRIBE_T ( new_columns => v_new_cols );

Then in the FETCH_ROWS procedure I add the TRACE column…

  v_dt DATE;
  v_trace VARCHAR2(500);
  v_trace_cols DBMS_TF.TAB_VARCHAR2_T;

…and populate it.

FOR row_counter IN 1..v_rowcount LOOP

  -- get the date value
  v_dt := v_rowset(col_counter).tab_date(row_counter);
  v_trace := v_trace || v_dt || '-';

  v_trace_cols(row_counter) := v_trace;

END LOOP; -- every row

DBMS_TF.PUT_COL( columnid => 1,
                 collection => v_trace_cols );

Here is the simplistic test data set I use.

SELECT partno,
       sdate,
       edate
  FROM prices;

PARTNO SDATE     EDATE
------ --------- ---------
     1 24-AUG-19 27-AUG-19
     1 27-AUG-19 30-AUG-19
     2 24-AUG-19 27-AUG-19
     2 27-AUG-19 30-AUG-19

And here is the result with the PTF.

SELECT partno,
       trace
  FROM ptf_func( prices
                 PARTITION BY partno
                 ORDER     BY sdate );

PARTNO TRACE
------ ----------------------------------------
     1 24-AUG-19-
     1 24-AUG-19-27-AUG-19-
     2 24-AUG-19-27-AUG-19-24-AUG-19-
     2 24-AUG-19-27-AUG-19-24-AUG-19-27-AUG-19-

That seems pretty straight forward because I make a some assumptions:

  • The date ranges are already exclusive in the table
      • I don’t believe this code will handle overlaps
  • There are less that 1024 rows in the table
      • See other posts in this PTF series about the 1024 rows feature

Thanks for reading.

KSCOPE Answer 2 of 2 – Part 2

The next step to performing CONNECT BY queries in a Polymorphic Table Function is to add the explicit fetch. First add this to the DESCRIBE function to ensure all columns are fetched.

FOR counter IN 1..p_tbl.column.count LOOP
  p_tbl.column(counter).for_read := TRUE;
END LOOP;

Then add a explicit FETCH to the FETCH_ROWS procedure.

DBMS_TF.GET_ROW_SET( rowset => v_rowset,
                     row_count => v_rowcount,
                     col_count => v_colcount);

And add a code stub to interrogate the rows as they are fetched.

-- for every get column...
FOR col_counter IN 1..v_colcount LOOP

  -- if the column is SDATE or EDATE then...
  IF v_rowset(col_counter).description.name IN ('"EDATE"','"SDATE"') THEN

   -- for every row in the row set...
   FOR row_counter IN 1..v_rowcount LOOP

     -- get the date value
     v_dt := v_rowset(col_counter).tab_date(row_counter);

   END LOOP; -- every row

  END IF; -- column is SDATE

END LOOP; -- every column

I’m wary of how well this whole effort will perform so I’m going to run another performance test similar to the one in the previous post. Here are the results.

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        3      0.07       0.07          0        201          0           0
Execute      3      0.00       0.00          0          0          0           0
Fetch    20004      0.77       2.92          0      99786          0      300000
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total    20010      0.85       3.00          0      99987          0      300000

Some interesting points when comparing this run to the previous test without the explicit fetch.

  • CPU went down from 2.33 to 0.77
  • Elapsed went up from 2.81 to 3.00
  • Queried went down from 201-0-99786 to 200-0-99785

What caused the reduction in CPU?

I’ll dig into that in the next few articles in this series.