Parallel By Range

Previous posts in this category introduced a way to perform DML in a SELECT using the following technology:

  • Table Functions
  • Autonomous Transactions
  • A REF CURSOR as a parameter
  • Row by row (aka slow by slow) processing of the REF CURSOR
  • Parallel processing to try to overcome the slow by slow processing

This post expands the parallel processing to enable a simple requirement – provide a total of the values processed in one column for each value in another column.

Here is the demo table:

SQL> desc demo
 Name         Null?    Type
 ------------ -------- ------
 COL1                  NUMBER 
 COL2                  NUMBER

We want to see the total of all COL2 values deleted for every COL1 value. For example these two rows would produce a total of 11 for the COL1 value of 1.

SQL> SELECT *
 2     FROM demo;
      COL1       COL2
---------- ----------
         1          7
         1          4

Let’s create some test data in the table.

SQL> BEGIN
  2    FOR counter IN 1..10 LOOP
  3      FOR counter2 IN 1..10 LOOP
  4        IF MOD(counter,2) = 0 THEN
  5          INSERT INTO demo
  6          VALUES(counter,counter2 * counter);
  7        ELSE
  8          INSERT INTO demo
  9          VALUES(counter,counter * counter);
 10       END IF;
 11     END LOOP;
 12   END LOOP;
 13 END;
 14 /
PL/SQL procedure successfully completed.

SQL> COMMIT;
Commit complete.

-- Query the data. We hope to see the same result from the function we'll create later.
SQL> SELECT col1,
  2         SUM(col2)
  3    FROM demo
  4  GROUP BY col1
  5  ORDER BY col1;

COL1       SUM(COL2)
---------- ----------
         1         10
         2        110
         3         90
         4        220
         5        250
         6        330
         7        490
         8        440
         9        810
        10        550
10 rows selected.

Now here is the function – pay close attention to the PARALLEL_ENABLE and ORDER clause in the header.

CREATE OR REPLACE FUNCTION delete_demo ( p_curs demo_curs.v_demo_curs )
                  RETURN v_n_t
                  --
                  -- Rows returned by the cursor will be partitioned across
                  -- parallel processes based on their COL1 value. For example
                  -- all COL1=1 rows will go to the same parallel process, all
                  -- COL1=2 rows will go to the same parallel process, and so on
                  --
                  PARALLEL_ENABLE ( PARTITION p_curs BY RANGE(col1) )
                  --
                  -- Within the parallel process the rows will be ordered by their COL1
                  -- values. For example if COL1=1 and COL1=2 go to the same parallel process
                  -- they wont get mixed up
                  --
                  ORDER p_curs BY ( col1 ) AS

    PRAGMA AUTONOMOUS_TRANSACTION;
    v_ret_val v_n_t := v_n_t();
    v_number NUMBER;

    v_last_col1 NUMBER := NULL; -- last col1 value processed
    v_total NUMBER := 0;        -- running total of col2 for each col1
    v_col1 NUMBER;              -- col1 value queried
    v_col2 NUMBER;              -- col2 value queried

BEGIN

   /*
   ||
   || The processing in this function is pretty straightforward. As the COL1 values
   || arrive we track the SUM Of COL2 values. When the COL1 value changes we output the
   || totals to a log table (demo_log), reset the counters and start again.
   ||
   */
   LOOP

    -- get a row from cursor
    FETCH p_curs INTO v_col1, v_col2;
    EXIT WHEN p_curs%NOTFOUND;

    -- initialize last col1 if this is first record
    -- from cursor
    IF v_last_col1 IS NULL THEN
      v_last_col1 := v_col1;
    END IF;

    -- if same col1 value then add to total
    IF v_last_col1 = v_col1 THEN
      v_total := v_total + v_col2;
    ELSE
      --
      -- not same col1 value so save log and
      -- re-initialize tracking values
      --
      INSERT INTO demo_log
      VALUES(v_last_col1,v_total);
      v_last_col1 := v_col1;
      v_total := v_col2;

    END IF; -- if same col1

    -- delete the record and return the rowcount
    v_ret_val.EXTEND;
    DELETE demo
    WHERE col1 = v_col1;
    v_ret_val(v_ret_val.LAST) := SQL%ROWCOUNT;

  END LOOP;

  -- dont forget to log the last total
  IF v_last_col1 IS NOT NULL THEN
    INSERT INTO demo_log
    VALUES(v_last_col1,v_total);
  END IF;

  COMMIT;

  RETURN(v_ret_val);

END;
/

Now lets execute the function.

SQL> ALTER SESSION FORCE PARALLEL QUERY;
Session altered.

SQL> SELECT COUNT(*)
 2     FROM TABLE(delete_demo(CURSOR(SELECT col1,
 3                                          col2
 4                                     FROM demo
 5                                   ORDER BY col2))) -- ordered by col2 to create randomness
 6 /
  COUNT(*)
----------
       100

SQL> ALTER SESSION DISABLE PARALLEL QUERY;
Session altered.

Now lets see what wound up in the log table.

SQL> SELECT *
  2    FROM demo_log
  3  ORDER BY col1
  4 /
      COL1      TOTAL
---------- ----------
         1         10
         2        110
         3         90
         4        220
         5        250
         6        330
         7        490
         8        440
         9        810
        10        550

That matches the query with a group by we did earlier so the partitioning and ordering of the cursor worked!

I’ll conclude with two key points:

  1. The partitioning and ordering works despite of the ORDER BY clause supplied in SQL of the cursor
  2. The cursor must be strongly typed to be partitioned and ordered. I used this package for that:
      CREATE OR REPLACE PACKAGE demo_curs AS
        TYPE v_demo_curs IS REF CURSOR RETURN demo%ROWTYPE
      END;

In the next post I’ll explain PIPELINING to make the processing even faster.

Thanks for reading!

Parallel By ANY

Previous posts in this category introduced a way to perform DML in a SELECT using the following technology:

  • Table Functions
  • Autonomous Transactions

The following concepts were included:

  • A REF CURSOR as a parameter
  • Row by row (aka slow by slow) processing of the REF CURSOR
  • Raw parallel query processing to try to overcome the slow by slow processing

This post makes the parallel processing less raw by specifying that one instance of the function will run on each parallel process.

To keep this demo simple I’ll let Oracle decide how many parallel processes to use. This wont break anything because all I’m after is extra processing power.

SQL> -- Create the table
SQL> CREATE TABLE demo
 2   ( col1 NUMBER,
 3     col2 NUMBER );
Table created.

SQL> -- create 1000 demo records
SQL> BEGIN
 2     FOR counter IN 1..1000 LOOP
 3       INSERT INTO demo
 4       VALUES(counter,
 5              counter);
 6     END LOOP;
 7   END;
 8 /
PL/SQL procedure successfully completed.

SQL> COMMIT;
Commit complete.

SQL> -- create a return type for the function we will write later
SQL> CREATE OR REPLACE TYPE v_n_t AS TABLE OF NUMBER;
 2 /
Type created.

SQL> CREATE OR REPLACE FUNCTION delete_demo ( p_curs SYS_REFCURSOR )
 2                     RETURN v_n_t
 3                     --
 4                     -- Here is the parallel clause. The BY ANY clause tells Oracle to
 5                     -- simply use whatever parallelism it sees fit
 6                     --
 7                    PARALLEL_ENABLE ( PARTITION p_curs BY ANY ) AS
 8                    --
 9                    PRAGMA AUTONOMOUS_TRANSACTION;
 10                   v_ret_val v_n_t := v_n_t();
 11                   v_number NUMBER;
 12 BEGIN
 13   LOOP
 14     FETCH p_curs INTO v_number;
 15     EXIT WHEN p_curs%NOTFOUND;
 16     v_ret_val.EXTEND;
 17     DELETE demo
 18     WHERE col1 = v_number;
 19     v_ret_val(1) := SQL%ROWCOUNT;
 20   END LOOP;
 21   COMMIT;
 22   RETURN(v_ret_val);
 23 END;
 24 /
Function created.

SQL> ALTER SESSION FORCE PARALLEL QUERY;
Session altered.

SQL> SELECT COUNT(*)
 2     FROM TABLE(delete_demo(CURSOR(SELECT col1
 3                                     FROM demo)))
 4 /

COUNT(*)
----------
      1000

SQL> ALTER SESSION DISABLE PARALLEL QUERY;
Session altered.

SQL> SELECT COUNT(*)
 2     FROM demo
 3 /

COUNT(*)
----------
         0

SQL> SELECT statistic,
 2          value
 3     FROM v$pq_sysstat
 4    WHERE value > 0
 5   ORDER BY value DESC;

STATISTIC            VALUE
-------------------- -----
Local Msgs Recv'd       30
Local Msgs Sent         30
Servers Idle            12
Servers Started         12
Servers Highwater        4
Server Sessions          4
Queries Initiated        1
DFO Trees                1

Note the much more efficient use of PQ processing. Only 30 messages!

The next post will how parallelism can be controlled to provide raw power and functionality. It’s Here.

Thanks for reading!

REF Cursors, Row By Row and Parallel Query

In the first post in this category I introduced a way to perform DML in a SELECT using the following technology:

  • Table Functions
  • Autonomous Transactions

In this post I’ll add the following:

  • A REF CURSOR as a parameter
  • Row by row (aka slow by slow) processing of the REF CURSOR
  • Raw parallel query processing to try to overcome the slow by slow processing

Lets get to the code shall we…

SQL> -- Create the table
SQL> CREATE TABLE demo
  2 ( col1 NUMBER,
  3   col2 NUMBER );
Table created.

SQL> -- create 1000 demo records
SQL> BEGIN
  2    FOR counter IN 1..1000 LOOP
  3      INSERT INTO demo
  4      VALUES(counter,
  5             counter);
  6    END LOOP;
  7  END;
  8 /
PL/SQL procedure successfully completed.

SQL> COMMIT;
Commit complete.

SQL> -- create a return type for the function we will write later
SQL> CREATE OR REPLACE TYPE v_n_t AS TABLE OF NUMBER;
  2 /
Type created.

SQL> -- now for the function
SQL> /*
SQL> Key points
SQL>    1) The parameter is a REF cursor - how cool is that?
SQL>    2) We loop through the records one by one aka SLOW BY SLOW...
SQL>    3) The function deletes the rows it queries
SQL>    4) This is madness
SQL>    5) Yes, it is madness but I'm leading up to something
SQL>    6) When will you make your point
SQL>    7) 3 or 4 more posts; I promise
SQL>    8) OK. This better be good...
SQL> */
SQL> CREATE OR REPLACE FUNCTION delete_demo ( p_curs SYS_REFCURSOR )
  2     RETURN v_n_t AS
  3     PRAGMA AUTONOMOUS_TRANSACTION;
  4     v_ret_val v_n_t := v_n_t();
  5     v_number NUMBER;
  6  BEGIN
  7    LOOP
  8      FETCH p_curs INTO v_number;
  9      EXIT WHEN p_curs%NOTFOUND;
 10      v_ret_val.EXTEND;
 11      DELETE demo
 12      WHERE col1 = v_number;
 13      v_ret_val(1) := SQL%ROWCOUNT;
 14    END LOOP;
 15    COMMIT;
 16    RETURN(v_ret_val);
 17 END;
 18 /
Function created.

SQL> -- One way to overcome the slow by slow processing is with MORE power!
SQL> -- This can be provided by Parallel Query because we are using a SELECT here...
SQL> ALTER SESSION FORCE PARALLEL QUERY;
Session altered.

SQL> SELECT COUNT(*)
  2    FROM TABLE(delete_demo(CURSOR(SELECT col1
  3                                    FROM demo)))
  4 /

COUNT(*)
----------
      1000

SQL> ALTER SESSION DISABLE PARALLEL QUERY;
Session altered.

SQL> SELECT COUNT(*)
  2    FROM demo
  3 /

COUNT(*)
----------
         0

SQL> -- Check the parallel query server usage
SQL> -- Please note the number of message sent and received
SQL> -- because that become important in later posts
SQL> SELECT statistic,
  2         value
  3    FROM v$pq_sysstat
  4   WHERE value > 0
  5 ORDER BY value DESC;

STATISTIC           VALUE
------------------ ------
Local Msgs Sent     22000
Local Msgs Recv'd   22000
Server Sessions      4000
DFO Trees            1000
Queries Initiated    1000
Servers Idle           12
Servers Started        12
Servers Highwater       4

At this point you may/should be thinking this is a lot of work just to delete records from a table. And you would be right.

It would be far simpler to just set the parallel degree for the table and run the DELETE directly against it.

And if you want to process the rows deleted just add a RETURNING clause to the DELETE statement.

And looping through the rows one by one will take forever!

Those are all valid arguments. And this series of posts is not intended to debunk any of them.

It’s intended to show Table Function as an option for processing large amounts of data especially when extra processing such as running totals must be added to the workload.

Over the next few articles I’ll expand on these topics. The next post introduces Parallelism for Table Functions.

Thanks for reading!