Reorganizing transaction tables for faster historical queries

by Granville Bonyata on September 30, 2012

Sometimes we have a transaction table that is frequently queried for historical data, but the query, while well-tuned, is too slow simply because it has to do a sequential read on an index and then retreive the table’s row over and over. For a query that retrieves many rows, it can run slower than needed and add significant overhead.

It’s an OLTP table, which means the data accumulates over time, so when querying the data, we’re querying data that was inserted over time(lets say an employee’s timesheets queried by employee id), so the data is scattered throughout the table. So odds are, every row that is retrieved will require a separate disk I/O.

The nice thing about this is many (most?) of these queries will all query by the same id. For timesheets, it will be employee_id; for orders it will be customer number; for banking transactions, it will be account number, etc. So the answer is to periodically re-order the table so that it is sorted by the most queried field, putting all these rows into the same and adjacent disk blocks.

Depending on the size of the tables, and how much, if any, downtime your system allows, you may:
1) Take the entire table off-line, and replace it with a copy that has been sorted.
2) Partition the table, and do a re-sort of older partitions once they move into history with little activity.
There are techniques for both of these which we’ll discuss in a later post, but for now it is enough to know that they exist. This post is to show the performance gain by sorting the data by the most queried fields.

First, let’s create a very large order table, with customer orders scattered throughout the table, and the same table sorted by customer.

CREATE TABLE unordered_order_table
(order_pk NUMBER NOT NULL,
customer_fk NUMBER NOT NULL,
text1 VARCHAR2(20),
text2 VARCHAR2(20),
text3 VARCHAR2(20),
text4 VARCHAR2(20)
)
/
CREATE INDEX order_index ON unordered_order_table (customer_fk)
/

CREATE TABLE ordered_order_table
(order_pk NUMBER NOT NULL,
customer_fk NUMBER NOT NULL,
text1 VARCHAR2(20),
text2 VARCHAR2(20),
text3 VARCHAR2(20),
text4 VARCHAR2(20)
)
/
CREATE INDEX order2_index ON ordered_order_table (customer_fk)
/

--Load it up with 3,000,000 orders, evenly divided across 10,000 customers.
--Note that the customer orders are not inserted together.
CREATE SEQUENCE order_seq cache 10000;

DECLARE
BEGIN
FOR i IN 1..300 LOOP
INSERT/*+ APPEND */ INTO ordered_order_table
SELECT order_seq.nextval,
level,
TO_CHAR(ROUND(dbms_random.value(100000,99999999999999999999),0)),
CASE WHEN MOD(ROWNUM,2) = 0 THEN 'ACTIVE' ELSE 'INACTIVE' END,
CASE WHEN MOD(ROWNUM,3) = 0 THEN 'QWERTYUIOP123456789'
WHEN MOD(ROWNUM,3) = 0 THEN TO_CHAR(ROUND(dbms_random.value(100000,99999999999999999999),0))
ELSE 'active' END,
TO_CHAR(ROUND(dbms_random.value(100000,99999999999999999999),0))
from dual
connect by level <= 10000 ORDER BY 2; COMMIT; END LOOP; END; / --Now create a similar table, but sorted by the customer number CREATE TABLE unordered_order_table AS SELECT * FROM ordered_order_table ORDER BY customer_fk / --And a script to compare them declare type array1 is table of NUMBER index by binary_integer; l_data array1; l_start TIMESTAMP; l_end TIMESTAMP; l_customer_id NUMBER; l_text VARCHAR2(100); FUNCTION milliseconds (pStartDate IN TIMESTAMP, pEndDate IN TIMESTAMP) RETURN NUMBER IS nMilliseconds NUMBER; BEGIN nMilliseconds := EXTRACT(DAY FROM pEndDate - pStartDate) * 24 * 60 * 60 * 1000 +EXTRACT(HOUR FROM pEndDate - pStartDate) * 60 * 60 * 1000 +EXTRACT(MINUTE FROM pEndDate - pStartDate) * 60 * 1000 +EXTRACT(SECOND FROM pEndDate - pStartDate) * 1000; RETURN (nMilliseconds); END; begin l_customer_id := ROUND(dbms_random.value(100001,110000)); l_start := CURRENT_TIMESTAMP; --Time collecting all rows for a customer. Since we're selecting a non-indexed field, we know it --has to do a table read. SELECT o.order_pk BULK COLLECT INTO l_data FROM large_order_table o WHERE o.customer_fk = l_customer_id; l_end := CURRENT_TIMESTAMP; dbms_output.put_line('Count: '||l_data.count||' Unordered: '||ROUND(milliseconds(l_start, l_end),0)||'ms'); l_start := CURRENT_TIMESTAMP; --Time collecting all rows for a customer. Since we're selecting a non-indexed field, we know it --has to do a table read. SELECT o.order_pk BULK COLLECT INTO l_data FROM ordered_order_table o WHERE o.customer_fk = l_customer_id; l_end := CURRENT_TIMESTAMP; dbms_output.put_line('Count: '||l_data.count||' Ordered: '||ROUND(milliseconds(l_start, l_end),0)||'ms'); end; / --Now let's see what the results look like. --Let's clear the buffer cache to make sure all things start equally alter system flush buffer_cache; --Now let's run some tests! SQL> @c:\temp\test_run.sql
Count: 300 Unordered: 5790ms
Count: 300 Ordered: 1412ms

PL/SQL procedure successfully completed.

SQL> @c:\temp\test_run.sql
Count: 300 Unordered: 5545ms
Count: 300 Ordered: 144ms

PL/SQL procedure successfully completed.

SQL> @c:\temp\test_run.sql
Count: 300 Unordered: 5794ms
Count: 300 Ordered: 98ms

PL/SQL procedure successfully completed.

SQL> @c:\temp\test_run.sql
Count: 300 Unordered: 4754ms
Count: 300 Ordered: 168ms

As you can see, apart from the first run when the indexes hadn’t yet been cached, the performance difference is HUGE.

Of course, it only works in the single type of query – you can only sort the table’s data to optimize one query. But considering how low the cost is, if you are seeing high waits when querying an OLTP table, this technique works.

Previous post:

Next post: