Friday, 18 April 2008

LAG, the Analytic Function

LAG is a nice analytic function in Oracle SQL which lets you access previous row of a row at the same time you access the current row.



The Oracle documentation says:

LAG is an analytic function. It provides access to more than one row of a table at the same time without a self join. Given a series of rows returned from a query and a position of the cursor, LAG provides access to a row at a given physical offset prior to that position. Found here: http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/functions070.htm#i1327527

Lets suppose you want to profile your customer by looking in the orders table and trying to figure out when was the last time the customer made an order. Even more you want to see the frequency of his/her ordering. How often he/she is making orders. That is, when was the last time you took an order and when was the last time before the last time ...etc.

Suppose your ORDERS table is this:


ORDER_ID ORDER_DAT FK_CUSTOMER_ID
---------- --------- --------------
1 25-FEB-08 1
2 25-MAY-06 1
3 25-JAN-08 1
5 29-JAN-07 3
6 25-JAN-08 3
88 20-JAN-04 2


Then if you wanted to look at FK_CUSTOMER_ID=1 and see the history, you can write something like this with the LAG analytic function:


select FK_CUSTOMER_ID, order_date, lag(order_date, 1) over (order by order_date) "PREVIOUS ORDER DATE",
order_date - (lag(order_date, 1) over (order by order_date)) "DAYS AGO"
from orders
where FK_CUSTOMER_ID=1
order by 2 desc


Which returns the history of the customer orders in a way which gives you an idea on what sort of the customer the customer is:


FK_CUSTOMER_ID ORDER_DAT PREVIOUS DAYS AGO
-------------- --------- --------- ----------
1 25-FEB-08 25-JAN-08 31
1 25-JAN-08 25-MAY-06 610
1 25-MAY-06


Analytics are cool!

Friday, 11 April 2008

scheduling jobs in the database with DBMS_JOB

Quickly submitting jobs with the Oracle DBMS_JOB package. The example below is an oracle schema user who wants to schedule the execution of a PL/SQL procedure at certain times during the day.


From SQL*Plus :


SQL> variable n number
SQL> exec dbms_job.submit( :n, 'YOURPLSQLPROCEDUREHERE;', sysdate, 'trunc(sysdate)+1+1/288' );
SQL> commit;


From PL/SQL :


declare
l_job number;
begin
dbms_job.submit( l_job,
'YOURPLSQLPROCEDUREHERE;',
trunc(sysdate)+20/24,
'trunc(sysdate)+1+20/24' );
commit;
end;



Some Scheduling time semantics
# 1/288 means every 5 minutes
# 1/24 means every hour
# trunc(sysdate)+1+11/24 means at 11am every day

Tuesday, 8 April 2008

SQL CASE statement and Aggregation

I find the CASE statement in SQL very good when it comes to classify data within ranges. Suppose you have a table called DEVICE which logs errors from some kind of machines and you wanted to classify the errors into categories according to the frequency of their occurance. Here is how you could use the CASE statement



Create the test table


DROP TABLE DEVICE;

CREATE TABLE DEVICE
(
DID NUMBER,
DNAME VARCHAR2(50),
DDATE DATE,
DMESSAGE VARCHAR2(50)
);


insert into device values (1,'Engine 5','09-MAR-08','leak');
insert into device values (1,'Engine 5','10-MAR-08','leak');
insert into device values (3,'Cam Belt','10-MAR-08','broken');
insert into device values (3,'Cam Belt','11-MAR-08','broken');
insert into device values (3,'Cam Belt','12-MAR-08','broken');
insert into device values (3,'Cam Belt','13-MAR-08','broken');
insert into device values (3,'Cam Belt','14-MAR-08','broken');
insert into device values (5,'Cockpit','24-MAR-08','lights out');
insert into device values (5,'Cockpit','25-MAR-08','lights out');
insert into device values (5,'Cockpit','23-MAR-08','lights out');
insert into device values (7,'Deck 34','29-MAR-08','starboard light green');
insert into device values (7,'Deck 34','28-MAR-08','starboard light green');
insert into device values (7,'Deck 34','28-MAR-08','starboard light green');
insert into device values (7,'Deck 34','31-MAR-08','starboard light green');
insert into device values (7,'Deck 34','30-MAR-08','starboard light green');
insert into device values (7,'Deck 34','05-APR-08','starboard light green');
insert into device values (7,'Deck 34','04-APR-08','starboard light green');

COMMIT;


And this is a table which would normally aggregate like this:


DNAME COUNT(*)
-------- --------
Engine 5 2
Cockpit 3
Deck 34 7
Cam Belt 5



Now let's use the CASE statement


select dname,
sum(case when cnt between 1 and 5 then cnt else 0 end) "NORMAL LEVEL",
sum(case when cnt between 6 and 11 then cnt else 0 end) "TOLERABLE LEVEL",
sum(case when cnt between 12 and (select count(*) from device) then cnt else 0 end) "DANGEROUS LEVEL"
from
(
select dname, count(*) cnt from device
group by dname
)
group by dname;


And here is the resultset with the CASE statement categorizing the aggregation by range.


DNAME NORMAL LEVEL TOLERABLE LEVEL DANGEROUS LEVEL
-------- ------------ --------------- ---------------
Engine 5 2 0 0
Cockpit 3 0 0
Deck 34 0 7 0
Cam Belt 5 0 0