Quantcast
Channel: SCN : Document List - SAP HANA and In-Memory Computing
Viewing all articles
Browse latest Browse all 1183

CE function trick #1 - Efficient cross join

$
0
0

Hi Folks,

 

Following is an simplifed example from a current project, showing how to achieve 100x or better performance executing cross joins in CE functions rather than SQL.

 

The CE function approach for various dataset sizes in testing has remained around 700ms, so the performance improvement may be greater than 100.

 

 

/*
 For certain use cases in HANA a 'cartesian product' is required, 
 also known as a cross join. A typical example is replacing [nested for loops
 over data sets with tuple calculations] with a [CROSS JOIN + calculated column].
 A real use case for cross join is in CRM IPM Availability Requests. Media companies
 can maintain information for products (i.e. movies) at different 'rights scopes':
 - Media (i.e. Free TV, Pay TV, Cable...)
 - Territory (i.e. regions, countries, states, cities, counties)
 - Languages
 A salesperson will want to find out what the availability is for certain products
 in certain rights scopes. In the example below, a salesperson can search for availability 
 for 1000+ products, 60 medias, 240 territories, 4 languages - resulting in 57 million combinations.
 Code below shows how to calculate cross join with SQL and CE functions, showing 100x better performance 
 with CE functions.
*/

--SET SCHEMA TEST;
/*
 Generator tables used to create fake data.
*/
DROP TABLE GENERATOR1;
CREATE COLUMN TABLE GENERATOR1 (G1 NCHAR(1));

DROP TABLE GENERATOR2;
CREATE COLUMN TABLE GENERATOR2 (G2 NCHAR(1));

DROP TABLE GENERATOR3;
CREATE COLUMN TABLE GENERATOR3 (G3 INTEGER);

INSERT INTO GENERATOR1 VALUES ('A');
INSERT INTO GENERATOR1 VALUES ('B');
INSERT INTO GENERATOR1 VALUES ('C');
INSERT INTO GENERATOR1 VALUES ('D');
INSERT INTO GENERATOR1 VALUES ('E');
INSERT INTO GENERATOR1 VALUES ('F');
INSERT INTO GENERATOR1 VALUES ('G');
INSERT INTO GENERATOR1 VALUES ('H');
INSERT INTO GENERATOR1 VALUES ('I');
INSERT INTO GENERATOR1 VALUES ('J');

INSERT INTO GENERATOR2 VALUES ('!');
INSERT INTO GENERATOR2 VALUES ('@');
INSERT INTO GENERATOR2 VALUES ('#');
INSERT INTO GENERATOR2 VALUES ('$');
INSERT INTO GENERATOR2 VALUES ('&');
INSERT INTO GENERATOR2 VALUES ('*');

INSERT INTO GENERATOR3 VALUES (1);
INSERT INTO GENERATOR3 VALUES (2);
INSERT INTO GENERATOR3 VALUES (3);
INSERT INTO GENERATOR3 VALUES (4);
INSERT INTO GENERATOR3 VALUES (5);
INSERT INTO GENERATOR3 VALUES (6);
INSERT INTO GENERATOR3 VALUES (7);
INSERT INTO GENERATOR3 VALUES (8);
INSERT INTO GENERATOR3 VALUES (9);
INSERT INTO GENERATOR3 VALUES (10);

-- 1000 unique GUIDs for products
DROP TABLE PRODUCT_GUID;
CREATE COLUMN TABLE PRODUCT_GUID AS (SELECT T1.G1 || T2.G1 || T3.G1 AS P FROM GENERATOR1 T1 CROSS JOIN GENERATOR1 T2 CROSS JOIN GENERATOR1 T3); 

-- 60 medias
DROP TABLE MEDIA;
CREATE COLUMN TABLE MEDIA AS (SELECT T1.G1 || T2.G2 AS M FROM GENERATOR1 T1 CROSS JOIN GENERATOR2 T2);

-- 240 territories
DROP TABLE TERRITORY;
CREATE COLUMN TABLE TERRITORY AS (SELECT T1.G1 || T2.G2 || T3.G3 AS T FROM GENERATOR1 T1 CROSS JOIN GENERATOR2 T2 CROSS JOIN (SELECT TOP 4 G3 FROM GENERATOR3) T3);

-- 4 languages
DROP TABLE LANGUAGE;
CREATE COLUMN TABLE LANGUAGE AS (SELECT TOP 4 G3 AS L FROM GENERATOR3);
DROP TYPE TT_TAB;

CREATE TYPE TT_TAB AS TABLE (P NVARCHAR(32), M NVARCHAR(30), T NVARCHAR(30), L NVARCHAR(30));

-- Read-only cross join procedure in SQL: Products x Media x Territory x Language
DROP PROCEDURE CROSS_JOIN_SQL;
CREATE PROCEDURE CROSS_JOIN_SQL (OUT var_out TT_TAB)
READS SQL DATA WITH RESULT VIEW CJ_SQL_VIEW AS
BEGIN
       var_out =               SELECT *               FROM PRODUCT_GUID               CROSS JOIN MEDIA              CROSS JOIN TERRITORY              CROSS JOIN LANGUAGE;       
END;

-- Read-only cross join in SQL: Products x Media x Territory x Language
DROP PROCEDURE CROSS_JOIN_CE;
CREATE PROCEDURE CROSS_JOIN_CE (OUT var_out TT_TAB)
READS SQL DATA WITH RESULT VIEW CJ_CE_VIEW AS
BEGIN
 -- 'query' tables
 a = CE_COLUMN_TABLE(PRODUCT_GUID, [P]);
 b = CE_COLUMN_TABLE(MEDIA, [M]);
 c = CE_COLUMN_TABLE(TERRITORY, [T]);
 d = CE_COLUMN_TABLE(LANGUAGE, [L]);
           -- add dummy field F, used for 'fake' cross join
 a1 = CE_PROJECTION(:a, [P, CE_CALC('1', INTEGER) AS F]);
 b1 = CE_PROJECTION(:b, [M, CE_CALC('1', INTEGER) AS F]);
 c1 = CE_PROJECTION(:c, [T, CE_CALC('1', INTEGER) AS F]);
 d1 = CE_PROJECTION(:d, [L, CE_CALC('1', INTEGER) AS F]);           -- 'fake' cross join
 ab = CE_JOIN(:a1, :b1, [F], [F, P, M]);
 cd = CE_JOIN(:c1, :d1, [F], [F, T, L]);
 abcd = CE_JOIN(:ab, :cd, [F], [F, P, M, T, L]);
 var_out = CE_PROJECTION(:abcd, [P, M, T, L]);
END;

-- server processing time is about 70 sec   
SELECT * FROM CJ_SQL_VIEW;
-- server processing time is about 700 ms 
SELECT * FROM CJ_CE_VIEW;

-- optional: verify same number of records in each 
-- SELECT COUNT(*) FROM CJ_SQL_VIEW;
-- SELECT COUNT(*) FROM CJ_CE_VIEW;
-- optional: verify that results match
-- SELECT * FROM CJ_SQL_VIEW ORDER BY P, MEDIA, TERRITORY, LANGUAGE;
-- SELECT * FROM CJ_CE_VIEW ORDER BY P, MEDIA, TERRITORY, LANGUAGE;




Viewing all articles
Browse latest Browse all 1183

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>