Quantcast
Channel: SCN : Document List - SAP HANA and In-Memory Computing
Viewing all articles
Browse latest Browse all 1183

Predictive Analysis using SQL Functions of HANA

$
0
0

Problem Statement:

 

An online retailer wants to improve it's shipping/delivery process. Currently, there are deviations noticed between the planned and the actual date of delivery. Causing customer dis-satisfaction.

 

Currently the order book has 360 pending orders. Retailer wants to know the following:

 

  1. What pecent of orders will get delivered within 3 days of the planned delivery date?
  2. Within how many days of planned date will 95% of orders be delivered?

 

Solution Approach:

 

Note: A basic knowledge of statistics and normal distribution is required.

 

Retailer stores the historic delivery data in the "DELIVERYDATA" table in HANA.

 

Table "DELIVERYDATA" has the following columns:

 

OrderNo, CustomerID, ProductID,PlanDelDate,ActDelDate

 

where PlanDelDate --> Planned Delivery Date

         ActDelDate   --> Actual Delivery Date

 

 

Step 1:  Assuming that actual delivery dates follow a normal distribution. And delay is the difference between the Planned and Actual delivery date.

 

Step 2:  Need to find the mean delay and standard deviation of delay for the historical delivery data. Here is the SQL written in HANA:

 

 

select  avg((days_between(plandeldate,actdeldate)))     as   mean,

           stddev((days_between(plandeldate,actdeldate))) as   standard_deviation

from deliverydata

 

 

 

 

Mean came as 0.08 and standard deviation came as 3.5466. This is directly from the SQL above on HANA.

 

 

 

Step 3: Now the first problem. Need to find the percentage of delivery where the lag will be + or - 3 days. Used the following calculation on paper ( Manual ):

 

For "+3" days, the corresponding deviation comes as : ( 3 - 0.08 ) / 3.5466 = 0.8233. Looked into the standard deviation table manually, and the probability percentage comes as 29.39%. Please scroll to the end of this document to get the link for the standard deviation table.

 

In a similar way, for "-3" day", the probability percentage comes as 30.5%.

 

Combined together it comes = 29.39 + 30.5 = 59.89%

 

Company has 360 pending orders. 59.89% of 360 is ~ 216

 

So, 216 of these 360 orders are expected to be delivered within 3 days of the planned delivery date.

 

NormalDistribution.gif

 

 

Step 4: Next problem. Need to know the delay range, by which 95% of the 360 orders will be delivered.

 

For normal distribution, twice the standard deviation (2 sigma) accounts for 95% of the distribution.

 

Mean delay was 0.08 days. As some orders reached late and some early. And standard deviation was 3.5467.

 

+ 2 sigma comes as :  2 x 3.5467 = 7.0934. Add it to the mean of 0.08. So the total comes as 7.1734. For practical purposes, lets take 7.

 

Similary for - 2 sigma :  2 x 3.5467 - 0.08 = 7.01. Roughly 7 days.

 

So 95% of the orders would reach within + or - 7 days of the planned delivery date.

 

 

 

Note: The attachment has the csv file to load data to the HANA "DELIVERYDATA" table.

 

Also one can find the normal standard distribution table in the below link:

http://www.mathsisfun.com/data/standard-normal-distribution-table.html

 

Thanks,

Kowshik

 


Viewing all articles
Browse latest Browse all 1183

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>