Quantcast
Channel: SCN : Document List - SAP HANA and In-Memory Computing
Viewing all 1183 articles
Browse latest View live

SAP HANA REVISION UPDATE – SPS10

$
0
0

Reason for HANA DB patch level update

We are copying HANA DB from SLES 11.3 revision 102.01 to RHEL 6.5 revision 102.00 through Backup / Restore method using SWPM (homogeneous system copy). While restoring any HANA DB it is necessary to have at least same or higher patch level into the target environment. This is reason we are updating target HANA DB environment from revision 102.00 to latest available patch level 102.04.

Download SAP HANA patches

Download following updates (Database, Studio & Client) from SAP marketplace and transfer to HANA server

Fig6.pngFig7.png

Current available patch level is 102.04 (PL04). So we are updating into PL 04. We will download Studio, Client & DB for update.

SAP HANA Backup before update

Take complete backup before rev update start

Fig4.png

Extract HANA Patches

Move all SAR files into HANA host server and extract using switch –manifest SIGNATURE.SMFig8.png

If you extract more than one component SAR into a single directory, you need to move the SIGNATURE.SMF file to the subfolder (SAP_HANA_DATABASE, SAP_HANA_CLIENT, SAP_HANA_STUDIO etc.), before extracting the next SAR in order to avoid overwriting the SIGNATURE.SMF file. For more information, see also SAP Note 2178665 in Related Information.

Fig9.png

Fig10.png

Do the same for client & studio as well

Fig11.png

Fig11.png

HANA Update via STUDIO

Run SAP HANA Platform Lifecycle Management from HANA STUDIO

Fig12.png

Fig13.png

Fig14.png

Select the location from HANA host

Fig16.png

Fig17.png

Fig18.png

Fig19.png

Fig20.png

Fig21.png

Fig22.png

Fig23.png

This completes HANA patch level update.


HANA System Rename (hostname) through hdblcmgui command

$
0
0

Prerequisites

  • You are logged in as root user.
  • The SAP HANA system has been installed with the SAP HANA database lifecycle manager (HDBLCM).
  • The SAP HANA database server is up and running. Otherwise, inconsistencies in the configuration might occur.

Go to HDBCLM directory of the HANA host

# cd /hana/shared/SEC/hdblcm

# ./hdblcmgui

Fig1.png

Choose the ‘rename of SAP HANA System’

Fig2.png

Enter <SID>ADM password of HANA DB and mention the new hostname that need to updated

Fig3.png

Check the information in the screen and proceed to next

Fig4.png

If you want to change otherwise proceed next

Fig5.png

Click on rename button

Fig6.png

Now HANA DB hostname changed to <<new hostname>>


SAP HANA - interesting notes and other information

$
0
0

Dear all,

 

for more than four years I'm working with SAP HANA now. During this period of time we had to find solutions for different kinds of problems. In many cases I received additional information extending my knowledge around SAP HANA, too.

 

The following notes and web pages might be very useful for your daily work with SAP HANA. The mentioned SAP notes explain various parts of SAP HANA very well. In the web you can find well written examples and explanations. All the information might help you administering and maintaining your HANA landscape.

 

note numberDescription
2063657HANA system replication takeover decision guidelines
1925267forgot SYSTEM password
1999997FAQ: SAP HANA memory
2044468FAQ: SAP HANA partitioning
1999998FAQ: SAP HANA lock analysis
2000002FAQ: SAP HANA sql optimization
2100009FAQ: SAP HANA savepoints
2147247FAQ: SAP HANA statistics server
2000003FAQ: SAP HANA
2114710FAQ: SAP HANA threads and thread samples
1999993how-to: interpreting SAP HANA mini check results
1514967SAP HANA: central note
2186744FAQ: SAP HANA parameters
2036111configuration parameters for the SAP HANA systems
1969700sql statement collection for SAP HANA
1999880FAQ: HANA system replication

 

Interesting websites with useful information.

URL
Description
https://blogs.saphana.comSAP HANA Blog
http://help.sap.com/hanaHelp SAP HANA Platform (Core)
http://help.sap.com/hana_platformSAP HANA Platform (Core)
http://hana.sap.com/abouthana.htmlSAP HANA Information
http://scn.sap.com/community/business-suite/blog/2015/03/02/sap-s4hana-frequently-asked-questions--part-1HANA FAQ - links to parts 2 and 3 are given in the article, too


On https://open.sap.com course for SAP HANA, too, are published. These courses will give you a deeper knowledge of SAP HANA. You can register for free to these courses.

 

Enjoy the given websites and find some useful information for your daily work. Please add additional comments if you like to.

 

Martin

HANA stopped unexpectedly due to the accidental deletion of shared memory lock

$
0
0

Symptom

One of my friend faces a strange problem regarding to the unpacked stop of HANA server after cleaning some files under /tmp directory.

After manual starting the system, HANA will run normal again (version: SPS 09 Rev.95).

 

In nameserver_hosta....trc, the following error is shown:

[79877]{-1}[-1/-1] 2016-02-24 01:15:07.912168 f NameServer

TREXNameServer.cpp(03342) : shared memory lock

'/tmp/.hdb_ABC_30_lock'was deleted -> stopping instance ...

[79877]{-1}[-1/-1] 2016-02-24 01:15:10.655484 i Service_Shutdown

transmgmt.cc(06027) : Preparing for TransactionManager shutdown

 

Analysis

File /tmp/.hdb_<SID>_<instance number>_lock is used by HANA as a shared memory lock. If the file is deleted by chance, the database cannot manage the access of shared memory segment anymore and therefore has to stop accordingly.

For more detailed information, please refer to 1984700 - HANA stopped unexpectedly

If you are using RedHat Enterprise Linux, please take care of tmpwatch which delete files order than sometime.

 

For HANA <= 09, the shared memory lock file is /tmp/.hdb_<sid>_<inst_id>_lock

For HANA >= 10, the shared memory lock file is /var/lib/hdb/<sid>/.hdb_<sid>_<inst_id>_lock

(1999998 - FAQ: SAP HANA Lock Analysis)

 

Solution

DO NOT delete shared memory lock file.

If you are running RedHat, please remove tmpwatch from the system's cron job。

 

Hope the blog can help you fix the same kind of problem you face. Thanks to Chiwo Lee for experience sharing.

 

Regards,

Ning

Myth of HANA

$
0
0

Hi experts,

 

since SAP HANA was available in the year 2011 (GA), I come across a lot of untruth about the new in-memory platform. As consultant I was able to talk to many costumers and other consultants on events like TechED, DSAG, Business partner days etc. Every time I was impressed after this long time that so much dangerous smattering is still out there. Some of them can be easily eleminated by reading the note 2100010 (SAP HANA: Popular Misconceptions)

The most answers to the statements are pretty easy to find in the offical notes, guides and other documents (blogs, presentations, articles etc.), but may it is an overload of information.

 

1) start time

2) cross SID backup

3) col / row store conversion

4) sizing *2

5) statistics

6) data fragmentation
7) persistency layer

8) high memory consumption HANA vs. Linux

9) Backup

10) Backup catalog

 

S stand for statement and A for the answer

 

SQL scripts

Used SQL scripts are available in the attachment of note 1969700 - SQL statement collection for SAP HANA

 

 

1) Start time

S: "The start time (availability of the SAP system) must be 30 to 60min to load all data into memory"

A: Yes, to load all data into memory it takes some time, but for any DB it also takes time to fill its data buffer. For any DB the data buffer will be filled on first access of the data and stay there until the the LRU (least recently used) algorithm takes place and push it out of the buffer.

HANA is loading the complete row store on every start into memory. After this the system is available!

Short description of start procedure:

1) open data files

 

2) read out information about last savepoint ( mapping of logical pages to physical pages in the data file / open transaction list)

 

3) load row store (depends on the size and the I/O subsystem; about 5min for 100GB)

 

4) replay redo logs

 

5) roll back uncommited transactions

 

6) perform savepoint

 

7) load col table defined as preload and lazy load of col tables (async load of Column tables that were loaded before restart)

For more details have a look at the SAP HANA Administration guide (search for "Restart Sequence") or the SAP HANA Administration book => Thanks to Lars and Richard for this great summary!

 

Example:

Test DB 40GB NW 740 system with a none enterprise storage (=slow):

SQL HANA_IO_KeyFigures_Total:

read: 33mb/s
avg-read-size: 31kb
avg-read-time: 0,93ms
write: 83mb/s
avg-write-size: 243kb
avg-write-time: 2,85ms
row store size: 11GB
CPU: 8vcpu (vmware; CPU E5-2680 v2 @ 2.80GHz)

Start time without preload: AVG 1:48

Stop time without preload: AVG 2:15

 

start time with 5GB col table (REPORSRC)

SQL for preload (more information in the guide "SAP HANA SQL and System views Reference"):

alter table REPOSRC preload all

 

verify with HANA_Tables_ColumnStore_PreloadActive script from note 1969700 - SQL statement collection for SAP HANA

 

Start time with preload: AVG 1:49

Stop time with preload: AVG 2:18

 

Why the start time don't increase although 5GB more data have to be loaded?

Since SPS 7, the preloading, together with the reloading, of tables happens async directly after the HDB restart has finished. That way, the system is again available for SQL access that do not require the information of the columns that are still being loaded.

 

With enterprise hardware the start times are faster!

 

If you want to know how long it takes to load all data into memory you can execute a python script.

load all tables into memory with python script:

cdpy (/usr/sap/HDB/SYS/exe/hdb/python_support/)
python ./loadAllTables.py --user=System --password=<password> --address=<hostname> --port=3xx15 --namespace=<schema_name>

[140737353893632, 854.406] << ending loadAllTables, rc = 0 (RC_TEST_OK) (91 of 91 subtests passed), after 854.399 secs

 

In a simular enterprise system it takes about 140-200sec.

 

 

 

2) Cross SID backup

S: "It is not possible not refresh a system via Cross-SID-copy"

A: Cross SID copy (single container) from disk is already available since a long time. Since SPS09 it is also available via backint interface.

Multitenant Database Container (MDC) for a Cross-SID-copy are currently (SPS11) only able to restore via disk.

 

 

 

3) Col / row store conversion

S: "Column tables can't be converted to row store and vice versa. It is defined by sap which tables are stored in which type."

A: It is correct that during the migration the SWPM (used for syscopy) procedure creates files in which store the tables are created.

But you can technically change the type from row to column and vice versa on the fly. But there must be a reason for it, e.g. in advise of SAP Support. If you have no depencies to the application, e.g. custom tables or a standalone HANA installation for your own applications, you can choose freely.

 

In the past SAP delivered a rowstorelist.txt with note 1659383 (RowStore Liste für SAP Netweaver 7.30/7.31 auf SAP HANA Database). This approach is out-dated. Nowadays you can use the latest version of SMIGR_CREATE_DDL with the option "RowStore List" (Note 1815547 - Row/ColumnStore Check ohne rowstorelist.txt)

 

 

 

4) Sizing * 2

S: "You have to double the sizing the result of the sizing report."

A: Results of Sizing reports are final, you dont have to double them.

 

example(BWoH):

|SIZING DETAILS                                                                |

|==============                                                                |

|                                                                              |

| (For 512 GB node)      data [GB]     total [GB]                              |

|                                      incl. dyn.                              |

| MASTER:                                                                      |

| -------                                                                      |

|                                                                              |

|  Row Store                    53            106                              |

|  Master Column Store          11             21                              |

|  Caches / Services            50             50                              |

|  TOTAL (MASTER)              114            178                              |

|                                                                              |

| SLAVES:                                                                      |

| -------                                                                      |

|                                                                              |

|  Slave  Column Store          67            135                              |

|  Caches / Services             0              0                              |

|  TOTAL (SLAVES)               67            135                              |

| ---------------------------------------------------------------              |

|  TOTAL (All Servers)         181            312                              |

 

This is a scale up solution. So Master and Slave are functional on one host. In a scale out solution you have one host as master for the transaction load. This one holds all row store tables. SAP recommends to have a min. of 3 hosts in a BW scale out solution. The other 2 slaves are for the reporting load.

 

Static and dynamic RAM

SAP HANA Main Memory Sizing is divided into static and the dynamic RAM requirement. The static part relates to the amount of main memory that is used for the holding the table data. The dynamic part has exact the same size as the static one and is used for temp data => grouping, sorting, query temp objects etc.

 

In this example you have:

row store 53 *2 = 106GB

Master column 11*2 =21(rounded) + 67*2= 135 (rounded) => 156GB

Caches / Services 50GB is needed for every host

106+156+50 in sum 312GB

 

 

 

5) Statistics

S: "Statistics are not needed any more. So no collect runs are needed"

A: For the Col store the Statement is correct in cause of the known data distribution through the dictionary. For the row store there is an automatically collection of statistics on the fly. So you don't have to schedule them. Currently it is not documented how you can trigger the collection or change sample size.

 

 

 

6) Data Fragmentation

S: "You don't have to take care of data fragmentation. All is saved in memory via col store and there is no fragmention of data"

A: Some tables are created in the row store. The row store still follows the old rules and conditions which results in fragmentation of data. How to analyze it?

Please see note 1813245 - SAP HANA DB: Row store reorganization

 

SELECT HOST, PORT, CASE WHEN (((SUM(FREE_SIZE) / SUM(ALLOCATED_SIZE)) > 0.30)
AND SUM(ALLOCATED_SIZE) > TO_DECIMAL(10)*1024*1024*1024)
THEN 'TRUE' ELSE 'FALSE' END "Row store Reorganization Recommended",
TO_DECIMAL( SUM(FREE_SIZE)*100 / SUM(ALLOCATED_SIZE), 10,2)"Free Space Ratio in %"
,TO_DECIMAL( SUM(ALLOCATED_SIZE)/1048576, 10, 2) "Allocated Size in MB"
,TO_DECIMAL( SUM(FREE_SIZE)/1048576, 10, 2) "Free Size in MB"
FROM M_RS_MEMORY WHERE ( CATEGORY = 'TABLE' OR CATEGORY = 'CATALOG' ) GROUP BY HOST, PORT

Reorg advise: if row store is bigger than 10GB and more than 30% free space

 

!!!Please check all prerequesites in the notes before you start the reorg!!! (online / offline reorg)

Row Store offline Reorganization is triggered at restart time and thus service downtime is required. Since it's guaranteed that there are no update transactions during the restart time, it achieves the maximum compaction ratio.

 

Before

Row Store Size: 11GB

Freespace: ~3GB

in %: 27% (no reorg needed)

 

But for testing I configured the needed parameters in indexserver.ini (don't forget to remove them afterwards!):

4min startup time => while starting the row store will reorganized in offline mode

 

After

Row Store Size: 7,5GB

Freespace: ~250MB

in %: 3,5%

 

Additionally you should consider the tables with multiple containers if revision is 90+. Multiple containers are typically introduced when additional columns are added to an existing table. As a consequence of multiple containers the performance can suffer, e.g. because indexes only take effect for a subset of containers

HANA_Tables_RowStore_TablesWithMultipleContainers

 

The compression methods of the col store (incl. indexes) should also be considered.

As of SPS 09 you can switch the largest unique indexes to INVERTED HASH indexes. In average you can save more than 30 % of space. See SAP Note 2109355 (How-To: Configuring SAP HANA Inverted Hash Indexes) for more information. Compression optimization for those tables:

UPDATE "<table_name>" WITH PARAMETERS ('OPTIMIZE_COMPRESSION' = 'FORCE')

Details:2112604 - FAQ: SAP HANA Compression

 

 

 

7) Persistency layer

S: "The persistency layer consists of exactly the same data which are loaded into memory"

A: As descibed in statement 3) the memory is parted into 2 areas. The temp data won't be stored on disk. The persistency layer on disk consists of the payload of data, before&after images / shadow pages concept + snapshot data + delta log (for delta merge). The real delta structure of the merge scenario only exists in memory, but it is written to the delta logs.

 

Check out this delta by yourself:

SQL: HANA_Memory_Overview

check memory usage vs. disk size

 

 

 

8) High Memory consumption HANA vs. Linux

S: "The used memory of the processes is the memory which is currently in use by HANA"

A: No, for the Linux OS it is not transparent what HANA currently real uses. The numbers in "top" are never maching the ones in the hana studio. HANA communicates free pages not instantly to the OS. There is a time offset for freed memory.

There is a pretty nice document which explaines this behaviour in detail:

http://scn.sap.com/docs/DOC-60337

 

The garbage collection takes by default pretty late. If your system shows a high memory consumtion the root cause may not necessarily a bad sizing or high load. The reason could also be a late GC.

 

2169283 - FAQ: SAP HANA Garbage Collection

One kind of garbage collection we already discussed in 6) row and col fragmentation. Another one is for Hybrid LOBs and there is one for the whole memory. Check out your current heap memory usage with HANA_Memory_Overview.

 

In my little test system the value is 80GB. In this example we have 14GB for Pool/Statistics , 13GB for Pool/PersistenceManager/PersistentSpace(0)/DefaultLPA/Page and 9GB for Pool/RowEngine/TableRuntimeData

Check also the value of col EXCLUSIVE_ALLOCATED_SIZE in the monitoring view "M_HEAP_MEMORY". It contains the sum of all allocations in this heap allocator since the last startup.

 

select CATEGORY, EXCLUSIVE_ALLOCATED_SIZE,EXCLUSIVE_DEALLOCATED_SIZE,EXCLUSIVE_ALLOCATED_COUNT,
EXCLUSIVE_DEALLOCATED_COUNT from M_HEAP_MEMORY
where category = 'Pool/Statistics'
or category='Pool/PersistenceManager/PersistentSpace(0)/DefaultLPA/Page'
or category='Pool/RowEngine/TableRuntimeData';

Just look at the index server port 3xx03 (may be the xsengine is also listed if active)

 

CATEGORYEXCL_ALLOC_SIZEEXCL_DEALLOC_SIZEEXCL_ALLOC_COUNTEXCL_DEALLOC_COUNT
Pool/PersistenceManager/PersistentSpace(0)/DefaultLPA/Page384.055.164.928369.623.433.2166.177.0195.856.165
Pool/RowEngine/TableRuntimeData10.488.371.360792.726.99283.346.94526
Pool/Statistics2.251.935.681.4722.237.204.512.6967.146.662.5277.084.878.887

 

In cause of a lot of deallocation there is a gap between the EXCLUSIVE_ALLOCATED_SIZE and the currently allocated size. The difference is usually free for reuse and can be freed with a GC run.

 

But by default the memory GC will be triggered by default in the following cases:

Parameter + Default valueDetails
async_free_target = 95 (%)When proactive memory garbage collection is triggered, SAP HANA tries to reduce allocated memory below async_free_target percent of the global allocation limit.
async_free_threshold = 100 (%)With the default of 100 % the garbage collection is quite "lazy" and only kicks in when there is a memory shortage. This is in general no problem and provides performance advantages, as the number of memory allocations and deallocations is minimized.
gc_unused_memory_threshold_abs = 0 (MB)Memory garbage collection is triggered when the amount of allocated, but unused memory exceeds the configured value (in MB).
gc_unused_memory_threshold_rel = -1 (%)Memory garbage collection is triggered when the amount of allocated memory exceeds the used memory by the configured percentage.

 

The % values are related to the configured global allocation limit.

 

Unnessarily triggered GC should be absolutely avoided, but it depends on your system load and sizing how you configure these values.

The unused memory will normally be reused by the HDB (free pool), so there is need to trigger the GC manually. But in some cases it is possible that a pool uses more memory. This should be analyzed (1999997 - FAQ: SAP HANA Memory 14. How can I identify how a particular heap allocator is populated?)

If we now trigger a manual GC for the memory area:

hdbcons 'mm gc -f'

 

Before:

heap: 80GB

 

free -m

             total       used       free     shared    buffers     cached

Mem:        129073     126877       2195      15434        142      32393

-/+ buffers/cache:     94341      34731

 

 

Garbage collection. Starting with 96247664640 allocated bytes.

82188451840 bytes allocated after garbage collection.

 

After:

heap: 72GB

 

free -m

             total       used       free     shared    buffers     cached

Mem:        129073     113680      15393      15434        142      32393

-/+ buffers/cache:     81144      47929


 

So at this time inside the hdb there is in this scenario not so much difference, but at the OS side the not allocated memory will be freed.

You don't have to do this manually! HANA is fully aware of the memory management!

 

If you get an alert (id 1 / 43) in cause of memory usage of your services, you should analyze not only row and col store. Take also care of the GC of the heap memory. In the past there were some bugs in this area.

Alert defaults:

ID 1: Host physical memory usage:      low: 95% medium: 98% high:100%

ID43: memory usage of services:         low: 80% medium: 90% high:95%

As you can see a GC will be triggered lazy at 100% fill ratio of the global allocationlimit by default may be it is too late for your system before the GC takes place or you can react to it.

 

In addition to the memory usage check the mini check script and the note advices. If you are not sure how to analyze or solve the issue you can order a TPO service at SAP (2177604 - FAQ: SAP HANA Technical Performance Optimization Service).

 

 

 

9) Backup

S: "Restore requires logs for consistent restore"

A: wrong, a HANA backup based on snapshot technology. So the backup is consistent without any additional log file. This means it is a full online copy of one particular consistent state which is defined by the log position at the time executing the backup.

Sure if you want to roll forward you have to apply Log Files for point in time recovery or most recent state.

 

 

 

10) Backup Catalog

S: "Catalog information are stored in a file like oracle *.anf which is needed for recovery"

A: The backup catalog is saved on every data AND log backup. It is not saved as human readable file! you can check the catalog in hana database studio or with command "strings log_backup_0_0_0_0.<backupid>" in the backup location of your system if you make backup-to-disk.

 

The backup catalog includes all needed information which file belongs to which backup set. If you delete your backups on disk/VTL/tape level the backup catalog still holds the unvalid information.

 

Housekeeping of the backup catalog

There is currently no automatism which clean it up. Just check the size of your backup catalog if it is bigger than about 20MB you should take care of housekeeping (depends on your backup retention and size of the system) the backup catalog, because it will be saved as already mentioned EVERY log AND data backup. This means more than 200 times a day! How big is your current backup catalog of your productive HANA system? Check your backup editor in hana studio and click on show log backups. Search for the backup catalog and select it => check the size.

 

 

Summary

At the end you also have to take care of your data housekeeping and resource management. You can save a lot of resources if you consider all the hints in the notes.


I hope I could clarify some statements for you.



###########

# Edit V4

###########

2100010 - SAP HANA: Popular Misconceptions

(Thanks to Lars for the hint)

 


Best Regards,

Jens Gleichmann



###########

# History

###########

V4: Updated statistics (5); row/col statement adjusted, format adjusted

V5: adjusted format and added details for backup catalog

SAP HANA : The Row store , column store and Data Compression

$
0
0

Here is an attempt to explain the row store data layout, column store data layout and the data compression technique.

 

Row Store : Here all data connect to a row is placed next to each other. See below an example.

 

Table 1 :

Name

Location

Gender

…..

…..

….

Sachin

Mumbai

M

Sania

Hyderabad

F

Dravid

Bangalore

M

…….

……

……

 

Row store corresponding to above table is

 

row store.jpg

 

 

Column store : Here contents of a column are placed next to each other. See below illustration of table 1.

 

column store.png

 

 

Data Compression : SAP HANA provide series of data compression technique that can be used for data in the column store. To store contents of a column , the HANA database creates minimum two data structures. A dictionary vector and an attribute vector. See below table 2 and the corresponding column store.

 

Table 2.

Record

Name

Location

Gender

…..

…..

…..

….

3

Blue

Mumbai

M

4

Blue

Bangalore

M

5

Green

Chennai

F

6

Red

Mumbai

M

7

Red

Bangalore

F

……

…..

……

……

 

column store2.png

 

 

Here in the above example the column ‘Name’ has repeating values ‘Blue’ and ‘Red’. Similarly for ‘Location’ and ‘Gender’. The dictionary vector stores each value of  a column only once in a sorted order and also a position is maintained against each value. With reference to the above example , the dictionary vectors of Name , Location and Gender could be as follows.

 

Dictionary vector : Name

Name

Position

….

……

Blue

10

Green

11

Red

12

…..

……

 

Dictionary vector : Location

Location

Position

….

……

Bangalore

3

Chennai

4

Mumbai

5

…..

……

 

 

Dictionary vector : Gender

Gender

Position

F

1

M

2

 

 

Now the Attribute vector corresponding the above table would be as follows. Here it stores the integer values , which is the positions in dictionary vector.

 

  dictionary enco.png

Parallelization options with the SAP HANA and R-Integration

$
0
0

Why is parallelization relevant?

 

The R-Integration with SAP HANA aims at leveraging R’s rich set of powerful statistical, data mining capabilities, as well as its fast, high-level and built-in convenience operations for data manipulation (eg. Matrix multiplication, data sub setting etc.) in the context of a SAP HANA-based application. To benefit from the power of R, the R-integration framework requires a setup with two separate hosts for SAP HANA and the R/Rserve environment. A brief summary of how R processing from a SAP HANA application works is described in the following:

 

  • SAP HANA triggers the creation of a dedicated R-process on the R-host machine, then
  • R-code plus data (accessible from SAP HANA) are transferred via TCP/IP to the spawned R-process.
  • Some computational tasks take place within the R-process, and
  • the results are sent back from R to SAP HANA for consumption and further processing.


For more details, see the SAP HANA R Integration Guide: http://help.sap.com/hana/SAP_HANA_R_Integration_Guide_en.pdf

 

There are certain performance-related bottlenecks within the default integration setup which should be considered. The main ones are the following:

  • Firstly, latency is incurred when transferring large datasets from SAP HANA to the R-process for computation on the foreign host machine.
  • Secondly, R inherently executes in a single threaded mode. This means that, irrespective of the number of CPU resources available on the R-host machine, an R-process will by default execute on a single CPU core. Besides full memory utilization on the R-host machine, the available CPU processing capabilities will remain underutilized.


A straightforward approach to gain performance improvements in the given setup is by leveraging parallelization. Thus I want to present an overview and highlight avenues for parallelization within the R-Integration with SAP HANA in this document.


Overview of parallelization options


The parallelization options to consider vary from hardware scaling (host box) to R-process scaling and are illustrated in the following diagram


0-overview.png

The three main paths to leverage parallelization, as illustrated above, are the following:

     (1) Trigger the execution of multiple R-calls in parallel from within SQLScript procedures in SAP HANA

     (2) Use parallel R libraries to spawn child (worker) R processes within parent (master) R-process execution

     (3) Scale the number of R-host machines connected to SAP HANA for parallel execution (scale memory and add computational power)


While each option can be implemented independently of one another, they can as well be combined and mixed. For example if you go for (3)– scaling number of R-hosts, you need (1)– Trigger the execution of multiple R-calls, for parallelism to take place. Without (1), you may remain “only” in a better high availability/fault tolerant scenario.  


Based on the following use case, I would illustrate the different parallelization approaches using some code examples:

A Health Care unit wishes to predict cancer patient’s survival probability over different time horizons, after following various treatment options based on diagnosis.  Let's assume the following information:

  • Survival periods for prediction are: half year, one year and two years
  • Accordingly, 3 predictive models have been trained (HALF, ONE, TWO) to predict a new patient’s survival probability over these periods, given a set predictor variables based on historical treatment data.


In a default approach without leveraging parallelization, you would have one R-CALL transferring a full set of new patient data to be evaluated, plus all three models from SAP HANA to the R-host. On the R-host, a single-threaded R process will be spawned. Survival predictions for all 3 periods would be executed sequentially. An example of the SAP HANA stored procedure of type RLANG is as shown below.


0-serial.png

In the code above 3 trained models (variable tr_models) are passed to the R-Process for predicting the survival of new patient data (variable eval). The survival prediction based on each model takes place in the body of the “for loop” statement highlighted above.

 

Performance measurement: For dataset size of 1.038.024 (~16.15 MB) observations and 3 trained Blob model objects (each~26.8MB), an execution time of 8.900 seconds was recorded.


There are various sources of overhead involved in this scenario. The most notable ones are:

  • Network communication overhead, in copying one dataset + 3 models (BLOB) from SAP HANA to R.
  • Code complexity, sequentially executing each model in a single-threaded R-process. Furthermore, the “for” loop control construct, though in-built into base R, may not be efficient from a performance perspective in this case.

 

By employing parallelization techniques, I hope to achieve better results in terms of performance. Let the results of this scenario constitute our benchmark for parallelization.



Applying the 3 parallelization options to the example scenario


1. Parallelize by executing multiple R-calls from SAP HANA


We can exploit the inherent parallel nature of SAP HANA’s database processing engines by triggering multiple R-calls to run in parallel as illustrated as above. For each R-call triggered by SAP HANA, the Rserve-process would spawn an independent R-runtime process on the R-host machine.

 

An example illustrating how an SAP HANA SQLScript-stored procedure with multiple parallel calls of stored procedure type RLANG is given below. In the example, one thought is to separate patient survival prediction across 3 separate R-Calls as follows:

1-1 Rlang.png

  • Create an RLANG stored procedure handling survival prediction for just one model ( see input variable tr_model).
  • Include expression “READS SQL DATA” (as highlighted above) in the RLANG procedure definition for parallel execution of R-operators to occur, when embedded in a procedure of type SQLScript. Without this instruction, R-calls embedded in an SQLScript will excute sequentially.
  • Then create an SQLSCRIPT procedure

1-2 SQLScript.png


  • Embed 3 RLANG procedure-calls within the SQLSCRIPT procedure as highlighted. Notice that I am calling the same RLANG procedure defined previously but I pass on different trained model objects (trModelHalf, trModelOne, trModelTwo) to separate survival predication across different R-calls.
  • In this SQLScript procedure you can include the READS SQL DATA expression (recommended for security reasons as documented in the SAP HANA SQLScript Reference guide) in the SQLSCRIPT procedure definition, but to trigger R-Calls in parallel it is not mandatory. If included however, you cannot use DDL/DML instructions (INSERT/UPDATE/DELETE etc) within the SQLSCRIPT procedure.
  • On the R host, 3 R processes will be triggered, and run in parallel. Consequently, 3 CPU cores will be utilized on the R machine.


Performance measurement: In this parallel R-calls scenario example, an execution time of 6.278 seconds was experienced. This represents a performance gain of roughly 29.46%. Although this indicates an improvement in performance, we may have theoretically expected a performance improvement close to 75%, given that we trigger 3 R-calls. The answer for this gab is overhead. But which one?


In this example, I parallelized survival prediction across 3 R-calls, but still transmit the same patient dataset in each R-call. While the improvement in performance could be explained, firstly, by the fact that now HANA transmits lesser data per R-call (only one model, as opposed to three in the default scenrio) and consequently the data transfer may be faster. Secondly, each model survival prediction is performed in 3 separate R-runtimes.

 

There are two other avenues we could explore for optimization in this use case scenario. One is to further parallelize R-runtime prediction itself (see section 2). The other is to further reduce the amount of data transmitted per R-call by splitting the patient dataset in HANA and parallelize the data transmitted across separate R-calls (see section 4).

 

Please note that without the READS SQL DATA instruction in the RLANG procedure definition an execution time of 13.868 seconds was experienced. This is because each R-CALL embedded in the SQLscript procedure is executed sequentially (3 R-call roundtrips).


2. Parallelize the R-runtime execution using parallel R libraries



By default, R execution is single threaded. No matter how much processing resource is available on the R-host machine (64, 32, 8 CPU cores etc.), a single R runtime process will only use one of them. In the following I will give examples of some techniques to improve the execution performance by running R code in parallel.

 

Several open source R packages exist which offer support for parallelism with R. The most popular packages for R-runtime parallelism on a single host are “parallel” and “foreach. The “parallel” package offers a myriad of parallel functions, each specific to the nature of data (lists, arrays etc.) subject to parallelism. Moreover, for historical reasons, one can classify these parallel functions roughly under two broad categories, prefixed by “par-“ (parallel snow cluster) and “mc-“ (multicore).

 

In the following example I use the multicore function mcLapply() to invoke parallel R processes on the patient dataset. Within each of the 3 parallel R-runtimes triggered from HANA, split the patient data into 3 subsets, then, parallelize survival prediction on each subset. See figure below.


2-1.png

The script example above highlights the following:

  • 3 CPU cores are used (variable n.cores)by the R-process
  • The patient data is split into 3 partitions, according to number of chosen cores, using the “splitIndices” function.
  • The task to be performed (survival prediction) by each CPU core is defined in function “scoreFun
  • Then I call the mclapply()split.idx) , how many CPU cores to use, and which function should be executed by each core.


In this example, 3 R-processes (master) are initially triggered in parallel on the R-host by the 3 R-calls. Then within each master R-runtime, 3 additional child R-processes (worker) are spawn by calling mclapply(). On the R-host, therefore, we will have 3 processing groups executing in parallel, each consisting of 4 R-Runtimes (1 for master and 3 for workers). Each group is dedicated to predict patient survival based one model. For this setup 12 CPUs will be used in total.

 

Performance measurement: In this parallel R package scenario using mclapply(), an execution time of 4.603 seconds was observed. This represents roughly 48.28% gain in performance over the default (benchmark) scenario and a roughly 20% improvement over the parallel R-call example presented in section 2.


3. Parallelize by scaling the number of R-Host machines connected to HANA for parallel execution


It is also possible to connect SAP HANA to multiple R-hosts, and exploit this setup for parallelization. The major motivation for choosing this option is to increase the number of processing units (as well as memory) available for computation, provided the resources of a single host are not sufficient. With this constellation, however, it would not be possible to control which R-host receives which R request. The choice will be determined randomly via an equally-weighted round-robin technique. From an SQLScript procedure perspective, nothing changes. You can reuse the same parallel R-call scripts as exemplified in section 1 above.


Setup Prerequisites


  • Include more than one IPv4 addresses in CalcEngine parameter cer_rserve_addressesindexserver.inixsengine.ini file (see section 3.3 of SAP HANA R Integration Guide)
  • Setup parallel R-Calls within as SQLSCRIPT procedure, as described in section

3-1 config.png

I configure 2 R-host addresses in the calcengine rserve address option shown above. While still using the same SQLScript procedure as in the 3 R-Calls scenario example (I change nothing in the code), I trigger parallelization of 3 R-calls across two R-host machines.


3-2 Parallel R -call.png

Performance measurement: The scenario took 6.342 seconds to execute. This execution time is similar to the times experienced in the parallel R-calls example. This example only demonstrates that parallelism works in a multi R-host setup. Its real benefit for parallelization comes into play when it believed the computational resources (CPUs, memory) available on one R-box are not enough.


4. Optimizing data transfer latency between SAP HANA and R


As discussed in section 1, one performance overhead is in the transmission of the full patient data set in each parallel R-call from HANA to R (see example in section 1). We could further reduce the latency in data transfer by splitting data set into 3 subsets in HANA, then using 3 parallel R-calls we transfer each subset from HANA to R for prediction. In each R call, however, we would have to also transfer all 3 models.


An example illustrating this concept is provided in the next figure.


4-1 split in hana.png


In the example above, the following is performed

  • The patient dataset (eval) is split into 3 subsets in HANA (eval1, eval2, eval3).
  • 3 R-calls are triggered, each with the transferring a data subset together with all 3 models.
  • On the R-host, 3 master R-process will be triggered. Within each master R-Process I parallelize survival prediction across 3 cores using pair functions mcpallelel()/mccollect() for task parallelism in the “parallel” R-package from the  (task parallelism) as shown below.


4-2 parallelize in R.png

 

  • I create and R funtion (scoreFun) to specify a particular task. This function focuses on predicting survival based on one model input parameter.
  • For each call of mcparallel() function an R process is started in parallel and will evaluate the expression in R function definition scoreFun. I assign each model individually.
  • With a list of assigned tasks I then call mccollect() to retrieve the results of parallel survival prediction.


In this manner, the overall data transfer latency is reduced to the size of data in each subset. Furthermore, we still maintaining completeness of data via parallel R-calls. The consistency in the results of this approach is guaranteed if there is no dependency in the result computation for each observation in the data set.

 

Performance measurement: With this scenario, an execution time of 2.444 seconds was observed. This represents a 72.54% performance gain over the default benchmark scenario. This represents roughly 43% improvement over the parallel R-call scenario example in section 1, and a 24.26% improvement over the parallel R-runtime execution (with parallel R-libraries) example in section 2. A fantastic result supporting the case for parallelization.


Concluding Remarks


The purpose of this document is to illustrate how techniques of parallelization can be implemented to address performance-related bottlenecks within the default integration setup between SAP HANA and R. The document presented 3 parallelization options one could consider:


  • Trigger parallel R-calls from HANA
  • Use parallel R libraries to parallelize the R-execution
  • Parallelize R-calls across multiple R-hosts.

 

With parallel R libraries you can improve the performance of a triggered R-process execution by spawning additional R-runtime instances executing on the R-host (see section 2). You can either parallelize by data (split data set computation across multiple R-runtimes), or by task (split algorithmic computation across multiple R-runtimes). A good understanding of the nature of the data and the algorithm is, therefore, fundamental to choosing how to parallelize. When executing parallel R runtimes using R-libraries we should remember that there is an additional setup overhead incurred by the system when spawning child (worker) R-processes and terminating them. The benefits of parallelism using option should, therefore, be appreciated after prior testing in an environment similar to the productive environment it will eventually run.


On the other hand, when using the trigger parallel R-calls option, no additional overhead is incurred on the overall performance. This option provides us with a means to increase the number of data transmission lanes between HANA and the R-host, as well as allows us spawn multiple parent R-runtime processes in the R-host. Exploiting this option led to the following key finding: The data transfer latency between HANA and R can, in fact, be significantly reduced by splitting the data set in HANA, and then parallelize the transfer of each subset from HANA to R using parallel R-calls (as illustrated in section 4).





Other Blog Links

Install R Language and Integrate it With SAP HANA

Custom time series analytics with HANA, R and UI5

New SQLScript Features in SAP HANA 1.0 SPS9

How to see which R packages are installed on an R server using SAP HANA Studio.

Quick SAP HANA and R usecase

Let R Embrace Data Visualization in HANA Studio

Connect ABAP with R via FastRWeb running on Rserve

HANA meets R    

Creating an OData Service using R

SAP HANA Application Example : Personal Spending Analysis - HANA Models

[SAP HANA Academy] Live3: Web Services - Using OData

$
0
0

[Update: April 5th, 2016 -  The Live3 on HCP tutorial series was created using the SAP HANA Cloud Platform free developer trial landscape in January 2015. The HCP landscape has significantly evolved over the past year. Therefore one may encounter many issues while following along with series using the most recent version of the free developer trail edition of HCP.]


Continuing the Live3 on the SAP HANA Cloud Platform course the SAP HANA Academy’s Philip Mugglestone provides a closer examination of the previously setup OData web services by running some example queries. Watch Philip's tutorial video below.

Screen Shot 2015-04-29 at 12.10.32 PM.png

(0:20 –  4:20) Viewing Meta Data and Entities in JSON Format

 

Running the services.xsodata file has generated a URL based on the trail account (p number), SAP HANA instance (dev), project (live3), and file (services.xsodata). Calling the file lists out the existing entities (Tweets, Tweeters, TweetersClustered and Clusters).

 

With OData we can make requests via URL based syntax. For example appending /$metadata to the end of the URL displays the full meta data for all of the properties within each entity. The data you get from OData is self referencing and is very important as SAPUI5 can read this meta data automatically to generate the screens.

Screen Shot 2015-04-29 at 12.16.49 PM.png

Be careful when looking at the individual entities in OData as there may be 100,000s of, for example Tweets, and you don’t want to read them all. So appending /Tweets?$top=3 to the URL only displays the top 3 Tweets in XML format.

Screen Shot 2015-04-29 at 12.24.54 PM.png

The XML format appears a bit messy so you can convert it to JSON format by adding &$format=json to the URL. By default the JSON format isn’t as readable as possibly desired so you download for free JSONView from the chrome store in order to display it in a nice readable format.

Screen Shot 2015-04-29 at 12.34.42 PM.png

To see only certain parts of an entity's data, for instance the id and text columns, you can append &$select= id,text to the URL. This returns only the id and text values, as well as the meta data for the Tweets entity.

Screen Shot 2015-04-29 at 12.37.03 PM.png

(4:20 – 6:30) OData's Filter, Expand and Count Parameters

 

Philip next shows the data for his Clusters entity by adding /Clusters?$format=json to the URL. Similar to a where clause in SQL, Philip filters his results by adding &$filter=clusterNumber eq1 to display only his first cluster.

Screen Shot 2015-04-29 at 12.38.13 PM.png

To see the Twitters association from the Clusters entity Philip adds an expand parameter by entering &$expand=Tweeters to the end of the URL. This returns all of the information for each of the individual Twitters in cluster 1.

Screen Shot 2015-04-29 at 12.39.49 PM.png

To see the number of rows for an entity add /$count after the entity’s name in the URL.


Follow along with the Live3 on HCP course here.


SAP HANA Academy over 900 free tutorial videos on using SAP HANA and SAP HANA Cloud Platform.


Follow @saphanaacademy


[SAP HANA Academy] Live3: Web Service - Setup XSJS

$
0
0

[Update: April 5th, 2016 -  The Live3 on HCP tutorial series was created using the SAP HANA Cloud Platform free developer trial landscape in January 2015. The HCP landscape has significantly evolved over the past year. Therefore one may encounter many issues while following along with series using the most recent version of the free developer trail edition of HCP.]


Part of the SAP HANA Academy’s Live3 on HCP course, the below video tutorial from Philip Mugglestone shows how to add server-side scripting capabilities to the live3 web services project. With this you can configure actions to refresh the clustering and reset the database. Watch Philip’s video below.

Screen Shot 2015-05-05 at 3.27.02 PM.png

(0:35 – 3:00) Inserting the Proper Schema Name and P Number into the services.xsjs Code

 

With the live3 project selected in the SAP Web-based Development Workbench, open the services folder of the Live3 GitHub code repository and drag the services.xsjs file in the Multi-File Drop Zone. First you must do a global replace to insert your schema name. Also you must insert your account p number where marked in the code as the code checks to verify if the user has the execute privilege. After verification the user can preform the reset and/or cluster operation.

 

(3:00 – 6:00) Examining the Code’s Logic

 

The code is very straight forward. It first checks if the user has the privilege to execute. If so then the URL command (cmd) parameter will be returned. It will pause there and wait for the command. If cmd=reset then it will call the reset function and if cmd=cluster than it will call the cluster function. If neither reset nor cluster is entered then it will display invalid command. If the user isn’t authorized then a not authorized message will appear.

Screen Shot 2015-05-05 at 3.52.21 PM.png

The reset function’s code first sets the schema and then truncates (empties) the Tweets table that is loaded directly via node.js. Next it empties the PAL results and centers tables. Then the full text analysis index is first cleaned out and then recreated using the same code that was used earlier in the setup text analysis piece. The only difference from earlier is that the code is modified with a backslash in front of every single quotation mark in the SQL.

Screen Shot 2015-05-05 at 3.48.42 PM.png

The cluster function’s code is similar to the setup Predictive SQL code. The schema is set and the PAL results and centers tables are truncated. Then the procedure is called. On the web as opposed to seeing the results directly, instead the results table will display questions marks first. Then it will loop around a set of results and then insert those results into the table using JavaScript.

Screen Shot 2015-05-05 at 3.51.17 PM.png

(6:00 – 7:30) Testing services.xsjs

 

Executing the services.xsjs file will open a web page that displays invalid command: undefined. This should happen as it didn’t recognize the default command that was specified. So you must delete the default anti caching system that appears after /service.xsjs? in the URL and them add a valid command. For instance cmd=cluster.

 

Entering the command for cluster won’t display anything on the web page at this point. However to show that the file has run with a valid command open the developer tools (control+shift+I in Chorme) and go to the network tab. In the network tab there will be information about the call.

Screen Shot 2015-05-05 at 3.53.00 PM.png

Follow along with the Live3 on HCP course here.


SAP HANA Academy over 900 free tutorial videos on using SAP HANA and SAP HANA Cloud Platform.


Follow @saphanaacademy

[SAP HANA Academy] Live3: Web Services - Debugging

$
0
0

[Update: April 5th, 2016 -  The Live3 on HCP tutorial series was created using the SAP HANA Cloud Platform free developer trial landscape in January 2015. The HCP landscape has significantly evolved over the past year. Therefore one may encounter many issues while following along with series using the most recent version of the free developer trail edition of HCP.]


The SAP HANA Academy’s Philip Mugglestone continues the Live3 on HCP course by showing how the server-side scripting application can be easily debugged using the SAP HANA Web-based Development Workbench.  Check out Philip’s tutorial video below.

Screen Shot 2015-05-07 at 10.29.27 AM.png

(0:15 – 4:10) How to Debug the XSJS Application

 

First identify the user account. This is listed near the top right corner of the SAP HANA Web-based Development Workbench. Right click on the user name (in Philip’s case it begins with DEV_) and select inspect element. Then copy the user account name so it can be used later on in the debugging.

Screen Shot 2015-05-07 at 10.37.01 AM.png

Now a definition must be created that enables this user to preform debugging. When logged into the server go to the URL displayed below ending with /sap/hana/xs/debugger. On the Grant Access screen paste in the copied account name into the Username text box. Set an expiration date and time for when the debugging access will cease and then click the grant button. Now this user can debug the session.

Screen Shot 2015-05-07 at 10.38.24 AM.png

Back in the SAP HANA Web-based Development Workbench choose the services.xsjs file and hit the execute button to open it up in a new browser tab. Append cmd=cluster1 to the end of the URL to return an invalid command. Now open the developer tools (control+shift+I in chrome) and navigate to the resources tab. Then expand the Cookies folder and open the session cookie file. Identify the value of the xxSessionId.

Screen Shot 2015-05-07 at 10.44.59 AM.png

Now back in the SAP HANA Web-based Development Workbench click the settings button. Then choose the value of the xxSessionId as the session to debug and click apply. A message will appear that the debugger has been attached to the session. Next set a break point where the command is being processed in the code.

Screen Shot 2015-05-07 at 10.46.16 AM.png

Now make a call in the URL. Philip enteres cmd=cluster2. The screen won’t change from earlier and will still say Invalid Command: cluster1 as it will say waiting for hanaxs.trail.ondemand. This is because the debugger has been opened in the SAP HANA Web-based Development Workbench. You will see that the cluster 2 command has been entered and the debugger has come to the break point that was set. You have the normal debugging options such as step in, step over, step through, etc. If you hit the resume button on the debugger than on the file page it will now say Invalid Command: cluster2.

Screen Shot 2015-05-07 at 10.55.56 AM.png

This is how you can access the debugger to preform real-time debugging when using xs in SAP HANA.

 

Follow along with the Live3 on HCP course here.


SAP HANA Academy over 900 free tutorial videos on using SAP HANA and SAP HANA Cloud Platform.


Follow @saphanaacademy

[SAP HANA Academy] Live3: Web Services - Authentication

$
0
0

[Update: April 5th, 2016 -  The Live3 on HCP tutorial series was created using the SAP HANA Cloud Platform free developer trial landscape in January 2015. The HCP landscape has significantly evolved over the past year. Therefore one may encounter many issues while following along with series using the most recent version of the free developer trail edition of HCP.]


In the next part of the SAP HANA Academy’s Live3 on HCP course Philip Mugglestone explains why a “proxy” authentication server is needed to access your SAP HANA Cloud Platform web services from a SAP HANA Cloud HTML5 application. Watch Philip’s tutorial video below.

Screen Shot 2015-05-08 at 10.05.31 AM.png

(0:12 – 3:00) Issue with HTML5 Authentication for the HCP Developer Trail Edition

 

Prior to this tutorial the web services were set up using the SAP HANA instance. We now want to access our Live3 app, OData, and server side JavaScript from a front end application UI.

 

Back in the SAP HANA Cloud Platform Cockpit our SAP HANA instance now has one application. Clicking on the application shows the URL, which you can navigate to and then enter a command like we've done in the earlier videos in the Live3 course.

 

There is one slight complication to building a HTML5 front end application. Our SAP HANA instances in the developer trail edition of HCP use SAML 2.0 authentication. Normally to access a backend system when working with a HTML5 application you use a destination in order to reference a folder or URL. The destination appears to be local to where the HTML5 application is hosted. However, it is pushed out to a backend system that can be hosted anywhere on the internet (even behind a firewall if you use the cloud connector). The destination is very important as it allows you get around the restriction of most browsers.

 

The trail edition of the SAP HANA Cloud Platform uses only SAML 2.0 as the authentication for the SAP HANA instance. SAML 2.0 is not an authentication method available in the destination configuration in the SAP HANA Cloud Platform Cockpit. Fortunately there is workaround.

Screen Shot 2015-05-08 at 10.32.13 AM.png

(3:00 – 4:45) Explanation for Proxy’s Necessity via the Live3 Course Architecture

 

Normally the browser or mobile HTML5 app would access the SAP HANA Cloud Platform where the HTML5 app is hosted. It would then access a backend system, which is SAP native web services, through a destination. However, we can’t connect the destination to the SAP HANA XS instance. So a destination can be defined that goes through the SAP HANA Cloud Connector that is installed locally on the desktop. Then a proxy will be inserted in-between the SAP HANA Cloud Connector and the native web services to account for the SAML 2.0 authentication and then connect back to the destination. This would not be run in production but is being used in this course purely as a work around of a technical limitation of the free trail developer edition of the SAP HANA Cloud Platform.

Screen Shot 2015-05-08 at 10.35.08 AM.png

(4:45 – 5:45) Locating the Proxy

 

The necessary proxy was created by SAP Mentor, Gregor Wolf. Search Google for “Gregor Wolf GitHub” and click on the link to his page. Under the popular repositories section open the hanatrail-auth-proxy file. Written in node.js the file will allow us to access the SAP HANA web services via a destination. The next video will detail how to download and install the proxy.


Follow along with the SAP HANA Academy's Live3 on HCP course here.


SAP HANA Academy - Over 900 free tutorial videos on using SAP HANA and SAP HANA Cloud Platform.


Follow @saphanaacademy

[SAP HANA Academy] Live3: Web Services - Authentication Setup Proxy

$
0
0

[Update: April 5th, 2016 -  The Live3 on HCP tutorial series was created using the SAP HANA Cloud Platform free developer trial landscape in January 2015. The HCP landscape has significantly evolved over the past year. Therefore one may encounter many issues while following along with series using the most recent version of the free developer trail edition of HCP.]


Continuing from the previous tutorial video of the SAP HANA Academy’s Live3 on HCP course, Philip Mugglestone shows how to setup the “proxy” authentication server for the HCP trail developer edition. Watch Philip's tutorial video below.

Screen Shot 2015-05-11 at 10.43.55 AM.png

(0:20 – 3:30) Installing the Prerequisites for the hanatrail-auth-proxy File and Modifying its Code

 

On the hanatrail-auth-proxy page located on SAP Mentor, Gregor Wolf’s GitHub, click on the download ZIP button. Extract the downloaded zip and then open a command window on the hantrail-auth-proxy file.

 

First a few prerequisite node.js modules (cheerio and querystirng) must be installed. In the command window enter npm install cheerio. Wait a few seconds for the cheerio installation to be completed before entering npm install querystring.

Screen Shot 2015-05-11 at 12.04.10 PM.png

*Note – The component has been updated since this video was recorded. Simply use “npm install” from the main hanatrail-auth-proxy folder. There is now no need to install cheerio and querstring explicitly.*

 

Next we need to make a few changes to the hanatrail-auth-proxy code. First right click to edit the config.js file with notepad++. First you must set a port to use. This will create a web server that is similar to the nodejs we created earlier for loading the Twitter data.


You also must insert the correct host. The host is the beginning of the services.xsodata URL. For example Philip’s host is s7hanaxs.hanatrail.ondemand.com. Leave the timeout and https as is before saving the code.

Screen Shot 2015-05-11 at 12.09.28 PM.png

*Note – The config.js and server-basic-auth files have moved to the examples subfolder. You must still verify that the “host” option in examples/config.js matches your SAP HANA XS instance.*

 

(3:30 – 6:30) Running the Proxy

 

To start the proxy application, back in the command window enter node server-basic-auth.js. A message will appear saying the SAP HANA Cloud Platform trail proxy is running on the port host number.

Screen Shot 2015-05-11 at 12.17.15 PM.png

Open a new web browser tab and enter localhost:portnumber/URL of application. So in Philip’s example he enters the URL displayed below.

Screen Shot 2015-05-11 at 11.55.39 AM.png

After logging in with your HCP p number the authentication for the SAP HANA instance using SAML 2.0 should be preformed automatically. Effectively now the proxy, acting as a local web server, talks as if it’s the SAP HANA Cloud Platform trial edition. You can now make all of the calls that were demonstrated in previous videos (e.g. metadata, clusters) using the localhost URL.

 

Follow along with the SAP HANA Academy's Live3 on HCP course here.


SAP HANA Academy - Over 900 free tutorial videos on using SAP HANA and SAP HANA Cloud Platform.


Follow @saphanaacademy

SAP HANA Data Sheet

$
0
0

SAP HANA is built on the next generation, massively parallel, in-memory data processing design paradigm to enable faster information processing. This new architecture enables converged OLTP and OLAP data processing within a single in-memory column based data store with ACID compliance, while eliminating data redundancy and latency. By providing advanced capabilities, such as predictive, text analytics, search, spatial processing, graph, time series, streaming, togather with data integration, data quality and application services on the same architecture, it further simplifies application development and processing across IoT, big data sources and structures. This makes SAP HANA the most suitable platform for building and deploying next-generation, realtime, applications and analytics.

 

This data sheet explains the capabilities, features and benefits of SAP HANA platform.

 

SAP HANA Platform Data Sheet

 

Last Update : April 2016

Unable to Open Alert History Information Due to Large Table _SYS_STATISTICS.STATISTICS_ALERTS_BASE

$
0
0

Recently, a customer says that there are huge amounts of alerts shown in SAP HANA Studio/DBACOCKPIT in one of SAP HANA system which has not been monitored for long time.
Alert Priority.jpg

The alert detailed information page is hanging and does not return or returns errors listed below after clicking high priority alerts for example (the overview page is hanging in the worse situation).

Return Error.jpg

 

From DBACOCKPIT -> System Information -> Large Tables, I see that the size of table _SYS_STATISTICS.STATISTICS_ALERTS_BASE which contains alert history has more than 30GB.

 

According to note 2170779 - SAP HANA DB: Big Statistics Server Table STATISTICS_ALERTS_BASE Leads to Performance Impact on the System

 

Firstly, customer using embedded statistics server with MDC environment, I have to disable embedded statistics server within System DB to prevent endless delete situation (the configuration takes effect immediately, no need to restart HANA DB).

nameserver.ini [statisticsserver] active = false

 

Secondly, cleanup the old alerts which more than 1 day for example then check and fix the latest alerts which takes around 30 minutes for me.

DELETE FROM "_SYS_STATISTICS"."STATISTICS_ALERTS_BASE" WHERE "ALERT_TIMESTAMP" < add_days(CURRENT_TIMESTAMP, -25);

 

Then I see the latest alerts and their detail information and try to fix one by one. For alerts do not need to be keep for long time, I set the shorten retetion date.

update _SYS_STATISTICS.STATISTICS_SCHEDULE set RETENTION_DAYS_CURRENT = 10 where ID = 79

 

Thirdly, enable embedded statistics server.

nameserver.ini [statisticsserver] active = true

 

Last but not least, I try to persuad customer to monitor the system in their daily or weekly tasks.

HANA Rules Framework

$
0
0

Welcome to the SAP HANA Rules Framework (HRF) Community Site!


SAP HANA Rules Framework provides tools that enable application developers to build solutions with automated decisions and rules management services, implementers and administrators to set up a project/customer system, and business users to manage and automate business decisions and rules based on their organizations' data.

In daily business, strategic plans and mission critical tasks are implemented by a countless number of operational decisions, either manually or automated by business applications. These days - an organization's agility in decision-making becomes a critical need to keep up with dynamic changes in the market.


HRF Main Objectives are:

  • To seize the opportunity of Big Data by helping developers to easily build automated decisioning solutions and\or solutions that require business rules management capabilities
  • To unleash the power of SAP HANA by turning real time data into intelligent decisions and actions
  • To empower business users to control, influence and personalize decisions/rules in highly dynamic scenarios

HRF Main Benefits are:

Rapid Application Development |Simple tools to quickly develop auto-decisioning applications

  • Built-in editors in SAP HANA studio that allow easy modeling of the required resources for SAP HANA rules framework
  • An easy to implement and configurable SAPUI5 control that exposes the framework’s capabilities to the business users and implementers

Business User Empowerment | Give control to the business user

  • Simple, natural, and intuitive business condition language (Rule Expression Language)

Untitled.png

  • Simple and intuitive UI control that supports text rules and decision tables

NewTable.png

  • Simple and intuitive web application that enables business users to manage their own rules

Rules.png   

Scalability and Performance |HRF as a native SAP HANA solution leverages all the capabilities and advantages of the SAP HANA platform.


For more information on HRF please contact shuki.idan@sap.com  and/or noam.gilady@sap.com

Interesting links:

SAP solutions already utilizing HRF:

Here are some (partial list) SAP solutions that utilizes HRF in different domains: 

Use cases of SAP solutions already utilizing HRF:

SAP Transportation Resource Planning

TRP_Use_Case.jpg

SAP Fraud Management

Fraud_Use_Case.JPG

SAP hybris Marketing (formerly SAP Customer Engagement Intelligence)

hybris_Use_Case.JPG

SAP Operational Process Intelligence

OPInt_Use_Case.JPG


Creating a copy of user SYSTEM in SAP HANA

$
0
0

In the SAP HANA Security Guide SAP recommends to use user SYSTEM only at the beginning. After various users, e. g. for backup and monitoring purposes, had been created user SYSTEM shall be deactivated. Following you'll find some of my experiences I made when trying to copy user SYSTEM.

 

Trying to copy user SYSTEM to another user shows the upcoming diffculties. The new user only receives role PUBLIC, no package, no privilege object that already had been assigned to user SYSTEM is copied to the new user. Only repository roles will be copied. Unfortunately currently there's no way to create an sql script automatically that contains all objects, packages and roles assigned to user SYSTEM. All objects assigned to user SYSTEM have to be assigned manually to the new user. The reason is that user SYSTEM in some cases isn't allowed to grant the respective object, package or role. Therefore no object, package or role is copied from user SYSTEM to the new one.

 

The copy process is purely an UI functionality, and thus cannot be automated. There's no sql command "COPY USER". Only the sql command "CREATE USER" is available.

 

User SYSTEM automatically receives the rights for new objects and packages created in the HANA system. The new user will not receive theses automatically. They have to be assigned one by one manually.

 

The password of the new user can be altered with the sql statement "ALTER USER newuser DISABLE PASSWORD LIFETIME;". By this the given password hasn't to be changed during the first logon.

 

User SYSTEM can be deactivated with the sql statement "ALTER USER SYSTEM DEACTIVATE USER NOW".

 

If you need to reset the password of user SYSTEM please follow the description given in note 1925267.

 

If you like to exclude user SYSTEM from the current password policy please follow note 2251556. Please bear in mind that this procedure isn't recommended by SAP AGS.

SAP HANA Database Campus – Open House 2016 in Walldorf

$
0
0

The SAP HANA Database Campus invites students, professors, and faculty members interested in database research to join our third Open House at SAP's headquarters. Throughout your day, you will get an overview of database research at SAP, meet the architects of SAP HANA and learn more about academic collaborations. There are a couple of interesting presentations by developers and academic partners. Current students and PhD candidates present their work and research. For external students and faculty members it is a great chance to find interesting topics for internships and theses.


The event takes place on June 2nd, 2016, during 09:3016:00 in Walldorf, Germany. Free lunch and snacks are provided for all attendees. The entire event is held in English.

 

Register here

 

 

Looking forward to seeing you in Walldorf,

The SAP HANA Database Campus

students-hana@sap.com

 

 

Location:

  • SAP Headquarters,WDF03, Robert-Bosch-Str. 30, 69190, Walldorf, Germany
  • Room E4.02, Check-In Desk in the lobby of WDF03

 

Agenda:

  • 09:00-09:30 Arriving
  • 09:30-10:00 Check-In
  • 10:00-10:15 Opening
  • 10:15-11:00 Keynote
    • Daniel Schneiss (Head of SAP HANA Development) – Topic will be announced
  • 11:00-12:00 Poster Session Part 1 & Career Booth
  • 12:00-12:45 Lunch
  • 12:45-13:00 Office Tour
  • 13:00-14:00 Session 1 – Academic
    • Prof. Anastasia Ailamaki (EPFL) Scaling Analytical and OLTP Workloads on Multicores: Are we there yet? [30 min]
    • Ismail Oukid (SAP HANA PhD student, TU Dresden) FPTree: A Hybrid SCM-DRAM Persistent and Concurrent BTree for Storage Class Memory [15 min]
    • SAP HANA PhD Student Speaker and Topic will be announced [15 min]
  • 14:00-15:00 Poster Session Part 2, Career Booth & Coffee Break
  • 15:00-15:45 Session 2 – SAP
    • Hinnerk Gildhoff (SAP) – SAP HANA Spatial & Graph[20 min]
    • Daniel Booss (SAP)– SAP HANA Basis [20 min]
  • 15:45-16:00 Best Student/PhD-Student Poster & Open House Closing

 

 

Archive of previous events


By participating you agree to appear in photos and videos taken during the event and published on SCN and CareerLoft.

SAP HANA TDI - Overview

$
0
0

SAP HANA tailored data center integration (TDI) was released in November 2013 to offer an additional approach of deploying SAP HANA. While the deployment of an appliance is easy and comfortable for customers, appliances impose limitations on the flexibility of selecting the hardware components for compute servers, storage, and  network. Furthermore, operating appliances may require changes to established IT operation processes. For those who prefer leveraging their established processes and gaining more flexibility in hardware selection for SAP HANA, SAP introduced SAP HANA TDI. For more information please download this overview presentation.

View this Document

SAP Hana EIM Connection Scenario Setup - Part 1

$
0
0

In my documentation I’ll explain how to setup and configure a SAP Hana SP10 EIM (SDI/SDQ) connection with Data Provisioning agent based on Cloud and On-Premise scenario.

 

This documentation is build in 3 part:

SAP Hana EIM  Connection Scenario Setup - Part 1 (current)

SAP Hana EIM  Connection Scenario Setup - Part 2

SAP Hana EIM  Connection Scenario Setup - Part 3

 

In my first documentation I have explain how to replicate data over by using Hana SDI capabilities with SAP Hana adapter, in this document I’ll explain how to configure and connect the DP Agent to several source system to retrieve and replicated data for On-Site and Cloud scenario.

 

 

I will show in detail step and configuration point to achieve this setup with the following adapter:

Log Reader (Oracle, DB2, MSSQL) / SAP ASE / Teradata and Twitter

 

 

Note: The Data Provisioning Agent must be installed on the same operating system as your source database, but not necessarily on the same machine

 

 

In order execution

 

  • Security (Roles and Privileges)
  • Configuration for On-Premise scenario
  • Configuration for Cloud scenario
  • Log Reader adapter setup (Oracle / DB2 / MS SQL)
  • SAP ASE adapter setup
  • Teradata adapter setup
  • Twitter adapter setup
  • Real time table replication

 

 

Guide used

 

SAP Hana EIM Administration Guide SP10

SAP Hana EIM Configuration guide SP10

 

 

Note used

 

2179583 - SAP HANA Enterprise Information Management SPS 10 Central Release Note

2091095 - SAP HANA Enterprise Information Management

 

 

Link used

http://help.sap.com/hana_eim

  

 

Overview Architecture

 

archi1.jpg

On-premise landscape

 

archi2.jpg

Cloud landscape

 

 

  

Security (Roles and Privileges)

 

Before to start trying to make any connection form DP Agent server and Hana, it’s important to provide the necessary credential to the user involve in the configuration based upon your landscape scenario.

 

 

For the on-premise and cloud scenario as an administrator ensure you have:

  • Privileges: AGENT ADMIN and ADAPTER ADMIN
  • Application privilege: sap.hana.im.dp.admin::Administrator

 

 

For the cloud scenario an additional user is required, its use as a technical user aka “xs agent” required when you need to register an agent:

  • Privileges: AGENT MESSAGING
  • Application privilege : sap.hana.im.dp.proxy::AgentMessaging

 

 

 

Configuration for On-Premise scenario

 

In the case of on-premise scenario, the interaction with the DP agent and Hana server is done through TCP/IP connection, similar to connection through the Hana Studio.

1.jpg

 

2.jpg

 

 

Configuration for Cloud Landscape scenario

 

The cloud scenario require in case of SSL connection a specific setup, Hana can be access directly from the internal webdispatcher or access through a proxy, for my configuration I use the direct connection over SSL.

 

 

One of the requirement to make Hana available over HTTPS is to have a valid “CommonCryptoLib” library (libsapcrypto.so), by default when Hana is installed it comes with it.

3.jpg

 

 

Now from the webdispatcher page in SSL and Trust Configuration tab, I create CA request and send it to my CA Authority and import the response

4.jpg

 

Once signed I import the CA response in my trusted list of PSE

5.png

 

6.jpg

 

And change the format view from TEXT to Perm in order to review the chain

7.jpg

 

 

Once completed I’ll change the default port used by the webdispatcher in order to use standard port 80/443. In order to do this, from the webdispatcher.ini change the default port to the one you want to use and add the parameter “EXTBIND=1”

8.jpg

 

Once saved at the os layer you need to bind the default SSL port to use, by default when hana is installed it create an “icmbnd.new” file, rename it to “icmbnd” and change the right on it. You must be root to do this.

9.jpg

 

Now my Hana instance is available from HTTPS access

10.jpg

 

 

The Hana certificate needs to be imported in the DP Agent server, to do this into the “ssl” directory of the DPA.

 

First change the default “cacerts” password with the following keytool command where the cacerts file is located

 

keytool -storepasswd  -new [new password ]  -keystore cacerts

11.jpg

 

Then create a “SAPSSL.cer” file, open it with your favorite editor and paste the entire chain from the imported webdispatcher certificate

12.jpg

 

And import it into the “cacerts”

Keytool –importcert –keystore cacerts –storepass <password> -file SAPSSL.cer -noprompt

13.jpg

 

I can now configure DPA to use SSL connection for Hana

14.jpg

 

15.jpg

 

 

LogReader adapter setup

 

Log Reader adapters provide real-time changed-data capture capability to replicate changed data from Oracle, Microsoft SQL Server, and IBM DB2 databases to SAP HANA in real time; In certain case, you can also write back to a virtual table.

 

 

Oracle 12c LogReader adapter

 

The first point to take in consideration before to start the configuration of any LogReader adapter, is to download the necessary JDBC libraries specific to the source used and store them into the lib directory of the data provisioning agent.

 

 

Download the libraries from:

Oracle

Microsoft SQL

IBM DB2

17.jpg

 

Note for all the database setup, I will not explain how to install them but focus on the step which needs to be perform in order to work with SDI configuration.

 

In order to enable the real time replication capability in Oracle, a specific script needs to be run on the Oracle database, this script is located in the “scripts” into the DP Agent server

18.png

Note: the script assume that the default user for the replication is LR_USER

 

Before to run it, check if the database is in archivelog mode, if not enabled it needs to be changed

1.jpg

 

Since the DP agent and Oracle doesn’t reside on the same server, we need to copy the timezome_11.dat file from the Oracle server to the DP Agent server.

2.jpg

 

And specify the location of the fie from the Oracle preference adapter

3.jpg

 

Now done I’ll use Oracle SQL Developer to execute the script, which will also create the LR_USER

Note : don’t forget to change the script in order to setup the password for the user.

5.jpg

 

Now done, I can register my adapter and create my remote connection

6.jpg

 

From the studio when you specify the OracleLogReader adapter, it’s important to specify the administrator LogReader port and the user define for the replication.

7.jpg

 

From a connection point with oracle we are done, next MS SQL setup

 

 

  

Microsoft SQL 2008 R2 LogReader adapter

 

 

Since EIM relies on database log to perform data movement, which means that logs must be available until the data is successfully read and replicated to Hana, MS SQL Server must be configure in Full Recovery Mode.

19.png

 

For my SQL based scenario it will make my database CDC (change data capture) enable, this feature is supported by EIM but the “truncate” operation on table is not

8.jpg

 

Once activated, make the check

9.jpg

 

After the feature is enabled on the database, the cdc schema, cdc user, data capture metadata tables are automatically created

10.jpg

 

Since I don’t have create table to replicate for now, I’ll explain later how to enable this feature for each table I want to track

  

Once done enable DAC to allow remote connection from facets

11.jpg

 

And make the log files readable, copy the sybfilter and sybfiltermgr from the dp server from the logreader folder to the MSSQL server

11.1.jpg

 

11.1.1.jpg

 

Anywhere on the server create a file named “LogPath.cfg” and set the variable environment “RACFGFilePath” which point to the location

11.1.2.jpg

 

Open the LogPath.cfg file and provide the location of the .ldf file

11.1.3.jpg

SAP Hana EIM Connection Scenario Setup - part 3

$
0
0

Twitter adapter setup

 

In order to replicate and consume content into Hana from twitter, I need to create a “Twitter apps” from the developer space (https://dev.twitter.com)

25.jpg

 

From the documentation link click on “Manage my Apps”

26.jpg

 

It will lead you to the application management page and click on the “Create New App” button

27.jpg

 

Provide the necessary information, accept the license term and hit click on the “create your Twitter application” at the bottom of the page

28.jpg

 

The application now create, four information will be required in order to create the remote connection with Hana:

  • Consumer Key (API Key)
  • Consumer Secret (API Secret)
  • Access Token
  • Access Token Secret

 

From the created application page click on the “keys and Access Tokens”

29.jpg

 

From the page note the two keys for the consumer

30.jpg

 

And from the bottom of the page create the access token to generate them

31.jpg

 

32.jpg

 

Now completed I need to register my adapter and create my new connection in Hana

32.1.jpg

 

32.2.jpg

 

 

 

Real time table replication

 

All my remote source connection are now created I can proceed with table replication, for my test lab I have created the same table to replicate in all remote source database named “Store”

32.3.jpg

 

MS SQL


For MS SQL when you setup the database to use “Change Data Capture” to track change, you need to specify on which table you want the track to encore

39.jpg

 

From the Workbench editor create we need to create the replication task and uncheck “initial load”

21.png

 

 

ORACLE


I earlier ran the “oracle_init.sql” script with the default user LR_USER for the replication, I did create my “store” table which belong to this user for the replication

40.jpg

 

From the workbench repeat the procedure to create a replication task and uncheck “initial load”

41.jpg

 

For DB2, Teradata and ASE, from the workbench repeat the procedure to create a replication task and uncheck “initial load”

 

Once the replication is working you can check the task from the “DP Provisioning Task Monitor”

42.jpg

 

TWITTER

 

For Twitter replication when the remote connection is created two table should appear, the one I will use for my tweet replication is the “status” table

43.jpg

 

From the workbench I start to setup the live replication and check the replication task

44.jpg

 

45.jpg

 

From the studio, I can see the content of the table which contain all the tweets and news

46.jpg

 

Replicate information for the status table bring a lot of element, for my test I create a tweet on my tweeter page and see if this one appear in the tale

47.jpg

 

And I can see my tweet in the table

48.jpg

 

The next step now is to filter my content, I’ll create additional tweet and filter the replication on them only

49.jpg

 

In order to filter, from the workbench apply a filter on “ScreenName” column, basically the screen name value should be your account name.

50.jpg

 

And refresh my status table

51.jpg

 

My HomeLab replication is now completed.

 

Link to :

SAP Hana EIM  Connection Scenario Setup - Part 1

SAP Hana EIM  Connection Scenario Setup - Part 2

Viewing all 1183 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>