SAP Hana Vora 1.2 setup with SAP Hana SP11 integration

In my documentation I’ll explain how install/configure SAP Hana Vora 1.2 with SAP Hana SP11 integration, I will demonstrate in detail how to setup a Horthonworks ecosystem in order to realize this configuration.

For my setup I’ll use my own lab on Vmware Vsphere 6.0, run SAP Hana Vora 1.2, SAP Hana Revision 112 and use Hadoop HDFS stack 2.7.2.

Disclaimer: My deployment is only for test purpose, I make the security simple from a network perspective in order to realize this configuration and use open source software.

In order execution

Deploy Horthonworks ecosystem
Install SAP Hana Vora for Ambari
Install SAP Hana Spark Controller 1.5
Install Spark assembly and dependent library
Configure Hive Metastore
Configure Spark queue
Adjust MapReduce2 class path
Connect SAP Hana to SAP Hana Vora

Guide used

SAP HANA Vora Installation and Developer Guide

SAP HANA Administration Guide

Note used

2284507 - SAP HANA Vora 1.2 Release Note

2203837 - SAP HANA Vora: Central Release Note

2213226 - Prerequisites for installing SAP HANA Vora: Operating Systems and Hadoop Components

Link used

Help SAP Hana for SAP HANA Vora 1.2

HDP Documentation Ver 2.3.4

Overview Architecture

The architecture is based on a full virtual environment, running SAP Hana Vora 1.2 require mandatory component as part of the Hadoop ecosystem:

• HDFS 2.6.x or 2.7.x

• ZooKeeper

• Spark 1.5.2

• Yarn cluster manager

For my configuration all my server are registered in my DNS and sync with an NTP server.

Deploy Horthonworks Ecosystem

The Horthonworks ecosystem deployment consist of several step

1. Prepare the server by sharing SSH Public Key

2. Install MySQL connector

3. Install Ambari

4. Install Hive database

5. Install and configure HDP cluster

For the installation in order to make it simple, I decide to use the “Ambari Automated Installation” based on HDP vreison 2.3.4 which can be deploy with SPARK version 1.5.2.

To realize this configuration my deployment will comport 3 vms:

Ambari: ambari.will.lab

Yarn: yarn.will.lab

Hana: vmhana02.will.lab

Prepare the server by sharing SSH Public Key

My 3 severs up and running we have to set the SSH Public key on Ambari server in order to allow it to install Ambari agent on host which are part of the cluster.

I first create the rsa key-pair

And copy the public key on the remote server “yarn”

And try to ssh my remote server to confirm that I don’t need to use the password

Install MySQL connector

Hive requires a relational database to store Hive Metastore, I install the MySQL connect and note the path, it will be required during the initialization setup of Ambari

Install Ambari

On the Ambari server we have download the Ambari repository for SLES11:

wget -nv http://public-repo-1.hortonworks.com/ambari/suse11/2.x/updates/2.2.0.0/ambari.repo -O /etc/zypp/repos.d/ambari.repo

And finally install Ambari

Now installed, the Ambari server needs to setup:

Note: I decide to use Oracle JDK 1.8 and the embedded database for Ambari PostgreSQL

Once done start the server and check the status

Note: I did not specify the MySQL connector path at the beginning of the initialization of Ambari, in order to include it stop Amabri and load it by re-executing the following command

Install Hive Database

By default on RHEL/CentOS/Oracle Linux 6, Ambari will install an instance of MySQL on the Hive Metastore host. Since i'm using SLES i need to create an instance of MySQL for Hive Metastore.

Install and configure HDP cluster

The server up and running we can start the installation and the configuration of the HDP cluster components, to proceed launch the Apache Ambari url and execute the wizard with the default user and password “admin/admin”

Follow the step provided by the wizard to create your cluster

For this section provide the private key generated earlier on Ambari server

Host added successfully, but check the warning message

Choose the necessary services you wants to deploy

Assign the service you wants to run on the selected master node, since I’m using one host only it’s a no brainer. Additional host can be assigned later upon your needs

Assign Slave and client

Customize your service upon your needs as well, in my case I use a MySQL database so I need to provide the database information

Review the configuration for all service and execute

Once completed, access the Ambari web page and make some checks to see the running services

The Horthonwork ecosystem now installed we can proceed with the SAP Hana Vora for Amabri installation

SAP Hana Vora for Amabri

SAP HANA Vora 1.2 is now available for download as a single installation package for the Ambari and Cloudera cluster provisioning tools. These packages also contain the SAP HANA Vora Spark extension library (spark-sap-datasources-<VERSION>-assembly.jar), which no longer needs to be downloaded separately.

The following components will be deployed from the provisioning tool

For Vora Dlog component a specific library is required on the server “libaio”, make sure it’s installed

Once download, from Ambari server copy the VORA_AM* file into:

/var/lib/ambari-server/resources/stacks/HDP/2.3/service folder

And decompress it, it will generate the several vora application folder

Then restart the Ambari server in order to load the new service

Once completed install the new Vora service form the Ambari dashboard

Select the vora application to deploy and hit Next to install it

The Vora Discovery and Thriftserver will required some customization entry such as hostname and java location

The new service appear now, yes I have red services but will be fixed.

The Vora engine installed, I need to install the Spark Controller

Install SAP Hana Spark Controller 1.5

The Spark controller needs to be download from the marketplace, this is an .rpm package.

Once downloaded execute the rpm command to install it

When the installation is completed the /usr/sap/spark/controller folder is normally generated

The next phase is now to install the Spark assembly file and Dependent libraries

Install Spark assembly and dependent library

The Spark assembly file and Depend libraries needs to be copied into spark controller external lib folder.

Note: up to now only the assembly.jar lib version 1.5.2 is the only supported version to works with t Vora 1.2, I’ll download page at https://spark.apache.org/download.html

Decompress the folder and copy the necessary library into “/usr/sap/spark/controller/lib/external” folder

And I will update the hanaes-site.xml file in /usr/sap/spark/controller/conf folder to update the content

Spark and Yarn create staging directories in /hana/hanaes directory in HDFS, this directory needs to be created manually by the following command as hdfs user:

hdfs dfs –mkdir /user/hanaes

Configure Hive Metastore

Since SAP Hana Spark Controller connect to the Hive Metastore, the hive-site.xml file needs to be available in controller’s class path.

To do it I will create a symbolic link in the /usr/sap/spark/controller/conf folder

And adjust the hive-site.xml file with the following parameter:

• Hive.execution.engine = mr

• Hive.metastore.client.connect.retry.delay = remove the (s)

• Hive.metastore.client.connect.socket.timeout = remove the (s)

• Hive.security.authorization.manager = org.apache.hadoop.hive.ql.security.authorization.DefaultHiveAuthorizationProvider

Note this change are made only because for our example we are using Horthonworks distribution, with Cloudera it’s not required

Configure Spark queue

In order to avoid Spark to take all available resources from Yarn manager and thus leaving no resource for any other application running on Yarn resource manager, I need to configure Spark dynamic Allocation by setting up a queue in ‘Queue Manager”

Create it then save and refresh from the action button

Once done from hanaes-site.xml file add the spark.yarn.queue property

Adjust Mapreduce2 class path

One import point to take in consideration about Spark Controller, is the fact that the component library path call during startup doesn’t support variable such as “${hdp.version}”.

This variable is declared in the MapReduce2 configuration

Expand the Advanced mapred-site property and locate the parameter “mapreduce.application.classpath”

Copy/past the whole string in your favorite editor and change all reference of ${hdp.version} entries by the current hdp version

Before the change

$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure

After the change

$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/2.3.0.0-2557/hadoop/lib/hadoop-lzo-0.6.0.2.3.0.0-2557.jar:/etc/hadoop/conf/secure

Once done, as “hanaes” user, start the Spark Controller from the directory /usr/sap/spark/controller/bin

Check the Spark log to see if it’s running properly in /var/log/hanaes/hana_controller.log

As we can see I have an error in my config file

Connect SAP Hana to SAP Hana Vora

My Horthonworks ecosystem in place and SAP Hana Vora 1.2 deployed, I can connect my Hana instance to it over the Spark adapter.

Before trying to make any connection one specific library needs to be copy into “/usr/sap/spark/controller/lib” folder, from “/var/lib/ambari-agent/cache/stacks/HDP/2.3/services/vora-base/package/lib/vora-spark/lib” copy the spark-sap-datasources-1.2.33-assembly.jar file

Once done restart the Spark Controller

Now to connect on my Hadoop from Hana in need to create a new remote connection by using the following SQL statement

Since I did not create any table in my Hadoop environment this is why nothing appear below default, in order to test it I’ll create a new schema and load a table (csv) into it and see the result in hana