In my documentation I’ll explain how install/configure SAP Hana Vora 1.2 with SAP Hana SP11 integration, I will demonstrate in detail how to setup a Horthonworks ecosystem in order to realize this configuration.
For my setup I’ll use my own lab on Vmware Vsphere 6.0, run SAP Hana Vora 1.2, SAP Hana Revision 112 and use Hadoop HDFS stack 2.7.2.
Disclaimer: My deployment is only for test purpose, I make the security simple from a network perspective in order to realize this configuration and use open source software.
In order execution
- Deploy Horthonworks ecosystem
- Install SAP Hana Vora for Ambari
- Install SAP Hana Spark Controller 1.5
- Install Spark assembly and dependent library
- Configure Hive Metastore
- Configure Spark queue
- Adjust MapReduce2 class path
- Connect SAP Hana to SAP Hana Vora
Guide used
SAP HANA Vora Installation and Developer Guide
SAP HANA Administration Guide
Note used
2284507 - SAP HANA Vora 1.2 Release Note
2203837 - SAP HANA Vora: Central Release Note
2213226 - Prerequisites for installing SAP HANA Vora: Operating Systems and Hadoop Components
Link used
Help SAP Hana for SAP HANA Vora 1.2
Overview Architecture
The architecture is based on a full virtual environment, running SAP Hana Vora 1.2 require mandatory component as part of the Hadoop ecosystem:
• HDFS 2.6.x or 2.7.x
• ZooKeeper
• Spark 1.5.2
• Yarn cluster manager
For my configuration all my server are registered in my DNS and sync with an NTP server.
Deploy Horthonworks Ecosystem
The Horthonworks ecosystem deployment consist of several step
1. Prepare the server by sharing SSH Public Key
2. Install MySQL connector
3. Install Ambari
4. Install Hive database
5. Install and configure HDP cluster
For the installation in order to make it simple, I decide to use the “Ambari Automated Installation” based on HDP vreison 2.3.4 which can be deploy with SPARK version 1.5.2.
To realize this configuration my deployment will comport 3 vms:
Ambari: ambari.will.lab
Yarn: yarn.will.lab
Hana: vmhana02.will.lab
Prepare the server by sharing SSH Public Key
My 3 severs up and running we have to set the SSH Public key on Ambari server in order to allow it to install Ambari agent on host which are part of the cluster.
I first create the rsa key-pair
And copy the public key on the remote server “yarn”
And try to ssh my remote server to confirm that I don’t need to use the password
Install MySQL connector
Hive requires a relational database to store Hive Metastore, I install the MySQL connect and note the path, it will be required during the initialization setup of Ambari
Install Ambari
On the Ambari server we have download the Ambari repository for SLES11:
wget -nv http://public-repo-1.hortonworks.com/ambari/suse11/2.x/updates/2.2.0.0/ambari.repo -O /etc/zypp/repos.d/ambari.repo
And finally install Ambari
Now installed, the Ambari server needs to setup:
Note: I decide to use Oracle JDK 1.8 and the embedded database for Ambari PostgreSQL
Once done start the server and check the status
Note: I did not specify the MySQL connector path at the beginning of the initialization of Ambari, in order to include it stop Amabri and load it by re-executing the following command
Install Hive Database
By default on RHEL/CentOS/Oracle Linux 6, Ambari will install an instance of MySQL on the Hive Metastore host. Since i'm using SLES i need to create an instance of MySQL for Hive Metastore.
Install and configure HDP cluster
The server up and running we can start the installation and the configuration of the HDP cluster components, to proceed launch the Apache Ambari url and execute the wizard with the default user and password “admin/admin”
Follow the step provided by the wizard to create your cluster
For this section provide the private key generated earlier on Ambari server
Host added successfully, but check the warning message
Choose the necessary services you wants to deploy
Assign the service you wants to run on the selected master node, since I’m using one host only it’s a no brainer. Additional host can be assigned later upon your needs
Assign Slave and client
Customize your service upon your needs as well, in my case I use a MySQL database so I need to provide the database information
Review the configuration for all service and execute
Once completed, access the Ambari web page and make some checks to see the running services
The Horthonwork ecosystem now installed we can proceed with the SAP Hana Vora for Amabri installation
SAP Hana Vora for Amabri
SAP HANA Vora 1.2 is now available for download as a single installation package for the Ambari and Cloudera cluster provisioning tools. These packages also contain the SAP HANA Vora Spark extension library (spark-sap-datasources-<VERSION>-assembly.jar), which no longer needs to be downloaded separately.
The following components will be deployed from the provisioning tool
For Vora Dlog component a specific library is required on the server “libaio”, make sure it’s installed
Once download, from Ambari server copy the VORA_AM* file into:
/var/lib/ambari-server/resources/stacks/HDP/2.3/service folder
And decompress it, it will generate the several vora application folder
Then restart the Ambari server in order to load the new service
Once completed install the new Vora service form the Ambari dashboard
Select the vora application to deploy and hit Next to install it
The Vora Discovery and Thriftserver will required some customization entry such as hostname and java location
The new service appear now, yes I have red services but will be fixed.
The Vora engine installed, I need to install the Spark Controller
Install SAP Hana Spark Controller 1.5
The Spark controller needs to be download from the marketplace, this is an .rpm package.
Once downloaded execute the rpm command to install it
When the installation is completed the /usr/sap/spark/controller folder is normally generated
The next phase is now to install the Spark assembly file and Dependent libraries
Install Spark assembly and dependent library
The Spark assembly file and Depend libraries needs to be copied into spark controller external lib folder.
Note: up to now only the assembly.jar lib version 1.5.2 is the only supported version to works with t Vora 1.2, I’ll download page at https://spark.apache.org/download.html
Decompress the folder and copy the necessary library into “/usr/sap/spark/controller/lib/external” folder
And I will update the hanaes-site.xml file in /usr/sap/spark/controller/conf folder to update the content
Spark and Yarn create staging directories in /hana/hanaes directory in HDFS, this directory needs to be created manually by the following command as hdfs user:
hdfs dfs –mkdir /user/hanaes
Configure Hive Metastore
Since SAP Hana Spark Controller connect to the Hive Metastore, the hive-site.xml file needs to be available in controller’s class path.
To do it I will create a symbolic link in the /usr/sap/spark/controller/conf folder
And adjust the hive-site.xml file with the following parameter:
• Hive.execution.engine = mr
• Hive.metastore.client.connect.retry.delay = remove the (s)
• Hive.metastore.client.connect.socket.timeout = remove the (s)
• Hive.security.authorization.manager = org.apache.hadoop.hive.ql.security.authorization.DefaultHiveAuthorizationProvider
Note this change are made only because for our example we are using Horthonworks distribution, with Cloudera it’s not required
Configure Spark queue
In order to avoid Spark to take all available resources from Yarn manager and thus leaving no resource for any other application running on Yarn resource manager, I need to configure Spark dynamic Allocation by setting up a queue in ‘Queue Manager”
Create it then save and refresh from the action button
Once done from hanaes-site.xml file add the spark.yarn.queue property
Adjust Mapreduce2 class path
One import point to take in consideration about Spark Controller, is the fact that the component library path call during startup doesn’t support variable such as “${hdp.version}”.
This variable is declared in the MapReduce2 configuration
Expand the Advanced mapred-site property and locate the parameter “mapreduce.application.classpath”
Copy/past the whole string in your favorite editor and change all reference of ${hdp.version} entries by the current hdp version
Before the change
$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure
After the change
$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/2.3.0.0-2557/hadoop/lib/hadoop-lzo-0.6.0.2.3.0.0-2557.jar:/etc/hadoop/conf/secure
Once done, as “hanaes” user, start the Spark Controller from the directory /usr/sap/spark/controller/bin
Check the Spark log to see if it’s running properly in /var/log/hanaes/hana_controller.log
As we can see I have an error in my config file
Connect SAP Hana to SAP Hana Vora
My Horthonworks ecosystem in place and SAP Hana Vora 1.2 deployed, I can connect my Hana instance to it over the Spark adapter.
Before trying to make any connection one specific library needs to be copy into “/usr/sap/spark/controller/lib” folder, from “/var/lib/ambari-agent/cache/stacks/HDP/2.3/services/vora-base/package/lib/vora-spark/lib” copy the spark-sap-datasources-1.2.33-assembly.jar file
Once done restart the Spark Controller
Now to connect on my Hadoop from Hana in need to create a new remote connection by using the following SQL statement
Since I did not create any table in my Hadoop environment this is why nothing appear below default, in order to test it I’ll create a new schema and load a table (csv) into it and see the result in hana
Note: you can download some csv sample here
Once done check the result from Hive view
And make the check in Hana by creating and querying the virtual table
It’s all good I have my data
My configuration is now completed with SAP Hana Vora 1.2 setup and connection with SAP Hana SP11.