Due to the interest in this topic I converted this discussion into a document.
Everybody is invited to edit and contribute.
Maybe this can evolve into some kind of FAQ...
Cheers,
Lars
-----
I posted the following questions on saphana.com but never received a response. Anyone care to take a look? http://www.saphana.com/message/4304#4304
From what I know there are three basic engines used to processes queries in SAP HANA. There is the Join Engine, the OLAP Engine and the Calculation Engine. What I am not clear on is under what scenario are these engines invoked? My basic understanding is as follows but more information would be appreciated:
- Join Engine
- Used when querying an Attribute View
- Questions:
- What If I execute a standard ANSI92 SQL statement from a BI tool or in SAP HANA Studio. Does this also use the Join Engine
- If my Analytic View foundation is joined to attribute views, is both the OLAP and JOIN Engine used?
- Is this engine embedded in the indexserver?
- How does the optimizer play a role with this engine?
- OLAP Engine
- Analytic Views (without derived columns) use the OLAP Engine
- Questions:
- Can this only be invoked from the column views is the _SYS_BIC schema that have the term "OLAP" in their name
- What syntax is used to create these column views?
- Can I create my own column views (outside of activating an information view)
- Is this engine embedded in the indexserver?
- How does the optimizer play a role with this engine?
- Calculation Engine
- Calculation Views or Analytic Views with derived columns use this engine
- Questions:
- Can this only be invoked from the column views is the _SYS_BIC schema
- What syntax is used to create these column views?
- Can I create my own column views (outside of activating an information view)
- Is this engine embedded in the indexserver?
- How does the optimizer play a role with this engine?
Hi Jonathan,
not sure if I can answer all your questions, but let's see how far I get...
Jonathan Haun wrote:
- Join Engine
- Used when querying an Attribute View
- Questions:
- What If I execute a standard ANSI92 SQL statement from a BI tool or in SAP HANA Studio. Does this also use the Join Engine
- If my Analytic View foundation is joined to attribute views, is both the OLAP and JOIN Engine used?
- Is this engine embedded in the indexserver?
- How does the optimizer play a role with this engine?
The join engine is also used, when you run plain SQL. Actually it's not too difficult to check what engine is used to process a query: simply run the Plan Visualization on your query.
The Plan Operators (POPs) you find in there - represented by the boxes in the graph - are named based on a naming scheme like this:
- JE<operator_name> => Join Engine (Attribute views, plain SQL)
- BW<operator_name> => Olap Engine (Analytic views)
- CE<operator_name> => Calc Engine (Analytic views with calculated attributes, Calculation views)
Answers to the questions:
A1: Depends on what objects you reference in your query. If you're just querying plain base tables then the join engine will be used. As soon as you reference an analytic or calculation view, the other engines will be employed as well.
A2: Nope - during activation of the analytic views, the joins in the attribute views get 'flattened' and included in the analytic view run time object. Only the OLAP engine will be used then.
A3: All the query execution is happening in the indexserver - so, yes, join/olap/calc engine are all major parts of the indexserver.
Jonathan Haun wrote:
- OLAP Engine
- Analytic Views (without derived columns) use the OLAP Engine
- Questions:
- Can this only be invoked from the column views is the _SYS_BIC schema that have the term "OLAP" in their name
- What syntax is used to create these column views?
- Can I create my own column views (outside of activating an information view)
- Is this engine embedded in the indexserver?
- How does the optimizer play a role with this engine?
A1: Technically the run time objects of analytic views and attribute views are so called column views. These can also be created without the modeler and in this case these views can be placed in arbitrary schemas, not just the _SYS_BIC schema.
For modeled views however, the _SYS_BIC schema is the only location for the run time objects.
A2: not officially documented, but you may check the "source code" of the run time objects by displaying the definition.
A3: sure you can - if you can ;-) As I wrote, this is not officially documented or supported. So use on your own risk.
Besides, you don't have any transport management with these views - so this really might not be something for production scenarios.
A4: see above. Yes!
A5: there are actually a couple of optimizers that work on different levels of query processing. So the answer here is: yes.
Key for Analytic views is to keep in mind that these are specifically designed to execute star schema queries. So the query optimization process 'knows' the scenario and optimizes according to this star schema scenario.
Jonathan Haun wrote:
Calculation Engine
- Calculation Views or Analytic Views with derived columns use this engine
- Questions:
- Can this only be invoked from the column views is the _SYS_BIC schema
- What syntax is used to create these column views?
- Can I create my own column views (outside of activating an information view)
- Is this engine embedded in the indexserver?
- How does the optimizer play a role with this engine?
A1: nope - you'd also see the Calc Engine being used when running SQLScript procedures.
A2: even less documented outside SAP than the column views. Still you could check for the source code, but you'd be unpleasantly surprised to find a large chunk of JSON syntax.
A3: technically, yes, again - if you could ...
A4: Sure it is.
A5: Calc Engine is pretty special and I'd recommend to give the sap notes including their attachments a read:
- SAP note HANA Calculation Engine Instantiation Processhttps://service.sap.com/sap/support/notes/1764658
- SAP notePerformance degradation for query on CalcView https://service.sap.com/sap/support/notes/1825549
- SAP note Select count(*) on calculated views
https://service.sap.com/sap/support/notes/1791464 - SAP note Duplicate dimension values for query on Calculation View
https://service.sap.com/sap/support/notes/1783880 - SAP note Select on a calculated attribute returns empty result
https://service.sap.com/sap/support/notes/1764662
Hope that these answers give you something to work with.
Cheers, Lars
Lars,
Based on your JOIN ENGINE answer:
"A2: Nope - during activation of the analytic views, the joins in the attribute views get 'flattened' and included in the analytic view run time object. Only the OLAP engine will be used then"
Using this example Plan image at the bottom I noticed that it starts with a "Join" operation and then moves into BWxxxxxx operations. Obviously the Attribute Views must be Joined to the Analytic Foundation but I was never sure which engine processed these joins until you provided the response above. Is the flattening operation occurring at run time or only when the column view is activated? What does the plan mean by "Join" in the below image?
SQL:
SELECT ORDERED_CAL_YEAR, SUM(SALESAMOUNT) AS SALESAMOUNT
FROM "_SYS_BIC"."saphana.internetsales/ANLV_INTERNETSALES/olap"
GROUP BY ORDERED_CAL_YEAR
BTW: Thanks for taking the time to answer these questions. This is very informative.
Hi Jonathan,
just open the join box (little triangle in the upper right corner of the join box).
You'll find that there are several BWJoin... steps being performed.
AFAIK the "flattening" is done during activation.
That means after activating the analytic view, the whole join structure is a star schema query and it doesn't matter whether you join a table in the "data foundation" or in the"logical join" - it boils down to the same star query.
That in mind it becomes clear what attribute views are for: reusable definitions of dimensions.
- Lars
Thanks Lars Breddemann. Very much informative.
Lars,
Thanks again for you guidance on this question. I received a response on saphana.com from another SAP employee but I am not sure about some of the answers. Beyond the conversation about analytic views he mentioned that sometimes the row engine was invoked. Do you know anything about the row engine usage with analytic views?
http://www.saphana.com/message/4397
Hi again
In your example plans, the row engine had not been involved and I actually don't see a way to design an analytic view to use the row engine.
However, you could e.g. write a SQL query that reads data from an analytic view and uses functionality that is only supported in the row engine (e.g. the date conversion *with* character pattern or some of the window functions).
In this case the result set from the analytic view would likely be materialized and handed over to the row engine.
A similar effect would occur, if you mix plain SQL and CE functions in a calculation view.
Therefore, the HA300 is not as inaccurate as you may think, but it surely doesn't cover the inner workings of the engines.
- Lars
Lars,
Thanks again. I understand the concepts now. I have been visualizing plans for a variety of queries using multiple scenarios the last few days. I can see how different design approaches can invoke different engines. One interesting item I noticed is that when I activate an analytic view that contains a calculated column two different column views are created in _SIS_BIC. One column view is coded to leverage the OLAP engine and it its name ends in /olap. I also noticed that it is void of some of the attributes defined as calculated columns. The second column view is activated using the CREATE CALCULATION SCENARIO command. It appears to be build off the parent OLAP column view but using a distinct syntax. In addition this view has all of the calculated columns that were missing from the OLAP column view.
From a SAP BusinessObjects Universe design standpoint it is important that I leverage the best performing column view. More importantly this information helps me understand which engine I will invoke based how I design the Universe and or the Information View. Thanks again for filling in the gaps.
Regards,
Jonathan Haun
Okay...
that one is easy and I kinda mentioned it already.
Once you start having a calculated column in an analytic view, the runtime object <package_name>/<view_name> is created as a wrapper calculation view.
In this wrapper the <package_name/<view_name>/olap runtime object (which is the pure analytic view w/o calc. column) is referenced.
The thing here is: even if you don't ask for the calculated column, the query will still need to go through the calc. engine in this case.
So, it really makes sense to spend some thoughts on where exactly a calculated column needs to be introduced (if this is actually required).
- Lars
Lars,
Again thanks for all the informational on this topic. I had a followup to one of your statements. You stated "That means after activating the analytic view, the whole join structure is a star schema query and it doesn't matter whether you join a table in the "data foundation" or in the"logical join" - it boils down to the same star query. That in mind it becomes clear what attribute views are for: reusable definitions of dimensions".
My question is:
I don't remember the source but I was reading in a HANA guide that stated that joining in the AV Analytic Foundation would lead to poor performance. That was the extent of the comment. However, as you stated there is technically no difference because joins are all treated the same once activated. I confirmed this when I visualize the plan of an example in my environment. I'm thinking the point of that comment was that joins between large tables or high cardinality tables anywhere in the Analytic View can slow performance. I have experienced this with joins between (for example) a 10M row Order Header and a 150M row order footer. However, If I use (for example) Data Services to pre-join these tables and then load them into a single table in SAP HANA, I will eliminate the join cost and increase the overall performance. To that extent, is it better to merge large fact or transaction tables before loading them into HANA to avoid the cost of joining in the OLAP Engine? Keep in mind I am referencing examples like the Order Header and Order Line Item fact tables.
On a side note, SAP HANA is still faster then any database I have experienced. It still manages to join the large tables faster than traditional databases, but I like to squeeze the absolute highest performance out of my solutions.
hmmm hmmm hmmm...
somebody wrote that something is slow somewhere... No way to argue here :-)
And actually impossible to give an answer to questions like yours, too.
Analytic views solve a very specific problem, star queries that is.
If you actually want to do star query processing, you know, one fact table, all dimensions only joined to this central table, all filters applied to the dimensions only, then the analytic view is the weapon of choice.
It's built for precisely this.
If this is not what you're after, then maybe there are better ways.
Prejoining during the data load might be an option, as the fastest piece of work still is the one you don't need to do.
Up to now, I've seen very different use cases for most of the mentioned options (and others as well) and usually the implementation variant that got chosen in the end wasn't totally clear from the start.
Alternative approaches are usually tried and tested and of course measured and based on this, a decision is made on what to use.
Which leads me to the all-time sneak out answer: it depends .
Cheers,
Lars
Hi Lars,
Referring to the your answer
"The join engine is also used, when you run plain SQL. Actually it's not too difficult to check what engine is used to process a query: simply run the Plan Visualization on your query.
The Plan Operators (POPs) you find in there - represented by the boxes in the graph - are named based on a naming scheme like this:
- JE<operator_name> => Join Engine (Attribute views, plain SQL)
- BW<operator_name> => Olap Engine (Analytic views)
- CE<operator_name> => Calc Engine (Analytic views with calculated attributes, Calculation views)"
I tried to search in blogs for the functionality but all I could find was the 'Explain Plan' Command from http://www.saphana.com/docs/DOC-2377
I am sorry but I could not find the option of Plan Visualization in SAP HANA Studio. Can you please help me locate this particular functionality. I am using SAP HANA Studio Version 52.
Regards,
Shreepad
In the SQL Editor, highlight a query, right click on it with the mouse and choose "Explain Plan".
Hi Shreepad,
in Show me the timelines, baby! I explained how to open the plan visualization.
it can be found close to the explain plan option in the context menu in the SQL editor window.
You can also navigate to it via the Shared SQL Cache list or the Expensive statements list.
Again, the option is to be found in the context menu.
- Lars
First of all - I want to say that this post has been incredibly helpful in understanding some of the visualization and execution plans, really informative.
I was also attempting to understanding when the various engines kick in for processing and I am observing a difference compared to what Jonathan had described described.
- Created analytic view AN_TEST_REFERENTIAL
- One entry in _SYS_BIC.copa-poc/AN_TEST_REFERENTIAL was observed
- In the execution plan, only the BWPop (OLAP engine) was observed with the following statement
SELECT MATNR, SUM(SALES_VAL) FROM "_SYS_BIC"."copa-poc/AN_TEST_REFERENTIAL"
GROUP BY MATNR
All is good at this point and behavior is as expected
- Add a calculated measure, simply SALES_VAL_X2 which is SALES_VAL * 2
- Still only one entry in _SYS_BIC, no additional column view is created. Additionally, the definition of the '/olap' column view contains the definition of the newly calculated column.
- In the execution plan, even though I am requesting the calculated column, only the BWPop (OLAP engine) seems to be firing. with this statement.
SELECT MATNR, SUM(SALES_VAL), SUM(SALES_VAL_X2) FROM "_SYS_BIC"."copa-poc/AN_TEST_REFERENTIAL" GROUP BY MATNR
This just seems different than what was described - and the core difference being that my assumption was that anytime a calculated column was introduced, the Calculation Engine would have to take over.
Any thoughts?
Thanks,
Justin
Justin,
There is a deference between a calculated measure and a calculated column. most calculated measures will stay in the OLAP engine. However, if you had a calculated column (lets say you concatenated two columns) then you would see 2 column views created in _SYS_BIC. One that covers the base star schema using the OLAP Engine. Another that builds on that to process the calculated column in the calculation engine. The column view with "/olap" is the base CV. The one without "/OLAP" is the calculation view. Also notice the different icons: Again this only happens when there is something in your design that has to be managed by the calculation engine.
Ahhh, that is the key here - calculated/derived column vs. a calculated measure. I was under the impression that ANY calculated column would force the calc engine, which is incorrect. This is very helpful as it tells me that pushing any calculated measures into the model is relatively cheap since it stays in the OLAP engine AND also performs the calc after aggregation (unless otherwise specified).
Now after adding a simple calculated column to move the value of MATNR, I can see the CE engine is firing and also the additional column views are created exactly as you have shown.
Thanks a bunch for the super fast response!
Justin
Jonathan, just one further question. Since two objects are created in the _SYS_BIC schema that together form the model, is there only one of these exposed for use by the BOBJ tools? IE does IDT see both of these? If so, I assume the /olap one is the one to use? Same question would be true for AAO and Explorer.
Thanks!
Justin
Assuming that you have been granted SELECT on the CVIEW in the _SYS_BIC schema, you will see both (Actually everything) within IDT. When designing your Universe you can choose between one or the other or both (using @Aggregate_aware). If you choose the Analytic View version, you loose the calculated column but stay in the OLAP Engine for processing. If you choose the calculation view version, you gain the calculated columns, but the calculation engine will be used (in addition to the OLAP Engine).
Also,
I believe Explorer only sees one when designing the InfoSpace and AOO only sees one as well. They both use the calculation view version if memory serves me correct but they might use some intelligence in the backed to automatically switch (Need SAP to clarify). That process is a component of the reporting tool's design. However, when designing with a Universe you have more direct control.
Thanks,
Thanks for the super fast response. In the absence of BOE at the moment, I was also able to test this through Lumira and Excel and found that you could only see one of these objects for use in analysis. So there does seem to be something in the backend that "tells" the tool that only one of the objects is able to be consumed directly.
Thanks for confirming the behavior through IDT. I assume the typical path would be to consume the calculation view version.
Thanks,
Justin
Hi Lars,
Indeed it seems "it depends"
Above you mentioned, "...after activating the analytic view, the whole join structure is a star schema query and it doesn't matter whether you join a table in the "data foundation" or in the"logical join" - it boils down to the same star query."
It seems it does matter (depend ). Following some other commentary and investigation, myself and others (i.e. Ravi) have seen significantly faster performance by adding tables in the Data Foundation rather than modeling as Attribute Views (i.e. contray to "best practice"). Also, VizPlan results show different plans depending on how the star schema is modeled. Here's the discussion - your commentary would be valued! http://scn.sap.com/thread/3433018
Hi All,
Thanks for this worferful post.I have seen real performance issue when AT with Calculated fields are used in a AV. When I use the same Table of AT and Join it in the Data foundation their was a 1000% improvement. But that is against the best practices of SAP of not using AT. Digging more I found the root cause for performance issue was my Calculated fields in AT. In the visual plan where Tables are joined at Table level only OLAP engine is used even with Calculated fields where as when AT with Calculated Fields even CALC engine comes to play.
Any way if any one can explain what to do and what not to do when building will be helpful.
Regards
Purnaram.k
Hi Purnaram,
you might want to have a look into the HANA modeling workshop sessions presented on SAP TechED.
Also, the updated HA300 modeling courses from SAP education include at least parts of the dos, dont's and what to do how tips.
- Lars
Hello Lars,
I saw this option "SQL Engine" in calculation view.
Does that mean am telling to system that execute the query using "SQL Engine". What are the benefits in choosing this option?
Regards,
Krishna Tangudu
Krishna,
In some cases the SQL Engine is faster at processing calculation views built to the standards mentioned in the model guide (listed below).
Copied From page 58 of the The Hana Modeling Guide:
"You can switch the execution of the calculation view to the SQL Engine rather that the Column Engine for better performance by selecting the SQL Engine value in the Execute In property. When this property is set to SQL Engine, the calculation engine generates SQL code for each node at runtime and combines them into one big SQL statement which is then passed to the SQL interface. This might be beneficial since the SQL interface also involves standard SQL optimizer which implements a lot more optimization rules than the Calculation Engine. For example, the most important rule is join ordering determination."
Link to Guide:
http://help.sap.com/hana/SAP_HANA_Modeling_Guide_en.pdf
Thanks Jonathan for the explanation and references, i undersand it now.
Regards,
Krishna Tangudu
I had an interesting observation and thought to revisit this thread as it's relevant to the topic.
I am facilitating some testing of other applications feeding from HANA. In many of the previous testing using views to "extract" data from HANA, very wide reads on views have caused extremely high memory usage, and only one single thread can be observed running for a long time.
However, when a very wide select (think SELECT *) is executed directly against the column table, it looks like there is another process going on. There appear to MANY small reads/threads popping up for just a few seconds and then falling off. When I look at the details in performance/threads, I can see the fetched record count is constantly increasing, and the visualize plan looks like this.
Since no specific engine is called out here on the components (JE/BW/CE) and based on the thread activity mentioned above, I have to assume there are just many small reads occurring in the SQL engine itself.
What is interesting to me is this buffer concept. I previously observed that in the other engines, the entire result is built before really pushing out for consumption, whereas here it looks like small pieces are being constantly added to this "buffer". It's really slow, but really cheap when I am monitoring the performance.
The thread details
At some later point
Lars, any info you can provide about what's going on "under the covers" on this? Is there some arbitrary cursor size that just keeps adding to some buffer/providing results to the consumer and then keeps moving on, unlike the OLAP/Calc engines that attempts to build the entire set? From the threads, I can see that each "packet" is right at 1,000 rows each.
The expectation was that large reads/extracts out of HANA would be very expensive to materialize the results, but this seems to contradict that assumption - it's just much slower.
Regards,
Justin
After some investigation, it looks like the smaller resultset thread behavior is a direct result of a consuming application parameter. In this case, the Sqoop tool is passing data into Hadoop, and the parameter was defaulted to 1000 rows at a time.
What is not crystal clear to me at this point is how this type of parameter is actually passed to HANA during a jdbc connection. From what I can see, there is nothing in the SQL statement itself nor anything in the connection string itself, but somehow in the intermittent thread and a query on "SYS"."M_CONNECTIONS" tells me that there is a max size for the connection.
Thread view
M_CONNECTIONS View
This is of interest (at least to me ) since this would clearly have an impact and ability to control the amount of resources that consuming applications/processes have on HANA.
Always something new,
Justin
Hi Justin
To be honest, I have not seen or investigated this feature yet. My first shot at it would be this:
for simple statements like yours, that don't require actual computation, aggregation or joining but look like a mass "dump of data" materializing the resultset first would not be the optimal choice.
Too much memory would be allocated just for spilling out raw data where the typical use case for such a query (e.g. "...let's dump it into a csv file...") usually is not performance critical.
Therefore it makes sense to materialize the resultset chunk wise.
I would guess that the size of these chunks is more or less automatically estimated by the SQLDBC wire protocol (which is also used by JDBC/ODBC) and I am not aware of any option to influence this.
All in all I'd say this is an optimisation for a rather specific use case in order to not overload the system memory with "data dumps", similar to what you have with direct path exports in Oracle that don't pass the cache.
- Lars
This document was generated from the following discussion: The SAP HANA Engines?