Quantcast
Channel: SCN : Document List - SAP HANA and In-Memory Computing
Viewing all articles
Browse latest Browse all 1183

Partitions in HANA for Performance Tuning

$
0
0

Partitions in HANA for Performance Tuning

Using the partitioning feature of the SAP HANA database, you can partition tables horizontally into disjunctive sub-tables or “partitions,” as they are also known. Partitioning supports the creation of very large tables by decomposing them into smaller and more manageable pieces. Partitioning is transparent for most SQL queries and Data Manipulation Language (DML) statements. This means that you need not modify these statements to support partitioning.

The following are benefits of partitioning:

  • Load balancing: Using partitioning, the individual partitions can be distributed over the landscape. This means that a query on a table is not processed by a single server but by all servers that the host partitions for processing.
  • Parallelization: Operations are parallelized by using several execution threads per table.
  • Partition pruning: Queries are analyzed to see if they match the given partition specification of a table. If a match is found, you can determine the actual partitions that hold the data in question. Using this method, you can reduce the overall load on the system, which typically speeds up response time.
  • Explicit partition handling: Applications can actively control partitions, for example by adding partitions that will hold the data for an upcoming month.

 

Note: A non-partitioned table cannot store more than 2 billion rows. By using partitioning, you can overcome this limit by distributing the rows to several partitions.

The delta merge performance of the database is dependent on the size of the main index. If data is only being modified on some partitions, there will be fewer partitions that need to be delta-merged, and therefore performance will increase.

Note: Partitioning is typically used in distributed landscapes, but it may also be beneficial for single-host systems. Partitioning is available for column store tables only.

 

Single-Level Partitioning

You can distribute rows to partitions using different types of partitioning known as partition specifications. The HANA database offers hash, range and round-robin as single-level partition specifications.

 
 

 

1. Hash

Hash partitioning equally distributes rows to partitions for load balancing and for overcoming the 2 billion rows limitation. Usually, implementation does not require in-depth knowledge of the actual content of a table. Each hash partition specification requires columns to be specified as partitioning columns. The values of these fields are used when the hash value is determined. If the table has a primary key, these columns must be part of that key. This restriction comes with the advantage of a uniqueness check of the key which can be performed on the local server. You can use as many partitioning columns as required to achieve a good variety of values for an equal distribution.

Hash Syntax

CREATE COLUMN TABLE <table name>

(<column_1> <DATA_TYPE>, <column_2> <DATA_TYPE>, <column_3> <DATA_TYPE>, PRIMARY KEY (<column_1>, <column_2>))

       PARTITION BY HASH (<column_1>, <column_2>) PARTITIONS 4

 

  • Creates 4 partitions columns a and b
  • At least one column has to be specified
  • All columns specified must be part of the primary key
  • PARTITION BY HASH (<column_1>, <column_2>) PARTITIONS GET_NUM_SERVERS () - The number of partitions is determined by the engine at runtime according to its configuration. It is recommended that you use this function in scripts, etc.

Example:

 

CREATE COLUMN TABLE partition_HASH(

                SALES_ORDER NVARCHAR(2) PRIMARY KEY,

CUSTOMER NVARCHAR(4),

MATERIAL NVARCHAR(4),

CAL_DAY NVARCHAR(8),

QUANTITY DOUBLE,

PRICE DOUBLE,

TAX DOUBLE,

REVENUE DOUBLE)

PARTITION BY HASH(SALES_ORDER) PARTITIONS 4

 

   

 

2. Round-robin(Default)

Round-robin is similar to hash partitioning because it is used for an equal distribution of rows to parts. When using this method, you do not need to specify partitioning columns.

Hash partitioning is usually more beneficial than round-robin partitioning for the following reasons:

  • The partitioning columns can be evaluated in a pruning step; therefore, all partitions will be considered in searches and other database operations.
  • Depending on the scenario, it is possible that the data within semantically related tables resides on the same server. Some internal operations may then operate locally instead of retrieving data from a remote system.

Round-robin Syntax

CREATE COLUMN TABLE <table name>

(<column_1> <DATA_TYPE>, <column_2> <DATA_TYPE>, <column_3> <DATA_TYPE>)

PARTITION BY ROUNDROBIN PARTITIONS 4

 

Note: The table NEED not have primary keys

Example:

 

CREATE COLUMN TABLE partition_rr(

                SALES_ORDER NVARCHAR(2),

CUSTOMER NVARCHAR(4),

MATERIAL NVARCHAR(4),

CAL_DAY NVARCHAR(8),

QUANTITY DOUBLE,

PRICE DOUBLE,

TAX DOUBLE,

REVENUE DOUBLE)

PARTITION BY ROUNDROBIN PARTITIONS 4

 
 

 

3. Range

Range partitioning creates dedicated partitions for certain values or certain value ranges. Usually, this requires in-depth knowledge of the values that are used or are valid for the selected partitioning column. For example, you can choose a range partitioning scheme to create one partition per month of the year. Note: Range partitioning is not optimal for load distribution.

The range partition specification usually takes ranges of values to determine one partition ( e.g., 1 to 10). It is also possible to define a partition for a single value. In this way, a list partitioning, known in other database systems, can be emulated and also mixed with range partitioning.

When inserting or modifying rows, the target partition is determined by the defined ranges. If a value does not fit into one of these ranges, an error is raised. If this is not intended, you can define a “rest partition” where all rows that do not match with any of the defined ranges will be inserted. You can create or drop rest partitions on-the-fly.

Range partitioning is similar to hash partitioning in that the partitioning column must be part of the primary key. Range partitioning also has restrictions on the data types that can be used. Only strings, integers and dates are allowed.

Range Syntax

CREATE COLUMN TABLE <table name>

(<column_1> <DATA_TYPE>, <column_2> <DATA_TYPE>, <column_3> <DATA_TYPE>,PRIMARY KEY (<column_1>, <column_2>))

PARTITION BY RANGE (<column_1>)

       (PARTION 1 <= VALUES < 5,

        PARTION 5 <= VALUES < 20,

        PARTION VALUE = 50,

        PARTION OTHERS)

 

  • Create partitions for ranges using <= VALUES < semantics
  • Create partitions for single values using VALUE = semantics
  • Create a rest partition for all values that do not match the other ranges using PARTITION OTHERS

Example:

CREATE COLUMN TABLE partition_range(

                SALES_ORDER NVARCHAR(2) PRIMARY KEY,

CUSTOMER NVARCHAR(4),

MATERIAL NVARCHAR(4),

CAL_DAY NVARCHAR(8),

QUANTITY DOUBLE,

PRICE DOUBLE,

TAX DOUBLE,

REVENUE DOUBLE)

PARTITION BY RANGE (SALES_ORDER)

(PARTITION '1' <= VALUES < '5',

PARTITION OTHERS)

 
 

Multi-Level Partitioning

For some tables, it is beneficial to partition by a column that is not part of the primary key. For example, if a date column is present, it is desirable to leverage it in order to build partitions per month or year.

Hash and range partitioning have the restriction of only being able to use key columns as partitioning columns. You can overcome this restriction by using the multi-level partitioning.

Multi-level partitioning is the technical implementation of time-based partitioning, in which a date column is leveraged:

The performance of the delta merge depends on the size of the main index of a table. If data is inserted into a table over time and it also contains temporal information in its structure (e.g., a date), multi-level partitioning may be an ideal candidate. If the partitions containing old, infrequently modified data, there is no need for a delta merge on these partitions; the delta merge is only required on new partitions, where new data is inserted. Therefore, its run-time is constant over time as new partitions are created and used.

Note: When using SQL commands to move partitions, it is not possible to move individual parts of partition groups. You can move only partition groups as a whole.

Hash-Range - Hash-range multi-level partitioning is most typically used. It is implemented with hash on the first level for load balancing and range on the second level to determine the time criterion.

Example

CREATE COLUMN TABLE <table name>

(<column_1> <DATA_TYPE>, <column_2> <DATA_TYPE>, <column_3> <DATA_TYPE>, PRIMARY KEY (<column_1>, <column_2>))

PARTITION BY HASH (<column_1>, <column_2>) PARTITIONS 4, RANGE (<column_3>)

(PARTITION 1 <= VALUES < 5,

PARTITION 5 <= VALUES < 20)

 

Round-robin Range - This is similar to Hash Range but with Round-robin on the first level.

Example

CREATE COLUMN TABLE <table name>

(<column_1> <DATA_TYPE>, <column_2> <DATA_TYPE>, <column_3> <DATA_TYPE>)           PARTITION BY ROUNDROBIN PARTITIONS 4, RANGE (<column_3>)

(PARTITION 1 <= VALUES < 5,

PARTITION 5 <= VALUES < 20)

 

Hash-Hash - This is two-level partitioning with Hash on both levels. The advantage is that the Hash on the second level may be defined on a non-key column.

Example

CREATE COLUMN TABLE <table name>

(<column_1> <DATA_TYPE>, <column_2> <DATA_TYPE>, <column_3> <DATA_TYPE>, PRIMARY KEY (<column_1>, <column_2>))

PARTITION BY HASH (<column_1>, <column_2>) PARTITIONS 4,

HASH (<column_3>) PARTITIONS 5

 
 

 

Explicit Partition Handling

For all partition specifications involving Range, it is possible to add additional ranges or to remove them at will. This causes partitions to be created or dropped as required by the ranges in use. In the case of a multi-level partitioning, the desired operation will be applied to all nodes.

If a partition is created and if a rest partition exists, the rows of the rest partition that match the newly added range are moved into the new partition. If the rest partition is large, note that this operation may take a long time; internally the split operation is executed. If a rest partition does not exist, this operation will be fast only as a new partition is added to the catalog.

Syntax -              ALTER TABLE <mytab> ADD PARTITION 100 <= VALUES < 200

ALTER TABLE <mytab> DROP PARTITION 100 <= VALUES < 200

 

It is also possible to create or drop a rest partition using the following syntax:

ALTER TABLE <mytab> ADD PARTITION OTHERS

ALTER TABLE <mytab> DROP PARTITION OTHERS

 

 

Moving Partitions

Partitions and partition groups can be moved to other servers. As mentioned before, when moving partition groups it is only possible to move an entire group. However, in the case of single-level partitioning, each partition forms its own group. To see how partitions and groups relate to each other, refer to the monitoring view M_CS_PARTITIONS. To see the current location of a partition, refer to M_TABLE_LOCATIONS.

Syntax-            ALTER TABLE <mytab> MOVE PARTITION 1 TO '<host:port>'

Where, port is the port of the target index server and not the SQL port.

 

Split/Merge Operations

You can determine how to partition a table either upon creation or at a later date. The split/merge operations can be used to transform a non-partitioned table into a partitioned table, and vice versa


Change table partitioning

  • Change the partitioning specification e.g. from Hash to Round-robin
  • Change the partitioning column
  • Split partitions into more partitions
  • Merge partitions into less partitions

 

The split/merge operation can be costly for the following reasons:

  • Long run time (i.e., it may take up to several hours for large tables)
  • Relatively high memory consumption
  • Exclusive lock requirement (only selects are allowed)
  • Delta merge performed in advance
  • Everything written into the log, which is required for backup and recovery

It is recommended that you split tables before inserting mass data or while they are still small. If a table is not partitioned and reaches configurable absolute thresholds, or a table grows a certain percentage per day, an alert is raised by the statistics server to inform the administrator.

There are three types of re-partitioning:

  1. 1. From n to m partitions where m is not a multiple/divider of n, for example from HASH 3 X to HASH 2 X.
  2. 2. From n to n partitions using a different partition specification or different partitioning columns, for example HASH 3 X to HASH 3 Y.
  3. 3. From n to m partitions where m is a multiple/divider of n, for example HASH 3 X to HASH 6 X.

In the first two cases, all source parts must be located on the same host. Up to one thread per column is used to split/merge the table.

For the third case, it is not required to move all parts to a common server. Instead, the split/merge request is broadcasted to each host where the partitions reside. Up to one thread per column and source part is used. This type of split/merge operation is typically faster, as it is always recommended to choose a multiple or divider of the source parts as number of target parts. This type of re-partitioning is called “parallel split/merge”.


Syntax -           ALTER TABLE mytab PARTITION BY...

This can be applied to non-partitioned tables and to partitioned tables.

Depending on the type of the split/merge operation (see above) it may be necessary to move partitions beforehand.

ALTER TABLE mytab MERGE PARTITIONS

Merge all parts of a partitioned table into a non-partitioned table.

All source partitions must reside on the same server.


 

 

Parallelism and Memory Consumption

Split/merge operations consume a high amount of memory. To reduce the memory consumption, configure the number of threads used.

The parameter split_threads in section [partitioning] in indexserver.ini can be set to change default for split/merge operations. If it is not set, 16 threads are used. In the case of a parallel split/merge, the individual operations use a total of the configured number of threads per host. Each operation uses at least one thread.

If a table has a history index, it is possible to split the main and history index in parallel. Use parameter split_history_parallel in section [partitioning] in indexserver.ini. The default is "no".


Delta Merge

The delta merge may operate in parallel on all available partitions. There are three parameters that control how the threading is handled. To configure the delta merge, use the following parameters of section [indexing] in hdbindexserver.ini:

  • One thread per server (default): set parallel_merge_location to "yes".
  • Configurable number of threads: set parallel_merge_location to "no"
  • parallel_merge_part_threads: to the number of threads you wish to use. The default is 5.
  • One thread per part: set parallel_merge_location to "no" and parallel_merge_part_threads to "0".

Note:  This may have a negative effect on the overall performance of the system if a table has a large number of partitions.

A delta merge thread is not necessarily shown during merges on the master server, as no dedicated thread per partition is started.


Viewing all articles
Browse latest Browse all 1183

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>