Databases protection
As we know, SAP HANA is a kind of in-memory database. How SAP HANA ensure that data consistency and correctness when system crash?
To answer this question, we should know that SAP HANA store data not only in memory but also in disk. And it is refer to a concept named database protection. It is means to prevent database from all kinds of interference and destruction, ensure the data save and reliable and recover rapidly from crash. So recovery technologies are important measures of databases protection. Transaction is a sequence of operation that can’t be split. For an example, bank transfer: account A transfer 100 dollar to account B. It is include two update operations:
- A=A-100
- B=B+100
These 2 operations cannot be split, they should either do both or never do at all. There are three kinds of state of transaction in log:
- <Start T> means transaction T has been started.
- <Commit T> means transaction T has been finished and all modifications have been written to database.
- <Abort T>means transaction T has been stop and all modifications have been undone.
Databases failure includes three types:
- Transaction failure is an internal failure of a single transaction and it will not affect other transaction.
- Media failure is hardware failure such as damage of disk, no space in disk, etc.
- System failure is soft failure such as outage, machine crash, etc. This kind of failure may result in memory data loss and affect all running transactions.
The goal of recovery of system failure is to recover system to state before failure happens.
Validation of recovery of SAP HANA system failure
The concepts mentioned above are applicable to SAP HANA database. So we can test it to validate recovery of SAP HANA system failure.At first, modify the interval of savepoint. In period of savepoint, SAP HANA system will persistent memory page to disk. It is 300s by default. We change it to 3000s.Open two SQL consoles and change the “auto commit” property to off.
Run console 1 sql command:
insert into "LOGTEST"."TEST" values(1,'谢谢大家关注HANAGeek,欢迎大家一起来学习SAP HANA知识,分享SAP HANA知识。'); |
Run console 2 sql command:
insert into "LOGTEST"."TEST" values(2,'谢谢大家关注HANAGeek,欢迎大家一起来学习SAP HANA知识,分享SAP HANA知识。');commit; |
Power off the machine of SAP HANA system. Then restart SAP HANA system and check the content of this table.We can regard console 1 and console 2 as transaction 1 and transaction 2. Because T1 executed one modification but committed it, SAP HANA rolled back to situation when T1 did not begin. Because T2 has committed before outage, SAP HANA recovered the system to the situation before outage even if system did not do savepoint operation.
Strategies of system failure
If the system failure is media failure, we need recover from copies of data at first. Then system will recover system using logs.
Transaction log
Transaction log is used to manage modifications in database system. It records all modification’s details. We do not need to persist all data when transaction is committed. Transaction log persistence is enough. When system crash, system’s last consistent state can be restore by replaying transaction logs. Hence logs must be recorded as chronological order.There are three types transaction log: undo log, redo log, undo/redo log. There are only two kinds of transaction logs in SAP HANA: undo log, redo log.There are three kinds of records in log files:
- <Start T> means transaction begin.
- <Commit, T>/ <Abort T> means transaction end.
- Update detail:
- Identification of transaction.
- Operation object.
- Value before update(undo log)/Value after update(redo log)/Value before update and value after update(undo/redo log).
Redo log
An important feature of redo log is that the log records must be written to disk before update data in to database system. The format of redo log record is <T,x,v> which T for identification of transaction, x for identification of update object and v for value after update.As shown below, operations of transaction T1: A=A-100,B=B+100. Left part of the picture is the steps of T1. Middle part of the picture is the content of redo log. Right part of the picture is initial value of A and B.The steps of the recovery of redo log:
- Start to scan redo log from head and find all truncations which have the identifier <Commit, T>. Put them in a truncation list L.
- Scan records <T, x, v>. If T belong to L, then
- Write(X ,v) (assign new value v to X)
- Output(X) (write X to database system)
- For each T not belong to L, do write <Abort, T> to log file.
We do not need to concern about transactions without <Commit, T> because they definitely did not write data to database system. We need to redo transactions which have <Commit, T> because they may have not written to database system.The writing of redo log is synchronous with the transaction process. When SAP HANA system restart after crash, it will process redo log to recover system. To improve the efficiency of log processing, SAP HANA system will do save-point (check point). In the period of save point, system persist data which did not persist since last save-point. Hence, only the redo log since last save-point needs to be processed. The redo log before last save-point can be removed.
Undo log
SAP HANA not only persist the update data of transaction which has committed, but also may persist data which has not committed. So we need undo log which has been persisted in disk. The format of undo log record is <T, x, v> whose v represents the value before update.As shown below, the operations of transaction T1: A=A-100, B=B+100. Left part of the picture is the steps of T1. Middle part of the picture is the content of undo log.The process of recovery:
- Start to scan redo log from head and find all truncations which don’t have the identifier <Commit, T> or <Abort, T>. Put them in a truncation list L.
- Scan records <T, x, v>. If T belong to L, then
• Write(X ,v) (assign new value v to X)• Output(X) (write X to database system)
- For each T not belong to L, do write <Abort, T> to log file.
In SAP HANA system, undo log do persistence when save-point which is different with redo log. Besides, undo log is written to the data area but not to the log area. The reason is that the system can be restore to the state of last save-point since restart from crash. If transactions after last save-point have committed, system can restore it using by redo log. If they have not committed, we do not need undo log after last save point to restore. So undo log after last save-point is useless. The advantages of this mechanism are:
- Fewer log records need to be persisted when transaction processing.
- It will slow the increase of disk.
- Database can be restored to the state of consistency from data area.
Save-point
When data base crashed, we need to scan all undo list and redo list to restore it. There are problems of this method:
- It will take a long time to scan the log.
- It will make the redo list too long, so take a long time to restore.
So SAP HANA chooses do save-point regularly:
- Do not accept new transactions.
- Write undo records to data area.
- Write modified memory pages into disk.
- Write identifier of save-point into redo log.
The process of save point is shown as below.