Manual (cold) recovery

Manual (cold) recovery uses the information provided by the Resource Managers querying them using xa_recover(). This type of recovery should be used to resolve some unusual situations.

If, for any reason, the LIXA state server (lixad) forgot the state of a transaction [54] or the transaction is in state recovery failed because a previous automatic (warm) recovery failed, you have to manually recover the transaction. The procedure to recover a forgotten transaction is the same you use to recover a recovery failed one, but additional server side clean-up is suggested for recovery failed transactions.

Recoverying forgotten transactions

This example necessitates of the same environment set-up in the section called “Forcing automatic recovery”; you must start running the example program after you enabled the LIXA_CRASH_POINT environment variable:

[Shell terminal session]
tiian@ubuntu:~/tmp$ export LIXA_CRASH_POINT=15
tiian@ubuntu:~/tmp$ echo $LIXA_CRASH_POINT
15
tiian@ubuntu:~/tmp$ ./example6_pql_ora insert
Deleting a row from the tables...
Oracle DELETE statement executed!
Aborted
	  

and check there is a recovery pending transaction inside PostgreSQL and Oracle Resource Managers:

[Oracle terminal session]
SQL> select * from dba_pending_transactions;

  FORMATID
----------
GLOBALID
--------------------------------------------------------------------------------
BRANCHID
--------------------------------------------------------------------------------
1279875137
957747F7F37B439EBCEA4146076AD322
9BAC7BE1C129EA6EE31F2D71B318120C
	  

[PostgreSQL terminal session]
testdb=> select * from pg_prepared_xacts;
 transaction |                                    gid                                       |           prepared            | owner | database 
-------------+------------------------------------------------------------------------------+-------------------------------+-------+----------
         877 | 1279875137.957747f7f37b439ebcea4146076ad322.9bac7be1c129ea6ee31f2d71b318120c | 2011-12-14 22:55:14.973443+01 | tiian | testdb
	  

Then you should stop and cold start the lixad state server:

[Shell terminal session]
tiian@ubuntu:~/tmp$ sudo su - lixa
lixa@ubuntu:~$ ps -ef|grep lixad|grep -v grep
lixa     24437     1  0 21:19 ?        00:00:00 /opt/lixa/sbin/lixad --daemon
lixa@ubuntu:~$ pkill lixad
lixa@ubuntu:~$ ps -ef|grep lixad|grep -v grep
lixa@ubuntu:~$ ls /opt/lixa/var/
lixad_status1_1  lixad_status2_1  lixad_status3_1  README
lixad_status1_2  lixad_status2_2  lixad_status3_2
lixa@ubuntu:~$ rm /opt/lixa/var/lixad_status*
lixa@ubuntu:~$ ls /opt/lixa/var/
README
lixa@ubuntu:~$ /opt/lixa/sbin/lixad --daemon
lixa@ubuntu:~$ ps -ef|grep lixad|grep -v grep
lixa     28594     1  0 23:00 ?        00:00:00 /opt/lixa/sbin/lixad --daemon
lixa@ubuntu:~$ ls /opt/lixa/var/
lixad_status1_1  lixad_status2_1  lixad_status3_1  README
lixad_status1_2  lixad_status2_2  lixad_status3_2  run.pid
lixa@ubuntu:~$ exit
logout
	  

These are the operations you just performed:

  • changed the user from your own to lixa user

  • checked the lixad daemon was running

  • stopped the lixad daemon

  • checked the lixad daemon was not running

  • checked the lixad's state files

  • removed the lixad's state files

  • started the lixad daemon

  • checked the new lixad's state files

Running lixat does not automatically recover the transaction:

[Shell terminal session]
tiian@ubuntu:~/tmp$ export LIXA_TRACE_MASK=0x00040000
tiian@ubuntu:~/tmp$ /opt/lixa/bin/lixat 
tx_open(): 0
tx_close(): 0
	  

Note

The program does not produce trace because client recovery module is not called (the state server does not pass information about recovery pending transactions to the LIXA client library).

[Oracle terminal session]
SQL> select * from dba_pending_transactions;

  FORMATID
----------
GLOBALID
--------------------------------------------------------------------------------
BRANCHID
--------------------------------------------------------------------------------
1279875137
957747F7F37B439EBCEA4146076AD322
9BAC7BE1C129EA6EE31F2D71B318120C
	  

[PostgreSQL terminal session]
testdb=> select * from pg_prepared_xacts;
 transaction |                                    gid                                       |           prepared            | owner | database 
-------------+------------------------------------------------------------------------------+-------------------------------+-------+----------
         877 | 1279875137.957747f7f37b439ebcea4146076ad322.9bac7be1c129ea6ee31f2d71b318120c | 2011-12-14 22:55:14.973443+01 | tiian | testdb
	  

The lixar utility program with -p option can be used to list the prepared transactions that can not be automatically recovered:

[Shell terminal session]
tiian@ubuntu:~/tmp$ /opt/lixa/bin/lixar -p
Execution options:
	- print report = yes
	- transaction(s) will be committed = no
	- transaction(s) will be rolled back = no
	- bypass xid branch qualifier check = no
	- bypass xid format id check = no
	- use TMENDRSCAN flag for last xa_recover call = no

Recovery environment:
LIXA_CONFIG_FILE_ENV_VAR = '(null)'
LIXA_PROFILE_ENV_VAR = 'PQL_STA_ORA_DYN'
LIXA_JOB_ENV_VAR = '(null)'

Resource manager list:
rmid=0, lixa_name='PostgreSQL_stareg', xa_name='PostgreSQL[LIXA]'
rmid=1, lixa_name='OracleXE_dynreg', xa_name='Oracle_XA'

Prepared and in-doubt transaction list:
xid='1279875137.957747f7f37b439ebcea4146076ad322.9bac7be1c129ea6ee31f2d71b318120c': rmid=0 rmid=1 
	  

it reports some useful information and at the bottom there is the list of in-doubt transactions. The transaction is prepared/in-doubt for both Resource Managers; sometimes it may be only for a subset of the Resource Managers defined by LIXA_PROFILE. You can manually recover the transaction using the -x and -r (rollback) or -c (commit):

[Shell terminal session]
tiian@ubuntu:~/tmp$ /opt/lixa/bin/lixar -p -x 1279875137.957747f7f37b439ebcea4146076ad322.9bac7be1c129ea6ee31f2d71b318120c -r
Execution options:
	- print report = yes
	- transaction to commit/rollback = 1279875137.957747f7f37b439ebcea4146076ad322.9bac7be1c129ea6ee31f2d71b318120c
	- transaction(s) will be committed = no
	- transaction(s) will be rolled back = yes
	- bypass xid branch qualifier check = no
	- bypass xid format id check = no
	- use TMENDRSCAN flag for last xa_recover call = no

Recovery environment:
LIXA_CONFIG_FILE_ENV_VAR = '(null)'
LIXA_PROFILE_ENV_VAR = 'PQL_STA_ORA_DYN'
LIXA_JOB_ENV_VAR = '(null)'

Resource manager list:
rmid=0, lixa_name='PostgreSQL_stareg', xa_name='PostgreSQL[LIXA]'
rmid=1, lixa_name='OracleXE_dynreg', xa_name='Oracle_XA'

Prepared and in-doubt transaction list:
xid='1279875137.957747f7f37b439ebcea4146076ad322.9bac7be1c129ea6ee31f2d71b318120c': rmid=0 rmid=1 

Analizing transaction '1279875137.957747f7f37b439ebcea4146076ad322.9bac7be1c129ea6ee31f2d71b318120c':
xa_rollback --> rmid=0, lixa_name='PostgreSQL_stareg', xa_name='PostgreSQL[LIXA]', rc=0
xa_rollback --> rmid=1, lixa_name='OracleXE_dynreg', xa_name='Oracle_XA', rc=0
	  

You can now verify there are no prepared/in-doubt transactions inside the Resource Managers:

[Oracle terminal session]
SQL> select * from dba_pending_transactions;

no rows selected
	  

[PostgreSQL terminal session]
testdb=> select * from pg_prepared_xacts;
 transaction | gid | prepared | owner | database 
-------------+-----+----------+-------+----------
(0 rows)
	  

We rolled back the deletion so the rows must be in place:

[Oracle terminal session]
SQL> select * from COUNTRIES where COUNTRY_ID = 'RS';

COUNTR
------
COUNTRY_NAME
--------------------------------------------------------------------------------
 REGION_ID
----------
RS
Repubblica San Marino
	 1
	  

[PostgreSQL terminal session]
testdb=> select * from AUTHORS;
 id | last_name | first_name 
----+-----------+------------
  1 | Foo       | Bar
	  

You can retrieve some help from lixar with -? option:

[Shell terminal session]
tiian@ubuntu:~/tmp$ /opt/lixa/bin/lixar -?
Usage:
  lixar [OPTION...] - LIXA recovery utility

Help Options:
  -?, --help                      Show help options

Application Options:
  -p, --print                     Print a report of all the prepared and in-doubt transactions compatible with current configuration and profile
  -x, --xid                       Select specified transaction for rollback/commit
  -X, --xid-file                  Select specified file as a list of transaction to rollback/commit
  -c, --commit                    Commit prepared & in-doubt transactions
  -r, --rollback                  Rollback prepared & in-doubt transactions
  -v, --version                   Print package info and exit
  -b, --bypass-bqual-check        Bypass xid branch qualifier check
  -B, --bypass-formatid-check     Bypass xid format id check
  -e, --use-tmendrscan-flag       Use TMENDRSCAN flag for last xa_recover call
	  

Recoverying a recovery failed transaction

This example is quite complex because a recovery failed transaction is unlikely. To create a recovery failed transaction we will use a special Resource Manager, the LIXA monkey one: it's a R.M. we can program to answer the Transaction Manager as we desire. We will emulate an heuristically completed transaction to force the LIXA Transaction Manager marking it as recovery failed.

Before we can start, you must set-up the same environment already explained in the section called “Recoverying forgotten transactions”. Open three terminals session and prepare the sessions as shown below:

[Shell terminal session]
tiian@ubuntu:~/tmp$ . /usr/lib/oracle/xe/app/oracle/product/10.2.0/server/bin/oracle_env.sh
tiian@ubuntu:~/tmp$ echo $ORACLE_HOME
/usr/lib/oracle/xe/app/oracle/product/10.2.0/server
tiian@ubuntu:~/tmp$ echo $ORACLE_SID
XE
tiian@ubuntu:~/tmp$ export LIXA_PROFILE=MON_STA_PQL_STA_ORA_DYN
tiian@ubuntu:~/tmp$ echo $LIXA_PROFILE
MON_STA_PQL_STA_ORA_DYN
tiian@ubuntu:~/tmp$ echo $LD_LIBRARY_PATH
/usr/lib/oracle/xe/app/oracle/product/10.2.0/server/lib:
	  

The specified LIXA_PROFILE points to a configuration with three Resource Managers: LIXA Monkey (a fake R.M.), PostgreSQL (static registration) and Oracle DBMS (dynamic registration).

[Oracle terminal session]
tiian@ubuntu:~$ . /usr/lib/oracle/xe/app/oracle/product/10.2.0/server/bin/oracle_env.sh
tiian@ubuntu:~$ sqlplus "hr/hr"

SQL*Plus: Release 10.2.0.1.0 - Production on Ven Dic 16 16:15:23 2011

Copyright (c) 1982, 2005, Oracle.  All rights reserved.


Connesso a:
Oracle Database 10g Express Edition Release 10.2.0.1.0 - Production

SQL> select * from COUNTRIES where COUNTRY_ID = 'RS';

no rows selected
	  

[PostgreSQL terminal session]
tiian@ubuntu:~$ psql testdb
Welcome to psql 8.3.16, the PostgreSQL interactive terminal.

Type:  \copyright for distribution terms
       \h for help with SQL commands
       \? for help with psql commands
       \g or terminate with semicolon to execute query
       \q to quit

testdb=> SELECT * FROM authors;
 id | last_name | first_name 
----+-----------+------------
(0 rows)
	  

Start the LIXA state server if it's not active:

[Shell terminal session]
tiian@ubuntu:~/tmp$ sudo su - lixa
lixa@ubuntu:~$ /opt/lixa/sbin/lixad --daemon
lixa@ubuntu:~$ ps -ef|grep lixad|grep -v grep
lixa      7127     1  0 22:13 ?        00:00:00 /opt/lixa/sbin/lixad --daemon
lixa@ubuntu:~$ exit
logout
	  

Before we can start the program execution, we must create a file named monkeyrm.conf in the current directory and put the following content inside it:

[Content of file monkeyrm.conf]
xa_open/0
xa_start/0
xa_end/0
xa_prepare/0
xa_commit/0
xa_close/0
	  

Now you can execute the program to verify it's running as expected; we are tracing the module client XA switch (see the section called “Tracing modules”) to verify the LIXA Monkey Resource Manager is properly configured:

[Shell terminal session]
tiian@ubuntu:~/tmp$ export LIXA_TRACE_MASK=0x00010000
tiian@ubuntu:~/tmp$ ./example6_pql_ora insert 2>&1 | grep monkey
2011-12-15 22:42:20.709697 [22490/3052672768] lixa_monkeyrm_open: xa_info='monkeyrm.conf', rmid=1, flags=0x0
2011-12-15 22:42:20.710101 [22490/3052672768] lixa_monkeyrm_open: creating new first level hash table...
2011-12-15 22:42:20.713087 [22490/3052672768] lixa_monkeyrm_open/g_hash_table_new_full/monkey_status: 0x804caa0
2011-12-15 22:42:20.713361 [22490/3052672768] lixa_monkeyrm_open: creating new second level hash table for tid=3052672768
2011-12-15 22:42:20.713719 [22490/3052672768] lixa_monkeyrm_open/g_hash_table_new_full/slht: 0x804cac8
2011-12-15 22:42:20.713994 [22490/3052672768] lixa_monkeyrm_open: creating new status block for tid=3052672768, rmid=1
2011-12-15 22:42:20.714248 [22490/3052672768] lixa_monkeyrm_open/g_malloc/mss: 0x804e998
2011-12-15 22:42:20.714521 [22490/3052672768] lixa_monkeyrm_open_init
2011-12-15 22:42:20.714783 [22490/3052672768] lixa_monkeyrm_open_init/g_array_new/mss->records: 0x804c550 
2011-12-15 22:42:20.715139 [22490/3052672768] lixa_monkeyrm_open_init: verb='xa_open'
2011-12-15 22:42:20.715430 [22490/3052672768] lixa_monkeyrm_open_init: appending record verb=1, rc=0
2011-12-15 22:42:20.715691 [22490/3052672768] lixa_monkeyrm_open_init: verb='xa_start'
2011-12-15 22:42:20.715946 [22490/3052672768] lixa_monkeyrm_open_init: appending record verb=3, rc=0
2011-12-15 22:42:20.716340 [22490/3052672768] lixa_monkeyrm_open_init: verb='xa_end'
2011-12-15 22:42:20.716602 [22490/3052672768] lixa_monkeyrm_open_init: appending record verb=4, rc=0
2011-12-15 22:42:20.716864 [22490/3052672768] lixa_monkeyrm_open_init: verb='xa_prepare'
2011-12-15 22:42:20.717123 [22490/3052672768] lixa_monkeyrm_open_init: appending record verb=5, rc=0
2011-12-15 22:42:20.717377 [22490/3052672768] lixa_monkeyrm_open_init: verb='xa_commit'
2011-12-15 22:42:20.717671 [22490/3052672768] lixa_monkeyrm_open_init: appending record verb=6, rc=0
2011-12-15 22:42:20.717937 [22490/3052672768] lixa_monkeyrm_open_init: verb='xa_close'
2011-12-15 22:42:20.718196 [22490/3052672768] lixa_monkeyrm_open_init: appending record verb=2, rc=0
2011-12-15 22:42:20.718519 [22490/3052672768] lixa_monkeyrm_open_init/excp=3/ret_cod=0/errno=0
2011-12-15 22:42:20.718816 [22490/3052672768] lixa_monkeyrm_get_rc
2011-12-15 22:42:20.726862 [22490/3052672768] lixa_monkeyrm_get_rc: verb is 1, XA return code is 0
2011-12-15 22:42:20.727177 [22490/3052672768] lixa_monkeyrm_get_rc/excp=2/ret_cod=0/errno=0
2011-12-15 22:42:20.727429 [22490/3052672768] lixa_monkeyrm_open/excp=4/ret_cod=0/xa_rc=0/errno=0
2011-12-15 22:42:21.022655 [22490/3052672768] lixa_monkeyrm_start: xid='1279875137.7bdcf6c060e14bdba416302661187411.a100c8728292168b21ba7239bffc137d', rmid=1, flags=0x0
2011-12-15 22:42:21.022695 [22490/3052672768] lixa_monkeyrm_get_rc
2011-12-15 22:42:21.022705 [22490/3052672768] lixa_monkeyrm_get_rc: verb is 3, XA return code is 0
2011-12-15 22:42:21.022712 [22490/3052672768] lixa_monkeyrm_get_rc/excp=2/ret_cod=0/errno=0
2011-12-15 22:42:21.022719 [22490/3052672768] lixa_monkeyrm_start/excp=4/ret_cod=0/xa_rc=0/errno=0
2011-12-15 22:42:21.059688 [22490/3052672768] lixa_monkeyrm_end: xid='1279875137.7bdcf6c060e14bdba416302661187411.a100c8728292168b21ba7239bffc137d', rmid=1, flags=0x4000000
2011-12-15 22:42:21.059701 [22490/3052672768] lixa_monkeyrm_get_rc
2011-12-15 22:42:21.059709 [22490/3052672768] lixa_monkeyrm_get_rc: verb is 4, XA return code is 0
2011-12-15 22:42:21.059716 [22490/3052672768] lixa_monkeyrm_get_rc/excp=2/ret_cod=0/errno=0
2011-12-15 22:42:21.059723 [22490/3052672768] lixa_monkeyrm_end/excp=4/ret_cod=0/xa_rc=0/errno=0
2011-12-15 22:42:21.076478 [22490/3052672768] lixa_monkeyrm_prepare: xid='1279875137.7bdcf6c060e14bdba416302661187411.a100c8728292168b21ba7239bffc137d', rmid=1, flags=0x0
2011-12-15 22:42:21.076498 [22490/3052672768] lixa_monkeyrm_get_rc
2011-12-15 22:42:21.076512 [22490/3052672768] lixa_monkeyrm_get_rc: verb is 5, XA return code is 0
2011-12-15 22:42:21.076520 [22490/3052672768] lixa_monkeyrm_get_rc/excp=2/ret_cod=0/errno=0
2011-12-15 22:42:21.076527 [22490/3052672768] lixa_monkeyrm_prepare/excp=4/ret_cod=0/xa_rc=0/errno=0
2011-12-15 22:42:21.098722 [22490/3052672768] lixa_monkeyrm_commit: xid='1279875137.7bdcf6c060e14bdba416302661187411.a100c8728292168b21ba7239bffc137d', rmid=1, flags=0x0
2011-12-15 22:42:21.098747 [22490/3052672768] lixa_monkeyrm_get_rc
2011-12-15 22:42:21.098756 [22490/3052672768] lixa_monkeyrm_get_rc: verb is 6, XA return code is 0
2011-12-15 22:42:21.098764 [22490/3052672768] lixa_monkeyrm_get_rc/excp=2/ret_cod=0/errno=0
2011-12-15 22:42:21.098770 [22490/3052672768] lixa_monkeyrm_commit/excp=4/ret_cod=0/xa_rc=0/errno=0
2011-12-15 22:42:21.102841 [22490/3052672768] lixa_monkeyrm_close: xa_info='', rmid=1, flags=0x0
2011-12-15 22:42:21.102853 [22490/3052672768] lixa_monkeyrm_get_rc
2011-12-15 22:42:21.102862 [22490/3052672768] lixa_monkeyrm_get_rc: verb is 2, XA return code is 0
2011-12-15 22:42:21.102869 [22490/3052672768] lixa_monkeyrm_get_rc/excp=2/ret_cod=0/errno=0
2011-12-15 22:42:21.102877 [22490/3052672768] lixa_monkeyrm_close/excp=4/ret_cod=0/xa_rc=0/errno=0
tiian@ubuntu:~/tmp$ unset LIXA_TRACE_MASK
tiian@ubuntu:~/tmp$ ./example6_pql_ora delete
Deleting a row from the tables...
Oracle DELETE statement executed!
	  

Don't forget the clean-up step (last command) before performing the next steps.

To create a recovery pending transaction, we are going to force a crash after xa_prepare() functions completed successfully:

[Shell terminal session]
tiian@ubuntu:~/tmp$ export LIXA_CRASH_POINT=15
tiian@ubuntu:~/tmp$ echo $LIXA_CRASH_POINT
15
tiian@ubuntu:~/tmp$ ./example6_pql_ora insert
Inserting a row in the tables...
Oracle INSERT statement executed!
Aborted
tiian@ubuntu:~/tmp$ unset LIXA_CRASH_POINT
tiian@ubuntu:~/tmp$ echo $LIXA_CRASH_POINT

	  

Verify there is a prepared/in-doubt transaction inside Oracle:

[Oracle terminal session]
SQL> select * from dba_pending_transactions;

  FORMATID
----------
GLOBALID
--------------------------------------------------------------------------------
BRANCHID
--------------------------------------------------------------------------------
1279875137
D2F5F0A5C37E485CB44D9FC16E72A33D
68D0E0CCBC6FC4B7FA616DF9AB122395
	  

Verify there is a prepared/in-doubt transaction inside PostgreSQL:

[PostgreSQL terminal session]
testdb=> select * from pg_prepared_xacts;
 transaction |                                    gid                                       |           prepared            | owner | database 
-------------+------------------------------------------------------------------------------+-------------------------------+-------+----------
         964 | 1279875137.d2f5f0a5c37e485cb44d9fc16e72a33d.68d0e0ccbc6fc4b7fa616df9ab122395 | 2011-12-16 16:38:47.921947+01 | tiian | testdb
	  

To move the transaction in recovery failed status we emulate an heuristically rolled back in the automatic recovery step. Edit the file monkeyrm.conf to code the new behavior of the LIXA Monkey R.M.:

[Content of file monkeyrm.conf]
xa_open/0
xa_commit/6
xa_close/0
	  

Start the lixat utility command to start an automatic (warm) recovery; trace the client recovery (see the section called “Tracing modules”) module to understand what happens:

[Shell terminal session]
tiian@ubuntu:~/tmp$ export LIXA_TRACE_MASK=0x00040000
tiian@ubuntu:~/tmp$ /opt/lixa/bin/lixat
2011-12-16 16:44:09.970398 [8385/3073862320] client_recovery
2011-12-16 16:44:09.970575 [8385/3073862320] client_recovery: sending 197 bytes ('000191<?xml version="1.0" encoding="UTF-8" ?><msg level="0" verb="8" step="8"><client job="68d0e0ccbc6fc4b7fa616df9ab122395/127.0.0.1      " config_digest="68d0e0ccbc6fc4b7fa616df9ab122395"/></msg>') to the server for step 8
2011-12-16 16:44:10.007703 [8385/3073862320] client_recovery: receiving 632 bytes from the server |<?xml version="1.0" encoding="UTF-8" ?><msg level="0" verb="8" step="16"><answer rc="0"/><client job="68d0e0ccbc6fc4b7fa616df9ab122395/127.0.0.1      " config_digest="68d0e0ccbc6fc4b7fa616df9ab122395"><last_verb_step verb="5" step="16"/><state finished="0" txstate="3" will_commit="1" will_rollback="0" xid="1279875137.d2f5f0a5c37e485cb44d9fc16e72a33d.68d0e0ccbc6fc4b7fa616df9ab122395"/></client><rsrmgrs><rsrmgr rmid="0" next_verb="0" r_state="1" s_state="33" td_state="10"/><rsrmgr rmid="1" next_verb="0" r_state="1" s_state="33" td_state="10"/><rsrmgr rmid="2" next_verb="0" r_state="1" s_state="33" td_state="20"/></rsrmgrs></msg>|
2011-12-16 16:44:10.008178 [8385/3073862320] client_recovery_analyze
2011-12-16 16:44:10.008229 [8385/3073862320] client_recovery_analyze: the TX was committing
2011-12-16 16:44:10.008242 [8385/3073862320] client_recovery_analyze: rmid=0, r_state=1, s_state=33, td_state=10
2011-12-16 16:44:10.008261 [8385/3073862320] client_recovery_analyze: rmid=1, r_state=1, s_state=33, td_state=10
2011-12-16 16:44:10.008278 [8385/3073862320] client_recovery_analyze: rmid=2, r_state=1, s_state=33, td_state=20
2011-12-16 16:44:10.008303 [8385/3073862320] client_recovery_analyze/excp=1/ret_cod=0/errno=0
2011-12-16 16:44:10.008327 [8385/3073862320] client_recovery: transaction '1279875137.d2f5f0a5c37e485cb44d9fc16e72a33d.68d0e0ccbc6fc4b7fa616df9ab122395' must be committed
2011-12-16 16:44:10.008353 [8385/3073862320] client_recovery_commit
2011-12-16 16:44:10.008388 [8385/3073862320] client_recovery_commit: committing transaction '1279875137.d2f5f0a5c37e485cb44d9fc16e72a33d.68d0e0ccbc6fc4b7fa616df9ab122395'
2011-12-16 16:44:10.008412 [8385/3073862320] client_recovery_commit: xa_commit for rmid=0, name='LIXAmonkey1staRM', xa_name='LIXA Monkey RM (static)'...
2011-12-16 16:44:10.008464 [8385/3073862320] client_recovery_commit: rc=6
2011-12-16 16:44:10.008613 [8385/3073862320] client_recovery_commit: xa_commit for rmid=1, name='PostgreSQL_stareg', xa_name='PostgreSQL[LIXA]'...
2011-12-16 16:44:10.062892 [8385/3073862320] client_recovery_commit: rc=0
2011-12-16 16:44:10.062962 [8385/3073862320] client_recovery_commit: xa_commit for rmid=2, name='OracleXE_dynreg', xa_name='Oracle_XA'...
2011-12-16 16:44:10.275061 [8385/3073862320] client_recovery_commit: rc=0
2011-12-16 16:44:10.275128 [8385/3073862320] client_recovery_commit/excp=1/ret_cod=0/errno=0
2011-12-16 16:44:10.275286 [8385/3073862320] client_recovery: sending 212 bytes ('000206<?xml version="1.0" encoding="UTF-8" ?><msg level="0" verb="8" step="24"><recovery failed="1" commit="1"/><rsrmgrs><rsrmgr rmid="0" rc="6"/><rsrmgr rmid="1" rc="0"/><rsrmgr rmid="2" rc="0"/></rsrmgrs></msg>') to the server for step 24
2011-12-16 16:44:10.275483 [8385/3073862320] client_recovery: sending 197 bytes ('000191<?xml version="1.0" encoding="UTF-8" ?><msg level="0" verb="8" step="8"><client job="68d0e0ccbc6fc4b7fa616df9ab122395/127.0.0.1      " config_digest="68d0e0ccbc6fc4b7fa616df9ab122395"/></msg>') to the server for step 8
2011-12-16 16:44:10.315057 [8385/3073862320] client_recovery: receiving 95 bytes from the server |<?xml version="1.0" encoding="UTF-8" ?><msg level="0" verb="8" step="16"><answer rc="1"/></msg>|
2011-12-16 16:44:10.315261 [8385/3073862320] client_recovery: the server answered LIXA_RC_OBJ_NOT_FOUND; there are no more transactions to recover
2011-12-16 16:44:10.315315 [8385/3073862320] client_recovery/excp=12/ret_cod=0/errno=0
tx_open(): 0
tx_close(): 0
	  

The trace gives us a lot of information:

  • lixat fires an automatic (warm) recovery for transaction XID='1279875137.d2f5f0a5c37e485cb44d9fc16e72a33d.68d0e0ccbc6fc4b7fa616df9ab122395'

  • the transaction must be committed because the crash followed a successful prepare phase

  • first Resource Manager returns 6: XA_HEURRB (the transaction branch has been heuristically rolled back)

  • second and third Resource Managers return 0: XA_OK

  • the client send a recovery failed message to the state server

Note

You could think the LIXA Transaction Manager should have not performed the second and third xa_commit after a failure with the first Resource Manager. It seems a good idea, but unfortunately it does not add much value because, if applied, it would introduce a behaviour that depends on the order of the operations. The LIXA Transaction Manager tryes to apply a consistent rule: after a successful xa_prepare all the Resource Managers must receive the same command (xa_commit/xa_rollback).

You can check the (real) Resource Managers status:

[Oracle terminal session]
SQL> select * from dba_pending_transactions;

no rows selected

SQL> select * from COUNTRIES where COUNTRY_ID = 'RS';

COUNTR
------
COUNTRY_NAME
--------------------------------------------------------------------------------
 REGION_ID
----------
RS
Repubblica San Marino
	 1
	  

[PostgreSQL terminal session]
testdb=> select * from pg_prepared_xacts;
 transaction | gid | prepared | owner | database 
-------------+-----+----------+-------+----------
(0 rows)

testdb=> SELECT * FROM authors;
 id | last_name | first_name 
----+-----------+------------
  1 | Foo       | Bar
	  

Inspecting the system log you can notify there was a problem:

[Shell terminal session]
tiian@ubuntu:~/tmp$ sudo tail /var/log/daemon.log
Dec 16 16:44:09 ubuntu lixat[8385]: LXC000I this process is starting a new LIXA transaction manager (lixa package version is 0.5.36)
Dec 16 16:44:10 ubuntu lixat[8385]: LXC003C resource manager 'LIXAmonkey1staRM' returned an error (6) while committing (xa_commit) during recovery phase for transaction '1279875137.d2f5f0a5c37e485cb44d9fc16e72a33d.68d0e0ccbc6fc4b7fa616df9ab122395'
Dec 16 16:44:10 ubuntu lixat[8385]: LXC005W unable to recover transaction id '1279875137.d2f5f0a5c37e485cb44d9fc16e72a33d.68d0e0ccbc6fc4b7fa616df9ab122395'; this transaction must be manually recovered and the correlated record(s) must be manually fixed in lixad server status file
Dec 16 16:44:10 ubuntu lixad[8157]: LXD012W a client notified recovery failed condition for the transaction registered in status file 3 and block 1
	  

there is a critical message (LXC003C) and two warning messages (LXC005W, LXD012W). Pay attention the server notifies the problem (LXD012W) as well as the client (LXC005W). To inspect the content of the recovery failed transaction you may dump the state server:

[Shell terminal session]
tiian@ubuntu:~/tmp$ sudo su - lixa
lixa@ubuntu:~$ pkill lixad
lixa@ubuntu:~$ /opt/lixa/sbin/lixad --dump=u >/tmp/bar
lixa@ubuntu:~$ exit
logout
	  

and inspect the content of file /tmp/bar [55]:

[Content of file /tmp/bar]
========================================================================
Second file ('/opt/lixa/var/lixad_status1_2') will be dumped
Magic number is: 24848 (24848)
Level is: 1 (1)
Last sync timestamp: 2011-12-16T16:38:23.482854+0100
Size: 17 blocks
Used block chain starts at: 16 
Free block chain starts at: 0 (empty chain)
Dumping records following physical order: 0
Dumping records following free block chain: 0
Dumping records following used block chain: 1
------------------------------------------------------------------------

[...] 

------------------------------------------------------------------------
Block: 9, next block in chain: 8
Block type: transaction manager record (transaction header)
	Trnhdr/number of resource managers: 3
	Trnhdr/resource manager blocks are: 10 11 12 
	Trnhdr/arrival time: 2011-12-16T16:26:04.790526+0100
	Trnhdr/local socket address:port is 127.0.0.1:2345
	Trnhdr/peer socket address:port is 127.0.0.1:52251
	Trnhdr/config digest is '68d0e0ccbc6fc4b7fa616df9ab122395'
	Trnhdr/job is '68d0e0ccbc6fc4b7fa616df9ab122395/127.0.0.1      '
	Trnhdr/last (verb, step) are: [ (9,8) (4,8) (4,16) (5,8) (5,16) ]
	Trnhdr/state/finished: 0
	Trnhdr/state/txstate: 3
	Trnhdr/state/will commit: 1
	Trnhdr/state/will rollback: 0
	Trnhdr/state/xid: '1279875137.592cec793be1433d8ffd18574211e0e2.68d0e0ccbc6fc4b7fa616df9ab122395'
	Trnhdr/recoverying block id: 0
	Trnhdr/recovery failed: 1
	Trnhdr/recovery failed time: 2011-12-16T16:29:15.947057+0100
	Trnhdr/recovery commit: 1
------------------------------------------------------------------------
Block: 8, next block in chain: 7
Block type: resource manager record
	Rsrmgr/rmid: 2
	Rsrmgr/state/next_verb: 0
	Rsrmgr/state/xa_r_state: 1
	Rsrmgr/state/dynamic: 0
	Rsrmgr/state/xa_td_state: 10
	Rsrmgr/state/xa_s_state: 33
	Rsrmgr/lixac_conf.xml name: 'LIXAmonkey1staRM'
	Rsrmgr/xa_name: 'LIXA Monkey RM (static)'
	Rsrmgr/xa_open_info: 'monkeyrm.conf'
	Rsrmgr/xa_open_flags: 0x0
	Rsrmgr/xa_open_rc: 0
	Rsrmgr/xa_start_flags: 0x0
	Rsrmgr/xa_start_rc: 0
	Rsrmgr/xa_end_flags: 0x4000000
	Rsrmgr/xa_end_rc: 0
	Rsrmgr/xa_prepare_flags: 0x0
	Rsrmgr/xa_prepare_rc: 0
	Rsrmgr/xa_commit_flags: 0x0
	Rsrmgr/xa_commit_rc: 0
	Rsrmgr/xa_rollback_flags: 0x0
	Rsrmgr/xa_rollback_rc: 0
	Rsrmgr/xa_forget_flags: 0x0
	Rsrmgr/xa_forget_rc: 0
	Rsrmgr/ax_reg_flags: 0x0
	Rsrmgr/ax_reg_rc: 0
	Rsrmgr/ax_unreg_flags: 0x0
	Rsrmgr/ax_unreg_rc: 0
	Rsrmgr/recovery_rc: 6
------------------------------------------------------------------------
Block: 7, next block in chain: 6
Block type: resource manager record
	Rsrmgr/rmid: 1
	Rsrmgr/state/next_verb: 0
	Rsrmgr/state/xa_r_state: 1
	Rsrmgr/state/dynamic: 1
	Rsrmgr/state/xa_td_state: 20
	Rsrmgr/state/xa_s_state: 33
	Rsrmgr/lixac_conf.xml name: 'OracleXE_dynreg'
	Rsrmgr/xa_name: 'Oracle_XA'
	Rsrmgr/xa_open_info: 'Oracle_XA+Acc=P/hr/hr+SesTm=30+LogDir=/tmp+threads=true+DbgFl=7+Loose_Coupling=true'
	Rsrmgr/xa_open_flags: 0x0
	Rsrmgr/xa_open_rc: 0
	Rsrmgr/xa_start_flags: 0x0
	Rsrmgr/xa_start_rc: 0
	Rsrmgr/xa_end_flags: 0x4000000
	Rsrmgr/xa_end_rc: 0
	Rsrmgr/xa_prepare_flags: 0x0
	Rsrmgr/xa_prepare_rc: 0
	Rsrmgr/xa_commit_flags: 0x0
	Rsrmgr/xa_commit_rc: 0
	Rsrmgr/xa_rollback_flags: 0x0
	Rsrmgr/xa_rollback_rc: 0
	Rsrmgr/xa_forget_flags: 0x0
	Rsrmgr/xa_forget_rc: 0
	Rsrmgr/ax_reg_flags: 0x0
	Rsrmgr/ax_reg_rc: 0
	Rsrmgr/ax_unreg_flags: 0x0
	Rsrmgr/ax_unreg_rc: 0
	Rsrmgr/recovery_rc: 0
------------------------------------------------------------------------
Block: 6, next block in chain: 5
Block type: resource manager record
	Rsrmgr/rmid: 0
	Rsrmgr/state/next_verb: 0
	Rsrmgr/state/xa_r_state: 1
	Rsrmgr/state/dynamic: 0
	Rsrmgr/state/xa_td_state: 10
	Rsrmgr/state/xa_s_state: 33
	Rsrmgr/lixac_conf.xml name: 'PostgreSQL_stareg'
	Rsrmgr/xa_name: 'PostgreSQL[LIXA]'
	Rsrmgr/xa_open_info: 'dbname=testdb'
	Rsrmgr/xa_open_flags: 0x0
	Rsrmgr/xa_open_rc: 0
	Rsrmgr/xa_start_flags: 0x0
	Rsrmgr/xa_start_rc: 0
	Rsrmgr/xa_end_flags: 0x4000000
	Rsrmgr/xa_end_rc: 0
	Rsrmgr/xa_prepare_flags: 0x0
	Rsrmgr/xa_prepare_rc: 0
	Rsrmgr/xa_commit_flags: 0x0
	Rsrmgr/xa_commit_rc: 0
	Rsrmgr/xa_rollback_flags: 0x0
	Rsrmgr/xa_rollback_rc: 0
	Rsrmgr/xa_forget_flags: 0x0
	Rsrmgr/xa_forget_rc: 0
	Rsrmgr/ax_reg_flags: 0x0
	Rsrmgr/ax_reg_rc: 0
	Rsrmgr/ax_unreg_flags: 0x0
	Rsrmgr/ax_unreg_rc: 0
	Rsrmgr/recovery_rc: 0
------------------------------------------------------------------------

[...]

========================================================================
First file ('/opt/lixa/var/lixad_status2_1') will be dumped
Magic number is: 24848 (24848)
Level is: 1 (1)
Last sync timestamp: 2011-12-16T16:23:40.512332+0100
Size: 10 blocks
Used block chain starts at: 0 (empty chain)
Free block chain starts at: 1 
Dumping records following physical order: 0
Dumping records following free block chain: 0
Dumping records following used block chain: 1
========================================================================
First file ('/opt/lixa/var/lixad_status3_1') will be dumped
Magic number is: 24848 (24848)
Level is: 1 (1)
Last sync timestamp: 2011-12-16T16:38:47.933099+0100
Size: 10 blocks
Used block chain starts at: 4 
Free block chain starts at: 5 
Dumping records following physical order: 0
Dumping records following free block chain: 0
Dumping records following used block chain: 1
------------------------------------------------------------------------

[...]
	  

The chain composed of blocks 9 (transaction manager record) and 8, 7, 6 (resource manager records) keeps the state of the recovery failed transaction: Trnhdr/recovery failed: 1

If LIXA Monkey RM Resource Manager was a real Resource Manager, you could manually recovery the transaction used the procedure shown in the section called “Recoverying forgotten transactions”. Unfortunately the LIXA Monkey RM is a fake Resource Manager and it does not save the state anywhere: it is not able to correctly answer to xa_recover() and there is no way to show this last step.

As a final step, to clean-up the recovery failed state from LIXA state server, you have to recycle it using a special option:

[Shell terminal session]
tiian@ubuntu:~/tmp$ sudo su - lixa
lixa@ubuntu:~$ pkill lixad
lixa@ubuntu:~$ /opt/lixa/sbin/lixad --daemon --clean-failed
lixa@ubuntu:~$ pkill lixad[a]
lixa@ubuntu:~$ exit
logout
	  

[a] The LIXA state server is stopped after start-up to guarantee the content of the state file(s) on the disk are up-to-date before dumping them.

Dump again the content of the state server:

[Shell terminal session]
tiian@ubuntu:~/tmp$ sudo su - lixa
lixa@ubuntu:~$ /opt/lixa/sbin/lixad --dump=u >/tmp/bar
lixa@ubuntu:~$ exit
logout
	  

and check the content of the dump file again: the blocks 10, 9, 8, 7 should not be used or, if re-used, they should be related to a different transaction.

Important

Don't use --clean-failed as a default when starting LIXA state server (lixad): this option should be used only after you inspected the content of the state server and solved any in-doubt transaction.

Note

Operating this type of recovery can be easier if the LIXA state server is running in maintenance mode (see the section called “Maintenance mode execution”): only lixar can access the online content of the LIXA state server and ordinary clients (Application Program) can not perform transactions. It may be useful using the state server in maintenance mode, but only you can decide if your business rules allows it.

Important

The LIXA technology does not ask you to:

  • recycle the lixad state server to see the current status when dumping the content of the state server files (--dump option)

  • start the lixad state server in maintenance mode when performing manual (cold) recovery

The LIXA technology provides you these functions to simplify the administrative tasks, but deciding which option should be used it's your own responsability.

Recoverying a transaction associated to a different job

As explained in the section called “Automatic recovery concepts”, the automatic (warm) recovery is automatically performed by the LIXA Transaction Manager under the condition of Application Program equivalence (see the section called “Application Program equivalence”).

Sometimes you have to perform a manual (cold) recovery because Application Program equivalence is no more available. This is a typical scenario:

  • an Application Program crashed and its transaction is in in-doubt/prepared (recovery pending) status

  • the Application Program didn't specify a custom value for the environment variable LIXA_JOB

  • you changed the content of file lixac_conf.xml, for example you added a new profile

  • the MD5 signature of file lixac_conf.xml changed, the associated branch qualifier changed, new transactions would be associated to a different job

  • automatic (warm) recovery wouldn't be automatically performed and the transaction becomes a forgotten transaction.

Inspecting the list of recovery pending transaction using lixar -p does not return the desired transaction... What's going on?

By default, the lixar utility filters the transactions retrieved by the Resource Managers: it keeps only the transactions with the same branch qualifier of the current lixar running instance. If you look at the section called “Application Program equivalence”, you will realize that lixar retrieves only the transactions started with the same lixac_conf.xml, the same $(LIXA_PROFILE) and the same gethostid(). This behaviour helps the system engineer to see only the relevant subset of the whole recovery pending set.

If you are looking for all the transactions in recovery pending status currently kept by the Resource Managers associated to the current LIXA_PROFILE, you must specify the --bypass-bqual-check (-b) option.

Side effect of lixac_conf.xml changes

This paragraph explains a side effect of the behavior explained above.

Suppose the following scenario happens:

  • your LIXA Transaction Manager saved the state of a prepared/recovery pending transaction inside the LIXA state server

  • you are not aware of the existence of that prepared/recovery pending transaction

  • you change the content of lixac_conf.xml file

  • lately you discover there is a prepared/recovery pending transaction that can not be automatically recovered by LIXA Transaction Manager and you perform manual recovery

unfortunately, LIXA state server will keep some records related to the manually recovered transaction forever.

Manual recovery does query LIXA state server to retrieve the list of prepared/recovery pending transactions because it is designed to solve the issue state server does not have information related to some transactions.

Note

The lixad option --clean-failed explained in the section called “Recoverying a recovery failed transaction” does not help because the transaction was not in recovery failed state.

Important

At the time of this writing there is not a specific tool to remove this type of ghosts from LIXA state server, you can clean-up those records using a cold start as explained in the section called “Recoverying forgotten transactions”. You must pay attention to avoid in flight transaction purge.

In a future release this behavior could be improved if any user asked for it.

Recoverying a transaction managed by a different Transaction Manager

The LIXA project technology can help you dealing with a different Transaction Manager (see the section called “Transaction Manager and Transaction Monitor”) too: the lixar utility program can be used to inspect (and manually recover) a transaction managed by a different Transaction Manager using two command options together:

  • --bypass-bqual-check (-b): to bypass branch qualifier based filtering

  • --bypass-formatid-check (-B): to bypass format ID based filtering

If you use the above command options together, lixar utility program will inspect (and eventually commit/rollback) any XA transaction known by the Resource Managers associated to the current LIXA_PROFILE.

Note

It's your own responsability to define a LIXA profile that's compatible with the configuration of the third party Transaction Manager used when managing the transaction: it must contain the same Resource Managers in the same order with the same options.

Warning

LIXA software is libre/free/open source software and you use it exclusively and consciously without any warranty at your own risk. Using LIXA software technology will probably put you in an unsupported state regarding to the third party Transaction Manager supplier.

Picking up the LIXA format id and branch qualifier

In the previous sections we have dealt with format id and branch qualifier; you could ask How can I discover the format id and the branch qualifier branch qualifier associated to my own Application Program?

The easiest way to pick-up them is to use lixat utility program using the same LIXA_PROFILE you use when running your Application Program:

tiian@ubuntu:~/src/lixa$ sudo su - lixa
lixa@ubuntu:~$ /opt/lixa/sbin/lixad --daemon
lixa@ubuntu:~$ ps -ef|grep lixad|grep -v grep
lixa      8122     1  0 23:02 ?        00:00:00 /opt/lixa/sbin/lixad --daemon
lixa@ubuntu:~$ exit
logout
tiian@ubuntu:~/src/lixa$ /opt/lixa/bin/lixat -c
tx_open(): 0
tx_begin(): 0
tx_info(): 1
	xid/formatID.gtrid.bqual = 1279875137.56af7a66398f4eca82b8826fe10165ad.9e4c11057107c73366c9fc421eaa85ca
tx_commit(): 0
tx_close(): 0
tiian@ubuntu:~/src/lixa$ /opt/lixa/bin/lixat -c
tx_open(): 0
tx_begin(): 0
tx_info(): 1
	xid/formatID.gtrid.bqual = 1279875137.218f05b733fb4bc1aa3d21eeaf01fbab.9e4c11057107c73366c9fc421eaa85ca
tx_commit(): 0
tx_close(): 0	  
	

From the above terminal output:

  • formatID is constant and the LIXA Transaction Manager uses the exadecimal value 1279875137 (that's the ASCII sequence of string LIXA)

  • branch qualifier is computed as explained in the section called “Application Program equivalence” and the value in the above example is 9e4c11057107c73366c9fc421eaa85ca

  • global transaction id must be different for any transaction and you can see two different values in the above examples (it's computed using uuid_generate() function).

If you were interested in retrieving them programmatically (from your own C language program) you could use the standard tx_info() function that returns a TXINFO struct ([TXspec]). Please pay attention the XID struct contains binary data (it does not contain ASCII data).



[54] This should never happen: it could be a bug in LIXA project software or it might be the consequence of a cold start (you removed the state files) of lixad

[55] The state server can be analyzed without stopping it (pkill lixad), but it may happen you will not see the current content because the state server has not yet synchronized the state file(s). If the LIXA state server is processing many transactions per second you will probably see an up-to-date state, but if it was sleeping you wouldn't.