Manual (cold) recovery uses the information provided by the
Resource Managers querying them using
xa_recover()
.
This type of recovery should be used to resolve some unusual
situations.
If, for any reason, the LIXA state server (lixad) forgot the state of a transaction [54] or the transaction is in state “recovery failed” because a previous automatic (warm) recovery failed, you have to manually recover the transaction. The procedure to recover a “forgotten” transaction is the same you use to recover a “recovery failed” one, but additional server side clean-up is suggested for “recovery failed” transactions.
This example necessitates of the same environment set-up in
the section called “Forcing automatic recovery”; you must start running
the example program after you enabled the
LIXA_CRASH_POINT
environment variable:
[Shell terminal session] |
tiian@ubuntu:~/tmp$ export LIXA_CRASH_POINT=15 tiian@ubuntu:~/tmp$ echo $LIXA_CRASH_POINT 15 tiian@ubuntu:~/tmp$ ./example6_pql_ora insert Deleting a row from the tables... Oracle DELETE statement executed! Aborted |
and check there is a recovery pending transaction inside PostgreSQL and Oracle Resource Managers:
[Oracle terminal session] |
SQL> select * from dba_pending_transactions; FORMATID ---------- GLOBALID -------------------------------------------------------------------------------- BRANCHID -------------------------------------------------------------------------------- 1279875137 957747F7F37B439EBCEA4146076AD322 9BAC7BE1C129EA6EE31F2D71B318120C |
[PostgreSQL terminal session] |
testdb=> select * from pg_prepared_xacts; transaction | gid | prepared | owner | database -------------+------------------------------------------------------------------------------+-------------------------------+-------+---------- 877 | 1279875137.957747f7f37b439ebcea4146076ad322.9bac7be1c129ea6ee31f2d71b318120c | 2011-12-14 22:55:14.973443+01 | tiian | testdb |
Then you should stop and cold start the lixad state server:
[Shell terminal session] |
tiian@ubuntu:~/tmp$ sudo su - lixa lixa@ubuntu:~$ ps -ef|grep lixad|grep -v grep lixa 24437 1 0 21:19 ? 00:00:00 /opt/lixa/sbin/lixad --daemon lixa@ubuntu:~$ pkill lixad lixa@ubuntu:~$ ps -ef|grep lixad|grep -v grep lixa@ubuntu:~$ ls /opt/lixa/var/ lixad_status1_1 lixad_status2_1 lixad_status3_1 README lixad_status1_2 lixad_status2_2 lixad_status3_2 lixa@ubuntu:~$ rm /opt/lixa/var/lixad_status* lixa@ubuntu:~$ ls /opt/lixa/var/ README lixa@ubuntu:~$ /opt/lixa/sbin/lixad --daemon lixa@ubuntu:~$ ps -ef|grep lixad|grep -v grep lixa 28594 1 0 23:00 ? 00:00:00 /opt/lixa/sbin/lixad --daemon lixa@ubuntu:~$ ls /opt/lixa/var/ lixad_status1_1 lixad_status2_1 lixad_status3_1 README lixad_status1_2 lixad_status2_2 lixad_status3_2 run.pid lixa@ubuntu:~$ exit logout |
These are the operations you just performed:
changed the user from your own to
lixa
user
checked the lixad daemon was running
stopped the lixad daemon
checked the lixad daemon was not running
checked the lixad's state files
removed the lixad's state files
started the lixad daemon
checked the new lixad's state files
Running lixat does not automatically recover the transaction:
[Shell terminal session] |
tiian@ubuntu:~/tmp$ export LIXA_TRACE_MASK=0x00040000 tiian@ubuntu:~/tmp$ /opt/lixa/bin/lixat tx_open(): 0 tx_close(): 0 |
The program does not produce trace because “client recovery” module is not called (the state server does not pass information about recovery pending transactions to the LIXA client library).
[Oracle terminal session] |
SQL> select * from dba_pending_transactions; FORMATID ---------- GLOBALID -------------------------------------------------------------------------------- BRANCHID -------------------------------------------------------------------------------- 1279875137 957747F7F37B439EBCEA4146076AD322 9BAC7BE1C129EA6EE31F2D71B318120C |
[PostgreSQL terminal session] |
testdb=> select * from pg_prepared_xacts; transaction | gid | prepared | owner | database -------------+------------------------------------------------------------------------------+-------------------------------+-------+---------- 877 | 1279875137.957747f7f37b439ebcea4146076ad322.9bac7be1c129ea6ee31f2d71b318120c | 2011-12-14 22:55:14.973443+01 | tiian | testdb |
The lixar utility program with -p
option can be used to list the prepared transactions that can not be
automatically recovered:
[Shell terminal session] |
tiian@ubuntu:~/tmp$ /opt/lixa/bin/lixar -p Execution options: - print report = yes - transaction(s) will be committed = no - transaction(s) will be rolled back = no - bypass xid branch qualifier check = no - bypass xid format id check = no - use TMENDRSCAN flag for last xa_recover call = no Recovery environment: LIXA_CONFIG_FILE_ENV_VAR = '(null)' LIXA_PROFILE_ENV_VAR = 'PQL_STA_ORA_DYN' LIXA_JOB_ENV_VAR = '(null)' Resource manager list: rmid=0, lixa_name='PostgreSQL_stareg', xa_name='PostgreSQL[LIXA]' rmid=1, lixa_name='OracleXE_dynreg', xa_name='Oracle_XA' Prepared and in-doubt transaction list: xid='1279875137.957747f7f37b439ebcea4146076ad322.9bac7be1c129ea6ee31f2d71b318120c': rmid=0 rmid=1 |
it reports some useful information and at the bottom there is the
list of in-doubt transactions.
The transaction is prepared/in-doubt for both Resource Managers;
sometimes it may be only for a subset of the Resource Managers
defined by LIXA_PROFILE
.
You can manually recover the transaction using
the -x
and -r
(rollback) or
-c
(commit):
[Shell terminal session] |
tiian@ubuntu:~/tmp$ /opt/lixa/bin/lixar -p -x 1279875137.957747f7f37b439ebcea4146076ad322.9bac7be1c129ea6ee31f2d71b318120c -r Execution options: - print report = yes - transaction to commit/rollback = 1279875137.957747f7f37b439ebcea4146076ad322.9bac7be1c129ea6ee31f2d71b318120c - transaction(s) will be committed = no - transaction(s) will be rolled back = yes - bypass xid branch qualifier check = no - bypass xid format id check = no - use TMENDRSCAN flag for last xa_recover call = no Recovery environment: LIXA_CONFIG_FILE_ENV_VAR = '(null)' LIXA_PROFILE_ENV_VAR = 'PQL_STA_ORA_DYN' LIXA_JOB_ENV_VAR = '(null)' Resource manager list: rmid=0, lixa_name='PostgreSQL_stareg', xa_name='PostgreSQL[LIXA]' rmid=1, lixa_name='OracleXE_dynreg', xa_name='Oracle_XA' Prepared and in-doubt transaction list: xid='1279875137.957747f7f37b439ebcea4146076ad322.9bac7be1c129ea6ee31f2d71b318120c': rmid=0 rmid=1 Analizing transaction '1279875137.957747f7f37b439ebcea4146076ad322.9bac7be1c129ea6ee31f2d71b318120c': xa_rollback --> rmid=0, lixa_name='PostgreSQL_stareg', xa_name='PostgreSQL[LIXA]', rc=0 xa_rollback --> rmid=1, lixa_name='OracleXE_dynreg', xa_name='Oracle_XA', rc=0 |
You can now verify there are no prepared/in-doubt transactions inside the Resource Managers:
[Oracle terminal session] |
SQL> select * from dba_pending_transactions; no rows selected |
[PostgreSQL terminal session] |
testdb=> select * from pg_prepared_xacts; transaction | gid | prepared | owner | database -------------+-----+----------+-------+---------- (0 rows) |
We rolled back the deletion so the rows must be in place:
[Oracle terminal session] |
SQL> select * from COUNTRIES where COUNTRY_ID = 'RS'; COUNTR ------ COUNTRY_NAME -------------------------------------------------------------------------------- REGION_ID ---------- RS Repubblica San Marino 1 |
[PostgreSQL terminal session] |
testdb=> select * from AUTHORS; id | last_name | first_name ----+-----------+------------ 1 | Foo | Bar |
You can retrieve some help from lixar with
-?
option:
[Shell terminal session] |
tiian@ubuntu:~/tmp$ /opt/lixa/bin/lixar -? Usage: lixar [OPTION...] - LIXA recovery utility Help Options: -?, --help Show help options Application Options: -p, --print Print a report of all the prepared and in-doubt transactions compatible with current configuration and profile -x, --xid Select specified transaction for rollback/commit -X, --xid-file Select specified file as a list of transaction to rollback/commit -c, --commit Commit prepared & in-doubt transactions -r, --rollback Rollback prepared & in-doubt transactions -v, --version Print package info and exit -b, --bypass-bqual-check Bypass xid branch qualifier check -B, --bypass-formatid-check Bypass xid format id check -e, --use-tmendrscan-flag Use TMENDRSCAN flag for last xa_recover call |
This example is quite complex because a “recovery failed” transaction is unlikely. To create a “recovery failed” transaction we will use a special Resource Manager, the “LIXA monkey” one: it's a R.M. we can “program” to answer the Transaction Manager as we desire. We will emulate an heuristically completed transaction to force the LIXA Transaction Manager marking it as “recovery failed”.
Before we can start, you must set-up the same environment already explained in the section called “Recoverying forgotten transactions”. Open three terminals session and prepare the sessions as shown below:
[Shell terminal session] |
tiian@ubuntu:~/tmp$ . /usr/lib/oracle/xe/app/oracle/product/10.2.0/server/bin/oracle_env.sh tiian@ubuntu:~/tmp$ echo $ORACLE_HOME /usr/lib/oracle/xe/app/oracle/product/10.2.0/server tiian@ubuntu:~/tmp$ echo $ORACLE_SID XE tiian@ubuntu:~/tmp$ export LIXA_PROFILE=MON_STA_PQL_STA_ORA_DYN tiian@ubuntu:~/tmp$ echo $LIXA_PROFILE MON_STA_PQL_STA_ORA_DYN tiian@ubuntu:~/tmp$ echo $LD_LIBRARY_PATH /usr/lib/oracle/xe/app/oracle/product/10.2.0/server/lib: |
The specified LIXA_PROFILE
points to a
configuration with three Resource Managers: LIXA Monkey (a fake
R.M.), PostgreSQL (static registration) and Oracle DBMS (dynamic
registration).
[Oracle terminal session] |
tiian@ubuntu:~$ . /usr/lib/oracle/xe/app/oracle/product/10.2.0/server/bin/oracle_env.sh tiian@ubuntu:~$ sqlplus "hr/hr" SQL*Plus: Release 10.2.0.1.0 - Production on Ven Dic 16 16:15:23 2011 Copyright (c) 1982, 2005, Oracle. All rights reserved. Connesso a: Oracle Database 10g Express Edition Release 10.2.0.1.0 - Production SQL> select * from COUNTRIES where COUNTRY_ID = 'RS'; no rows selected |
[PostgreSQL terminal session] |
tiian@ubuntu:~$ psql testdb Welcome to psql 8.3.16, the PostgreSQL interactive terminal. Type: \copyright for distribution terms \h for help with SQL commands \? for help with psql commands \g or terminate with semicolon to execute query \q to quit testdb=> SELECT * FROM authors; id | last_name | first_name ----+-----------+------------ (0 rows) |
Start the LIXA state server if it's not active:
[Shell terminal session] |
tiian@ubuntu:~/tmp$ sudo su - lixa lixa@ubuntu:~$ /opt/lixa/sbin/lixad --daemon lixa@ubuntu:~$ ps -ef|grep lixad|grep -v grep lixa 7127 1 0 22:13 ? 00:00:00 /opt/lixa/sbin/lixad --daemon lixa@ubuntu:~$ exit logout |
Before we can start the program execution, we must create a file
named monkeyrm.conf
in the current directory
and put the following content inside it:
[Content of file monkeyrm.conf] |
xa_open/0 xa_start/0 xa_end/0 xa_prepare/0 xa_commit/0 xa_close/0 |
Now you can execute the program to verify it's running as expected; we are tracing the module “client XA switch” (see the section called “Tracing modules”) to verify the LIXA Monkey Resource Manager is properly configured:
[Shell terminal session] |
tiian@ubuntu:~/tmp$ export LIXA_TRACE_MASK=0x00010000 tiian@ubuntu:~/tmp$ ./example6_pql_ora insert 2>&1 | grep monkey 2011-12-15 22:42:20.709697 [22490/3052672768] lixa_monkeyrm_open: xa_info='monkeyrm.conf', rmid=1, flags=0x0 2011-12-15 22:42:20.710101 [22490/3052672768] lixa_monkeyrm_open: creating new first level hash table... 2011-12-15 22:42:20.713087 [22490/3052672768] lixa_monkeyrm_open/g_hash_table_new_full/monkey_status: 0x804caa0 2011-12-15 22:42:20.713361 [22490/3052672768] lixa_monkeyrm_open: creating new second level hash table for tid=3052672768 2011-12-15 22:42:20.713719 [22490/3052672768] lixa_monkeyrm_open/g_hash_table_new_full/slht: 0x804cac8 2011-12-15 22:42:20.713994 [22490/3052672768] lixa_monkeyrm_open: creating new status block for tid=3052672768, rmid=1 2011-12-15 22:42:20.714248 [22490/3052672768] lixa_monkeyrm_open/g_malloc/mss: 0x804e998 2011-12-15 22:42:20.714521 [22490/3052672768] lixa_monkeyrm_open_init 2011-12-15 22:42:20.714783 [22490/3052672768] lixa_monkeyrm_open_init/g_array_new/mss->records: 0x804c550 2011-12-15 22:42:20.715139 [22490/3052672768] lixa_monkeyrm_open_init: verb='xa_open' 2011-12-15 22:42:20.715430 [22490/3052672768] lixa_monkeyrm_open_init: appending record verb=1, rc=0 2011-12-15 22:42:20.715691 [22490/3052672768] lixa_monkeyrm_open_init: verb='xa_start' 2011-12-15 22:42:20.715946 [22490/3052672768] lixa_monkeyrm_open_init: appending record verb=3, rc=0 2011-12-15 22:42:20.716340 [22490/3052672768] lixa_monkeyrm_open_init: verb='xa_end' 2011-12-15 22:42:20.716602 [22490/3052672768] lixa_monkeyrm_open_init: appending record verb=4, rc=0 2011-12-15 22:42:20.716864 [22490/3052672768] lixa_monkeyrm_open_init: verb='xa_prepare' 2011-12-15 22:42:20.717123 [22490/3052672768] lixa_monkeyrm_open_init: appending record verb=5, rc=0 2011-12-15 22:42:20.717377 [22490/3052672768] lixa_monkeyrm_open_init: verb='xa_commit' 2011-12-15 22:42:20.717671 [22490/3052672768] lixa_monkeyrm_open_init: appending record verb=6, rc=0 2011-12-15 22:42:20.717937 [22490/3052672768] lixa_monkeyrm_open_init: verb='xa_close' 2011-12-15 22:42:20.718196 [22490/3052672768] lixa_monkeyrm_open_init: appending record verb=2, rc=0 2011-12-15 22:42:20.718519 [22490/3052672768] lixa_monkeyrm_open_init/excp=3/ret_cod=0/errno=0 2011-12-15 22:42:20.718816 [22490/3052672768] lixa_monkeyrm_get_rc 2011-12-15 22:42:20.726862 [22490/3052672768] lixa_monkeyrm_get_rc: verb is 1, XA return code is 0 2011-12-15 22:42:20.727177 [22490/3052672768] lixa_monkeyrm_get_rc/excp=2/ret_cod=0/errno=0 2011-12-15 22:42:20.727429 [22490/3052672768] lixa_monkeyrm_open/excp=4/ret_cod=0/xa_rc=0/errno=0 2011-12-15 22:42:21.022655 [22490/3052672768] lixa_monkeyrm_start: xid='1279875137.7bdcf6c060e14bdba416302661187411.a100c8728292168b21ba7239bffc137d', rmid=1, flags=0x0 2011-12-15 22:42:21.022695 [22490/3052672768] lixa_monkeyrm_get_rc 2011-12-15 22:42:21.022705 [22490/3052672768] lixa_monkeyrm_get_rc: verb is 3, XA return code is 0 2011-12-15 22:42:21.022712 [22490/3052672768] lixa_monkeyrm_get_rc/excp=2/ret_cod=0/errno=0 2011-12-15 22:42:21.022719 [22490/3052672768] lixa_monkeyrm_start/excp=4/ret_cod=0/xa_rc=0/errno=0 2011-12-15 22:42:21.059688 [22490/3052672768] lixa_monkeyrm_end: xid='1279875137.7bdcf6c060e14bdba416302661187411.a100c8728292168b21ba7239bffc137d', rmid=1, flags=0x4000000 2011-12-15 22:42:21.059701 [22490/3052672768] lixa_monkeyrm_get_rc 2011-12-15 22:42:21.059709 [22490/3052672768] lixa_monkeyrm_get_rc: verb is 4, XA return code is 0 2011-12-15 22:42:21.059716 [22490/3052672768] lixa_monkeyrm_get_rc/excp=2/ret_cod=0/errno=0 2011-12-15 22:42:21.059723 [22490/3052672768] lixa_monkeyrm_end/excp=4/ret_cod=0/xa_rc=0/errno=0 2011-12-15 22:42:21.076478 [22490/3052672768] lixa_monkeyrm_prepare: xid='1279875137.7bdcf6c060e14bdba416302661187411.a100c8728292168b21ba7239bffc137d', rmid=1, flags=0x0 2011-12-15 22:42:21.076498 [22490/3052672768] lixa_monkeyrm_get_rc 2011-12-15 22:42:21.076512 [22490/3052672768] lixa_monkeyrm_get_rc: verb is 5, XA return code is 0 2011-12-15 22:42:21.076520 [22490/3052672768] lixa_monkeyrm_get_rc/excp=2/ret_cod=0/errno=0 2011-12-15 22:42:21.076527 [22490/3052672768] lixa_monkeyrm_prepare/excp=4/ret_cod=0/xa_rc=0/errno=0 2011-12-15 22:42:21.098722 [22490/3052672768] lixa_monkeyrm_commit: xid='1279875137.7bdcf6c060e14bdba416302661187411.a100c8728292168b21ba7239bffc137d', rmid=1, flags=0x0 2011-12-15 22:42:21.098747 [22490/3052672768] lixa_monkeyrm_get_rc 2011-12-15 22:42:21.098756 [22490/3052672768] lixa_monkeyrm_get_rc: verb is 6, XA return code is 0 2011-12-15 22:42:21.098764 [22490/3052672768] lixa_monkeyrm_get_rc/excp=2/ret_cod=0/errno=0 2011-12-15 22:42:21.098770 [22490/3052672768] lixa_monkeyrm_commit/excp=4/ret_cod=0/xa_rc=0/errno=0 2011-12-15 22:42:21.102841 [22490/3052672768] lixa_monkeyrm_close: xa_info='', rmid=1, flags=0x0 2011-12-15 22:42:21.102853 [22490/3052672768] lixa_monkeyrm_get_rc 2011-12-15 22:42:21.102862 [22490/3052672768] lixa_monkeyrm_get_rc: verb is 2, XA return code is 0 2011-12-15 22:42:21.102869 [22490/3052672768] lixa_monkeyrm_get_rc/excp=2/ret_cod=0/errno=0 2011-12-15 22:42:21.102877 [22490/3052672768] lixa_monkeyrm_close/excp=4/ret_cod=0/xa_rc=0/errno=0 tiian@ubuntu:~/tmp$ unset LIXA_TRACE_MASK tiian@ubuntu:~/tmp$ ./example6_pql_ora delete Deleting a row from the tables... Oracle DELETE statement executed! |
Don't forget the clean-up step (last command) before performing the next steps.
To create a “recovery pending” transaction, we are
going to force a crash after xa_prepare()
functions completed successfully:
[Shell terminal session] |
tiian@ubuntu:~/tmp$ export LIXA_CRASH_POINT=15 tiian@ubuntu:~/tmp$ echo $LIXA_CRASH_POINT 15 tiian@ubuntu:~/tmp$ ./example6_pql_ora insert Inserting a row in the tables... Oracle INSERT statement executed! Aborted tiian@ubuntu:~/tmp$ unset LIXA_CRASH_POINT tiian@ubuntu:~/tmp$ echo $LIXA_CRASH_POINT |
Verify there is a prepared/in-doubt transaction inside Oracle:
[Oracle terminal session] |
SQL> select * from dba_pending_transactions; FORMATID ---------- GLOBALID -------------------------------------------------------------------------------- BRANCHID -------------------------------------------------------------------------------- 1279875137 D2F5F0A5C37E485CB44D9FC16E72A33D 68D0E0CCBC6FC4B7FA616DF9AB122395 |
Verify there is a prepared/in-doubt transaction inside PostgreSQL:
[PostgreSQL terminal session] |
testdb=> select * from pg_prepared_xacts; transaction | gid | prepared | owner | database -------------+------------------------------------------------------------------------------+-------------------------------+-------+---------- 964 | 1279875137.d2f5f0a5c37e485cb44d9fc16e72a33d.68d0e0ccbc6fc4b7fa616df9ab122395 | 2011-12-16 16:38:47.921947+01 | tiian | testdb |
To move the transaction in “recovery failed” status
we emulate
an “heuristically rolled back” in the automatic recovery
step.
Edit the file monkeyrm.conf
to code the
new behavior of the LIXA Monkey R.M.:
[Content of file monkeyrm.conf] |
xa_open/0 xa_commit/6 xa_close/0 |
Start the lixat utility command to start an “automatic (warm) recovery”; trace the “client recovery” (see the section called “Tracing modules”) module to understand what happens:
[Shell terminal session] |
tiian@ubuntu:~/tmp$ export LIXA_TRACE_MASK=0x00040000 tiian@ubuntu:~/tmp$ /opt/lixa/bin/lixat 2011-12-16 16:44:09.970398 [8385/3073862320] client_recovery 2011-12-16 16:44:09.970575 [8385/3073862320] client_recovery: sending 197 bytes ('000191<?xml version="1.0" encoding="UTF-8" ?><msg level="0" verb="8" step="8"><client job="68d0e0ccbc6fc4b7fa616df9ab122395/127.0.0.1 " config_digest="68d0e0ccbc6fc4b7fa616df9ab122395"/></msg>') to the server for step 8 2011-12-16 16:44:10.007703 [8385/3073862320] client_recovery: receiving 632 bytes from the server |<?xml version="1.0" encoding="UTF-8" ?><msg level="0" verb="8" step="16"><answer rc="0"/><client job="68d0e0ccbc6fc4b7fa616df9ab122395/127.0.0.1 " config_digest="68d0e0ccbc6fc4b7fa616df9ab122395"><last_verb_step verb="5" step="16"/><state finished="0" txstate="3" will_commit="1" will_rollback="0" xid="1279875137.d2f5f0a5c37e485cb44d9fc16e72a33d.68d0e0ccbc6fc4b7fa616df9ab122395"/></client><rsrmgrs><rsrmgr rmid="0" next_verb="0" r_state="1" s_state="33" td_state="10"/><rsrmgr rmid="1" next_verb="0" r_state="1" s_state="33" td_state="10"/><rsrmgr rmid="2" next_verb="0" r_state="1" s_state="33" td_state="20"/></rsrmgrs></msg>| 2011-12-16 16:44:10.008178 [8385/3073862320] client_recovery_analyze 2011-12-16 16:44:10.008229 [8385/3073862320] client_recovery_analyze: the TX was committing 2011-12-16 16:44:10.008242 [8385/3073862320] client_recovery_analyze: rmid=0, r_state=1, s_state=33, td_state=10 2011-12-16 16:44:10.008261 [8385/3073862320] client_recovery_analyze: rmid=1, r_state=1, s_state=33, td_state=10 2011-12-16 16:44:10.008278 [8385/3073862320] client_recovery_analyze: rmid=2, r_state=1, s_state=33, td_state=20 2011-12-16 16:44:10.008303 [8385/3073862320] client_recovery_analyze/excp=1/ret_cod=0/errno=0 2011-12-16 16:44:10.008327 [8385/3073862320] client_recovery: transaction '1279875137.d2f5f0a5c37e485cb44d9fc16e72a33d.68d0e0ccbc6fc4b7fa616df9ab122395' must be committed 2011-12-16 16:44:10.008353 [8385/3073862320] client_recovery_commit 2011-12-16 16:44:10.008388 [8385/3073862320] client_recovery_commit: committing transaction '1279875137.d2f5f0a5c37e485cb44d9fc16e72a33d.68d0e0ccbc6fc4b7fa616df9ab122395' 2011-12-16 16:44:10.008412 [8385/3073862320] client_recovery_commit: xa_commit for rmid=0, name='LIXAmonkey1staRM', xa_name='LIXA Monkey RM (static)'... 2011-12-16 16:44:10.008464 [8385/3073862320] client_recovery_commit: rc=6 2011-12-16 16:44:10.008613 [8385/3073862320] client_recovery_commit: xa_commit for rmid=1, name='PostgreSQL_stareg', xa_name='PostgreSQL[LIXA]'... 2011-12-16 16:44:10.062892 [8385/3073862320] client_recovery_commit: rc=0 2011-12-16 16:44:10.062962 [8385/3073862320] client_recovery_commit: xa_commit for rmid=2, name='OracleXE_dynreg', xa_name='Oracle_XA'... 2011-12-16 16:44:10.275061 [8385/3073862320] client_recovery_commit: rc=0 2011-12-16 16:44:10.275128 [8385/3073862320] client_recovery_commit/excp=1/ret_cod=0/errno=0 2011-12-16 16:44:10.275286 [8385/3073862320] client_recovery: sending 212 bytes ('000206<?xml version="1.0" encoding="UTF-8" ?><msg level="0" verb="8" step="24"><recovery failed="1" commit="1"/><rsrmgrs><rsrmgr rmid="0" rc="6"/><rsrmgr rmid="1" rc="0"/><rsrmgr rmid="2" rc="0"/></rsrmgrs></msg>') to the server for step 24 2011-12-16 16:44:10.275483 [8385/3073862320] client_recovery: sending 197 bytes ('000191<?xml version="1.0" encoding="UTF-8" ?><msg level="0" verb="8" step="8"><client job="68d0e0ccbc6fc4b7fa616df9ab122395/127.0.0.1 " config_digest="68d0e0ccbc6fc4b7fa616df9ab122395"/></msg>') to the server for step 8 2011-12-16 16:44:10.315057 [8385/3073862320] client_recovery: receiving 95 bytes from the server |<?xml version="1.0" encoding="UTF-8" ?><msg level="0" verb="8" step="16"><answer rc="1"/></msg>| 2011-12-16 16:44:10.315261 [8385/3073862320] client_recovery: the server answered LIXA_RC_OBJ_NOT_FOUND; there are no more transactions to recover 2011-12-16 16:44:10.315315 [8385/3073862320] client_recovery/excp=12/ret_cod=0/errno=0 tx_open(): 0 tx_close(): 0 |
The trace gives us a lot of information:
lixat fires an automatic (warm) recovery for transaction XID='1279875137.d2f5f0a5c37e485cb44d9fc16e72a33d.68d0e0ccbc6fc4b7fa616df9ab122395'
the transaction must be committed because the crash followed a successful “prepare” phase
first Resource Manager returns 6:
XA_HEURRB
(“the transaction branch
has been heuristically rolled back”)
second and third Resource Managers return 0: XA_OK
the client send a “recovery failed” message to the state server
You could think the LIXA Transaction Manager should have not
performed the second and third
xa_commit
after a failure with the
first Resource Manager. It seems a good idea, but unfortunately
it does not add much value because, if applied, it would
introduce a behaviour that depends on the order of the
operations. The LIXA Transaction Manager tryes to apply a
consistent rule: after a successful xa_prepare
all the Resource Managers must receive the same command
(xa_commit
/xa_rollback
).
You can check the (real) Resource Managers status:
[Oracle terminal session] |
SQL> select * from dba_pending_transactions; no rows selected SQL> select * from COUNTRIES where COUNTRY_ID = 'RS'; COUNTR ------ COUNTRY_NAME -------------------------------------------------------------------------------- REGION_ID ---------- RS Repubblica San Marino 1 |
[PostgreSQL terminal session] |
testdb=> select * from pg_prepared_xacts; transaction | gid | prepared | owner | database -------------+-----+----------+-------+---------- (0 rows) testdb=> SELECT * FROM authors; id | last_name | first_name ----+-----------+------------ 1 | Foo | Bar |
Inspecting the system log you can notify there was a problem:
[Shell terminal session] |
tiian@ubuntu:~/tmp$ sudo tail /var/log/daemon.log Dec 16 16:44:09 ubuntu lixat[8385]: LXC000I this process is starting a new LIXA transaction manager (lixa package version is 0.5.36) Dec 16 16:44:10 ubuntu lixat[8385]: LXC003C resource manager 'LIXAmonkey1staRM' returned an error (6) while committing (xa_commit) during recovery phase for transaction '1279875137.d2f5f0a5c37e485cb44d9fc16e72a33d.68d0e0ccbc6fc4b7fa616df9ab122395' Dec 16 16:44:10 ubuntu lixat[8385]: LXC005W unable to recover transaction id '1279875137.d2f5f0a5c37e485cb44d9fc16e72a33d.68d0e0ccbc6fc4b7fa616df9ab122395'; this transaction must be manually recovered and the correlated record(s) must be manually fixed in lixad server status file Dec 16 16:44:10 ubuntu lixad[8157]: LXD012W a client notified recovery failed condition for the transaction registered in status file 3 and block 1 |
there is a critical message (LXC003C) and two warning messages (LXC005W, LXD012W). Pay attention the server notifies the problem (LXD012W) as well as the client (LXC005W). To inspect the content of the “recovery failed” transaction you may dump the state server:
[Shell terminal session] |
tiian@ubuntu:~/tmp$ sudo su - lixa lixa@ubuntu:~$ pkill lixad lixa@ubuntu:~$ /opt/lixa/sbin/lixad --dump=u >/tmp/bar lixa@ubuntu:~$ exit logout |
and inspect the content of file /tmp/bar
[55]:
[Content of file /tmp/bar] |
======================================================================== Second file ('/opt/lixa/var/lixad_status1_2') will be dumped Magic number is: 24848 (24848) Level is: 1 (1) Last sync timestamp: 2011-12-16T16:38:23.482854+0100 Size: 17 blocks Used block chain starts at: 16 Free block chain starts at: 0 (empty chain) Dumping records following physical order: 0 Dumping records following free block chain: 0 Dumping records following used block chain: 1 ------------------------------------------------------------------------ [...] ------------------------------------------------------------------------ Block: 9, next block in chain: 8 Block type: transaction manager record (transaction header) Trnhdr/number of resource managers: 3 Trnhdr/resource manager blocks are: 10 11 12 Trnhdr/arrival time: 2011-12-16T16:26:04.790526+0100 Trnhdr/local socket address:port is 127.0.0.1:2345 Trnhdr/peer socket address:port is 127.0.0.1:52251 Trnhdr/config digest is '68d0e0ccbc6fc4b7fa616df9ab122395' Trnhdr/job is '68d0e0ccbc6fc4b7fa616df9ab122395/127.0.0.1 ' Trnhdr/last (verb, step) are: [ (9,8) (4,8) (4,16) (5,8) (5,16) ] Trnhdr/state/finished: 0 Trnhdr/state/txstate: 3 Trnhdr/state/will commit: 1 Trnhdr/state/will rollback: 0 Trnhdr/state/xid: '1279875137.592cec793be1433d8ffd18574211e0e2.68d0e0ccbc6fc4b7fa616df9ab122395' Trnhdr/recoverying block id: 0 Trnhdr/recovery failed: 1 Trnhdr/recovery failed time: 2011-12-16T16:29:15.947057+0100 Trnhdr/recovery commit: 1 ------------------------------------------------------------------------ Block: 8, next block in chain: 7 Block type: resource manager record Rsrmgr/rmid: 2 Rsrmgr/state/next_verb: 0 Rsrmgr/state/xa_r_state: 1 Rsrmgr/state/dynamic: 0 Rsrmgr/state/xa_td_state: 10 Rsrmgr/state/xa_s_state: 33 Rsrmgr/lixac_conf.xml name: 'LIXAmonkey1staRM' Rsrmgr/xa_name: 'LIXA Monkey RM (static)' Rsrmgr/xa_open_info: 'monkeyrm.conf' Rsrmgr/xa_open_flags: 0x0 Rsrmgr/xa_open_rc: 0 Rsrmgr/xa_start_flags: 0x0 Rsrmgr/xa_start_rc: 0 Rsrmgr/xa_end_flags: 0x4000000 Rsrmgr/xa_end_rc: 0 Rsrmgr/xa_prepare_flags: 0x0 Rsrmgr/xa_prepare_rc: 0 Rsrmgr/xa_commit_flags: 0x0 Rsrmgr/xa_commit_rc: 0 Rsrmgr/xa_rollback_flags: 0x0 Rsrmgr/xa_rollback_rc: 0 Rsrmgr/xa_forget_flags: 0x0 Rsrmgr/xa_forget_rc: 0 Rsrmgr/ax_reg_flags: 0x0 Rsrmgr/ax_reg_rc: 0 Rsrmgr/ax_unreg_flags: 0x0 Rsrmgr/ax_unreg_rc: 0 Rsrmgr/recovery_rc: 6 ------------------------------------------------------------------------ Block: 7, next block in chain: 6 Block type: resource manager record Rsrmgr/rmid: 1 Rsrmgr/state/next_verb: 0 Rsrmgr/state/xa_r_state: 1 Rsrmgr/state/dynamic: 1 Rsrmgr/state/xa_td_state: 20 Rsrmgr/state/xa_s_state: 33 Rsrmgr/lixac_conf.xml name: 'OracleXE_dynreg' Rsrmgr/xa_name: 'Oracle_XA' Rsrmgr/xa_open_info: 'Oracle_XA+Acc=P/hr/hr+SesTm=30+LogDir=/tmp+threads=true+DbgFl=7+Loose_Coupling=true' Rsrmgr/xa_open_flags: 0x0 Rsrmgr/xa_open_rc: 0 Rsrmgr/xa_start_flags: 0x0 Rsrmgr/xa_start_rc: 0 Rsrmgr/xa_end_flags: 0x4000000 Rsrmgr/xa_end_rc: 0 Rsrmgr/xa_prepare_flags: 0x0 Rsrmgr/xa_prepare_rc: 0 Rsrmgr/xa_commit_flags: 0x0 Rsrmgr/xa_commit_rc: 0 Rsrmgr/xa_rollback_flags: 0x0 Rsrmgr/xa_rollback_rc: 0 Rsrmgr/xa_forget_flags: 0x0 Rsrmgr/xa_forget_rc: 0 Rsrmgr/ax_reg_flags: 0x0 Rsrmgr/ax_reg_rc: 0 Rsrmgr/ax_unreg_flags: 0x0 Rsrmgr/ax_unreg_rc: 0 Rsrmgr/recovery_rc: 0 ------------------------------------------------------------------------ Block: 6, next block in chain: 5 Block type: resource manager record Rsrmgr/rmid: 0 Rsrmgr/state/next_verb: 0 Rsrmgr/state/xa_r_state: 1 Rsrmgr/state/dynamic: 0 Rsrmgr/state/xa_td_state: 10 Rsrmgr/state/xa_s_state: 33 Rsrmgr/lixac_conf.xml name: 'PostgreSQL_stareg' Rsrmgr/xa_name: 'PostgreSQL[LIXA]' Rsrmgr/xa_open_info: 'dbname=testdb' Rsrmgr/xa_open_flags: 0x0 Rsrmgr/xa_open_rc: 0 Rsrmgr/xa_start_flags: 0x0 Rsrmgr/xa_start_rc: 0 Rsrmgr/xa_end_flags: 0x4000000 Rsrmgr/xa_end_rc: 0 Rsrmgr/xa_prepare_flags: 0x0 Rsrmgr/xa_prepare_rc: 0 Rsrmgr/xa_commit_flags: 0x0 Rsrmgr/xa_commit_rc: 0 Rsrmgr/xa_rollback_flags: 0x0 Rsrmgr/xa_rollback_rc: 0 Rsrmgr/xa_forget_flags: 0x0 Rsrmgr/xa_forget_rc: 0 Rsrmgr/ax_reg_flags: 0x0 Rsrmgr/ax_reg_rc: 0 Rsrmgr/ax_unreg_flags: 0x0 Rsrmgr/ax_unreg_rc: 0 Rsrmgr/recovery_rc: 0 ------------------------------------------------------------------------ [...] ======================================================================== First file ('/opt/lixa/var/lixad_status2_1') will be dumped Magic number is: 24848 (24848) Level is: 1 (1) Last sync timestamp: 2011-12-16T16:23:40.512332+0100 Size: 10 blocks Used block chain starts at: 0 (empty chain) Free block chain starts at: 1 Dumping records following physical order: 0 Dumping records following free block chain: 0 Dumping records following used block chain: 1 ======================================================================== First file ('/opt/lixa/var/lixad_status3_1') will be dumped Magic number is: 24848 (24848) Level is: 1 (1) Last sync timestamp: 2011-12-16T16:38:47.933099+0100 Size: 10 blocks Used block chain starts at: 4 Free block chain starts at: 5 Dumping records following physical order: 0 Dumping records following free block chain: 0 Dumping records following used block chain: 1 ------------------------------------------------------------------------ [...] |
The chain composed of blocks 9 (transaction manager record) and
8, 7, 6 (resource manager records) keeps the state of the
“recovery failed” transaction:
Trnhdr/recovery failed: 1
If “LIXA Monkey RM” Resource Manager was a real
Resource Manager, you could manually recovery the transaction
used the procedure shown in
the section called “Recoverying forgotten transactions”.
Unfortunately the LIXA Monkey RM is a fake Resource Manager and it
does not save the state anywhere: it is not able to correctly
answer to xa_recover()
and there is no way to
show this last step.
As a final step, to clean-up the “recovery failed” state from LIXA state server, you have to recycle it using a special option:
[Shell terminal session] | |||||||||||||||||||||||||||||||||||||||||||||||||
tiian@ubuntu:~/tmp$ sudo su - lixa lixa@ubuntu:~$ pkill lixad lixa@ubuntu:~$ /opt/lixa/sbin/lixad --daemon --clean-failed lixa@ubuntu:~$ pkill lixad[a] lixa@ubuntu:~$ exit logout | |||||||||||||||||||||||||||||||||||||||||||||||||
[a] The LIXA state server is stopped after start-up to guarantee the content of the state file(s) on the disk are up-to-date before dumping them. |
Dump again the content of the state server:
[Shell terminal session] |
tiian@ubuntu:~/tmp$ sudo su - lixa lixa@ubuntu:~$ /opt/lixa/sbin/lixad --dump=u >/tmp/bar lixa@ubuntu:~$ exit logout |
and check the content of the dump file again: the blocks 10, 9, 8, 7 should not be used or, if re-used, they should be related to a different transaction.
Don't use
--clean-failed
as a default
when starting LIXA state server (lixad):
this option should be used only after you inspected the content
of the state server and solved any in-doubt transaction.
Operating this type of recovery can be easier if the LIXA state server is running in “maintenance mode” (see the section called “Maintenance mode execution”): only lixar can access the online content of the LIXA state server and ordinary clients (Application Program) can not perform transactions. It may be useful using the state server in “maintenance mode”, but only you can decide if your business rules allows it.
The LIXA technology does not ask you to:
recycle the lixad state server to see
the current status when dumping the content of the state
server files (--dump
option)
start the lixad state server in “maintenance mode” when performing “manual (cold) recovery”
The LIXA technology provides you these functions to simplify the administrative tasks, but deciding which option should be used it's your own responsability.
As explained in the section called “Automatic recovery concepts”, the “automatic (warm) recovery” is automatically performed by the LIXA Transaction Manager under the condition of “Application Program equivalence” (see the section called “Application Program equivalence”).
Sometimes you have to perform a “manual (cold) recovery” because “Application Program equivalence” is no more available. This is a typical scenario:
an Application Program crashed and its transaction is in “in-doubt/prepared (recovery pending) status”
the Application Program didn't specify a custom value for
the environment variable LIXA_JOB
you changed the content of file
lixac_conf.xml
, for example you added a
new profile
the MD5 signature of file lixac_conf.xml
changed, the associated branch qualifier
changed, new transactions would be associated to a
different job
automatic (warm) recovery wouldn't be automatically performed and the transaction becomes a “forgotten” transaction.
Inspecting the list of recovery pending transaction using lixar -p does not return the desired transaction... What's going on?
By default, the lixar utility filters the
transactions
retrieved by the Resource Managers: it keeps only the transactions
with the same branch qualifier
of the current
lixar running instance.
If you look at the section called “Application Program equivalence”,
you will realize that lixar
retrieves only the transactions started with
the same lixac_conf.xml
,
the same $(LIXA_PROFILE
) and
the same gethostid()
.
This behaviour helps the system engineer to see only the
“relevant” subset of the whole
“recovery pending” set.
If you are looking for all the transactions in
“recovery pending” status currently kept by the
Resource Managers associated to the current
LIXA_PROFILE
, you must specify the
--bypass-bqual-check
(-b
) option.
This paragraph explains a side effect of the behavior explained above.
Suppose the following scenario happens:
your LIXA Transaction Manager saved the state of a “prepared/recovery pending” transaction inside the LIXA state server
you are not aware of the existence of that “prepared/recovery pending” transaction
you change the content of lixac_conf.xml
file
lately you discover there is a “prepared/recovery pending” transaction that can not be automatically recovered by LIXA Transaction Manager and you perform manual recovery
unfortunately, LIXA state server will keep some records related to the manually recovered transaction “forever”.
Manual recovery does query LIXA state server to retrieve the list of “prepared/recovery pending” transactions because it is designed to solve the issue “state server does not have information” related to some transactions.
The lixad option
--clean-failed
explained in
the section called “Recoverying a “recovery failed” transaction”
does not help because the transaction was not in
“recovery failed” state.
At the time of this writing there is not a specific tool to remove this type of “ghosts” from LIXA state server, you can clean-up those records using a cold start as explained in the section called “Recoverying forgotten transactions”. You must pay attention to avoid “in flight transaction purge”.
In a future release this behavior could be improved if any user asked for it.
The LIXA project technology can help you dealing with a different Transaction Manager (see the section called “Transaction Manager and Transaction Monitor”) too: the lixar utility program can be used to inspect (and manually recover) a transaction managed by a different Transaction Manager using two command options together:
--bypass-bqual-check
(-b
):
to bypass branch qualifier based filtering
--bypass-formatid-check
(-B
):
to bypass format ID based filtering
If you use the above command options together,
lixar utility program will inspect (and eventually
commit/rollback) any XA transaction known by the Resource Managers
associated to the current LIXA_PROFILE
.
It's your own responsability to define a LIXA profile that's compatible with the configuration of the third party Transaction Manager used when managing the transaction: it must contain the same Resource Managers in the same order with the same options.
LIXA software is libre/free/open source software and you use it exclusively and consciously without any warranty at your own risk. Using LIXA software technology will probably put you in an unsupported state regarding to the third party Transaction Manager supplier.
In the previous sections we have dealt with “format id” and “branch qualifier”; you could ask “How can I discover the ‘format id’ and the ‘branch qualifier’ branch qualifier associated to my own Application Program?”
The easiest way to pick-up them is to use lixat
utility program using the same
LIXA_PROFILE
you use when running your
Application Program:
tiian@ubuntu:~/src/lixa$ sudo su - lixa lixa@ubuntu:~$ /opt/lixa/sbin/lixad --daemon lixa@ubuntu:~$ ps -ef|grep lixad|grep -v grep lixa 8122 1 0 23:02 ? 00:00:00 /opt/lixa/sbin/lixad --daemon lixa@ubuntu:~$ exit logout tiian@ubuntu:~/src/lixa$ /opt/lixa/bin/lixat -c tx_open(): 0 tx_begin(): 0 tx_info(): 1 xid/formatID.gtrid.bqual = 1279875137.56af7a66398f4eca82b8826fe10165ad.9e4c11057107c73366c9fc421eaa85ca tx_commit(): 0 tx_close(): 0 tiian@ubuntu:~/src/lixa$ /opt/lixa/bin/lixat -c tx_open(): 0 tx_begin(): 0 tx_info(): 1 xid/formatID.gtrid.bqual = 1279875137.218f05b733fb4bc1aa3d21eeaf01fbab.9e4c11057107c73366c9fc421eaa85ca tx_commit(): 0 tx_close(): 0
From the above terminal output:
“formatID” is constant and the LIXA Transaction Manager uses the exadecimal value 1279875137 (that's the ASCII sequence of string “LIXA”)
“branch qualifier” is computed as explained in the section called “Application Program equivalence” and the value in the above example is “9e4c11057107c73366c9fc421eaa85ca”
“global transaction id” must be different for
any transaction and you can see two different values in the
above examples (it's computed using
uuid_generate()
function).
If you were interested in retrieving them programmatically
(from your own C language program) you could use the standard
tx_info()
function that returns a
TXINFO struct ([TXspec]).
Please pay attention the
XID struct contains binary data
(it does not contain ASCII data).
[54] This should never happen: it could be a bug in LIXA project software or it might be the consequence of a “cold start” (you removed the state files) of lixad
[55] The state server can be analyzed without stopping it (pkill lixad), but it may happen you will not see the current content because the state server has not yet synchronized the state file(s). If the LIXA state server is processing many transactions per second you will probably see an up-to-date state, but if it was “sleeping” you wouldn't.