In this post I would like to share my experience of setting up an Oracle Data Guard (DG) Fast-Start Failover High Availability (HA) environment with a Physical Standby Database and Data Guard Broker enabled, using 2 PCs with Ubuntu 7.10 Desktop and Oracle 10g R2.
Data Guard and High Availability is not much of DBA work really, there is no tuning, data modelling or core DBA skills like SQL involved.
This is all about the chit-chat between two servers, lots of networking magic-do, and one server informing the other when it is down. Is like SysAdmin disguised as a DBA. You can also call it a "poor man's RAC", I suppose.
It took me about 2-3 hours, depending on the size of your database, to set up and successfully implement the following.
Summary of steps
My instructions will be as brief and as neat as possible. Here we go:
1. Configure the primary database
1.1 Enable Forced Logging on the primary database
SQL> conn / as sysdba
SQL> alter database force logging
1,2 Create a password file with orapwd on then primary database
$ orapwd file=/usr/local/oracle/product/10.2.0.1/dbs/orapwtest01 password=kubi entries=2
1.3 Configure standby redo Logs on the primary database, these will be needed when the primary changes role and becomes standby
SQL>ALTER DATABASE ADD STANDBY LOGFILE '/u01/oradata/test01/sbyredo01.log' SIZE 50M;
SQL>ALTER DATABASE ADD STANDBY LOGFILE '/u01/oradata/test01/sbyredo02.log' SIZE 50M;
SQL>ALTER DATABASE ADD STANDBY LOGFILE '/u01/oradata/test01/sbyredo03.log' SIZE 50M;
SQL>ALTER DATABASE ADD STANDBY LOGFILE '/u01/oradata/test01/sbyredo04.log' SIZE 50M;
1.4 Get a better than the default standby_archive_dest location
SQL> alter system set standby_archive_dest='/u01/oradata/test01/standby_archive_dest';
1.5 Create pfile from the existing database spfile, if you have used dbca to create your db you probably have an spfile
SQL> create pfile='/usr/local/oracle/product/10.2.0.1/dbs/inittest01.ora' from spfile;
1.6 Then Edit the pfile and add those Data Guard specific parameters
On the primary database box edit the pfile for the primary database like this, add the following at the end of the file, change your host names and database SID accordingly.
# DG Config PRIMARY ROLE initialization parameters
*.db_unique_name=host_istanbul
*.db_domain='mediterranean'
*.log_archive_config='DG_CONFIG=(host_istanbul,host_london)'
*.log_archive_dest_1='LOCATION=/u01/oradata/test01/arch VALID_FOR=(ALL_LOGFILES,ALL_ROLES)
DB_UNIQUE_NAME=host_istanbul'
*.log_archive_dest_2='SERVICE=TO_HOST_london LGWR ASYNC VALID_FOR=(ONLINE_LOGFILES,PRIMARY_ROLE)
DB_UNIQUE_NAME=host_london'
*.log_archive_dest_state_1=ENABLE
*.log_archive_dest_state_2=ENABLE
*.log_archive_max_processes=4
*.db_flashback_retention_target=4320
*.undo_retention=3600
#
# DG Config STANDBY ROLE initialization parameters
*.fal_server=host_london
*.fal_client=host_istanbul
*.standby_file_management=auto
#
# Flashback
*.db_recovery_file_dest=/u01/oradata/test01/fra
*.db_recovery_file_dest_size=524288000
1.7 Then startup the primary database instance using the pfile with the new Data Guard parameters
SQL> startup force pfile='/usr/local/oracle/product/10.2.0.1/dbs/inittest01.ora';
1.8 After the instance startup, recreate the spfile to include the new added Data Guard parameters
SQ> create spfile from pfile='/usr/local/oracle/product/10.2.0.1/dbs/inittest01.ora';
1.9 Bounce the primary database so that it starts up using the spfile and the new Data Guard parameters
SQL> shutdown immediate;
SQL> startup mount;
1.10 Put the primary database in archivelog mode
SQL> alter database archivelog;
1.11 Enable flashback on the primary database, flashback will be necessary for fast-start failovers
SQL> alter database flashback on;
SQL> alter database open;
Now you have a database ready to be used as primary database with all Data Guard configuration parameters in place. It is now time to create the physical standby database, lets move on.
2. Create a physical standby database from your primary database using RMAN
2.1 Create a backup copy of the primary database data files with RMAN (Oracle 10g R2) ready to be used for standby duplication.
# !/bin/bash
# Unix controls
#
trap cleanup 1 2 3 15
cleanup()
{
echo "Caught CTRL-C Signal ... exiting script."
exit 1
}
#!/bin/bash
# Oracle Environemt Variables
#
export ORACLE_SID=test01
export ORACLE_BASE=/usr/local/oracle
export ORACLE_HOME=/usr/local/oracle/product/10.2.0.1
export PATH=$PATH:${ORACLE_HOME}/bin
#
rman target=/ <<
EOF
configure controlfile autobackup on;
CONFIGURE CONTROLFILE AUTOBACKUP FORMAT FOR DEVICE TYPE DISK TO '/u01/backup/test01/standby_backups/autobkpcontrolfile_%F';
run {
change archivelog all crosscheck;
allocate channel rman_back_ch01 type disk;
allocate channel rman_back_ch02 type disk;
backup as compressed backupset incremental level 0 database
format '/u01/backup/test01/standby_backups/sbybk_inc0_%s_%p' include current controlfile for standby;
sql "alter system archive log current";
backup as compressed backupset archivelog all format '/u01/backup/test01/standby_backups/archlog_%s_%p';
release channel rman_back_ch01;
release channel rman_back_ch02;
}
EOF
2.2 Prepare an Initialization Parameter File for the standby database
oracle@istanbul:~$ scp /usr/local/oracle/product/10.2.0.1/dbs/inittest01.ora
oracle@london:/usr/local/oracle/product/10.2.0.1/dbs/
On the standby box after you copy a pfile from the primary database default location edit like this:
# DG Config PRIMARY ROLE initialization parameters
*.db_unique_name=host_london
*.db_domain='mediterranean'
*.log_archive_config='DG_CONFIG=(host_istanbul,host_london)'
*.log_archive_dest_1='LOCATION=/u01/oradata/test01/arch VALID_FOR=(ALL_LOGFILES,ALL_ROLES)
DB_UNIQUE_NAME=host_london'
*.log_archive_dest_2='SERVICE=TO_HOST_istanbul LGWR ASYNC VALID_FOR=(ONLINE_LOGFILES,PRIMARY_ROLE)
DB_UNIQUE_NAME=host_istanbul'
*.log_archive_dest_state_1=ENABLE
*.log_archive_dest_state_2=ENABLE
*.log_archive_max_processes=4
*.db_flashback_retention_target=4320
*.undo_retention=3600
#
# DG Config STANDBY ROLE initialization parameters
*.fal_server=host_istanbul
*.fal_client=host_london
*.standby_file_management=auto
# Flashback
*.db_recovery_file_dest=/u01/oradata/test01/fra
*.db_recovery_file_dest_size=524288000
#
2.3 Startup the standby database in NOMOUNT state and create it
SQL> startup pfile='/usr/local/oracle/product/10.2.0.1/dbs/inittest01.ora' nomount;
Before you duplicate the database for standby with RMAN you have to configure connectivity between the 2 boxes with tnsnames.ora.
On the Primary System your tnsnames.ora file should look like this:
TO_HOST_london =
( DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = london)(PORT = 1521)))
(CONNECT_DATA =
(SERVICE_NAME = test01))
)
TO_HOST_istanbul =
( DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = istanbul)(PORT = 1521)))
(CONNECT_DATA =
(SERVICE_NAME = test01))
)
On the Standby System your tnsnames.ora file should look like this:
TO_HOST_istanbul =
( DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = istanbul)(PORT = 1521)))
(CONNECT_DATA =
(SERVICE_NAME = test01))
)
TO_HOST_london =
( DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = london)(PORT = 1521)))
(CONNECT_DATA =
(SERVICE_NAME = test01))
)
This is so that the boxes can communicate with each other via those service names. Then you can duplicate the primary database on the standby box using RMAN like this
Make sure orapwd is run on london as well
orapwd file=/usr/local/oracle/product/10.2.0.1/dbs/orapwtest01 password=kubi entries=2
Then connect to RMAN on the primary database and start the creation of the standby database via RMAN.
oracle@istanbul$ rman target /
RMAN>
connect auxiliary sys/kubi@to_host_london
run
{
allocate auxiliary channel ch1 type disk;
duplicate target database for standby dorecover nofilenamecheck;
release channel ch1;
}
2.4 Add the standby log files on the standby database
If you watch the backup in the alert.log file RMAN gives the recommendation of adding the standby log files anyway.
ALTER DATABASE ADD STANDBY LOGFILE '/u01/oradata/test01/sbyredo03.log' SIZE 52428800;
ALTER DATABASE ADD STANDBY LOGFILE '/u01/oradata/test01/sbyredo02.log' SIZE 52428800;
ALTER DATABASE ADD STANDBY LOGFILE '/u01/oradata/test01/sbyredo01.log' SIZE 52428800;
ALTER DATABASE ADD STANDBY LOGFILE '/u01/oradata/test01/sbyredo04.log' SIZE 52428800;
2.5 Create SPFILE for the standby database
SQL> create spfile from pfile='/usr/local/oracle/product/10.2.0.1/dbs/inittest01.ora';
2.6 Bounce the standby database to pickup the changes and start with an spfile.
SQL> startup force mount;
2.7 Put the standby database in ARCHIVELOG mode.
SQL> alter database archivelog;
2.8 Put the standby database inf FLASHBACK mode
SQL> alter database flashback on;
2.9 Put the standby database in constant recovery mode receiving archived logs from the primary database
To start Redo Apply, issue the following statement:
SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT FROM SESSION;
3. Oracle Data Guard and Data Broker (DGMGRL) configuration
3.1. On both Primary and Standby Database set the parameter DG_BROKER_START to TRUE
SQL> ALTER SYSTEM SET DG_BROKER_START=TRUE SCOPE=BOTH;
3.2. Add global_db_name parameters and other parameters to the listener.ora file on both primary and standby as follows, on Primary Database your listener.ora
SID_LIST_LISTENER =
(SID_LIST =
(SID_DESC =
(SID_NAME = PLSExtProc)
(ORACLE_HOME = /usr/local/oracle/product/10.2.0.1)
(PROGRAM = extproc)
)
(SID_DESC =
(SID_NAME = test01)
(ORACLE_HOME = /usr/local/oracle/product/10.2.0.1)
)
(SID_DESC =
(SID_NAME=pdb)
(GLOBAL_DBNAME=host_istanbul_DGMGRL.mediterranean)
(ORACLE_HOME = /usr/local/oracle/product/10.2.0.1)
)
)
LISTENER = (DESCRIPTION =
(ADDRESS_LIST=(ADDRESS=(PROTOCOL=tcp)(HOST=istanbul)
(PORT=1521))))
SQLNET.EXPIRE_TIME=2
on Standby Database your listener.ora
SID_LIST_LISTENER =
(SID_LIST =
(SID_DESC =
(SID_NAME = PLSExtProc)
(ORACLE_HOME = /usr/local/oracle/product/10.2.0.1)
(PROGRAM = extproc)
)
(SID_DESC =
(SID_NAME = test01)
(ORACLE_HOME = /usr/local/oracle/product/10.2.0.1)
)
(SID_DESC =
(SID_NAME=pdb)
(GLOBAL_DBNAME=host_london_DGMGRL.mediterranean)
(ORACLE_HOME = /usr/local/oracle/product/10.2.0.1)
)
)
LISTENER = (DESCRIPTION =
(ADDRESS_LIST=(ADDRESS=(PROTOCOL=tcp)(HOST=london)
(PORT=1521))))
SQLNET.EXPIRE_TIME=2
3.3 Create the DGMGRL broker Configuration
$dgmgrl
DGMGRL> connect sys/kubi@to_host_istanbul;
DGMGRL> create configuration mediterranean_dg as primary database is host_istanbul connect identifier is to_host_istanbul;
DGMGRL> add database host_london as connect identifier is to_host_london maintained as physical;
DGMGRL> enable configuration;
3.4. Enabling Fast-Start Failover and the Observer
DGMGRL> EDIT DATABASE 'host_istanbul' SET PROPERTY 'LogXptMode'='SYNC';
DGMGRL> EDIT DATABASE 'host_london' SET PROPERTY 'LogXptMode'='SYNC';
3.5 Specify the FastStartFailoverTarget property
DGMGRL> EDIT DATABASE 'host_istanbul' SET PROPERTY FastStartFailoverTarget='host_london';
DGMGRL> EDIT DATABASE 'host_london' SET PROPERTY FastStartFailoverTarget='host_istanbul';
3.6. Upgrade the protection mode to MAXAVAILABILITY, if necessary.
DGMGRL> EDIT CONFIGURATION SET PROTECTION MODE AS MAXAVAILABILITY;
3.7. Enable fast start failover
DGMGRL> ENABLE FAST_START FAILOVER;
3.8. Start the observer.
DGMGRL> CONNECT sys/kubi@to_host_istanbul;
DGMGRL> START OBSERVER;
Observer started
The above command will just hang, will not return you back to the prompt, this is how the observer is started, it is normal. Start anothter dgmgrl prompt for the rest of the operations.
3.8 Check the configuration so far
DGMGRL> connect sys/kubi@to_host_istanbul
Connected.
DGMGRL> show configuration;
Configuration
Name: mediterranean_dg
Enabled: YES
Protection Mode: MaxAvailability
Fast-Start Failover: ENABLED
Databases:
host_istanbul - Primary database
host_london - Physical standby database
- Fast-Start Failover target
Current status for "mediterranean_dg":
SUCCESS
Also see the verbose output where you can see how long it will take for a box to failover to the other.
DGMGRL> SHOW CONFIGURATION VERBOSE;
Configuration
Name: mediterranean_dg
Enabled: YES
Protection Mode: MaxAvailability
Fast-Start Failover: ENABLED
Databases:
host_istanbul - Primary database
host_london - Physical standby database
- Fast-Start Failover target
Fast-Start Failover
Threshold: 30 seconds
Observer: istanbul
Current status for "mediterranean_dg":
SUCCESS
4. Do the failover
Test by killing PMON of database instance test01 on the primary database box istanbul, this automatically will trigger a failover within 30 seconds to the standby database as configured. You can watch this exciting event happening by looking at the alert_tst01.log on both boxes simultaneously.Once you crashed test01 on host_istanbul, then on host_london go to dgmgrl and check out what it says.
oracle@london$ dgmgrl
DGMGRL> connnect sys/kubi@to_host_london
DGMGRL> show configuration;
Configuration
Name: mediterranean_dg
Enabled: YES
Protection Mode: MaxAvailability
Fast-Start Failover: ENABLED
Databases:
host_istanbul - Physical standby database (disabled)
- Fast-Start Failover target
host_london - Primary database
Current status for "mediterranean_dg":
Warning: ORA-16608: one or more databases have warnings
That is, host_istanbul database is down!
Be careful, the database is down and NOT the box. As I am running observer on host_istanbul as well, if you unplug the box, probably nothing nice will happen as the observer will be incapable of detecting anything. In a real life situation I suppose the observer runs on a 3rd piece of hardware. I haven't tested the 'unplugging' of the box, I don't know what it will do. I just 'kill -9' the PMON backgroud process for the primary database instance test01.
Anyway, your primary database now is host_london. You have failed over to it successfully in 30 seconds.
If you followed the configuration steps above you will be able to REINSTATE a disabled standby database as you have flashback enabled and put it back in the High Availability environment after you mount it. This is what we will do next.
5. Reinstate the failed primary database as a physical standby database after the failover
Fast-Start failover is a very good configuration for High Availability (HA) as it requires no intervention from the DBA, almost, you wish!
Say the primary database fails, say at 04:00 am, when you are sleeping, you get no phonecall from your boss, and the standby database becomes primary. Oleey! business continues. The next day when you realise what happened, you can with Data Guard Observer and using flash back logs reinstate the failed primary to be a standby to the new primary.
Basically Data Guard 'rolls forward' the SCN of the old Primary to match that of the now new primary (ex Standby). That is what Flashback Database does and it is all about. Moving the SCN to catch up with the primary.
In all situations, you will have to manually mount the failed ex-Primary database, probably the next day, if it has no media failure and can be restarted without any problems.
Oracle Data Guard Broker can then reinstate the failed primary database as the new standby database. Changing of roles. But, before you reinstate the failed primary database host_istanbul, first see the state of both databases by logging in to the DGMGRL from the new primary database host_london.
5.1 Login to DGMGML from the new primary database host and checkout your configuration
oracle@london:~$ dgmgrl
DGMGRL for Linux: Version 10.2.0.1.0 - Production
Copyright (c) 2000, 2005, Oracle. All rights reserved.
Welcome to DGMGRL, type "help" for information.
DGMGRL> connect sys/kubi@to_host_london;
DGMGRL> show configuration verbose;
Configuration
Name: mediterranean_dg
Enabled: YES
Protection Mode: MaxAvailability
Fast-Start Failover: ENABLED
Databases:
host_istanbul - Physical standby database (disabled)
- Fast-Start Failover target
host_london - Primary database
Fast-Start Failover
Threshold: 30 seconds
Observer: istanbul
Current status for "mediterranean_dg":
Warning: ORA-16608: one or more databases have warnings
5.2 After this you are sure you have failed over and the new primary database is working properly go to the failed ex-primary database box and mount the instance test01.
oracle@istanbul:~$ sqlplus /nolog
SQL*Plus: Release 10.2.0.1.0 - Production on Thu May 24 23:35:58 2008
Copyright (c) 1982, 2005, Oracle. All rights reserved.
SQL> conn / as sysdba
Connected to an idle instance.
SQL> startup mount;
ORACLE instance started.
Total System Global Area 608174080 bytes
Fixed Size 1220844 bytes
Variable Size 167776020 bytes
Database Buffers 436207616 bytes
Redo Buffers 2969600 bytes
Database mounted.
SQL>
5.3 Go back to DGMGRL on new primary database host_london and run the reinstate database command
DGMGRL> reinstate database 'host_istanbul';
You have to give it some time, the configuration will not immediately show the reinstated status, but after a few minutes you will get the following.
DGMGRL> show configuration;
Configuration
Name: mediterranean_dg
Enabled: YES
Protection Mode: MaxAvailability
Fast-Start Failover: ENABLED
Databases:
host_istanbul - Physical standby database
- Fast-Start Failover target
host_london - Primary database
Current status for "mediterranean_dg":
SUCCESS
Congratulations! You just had a successfully failed over and now your primary database is host_london and the standby database is host_istanbul and the obsesrver is watching them.
NOTES
Be patient when you are working with this configuration use a test system, you can use VMWare to have multiple nodes in one box and make sure you have:
- password files with the same password on both machines.
- you have tnsnames.ora set up on both machines and you can communicate between both machines via service names
- you have flashback enabled on both databases
- you have configured a db_domain and is the same in both init.ora files
References:
- Oracle® Data Guard Concepts and Administration 10g Release 2 (10.2) Part Number B14239-01 link: http://youngcow.net/doc/oracle10g/server.102/b14239/rcmbackp.htm
- Using Data Guard to do fail-over and other cool features here:
http://download-uk.oracle.com/docs/cd/B19306_01/server.102/b14230/cli.htm#i1005573