Quantcast
Channel: OracleBuffer
Viewing all articles
Browse latest Browse all 58

Shareplex: Restore or Recreate missing/corrupted object cache on Target system

$
0
0

Introduction

This post explains and demonstrates the process involved in restoring or recreation of an object cache when it is found to be missing or corrupted. An object cache is a file that is created on source and subsequently on the target system once a Shareplex configuration is activated. This file contains entries that are Shareplex representation of the Oracle table structures.

More details about Shareplex object cache can be found here.

This post is primarily focused on the process of restoring or recreating the object cache file when it is found to be missing or corrupted. If by any means, the object cache gets corrupted or missing, Shareplex replication would break as it can’t fetch the object structures from the respective cache.

When an object cache is missing or corrupted on the target site, we can observe errors similar to the following in the Shareplex event log.

Notice   2015-12-05 17:13:57.245056 11275 1980524368 Poster: Input/output: File not found: /shareplex/mypdb01_2103/state/0xc0a8e60d+PP+mylab-01+sp_opst+o.orppdb13_1-o.mypdb_01-objcache_sp_opst.5  (posting from orppdb13_1, queue mylab-01, to mypdb_01) [module osp]
Error    2015-12-05 17:13:57.245146 11275 1980524368 Poster cannot read object cache for actid 5  (posting from orppdb13_1, queue mylab-01, to mypdb_01)
Error    2015-12-05 17:13:57.245856 11275 1980524368 Poster stopped: Internal error encountered; cannot continue  (posting from orppdb13_1, queue mylab-01, to mypdb_01)
Info     2015-12-05 17:13:58.251920 11272 4031776592 Poster exited with code=1, pid = 11275  (posting from orppdb13_1, queue mylab-01, to mypdb_01)

or something like

Info     2015-12-05 19:11:27.546419 26993 1 Poster exited with code=1, pid = 27948  (posting from orppdb13_1, queue mylab-01, to mypdb_01)
Error    2015-12-05 19:11:27.515512 27948 1 Poster stopped: Internal error encountered; cannot continue  (posting from orppdb13_1, queue mylab-01, to mypdb_01)
Error    2015-12-05 19:11:27.498093 27948 1 Poster: 15010 - Error reading pre-sync'd objcache for datasource objectcache_name, actid 4 (posting from orppdb13_1, queue mylab-01, to mypdb_01) [module opo]
Error    2015-12-05 19:11:27.484628 27948 1 Poster: 17006 - Cannot open object cache: Input/output: /shareplex/mypdb01_2103/state/0x0a01009c+PP+mylab-01+sp_opst+o.orppdb13_1-o.mypdb_01-objcache_sp_opst.6: open() failed (posting from orppdb13_1, queue mylab-01, to mypdb_01) [module osp]
Info     2015-12-05 19:11:20.075846 27948 1 Poster launched, pid = 27948  (posting from orppdb13_1, queue mylab-01, to mypdb_01)

These errors leads the Post process to stop as it can’t access the object cache for reading the object structures.

Good news is that, there is a way (depends) to restore or recreate this object cache if required. When a Shareplex configuration is activated, it creates the object cache against that activation ID on both source and target systems.

Shareplex creates three copies of the object cache on the source system (for configuration, read process and capture process) and a single copy on the target system (for post process) for a given activation. These files have the following formats.

On Source System

---//
---// Object Cache format on source system //---
---//
o.[Source_ORACLE_SID]-objcache_sp_conf.[Activation_ID] --> For the Shareplex configuration
o.[Source_ORACLE_SID]-objcache_sp_ordr.[Activation_ID] --> For the Reader process
o.[Source_ORACLE_SID]-objcache_sp_ocap.[Activation_ID] --> For the Capture process

Example:

o.orppdb13_1-objcache_sp_conf.5
o.orppdb13_1-objcache_sp_ordr.5
o.orppdb13_1-objcache_sp_ocap.5

On Target System

 
---//
---// Object Cache format on target system //---
---//
[Hexa_ID]+PP+[source_host]+sp_opst+[o.Source_ORACLE_SID]-[o.Target_ORACLE_SID]-objcache_sp_opst.[Activation_ID]

Example:

0xc0a8e60d+PP+mylab-01+sp_opst+o.orppdb13_1-o.mypdb_01-objcache_sp_opst.5 

Note: All these object cache files contains same information and are synced by Shareplex whenever any changes happen to the object structures.

Now, when any one of these files get corrupted or missing, we can recreate or restore it using any other available copies. What if we loose all of these object caches?.

In that case, there is no way to restore these files and the only option would be to reactivate the configuration.Let’s quickly go through a demonstrations to understand, how this restoration process works.

Demonstration

In my case the object cache on the target site somehow went missing and Shareplex reported following errors in the event log.

##---
##--- event_log reported following errors
##---
Notice   2015-12-05 17:13:57.245056 11275 1980524368 Poster: Input/output: File not found: /shareplex/mypdb01_2103/state/0xc0a8e60d+PP+mylab-01+sp_opst+o.orppdb13_1-o.mypdb_01-objcache_sp_opst.5  (posting from orppdb13_1, queue mylab-01, to mypdb_01) [module osp]
Error    2015-12-05 17:13:57.245146 11275 1980524368 Poster cannot read object cache for actid 5  (posting from orppdb13_1, queue mylab-01, to mypdb_01)
Error    2015-12-05 17:13:57.245856 11275 1980524368 Poster stopped: Internal error encountered; cannot continue  (posting from orppdb13_1, queue mylab-01, to mypdb_01)
Info     2015-12-05 17:13:58.251920 11272 4031776592 Poster exited with code=1, pid = 11275  (posting from orppdb13_1, queue mylab-01, to mypdb_01)

As we can observe from the error, Shareplex is not able to find the object cache file ‘/shareplex/mypdb01_2103/state/0xc0a8e60d+PP+mylab-01+sp_opst+o.orppdb13_1-o.mypdb_01-objcache_sp_opst.5’ for the Poster process and that resulted the poster process to stop.

If we look into the process status, we can see the POST process is stopped due to the errors; as it can’t read the object structures from the cache.

##---
##--- Poster process in error state
##---
sp_ctrl (mylab-02:2103)> show

Process    Source                               Target                 State                   PID
---------- ------------------------------------ ---------------------- -------------------- ------
Capture    o.mypdb_01                                                  Running               11273
Read       o.mypdb_01                                                  Running               11276
Import     mylab-01                             mylab-02               Running               11342
Post       o.orppdb13_1-mylab-01                o.mypdb_01             Stopped - due to error
Export     mylab-02                             mylab-01               Running               11274

As mentioned earlier, Shareplex maintains multiple copies of the object cache on source (3 copies) and target (1 copy) systems for a give activation (configuration). Given that fact, we can restore the missing object cache on our target system from the source system object cache.

In order to restore the object cache file from source, we first need to find the current activation ID of the missing file. There are different ways to find the activation ID.

The simplest way is, to locate it (activation ID) from the missing target object cache file name. If we observe the missing object cache file name, we can see the file name ends with a digit (objcache_sp_opst.5). This digit (5) is mapped to the activation ID of the source configuration file.

We can also confirm this, by querying the source Shareplex activation table SHAREPLEX_ACTID as show below.

---//
---// Querying source SHAREPLEX_ACTID table to find ACTID //---
---//
SQL> select ACTID
  2  from splex.shareplex_actid;

     ACTID
----------
         5

We can also locate the activation ID, by using the ‘show config’ command on the source site as shown below.

##---
##--- Finding ACTID using 'show config' command 
##---
sp_ctrl (mylab-01:2103)> show config

Tables Replicating with Key:

  "MYAPP"."T_EMP_DATA"   KEY: ID

Tables Replicating with no key:

  "MYAPP"."T_APP_USERS"

File Name  :sp_orppdb131.conf
Datasource :orppdb13_1
Activated  :05-Dec-15 17:02:47
Actid      :5

Total Objects                 :2
Total Objects Replicating     :2
Total Objects Not Replicating :0

View config summary in /shareplex/pdb131_2103//log/orppdb13_1_config_log

Once we locate the activation ID by following any of these methods, we can easily find the source object cache file that can be used to restore the target object cache.
The object cache file resides under the $SP_SYS_VARDIR/state directory on both source and target system.

Let’s find out the source object cache file that can be used to restore the missing object cache file on target site.

##---
##--- Finding source object cache using pattern 
##--- o.[Source_ORACLE_SID]-objcache_sp_ocap.[Activation_ID]
##---
[oracle@mylab-01 ~]$ cd $SP_SYS_VARDIR/state
[oracle@mylab-01 state]$ ls -lrt o.orppdb13_1-objcache_sp_ocap*5
-rw-r--r-- 1 oracle dba 1180 Dec  5 17:04 o.orppdb13_1-objcache_sp_ocap.5

In the above step, we have located the object cache for source capture process against the activation ID 5. We can use this file to recreate the object cache for target poster process.

The next step is to stop the source capture process to ensure that the object cache remains consistent during the restore process.

##---
##--- stopping source capture to ensure object cache consistency
##---
sp_ctrl (mylab-01:2103)> stop capture

sp_ctrl (mylab-01:2103)> show

Process    Source                               Target                 State                   PID
---------- ------------------------------------ ---------------------- -------------------- ------
Capture    o.orppdb13_1                                                Stopped by user
Read       o.orppdb13_1                                                Running               18193
Import     mylab-02                             mylab-01               Running               18232
Post       o.mypdb_01-mylab-02                  o.orppdb13_1           Running               18192
Export     mylab-01                             mylab-02               Running               18243

We have now stopped the source Capture process to ensure that the object cache remains consistent during the restoration process. Let’s locate the target directory location where the object cache file needs to be restored from source system.

We need to restore the object cache file under $SP_SYS_VARDIR/state directory on target system. Let’s find out the actual location to which this variable points to on target system.

##---
##--- finding state directory location on target system
##---
[oracle@mylab-02 ~]$ cd $SP_SYS_VARDIR/state
[oracle@mylab-02 state]$ pwd
/shareplex/mypdb01_2103/state

Based on our findings,we need to restore the source object cache file ‘o.orppdb13_1-objcache_sp_ocap.5’ from source state directory ‘/shareplex/pdb131_2103/state’ to target state directory ‘/shareplex/mypdb01_2103/state’ and rename the file as ‘0xc0a8e60d+PP+mylab-01+sp_opst+o.orppdb13_1-o.mypdb_01-objcache_sp_opst.5’

Let’s do all this in a single step. I am using scp to copy & rename the file in a single step as shown below.

##---
##--- Copying object cache file from source to target
##--- renaming the object cache file on target during the copy process
##---
[oracle@mylab-01 state]$ pwd
/shareplex/pdb131_2103/state
[oracle@mylab-01 state]$ scp o.orppdb13_1-objcache_sp_ocap.5 oracle@mylab-02:/shareplex/mypdb01_2103/state/0xc0a8e60d+PP+mylab-01+sp_opst+o.orppdb13_1-o.mypdb_01-objcache_sp_opst.5
oracle@mylab-02's password:
o.orppdb13_1-objcache_sp_ocap.5                                                                                                                            100% 1180     1.2KB/s   00:00

We have now restored the missing target object cache file for poster process. Let’s validate that the target object cache file is restored

##---
##--- Validating object cache file is restored on target system
##---
[oracle@mylab-02 state]$ pwd
/shareplex/mypdb01_2103/state
[oracle@mylab-02 state]$ ls -lrt /shareplex/mypdb01_2103/state/0xc0a8e60d+PP+mylab-01+sp_opst+o.orppdb13_1-o.mypdb_01-objcache_sp_opst.5
-rw-r--r-- 1 oracle dba 1180 Dec  5 17:45 /shareplex/mypdb01_2103/state/0xc0a8e60d+PP+mylab-01+sp_opst+o.orppdb13_1-o.mypdb_01-objcache_sp_opst.5

We can now start Capture process on the source system as well as the poster process (which erred out due to missing object cache) on the target site.

##---
##--- Starting Capture process on source system
##---
sp_ctrl (mylab-01:2103)> show

Process    Source                               Target                 State                   PID
---------- ------------------------------------ ---------------------- -------------------- ------
Capture    o.orppdb13_1                                                Stopped by user
Read       o.orppdb13_1                                                Running               18193
Import     mylab-02                             mylab-01               Running               18232
Post       o.mypdb_01-mylab-02                  o.orppdb13_1           Running               18192
Export     mylab-01                             mylab-02               Running               18243

sp_ctrl (mylab-01:2103)> start capture
sp_ctrl (mylab-01:2103)> show

Process    Source                               Target                 State                   PID
---------- ------------------------------------ ---------------------- -------------------- ------
Capture    o.orppdb13_1                                                Running               20558
Read       o.orppdb13_1                                                Running               18193
Import     mylab-02                             mylab-01               Running               18232
Post       o.mypdb_01-mylab-02                  o.orppdb13_1           Running               18192
Export     mylab-01                             mylab-02               Running               18243

Let’s now start the poster process on target site. It should now be able to resume operation as the object cache file is restored and functional.

##---
##--- Starting poster process on target 
##---
sp_ctrl (mylab-02:2103)> start post
sp_ctrl (mylab-02:2103)> show

Process    Source                               Target                 State                   PID
---------- ------------------------------------ ---------------------- -------------------- ------
Capture    o.mypdb_01                                                  Running               11273
Read       o.mypdb_01                                                  Running               11276
Import     mylab-01                             mylab-02               Running               11342
Post       o.orppdb13_1-mylab-01                o.mypdb_01             Running               12270
Export     mylab-02                             mylab-01               Running               11274

sp_ctrl (mylab-02:2103)> show post

Host   : mylab-02.oraclebuffer.com
Source : o.orppdb13_1     Queue : mylab-01

                           Operations
Target     Status              Posted Since              Total      Backlog
---------- --------------- ---------- ------------------ ---------- ----------
o.mypdb_01 Running                  4 05-Dec-15 17:48:15          0          0

   Last operation posted:
        Redo log: 0              Log offset: 0
        INTERNAL OPERATION

   Last transaction posted:
        Redo log: 26             Log offset: 9176880
        SCN: 1915634             Source time: 12/05/15 17:15:44

The poster process is now able to resume it’s operation and we are no longer seeing the errors related to object cache.


Viewing all articles
Browse latest Browse all 58

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>