In today’s post, I am going to discuss an issue that I had faced recently. One of my application team sent me an e-mail stating that they are not able to connect to the database and are receiving following error intermittently.
Application team provided the information that, they are using the following connection string.
##---- ##---- DB connection string used by App team ##---- (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST =rac01.oraclebuffer.com)(PORT = 1521)) (ADDRESS = (PROTOCOL = TCP)(HOST =rac02.oraclebuffer.com)(PORT = 1521)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = prod_app.oraclebuffer.com) ) )
As a first step, I immediately checked the status of the service and could see it was up running fine on both the cluster instances.
##---- ##---- Service status from clusterware ##---- oracle@rac01:/home/oracle: srvctl status service -d orpproda Service prod_app is running on instance(s) orpproda1, orpproda2
I could also see, the service is registered with the listener and in READY status on both the cluster nodes.
##---- ##---- Service registration status from listener (node 1) ##---- orpproda1@rac01:/home/oracle: lsnrctl status LSNRCTL for Linux: Version 11.1.0.7.0 - Production on 10-DEC-2015 03:24:13 Copyright (c) 1991, 2008, Oracle. All rights reserved. Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521)) STATUS of the LISTENER ------------------------ Alias LISTENER Version TNSLSNR for Linux: Version 11.1.0.7.0 - Production Start Date 16-OCT-2015 22:10:53 Uptime 54 days 6 hr. 13 min. 20 sec Trace Level off Security ON: Local OS Authentication SNMP OFF Listener Parameter File /app/oracle/product/11.1.0/db/network/admin/listener.ora Listener Log File /oraadmin/diag/tnslsnr/rac01/listener/alert/log.xml Listening Endpoints Summary... (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC1521))) (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=10.171.10.56)(PORT=1521))) (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=10.171.10.55)(PORT=1521))) Services Summary... Service "prod_app.oraclebuffer.com" has 2 instance(s). Instance "orpproda1", status READY, has 2 handler(s) for this service... Instance "orpproda2", status READY, has 1 handler(s) for this service... The command completed successfully ##---- ##---- Service registration status from listener (node 2) ##---- oracle@rac02:~> lsnrctl status LSNRCTL for Linux: Version 11.1.0.7.0 - Production on 10-DEC-2015 05:37:01 Copyright (c) 1991, 2008, Oracle. All rights reserved. Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521)) STATUS of the LISTENER ------------------------ Alias LISTENER Version TNSLSNR for Linux: Version 11.1.0.7.0 - Production Start Date 03-NOV-2015 12:53:45 Uptime 36 days 16 hr. 43 min. 15 sec Trace Level off Security ON: Local OS Authentication SNMP OFF Listener Parameter File /app/oracle/product/11.1.0/db/network/admin/listener.ora Listener Log File /oraadmin/diag/tnslsnr/rac02/listener/alert/log.xml Listening Endpoints Summary... (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC1521))) (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=10.171.10.58)(PORT=1521))) (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=10.171.10.57)(PORT=1521))) Services Summary... Service "prod_app.oraclebuffer.com" has 2 instance(s). Instance "orpproda1", status READY, has 1 handler(s) for this service... Instance "orpproda2", status READY, has 2 handler(s) for this service... The command completed successfully
I was not seeing any issue with the service status. As a next step, I tried to look into the listener log file and could see (from first node) the connections are indeed failing with the TNS errors.
##---- ##---- Connection errors from listener log ##---- 09-DEC-2015 10:50:16 * (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=prod_app.oraclebuffer.com)(CID=(PROGRAM=C:\oracle\product\11.2.0\client_1\sqldeveloper\sqldeveloper\bin\sqldeveloperW.exe)(HOST=DEV01-PTL-D)(USER=DEV01))(INSTANCE_NAME=orpproda1)) * (ADDRESS=(PROTOCOL=tcp)(HOST=10.201.35.106)(PORT=54124)) * establish * prod_app.oraclebuffer.com * 12520 TNS-12520: TNS:listener could not find available handler for requested type of server 09-DEC-2015 11:27:03 * (CONNECT_DATA=(SID=orpproda1)(CID=(PROGRAM=)(HOST=__jdbc__)(USER=oracle))) * (ADDRESS=(PROTOCOL=tcp)(HOST=10.171.10.56)(PORT=31280)) * establish * orpproda1 * 12519 TNS-12519: TNS:no appropriate service handler found
At that point, I had decided to check the service state from listener and following is what I had found from there
##---- ##---- Service state from listener (node 1) ##---- orpproda1@rac01:/home/oracle: lsnrctl services |grep -A 4 prod_app Service "prod_app.oraclebuffer.com" has 2 instance(s). Instance "orpproda1", status READY, has 2 handler(s) for this service... Handler(s): "DEDICATED" established:915 refused:0 state:blocked ##---> preventing incoming connections REMOTE SERVER ##---- ##---- Service state from listener (node 2) ##---- oracle@rac02:~> lsnrctl services |grep -A 4 prod_app Service "prod_app.oraclebuffer.com" has 2 instance(s). Instance "orpproda1", status READY, has 1 handler(s) for this service... Handler(s): "DEDICATED" established:59 refused:0 state:ready REMOTE SERVER
As we can see from the above output, the service is showing in blocked state for first instance (orpproda1). This means even though the service is up and running and registered with listener, listener will block any incoming connection for this service as it’s in blocked state.
Now, the question comes, why the service is showing in blocked state? Well, the most possible reason could be due to database resource limitations like the number of processes are exhausted and the database is not in a state to accept any new connection. When this type of resource limitations arise, listener blocks the service to prevent any incoming connection to go the underlying database.
There are few MOS articles available which explains these behaviour too. MOS Doc ID 240710.1 is one of those articles which talks about this behaviour.
Considering this fact, I had immediately checked the database resource utilization and could find the status as follows
---// ---// resource limit per instance ---// SQL> select inst_id,name,value from gv$parameter where name='processes'; INST_ID NAME VALUE ---------- -------------------- ---------- 1 processes 1000 2 processes 1000 ---// ---// resource utilization per instance ---// SQL> select inst_id,count(*) from gv$session group by inst_id; INST_ID COUNT(*) ---------- ---------- 1 77 2 87
As we can see from the output, our database resource utilization is very low and we shouldn’t have received that TNS error. I was not able to determine the root cause and due to which was not able to fix the issue.
At this point I was sure, the problem in hand is not related to service status nor related to the database resource utilization as the service is running and registered with listener. Further, we have adequate resource available to accept any new database connection.
I was suspecting, the listener may not be aware of the current state of the service (database resource utilization) and that is why it is blocking any connections coming for that service. I decided to check from listener log about the last time when the service status was updated by PMON.
When, I looked out for that information. I could see the service status was not updated for the first cluster instance since a long period.
##--- ##--- service_update count from listener log (for instance 1) ##--- oracle@rac01:/oraadmin/diag/tnslsnr/rac01/listener/trace: grep -i service_update listener.log | grep orpproda1 | grep 09-DEC-2015 | wc -l 0 ##-->> (PMON is not performing service update) ##--- ##--- service_update count from listener log (for instance 2) ##--- oracle@rac01:/oraadmin/diag/tnslsnr/rac01/listener/trace: grep -i service_update listener.log | grep orpproda2 | grep 09-DEC-2015 | wc -l 1696
At this point, I was almost sure that this is an issue related to dynamic registration of the service by PMON. Looks like PMON is not at all performing dynamic service registration (update) for the first instance (orpproda1).
It is possible that the database processes were exhausted at an earlier time (as indicated by the MOS note) which caused the listener to block incoming connections. However, once the resource utilization came back to normal, listener should have unblocked the service based on the current utilization report submitted by PMON; which did not happen in this case.
I just went ahead and performed that service registration on behalf of PMON (looks like he is busy with some other work, or might have forgot about the existence of listener)
---// ---// manual service registration ---// SQL> alter system register; System altered. ---// ---// service state from listener ---// SQL> !lsnrctl services |grep -A 4 prod_app Service "prod_app.oraclebuffer.com" has 2 instance(s). Instance "orpproda1", status READY, has 2 handler(s) for this service... Handler(s): "DEDICATED" established:915 refused:0 state:ready ---> changed from blocked to ready REMOTE SERVER
Our problem is resolved. The moment, I registered the service, the service state changed from blocked to ready. I could also observe, PMON has recovered its memory (may be was in coma) and doing periodic dynamic service updates.
##--- ##--- service_update count from listener log (for instance 1) ##--- oracle@rac01:/oraadmin/diag/tnslsnr/rac01/listener/trace: grep -i service_update listener.log | grep orpproda1 | grep 10-DEC-2015 | wc -l 89 ##-->> (PMON is now performing periodic service update) ##--- ##--- service_update count from listener log (for instance 2) ##--- oracle@rac01:/oraadmin/diag/tnslsnr/rac01/listener/trace: grep -i service_update listener.log | grep orpproda2 | grep 10-DEC-2015 | wc -l 120
Although this issue is resolved, the root cause of this issue was related to excessive connections being made to the database. This was identified by reviewing the listener logs. Following is a hourly report providing the count of connections established to the database
By looking at this report, it is evident that the current setting for the processes parameter (1000 processes) is not adequate to handle the incoming database traffic. It seems that on many occasions the number of connections reached to the maximum allowed limit and caused the listener to block the service. However, what happened here is; although the connections were released and process count came down far below the allowed threshold, PMON did not perform the service_update and resulted in a permanent service blockage.
To address the root cause we need to increase the value of processes parameter as suggested by MOS notes, which would make room for new incoming connections and prevent blocking of services by listener.
I was interested in knowing as to why PMON was not performing the periodic service update. However, unfortunately I did not have adequate trace/logs handy to dig in to that route any further.
Hope, you find this post useful in resolving similar connection issue !