SQL5005C and Ulimit Issues

I’m spoiled. While we build a fair number of environments each year, we also have basic starting standards. Because of this, I sometimes miss the basics when a problem shows up. Or at least it takes me longer to get there.

In this case, we had a couple of alerts over the high-volume weekend (Black Friday 2013). They were alerts from our connection monitor. We had done some tuning before the holiday, which included increasing MAXFILOP. This database is largely SMS tablespaces and an older version of DB2 (and WCS 6). The alerts were transient – as soon as someone logged in to look at them, connections were working just fine. Looking in the db2 diag log on Monday morning, I saw a number of entries like this:

2013-12-02-09.49.35.996713-300 I634382E367        LEVEL: Severe
PID     : 10811                TID  : 47251525193888PROC : db2agent (ESB19Q02) 0
INSTANCE: db2inst1             NODE : 000         DB   : ESB19Q02
APPHDL  : 0-1206               APPID: *LOCAL.db2inst1.133012144937
FUNCTION: DB2 UDB, base sys utilities, sqleserl, probe:10
RETCODE : ZRC=0xFFFFEC73=-5005

2013-12-02-09.49.35.996180-300 I632343E481        LEVEL: Error
PID     : 10811                TID  : 47251525193888PROC : db2agent (ESB19Q02) 0
INSTANCE: db2inst1             NODE : 000         DB   : ESB19Q02
APPHDL  : 0-1206               APPID: *LOCAL.db2inst1.133012144937
FUNCTION: DB2 UDB, config/install, sqlf_read_db_and_verify, probe:30
MESSAGE : SQL5005: sqlf_openfile rc = 
DATA #1 : Hexdump, 4 bytes
0x00007FFFFEC5864C : 0600 0F85

One time, I actually managed to catch the error at the command line – it looked like this:

$ db2 connect to esb19q02
SQL5005C  System Error.

In researching this, I found this helpful technote: http://www-01.ibm.com/support/docview.wss?uid=swg21403936

And while I first thought that I needed to increase MAXFILOP, I figured out that it was really the ulimit that was my problem:

$ ulimit -a
...
open files                      (-n) 1024
...

This particular instance had three databases on it, all with SMS tablespaces, and one with over a thousand tables. The settings for MAXFILOP for the three databases added up to 4096.

In order to increase the limit, I added the following lines to /etc/security/limits.conf, as root:

db2inst1    soft    nofile    16192
db2inst1    hard    nofile    16192

… where db2inst1 is my instance owner.

Modifying the ulimit as the instance owner itself did not work:

$ ulimit -n 16192
-bash: ulimit: open files: cannot modify limit: Operation not permitted

Unfortunately, these settings do not take effect until the next time the database manager is started (db2stop/db2start), so I had to schedule that outage. I could have also done it with a failover to avoid the actual outage.

To prevent the issue, MAXFILOP could actually be lowered across the databases, with the side effect of possibly decreasing database performance, but preventing an actual inability to connect.

With the modifications to make automatic storage tablespaces so easy to use, and the default, I see fewer and fewer databases making extensive use of SMS tablespaces.

Latest Images

Trending Articles

Latest Images