Quantcast
Channel: DB2 Settings – db2commerce.com

Two Worlds of DB2 LUW Performance Monitoring

$
0
0

I generally suggest both my readers and my clients turn on the monitor switches using the DFT_MON* parameters in the DBM CFG. However, I find myself using traditional snapshots less and less. The main time I still use them is when I’m panicked and my older training kicks in. But thinking back today, the only time in the last month that I used a “GET SNAPSHOT” was when working on a 8.2 database (which is still supported by IBM when it is in conjunction with WebSphere Commerce).

Two Worlds

You can think of the two monitoring methods as the “old” way and the “new” way of accessing performance monitoring data. Starting in earnest in DB2 9.7, IBM started to introduce the MON_* table functions and views as a lighter weight methodology for monitoring DB2. The IBM DB2 Info Center refers to them as simply “Monitoring Routines and Views”. IBM describes them as having less impact on the database being monitored, and also as IBM’s strategic direction for database monitoring.

The older methodology is referred to as “Snapshot Monitoring” – you used to have no choice but the GET SNAPSHOT command. SQL methods have been introduced over the years, so there are other ways to get to the data. I fully expect that in some version IBM will announce the deprecation and finally discontinue snapshot monitoring.

Today is what it looks like while we’re in transition between the methods – there is more functionality added to the monitoring routines and views with Fixpacks of 9.7. And it takes those of us with significant experience with the older method some time to move over to fully using them. We try to make old stuff work with the new methodologies. If there was just a “RESET MONITOR SWITCHES” function, some would take it up quicker. I imagine there are technical reasons that there isn’t.

Also, I’m slow to the game. Maybe SAP certifies a new version of DB2 within 9 weeks, but IBM WebSphere Commerce? I’m lucky if they do within TWO YEARS, and even luckier if they don’t require my clients to buy expensive separate DB2 licenses to do it. I hear they’re working on that, but have yet to see DB2 10.1 be allowed with any version of WebSphere Commerce, and 10.1 has been out for well over a year now. If you are experienced and already know all this new monitoring methodology inside and out, just comment and help a fellow DBA out. Perhaps something more cutting edge should be on my wishlist the next time I change jobs.

Monitor Switches and Snapshot Monitoring

The monitor switches control what data db2 collects for the older snapshot monitoring interface. Yes, they can be turned on at the command line for a particular session, but if you set the default settings for them in the DBM CFG, then you’ll have DB2 collecting the data you’ll need if you run into a performance problem. To check them:

$ db2 get dbm cfg |grep DFT_MON
Buffer pool                         (DFT_MON_BUFPOOL) = ON
Lock                                   (DFT_MON_LOCK) = ON
Sort                                   (DFT_MON_SORT) = ON
Statement                              (DFT_MON_STMT) = ON
Table                                 (DFT_MON_TABLE) = ON
Timestamp                         (DFT_MON_TIMESTAMP) = ON
Unit of work                            (DFT_MON_UOW) = ON

To set them, use:

db2 update dbm cfg using DFT_MON_BUFPOOL ON DFT_MON_LOCK ON DFT_MON_SORT ON DFT_MON_STMT ON DFT_MON_TABLE ON DFT_MON_UOW ON HEALTH_MON OFF IMMEDIATE

I included the setting for turning the Health Monitor off. I don’t use it. If you don’t use it, turn it off – it can cause performance problems.

Changes to these parameters take place online, as long as you are attached to the instance and use the “immediate” keyword.

What does turning these on get you? It gets you data for all fields in the “GET SNAPSHOT” output, for all SYSIBMADM views, and for most monitoring table functions that do not start with “MON_”. That includes things like SYSIBMADM.SNAPDB. These can be useful as a transition.

With this older methodology, you can issue the command “RESET MONITOR SWITCHES” and reset the counters for a particular session. The most useful aspect of this is to have a script that connects to a database, resets the monitor switches, sleeps for an hour (or some other amount of time)and then takes snapshots to files. This lets us get data that we know is only for a very specific period of time – though the dynamic SQL snapshot was always exempt from that methodology. I still capture data that way on most of my databases – as an emergency backup since some of my newer scripts still have bugs that I’m working out.

Looking at Data Elements to Determine Which Switch Must be on for Data to be Collected

Sometimes you have a particular area or a particular metric that you are especially interested in. Using the IBM DB2 Info Center, it is easy to see which switch must be on to collect data for a particular metric. Simply search to find the page on that metric, and you’ll get something that looks like this:
info_center_mon_ele
Click on the image to go to that page in the IBM DB2 Info Center if you like. Notice in Table 2, the right hand column lists the name of the monitor switch that must be turned on for this metric to be collected.

Remember, that the monitor switch must be turned on before any issue happens for DB2 to collect data about that issue. If you plan on making extensive use or even backup emergency use of the snapshot monitors, it is a good idea to have all of the monitor switches on by default.

Newer MON_* Monitoring Routines and Views

I’ve written a number of posts on these:
My New Best Friend – mon_ Part 1: Table Functions
My New Best Friend – mon_ Part 2: Views
My love affair with them continues.

One of their chief disadvantages has been that they always record data since the last database restart, and there is no way to reset monitor switches or limit data to a specific time period. I’ve said it before, and I’ll say it again, my favorite developerWorks article in recent years is: Monitoring in DB2 9.7, Part 1: Emulating data reset with the new DB2 9.7 monitoring table functions. This excellent article includes the scripts you need to implement an emulation of “RESET MONITOR SWITCHES” with the lightweight monitoring routines and views in DB2 9.7 and above. I have extended the methodology for my own personal use – to include the package cache and some other tidbits, and also to keep history tables of data so my old scripts that took snapshots hourly – instead they write to tables, and it is so easy to look through that data with SQL for performance trends or to pinpoint issues.

There are actually some configuration parameters that control what they collect, too. Unlike the old snapshot monitoring interface, these parameters are in the DB cfg. They look like this:

$ db2 get db cfg for wc005p01 |grep MON_
 Request metrics                       (MON_REQ_METRICS) = BASE
 Activity metrics                      (MON_ACT_METRICS) = BASE
 Object metrics                        (MON_OBJ_METRICS) = BASE
 Unit of work events                      (MON_UOW_DATA) = NONE
 Lock timeout events                   (MON_LOCKTIMEOUT) = WITHOUT_HIST
 Deadlock events                          (MON_DEADLOCK) = WITHOUT_HIST
 Lock wait events                         (MON_LOCKWAIT) = NONE
 Lock wait event threshold               (MON_LW_THRESH) = 5000000
 Number of package list entries         (MON_PKGLIST_SZ) = 32
 Lock event notification level         (MON_LCK_MSG_LVL) = 2

Like the DFT monitor switches for the old snapshot monitoring, changes to these take place online. However, they take effect for new connections only – existing connections are not affected. This could be problematic for an application that retains the same connections for long periods of time. Also, there is no way to turn them on for only a particular session like monitor switches – they’re either on or they’re not.

The same information is available in the IBM DB2 Info Center for each monitor element. To look at the same one as in the previous section:
info_center_mon_ele
Click on the image to go to that page in the IBM DB2 Info Center if you like. Notice Table 1 – the right hand column tells us what parameter and setting we need to collect data for this element. In the case of the example here, to get the data for this monitoring element, MON_OBJ_METRICS must be set to BASE or higher.

Most of these parameters allow a setting of “BASE”, “NONE”, or “EXTENDED” and default to “BASE” – which I am much happier about than the default settings for the snapshot monitoring interface that I have always disagreed with. Unlike the old snapshot monitoring, some of these settings can affect what event monitors collect too. See my post on Analyzing Deadlocks – the new way to see an example of how that works.

The info center on each of these parameters tells us what information they pertain to. The ones to focus on that roughly equate to the same kind of data as the old snapshot monitoring interface are:

  • MON_REQ_METRICS
  • MON_ACT_METRICS
  • MON_OBJ_METRICS

MON_REQ_METRICS

Monitoring Request Metrics
The default for databases migrated from previous DB2 versions is NONE. For newly created databases, the default is BASE.
The possible values are NONE, BASE, and EXTENDED
Setting the parameter to BASE or EXTENDED will cause data for the following to be collected:

  • MON_GET_UNIT_OF_WORK
  • MON_GET_CONNECTION
  • MON_GET_SERVICE_SUBCLASS
  • MON_GET_WORKLOAD
  • Statistics event monitor (you can only access this data if event monitor exists)
  • Unit of work event monitor (you can only access this data if event monitor exists)

MON_GET_WORKLOAD is actually the one I use in place of a database snapshot, so this is an important one

MON_ACT_METRICS

Monitoring Activity Metrics
The default for databases migrated from previous DB2 versions is NONE. For newly created databases, the default is BASE.
The possible values are NONE, BASE, and EXTENDED
Setting the parameter to BASE or EXTENDED will cause data for the following to be collected:

  • MON_GET_ACTIVITY_DETAILS
  • MON_GET_PKG_CACHE_STMT
  • Activity event monitor (you can only access this data if event monitor exists)

MON_GET_PKG_CACHE_STMT is probably my favorite table function, so this one is critical for me.

MON_OBJ_METRICS

Monitoring Object Metrics
The default for databases migrated from previous DB2 versions is NONE. For newly created databases, the default is BASE.
The possible values are NONE, BASE, and EXTENDED
Setting the parameter to BASE or EXTENDED will cause data for the following to be collected:

  • MON_GET_BUFFERPOOL
  • MON_GET_TABLESPACE
  • MON_GET_CONTAINER

This one would be critical if you were tuning I/O or memory – pretty critical areas, too.

The recommendation for all three of these parameters would be to set them at BASE.

Summary

In DB2 9.7 there are two worlds of performance monitoring, and a transition is occurring from the old snapshot monitoring to the new monitoring routines and views. Many of the same system monitor elements are available in each method. In the IBM DB2 Info Center page that we looked at above, in the left hand column of tables 1 and 2 you will see what table functions or snapshots you can use to see a particular element. Every row in a snapshot and every column in the table functions and views are represented with a similar page in the IBM DB2 Info Center.

New functionality is being added to the new monitoring routines and views – there are things there that you can’t get otherwise. Things like static SQL from the package cache – something with the old method that you had to use an event monitor for – which has far more database impact and also a lot of data to parse through. I’m not sure if IBM is intentionally removing some monitor elements that were in the old snapshots, or if there are some they just haven’t gotten around to, or if they have a different approach in mind, but there are also elements that are simply missing from the new monitoring routines and views. Ones that I’ve looked for lately and found missing include “connections_top” and “x_lock_escals”.

If you’ve been a DBA for a while or support particularly old versions of DB2, it’s best to know both of them. If you are a new DBA, focus on the newer methodology.


HADR_TIMEOUT vs. HADR_PEER_WINDOW

$
0
0

It has taken me a while to fully understand the difference between HADR_TIMEOUT and HADR_PEER_WINDOW. I think there is some confusion here, so I’d like to address what each means and some considerations when setting them. In general, you’ll only need HADR_TIMEOUT when using HADR and only need HADR_PEER_WINDOW when using TSA(db2haicu) or some other automated failover tool.

HADR_TIMEOUT

HADR Timeout defines, in seconds, the time after unavailability of the other HADR server is first noticed that the HADR state will change from connected to disconnected. If you are starting HADR on the primary server, then if the primary server cannot connect to the standby in this number of seconds, the start will fail and HADR will not be running. Assuming no failover software and the setting of HADR_PEER_WINDOW to 0, The primary server will continue processing transactions without sending them to the standby. It will periodically retry the connection to the standby, and if the standby becomes available it will again start processing transactions with commits tied to the requirements of the SYNCMODE being used.

If attempting a takeover without force, DB2 will wait this amount of time to attempt to communicate with the other server before failing and returning an error message.

The real point of this time period is to allow minor network hiccups to occur without other action being taken, but yet to consider the connection failed so as not to impede transactions after a reasonable period of time.

Setting this value depends on your network. I have a client with frequent network issues where I keep this value at 300. I have other clients where I use simply 120, which seems to work well for most environments. I have seen it set as low as 10 seconds for a very highly available network where seconds of slowdown are not very acceptable, but would be very cautious setting it that low.

HADR_PEER_WINDOW

This parameter is not usually used when only HADR is in place with manual failover. But it is critical if using an automated failover for HADR such as TSA(db2haicu) or others. This tells DB2 how long AFTER the connection is considered failed to continue to behave as if the connection were not failed. Now that may sound a bit odd. But the real intention here is to allow the connection to be considered failed, and then give time for that failure to be detected by the failover automation software before any transactions are allowed to complete and compromise the data. This means you can easily have connections waiting for as much as HADR_TIMEOUT plus HADR_PEER_WINDOW before a failover is completed and your database is again available.

Most frequently I see HADR_PEER_WINDOW set to 300 out of an abundance of caution – actual takeovers do not generally take that long, though in a failure state there may be multiple factors slowing down the failover.

SQL5005C and Ulimit Issues

$
0
0

I’m spoiled. While we build a fair number of environments each year, we also have basic starting standards. Because of this, I sometimes miss the basics when a problem shows up. Or at least it takes me longer to get there.

In this case, we had a couple of alerts over the high-volume weekend (Black Friday 2013). They were alerts from our connection monitor. We had done some tuning before the holiday, which included increasing MAXFILOP. This database is largely SMS tablespaces and an older version of DB2 (and WCS 6). The alerts were transient – as soon as someone logged in to look at them, connections were working just fine. Looking in the db2 diag log on Monday morning, I saw a number of entries like this:

2013-12-02-09.49.35.996713-300 I634382E367        LEVEL: Severe
PID     : 10811                TID  : 47251525193888PROC : db2agent (ESB19Q02) 0
INSTANCE: db2inst1             NODE : 000         DB   : ESB19Q02
APPHDL  : 0-1206               APPID: *LOCAL.db2inst1.133012144937
FUNCTION: DB2 UDB, base sys utilities, sqleserl, probe:10
RETCODE : ZRC=0xFFFFEC73=-5005

2013-12-02-09.49.35.996180-300 I632343E481        LEVEL: Error
PID     : 10811                TID  : 47251525193888PROC : db2agent (ESB19Q02) 0
INSTANCE: db2inst1             NODE : 000         DB   : ESB19Q02
APPHDL  : 0-1206               APPID: *LOCAL.db2inst1.133012144937
FUNCTION: DB2 UDB, config/install, sqlf_read_db_and_verify, probe:30
MESSAGE : SQL5005: sqlf_openfile rc = 
DATA #1 : Hexdump, 4 bytes
0x00007FFFFEC5864C : 0600 0F85   

One time, I actually managed to catch the error at the command line – it looked like this:

$ db2 connect to esb19q02
SQL5005C  System Error.

In researching this, I found this helpful technote: http://www-01.ibm.com/support/docview.wss?uid=swg21403936

And while I first thought that I needed to increase MAXFILOP, I figured out that it was really the ulimit that was my problem:

$ ulimit -a
...
open files                      (-n) 1024
...

This particular instance had three databases on it, all with SMS tablespaces, and one with over a thousand tables. The settings for MAXFILOP for the three databases added up to 4096.

In order to increase the limit, I added the following lines to /etc/security/limits.conf, as root:

db2inst1    soft    nofile    16192
db2inst1    hard    nofile    16192

… where db2inst1 is my instance owner.

Modifying the ulimit as the instance owner itself did not work:

$ ulimit -n 16192
-bash: ulimit: open files: cannot modify limit: Operation not permitted

Unfortunately, these settings do not take effect until the next time the database manager is started (db2stop/db2start), so I had to schedule that outage. I could have also done it with a failover to avoid the actual outage.

To prevent the issue, MAXFILOP could actually be lowered across the databases, with the side effect of possibly decreasing database performance, but preventing an actual inability to connect.

With the modifications to make automatic storage tablespaces so easy to use, and the default, I see fewer and fewer databases making extensive use of SMS tablespaces.

Do you see a ‘Congested’ State for HADR while Performing Reorgs?

$
0
0

I monitor HADR closely, and want to be paged out if HADR goes down. Why? Because any subsequent failure would then be a serious problem. Once HADR has a problem, I suddenly have a single point of failure. That said, for some clients we have to tell our monitoring tools not to alert on HADR during online reorgs. The reason for that is that HADR tends to get into a ‘Congested’ state during online reorgs for certain databases.

Describing the Problem

Essentially what we’re seeing is network congestion. We can see it for one of two major reasons:

  1. A lot of log files are being generated, and the standby is having trouble keeping up
  2. An operation that takes a long time such as an offline reorg, but requires relatively few log entries

In the case of online reorgs, I have certainly seen a lot of logging occurring.
This is what the problem looks like when you get HADR status using the db2pd command:

HADR Information:
Role    State                SyncMode HeartBeatsMissed   LogGapRunAvg (bytes)
Primary Peer                 Nearsync 0                  991669

ConnectStatus ConnectTime                           Timeout
Congested     Wed Nov  25 20:31:26 2010 (1283970686) 120

On the primary, DB2 will write the active log file containing the start of the reorg off to disk, so transactions are not impacted there, but on the standby, DB2 cannot do that – it needs to apply whatever is in that log buffer immediately, and if it cannot, it has to wait until the operation in question is finished before moving on.

To prevent the Primary and Standby getting out of sync, DB2 uses this Congested state, and it may actually cause transactions to wait. If, like me, you run a number of reorgs in a window, you may see DB2 going in and out of this state as various tables are reorged. I’m also pretty sure that this doesn’t happen from the start of a reorg to the end of a reorg – there are internal phases or parts of the reorg that may cause this scenario – mostly because I’ve had 5-hour reorgs without hitting a congested state. The level of other activity on the database would probably also play a role.

What are the Effects of the Problem?

When in a ‘Congested’ state, DB2 can prevent transactions from completing on the Primary in SYNC, NEARSYNC, and even ASYNC mode. Only SUPERASYNC mode would be immune to this issue. I tend to run my “online” reorgs at my lowest point of volume anyway to prevent unexpected user impact, so I’ve only heard complaints from automated monitoring, and those even were short lived (less than a couple of minutes). This can all make it pretty frustrating to troubleshoot. Intermittent outages during reorgs of varying duration. Note that just because the status is congested does not mean transactions are being blocked or slowed, but they can be.

Resolving in DB2 10.1 and Later

DB2 10.1 and later have a database configuration parameter – HADR_SPOOL_LIMIT – which you can use to specify how much log you would like to spool to disk on the standby server. It is specified in 4K pages, but there are two special settings which may be useful:

  • -1 means unlimited – fill up all the disk space you have in your logging filesystem on your standby
  • -2 (AUTOMATIC) means fill up configured log space (may only be available on DB2 10.5)

In DB2 10.1, the default is no log spooling, while in DB2 10.5, the default is AUTOMATIC or -2.

Inevitably a reader at this point is wondering “doesn’t that compromise my ability to failover?”. No, allowing logs to spool to disk does not affect your ability to failover – this just makes the standby behave more like the primary on logging. What it does do is potentially increase the time that a failover would take. You would still get all the data, but you would then have to roll forward through any log files on disk before the database would become available on the standby.

Resolving before DB2 10.1

Before 10.1, there isn’t a full resolution – we can either accept the slowness caused by the congestion, OR we can increase the size of the log buffer on the standby. By default, the log buffer on the standby is two times that on the primary. This is handled by the DB2 registry parameter DB2_HADR_BUF_SZ. The reason we don’t want to control this by actually changing LOGBUFSZ is that in a failover, we’d want a normal log buffer on the (new) primary. This parameter defaults to 2 times the size of the log buffer. Its maximum is supposedly 4 GB, but reading and searching around, I wouldn’t set it over 1 GB.

I learned about this work-around only a couple of weeks ago, and am already using it on at least one client. Another nice thing – this parameter is available all the way back to DB2 8.2, so for those of us with some back-level databases it still works.

Here’s the drawback of the approach though – consider a failure scenario in NEARSYC. If both database servers were to fail at the same moment, AND I couldn’t get the primary database server back, I would lose anything in that larger memory area. Granted, that’s a rare failure scenario, but certainly within the realm of the possible. Unlike the resolution for DB2 10.5, you are changing your recoverability, and you have to decide if the benefits are worth the risks.

A big thanks to Melanie Stopfer @mstopfer1 and Dale McInnis for covering this in their presentations at IBM IOD – this was my biggest immediate-impact technical take-away from that conference.

Informational Constraints – Benefits and Drawbacks

$
0
0

Krafick_HeadshotOne of the most frustrating things a DBA can experience is troubleshooting due to bad data. The client is upset because rows are missing or incorrect data is returned.  The client facing web front end could be displaying gobilty-gook because the data retrieved makes no sense. Resources and energy are burned because of an issue is easily solved with the proper use of constraints.

So why would I ever want a check constraint to be created on a table but NOT ENFORCED?

Constraints are a double-edged sword. They protect the database from poor data quality at the cost of overhead associated with that constraint. If you load a large amount of data that must be verified with constraints you can add some significant overhead.

If the application is configured to verify the data being fed into the database, do we really need the additional overhead of database constraints? Data verification in both the application and the database is a duplication of effort and leads wasted resources and elongated time.

A constraint that is NOT ENFORCED informs the database manager what the data would normally look like and how it would behave under constraint, but data that violates the constraint is not prevented from accessing the target table. This is called an informational constraint.

What is the benefit of an informational constraint? Optimizer can base its query plan on what it believes the data should look like because of the constraint’s definition. This leads to improved performance.

The catch? What happens with bad data? Let’s take a look at an example.

In this specific scenario, assume we have a simple table that is the back end of a project management tool. On the web front end, the application takes in a project name and its current status. Responses can be “Green” for a project in good health, “Yellow” for a project facing challenges, and “Red” for a project at a standstill.

The table definition could look like this (Notice the NOT ENFORCED clause):

 

CREATE TABLE COMMAND with an informational constraint.

CREATE TABLE COMMAND with an informational constraint.

 

Once the data is inserted, we have the following project data:

Project Management Data

 

This is great! If the application is vetting data before it comes in, the constraint is NOT ENFORCED so there is no overhead and the ENABLE QUERY OPTIMIZATION clause tells DB2 that it can lean on this constraint to help generate a great optimization plan.

But what happens if the data is not vetted by the application.  For example:

What happens when bad data is inserted?

What happens when bad data is inserted?

The table would be loaded with incorrect data, invalid for queries to the database. Our table now is in the following state:

 

Bad data bypasses a NOT ENFORCED constraint and is inserted

Bad data bypasses a NOT ENFORCED constraint and is inserted

 

Here is the weakness of an informational constraint. Optimizer assumes  rows that violate the constraint don’t exist in the table. As a result, a query may not include the invalid rows when they should be returned. Although they exist, DB2 is convinced otherwise because of the informational constraint.

So the following query could return the following output:

Returned rows are missing data that is actually there.

Returned rows are missing data that is actually there.

 

Notice that optimizer believes the rows can’t be there so the invalid rows are not displayed. IBM’s Knowledge Center (as well as other training material) states that the rows may not display. This leads me to believe that two separate queries could return two separate results. Bad data now leads to inconsistent queries.

This weakness of an informational constraint could be too large point of failure for some administrators. However, the benefits of an informational constraint seem to outweigh the risks if your application is configured properly. Imagine the overhead saved and speed increase on a massive data warehouse with an overnight load cycle of hundreds of gigs.

 


Michael Krafick is an occasional contributor to db2commerce.com. He has been a production support DBA for over 12 years in data warehousing and highly transactional OLTP environments. He was acknowledged as a top ten session speaker for “10 Minute Triage” at the 2012 IDUG Technical Conference. Michael also has extensive experience in setting up monitoring configurations for DB2 Databases as well as preparing for high availability failover, backup, and recovery. He can be reached at “Michael.Krafick (at) icloud (dot) com”. Linked-in Profile: http://www.linkedin.com/in/michaelkrafick. Twitter: mkrafick

Mike’s blog posts include:
10 Minute Triage: Assessing Problems Quickly (Part I)
10 Minute Triage: Assessing Problems Quickly (Part II)
Now, now you two play nice … DB2 and HACMP failover
Technical Conference – It’s a skill builder, not a trip to Vegas.
Why won’t you just die?! (Cleaning DB2 Process in Memory)
Attack of the Blob: Blobs in a Transaction Processing Environment
Automatic Storage Tablespaces (AST): Compare and Contrast to DMS
DB2 v10.1 Column Masking
Automatic Storage (AST) and DMS
Reloacting the Instance Home Directory

STMM Analysis Tool

$
0
0

I mostly like and use DB2’s Self Tuning Memory Memory Manager (STMM) for my OLTP databases where I have only one DB2 Instance/Database on a database server. I do have some areas that I do not let it set for me. I’ve recently learned about an analysis tool – Adam Storm did a presentation that mentioned it at IDUG 2014 in Phoenix.

Parameters that STMM Tunes

To begin with, it is important to understand what STMM tunes and what it doesn’t. I recommend reading When is ‘AUTOMATIC’ Not STMM?. There are essentially only 5 areas that STMM can change:

  1. DATABASE_MEMORY if AUTOMATIC
  2. SORTHEAP, SHEAPTHRES_SHR if AUTOMATIC, and SHEAPTHRES is 0
  3. BUFFERPOOLS if number of pages on CREATE/ALTER BUFFERPOOL is AUTOMATIC
  4. PCKCACHESZ if AUTOMATIC
  5. LOCKLIST, MAXLOCKS if AUTOMATIC (both must be automatic)

Any other parameters, even if they are set to “AUTOMATIC” are not part of STMM.

Why I don’t use STMM for PCKCACHESZ

A number of the e-commerce database servers I support are very much oversized for daily traffic. This is common for retail sites because there are always peak periods, and servers tend to be sized to handle those. Many retail clients have extremely drastic peak periods like Black Friday, Cyber Monday, or other very critical selling times.

I noticed for one of my clients that was significantly oversized on Memory, DB2 was making the package cache absolutely huge. I saw this:

Package cache size (4KB)                   (PCKCACHESZ) = AUTOMATIC(268480)

That’s a full GB allocated to the package cache. There were over 30,000 statements in package cache, the vast majority with only a single execution. The thing is that for my OLTP databases the statements for which performance is critical are often static SQL or they’re using parameter markers. Most of the ad-hoc statements that are only executed once I don’t really care if they’re stored in package cache. This was about a 50-100 GB database on a server with 64 GB of memory. The buffer pool hit ratios were awesome, so I guess DB2 didn’t really need the memory there, but still. In my mind, for well-run OLTP databases, that much package cache does not help performance. I am certain there may be databases that need that much or more in the Package Cache, but this database was simply not one of them. Because of this experience I set the package cache manually and tune it properly.

A few STMM Caveats

Just a few things to note – I have heard rumors of issues with STMM when there are multiple DB2 instances running on a server. I have not personally experienced this. Also, the settings that STMM is using are not transferred at all to the HADR standby, so when you fail over, you may have poor performance while STMM starts up. You could probably script a regular setting of the STMM parameters to deal with this. Also if you have a well-tuned, well performing non-STMM database there is probably little reason and not much reward in changing it to STMM. Most experts with database performance can likely tune the database better than STMM, but we can’t all be performance experts, or give as much time as we’d like to every database we support.

The STMM Log Parser

STMM logs the changes it make in parameter sizes both to the db2diag.log and to some STMM log files. (hint: IBM, maybe these could be used to periodically update the HADR standby too?). The log files are in the stmmlog subdirectory of the DIAGPATH. The log files aren’t exactly tough to read, but they don’t really present the information in an easy to view way. Entries look a bit like diagnostic log entries:

2014-07-02-23.44.40.788684+000 I10464203A600        LEVEL: Event
PID     : 18677976             TID : 46382          PROC : db2sysc 0
INSTANCE: db2inst1             NODE : 000           DB   : WC42P1L1
APPHDL  : 0-12466              APPID: *LOCAL.DB2.140620223552
AUTHID  : DB2INST1             HOSTNAME: ecprwdb01s
EDUID   : 46382                EDUNAME: db2stmm (WC42P1L1) 0
FUNCTION: DB2 UDB, Self tuning memory manager, stmmMemoryTunerMain, probe:2065
DATA #1 : String, 115 bytes
Going to sleep for 180000 milliseconds.
Interval = 5787, State = 0, intervalsBeforeStateChange = 0, lost4KPages = 0

2014-07-02-23.47.40.807231+000 I10464804A489        LEVEL: Event
PID     : 18677976             TID : 46382          PROC : db2sysc 0
INSTANCE: db2inst1             NODE : 000           DB   : WC42P1L1
APPHDL  : 0-12466              APPID: *LOCAL.DB2.140620223552
AUTHID  : DB2INST1             HOSTNAME: ecprwdb01s
EDUID   : 46382                EDUNAME: db2stmm (WC42P1L1) 0
FUNCTION: DB2 UDB, Self tuning memory manager, stmmMemoryTunerMain, probe:1909
MESSAGE : Activation stage ended

2014-07-02-23.47.40.807661+000 I10465294A488        LEVEL: Event
PID     : 18677976             TID : 46382          PROC : db2sysc 0
INSTANCE: db2inst1             NODE : 000           DB   : WC42P1L1
APPHDL  : 0-12466              APPID: *LOCAL.DB2.140620223552
AUTHID  : DB2INST1             HOSTNAME: ecprwdb01s
EDUID   : 46382                EDUNAME: db2stmm (WC42P1L1) 0
FUNCTION: DB2 UDB, Self tuning memory manager, stmmMemoryTunerMain, probe:1913
MESSAGE : Starting New Interval

2014-07-02-23.47.40.808193+000 I10465783A925        LEVEL: Event
PID     : 18677976             TID : 46382          PROC : db2sysc 0
INSTANCE: db2inst1             NODE : 000           DB   : WC42P1L1
APPHDL  : 0-12466              APPID: *LOCAL.DB2.140620223552
AUTHID  : DB2INST1             HOSTNAME: ecprwdb01s
EDUID   : 46382                EDUNAME: db2stmm (WC42P1L1) 0
FUNCTION: DB2 UDB, Self tuning memory manager, stmmLogRecordBeforeResizes, probe:590
DATA #1 : String, 435 bytes

***  stmmCostBenefitRecord ***
Type: LOCKLIST
PageSize: 4096
Benefit:
  -> Simulation size: 75
  -> Total seconds saved: 0 (+ 0 ns)
  -> Normalized seconds/page: 0
Cost:
  -> Simulation size: 75
  -> Total seconds saved: 0 (+ 0 ns)
  -> Normalized seconds/page: 0
Current Size: 27968
Minimum Size: 27968
Potential Increase Amount: 13984
Potential Increase Amount From OS: 13984
Potential Decrease Amount: 0
Pages Available For OS: 0

2014-07-02-23.47.40.808580+000 I10466709A993        LEVEL: Event
PID     : 18677976             TID : 46382          PROC : db2sysc 0
INSTANCE: db2inst1             NODE : 000           DB   : WC42P1L1
APPHDL  : 0-12466              APPID: *LOCAL.DB2.140620223552
AUTHID  : DB2INST1             HOSTNAME: ecprwdb01s
EDUID   : 46382                EDUNAME: db2stmm (WC42P1L1) 0
FUNCTION: DB2 UDB, Self tuning memory manager, stmmLogRecordBeforeResizes, probe:590
DATA #1 : String, 502 bytes

***  stmmCostBenefitRecord ***
Type: BUFFER POOL ( BUFF_REF16K )
PageSize: 16384
Saved Misses: 0
Benefit:
  -> Simulation size: 2560
  -> Total seconds saved: 0 (+ 0 ns)
  -> Normalized seconds/page: 0
Cost:
  -> Simulation size: 2560
  -> Total seconds saved: 0 (+ 0 ns)
  -> Normalized seconds/page: 0
Current Size: 25000
Minimum Size: 5000
Potential Increase Amount: 12500
Potential Increase Amount From OS: 12500
Potential Decrease Amount: 5000
Pages Available For OS: 5000
Interval Time: 180.029

Scrolling through each 10 MB file of this is not likely to give us a complete picture very easily. IBM is offering us, through developerWorks a log parser tool for STMM. The full writeup on it is here: http://www.ibm.com/developerworks/data/library/techarticle/dm-0708naqvi/index.html

The tool is free, and is a Perl script that DBAs can modify if they like. AIX and Linux tend to include Perl, and it’s not hard to install on Windows using ActivePerl or a number of other options. I happen to rather like a Perl utility as I do the vast majority of my database maintenance scripting in Perl.

Download and Set Up

The developerWorks link above includes the Perl script. Scroll down to the “download” section, click on “parseStmmLogFile.pl”, if you accept the terms and conditions, click “I Accept the Terms and Conditions”, and save the file. Then upload it to the database server you wish to use it on.

Syntax

There are several options here. Whenever you execute it, you will need to specify the name of one of your STMM logs, and the database name. The various options beyond that are covered below.

Examples

The default if you specify nothing beyond the file name and the database name is the s option. This gives you the new size at each interval of each heap that STMM manages. The output looks something like this:

 ./parseStmmLogFile.pl /db2diag/stmmlog/stmm.43.log SAMPLE s
# Database: SAMPLE
[ MEMORY TUNER - LOG ENTRIES ]
[ Interv ]      [        Date         ] [ totSec ]      [ secDif ]      [ newSz ]
[        ]      [                     ] [        ]      [        ]      [ LOCKLIST  BUFFERPOOL - BUFF16K:16K BUFFERPOOL - BUFF32K:32K BUFFERPOOL - BUFF4K BUFFERPOOL - BUFF8K:8K BUFFERPOOL - BUFF_CACHEIVL:8K BUFFERPOOL - BUFF_CAT16K:16K BUFFERPOOL - BUFF_CAT4K BUFFERPOOL - BUFF_CAT8K:8K BUFFERPOOL - BUFF_CTX BUFFERPOOL - BUFF_REF16K:16K BUFFERPOOL - BUFF_REF4K BUFFERPOOL - BUFF_REF8K:8K BUFFERPOOL - BUFF_SYSCAT BUFFERPOOL - BUFF_TEMP16K:16K BUFFERPOOL - BUFF_TEMP32K:32K BUFFERPOOL - BUFF_TEMP4K BUFFERPOOL - BUFF_TEMP8K:8K BUFFERPOOL - IBMDEFAULTBP ]
[      1 ]      [ 02/07/2014 00:17:27 ] [    180 ]      [    180 ]      [ 27968 12500 2500 2000000 50000 500000 25000 1000000 50000 1000000 25000 1000000 50000 50000 1000 1000 1000 1000 10000 ]
[      2 ]      [ 02/07/2014 00:20:27 ] [    360 ]      [    180 ]      [ 27968 12500 2500 2000000 50000 500000 25000 1000000 50000 1000000 25000 1000000 50000 50000 1000 1000 1000 1000 10000 ]
[      3 ]      [ 02/07/2014 00:23:27 ] [    540 ]      [    180 ]      [ 27968 12500 2500 2000000 50000 500000 25000 1000000 50000 1000000 25000 1000000 50000 50000 1000 1000 1000 1000 10000 ]
[      4 ]      [ 02/07/2014 00:26:27 ] [    720 ]      [    180 ]      [ 27968 12500 2500 2000000 50000 500000 25000 1000000 50000 1000000 25000 1000000 50000 50000 1000 1000 1000 1000 10000 ]
[      5 ]      [ 02/07/2014 00:29:27 ] [    900 ]      [    180 ]      [ 27968 12500 2500 2000000 50000 500000 25000 1000000 50000 1000000 25000 1000000 50000 50000 1000 1000 1000 1000 10000 ]

If you have a number of bufferpools, this can be hard to read, even on a large screen. the width of the numeric values is not hte same as their names, making it not all that tabular. To fix that, you can try the d option, which delimits the output with semicolons, making it easier to get into your favorite spreadsheet tool. The output in that case, raw looks like this:

./parseStmmLogFile.pl /db2diag/stmmlog/stmm.43.log SAMPLE s d
# Database: SAMPLE
MEMORY TUNER - LOG ENTRIES
Interval;Date;Total Seconds;Difference in Seconds; LOCKLIST  ;  BUFFERPOOL - BUFF16K:16K ;  BUFFERPOOL - BUFF32K:32K ;  BUFFERPOOL - BUFF4K ;  BUFFERPOOL - BUFF8K:8K ;  BUFFERPOOL - BUFF_CACHEIVL:8K ;  BUFFERPOOL - BUFF_CAT16K:16K ;  BUFFERPOOL - BUFF_CAT4K ;  BUFFERPOOL - BUFF_CAT8K:8K ;  BUFFERPOOL - BUFF_CTX ;  BUFFERPOOL - BUFF_REF16K:16K ;  BUFFERPOOL - BUFF_REF4K ;  BUFFERPOOL - BUFF_REF8K:8K ;  BUFFERPOOL - BUFF_SYSCAT ;  BUFFERPOOL - BUFF_TEMP16K:16K ;  BUFFERPOOL - BUFF_TEMP32K:32K ;  BUFFERPOOL - BUFF_TEMP4K ;  BUFFERPOOL - BUFF_TEMP8K:8K ;  BUFFERPOOL - IBMDEFAULTBP ; ;
1;02/07/2014 00:17:27;180;180; 27968; 12500; 2500; 2000000; 50000; 500000; 25000; 1000000; 50000; 1000000; 25000; 1000000; 50000; 50000; 1000; 1000; 1000; 1000; 10000;
2;02/07/2014 00:20:27;360;180; 27968; 12500; 2500; 2000000; 50000; 500000; 25000; 1000000; 50000; 1000000; 25000; 1000000; 50000; 50000; 1000; 1000; 1000; 1000; 10000;
3;02/07/2014 00:23:27;540;180; 27968; 12500; 2500; 2000000; 50000; 500000; 25000; 1000000; 50000; 1000000; 25000; 1000000; 50000; 50000; 1000; 1000; 1000; 1000; 10000;
4;02/07/2014 00:26:27;720;180; 27968; 12500; 2500; 2000000; 50000; 500000; 25000; 1000000; 50000; 1000000; 25000; 1000000; 50000; 50000; 1000; 1000; 1000; 1000; 10000;
5;02/07/2014 00:29:27;900;180; 27968; 12500; 2500; 2000000; 50000; 500000; 25000; 1000000; 50000; 1000000; 25000; 1000000; 50000; 50000; 1000; 1000; 1000; 1000; 10000;

Save it off to a file, import it into a spreadsheet, and you get something like this:
STMM_log_parser_output_s-1

Ok, and finally, you can make a pretty graph to look at these in a more human way:
STMM_log_parser_output_chart
Now that would be a lot more exciting if I ran it on a database where things were changing more often, but that’s the one I have to play with at the moment.

There are some other interesting options besides the s option. The b option shows the benefit analysis that STMM does, which looks pretty boring on my database, but still:

./parseStmmLogFile.pl /db2diag/stmmlog/stmm.43.log SAMPLE b
# Database: SAMPLE
[ MEMORY TUNER - LOG ENTRIES ]
[ Interv ]      [        Date         ] [ totSec ]      [ secDif ]      [ benefitNorm ]
[        ]      [                     ] [        ]      [        ]      [ LOCKLIST  BUFFERPOOL - BUFF16K:16K BUFFERPOOL - BUFF32K:32K BUFFERPOOL - BUFF4K BUFFERPOOL - BUFF8K:8K BUFFERPOOL - BUFF_CACHEIVL:8K BUFFERPOOL - BUFF_CAT16K:16K BUFFERPOOL - BUFF_CAT4K BUFFERPOOL - BUFF_CAT8K:8K BUFFERPOOL - BUFF_CTX BUFFERPOOL - BUFF_REF16K:16K BUFFERPOOL - BUFF_REF4K BUFFERPOOL - BUFF_REF8K:8K BUFFERPOOL - BUFF_SYSCAT BUFFERPOOL - BUFF_TEMP16K:16K BUFFERPOOL - BUFF_TEMP32K:32K BUFFERPOOL - BUFF_TEMP4K BUFFERPOOL - BUFF_TEMP8K:8K BUFFERPOOL - IBMDEFAULTBP ]
[      1 ]      [ 02/07/2014 00:17:27 ] [    180 ]      [    180 ]      [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ]
[      2 ]      [ 02/07/2014 00:20:27 ] [    360 ]      [    180 ]      [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ]
[      3 ]      [ 02/07/2014 00:23:27 ] [    540 ]      [    180 ]      [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ]
[      4 ]      [ 02/07/2014 00:26:27 ] [    720 ]      [    180 ]      [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ]
[      5 ]      [ 02/07/2014 00:29:27 ] [    900 ]      [    180 ]      [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ]

The o option shows only database memory and overflow buffer tuning:

./parseStmmLogFile.pl /db2diag/stmmlog/stmm.43.log SAMPLE o
# Database: SAMPLE
[ MEMORY TUNER - DATABASE MEMORY AND OVERFLOW BUFFER TUNING - LOG ENTRIES ]
[ Interv ]      [        Date         ] [ totSec ]      [ secDif ]      [ configMem ]   [ memAvail ]    [ setCfgSz ]
[      1 ]      [ 02/07/2014 00:17:27 ] [    180 ]      [    180 ]      [ 6912 ]        [ 6912 ]        [ 1990 ]
[      2 ]      [ 02/07/2014 00:20:27 ] [    360 ]      [    180 ]      [ 6912 ]        [ 6912 ]        [ 1990 ]
[      3 ]      [ 02/07/2014 00:23:27 ] [    540 ]      [    180 ]      [ 6912 ]        [ 6912 ]        [ 1990 ]
[      4 ]      [ 02/07/2014 00:26:27 ] [    720 ]      [    180 ]      [ 6912 ]        [ 6912 ]        [ 1990 ]
[      5 ]      [ 02/07/2014 00:29:27 ] [    900 ]      [    180 ]      [ 6912 ]        [ 6912 ]        [ 1990 ]

There is also a 4 option that you can use to convert all values to 4k pages.

Summary

There are some useful things in the STMM log parser, if you want to understand the changes DB2 is making. Many of us, coming from fully manual tuning naturally distrust what STMM or other tuning tools are doing, so this level of transparency helps us understand what is happening and why it is or is not working in our environments. I would love to see more power in this. Being able to query this data with a table function or administrative view (we can with the db2diag.log!) would be even more useful so the output could be further limited and tweaked. The script is well documented, and I imagine I could tweak it to limit it if I wanted to. I’d love to have it call out actual changes – that would be harder to graph, but for the text output, could be more useful for a fairly dormant system.

Quick Hit Tips – CPUSPEED, RESTRICTIVE, and DB2_WORKLOAD

$
0
0

Krafick_Headshot Today we are going to talk about some random DB2 features that can’t stand in a blog of their own, but are worth discussing nonetheless. These are tidbits I had discovered during “DB2’s Got Talent” presentations, IDUG conferences, or “Hey, look what I discovered” moments.

CPUSPEED (Database Management Configuration)

You blow past this setting every time you execute “db2 get dbm cfg”. It’s located at the very top of your output and is one of the more important settings that is overlooked. The result for this parameter is set after DB2 looks at the CPU and determines how fast instructions process through (millisec/instruction).

Optimizer is influenced greatly by this setting. CPUSPEED is automatically set upon instance creation and is often never examined again. The setting will stay static unless you ask DB2 to re-examine the CPU.

So, why do we care? Well there could be a few reasons.

  • An additional CPU was added or subtracted from an LPAR.
  • An image of your old server was taken and placed on a new, faster server, with a different type or number of processors.

If for some reason your CPU configuration was altered or changed the new processing speed is not taken into account until it is re-evaluated. So go ahead and add on that additional CPU to handle your black Friday workload. It won’t help much unless DB2 knows to take it into account.

To re-evaluate:
db2 "update dbm cfg using CPUSPEED -1"

Once done, you should see a new CPUSPEED displayed for your DBM configuration.

If you are comparing apples to apples you want to see the number get smaller to show a speed increase. If you are changing architecture (P6 to P7 for example) the number could theoretically go up. Put that in context though, the higher number could be the setting DB2 needs to account for multi-threading or some other hardware change. So it may look worse when it really isn’t.

Once done, DB2 will use the new number to determine proper access paths. So make sure to issue a REBIND once done so your SQL can take advantage of the speed increase.

(Special thanks to Robert Goethel who introduced this topic during the DB2 Night Show competition this year).

RESTRICTIVE (Create Database …. RESTRICTIVE)

I picked this up in Roger Sander’s DB2 Crammer Course at IDUG this year. I had just spent the past two months auditing our database authorities and was frustrated with the amount of PUBLIC access. I even created a separate SQL script to run after new database creation to revoke some of the default PUBLIC authority.

Apparently I reinvented the wheel. If you use the RESTRICTIVE clause in the CREATE DATABASE command no privileges or authorities will automatically be granted to PUBLIC.

For example:
db2 “create database warehouse on /data1 dbpath on /home/db2inst1 restrictive”

DB2_WORKLOAD (System Environment Variable)

This db2set parameter has been available for a while but I know a new option (ANALYTICS) became available with v10.5. Essentially, DB2_WORKLOAD will preset a group of environment variables for your needs. Set it once and go – no need to look up various configurations or develop scripts. This is valuable for  various application configurations such as BLU or Cognos.

To activate:
db2set DB2_WORKLOAD <option>

1C 1C Applications
Analytics Analytics Workload
CM IBM Content Manager
COGNOS_CS Cognos Content Server
FILENET_CM Filenet Content Manager
INFOR_ERP_LN ERP Baan
MAXIM Maximo
MDM Master Data Management
SAP SAP Environment
TPM Tivoli Provisioning Manager
WAS Websphere Application Server
WC Websphere Commerce
WP Websphere Portal

If you are a Websphere Commerce nerd like Ember, make sure to read her blog on DB2_Workload and how it can be used for Websphere Commerce.


Michael Krafick is an occasional contributor to db2commerce.com. He has been a production support DBA for over 12 years in data warehousing and highly transactional OLTP environments. He was acknowledged as a top ten session speaker for “10 Minute Triage” at the 2012 IDUG Technical Conference. Michael also has extensive experience in setting up monitoring configurations for DB2 Databases as well as preparing for high availability failover, backup, and recovery. He can be reached at “Michael.Krafick (at) icloud (dot) com”. Linked-in Profile: http://www.linkedin.com/in/michaelkrafick. Twitter: mkrafick

Mike’s blog posts include:
10 Minute Triage: Assessing Problems Quickly (Part I)
10 Minute Triage: Assessing Problems Quickly (Part II)
Now, now you two play nice … DB2 and HACMP failover
Technical Conference – It’s a skill builder, not a trip to Vegas.
Why won’t you just die?! (Cleaning DB2 Process in Memory)
Attack of the Blob: Blobs in a Transaction Processing Environment
Automatic Storage Tablespaces (AST): Compare and Contrast to DMS
DB2 v10.1 Column Masking
Automatic Storage (AST) and DMS
Reloacting the Instance Home Directory
Informational Constraints: Benifits and Drawbacks

Giving Automatic Maintenance a Fair Try

$
0
0

I’m a control freak. I think that control freaks tend to make good DBAs as long as they don’t take it too far. My position for years has been that I would rather control my runstats, reorgs, and backups directly than trust DB2’s automatic facilities. But I also try to keep an open mind. That means that every so often I have to give the new stuff a chance. This blog entry is about me giving automatic maintenance a try. I am NOT recommending it yet, but here’s how I approached it and what I saw.

Environment

The environment I’m working with here is a brand new 10.5 (fixpack 5) database. It uses column-organization for most tables, but has a small subset of tables that must be row-organized. I still refuse to trust my backups to automation – I want them to run at a standard time each day. But for runstats and reorgs, I’m using this article to both look into what to do with the row-organized and the column-organized tables and how to set up controls around what times things happen. The environment I’m working on happens to use HADR.

Database Configuration Parameter Settings

Here’s the configuration I’m starting with for the Automatic maintenance parameters:

 Automatic maintenance                      (AUTO_MAINT) = ON
   Automatic database backup            (AUTO_DB_BACKUP) = OFF
   Automatic table maintenance          (AUTO_TBL_MAINT) = ON
     Automatic runstats                  (AUTO_RUNSTATS) = ON
       Real-time statistics            (AUTO_STMT_STATS) = ON
       Statistical views              (AUTO_STATS_VIEWS) = OFF
       Automatic sampling                (AUTO_SAMPLING) = OFF
     Automatic reorganization               (AUTO_REORG) = ON

I think those are the defaults for a BLU database, though I may have already set AUTO_DB_BACKUP to OFF manually myself. I’m also going to turn auto_stats_views on, in case I create one of those. This is the syntax I use for that:

-bash-4.1$ db2 update db cfg for SAMPLE using AUTO_STATS_VIEWS ON
DB20000I  The UPDATE DATABASE CONFIGURATION command completed successfully.

And now these configuration parameters look like this:

 Automatic maintenance                      (AUTO_MAINT) = ON
   Automatic database backup            (AUTO_DB_BACKUP) = OFF
   Automatic table maintenance          (AUTO_TBL_MAINT) = ON
     Automatic runstats                  (AUTO_RUNSTATS) = ON
       Real-time statistics            (AUTO_STMT_STATS) = ON
       Statistical views              (AUTO_STATS_VIEWS) = ON
       Automatic sampling                (AUTO_SAMPLING) = OFF
     Automatic reorganization               (AUTO_REORG) = ON

Runstats on Row-Organized Tables

The first thing that I know I want to do is to set up profiles for runstats on my row-organized tables.

Setting Profiles

I want to use this syntax for all of my row organized tables in this database:

runstats on table <SCHEMA>.<TABLE> with distribution and detailed indexes all

If I had larger row-organized tables, I might consider using sampling, but my row-organized tables in this database happen to be on the smaller side.

There doesn’t seem to be a way to set a default syntax other than through creating profiles on individual tables. Would love to know in a comment below if I’m missing something there.

To set my profiles, I’m going to use a little scripting trick I like to use when I have to do the same thing for many objects. In this case, I have 106 tables where I want to set the profile identically, so first I create a list of those tables using:

db2 -x "select  substr(tabschema,1,18) as tabschema
        , substr(tabname,1,40) as tabname
from syscat.tables
where   tableorg='R'
        and tabschema not like ('SYS%')
with ur" > tab.list

This creates a file called tab.list that has only the names of my tables – the -x on the command ensures that column headings and the summary row I have telling me how many rows are not returned as a part of the query.

Next, I loop through that list with a one-line shell script:

cat tab.list |while read s t; do db2 connect to bcudb; db2 -v "runstats on table $s.$t with distribution and detailed indexes all set profile"; db2 connect reset; done |tee stats.profile.out

Note that I could have used “set profile only” if I didn’t also want to actually do runstats on these tables, but in my case, I wanted to both do the runstats and set the profile. I then checked stats.profile.out for any failures with this quick grep:

 cat stats.profile.out |grep SQL |grep -v DB20000I |grep -v "SQL authorization ID   = DB2INST1"

Everything was successful.

Setting up a schedule

I don’t want my runstats to kick off any old time they feel like it. I want to restrict them to run between 1 am and 6 am daily. To do this, I need to set up an automatic maintenance policy. There are samples that I can start with in $HOME/sqllib/samples/automaintcfg.

I first made a copy of DB2MaintenanceWindowPolicySample.xml, renaming it and moving it to a working directory. I ensured my new file contained these lines:

<DB2MaintenanceWindows
xmlns="http://www.ibm.com/xmlns/prod/db2/autonomic/config">
 <OnlineWindow Occurrence="During" startTime="01:00:00" duration="5">
   <DaysOfWeek>All</DaysOfWeek>
   <DaysOfMonth>All</DaysOfMonth>
   <MonthsOfYear>All</MonthsOfYear>
 </OnlineWindow>
</DB2MaintenanceWindows>

I don’t want to set an offline window at this time, because I don’t have one. The sample file has great information on how to configure different scenarios. While there is no option in the xml file to specify a database or different databases, I’m setting the policy with a command against a database, so I named the file with the database name in it and keep it in a place I can easily find it later so that I can change it if need be.

By default, the online window is 24/7.

Now that I have an XML file that will do what I want, I can set that as the policy using the AUTOMAINT_SET_POLICYFILE procedure, like this:

-bash-4.1$ db2 "call sysproc.automaint_set_policyfile( 'MAINTENANCE_WINDOW', 'DB2MaintenanceWindowPolicyBCUDB.xml' )"
SQL1436N  Automated maintenance policy configuration file named

Well, oops, that didn’t work so well. I learned that the xml file you want to use must be in $HOME/sqllib/tmp. ALSO, it must be readable by the fenced user ID. With the way I have it set up (with the fenced user id in the primary group of my instance id), this is what I had to do to make that work:

-bash-4.1$ cp DB2MaintenanceWindowPolicyBCUDB.xml $HOME/sqllib/tmp
-bash-4.1$ chmod 740 $HOME/sqllib/tmp/DB2MaintenanceWindowPolicyBCUDB.xml

I was then able to successfully call the stored procedure:

-bash-4.1$ db2 "call sysproc.automaint_set_policyfile( 'MAINTENANCE_WINDOW', 'DB2MaintenanceWindowPolicyBCUDB.xml' )"

  Return Status = 0

When DB2 reads the file in, it is not depending on that file to exist forever. It is storing the information from the file in the database. You can use the AUTOMAINT_GET_POLICYFILE and AUTOMAINT_GET_POLICY stored procedures to pull that information back out. Remember that there is only one policy for each of the automatic maintenance categories, so it is best to first get the policy, change it, and then set it, so you do not accidentally overwrite what is already there.

Phew. Ok, that’s what I had to do to set things up for my row-organized tables. My column organized tables will also get runstats by default, and for the sake of trying it, I’m going to go with the defaults there and see what happens. And by see what happens, I mean I’ll be querying up a storm to see what DB2 is doing.

Reorgs

This database does not have an offline maintenance window. So I need to configure online reorgs to occur. Much like with runstats, I’m going to let the BLU tables go for a while and see if the the hype from IBM about just letting DB2 take care of it is really all that. The only reorgs there are for space reclaimation anyway. But for my row-organized tables, I want to make sure they’re taken care of.

Man, was I disappointed to discover that inplace/notruncate reorgs are STILL not supported as a part of automatic maintenance. This means that I cannot do table reorgs through DB2’s automation facilities … off to re-write my reorg script for yet another employer, I guess.

I’m trying to see if DB2 can at least manage my index reorgs for me online, though, with this syntax in my file:

<DB2AutoReorgPolicy
xmlns="http://www.ibm.com/xmlns/prod/db2/autonomic/config">
 <ReorgOptions dictionaryOption="Keep" indexReorgMode="Online"  useSystemTempTableSpace="false" />
 <ReorgTableScope maxOfflineReorgTableSize="52">
  <FilterClause />
 </ReorgTableScope>
</DB2AutoReorgPolicy>

And, of course, implementing that file with:

-bash-4.1$ cp DB2AutoReorgPolicyBCUDB.xml $HOME/sqllib/tmp
-bash-4.1$ chmod 740 $HOME/sqllib/tmp/DB2AutoReorgPolicyBCUDB.xml
-bash-4.1$ db2 "call sysproc.automaint_set_policyfile( 'AUTO_REORG', 'DB2AutoReorgPolicyBCUDB.xml' )"

  Return Status = 0

I think the combination of that and no offline reorg window defined will get me what I want on the index side anyway.


Analyzing Package Cache Size

$
0
0

Note: updated 7/21 to reflect location of the package cache high water mark in the MON_GET* table functions

I have long been a fan of a smaller package cache size, particularly for transaction processing databases. I have seen STMM choose a very large size for the package cache, and this presents several problems:

  • Memory used for the package cache might be better used elsewhere
  • A large package cache makes statement analysis difficult
  • A large package cache may be masking statement issues – the proper use of parameter markers

Parameter Markers

Parameter markers involve telling DB2 that the same query may be executed many times with slightly different values, and that DB2 should use the same access plan, no matter what the values supplied are. This means that DB2 only has to compile the access plan once, rather than doing the same work repeatedly. However, it also means that DB2 cannot make use of distribution statistics to compute the optimal access plan. That means that parameter markers work best for queries that are executed frequently, and for which the value distribution is likely to be even or at least not drastically skewed.

The use of parameter markers is not a choice that the DBA usually gets to make. It is often a decision made by developers or even vendors. Since it is not an across-the-board best practice to use parameter markers, there are frequently cases where the wrong decisions are made. There are certainly queries and data sets where parameter markers will make things worse.

At the database level, we can use the STMT_CONC database configuration parameter (set to LITERALS) to force the use of common access plans for EVERYTHING. This is not optimal for the following reasons:

  • There are often some places where the value will always be the same, and in those places SQL would benefit more from a static value.
  • The SQL in the pacakage cache will essentially never show static values used, which can be difficult when troubleshooting.
  • With uneven distribution of data, performance of some SQL may suffer.
  • There have been APARs about incorrect data being returned.

If you have interaction with developers on a deep and meaningful level, proper use of parameter markers is the best choice.

Parameter markers show up as question marks in SQL in the package cache. This statement uses parameter markers:

Select booking_num from SAMPLE.TRAILER_BOOKING where trailer_id = ?

Statement substitutions done by the statement concentrator use :LN, where N is a number representing the position in the statement. This statement shows values affected by the statement concentrator:

select count(*) from event where event_id in ( select event_id from sample.other_table where comm_id=:L0 ) and who_entered != :L1

Sizing the Package Cache

I’ve said that I don’t trust STMM to make the best choices for the package cache. As a result, I recommend setting a static value. How do I come up with the right value?

I often start by setting the PCKCACHESZ database configuration parameter to 8192 or 16384, and tune it upwards until I stop seeing frequent package cache overflows. A package cache overflow will write messages like this to the DB2 diagnostic log:

xxxx-xx-xx-xx.xx.xx.xxxxxx+xxx xxxxxxxxxxxxxx     LEVEL: Event
PID     : xxxxxxx              TID  : xxxxx       PROC : db2sysc
0
INSTANCE: db2             NODE : 000         DB   : SAMPLE
APPHDL  : 0-xxxxx              APPID:
xx.xxx.xxx.xx.xxxxx.xxxxxxxxxxx
AUTHID  : xxxxxxxx

EDUID   : xxxxx                EDUNAME: db2agent (SAMPLE) 0
FUNCTION: DB2 UDB, access plan manager, sqlra_cache_mem_please,
probe:100
MESSAGE : ADM4500W  A package cache overflow condition has
occurred. There is
          no error but this indicates that the package cache has
exceeded the
          configured maximum size. If this condition persists,
you should
          perform additional monitoring to determine if you need
to change the
          PCKCACHESZ DB configuration parameter. You could also
set it to
          AUTOMATIC.
REPORT  : APM : Package Cache : info
IMPACT  : Unlikely
DATA #1 : String, 274 bytes
Package Cache Overflow
memory needed             : 753
current used size (OSS)   : 15984666
maximum cache size (APM)  : 15892480
maximum logical size (OSS): 40164894
maximum used size (OSS)   : 48562176
owned size (OSS)          : 26017792
number of overflows       : xxxxx

I address these usually by increasing the package cache by 4096 until they are vastly less frequent. This could still be a considerable size if your application does not make appropriate use of parameter markers.

To look at details of your package cache size, you can look at this section of a database snapshot:

Package cache lookups                      = 16001443673
Package cache inserts                      = 4180445
Package cache overflows                    = 0
Package cache high water mark (Bytes)      = 777720137

I’m a bit frustrated that the package cache high water mark doesn’t seem to be in the MON_GET* functions. I’m going to need that before they discontinue the snapshot monitor. To get the high water mark for the package cache, you can use this query on 9.7 and above (thanks to Paul Bird’s twitter comment for pointing me to this):

select memory_pool_used_hwm
from table (MON_GET_MEMORY_POOL(NULL, CURRENT_SERVER, -2)) as mgmp 
where memory_pool_type='PACKAGE_CACHE' 
with ur

MEMORY_POOL_USED_HWM
--------------------
                 832

You can use that value to see how close to the configured maximum size (PCKCACHESZ) the package cache has actually come. In this particular database, the package cache size is 190000 (4K pages). In bytes that would be 778,240,000. That means in this case that the package cache has nearly reached the maximum at some point. But you can tell from the value of package cache overflows that it has not attempted to overflow the configured size.

The numbers above also allow me to calculate the package cache hit ratio. These numbers are also available in MON_GET_WORKLOAD on 9.7 and above or MON_GET_DATABASE on 10.5. The package cache hit ratio is calculated as:

100*(1-(package cache inserts/package cache lookups))

With the numbers above, that is:

100*(1-(4180445/16001443673))

or 99.97%

You do generally want to make sure your package cache hit ratio is over 90%.

In addition to these metrics, you can also look at what percentage of time your database spends on compiling SQL. This can be computed over a specific period of time using MONREPORT.DBSUMMARY. Look for this section:

  Component times
  --------------------------------------------------------------------------------
  -- Detailed breakdown of processing time --

                                      %                 Total
                                      ----------------  --------------------------
  Total processing                    100               10968

  Section execution
    TOTAL_SECTION_PROC_TIME           80                8857
      TOTAL_SECTION_SORT_PROC_TIME    17                1903
  Compile
    TOTAL_COMPILE_PROC_TIME           2                 307
    TOTAL_IMPLICIT_COMPILE_PROC_TIME  0                 0
  Transaction end processing
    TOTAL_COMMIT_PROC_TIME            0                 76
    TOTAL_ROLLBACK_PROC_TIME          0                 0
  Utilities
    TOTAL_RUNSTATS_PROC_TIME          0                 0
    TOTAL_REORGS_PROC_TIME            0                 0
    TOTAL_LOAD_PROC_TIME              0                 0

You generally want to aim for a compile time percentage of 5% or less. Remember that MONREPORT.DBSUMMARY only reports data over the interval that you give it, with a default of 10 seconds, so you want to run this over time and at many different times before making a decision based upon it.

Summary

A properly sized package cache is important to database performance. The numbers and details presented here should help you find the appropriate size for your system.

Issues with STMM

$
0
0

I thought I’d share some issues with STMM that I’ve seen on Linux lately. I’ve mostly been a fan of STMM, and I still am for small environments that are largely transaction processing and have only one instance on a server.

Here are the details of this environment. The database is a small analytics environment. It used to be a BCU environment that was 4 data nodes and one coordinator node on 9.5. The database was less than a TB, uncompressed. There were also some single-partition databases for various purposes on the coordinator node. I’ve recently migrated it to BLU – 10.5 on Linux. The users are just starting to make heavier use of the environment, though I largely built and moved some data about 6 months ago. The client does essentially a full re-load of all data once a month.

The new environment is two DB2 instances – one for the largely BLU database, and one for a transaction processing database that replaces most of the smaller databases from the coordinator node. Each instance has only one database. The server has 8 CPUS and about 64 GB of memory – the minimums for a BLU environment.

First Crash

The first crash we saw was both instances going down within 2 seconds of each other. The last message before the crash looked like this:

2015-08-06-17.58.02.253956+000 E548084503E579        LEVEL: Severe
PID     : 20773                TID : 140664939472640 PROC : db2wdog
INSTANCE: db2inst1             NODE : 000
HOSTNAME: dbserver1
EDUID   : 2                    EDUNAME: db2wdog [db2inst1]
FUNCTION: DB2 UDB, base sys utilities, sqleWatchDog, probe:20
MESSAGE : ADM0503C  An unexpected internal processing error has occurred. All
          DB2 processes associated with this instance have been shutdown.
          Diagnostic information has been recorded. Contact IBM Support for
          further assistance.

2015-08-06-17.58.02.574134+000 E548085083E455        LEVEL: Error
PID     : 20773                TID : 140664939472640 PROC : db2wdog
INSTANCE: db2inst1             NODE : 000
HOSTNAME: dbserver1
EDUID   : 2                    EDUNAME: db2wdog [db2inst1]
FUNCTION: DB2 UDB, base sys utilities, sqleWatchDog, probe:8959
DATA #1 : Process ID, 4 bytes
20775
DATA #2 : Hexdump, 8 bytes
0x00007FEF1BBFD1E8 : 0201 0000 0900 0000                        ........

2015-08-06-17.58.02.575748+000 I548085539E420        LEVEL: Info
PID     : 20773                TID : 140664939472640 PROC : db2wdog
INSTANCE: db2inst1             NODE : 000
HOSTNAME: dbserver1
EDUID   : 2                    EDUNAME: db2wdog [db2inst1]
FUNCTION: DB2 UDB, base sys utilities, sqleCleanupResources, probe:5475
DATA #1 : String, 24 bytes
Process Termination Code
DATA #2 : Hex integer, 4 bytes
0x00000102

2015-08-06-17.58.02.580890+000 I548085960E848        LEVEL: Event
PID     : 20773                TID : 140664939472640 PROC : db2wdog
INSTANCE: db2inst1             NODE : 000
HOSTNAME: dbserver1
EDUID   : 2                    EDUNAME: db2wdog [db2inst1]
FUNCTION: DB2 UDB, oper system services, sqlossig, probe:10
MESSAGE : Sending SIGKILL to the following process id
DATA #1 : signed integer, 4 bytes
...

The most frequent cause of this kind of error in my experience tends to be memory pressure at the OS level – the OS saw that too much memory was being used, and instead of crashing itself, it chooses the biggest consumer of memory to kill. On a DB2 database server, this is almost always db2sysc or another DB2 process. I still chose to open a ticket with support, to get confirmation on this and see if there was a known issue.

IBM support pointed me to this technote, confirming my suspicions: http://www-01.ibm.com/support/docview.wss?uid=swg21449871. They also recommended “have a Linux system administrator review the system memory usage and verify that there is available memory, including disk swap space. Most Linux kernels now allow for the tuning of the OOM-killer. It is recommended that a Linux system administrator perform a review and determine the appropriate settings.” I was a bit frustrated with this response as this box runs on a PureApp environment and runs only DB2. The solution is to tune the OOM-killer at the OS level?

While working on the issue I discovered that I had neglected to set INSTANCE_MEMORY/DATABASE_MEMORY to fixed values, as is best practice on a system with more than one DB2 instance when you’re trying to use STMM. So I set them for both instances and databases, allowing the BLU instance to have most of the memory. I went with the idea that this crash was basically my fault for not better limiting the two DB2 instances on a box. Though I wish STMM would play better for multiple instances.

Second Crash

Several weeks later, I had another crash, though this time only of the BLU instance, not of the other instance. It was clearly the same issue. I re-opened the PMR with support, and asked for help identifying what tuning I needed to do to keep these two instances from stepping on each other. IBM support again confirmed that it was a case of the OS killing DB2 due to memory pressure. This time, they recommended setting the Linux kernel parameter vm.swappiness to 0. While I worked on getting approvals for that, I tweeted about it. The DB2 Knowledge Center does recommend it be set to 0. I had it set to the default of 60.

Resolution

Scott Hayes reached out to me on twitter because he had recently seen a similar issue. After a discussion with him about the details, I decided to implement a less drastic setting for vm.swappines, and to instead abandon the use of STMM. I always set the package cache manually anyway. I had set catalog cache manually. Due to problems with loads, I had already set the utility heap manually. In BLU databases, STMM cannot tune sort memory areas. All of this meant that the only areas STMM was even able to tune in my BLU database were DBHEAP, LOCKLIST, and the buffer pools. I looked at what the current settings were and set these remaining areas to just below what STMM had them at. I have already encountered one minor problem – apparently STMM had been increasing the DBHEAP each night during LOADs, so when they ran LOADs the first night, they failed due to insufficient DBHEAP. That was easy to fix, as the errors in the diagnostic log specified exactly how much DBHEAP was needed, so I manually increased the DBHEAP. I will have to keep a closer eye on performance tuning, but my monitoring already does things like send me an email when buffer pool hit ratios or other KPIs are off, so that’s not much of a stretch for me.

Registry Variables and DB2_WORKLOAD=WC

$
0
0

I was happy to see the new workload registry variable in Commerce 7/DB2 9. Mostly out of laziness – it requires fewer variables to be set manually, but it also ensures that nothing major is missed (I imagine they may come up with more that won’t be added to the workload registry variable in real time). I had a whole argument with whichever part of IBM a client brought in to do a load-test performance review because I had set one of the paramters to “ON” instead of “YES”. I recently ran across this statement in the Info Center:

Note: If a registry variable requires Boolean values as arguments,
the values YES, 1, and ON are all equivalent and the values NO, 0, and OFF
are also equivalent. For any variable, you can specify any of the appropriate
equivalent values.

I will surely be quoting this and linking it in any future such disagreements.

If I look at what is set, it is mostly what we set by hand before, with a few additions:

[i] DB2_OPT_MAX_TEMP_SIZE=10240 [O]
[i] DB2_WORKLOAD=WC
[i] DB2_SKIPINSERTED=YES [O]
[i] DB2_OPTPROFILE=YES [O]
[i] DB2_USE_ALTERNATE_PAGE_CLEANING=ON
[i] DB2_INLIST_TO_NLJN=YES [O]
[i] DB2_MINIMIZE_LISTPREFETCH=YES [O]
[i] DB2_REDUCED_OPTIMIZATION=INDEX,JOIN,NO_SORT_MGJOIN,JULIE [O]
[i] DB2_EVALUNCOMMITTED=YES_DEFERISCANFETCH [O]
[i] DB2_ANTIJOIN=EXTEND [O]
[i] DB2_SKIPDELETED=YES [O]

I would like to hear the story behind DB2_REDUCED_OPTIMIZATION being set to “Julie” – what, was that someone’s girlfriend? Actually, that’s the parameter that has me most interested out of all of them overall(and most worried too).

I’m also interested in diggging further into the use of optimization profiles and how Commerce 7 is using them. I do wory a bit that they may be locking in access methods that may not be appropriate for every database size/distribution.

I would like to find a complete reference on Commerce’s thoughts on each variable and why they work for Commerce. I’m just not a “trust me, it works” kind of person when it comes to these things.

HADR

$
0
0

What is HADR?

HADR is DB2’s implementation of log shipping. Which means it’s a shared-nothing kind of product. But it is log shipping at the Log Buffer level instead of the Log File level, so it can be extremely up to date. It even has a Synchronous mode that would guarantee that committed transactions on one server would also be on another server. (in years of experience on dozens of clients, I’ve only ever seen NEARSYNC used) It can only handle two servers (there’s no adding a third in), and is active/warm spare – only with 9.7 Fixpack 1 and later can you do reads on the standby and you cannot do writes on the standby.

How much does it cost?

As always verify with IBM because licensing changes by region and other factors I’m not aware of. But generally HADR is included with DB2 licensing – the rub is usually in licensing DB2 on the standby server. Usually the standby server can be licensed at only 100 PVU, which is frequently much cheaper than full DB2 licensing. If you want to be able to do reads on the standby, though, you’ll have to go in for full licensing. Usually clients run HADR only in production, though I have seen a couple lately doing it in QA as well to have a testing ground.

What failures does it protect against?

HADR protects against hardware failures – CPU, disk, memory and the controllers and other hardware components. Tools like HACMP and Veritas use a shared-disk implementation, so cannot protect against disk failure. I have seen both SAN failures and RAID array (the whole array) failures, so it may seem like one in a million, but even the most redundant disks can fail. It can also be used to facilitate rolling hardware maintenance and rolling FixPacks. You are not guaranteed to be able to keep the database up during a full DB2 version upgrade. It must be combined with other (included) products to automatically sense failures and fail over.

What failures does it not protect against?

HADR does not protect against human error, data issues, and HADR failures. If someone deletes everything from a table and commits the delete, HADR is not going to be able to recover from that. It is not a replacement for a good backup and recovery strategy. You must also monitor HADR – I treat HADR down in production as a sev 1 issue where a DBA needs to be called out of bed to fix it. I have actually lost a production raid array around 5 am when HADR had gone down around 1 am. Worst case scenarios do happen.

How to set it up

HADR is really not too difficult to set up on it’s own. Configuring automatic failover is a bit more difficult, though DB2 has made it significantly easier in 9.5 and above with the introduction of bundled TSA and the haicu tool. I’m not going to list every detail here because there are half a dozen white papers out there on how to set it up. The general idea is:

1. Set the HADR parameters on each server

HADR local host name                  (HADR_LOCAL_HOST) = your.primary.hostname 
HADR local service name                (HADR_LOCAL_SVC) = 18819
HADR remote host name                (HADR_REMOTE_HOST) = your.secondary.hostname
HADR remote service name              (HADR_REMOTE_SVC) = 18820
HADR instance name of remote server  (HADR_REMOTE_INST) = inst1
HADR timeout value                       (HADR_TIMEOUT) = 120
HADR log write synchronization mode     (HADR_SYNCMODE) = NEARSYNC
HADR peer window duration (seconds)  (HADR_PEER_WINDOW) = 120

2. Set the Alternate Servers on the Primary and the standby (for Automatic Client Reroute)

3. Set db configuration parameters INDEXREC to RESTART and LOGINDEXBUILD to ON

4. Take a backup (preferably Offline) of the database on the primary server

5. Restore the database on the standby server, leaving it in rollforward pending state

6. Start HADR on the standby

7. Start HADR on the primary

8. Wait 5 minutes and check HADR status

9. Run db2haicu to set up TSA for automated failover

10. Test multiple failure scenarios at the app and database level

For chunks of this, your database will be unavailable. There are also a number of inputs you need to have ready for running db2haicu, and you will need ongoing sudo authority to execute at least one TSA related command.

Remember that the primary and standby servers should be as identical as possible – filesystems, hardware, and software.

Some clients also neglect step #10 – testing of failovers. This is an important step to make sure you really can failover. It is possible to think you have everything set up right, do a failover and then not have it work properly from the application’s perspective.

Gotchas

This section represents hours spent troubleshooting different problems or recovering from them. I hope it can help someone find an issue faster.

HADR is extremely picky about its variables. They must be exactly right with no typos, or HADR will not work. I have, on several occasions had numbers reversed or the instance name off, and spent a fair amount of time looking for the error before finding it. Because of this, it can help if you have another dba look over the basics if things aren’t working on setup. HADR is also picky on hosts file and/or db2nodes.cfg set up, and in some cases you may end up using an IP address in the db cfg parameters instead of a hostname.

HADR also sometimes fails after it tells you it has successfully started, so you must check the status after you start it.

Occasionally HADR doesn’t like to work from an Online backup, so an Offline one will be required. I have one note about it not going well with a compressed backup, but that was years ago, and I frequently used compressed backups without trouble.

HADR does not copy things that aren’t logged – so it is not a good choice if you have non-logged LOBs or if you do non-recoverable loads. If you are using HADR and you do a non-recoverable load, you have to take a backup on the primary and restore it into the standby – if you don’t, any table with a non-recoverable load will not be copied over, nor will future changes, and if you go to failover, then you will not be able to access that table. For this reason, I wouldn’t use it in a scenario where you don’t have good control over data being loaded into the database. If you do run into that, then you have to backup your primary database, restore it into your standby database, and start HADR.

HADR does go down sometimes without warning – so you must monitor it using whatever monitoring tools you have, and ensure that you respond very quickly when it goes down. I use db2pd to monitor(parsing output with scripts), partially because db2pd works when other monitoring tools hang. We look at ConnectStatus, State, and LogGapRunAvg.

On reboot, HADR comes up with database activation. Which means it usually comes up just fine on your primary database, but not on your standby database (no connections to prompt activation). So you’ll generally need to manually start hadr on your standby after a reboot. The primary database will not allow connections on activation until after it can communicate with the standby. This is to prevent a DBA’s worst nightmare – ‘Split Brain’. DB2’s protections against split-brain are pretty nifty. But this means that if you reboot both your primary and your standby at the same time and your primary comes up first, then your primary will not allow any connections until your standby is also up. This can be very confusing the first time or two that you see it. You can manually force the primary to start if you’re sure that the standby is not also up and taking transactions. Or if you’re rebooting both, just do the standby first and do the primary after the standby is back up and activated. If you need your standby down for a while, then stop HADR before you stop the servers. I would recommend NOT stopping HADR automatically on reboot, because the default behavior protects you from split-brain.

What is split-brain? It is simply both your primary and standby databases thinking they are the primary and taking transactions – getting you into a nearly impossible to resolve data conflict.

You must keep the same ids/groups on the primary and standby database servers. I’ve seen a situation on initial set up where the id that Commerce was using to connect to the database was only on the primary server, and not on the standby server, and thus on failover, the database looked fine, but Commerce could not connect.

You also want to be aware of any batch-jobs, data loads, or even scheduled maintenance like runstats or backups – when you fail over, you’ll need to run these on the other database server. Or you can also run them from a client that will get the ACR value and always point to the active database server. Frequently we don’t care which database server the database is running on, and may have it on what was initially the “standby” for months at a time.

Overall, I really like HADR and it’s ease of administration. The level of integration for TSA in 9.5/9.7 is great.

Managing db2 transaction log files

$
0
0

Logging method

There are two methods of logging that DB2 supports: Circular and Archive. I believe Oracle has similar modes.

Circular

To my extreme disgust, the default that Commerce uses if you don’t change anything is Circular logging. Circular logging is more often appropriate for databases where you don’t care about the data (seen it for databases supporting Tivoli and other vendors) or for a Data Warehousing and Decision Support databases where you have extremely well defined data loading processes that can easily be re-done on demand. You must also be willing to take regular outages for database backups because Circular logging does not allow you to take online backups. Circular logging also does not allow you to rollforward or back through transaction logs to reach a point in time – any restores are ONLY to the time a backup was taken.

On every new build, I move away from circular logging. I just don’t find it appropriate for a OLTP database, where your requirement often include very high availability and the ability to recover from all kinds of disasters with no data loss.

Archive

So why, then isn’t archive logging the default? Well, it requires proper management of transaction log files. Which can really get you in trouble if you don’t know what you’re doing. If you compress or delete an active transaction log, you will crash your database and have to restore from a backup. I’ve seen it happen, and it’s not fun. And the highest freqency of OS level backups you’re willing to do should be applied to the directories holding transaction log files.

I ensure that my archive logs are always on a separate path from the active ones so I and whoever gets paged out when a filesystem is filling up can easily see which is which.

Personally, I use scripts to manage my trasaction log files. I actually do most of it with my backup script. How long you keep them depends on your restore requirements and your overall backup/restore strategy. I also use a simple cron job to find files in the archive log path older than a certain time frame (1 day or 3 days is most common) and compress them. I hear that a nice safe way to delete logs is the prune logs command, but I never got used to it.

This is one of the areas where it is critical for DBA’s to have an excruciatingly high level of attention to detail.

Logging settings

Ok, ready for the most complicated part?

All the settings discussed are in the db cfg.

LOGRETAIN

So the most important setting here is LOGRETAIN. If set to ‘NO’, then you have circular logging. If it is set to ‘Recovery’ then you have Archive logging. To enable archive logging, you simply update this parameter.

LOGARCHMETH1

Second in my mind is LOGARCHMETH1. This parameter specifies the separate path for your archive logs to be sent to. It can be a location on DISK or TSM. Do not leave it set to ‘LOGRETAIN’.

http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/topic/com.ibm.db2.luw.admin.config.doc/doc/r0011448.html

WTH is this USEREXIT thing?

I undoubtedly have some newer DBAs wondering about this. The LOGARCHMETH1 parameter and others that dictate the location of archive logs was only introduced in db28 (or was it 7?). Before that, we had these nasty things called userexit programs that we had to locate C compilers to compile and be aware of the location of the uncompiled versions to make changes if needed. And the compiled file had to be in the right place with the right permissions. Really, I hated working with them. But the functionality is still in DB2 to use them. I imagine they could do things you can’t do natively, but the parameters are so good that it’d be a rare situation that you need them.

LOGFILSIZ

This is the size of each transaction log file. Generally my default for Commerce is 10000 (which I think Commerce itself actually sets on instance creation), but I’ve gone higher – it’s not unusual to go up to 40,000 while data is being loaded or for stagincopies.

LOGPRIMARY

This determines the number of log files of the size LOGFILSZ that compose the database’s active log files. These are all created on database activation (which happens on first connection), so you don’t want to go too large. But you do want to generally have the space here to handle your active logs. Most Commerce databases do well at around 12.

LOGSECOND

This determines the number of additional active logs that can be allocated as needed. LOGPRIMARY + LOGSECOND cannot exceed 255. The nice thing about LOGSECOND is that these are not allocated on database activation, but only as needed. The other awesome thing here is that they can be increased online – one of the few logging parameters that can be. I usually start with 50, but increase if there’s a specific need for more. Remember, these should not be used on an ongoing basis – just to handle spikes.

All the others

So there are all kinds of nifty things you can do with logging. Infinite logging, mirrored loging, logging to a raw device, etc. So I’m not going to cover all the logging parameters there are in this post.

Potential issues

Deleting or compressing an active log file

The best case if you delete or compress an active log file is that DB2 is able to recreate it. This may affect your ability to take online backups. The worst (and more likely) case is that your database ceases functioning and you have to restore from backup. Keep your active and archive logs in separate directories to help prevent this, and educate anyone who might try to alleviate a filesystem full. If you do get an error on an online backup referencing the inability to include a log file, take an offline backup just as soon as you can – you will be unable to take online backups until you do.

Filling up a filesystem due to not managing log files

If your archive log filesystem is separate and fills up, it doesn’t hurt anything. If the filesystem your active log path is on fills up, your database will be inaccessible until you clear up the filesystem full. The moment the filesystem is no longer full, the database will function, so there is no need to restore. I recommend monitoring for any filesystems involved in transaction logging.

Deleting too many log files and impacting recovery

If you’re on anything before DB2 9.5, make absolutely sure that you use the “include logs” keyword on the backup command. If you don’t, you may end up with a backup that is completely useless, because you MUST have at least one log file to restore from an online backup. When you delete log files, keep in mind your backup/recovery strategy. There’s very little worse than really needing to restore but being unable to do so because you’re missing a file. I recommend backing up your transaction logs to tape or through other OS level methods as frequently as you can.

Deleting recent files and impacting HADR

Sometimes HADR needs to access archive log files – especially if HADR is behind and needs to catch up. If you run into this situation, you have to re-set-up HADR using a database restore. If you’re using HADR, it is important to monitor HADR so you can catch failures as soon as possible and reduce the need for archive logs.

Log files too small

Tuning the size of your log files may be a topic for another post, but I’ll cover the highlights. Large deletes are the most likely to chew through everything you’ve got. The best solution is to break up large units of work into smaller pieces, especially deletes. Where that’s not possible (ahem, stagingcopy), you’ll need to increase any of LOGFILSZ, LOGPRIMARY, or LOGSECOND. Only LOGSECOND can be changed without recycling the database.

Log file saturation

This one confuses the heck out of new DBAs. You get what looks like a log file full, yet the disk is not full and a snapshot says there’s plenty of log space available. The problem here is that with archive logging, log files and each spot in those log files must be used sequentially – even if there are things that have already been committed. Normally the database is rolling through the logs, with the same number of files active at once, but constantly changing which files.

Sometimes an old connection is sitting out there hanging on to a page in the log file with an uncommitted unit of work. Then the connection becomes idle and stays that way, sometimes for days. Then DB2 gets to the point where it has to open another log file, and it can’t because that would be more than it is allowed to allocate. So it throws an error that looks pretty similar to log file full. In that case, you must force off the old idle connection. Details are written to the diag log, and you can also use a database snapshot to get the id of the connection holding the oldest log file.

This never happens to Commerce’s own connections, in my experience. It is usually a developer’s connection from what I’ve seen in Commerce databases. Commerce when functioning normally rarely has a connection with more than 5 minutes of idle time. So I like to have a db2 governor running that forces off connections that are IDLE for more than 4 hours.

Locking Parameters

$
0
0

So I thought I’d write a post covering locking parameters. This is by no means a comprehensive coverage of isolation levels and locking, but more of a practically oriented guide to the parameters available in DB2 that relate to locking.

LOCKTIMEOUT

This database configuration parameter specifies the time in seconds that a connection will wait for a needed lock before returning an error to the user.

Locktimeout is actually powerful functionality for OLTP/e-commerce databases. The idea is that an application should either do its work or fail and get out of the way. DB2 has a bit of a bad reputation for concurrency. I tend to think that this is because DB2 favors data integrity over concurrency, but I’m sure an Oracle dba would disagree with that characterization. For WebSphere Commerce or any OLTP/e-commerce databases, LOCKTIMEOUT should be set to a value between 30 and 90 seconds. Other types of databases may have other appropriate values.

Be exceedingly careful with the default of -1, though. -1 means “wait forever”, and this has a couple of implications. The first is that this “wait forever” may appear to the end user to be a hang – a query that is simply not returning results. The other one is that you can end up with some interesting lock chaining scenarios. The main problem is not always that one connection is waiting on one other connection – the problem tends to be that the waiting connection also has a dozen or a hundred other locks, and other connections may pile up behind those locks. db2top even has an option from the locks screen to list out lock chains. I’ve seen some ESB datbases (where the ESB application holds a lock on the SIBOWNER table continuously) where runstats and/or automatic runstats evaluation have piled up behind the application locks over the course of weeks to the point where the database finally becomes unusable, and the various runstats have to be manually cancelled. This does not occur if  LOCKTIMEOUT is set to a value.

To check your current value:

$ db2 get db cfg for <db_name> |grep LOCKTIMEOUT
 Lock timeout (sec)                        (LOCKTIMEOUT) = 60

The database must be recycled for changes to LOCKTIMEOUT to take effect.

LOCKTIMEOUT info center entry

LOCKLIST

This database configuration parameter is the size in 4k pages of the area of memory that DB2 uses to store locking information.

Contrary to some beliefs, changes to this parameter will not help locktimeout or deadlock issues unless there are also lock escalations are also occurring. I have to explain this at least a couple of times a year to one client or another. Generally the only time you will tune this is if you do see lock escalation. Lock escalations are noted in the DB2 diagnostic log and also in database snapshots. This parameter can be designated as one of the ones that is automatically tuned by STMM. If you are not allowing STMM to automatically tune it, I do recommend going higher than the default to start – I usually start with 4800 when manually tuning.

Each lock takes either 128 or 256 bytes, depending on whether other locks are held on the same object.

To check your current value:

$ db2 get db cfg for <db_name> |grep LOCKLIST
 Max storage for lock list (4KB)              (LOCKLIST) = 4800

One nice thing is that any changes to this parameter will take place immediately – no need to recycle the database or the instance for it to take effect.

LOCKLIST info center entry

MAXLOCKS

This database configuration parameter specifies the maximum percentage of the LOCKLIST that a single connection can use. This is designed to help prevent all of the LOCKLIST being consumed by a singe connection. It is something that is more likely to be tuned on previous versions of DB2 where LOCKLIST could not be set to automatically increase to avoid locking issues, and may still need tuning, particularly on ODS or DW databases where memory is constrained. Like LOCKLIST, this parameter can be set to automatic.

To check your current value:

$ db2 get db cfg for <db_name> |grep MAXLOCKS
 Percent. of lock lists per application       (MAXLOCKS) = 10

Like LOCKLIST, this parameter can be changed online with changes taking place immediately with no recycle needed.

MAXLOCKS info center entry

DLCHKTIME

This database configuration parameter specifies the frequency (in milliseconds) that db2 checks for deadlocks. The default is 10,000 ms (1o seconds). I don’t think I’ve ever seen this one changed.

To check your current value:

$ db2 get db cfg for wc005d04 |grep DLCHKTIME
 Interval for checking deadlock (ms)         (DLCHKTIME) = 10000

If you do end up having to change it, you can change it immediately without a database or instance recycle.

DLCHKTIME info center entry

DB2_EVALUNCOMMITTED

This DB2 registry parameter is one of three that changes how db2 locks rows. As such it is dangerous and should only be used if your application explicitly supports its use. WebSphere Commerce supports the use of all three. Be careful of DB2 instances where you have more than one database – this is set at the instance level, and you’ll want to make sure that all applications accessing any database in the instance support these parameters.

This DB2 registry parameter allows db2 to evaluate rows to see if they meet the conditions of the query BEFORE locking the row when using RS or CS isolation levels. The normal behavior for these isolation levels would be to lock the row before determining if it matched.

To check your current value:

$ db2set -all |grep DB2_EVALUNCOMMITTED
[i] DB2_EVALUNCOMMITTED=YES

If nothing is returned from this command, then the parameter is not set. The entire db2 instance must be recycled (db2stop/db2start) for changes to this parameter to take effect.

 

 

 

 

DB2_EVALUNCOMMITTED info center entry

DB2_SKIPDELETED

This DB2 registry parameter is one of three that changes how db2 locks rows. As such it is dangerous and should only be used if your application explicitly supports its use. WebSphere Commerce supports the use of all three. Be careful of DB2 instances where you have more than one database – this is set at the instance level, and you’ll want to make sure that all applications accessing any database in the instance support these parameters.

This DB2 registry parameter allows DB2 to skip uncommitted deleted rows during index scans. If it is not set, db2 will still evaluate uncommitted deleted rows during index scans. The normal behavior is for DB2 to evaluate uncommitted deleted rows in indexes until they are actually committed.

To check your current value:

$ db2set -all |grep DB2_SKIPDELETED
[i] DB2_SKIPDELETED=ON

If nothing is returned from this command, then the parameter is not set. The entire db2 instance must be recycled (db2stop/db2start) for changes to this parameter to take effect.

DB2_SKIPDELETED info center entry

DB2_SKIPINSERTED

This DB2 registry parameter is one of three that changes how db2 locks rows. As such it is dangerous and should only be used if your application explicitly supports its use. WebSphere Commerce supports the use of all three. Be careful of DB2 instances where you have more than one database – this is set at the instance level, and you’ll want to make sure that all applications accessing any database in the instance support these parameters.

This DB2 registry parameter allows DB2 to skip uncommitted newly inserted rows. If this parameter is not set, DB2 waits for the inserts to be committed or rolled back before continuing – you can see how this might not be ideal for a database that requires high concurrency. Like the others, this applies to CS and RS isolation levels.

To check your current value:

$ db2set -all |grep DB2_SKIPINSERTED
[i] DB2_SKIPINSERTED=ON

DB2_SKIPINSERTED info center entry

Deadlock Event Monitor

While not a parameter per se, having your deadlock event monitor properly set up is important to deadlock analysis. DB2 starting with version 8 comes with a detailed deadlock event monitor enabled by default. This is actually awesome because it means that in many cases, we have the data we need to analyze deadlocks after they happen. But one of the problems is that this event monitor is set up with very little space. Because of this, I re-create it whenever I’m setting up a new database. My favorite syntax for that is:

db2 "create event monitor <dl_evmon_name> for deadlocks with details write to file 'ros_detaildeadlock' maxfiles 2000 maxfilesize 10000 blocked append autostart"

You have to have the disk space to support that, and you still may have to clear out old output files by dropping and recreating the event monitor from time to time.

I frequently get questions about having the deadlock event monitor write to tables. My inclination is not to write to tables – mostly because I’ve seen deadlocking issues with hundreds or thousands of deadlocks per hour that just creamed the database – and the additional load of writing to tables during such an issue might make things even worse.

 

 

So I hope something there helps someone who is looking at locking parameters.

Analyzing Deadocks – the old way

$
0
0

In 9.7, DB2 started offering a new monitoring method for deadlocking. Though this post describes the “old” way, this method also works in db2 9.7. Detailedeadlock event monitors have been deprecated, but not yet removed. This means that even in 9.7, you can still create them and work with them.

If you’re at all confused about the difference between deadlocks and lock timeouts, please first read my post on Deadlocks VS. Lock timeouts.

Creating the deadlock event monitor

One of the most critical things here is that you must have the detailedeadlock event monitor in place and working before you run into an issue. By default (even in 9.7), db2 has one called simply ‘db2detaildeadlock’. The only problem with it is that it may run out of space rather quickly. As a result, I re-create it on build, using this syntax (you’ll need a database connection, of course):

db2 "create event monitor my_detaildeadlock for deadlocks with details write to file 'my_detaildeadlock' maxfiles 2000 maxfilesize 10000 blocked append autostart"
DB20000I  The SQL command completed successfully.

In additon, you must actually manually create the directory for the event monitor. It goes in the ‘db2event’ subdirectory of the database path, so in my latest example, I used something like this statement to create it:

mkdir /db_data/db2inst1/NODE0000/SQL00002/db2event/my_detaildeadlock

And then there’s activating the new one and dropping the old one:

db2 "set event monitor ros_detaildeadlock state=1"
DB20000I  The SQL command completed successfully.

db2 "set event monitor db2detaildeadlock state=0"
DB20000I  The SQL command completed successfully.

db2 "drop event monitor db2detaildeadlock"
DB20000I  The SQL command completed successfully.

Finally, you’ll want to verify the event monitor state:

> db2 "select substr(evmonname,1,30) as evmonname, EVENT_MON_STATE(evmonname) as state from syscat.eventmonitors with ur"

EVMONNAME                      STATE
------------------------------ -----------
ROS_DETAILDEADLOCK                       1

  1 record(s) selected.

1 means active for the state, and 0 means not active, so this is the output we want.

Parsing and Analyzing output

Now that you’ve got the event monitor running, what do you do with it? Well, assuming you actually have some deadlocks (as you can tell through the snapshot event monitor, using db2top, db2pd, db2 admin views, or the get snapshot command), you’ll want to flush the event monitor and convert it’s output to human readable form:

> db2 flush event monitor MY_DETAILDEADLOCK
DB20000I  The SQL command completed successfully.
> db2evmon -path /db_data/db2inst1/NODE0000/SQL00002/db2event/my_detaildeadlock >deadlocks.out

Reading /db_data/db2inst1/NODE0000/SQL00002/db2event/ros_detaildeadlock/00000000.evt ...

Your path may be different, of course. I prefer the path option on db2evmon because I’ve had less problems with it. There is an option to specify the dbname and event monitor name – I just find that it’s not as reliable.

So now you’ve done the easy part. Yep, that’s right, that’s the easy part. Depending on the number of deadlocks, you may now have a giant file. I seem to remember parsing a 15 GB one at one time. Here are some snippets of the output to give an idea of what you’re looking at:

379) Deadlock Event ...
  Deadlock ID:   20
  Number of applications deadlocked: 2
  Deadlock detection time: 01/03/2012 14:06:13.425034
  Rolled back Appl participant no: 2
  Rolled back Appl Id: 172.19.10.61.37259.120103200006
  Rolled back Appl seq number: : 0009
...
381) Deadlocked Connection ...
  Deadlock ID:   20
  Participant no.: 2
  Participant no. holding the lock: 1
  Appl Id: 172.19.10.61.37259.120103200006
  Appl Seq number: 00009
  Tpmon Client Workstation: spp27comm02x
  Appl Id of connection holding the lock: 172.19.10.61.62895.120103194755
  Seq. no. of connection holding the lock: 00001
  Lock wait start time: 01/03/2012 14:06:03.651592
  Lock Name       : 0x02000C1A1500BCE31800000052
  Lock Attributes : 0x00000000
  Release Flags   : 0x00000001
  Lock Count      : 1
  Hold Count      : 0
  Current Mode    : none
  Deadlock detection time: 01/03/2012 14:06:13.425119
  Table of lock waited on      : USERS
  Schema of lock waited on     : WSCOMUSR
  Data partition id for table  : 0
  Tablespace of lock waited on : USERSPACE1
  Type of lock: Row
  Mode of lock: X   - Exclusive
  Mode application requested on lock: NS  - Share (and Next Key Share)
  Node lock occured on: 0
  Lock object name: 106899963925
  Application Handle: 47264
  Deadlocked Statement:
    Type     : Dynamic
    Operation: Fetch
    Section  : 2
    Creator  : NULLID
    Package  : SYSSH200
    Cursor   : SQL_CURSH200C2
    Cursor was blocking: FALSE
    Text     : SELECT T1.STATE, T1.MEMBER_ID, T1.OPTCOUNTER, T1.TYPE, T2.FIELD2, T2.REGISTRATIONUPDATE, T2.FIELD3, T2.LASTORDER, T2.LANGUAGE_ID, T2.PREVLASTSESSION, T2.SETCCURR, T2.DN, T2.REGISTRATIONCANCEL, T2.LASTSESSION, T2.REGISTRATION, T2.FIELD1, T2.REGISTERTYPE, T2.PROFILETYPE, T2.PERSONALIZATIONID FROM MEMBER  T1, USERS  T2 WHERE T1.TYPE = 'U' AND T1.MEMBER_ID = T2.USERS_ID AND T1.MEMBER_ID = ?
  List of Locks:
...
383) Deadlocked Connection ...
  Deadlock ID:   20
  Participant no.: 1
  Participant no. holding the lock: 2
  Appl Id: 172.19.10.61.62895.120103194755
  Appl Seq number: 00905
  Tpmon Client Workstation: spp27comm02x
  Appl Id of connection holding the lock: 172.19.10.61.37259.120103200006
  Seq. no. of connection holding the lock: 00001
  Lock wait start time: 01/03/2012 14:06:03.657097
  Lock Name       : 0x02000D0E2F00F8D61800000052
  Lock Attributes : 0x00000000
  Release Flags   : 0x40000000
  Lock Count      : 1
  Hold Count      : 0
  Current Mode    : U   - Update
  Deadlock detection time: 01/03/2012 14:06:13.425274
  Table of lock waited on      : MEMBER
  Schema of lock waited on     : WSCOMUSR
  Data partition id for table  : 0
  Tablespace of lock waited on : USERSPACE1
  Type of lock: Row
  Mode of lock: NS  - Share (and Next Key Share)
  Mode application requested on lock: X   - Exclusive
  Node lock occured on: 0
  Lock object name: 106685792303
  Application Handle: 47206
  Deadlocked Statement:
    Type     : Dynamic
    Operation: Execute
    Section  : 25
    Creator  : NULLID
    Package  : SYSSH200
    Cursor   : SQL_CURSH200C25
    Cursor was blocking: FALSE
    Text     : UPDATE MEMBER  SET STATE = ?, OPTCOUNTER = ? WHERE MEMBER_ID = ? AND OPTCOUNTER = ?
  List of Locks:
...

I’ve removed the list of locks due to length, and also entries on the connection events, but have not altered the actual output here.

The “Deadlock ID” here lets us identify which deadlock this was a participant in. Deadlocks most frequently involve 2 connections, but they can involve 3, 4, 5, or even more.

Looking at “Participant no” both in the “Deadlock Event” section and the “Deadlocked Connection” sections and “Rolled back Appl participant no” in the “Deadlock Event” section, you can understand which statement was rolled back and which was allowed to continue.

There’s a lot more useful information there to parse through – most of it is pretty obvious in its meaning.

It is nice to go through and determine if the same statements were involved in deadlocks over and over again – which statements were most frequently involved in a deadlock. It’s also nice to analyze the timing of the deadlocks – I find summarizing by hour very useful in helping to determine if they were limited to a specific time period. It can also be interesting to summarize by table to see if a particular table is frequently involved.

What to do with the analysis

The number one thing to do with what you find is to provide the SQL to your developers. They should be able to understand where that SQL is coming from in your application, and should be able to come up with ideas to reduce the deadlocking.

Remember that deadlocking is an application problem whose symptoms appear on the database. The sum total of everything you can do that might reduce deadlocking at the database level is:

  1. Keep runstats current
  2. Set the db2 registry variables, ONLY IF YOUR APPLICATION EXPLICITLY SUPPORTS THEM:
  • DB2_SKIPINSERTED
  • DB2_SKIPDELETED
  • DB2_EVALUNCOMMITTED

Increasing LOCKLIST will not help with deadlocking unless you’re also seeing lock escallations.

References:

Info Center entry on deprecation of detaildeadlock event monitor: http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/topic/com.ibm.db2.luw.wn.doc/doc/i0054715.html

 

Info Center entry on db2evmon:

http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/topic/com.ibm.db2.luw.admin.cmd.doc/doc/r0002046.html

 

Keep your eyes open for a new post on Analyzing Deadlocks the new way – using the event monitor for locks.


Analyzing Deadlocks – the new way

$
0
0

The section titled “To Format the Output to a Flat File” was updated on 2/13/2012.
Edit on 12/11/2014: This new method of analyzing locking issues became available in DB2 9.7.

So you can still use the old way, and if you want to avoid event monitors that write to tables, that’s still the only way. See Analyzing Deadlocks – the old way for more details on that method.

This new method is not yet enabled by default, but I would expect it would be in a future release. The pattern I’ve seen IBM follow is that in one version, they introduce a new way of doing something and deprecate the old way. In the next version, they generally make the new way the default, requiring you to take more drastic actions to keep using the old one. Then in the following version, they remove the ability to use the old method. Sometimes they allow more than one version to pass in each of those steps, but it sure seems likely that’s the general direction they’re going now.

I haven’t actually done all that much with this in production, though I’m in the process of doing so now, so I don’t have a strong opinion which is better while we have the choice.

Creating the Event Monitor

So first you’ll want to be creating the table for this in a 32k tablespace. So assuming you don’t already have one, you need to do this:

Create the 32k bufferpool (if you’re working with WebSphere Commerce, this exists out of the box, but you still need to run the last two commands):

db2 "CREATE BUFFERPOOL BUFF32K IMMEDIATE SIZE 2500 AUTOMATIC PAGESIZE 32 K"

Create a 32k tablespace (using AST) – only needed if you don’t already have one you want to use for this:

db2 "create large tablespace TAB32K pagesize 32 K bufferpool BUFF32K dropped table recovery on"

Create a 32k temp tablespace (using AST) – only needed if you don’t already have one:

db2 "CREATE SYSTEM TEMPORARY TABLESPACE TEMPSYS32K PAGESIZE 32 K BUFFERPOOL BUFF32K"

Create the Event Monitor for Locks:

>db2 "create event monitor my_locks for locking write to unformatted event table (table dba.my_locks in TAB32K) autostart"
DB20000I  The SQL command completed successfully.
> db2 "set event monitor MY_LOCKS state=1"
DB20000I  The SQL command completed successfully.

Then verify that the event monitor has the correct state:

> db2 "select substr(evmonname,1,30) as evmonname, EVENT_MON_STATE(evmonname) as state from syscat.eventmonitors where evmonname='MY_LOCKS' with ur"

EVMONNAME                      STATE
------------------------------ -----------
MY_LOCKS                                1

  1 record(s) selected.

1 means active, so that’s what we want. 0 means not active.

Setting the Collection Parameters

So this was new to me. In addition to create the event monitor for locks, you also have to enable collection of information in the database configuration or at the workload level. I’m not used to working at the workload level, so I’ll go with what makes more sense to me and give you the parameters to set at the database level.

MON_LOCKTIMEOUT

Changes to this parameter should take effect without a database recycle. By default, this parameter is set to “NONE”. Possible values include:

  • NONE – no data is collected on lock timeouts (DEFAULT)
  • WITHOUT_HIST – data about lock timeout events is sent to any active event monitor tracking locking events
  • HISTORY – the last 250 activities performed in the same UOW are tracked by event monitors tracking locking events, in addition to the data about lock timeout events.
  • HIST_AND_VALUES – In addition to the the last 250 activities perfromed in the same UOW and the data about lock timeout events, values that are not long or xml data are also sent to any active event monitor tracking locking events
Based on similar settings with the deadlock event monitors, I’m going to guess that WITHOUT_HIST would be the most useful for every day use. To set that, use this syntax:
> db2 update db cfg for dbname using MON_LOCKTIMEOUT WITHOUT_HIST immediate
DB20000I  The UPDATE DATABASE CONFIGURATION command completed successfully.

MON_DEADLOCK

The values here are similar to those for MON_LOCKTIMEOUT, however, the default is different. The default here is WITHOUT_HIST. all possible values include:

  • NONE – no data is collected on deadlocks
  • WITHOUT_HIST – data about deadlock events is sent to any active event monitor tracking locking events (DEFAULT)
  • HISTORY – the last 250 activities performed in the same UOW are tracked by event monitors tracking locking events, in addition to the data about deadlock events.
  • HIST_AND_VALUES – In addition to the the last 250 activities perfromed in the same UOW and the data about deadlock events, values that are not long or xml data are also sent to any active event monitor tracking locking events
My initial thought here would also be to go with WITHOUT_HIST.

MON_LOCKWAIT

This has essentially the same options as the last two. By default, this parameter is set to “NONE”. Possible values include:

  • NONE – no data is collected on lock waits (DEFAULT)
  • WITHOUT_HIST – data about lock wait events is sent to any active event monitor tracking locking events
  • HISTORY – the last 250 activities performed in the same UOW are tracked by event monitors tracking locking events, in addition to the data about lock wait events.
  • HIST_AND_VALUES – In addition to the the last 250 activities perfromed in the same UOW and the data about lock wait events, values that are not long or xml data are also sent to any active event monitor tracking locking events
I would be very cautious on setting this one away from the default. I consider some level of lock-waiting absolutely normal. If you were to set the value of the next related parameter – MON_LW_THRESH – too low, this could generate a huge amount of data. On the other hand, if your transaction time is being increased due to lock wait time, this may be valuable to use. I would go with the default of NONE for normal use.

MON_LW_THRESH

The default here is 5 seconds (5,000,000). For whatever reason, this is specified in microseconds (one million to the second). I certainly wouldn’t recommend setting this too low. This parameter only means something if MON_LOCKWAIT is set to something other than NONE.

MON_LCK_MSG_LVL

This parameter indicates what events will be logged to the notify log. Not being a big fan of the notify log, I’m not sure how much I’d use this, but here are the possible values:

  • 0 – No notification of locking phenomena is done, including deadlocks, lock escalations, and lock timeouts
  • 1 – Notification of only lock escalations is done – notifications are not done for deadlocks or lock timeouts
  • 2 – Notification of lock escalations and deadlocks are done – no notification of lock timeouts is done
  • 3 – Notification is done for lock escalations, deadlocks, and lock timeouts
My inclination here would be to set this to 2 – if I’m looking back, I want to know about deadlocks and escalations, and I worry that lock timeouts might be a bit much. To update this, use:
> db2 update db cfg for dbname using MON_LCK_MSG_LVL 2 immediate
DB20000I  The UPDATE DATABASE CONFIGURATION command completed successfully.

Parsing/Analyzing Output

So that last bit was the easy part. That gets you basic collection of the data you need, in an unformated way. Now you have to format the data and make sense of it. You can format the data either to a flat file or to tables. If you have severe locking issues, go for flat file to eliminate additional database load. If your locking issues are more moderate and you still have available capacity on your database server, go with the tables – they’re easier to query and summarize.

Getting output in human-readable format

To format the output to a flat file:

This section updated 2/13/2012.

I have, as yet, been unable to test this method, as the samples referenced everywhere are missing from every 9.7 installation I have. I hope to update this portion of this post in the future when I get this figured out.

I had a bear of a time getting this to work. I started with the instructions in this technical article: http://www.ibm.com/developerworks/data/library/techarticle/dm-1004lockeventmonitor/

But that post references files that were missing from every 9.7 installation I have(over a dozen different servers on at least two different flavors of Linux). I asked a colleage who works on a wider variety of systems, and they did not have the same issue. So I have no idea what the deal is here, but there must be someone else in the same place as me, looking at the directories and doing find commands and so on with no results. Here’s how I finally made it work.

Make the directory $HOME/bin if you don’t already have it

mkdir $HOME/bin

Add to your path $HOME/bin:$HOME/sqllib/java/jdk64/bin (substituting your instance home directory for $HOME). Here’s the syntax I used for that:

export PATH=$PATH:/db2home/db2inst1/bin:/db2home/db2inst1/sqllib/java/jdk64/bin

Into a file in $HOME/bin called db2evmonfmt.java, copy the content from this link: http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/index.jsp?topic=%2Fcom.ibm.db2.luw.apdv.sample.doc%2Fdoc%2Fjava_jdbc%2Fs-db2evmonfmt-java.html

Install db2 on your laptop or a windows system and copy C:\Program Files\IBM\SQLLIB_01\samples\java\jdbc\DB2EvmonLocking.xsl to $HOME/bin. I could not find DB2EvmonLocking.xsl online anywhere, and I don’t want to get myself on the wrong side of IBM legally by posting it myself, so if anyone reads this at IBM, I urge you to make this file available through the info center too.

Compile db2evmonfmt.java using this as the db2instance owner from the $HOME/bin directory:

javac db2evmonfmt.java
It will create two class files in the same directory.
Finally, use this to actually generate the flat-file report:
java db2evmonfmt -d sample -ue DBA.MY_LOCKS -ftext

Where the database name replaces ‘sample’, and the unformatted event monitory table name is “DBA.MY_LOCKS”.

The output’s not bad, overall – I think more useful than the db2detaildeadlock formatted output that we got with the old method. The output you get looks like this:

> java db2evmonfmt -d sample -ue DBA.MY_LOCKS -ftext
SELECT evmon.xmlreport FROM TABLE ( EVMON_FORMAT_UE_TO_XML( 'LOG_TO_FILE',FOR EACH ROW OF ( SELECT * FROM DBA.ROS_LOCKS  ORDER BY EVENT_ID, EVENT_TIMESTAMP, EVENT_TYPE, MEMBER ))) AS evmon

-------------------------------------------------------
Event ID               : 1
Event Type             : LOCKTIMEOUT
Event Timestamp        : 2012-02-07-12.52.54.529330
Partition of detection : 0
-------------------------------------------------------

Participant No 1 requesting lock
----------------------------------
Lock Name            : 0x03002900000000000000000054
Lock wait start time : 2012-02-07-12.52.09.070093
Lock wait end time   : 2012-02-07-12.52.54.529330
Lock Type            : TABLE
Lock Specifics       :
Lock Attributes      : 00000000
Lock mode requested  : Intent Exclusive
Lock mode held       : Exclusive
Lock Count           : 0
Lock Hold Count      : 0
Lock rrIID           : 0
Lock Status          : Converting
Lock release flags   : 40000000
Tablespace TID       : 3
Tablespace Name      : TAB8K
Table FID            : 41
Table Schema         : WSCOMUSR
Table Name           : STAGLOG

Attributes            Requester                       Owner
--------------------- ------------------------------  ------------------------------
Participant No        1                               2
Application Handle    030682                          030734
Application ID        REDACTED                        REDACTED
Application Name      db2bp                           db2jcc_application
Authentication ID     REDACTED                        REDACTED
Requesting AgentID    1728                            1385
Coordinating AgentID  1728                            1385
Agent Status          UOW Executing                   UOW Waiting
Application Action    No action                       No action
Lock timeout value    45                              0
Lock wait value       0                               0
Workload ID           1                               1
Workload Name         SYSDEFAULTUSERWORKLOAD          SYSDEFAULTUSERWORKLOAD
Service subclass ID   13                              13
Service subclass      SYSDEFAULTSUBCLASS              SYSDEFAULTSUBCLASS
Current Request       Execute Immediate               Execute
TEntry state          2                               2
TEntry flags1         00000000                        00000000
TEntry flags2         00000200                        00000200
Lock escalation       no                              no
Client userid
Client wrkstnname                                     REDACTED
Client applname
Client acctng

Current Activities of Participant No 1
----------------------------------------
Activity ID        : 1
Uow ID             : 14
Package Name       : SQLC2H21
Package Schema     : NULLID
Package Version    :
Package Token      : AAAAAPAa
Package Sectno     : 203
Reopt value        : none
Incremental Bind   : no
Eff isolation      : CS
Eff degree         : 0
Eff locktimeout    : 45
Stmt first use     : 2012-02-07-12.31.06.297485
Stmt last use      : 2012-02-07-12.31.06.297485
Stmt unicode       : no
Stmt query ID      : 0
Stmt nesting level : 0
Stmt invocation ID : 0
Stmt source ID     : 0
Stmt pkgcache ID   : 1069446856888
Stmt type          : Dynamic
Stmt operation     : DML, Insert/Update/Delete
Stmt text          : delete from attr where attr_id >= 7000000000000000001 and attr_id <= 7000000000000000020

To format the output to tables (to tables in the ‘DBA’ schema for our example):

> db2 "call EVMON_FORMAT_UE_TO_TABLES ('LOCKING', NULL, NULL, NULL, 'DBA', NULL, NULL, -1, 'SELECT * FROM DBA.MY_LOCKS ORDER BY event_timestamp')"

  Return Status = 0

If you receive this error:

SQL0171N  The data type, length or value of the argument for the parameter in
position "3 (query) " of routine "XDB_DECOMP_XML_FROM_QUERY" is incorrect.
Parameter name: "".  SQLSTATE=42815

check to make sure that you’ve specified the right values for the type of event monitor, the schema where the tables should be created, and the table name the event monitor is writing to. Also check to make sure you have the appropriately sized system temporary tablespace, as an order-by on the un-formatted event monitor table will fail without it.

Assuming this is the first time you’ve run EVMON_FORMAT_UE_TO_TABLES, you’ll see a number of new tables:

LOCK_ACTIVITY_VALUES
LOCK_EVENT
LOCK_PARTICIPANTS
LOCK_PARTICIPANT_ACTIVITIES

Some interesting SQL to help you navigate the output in tables

Please first note that this SQL is tested only for functionality and not for performance – I haven’t explained it or done any optimization on it, so expect it to run poorly for large amounts of data.

First to list all of the locking events:

> db2 "select event_id, substr(event_type,1,18) as event_type, event_timestamp, dl_conns, rolled_back_participant_no from DBA.LOCK_EVENT order by event_id, event_timestamp with ur"

EVENT_ID             EVENT_TYPE         EVENT_TIMESTAMP            DL_CONNS    ROLLED_BACK_PARTICIPANT_NO
-------------------- ------------------ -------------------------- ----------- --------------------------
                   1 DEADLOCK           2012-01-23-15.36.56.036831           2                          2
                   1 DEADLOCK           2012-01-23-15.36.56.036831           2                          2
                   2 LOCKTIMEOUT        2012-01-23-15.43.06.875032           -                          -
                   2 LOCKTIMEOUT        2012-01-23-15.43.06.875032           -                          -

  4 record(s) selected.

To summarize counts

> db2 "select substr(event_type,1,18) as event_type, count(*) as count, sum(dl_conns) sum_involved_connections from DBA.LOCK_EVENT group by event_type with ur"

EVENT_TYPE         COUNT       SUM_INVOLVED_CONNECTIONS
------------------ ----------- ------------------------
DEADLOCK                     2                        4
LOCKTIMEOUT                  2                        -

  2 record(s) selected.

To summarize counts by hour

> db2 "select substr(event_type,1,18) as event_type, year(event_timestamp) as year, month(event_timestamp) as month, day(event_timestamp) as day, hour(event_timestamp) as hour, count(*) as count from DBA.LOCK_EVENT group by year(event_timestamp), month(event_timestamp), day(event_timestamp), hour(event_timestamp), event_type order by year(event_timestamp), month(event_timestamp), day(event_timestamp), hour(event_timestamp), event_type with ur"

EVENT_TYPE         YEAR        MONTH       DAY         HOUR        COUNT
------------------ ----------- ----------- ----------- ----------- -----------
DEADLOCK                  2012           1          23          15           2
LOCKTIMEOUT               2012           1          23          15           2

  2 record(s) selected.

To summarize by table and lock event type:

> db2 "select substr(lp.table_schema,1,18) as table_schema, substr(lp.table_name,1,30) as table_name, substr(le.event_type,1,18) as lock_event, count(*)/2 as count from DBA.LOCK_PARTICIPANTS lp, DBA.LOCK_EVENT le where lp.xmlid=le.xmlid group by lp.table_schema, lp.table_name, le.event_type order by lp.table_schema, lp.table_name, le.event_type with ur"
TABLE_SCHEMA       TABLE_NAME                     LOCK_EVENT         COUNT
------------------ ------------------------------ ------------------ -----------
DB2INST1           INVENTORY                      DEADLOCK                     2
DB2INST1           PURCHASEORDER                  DEADLOCK                     2
DB2INST1           PURCHASEORDER                  LOCKTIMEOUT                  2
-                  -                              LOCKTIMEOUT                  2

  4 record(s) selected.

If you want to only look at deadlocks or lock timeouts, it is easy to add a where clause on event_type to these queries.

To summarize by statement:

> db2 "with t1 as (select STMT_PKGCACHE_ID as STMT_PKGCACHE_ID, count(*) as stmt_count from dba.lock_participant_activities group by STMT_PKGCACHE_ID) select t1.stmt_count, (select substr(STMT_TEXT,1,100) as stmt_text from dba.lock_participant_activities a1 where a1.STMT_PKGCACHE_ID=t1.STMT_PKGCACHE_ID fetch first 1 row only) from t1 order by t1.stmt_count desc with ur"

STMT_COUNT  STMT_TEXT
----------- ----------------------------------------------------------------------------------------------------
          4 update db2inst1.purchaseorder set ORDERDATE=current timestamp - 7 days where POID='5000'
          2 update db2inst1.INVENTORY set QUANTITY= QUANTITY+5 where PID='100-101-01'

  2 record(s) selected.

The substr on stmt_text in the above statement is included for readability only – I would recommend removing that substr when actually using this SQL.

If you want to do the same thing, counting only statements invloved in deadlocks, try this:

> db2 "with t1 as (select STMT_PKGCACHE_ID as STMT_PKGCACHE_ID, count(*) as stmt_count from dba.lock_participant_activities where XMLID like '%DEADLOCK%' group by STMT_PKGCACHE_ID) select t1.stmt_count, (select substr(STMT_TEXT,1,100) as stmt_text from dba.lock_participant_activities a1 where a1.STMT_PKGCACHE_ID=t1.STMT_PKGCACHE_ID fetch first 1 row only) from t1 with ur"

STMT_COUNT  STMT_TEXT
----------- ----------------------------------------------------------------------------------------------------
          2 update db2inst1.purchaseorder set ORDERDATE=current timestamp - 7 days where POID='5000'
          2 update db2inst1.INVENTORY set QUANTITY= QUANTITY+5 where PID='100-101-01'

  2 record(s) selected.

 

References

Statement on deprication of event monitor for detaildeadlocks: http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/topic/com.ibm.db2.luw.wn.doc/doc/i0054715.html

Syntax for creating lock event monitor: http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/topic/com.ibm.db2.luw.sql.ref.doc/doc/r0054074.html

Syntax for EVMON_FORMAT_UE_TO_TABLES: http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/topic/com.ibm.db2.luw.sql.rtn.doc/doc/r0054910.html

Reference on tables data is written to: http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/topic/com.ibm.db2.luw.admin.mon.doc/doc/r0055559.html

Good article on this topic on DW: http://www.ibm.com/developerworks/data/library/techarticle/dm-1004lockeventmonitor/ – this also has descriptions of how to artificially create a deadlock on the sample database for testing purposes, but doesn’t include SQL to parse the output if you write the formatted data out to tables.

 

 

How to use the DB2 Governor to force off long idle connections

$
0
0

Ok, so the DB2 Governor is deprecated with 9.7. But it’s only replacement is a pay-for-use tool – the workload manager. So I imagine I’ll be writing a script to do the very basic things that I do with the DB2 Governor when it’s gone.

The DB2 Governor has a lot of purposes. It can be used to change the priority or limit connections based on a variety of criteria, but the way I use it is very basic. The main way I use it is to prevent log file saturation caused by idle connections. For a definition of log file saturation if you’re not familiar with it, read the bottom of this post: Managing db2 transaction log files.

I use the DB2 Governor to force off connections that have been idle for more than 4 hours. I’ve seen a number of cases where a hung connection causes log file saturation by holding on to an older log file. In many cases, these connections are random one-off connections that are not intentionally idle – either a developer has left a connection open or massload or some other tool has failed without releasing all resources.

Most WebSphere Commerce connections have some activity every 10 minutes or more frequently. If you’re running some other app, you’d obviously have to consider what timing is appropriate for you. Some applications may actually require a connection to the database with a lot of idle time.

Creating the governor config file

The config file is pretty simple:

1  { Wake up once every three minutes, database name is sample }
2  interval 180; dbname sample;
3
4  desc "Force off java applications idle for more than 4 hours"
5  applname java,java.exe
6  setlimit idle 14400
7  action force;

The leading numbers are added for convenience here, and are  not a part of the file.

Line 1 is a comment line. Anything in curly brackets is a comment, and at least one line of comment is nice.

Line 2 simply sets the wake-up interval and database name. These clauses can be specified only once per file, so only one database and interval may be specified.

Line 4 is simply a descriptive line

Line 5 sets the application names that will be affected. In my case, I only want to affect java connections, not command line or other connections

Line 6 sets a limit for the idle time of 14,400 seconds or 4 hours.

Line 7 indicates that when the limit above is encountered, the connection in question will be forced off of the database.

Starting the governor

With the above in a file named whatever you like and in whatever location you like, you can then start the governor using this syntax:

db2gov start sample dbpartitionnum 0 /fully/qualified/path/db2gov.sample.cfg db2gov.sample.log

The database name in this example is ‘sample’. The dbpartionnum clause is required even on single partition databases to prevent some odd error messages. The log file is created in the instance home directory under sqllib/logs.

If you get this error:

GOV1007N Governor already flagged as running

Then you must stop the governor before the start will succeed.

Keeping the governor up

There is no autostart function for the governor that I am aware of. So that leaves us to solve the situations of starting the governor on db2start or reboot and also if it should happen to crash. My preferred solution here is to run a simple script every 15 minutes or so that checks and sees if the governor is up, and starts it if not. Such a script has to also be able to either handle the GOV1007N error, or to always stop the governor before starting it.

Stopping the governor

Stopping the governor is quite simple:

db2gov stop sample dbpartitionnum 0

The only details we have to specify are the database name and the same dbpartitionnum clause.

Introducing Parameter Wednesday – DBM CFG: NUMDB

$
0
0

This is a new blog post format I’m introducing. I’m declaring Wednesday Parameter Day. That means each Wednesday, I’ll pick a parameter and cover it in excruciating detail. Some of the details will come straight out of the info center, but I’ll add my own experiences and insight geared towards e-commerce databases and throw in specifics for WebSphere Commerce from time to time. I’m selecting my own order – they’re not necessarily going to be in the order of where they are listed or alphabetical or even of impact or anything. That also means if you want more details on a parameter, comment or email me and I’ll generally be glad to slip it in. I’m starting with a relatively simple one to get the format worked out.

DB2 Version This Was Written For

9.7

Parameter Name

NUMDB

Where This Parameter Lives

Database Manager Configuration

Description

Defines the maximum number of databases that can be concurrently active

Impact

If this is set too low, you will get an error. May impact how memory is allocated, so shouldn’t be set too high.

Default

8 (UNIX/Linux)

8 (Windows server with local and remote clients)

3 (Windows server with local clients)

Range

1-256

Recycle Required To Take Effect?

DB2 instance recycle is required for this to take effect.

Can It Be Set To AUTOMATIC?

No, this cannot be set to AUTOMATIC

How To Change It

db2 update dbm cfg using NUMDB N

where N is the number you wish to set NUMDB to

Rule of Thumb

Leave this at the default unless you specifically know you will have more than that number of concurrently active databases.

Tuning Considerations

Since an error message is returned when you need to change this parameter, there isn’t a lot of tuning to be done.

Related Error Messages

SQL1041N The maximum number of concurrent databases have already been
started. SQLSTATE=57032

This error indicates that NUMDB needs to be higher.

War Stories From The Real World

This is generally a pretty boring parameter. I’ve seen as many as 14 databases on a single instance. I would generally aim for having fewer than 8 databases on an instance anyway. I’m a big fan of the one database on one instance approach unless we’re talking about small databases like configuration databases or ESB databases. I’ve certainly had to increase the parameter, but there’s nothing complicated about doing that.

Link To Info Center

http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/topic/com.ibm.db2.luw.admin.config.doc/doc/r0000278.html

Related Parameters

The info center lists all the parameters that control memory allocated on a per-database basis. See the info center link above to see those links.

What DBAs can do to Reduce Deadlocks

$
0
0

Deadlocking is an application problem. There are only a few things that DBAs can do to reduce deadlocking, and they all require buy-in from the application. Let me repeat that another way. Don’t set the parameters mentioned here without understanding the impact on your application.

Currently Committed

This is new behavior in DB2 9.7. It has a similar effect to Oracle’s locking methodology in that “Readers don’t block writers and writers don’t block readers”. If you’ve created your database on 9.7, it is on by default. If you upgrade to 9.7, you can turn it on like this:

db2 update db cfg for <dbname> using CUR_COMMIT ON

It is a database configuration parameter, so you’d have to look at it for each database if you have more than one on an instance.

One thing I like about this is that it can be set separately from DB2_COMPATIBILITY_VECTOR. For applications like WebSphere Commerce that don’t support DB2_COMPATIBILITY_VECTOR, this is nice.

Because this reduces locking overall, it also reduces deadlocking. It also changes how locks are acquired, so your application should explicitly support this setting. WebSphere Commerce 7 supports it. WebSphere Commerce 6 does not support it.

Three DB2 Registry Parameters

DB2_EVALUNCOMMITTED

This DB2 registry parameter is one of three that changes how db2 locks rows. As such it is dangerous and should only be used if your application explicitly supports its use. WebSphere Commerce supports the use of all three. Be careful of DB2 instances where you have more than one database – this is set at the instance level, and you’ll want to make sure that all applications accessing any database in the instance support these parameters.

This DB2 registry parameter allows db2 to evaluate rows to see if they meet the conditions of the query BEFORE locking the row when using RS or CS isolation levels. The normal behavior for these isolation levels would be to lock the row before determining if it matched.

To check your current value:

$ db2set -all |grep DB2_EVALUNCOMMITTED
[i] DB2_EVALUNCOMMITTED=YES

If nothing is returned from this command, then the parameter is not set. The entire db2 instance must be recycled (db2stop/db2start) for changes to this parameter to take effect.

DB2_EVALUNCOMMITTED info center entry

DB2_SKIPDELETED

This DB2 registry parameter is one of three that changes how db2 locks rows. As such it is dangerous and should only be used if your application explicitly supports its use. WebSphere Commerce supports the use of all three. Be careful of DB2 instances where you have more than one database – this is set at the instance level, and you’ll want to make sure that all applications accessing any database in the instance support these parameters.

This DB2 registry parameter allows DB2 to skip uncommitted deleted rows during index scans. If it is not set, db2 will still evaluate uncommitted deleted rows during index scans. The normal behavior is for DB2 to evaluate uncommitted deleted rows in indexes until they are actually committed.

To check your current value:

$ db2set -all |grep DB2_SKIPDELETED
[i] DB2_SKIPDELETED=ON

If nothing is returned from this command, then the parameter is not set. The entire db2 instance must be recycled (db2stop/db2start) for changes to this parameter to take effect.

DB2_SKIPDELETED info center entry

DB2_SKIPINSERTED

This DB2 registry parameter is one of three that changes how db2 locks rows. As such it is dangerous and should only be used if your application explicitly supports its use. WebSphere Commerce supports the use of all three. Be careful of DB2 instances where you have more than one database – this is set at the instance level, and you’ll want to make sure that all applications accessing any database in the instance support these parameters.

This DB2 registry parameter allows DB2 to skip uncommitted newly inserted rows. If this parameter is not set, DB2 waits for the inserts to be committed or rolled back before continuing – you can see how this might not be ideal for a database that requires high concurrency. Like the others, this applies to CS and RS isolation levels.

To check your current value:

$ db2set -all |grep DB2_SKIPINSERTED
[i] DB2_SKIPINSERTED=ON

DB2_SKIPINSERTED info center entry

Increasing LOCKLIST only helps if you’re seeing lock escalations

I put that whole sentence in a heading because I very frequently run into someone who wants me to change LOCKLIST to deal with a deadlocking issue. Increasing LOCKLIST will only help if you’re actually seeing lock escalations. Lock escalations are noted both in the db2diag.log and in counters in the database snapshot.

What won’t help

Changing the LOCKTIMEOUT database configuration parameter will only help if you have it set unreasonably high (higher than 90 seconds for an e-commerce database), and then only in deadlocks that are side-effects of long lock waits.

Changing DLCHKTIME will not help reduce deadlocking.

Deadlocking is an application or database design problem

Deadlocking is an application problem that manifests in the database. Even if it isn’t a database problem, DBAs frequently help developers troubleshoot issues. See my blog entries on analyzing deadlocks to get an idea of how to do this.

Analyzing Deadocks – the old way

Analyzing Deadlocks – the new way

Parameter Wednsday – DBM CFG: DIAGLEVEL

$
0
0

Continuing my trend of attacking some of the simpler parameters first, I’m going to cover DIAGLEVEL this week.

DB2 Version This Was Written For

9.7

Parameter Name

DIAGLEVEL

Where This Parameter Lives

Database Manager Configuration

How To Check Value

> db2 get dbm cfg |grep DIAGLEVEL
 Diagnostic error capture level              (DIAGLEVEL) = 3

OR

> db2 "select name, substr(value,1,16) value, value_flags, substr(deferred_value,1,16) deferred_value, deferred_value_flags, substr(datatype,1,16) datatype from SYSIBMADM.DBMCFG where name='diaglevel' with ur"

NAME                             VALUE            VALUE_FLAGS DEFERRED_VALUE   DEFERRED_VALUE_FLAGS DATATYPE
-------------------------------- ---------------- ----------- ---------------- -------------------- ----------------
diaglevel                        3                NONE        3                NONE                 INTEGER

  1 record(s) selected.

Description

Level of logging for the DB2 diagnostic log

Impact

Can have performance impact if set too high, especially on loads

Default

3

Range/Values

0 – No diagnostic data captured

1 – Severe errors only

2 – All errors

3 – All errors and warnings

4 – All errors, warnings, and informational messages

Recycle Required To Take Effect?

No – should take effect immediately

Can It Be Set To AUTOMATIC?

No, there is no automatic setting for this parameter.

How To Change It

 db2 update dbm cfg using DIAGLEVEL N

Where N is the desired logging level

Rule of Thumb

Leave it at the default. That almost always works for this parameter.

Tuning Considerations

Generally, you will have this at either 3 or 4. You should only use 4 when troubleshooting a specific issue for a limited period of time. You should never leave it at 4 while doing LOADs.

Related Error Messages


 

War Stories From The Real World

I’ve seen a setting of 4 for this increase the time that a nightly LOAD process took by a factor of 10 or more. So be cautious when using 4, even if it is at the recommendation of support. A setting of 4 can also generate a lot of output, so keep a close eye on filesystem utilization while you have it set at 4. Return the setting to 3 as quickly as possible. I’ve never seen a setting other than 3 or 4 used in a real-world situation.

Link To Info Center

 http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/topic/com.ibm.db2.luw.admin.config.doc/doc/r0000298.html

Related Parameters

diagpath – Diagnostic data directory path configuration parameter (Info Center link)

Parameter Wednesday: DBM CFG – INTRA_PARALLEL

$
0
0

DB2 Version This Was Written For

9.7

Parameter Name

INTRA_PARALLEL

Where This Parameter Lives

Database Manager Configuration

How To Check Value

$ db2 get dbm cfg  |grep INTRA_PARALLEL
 Enable intra-partition parallelism     (INTRA_PARALLEL) = NO

OR

$ db2 "select name, substr(value,1,16) value, value_flags, substr(deferred_value,1,16) deferred_value, deferred_value_flags, substr(datatype,1,16) datatype from SYSIBMADM.DBMCFG where name='intra_parallel' with ur"

NAME                             VALUE            VALUE_FLAGS DEFERRED_VALUE   DEFERRED_VALUE_FLAGS DATATYPE
-------------------------------- ---------------- ----------- ---------------- -------------------- ----------------
intra_parallel                   NO               NONE        NO               NONE                 VARCHAR(3)

  1 record(s) selected.

Description

Specifies whether or not the database manager can use Intra-partition parallelism. Intra-partition parallelism allows db2 to apply more than one cpu to processing a query. It is most useful for large queries in dw or dss systems where you have both multiple processors and multiple separate I/O paths.

Impact

Can cause performance degradation if you both enable and use intra-partition parallelism on a e-commerce or transaction processing database. It can significantly increase performance in a dw or dss system if you use it properly.

Default

NO

Range/Values

NO (0), YES (1), SYSTEM(-1)

Recycle Required To Take Effect?

Yes, and packages should be re-bound using db2rbind as well.

Can It Be Set To AUTOMATIC?

SYSTEM is different than automatic – it sets the value based only on the hardware DB2 is running on.

How To Change It

 db2 update dbm cfg for dbname using INTRA_PARALLEL YES

Rule of Thumb

If you have a data warehouse or decision support system, consider setting this to YES and also tune the database configuration parameter DFT_DEGREE appropriately.

Tuning Considerations

The db2 configuration advisor may change this parameter.

It used to be that this parameter was required to be set to YES for parallelism on index creation. This is no longer the case – parallel index creation can occur even when this parameter is set to NO.

The actual level of parallelism used is specified by the db cfg parameter DFT_DEGREE, the CURRENT DEGREE special register, or by a clause on the SQL clause.

The reason this can cause performance degredation for OLTP systems is because there is overhead associated with using paralellism, and that overhead is extra time for the singleton-row queries that are the focus of performance for OLTP systems – and parallelism does not help with these small queries. In DW or DSS systems, the overhead is made up for by an increase in the performance of the rest of the query.

For mixed systems, you can consider setting INTRA_PARALLEL to YES, and set DFT_DEGREE to 1 so that queries not specifying a query degree will continue to not use parallelism. Then you can specify a higher degree of parallelism for queries or applications that can take advantage of it.

Related Error Messages

This parameter doesn’t have specific associated error messages, though it can change performance.

War Stories From The Real World

I’ve spent the last 4 or 5 years nearly exclusively on e-commerce systems, so I haven’t done much with this parameter lately. Before that, I did have a mixed use system (client/server transaction processing and reporting) that had INTRA_PARALLEL set to YES with DFT_DEGREE at 1, but the developers did not make much use of it.

Link To Info Center

http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/index.jsp?topic=%2Fcom.ibm.db2.luw.admin.config.doc%2Fdoc%2Fr0000146.html

Related Parameters

DBM CFG:  MAX_QUERYDEGREE – http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/index.jsp?topic=%2Fcom.ibm.db2.luw.admin.config.doc%2Fdoc%2Fr0000140.html

DB CFG: DFT_DEGREE – http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/index.jsp?topic=%2Fcom.ibm.db2.luw.admin.config.doc%2Fdoc%2Fr0000346.html

Special Register: CURRENT DEGREE – http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/index.jsp?topic=%2Fcom.ibm.db2.luw.sql.ref.doc%2Fdoc%2Fr0005873.html

Parameter Wednesday – DB CFG – LOCKLIST

$
0
0

DB2 Version This Was Written For

9.7

Parameter Name

LOCKLIST

Where This Parameter Lives

Database Configuration

How To Check Value

$ db2 get db cfg for wc005s01 |grep LOCKLIST
 Max storage for lock list (4KB)              (LOCKLIST) = AUTOMATIC(4096)

OR

$ db2 "select name, substr(value,1,16) value, value_flags, substr(deferred_value,1,16) deferred_value, deferred_value_flags, substr(datatype,1,16) datatype from SYSIBMADM.DBCFG where name='locklist' with ur"

NAME                             VALUE            VALUE_FLAGS DEFERRED_VALUE   DEFERRED_VALUE_FLAGS DATATYPE
-------------------------------- ---------------- ----------- ---------------- -------------------- ----------------
locklist                         4096             AUTOMATIC   4096             AUTOMATIC            BIGINT

  1 record(s) selected.

Description

Specifies the maximum amount of memory to use for a list of locks within the DB2 database. DB2 stores lock information in-memory in this one location. The locklist is not written to disk.

Impact

Can cause performance degradation if the locklist becomes full. Once the locklist becomes full, DB2 will escalate row-level locks to table level locks, which can significantly impact the concurrency of connections to the database. Especially for E-Commerce databases, it is important to have a large enough lock list.

Default

AUTOMATIC

Range/Values

4 – 134217728

Recycle Required To Take Effect?

No – but you should do a db2rbind all if you change this parameter.

Can It Be Set To AUTOMATIC?

Yes, and that is the recommended starting point, assuming you’re using STMM. If you set it to AUTOMATIC, MAXLOCKS should also be set to AUTOMATIC. If MAXAPPLS or MAX_COORDAGENTS are set to AUTOMATIC, LOCKLIST should also be set to AUTOMATIC

How To Change It

 db2 update dbm cfg for dbname using LOCKLIST 4096

Rule of Thumb

Set to AUTOMATIC if you are using STMM. If you are not using STMM, somewhere around 5000 is a good starting point for e-commerce databases.

Tuning Considerations

The db2 configuration advisor may change this parameter.

There are some detailed formulas in the DB2 Info Center that you can use to determine upper and lower bounds of possible values, but they are based on knowing the average number of locks per application. I won’t cover the actual formulas here, but will go into detail on some of the components.

  • 256 is the number of bytes used in the locklist for the first lock on an object
  • 128 is the number of bytes used in the locklist for locks on objects that already have at least one other lock against them
  • The average number of locks per application can be determined in an existing database with load on it by looking at the locks_held_top monitor element. This is an event monitoring element, so there is some work involved with looking at this.

The main time you’re going to increase LOCKLIST (assuming you’re not using STMM and AUTOMATIC) is when you see lock escalations. Every time you look at database performance or review your diag log, you should look for lock escalations. By default, lock escalations are written to the diag log and are also counted in the snapshot monitor. You can look at the number of lock escalations since the database was started using this syntax:

$ db2 "select varchar(workload_name,30) as workload_name, lock_escals FROM TABLE(MON_GET_WORKLOAD('',-2)) AS t"

WORKLOAD_NAME                  LOCK_ESCALS
------------------------------ --------------------
SYSDEFAULTUSERWORKLOAD                            0
SYSDEFAULTADMWORKLOAD                             0

  2 record(s) selected.

If you see ANY lock escalations, especially in an e-commerce database, you need to tune locklist or change your application’s behavior to avoid them. Lock escalations are a very bad thing, and even one lock escalation should be looked into and resolved.

Related Error Messages

You can see performance degradation if locklist is too small without seeing error messages.

If LOCKLIST is drastically under sized, you may see:

SQL0912N  The maximum number of lock requests has been reached for the
      database.

This really means you must increase LOCKLIST.

War Stories From The Real World

Frequently when I run into a locking problem, someone (not a DBA) on a conference call will suggest increasing LOCKLIST. The only locking problem that increasing LOCKLIST will solve is lock escalations. Unless lock escalations are occurring, increasing LOCKLIST will not help with deadlocks, lock timeouts, or excessive lock waits.

The only time I have increased this parameter is on build (the defualt in previous versions was far too low), in response to lock escallations, or in response to SQL0912N.

Link To Info Center

http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/topic/com.ibm.db2.luw.admin.config.doc/doc/r0000267.html

Related Parameters

DB CFG:  MAXLOCKS – http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/topic/com.ibm.db2.luw.admin.config.doc/doc/r0000268.html

DB CFG: MAXAPPLS – http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/topic/com.ibm.db2.luw.admin.config.doc/doc/r0000279.html

DB CFG: MAX_COORDAGENTS – http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/topic/com.ibm.db2.luw.admin.config.doc/doc/r0000139.html

System Monitor Element: locks_held_top – http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/topic/com.ibm.db2.luw.admin.mon.doc/doc/r0001291.html

Blog Entries Related to this Parameter

Locking Parameters: http://db2commerce.com/2012/01/09/locking-parameters/


Parameter Wednesday – DB CFG – pckcachesz

$
0
0

DB2 Version This Was Written For

9.7

Parameter Name

PCKCACHESZ

Where This Parameter Lives

Database Configuration

How To Check Value

> db2 get db cfg for sample |grep PCKCACHESZ
 Package cache size (4KB)                   (PCKCACHESZ) = AUTOMATIC(250509)

OR

> db2 "select name, substr(value,1,16) value, value_flags, substr(deferred_value,1,16) deferred_value, deferred_value_flags, substr(datatype,1,16) datatype from SYSIBMADM.DBCFG where name='pckcachesz' with ur"

NAME                             VALUE            VALUE_FLAGS DEFERRED_VALUE   DEFERRED_VALUE_FLAGS DATATYPE
-------------------------------- ---------------- ----------- ---------------- -------------------- ----------------
pckcachesz                       250509           AUTOMATIC   250509           AUTOMATIC            BIGINT

Description

Specifies size (in 4KB pages) of the area of memory used for caching static and dynamic SQL statements and information about those statements that is summarized across executions.

Impact

This is a paramter with a high possibility for performance impact. Also in setting it too high, you may get unusually large dynamic sql snapshots or output from MON_GET_PKG_CACHE_STMT

Default

AUTOMATIC

Range/Values

32 – 2,147,483,646

Recycle Required To Take Effect?

No – changes to this parameter take effect immediately if there is space in database shared memory.

Can It Be Set To AUTOMATIC?

Yes, this can be set to AUTOMATIC, however it is one that I’m most likely not to set at AUTOMATIC.

How To Change It

 db2 update db cfg for dbname using PCKCACHESZ 120

Rule of Thumb

If you really have no clue, Automatic will work as long as you’re not storing snapshot information on disk – if you are, then you may need to consider setting this to something else. If you want to use a hard value, 8192 isn’t a bad place to start.

Tuning Considerations

This is a parameter that assuming you’re not going with AUTOMATIC, you really do need to observe over time and set based on the activity in your environment. This actually falls into my top 10 physical performance tuning areas. When manually tuning this, you want to look at the following:

> db2 get snapshot for database on sample |grep "Package cache"
Package cache lookups                      = 2326089
Package cache inserts                      = 32733
Package cache overflows                    = 0
Package cache high water mark (Bytes)      = 688635351

And calculate:
1-Package cache inserts/Package cache lookups

So in this case, that would be:

1-32733/2326089=0.985925=98.5%

That’s also called your package cache hit ratio. To calculate it in one step, based on the mon_get functions, use:

> db2 "select decimal(PKG_CACHE_INSERTS,10,3) as PKG_CACHE_INSERTS, decimal(PKG_CACHE_LOOKUPS,10,3) as PKG_CACHE_LOOKUPS, decimal(1-decimal(PKG_CACHE_INSERTS,10,3)/decimal(PKG_CACHE_LOOKUPS,10,3),10,3) as pkg_cache_hit_ratio from table(SYSPROC.MON_GET_WORKLOAD('SYSDEFAULTUSERWORKLOAD', -2)) as t with ur"

PKG_CACHE_INSERTS PKG_CACHE_LOOKUPS PKG_CACHE_HIT_RATIO
----------------- ----------------- -------------------
        32750.000       2329102.000               0.985

Obviously if you’re using WLM, you might have to tweak the above, but if you’re not (and it’s a pay-for-use feature), then the above should encompass essentially all database activity.

As with many hit ratios, we’d love to see this above 95%, and can usually be happy with it above 90% and may accept it as low as 80%

Related Error Messages

 

War Stories From The Real World

Man, am I the only one who can never spell this one right? Seriously, “pck”? I always go for “pkg”, plus can never remember whether there’s an underscore before the “sz”.

On a more serious note, this is one I’m likely to tune. On many of our new installations, I’ve been going with the default of AUTOMATIC, but have been questioning that lately and playing with it. The reason? Often DB2 makes this so big that I gather so much information in my hourly dynamic sql snapshots that I fill up 5GB in less than two days. And the impact between this rather large size and and a much more reasonable one seems to be 1% or less in the package cache hit ratio. I’m probably going to start setting this one to a hard value on new installations going forward – I just don’t see enough benefit from the gigundo size STMM seems to be fond of.

This is one of the parameters for which the setting is a soft limit, so db2 can use more than the number you specify (instead of giving you an error). If you’re concerned this may be occuring, you can look at “Package cache high water mark” in a database snapshot.

I’ve always found the package cache to be just fine for following SQL – mostly because I work with e-commerce databases where most of the SQL is canned, and my problem SQL always involves multiple executions – so is likely to stay in the package cache. If this is not true for you, you may want to consider using the new event monitor for the package cache (http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/topic/com.ibm.db2.luw.admin.mon.doc/doc/c0056443.html). It apparently catches information as it leaves the package cache.

Link To Info Center

http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/index.jsp?topic=%2Fcom.ibm.db2.luw.admin.config.doc%2Fdoc%2Fr0000266.html

Related Parameters

Blog Entries Related to this Parameter

Identifying Problem SQL

Parameter Wednesday: DB2 Registry DB2_SKIPINSERTED

$
0
0

DB2 Version This Was Written For

9.7

Parameter Name

DB2_SKIPINSERTED

Where This Parameter Lives

DB2 Registry (db2set)

How To Check Value

> db2set -all |grep DB2_SKIPINSERTED
[i] DB2_SKIPINSERTED=YES [DB2_WORKLOAD]

OR

> db2 "select substr(reg_var_name,1,32) name, substr(reg_var_value,1,16) value, level, is_aggregate, substr(aggregate_name,1,32) aggregate_name from SYSIBMADM.reg_variables where reg_var_name='DB2_SKIPINSERTED' with ur"

NAME                             VALUE            LEVEL IS_AGGREGATE AGGREGATE_NAME
-------------------------------- ---------------- ----- ------------ --------------------------------
DB2_SKIPINSERTED                 YES              I                0 DB2_WORKLOAD

Description

Affects db2’s locking/scanning behavior when using CS or RS isolation levels. Causes DB2 to skip uncommitted inserts as if they did not exist. Variable is activated at database start time, and is engaged (or not) at statement compile or bind time.

This variable has no effect if Currently Commited behavior is enabled (CUR_COMMIT).

Impact

Can reduce locking/deadlocking and increase concurrency if applications can tolerate the data integrity changes. Potentially dangerous if the application does not explicitly support this behavior.

Default

NO/OFF

Range/Values

[ON, OFF}

Recycle Required To Take Effect?

Yes

Can It Be Set To AUTOMATIC?

No, there is no AUTOMATIC option for this parameter.

How To Change It

 db2set DB2_SKIPINSERTED=YES

Rule of Thumb

If your application does not explicitly support it, do not set this parameter – leave it at the default of NO.

Tuning Considerations

The main consideration here is if your application supports this value. With the new cur_commit parameter, it is less likely you will use this. But if your application supports this behavior, you should set this parameter.

Along with DB2_SKIPDELETED and DB2_EVALUNCOMMITTED, you can drastically reduce deadlocking.

Related Error Messages

 

War Stories From The Real World

WebSphere Commerce has supported DB2_SKIPINSERTED, DB2_SKIPDELETED, and DB2_EVALUNCOMMITTED for years. They’re part of the aggregate DB2 Registry setting DB2_WORKLOAD=WC, but we were setting them independently long before that. I have personally seen them cause a WebSphere Commerce site go from having dozens of deadlocks per day to just one or two. So for WebSphere Commerce databases, they absolutely must be set.

I wonder how having the three related parameters set is different from Currently Committed behavior available in DB2 9.7?

Think long and hard before enabling this, because you are changing the behavior of your isolation level by enabling this, and may allow concurrency phenomena that you did not intend.

Link To Info Center

http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/topic/com.ibm.db2.luw.admin.regvars.doc/doc/r0005665.html

Related Parameters

Note that the links fore these are all the same. You have to search that page to find them. I wish IBM would have individual parameter pages for DB2 Registry variables like they do for DB CFG and DBM CFG, but they don’t currently.

Blog Entries Related to this Parameter

What DBAs can do to Reduce Deadlocks

Locking Parameters

Registry Variables and DB2_WORKLOAD=WC

Installing DB2 Using a Response File

$
0
0

Some friends on Twitter were commenting that db2_install is not just deprecated in DB2 10, but is fully discontinued and not available. Melanie Stopfer mentioned this on her webcast on June 22. I do dozens of installs a year, and most of them are using a response file. The same response file, actually, since a standard starting point makes sense at that level for WebSphere Commerce installs. Still, I created the file myself.

If you’re not familiar with db2_install, it was a command line script that seemed to work when other install methods failed, and just did the most basic tasks of getting DB2 code installed on a system.

What is a Response File?

A response file is simply a text (ASCII) file that you use to provide inputs to db2setup instead of providing inputs via GUI. Most DBAs I know are command line geeks like me and tend to avoid GUIs. I have actually set up Xwindows and done a GUI install on a Linux system, but as we say where I work, my X-fu is not strong. I haven’t done it in years.

And honestly, why would I? I install enough very similar servers every year that response files make sense for me. I have used db2_install in the last 18 months, so I definitely see the use case for it, but I can survive without it.

A response file actually goes a bit beyond what you do with db2_install. It will acutally let you create instances and databases and set most instance and database level settings at the same time. When I personally do an install myself, I tend to keep things separate – I do the db2_install first, then go out and create instances, and then databases. You don’t have to do any of the extra work with the response file – you can use it for strictly the task of installing DB2. In our normal scripted installs, I use the response file to install DB2, create an instance, and set some of the most common instance level settings.

When to Use a Response File

The perfect use case for a response file is when you do multiple similar installs. It makes it very easy to script installs too. But a response file can really be used for any install. It’s perfect for use when you don’t want to or cannot use a GUI. If you’re scripting an install, the response file is the only way to go.

Where to get a Response File

To get a response file, there are two main methods:

  1. Do a gui install using db2setup and click the box to create a response file
  2. Pull the sample file from the installation media and edit it manually

Surprise, surprise, I’m actually a fan of the second. Either one will work, but since I avoid GUIs, the last time I actually used the db2setup GUI was about 3 years ago for a windows install.

Response files can change between versions of DB2 and between operating systems (though I find the same file seems to work across various Linux and UNIX distributions). You can find the sample file on your installation media under db2/<platform>/samples. That’s right, it’s not in your installed code, so you cannot just grab one from another server, you have to go to your downloaded installation media or your DVD.

Response files are finicky on syntax, and don’t always give the best error messages. That is the most difficult part about them. If you’re like me and can’t seem to write a 20-line script that runs right on the first execution, you may have to try several times to get the install to run with an response file. If a client is running an install, I prefer to only hand them a tested and proven response file for this reason.

What does a response file look like?

Here’s essentially the one that I use most frequently(some names may have been changed to protect the innocent – I don’t actually recommend some of the settings like /home for an instance home):

** Response File for ESE
**  Also creates and starts db2 instance and makes entry in /etc/services
**  Configures all required instance-level parameters except SYSMON_GROUP, which must still be set manually
**  Additional DB2 work is still required after Commerce instance creation
PROD                      = ENTERPRISE_SERVER_EDITION
FILE                      = /opt/IBM/db2/V9.7
LIC_AGREEMENT             = ACCEPT         ** ACCEPT or DECLINE
INSTALL_TYPE              = TYPICAL         ** TYPICAL, COMPACT, CUSTOM
INSTALL_TSAMP            = YES             ** YES or NO. Valid for root install only
INSTANCE                  = db2inst1
db2inst1.NAME             = db2inst1        ** real name of the instance
db2inst1.GROUP_NAME       = db2grp        ** char(30) no spaces
db2inst1.HOME_DIRECTORY   = /home/db2inst1                ** char(64) no spaces. Valid for root install only
db2inst1.AUTOSTART        = YES             ** YES or NO
db2inst1.START_DURING_INSTALL = NO         ** YES or NO
db2inst1.SVCENAME        = db2c_db2inst1   ** BLANK or char(14). Reserved for root install only
db2inst1.PORT_NUMBER     = 50000           ** 1024 - 65535, Reserved for root install only
db2inst1.TYPE            = ESE             ** ESE WSE STANDALONE CLIENT
db2inst1.AUTHENTICATION  = SERVER          ** CLIENT, SERVER, or SERVER_ENCRYPT
db2inst1.FENCED_USERNAME  = db2fenc        ** char(8)  no spaces, no upper case letters
db2inst1.FENCED_GROUP_NAME = db2fgrp       ** char(30)  no spaces
db2inst1.FENCED_HOME_DIRECTORY = /home/db2fenc          ** char(64) no spaces
db2inst1.DFTDBPATH       = /data                ** any valid path
db2inst1.DFT_MON_BUFPOOL = ON                 ** ON or OFF
db2inst1.DFT_MON_LOCK    = ON                 ** ON or OFF
db2inst1.DFT_MON_SORT    = ON                 ** ON or OFF
db2inst1.DFT_MON_STMT    = ON                 ** ON or OFF
db2inst1.DFT_MON_TABLE   = ON                 ** ON or OFF
db2inst1.DFT_MON_UOW     = ON                 ** ON or OFF
db2inst1.DFT_MON_TIMESTAMP = ON               ** ON or OFF
db2inst1.HEALTH_MON      = OFF
db2inst1.DIAGPATH        = /diag                ** BLANK or char(215)
db2inst1.INSTANCE_MEMORY = AUTOMATIC       ** AUTOMATIC or a number in range [0, 1000000] for 32-bit and [0, 68719476736] for 64-bit
db2inst1.HEALTH_MON      = OFF                ** default is ON; ON or OFF
db2inst1.SHEAPTHRES      = 50000                ** 250 - 2097152
db2inst1.SPM_NAME        = NULL                ** BLANK or char(8)
db2inst1.SYSADM_GROUP    = db2adm                 ** BLANK or char(30)
db2inst1.DB2BIDI         = ON                 ** BLANK, 0, 1, YES, NO, ON, OFF, Y, N, TRUE, FALSE, T or F
db2inst1.DB2_PARALLEL_IO = *                 ** BLANK, * or 0-4095,0-4095,...
db2inst1.DB2_INLIST_TO_NLJN = YES              ** BLANK or YES, NO
db2inst1.DB2_USE_ALTERNATE_PAGE_CLEANING = ON ** BLANK or ON, OFF
db2inst1.DB2_WORKLOAD    = WC         ** BLANK, SAP
db2inst1.DB2COMM                 = TCPIP
DAS_USERNAME             = dasusr                 ** char(8)  no spaces, no upper case letters
DAS_GROUP_NAME           = dasgrp                 ** char(30)  no spaces

I discovered the hard way that at least for 9.7 FixPack 3 and before, you cannot set the SYSMON group in the response file. It may have changed, I haven’t tried it on FixPacks 4 or 5 because I have additional scripting that makes it easy for me to set parameters manually while I’m doing some other database level set up work. There may be other random settings like that missing.

Note that most of the parameters are prefixed with the instance name – one thing that does is make it so I could create multiple instances using a single response file, and have differing settings for them.

Comments in this file are done by using a double asterix (**), and many of the comments that I’ve left in here are ones that come with the sample file, reminding us of ranges and so forth.

I think you could get by with just the following minimal settings:

  • PROD
  • FILE
  • LIC_AGREEMENT
  • INSTALL_TYPE
  • INSTALL_TSAMP

But you would have to test it before relying on that advice, and you would also have to do the basics like setting DB2COMM and SVCENAME manually

How to use the response file

This is truly the easy part. To use a response file, all you have to do is execute db2setup as you normally would, but add the -r flag and specify the full path to the response file. Obviously you’re running it as root. It looks something like this:

# ./db2setup -r /downloads/scripts/response_files/my_db2ese_9_7.rsp

Anything I’m missing here? anything else you’d like to know about installing DB2 using a response file?

References

Good developerWorks Article on response files:http://www.ibm.com/developerworks/data/library/techarticle/0302gao/0302gao.html
Info Center entry on response file considerations (links to other good info center entries on response files): http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/topic/com.ibm.db2.luw.qb.server.doc/doc/c0007502.html
Info Center entry on response file keywords: http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/topic/com.ibm.db2.luw.qb.server.doc/doc/r0007505.html

Parameter Wednesday: DB2 Registry DB2_EVALUNCOMMITTED

$
0
0

DB2 Version This Was Written For

9.7

Parameter Name

DB2_EVALUNCOMMITTED

Where This Parameter Lives

DB2 Registry (db2set)

How To Check Value

> db2set -all |grep DB2_EVALUNCOMMITTED
[i] DB2_EVALUNCOMMITTED=YES [DB2_WORKLOAD]

OR

> db2 "select substr(reg_var_name,1,32) name, substr(reg_var_value,1,16) value, level, is_aggregate, substr(aggregate_name,1,32) aggregate_name from SYSIBMADM.reg_variables where reg_var_name='DB2_EVALUNCOMMITTED' with ur"

NAME                             VALUE            LEVEL IS_AGGREGATE AGGREGATE_NAME
-------------------------------- ---------------- ----- ------------ --------------------------------
DB2_EVALUNCOMMITTED              YES              I                0 DB2_WORKLOAD

Description

In some situations where there are uncommitted updates to a row, this parameter defers the acquisition of a lock on a row for CS or RS isolation levels until the row is know to satisfy the predicates of the query. This can improve concurrency.

The value of the parameter at bind time is used if the values are different at bind time and run time.

There are only very specific scenarios where this parameter applies, which are laid out in detail in the Info Center. http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/topic/com.ibm.db2.luw.admin.perf.doc/doc/c0011218.html. In general, this parameter only helps if CS or RS isolation levels are used and predicates in question are SARGable.

This parameter is ignored when currently committed semantics are used (CUR_COMMIT=ON).

Impact

Can reduce locking/deadlocking and increase concurrency if applications support the data integrity changes. Potentially dangerous if the application does not explicitly support this behavior.

Default

NO/OFF

Range/Values

ON, OFF

Recycle Required To Take Effect?

Yes

Can It Be Set To AUTOMATIC?

No, there is no AUTOMATIC option for this parameter.

How To Change It

db2set DB2_EVALUNCOMMITTED=YES

Rule of Thumb

If your application does not explicitly support it, do not set this parameter – leave it at the default of NO.

Tuning Considerations

The main consideration here is if your application supports this value. If your application supports this behavior, you should set this parameter.

Along with DB2_SKIPINSERTED and DB2_SKIPDELETED, you can drastically reduce deadlocking.

Related Error Messages

 

War Stories From The Real World

WebSphere Commerce has supported DB2_SKIPINSERTED, DB2_SKIPDELETED, and DB2_EVALUNCOMMITTED for years. They’re now part of the aggregate DB2 Registry setting DB2_WORKLOAD=WC, but we were setting them independently long before that. I have personally seen them cause a WebSphere Commerce site go from having dozens of deadlocks per day to just one or two. So for WebSphere Commerce databases, they absolutely must be set.

I wonder how having the three related parameters set is different from Currently Committed behavior available in DB2 9.7?

Think long and hard before enabling this, because you are changing the behavior of your isolation level by enabling this, and may allow concurrency phenomena that you did not intend.

Link To Info Center

http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/topic/com.ibm.db2.luw.admin.regvars.doc/doc/r0005665.html

Related Parameters

Note that the links for these are all the same. You have to search that page to find them. I wish IBM would have individual parameter pages for DB2 Registry variables like they do for DB CFG and DBM CFG, but they don’t currently.

Blog Entries Related to this Parameter

What DBAs can do to Reduce Deadlocks

Locking Parameters

Registry Variables and DB2_WORKLOAD=WC

Parameter Wednesday: DB CFG UTIL_HEAP_SZ

$
0
0

DB2 Version This Was Written For

9.7

Parameter Name

UTIL_HEAP_SZ

Where This Parameter Lives

DB CFG

How To Check Value

> db2 get db cfg for sample |grep UTIL_HEAP_SZ
 Utilities heap size (4KB)                (UTIL_HEAP_SZ) = 70982

OR

> db2 "select name, substr(value,1,12) value, substr(deferred_value,1,12) deferred_value from sysibmadm.dbcfg where name='util_heap_sz' with ur"

NAME                             VALUE        DEFERRED_VALUE
-------------------------------- ------------ --------------
util_heap_sz                     70982        70982

Description

The utility heap is used by – surprise, surprise – utilities. Load, Backup, Restore, Redistribute, Compression Dictionary Creation, and Online Index reorg operations each use memory from this heap.

Impact

Can drastically impact the performance of utilities.

Default

5000

Range/Values

16-524,288

Recycle Required To Take Effect?

No

Can It Be Set To AUTOMATIC?

No, there is no AUTOMATIC option for this parameter.

How To Change It

db2 update db cfg for sample using UTIL_HEAP_SZ 7000

Rule of Thumb

Start with the default, and increase if you have performance problems with utilities or receive errors.

Tuning Considerations

You can track the allocated/used space using db2mtrk. In the example below, the utility heap is the first one reported in the upper left.

> db2mtrk -d
Tracking Memory on: 2012/08/01 at 03:10:41

Memory for database: SAMPLE

   utilh       pckcacheh   other       catcacheh   bph (4)     bph (3)
   192.0K      36.1M       192.0K      18.2M       1.2G        54.1M

   bph (2)     bph (1)     bph (S32K)  bph (S16K)  bph (S8K)   bph (S4K)
   414.1M      1.4G        832.0K      576.0K      448.0K      384.0K

   lockh       dbh         apph (56234)apph (56209)apph (55221)apph (54716)
   62.8M       35.9M       256.0K      64.0K       64.0K       192.0K

   apph (54205)apph (53697)apph (53195)apph (52173)apph (51151)apph (5977)
   128.0K      128.0K      192.0K      128.0K      192.0K      128.0K

   apph (5599) apph (25678)apph (901)  apph (832)  apph (45)   apph (44)
   128.0K      128.0K      64.0K       128.0K      64.0K       64.0K

   apph (43)   apph (42)   apph (41)   apph (40)   apph (39)   apph (38)
   64.0K       64.0K       64.0K       512.0K      64.0K       64.0K

   apph (37)   appshrh
   64.0K       6.1M

You can also get this information through the SYSIBMADM views:

> db2 "select SNAPSHOT_TIMESTAMP, POOL_ID, POOL_CUR_SIZE, POOL_WATERMARK, POOL_CONFIG_SIZE, DBPARTITIONNUM from sysibmadm.snapdb_memory_pool where DB_NAME = 'WC005D01' and pool_id='UTILITY' with ur
"

SNAPSHOT_TIMESTAMP         POOL_ID        POOL_CUR_SIZE        POOL_WATERMARK       POOL_CONFIG_SIZE     DBPARTITIONNUM
-------------------------- -------------- -------------------- -------------------- -------------------- --------------
2012-08-01-03.24.02.128583 UTILITY                      196608            101253120            290783232              0

Note that while the pool size is specified in 4K pages, the numbers in the the above two commands are in Bytes/KB.

Now here’s part of the interesting part. The space from this heap is allocated as a percentage of the remaining heap in the following ways:

Backup 50%
Restore 100%
Load 25%
Redistribute   50%

Compression Dictionary creation takes only about 10MB to build, and online index reorganization uses this area to track transactions.
 
The important part of that is not the percentages – it’s what the percentages are of. They’re of the REMAINING available heap space – so each additional operation gets less space. This would especially show up if you were running a number of loads at the same time – you would see them get slower and slower with each additional load added.
 
There are some things you can do to get around these allocations. The data buffer option on the load or redistribute utilities would help you ensure consistent space. Similar ends could be achieved by the buffer size on backup and restore. But keep in mind that even if you use these methods to get around the allocations, all that memory still comes from the Utility heap, and if you exhaust it, you won’t have memory available for additional operations.

Related Error Messages

War Stories From The Real World

Link To Info Center

http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/topic/com.ibm.db2.luw.admin.config.doc/doc/r0000330.html

Related Parameters

Using TSA/db2haicu to Automate Failover Part 3: Testing, Ways Setup can go Wrong and What to do.

$
0
0

Part 3 in this series is a bit overdue. Parts 1 and 2 were back in April. This is a complicated topic. Please use any procedures here with extreme care, and keep in mind that if you have anything other than the standard two-server HADR-only TSA implementation, these procedures probably aren’t the best idea, as they could break other things. There will also be a Part 4 – dealing with problems after set-up.

I’m not saying I’m covering every possible failure scenario, but I’ve seen a number of different issues and wanted to share some strategies for dealing with them.

Testing automated failover

First of all, it is absoultely critical that you test your failover. As many tests as you can manage will help you out here. I try to set up hadr, set up failover using TSA/db2haicu, and test all in the same week to keep things from getting missed.
The absolute minimum tests you should do are:

  • Manual takeover, verifying the database
  • Manual takeover, verifying the Commerce (or other) application
  • Hard failure with inability to start (renaming executable)

If at all possible, also do the following tests:

  • Power-off tests on each node
  • db2_kill test on each node (with caution)
  • Manual takeover by force on each node
  • Network failure test on each node
  • Failover under load (during load test)

See section 6 of this document for some really excellent details on testing: http://download.boulder.ibm.com/ibmdl/pub/software/dw/data/dm-0908hadrdb2haicu/HADR_db2haicu.pdf

If you just assume it will work, it probably will not. On at least three occasions, I’ve caught issues while testing failover.

Failure causes

While I’ve caught 3 issues while testing failover, I’ve had at least twice that many during the setup process. The most common cause of failure that I’ve seen is missed steps during preparation. For nearly every problem or issue I’ve seen, I’ve gone back and added to that preparation post. The first few times I set up TSA with HADR, my preparation was mostly just gathering inputs. Then, one by one, as I saw failures, I added to the prep work. I’m still going to talk about what those missed prep work errors look like, because it’s easy to miss something. I always say that the best DBA is a detail-oriented control freak, and this is one area where that’s certainly true. If you’re having problems, literally go through the preparation post line by line on each server and see if you missed anything. Seriously, for any failure prior to testing, go through the preparation items with a fine-tooth comb on both servers.

preprpnode Failure

If you go to do the preprpnode preparation step, and you get a failure like this:

# preprpnode server1.domain.com server2.domain.com

-bash: preprpnode: command not found

This likely means that your SAM installation was not completed successfully. See http://db2commerce.com/2012/04/09/using-tsadb2haicu-to-automate-failover-part-1-the-preparation/ – the section called “Software Installed” – for details on how to do that.

Failure on Creating the Domain

What this looks like

> db2haicu
...
Create the domain now? [1]
1. Yes
2. No
1
Creating domain prod_db2ha in the cluster ...
Creating domain failed. Refer to db2diag.log and the DB2 Information Center for details.

I don’t have excerpts from the db2diag log at this point – if anyone does, please share.

Resolution

This usually means you didn’t do the preprpnode or you didn’t do it properly. Remember that the preprpnode must be done as root on both servers in this format:

# preprpnode server1.domain.com server2.domain.com

db2haicu Fails Near the End of the Setup for the Standby Server

What This Looks Like

> db2haicu
...
Retrieving high availability configuration parameter for instance db2inst1 ...
The cluster manager name configuration parameter (high availability configuration parameter) is not set. For more information, se
e the topic "cluster_mgr - Cluster manager name configuration parameter" in the DB2 Information Center. Do you want to set the hi
gh availability configuration parameter?
The following are valid settings for the high availability configuration parameter:
  1.TSA
  2.Vendor
Enter a value for the high availability configuration parameter: [1]
1
Setting a high availability configuration parameter for instance db2inst1 to TSA.
Adding DB2 database partition 0 to the cluster ...
There was an error with one of the issued cluster manager commands. Refer to db2diag.log and the DB2 Information Center for detai
ls.

Resolution

In the case where I most recently saw this particular failure, I had set up HADR with IP addresses. TSA/db2haicu does not seem to like or allow the use of just IP addresses. So I had to go back and re-do the HADR setup using host names. I believe I’ve also seen failures here due to incorrect formatting of the hosts file or incorrect entries in db2nodes.cfg(yes, for single node implementations). Basically a failure at this point most frequently means that you missed some part of the preparation steps. See http://db2commerce.com/2012/04/09/using-tsadb2haicu-to-automate-failover-part-1-the-preparation/.

Failure on failover test while testing the Application

This one seems a bit dumb in retrospect, but I was working with someone I don’t normally, and made some assumptions that I shouldn’t have. Essentially what happened was that when we tested the failover, we saw the database come up fine every time, but the application never seemed to re-establish connections. After a couple of hours of troubleshooting, we realized that the application’s ID did not exist on the standby server, and when it was created and the passwords synced, the problem immediately went away. This holds true for just standard HADR, even if you’re not using TSA: ensure that your user ids and passwords are identical between your primary and your standby database servers.

TSA Installation Issues

We normally install DB2 from Base Code, and then Apply the latest FixPack (well, as long as it has been out for a month or so). On RedHat, we’ve seen issues where the version of RedHat we’re using doesn’t support the version of TSA that comes with the base code. So when we install DB2, it gives an error message that the TSA/SAM component could not be installed. Luckily the version of TSA that comes with FixPack 4 and later is supported with the version of RedHat. But the FixPack does not automatically install it, of course. So for servers where we want to use TSA, we have to install the DB2 Base Code, Install the FixPack, and then install the TSA/SAM component from the FixPack code using this procedure: http://db2commerce.com/2012/04/09/using-tsadb2haicu-to-automate-failover-part-1-the-preparation/ – the section called “Software Installed”

Other Failures

Ultimately, I know that I don’t fully understand at least half of the failures I’ve seen. I need to see what information I can find on pure TSA so that I really understand what to do and all of the states. I would love it if there were some education offered for this at the conference or even just in a webcast. So what I really have are a series of things that I try when a failure occurs. Some I’ve already mentioned above.

  1. Go through the prep work with a fine tooth comb: http://db2commerce.com/2012/04/09/using-tsadb2haicu-to-automate-failover-part-1-the-preparation/. This includes:
    • Double and tripple check that you have picked either the server’s short name or the server’s long name and are using it consistently in each of:
      • /etc/hosts
      • HADR configuration parameters in db cfg
      • db2nodes.cfg (in $HOME/sqllib)
      • Results of the ‘hostname’ command
    • Double check that you successfully executed the preprpnode command on both hosts
    • Double check that you successfully executed the db2cptsa command on both hosts
  2. Start Over. Delete your TSA work using the -delete option on db2haicu and start over with db2haicu fresh
    [db2inst1@403238-Prod-db2 ~]$ db2haicu -delete
    Welcome to the DB2 High Availability Instance Configuration Utility (db2haicu).
    
    You can find detailed diagnostic information in the DB2 server diagnostic log file called db2diag.log. Also, you can use the util
    ity called db2pd to query the status of the cluster domains you create.
    
    For more information about configuring your clustered environment using db2haicu, see the topic called 'DB2 High Availability Ins
    tance Configuration Utility (db2haicu)' in the DB2 Information Center.
    
    db2haicu determined the current DB2 database manager instance is db2inst1. The cluster configuration that follows will apply to t
    his instance.
    
    When you use db2haicu to configure your clustered environment, you create cluster domains. For more information, see the topic 'C
    reating a cluster domain with db2haicu' in the DB2 Information Center. db2haicu is searching the current machine for an existing
    active cluster domain ...
    db2haicu found a cluster domain called prod_db2ha on this machine. The cluster configuration that follows will apply to this doma
    in.
    
    Deleting the domain prod_db2ha from the cluster ...
    Deleting the domain prod_db2ha from the cluster was successful.
    All cluster configurations have been completed successfully. db2haicu exiting ...
    
  3. Try uninstalling and re-installing the TSA/SAM component
    • Uninstalling looks like this:
      [root@server1]# cd /db2/linuxamd64/tsamp
      [root@server1]# ./uninstallSAM
      uninstallSAM: Uninstalling System Automation on platform: x86_64
      uninstallSAM: Package is not installed: sam.sappolicy
      uninstallSAM: Uninstalling
       sam.adapter-3.1.0.1-08261.i386
      uninstallSAM: Uninstalling
       sam.msg.de_DE-3.1.0.0-0.i386
       sam.msg.de_DE.ISO-8859-1-3.1.0.0-0.i386
       sam.msg.de_DE@euro-3.1.0.0-0.i386
       sam.msg.de_DE.UTF-8-3.1.0.0-0.i386
      uninstallSAM: Uninstalling
       sam.msg.es_ES-3.1.0.0-0.i386
       sam.msg.es_ES.ISO-8859-1-3.1.0.0-0.i386
       sam.msg.es_ES@euro-3.1.0.0-0.i386
       sam.msg.es_ES.UTF-8-3.1.0.0-0.i386
      uninstallSAM: Uninstalling
       sam.msg.fr_FR-3.1.0.0-0.i386
       sam.msg.fr_FR.ISO-8859-1-3.1.0.0-0.i386
       sam.msg.fr_FR@euro-3.1.0.0-0.i386
       sam.msg.fr_FR.UTF-8-3.1.0.0-0.i386
      uninstallSAM: Uninstalling
       sam.msg.it_IT-3.1.0.0-0.i386
       sam.msg.it_IT.ISO-8859-1-3.1.0.0-0.i386
       sam.msg.it_IT@euro-3.1.0.0-0.i386
       sam.msg.it_IT.UTF-8-3.1.0.0-0.i386
      uninstallSAM: Uninstalling
       sam.msg.ja_JP.eucJP-3.1.0.0-0.i386
       sam.msg.ja_JP.UTF-8-3.1.0.0-0.i386
      uninstallSAM: Uninstalling
       sam.msg.ko_KR.eucKR-3.1.0.0-0.i386
       sam.msg.ko_KR.UTF-8-3.1.0.0-0.i386
      uninstallSAM: Uninstalling
       sam.msg.pt_BR-3.1.0.0-0.i386
       sam.msg.pt_BR.UTF-8-3.1.0.0-0.i386
      uninstallSAM: Uninstalling
       sam.msg.zh_CN.GB2312-3.1.0.0-0.i386
       sam.msg.zh_CN.GB18030-3.1.0.0-0.i386
       sam.msg.zh_CN.GBK-3.1.0.0-0.i386
       sam.msg.zh_CN.UTF-8-3.1.0.0-0.i386
      uninstallSAM: Uninstalling
       sam.msg.zh_TW-3.1.0.0-0.i386
       sam.msg.zh_TW.Big5-3.1.0.0-0.i386
       sam.msg.zh_TW.eucTW-3.1.0.0-0.i386
       sam.msg.zh_TW.UTF-8-3.1.0.0-0.i386
      uninstallSAM: Uninstalling
       sam-3.1.0.1-08261.i386
      uninstallSAM: Uninstalling
       rsct.opt.storagerm-2.5.1.4-08249.i386
      uninstallSAM: Uninstalling
       rsct.64bit-2.5.1.4-08249.x86_64
      uninstallSAM: Uninstalling
       rsct.basic.msg.de_DE-2.5.1.2-0.i386
       rsct.basic.msg.de_DE.ISO-8859-1-2.5.1.2-0.i386
       rsct.basic.msg.de_DE@euro-2.5.1.2-0.i386
       rsct.basic.msg.de_DE.UTF-8-2.5.1.2-0.i386
      uninstallSAM: Uninstalling
       rsct.basic.msg.es_ES-2.5.1.2-0.i386
       rsct.basic.msg.es_ES.ISO-8859-1-2.5.1.2-0.i386
       rsct.basic.msg.es_ES@euro-2.5.1.2-0.i386
       rsct.basic.msg.es_ES.UTF-8-2.5.1.2-0.i386
      uninstallSAM: Uninstalling
       rsct.basic.msg.fr_FR-2.5.1.2-0.i386
       rsct.basic.msg.fr_FR.ISO-8859-1-2.5.1.2-0.i386
       rsct.basic.msg.fr_FR@euro-2.5.1.2-0.i386
       rsct.basic.msg.fr_FR.UTF-8-2.5.1.2-0.i386
      uninstallSAM: Uninstalling
       rsct.basic.msg.it_IT-2.5.1.2-0.i386
       rsct.basic.msg.it_IT.ISO-8859-1-2.5.1.2-0.i386
       rsct.basic.msg.it_IT@euro-2.5.1.2-0.i386
       rsct.basic.msg.it_IT.UTF-8-2.5.1.2-0.i386
      uninstallSAM: Uninstalling
       rsct.basic.msg.ja_JP.eucJP-2.5.1.2-0.i386
       rsct.basic.msg.ja_JP.UTF-8-2.5.1.2-0.i386
      uninstallSAM: Uninstalling
       rsct.basic.msg.ko_KR.eucKR-2.5.1.2-0.i386
       rsct.basic.msg.ko_KR.UTF-8-2.5.1.2-0.i386
      uninstallSAM: Uninstalling
       rsct.basic.msg.pt_BR-2.5.1.2-0.i386
       rsct.basic.msg.pt_BR.UTF-8-2.5.1.2-0.i386
      uninstallSAM: Uninstalling
       rsct.basic.msg.zh_CN.GB2312-2.5.1.2-0.i386
       rsct.basic.msg.zh_CN.GB18030-2.5.1.2-0.i386
       rsct.basic.msg.zh_CN.GBK-2.5.1.2-0.i386
       rsct.basic.msg.zh_CN.UTF-8-2.5.1.2-0.i386
      uninstallSAM: Uninstalling
       rsct.basic.msg.zh_TW-2.5.1.2-0.i386
       rsct.basic.msg.zh_TW.Big5-2.5.1.2-0.i386
       rsct.basic.msg.zh_TW.eucTW-2.5.1.2-0.i386
       rsct.basic.msg.zh_TW.UTF-8-2.5.1.2-0.i386
      uninstallSAM: Uninstalling
       rsct.basic-2.5.1.4-08249.i386
      uninstallSAM: Uninstalling
       rsct.core.msg.de_DE-2.5.1.2-0.i386
       rsct.core.msg.de_DE.ISO-8859-1-2.5.1.2-0.i386
       rsct.core.msg.de_DE@euro-2.5.1.2-0.i386
       rsct.core.msg.de_DE.UTF-8-2.5.1.2-0.i386
      uninstallSAM: Uninstalling
       rsct.core.msg.es_ES-2.5.1.2-0.i386
       rsct.core.msg.es_ES.ISO-8859-1-2.5.1.2-0.i386
       rsct.core.msg.es_ES@euro-2.5.1.2-0.i386
       rsct.core.msg.es_ES.UTF-8-2.5.1.2-0.i386
      uninstallSAM: Uninstalling
       rsct.core.msg.fr_FR-2.5.1.2-0.i386
       rsct.core.msg.fr_FR.ISO-8859-1-2.5.1.2-0.i386
       rsct.core.msg.fr_FR@euro-2.5.1.2-0.i386
       rsct.core.msg.fr_FR.UTF-8-2.5.1.2-0.i386
      uninstallSAM: Uninstalling
       rsct.core.msg.it_IT-2.5.1.2-0.i386
       rsct.core.msg.it_IT.ISO-8859-1-2.5.1.2-0.i386
       rsct.core.msg.it_IT@euro-2.5.1.2-0.i386
       rsct.core.msg.it_IT.UTF-8-2.5.1.2-0.i386
      uninstallSAM: Uninstalling
       rsct.core.msg.ja_JP.eucJP-2.5.1.2-0.i386
       rsct.core.msg.ja_JP.UTF-8-2.5.1.2-0.i386
      uninstallSAM: Uninstalling
       rsct.core.msg.ko_KR.eucKR-2.5.1.2-0.i386
       rsct.core.msg.ko_KR.UTF-8-2.5.1.2-0.i386
      uninstallSAM: Uninstalling
       rsct.core.msg.pt_BR-2.5.1.2-0.i386
       rsct.core.msg.pt_BR.UTF-8-2.5.1.2-0.i386
      uninstallSAM: Uninstalling
       rsct.core.msg.zh_CN.GB2312-2.5.1.2-0.i386
       rsct.core.msg.zh_CN.GB18030-2.5.1.2-0.i386
       rsct.core.msg.zh_CN.GBK-2.5.1.2-0.i386
       rsct.core.msg.zh_CN.UTF-8-2.5.1.2-0.i386
      uninstallSAM: Uninstalling
       rsct.core.msg.zh_TW-2.5.1.2-0.i386
       rsct.core.msg.zh_TW.Big5-2.5.1.2-0.i386
       rsct.core.msg.zh_TW.eucTW-2.5.1.2-0.i386
       rsct.core.msg.zh_TW.UTF-8-2.5.1.2-0.i386
      uninstallSAM: Uninstalling
       rsct.core-2.5.1.4-08249.i386
      uninstallSAM: Uninstalling
       rsct.core.utils.msg.de_DE-2.5.1.2-0.i386
       rsct.core.utils.msg.de_DE.ISO-8859-1-2.5.1.2-0.i386
       rsct.core.utils.msg.de_DE@euro-2.5.1.2-0.i386
       rsct.core.utils.msg.de_DE.UTF-8-2.5.1.2-0.i386
      uninstallSAM: Uninstalling
       rsct.core.utils.msg.es_ES-2.5.1.2-0.i386
       rsct.core.utils.msg.es_ES.ISO-8859-1-2.5.1.2-0.i386
       rsct.core.utils.msg.es_ES@euro-2.5.1.2-0.i386
       rsct.core.utils.msg.es_ES.UTF-8-2.5.1.2-0.i386
      uninstallSAM: Uninstalling
       rsct.core.utils.msg.fr_FR-2.5.1.2-0.i386
       rsct.core.utils.msg.fr_FR.ISO-8859-1-2.5.1.2-0.i386
       rsct.core.utils.msg.fr_FR@euro-2.5.1.2-0.i386
       rsct.core.utils.msg.fr_FR.UTF-8-2.5.1.2-0.i386
      uninstallSAM: Uninstalling
       rsct.core.utils.msg.it_IT-2.5.1.2-0.i386
       rsct.core.utils.msg.it_IT.ISO-8859-1-2.5.1.2-0.i386
       rsct.core.utils.msg.it_IT@euro-2.5.1.2-0.i386
       rsct.core.utils.msg.it_IT.UTF-8-2.5.1.2-0.i386
      uninstallSAM: Uninstalling
       rsct.core.utils.msg.ja_JP.eucJP-2.5.1.2-0.i386
       rsct.core.utils.msg.ja_JP.UTF-8-2.5.1.2-0.i386
      uninstallSAM: Uninstalling
       rsct.core.utils.msg.ko_KR.eucKR-2.5.1.2-0.i386
       rsct.core.utils.msg.ko_KR.UTF-8-2.5.1.2-0.i386
      uninstallSAM: Uninstalling
       rsct.core.utils.msg.pt_BR-2.5.1.2-0.i386
       rsct.core.utils.msg.pt_BR.UTF-8-2.5.1.2-0.i386
      uninstallSAM: Uninstalling
       rsct.core.utils.msg.zh_CN.GB2312-2.5.1.2-0.i386
       rsct.core.utils.msg.zh_CN.GB18030-2.5.1.2-0.i386
       rsct.core.utils.msg.zh_CN.GBK-2.5.1.2-0.i386
       rsct.core.utils.msg.zh_CN.UTF-8-2.5.1.2-0.i386
      uninstallSAM: Uninstalling
       rsct.core.utils.msg.zh_TW-2.5.1.2-0.i386
       rsct.core.utils.msg.zh_TW.Big5-2.5.1.2-0.i386
       rsct.core.utils.msg.zh_TW.eucTW-2.5.1.2-0.i386
       rsct.core.utils.msg.zh_TW.UTF-8-2.5.1.2-0.i386
      uninstallSAM: Uninstalling
       rsct.core.utils-2.5.1.4-08249.i386
      uninstallSAM: Uninstalling
       src.msg.de_DE-1.3.0.3-0.i386
       src.msg.de_DE.ISO-8859-1-1.3.0.3-0.i386
       src.msg.de_DE@euro-1.3.0.3-0.i386
       src.msg.de_DE.UTF-8-1.3.0.3-0.i386
      uninstallSAM: Uninstalling
       src.msg.es_ES-1.3.0.3-0.i386
       src.msg.es_ES.ISO-8859-1-1.3.0.3-0.i386
       src.msg.es_ES@euro-1.3.0.3-0.i386
       src.msg.es_ES.UTF-8-1.3.0.3-0.i386
      uninstallSAM: Uninstalling
       src.msg.fr_FR-1.3.0.3-0.i386
       src.msg.fr_FR.ISO-8859-1-1.3.0.3-0.i386
       src.msg.fr_FR@euro-1.3.0.3-0.i386
       src.msg.fr_FR.UTF-8-1.3.0.3-0.i386
      uninstallSAM: Uninstalling
       src.msg.it_IT-1.3.0.3-0.i386
       src.msg.it_IT.ISO-8859-1-1.3.0.3-0.i386
       src.msg.it_IT@euro-1.3.0.3-0.i386
       src.msg.it_IT.UTF-8-1.3.0.3-0.i386
      uninstallSAM: Uninstalling
       src.msg.ja_JP.eucJP-1.3.0.3-0.i386
       src.msg.ja_JP.UTF-8-1.3.0.3-0.i386
      uninstallSAM: Uninstalling
       src.msg.ko_KR.eucKR-1.3.0.3-0.i386
       src.msg.ko_KR.UTF-8-1.3.0.3-0.i386
      uninstallSAM: Uninstalling
       src.msg.pt_BR-1.3.0.3-0.i386
       src.msg.pt_BR.UTF-8-1.3.0.3-0.i386
      uninstallSAM: Uninstalling
       src.msg.zh_CN.GB2312-1.3.0.3-0.i386
       src.msg.zh_CN.GB18030-1.3.0.3-0.i386
       src.msg.zh_CN.GBK-1.3.0.3-0.i386
       src.msg.zh_CN.UTF-8-1.3.0.3-0.i386
      uninstallSAM: Uninstalling
       src.msg.zh_TW-1.3.0.3-0.i386
       src.msg.zh_TW.Big5-1.3.0.3-0.i386
       src.msg.zh_TW.eucTW-1.3.0.3-0.i386
       src.msg.zh_TW.UTF-8-1.3.0.3-0.i386
      uninstallSAM: Uninstalling
       src-1.3.0.4-08249.i386
    • For re-installing see: http://db2commerce.com/2012/04/09/using-tsadb2haicu-to-automate-failover-part-1-the-preparation/ – the section called “Software Installed”

Now that I have my prep work figured out, I can get a clean setup on the first try about 50-75% of the time. The rest of the time, I still have some sort of issue that I have to troubleshoot and deal with on setup or testing. So don’t be discouraged – just work through the issues. I hope this post can provide you with a good toolbox of things to try. Please comment or contact me if you have additional issues that you have seen and solved so others can benefit from your pain.

Other Posts In This Series

This series consists of four posts:
Using TSA/db2haicu to automate failover – Part 1: The Preparation
Using TSA/db2haicu to automate failover – Part 2: How it looks if it goes smoothly
Using TSA/db2haicu to Automate Failover Part 3: Testing, Ways Setup can go Wrong and What to do.
“Using TSA/db2haicu to automate failover Part 4: Dealing with Problems After Setup

Search this blog on “TSA” for other posts on TSA issues and tips.

What to Change in DB2 when IP Addresses or Host Names Change

$
0
0

I have been through a number of IP address or host name changes over the years, and thought I’d share the lessons learned. I’m specifically talking about changes to the IP address or hostname of a DB2 server, and the related changes within DB2. This post focuses on steps for Linux/UNIX systems.

IP Address Changes

If you have cataloged databases using host names, there may actually be very little to change when a database server’s IP address changes. I prefer to use a host name when cataloging remote databases. I require WebSphere Commerce and any application servers with full DB2 clients on them to use host names. If you’re using host names, then an IP address change has less impact. In fact, if you are not using TSA and you set up HADR (if any) with host names, you may not have to make any changes at all. You will, however, have to verify that your /etc/hosts file on the database server is also appropriately changed. An SA(System Administrator) may do this work, but it never hurts to check. See the “Hosts File” section below.

Before the change even occurs, first check to see if any loopback database catalog entries and any entries on DB2 clients that access this server use the IP address. If they don’t, then no work is needed in re-cataloging. If they do, then choose between altering them to use the hostname or coordinating with the IP address change to make the catalog entry changes at the same time. In either case, only change the node directory entry and not the database directory entry. Use the UNCATALOG NODE and CATALOG TCPIP NODE commands to re-catalog the nodes.

For HADR, check the values for HADR_LOCAL_HOST and/or HADR_REMOTE_HOST on both the primary and the standby database servers for any IP addresses. Hostnames in these variables mean that changes are not required if only the IP Address of a server is changing. Remember that making changes to these parameters on an earlier version than DB2 10.1 requires deactivation and activation of the database before the changes will actually take effect, and that means a database outage.

If you are using TSA/db2haicu to automate failover, it is important to understand if the IP addresses for the quoroum device or the Virtual IP are changing. If they are, update them through db2haicu. Be prepared to destroy your TSA domain and re-do the whole thing. Because of this, it is critical to have the inputs ready for doing that: http://db2commerce.com/2012/04/09/using-tsadb2haicu-to-automate-failover-part-1-the-preparation/. Depending on what is changing, it may be possible to go in and make selective changes using db2haicu to add/remove a quoroum device or to add/remove a virtual IP address. In the worst case scenario, the network changes may require re-doing the setup from scratch.

Host Name Changes

This is actually pretty well documented by IBM: http://www-01.ibm.com/support/docview.wss?uid=swg21258834. IBM’s document also covers Windows systems, which I’m not covering in this post.

If the host name changes, change db2nodes.cfg. db2nodes.cfg is located in the instance owner’s home directory, in the sqllib subdirectory. Use a text editor to edit it with the updated host name.

The next thing to change is the value of the DB2SYSTEM DB2 registry variable. It is a variable that rarely requires changing, but if you do a db2set -all, one of the parameters that you will see set is called DB2SYSTEM. It looks like this:

$ db2set -all |grep g
[g] DB2FCMCOMM=TCPIP4
[g] DB2SYSTEM=svq00db01z.domain.com
[g] DB2INSTDEF=db2inst1
[g] DB2ADMINSERVER=dasusr1

Chances are that most DBAs have never even changed the [g] parameters. Root is required to change this. Either su to root or login as root, and then source the db2profile and use the db2set command like this:

# . /db2home/db2inst1/sqllib/db2profile
# db2set -g DB2SYSTEM=svs00db01z.domain.com

Of course the path for the db2profile will vary. It is in the instance owner’s home directory under the sqllib sub directory.

Check the output of db2set again – it should be correct. Restart the instance(db2stop/db2start) for the change to take effect.

When using HADR, changes are required to the parameters for the host names – HADR_LOCAL_HOST and/or HADR_REMOTE_HOST on both the primary and the standby database servers. Remember that, unless using DB2 10.1 or later, DB2 requires deactivating and activating the database before the changes will actually take effect. That means a database outage.

For TSA/db2haicu, it is important to understand if the IP addresses for the quorum device or the Virtual IP are changing. If they are, update them, and be prepared to destroy your TSA domain and re-do the whole thing. Have all the inputs handy from http://db2commerce.com/2012/04/09/using-tsadb2haicu-to-automate-failover-part-1-the-preparation/. If the database server names are changing, be prepared to re-do the setup from scratch. If you fully understand what TSA is doing under the covers, making the changes outside of db2haicu may be possible, but I don’t have instructions on that.

Hosts File

An incorrect hosts file can have some interesting effects, and of course it is most likely to be incorrect when something like the hostname or IP address changes and someone has to also change the hosts file. When I went to look at a colleague’s HADR standby after an IP address change recently, I got this error every time I tried to start DB2:

SQL6048N A communication error occurred during START or STOP DATABASE MANAGER processing.

This error was particularly perplexing given the fact that I was working with a single-node environment – it didn’t seem to me that any communication would be necessary. But I even got it when trying to list applications. In that case, I discovered that the /etc/hosts file had an entry for the local server, but it did not reference the full server name returned by the ‘hostname’ command. It did, however have the short name listed in db2nodes.cfg. I must have spent 10 minutes doing ‘hostname’, and looking at the db2nodes.cfg file before I figured this one out. It’s not just for TSA that the hosts file and the results of the ‘hostname’ command must be in sync with db2nodes.cfg. In this case, I edited /etc/hosts with the proper fully qualified host name, and the error then went away.

The same kind of confusion can occur after cloning a server at the OS level, when changing the hostname, or when changing the IP address.


Looking at How Much Memory DB2 is Using

$
0
0

I used to think that if I could just get enough details into a spreadsheet, I could tell exactly how much memory DB2 would be using at any point in time. I gave up on the spreadsheet idea long ago, though when I was working with 32-bit systems and their limit of ~2GB for the most critical memory areas, I did use a simplified spreadsheet when adjusting to make sure I could keep it under the 2GB.

Thankfully, 64-bit databases mean that I am less likely to have to rob bufferpools to give memory to sorts or vice-versa. Also, I use STMM and automatic settings for many areas on my one-database->one-instance->one-server systems.

But even with these advances it is best to understand DB2’s memory model and have the real-world commands to figure out what is going on.

DB2’s Memory Model

There is a lot of good material on this, so I am not going to go into a full description here. I suggest this developerWorks article as a great read on this: http://www.ibm.com/developerworks/data/library/techarticle/dm-0406qi/

The way I think of it, there are basically three types of memory areas – ones that are allocated at the instance level, ones that are allocated at the database level, and ones that are allocated at the application level.

Instance level memory areas include:

  • AUDIT_BUF_SZ
  • MON_HEAP_SZ
  • FCM areas

Database level memory areas include:

Application level memory areas include:

  • APPLHEAPSZ
  • STMTHEAP
  • STAT_HEAP_SZ
  • Private Sort Heaps (SHEAPTHRES and SORTHEAP)
  • AGENT_STACK_SZ
  • ASLHEAPSZ
  • RQRIOBLK

Why there is no exact way to say “DB2 should be using X memory at any given time”?

It took me a while to understand this. There are two big reasons you cannot just add up parameters as configured in DBM and DB configurations and say exactly how much memory DB2 should be using at any one time.

  1. Some memory areas are allocated only as applications connect or agents are started up, and the number of connected applications can change from second to second
  2. Different memory is allocated at different times. Some areas are allocated in full on instance start or database activation or application connection. Some areas start at a minimum size and are incremented by db2 up to a configured maximum

If you work really hard at it, you can come up with an accurate range of memory allocation that DB2 should fall into at any given time based on your configuration settings, but estimating an exact number accurately is very difficult, especially for a live system with a variable number of end users.

Understanding an Individual Memory Area

Sometimes you need to approach memory from this perspective – understanding how much space the buffer pools are taking, or how much space the Package Cache is using for example.

I really do love the Info Center. I still have a complete three foot long book shelf of db2 8.1 reference books that has been replaced by the info center. Everything that was there is somewhere in the Info Center. Most memory areas correspond to a specific configuration parameter. You can look that configuration parameter up in the information center, you can figure out when it is allocated, whether it is allocated at an exact value, or if it is allocated at a minimum and then allocated up to a maximum. I have blog entries on some areas too.

Take the package cache as an example. The associated parameter is PCKCACHESZ (the one I most frequently misspell). In the Info Center, PCKCACHESZ is detailed here: http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/topic/com.ibm.db2.luw.admin.config.doc/doc/r0000266.html

If you look at that page there is a section titled “When allocated” – that is how you can tell when this memory area is allocated.
If you read through the text, you will see details like this:
“The limit specified by the pckcachesz parameter is a soft limit. This limit can be exceeded, if required, if memory is still available in the database shared set. You can use the pkg_cache_size_top monitor element to determine the largest that the package cache has grown, and the pkg_cache_num_overflows monitor element to determine how many times the limit specified by the pckcachesz parameter has been exceeded.”

From reading the Info Center entry, you can learn that the area is allocated in full on database activation, but this particular memory area can exceed the value you set for your parameter.

To contrast, look at the info center entry for LOCKLIST: http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/topic/com.ibm.db2.luw.admin.config.doc/doc/r0000267.html

If you read the text carfully, you will see that the lock list is an area that is allocated in full on the first database connection. BUT your lock list will absolutely not get larger than the configured value for LOCKLIST – to the point of degrading performance and finally returning an error if it gets that far.

These are very different behaviors for memory areas that are both allocated at the database level.

How to Tell How Much Memory DB2 is Using

For the most part, I will leave looking at DB2 memory from the OS side to SAs and when needed, googling. I have used top before on linux/UNIX, but it just reports one (generally large) number for all of DB2. That is not terribly helpful, and the main situation I have used it in was when someone was using it and called me and said “What is this db2sysc that is using over 50% of the memory?”, and I had to compare it back to what memory DB2 thought it was using.

top

top looks something like this:

$ top
top - 21:56:00 up 97 days,  8:28,  1 user,  load average: 0.11, 0.04, 0.01
Tasks: 206 total,   1 running, 203 sleeping,   0 stopped,   2 zombie
Cpu(s):  0.0%us,  0.3%sy,  0.0%ni, 99.3%id,  0.3%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   8060476k total,  7696276k used,   364200k free,     7152k buffers
Swap:  6291448k total,    75748k used,  6215700k free,  6725432k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
16756 db2inst1  20   0 17220 1344  968 R  0.3  0.0   0:00.16 top
24131 db2inst1  20   0 6294m 4.1g 3.7g S  0.3 53.6   2869:02 db2sysc

The nice thing about top is that it does show changes over time, but it can also consume system resources.

db2mtrk

When we want to dig further into memory than the one big number, I have found db2mtrk as a reliable method. It is used like this:

$ db2mtrk -i
Tracking Memory on: 2012/11/15 at 22:33:30

Memory for instance

   other       fcmbp       monh        
   17.0M       832.0K      384.0K      

$ db2mtrk -d
Tracking Memory on: 2012/11/15 at 22:33:38

Memory for database: SAMPLE

   utilh       pckcacheh   other       catcacheh   bph (4)     bph (3)     
   128.0K      44.9M       192.0K      21.9M       106.4M      217.9M      

   bph (2)     bph (1)     bph (S32K)  bph (S16K)  bph (S8K)   bph (S4K)   
   135.8M      2.1G        832.0K      576.0K      448.0K      384.0K      

   lockh       dbh         apph (52487)apph (52484)apph (52482)apph (52427)
   567.2M      34.9M       128.0K      128.0K      256.0K      192.0K      

   apph (52426)apph (52424)apph (52423)apph (52422)apph (52421)apph (45849)
   192.0K      128.0K      64.0K       192.0K      192.0K      64.0K       

   apph (45848)apph (45847)apph (45846)apph (45845)apph (45844)appshrh     
   64.0K       576.0K      64.0K       64.0K       64.0K       6.7M        

Adding up all of these numbers will give you a pretty accurate number as far as total memory usage. There are a few other useful option on db2mtrk – “db2mtrk -h” provides both a usage diagram and meanings of the abbreviations used for the various heaps.

db2top

I am a big fan of db2top for watching a database in real time. The memory screen can be useful here, and shows similar memory areas:

[/]23:28:40,refresh=60secs(0.004)                                                                             Memory                                                                            Linux,part=[1/1],DB2INST1:SAMPLEDB
[d=Y,a=N,e=N,p=ALL]                                                                                                                                                                                                      [qp=off]

                                                                             ┌──────────────┬────────────┬────────────┬────────────┬───────────┐
                                          Self Tuning......:         On      │              │         25%│         50%│         75%│	   100%│   Sort Heap........:          0
                                          Sort HWM.........:     117.6M      │Memory hwm%   │--------------------------------------------------│   Private Mem......:     109.8M
                                          Lock List........:     198.0K      │Sort Heap%    │                                                  │   Private Sort.....:          0
                                          Shared Sort......:          0      │Mem Skew%     │                                                  │   Shared Sort HWM..:          0
                                          Private Work HWM.:          0      │Pool Skew%    │                                                  │   PkgCache HWM.....:     156.0M
                                          Catalog Cache HWM:      15.1M      └──────────────┴──────────────────────────────────────────────────┘   Shared Work HWM..:          0

                                                           Memory                 Memory               Percent      Current         High Percent      Maximum # of    
                                                           Type        Level      Pool                   Total         Size    WaterMark     Max         Size Pool(s) 
                                                           ----------- ---------- -------------------- ------- ------------ ------------ ------- ------------ ------- 
                                                           Instance    DB2INST1   Monitor                0.01%       384.0K         1.1M 100.00%       384.0K       1
                                                           Instance    DB2INST1   FCMBP                  0.02%       832.0K       832.0K 100.00%       832.0K       1
                                                           Instance    DB2INST1   Other                  0.51%        17.1M        20.3M  40.44%        42.5M       1
                                                           Database    WC036D01   Applications           0.07%         2.5M         9.6M  13.89%        18.0M      18
                                                           Database    WC036D01   Database               1.04%        34.9M        34.9M  84.83%        41.1M       1
                                                           Database    WC036D01   Lock Mgr              16.88%       567.2M       595.6M 100.24%       565.8M       1
                                                           Database    WC036D01   Utility                0.00%       128.0K        66.6M   0.09%       139.4M       1
                                                           Database    WC036D01   Package Cache          1.11%        37.2M       198.3M 116.41%        32.0M       1
                                                           Database    WC036D01   Catalog Cache          0.65%        21.8M        21.8M 136.72%        16.0M       1
                                                           Database    WC036D01   Other                  0.01%       192.0K       192.0K   0.94%        20.0M       1
                                                           Database    WC036D01   BufferPool            79.24%         2.6G         3.1G 100.00%         2.6G       8
                                                           Database    WC036D01   ApplShrHeap            0.20%         6.6M        28.0M   8.56%        78.1M       1
                                                           Application WC036D01   Applications           0.07%         2.5M         9.6M  13.89%        18.0M      18
                                                           Application WC036D01   Other                  0.19%         6.2M        12.1M   0.00%       138.3G      18


















Quit: q, Help: h                                                                                        Total memory 3.2G                                                                                              db2top 2.0

Notice in the very middle at the bottom, it gives you a total number that matches up well with the other methods described here.

db2pd

db2pd also lets us look at these memory areas in detail:

$ db2pd -mempools

Database Partition 0 -- Active -- Up 43 days 10:01:38 -- Date 2012-11-15-22.37.17.338066

Memory Pools:
Address            MemSet   PoolName   Id    Overhead   LogSz       LogUpBnd    LogHWM      PhySz       PhyUpBnd    PhyHWM      Bnd BlkCnt CfgParm   
0x00000002000012B8 DBMS     fcm        74    0          0           706833      0           0           720896      0           Ovf 0      n/a       
0x0000000200001170 DBMS     fcmsess    77    65376      1401568     1687552     1401568     1572864     1703936     1572864     Ovf 3      n/a       
0x0000000200001028 DBMS     fcmchan    79    65376      259584      507904      259584      393216      524288      393216      Ovf 3      n/a       
0x0000000200000EE0 DBMS     fcmbp      13    65376      656896      925696      656896      851968      983040      851968      Ovf 3      n/a       
0x0000000200000D98 DBMS     fcmctl     73    126624     1513973     3597992     1513973     1703936     3604480     1703936     Ovf 11     n/a       
0x0000000200000C50 DBMS     monh       11    122496     187124      368640      720892      393216      393216      1179648     Ovf 29     MON_HEAP_SZ
0x0000000200000B08 DBMS     resynch    62    42816      153720      2752512     153720      262144      2752512     262144      Ovf 2      n/a       
0x00000002000009C0 DBMS     apmh       70    4512       2498164     8257536     2521972     3080192     8257536     3080192     Ovf 121    n/a       
0x0000000200000878 DBMS     kerh       52    32         1816944     4128768     1824648     2031616     4128768     2031616     Ovf 195    n/a       
0x0000000200000730 DBMS     bsuh       71    0          444750      15335424    2838531     1048576     15335424    4718592     Ovf 64     n/a       
0x00000002000005E8 DBMS     sqlch      50    0          2618755     2686976     2618755     2686976     2686976     2686976     Ovf 208    n/a       
0x00000002000004A0 DBMS     krcbh      69    0          147400      196608      147840      196608      196608      196608      Ovf 15     n/a       
0x0000000200000358 DBMS     eduah      72    44864      4608024     4608064     4608024     4653056     4653056     4653056     Ovf 1      n/a       
0x0000000210000358 FMP      undefh     59    56000      860300      22971520    860300      917504      23003136    917504      Phy 7      n/a       
$ db2pd -db sampledb -mempools

Database Partition 0 -- Database SAMPLEDB -- Active -- Up 4 days 02:22:50 -- Date 2012-11-15-22.39.41.223822

Memory Pools:
Address            MemSet   PoolName   Id    Overhead   LogSz       LogUpBnd    LogHWM      PhySz       PhyUpBnd    PhyHWM      Bnd BlkCnt CfgParm   
0x00007FFF4A0A0408 SAMPLEDB utilh      5     0          3840        146210816   69378188    131072      146210816   69861376    Ovf 16     UTIL_HEAP_SZ
0x00007FFF4A0A0178 SAMPLEDB pckcacheh  7     2772288    32405296    Unlimited   163654502   47120384    Unlimited   208011264   Ovf 4329   PCKCACHESZ
0x00007FFF4A0A0030 SAMPLEDB xmlcacheh  93    50880      145552      20971520    145552      196608      20971520    196608      Ovf 1      n/a       
0x00007FFE9ED11BB0 SAMPLEDB catcacheh  8     90464      16768378    Unlimited   16777210    22937600    Unlimited   22937600    Ovf 4400   CATALOGCACHE_SZ
0x00007FFE9ED117D8 SAMPLEDB bph        16    40800      111484416   Unlimited   111484416   111607808   Unlimited   111607808   Ovf 846    n/a       
0x00007FFE9ED11548 SAMPLEDB bph        16    81600      228203472   Unlimited   228203472   228458496   Unlimited   228458496   Ovf 1725   n/a       
0x00007FFE9ED112B8 SAMPLEDB bph        16    81600      142117728   Unlimited   146656864   142344192   Unlimited   146931712   Ovf 1065   n/a       
0x00007FFE9ED11028 SAMPLEDB bph        16    2121600    2302992752  Unlimited   2875827056  2308177920  Unlimited   2881814528  Ovf 16914  n/a       
0x00007FFE9ED10D98 SAMPLEDB bph        16    0          783104      Unlimited   783104      851968      Unlimited   851968      Ovf 5      n/a       
0x00007FFE9ED10B08 SAMPLEDB bph        16    0          520960      Unlimited   520960      589824      Unlimited   589824      Ovf 3      n/a       
0x00007FFE9ED10878 SAMPLEDB bph        16    0          389888      Unlimited   389888      458752      Unlimited   458752      Ovf 2      n/a       
0x00007FFE9ED105E8 SAMPLEDB bph        16    0          324352      Unlimited   324352      393216      Unlimited   393216      Ovf 2      n/a       
0x00007FFE9ED104A0 SAMPLEDB lockh      4     0          594782848   624623616   624536192   594804736   624623616   624558080   Ovf 1      LOCKLIST  
0x00007FFE9ED10358 SAMPLEDB dbh        2     1883264    32304071    43188224    33257585    36634624    43188224    36634624    Ovf 25517  DBHEAP    
0x00007FFFB63B1920 Appl     apph       1     0          20375       1048576     71576       131072      1048576     131072      Phy 50     APPLHEAPSZ
0x00007FFFB66F3C68 Appl     apph       1     0          176492      1048576     14066624    262144      1048576     14221312    Phy 219    APPLHEAPSZ
0x00007FFFB63B0D98 Appl     apph       1     0          10935       1048576     10935       65536       1048576     65536       Phy 18     APPLHEAPSZ
0x00007FFFB63B0C50 Appl     apph       1     0          10935       1048576     10935       65536       1048576     65536       Phy 18     APPLHEAPSZ
0x00007FFFB63B0B08 Appl     apph       1     0          358615      1048576     402159      589824      1048576     655360      Phy 4364   APPLHEAPSZ
0x00007FFFB63B09C0 Appl     apph       1     0          10935       1048576     10935       65536       1048576     65536       Phy 18     APPLHEAPSZ
0x00007FFFB63B0878 Appl     apph       1     0          10935       1048576     12919       65536       1048576     65536       Phy 18     APPLHEAPSZ
0x00007FFFB63B0730 Appl     apph       1     0          10935       1048576     10935       65536       1048576     65536       Phy 18     APPLHEAPSZ
0x00007FFFB63B0358 Appl     appshrh    20    133568     4077928     81920000    27070923    6881280     81920000    29425664    Phy 1657   application shared

There are interesting looking columns here for things like high water marks and upper bounds. See those “Unlimited” – well, they are not fully unlimited. They are limited by the amount of physical and virtual memory on the server, and possibly by things that provide upper bounds on a composite basis like DATABASE_MEMORY or INSTANCE_MEMORY.

For a slightly different way of looking at it, you can also look at it from a memory sets point of view with db2pd:

$ db2pd -db sampledb -memsets

Database Partition 0 -- Database SAMPLEDB -- Active -- Up 4 days 02:30:10 -- Date 2012-11-15-22.47.01.149499

Memory Sets:
Name         Address            Id          Size(Kb)   Key         DBP    Type   Unrsv(Kb)  Used(Kb)   HWM(Kb)    Cmt(Kb)    Uncmt(Kb) 
SAMPLEDB     0x00007FFE9ED10000 639729668   4428736    0x0         0      1      865920     3413120    4005824    3845824    160000    
  Seg0       0x00007FFE7ED10000 811827205   0          0x0         0      1      0          0          0          0          0         
  Seg1       0x00007FFE6DF5D000 811859975   0          0x0         0      1      0          0          0          0          0         
AppCtl       0x00007FFFB63B0000 639696899   16512      0x0         0      12     0          8640       165248     16192      320       
  Seg0       0x00007FFE8ED10000 770605065   0          0x0         0      12     0          0          0          0          0         
App52482     n/a                821788678   128        0x0         0      4      0          128        0          128        0         
$ db2pd -memsets

Database Partition 0 -- Active -- Up 43 days 10:11:39 -- Date 2012-11-15-22.47.18.517271

Memory Sets:
Name         Address            Id          Size(Kb)   Key         DBP    Type   Unrsv(Kb)  Used(Kb)   HWM(Kb)    Cmt(Kb)    Uncmt(Kb) 
DBMS         0x0000000200000000 1497006081  48192      0x7F9F7661  0      0      6208       18624      22144      22144      26048     
FMP          0x0000000210000000 1497038850  22592      0x0         0      0      2          0          960        22592      0         
Trace        0x0000000000000000 1496973312  39251      0x7F9F7674  0      -1     0          39251      0          39251      0         

Man, I love MON_GET…

The MON_GET table functions are by far one of my favorite recent features. I hear they come from Informix inspiration, and they are inspiring, containing data we have never had SQL access to before and getting it in a lightweight way. MON_GET_MEMORY_SETS and MON_GET_MEMORY_POOLS should correspond to the db2pd way of looking at things.

$ db2 "select substr(HOST_NAME,1,25) as HOST_NAME, substr(DB_NAME,1,10) as DBNAME, MEMORY_SET_TYPE, MEMORY_SET_ID, MEMORY_SET_SIZE, MEMORY_SET_COMMITTED, MEMORY_SET_USED, MEMORY_SET_USED_HWM from table(MON_GET_MEMORY_SET(null,null,-2)) as t with ur"

HOST_NAME                 DBNAME     MEMORY_SET_TYPE                  MEMORY_SET_ID        MEMORY_SET_SIZE      MEMORY_SET_COMMITTED MEMORY_SET_USED      MEMORY_SET_USED_HWM 
------------------------- ---------- -------------------------------- -------------------- -------------------- -------------------- -------------------- --------------------
xxxxxxxxxxxxxxxxxxxxx.xxx -          DBMS                                                0                49348                22675                19136                22675
xxxxxxxxxxxxxxxxxxxxx.xxx -          FMP                                                 2                23134                23134                  983                  983
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                                             9               115212               115212                61276               173015
xxxxxxxxxxxxxxxxxxxxx.xxx SAMPLEDB   DATABASE                                            1              4535025              3938123              3493789              4101963
xxxxxxxxxxxxxxxxxxxxx.xxx SAMPLEDB   APPLICATION                                        12               170065                16580                 9371               169213

  5 record(s) selected.
[db2inst1@435753-svd36db01r ~]$ db2 "select substr(HOST_NAME,1,25) as HOST_NAME, substr(DB_NAME,1,10) as DBNAME, MEMORY_SET_TYPE, MEMORY_POOL_TYPE, MEMORY_POOL_ID, APPLICATION_HANDLE, EDU_ID, MEMORY_POOL_USED, MEMORY_POOL_USED_HWM from table(MON_GET_MEMORY_POOL(null,null,-2)) as t with ur"

HOST_NAME                 DBNAME     MEMORY_SET_TYPE                  MEMORY_POOL_TYPE                 MEMORY_POOL_ID       APPLICATION_HANDLE   EDU_ID               MEMORY_POOL_USED     MEMORY_POOL_USED_HWM
------------------------- ---------- -------------------------------- -------------------------------- -------------------- -------------------- -------------------- -------------------- --------------------
xxxxxxxxxxxxxxxxxxxxx.xxx -          DBMS                             FCM_LOCAL                                          74                    -                    -                    0                    0
xxxxxxxxxxxxxxxxxxxxx.xxx -          DBMS                             FCM_SESSION                                        77                    -                    -              1572864              1572864
xxxxxxxxxxxxxxxxxxxxx.xxx -          DBMS                             FCM_CHANNEL                                        79                    -                    -               393216               393216
xxxxxxxxxxxxxxxxxxxxx.xxx -          DBMS                             FCMBP                                              13                    -                    -               851968               851968
xxxxxxxxxxxxxxxxxxxxx.xxx -          DBMS                             FCM_CONTROL                                        73                    -                    -              1703936              1703936
xxxxxxxxxxxxxxxxxxxxx.xxx -          DBMS                             MONITOR                                            11                    -                    -               393216              1179648
xxxxxxxxxxxxxxxxxxxxx.xxx -          DBMS                             RESYNC                                             62                    -                    -               262144               262144
xxxxxxxxxxxxxxxxxxxxx.xxx -          DBMS                             APM                                                70                    -                    -              3080192              3080192
xxxxxxxxxxxxxxxxxxxxx.xxx -          DBMS                             KERNEL                                             52                    -                    -              2031616              2031616
xxxxxxxxxxxxxxxxxxxxx.xxx -          DBMS                             BSU                                                71                    -                    -              1376256              4718592
xxxxxxxxxxxxxxxxxxxxx.xxx -          DBMS                             SQL_COMPILER                                       50                    -                    -              2686976              2686976
xxxxxxxxxxxxxxxxxxxxx.xxx -          DBMS                             KERNEL_CONTROL                                     69                    -                    -               196608               196608
xxxxxxxxxxxxxxxxxxxxx.xxx -          DBMS                             EDU                                                72                    -                    -              4653056              4653056
xxxxxxxxxxxxxxxxxxxxx.xxx -          FMP                              MISC                                               59                    -                    -               917504               917504
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                          PRIVATE                                            88                    -                 1429                    0                    0
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                          PRIVATE                                            88                    -                 1480               196608               196608
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                                                                             90                    -                 1379                    0                    0
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                                                                             90                    -                 1379                    0                    0
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                          PRIVATE                                            88                    -                 1379               393216              1572864
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                          PRIVATE                                            88                    -                 1451               327680               327680
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                          PRIVATE                                            88                    -                 1524               524288               524288
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                          PRIVATE                                            88                    -                 1403               327680               393216
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                          PRIVATE                                            88                    -                 1444               327680               720896
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                          PRIVATE                                            88                    -                 1423               393216              1507328
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                          PRIVATE                                            88                    -                 1521               262144               524288
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                          PRIVATE                                            88                    -                 1502               524288               851968
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                          PRIVATE                                            88                    -                 1504               655360               851968
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                          PRIVATE                                            88                    -                 1503               393216               589824
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                          PRIVATE                                            88                    -                 1536               458752               851968
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                          PRIVATE                                            88                    -                 1418               524288               851968
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                          PRIVATE                                            88                    -                 1412               393216               851968
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                          PRIVATE                                            88                    -                 1519               458752              1572864
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                          PRIVATE                                            88                    -                 1535               458752               655360
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                          PRIVATE                                            88                    -                 1408               720896              3211264
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                          PRIVATE                                            88                    -                 1449               327680               524288
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                          PRIVATE                                            88                    -                 1526               262144               655360
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                          PRIVATE                                            88                    -                 1410               458752               851968
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                          PRIVATE                                            88                    -                 1525               393216              1507328
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                          PRIVATE                                            88                    -                 1401               524288              1507328
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                          PRIVATE                                            88                    -                 1446               458752               917504
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                          PRIVATE                                            88                    -                 1532               589824               851968
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                          PRIVATE                                            88                    -                 1409               196608               196608
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                          PRIVATE                                            88                    -                 1400               393216               524288
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                          PRIVATE                                            88                    -                 1437               589824               589824
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                          PRIVATE                                            88                    -                 1283               393216              1507328
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                          PRIVATE                                            88                    -                 1529               589824               589824
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                          PRIVATE                                            88                    -                 1405               917504              3407872
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                          PRIVATE                                            88                    -                 1447               655360              1572864
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                          PRIVATE                                            88                    -                 1433               655360              1572864
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                          PRIVATE                                            88                    -                 1428               720896              1179648
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                          PRIVATE                                            88                    -                 1520               786432              1179648
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                          PRIVATE                                            88                    -                 1443               589824               720896
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                          PRIVATE                                            88                    -                 1430               655360               917504
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                          PRIVATE                                            88                    -                 1422               655360               917504
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                          PRIVATE                                            88                    -                 1445               458752              1507328
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                          PRIVATE                                            88                    -                 1498               655360               655360
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                          PRIVATE                                            88                    -                 1450               655360               655360
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                          PRIVATE                                            88                    -                 1534               393216              1572864
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                          PRIVATE                                            88                    -                 1424               196608               262144
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                          PRIVATE                                            88                    -                 1538               786432              1572864
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                          PRIVATE                                            88                    -                 1416               589824               589824
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                          PRIVATE                                            88                    -                 1442               655360               720896
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                          PRIVATE                                            88                    -                 1518               589824               720896
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                          PRIVATE                                            88                    -                 1537               458752               720896
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                          PRIVATE                                            88                    -                 1539               196608               196608
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                          PRIVATE                                            88                    -                 1533               327680               720896
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                          PRIVATE                                            88                    -                 1523               589824               589824
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                          PRIVATE                                            88                    -                 1505               131072               131072
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                          PRIVATE                                            88                    -                 1506               196608              4521984
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                          PRIVATE                                            88                    -                 1528               655360               655360
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                          PRIVATE                                            88                    -                 1527               589824               720896
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                          PRIVATE                                            88                    -                 1522               589824               720896
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                          PRIVATE                                            88                    -                 1417               655360               720896
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                          PRIVATE                                            88                    -                 1420               524288               589824
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                          PRIVATE                                            88                    -                 1508                65536              2228224
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                          PRIVATE                                            88                    -                 1517               262144               262144
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                          PRIVATE                                            88                    -                 1516               262144               262144
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                          PRIVATE                                            88                    -                 1515               262144               262144
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                          PRIVATE                                            88                    -                 1514               262144               262144
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                          PRIVATE                                            88                    -                 1513               262144               262144
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                          PRIVATE                                            88                    -                 1512               327680               458752
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                          PRIVATE                                            88                    -                 1511               262144               262144
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                          PRIVATE                                            88                    -                 1510                65536                65536
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                          PRIVATE                                            88                    -                 1507               393216               393216
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                          PRIVATE                                            88                    -                 1435               327680               327680
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                          PRIVATE                                            88                    -                 1421               196608               196608
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                          PRIVATE                                            88                    -                 1501               196608               196608
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                          PRIVATE                                            88                    -                 1399               196608               524288
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                          PRIVATE                                            88                    -                 1439               196608               196608
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                          PRIVATE                                            88                    -                 1499               196608               196608
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                          PRIVATE                                            88                    -                 1397               786432              3276800
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                          PRIVATE                                            88                    -                 1414               655360               720896
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                          PRIVATE                                            88                    -                 1432               589824               655360
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                          PRIVATE                                            88                    -                 1415               589824               851968
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                          PRIVATE                                            88                    -                 1427               458752               655360
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                          PRIVATE                                            88                    -                 1169               589824               655360
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                          PRIVATE                                            88                    -                 1452               655360               655360
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                          PRIVATE                                            88                    -                 1500               524288               589824
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                          PRIVATE                                            88                    -                 1426               720896              1048576
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                          PRIVATE                                            88                    -                   16                    0                    0
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                          PRIVATE                                            88                    -                   14                65536                65536
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                          PRIVATE                                            88                    -                   13                65536                65536
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                          PRIVATE                                            88                    -                    0                65536                65536
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                          USER_DATA                                          95                    -                    0              8650752             11337728
xxxxxxxxxxxxxxxxxxxxx.xxx -          PRIVATE                          PERSISTENT_PRIVATE                                 86                    -                    0             15400960             36503552
xxxxxxxxxxxxxxxxxxxxx.xxx SAMPLEDB   DATABASE                         UTILITY                                             5                    -                    -               131072             69861376
xxxxxxxxxxxxxxxxxxxxx.xxx SAMPLEDB   DATABASE                         PACKAGE_CACHE                                       7                    -                    -             39059456            208011264
xxxxxxxxxxxxxxxxxxxxx.xxx SAMPLEDB   DATABASE                         XMLCACHE                                           93                    -                    -               196608               196608
xxxxxxxxxxxxxxxxxxxxx.xxx SAMPLEDB   DATABASE                         CAT_CACHE                                           8                    -                    -             22937600             22937600
xxxxxxxxxxxxxxxxxxxxx.xxx SAMPLEDB   DATABASE                         BP                                                 16                    -                    -            111607808            111607808
xxxxxxxxxxxxxxxxxxxxx.xxx SAMPLEDB   DATABASE                         BP                                                 16                    -                    -            228458496            228458496
xxxxxxxxxxxxxxxxxxxxx.xxx SAMPLEDB   DATABASE                         BP                                                 16                    -                    -            142344192            146931712
xxxxxxxxxxxxxxxxxxxxx.xxx SAMPLEDB   DATABASE                         BP                                                 16                    -                    -           2308177920           2881814528
xxxxxxxxxxxxxxxxxxxxx.xxx SAMPLEDB   DATABASE                         BP                                                 16                    -                    -               851968               851968
xxxxxxxxxxxxxxxxxxxxx.xxx SAMPLEDB   DATABASE                         BP                                                 16                    -                    -               589824               589824
xxxxxxxxxxxxxxxxxxxxx.xxx SAMPLEDB   DATABASE                         BP                                                 16                    -                    -               458752               458752
xxxxxxxxxxxxxxxxxxxxx.xxx SAMPLEDB   DATABASE                         BP                                                 16                    -                    -               393216               393216
xxxxxxxxxxxxxxxxxxxxx.xxx SAMPLEDB   DATABASE                         LOCK_MGR                                            4                    -                    -            594804736            624558080
xxxxxxxxxxxxxxxxxxxxx.xxx SAMPLEDB   DATABASE                         DATABASE                                            2                    -                    -             36634624             36634624
xxxxxxxxxxxxxxxxxxxxx.xxx SAMPLEDB   APPLICATION                      APPLICATION                                         1                52543                    -                65536                65536
xxxxxxxxxxxxxxxxxxxxx.xxx SAMPLEDB   APPLICATION                      APPLICATION                                         1                52542                    -                65536                65536
xxxxxxxxxxxxxxxxxxxxx.xxx SAMPLEDB   APPLICATION                      APPLICATION                                         1                52540                    -               262144              7274496
xxxxxxxxxxxxxxxxxxxxx.xxx SAMPLEDB   APPLICATION                      APPLICATION                                         1                52537                    -               196608               262144
xxxxxxxxxxxxxxxxxxxxx.xxx SAMPLEDB   APPLICATION                      APPLICATION                                         1                52532                    -               196608               196608
xxxxxxxxxxxxxxxxxxxxx.xxx SAMPLEDB   APPLICATION                      APPLICATION                                         1                52531                    -               196608               327680
xxxxxxxxxxxxxxxxxxxxx.xxx SAMPLEDB   APPLICATION                      APPLICATION                                         1                52530                    -               131072               131072
xxxxxxxxxxxxxxxxxxxxx.xxx SAMPLEDB   APPLICATION                      APPLICATION                                         1                52527                    -               131072               131072
xxxxxxxxxxxxxxxxxxxxx.xxx SAMPLEDB   APPLICATION                      APPLICATION                                         1                52526                    -                65536               131072
xxxxxxxxxxxxxxxxxxxxx.xxx SAMPLEDB   APPLICATION                      APPLICATION                                         1                52525                    -               196608               196608
xxxxxxxxxxxxxxxxxxxxx.xxx SAMPLEDB   APPLICATION                      APPLICATION                                         1                52487                    -               131072               131072
xxxxxxxxxxxxxxxxxxxxx.xxx SAMPLEDB   APPLICATION                      APPLICATION                                         1                45849                    -                65536                65536
xxxxxxxxxxxxxxxxxxxxx.xxx SAMPLEDB   APPLICATION                      APPLICATION                                         1                45848                    -                65536                65536
xxxxxxxxxxxxxxxxxxxxx.xxx SAMPLEDB   APPLICATION                      APPLICATION                                         1                45847                    -               589824               655360
xxxxxxxxxxxxxxxxxxxxx.xxx SAMPLEDB   APPLICATION                      APPLICATION                                         1                45846                    -                65536                65536
xxxxxxxxxxxxxxxxxxxxx.xxx SAMPLEDB   APPLICATION                      APPLICATION                                         1                45845                    -                65536                65536
xxxxxxxxxxxxxxxxxxxxx.xxx SAMPLEDB   APPLICATION                      APPLICATION                                         1                45844                    -                65536                65536
xxxxxxxxxxxxxxxxxxxxx.xxx SAMPLEDB   APPLICATION                      APPL_SHARED                                        20                    -                    -              7340032             29425664

  137 record(s) selected.

Now that is pretty long, but think of the potential for easy addition and averaging and such.

I think you are going to get the best possible answer to “how much memory is DB2 using” from DB2’s perspective by using:

$ db2 "select SUM(MEMORY_POOL_USED) as TOT_MEMORY_USED from table(MON_GET_MEMORY_POOL(null,null,-2)) as t with ur"
TOT_MEMORY_USED     
--------------------
          3579117568

  1 record(s) selected.

IDUG NA 2013 Brain Dump

$
0
0

Wait, don’t avoid reading this because you didn’t go to the conference. There is still valuable stuff here. In fact, it is even more valuable for those who did not go. Last year, I posted my brain dump by basically re-writing all of my paper notes. This year, I’m going to try to make it a bit more organized.

It was really an awesome week. I learned so much, and I think I did well on my first time presenting. We’ll see when the surveys come back. If you’re not familiar with how they do things, IDUG tries to choose proven speakers. This means that if you have bad reviews, you’re less likely to be approved to present next year. So if you were there, go do all your reviews ASAP to give speakers their props.

This is a brain dump. It’s the stuff that really stood out to me in each session. It does no justice whatsoever to the depth and quality of the speakers’ presentations, and does not cover a bunch of stuff I already know. Please comment if you see any inaccuracies or have any comments on the sessions or my details of them.

Monday – ED4: DB2 for LUW Top Gun Performance Workshop – Scott Hayes and Martin Hubel

There’s a lot to learn here. I’m pretty good with performance tuning, and took Scott’s pre-conference workshop in 2005. It’s a whole other experience to take it now as a more experienced DBA. Many things I will use. I highly recommend it if you get the chance.

Tuesday – Keynote

The one and only session where I took no notes.

Tuesday – S02: DB2 for Linux, Unix, and Windows: Recent Enhancements and Trends – Matt Huras

I learn from Matt every time I see him speak. I got some interesting details about BLU and some interesting clues on what is coming down the road. My notes include:

  • BLU is a DMS tablespace type
  • Compression in 10.1 gains additional efficiency through the use of “Approximate Huffman Encoding” – my understanding of that is that a smaller value is chosen for the most frequently compressed values
  • Encrypted data does not need to be uncomressed to be evaluated
  • SIMD – Single Instruction, Multiple Data – in 10.5 DB2 can, for example, compare 4 values at once to see if they match a literal value
  • In 10.5, DB2 can dedicate specific CPUs to only look at memory for what they’re working on
  • A BLU table is specified by the “ORGANIZE BY COLUMN” keywords in the create table statement
  • A “synopsis” table is maintained alongside the actual columns of data for BLU tables. This is then used in some way for data elimination. This table is always there, and is changed atomically.
  • For analytics workloads, there is a registry variable – DB2_WORKLOAD that should be set to ANALYTICS
  • You can convert tables from normal to BLU
  • The create table statement will allow you to specify “ORGANIZE BY ROW” to explicitly specify a regular table
  • LODADing into BLU tables performs comparably to LOADing into traditional tables with indexes
  • BLU is not appropriate for high inserts/deletes because they have to touch more pages to do an insert than with traditional tables
  • IBM may or may not be including the ability to index on an expression in some future (post 10.5) release of DB2

Tuesday – E01: Advanced Query Tuning with IBM Data Studio for Developers – Tony Andrews

  • Ways of reducing I/O:
    • Index
    • System Tuning
    • BP Tuning
    • Early Elimination
    • Program and SQL Tuning
  • Residual predicates are ones that are applied after the data is retrieved and are therefore generally expensive
  • IXSCAN is displayed in an explain plan whether or not it is truly a scan – it could be just index access.

Tuesday – C02: DB2 10 LUW – Securely Hiding Behind the Mask – Rebecca Bond

  • With RCAC, we can do permissions for rows and masks for columns
  • This is also called FGAC or Fine-Grained Access Control
  • Column masking is the topic of this presentation
  • There is no need to mask all data – we can create a template and do partial masking
  • This can be included in db2look with the right flags
  • UDFs and Triggers can be secured
  • XML and LOBS cannot be masked
  • Catalog tables cannot be masked
  • Nicknames cannot be masked
  • Changes to masking invalidate the Package Cache
  • Change control of masking is critical
  • Must test masking changes very thoroughly to ensure they are providing the desired result
  • System tables for this stuff: SYSCAT.CONTROLS, SYSCAT.CONTROLDEP
  • Masking isn’t fool proof – it only covers when the data in a row is VIEWED – for example, a count(*) where query would still give information on the true values

Tuesday – D03: DB2 Busines Continuity Features – Dale Mcinnis

I always learn from Dale, too, when he speaks. I wish I could have attended his other session, but it conflicted with something I couldn’t miss.

  • Top causes of business interruption:
    • Human error (70-75%, mitigate this with documentation)
    • Planned maintenance
    • Disaster
    • Component failure
  • Recent Paradigm shift from failover to active/active
  • Geographic dispersion
  • Regular maintenance is simply required – you cannot just ignore it in favor of higher uptime
  • Application maintenance hassles:
    • Renaming of columns
    • New versions of stored procedures
  • Websphere Commerce does not support:
    • Range partitioned tables
    • Insert time clustering (ITC) tables
  • IBM is working harder to get the apps they write anyway to support more recent features quickly. It sounds like we may get Commerce certifying on DB2 10 this year
  • Dale published a great whitepaper on DB2 integration with dedup devices http://www.ibm.com/developerworks/data/library/techarticle/dm-1302db2deduplication/index.html
  • TSM dedup can run on the client, which saves network traffic
  • IBM is working on support for online schema changes
  • IBM’s not necessarily scientific polling inds that 75% of OLTP clients, whether on DB2 or Oracle, are doing nothing for DR, only HA
  • Xcoto has been renamed to Unity!
  • DR technologies are generally:
    • Physical replication
    • Logical replication
    • Storage replication
    • Single cluster across multiple locations
  • Not all data must be replicated across sites – such as guest carts
  • HADR modes for HA
    • Sync
    • Nearsync
  • HADR modes for DR:
    • Async
    • Super Async
  • IBM is considering an admin mode on the target for Q replication to turn off things like Triggers

Tuesday – H04: Best Practices: Upgrade to DB2 LUW 10 – Melanie Stopfer

Wow, Melanie really packs the information in. Worth downloading the slides for any presentation she gives.

  • Upgrade servers first, clients later
  • Use db2prereqcheck
  • After upgrade, remember to:
    • Uninstall Firefox browser
    • Re-apply licenses
    • db2fs (optional)
    • db2val -a
    • db2ls
  • db2ckupgrade
  • With 10.1, type 1 indexes are no longer allowed. If you are coming from 9.7, you already do not have them, so you can specify -not1 to skip this step and speed it up
  • Tempspace must be 2X SYSCAT space
  • SYSCAT space must be 50% empty
  • The upgrade is done as a single UOW, so be careful of log space
  • Query Patroller has been discontinued
  • HADR MUST be stopped
  • Run db2ckbkup and db2cklogs after backup and before upgrade
  • If you are really desperate, technically only the last delta backup must be offline – the full could be online
  • Always verify the upgrade with db2level
  • There is a url for getting any needed bind packages
  • Upgrade does runstats on SYSCAT tables only – may need a runstats for other tables
  • db2tdbmgr to upgrade tools (things in the SYSTOOLS tablespace)
  • db2exmig can be used to upgrade the explain tables – if you have data in them you want to keep
  • Event monitor tables must be upgraded too, or event monitors won’t work – both formatted and unformatted tables
  • To undo an upgrade, you have to drop and re-create the instance – that’s why it is critical to have all settings documented
  • Always double-check EVERY config after upgrade – Diff is your friend
  • Consider using your HADR standby for backout if your upgrade on the primary fails before you have also done the standby
  • Copy all log files and .mig files in the active log path before connection – they are deleted on first connect
  • JDBC type 2 drivers are discontinued in DB2 10.1
  • Use Data Studio 3.1.1 or higher to take advantage of all 10.1 features
  • Default for detailed statistics is now sampled, (default for non-detailed statistics remains full)
  • Schema specification is no longer required for Runstats
  • Log records are larger – you’ll need to increase your active log space by 10-15%, and also LOGBUFSZ
  • Consider using these new features:
    • Adaptive Compression
    • Log Archive Compression
    • Multi-temperature storage
    • Insert-Time-Clustered tables
    • Partitioned tables
    • db2move supports parallelism
    • Change history
  • The Redbook on Unleasing DB2 10 for LUW is good.

Wednesday – G05: PureData System for Transactions Overview – James Cho

This session was a bit heavy on the marketing for me, but I wanted to have some details in case one of my clients buys one of these.

  • You can create your only patterns
  • Includes pureScale with 2 CFs
  • Includes TSM
  • Includes some GUI facility for taking backups
  • Includes OPM
  • Full rack is bigger than Watson
  • Includes SSD
  • Cannot split tablespaces and control where their storage lands
  • Does not include Recovery Expert

Wednesday – C06: DB2 on the Cloud: Experiences from the Trenches – Leon Katsnelson

This session was a bit focused on Amazon costs and options.

  • http://bigdatauniversity.com/ – this session is about how it is hosted in the cloud
  • Cloud = delivery (on-demand)
  • Infrastructure as a service
  • Platform as a service
  • Software as a service

Wednesday – E07: Battle-Proven SQL Tuning Techniques – Phill Gunning

  • Range delimiting predicates are applied earlier than many other predicates – that’s why you want them
  • Jump scan shows up in explain as a jump predicate
  • Jump Scans are counted in MON_GET_INDEX
  • Interesting way to re-write a not exists presented in the session – might look it up on the slides
  • DB2_ANTIJOIN can improve performance of not exists
    • Default = NO
    • Yes = not exists
    • Extend = not exists and not in
  • ‘IN’ is not always evil – it can sometimes be useful to encourage a semi-join

Wednesday – TNO: Nuts and Bolts for LUW Performance – Beulke, Gross, Stopfer

  • If using TRACKMOD=ON, then any change to a LOB will cause the entire tablespace to be backed up in full, which may lead to larger backups
  • LOBs should be in separate tablespace
  • SLOB = Small LOB
  • STMM sleeps for 60 seconds, does work, then sleeps again
  • To improve reorg performance:
    1. Commit as frequently as possible, since reorg will wait forever for every single row – it does not respect LOCKTIMEOUT
    2. Clustering reorgs do a forward scan while non-clustering reorgs do a backward scan – so non-clustering reorgs do half the work or less as clustering reorgs
  • Consider a script to check active log space and pause reorgs if it is full

Thursday – C08: The 10 Commandments of Supporting E-Commerce Databases – ME!

Well, this was my session, so I’ll refrain from a review. I think it went well.

Thursday – C09: Index Jump Scans and Updated rules for optimal DB2 LUW Index Design – Scott Hayes

This one was interesting. Basically, he had trouble getting jump scans to even happen, and in nearly all cases saw more logical reads on 10.1 than on 9.7. He later stated that execution time was about the same or better on 10.1, though. Good tips for using db2batch in the presentation.

Thursday – C10: Wowza! Great Tips I’ve Learned about DB2 LUW 10.1 – Melanie Stopfer

Melanie is one of my favorite people on the planet.

  • MON_GET_TRANSACTION_LOG (love those MON_GET table functions!)
  • Make your log buffer on the standby larger to avoid back pressure
  • HADR_SPOOL_LIMIT (default=0) – still guarantees integrity, but allows logs on standby to spool out to disk
  • ecu_buf_pct
  • Disadvantage – this can extend takeover time
  • ADMIN_MOVE_TABLE can be used for and offline reorg, though you can’t use LONGLOBDATA in that reorg
  • If using auotmatic storage, set overhead at the stogroup level
  • (9.7) – db2pd -tablespaces has TRACKMOD stats (or MON_GET_TABLESPACE)
  • Paths changed (due to looping PureScale in):
    • Database Directory
    • Log
    • Archive Log
    • Backup file name

Thursday – C11: A db2 LUW Fitness Plan – Brian Fairchild

Brian is brilliant on reorgs, and has good stuff to say.

  • MON_GET_TABLESPACE will show you if there are pages empty below the High Water Mark
  • Overflow accesses in the biggest reorg indicator
  • Great slides on reasons to reorg by formula – what each formula means
  • Value compression – can be used without specific license

Thursday – D12: An Index’s Guide to the Universe – Brad Price

  • Brad’s goal for OLTP is to have 5 indexes or less per table
  • ADVISE_INDEX can store potential indexes
  • When setting the explain mode, you can use “recommend indexes” to populate ADVISE_INDEX
  • There’s a use_index column in ADVISE_INDEXES that can be tweaked

Thursday – DB2 for LUW Panel

  • Index reorgs done with “cleanup only” are fully, 100% online
  • Full index reorgs take a Z lock during the switch phase
  • 10.1 improves index prefetching
  • Allowing truncation during reorg gets an S lock at the table level
  • Read-ahead prefetching may cause the increased Logical Reads that were presented in Scott’s presentation
  • One should look at the execution time, not just he number of logical reads
  • LASTUSED is updated for indexes during a LOAD
  • PureScale requires AST, so you may have to use db2move or something to move data in from other tablespace types, if converting

Friday – G13: Customer Success Story: Using DBI Software tools for IBM DB2 LUW

This one suprised me with quality of content and speaker – Kohli has some skills!

  • Statistical views may be particularly useful when you have ranges in your where clause
  • Kohli presented some interesting techniques to deal with poor performing likes on ‘%string’ and ‘%string%’. They’re complicated, and I cannot do them justice here.
  • Interesting techniques for generating test data
  • Great query on slide 47 for Foreign Key creation

Friday – C14: How to Maximize Performance Benefits from DB2 10.1 & 10.1 BLU Intro – Michael Kwok

Another surprisingly good one – interesting speaker and great content:

  • Showed 10.1 performance benchmarks
  • ESE OLTP was about the same as 9.7
  • ESE Scalability was better than 9.7
  • Better performance on 10.1 was when leveraging multiple cores
  • When using jump scans, bigger gaps = better performance
  • 10.1 is more proactive in prefetching – look at sync vs. async index reads
  • 10.1 is faster for poorly clustered data
  • Need for reorgs is lower, and IBM’s strategic direction seems to be reducing the need for reorg rather than improving reorg
  • Path length for utilities reduced in 10.1
  • 10.1 encourages anti-join
  • exfmt in 10.1 recommends indexes to support zig-zag join

Friday – C15: Data Masking: Protecting both Production and Test Environments – Wallid Rjaibi

Man, was I tired by the time I got to this presentation.

  • Verizon produces a comprehensive data security report
  • A trusted context includes additional capabilities like switching the user
  • Was 6.1 and cognos support trusted context – need to find out if Commerce does
  • When using column masking, you can call a UDF, but not a Stored Procedure
  • Guardium’s value add is if you’re working with multiple RDBMses
  • Consider the same mask on related columns
  • Can use Optim’s data masking UDFs with column masks

I have done justice to nothing I saw. It was a fabulous week of drinking from a firehose of knowledge and hanging out with my DB2 geek friends. And it rained all week, so I didn’t feel bad about being inside and occupied from basically 7 am to 11 pm every day. If you haven’t been to an IDUG conference, GO – it is totally worth it.

When is ‘AUTOMATIC’ Not STMM?

$
0
0

Somewhere along the line, I associated ‘AUTOMATIC’ settings for parameters with DB2’s Self-Tuning Memory Manger (STMM). But the two are not associated. Sure, if STMM is set to ON, then some parameters set to AUTOMATIC will be tuned by the STMM, but many parameters can be set to automatic whether STMM is ON or not.

What Parameters STMM can Change

In reality, the parameters that STMM deals with are ONLY:

  1. DATABASE_MEMORY if AUTOMATIC
  2. SORTHEAP, SHEAPTHRES_SHR if AUTOMATIC, and SHEAPTHRES is 0
  3. BUFFERPOOLS if number of pages on CREATE/ALTER BUFFERPOOL is AUTOMATIC
  4. PCKCACHESZ if AUTOMATIC
  5. LOCKLIST, MAXLOCKS if AUTOMATIC (both must be automatic)

Now personally, I immediately set PCKCACHESZ to a fixed number on my e-commerce or OLTP systems. On an oversized system, I’ve seen the STMM allocate way too much space to my package cache by caching every query that’s only ever executed once, which in a system that should be running well-defined SQL only is not helpful for performance.

But I’ve had good luck with the others generally set at AUTOMATIC on single-instance, single-database OLTP systems.

Three Scenarios to Understand

Scenario 1: STMM is off, DATABASE_MEMORY = AUTOMATIC

SELF_TUNING_MEM: OFF
DATABASE_MEMORY: AUTOMATIC

DB2 will add up the parameters for UTIL_HEAP_SZ, PCKCACHESZ, DBHEAP, SHEAPTHRES_SHR, LOCKLIST and CATALOGCACHE_SZ. Let’s pretend that equals 10 GB. DB2 then adds 20% to that – 2 GB – as the overflow buffer. You can then manually change any of those parameters plus bufferpools, as long as that change is less than the the 2 GB (assuming other changes haven’t used your 2 GB), the change will be immediately successful. If your changes exceed 2 GB, then the change will be deferred.

Package cache overflows and Catalog cache overflows automatically take memory from the overflow buffer, but other memory parameters must be changed manually to use the overflow buffer.

The size of the overflow buffer cannot be increased because STMM is off.

Scenario 2: STMM is on, DATABASE_MEMORY = 11 GB

SELF_TUNING_MEM: ON
DATABASE_MEMORY: 2883584

DB2 will add up the parameters for UTIL_HEAP_SZ, PCKCACHESZ, DBHEAP, SHEAPTHRES_SHR, LOCKLIST and CATALOGCACHE_SZ. Let’s pretend that equals 10 GB. DB2 then subtracts that from your static value for DATABASE_MEMORY – in this case, 11-10=1GB as the overflow buffer. You can then manually change any of those parameters plus bufferpools, as long as that change is less than the the 1 GB (assuming other changes haven’t used your 1 GB), the change will be immediately successful. If your changes exceed 1 GB, then the change will be deferred.

Package cache overflows and Catalog cache overflows automatically take memory from the overflow buffer, but other memory parameters must be changed manually to use the overflow buffer.

The size of the overflow buffer cannot be increased because it is limited by the static size of DATABASE_MEMORY.

Scenario 3: STMM is on, DATABASE_MEMORY = AUTOMATIC

SELF_TUNING_MEM: ON
DATABASE_MEMORY: AUTOMATIC

DB2 will automatically tune sortheap, sortheapthres_shr (must have sheapthres=0 in DBM CFG), locklist and maxlocks, bufferpools, and package cache. If DB2 runs out of space in the overflow buffer, it will go looking for more memory in INSTANCE_MEMORY. If INSTANCE_MEMORY is set to AUTOMATIC, then DB2 will also go looking for more memory from the operating system. But realistically, DB2 is only tuning INSTANCE_MEMORY by adding space to DATABASE_MEMORY, because INSTANCE_MEMORY contains DATABASE_MEMORY. DB2 does not increase INSTANCE_MEMORY for the instance parameters.

Within INSTANCE_MEMORY, DB2 allocates space to DATABASE_MEMORY, MON_HEAP_SZ, AUDIT_BUF_SZ, FCM buffers, FCM anchors and APPL_MEMORY. DB2 does not increase INSTANCE_MEMORY for the instance parameters – INSTANCE_MEMORY is only increased to accommodate changes to DATABASE_MEMORY.

If you set a hard value for INSTANCE_MEMORY, but leave DATABASE_MEMORY at AUTOMATIC, DB2 may choose to give too much of INSTANCE_MEMORY to DATABASE_MEMORY, leaving too little memory for APPL_MEMORY, and impacting overall performance. For this reason, if hard limits are set, it is advisable to set them at the database level instead of the instance level.

Some examples of parameters that can be set to AUTOMATIC without STMM:

  • DBHEAP – AUTOMATIC means it can be increased as long as there is space in DATABASE_MEMORY, or if DATABASE_MEMORY is also automatic, INSTANCE_MEMORY
  • MON_HEAP_SZ – AUTOMATIC means it can be increased as long as there is space in INSTANCE_MEMORY
  • STMTHEAP – AUTOMATIC means it can be increased as long as there is space in APPL_MEMORY, or INSTANCE_MEMORY if APPL_MEMORY is set to ATUOMATIC.
  • APPLHEAPSZ – AUTOMATIC means it can be increased as long as there is space in APPL_MEMORY, or INSTANCE_MEMORY if APPL_MEMORY is set to ATUOMATIC.

DB2 LUW – What is a Page?

$
0
0

The logical view of a database consists of the standard objects in any RDBMS – Tables, Indexes, etc. There are a number of layers of abstraction between this and the physical hardware level, both in the OS and within DB2.

Setting the Page Size

The smallest unit of I/O that DB2 can handle is a page. By default, the default page size is 4096. The default can be set to one of the other possible values when the databases is created using the PAGESIZE clause of the CREATE DATABASE statement.

The possible values for page size in a db2 database, no mater where it is referenced are:

  • 4K or 4,096 bytes
  • 8K or 8,192 bytes
  • 16K or 16,384 bytes
  • 32K or 32,768 bytes

Realistically, the page size is set for either a buffer pool or a tablespace. Setting it at the database level on database creation just changes the default from 4K to whatever you choose and changes the page size for all of the default tablespaces and the default bufferpool.

The page size for each bufferpool and tablespace is set at the time the buffer pool or tablespace is created, and cannot be changed after creation.

Tables can be moved to a tablespace of a different page size after creation using the ADMIN MOVE TABLE command, but that operation requires at the very least an exclusive lock, and may not support RI – I hear RI support is added in 10.1 Fixpack 2.

Choosing a Pagesize

In my experience, it is rare to have a database created with a different default page size. Every database I currently support has the default page size of 4K, and also has at least one tablespace and one bufferpool with each of the other page sizes.

The most common time you think about page sizes is when you’re creating a table. When DB2 stores data in a table, it CANNOT have a row size larger than the page size minus some overhead. So if you have a row greater than 4,005 bytes in width, you simply cannot keep it in a tablespace with a page size of 4K. The row size does not include large LOBs that you do not wish to in-line. But it does include the maximum value of every varchar field.

This is one area where DB2 differs from Oracle. To my knowledge, in Oracle, you can have a row that spans multiple pages. From what I hear, DB2 is planning to support that in a future release, but they’re planning to do it by plopping the rest of the row in a LOB – there was an audible groan when they announced that in a conference session I was in, due to the other problems in dealing with LOBs.

It is also important to think of the geometry of the table when choosing a page size. If you have a row size of 2010 bytes, that means that only one row will fit on every 4K page, and therefore nearly 50% of the space allocated to your table will be empty and wasted. A row size that large would do much better on an 8K or 16K or even 32K page. It is important to consider this for every table as you create it, and to revisit it with any column length alterations you make.

I have, on several different occasions, had a request from a developer to increase a column size, and have had to move the table to a different tablespace to accommodate the change because the increase pushed the row over the limit. Historically, moving a table between tablespaces has required the table to be offline – which can be problematic in a live production database. If you’re properly testing changes in at least one development environment, you will discover these kinds of issues before they get to production.

While page overhead is not exactly 100 bytes, it is usually not far from it, so to determine how many rows will fit on a page, you can usually use:

(pagesize-100)/maximum row length

Again, this does not count LOB data, but only LOB descriptors. LOB descriptors are stored on the data page. The remainder of the LOB is stored in another location, assuming you have not INLINED the LOBs. From the standpoint of a page, the main reason for using a LOB is to allow a large portion of unstructured data to be stored elsewhere – not directly on the data page. LOB descriptor sizes on the page depend on the size of the LOB and vary from 68-312 bytes.

Generally, smaller pages are preferable for OLTP and e-commerce databases because they allow you to handle less data at a time when you’re expecting smaller queries.

The total table size is another factor in choosing a page size. New tablespaces should generally be created as “LARGE” tablespaces. But “REGULAR” used to be our only option, and for REGULAR tablespaces with a 4K page size, the limit on table size in a DMS tablespace is just 64 GB (per partition). On more than one occasion I have dealt with hitting that limit, and it is not a pleasant experience. For LARGE tablespaces, the limit per partition for a 4K page size is 8 TB – much higher.

Since pagesize is set at the tablespace level, you can also consider the appropriate page size for your indexes, assuming you’re putting them in a different tablespace than the data. While you cannot select a tablespace for each index, but only an index tablespace for the table as a whole, you’ll want to consider possible indexing scenarios when chossing where your indexes go as well. The limit for index key (or row) size in db2 9.7 is the pagesize divided by 4. So for a 4k page size, it is 1024. Back on DB2 8.2, the limit was 1024 for all page sizes.

An interesting side note: if you’re not familiar with the “SQL and XML limits” page in any DB2 Info Center, I recommend looking it up and becoming familiar with it. That’s where I verified the index row limit and there are all kinds of interesting limits there.

Data Page Structure

Ok, this section is admitedly just a tad fuzzy. I had the darndest time getting information on this, even reaching out to some of my technical contacts at IBM. But I’m going to share what I do know in case it helps someone.

Each data page consists of several parts. First is the page header – a space of 91-100 bytes reserved for information about the page. The page header identifies the page number and some other mysterious information I cannot find much detail on.

Next comes the slot directory – which is variable in size, depending on how many rows are on the page. It lists RID’s that are on the page and the offset on the data page where the record begins. if the offset is -1, then that indicates a deleted record. In the structure of the rows themselves on the page, it appears that the records “start” at the bottom of the page. There may be open space between records due to any of the following:

  • Deleted records
  • Reduced size of VARCHAR values
  • Relocation of records due to increased size of the VARCHAR that will no longer allow the row to fit on the page

This open space cannot be used by additional records until after a reorg.

Finally, there may be continuous open space on a page that is left over or simply not used due to either deletes followed by reorgs or due to the pages simply not being filled yet.

DataPage

I found some references to each page also having a trailer, but they were not from sources I could consider credible, so there may or may not be a trailer on the page. Most of the information here comes from a page in the IBM DB2 Info Center. I would love to hear reader comments on this topic, or any references anyone may have with more detailed data.

Not every page in a table is a data page dedicated fully to user data. There is one extent of pages allocated for each table as the extent map for the table. Past a certain size, additional extent map extents may be required. There is also a Free Space Control Record every 500 pages that db2 uses when looking for space to insert a new row or move a re-located row.

Index Page Structure

Indexes are logically organized in a b-tree structure. Interestingly, the RIDs that index pages use are table-space relative in DMS tablespace containers, but object relative in SMS tablespace containers – perhaps this is one of the reasons there has never been an easy way to convert from SMS to DMS or vice-versa.

I have not been able to find a good representation of exactly what leaf and non-leaf pages look like for indexes. We do have the standard representation of pages in a b-tree index, that is:
b-tree

This shows us that we can expect to see index keys that delimit ranges on the root and intermediate non-leaf pages, and that on the leaf pages, we expect to see the index keys along with the RIDs that correspond to those index keys. There is still a page header that is between 91 and 100 bytes, But I don’t know if index leaf pages have the same slot directory that data pages do. Again, I welcome user comments and links on this topic.

Extent Size and Prefetch Size

The extent size and prefetch size are specified on a tablespace by tablespace basis(keywords EXTENTSIZE and PREFETCHSIZE on the CREATE TABLESPACE command) or as database-wide defaults (DFT_EXTENT_SZ and DFT_PREFETCH_SZ db cfg parameters). Extent sizes cannot ever be changed after tablespace creation. Prefetch sizes can be set to AUTOMATIC and be changed automatically by DB2, or can be altered manually using the ALTER TABLESPACE command. Both are specified as a number of pages.

The Extent size is the number of pages that are allocated to any objects in the each tablespace container at one time. The Prefetch size is the number of pages that are read into the bufferpool by the prefetchers for one read request. I’m not going to speak specifically to the tuning of these in this post.

Summary

By understanding pages, we come closer to understanding how DB2 handles some aspects of I/O. Minimally, a DBA needs to be able to pick the appropriate page size for a given table and it’s indexes.

Using Vendor Backup Solutions with DB2 for LUW

$
0
0

There are an astonishing number of vendor solutions available with specific interfaces for DB2. Working with a variety of clients, I see and help to evaluate and implement a variety of backup solutions. I thought I’d share some of the things I look for and work on as part of an implementation. Sometimes the DBA has input on a solution chosen, and other times, A solution is dictated and a DBA must simply implement it.

Backup Solution Expertise

With any solution, engaging an expert in that solution is key. Whether that expert is someone who is hired to implement the solution and train one or more internal resources on it, or whether someone is hired with that expertise, this is critical. A backup solution is not a “set it and forget it” kind of thing. Implementation is time consuming, testing is time consuming, and over time, issues occur that require expertise. If you don’t have someone permanently on staff with the expertise, then hire a qualified consultant and require that they also produce detailed documentation that not only your regular staff can use for routine work, but that another expert would understand in case you have to bring in someone else in the future.

Functions a Backup Solution Should Provide

Each time I am asked to evaluate a solution, I look at the details of how to perform these tasks. If I am asked to participate in the implementation of a solution, I will insist on testing each of these areas.

Backups/Restores of Databases

It may be obvious, but sometimes the syntax for a backup and/or restore may be a bit different. It is crucial to not only test the backup, but also the restore. On both, compare duration to backup to disk or to the former backup solution. I’ve seen a backup solution used that increased backup time by a factor of 10 due to improper network speed with the new solution. While this may be an annoyance on backup, it will likely completely destroy your RTO on restore. I also like to test a restore, a restore with rollforward (prefferably after some time has passed), a redirected restore, and a restore to a different server.

Archiving of Transaction Logs

Most solutions for database backups will provide the ability to archive transaction logs directly to the backup solution. After interrogating the expert about how many copies of each archive log are kept in case of corruption or other problem, I’ll look into the details of how they recommend setting things up to accomplish this. I have actually within the last year seen a vendor that required a use of a user exit. This is a red flag to me, as a user exit is an older way of doing things, and can be problematic to deal with. Archiving of transaction logs should be able to be handled with a vendor library and a few settings at the database level. No need to have the black box of a user exit getting in the way.

Listing Backup Images and Transaction Log Files Available

If using TSM, this is easy using the db2adutl command. However, each vendor should have a similar command (though likely not with “db2” in the title) that the DBA must be able to run to see what backup images are available to restore. This is critical to test. I usually have to work through three things to make this work:

  • The syntax of the vendor command
  • Ensuring the instance owner and SYSADM groups have the right permissions to run the command
  • Details of the input to the vendor’s command

Extracting Backup Images to Disk

Being able to backup and restore an image is one thing, but you should also be able to extract that image to disk. Why? Maybe you need to transfer the image to a server that will not have access to the vendor solution. Maybe you need to supply a backup image to IBM or an application vendor’s support personnel to work through a problem. It is not enough to just be able to run backups and restores.

Extracting Transaction Log Files to Disk

For the same reasons you need to be able to access backup images, you also need to be able to access transaction log files. Many restore scenarios require at least a few transaction logs.

Functions a Backup Solution May Provide

In addition to the minimum functionality, a backup solution may provide other functionality.

Accessing Backup Files and Transaction Logs from a Different Server

As a part of my restore testing, I always try to test a restore on a different server than where the backup was taken. If this requires extracting the backup image to disk and then transferring the image, that is generally fine with me. If there is a way to access the backup image directly on the vendor’s backup solution from another server, that’s fine as well. Direct access can save time, especially since the connection to the backup solution may be faster than the connection available to transfer a file.

Deduplication

Most vendor backup solutions provide some sort of deduplication. If you’re using such a solution make sure that you do not compress or encyrpt your backup images as either will nullify the benefits of deduplication. This may mean that you need to have enough disk space on your database server to hold an uncompressed backup image, in case you need to extract one to disk.

Flash Copy at the Storage Level

Some vendor backup solutions may suspend writes on the database and take a flash copy at the storage level. This can work well with DB2, as long as the backup procedure includes suspending writes and your database filesystems are appropriately segregated by database to make independent restores possible. Did you know that you can even roll forward through transaction logs after bringing up a backup taken in this manner? Just ensure that you perform the same kinds of restore tests and are aware of the restrictions with this methodology.

Pruning Backup Images and Transaction Log Files

DB2 provides methods for deleting old backups and transaction log files. Make sure you communicate with the expert in the vendor backup solution to see who is responsible for pruning old images in this scenario. If the vendor solution performs this work, verify that the numbers they are specifying for each match what you need to meet requirements at the databases level.

DB2 Settings for Vendor Solutions

Often, using a vendor solution for DB2 backups, restores, and transaction log file archiving requires setting three DB2 parameters – VENDOROPT, LOGARCHMETH1, and LOGARCHOPT1.

Vendor Options

The VENDOROPT database configuration parameter will often specify a configuration file that DB2 will need to know about for backup, restore, or load copy operations. It can specify other details needed by the vendor solution.

Log Archive Method and Options

The LOGARCHMETH1 database configuration parameter will be set to VENDOR: followed by a string that will likely specify the vendor library. The LOGARCHOPT1 database configuration parameter will likely specify the name of the configuration file (which may be the same as specified in VENDOROPT), and may specify other details that the vendor solution needs.

TSM

TSM continues to have its own set of parameters that are specified directly in the database configuration. These parameters include:

For other vendor solutions, these are things that are often specified in one of the vendor option locations or the vendor configuration file. IBM just makes it easier to use their products together. Like with any other vendor solution, you need a TSM expert to help get TSM set up in a stable and secure way. TSM also has a dedicated db2 command – db2adutl – to list and extract backup images and transaction logs.

Backup Command

The backup command often looks a bit different when using a vendor solution. Here’s one example from a client that uses a product called ‘NetWorker’:

db2 backup db db_name online load /usr/lib/libnsrdb2.so options @/nsr/apps/config/db_name_db2.cfg dedup_device include logs

This is the same library that we specify in LOGARCHMETH1 and the same config file that we specify in VENDOROPT and LOGARCHOPT1 parameters. Notice also that we do not encrypt or compress the backup image and we also specify an option – DEDUP_DEVICE in the backup command that optimizes the backup image for deduplication. For TSM, the location is specified as USE TSM instead of LOAD <library>

Parallelism

With many vendor solutions, you can specify that multiple sessions be opened to the vendor product if bandwidth is available. This can speed up the backup, so you may want to try this. This is specified with the OPEN <num-sessions> SESSIONS syntax.

Warning Signs in Vendor Backup Solutions

I have seen bad vendor solutions or perhaps bad implementations of vendor solutions. Look out for vendor solutions that don’t have details around DB2 – they may not be aware of the intracacies of DB2 backups. Also keep an eye on CPU and I/O stats before and after implementation of a vendor solution. I have seen a vendor solution crash a running DB2 production database server (one of the rare cases I’ve seen an error condition require a full server reboot to recover from on AIX). So be sure that you or someone is thoroughly analyzing things before you get to production.

DB2 Memory Area In-Depth: The Package Cache

$
0
0

The package cache is just one memory area that DB2 offers to tune memory usage for a DB2 database. This article is a deep dive into this memory area.

What is the Package Cache

The package cache is an area of memory that DB2 uses to store access plans. Access plans are detailed strategies for how DB2 will get to all of the data it needs to satisfy a query.

SQL is a language where we tell DB2 what data we want. We don’t tell DB2 with computer precision where that data is on disk or the methodology that it should use to integrate data into a single result set. We tell it to join two tables, but we don’t give it the specific join methodology to line up rows from the two tables.

DB2 has a wickedly powerful cost-based optimizer that takes the details we feed it about the result set we want, and turns that into the incredibly detailed plan of how to retrieve all that data in the most efficient way and integrate it. There are multiple people at IBM whose whole jobs and careers are based on writing, testing, and improving this optimizer. The more powerful your cost-based optimizer, the more powerful your RDBMS. In my opinion, a good cost-based optimizer is a key differentiator of a relational database management system over many flavors of NO-SQL.

While the DB2 optimizer usually saves vastly more time than if we just accessed all data by table scan and nested loop join, there are still costs associated with that work. In many DB2 databases, the same or vastly similar SQL is re-executed over and over again. DB2 places the access plans generated in a memory area called the package cache. The package cache stores access plans for both static and dynamic SQL.

Sizing the Package Cache

The package cache, in recent versions, is set to AUTOMATIC by default and is tuned as part of STMM (the Self-Tuning Memory Manager). This means that DB2 can tune it for you.

This is, however, the one parameter I nearly always remove from self-tuning. Particularly in databases that make poor use of parameter markers, and thus have few reusable access plans, DB2 tends to over-size the package cache.

Too Small

If the package cache is too small, agents will have to constantly re-compile dynamic SQL while applications are waiting for that compilation to happen.

Package Cache Overflows

Package cache overflows happen when there is not enough memory in the package cache to hold the access plans for all active SQL. If this happens, DB2 will borrow some memory from the database overflow buffer. Borrowing this memory takes some time and resources and means applications will wait even longer on compiling SQL. You can look for package cache overflows in the db2 diagnostic log like this:

SELECT     TIMESTAMP
    , substr(APPL_ID,1,15) as APPL_ID_TRUNC
    , MSGSEVERITY as SEV
    , MSGNUM
FROM TABLE ( PD_GET_LOG_MSGS( CURRENT_TIMESTAMP - 7 DAYS)) AS T 
WHERE MSGNUM=4500
ORDER BY TIMESTAMP DESC;

TIMESTAMP                  APPL_ID_TRUNC   SEV MSGNUM     
-------------------------- --------------- --- -----------
2016-02-18-10.36.52.270951      192.0.2.0. W          4500
2016-02-18-10.36.44.129013      192.0.2.0. W          4500
2016-02-18-10.36.36.298252      192.0.2.0. W          4500
2016-02-17-10.38.23.363510      192.0.2.0. W          4500
2016-02-17-10.38.15.112473      192.0.2.0. W          4500

The above is slightly more accurate than using db2diag with -e 4500, as that can return false positives.
There is also a counter of package cache overflows since the last database restart:

select pkg_cache_num_overflows
from table(mon_get_database(-2)) as t with ur

PKG_CACHE_NUM_OVERFLOWS
-----------------------
                  10347

  1 record(s) selected.

Generally we want to tune the package cache to a large enough size that we’re not seeing frequent package cache overflows.

Package Cache Hit Ratio

Another performance indicator for the package cache is the package cache hit ratio. The package cache hit ratio tell us how often an agent went to see if there was already an access plan in the package cache and found it was already there.
The package cache hit ratio can be calculated like this:

select decimal((1-(float(pkg_cache_inserts)/float(pkg_cache_lookups))) * 100,5,2) 
from table(mon_get_database(-2)) as t with ur

1      
-------
  34.40

  1 record(s) selected.

Before you tune the package cache to improve your package cache hit ratio, consider what the database workload looks like. There are some database workloads that will never get a high package cache hit ratio because the SQL executed is either truly unique SQL or the application is not making appropriate use of parameter markers. Some data warehousing environments may have truly unique SQL and you will never achieve a good package cache hit ratio, no matter how big you make it. The memory is better used in other places. If there is a bad package cache hit ratio due to lack of proper use of parameter markers, then consider educating your developers on the proper use of parameter markers or as a last resort using the statement concentrator.

In an e-commerce or OLTP database, this is one of the top 10 key performance indicators that I look at when looking at performance for a system. I want to see my package cache hit ratio above 95% in those environments. e-commerce and OLTP environments should be repetitively executing the same sets of queries over and over again with different values.

Too Large

If the package cache is too large, it is using memory that would be more beneficial if allocated to other important memory areas like sort or buffer pools. An overly large package cache can also make analysis of SQL based on the data in the package cache unreasonably difficult.

Size of the Package Cache

The package cache is defined by the database configuration parameter PCKCACHESZ. I have to look up the spelling of this parameter EVERY TIME, because it just doesn’t seem logically consistent with other database configuration parameters to me.
It is easy to determine the size of the package cache:

db2 get db cfg for SAMPLE |grep PCKCACHESZ
 Package cache size (4KB)                (PCKCACHESZ) = AUTOMATIC(18819)

or

select integer(value) as pckcachesz_4kPG
    , value_flags 
from sysibmadm.dbcfg 
where name='pckcachesz' with ur

PCKCACHESZ_4KPG VALUE_FLAGS
--------------- -----------
          18819 AUTOMATIC

  1 record(s) selected.

The size of the package cache can be updated using this syntax:

db2 update db cfg for SAMPLE using pckcachesz <NUM>

Data Available in the Package Cache

In addition to serving a very useful function, there are also details about SQL that has been executed against the database. This information includes:

  • Query statements
  • Number of executions
  • Execution time
  • Lock wait time
  • Sort time
  • CPU time
  • Rows read
  • Rows returned
  • And much more

This information is collected for each unique statement, but it is cumulatice across all executions of the statement since the statement was placed into the package cache. DB2 keeps about as much as it can in the package cache, but especially if the package cache is on the small side, statements may be moving in and out of the package cache all the time. This means that it is possible for the data for one statement to represent executions over the last 10 months, while the very next statement might be over the last 10 seconds.

Having this data be cumulative is very useful. It can be a potent tool for identifying problem SQL that is running against a database.

To query this data, see my favorite SQL statement for identifying problem SQL.

Data Not Available in the Package Cache

There are some pieces of data that we might sometimes want that are not available in the package cache. This includes:

  • When a particular query was executed
  • Some details on static SQL
    • If using snapshot monitor
    • Prior to DB2 9.7
  • Specific values used in execution of a statement when that statement uses parameter markers
  • Literal values used during execution, if the statement concentrator was used for that statement

Most of these details can be collected using a more invasive activity event monitor. Many of these pieces information are simply not possible to gather in a cumulative environment like the package cache.

Parameter Markers

Parameter markers are placeholders for values in SQL statements. They allow DB2 to compile one access plan and then use that access plan for the same statement with different supplied values. In the same statement, some values may make sense as parameter markers, while others make more sense as static values. It is up to the developer or vendor to chose the proper places to use or not use parameter markers. This is not something that the DBA often has much input on, though educating developers on such details falls to many DBAs.

One of the disadvantages of parameter markers is that using them means that distribution statistics will not be used for that particular portion of the query. The advantage of a generic access plan is that it doesn’t have to be re-compiled over and over. The disadvantage is that in some cases of data skew, the access plan will be slower than one that took into account the specific values.

Statement Concentrator

What happens if you’re dealing with developers or a vendor who are unwilling or unable to make proper use of parameter markers? DB2 has a band-aid for that! The statement concentrator can be enabled for certain connections using the CLI configuration or for all connections using the database configuration parameter STMT_CONC. Setting STMT_CONC to LITERALS tells DB2 to treat all literal values as parameter markers. DB2 makes an exception for statements that already contain parameter markers. DB2 assumes that if you were smart enough to use a parameter marker at one place in a statement, you intended the other values to be static.

Package Cache Statement Eviction Monitor

To help us deal with the transient nature of data in the package cache, IBM introduced the package cache statement eviction monitor. This event monitor will capture the data about a statement when it is evicted from the package cache so that data is not lost.

Creating the Event Monitor

Before creating a package cache statement eviction monitor, you must first have a buffer pool, a table space, and a system temporary table space with a 32k page size. The simplest syntax for that is:

create bufferpool buff32k pagesize 32K
create tablespace dba32k pagesize 32K bufferpool buff32k
create system temporary tablespace tempsys32k pagesize 32k bufferpool buff32k

You can create this event monitor to capture all evicted statements:

CREATE EVENT MONITOR MY_PKGCACHE_EVMON 
  FOR PACKAGE CACHE 
  WRITE TO UNFORMATTED EVENT TABLE (IN DBA32K)

Or, the event monitor can capture only statements that exceed defined thresholds:

CREATE EVENT MONITOR MY_PKGCACHE_EVMON 
  FOR PACKAGE CACHE 
   WHERE NUM_EXECUTIONS > 10
  WRITE TO UNFORMATTED EVENT TABLE (IN DBA32K)

The three thresholds available for this are:

  • NUM_EXECUTIONS
  • STMT_EXEC_TIME
  • UPDATED_SINCE_BOUNDARY_TIME

Using these thresholds can be useful to reduce the statement data collected, since we’re often talking tens of thousands of statements a day.

Querying the Data

One of the things I really like about this event monitor is that the output of this event monitor is formatted in the same way as the tables I query for real-time data like MON_GET_PKG_CACHE_STMT. This makes it easy to tweak an existing statement to reference the event monitor table instead.

Maintenance Needed

As with all event monitors, you need to make sure you are archiving and pruning output to prevent this from filling up a critical disk. I generally like to keep my monitoring data in a separate table space on a separate disk to keep a mistake in pruning data from having any possibility of causing a database outage.

Squirrel!

Side note: “The Whole Package Cache” refers to my second favorite DB2 podcast by Ian Bjorhovde (whose name I can finally spell without looking it up) and Fred Sobotka (whose name I still cannot spell without looking it up).

Db2 Basics: Levels of Configuration

$
0
0

There are a number of places where we can store and change configuration for a Db2 server. I wanted to walk through the main areas and a few details about them.

Db2 Registry

The Db2 registry actually has a number of levels within itself. It is accessed using the db2set command. This is not the Windows registry. By default a couple of parameters are set at the global level in the Db2 registry when an instance is created. If db2set is run as an instance owner without specifying the level at which a setting should apply, then it takes effect at the instance level. The levels of the Db2 registry are:

  • Operating system enviornment [e]
  • Node level registry [n]
  • Instance level registry [i]
  • Global level registry [g]

Most of the parameters in the DB2 registry require a db2stop/db2start to take effect, but some may be able to take effect dynamically. To check whether a parameter can take effect without a recycle, use the -info <variable> option on the db2set command, like this:

> db2set -info DB2COMM
   Immediate change supported : NO
   Immediate by default :       NO

Some Db2 registry parameters, like DB2_WORKLOAD, are aggregate parameters that include setting a number of other registry parameters. If any Db2 registry parameters are set by an aggregate parameter, they will have the name of that parameter in square brackets, like this:

[i] DB2_EVALUNCOMMITTED=YES [DB2_WORKLOAD]

Most registry variables are documented in the Db2 Knowledge Center, however IBM sometimes introduces new functionality with a hidden or undocumented registry variable. Unlike the other major configuration locations below, Db2 registry variables do not each have their own page in the Db2 Knowledge Center, and can be a bit frustrating to find at times.

Updating Values

The db2set command is used to update the values of Db2 registry parameters. The general syntax is:

db2set parameter=value

For example:

db2set DB2COMM=TCPIP

After changing a value, always verify that the change has taken effect.

Database Manager(DBM) Configuration

The database manager configuration consists of a large number of parameters that are set at the Db2 instance level.

Viewing Values

This configuration is often referred to as the “dbm cfg” because the shortest command for listing these variables and their values is:

db2 get dbm cfg

If you’re looking for just one parameter, it’s easiest to pipe the output to grep on Linux/UNIX or select-string at a PowerShell command line:

$ db2 get dbm cfg |grep INSTANCE_MEMORY
 Global instance memory (% or 4KB)     (INSTANCE_MEMORY) = AUTOMATIC(187170)
PS C:\Windows\system32> db2 get dbm cfg |select-string instance_memory

 Global instance memory (% or 4KB)     (INSTANCE_MEMORY) = AUTOMATIC(399591)

Some parameters, when changed, have the new value deferred until the next db2start. To see the deferred value, the show detail keywords are used on the get dbm cfg command. Using these keywords requires an instance attachment:

$ db2 attach to db2inst2

   Instance Attachment Information

 Instance server        = DB2/LINUXX8664 11.1.1.1
 Authorization ID       = DB2INST2
 Local instance alias   = DB2INST2

$ db2 get dbm cfg show detail |grep INSTANCE_MEMORY
 Global instance memory (% or 4KB)     (INSTANCE_MEMORY) = AUTOMATIC(187170)          AUTOMATIC(187170)

If the deferred value and the current value are the same, then no change will take place for the parameter on db2start.

In addition to using the get dbm cfg command, the values for the database manager configuration parameters can be queried using the SYSIBMADM.DBMCFG administrative view like this:

select substr(value,1,18) as value
    , value_flags
    , substr(deferred_value,1,18) as deferred_value
    , deferred_value_flags 
from SYSIBMADM.DBMCFG 
where name='instance_memory'

VALUE              VALUE_FLAGS DEFERRED_VALUE     DEFERRED_VALUE_FLAGS
------------------ ----------- ------------------ --------------------
187170             AUTOMATIC   187170             AUTOMATIC           

  1 record(s) selected.

Querying the administrative view requires a database connection, while the get dbm cfg command doesn’t even require the Db2 instance be started.

Researching Parameters and Propagation Boundaries

Many of the parameters in the database manager configuration require a db2stop/db2start to take effect. Each parameter has an inidividual page in the IBM Db2 Knowledge Center. A list of all the parameters links to each individual page. While the summary list includes whether each parameter can be changed online, the individual page for the parameter also contains this information. If the parameter can be changed online, then the “Parameter type” section will say “Configurable Online”, and the “Propagation Class” section will specify when the change will take place. This is an example:
Config_online_DBM

There is a wealth of other information on the parameter page for each parameter.

Updating Values

The db2 update dbm cfg command is used to update the values of dbm cfg parameters. The general syntax is:

db2 update dbm cfg using parameter value

For example:

db2 update dbm cfg using SYSMON_GROUP mon_grp

After changing a value, always verify that the change has taken effect using the show detail syntax detailed above. Sometimes if a parameter is deferred, you can attach to the instance and re-issue the update command to make the setting immediate. You may also want to use the immediate keyword:

db2 update dbm cfg using SYSMON_GROUP mon_grp immediate

Multiple changes can be done in a single command with syntax like this:

db2 update dbm cfg using parameter1 value1 parameter2 value2

For example:

db2 update dbm cfg using SYSMON_GROUP mon_grp mon_dft_buffpool ON

Database(DB) Configuration

There is a separate database configuration for each database, even if the databases are in the same instance. There are a lot of similarities between the db config and the dbm config. In fact, over the years, it seems like some things that used to be set at the database manager level seem to move to the database level. For example, the old snapshot monitors used the DFT_MON parameters in the dbm configuration. The newer mon_get* interfaces use the MON_* variables in the db configuration.

Viewing Values

The command for viewing the database configuration is similar to the one for the dbm cfg, though if there is no database connection, the database name must be specified:

db2 get db cfg for sample

Again, If you’re looking for just one parameter, it’s easiest to pipe the output to grep on Linux/UNIX or select-string at a PowerShell command line:

$ db2 get db cfg for sample |grep DATABASE_MEMORY
 Size of database shared memory (4KB)  (DATABASE_MEMORY) = AUTOMATIC(63936)
PS C:\Windows\system32> db2 get db cfg for sample |select-string database_memory

 Size of database shared memory (4KB)  (DATABASE_MEMORY) = AUTOMATIC(62336)

Some parameters, when changed, have the new value deferred until the next database deactivation and activation. To see the deferred value, the show detail keywords are used on the get db cfg command. Using these keywords requires an database connection:

$ db2 connect to sample

   Database Connection Information

 Database server        = DB2/LINUXX8664 11.1.1.1
 SQL authorization ID   = DB2INST2
 Local database alias   = SAMPLE

$ db2 get db cfg for sample show detail |grep DATABASE_MEMORY
 Size of database shared memory (4KB)  (DATABASE_MEMORY) = AUTOMATIC(63936)           AUTOMATIC(63936) 

If the deferred value and the current value are the same, then no change will take place for the parameter on database deactivation/activation.

In addition to using the get db cfg command, the values for database configuration parameters can be queried using the SYSIBMADM.DBCFG administrative view like this:

select substr(value,1,18) as value
    , value_flags
    , substr(deferred_value,1,18) as deferred_value
    , deferred_value_flags 
from SYSIBMADM.DBCFG 
where name='database_memory'

VALUE              VALUE_FLAGS DEFERRED_VALUE     DEFERRED_VALUE_FLAGS
------------------ ----------- ------------------ --------------------
63936              AUTOMATIC   63936              AUTOMATIC           

  1 record(s) selected.

Querying the administrative view requires a database connection, while the get db cfg command doesn’t.

Researching Parameters and Propagation Boundaries

Some of the parameters in the database configuration require the database be deactivated and then activated (while there are no connections) to take effect. Each parameter has an individual page in the IBM Db2 Knowledge Center. A list of all the parameters links to each individual page. While the summary list includes whether each parameter can be changed online, the individual page for the parameter also contains this information. If the parameter can be changed online, then the “Parameter type” section will say “Configurable Online”, and the “Propagation Class” section will specify when the change will take place. This is an example:
Config_online_DBM

There is a wealth of other information on the parameter page for each parameter.

Updating Values

The db2 update db cfg command is used to update the values of db cfg parameters. The general syntax is:

db2 update db cfg for dbname using parameter value

For example:

db2 update db cfg for sample using LOCKTIMEOUT 90

After changing a value, always verify that the change has taken effect using the show detail syntax detailed above. Sometimes if a parameter is deferred, you can connect to the database and re-issue the update command to make the setting immediate. You may also want to use the immediate keyword:

db2 update db cfg for sample using LOCKTIMEOUT 90 immediate

Multiple changes can be done in a single command with syntax like this:

db2 update db cfg for dbname using parameter1 value1 parameter2 value2

For example:

db2 update db cfg using LOCKTIMEOUT 90 NEWLOGPATH /db2logs

Note that some parameters may put the database into a backup pending state, so be sure to research parameters before changing them.

Other Configurations

There are a wealth of other locations where things can be configured. Some parameters can be set in the userprofile (which is sourced by the db2profile). Others can be set in the CLI configuration file for certain kinds of applications. Some are set for the current session only, and the methodology for setting them depends heavily on the type of application being used. If the same parameter can be set in multiple places, the local or session values usually override the ones set at the server level. An example of this is LOCKTIMEOUT. There is always some value set for this at the database level in the db cfg. However, it can also be configured at the session/connection level using the SET CURRENT LOCK TIMEOUT command. It can also be set using an ODBC keyword. Things like binding packages can also have a profound impact on how a database behaves.


Excluding a Table from Db2’s Automatic Runstats

$
0
0

The Problem

The application in this case – SPSS – maintains an exclusive lock at all times on one table. I think this is poor application design, but IBM was not willing to change it when we brought it to their attention in a PMR. This leads to dozens of lock timeouts a day when automatic statistics evaluation tries to get a lock on the table to evaluate whether the table needs runstats or not.

This problem didn’t show up quite the way I would expect it to. Using my favorite locking event monitor, the lock timeouts show up as being against SYSIBM.SYSTABLES and NULL:

select substr(lp.table_schema,1,18) as table_schema
    , substr(lp.table_name,1,30) as table_name
    , substr(le.event_type,1,18) as lock_event
    , count(*)/2 as count
from DBA.LOCK_PARTICIPANTS lp, DBA.LOCK_EVENT le
where lp.xmlid=le.xmlid
group by lp.table_schema, lp.table_name, le.event_type
order by lp.table_schema, lp.table_name, le.event_type
with ur;

TABLE_SCHEMA       TABLE_NAME                     LOCK_EVENT         COUNT
------------------ ------------------------------ ------------------ -----------
SYSIBM             SYSTABLES                      LOCKTIMEOUT              26765
-                  -                              LOCKTIMEOUT              26765

  2 record(s) selected.

This confused me at first, and with help from IBM Support, we identified that the lock timeouts are related to the SIBOWNER table. In the db2 diagnostic path, there is a subdirectory called events. This directory contains records related to statistics collection – both manual and automatic. Within the files in this directory, we see messages like this:

2017-08-24-23.11.08.695965-420 I886320E1035          LEVEL: Severe
PID     : 30986                TID : 140729082963712 PROC : db2sysc 0
INSTANCE: db2inst1             NODE : 000            DB   : SPSSDB
APPHDL  : 0-20515              APPID: *LOCAL.db2inst1.170825060338
AUTHID  : DB2INST1             HOSTNAME: host1.example.com
EDUID   : 24993                EDUNAME: db2agent (SPSSDB) 0
FUNCTION: DB2 UDB, relation data serv, sqlrLocalRunstats, probe:12276
MESSAGE : ZRC=0x80100044=-2146435004=SQLP_LTIMEOUT
          "LockTimeOut - tran rollback Reason code 68"
DATA #1 : unsigned integer, 8 bytes
11528
DATA #2 : SQLCA, PD_DB2_TYPE_SQLCA, 136 bytes
 sqlcaid : SQLCA     sqlcabc: 136   sqlcode: -911   sqlerrml: 2
 sqlerrmc: 68
 sqlerrp : SQLRC02B
 sqlerrd : (1) 0x80100044      (2) 0x00000044      (3) 0x00000000
           (4) 0x00000000      (5) 0x00000000      (6) 0x00000000
 sqlwarn : (1)      (2)      (3)      (4)        (5)       (6)
           (7)      (8)      (9)      (10)        (11)
 sqlstate:

2017-08-24-23.11.08.696601-420 E887356E784           LEVEL: Event
PID     : 30986                TID : 140729082963712 PROC : db2sysc 0
INSTANCE: db2inst1             NODE : 000            DB   : SPSSDB
APPHDL  : 0-20515              APPID: *LOCAL.db2inst1.170825060338
AUTHID  : DB2INST1             HOSTNAME: host1.example.com
EDUID   : 24993                EDUNAME: db2agent (SPSSDB) 0
FUNCTION: DB2 UDB, relation data serv, sqlrLocalRunstats, probe:220
COLLECT : TABLE AND INDEX STATS : Object name with schema : AT "2017-08-24-23.11.08.696578" : BY "User" : DUE TO "Error" : failure
OBJECT  : Object name with schema, 17 bytes
IBMWSSIB.SIBOWNER
IMPACT  : None
DATA #1 : String, 30 bytes
RUNSTATS command not available
DATA #2 : String, 26 bytes
ZRC=0x80100044=-2146435004

In this specific case, the table in question very rarely changes anyway, so we’re not terribly worried about actually catching runstats on it. We’re more worried about the needless overhead of the constant lock timeouts and associated logging. To deal with this, what we need to do is to remove the table from consideration by our automatic runstats.

Generating the XML Policy File

The first step is to get the current XML policy file so that we can then update the policy. This is relatively easy to do with the AUTOMAINT_GET_POLICYFILE stored procedure:

$ db2 "call AUTOMAINT_GET_POLICYFILE ('AUTO_RUNSTATS', 'spssdb_auto_runstats1.xml')"

  Return Status = 0

When you use this stored procedure, the policy file is placed in the tmp sub-directory of sqllib. For a vanilla DB2 installation, it is likely to look something like this:


<?xml version="1.0" encoding="UTF-8"?>

<db2autorunstatspolicy xmlns="http://www.ibm.com/xmlns/prod/db2/autonomic/config">
 <runstatstablescope>
  <filtercondition></filtercondition>
 </runstatstablescope>
</db2autorunstatspolicy>

Updating the XML

The place to update the file is with elements within the FilterCondition attribute. Here we want to place what we would put in a where clause to eliminate this table. In this case, I’m updating it to look like this:


<?xml version="1.0" encoding="UTF-8"?>

<db2autorunstatspolicy xmlns="http://www.ibm.com/xmlns/prod/db2/autonomic/config">
 <runstatstablescope>
  <filtercondition>
        (TABSCHEMA != 'IBMWSSIB' and TABNAME != 'SIBOWNER')
  </filtercondition>
 </runstatstablescope>
</db2autorunstatspolicy>

Setting the XML Policy File

After I have the file defined the way I want it, I use the AUTOMAINT_SET_POLICYFILE procedure to set this file as the new policy, like so:

$ db2 "call AUTOMAINT_SET_POLICYFILE ('AUTO_RUNSTATS', 'spssdb_auto_runstats1.xml')"

  Return Status = 0

Resolution

After making this change, I went from about 12 lock timeouts per hour on this database to about 1 per day.

Comparing Two Db2 Systems

$
0
0

Sometimes configuration needs to be kept in sync between two or more Db2 systems. There are a variety of reasons – sometimes this is for keeping two HADR servers in sync, and other times it may be for keeping a dev, QA, or Staging system in sync with production. In any case, having an idea of what needs to be in sync and what doesn’t can be complicated. The focus here is at the system level. This post does not dig into comparison of logical objects in the database such as tables and indexes.

Issues Specific to HADR

HADR synchronizes any logged data. Explicit changes to all kinds of structures, including table spaces, are therefore replicated automatically. Changes made to the database configuration and any levels of configuration at the instance level are NOT automatically done on the standbys. Additionally, changes made to buffer pools by STMM are not replicated to the standby. This makes sense when you think about the nature of HADR and some of the non-standard configurations that are sometimes used for HADR. I often think of HADR as two identical servers. but HADR is occasionally used for replication on the same server or sometimes one HADR standby serves as standby for a large number of Db2 databases on different servers. To accommodate configurations like this, it is necessary to limit what is copied.

Buffer Pools

It therefore rests on the DBA to ensure that the database configuration, database manager configuration, Db2 registry, file systems, and buffer pools remain in sync. The easiest way to do most of this is simply to incorporate changes to any of these areas across all servers in an HADR cluster in your everyday thinking and processes. The exception to that may be the buffer pools. Often with STMM, Db2 is constantly tuning the buffer pools for optimal performance. DBAs often don’t even know about these changes. It is usually not worth removing bufferpools from STMM just for the sake of consistency on failover. Instead, the easy way to handle this is to regularly issue alter buffer pool statements to push the bufferpool sizes on the standby to match the current sizes on the primary. Issuing the ALTER BUFFERPOOL statement is logged and is therefore replicated to all standbys. You can maintain the automatic setting, even while specifying a value with syntax like this:

db2 alter bufferpool BUFF32K size 10000 automatic

Specifying both a size and the AUTOMATIC keyword essentially tells Db2 to go ahead and tune the buffer pool, but to start with the value specified. Assuming that the memory isn’t needed for other memory areas and dbmemthresh (sp?) is set to the default of 100, Db2 is unlikely to decrease the buffer pool size.

This explicit buffer pool sizing is easy to script.

Database and Database Manager Configuration

Changes to the database configuration and other configuration areas are not covered by HADR. This includes automatic tuning changes made. If our standby system is identically sized, and we want optimal performance on failover, we manually need to keep changes in these areas in sync.

Comparing HADR and Non-HADR Environments

Another thing to keep in mind with HADR is that there are some parameters that are required for HADR databases and not for non-HADR databases. Often HADR is in production. It is occasionally in one non-production environment, but rarely in all non-production environments. The most obvious examples of this are:

  • BLOCKNONLOGGED – should always be set in HADR environments, rarely in non-HADR environments
  • LOGINDEXBUILD – should always be set in HADR environments, rarely in non-HADR environments
  • INDEXREC – should be set to RESTART, but can be set to other values in non-HADR environments
  • HADR_SPOOL_LIMIT – may be set on HADR servers, but isn’t usually in non-HADR environments.

There are a few other parameters in the Db2 registry, such as DB2_HADR_ROS, DB2_HADR_SOSNDBUF, and DB2_HADR_SORCVBUF that may be set for HADR environments and not for non-HADR environments.

When comparing HADR and non-HADR environments, we may want to note the differences, but there is no need to sync up the parameter settings.

Issues Specific to Non-Production Comparisons to Production

Sizing is often a major difference in between production and non-production environments. Often a production server is the beefiest server anyone can possibly justify, while non-production servers may just be whatever spare hardware is lying around. Non-production, therefore, often has less memory, fewer CPUs, and less disk. This obiously necestates differences in areas and parameters that might normally be part of a system comparison. This affects the filesystem and tablespace details if the disk size is different. If memory and CPUs are different, then there are a number of parameters in the database manager and database configuration that SHOULD be different, along with buffer pool sizing. Part of these differences may be less noticible if STMM is in use, as many of the parameters in question may be automatically tuned.

General Areas to Compare

Db2 Registry

The Db2 registry variables are set at the instance level. The db2set command is used to set or view them. Parameters in this area can have a profound or a minor impact on how Db2 functions. Ensuring they match can be critical not just to performance, but to base functionality. I’ve seen an HADR standby that did not have DB2COMM set, and therefore on failover, nothing would be able to connect to the database on the standby. Parameters here have a vast array of purposes from controlling how locking works, tweaking optimization, or even trying new features that are not considered ready for production.

DBM CFG

The database manager configuration also holds a vast array of different kinds of parameters. Some may be required to be in line for things to function properly on a non-production system or on an HADR standby. A few may need to be sized differently for a smaller system as compared to a same-size system.

DB CFG

This is the configuration area where sizing differences may make for the most significant differences in parameter values. A careful side-by-side comparison is critical. It is posible to use the -printdbcfg option of db2look to mimic settings exactly.

$ db2look -d sample -printdbcfg -o sample.dbcfg
-- No userid was specified, db2look tries to use Environment variable USER
-- USER is: ECROOKS
-- Output is sent to file: sample.dbcfg

The output from this command gives all statements needed to mimic the db cfg, including settings that may be obvious and rarely changed.

Db2 Licensing/Edition

This is a bit of an overlooked area. Even development environments may need to be licensed depending on the licensing model used. More importantly, a few distributions of Db2 don’t have the same features as others. For example, Express-C doesn’t allow online reorgs. If you have a higher edition in non-production, but then are using Express-C in production, you may be surprised when you try to do something that worked perfectly well in lower environments. While I love the new Developer-C edition of Db2, it makes issues like this even more possible.

File systems

This is of utmost concern when comparing HADR standbys or when comparing a production environment to a non-production environment where a full backup will be restored. In those situations, something will fail if the file system sizes, ownership, and permissions are not kept exactly in sync. Even if you’re comparing a differently sized non-production environment to a corresponding production environment, verifying that file systems with the same name, ownership, and permissions exist is a good idea. It is possible and sometimes necessary to have a non-production environment with a simplified file system structure, but it is nice to keep them identical whenever possible.

Table Spaces

It should not be possible for the table space design to get out of sync on an HADR environment, since changes should be replicated for you. In other enviornments, it is possible for a tablespace to be added or even dropped in one environment, but not in the other, or for the storage paths or container paths to be added or changed in some way. This is especially true if a tablespace like SYSTOOLS is automatically added by Db2, or if a 32K temp tablespace does not exist, and one is added in response to a specific error.

Buffer Pools

Outside of the issues previously mentioned for HADR environments, buffer pools are one of the areas least likely to be a problem, particularly if they are all a part of STMM. Buffer pools are likely to be of different sizes in production and non-production enviornments, and as long as key performance indicators like the buffer pool hit ratio are in reasonable ranges, that should be just fine.

CLI Packages

This is one I’ve seen forgotten several times. Sometimes in response to an error, a DBA will bind additional CLI packages. Later, it may come to light that the same number of CLI packages are not available in another environment. By default, 3 are bound, but a number can be specified up to 30. The way to check this is to look at the system packages using a query like this:

select 
    substr(pkgname,1,18) as pkg_name
    , valid 
from syscat.packages 
where pkgname like 'SYS%' 
    and pkgschema='NULLID' 
order by pkg_name   

PKG_NAME           VALID
------------------ -----
SYSLH100           Y
SYSLH101           Y
SYSLH102           Y
SYSLH103           Y
SYSLH104           Y
SYSLH105           Y
SYSLH106           Y
SYSLH107           Y
SYSLH108           Y
SYSLH109           Y
SYSLH200           Y
SYSLH201           Y
SYSLH202           Y
SYSLH203           Y
SYSLH204           Y
SYSLH205           Y
SYSLH206           Y
SYSLH207           Y
SYSLH208           Y
SYSLH209           Y
...

In this example, 10 was specified when CLI packages were bound. There are multiple ranges we can see that in – displayed here are the SYSLH1 and SYSLH2 ranges.

Summary

Keeping two or more enviornments in sync as far as configuration is concerned is a task that pulls strongly on one of the DBA’s biggest strengths – attention to detail. There are hundreds of parameters and details to compare, even before we get down to the actual logical object level. Even if you are really on top of this task, periodic reviews of environments should be done.



Latest Images