Friday, December 6, 2013

OS Statistics from AWR Reports

There are occassions when it is important to review the "Operating System Statistics" section of the AWR Report. Examples can be :
1. The Database under review is not the only database on the server and one or more other database(s) and/or other application(s) are contending for CPU (and I/O !) resources. If you are reviewing an AWR report for a set of jobs which are CPU intensive (eg high "consistent gets", high "latch waits" and "latch spins" etc), then it is also important to know if adequate CPU was available for the database. You may or may not have been informed of the other database(s) and/or application(s) running on the same server before you begin reviewing AWR reports.
2. The application tier for the database under consideration is on the same server. When you are investigating high "SQL*Net message from client" waits for batch jobs where you know that neither network latency nor user think time would be the cause of these waits, you need to know if CPU resources are being consumed by the application as well.

Pre-10g, there was no method to obtain OS performance statistics from within Oracle. StatsPack would show elapsed time and Oracle time but not any information about CPU usage outside of 'CPU used by this session' and 'CPU used when call started' -- ie CPU usage reported against the same database's sessions.

10g AWR Reports do provide Operating System statistics.

Here are some examples.

Example 1 :
For these 11.8 minutes :

              Snap Id      Snap Time      Sessions Curs/Sess
            --------- ------------------- -------- ---------
Begin Snap:       211 24-Feb-08 22:45:15        21       3.1
  End Snap:       212 24-Feb-08 22:57:03        18       1.9
   Elapsed:               11.80 (mins)
   DB Time:                0.31 (mins)

we see :


Top 5 Timed Events                                         Avg %Total
~~~~~~~~~~~~~~~~~~                                        wait   Call
Event                                 Waits    Time (s)   (ms)   Time Wait Class
------------------------------ ------------ ----------- ------ ------ ----------
RMAN backup & recovery I/O            1,681          82     49  437.9 System I/O
control file sequential read          3,800          20      5  109.4 System I/O
control file parallel write             526           5      9   24.2 System I/O
CPU time                                              3          16.2
db file sequential read                 236           2      8    9.6   User I/O

Statistic Name                                       Time (s) % of DB Time
------------------------------------------ ------------------ ------------
sql execute elapsed time                                 15.5         82.7
inbound PL/SQL rpc elapsed time                           3.5         18.6
PL/SQL execution elapsed time                             3.3         17.5
DB CPU                                                    3.0         16.2
RMAN cpu time (backup/restore)                            2.0         10.8
parse time elapsed                                        0.5          2.9
hard parse elapsed time                                   0.5          2.6

So, Oracle CPU time was only about 3 seconds. Was the server also doing something else ?
These are the Operating System statistics from the AWR Report :


Statistic                                       Total
-------------------------------- --------------------
AVG_BUSY_TIME                                  43,445
AVG_IDLE_TIME                                  27,531
AVG_SYS_TIME                                    4,977
AVG_USER_TIME                                  38,468
BUSY_TIME                                      43,445
IDLE_TIME                                      27,531
SYS_TIME                                        4,977
USER_TIME                                      38,468
...
NUM_CPUS                                            1

These OS CPU statistics account for 709.76seconds(BUSY+IDLE time).
If Oracle accounted for 3seconds of CPU time, then it seems as if some other processes accounted for 381.68 seconds(384.68-3).

Also, the Operating System statistics show that the CPU utilisation [including %sys] was 61.21% (434.45 of 709.76 seconds).

Example 2:
For these 3 hours :


              Snap Id      Snap Time      Sessions Curs/Sess
            --------- ------------------- -------- ---------
Begin Snap:       213 25-Feb-08 00:01:00        18       1.9
  End Snap:       216 25-Feb-08 03:00:40        18       1.9
   Elapsed:              179.67 (mins)
   DB Time:                0.09 (mins)

we see :


Top 5 Timed Events                                         Avg %Total
~~~~~~~~~~~~~~~~~~                                        wait   Call
Event                                 Waits    Time (s)   (ms)   Time Wait Class
------------------------------ ------------ ----------- ------ ------ ----------
control file sequential read          6,096           7      1  124.0 System I/O
control file parallel write           3,593           4      1   77.7 System I/O
CPU time                                              3          50.4
db file parallel write                1,790           1      1   25.7 System I/O
db file sequential read                  49           1     13   12.0   User I/O

Oracle CPU time was 3seconds.
But was the server CPU really idle ?


Statistic                                       Total
-------------------------------- --------------------
AVG_BUSY_TIME                                 721,220
AVG_IDLE_TIME                                 356,777
AVG_SYS_TIME                                    5,531
AVG_USER_TIME                                 715,689
BUSY_TIME                                     721,220
IDLE_TIME                                     356,777
SYS_TIME                                        5,531
USER_TIME                                     715,689
NUM_CPUS                                            1

These statistics show us that CPU %utilisation [including %sys] 
on the server was 66.90% (7212.22seconds out of 10,779.97seconds).
The server was, actually, quite busy -- busy doing something else. Those CPU cycles were consumed by some other application, not being Oracle.

No comments:

Post a Comment