Friday, December 6, 2013

CPU in AWR Report tips

Question:  My AWR report is showing CPU as the top entry in the top-5 timed events.

I'm concerned because it is consuming 36% of my total DB Time.
Top 5 Timed Foreground Events
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                                           Avg
                                                          wait   % DB
Event                                 Waits     Time(s)   (ms)   time Wait Class
------------------------------ ------------ ----------- ------ ------ ----------
DB CPU                                            6,452          36.1
log file sync                       391,362       5,015     13   28.0 Commit
SQL*Net message from dblink       2,925,160       1,891      1   10.6 Network
db file scattered read            1,023,422         305      0    1.7 User I/O
read by other session               499,400         213      0    1.2 User I/O


How can I tell from the AWR report if I am experiencing a CPU bottleneck?


Answer:  Determining CPU in an AWR report is challenging because you have to look in several areas within the AWR report to get the CPU utilization information.  Just because CPU is the #1 timed foreground event, does not mean that the CPU is pegged at 100%, and it does not necessarily mean that you have a "real:" CPU bottleneck, a case where tasks wait in the runqueue for execution.



Note: Before we begin discussing CPU utilization metrics within an AWR report, it is important to understand that oracle statistics are incomplete because of incomplete instrumentation on the Oracle database.  However, you can get accurate CPU metrics viavmstat, as well as top and glance.

Using the example AWR report that you have provided, we can determine the total amount of available CPU and see how much total CPU processing time was available.  The first thing we need to note is the total elapsed time for the AWR report, and this is at the very beginning of the AWR REPORT":


WORKLOAD REPOSITORY report for

DB Name         DB Id    Instance     Inst Num Startup Time    Release     RAC
------------ ----------- ------------ -------- --------------- ----------- ---
FINONE        1252548811 mydb              1 13-Dec-12 22:10 11.2.0.3.0  NO

Host Name        Platform                         CPUs Cores Sockets Memory(GB)
---------------- -------------------------------- ---- ----- ------- ----------
MYPROD           AIX-Based Systems (64-bit)         24    12              80.00

              Snap Id      Snap Time      Sessions Curs/Sess
            --------- ------------------- -------- ---------
Begin Snap:     25130 15-Dec-12 03:00:39       425      12.2
  End Snap:     25131 15-Dec-12 04:00:45       430      14.7
   Elapsed:               60.09 (mins)
   DB Time:              298.22 (mins)


Note above the relationship between the "Elapsed" and "DB Time" columns.  The report period was only one hour (60 minutes), yet we see 298 for DB Time.  This suggests that the (298/60) that there were only five session active at any given time.
Now that we know the total time for the report, if we look farther down into the "OPERATING SYSTEM STATISTICS" section on the AWR report, there are three metrics that are important for determining CPU utilization:
  • BUSY_TIME- The total amount of time that the CPU was busy for the elapsed time period.  This will exceed the wall-=clock time because we usually have multiple CPU's.

  • IDLE_TIME - The amount of time that the database was idle.

  • NUM_CPUS - The number of processors available to the Oracle database


    Here is an example AWR report section showing the total CPU usage:
Operating System Statistics        
-> *TIME statistic values are diffed.
   All others display actual values.  End Value is displayed if different
-> ordered by statistic type (CPU Use, Virtual Memory, Hardware Config), Name

Statistic                                  Value        End Value
------------------------- ---------------------- ----------------
. . .
BUSY_TIME                                811,048
IDLE_TIME                              7,854,100
IOWAIT_TIME                              613,785
SYS_TIME                                 153,228
USER_TIME                                657,820
LOAD                                           1                2
OS_CPU_WAIT_TIME                         871,600
RSRC_MGR_CPU_WAIT_TIME                         0
VM_IN_BYTES                       26,080,108,536
VM_OUT_BYTES                      14,015,483,904
PHYSICAL_MEMORY_BYTES             85,899,280,384
NUM_CPUS                                      24
 . . . 


Now that we have the salient figures, we can calculate the amount of CPU.  The equation for total available CPU is as follows:

      Available CPU = NUM_CPU's * elapsed_time_bet_snapshots * 60(secs) 

                                =  24 * 60 * 60 = 86,400 seconds

Now we can compare that to the amount of used CPU and see that this database was not CPU-bound.  The used DB CPU (6,452 seconds) was only a small fraction of the total available CPU (86,400 seconds).

No comments:

Post a Comment