TOCPREVNEXTINDEX

Lund Performance Solutions


SOS Global Summary

The Global Summary Screen

The sos Global Summary screen provides a summary of activity system-wide:
  • Product version and collection interval information
  • Key indicators of performance data
  • Global statistics
  • CPU utilization statistics
  • CPU miscellaneous statistics
  • Memory and virtual memory statistics
  • Miscellaneous statistics
  • Disk statistics
  • Process statistics
  • Workload statistics
  • System performance advice
  • The Global Summary screen is the first screen to display when you start sos and the usual starting point for any review of system activity and performance. The screen can be displayed in either graphical or tabular format.
    To access the Global Summary screen from any sos display screen:
  • Type s from the sos Enter command: prompt to view the Screen Selection Menu screen.
  • From the Screen Selection Menu screen, enter g (Global Summary). The Global Summary screen will display.
  • Type t from the Global Summary screen to toggle between the graphical and tabular formats.
  • Graphical Format

    Figure 9.1 shows an example of the Global Summary screen in graphical format.




    Figure 9.1 SOS Global Summary screen (graphical format)
    The graphical Global Summary screen can show the following information:
  • The SOS banner
  • The Key Indicators of Performance (KIP) line (optional)
  • GLOBAL statistics
  • PROCESS SUMMARY (optional)
  • WORKLOAD SUMMARY (optional)
  • SYSTEM PERFORMANCE ADVICE messages (optional)
  • Each of these components is described in "Global Summary Screen Display Items".

    Tabular Format

    To toggle between the graphical and tabular format options, press the t key from the Global Summary screen. Figure 9.2 shows an example of the Global Summary screen in tabular format.




    Figure 9.2 SOS Global Summary screen (tabular format)
    The tabular Global Summary screen can show the following information:
  • The SOS banner
  • CPU UTILIZATION statistics (including cumulative statistics)
  • CPU MISC statistics
  • MEM/VM statistics (optional)
  • MISC global statistics (optional)
  • DISK statistics (optional)
  • PROCESS SUMMARY (optional)
  • SYSTEM PERFORMANCE ADVICE messages (optional)
  • Each of these components is described in detail in "Global Summary Screen Display Items".

    Global Summary Screen Display Items

    SOS Banner

    The SOS banner is always displayed at the top of all SOS data display screens.




    Figure 9.3 SOS Global Summary screen: SOS banner
    The banner contains information about the SOS program, the host system, the elapsed interval, and the current interval.
    Product Version Number (SOS V.nnx)
    The first item displayed in the SOS banner (reading left to right) is the product version number (SOS V.nnx). The version number denotes the following about the product:
  • SOS is the name of the product.
  • V denotes the major version level.
  • nn denotes the minor version level.
  • x denotes the fix level.
  • The SOS version number displayed in the example (refer to Figure 9.3) is B.01y. When contacting technical support, please provide the product version number of the software installed on your system.
    System Name
    The second item displayed in the SOS banner is the name of the system given during the installation of the operating system. The name of the system used in the example shown in Figure 9.3 is eagle.
    Current Date and Time (DDD, DD MMM YYYY, HH:MM)
    The third item in the SOS banner is the current date and time:
  • DDD denotes the day of the week.
  • DD denotes the day of the month.
  • MMM denotes the month.
  • YYYY denotes the year.
  • HH:MM denotes the hour and minutes.
  • Elapsed Time (E: HH:MM:SS)
    The fourth item displayed in the SOS banner is the elapsed time (E:HH:MM:SS), which is the time counted in hours, minutes, and seconds that has passed since you started the current session of SOS. This elapsed time measurement is especially valuable when viewing cumulative statistics. For further information, refer to "Display cumulative stats".


    To reset the elapsed time to zero, type r from any SOS display screen.
    Current Interval (I: MM:SS)
    The last item displayed in the SOS banner is the current interval (I: MM:SS). The current interval is the amount of time in minutes and seconds accumulated since SOS last updated the screen. The measurements reported on any SOS display screen are valid for the current interval.


    By default, the interval refresh rate is 60 seconds. You can adjust this rate from the Main Options Menu screen. For further information, refer to "Screen refresh interval in seconds".


    Assuming the interval refresh rate is 60 seconds, the current interval displayed in the SOS banner should be I: 01:00. However, if at some point during the measurement interval the program has to wait for user input, the interval update will be delayed. For example, when the f key is pressed from an SOS display screen to "freeze" the current interval, the next update is delayed until the user enters the command to "unfreeze" the interval.


    If the current interval displayed is less than the interval refresh rate, the user pressed the u key from an SOS display screen to update the performance data mid-interval.
    Current Interval Metrics vs. Cumulative Averages
    The statistical values expressed in the format "nnn.n" represent measurements for the current interval (I: MM:SS). The values in brackets, [nnn.n], represent cumulative averages for the elapsed interval (E: HH:MM:SS).

    Key Indicators of Performance (KIP) Line

    The Key Indicators of Performance (KIP) line can be displayed just below the SOS banner. This option is invoked when the Display Key Indicators of Performance option is enabled from the SOS Main Option Menu screen.




    Figure 9.4 SOS Global Summary screen: Key Indicators of Performance (KIP) line
    The purpose of the KIP line is to display statistics associated with the primary indicators of system performance. The data displayed in the KIP line is configurable. By default, it shows Total Busy, High Pri, and Read Hit data for the current interval.
    Total Busy
    The Total Busy value displayed in the KIP line is the percentage of time the CPU spent executing the following activities instead of being in a pause or idle state:
  • Processing user and system process code
  • Processing interrupts
  • Processing context switches
  • Managing main memory
  • Managing traps
  • High Pri
    The High Pri value displayed in the KIP line is the percentage of time the CPU spent executing high priority processes during the current interval.
    Read Hit
    The Read Hit value displayed in the KIP line is the read hit percentage for the current interval.


     
    NOTE By editing the soskip text file located in the /etc/opt/lps/cfg directory, you can redefine the variables to display in the KIP line. For information about editing the soskip file, see "SOS soskip File".


    GLOBAL

    The GLOBAL statistics portion of the Global Summary screen contains a simple bar graph that summarizes activity levels system-wide.

    GLOBAL (Left Column)

    CPU%
    The CPU% bar graph (the left portion of the GLOBAL statistics) shows the percentage of CPU time expended during the current measurement interval on various activities.




    Figure 9.5 SOS Global Summary screen: GLOBAL (left column)
    Each letter-width space on the CPU% bar graph represents approximately 2 percent of the CPUs time for the current interval. The code letters correspond to the CPU activities described in Table 9.1. Where a block of spaces on the bar graph is bordered by two instances of one code letter (e.g., S...S), that corresponding activity (e.g., executing system calls and code) would account for the CPU% range bordered by the two letters.
    For example, the CPU% bar shown in Figure 9.5 indicates the following:
  • 14 percent of CPU time in the current interval was spent executing processes with a negative nice value in user mode.
  • 14 percent of CPU time was spent executing system calls and code (in kernel mode).
  • 2 percent of CPU time was spent executing system interrupt handling code.
  • 4 percent of CPU time was spent managing virtual memory.
  • The code letters used in the CPU% bar graph are described in Table 9.1.
    Table 9.1 CPU%
    Code
    Statistic
    Description
    C
    Context Switch
    The percentage of time managing context switches between processes.
    I
    Interrupt
    The percentage of time executing system interrupt handling code.
    M
    Memory
    The percentage of time managing virtual memory.
    N
    Nice
    The percentage of time executing processes w/a nice value in user mode.
    R
    Real Time
    The percentage of time executing real-time processes in user mode.
    S
    System
    The percentage of CPU time spent executing system calls and code (in kernel mode). This does not include time spent performing context switches or idle time.
    T
    Trap
    The percentage of time managing traps.
    U
    User Mode
    The percentage of CPU time spent executing user program code with a nice value of 20 and without any special priority .
    W
    Wait
    The amount of idle time the CPU spent waiting for a disk I/O to complete.
    X
    Negative Nice
    The percentage of time executing processes with a negative nice value in user mode.
    states or activities
    RHit%
    The RHit% bar represents the buffer cache read hit percentage.
    WHit%
    The WHit% bar represents the write hit percentage.
    IO/s
    The IO/s bar represents the disk I/O rate. This is the number of physical reads and writes per second for each type of physical I/O. Similarly to the CPU% bar (see "CPU%"), specific code letters in the bar graph tell you how many of each type of physical I/Os were accumulated in the current interval. Each of these code letters are listed and described in Table 9.2.
    Table 9.2 Physical I/Os
    Code
    Physical I/O
    Description
    U
    User File System
    The number of user file system physical I/Os accumulated in the current interval.
    S
    System
    The number of system physical I/Os accumulated in the current interval.
    V
    Virtual Memory
    The number of virtual memory physical I/Os accumulated in the current interval.
    R
    Raw
    The number of raw physical I/O's accumulated in the current interval.

    GLOBAL (Right Column)

    The scale for the next four global statistics ranges from 2 to 20. A value greater than 20 is represented by a trailing greater than character (>).




    Figure 9.6 SOS Global Summary screen: GLOBAL (right column
    Each data item in the right column of the GLOBAL statistics is described in Table 9.3.
    Table 9.3 SOS Global data items
    Data Items
    Description
    RunQ Len
    The average number of processes in the CPU run queue during the current interval.
    Pg Out/s
    The number of page outs per second.
    Deact b/s
    The number of deactivated bytes per second.
    I/O QLen
    The average number of disk I/O requests pending for all disks during the current interval.

    PROCESS SUMMARY

    After reviewing the general state of global resources, the next logical step in analyzing a system’s performance is to observe individual processes. It is important to find out which users are running which programs and what kinds of resources those programs are consuming. The primary purpose of the PROCESS SUMMARY portion of the Global Summary screen is to help you to identify key resources consumed by various processes on the system.
    To examine the CPU usage, disk I/O usage, and wait state information for a process, open the Process Detail screen. For further information, see "SOS Process Detail".

    PROCESS SUMMARY Display Options

    The PROCESS SUMMARY section is included in the Global Summary screen by default when the SOS program is started. However, this information can be suppressed. For instructions, refer to "Display process information".
    You can configure the PROCESS SUMMARY display in the following ways:
  • Display or suppress the extended process line.
  • Display either the total and I/O percentages or the read and write counts.
  • Display all processes or only the active processes.
  • Display or suppress attached processes.
  • Display or suppress detached processes.
  • Display or suppress system processes.
  • Display or suppress processes that have died.
  • Apply a process logon filter.
  • Apply a process sort option.
  • Display sorted processes in either ascending or descending order.
  • Set a maximum number of processes to display.
  • For information about these options, please refer to "Process Display Options".

    PROCESS SUMMARY Data Items

    Figure 9.7 SOS Global Summary screen: PROCESS SUMMARY
    The contents of each PROCESS SUMMARY column (shown in Figure 9.7) are described in the next table.
    Table 9.4 SOS Process Summary data items
    Data Item
    Description
    PID
    The process identification number that uniquely identifies each process running on the system.
    Name
    The process name.
    User Name
    The name of the user that owns (or creates) each process running on the system.
    TTY
    "TTY" is defined in SOS as the special device file of the terminal to which the process is attached. The TTY column will show three dashes (---) for processes that are not attached to a terminal (processes such as daemons and batch jobs).
    CPU%
    The CPU% column shows the percentage of system-wide CPU time that was used by each process. This is normalized for multiple-processors. In other words, all CPU% values added together should never exceed 100 (percent).
    Nic
    The Nic (Nice) column displays the nice value associated with each process.This value, ranging from 0 to 39 (the default is 20), is a determining factor when a process’s priority is recalculated.
  • A process with a larger nice value will receive a higher priority (resulting in a lower-priority status).
  • A process with a smaller nice value will receive a lower priority (resulting in a higher-priority status).
  • A process that slows system response time can be "niced" to lower its priority and allow other processes to be executed more quickly.
    Pri
    The Pri column shows the most recent priority that each process was given.
    As explained earlier, high priority numbers indicate low-priority status, and vice versa. The priority numbers between 0 and 127 indicate high-priority status and are reserved for certain system daemons or real-time processes. The majority of processes are given numbers between 128 and 255, which indicate timeshare-priority status. A typical timeshare process will fluctuate within this priority range, based on the process’s CPU demands and the system’s load. Processes executing at nice priorities typically have larger numbers (lower priorities).
    The system scheduler dynamically sets the priority by considering several factors, such as CPU utilization. Because the scheduler tries to allocate CPU time fairly among the processes, it will lower the scheduling priority of process that require a lot of CPU time. This means that as a process’s CPU usage grows, its priority number in the Pri column will increase.
    RSS/Size
    The RSS/Size column presents two data items for each process running on the system. The RSS value represents the resident set size—the amount of RAM used by the process. The Size value represents the size in kilobytes of the core image of the process. This includes text, data, and stack space. In other words, the amount of swap or virtual memory the process has reserved.
    Performance Tip
    Large values in the RSS/Size column indicates the corresponding process uses a lot of memory. Processes in this category may need to be checked for memory usage problems.
    #Rd
    The #Rd column lists the number of physical reads performed by each process during the current interval.
    #Wr
    The #Wr column shows the number of physical writes performed by each process during the current interval.
    Performance Tip
    The #Wr values are important because they can point to processes that are performing excessive disk I/Os. To confirm, check the SYSTEM PERFORMANCE ADVICE portion of the Global Summary screen for a message that reports the high I/O process for the current interval. When high #Rd and #Wr values are evident, determine whether the I/Os are necessary or unnecessary.
    Wait
    The Wait column in the PROCESS SUMMARY portion of the Global Summary screen shows which wait state the corresponding process was in at the end of the current interval. Each wait state is described in the appendix, "Wait State Codes".
    Performance Tip
    Wait state information is helpful when you want to determine why a process is "stuck." Keep in mind, however, that the wait state of a process can change radically in a manner of seconds. If you suspect a problem, check the information provided for that process in the Process Detail screen.
    Resp
    The average response time (seconds). A less than character (<), represents a value less than 0.1.

    Extended Process Statistics Lines

    The PROCESS SUMMARY portion of the Global Summary screen can be expanded to show the percentage of time each process spent in one or more wait states during the current interval. This additional process information is displayed below each corresponding process statistics line in an extended process line.


     
     
    Wait States
    Process Line
    Extended Process Line
     
     
    ___
    ___
    ___




    Figure 9.8 SOS Global Summary screen: extended process column headings and lines
    The extended process lines together with the extended process headings line can be enabled from the Process Display Options submenu of the SOS Main Options Menu or by typing the y key from the Global Summary screen (toggles the extended process lines on and off).
    The statistics in the extended process lines correspond with the column headings in the extended process headings line. Each column heading is described in Table 9.5.


    Table 9.5 Extended process column headings
    Heading
    Description
    {RN,...,OT}
    The percentage of the process’ time spent in the corresponding wait state during the current interval. See "Wait State Codes" for a description of each state.
    CPU (ms)
    The total CPU time in milliseconds used by the process during the current interval.
    The percentage of time the process spent in a wait state is represented by one of the following:
  • A number between 0 and 100 (percent).
  • A less than character (<), which represents a value less than 0.1 percent.
  • An asterisk character (*), which represents a value greater than 100.0 percent.
  • For example, the extended process line for PID 3 shown in Figure 9.8 provides the following information:
  • Process 3 spent 100 percent of the current interval in the SL wait state, waiting for a sleep or wait call to expire.
  • Process 3 consumed 48 ms of the CPU time during the current interval.
  • Additional information about a process can be viewed in the Process Detail screen, which is discussed in "SOS Process Detail".

    WORKLOAD SUMMARY

    The SOS program is able to track process statistics by application workloads. Workloads was discussed in "Workload Groups". Workload statistics can be displayed in the WORKLOAD SUMMARY portion of the Global Summary screen.

    WORKLOAD SUMMARY Display Options

    To display the WORKLOAD SUMMARY statistics in the Global Summary screen, first enable the Display workload information option from the SOS Main Options Menu screen.




    Figure 9.9 SOS Global Summary screen: WORKLOAD SUMMARY
    By default, all workloads running on the system are included in this process summary. To show only the active workloads, enter Y (Yes) for the Display only active workloads option in the SOS Main Options Menu screen, then set the minimum CPU time required for workload display to a value between 0.1 and 99.9 percent.

    WORKLOAD SUMMARY Data Items

    The data items presented in the WORKLOAD SUMMARY portion of the Global Summary screen are described in the following table.
    Table 9.6 SOS Workload Summary data items
    Data Item
    Description
    Num
    The workload numbers in ascending order as they appear in the workload definition file.
    Name
    The name assigned to each workload as it appears in the workload definition file.
    CPU%
    The percentage of CPU time used by each workload during the current interval and the elapsed interval.(Elapsed interval data is enclosed in brackets ([ ]).
    User CPU%
    The percentage of system-wide I/Os performed by this workload.
    Disk I/O%
    The percentage of each workload’s CPU percentage that was spent on disk I/O during the current and elapsed intervals.
    Resp Time
    The average response times (seconds) calculated for each workload during the current and elapsed intervals.
    Trans/min
    The total number of transactions per minute counted for each workload during the current and elapsed intervals.

    CPU UTILIZATION

    Information presented in the CPU UTILIZATION portion of the tabular Global Summary screen will help you to evaluate your system’s CPU performance by showing you how global activities are expending CPU time.




    Figure 9.10 SOS Global Summary screen: CPU UTILIZATION
    The statistical values expressed in the format "nnn.n" represent measurements for the current interval. The values in brackets, [nnn.n], represent cumulative averages for the elapsed interval.

    CPU UTILIZATION Data Items

    The data items presented in the CPU Utilization portion of the Global Summary screen are described in the next table.
    Table 9.7 SOS CPU Utilization data items
    Data Item
    Description
    TOTAL BUSY
    The percentage of time the CPU was busy (not idle) during the current (nn.n) and elapsed intervals ([nn]). The TOTAL BUSY value is the sum of the values reported for User, Real, Nice, NNice, Sys, Intr, C SW, Trap, and Mem values reported in the same area of the Global Summary screen.
    Performance Tip
    When the TOTAL BUSY value is consistently greater than 75 or 80 percent and the majority of this resource is consumed by high- priority interactive user processing, it is possible that the CPU is a bottleneck on your system. It is important to observe this data over time and not base your diagnosis on a brief spike in CPU activity.
    If the TOTAL BUSY value is excessive due to batch job activity, there is usually ample CPU capacity for interactive users. To confirm your diagnosis, investigate the average length of the CPU queue (see "RunQ Avg").
    HIGH PRI
    The percentage of time the CPU spent executing high priority processes.
    User
    The percentage of time the CPU spent executing user code with a nice value of 20 and without any special priority status.
    Performance Tip
    It is usually advantageous to allow the majority of CPU time to be spent processing user code (including real- and nice-level code). To get a feel for the relative impact of productive or nonproductive work, monitor the Capture Ratio value (see "MEM/VM").
    Real
    The percentage of time executing real-time processes in user mode.
    Nice
    The percentage of time executing processes with a nice value in user mode.
    NNice
    The percentage of time executing processes with a negative nice mode.
    Sys
    The Sys value in the CPU UTILIZATION portion of the Global Summary screen represents the percentage of time the CPU spent in system (kernel) mode.
    Performance Tip
    All processes spend some time executing system code. A large Sys value may indicate a problem with programs making unnecessary or inefficient system calls. You may want to identify all system processes and sort them by CPU usage to see which process(es) is (are) causing the problem.
    Intr
    The percentage of time processing interrupts.
    C SW
    The percentage of time managing context switches.
    Trap
    The percentage of time processing traps.
    Mem
    The percentage of time the CPU spent managing virtual memory.
    Idle
    The Idle value represents the percentage of time the CPU was not in use.
    Performance Tip
    A consistently high Idle value means your CPU is "on vacation" most of the time. Although it is not desirable to swamp the processor, it should "earn its keep" by performing at or near capacity.
    If the Idle value is consistently low and the lack of idle time is primarily due to session activity, the system may be overloaded. Either reduce such processing or obtain more CPU horsepower via an upgrade. It is best to observe entire days of idle time values. You may see plenty of idle time at noon, but no idle time between 3:00 and 4:00 P.M. Shifting workloads (batch scheduling, users work hours, etc.) will help bring this type of peak-period utilization down.

    CPU MISC

    The CPU MISC portion of the tabular Global Summary screen provides statistics to further analyze the condition of your system.
    Figure 9.11 SOS Global Summary screen: CPU MISC

    CPU MISC Data Items

    The data items presented in the CPU MISC portion of the Global Summary screen are described in Table 9.8.
    Table 9.8 SOS CPU Miscellaneous data items
    Data Item
    Description
    Capture Ratio
    The Capture Ratio value is calculated as:
    Capture Ratio = (User + Real + Nice + NNice) / (Sys + Intr + C SW + Trap + Vflt)
    Performance Tip
    A Capture Ratio value equal to one or greater indicates the system is spending more than half it’s time on useful system work. A value of less than one means the system is spending more than half it’s time on overhead.
    RunQ Avg
    The average number of processes present in the CPU run queue during the current interval. The value reported in brackets is the cumulative run queue average for the elapsed interval.
    The RunQ Avg values reported in the Global Summary screen are similar to the system load average values retrieved by typing the uptime command at the Unix command prompt.
    5/15 Min RunQ Avg
    The 5/15 Min RunQ Avg values show the load average in the last five minutes and the last 15 minutes, respectively.
    RunQ Busy%
    The RunQ Busy% value represents the percentage of time that at least one process was waiting for the CPU. A high percentage is not uncommon, but 100 percent is not desirable.

    MEM/VM

    The MEM/VM statistics reported in the Global Summary screen provide a general overview of memory and virtual memory activities. To view specific memory statistics, refer to the Memory Summary screen. For further information, see "SOS Memory Summary"



    .
    Figure 9.12 Global Summary screen: MEM/VM

    MEM/VM Display Options

    To display or suppress the MEM/VM statistics in the Global Summary screen, enable/disable the Display memory information on global screen option from the SOS Main Options Menu screen.

    MEM/VM Data Items

    The data items presented in the MEM/VM portion of the Global Summary screen are described in Table 9.9.
    Table 9.9 SOS Memory/Virtual Memory data items
    Data Item
    Description
    Read Hit %
    The percentage of disk reads satisfied in the buffer cache.
    Write Hit %
    The percentage of disk writes satisfied in the buffer cache.
    Page Outs
    The number of page outs per second.
    Deact Bytes
    The number of bytes deactivated per second.
    Mem Used %
    The percentage of RAM currently used.
    VM Used %
    The percentage of swap space currently used.

    MISC

    The MISC portion of the tabular Global Summary screen displays several miscellaneous data items such as the number of sessions, the number of processes, the number of I/Os in a wait state, and the average response time. These statistics provide a good overview of the system’s general workload.




    Figure 9.13 SOS Global Summary screen: MISC

    MISC Display Options

    To display or suppress the MISC statistics in the Global Summary screen, enable/disable the Display miscellaneous information on global screen option from the SOS Main Options Menu screen.

    MISC Data Items

    The data items presented in the MISC portion of the Global Summary screen are described in the next table.
    Table 9.10 SOS Miscellaneous data items
    Data Item
    Description
    #Sessions
    The current number of sessions logged on the system.
    #Active
    The #Active value (displayed below the #Sessions value) represents the current number of active sessions (sessions that used at least 0.0 percent of CPU time).
    #Procs
    The current number of processes present on the system.
    #Active
    The #Active value (displayed below the #Procs value) represents the current number of active processes (processes that used at least 0.0 percent CPU).
    #Wait I/O
    The current number of processes that waited on disk I/O.
    #Deact
    The current number of deactivated processes.
    Transactions
    The number of transactions per second that occurred during the current interval. A transaction is defined as a character read or write, or a process death.
    Avg Response Time
    The Avg Response Time value in the MISC statistics portion of the tabular Global Summary screen represents the average response time for all terminals during the current interval.
    Response time is a difficult number to obtain from the Unix operating system. It is defined (as calculated by SOS) as the average number of requests in the system (average number of processes) divided by throughput (the transaction rate).
    Response Time = Number of Requests x Throughput

    DISK

    The DISK portion of the tabular Global Summary screen presents a few statistics for each configured disk drive on the system (see Figure 9.14). This information can help answer:
  • How balanced are the I/Os between disks?
  • Is one disk accessed more than others?
  • Is the number of disk I/Os exceeding acceptable limits?




  • Figure 9.14 SOS Global Summary screen: DISK

    DISK Display Options

    To display or suppress the DISK statistics in the Global Summary screen, enable/disable the Display disk information on global screen option from the SOS Main Options Menu screen.

    DISK Data Items

    The data items presented in the DISK portion of the Global Summary screen are described in this section.
    Table 9.11 SOS Disk data items
    Data Item
    Description
    Disk
    The disk drive in the system’s configuration.
    IO/s
    The number of physical disk reads and writes per second that occurred in the current interval.
    IO%
    The percentage of disk I/Os performed by the disk compared to all other disks on the system.
    QLen
    The QLen value represents the average length of the disk’s queue.
    Performance Tip
    An average queue length of 1.0 or greater is not a good sign. While a typical system may experience "rush hour" situations, it is the consistently long queues that are suspect. If the QLen value for a particular drive is consistently high, explore the following possible causes:
  • Excessive disk arm movement due to heavily hit files. You might achieve better I/O balance by placing complementary files on separate drives.
  • Database inefficiencies. Implement better database maintenance.
  • Hardware issues. Upgrade slow disk drives.
  • SYSTEM PERFORMANCE ADVICE

    The final portion of the Global Summary screen contains the SYSTEM PERFORMANCE ADVICE messages. These advice messages are designed to provide current performance information in plain-English "one-liners" in order to help system administrators zero-in on potential performance problems.




    Figure 9.15 SOS Global Summary screen: SYSTEM PERFORMANCE ADVICE
    At the end of each advice message is a four-character message identification code (for example, <CI01> or <ME01>). The identification code of any standard advice message can be referenced in "SYSTEM PERFORMANCE ADVICE Message Interpretations" to obtain a more detailed explanation of the ascribed event.
    Two types of advice messages can be generated: informational and excessive.
  • An informational message (denoted by an uppercase I in the message identification code) summarizes a particular aspect of the system’s performance during the current interval.
  • An excessive message (denoted by an uppercase E) alerts the user to an excessive condition—a situation or problem that could require immediate action.
  • To get more information about a situation described in an advice message, refer to the GLOBAL or PROCESS SUMMARY portions of the Global Summary screen.

    SYSTEM PERFORMANCE ADVICE Display Options

    To enable SYSTEM PERFORMANCE ADVICE messages, enter Y for the Display advice messages option in the SOS Main Options Menu screen.


    By default, the SYSTEM PERFORMANCE ADVICE messages include both informational messages and excessive use messages. To suppress the informational messages, enter N for the Display informational advice messages option in the SOS Main Options Menu screen.

    SYSTEM PERFORMANCE ADVICE Message Configuration

    The SYSTEM PERFORMANCE ADVICE messages are located in the SOS advice configuration file. This file can be edited by the user to add custom advice messages. For example, adding a message to alert personnel when the average system utilization exceeds 90 percent can be accomplished by following the instructions presented in "SOS advice File".

    SYSTEM PERFORMANCE ADVICE Message Interpretations



     
    RECOMMENDATION The standard SYSTEM PERFORMANCE ADVICE messages that are contained in the SOS advice file (described below) are generic. These messages should be customized for the system using the instructions found in "SOS advice File".


    <BE01> Buffer cache read hit percent low, increase %s
    Advice message BE01 is generated to alert the user when the buffer cache read-hit percentage is equal to or less than 90 percent.
  • If the number of virtual memory page outs for the current interval is equal to or greater than 5, the message will advise the user to increase memory.
  • If the virtual memory page outs number is greater than 0 and less than 5, the message will advise the user to increase the buffer cache size.
  • <BE02> Buffer cache read write hit percent low, increase %s
    Advice message BE02 is generated to alert the user when the buffer cache write-hit percentage is equal to or less than 65 percent.
  • When the number of virtual memory page outs counted in the current interval is equal to or greater than 5, the message will advise the user to increase memory.
  • When the virtual memory page outs number is greater than 0 and less than 5, the message will advise the user to increase the buffer cache size.
  • <CE01> CPU Queue length indicates %s %s CPU bottleneck
    Advice message CE01 is generated to alert the user when the CPU queue length for the current interval is equal to or greater than 5 processes.
  • A CPU queue length equal to or greater than 5 and less than 10 during the current interval is HEAVY.
  • A CPU queue length equal to or greater than 10 is EXCESSIVE.
  • <CI01> The CPU was used a total of %s of its capacity during this interval
    Advice message CI01 is always generated to inform the user of the CPU busy percentage for the current interval.
    <DE01> Average disk service time indicates possible disk bottleneck
    Advice message DE01 is generated to alert the user when the average disk service time for the current interval is equal to or greater than 30 milliseconds, which can indicate a disk bottleneck.
    <GE01> Global average response time during this interval was %s
    Advice message GE01 is generated to alert the user when the global average response time for the current interval is equal to or greater than 10 milliseconds.
  • A global average response time in the range of 10-14 ms is moderate.
  • A global average response time in the range of 15-19 ms is HEAVY.
  • A global average response time equal to or greater than 20 ms is EXCESSIVE.
  • <LE01> Collision percent indicates %s %s network bottleneck
    Advice message LE01 is generated to alert the user when the collision percentage for the current interval is equal to or greater than 5 percent, which indicates a possible network bottleneck.
  • A collision percentage in the range of 5-14 percent is moderate.
  • A collision percentage in the range of 15-29 percent is HEAVY.
  • A collision percentage equal to or greater than 30 percent is EXCESSIVE.
  • <ME01> Page out rate reveals %s %s memory load
    Advice message ME01 is generated to alert the user when the virtual memory page out rate for the current interval is in the range of 10-50 page outs per second.
  • A virtual memory page out rate in the range of 10-14 is moderate.
  • A virtual memory page out rate in the range of 15-19 is HEAVY.
  • A virtual memory page out rate equal to or greater than 20 is EXCESSIVE.
  • <ME02> CPU consumption due to memory mgt overhead during this interval was %s
    Advice message ME02 is generated to alert the user when the page fault percentage for the current interval is equal to or greater than 10 percent.
  • A page fault percentage of 3-4 percent is moderate.
  • A page fault percentage of 5-6 is HEAVY.
  • A page fault percentage equal to or greater than 7 is EXCESSIVE.
  • <PI01> This interval’s ’hog’ process is %s with %s%% of the CPU
    Advice message PI01 is always generated to inform the user of the current interval’s largest CPU consumer. The message provides the process PID number and the process’s CPU busy percentage.
    <PI02> This interval’s highest disk I/O user was %s with %s I/Os
    Advice message PI02 is generated to inform the user of the current interval’s largest disk I/O user. The message provides the disk PID number and the disk I/O percentage.

    Lund Performance Solutions
    www.lund.com
    Voice: (541) 812-7600
    Fax: (541) 81207611
    info@lund.com
    TOCPREVNEXTINDEX