TOCPREVNEXTINDEX

Lund Performance Solutions


SOS Global Summary
The SOS Global Summary screen is the first screen to display when you start SOS and the usual starting point for any review of system activity and performance. The screen can be displayed in either graphical or tabular format.
To access the Global Summary screen from any SOS display screen:
  • Type S from the SOS Enter command prompt to view the Screen Selection Menu.
  • From the Screen Selection Menu, enter G (Global Summary).
  • To toggle between the graphical and tabular display formats, press the G key again.
  • Graphical Format

    Figure 9.1 shows an example of the Global Summary screen in graphical format.


    Figure 9.1 SOS Global Summary screen (graphical format)
    This example screen contains the following information;
  • The SOS banner
  • The Key Indicators of Performance
  • Global statistics
  • Process information (optional)
  • System Performance Advice (optional)
  • Each of these features is described in "Global Screen Display Items".

    Tabular Format

    To toggle between the graphical and tabular format options, press the G key from the Global Summary screen. Figure 9.2 shows an example of the Global Summary screen in tabular format.


    Figure 9.2 SOS Global Summary screen (tabular format)
    The tabular Global Summary screen provides:
  • The SOS banner
  • The Key Indicators of Performance
  • Global CPU Statistics
  • Global Misc. Statistics
  • Global Memory Statistics (optional, and not displayed in Figure 9.2)
  • Global Disc Statistics (optional, and not displayed in Figure 9.2)
  • Process Information (optional)
  • Process Summary by Application Workloads (optional, and not displayed in Figure 9.2)
  • System Performance Advice messages (optional)
  • Each of these components is described in detail in the next section, "Global Screen Display Items."

    Global Screen Display Items

    SOS Banner

    The SOS banner is always displayed at the top of all SOS data display screens.


    Figure 9.3 SOS Global Summary screen: SOS banner
    The SOS banner contains information about the SOS program, the host system, the elapsed interval, and the current interval.

    Product Version Number (SOS 3000 V.nnx)

    The first item displayed in the SOS banner (reading left to right) is the product version number, SOS 3000.Vnnx). The version number denotes the following about the product:
  • SOS 3000 is the name of the product.
  • V denotes the major version level.
  • nn denotes the minor version level.
  • x denotes the fix level.
  • The SOS version number displayed in the example (refer to Figure 9.3) is G.03x. When contacting technical support, please provide the product version number of the software installed on your system.

    System Name

    The second item displayed in the SOS banner line is the name of the system given during the installation of the operating system. The name of the system used in the example shown in Figure 9.3 is LPS.

    Current Date and Time (DDD, MMM DD YYYY, HH:MM)

    The third item in the SOS banner line is the current date and time:
  • DDD denotes the day of the week.
  • MMM denotes the month.
  • DD denotes the day of the month.
  • YYYY denotes the year.
  • HH:MM denotes the hour and minutes.
  • Elapsed Time (E: HH:MM:SS)

    The fourth item displayed in the SOS banner line is the elapsed time (E: HH:MM:SS), which is the time counted in hours, minutes, and seconds that has passed since the current session of SOS Performance Advisor was started.

    Current Interval (I: MM:SS)

    The last item displayed in the banner line is the current interval (I: MM:SS). The current interval is the amount of time in minutes and seconds accumulated since SOS last updated the screen. The measurements reported on any SOS display screen are valid for the current interval.

    By default, the interval refresh rate is 60 seconds. The rate can be adjusted from the Main Options menu screen. For further information, refer to "Screen refresh interval in seconds".

    Key Indicators of Performance (KIP) Line

    The Key Indicators of Performance (KIP) line can be displayed just below the SOS banner.


    Figure 9.4 SOS Global Summary screen: Key Indicators of Performance (KIP) line
    The purpose of the KIP line is to display statistics associated with the primary indicators of performance for the current interval—by default, Total (CPU) Busy, High Pri, MemMgr, and Read Hit data (described below). To configure the KIP line, follow the instructions in "Display Key Indicators of Performance".
    Table 9.1 SOS Global Summary screen: Key Indicators of Performance (KIP) line data
    Data Item
    Description
    Total Busy
    The percentage of time the CPU spent executing all processes during the current interval.
    High Pri
    The percentage of time the CPU spent executing high priority processes during the current interval.
    MemMgr
    The percentage of time the CPU spent managing memory.
    Read Hit
    The read hit percentage for the current interval.

    NOTE By editing the SOSKIP.PUB.LPS file you can redefine the variables to display in the KIP line. For information about editing the SOSKIP file, see "SOSKIP File".

    Global Statistics (graphical format)

    On the graphical version of the Global Summary screen, the Global Statistics block contains system-wide CPU, memory, and disc data The graphical format uses a bar graph to display data either in percentages or total numbers. Each block displayed reflects a value of 2 or 2%. Disc I/O queue length statistics is measured as 0.2.


    Figure 9.5 SOS Global Summary screen (graphical format): Global Statistics
    The graphical format is easier to read than the tabular display (see "Tabular Format") but contains less detailed information.



    Table 9.2 SOS Global Summary screen (graphical format): Global Statistics
    Data Item
    Description
    CPU%
    The percentage of CPU resource used in the major CPU states. The letter codes are described in Table 9.3.
    Space between the end of the video bar and the percent sign (%) denotes the percentage of time the CPU was idle.
    TRN/min
    The estimated number of terminal reads per minute (roughly equivalent to user transactions) based on the number of actual terminal reads performed during the current interval.
    RHIT%
    The percentage of time data requests are satisfied in main memory, without having to perform a disk I/O.
    IO/sec
    The total number of disc I/Os performed on all disc devices, broken down into reads (R) and writes (W). For a detailed explanation of these statistics, see "Disc Data Items".
    TI CPU%
    The percentage of utilized CPU resource that is dedicated to performing DBI calls (successful or not) by all processes. If option 16 in the SOS Main Option Screen is not set to Y (yes), then this statistic will not be displayed. See "SOS TurboIMAGE Database Main" for more information on TurboIMAGE statistics.
    QLEN
    The average number of processes waiting to use the CPU during this interval. See "CPU QLen nn[nn]" for a detailed explanation of this statistic.
    RESP sec
    The average global user prompt response time for terminal transactions. It is the time elapsed from when C/R or ENTER is pressed to the time the user receives a prompt.
    PFLT/s
    The number of memory page faults that occur per second. This indicates whether or not there is enough memory. See "Memory Data Items".
    QLEN
    The average number of disk I/O requests pending for all disc drives combined during the current interval. Each character position in the bar represents an average queue length of 0.2 requests. See "Disc Data Items".
    I/10/sec
    The number of DBI calls (intrinsics) performed by all processors per second, divided by 10. See "SOS TurboIMAGE Database Main".
    Table 9.3 SOS Global Summary: CPU states
    Letter
    Description
    A,B,C,D,E
    These letters indicate how much CPU time for the current interval is used to execute user and system code on behalf of those processes running in each respective scheduling queue. This is the time the CPU works constructively on our behalf as opposed to performing overhead tasks. MPE/iX system processes usually run in the A and B queues. The C queue is typically reserved for interactive user processes. Batch jobs usually run in the D and E queues. See "AQ, BQ, CQ, DQ, EQ nn.n[nn]".
    M
    The percentage of CPU resource spent on managing main memory. See "Memory nn.n[nn]".
    O
    The percentage of time the CPU spends scheduling and dispatching processes and dealing with external device activity. See "ICS/OH nn.n[nn]".
    P
    The percentage of time the CPU was waiting for disc I/Os to complete. See "Pause nn.n[nn]".

    Global CPU Statistics (tabular format)

    On the Global Summary screen, the first main section of data is the Global CPU Statistics. This block of data contains system-wide CPU, memory, and disc statistics.
    The Global Statistics block can be toggled between graphical and tabular display formats by pressing the G key while on the Global Screen. The tabular format may also be obtained by setting option 5 (Display Option) in the Main Options menu to 2 - Tabular.


    Figure 9.6 SOS Global Summary screen (tabular format): Global CPU Statistics

    Total: nnn.n[nnn]

    The Total value displayed is the sum of CPU BUSY percentages for all queues, memory, dispatch, and ICS/OH.
    Performance Tip
    If this number consistently exceeds 85%, and the majority of this time is consumed by interactive user processing, it is possible that the CPU is creating a bottleneck on the system. It is important to gather this data over a period of time and not base a diagnosis on any single spike of activity. If the majority of this value is due to batch activity, this usually implies there is ample CPU capacity for interactive users.

    Hi Pri nnn.n[nnn]

    This is the percentage of CPU time spent on a combination of AQ, BQ, CQ, Memory, Dispatcher, and ICS/OH processes.
    Performance Tip
    It is generally understood that measuring the high priority busy time is a better indicator of CPU saturation that total busy. If Hi Pri processes are consistently using 65% or more of the CPU’s time, the CPU may be near saturation levels, as this would leave very little bandwidth for critical batch processes or expected growth to the processes or users on the system.

    AQ, BQ, CQ, DQ, EQ nn.n[nn]

    These statistics indicate how much CPU time is spent executing user and system program codes within the respective scheduling queues. These statistics do not include time spent managing main memory, dispatching processes, or executing other overhead activity.
    Performance Tip
    If the sum of these percentages (particularly AQ, BQ, and CQ) are very large and there is little to no time spent in any active or paused states, it is possible that one or more processes are experiencing difficulties completing, such as a looping condition. The offending process(es) should be identified by finding the highest CPU user (use the HOG PROC ZOOM key for this). If the sum of these numbers is very low and other active or passive statistics are very high, then an overhead task may be consuming the CPU and should be researched further. A low number in these process states counters (when other busy and paused counters are low) means that there is plenty of CPU capacity available for more processing (batch or interactive). It is important to note the spread of CPU in various queues. The AQ and BQ should have a very low percentage of the CPU, except for brief spikes. It is best to see that CQ, DQ, and EQ obtain the majority of the CPU because other activities typically represent overhead activity, thus unproductive, tasks.

    Memory nn.n[nn]

    This figure represents how much CPU time is spent handling memory page activity. This counter includes time spent on memory allocations for user processes that cannot be launched (obtain the CPU) until necessary segments are present in memory.
    Performance Tip
    A slight memory load is indicated by memory manager percentages of between 5-8%. Problematic percentages are between 8-12%, and unacceptable readings are 12% and higher. These are simply rough guidelines, and should be taken into consideration with other memory and disc pulse point indicators. See "SOS Memory Detail" for more information on memory. The memory manager percentage tends to be more reliable on MPE/iX systems than on MPE V systems.

    Dispatch nn.n[nn]

    This statistic represents the amount of time the CPU spends on scheduling and dispatching processes.
    Performance Tip
    If this value rises above 8%, it may mean that MPE/iX is spending an inordinate amount of time dealing with process launch and process stop activity. Correlate this figure with Launch/s, Individual Stop Detail, and Global Stops Detail to gain more insight as to why this is happening. If this figure becomes excessive, response times may increase.

    ICS/OH nn.n[nn]

    This statistic represents the time the CPU spends dealing with external device activity. Pressing RETURN or ENTER to get an MPE/iX prompt is one such interrupt. Time spent handling disc I/O completions are also included here. Interrupt Control Stack (ICS) requires service time by the CPU.
    Performance Tip
    If this value rises above 8%, it may mean that MPE/iX is spending too much time on the DT subsystem, disc, or other datacomm interrupt activity. Locating processes guilty of excessive terminal reads (DTC activity) or processes with large numbers of disc I/Os will be helpful. A small ICS/OH value is desirable.

    Pause nn.n[nn]

    This statistic reveals the percentage of time the CPU spends waiting for disc I/Os to complete. This event is essentially a roadblock for further activity to take place, as no other functions can occur during this waiting period. This is time in which processes could have had work performed on their behalf, but could not because of the relative slowness of the disc drives in performing an I/O.
    Performance Tip
    A large pause percentage indicates that the CPU could have been busier, but because data was not found in main memory, the CPU had to wait on a disc I/O request and was not able to continue processing requests. If the pause percentage exceeds 10%, it may indicate either a disc I/O bottleneck or an inadequate memory configuration. It is best to correlate high pause readings with other memory and disc indicators to identify the true cause of the bottleneck.

    Idle nn.n[nn]

    If the CPU is not actively working on processes and not waiting for any disc I/Os to complete, it is considered to be in an idle state. Simply stated, this is leftover CPU bandwidth.
    Performance Tip
    If a system has a consistently high amount of idle time, it is not being used to its full potential. While it is not desirable to overload the processor, having a system that is too powerful for the processing that is required of it is not cost-effective. If idle time is very low due to a large amount of batch activity, then the system has bandwidth available for batch or interactive growth. However, if the idle time is very low due to mostly interactive processing, then the system may be overloaded. Reducing processes, balancing processes to off-shift or low-use hours, or upgrading the processor will help reduce the load on the system during peak utilization hours.

    CPU QLen nn[nn]

    This statistic represents the average number of processes that are waiting for service from the CPU.
    Performance Tip
    A CPU queue length is like going to the bank. If, when you walk in, a teller is immediately available, there is no line (queue), and you are serviced immediately. If there is only one teller available, however, and three people walked in ahead of you, the teller would help the first person to walk in, and you’d be standing in a line of three people. If, however, there are four tellers available, all four customers could be immediately served. An efficient bank teller will process transactions quickly, so that even as customers filter in to the bank, the line is always kept to a minimum.

    A consistently large CPU queue length indicates a CPU bottleneck. This could be caused by an inadequate model or fewer processors than necessitated by the amount of processing. It could also be caused by a very high job limit or too many jobs scheduled to start up concurrently. Ideally, this number will always be under five, but may reach as high as 10 during moderate to heavy processing. A consistent reading of 10 or higher should be investigated and addressed.

    Launch/s nn[nn]

    A process launch occurs when a process gains exclusive access to the CPU. When that process has to stop for an event, (a disc I/O is most common) it relinquishes the CPU and another process is launched instead. The launch rate indicates the amount of CPU sharing that is taking place on the system. If a single process is launched many times, it is included in this number.
    Performance Tip
    A high launch rate should be evaluated to find out why processes are giving up the CPU so often. If processes are waiting on certain events, (memory, disc I/O, etc.) it is possible that not enough resources are available to adequately service all requests. The Extended Process section (Wait Heading or Wait States) will further explain why processes are having to share the CPU so often. The Global Stops screen is helpful because the Process Launches will be roughly equivalent to the number of Process Stops. The ideal situation is that a process never has to give up the CPU and is processed to completion unhindered. Acceptable numbers of process launches are dependent upon the size of the processor.

    CPU CM%

    This statistic represents the average amount of time the CPU spends in Compatibility Mode program code.
    Performance Tip
    This number can assist with optimizing performance from an MPE V migration. It is important to have as many programs as possible compiled in Native Mode to take full advantage of the Hewlett-Packard Precision Architecture (HP-PA - also known as RISC). The time the CPU spends in Compatibility Mode represents wasted time because code translations must take place. If the programs are compiled with a native language compiler, the translation is done once for all programs at compile time. There may not be a right or wrong value for this indicator on your system. The ability to go "native" is often dependent upon third-party software. If a third-party vendor has not made the switch from MPE V to MPE/iX, then you must remain in Compatibility Mode. This means a performance compromise. It is best to target a value of less than 20%.

    SAQ

    This is the System Average Quantum. This is similar to the ASTT (Average Short Transaction Time) on MPE V systems. It is an ongoing average of the amount of CPU used by transactions and is considered to be short in nature. It includes the last 100 or so terminal transactions the system has tracked. This number is a valuable indicator of the type of activity taking place in the CS scheduling queue. For example, if the SAQ is 11 milliseconds, (extremely small) this means the amount of CPU time used by interactive processes to accomplish their transactions was very low.

    TI CPU% nn.n[nn.n]

    This is the percentage of utilized CPU resource that is being dedicated for all DBI calls performed on the system by all processes on any database.

    Global Misc Statistics (tabular format)

    The Global Misc Statistics portion of the tabular global screen provides statistics to further analyze the condition of your system. These statistics, while often helping to indicate a bottleneck in any of the three main components of the system, (CPU, memory, and disc) are not directly related to any of them, and so fall into their own "miscellaneous" category. See Table 9.4 for details on these data items.


    Figure 9.7 Global Summary screen (tabular format): Global Misc Statistics

    Miscellaneous Display Options

    The Miscellaneous Statistics are, by default, displayed on the Global Tabular screen. To suppress the display of Miscellaneous Statistics:
  • Press O from the Global Summary screen to access the SOS Main Options menu.
  • Select option 5 - Display option.
  • Type 3. This will suppress the Display option. Press Enter.
  • Press Enter again and type Y if you want to save these options permanently, or press Enter again to exit the options without saving.
  • Miscellaneous Data Items


    Table 9.4 SOS Global Summary screen (tabular format): Global Misc Statistics
    Data Item
    Description
    #Ses
    The number of sessions logged on to the system during the current interval.
    #Job
    The number of batch jobs logged on to the system during the current interval.
    #Proc
    The number of processes present during the current interval. One job or session may spawn several processes. MPE/iX requires many processes for normal operation.
    CM to NM Switches/sec
    These values represent the number of Compatibility Mode to Native Mode switches performed per second for the current interval, as well as cumulatively.
    Performance Tip
    A CM to NM switch occurs when a piece of code that is executed reverts from Compatibility Mode to Native Mode. This operation is not as expensive to perform as is NM to CM switching. Depending on the system size, more than 200 per second can be sustained without being an excessive overhead drain on the CPU.
    NM to CM Switches/sec
    These values represent the number of Native Mode to Compatibility Mode switches performed per second during the current interval, as well as cumulatively.
    Performance Tip
    An NM to CM switch occurs when a piece of code that is executed reverts from Native Mode to Compatibility Mode. This operation is quite expensive to perform and should be minimized. Depending on the system size, more than 50 per second may indicate an overhead drain on the CPU. It is best to "go native" whenever possible. However, this can cause an increased