TOCPREVNEXTINDEX

Lund Performance Solutions


SOS Global Summary
The SOS Global Summary screen is the first screen to display when you start SOS and the usual starting point for any review of system activity and performance. The screen can be displayed in either graphical or tabular format.
To access the Global Summary screen from any SOS display screen:
  • Type S from the SOS Enter command prompt to view the Screen Selection Menu.
  • From the Screen Selection Menu, enter G (Global Summary).
  • To toggle between the graphical and tabular display formats, press the G key again.
  • Graphical Format

    Figure 9.1 shows an example of the Global Summary screen in graphical format.


    Figure 9.1 SOS Global Summary screen (graphical format)
    This example screen contains the following information;
  • The SOS banner
  • The Key Indicators of Performance
  • Global statistics
  • Process information (optional)
  • System Performance Advice (optional)
  • Each of these features is described in "Global Screen Display Items".

    Tabular Format

    To toggle between the graphical and tabular format options, press the G key from the Global Summary screen. Figure 9.2 shows an example of the Global Summary screen in tabular format.


    Figure 9.2 SOS Global Summary screen (tabular format)
    The tabular Global Summary screen provides:
  • The SOS banner
  • The Key Indicators of Performance
  • Global CPU Statistics
  • Global Misc. Statistics
  • Global Memory Statistics (optional, and not displayed in Figure 9.2)
  • Global Disc Statistics (optional, and not displayed in Figure 9.2)
  • Process Information (optional)
  • Process Summary by Application Workloads (optional, and not displayed in Figure 9.2)
  • System Performance Advice messages (optional)
  • Each of these components is described in detail in the next section, "Global Screen Display Items."

    Global Screen Display Items

    SOS Banner

    The SOS banner is always displayed at the top of all SOS data display screens.


    Figure 9.3 SOS Global Summary screen: SOS banner
    The SOS banner contains information about the SOS program, the host system, the elapsed interval, and the current interval.

    Product Version Number (SOS 3000 V.nnx)

    The first item displayed in the SOS banner (reading left to right) is the product version number, SOS 3000.Vnnx). The version number denotes the following about the product:
  • SOS 3000 is the name of the product.
  • V denotes the major version level.
  • nn denotes the minor version level.
  • x denotes the fix level.
  • The SOS version number displayed in the example (refer to Figure 9.3) is G.03x. When contacting technical support, please provide the product version number of the software installed on your system.

    System Name

    The second item displayed in the SOS banner line is the name of the system given during the installation of the operating system. The name of the system used in the example shown in Figure 9.3 is LPS.

    Current Date and Time (DDD, MMM DD YYYY, HH:MM)

    The third item in the SOS banner line is the current date and time:
  • DDD denotes the day of the week.
  • MMM denotes the month.
  • DD denotes the day of the month.
  • YYYY denotes the year.
  • HH:MM denotes the hour and minutes.
  • Elapsed Time (E: HH:MM:SS)

    The fourth item displayed in the SOS banner line is the elapsed time (E: HH:MM:SS), which is the time counted in hours, minutes, and seconds that has passed since the current session of SOS Performance Advisor was started.

    Current Interval (I: MM:SS)

    The last item displayed in the banner line is the current interval (I: MM:SS). The current interval is the amount of time in minutes and seconds accumulated since SOS last updated the screen. The measurements reported on any SOS display screen are valid for the current interval.

    By default, the interval refresh rate is 60 seconds. The rate can be adjusted from the Main Options menu screen. For further information, refer to "Screen refresh interval in seconds".

    Key Indicators of Performance (KIP) Line

    The Key Indicators of Performance (KIP) line can be displayed just below the SOS banner.


    Figure 9.4 SOS Global Summary screen: Key Indicators of Performance (KIP) line
    The purpose of the KIP line is to display statistics associated with the primary indicators of performance for the current interval—by default, Total (CPU) Busy, High Pri, MemMgr, and Read Hit data (described below). To configure the KIP line, follow the instructions in "Display Key Indicators of Performance".
    Table 9.1 SOS Global Summary screen: Key Indicators of Performance (KIP) line data
    Data Item
    Description
    Total Busy
    The percentage of time the CPU spent executing all processes during the current interval.
    High Pri
    The percentage of time the CPU spent executing high priority processes during the current interval.
    MemMgr
    The percentage of time the CPU spent managing memory.
    Read Hit
    The read hit percentage for the current interval.

    NOTE By editing the SOSKIP.PUB.LPS file you can redefine the variables to display in the KIP line. For information about editing the SOSKIP file, see "SOSKIP File".

    Global Statistics (graphical format)

    On the graphical version of the Global Summary screen, the Global Statistics block contains system-wide CPU, memory, and disc data The graphical format uses a bar graph to display data either in percentages or total numbers. Each block displayed reflects a value of 2 or 2%. Disc I/O queue length statistics is measured as 0.2.


    Figure 9.5 SOS Global Summary screen (graphical format): Global Statistics
    The graphical format is easier to read than the tabular display (see "Tabular Format") but contains less detailed information.



    Table 9.2 SOS Global Summary screen (graphical format): Global Statistics
    Data Item
    Description
    CPU%
    The percentage of CPU resource used in the major CPU states. The letter codes are described in Table 9.3.
    Space between the end of the video bar and the percent sign (%) denotes the percentage of time the CPU was idle.
    TRN/min
    The estimated number of terminal reads per minute (roughly equivalent to user transactions) based on the number of actual terminal reads performed during the current interval.
    RHIT%
    The percentage of time data requests are satisfied in main memory, without having to perform a disk I/O.
    IO/sec
    The total number of disc I/Os performed on all disc devices, broken down into reads (R) and writes (W). For a detailed explanation of these statistics, see "Disc Data Items".
    TI CPU%
    The percentage of utilized CPU resource that is dedicated to performing DBI calls (successful or not) by all processes. If option 16 in the SOS Main Option Screen is not set to Y (yes), then this statistic will not be displayed. See "SOS TurboIMAGE Database Main" for more information on TurboIMAGE statistics.
    QLEN
    The average number of processes waiting to use the CPU during this interval. See "CPU QLen nn[nn]" for a detailed explanation of this statistic.
    RESP sec
    The average global user prompt response time for terminal transactions. It is the time elapsed from when C/R or ENTER is pressed to the time the user receives a prompt.
    PFLT/s
    The number of memory page faults that occur per second. This indicates whether or not there is enough memory. See "Memory Data Items".
    QLEN
    The average number of disk I/O requests pending for all disc drives combined during the current interval. Each character position in the bar represents an average queue length of 0.2 requests. See "Disc Data Items".
    I/10/sec
    The number of DBI calls (intrinsics) performed by all processors per second, divided by 10. See "SOS TurboIMAGE Database Main".
    Table 9.3 SOS Global Summary: CPU states
    Letter
    Description
    A,B,C,D,E
    These letters indicate how much CPU time for the current interval is used to execute user and system code on behalf of those processes running in each respective scheduling queue. This is the time the CPU works constructively on our behalf as opposed to performing overhead tasks. MPE/iX system processes usually run in the A and B queues. The C queue is typically reserved for interactive user processes. Batch jobs usually run in the D and E queues. See "AQ, BQ, CQ, DQ, EQ nn.n[nn]".
    M
    The percentage of CPU resource spent on managing main memory. See "Memory nn.n[nn]".
    O
    The percentage of time the CPU spends scheduling and dispatching processes and dealing with external device activity. See "ICS/OH nn.n[nn]".
    P
    The percentage of time the CPU was waiting for disc I/Os to complete. See "Pause nn.n[nn]".

    Global CPU Statistics (tabular format)

    On the Global Summary screen, the first main section of data is the Global CPU Statistics. This block of data contains system-wide CPU, memory, and disc statistics.
    The Global Statistics block can be toggled between graphical and tabular display formats by pressing the G key while on the Global Screen. The tabular format may also be obtained by setting option 5 (Display Option) in the Main Options menu to 2 - Tabular.


    Figure 9.6 SOS Global Summary screen (tabular format): Global CPU Statistics

    Total: nnn.n[nnn]

    The Total value displayed is the sum of CPU BUSY percentages for all queues, memory, dispatch, and ICS/OH.
    Performance Tip
    If this number consistently exceeds 85%, and the majority of this time is consumed by interactive user processing, it is possible that the CPU is creating a bottleneck on the system. It is important to gather this data over a period of time and not base a diagnosis on any single spike of activity. If the majority of this value is due to batch activity, this usually implies there is ample CPU capacity for interactive users.

    Hi Pri nnn.n[nnn]

    This is the percentage of CPU time spent on a combination of AQ, BQ, CQ, Memory, Dispatcher, and ICS/OH processes.
    Performance Tip
    It is generally understood that measuring the high priority busy time is a better indicator of CPU saturation that total busy. If Hi Pri processes are consistently using 65% or more of the CPU’s time, the CPU may be near saturation levels, as this would leave very little bandwidth for critical batch processes or expected growth to the processes or users on the system.

    AQ, BQ, CQ, DQ, EQ nn.n[nn]

    These statistics indicate how much CPU time is spent executing user and system program codes within the respective scheduling queues. These statistics do not include time spent managing main memory, dispatching processes, or executing other overhead activity.
    Performance Tip
    If the sum of these percentages (particularly AQ, BQ, and CQ) are very large and there is little to no time spent in any active or paused states, it is possible that one or more processes are experiencing difficulties completing, such as a looping condition. The offending process(es) should be identified by finding the highest CPU user (use the HOG PROC ZOOM key for this). If the sum of these numbers is very low and other active or passive statistics are very high, then an overhead task may be consuming the CPU and should be researched further. A low number in these process states counters (when other busy and paused counters are low) means that there is plenty of CPU capacity available for more processing (batch or interactive). It is important to note the spread of CPU in various queues. The AQ and BQ should have a very low percentage of the CPU, except for brief spikes. It is best to see that CQ, DQ, and EQ obtain the majority of the CPU because other activities typically represent overhead activity, thus unproductive, tasks.

    Memory nn.n[nn]

    This figure represents how much CPU time is spent handling memory page activity. This counter includes time spent on memory allocations for user processes that cannot be launched (obtain the CPU) until necessary segments are present in memory.
    Performance Tip
    A slight memory load is indicated by memory manager percentages of between 5-8%. Problematic percentages are between 8-12%, and unacceptable readings are 12% and higher. These are simply rough guidelines, and should be taken into consideration with other memory and disc pulse point indicators. See "SOS Memory Detail" for more information on memory. The memory manager percentage tends to be more reliable on MPE/iX systems than on MPE V systems.

    Dispatch nn.n[nn]

    This statistic represents the amount of time the CPU spends on scheduling and dispatching processes.
    Performance Tip
    If this value rises above 8%, it may mean that MPE/iX is spending an inordinate amount of time dealing with process launch and process stop activity. Correlate this figure with Launch/s, Individual Stop Detail, and Global Stops Detail to gain more insight as to why this is happening. If this figure becomes excessive, response times may increase.

    ICS/OH nn.n[nn]

    This statistic represents the time the CPU spends dealing with external device activity. Pressing RETURN or ENTER to get an MPE/iX prompt is one such interrupt. Time spent handling disc I/O completions are also included here. Interrupt Control Stack (ICS) requires service time by the CPU.
    Performance Tip
    If this value rises above 8%, it may mean that MPE/iX is spending too much time on the DT subsystem, disc, or other datacomm interrupt activity. Locating processes guilty of excessive terminal reads (DTC activity) or processes with large numbers of disc I/Os will be helpful. A small ICS/OH value is desirable.

    Pause nn.n[nn]

    This statistic reveals the percentage of time the CPU spends waiting for disc I/Os to complete. This event is essentially a roadblock for further activity to take place, as no other functions can occur during this waiting period. This is time in which processes could have had work performed on their behalf, but could not because of the relative slowness of the disc drives in performing an I/O.
    Performance Tip
    A large pause percentage indicates that the CPU could have been busier, but because data was not found in main memory, the CPU had to wait on a disc I/O request and was not able to continue processing requests. If the pause percentage exceeds 10%, it may indicate either a disc I/O bottleneck or an inadequate memory configuration. It is best to correlate high pause readings with other memory and disc indicators to identify the true cause of the bottleneck.

    Idle nn.n[nn]

    If the CPU is not actively working on processes and not waiting for any disc I/Os to complete, it is considered to be in an idle state. Simply stated, this is leftover CPU bandwidth.
    Performance Tip
    If a system has a consistently high amount of idle time, it is not being used to its full potential. While it is not desirable to overload the processor, having a system that is too powerful for the processing that is required of it is not cost-effective. If idle time is very low due to a large amount of batch activity, then the system has bandwidth available for batch or interactive growth. However, if the idle time is very low due to mostly interactive processing, then the system may be overloaded. Reducing processes, balancing processes to off-shift or low-use hours, or upgrading the processor will help reduce the load on the system during peak utilization hours.

    CPU QLen nn[nn]

    This statistic represents the average number of processes that are waiting for service from the CPU.
    Performance Tip
    A CPU queue length is like going to the bank. If, when you walk in, a teller is immediately available, there is no line (queue), and you are serviced immediately. If there is only one teller available, however, and three people walked in ahead of you, the teller would help the first person to walk in, and you’d be standing in a line of three people. If, however, there are four tellers available, all four customers could be immediately served. An efficient bank teller will process transactions quickly, so that even as customers filter in to the bank, the line is always kept to a minimum.

    A consistently large CPU queue length indicates a CPU bottleneck. This could be caused by an inadequate model or fewer processors than necessitated by the amount of processing. It could also be caused by a very high job limit or too many jobs scheduled to start up concurrently. Ideally, this number will always be under five, but may reach as high as 10 during moderate to heavy processing. A consistent reading of 10 or higher should be investigated and addressed.

    Launch/s nn[nn]

    A process launch occurs when a process gains exclusive access to the CPU. When that process has to stop for an event, (a disc I/O is most common) it relinquishes the CPU and another process is launched instead. The launch rate indicates the amount of CPU sharing that is taking place on the system. If a single process is launched many times, it is included in this number.
    Performance Tip
    A high launch rate should be evaluated to find out why processes are giving up the CPU so often. If processes are waiting on certain events, (memory, disc I/O, etc.) it is possible that not enough resources are available to adequately service all requests. The Extended Process section (Wait Heading or Wait States) will further explain why processes are having to share the CPU so often. The Global Stops screen is helpful because the Process Launches will be roughly equivalent to the number of Process Stops. The ideal situation is that a process never has to give up the CPU and is processed to completion unhindered. Acceptable numbers of process launches are dependent upon the size of the processor.

    CPU CM%

    This statistic represents the average amount of time the CPU spends in Compatibility Mode program code.
    Performance Tip
    This number can assist with optimizing performance from an MPE V migration. It is important to have as many programs as possible compiled in Native Mode to take full advantage of the Hewlett-Packard Precision Architecture (HP-PA - also known as RISC). The time the CPU spends in Compatibility Mode represents wasted time because code translations must take place. If the programs are compiled with a native language compiler, the translation is done once for all programs at compile time. There may not be a right or wrong value for this indicator on your system. The ability to go "native" is often dependent upon third-party software. If a third-party vendor has not made the switch from MPE V to MPE/iX, then you must remain in Compatibility Mode. This means a performance compromise. It is best to target a value of less than 20%.

    SAQ

    This is the System Average Quantum. This is similar to the ASTT (Average Short Transaction Time) on MPE V systems. It is an ongoing average of the amount of CPU used by transactions and is considered to be short in nature. It includes the last 100 or so terminal transactions the system has tracked. This number is a valuable indicator of the type of activity taking place in the CS scheduling queue. For example, if the SAQ is 11 milliseconds, (extremely small) this means the amount of CPU time used by interactive processes to accomplish their transactions was very low.

    TI CPU% nn.n[nn.n]

    This is the percentage of utilized CPU resource that is being dedicated for all DBI calls performed on the system by all processes on any database.

    Global Misc Statistics (tabular format)

    The Global Misc Statistics portion of the tabular global screen provides statistics to further analyze the condition of your system. These statistics, while often helping to indicate a bottleneck in any of the three main components of the system, (CPU, memory, and disc) are not directly related to any of them, and so fall into their own "miscellaneous" category. See Table 9.4 for details on these data items.


    Figure 9.7 Global Summary screen (tabular format): Global Misc Statistics

    Miscellaneous Display Options

    The Miscellaneous Statistics are, by default, displayed on the Global Tabular screen. To suppress the display of Miscellaneous Statistics:
  • Press O from the Global Summary screen to access the SOS Main Options menu.
  • Select option 5 - Display option.
  • Type 3. This will suppress the Display option. Press Enter.
  • Press Enter again and type Y if you want to save these options permanently, or press Enter again to exit the options without saving.
  • Miscellaneous Data Items


    Table 9.4 SOS Global Summary screen (tabular format): Global Misc Statistics
    Data Item
    Description
    #Ses
    The number of sessions logged on to the system during the current interval.
    #Job
    The number of batch jobs logged on to the system during the current interval.
    #Proc
    The number of processes present during the current interval. One job or session may spawn several processes. MPE/iX requires many processes for normal operation.
    CM to NM Switches/sec
    These values represent the number of Compatibility Mode to Native Mode switches performed per second for the current interval, as well as cumulatively.
    Performance Tip
    A CM to NM switch occurs when a piece of code that is executed reverts from Compatibility Mode to Native Mode. This operation is not as expensive to perform as is NM to CM switching. Depending on the system size, more than 200 per second can be sustained without being an excessive overhead drain on the CPU.
    NM to CM Switches/sec
    These values represent the number of Native Mode to Compatibility Mode switches performed per second during the current interval, as well as cumulatively.
    Performance Tip
    An NM to CM switch occurs when a piece of code that is executed reverts from Native Mode to Compatibility Mode. This operation is quite expensive to perform and should be minimized. Depending on the system size, more than 50 per second may indicate an overhead drain on the CPU. It is best to "go native" whenever possible. However, this can cause an increased dependency on the application design.
    Transactions
    This line contains three statistics regarding terminal reads. The first value is the number of terminal reads performed by all interactive terminal users for the current interval. The second value (within brackets) is the total number of terminal reads performed since SOS/3000 was initially started. The third value (within parentheses) is the estimated number of terminal reads per minute based on the current interval’s workload.
    Performance Tip
    It is essential to understand MPE/iX’s definition of a terminal read. A terminal read occurs any time a terminal receives input from a user (C/R or ENTER). The true number of transactions, as we define transactions, is likely to be less than what is reported on this line by SOS. If you are using Character Mode and your application defines a transaction as being delimited by a single carriage return, these numbers will represent interactive activity. VPLUS applications will have accurate transaction counts.
    TI Intrinsic/s
    The average number of DBI calls performed on the system by all processes. The value outside the brackets is the number of DBI calls processed per second during the current interval. The value inside the brackets is the number of DBI calls performed per second since SOS was started.
    Avg Prompt Resp
    The average system response time for all processes that execute terminal reads. It is the time it takes from when a user presses C/R or ENTER to when the user is supplied a new prompt. Current and cumulative values are displayed. This value includes both Command Interpreter and Application Process response times. Average response time will not be displayed if the option to collect process information is turned off (option 8 in the SOS Main Option menu).
    Performance Tip
    There are some important things to consider when evaluating average response times. First, applications that perform multiple Character Mode terminal reads to issue a single user transaction will have varied response times reported. For example, is a user enters data into 20 fields on a screen, and then presses a final RETURN, this is considered to be a single transaction by the user. However, MPE/iX counts 21 terminal reads, and SOS will report 21 transactions, 20 for the fields and 1 for the carriage return.

    Second, Command Interpreter times are included in these numbers. When a user or job logs on, a process called Command Interpreter (CI) is created by MPE/iX on behalf of that user. This process communicates with the user at the terminal by means of the MPE/iX prompt. When a program such as EDITOR.PUB.SYS is started, another processes is created. The response time of the CI process envelops the second process. so that when you look at the process’s response time at the process level, it will probably be very large. It is not usually a helpful number when a significant number of CI processes are included. Basically, any process that creates a son process that performs terminal I/Os will have its response time elevated by the son’s total time.

    Finally, notice that this value is especially important if users perform a great deal of on-line terminal reporting. Charting this value, especially when contrasted with terminal reads, will provide insight into what kind of response times the users are actually experiencing.

    Global Memory Statistics (tabular format)

    The Global Memory Statistics section of the Global Screen focuses on indicators that are primarily memory related. To view specific memory statistics, refer to the "SOS Memory Detail" screen.


    Figure 9.8 SOS Global Summary screen (tabular format): Global Memory Statistics

    Memory Display Options

    To display or suppress the Global Memory Statistics in the Global screen, enable or disable option 6 - "Display memory information on global screen"- in the SOS Main Options Menu screen.

    Memory Data Items

    The data items presented in the Global Memory Statistics portion of the Global screen are described in Table 9.5.

    Table 9.5 SOS Global Summary screen (tabular format): Global Memory Statistics
    Data Item
    Description
    Page Fault Rate
    The current and cumulative number of times per second that memory page faulting occurs. A page fault occurs when a process needs a memory object (code or data) that is absent from main memory. Acceptable ranges depend on system size. See Table 9.6.
    Lbry Fault %
    The percentage of all page faults that occurred because system libraries were not present in memory (XL.PUB.SYS, NL.PUB.SYS, SL.PUB.SYS). A system library page fault is counted when a process needs code from a library that is absent from main memory.
    Performance Tip
    An consistent value of more than 10% of page faults due to library faults can indicate memory shortage or an inappropriate demand on memory.
    Memory Cycles
    The number of times the memory manager cycles through main memory looking for adequate space to satisfy requests for memory, during the current interval and cumulatively. A large number indicates that the requests for memory are not being satisfied efficiently. A low number implies that there is adequate memory. If this value is blank or zero, (0) that means that the clock was not active during this interval. This is the best possible situation.
    Read Hit %
    The percentage of time that requests for data or code were satisfied in main memory without having to resort to a disc I/O. While this indicator reflects memory efficiency, read hit percentage also can reveal disc bottlenecks.
    Performance Tip
    A high percentage is desirable. See "SOS/3000 Pulse Points".
    Overlay Rate
    The number of memory overlay candidates occurring per second. An overlay candidate is a memory object that is flagged as temporarily non-essential and subject to being overwritten in order to allow higher priority processes to be attended to.
    Performance Tip
    A low rate is desirable. For instance, a poorly sized Image Master Set can lead to poor distribution of records in the Set. The records may be bunched together leaving large areas of unused space in the file. Consequently, a large overlay rate may reflect on this problem, since pages of data brought into memory may come from these areas of blank pages and are immediately marked as overlay candidates.
    Swap/Launch
    The ratio of swap-ins to the number of launches during the current interval.
    Performance Tip
    A large ratio means that for every time a process gained access to the CPU, necessary segments were not present in main memory, thus disabling the process. A consistent number that is higher than 0.5 could indicate a possible memory shortage. Correlate this value with other memory indicators to determine if this is the case.
    Table 9.6 Page Fault Rates
    Performance Indicator
    Performance Ranges
    Normal
    Problematic
    Unacceptable
    Page Faults/second
    Small, single processor
    HP 3000 series models 920, 922, 925, 932, 935
    less than 4
    4 to 8
    greater than 8
    Medium, max. 2-way
    HP 3000 series models 917, 927, 937, 947, 918, 928, 929, 939, 949
    less than 8
    8 to 12
    greater than 12
    Moderate, max. 2-way
    HP 3000 series models 950, 955, 957, 967, 977, 987, 960, 968
    less than 13
    13 to 19
    greater than 19
    Large, max. 2-way
    HP 3000 series models 959, 978, 980, 988, 990
    less than 20
    20 to 40
    greater than 40
    Larger, max. 4-way
    HP 3000 series models 959, 969, 979, 989, 992, 995, 996, 997
    less than 40
    40 to 60
    greater than 60
    Even larger max. 6-way
    HP 3000 series models 969, 979, 989, 992, 995, 996, 997
    less than 100
    100 to 150
    greater than 150
    Very large max. 8-way
    HP 3000 series models 969, 979, 989, 992, 995, 996, 997
    less than 150
    150 to 200
    greater than 200
    Note: Performance ranges for HP 3000 series models 996/900-996/1200 may vary depending upon the application.

    Global Disc Statistics (tabular format)

    The Global Disk Statistics portion of the tabular Global Screen presents statistics for each configured disc drive on the system. This information addresses the following issues:
  • How balanced are the I/Os across the discs?
  • Is one disc accessed more frequently than others?
  • Are disc I/Os exceeding acceptable limits?
  • The Global Disc Statistics screen (Figure 9.9) contains the first level of magnification of disc statistics. To access more detailed individual disc information:
  • Press F7 function key to access the Screen Menu, or press J to receive a Screen Selection Prompt.
  • Press d (case insensitive) to access the Disc I/O Detail screen.

  • NOTE In this section, disc and drive are used interchangeably.




    Figure 9.9 SOS Global Summary screen (tabular format): Global Disc Statistics

    Disc Display Options

    To display or suppress the Global Disc Statistics in the Global screen, enable or disable option 7 - "Display disc information on global screen"- in the SOS Main Options Menu screen.

    Disc Data Items

    The data items presented in the Global Disc Statistics portion of the Global screen are described in Table 9.7.

    Table 9.7 SOS Global Summary screen (tabular format): Global Disc Statistics
    Data Item
    Description
    LDev
    The Logical Device (disc) number.
    IO/s
    The total rate of both reads and writes per second to each disc and all discs combined.
    Performance Tip
    A typical single disc drive can sustain I/O rates upward of 25-30 per second. If rates consistently exceed this number, it is possible that a disk I/O exists. The CPU pause for disc I/O values should be investigated and correlated with these readings. If one or more disc drives is consistently sustaining a majority of I/O hits, then balancing files to other less active drives will likely alleviate the bottleneck. Check the Disc I/O Detail screen for cumulative values to gain a long term-view of the situation.
    IO%
    The percentage of all disc I/Os performed by each drive.
    Performance Tip
    The I/O% statistics are helpful in determining how balanced the I/O distribution is across the drives. A drive that performs a substantially higher percentage of total I/O may contain files that are more actively accessed than files on other discs.
    QLen
    The average length of the request queue for each disc drive when another request arrives at that disc. See "CPU QLen nn[nn]" for a detailed description of queue length.
    Performance Tip
    In terms of queue length, zero (0) is ideal, but is rarely the case on an active system. An average queue length of 1.0 or greater is unacceptable. Keep in mind that brief, substantial increases throughout the day are normal.
    If one drive has consistently high queue lengths, explore the following possibilities:
  • There is excessive disc arm movement due to frequently accessed files. These files may depend on each other and are on the same disc drive. Heavily accessed files should be distributed across different drives.
  • There are database file inefficiencies. Dynamic databases are constantly changing; files are added and deleted, forcing the drive arm to search over the platter to find all the data. Repacking the database may alleviate these issues.
  • The disc drive itself is too slow for the activity requested of it.
  • Process Information

    After reviewing the general state of the global resources, the next logical step in analyzing a system’s performance is to observe individual processes. It is important to find out which users are running which programs, and to determine the resources utilized by those processes. The primary purpose of the Process Information section of the global screen is to identify key resources consumed by various processes on the system. Figure 9.10 represents a sample Process Information section of the Global screen.


    Figure 9.10 SOS Global Summary screen: Process Information
    The Process Information section displays information about three types of processes:
  • System processes.
  • Command Interpreter.
  • User Processes.
  • Process Information Display Options

    To display or suppress the Process Information in the Global screen, enable or disable option 9 - "Display process information- in the SOS Main Options Menu screen. If other options (advice messages and/or workload information) are also set to Y in the SOS Main Option Menu, all this information not be immediately visible. Scrolling or paging up through the sections may be necessary.
    By default, only processes that have used CPU time will be displayed. All processes, whether they have utilized the CPU or not, can be viewed one time. To see all processes:
  • Press the OPTION KEYS function key (F5) from the Main Keys screen.
  • Press the DISP ALL PROCS - 1X function key (F7).
  • All processes will be listed, although most will not be visible unless the screen is scrolled up.
    There are many ways to configure the Process Display section. For further information, please see "Process Display Options". Try a variety of display options to see what best suits your performance monitoring needs.

    Process Information Data Items

    The data items presented in the Process Information portion of the Global screen are described in Table 9.8.



    Table 9.8 SOS Global Summary screen: Process Information
    Data Item
    Description
    PIN
    The process identification number that uniquely identifies each process running on the system. These processes can be executed by MPE/iX, a user, or batch job. These unique numbers allow processes to be identified and investigated. A single job or session can have many processes associated with it. In order to see all processes in the Process Tree:
  • Press UTILITY KEYS (F6).
  • Choose PROCESS TREE (F4) or JOB/SESS TREE (F5).
  • Enter the PIN number of the process and press Enter.
  • J/S#
    The job or session number of a particular process. If the process is spawned by the system, <sys> will appear in this column.
    Session/User Name
    The logon sequence as initiated by the users or job, minus the logon group. Once again, if the process was spawned by MPE/iX, then <system process> will be shown here.
    Cmd/Program
    This is the program or last MPE/iX command executed by the user. Some system type program names will be uniquely identified, such as "Spooler." If the process is a Command Interpreter process, a ci:, followed by the last MPE/iX command issued by the user, will appear in this column. Notice that the actual command will only appear for root level CI processes, and not for subsequent CI processes in the process tree.
    CPU%
    The amount of CPU resource used by this process during the current interval. If a process uses more than 0%, but less or equal to 0.1%, then this value will be reflected as ".>%."
    Performance Tip
    The highest CPU users (the "Hog") is displayed in the Performance Advice section (see "System Performance Advice"). If you want to zoom in on the hog process, press HOG PROC ZOOM (F4). If a process is using an inordinate amount of CPU for an extended period of time, it is possible that the process is looping. If a process should be getting CPU time, but isn’t, look at the Wait state (Process Detail screen) to find out why.
    QPri
    This column displays two items of importance.
    The first is the particular MPE/iX Dispatcher subqueue in which the process is executing. This is displayed with two letters. The first indicates the subqueue, while the second indicates whether the subqueue in linear ("L") or circular ("S"). If a process is in a circular subqueue, the priority can be changed. If it is in a linear subqueue, then the priority is fixed.
    The second is the absolute priority number that the Dispatcher uses to determine which process will receive CPU attention next. This will be a one, two, or three digit number.
    #Rd
    The absolute number of logical disc reads (usually not the same as physical) performed by this process during the current interval.
    #Wr
    The absolute number of logical disc writes (usually not the same as physical) performed by this process during the current interval.
    Performance Tip
    These values are important because they can help identify a process that is performing excessive disc I/O. This number will not usually be the same as the actual number of physical disc reads because data may be pre-fetched, thus eliminating some I/O. The System Performance Advice section will report the high I/O (reads and writes) process for the current interval. When these processes are identified, it must be determined whether these I/Os are necessary or not. Please refer to the Disc I/O and TurboIMAGE chapters of Taming the HP 3000 for a list of areas to explore.
    LDV
    The logical device number of the device at which the process was created. Batch jobs will have a "-" for the root Command Interpreter process and the rest of the processes in the tree will show the Stream Device number (usually 10). System processes will also display a "-." The LDV column is helpful in tracking down a user whose process exhibits unique traits. You may see an erroneous number here when jobs are in the process of terminating.
    #Tr
    The current number of terminal transactions (possibly equivalent to terminal reads) performed by the process to a particular terminal device. Under certain conditions, this number will represent the actual number of user transactions, (posting a payment, inquiring on an account, etc.). An inaccurate number will be displayed if multiple carriage returns per screen are used for data entry. VPLUS status checks are not counted by the measurement interface (which SOS accesses). Therefore, transaction counts for VPLUS applications will be accurate. The best way to determine if terminal reads and transactions are equivalent is to test them. A user can enter a certain number of transactions as defined from the user’s standpoint and track that activity via SOS to see if there is a discrepancy.
    Performance Tip
    Any high number here (depending on the length of the display interval) should be investigated. Heavy terminal activity can drain the CPU’s attention with non-productive overhead tasks. Sometimes, an application design problem can be identified if a large number of terminal reads occur when very little useful activity is taking place.
    PRes
    This is the terminal read response time for interactive users. This can be displayed as either Prompt Response time (PRes) or First Response time (FRes) in the Process Display Options submenu. First response time is measured from the time the user pressed C/R or ENTER to the time the first character appears on screen. Prompt response times are measured from the moment a user presses C/R or ENTER to the time when the user is supplied a new prompt.
    Performance Tip
    Excessively high response times should be investigated. It is important to analyze the wait state percentages as shown in the Extended Process Display line, or at the Process Detail screen for the process experiencing excessive response times. If on-line reporting is typical on the system, then prompt response times may be excessive, thus skewing the true system response time. In this case, first response times will be more meaningful for tracking the rate at which the system is sending data back to the user’s terminal.

    Extended Process Display

    There is an option to display a second line of detail for each process. This is called the Extended Process line and provides more in depth information about each individual process. Figure 9.11 represents a sample Extended Process Information section of the Global screen.


    Figure 9.11 SOS Global Summary screen: Extended Process Information

    Extended Process Information Display Options

    There are two ways to display the Extended Process line.
  • Press the OPTION KEYS function key (F5).
  • Press the EXTENDED PROCESS function key (F5).
  • OR
  • Press the OPTION KEYS function key (F5).
  • Press the MAIN OPTIONS function key (F1).
  • Type 15 (Detail display options) and press ENTER.
  • Type1 (Process display options) and press ENTER.
  • Type 1 to select the option.
  • Type "Y" to enable the extended process line and press ENTER.
  • Press the EXIT OPTIONS function key (F8) to return to the Global screen.
  • Extended Process Information Data Items

    The data items presented in the Process Information portion of the Global screen are described in Table 9.9.
    Table 9.9 SOS Global Summary screen: Extended Process Information
    Data Item
    Description
    Wait:Cur
    The state of the process at the instant that SOS took a sample of the system. When processes are "stuck," this information can help pinpoint why. Keep in mind this wait state indicator is only the first line of defense. If a process is being impeded, the process’s wait state breakdown or the Process Detail screen will contain useful information to further analyze the problem. These wait states can also change every few seconds. See "Wait State Description".
    Performance Tip
    If a process is always in a particular wait condition, it can indicate resource shortage or possibly a logical program problem (i.e. database locking strategy). For example, if the Mem flag is on for multiple processes, this could indicate a memory shortage.
    Wait:
    {CP,...OT}
    This banner represents the wait states in which a process can be spending time. If a process is experiencing eight-second response times, the percentage displayed in these wait state categories represent the various delay or servicing reasons. Ideally, processes would conclude unhindered. However, a process usually encounters several hindrances over the course of its life. These hindrances could include a missing memory segment, disc data, or perhaps prevented access to a TurboIMAGE database. If a particular user’s process is receiving poor response times, or a batch job is taking more time to complete than is reasonable, examine these wait states. These can be found on the Extended Process line or in the individual Process Detail screen. Cumulative wait state figures are also provided on the Process Detail screen. See "Wait State Description".
    CPU:ms
    The amount of CPU milliseconds consumed by the process for the current interval. These milliseconds are the time that the process spends at the CPU receiving service. Current means the interval specified by the I:nn:nn on the Banner Line (see "SOS Banner"). A cumulative total can be found on the Process Detail screen.
    Performance Tip
    If the current value is zero, (0) then the process was not active during the last interval. This number will also display in a quantitative fashion which processes are consuming the most and least CPU resource.
    /Tr
    The number of CPU milliseconds used by the process per terminal transaction. This will always be blank for batch jobs because batch jobs do not perform terminal transactions. This number is calculated by dividing the total number of terminal transactions into the total amount of CPU used for the current interval.
    Performance Tip
    This statistic reveals which applications are costing the most CPU cycles for each transaction. Keep in mind that the concept of a terminal read versus a user’s perception of a transaction may be different.
    D/Tr
    The number of physical disc I/Os that were performed per user terminal read (Tr). If you have redefined a terminal read to mean a user transaction, then this value will reflect the average number of disc I/Os performed by the process per user transaction.
    Performance Tip
    This statistic reveals which applications are costing the most disc I/Os for each transaction. This value is helpful in capacity planning. By obtaining an average reading of the number of disc I/Os used per transaction over time, "what if" questions like, "How will overall performance be affected if general ledger transactions increase by 40%?" Keep in mind that the concept of a terminal read versus a user’s perception of a transaction may be different.
    PF/s
    The number of page faults per second. A page fault occurs when a needed object (code or data) in not in main memory. A very low number is ideal.
    C/N
    The number of Compatibility Mode to Native Mode switches incurred by the process during the current interval. See Table 9.4 for more information.
    N/C
    The number of Native Mode to Compatibility Mode switches incurred by the process during the current interval. See Table 9.4 for more information.
    %CM
    The average amount of time the CPU spends in Compatibility Mode for this process. See "CPU CM%" for more information and performance tips.
    Additional information about a process can be viewed in the Process Detail screen, which is discussed in "SOS Process Detail".

    Process Summary by Application Workloads

    SOS Performance Advisor is able to track process statistics by application workloads. Vital performance statistics will be gathered and displayed according to the defined workloads specified in the SOSWKDEF file. Figure 9.12 displays the information contained in the Process Summary by Application Workloads section of the Global screen.


    Figure 9.12 SOS Global Summary screen: Process Summary by Application Workloads

    Workload Display Options

    Workload statistics can be displayed by doing the following:
  • Press the OPTION KEYS function key (F5).
  • Press the MAIN MENU function key (F1).
  • Type 10 - "Display workload information." Press Enter.
  • Type Y (Yes) and press Enter.
  • By default, all workloads running on the system are included in this process summary. To show only the active workloads, type 11 - "Display only active workloads," and press Enter. You may enter a minimum CPU value (between 0.0 and 100%) by selecting option 12.
  • Type Y (Yes) and press Enter.
  • Press Enter again.
  • If you’d like to save these settings permanently, press Y again. Otherwise, type N.
  • Press Enter to return to the main screen.
  • Workload Data Items

    The data items found in the Process Summary by Application Workload portion of the Global screen are listed in Table 9.10.

    Table 9.10 SOS Global Summary screen: Process Summary by Application Workloads
    Data Item
    Description
    No
    The workload numbers in ascending order as they appear in the SOSWKDEF definition file.
    Group Name
    The name assigned to each workload as it appears in the SOSWKDEF file.
    %CPU
    The percentage of CPU time used by the workload for the current and cumulative intervals.
    %Disc I/O
    The percentage of the workload’s activity that was spent on disc I/O.
    Prompt Resp
    The average response time during the current interval and cumulatively.
    #Transacts
    The number of terminal reads or transactions performed by this workload during the current interval and cumulatively.
    CPU/Tr
    The CPU milliseconds per transaction.
    IO/Tr
    Disc I/Os per transaction.

    System Performance Advice

    These System Performance Advice messages are easy-to-understand "one-liners" designed to help system administrators focus in on potential performance issues.


    Figure 9.13 SOS Global Summary screen: System Performance Advice
    At the end of each advice message is a four character message identification code (for example, <GI01> or <GE09>). The identification code of any standard advice message can be referenced in "Performance Advice Message Catalog" to obtain a more detailed explanation of the ascribed event.
    Two types of advice messages can be generated: informational and excessive.
  • An informational message (denoted by an uppercase I in the message identification code) summarizes a particular aspect of the system’s performance during the current interval.
  • An excessive message (denoted by and uppercase E) alerts the user to an excessive condition - a situation or problem that could require immediate action.
  • To receive more information about a situation described in an advice message, refer to the Global Statistics block or Process Information portions of the Global screen.

    System Performance Advice Display Options

    To enable System Performance Advice messages, enter Y for the Display advice messages option (option 3) in the SOS Main Options Menu screen.
    By default, the System Performance Advice messages include both information messages and excessive use messages. To suppress the information messages, enter N for the Display information advice messages option (option 4) in the SOS Main Options Menu screen.

    System Performance Advice Message Configuration

    If there are particular events or information of which you want to be alerted, add to or alter the SOSADVIC file located on the PUB.LPS group/Account (if you used the default installation). For example, to send a message when average CPU utilization exceeds 90%, alter the advice catalog so that necessary personnel will be notified. Instructions are found in "SOSADVIC File".

    Lund Performance Solutions
    www.lund.com
    Voice: (541) 812-7600
    Fax: (541) 812-7611
    info@lund.com
    TOCPREVNEXTINDEX