TOCPREVNEXTINDEX

Lund Performance Solutions


SOS Workload Detail
Defining your processes into workload groups gives you a method by which you can report system performance in units more compatible with business issues than with technical ones.


Figure 17.1 SOS Workload Detail screen: JOBS
To access the Workload Detail screen from the global screen:
  • Type S from the SOS Enter command: prompt to view the Screen Selection Menu screen.
  • From the Screen Selection Menu screen, enter W (Workload Detail Screen). You will be prompted: Enter workload group name: (Enter @ for a list of workloads). The Workload detail screen will appear.
  • Figure 17.1 shows an example of the screen.

    Workload Detail Screen Keys

    Each of the Workload Detail Screen key is listed and explained in the following table.
    Table 17.1 Workload Detail Screen keys
    Key
    Usage
    Enter
    Refresh screen
    D
    Display Workload Definitions
    E
    Return to Global Summary screen
    F
    Screen Freeze
    H
    Help System
    J
    Jump to new screen
    K
    Toggle first response option
    L
    Print Hardcopy
    O
    Option Subsystem
    S
    Jump to the SOS Screen Selection menu
    W
    New Workload Detail screen
    X
    Exit
    Y
    Toggle Extended Process Line
    Z
    Zero Cumulative Totals
    1
    Display All Processes One Time
    !
    Execute Shell Commands
    :
    Execute Shell Commands
    ?
    Help System
    *
    Toggle Switch Function Key Sets
    CTRL T
    Toggle Timer Status

    Workload Detail Screen Display Items

    Miscellaneous Statistics

    The data items presented in the Misc Statistics portion of the Workload Detail screen are described in the following table.
    Table 17.2
    Data Item
    Description
    Processes
    This is the number of processes running in this workload and the cumulative average during this interval sample.
    Session
    This is the number of sessions running in this workload and the cumulative average during this interval sample.
    Job
    This is the number of jobs running in this workload and the cumulative average during this interval sample.
    CM to NM
    This is the number and rate per second (nnn/s) of compatibility mode to native mode switches performed by processes within the workload.
    Performance Tip
    A compatibility mode (CM) to native mode (NM) switch occurs when a piece of code that is executed reverts from CM to NM. This operation is not as expensive to perform as is NM to CM switching. The system can sustain many CM to NM switches without excessive degradation to the system. Depending on the system size, more than about 200 (for a small system) to 1000 or more (for a large system) can be sustained without being an excessive overhead drain on the CPU. So if a workload has 200 CM to NM switches occurring, it might be wise to check the system CM to NM switches on the "Global Misc Statistics (tabular format)".
    NM to CM
    This is the number and rate per second (nnn/s) of native mode to compatibility mode switches performed by processes within the workload.
    Performance Tip
    A native mode (NM) to compatibility mode (CM) switch occurs when a piece of code that is executed reverts from NM to a translated form (CM). This operation is quite expensive for the system to perform and should be minimized. Depending on the system size, about 50 per second may indicate an overhead drain on the CPU. A single workload with 50 NM to CM switches per second should be investigated.
    CPU CM%
    This is the percentage of time within the current interval that processes within this workload have been in compatibility mode program code.
    Performance Tip
    This number can be a big help to you if you are trying to optimize performance from a migration standpoint. It is important that as many programs as possible be compiled in Native Mode to take full advantage of the performance advantages of the Hewlett-Packard Precision Architecture HP-PA (also known as RISC). The time the CPU spends in the compatibility mode represents wasted time because code translations must take place. If the programs are compiled with a native language compiler, the translation is done once for all programs at compile time. There may be no right or wrong value for this number on your system. It depends on what you find is acceptable and on your ability to go “native.” If you have third party software and your vendor has not made the switch from MPE V to MPE/iX, then you are stuck with compatibility mode code. This means a performance compromise. It is best to target values of less than 20%.
    Launch/s
    This is the number of launches per second within this workload. This is the activity that refers to a process receiving exclusive use of the CPU. A launch occurs when the MPE dispatcher has determined which process is ready to run and has the highest priority if there are many such processes ready. Typically, this activity will occur many times in the life of a process. A launch implies that a process stop occurred. After a process is launched, it is considered to be executing.
    Performance Tip
    Excessive launch activity implies excessive process stops. Each launch incurs CPU overhead (especially due to dispatcher activity). A low launch rate is desirable.
    Page Fault/s
    This value represents the current and cumulative number of times per second that memory page faulting occurred for processes within this workload. A Page Fault is counted when a process needs a memory object (code or data) that is absent from main memory.
    Performance Tip
    A consistent value of more than 25 page faults per second should alert you to the possibility of a memory shortage and other memory indicators should be checked (this is system wide). A range on systems that have adequate memory is 0-5 per second (system wide). If a workload has more than 5, the cause should be investigated. Be sure to check the Memory Detail screen for more insight into memory activity.
    SOS Misc Statistics data items

    CPU Usage

    The data items presented in the CPU Usage portion of the Workload Detail screen are described in the following table.
    Table 17.3
    Data Item
    Description
    System%
    This percentage reflects the amount of the total CPU capacity consumed by workload processes during the current interval. If a process uses more than zero but less than or equal to 0.1 then .<% is displayed.
    Performance Tip
    The high CPU user (the "Hog") is displayed in the Advice Section in the Global Summary screen. It is very important to isolate the currently active, high CPU consumer because it is often the performance problem. It is possible to spot a program looping condition if it consumes a lot of the CPU’s attention and breaks little or not at all for other events. An even distribution of the CPU among processes over a period of time is desirable. If a process should be getting CPU time and is not, you should look at the Current Wait reason (discussed below) to see why not. This process may be waiting on resources to be released in order to continue. Looking at the Process Wait states will reveal even more.
    Ms Used
    These numbers represent the current and cumulative amount of CPU milliseconds consumed by the workload processes, respectively. These milliseconds represent the time processes spent at the CPU watering hole for service. “Current” means the interval specified by the I:nn:nn at the top banner line. The cumulative number is unique because it represents the total number of CPU milliseconds that were consumed since the process was created and not just since SOS/3000 started. So if the process under study was started hours ago you will see a large cumulative value for the “CPU Ms Used”.
    Performance Tip
    One of the first things you can tell about a process is whether or not it has received any CPU attention during the last interval. If the current value is zero then the process was not active during the last interval. These numbers will also quantitatively indicate which processes are consuming the most and the least CPU.
    Per Trans
    This value is the number of CPU milliseconds used by the workload per each terminal transaction. This will always be blank for batch jobs because batch jobs do not perform terminal transactions. This number is calculated by dividing the total number of terminal transactions into the total amount of CPU used by the workload for the current interval.
    Performance Tip
    You can discover which applications are costing the most CPU cycles for each transaction. This number is helpful if you are trying to perform capacity planning. By obtaining an average reading of the amount of CPU used per transaction, over time you can use queuing network math or simple spreadsheet calculations to help you answer “what if” questions like: “How will my overall performance be affected if I increase my general ledge transaction volume by 40%?” Keep in mind that the concept of a terminal read versus a user’s perception of a transaction may be different. Please refer to the Transaction and Response Time discussion in "Global Misc Statistics (tabular format)" for more insight.
    SOS CPU Usage data items

    Disc I/O Usage

    The Disc I/O Usage portion of the Workload Detail screen includes data describing the various aspects of a workload’s disc I/O resource usage within the workload. Within the framework of MPE/iX, disc I/O is usually not a bottleneck. However, it is important to pay close attention to applications exhibiting abnormally high disc I/O activity. Each data item is described in Table 17.4.
    Table 17.4 SOS Disc I/O Usage data items
    Data Item
    Description
    I/Os
    Total
    The first value of this pair is the total number of physical disc I/Os generated by the workload during the current interval. The second “[n]” is the cumulative number of I/Os for the workload processes since they began. If SOS/3000 was started after the workload began, this value will reflect disc I/Os that accumulated since the beginning of SOS/3000.
    Reads
    The first value of this pair is the number of physical read disc I/Os generated by the workload processes during the current interval. The second “[n]” is the cumulative number of read I/Os for the workload processes since the workload processes began. If SOS/3000 was started after the process began, this value will reflect the disc I/Os that accumulated since the beginning of SOS/3000.
    Writes
    The first value of this pair is the number of physical write disc I/Os generated by the workload processes during the current interval. The second “[n]” is the cumulative number of write I/Os for the workload processes since the workload processes began. If SOS/3000 was started after the process began, this value will reflect the disc I/Os that accumulated since the beginning of SOS/3000.
    Performance Tip
    These absolute physical I/O numbers will help you characterize workload in terms of trips to disc. In the case of MPE/iX pre-fetching, most I/Os will be eliminated. Only those I/Os unsatisfied in memory will be retrieved from disc and will be reflected in these numbers.
    Rate
    Total
    This value is the average number of total physical disc I/Os per second generated by the workload processes during the current interval.
    Read
    This value is the average number of physical disc I/O reads per second generated by the workload processes during the current interval.
    Write
    This value is the average number of physical disc I/O writes per second generated by the process during the current interval.
    Performance Tip
    These I/O rates will help you characterize workload processes in terms of the rate of physical trips to disc. In the case of MPE/iX pre-fetching some I/Os will be eliminated. Only those I/Os unsatisfied in memory will be retrieved from disc and reflected in these numbers.

    Response and Transaction Statistics

    Each data item from the Response/Transaction portion of the Workload Detail Screen is described in Table 17.5.
    Table 17.5 SOS Response/Transaction data items
    Data Item
    Description
    Trans Count
    Trans Rate/min
    These numbers represent the current number of terminal transactions (possibly equivalent to terminal reads) performed by the workload processes to a particular terminal device, a cumulative average, and an estimated rate per minute based on the current interval. Under certain conditions these numbers will represent the actual number of user transactions (e.g., posting a payment, inquiring on an account, etc.). An accurate reading will occur if multiple carriage returns per screen were used for data entry. VPLUS status checks are not counted by measurement interface which SOS/3000 accesses. Transaction counts for VPLUS applications will be quite accurate. These numbers will provide a consistent transaction count for VPLUS applications and are a questionable count for character mode transactions. The best way to tell if terminal reads and transactions are equivalent is to test them. Have a user enter a specific number of transactions defined from the user’s standpoint and track that activity via SOS/3000 to check for discrepancies.
    Prompt Resp First Resp
    These numbers represent the terminal read response times for interactive users within the workload. First Resp is the response time for the user from the time C/R or Enter is pressed to when the first character appears on the screen. Prompt Resp is the response time for the user from when C/R or Enter is pressed to when the first prompt appears at which the user can enter a new transaction. There are a number of things to keep in mind when discussing response times. Refer to the discussion of Transactions and Response Times, under "Global Misc Statistics (tabular format)" for a detailed explanation.
    Performance Tip
    Excessively high response times should be investigated. Heavy terminal activity can drain the CPU’s attention with nonproductive overhead tasks. Impedances can cause excessive response times. It is important to analyze the Wait State percentages. These are shown on the Extended Process Display line or at the Process Detail screen (Process Wait States). Be sure you understand the difference between First and Prompt response times. If you have a lot of on-line reporting, the Prompt response times will be substantially larger thus skewing the true system Response time. In this case the First response will be more meaningful in tracking the rate at which the system is sending data back to the user’s terminal.

    Workload Wait State Statistics

    These counters represent the Wait States in which processes within a workload can spend time. In other words, if a process is experiencing eight second response times, the percentages displayed in these Wait State categories represent the delay or servicing reasons. It is ideal for a process to continue unhindered. However, a process usually hits many brick walls over the course of its life.

    A brick wall could mean a missing memory segment, disc data, or perhaps prevented access to a TurboIMAGE database. If you notice that a particular user’s process is receiving poor response times, or a batch job is taking more time to complete than is reasonable, examine these wait reasons. You can view them in the Extended Process line or on the individual Process Detail Screen. Cumulative Wait State figures are also provided on the Process Detail Screen.

    The most ideal throughput for a process is derived when it does not have to stop for any reason. In other words, it derives full use of the CPU. The following discussion describes the other “brick walls” that can slow down a process’ progress (with the exception of CPU).
    Each data item is described in Table 17.6.
    Table 17.6 SOS Workload Wait State data items
    Data Item
    Description
    CPU
    This Wait State is the percentage of the workload process’ Response time due to being serviced by the CPU. It takes a certain amount of CPU time to perform the various commands of processes.
    Performance Tip
    For processes that are computation-intensive you will usually see a high number in this category. It is possible that a process exhibiting close to 100% here is in a looping state especially if the program is not completing as desired.
    Mem
    This Wait State is the percentage of the workload process’ Response time due to waiting for missing memory segments to be brought into main memory. When a process wants to continue to run but cannot because necessary memory segments are missing, that process is blocked. Memory fault stop time is counted in this category.
    Performance Tip
    For systems having an inadequate amount of main memory to support current demands, numbers may exceed 10% in this category. Systems exhibiting severe memory shortage will show most user processes, even those needing modest amounts of memory, as high memory wait percentages in this bucket. If only a few processes report values greater than or equal to 20-30% you should look at their individual memory requirements. A particular application may be gorging itself on memory space. If this is so, a redesign of that program is warranted. Remember that when dealing with process “brick walls” (in this case absent memory segments), small percentages are desirable. Less than 10% in this Wait State is preferable.
    Dsc
    This Wait State is the percentage of the workload process’ Response time due to waiting for missing data to be brought into main memory from disc. An I/O “brick wall” occurs when a process wants to continue running but cannot due to necessary user-requested data missing from disc. Since a process is literally stopped and the CPU is taken away when a physical disc access is performed it is absolutely necessary to minimize this percentage.
    Performance Tip
    If you notice that most of the time the CPU Pause for Disc I/O time (Global Screen) is rising above 10-15% most of the time, you will usually find that one or more processes are spending a moderate-to-high percentage of their processing time waiting for disc I/O’s to complete. If a process is consistently waiting more than 20-30% of its time on disc I/O servicing then you should find out why. There are a number of reasons why I/O bottlenecking can take place. Some common culprits are:
  • TurboIMAGE master and detail set inefficiencies.
  • Inefficient pre-fetching operation (lack of CPU, memory, poor I/O locality).
  • Too many I/O-demanding processes running concurrently, etc.
  • Imp
    This Wait State is the percentage of the workload process’ Response time due to being impeded by various lock and latch control mechanisms. This category includes many stop reasons. An impede occurs when a process tries to gain access to a software table or control structure and cannot because other processes arrived first. TurboIMAGE access is one of the most common sources of impedes. When a process tries to gain entry to a particular dataset and another process has that set locked via the DBLOCK intrinsic, then the waiting process is counted as having been impeded. It must wait until the prior process is finished with its current operation before it can continue.
    Any file can have only one disc request outstanding. For a process to access even a simple MPE/iX flat file, it must first gain control of that file’s control block. This access is not by the FLOCK intrinsic, which is the case in the other wait state bucket. Rather, only one user at a time can gain access, regardless of programmatic locking. Other sources of impedes include unavailable system table entries, terminal buffers, etc.
    Performance Tip
    The interpretation of impedes can be difficult because there are potentially many causes and interrelationships between processes and resources. First of all, it is best to determine the overall global impede rate. Do this by looking at the Impede Value on the Global Process Stop Reasons screen. If the global impede percentage is consistently high it is important to look at individual processes that have high impede percentages as part of their processing time. Processes accessing the same database in applications where poor locking strategies are implemented, tend to spend a very large percentage of their time being impeded. It is not uncommon to see values in excess of 60% for processes in the Impede Wait State. A large percentage may point to poor locking or can simply indicate that a great deal of competition exists for a particular file.
    Pre
    This Wait State is the percentage of the workload process’ Response time due to preemption by other processes. A preemption occurs when a process is forced to give up use of the CPU because a higher priority process is ready to execute.
    Performance Tip
    If both interactive and batch processes are running, batch processes in lower queues will receive a higher number of preemptions than those running in the interactive queue. If interactive users are spending too much Response time being preempted, it is possible that there is not enough CPU horsepower to go around. Backing off on demand or increasing the supply are the only recourses. It may help to sparingly distribute the CPU resource by means of the TUNE command or through a program. The basic strategy is to give less CPU attention to those who can stand it and provide more to those who need it most.
    RIN
    This Wait State is the percentage of time the processes within the workload are waiting for a RIN (Resource Identification Number).
    TWr
    This Wait State is the percentage of time the processes within the workload are waiting for terminal writes to complete. Since terminal output is usually buffered, this will only accumulate time if the system runs out of terminal buffers or if the program is blocking on terminal output.
    BIO
    The BIO (i.e., Block for I/O) Wait State is the percentage of time the processes within the workload are waiting for a programmatic timer (such as the PAUSE intrinsic) to complete.
    Tim
    This Wait State is the percentage of time the process is waiting for a programmatic timer (such as the PAUSE intrinsic) to complete.
    FS
    This Wait State is the percentage of time the process is waiting on a father and/or son wait.
    Msg
    This Wait Sate is the percentage of time the process is waiting on a message file, port, or sendmail/receivemail wait.
    Oth
    This Wait State is the percentage of time the process is waiting on other events not covered by the above definitions.

    Process Information

    See "Process Information" for details.

    Lund Performance Solutions
    www.lund.com
    Voice: (541) 812-7600
    Fax: (541) 812-7611
    info@lund.com
    TOCPREVNEXTINDEX