Lund Performance Solutions

SOS Process Detail
The Process Detail screen allows you to see, in microscopic view, one particular process at a time. Although some of the statistics are the same as those on the Process or Extended Process Display lines, many new data items are provided. Additionally, this screen also displays averaged data.
To access the Process Detail screen from the Global Summary screen:
  • Type S from the SOS Enter command: prompt to view the Screen Selection Menu screen or press P. You also may use the HOG PROC ZOOM key (F4).
  • From the Screen Selection Menu screen, enter P (Process Detail Screen). You will see this prompt: “Enter process identification number of process to display:”. By entering a valid PIN number the Process Detail Screen will appear. HOG PROC ZOOM automatically inputs the PIN of the process using the most CPU. Figure 16.1 shows an example of the screen.
  • One feature provided on this screen is the family lineage for the current process. By pressing the UTILITY KEYS function key (F6), and then the PROCESS TREE function key (F4), you will see a graphic format of the father and its related son processes. This will be helpful when dealing with process-handling issues.

    Figure 16.1 SOS Process Detail screen

    Process Detail Screen Keys

    Each of the Process Detail Screen key is listed and explained in the following table.
    Table 16.1 Pulse Points Screen keys
    Refresh screen
    Return to Global Summary screen
    Screen Freeze
    Help System
    Jump to new screen
    Print Hardcopy
    Toggle Memory Lock
    Toggle Show Other File Opens
    New Process Detail Screen
    Queue Jump
    Toggle Stack Trace
    Jump to SOS Screen Selection menu
    Display Process Tree
    Display File Users
    Launch FILERPT
    Zero Cumulative Totals
    Execute Shell Commands
    Execute Shell Commands
    Display Job/Session Tree
    Help System
    CTRL T
    Toggle Timer Status
    CTRL W
    Toggle Wait Index Info

    Process Detail Screen Display Items

    The data items are described in Table 16.2.
    Table 16.2 SOS Process Detail data items
    Data Item
    This number stands for the Process Identification Number (PIN). Each process is uniquely identified by its own PIN. The easiest way to locate processes is by knowing this number. A single job or session can have many processes associated with it.
    The job or session number associated with the particular process. If the process is a system type (not originating from a user job or session), <sys> will appear in this column.
    The logical device number of the device where the process was created. Batch jobs will display the streams device number here (usually 10). System processes will have a “-.” This column is helpful to track down a particular user whose process is exhibiting unique traits. Jobs or processes that are in the process of terminating may display an erroneous number here. This does not indicate a problem.
    The program or last MPE/iX command executed by the user. Some system-type program names will be uniquely identified, such as “Spooler”. If the process is a Command Interpreter, then “ci:xxxx” will appear in this column, where “xxxx” is the last MPE/iX command the user or job issued.
    The logon sequence as initiated by the user or job minus the logon group. Once again, if the process was spawned by MPE and not with a session or job, System Process will be displayed. The name of that process will be provided at the Program name.
    These numbers are the Process Identification Numbers (PIN) for the father process that created the current process. Next, the brother— if any—was created by the father, and the first son process—if any—was created by the current process. By traversing the process tree you can identify all relatives associated with the current process. You may also press the PROCESS TREE function key to see the lineage of the current process in graphic format.
    The first two letters signify the particular dispatch subqueue in which the process is executing. The following number is an absolute priority that the MPE dispatcher uses to determine what process gets the CPU’s attention next. This number is ultimately used to determine the CPU’s next process customer. The first letter is the queue. The second indicates whether the process has a fixed priority subqueue (L for linear or S for circular). Only C, D, and E queues can have the S subqueue. The possible letters for the queue are described in Table 16.3.
    This label indicates whether the process began in compatibility mode (CM) or native mode (NM). This flag will not indicate the current mode of the program.
    The percentage of the CPU used for compatibility mode operations when this process was using the CPU.
    The number and rate per second (nnn/s) of compatibility mode to native mode switches performed by the process.
    The number and rate per second (nnn/s) of native mode to compatibility mode switches performed by the process.
    Table 16.3 SOS Queue items
    Queue Item
    A very high priority linear subqueue. This queue is usually reserved for highest priority MPE system processes that need immediate and adequate CPU time. Linear means that the process priority does not usually change. It is fixed.
    A high priority queue. This queue is used by some lower priority MPE system processes and by some very high priority user processes. For example, logging on a system with a “PRI=BS” parameter will allow your terminal to receive more CPU attention than those in lower queues. You should be cautious when running processes in this queue. If a looping condition takes place often, the only remedy is to restart the system! This is because processes in the A and B queues generally will not give up control of the CPU until they are through with it. This queue is generally linear, but it is possible to assign a process to the circular queue with priorities falling in the B queue range.
    This subqueue is the one in which normal interactive sessions run. When you log on at a terminal, your Command Interpreter Process (the process that allows you to dialogue with an MPE/iX prompt) is assigned a priority of 152 in the queue unless the default queue settings have been altered. As your process uses more CPU time than the average last 100 transactions, your priority is decremented (increased numerically - logically lower in priority). The net effect is that HOG interactive transactions are penalized. They have less chance of getting CPU time. Short transactions are rewarded by maintaining a higher priority. It is by this method that MPE/iX tries to fairly allocate resources among competing processes.
    This subqueue is commonly used for high priority batch jobs. The rules for this and the E queue are described below and are similar to that of the C queue. Processes fall in priority as they exceed the filter values. In the CS queue this is the dynamically calculated SAQ (System Average Quantum) value. For the D and E queues these values are the MINQUANTUM and MAXQUANTUM.
    This subqueue is typically used for lower priority batch jobs. Processes running at low priority will only get table scraps of CPU time. Processes running at higher priorities leave leftovers for these lower priority processes.
    Performance Tip
    If you see a process in the linear queue that consumes a lot of CPU time, it is possibly the culprit causing a bottleneck. If other processes are congregating at a low priority and are not getting enough CPU time you should use the TUNE command to help them derive more. You can manipulate the TUNE command to perform several actions. Do not be afraid to take advantage of its capabilities.

    CPU Usage

    The CPU Usage portion of the Process Detail screen contains information and explanations of a process’ CPU resource usage. Each data item is described in the next table.
    Table 16.4 SOS CPU Usage data items
    Data Item
    This percentage reflects the amount of the total CPU capacity consumed by this process during the current interval. If a process uses more than zero but less than or equal to 0.1 then .<% is displayed. This is to let you know that some time was spent on the process although very little (between 0 and 0.1).
    Performance Tip
    The high CPU user, HOG, is displayed in the Advice Section. It is very important to isolate the currently active, high CPU consumer because it is often the performance problem. It is possible to spot a program looping condition if it consumes a lot of the CPU’s attention and breaks little or not at all for other events. An even distribution of the CPU among processes over a period of time is desirable. If a process should be getting CPU time and is not, you should look at the Current Wait reason (discussed below) to see why not. This process may be waiting on resources to be released in order to continue. Looking at the Process Wait states will reveal even more.
    Ms Used
    These numbers represent the current and cumulative amount of CPU milliseconds consumed by the process respectively. These milliseconds represent the time processes spent at the CPU watering hole for service. “Current” means the interval specified by the I:nn:nn at the top banner line. The cumulative number is unique because it represents the total number of CPU milliseconds that were consumed since the process was created and not just since SOS/3000 started. So if the process under study was started hours ago you will see a large cumulative value for the “CPU Ms Used”.
    Performance Tip
    One of the first things you can tell about a process is whether or not it has received any CPU attention during the last interval. If the current value is zero then the process was not active during the last interval. These numbers will also quantitatively indicate which processes are consuming the most and the least CPU.
    Per Trans
    The average number of CPU milliseconds consumed by the process per transaction.

    Disc I/O Usage

    The Disc I/O Usage portion of the Process Detail screen includes data describing the various aspects of a process’ disc I/O resource usage. Within the frame work of MPE/iX, disc I/O is usually not a bottleneck. However, it is important to pay close attention to applications exhibiting abnormally high disc I/O activity. Each data item is described in Table 16.5.
    Table 16.5 SOS Disc I/O Usage data items
    Data Item
    I/Os Total
    The first value of this pair is the total number of physical disc I/Os generated by the process during the current interval. The second “[n]” is the cumulative number of I/O’s for the process since it began. If SOS/3000 was started after the process began this value will reflect disc I/O’s that accumulated since the beginning of SOS/3000.
    The first value of this pair is the number of logical read disc I/Os generated by the process during the current interval. The second “[n]” is the cumulative number of read I/Os for the process since the process began. If SOS/3000 was started after the process began, this value will reflect the disc I/Os that accumulated since the beginning of SOS/3000.
    The first value of this pair is the number of logical write disc I/Os generated by the process during the current interval. The second “[n]” is the cumulative number of write I/Os for the process since the process began. If SOS/3000 was started after the process began, this value will reflect the disc I/Os that accumulated since the beginning of SOS/3000.
    Performance Tip
    These absolute logical I/O numbers will help you characterize processes in terms of trips to disc. In the case of MPE/iX pre-fetching some I/Os will be eliminated. Only those I/Os unsatisfied in memory will be retrieved from disc and will be reflected in these numbers.
    Rate Total
    This value is the average number of total logical disc I/Os per second generated by the process during the current interval.
    This value is the average number of logical disc I/O reads per second generated by the process during the current interval.
    This value is the average number of logical disc I/O writes per second generated by the process during the current interval.
    Performance Tip
    These I/O rates will help you characterize processes in terms of the rate of physical trips to disc. In the case of MPE/iX pre-fetching some I/O’s will be eliminated. Only those I/Os unsatisfied in memory will be retrieved from disc and be reflected in these numbers.

    Response and Transaction Statistics

    Each data item from the Response/Transaction portion of the Process Detail screen is described in Table 16.6.
    Table 16.6 SOS Response/Transaction data items
    Data Item
    Prompt Resp First Resp
    These numbers represent the terminal read response times for interactive users. First Resp is the response time for the user from the time C/R or Enter is pressed to when the first character appears on the screen. Prompt Resp is the response time for the user from when C/R or Enter is pressed to when the first prompt appears at which the user can enter a new transaction. There are a number of things to keep in mind when discussing response times. Refer to the discussion of Transactions and Response Times, under "Global Misc Statistics (tabular format)" for a detailed explanation.
    Performance Tip
    Excessively high response times should be investigated. Heavy terminal activity can drain the CPU’s attention with nonproductive overhead tasks. Impedances can cause excessive response times. It is important to analyze the Wait State percentages. These are shown on the Extended Process Display line or at the Process Detail screen (Process Wait States). Be sure you understand the difference between First and Prompt response times. If you have a lot of on-line reporting the Prompt response times will be substantially larger thus skewing the true system response time. In this case the First response will be more meaningful in tracking the rate at which the system is sending data back to the user’s terminal.
    Trans Count
    Trans Rate/min
    These numbers represent the current number of terminal transactions (possibly equivalent to terminal reads) performed by the process to a particular terminal device, a cumulative average, and an estimated rate per minute based on the current interval. Under certain conditions these numbers will represent the actual number of user transactions (e.g., posting a payment, inquiring on an account, etc.). An inaccurate reading will occur if multiple carriage returns per screen are used for data entry. VPLUS status checks are not counted by measurement interface which SOS/3000 accesses. Transaction counts for VPLUS applications will be quite accurate. These numbers will provide a consistent transaction count for VPLUS applications and are a fairly accurate count for character mode transactions. The best way to tell if terminal reads and transactions are equivalent is to test them. You can have a user enter a specific number of transactions defined from the users standpoint and track that activity via SOS/3000 to check for discrepancies.

    Process Wait State Statistics

    These counters represent the Wait States in which processes can spend time. In other words, if a process is experiencing eight second response times, the percentages displayed in these Wait state categories represent the delay or servicing reasons. It is ideal for a process to continue unhindered. However, a process is usually impeded over the course of its life.

    A hindrance could mean a missing memory segment, disc data, or perhaps prevented access to a TurboIMAGE database. If you notice that a particular user’s process is receiving poor response times, or a batch job is taking more time to complete than is reasonable, examine these wait reasons. You can view them in the Extended Process line or on the individual Process Detail Screen. Cumulative Wait State figures are also provided on the Process Detail Screen.

    The most ideal throughput for a process is derived when it does not have to stop for any reason. In other words, it derives full use of the CPU. The following discussion describes the other “brick walls” that can slow down a process’ progress (with the exception of CPU).
    Each data item is described in Table 16.7.
    Table 16.7 SOS Process Wait State data items
    Data Item
    This Wait State is the percentage of the process’ response time due to being serviced by the CPU. It takes a certain amount of CPU time to perform the various commands of processes.
    Performance Tip
    For processes that are computation-intensive, you will usually see a high number in this category. It is possible that a process exhibiting close to 100% here is in a looping state, especially if the program is not completing as desired.
    This Wait State is the percentage of the process’ response time due to waiting for missing memory segments to be brought into main memory. When a process wants to continue to run but cannot because memory segments are missing, it is blocked. Memory fault stop time is counted in this category.
    Performance Tip
    For systems having an inadequate amount of main memory to support current demands, numbers may be greater than 10% in this category. Systems exhibiting severe memory shortage will show most user processes in this bucket as high memory wait percentages, even those needing modest amounts of memory. If only a few processes report values greater than or equal to 20-30% you should look at their individual memory requirements. A particular application may be gorging itself on memory space. If this is so, a redesign of that program is warranted. Remember that when dealing with process “brick walls” (in this case, absent memory segments), small percentages are desirable. Less than 10% in this Wait State is preferable.
    This Wait State is the percentage of the process’ response time due to waiting for missing data to be brought into main memory from disc. An I/O “brick wall” occurs when a process wants to continue running but cannot due to necessary user-requested data missing from disc. Since a process is literally stopped and the CPU is taken away when a physical disc access is performed it is absolutely necessary to minimize this percentage.
    Performance Tip
    If you notice that most of the time the CPU pause for disc I/O time (Global Section) is rising above 10-15%, you will usually find that one or more processes are spending a moderate-to-high percentage of their processing time waiting for disc I/O’s to complete. If a process is consistently waiting more than 20-30% of its time on disc I/O servicing then you should find out why. There are a number of reasons why I/O bottlenecking can take place. Some common culprits are:
  • TurboIMAGE master and detail set inefficiencies.
  • Inefficient pre-fetching operation (lack of CPU, memory, poor I/O locality).
  • Too many I/O-demanding processes running concurrently, etc.
  • Imp
    This Wait State is the percentage of the process’ response time due to being impeded by various lock and latch control mechanisms. This category includes many stop reasons. An impede occurs when a process tries to gain access to a software table or control structure and cannot because other processes arrived first. TurboIMAGE access is one of the most common sources of impedes. When a process tries to gain entry to a particular dataset and another process has that set locked via the DBLOCK intrinsic, the waiting process is counted as having been impeded. It must wait until the prior process is finished with its current operation before it can continue.
    Any file can have only one disc request outstanding. That is, in order for a process to access even a simple MPE/iX flat file it must first gain control of that file’s control block. This access is not by the FLOCK intrinsic, which is the case in the other wait state bucket. Rather, only one user—regardless of programmatic locking—can gain access at a time. Other sources of impedes include unavailable system table entries, terminal buffers, etc.
    Performance Tip
    The interpretation of impedes can be difficult because there are potentially many causes and interrelationships between processes and resources. First of all it is best to determine the overall global impede rate. Do this by looking at the Impede Value on the Global Process Stop Reasons screen. If the global impede percentage is consistently high it is important to look at individual processes that have high impede percentages as part of their processing time. Processes accessing the same database in applications where poor locking strategies are implemented tend to spend a very large percentage of their time being impeded. It is not uncommon to see values in excess of 60% for processes in the impeded Wait State. A large percentage may point to poor locking or can simply indicate that a great deal of competition exists for a particular file.
    This is the percentage of the process’ response time due to preemption by other processes. A preemption occurs when a process is forced to give up use of the CPU because a higher priority process is ready to execute.
    Performance Tip
    If both interactive and batch processes are running, batch processes in lower queues will receive a higher number of preemptions than those running in the interactive queue. If interactive users are spending too much response time being preempted it is possible that there is not enough CPU horsepower to go around. Backing off on demand or increasing the supply are the only recourses. Doling out the CPU resource by means of the TUNE command or a queue manager program may help. The basic strategy is to give less CPU attention to those who can stand it and provide more to those who really need it.
    The percentage of time the process is waiting for a Resource Identification Number.
    The percentage of time the process is waiting for terminal writes to complete. Since terminal output is usually buffered this will only accumulate time if the system runs out of terminal buffers or if the program is blocking on terminal output.
    The percentage of time the process is waiting for non-disc I/O to complete (e.g., tape drive activity). Datacomm overhead is accumulated in this bucket as well.
    The percentage of time the process is waiting for a programmatic timer (such as the PAUSE intrinsic) to complete.
    The percentage of time the process is waiting on a father and/or son wait.
    The percentage of time the process is waiting on a message file, port, or sendmail/receivemail wait.
    The percentage of time the process is waiting on other events not covered by the above definitions.
    Current Wait
    This item represents the state of the process when SOS/3000 took a picture of the system for the current interval. This number is helpful because if a process is hindered you can find out why. Keep in mind that this single wait state indicator is only a first line of defense if you suspect an impedance problem. You should take an in-depth look at that process’ Wait State breakdown or go the Process Detail screen for that process. Keep in mind that the Wait State of a process over even a few seconds can change radically. These states are defined in Table 16.8.
    Performance Tip
    If a process is always in a particular wait condition this could be a sign of resource shortage or a logical program problem (database locking strategy issue). For example, if the Mem flag is on for multiple processes it can point to a memory shortage condition for the entire system.
    Table 16.8 SOS Current Wait data items
    Data Item
    Waiting for non-disc I/O to complete.
    Currently active in the CPU resource.
    This process has terminated and will not show during the next interval.
    Waiting for a disc I/O to complete.
    Waiting for activation by its father or son process.
    Waiting due to unavailable resources. An example is database locks, lack of system table entries, etc.
    Waiting for a segment(s) to be brought into memory.
    Waiting for message file I/O.
    This process has been preempted by a higher priority process.
    Waiting for a RIN to become available.
    Waiting for a timer to expire.
    Waiting for a terminal read to complete.
    Waiting on a terminal write to complete.
    Waiting for a miscellaneous condition to complete.

    File Usage

    In this section, information is displayed regarding any files that a process has open. The current record pointer and the number of times the file has been opened by processes (globally) are provided.
    Performance Tip
    It is often helpful to find out which files a job or session has opened. Notice, for example, that the system is exhibiting an abnormally high pause for disc I/O process for a particular job. The next logical step is to find out what files it has accessed. If one of the main files is a database you should be suspicious of its internal efficiency. The record pointer (Rec Ptr) may be helpful in determining progress through a serially read MPE file and not a TurboIMAGE database. If the file name is followed by a “T” the file in question is a temporary file. The completed number (%) is helpful for serially read MPE files because it will indicate how far a process has progressed through that file. This number will not be useful if the file is being randomly accessed.
    Each data item is described in Table 16.9.
    Table 16.9 SOS File Usage data items
    Data Item
    The name of the file that is opened and used by a process.
    The total number of files opened by all processes that are outstanding against this file.
    The code that represents how the file is being accessed by the process. R means read. W means write. L means lock.
    File Size
    The size of the file.
    Rec Ptr
    The current record number being accessed.
    The number that will indicate how far a process has progressed through that file.

    Other Process Detail Information

    SOS/3000 provides additional aids to examine process related situations. They are the OPTION KEYS and UTILITY KEYS. The Option Keys are shown in the Table 16.10.
    Option Keys
    The option keys are described in Table 16.10.
    Table 16.10 Option Keys
    F1 - Memory Lock
    When enabled, Terminal Memory Lock is turned on below FILE USAGE. This causes the file display to scroll under FILE USAGE.
    F2 - Stack Trace/Open Files
    This function key acts as a toggle between Stack Marker Tracing and an Opens Files Display. When F2 is pressed while Labeled Stack Trace is showing, a process Stack Trace markers list is displayed for the process. Notice that the first time this command is executed in an SOS/3000 session it can take 30 seconds or more for the initial display. Also notice that when you access Stack Trace, Memory Lock is disabled. Pressing F2 again will toggle to open file usage (discussed above). Only users with PM capability can display Stack Trace markers, but all users can see Open Files.
    F3 - Other Opens
    When enabled, this key will show all other accessories to files opened by this process.
    F8 - Main Keys
    When executed in this context the function key labels will revert to those within the Process Detail Display.
    The Utility Keys are shown in the Table 16.11.
    Table 16.11 Utility Keys
    F1 - MPE/iX Command
    Enter the SOS/3000/MPE/iX command interface.
    F3 - Queue Jump
    Alters the priority or queue of an executing process.
    F4 - Process Tree
    Graphically represents the PROCESS TREE showing the father and sons of the selected process.
    F5 - Job/Sess Tree
    Entering a job or session number in the form of “Jnnn” or “Snnn” will display PROCESS TREE for the requested job or session.
    F6 - File Users
    Entering an adequately qualified file name will result in a display of all users who are currently accessing that file. (The same function as the "File Users Screen" described in "SOS File Users".) It is obtained by pressing F on the SOS/3000 Screen Selection Menu.
    F7 - Filerpt Analysis
    This command will cause the programmatic execution of the program FILERPT. SOS will attempt to run the program from UTIL.LPS then UTIL.SYS and finally PUB.SYS. The program is included as an unsupported utility program with your SOS/3000 software.
    F8 - Main Keys
    When executed in this context the function key labels will revert to those within the Process Detail Display.

    Lund Performance Solutions
    Voice: (541) 812-7600
    Fax: (541) 812-7611