SOS Process Detail
The Process Detail screen allows you to see, in microscopic view, one particular process at a
time. Although some of the statistics are the same as those on the Process or Extended Process
Display lines, many new data items are provided. Additionally, this screen also displays averaged
data.
To access the Process Detail screen from the Global Summary screen:
Type S from the SOS Enter command: prompt to view the Screen Selection Menu screen or press P. You also may use the HOG PROC ZOOM key (F4).
From the Screen Selection Menu screen, enter P (Process Detail Screen). You will see this prompt: “Enter process identification number of process to display:”. By entering a valid PIN number the Process Detail Screen will appear. HOG PROC ZOOM automatically inputs the PIN of the process using the most CPU. Figure 16.1 shows an example of the screen.
One feature provided on this screen is the family lineage for the current process. By pressing the
UTILITY KEYS function key (F6), and then the PROCESS TREE function key (F4), you will see a
graphic format of the father and its related son processes. This will be helpful when dealing with
process-handling issues.
Figure 16.1 SOS Process Detail screen
Process Detail Screen Keys
Each of the Process Detail Screen key is listed and explained in the following table.
Table 16.1 Pulse Points Screen keys
|
Key
|
Usage
|
|
Enter
|
Refresh screen
|
|
E
|
Return to Global Summary screen
|
|
F
|
Screen Freeze
|
|
H
|
Help System
|
|
J
|
Jump to new screen
|
|
L
|
Print Hardcopy
|
|
M
|
Toggle Memory Lock
|
|
O
|
Toggle Show Other File Opens
|
|
P
|
New Process Detail Screen
|
|
Q
|
Queue Jump
|
|
R
|
Toggle Stack Trace
|
|
S
|
Jump to SOS Screen Selection menu
|
|
T
|
Display Process Tree
|
|
U
|
Display File Users
|
|
V
|
Launch FILERPT
|
|
X
|
Exit
|
|
Z
|
Zero Cumulative Totals
|
|
!
|
Execute Shell Commands
|
|
:
|
Execute Shell Commands
|
|
#
|
Display Job/Session Tree
|
|
?
|
Help System
|
|
CTRL T
|
Toggle Timer Status
|
|
CTRL W
|
Toggle Wait Index Info
|
Process Detail Screen Display Items
Table 16.2 SOS Process Detail data items
|
Data Item
|
Description
|
|
PIN
|
This number stands for the Process Identification Number (PIN). Each process is uniquely identified by its own PIN. The easiest way to locate processes is by knowing this number. A single job or session can have many processes associated with it.
|
|
Sess/Job
|
The job or session number associated with the particular process. If the process is a system type (not originating from a user job or session), <sys> will appear in this column.
|
|
LDev
|
The logical device number of the device where the process was created. Batch jobs will display the streams device number here (usually 10). System processes will have a “-.” This column is helpful to track down a particular user whose process is exhibiting unique traits. Jobs or processes that are in the process of terminating may display an erroneous number here. This does not indicate a problem.
|
|
Prog
|
The program or last MPE/iX command executed by the user. Some system-type program names will be uniquely identified, such as “Spooler”. If the process is a Command Interpreter, then “ci:xxxx” will appear in this column, where “xxxx” is the last MPE/iX command the user or job issued.
|
|
User
|
The logon sequence as initiated by the user or job minus the logon group. Once again, if the process was spawned by MPE and not with a session or job, System Process will be displayed. The name of that process will be provided at the Program name.
|
|
Fath/Bro/Son
|
These numbers are the Process Identification Numbers (PIN) for the father process that created the current process. Next, the brother— if any—was created by the father, and the first son process—if any—was created by the current process. By traversing the process tree you can identify all relatives associated with the current process. You may also press the PROCESS TREE function key to see the lineage of the current process in graphic format.
|
|
Pri
|
The first two letters signify the particular dispatch subqueue in which the process is executing. The following number is an absolute priority that the MPE dispatcher uses to determine what process gets the CPU’s attention next. This number is ultimately used to determine the CPU’s next process customer. The first letter is the queue. The second indicates whether the process has a fixed priority subqueue (L for linear or S for circular). Only C, D, and E queues can have the S subqueue. The possible letters for the queue are described in Table 16.3.
|
|
Type
|
This label indicates whether the process began in compatibility mode (CM) or native mode (NM). This flag will not indicate the current mode of the program.
|
|
CM%
|
The percentage of the CPU used for compatibility mode operations when this process was using the CPU.
|
|
CM->NM
|
The number and rate per second (nnn/s) of compatibility mode to native mode switches performed by the process.
|
|
NM->CM
|
The number and rate per second (nnn/s) of native mode to compatibility mode switches performed by the process.
|
Table 16.3 SOS Queue items
|
Queue Item
|
Description
|
|
AL
|
A very high priority linear subqueue. This queue is usually reserved for highest priority MPE system processes that need immediate and adequate CPU time. Linear means that the process priority does not usually change. It is fixed.
|
|
BL
|
A high priority queue. This queue is used by some lower priority MPE system processes and by some very high priority user processes. For example, logging on a system with a “PRI=BS” parameter will allow your terminal to receive more CPU attention than those in lower queues. You should be cautious when running processes in this queue. If a looping condition takes place often, the only remedy is to restart the system! This is because processes in the A and B queues generally will not give up control of the CPU until they are through with it. This queue is generally linear, but it is possible to assign a process to the circular queue with priorities falling in the B queue range.
|
|
CS
|
This subqueue is the one in which normal interactive sessions run. When you log on at a terminal, your Command Interpreter Process (the process that allows you to dialogue with an MPE/iX prompt) is assigned a priority of 152 in the queue unless the default queue settings have been altered. As your process uses more CPU time than the average last 100 transactions, your priority is decremented (increased numerically - logically lower in priority). The net effect is that HOG interactive transactions are penalized. They have less chance of getting CPU time. Short transactions are rewarded by maintaining a higher priority. It is by this method that MPE/iX tries to fairly allocate resources among competing processes.
|
|
DS
|
This subqueue is commonly used for high priority batch jobs. The rules for this and the E queue are described below and are similar to that of the C queue. Processes fall in priority as they exceed the filter values. In the CS queue this is the dynamically calculated SAQ (System Average Quantum) value. For the D and E queues these values are the MINQUANTUM and MAXQUANTUM.
|
|
ES
|
This subqueue is typically used for lower priority batch jobs. Processes running at low priority will only get table scraps of CPU time. Processes running at higher priorities leave leftovers for these lower priority processes.
Performance Tip
If you see a process in the linear queue that consumes a lot of CPU time, it is possibly the culprit causing a bottleneck. If other processes are congregating at a low priority and are not getting enough CPU time you should use the TUNE command to help them derive more. You can manipulate the TUNE command to perform several actions. Do not be afraid to take advantage of its capabilities.
|
CPU Usage
The CPU Usage portion of the Process Detail screen contains information and explanations of a
process’ CPU resource usage. Each data item is described in the next table.
Table 16.4 SOS CPU Usage data items
|
Data Item
|
Description
|
|
System%
|
This percentage reflects the amount of the total CPU capacity consumed by this process during the current interval. If a process uses more than zero but less than or equal to 0.1 then .<% is displayed. This is to let you know that some time was spent on the process although very little (between 0 and 0.1).
Performance Tip
The high CPU user, HOG, is displayed in the Advice Section. It is very important to isolate the currently active, high CPU consumer because it is often the performance problem. It is possible to spot a program looping condition if it consumes a lot of the CPU’s attention and breaks little or not at all for other events. An even distribution of the CPU among processes over a period of time is desirable. If a process should be getting CPU time and is not, you should look at the Current Wait reason (discussed below) to see why not. This process may be waiting on resources to be released in order to continue. Looking at the Process Wait states will reveal even more.
|
|
Ms Used
|
These numbers represent the current and cumulative amount of CPU milliseconds consumed by the process respectively. These milliseconds represent the time processes spent at the CPU watering hole for service. “Current” means the interval specified by the I:nn:nn at the top banner line. The cumulative number is unique because it represents the total number of CPU milliseconds that were consumed since the process was created and not just since SOS/3000 started. So if the process under study was started hours ago you will see a large cumulative value for the “CPU Ms Used”.
Performance Tip
One of the first things you can tell about a process is whether or not it has received any CPU attention during the last interval. If the current value is zero then the process was not active during the last interval. These numbers will also quantitatively indicate which processes are consuming the most and the least CPU.
|
|
Per Trans
|
The average number of CPU milliseconds consumed by the process per transaction.
|
Disc I/O Usage
The Disc I/O Usage portion of the Process Detail screen includes data describing the various
aspects of a process’ disc I/O resource usage. Within the frame work of MPE/iX, disc I/O is
usually not a bottleneck. However, it is important to pay close attention to applications exhibiting
abnormally high disc I/O activity. Each data item is described in
Table 16.5.
Table 16.5 SOS Disc I/O Usage data items
|
Data Item
|
Description
|
|
I/Os Total
|
The first value of this pair is the total number of physical disc I/Os generated by the process during the current interval. The second “[n]” is the cumulative number of I/O’s for the process since it began. If SOS/3000 was started after the process began this value will reflect disc I/O’s that accumulated since the beginning of SOS/3000.
|
|
Reads
|
The first value of this pair is the number of logical read disc I/Os generated by the process during the current interval. The second “[n]” is the cumulative number of read I/Os for the process since the process began. If SOS/3000 was started after the process began, this value will reflect the disc I/Os that accumulated since the beginning of SOS/3000.
|
|
Writes
|
The first value of this pair is the number of logical write disc I/Os generated by the process during the current interval. The second “[n]” is the cumulative number of write I/Os for the process since the process began. If SOS/3000 was started after the process began, this value will reflect the disc I/Os that accumulated since the beginning of SOS/3000.
|
Performance Tip
These absolute logical I/O numbers will help you characterize processes in terms of trips to disc. In the case of MPE/iX pre-fetching some I/Os will be eliminated. Only those I/Os unsatisfied in memory will be retrieved from disc and will be reflected in these numbers.
|
|
Rate Total
|
This value is the average number of total logical disc I/Os per second generated by the process during the current interval.
|
|
Read
|
This value is the average number of logical disc I/O reads per second generated by the process during the current interval.
|
|
Write
|
This value is the average number of logical disc I/O writes per second generated by the process during the current interval.
|
Performance Tip
These I/O rates will help you characterize processes in terms of the rate of physical trips to disc. In the case of MPE/iX pre-fetching some I/O’s will be eliminated. Only those I/Os unsatisfied in memory will be retrieved from disc and be reflected in these numbers.
|
Response and Transaction Statistics
Each data item from the Response/Transaction portion of the Process Detail screen is described
in
Table 16.6.
Table 16.6 SOS Response/Transaction data items
|
Data Item
|
Description
|
|
Prompt Resp First Resp
|
These numbers represent the terminal read response times for interactive users. First Resp is the response time for the user from the time C/R or Enter is pressed to when the first character appears on the screen. Prompt Resp is the response time for the user from when C/R or Enter is pressed to when the first prompt appears at which the user can enter a new transaction. There are a number of things to keep in mind when discussing response times. Refer to the discussion of Transactions and Response Times, under "Global Misc Statistics (tabular format)" for a detailed explanation.
Performance Tip
Excessively high response times should be investigated. Heavy terminal activity can drain the CPU’s attention with nonproductive overhead tasks. Impedances can cause excessive response times. It is important to analyze the Wait State percentages. These are shown on the Extended Process Display line or at the Process Detail screen (Process Wait States). Be sure you understand the difference between First and Prompt response times. If you have a lot of on-line reporting the Prompt response times will be substantially larger thus skewing the true system response time. In this case the First response will be more meaningful in tracking the rate at which the system is sending data back to the user’s terminal.
|
|
Trans Count
Trans Rate/min
|
These numbers represent the current number of terminal transactions (possibly equivalent to terminal reads) performed by the process to a particular terminal device, a cumulative average, and an estimated rate per minute based on the current interval. Under certain conditions these numbers will represent the actual number of user transactions (e.g., posting a payment, inquiring on an account, etc.). An inaccurate reading will occur if multiple carriage returns per screen are used for data entry. VPLUS status checks are not counted by measurement interface which SOS/3000 accesses. Transaction counts for VPLUS applications will be quite accurate. These numbers will provide a consistent transaction count for VPLUS applications and are a fairly accurate count for character mode transactions. The best way to tell if terminal reads and transactions are equivalent is to test them. You can have a user enter a specific number of transactions defined from the users standpoint and track that activity via SOS/3000 to check for discrepancies.
|
Process Wait State Statistics
These counters represent the Wait States in which processes can spend time. In other words, if a
process is experiencing eight second response times, the percentages displayed in these Wait
state categories represent the delay or servicing reasons. It is ideal for a process to continue
unhindered. However, a process is usually impeded over the course of its life.
A hindrance could mean a missing memory segment, disc data, or perhaps prevented access to
a TurboIMAGE database. If you notice that a particular user’s process is receiving poor response
times, or a batch job is taking more time to complete than is reasonable, examine these wait
reasons. You can view them in the Extended Process line or on the individual Process Detail
Screen. Cumulative Wait State figures are also provided on the Process Detail Screen.
The most ideal throughput for a process is derived when it does not have to stop for any reason.
In other words, it derives full use of the CPU. The following discussion describes the other “brick
walls” that can slow down a process’ progress (with the exception of CPU).
Table 16.7 SOS Process Wait State data items
|
Data Item
|
Description
|
|
CPU
|
This Wait State is the percentage of the process’ response time due to being serviced by the CPU. It takes a certain amount of CPU time to perform the various commands of processes.
Performance Tip
For processes that are computation-intensive, you will usually see a high number in this category. It is possible that a process exhibiting close to 100% here is in a looping state, especially if the program is not completing as desired.
|
|
Mem
|
This Wait State is the percentage of the process’ response time due to waiting for missing memory segments to be brought into main memory. When a process wants to continue to run but cannot because memory segments are missing, it is blocked. Memory fault stop time is counted in this category.
Performance Tip
For systems having an inadequate amount of main memory to support current demands, numbers may be greater than 10% in this category. Systems exhibiting severe memory shortage will show most user processes in this bucket as high memory wait percentages, even those needing modest amounts of memory. If only a few processes report values greater than or equal to 20-30% you should look at their individual memory requirements. A particular application may be gorging itself on memory space. If this is so, a redesign of that program is warranted. Remember that when dealing with process “brick walls” (in this case, absent memory segments), small percentages are desirable. Less than 10% in this Wait State is preferable.
|
|
Dsc
|
This Wait State is the percentage of the process’ response time due to waiting for missing data to be brought into main memory from disc. An I/O “brick wall” occurs when a process wants to continue running but cannot due to necessary user-requested data missing from disc. Since a process is literally stopped and the CPU is taken away when a physical disc access is performed it is absolutely necessary to minimize this percentage.
Performance Tip
If you notice that most of the time the CPU pause for disc I/O time (Global Section) is rising above 10-15%, you will usually find that one or more processes are spending a moderate-to-high percentage of their processing time waiting for disc I/O’s to complete. If a process is consistently waiting more than 20-30% of its time on disc I/O servicing then you should find out why. There are a number of reasons why I/O bottlenecking can take place. Some common culprits are:
TurboIMAGE master and detail set inefficiencies.
Inefficient pre-fetching operation (lack of CPU, memory, poor I/O locality).
Too many I/O-demanding processes running concurrently, etc.
|
|
Imp
|
This Wait State is the percentage of the process’ response time due to being impeded by various lock and latch control mechanisms. This category includes many stop reasons. An impede occurs when a process tries to gain access to a software table or control structure and cannot because other processes arrived first. TurboIMAGE access is one of the most common sources of impedes. When a process tries to gain entry to a particular dataset and another process has that set locked via the DBLOCK intrinsic, the waiting process is counted as having been impeded. It must wait until the prior process is finished with its current operation before it can continue.
Any file can have only one disc request outstanding. That is, in order for a process to access even a simple MPE/iX flat file it must first gain control of that file’s control block. This access is not by the FLOCK intrinsic, which is the case in the other wait state bucket. Rather, only one user—regardless of programmatic locking—can gain access at a time. Other sources of impedes include unavailable system table entries, terminal buffers, etc.
Performance Tip
The interpretation of impedes can be difficult because there are potentially many causes and interrelationships between processes and resources. First of all it is best to determine the overall global impede rate. Do this by looking at the Impede Value on the Global Process Stop Reasons screen. If the global impede percentage is consistently high it is important to look at individual processes that have high impede percentages as part of their processing time. Processes accessing the same database in applications where poor locking strategies are implemented tend to spend a very large percentage of their time being impeded. It is not uncommon to see values in excess of 60% for processes in the impeded Wait State. A large percentage may point to poor locking or can simply indicate that a great deal of competition exists for a particular file.
|
|
Pre
|
This is the percentage of the process’ response time due to preemption by other processes. A preemption occurs when a process is forced to give up use of the CPU because a higher priority process is ready to execute.
Performance Tip
If both interactive and batch processes are running, batch processes in lower queues will receive a higher number of preemptions than those running in the interactive queue. If interactive users are spending too much response time being preempted it is possible that there is not enough CPU horsepower to go around. Backing off on demand or increasing the supply are the only recourses. Doling out the CPU resource by means of the TUNE command or a queue manager program may help. The basic strategy is to give less CPU attention to those who can stand it and provide more to those who really need it.
|
|
RIN
|
The percentage of time the process is waiting for a Resource Identification Number.
|
|
|