|
|
The REDWOOD ToolREDWOOD is a logfile analyzer that specializes in looking at file-close records from MPE XL, MPE V, and MPE IV logfiles.Operation
Optimizing disk I/O performance can be a costly and time consuming job. REDWOOD can make this process easier by identifying the most frequently accessed files on the system. Thus, when you do choose to optimize your system, you can be sure that your time is being spent productively. System optimization can yield a significant decrease in execution time. Determining which files to optimize to achieve these kinds of results is a matter of analyzing the frequency of logical access, physical access, and the number of times a file is opened. This is the type of information that REDWOOD provides.REDWOOD makes a compressed copy of the data in the system log file(s) and places it into a user-defined summary file. REDWOOD uses this summary file to create the reports you design.As you use REDWOOD, you will notice that REDWOOD frequently displays a "(CR = <value>)" at user prompts. This is REDWOOD’s way of showing default choices. Press "CR" (Enter) to select the default.REDWOOD analyzes both MPE V log files and MPE/iX log files. Before using REDWOOD, you will need to make sure that FCLOSE logging is enabled so that REDWOOD has something to which to report. Use the SYSGEN utility to determine and modify (if necessary) your system’s configuration to include fclose logging.Refer to the System Startup and Shutdown manual for details on modifying system log files. Or, use the instruction sequence that follows.Getting Started
The following instruction explain how to enable FCLOSE logging on an MPE/iX machine. Classic system log files are defined as TYPE 160. Native Mode log files are defined as TYPE 105.Type :sysgen to invoke the SYSGEN program. You should see something similar to the following display:
Figure 14.1 SYSGEN Program Screen
Figure 14.2 LOG Configuration CommandsType sl on=160,105 at the log> prompt. Next, type hold at the log> prompt. Then, type exit at the log> prompt. This returns you to the :SYSGEN prompt. Type keep at the :SYSGEN prompt. Answer yes to the Purge old configuration (yes/no): prompt. Finally, type exit to terminate the program. Overview
Disc I/O on the PA-RISC HP3000 is affected by few factors under the direct control of the programmer. These factors include: the amount of memory on the machine, the number of users competing for memory, and the transaction manager. Factors under the control of the programmer include: file locality (random or sequential access), FCONTROLs to post dirty pages, and the sequential write queue.Disc I/O on the Classic HP3000 is affected by many things: blocking factors, the number of buffers available, the type of file (variable or fixed), the spread of the disc files across the available devices, the type of calls made against the file as well as the sequence of those calls and their frequency.For an individual file, all of these considerations must be taken into account when we attempt to optimize our system. As well, we must consider the use of each file as it pertains to the application and if it is a structured file, we must analyze our selection of keys and file relationships.The benefits involved in handling these tradeoffs correctly are many, as are the costs involved with incorrect file choices.Consider an example on the Classic HP3000, a file which will be accessed sequentially. The blocking factor determines how many physical I/O's are necessary to process a given number of logical records. If a file contains 10,000 logical records which must be sequentially read by a program then the formula for the number of physical reads necessary is as follows:physical'reads = [ logical'records / blocking'factor ]where the expression [ ] is rounded up to the nearest integer.Thus, in our ten thousand record file, a blocking factor of two results in 10000/2, or 5,000 physical reads against the file. A blocking factor of twenty results in 10000/20, or 500 reads and an improvement factor of ten in the number of physical transfers necessary. Logical record accesses cost in terms of cpu as well as the amount of memory required during the transfer. Physical accesses cost in terms of several additional resources - the disc movement required, the controller time, channel time, etc. Assuming our application does require the logical accesses, we can not reduce them (although NOBUF may reduce their impact).However, by varying the blocking factor, we can reduce the number of physical accesses required (at a cost of increasing the amount of memory our buffers are using). The end result of this change can be significant. One test showed that altering a blocking factor from two to twenty, resulted in a thirty percent decrease in the execution time of a sequential-read application. Similar optimizations are possible by varying some of the other file characteristics or by rewriting portions of the file handling applications.The above is an excellent example of the difference between MPE XL on the PA-RISC HP3000 and MPE V on the Classic HP3000. Changing the block factor on MPE XL usually results in no performance differences at all. (For a few files, it can still improve the performance, but this is for relatively rare kinds of files (i.e.: RIO).)Such optimizations can provide significant improvements in the performance of a system. However, each such optimization has an implicit cost associated with it. It may be simply the time the programmer must take to redo the file equations or :BUILD commands of a batch job. On the other hand, the optimization may involve extensive recoding effort as well as testing. Most optimizations involve exploiting relatively harmless tradeoffs. In the case of the blocking factor change mentioned above, the decrease in the number of physical I/O's more than compensated for the additional burden the larger buffers placed on memory. This is not always the situation though, and each modification must be tested to determine if it does indeed improve or degrade performance.Since these file optimizations have costs associated with them, we would like to pay the cost only when we are sure of a chance of a reasonable return on our investment. There is a theory of programming which says that eighty percent of the time a program is executing it is in twenty percent of the code. This 80/20 rule can also be applied to files, or "eighty percent of all file activity is against twenty percent of the files". To receive a maximum return on our optimization investment, we should obviously focus our attention on this "top twenty" percent, and the topic which REDWOOD addresses is the identification of that twenty percent.Method
This top 20 might also be referred to as the "busy" or "heavily used" files on the system. Before we can identify the top 20, we must establish our criteria for deciding which files are busier than others. Three criteria come immediately to mind when we speak of heavy file usage:This information (with a few exceptions noted later) is available to us through the type 5 record recorded by MPE V in its log files and in the type 105 records recorded by MPE XL.The format of the type 5 (FCLOSE) record is given on pages 6-123 and 6-124 of the old SYSTEM MANAGER/SYSTEM SUPERVISOR Reference Manual. The information which will be of interest to us includes:where:file nameThe fully qualified formal designator associated with the file, fname.group.account; some program temporary files may have a blank name.logical deviceThe logical device number of the file label. This ldev may not contain the entire file since only a single extent need reside entirely on one device.number of recordsThe number of logical records which have been read or written since the file was opened, this value gives us a measure of the application's activity.number of blocksThe number of blocks which have been transferred to/from the file. This value is a measure of the physical I/O against the file in all instances except the following two cases:for files which are accessed with MULTI-REC (bit 11 of AOPTIONS or MR in file eq.) and where the block size is equal to an integer multiple of 128 words. In this case the value is the number of blocks processed (which will probably be greater than the number of physical I/O's). MPE XL's type 105 record has similar information within it. The layout, however, does not appear to be documented in any publicly accessible manuals. In the process of writing REDWOOD, we located and were able to determine its contents.
The information contained in these log records is synthesized into one of several reports which help users determine what their busy files are. The specific algorithm used will be covered later, but briefly our method is to gather all of the type 5 (& 105) records for each unique file on the system, and total the number of records processed, the number of blocks processed as well as the count of type 5 (& 105) records encountered.This gives us a measure of:Each of these measures gives us a slightly different measure of the relative use of the files, so our method allows us to choose the top twenty percent of the files judged to be busiest on any of these values. For that matter, we may choose any percentage from one to one hundred. Once these busy files have been identified, we can begin to optimize these files knowing that any improvements we make will have the maximum effect on the overall system performance.Advantages / Disadvantages
There are several other methods which could have been used in an attempt to determine which files are our busiest. These include direct monitoring of the I/O on the system, embedded measuring tools and alternative reporting schemes using log records. The method used here has the following advantages:The MPE logging facility is universal across the 3000 product line and does not require a specific machine or MPE level (true, the logfile format changed from MPE IV to MPE V to MPE XL, but REDWOOD understands all three formats) No special programming or capabilities are required (REDWOOD does not need PM capability) Since the method analyzes log files which have been closed, it may be run during "off hours" or on a separate machine, thus there is no effect on the system caused by the tool itself (other than the negligible overhead of enabling logging itself). REDWOOD can be used on an MPE XL machine to analyze logfiles :RESTOREd from a Classic machine. We have several distinct measures of usage: logical & "physical" I/O counts, file size (XL only), and file-open counts This method is not restricted to a particular file type or structure, although its usefulness may not be as great with some The only necessary modification to the system configuration is that logging be enabled for FCLOSE's (type 5 or 105) The disadvantages to this approach are as follows:On MPE XL, we cannot determine the number of physical I/Os that occurred. MPE XL will "post" (write to disc) pages of files at its leisure. Thus, a dozen FWRITEs to the same record may result in anywhere from 0 to 12 disc writes. On MPE XL, any "mapped access" to a file is not reflected in the file close record. Thus, access by TurboImage is not seen (except for the file close record itself). On MPE V, although it does give you physical I/O, these figures are not related to any figure such as specific IMAGE calls, etc. This tool works well for summary reporting of disc activity, however, in some cases the real concern may center on "burst" I/O activity; i.e., the total number of I/O's spread across the day is small, but within 2 seconds after ENTER is hit, the activity is concentrated. There is no mechanism for determining the I/O rate for a period of time. These records can not be used to summarize disc activity per device since the LDEV is only for the first extent; not all extents of a file are required to reside on a single device. The number of blocks processed does not equal the number of physical I/O's if one of two cases is true: files opened with multi-rec which also have blocks which end on a sector boundary may access multiple blocks in a single I/O, the number is higher than the actual number of physical I/O's On MPE XL, type 105 records do not contain a "blocks transferred" count, so REDWOOD synthesizes a value by assuming a blocking factor of 1. On MPE XL, type 105 records contain the number of logical reads and the number of logical writes. On MPE V, type 5 records contain only the sum of these two values. REDWOOD does not yet have the ability to separate the read count and write count. Instead, it always adds them together. Capabilities
Program capabilities required include IA, BA, DS and PH. No special user capabilities are required.Usage
Invoke REDWOOD using the supplied UDC or with the RUN command detailed below.:REDWOOD [parm=#]:RUN REDWOOD.PUB.LPSTOOLS;PARM=#parm=# is used to change the max default number of records that REDWOOD can process in a single summary log file. The number entered for parm is multiplied by 1000 to obtain the new max default value (DEFAULT: 40000). For example:REDWOOD 90Or:RUN REDWOOD.PUB.LPSTOOLS;PARM=90(Sets default to 90,000 records)Command Summary
REDWOOD has several commands which when executed in a particular sequence will produce a summary file which can then be re-sorted and listed for several different reports. The commands are listed in the next table.
Table 14.1 REDWOOD CommandsCommand Definitions
Depending on which commands you use, REDWOOD produces a summary file which can be re- sorted and listed for several different reports. REDWOOD commands include primary functions, options, exit procedures and Help. These are discussed in detail next.CReate
CREATE produces a summary file of the FCLOSE records from one or more log files. The log file (s) used are read sequentially and all type 5 and 105 (FCLOSE) records for disc files (subtype 0) are extracted. These records are then sorted by the file formal designator (file.group.account) to group all records for the same file. EDITOR work files of the form Knnnnnnn, where nnnnnnn is a valid seven-digit number, are transformed to have file names of the form, K#######, to group all k-files for each group/account into one record. (This is controlled by the [RE]SET EDITOR command.) Similarly, FSEDIT work files (whose names are of the form F#######) are gathered together under the control of the [RE]SET FSEDIT command.This temporary sort file is then read sequentially and a summary file is built containing one record for each unique formal designator. This record contains information including: the device number (or pseudo-LDEV for MPE XL), total number of records processed, total number of blocks processed, FCLOSE count and an indicator for whether the device number was the same for all FCLOSE's. If this indicator is set to TRUE, then at there was at least one record which contained a logical device different from the other records for that file. This indicates that the file has moved, possibly due to a purge and re-create.Since the log files may have been stored from the test system and restored to an account or group other than PUB.SYS, the CREATE command allows the user to override the default group and account (PUB.SYS) for the log file(s) to be analyzed. Once the group and account has been established, the four digit number of the first log file is entered and then the four digit number of the ending log file is entered if different from the first. Once these numbers are in, REDWOOD requests the name of a summary file which it will attempt to create to hold the summary records for each FCLOSE'd file.After the log file range has been specified, REDWOOD reads through each log file whose number is in the desired range and extracts all the file close records, storing them in the summary file.
EXCLude
This command has the following syntax:EXCLUDE [ZERO] [NONE] [PERM] [DEFault] [LDEV #] [SMALL #blocks][NONZERO] [BIGsectors #]REDWOOD has the ability to "exclude" records from being considered based on a variety of criteria. The exclusion is checked as each record is read from a log file.
Exit
The EXIT command closes all open files and ends the program execution.A :EOD entered at any prompt will also terminate REDWOOD.List
The LIST command will sort and report on the records found in a summary file, whose name can be directly entered or a return can be used to indicate that the same summary file will be used again. The file is sorted in one of eight or nine different manners. When this sorted file is then listed and totalled, the files are in an order such that the "busiest" files are listed first. The user can choose to list just the busiest ten percent of the system's files. If the sort key chosen stays the same between two LISTings of the same summary file, then the sort is not executed to save time.The possible sort options available are:BLOCKS processed (only for MPE V & IV logfiles) FCLOSE count REC/BLK ratio REC/BLK ratio (exclude probable NOBUF) File Name (A.G.F) File Name (F.G.A) Average Size Maximum Size The LIST command produces a report with a header like the following:
Figure 14.3 LIST Command Report HeaderEach of these fields is described below.Table 14.2 Header FieldsSome columns in the LIST output have special characters to indicate various things. This section documents each of these characters.
LP
The LP command opens a file with the formal name LPSLP, which defaults device=LP. All reports are sent to this file until the TERM or EXIT commands are used. A file equation may be used to redirect this file.If a hard-copy is desired at the same time as an on-line report, the Toolbox standard command: SET COPYLP may be used instead of the LP command.SCAN
The SCAN command acts like the CREATE command, except that no summary file is created. It is useful for scanning over log files for I/O errors.SET | REset
These commands have the following syntax: