(gawk.info)Nextfile Function
Implementing `nextfile' as a Function
=====================================
The `nextfile' statement presented in Note: The `nextfile'
Statement, is a `gawk'-specific extension. It is
not available in other implementations of `awk'. This section shows
two versions of a `nextfile' function that you can use to simulate
`gawk''s `nextfile' statement if you cannot use `gawk'.
Here is a first attempt at writing a `nextfile' function.
# nextfile --- skip remaining records in current file
# this should be read in before the "main" awk program
function nextfile() { _abandon_ = FILENAME; next }
_abandon_ == FILENAME { next }
This file should be included before the main program, because it
supplies a rule that must be executed first. This rule compares the
current data file's name (which is always in the `FILENAME' variable)
to a private variable named `_abandon_'. If the file name matches,
then the action part of the rule executes a `next' statement, to go on
to the next record. (The use of `_' in the variable name is a
convention. It is discussed more fully in Note: Naming Library
Function Global Variables.)
The use of the `next' statement effectively creates a loop that reads
all the records from the current data file. Eventually, the end of the
file is reached, and a new data file is opened, changing the value of
`FILENAME'. Once this happens, the comparison of `_abandon_' to
`FILENAME' fails, and execution continues with the first rule of the
"real" program.
The `nextfile' function itself simply sets the value of `_abandon_'
and then executes a `next' statement to start the loop going.(1)
This initial version has a subtle problem. What happens if the same
data file is listed *twice* on the command line, one right after the
other, or even with just a variable assignment between the two
occurrences of the file name?
In such a case, this code will skip right through the file, a second
time, even though it should stop when it gets to the end of the first
occurrence. Here is a second version of `nextfile' that remedies this
problem.
# nextfile --- skip remaining records in current file
# correctly handle successive occurrences of the same file
# Arnold Robbins, arnold@gnu.ai.mit.edu, Public Domain
# May, 1993
# this should be read in before the "main" awk program
function nextfile() { _abandon_ = FILENAME; next }
_abandon_ == FILENAME {
if (FNR == 1)
_abandon_ = ""
else
next
}
The `nextfile' function has not changed. It sets `_abandon_' equal
to the current file name and then executes a `next' satement. The
`next' statement reads the next record and increments `FNR', so `FNR'
is guaranteed to have a value of at least two. However, if `nextfile'
is called for the last record in the file, then `awk' will close the
current data file and move on to the next one. Upon doing so,
`FILENAME' will be set to the name of the new file, and `FNR' will be
reset to one. If this next file is the same as the previous one,
`_abandon_' will still be equal to `FILENAME'. However, `FNR' will be
equal to one, telling us that this is a new occurrence of the file, and
not the one we were reading when the `nextfile' function was executed.
In that case, `_abandon_' is reset to the empty string, so that further
executions of this rule will fail (until the next time that `nextfile'
is called).
If `FNR' is not one, then we are still in the original data file,
and the program executes a `next' statement to skip through it.
An important question to ask at this point is: "Given that the
functionality of `nextfile' can be provided with a library file, why is
it built into `gawk'?" This is an important question. Adding features
for little reason leads to larger, slower programs that are harder to
maintain.
The answer is that building `nextfile' into `gawk' provides
significant gains in efficiency. If the `nextfile' function is executed
at the beginning of a large data file, `awk' still has to scan the
entire file, splitting it up into records, just to skip over it. The
built-in `nextfile' can simply close the file immediately and proceed
to the next one, saving a lot of time. This is particularly important
in `awk', since `awk' programs are generally I/O bound (i.e. they
spend most of their time doing input and output, instead of performing
computations).
---------- Footnotes ----------
(1) Some implementations of `awk' do not allow you to execute
`next' from within a function body. Some other work-around will be
necessary if you use such a version.
Next: Assert Function Prev: Portability Notes Up: Library Functions
[ Dokumentation lokal installierter Software ]