http://www.usf.uos.de/infoservice/doc/

(gawk.info)Nextfile Function


Next: Assert Function Prev: Portability Notes Up: Library Functions
Enter node , (file) or (file)node

Implementing `nextfile' as a Function
=====================================

   The `nextfile' statement presented in Note: The `nextfile'
Statement, is a `gawk'-specific extension.  It is
not available in other implementations of `awk'.  This section shows
two versions of a `nextfile' function that you can use to simulate
`gawk''s `nextfile' statement if you cannot use `gawk'.

   Here is a first attempt at writing a `nextfile' function.

     # nextfile --- skip remaining records in current file
     
     # this should be read in before the "main" awk program
     
     function nextfile()    { _abandon_ = FILENAME; next }
     
     _abandon_ == FILENAME  { next }

   This file should be included before the main program, because it
supplies a rule that must be executed first.  This rule compares the
current data file's name (which is always in the `FILENAME' variable)
to a private variable named `_abandon_'.  If the file name matches,
then the action part of the rule executes a `next' statement, to go on
to the next record.  (The use of `_' in the variable name is a
convention.  It is discussed more fully in Note: Naming Library
Function Global Variables.)

   The use of the `next' statement effectively creates a loop that reads
all the records from the current data file.  Eventually, the end of the
file is reached, and a new data file is opened, changing the value of
`FILENAME'.  Once this happens, the comparison of `_abandon_' to
`FILENAME' fails, and execution continues with the first rule of the
"real" program.

   The `nextfile' function itself simply sets the value of `_abandon_'
and then executes a `next' statement to start the loop going.(1)

   This initial version has a subtle problem.  What happens if the same
data file is listed *twice* on the command line, one right after the
other, or even with just a variable assignment between the two
occurrences of the file name?

   In such a case, this code will skip right through the file, a second
time, even though it should stop when it gets to the end of the first
occurrence.  Here is a second version of `nextfile' that remedies this
problem.

     # nextfile --- skip remaining records in current file
     # correctly handle successive occurrences of the same file
     # Arnold Robbins, arnold@gnu.ai.mit.edu, Public Domain
     # May, 1993
     
     # this should be read in before the "main" awk program
     
     function nextfile()   { _abandon_ = FILENAME; next }
     
     _abandon_ == FILENAME {
           if (FNR == 1)
               _abandon_ = ""
           else
               next
     }

   The `nextfile' function has not changed.  It sets `_abandon_' equal
to the current file name and then executes a `next' satement.  The
`next' statement reads the next record and increments `FNR', so `FNR'
is guaranteed to have a value of at least two.  However, if `nextfile'
is called for the last record in the file, then `awk' will close the
current data file and move on to the next one.  Upon doing so,
`FILENAME' will be set to the name of the new file, and `FNR' will be
reset to one.  If this next file is the same as the previous one,
`_abandon_' will still be equal to `FILENAME'.  However, `FNR' will be
equal to one, telling us that this is a new occurrence of the file, and
not the one we were reading when the `nextfile' function was executed.
In that case, `_abandon_' is reset to the empty string, so that further
executions of this rule will fail (until the next time that `nextfile'
is called).

   If `FNR' is not one, then we are still in the original data file,
and the program executes a `next' statement to skip through it.

   An important question to ask at this point is: "Given that the
functionality of `nextfile' can be provided with a library file, why is
it built into `gawk'?"  This is an important question.  Adding features
for little reason leads to larger, slower programs that are harder to
maintain.

   The answer is that building `nextfile' into `gawk' provides
significant gains in efficiency.  If the `nextfile' function is executed
at the beginning of a large data file, `awk' still has to scan the
entire file, splitting it up into records, just to skip over it.  The
built-in `nextfile' can simply close the file immediately and proceed
to the next one, saving a lot of time.  This is particularly important
in `awk', since `awk' programs are generally I/O bound (i.e.  they
spend most of their time doing input and output, instead of performing
computations).

   ---------- Footnotes ----------

   (1)  Some implementations of `awk' do not allow you to execute
`next' from within a function body. Some other work-around will be
necessary if you use such a version.


Next: Assert Function Prev: Portability Notes Up: Library Functions
[ Dokumentation lokal installierter Software ]