SHELLdorado Newsletter 1/2005 - April 30th, 2005 

http://groups.google.com/group/comp.unix.shell/browse_frm/thread/92fbe47bab2bf93e/f58a4fc700f5e3aa

Date:          Sat, Apr 30 2005 10:43 am
Groups:        comp.unix.shell

SHELLdorado Newsletter 1/2005 - April 30th, 2005

The "SHELLdorado Newsletter" covers UNIX shell script related topics. To subscribe to this newsletter, leave your e-mail address at the SHELLdorado home page:

http://www.shelldorado.com/

View previous issues at the following location:

http://www.shelldorado.com/newsletter/

"Heiner's SHELLdorado" is a place for UNIX shell script programmers providing

Many shell script examples, shell scripting tips & tricks,
a large collection of shell-related links & more...

Contents

o  Shell Tip: How to read a file line-by-line
o  Shell Tip: Print a line from a file given its line number
o  Shell Tip: How to convert upper-case file names to lower-case
o  Shell Tip: Speeding up scripts using "xargs"
o  Shell Tip: How to avoid "Argument list too long" errors

Shell Tip: Speeding up scripts using "xargs" 

The essential part of writing fast scripts is avoiding external processes.

for file in *.txt
do
    gzip "$file"
done

is much slower than just

gzip *.txt

because the former code may need many "gzip" processes for a task the latter command accomplishes with only one external process. But how could we build a command line like the one above when the input files come from a file, or even standard input? A naive approach could be

gzip `cat textfiles.list archivefiles.list`

but this command can easily run into an "Argument list too long" error, and doesn't work with file names containing embedded whitespace characters. A better solution is using "xargs":

cat textfiles.list archivefiles.list | xargs gzip

The "xargs" command reads its input line by line, and build a command line by appending each line to its arguments (here: "gzip"). Therefore the input

a.txt
b.txt
c.txt

would result in "xargs" executing the command

gzip a.txt b.txt c.txt

"xargs" also takes care that the resulting command line does not get too long, and therefore avoids "Argument list too long" errors.

Shell Tip: How to avoid "Argument list too long" errors 

Oh no, there it is again: the system's spool directory is almost full (4018 files); old files need to be removed, and all useful commands only print the dreaded "Argument list too long":

$ cd /var/spool/data
$ ls *
ls: Argument list too long
$ rm *
rm: Argument list too long

So what exactly in the character '*' is too long? Well, the current shell does the useful work of converting '*' to a (large) list of files matching that pattern. This is not the problem. Afterwards, it tries to execute the command (e.g. "/bin/ls") with the file list using the system call execve(2) (or a similar one). This system call has a limitation for the maximum number of bytes that can be used for arguments and environment variables(*), and fails.

It's important to note that the limitation is on the side of the the system call, not the shell's internal lists.

To work around this problem, we'll use shell-internal functions, or ways to limit the number of files directly specified as arguments to a command.

Examples:

  • Don't specify arguments, to get the (hopefully) useful default:

    $ ls
  • Use shell-internal functionality ("echo" and "for" are shell-internal commands):

    $ echo *
    file1 file2 [...]
    $ for file in *; do rm "$file"; done    # be careful!
  • Use "xargs"

    $ ls | xargs rm         # careful!
    $ find . -type f -size +100000 -print | xargs ...
  • Limit the number of arguments for a command:

    $ ls [a-l]*
    $ ls [m-z]*

Using this techniques should help getting around the problem.

(*) Parameter ARG_MAX, often 128K (Linux) or 1 or 2 MB (Solaris).

Shell Tip: How to avoid "Argument list too long" errors 

  1. Avoid 'ls /long/path/etc/files', cd into the directory first to do ls.
  2. If that won't help, use cmd:xargs to solve Argument list too long problem:

    $ ls | xargs cmd...
    • Or, use cmd:split to solve Argument list too long problem: If the path info is necessary, or there are just to much files in a single directory for ls <criteria>, you can simply use 'split' to split up the result before using them.

xpt

documented on: 2007-09-09