http://groups.google.com/group/comp.unix.shell/browse_frm/thread/92fbe47bab2bf93e/f58a4fc700f5e3aa
Date: Sat, Apr 30 2005 10:43 am Groups: comp.unix.shell
SHELLdorado Newsletter 1/2005 - April 30th, 2005
The "SHELLdorado Newsletter" covers UNIX shell script related topics. To subscribe to this newsletter, leave your e-mail address at the SHELLdorado home page:
http://www.shelldorado.com/
View previous issues at the following location:
http://www.shelldorado.com/newsletter/
"Heiner's SHELLdorado" is a place for UNIX shell script programmers providing
Many shell script examples, shell scripting tips & tricks, a large collection of shell-related links & more...
Contents
o Shell Tip: How to read a file line-by-line o Shell Tip: Print a line from a file given its line number o Shell Tip: How to convert upper-case file names to lower-case o Shell Tip: Speeding up scripts using "xargs" o Shell Tip: How to avoid "Argument list too long" errors
The essential part of writing fast scripts is avoiding external processes.
for file in *.txt do gzip "$file" done
is much slower than just
gzip *.txt
because the former code may need many "gzip" processes for a task the latter command accomplishes with only one external process. But how could we build a command line like the one above when the input files come from a file, or even standard input? A naive approach could be
gzip `cat textfiles.list archivefiles.list`
but this command can easily run into an "Argument list too long" error, and doesn't work with file names containing embedded whitespace characters. A better solution is using "xargs":
cat textfiles.list archivefiles.list | xargs gzip
The "xargs" command reads its input line by line, and build a command line by appending each line to its arguments (here: "gzip"). Therefore the input
a.txt b.txt c.txt
would result in "xargs" executing the command
gzip a.txt b.txt c.txt
"xargs" also takes care that the resulting command line does not get too long, and therefore avoids "Argument list too long" errors.
Oh no, there it is again: the system's spool directory is almost full (4018 files); old files need to be removed, and all useful commands only print the dreaded "Argument list too long":
$ cd /var/spool/data $ ls * ls: Argument list too long $ rm * rm: Argument list too long
So what exactly in the character '*' is too long? Well, the current shell does the useful work of converting '*' to a (large) list of files matching that pattern. This is not the problem. Afterwards, it tries to execute the command (e.g. "/bin/ls") with the file list using the system call execve(2) (or a similar one). This system call has a limitation for the maximum number of bytes that can be used for arguments and environment variables(*), and fails.
It's important to note that the limitation is on the side of the the system call, not the shell's internal lists.
To work around this problem, we'll use shell-internal functions, or ways to limit the number of files directly specified as arguments to a command.
Examples:
Don't specify arguments, to get the (hopefully) useful default:
$ ls
Use shell-internal functionality ("echo" and "for" are shell-internal commands):
$ echo * file1 file2 [...]
$ for file in *; do rm "$file"; done # be careful!
Use "xargs"
$ ls | xargs rm # careful!
$ find . -type f -size +100000 -print | xargs ...
Limit the number of arguments for a command:
$ ls [a-l]* $ ls [m-z]*
Using this techniques should help getting around the problem.
(*) Parameter ARG_MAX, often 128K (Linux) or 1 or 2 MB (Solaris).
If that won't help, use cmd:xargs to solve Argument list too long problem:
$ ls | xargs cmd...
xpt
documented on: 2007-09-09