ARG_MAX
| Shells
| portability
| permissions
| UUOC
| ancient
| -
| ../Various
| HOME
$() vs )
| IFS
| using siginfo
| nanosleep
| line charset
| locale
find(1)emphasizing on portability and the very details of a few basic issues
2010-07-10 (see recent changes)
Table of content
Basic issues:
-maxdepth)
xargs(1)
(-exec ... {} + )
-name '*')
-print
-ls
-path
-printf
-H/-L
..{}..
Pointers to several system independent implementations.
An often recommended way to limit find(1) to one
level (i.e., not descending into directories) is using the
expression -maxdepth 1.
This expression was introduced by GNU findutils.
FreeBSD 4.1, NetBSD 2.0, OpenBSD 2.0 and the AST toolchest adopted this.
Mac OS X implements it since 10.2
(Darwin 6), switching from NetBSD find to FreeBSD 4.5 find.
But it's also possible to do this with the traditional find 1.
| [1] |
You need a find(1) that offers the expression "-prune".
And nowadays this is generally available, because it had already been introduced with 4.3BSD-Reno and SVR4 (i.e., about 1989). An exception to this rule is the busybox multi-call binary: it emphasizes on minimalistic replacements. It implements -prune and -maxdepth since june 2007 as a compile time option. |
The portable way to avoid descending is
$ find . ! -name . -prune <remaining expressions>
$ find /etc/. ! -name . -prune <remaining expressions>
Usually, the second works as well (but I know Cray Unicos as exception).
find /etc [...]"
and "find /etc/ [...]".)
The order of arguments is crucial here, because they are not options but expressions.
The explanation sounds obvious: find never
lists the '..' entry. If you also exclude the
'.' entry and then apply "-prune" to all
the remaining entries, find certainly won't descend anymore.
If you ever need it included (pointed out by Stephane Chazelas in comp.unix.shell),
then use "find . \( -name . -o -prune \)".
About the portability:
$ find /etc/. ! -name /etc/. -prune <remaining expressions>
$ find /etc ! -name /etc -prune <remaining expressions>
and so on; because here the string is compared literally (and not with
an internal "basename" mechanism).
The results of this call will look like
"/etc/./hosts", which resolves without any problems on Unix.
Here, be careful when additionally applying a pattern match on the result of the "-prune". At least the versions since 3.8 until the stable 4.1 (and until the alpha release 4.1.5) suffer from a bug, which prevents them from working:
$ find . ! -name . -prune -name '<pattern>' <remaining expressions>
"-name" is not only applied to the result of
"-prune" but all existing entries,
because it is "optimized" to the left of "-prune".
This is violating the fundamental left-to-right order of evaluation.
By the way: If you negate the second "-name" then the bug is apparently
not triggered anymore.
Thus, unmaking the negation by just
doubling it looks like a proper workaround:
$ find . ! -name . -prune ! \( ! -name '<pattern>' \) <remaining expressions>
(See <3D7355C4.23L11TSIK@bigfoot.de> and
<3DA23EDA.LHE11XWCK@bigfoot.de>.)
find /etc ! -name etc
-prune <remaining expressions>"
This won't omit files like "/etc/etc/file".
Thanks to Stéphane Chazelas for reminding me of this.
And interestingly, on SCO OpenServer 5.0.6, this requires
"! -name /etc" instead of
"! -name etc", for top level directories (only).
find /etc/ ! -name etc
-prune <remaining options>"
/etc/etc/file" either; and
the actual output is even varying, because on many systems,
you mustn't append a slash to the path argument. This is depending on the
find-internal "basename" implementation, which is often simpler than
the libc implementation:
The above will fail with a trailing slash for example on AIX 3.2, FreeBSD 4.3, 4.5, GNU-findutils-4.1, Irix 6.5, 5.3, NetBSD 1.5.2, OpenBSD 2.9, SunOS 4.1.4, OpenServer 5.0.6, Solaris 2.1-2.9 and UnixWare 1 (SVR4.2).
It will work properly on AIX 4.3, HP-UX 10.x, NetBSD 1.5 and OSF1/V4, /V5.
On some of the former, affected versions (all Solaris, FreeBSD-4.3, OpenBSD-2.9, Irix 5/6, UnixWare 1), you even can illustrate the "basename bug" with an interesting workaround. Both following examples yield the same correct result, although the empty argument is provided in the second case:
$ find /etc ! -name etc -prune -print
$ find /etc/ ! -name '' -prune -print
This interesting side effect is another reason to mention these two unportable variants.
Why using find at all? It's helpful for
find to other commands, like
chown, chmod, etc). Otherwise,
you would have to fiddle around with more complicated shell pattern
matching 2 or alike.
| [2] |
See also 8ned6b$k8$1@nnrp1.deja.com
and ff., "Command to find out if a directory is empty", in comp.unix.shell about a discussion of patterns like ".??* .[!.]* ? *".
|
find
Another frequently mentioned feature of GNU findutils
3 is a special combination
with xargs, "[...] -print0 | xargs -0".
The purpose is increasing performance by avoiding to fork/exec
for each single argument;
and the usage of null-terminated
filenames avoids problems with unexpected filenames.
| [3] | NetBSD 1.0, FreeBSD 2.0.5, (and thus) Mac OS X from start, OpenBSD 2.1, the AST toolchest and HP-UX 11.23 incorporated this, too. |
However, various find implementations know about the expression
+ (instead of ;) in connection with -exec.
This increases performance in the same way, but obsoletes xargs.
It's much simpler then:
$ find . -name xxx -exec command {} +
This has become a standard: it's specified in the
SUSv3,
aka IEEE 1003.1-2001/2004.
find . -exec echo x {} +" works correct.
Implementations without this feature:
Having -exec ... + in mind, the usefulness and elegance of -print0 is debatable,
because most utilities (including the whole traditional toolchest) don't handle null-terminated output.
Stéphane Chazelas pointed out that at least
GNU grep (-Z), GNU sort (-z), perl (-0) and zsh (via IFS)
understand this output (at the time of this writing).
Apart from these, some utilities from the new AT&T ast toolchest
also understand
it and the bash "read" built-in knows an
appropriate option (-d delimiter) since release 2.04.
Another issue with xargs: possible problems with the character set encoding of arguments for xargs.
For example, some xargs implementations (e.g. SunOS 5) yield an error,
if you run them in a multi-byte locale
with path names encoded in a single-byte locale (spotted by S.Chazelas and G.Clare in c.u.s).
find . -name '*') -name is a pattern. POSIX requires (plain) pattern matching, not filename expansion (file globbing).
case $var in pattern)"
So, according to POSIX, "find . -name '*'" shall match leading dots.
On Version 7, find implements matching with
glob(3) and thus handles the dot special.
The same applies to System III and V and to the BSDs until 4.3BSD-Tahoe.
In the BSD line, find on 4.3BSD-Reno switched to
fnmatch(3), so the dot is not special anymore.
What about later implementations?
The dot is special on
* before the dot,
that is, '*.' matches '.'.
/bin/find
The dot is not special anymore on:
u95-find [sco6]
/usr/xpg4/bin/find
-print"?find(1) required "-print".
However, it was changed to be the default action on 4.3BSD-Reno
and in POSIX.
It's not required on:
It was required on:
Omitting -print:
Be careful with omitting -print when using logical operators (-a or -o):
An implicit -print binds to the whole expression, while
an explicit -print binds as if it was added as -a -print.
Example:
find . -name omit-directory -prune -o -type f
find . \( -name omit-directory -prune -o -type f \) -print
find . -name omit-directory -prune -o -type f -print
find . \( -name omit-directory -prune \) -o \( -type f -print \)
BSD fast-find
Another issue where omitting -print can be confusing: the BSD fast-find feature.
It was first implemented in 4.3 BSD: If given only one single argument,
find filenamethen a database was searched for pathnames containing this filename as component.
But if -print is the default action, this can be confused with the widely known syntax,
where one single argument is interpreted as a directory to descend into.
Thus the feature was removed with 4.3 BSD-Reno and implemented with another utility, locate.
Some variants which implemented the fast-find feature:
-ls"?find didn't know "-ls".
It is for instance available on
It is not available on
By the way, the resulting output imitates "ls -ldis".
It includes inode number and size in blocks
(with system specific blocksize), sometimes it's documented as kilobytes.
-path"?find didn't know "-path".
It is for example available on
It is not available on
-printf"?find -print") with
-H/-L"-H" was introduced with 4.4BSD-alpha; "-L" with 4.4BSD-Lite.
Some implementations that support the options -H/-L:
-H)
-H)
-H)
XPG find)
Several modern implementations substitute {} even if it is embedded in a string like this
find . -exec echo xx{}xx \;
However, the traditional find requires {} to be a separate argument.
The new behaviour was introduced around 386BSD,
according to the FreeBSD manpage archive.
POSIX/SUS only specifies {} standing alone, but allows them to be embedded
as implementation defined behaviour.
These variants for example accept an embedded {}:
but these don't accept an embedded {}:
$SHELL -c 'cmd' x y z ...", how
do shells convert the arguments to $0, $1, etc.?
Neverthelesse, it's explained here, because the above combination
of find and sh is very versatile,
like a swiss army knife.
Consider the call
$SHELL -c 'command' arg0 arg1 ...
Most shells, especially modern ones, set $0 to arg0 and $1 to arg1.
Thus you will set arg0 to something which makes sense to become $0, e.g. "sh" or "find-sh".
However: early variants 4
of both the Almquist shell and the the Korn shell (before ksh88f)
implement this differently: $0 is set to $SHELL
and $1 is set to arg0. Here you need:
$SHELL -c 'command' arg1 arg2 ...
What does this mean, if you need a portable call?
Back to find: if you only want to make use of one argument
per call (that is, $1 but not $2, $3, etc.),
you can work around by merging both variants:
find ... -exec sh -c '...' {} {}
But if you want to use more parameters (or just "$@") in the shell:
SHELL -c 'shift $1; command' 2 1 arg1 arg2 ...
Example: You want to move all files in one directory to another directory,
but get an "arg list too long" due to ARG_MAX (too many files) with "mv * /dir".
For reasons of speed you might want to avoid a shell for-loop if you have
a really high number of files.
And if you cannot exclude filenames with blanks or newlines,
you will use "find" with its + notation.
You will need the shell to hand over the last argument,
the target directory, to the "mv" command separately. 5
Here are two solutions, the first assumes a modern shell implementation,
the second is completely robust,
find . ! -name . -prune -exec sh -c 'mv "$@" targetdirectory/' find-sh {} +
find . ! -name . -prune -exec sh -c 'shift $1; mv "$@" targetdirectory/' 2 1 {} +
You do need the shell, because the following is not a valid syntax (it was not allowed to minimize possible confusion with a valid argument "+"):
find ... -exec mv {} target_directory +
| [4] |
Shell implementations with the old behaviour about arg0:
Older variants than ksh88f for example exist on · HP-UX 8-11 ("ksh" but not "sh"): ksh88c · AIX 3.2: ksh88d · Ultrix 4.5: ksh88a-c? · Unicos 9: ksh88e Older variants of ash exist on · all traditional BSDs that come with an ash (4.3BSD-Net/2 ... 4.4BSD-Lite2) · FreeBSD before 2.1.0 (10/95) · NetBSD before 1.2 (10/96) · Minix before 3.1.3 (5/06) |
| [5] |
GNU mv knows an option to prepend the target and allows to
find [...] -exec mv --target-directory=dir {} +
|
Gunnar Ritter's
Heirloom Toolchest
implements several traditional variants and POSIX/SUS.
The AT&T AST
toolchest is a POSIX/SUS implementation with numerous extensions.
The busybox toolchest is aiming at tiny implementations.
Jörg Schilling's
sfind
is a POSIX/SUS implementation.
| [sco6] |
SCO OpenServer 6.0.0 provides three find variants
(Thanks to Rodolfo Martín for access):
- /usr/bin/find - /usr/bin/posix/find - /u95/bin/find - interestingly more posix compliant than the posix variant, for example concerning pattern matching |