About `find(1)`

emphasizing on portability and the very details of a few basic issues

2016-11-01 (see recent changes)

Table of content

Basic issues:

Limit the search to the current directory portably (versus -maxdepth)
You might be able to avoid xargs(1) (-exec ... {} + )
pattern matching vs. filename expansion (-name '*')

Availability and origin of some features:

Where do you need -print
Availability of -ls
...and -path
...and -printf
Omission of path-list
Following symbolic links, -H/-L
Embedded ..{}..

A useful combination of find and shell, and a frequent question about it:

What is the meaning of x in find -exec sh -c '...' x {}?

Pointers to several system independent implementations.

(What else can you do with find?)

Limit the search to the current directory portably

An often recommended way to limit find(1) to one level (i.e., not descending into directories) is using the expression -maxdepth 1.
This expression was introduced by GNU findutils. FreeBSD 4.1, NetBSD 2.0, OpenBSD 2.0 and the AST toolchest adopted this.
Mac OS X implements it since 10.2 (Darwin 6), switching from NetBSD find to FreeBSD 4.5 find.

But it's also possible to do this with the traditional find ¹.

^[1] You need a find utility that offers the expression "-prune".
And nowadays this is generally available, because it had already been introduced with 4.3BSD-Reno and SVR4 (i.e., about 1989).
An exception to this rule is the busybox multi-call binary: it emphasizes on minimalistic replacements. It implements -prune and -maxdepth since june 2007 as a compile time option.

The portable way to avoid descending is

    $ find .      ! -name . -prune <remaining expressions>
    $ find /etc/. ! -name . -prune <remaining expressions>

The first variant is flawless and the remaining text in this section is about the 2nd variant.

Usually, the second works as well (but I know SVR4.0 v2.1 and Cray Unicos as exception).
(But be very careful with other variants like "find /etc [...]" and "find /etc/ [...]".)

The order of arguments is crucial here, because they are not options but expressions.

The explanation sounds obvious: find never lists the '..' entry. If you also exclude the '.' entry and then apply "-prune" to all the remaining entries, find certainly won't descend anymore.
If you ever need the '.' included (pointed out by Stéphane Chazelas in comp.unix.shell), then you can use "find . $ -name . -o -prune $".

About the portability:

The second call works everywhere I tried, except on SVR4.0 v2.1 and Unicos 9.0.2.2. There you need
```
    $ find /etc/. ! -name /etc/. -prune <remaining expressions>
    $ find /etc   ! -name /etc   -prune <remaining expressions>
```
and so on; because here the string is compared literally (and not with an internal "basename" mechanism). The results of this call will look like "/etc/./hosts", which resolves without any problems on Unix.
The first call doesn't work always with earlier versions of GNU find:
Here, be careful when additionally applying a pattern match on the result of the "-prune". At least the versions since 3.8 until the stable 4.1 (and until the alpha release 4.1.5) suffer from a bug, which prevents them from working:
```
    $ find . ! -name . -prune -name '<pattern>' <remaining expressions>
```
"-name" is not only applied to the result of "-prune" but all existing entries, because it is "optimized" to the left of "-prune". This is violating the fundamental left-to-right order of evaluation.
By the way: If you negate the second "-name" then the bug is apparently not triggered anymore.
Thus, unmaking the negation by just doubling it looks like a proper workaround:
```
    $ find . ! -name . -prune ! $ ! -name '<pattern>' $ <remaining expressions>
```
(See <3D7355C4.23L11TSIK@bigfoot.de> and <3DA23EDA.LHE11XWCK@bigfoot.de>.)
What about the two remaining problematic ways mentioned above?
- "find /etc ! -name etc -prune <remaining expressions>"
  This won't omit files like "/etc/etc/file". Thanks to Stéphane Chazelas for reminding me of this.
  And interestingly, on SCO OpenServer 5.0.6, this requires "! -name /etc" instead of "! -name etc", for top level directories (only).
- "find /etc/ ! -name etc -prune <remaining options>"
  won't omit files like "/etc/etc/file" either; and the actual output is even varying, because on many systems, you mustn't append a slash to the path argument. This is depending on the find-internal "basename" implementation, which is often simpler than the libc implementation:
  The above will fail with a trailing slash for example on AIX 3.2, FreeBSD 4.3, 4.5, GNU-findutils-4.1, Irix 6.5, 5.3, NetBSD 1.5.2, OpenBSD 2.9, SunOS 4.1.4, OpenServer 5.0.6, Solaris 2.1-2.9 and UnixWare 1 (aka SVR4.2).
  It will work properly on AIX 4.3, HP-UX 10.x, NetBSD 1.5 and OSF1/V4, /V5.
  On some of the former, affected versions (all Solaris, FreeBSD-4.3, OpenBSD-2.9, Irix 5/6, UnixWare 1), you even can illustrate the "basename bug" with an interesting workaround. Both following examples yield the same correct result, although the empty argument is provided in the second case:
```
    $ find /etc  ! -name etc -prune -print
    $ find /etc/ ! -name ''  -prune -print 
```
  This interesting side effect is another reason to mention these two unportable variants.

Why using find at all? It's helpful for

automatically avoiding the entries '.' and '..' (particularly, if you feed the result of find to other commands, like chown, chmod, etc). Otherwise, you would have to fiddle around with more complicated shell pattern matching ² or alike.

^[2]	To match all possible files without `.` and `..` you could use `* ..?* .[!.]` Also, see also 8ned6b$k8$1@nnrp1.deja.com and ff., "Command to find out if a directory is empty", in comp.unix.shell about a discussion of patterns like `".?? .[!.]* ? *"`.

using the sophisticated filtering options of find
also processing the arguments with find and thus not having trouble with special filenames which tend to break simple scripts (that is, filenames containing embedded spaces or even newlines). And see below for a powerful way to even combine find and shell.

You might be able to avoid `xargs(1)` (with `-exec ... {} +` )

Another frequently mentioned feature of GNU findutils ³ is a special combination with xargs, "[...] -print0 | xargs -0".
The purpose is increasing performance by avoiding to fork/exec for each single argument;
and the usage of null-terminated filenames avoids problems with unexpected filenames.

^[3]	GNU find introduced it with release 2.0 in nov '90. NetBSD 1.0, FreeBSD 2.0.5, (and thus) Mac OS X came with it from start, and OpenBSD 2.1, the AST toolchest and HP-UX 11.23 incorporated this feature.

However, various find implementations know about the expression -exec + (instead of -exec \;).
This increases performance in the same way, but obsoletes xargs. It's much simpler then:

    $ find . -name xxx -exec command {} +

This has become a standard: it's specified in the SUSv3, aka IEEE 1003.1-2001/2004.
Actually it originates from SVR4 ('88), where it was not documented yet (this feature implemented by D.Korn, see two messages from the austin-group-list, local copies),

implemented in all SunOS 5.x versions, but not documented until SunOS 5.9 (~'01).
implemented but not documented on other earlier SVR4-systems like Unicos 9.0.2.2 (Cray) and EP/IX 2.2.1AA (Control Data).
implemented and documented on UnixWare (aka "SVR4.2", ~'92).

More implementations came along then:

implemented and documented on HP-UX since 11.x.
(The first argument is wrongly ignored on earlier 11.x; that is "find . -exec echo x {} +" works correct.
It was fixed with a patch for 11.11, implemented between find revision PHCO_25905 and PHCO_29692 (~'05), as pointed out
by Florian Anwander and Peter Holzer in <3iakecFkjudbU1@individual.net> ff.)
implemented and documented on FreeBSD since 5.0 (Jun/02)
implemented and rather undocumented since AIX 5.3 ('04).
It's easy to miss in both the manual pages on 5.3 and 6.1: Without further explanation, the only appearance is
"The end of the specified command must be punctuated by a semicolon in quotation marks, an escaped semicolon, or a plus sign."
sfind implements it at least since release 0.92 (07/2004)
GNU findutils introduced it with 4.2.12 (Jan 2005);
but bugs still were fixed in 4.2.19 (Mar 2005),
4.2.26 (Nov 2006) ("Applied bugfix [...], where many short arguments would cause [...] find -exec .. {} + to fail")
and 4.2.28 (Aug 2006) ("find does not subtract environment size in find .. -exec {} +")
[GNU was kind of late, because it seems not have been actively maintained between 2000 and 2004.
On 2002-9-13 and 2004-01-16, I had asked the former maintainers whether they also plan to implement -exec +, but I never got an answer.]
implemented and documented on NetBSD since 4.0 (Oct/06)
implemented in the AST toolchest since 99-02-14, but not documented.
Until and including to AST 2006-01-24, the braces must be omitted to get the expected behaviour.
With AST 2007-03-28 (s+, released 2007-07-01) this has been fixed.
Mac OS X 10.5 (Darwin 9, Leopard, Oct/07) was certified as UNIX03 compliant.
It incorporated FreeBSD 7 find (formerly FreeBSD 4.5 find).
OpenBSD implemented and documented it with release 5.1 (Jan 2012)
Busybox implemented and documented it (as optional compile time feature) with release 1.23.0 (Dec 2014)

Implementations without this feature:

IRIX 6.5.x, because its find is derived from SVR3
Reliant UNIX 5.43, because its find is derived from SVR3

Having -exec ... + in mind, the usefulness and elegance of -print0 is debatable:

even if nothing is found, xargs executes once, which might be unexpected.
Only some implementations know an option to avoid running with empty input.
most utilities (including the whole traditional toolchest) can't handle the null-terminated output from "find ... -print0"
Stéphane Chazelas pointed out that, however, at least GNU grep (-Z), GNU sort (-z), perl (-0) and zsh (via IFS)
understand this output (at the time of this writing). Apart from these, some utilities from the new AT&T ast toolchest
also understand it and the bash "read" built-in knows an appropriate option (-d delimiter) since release 2.04.
Another issue with xargs: possible problems with the character set encoding of arguments for xargs.
For example, some xargs implementations (e.g. SunOS 5) yield an error, if you run them in a multi-byte locale
with path names encoded in a single-byte locale (spotted by S.Chazelas and G.Clare in c.u.s).

"Pattern matching" vs "filename expansion" (`find . -name '*')`

The argument to -name is a pattern. POSIX requires (plain) pattern matching, not filename expansion (file globbing).
Filename expansion is a more special variant of pattern matching, it requires that a leading dot and the slash as pathname separator must be given explicitly.
The shell knows the more common pattern matching in "case $var in pattern)"

So, according to POSIX, "find . -name '*'" shall match leading dots.

On Version 7, find implements matching with glob(3) and thus handles the dot special.
The same applies to System III and V and to the BSDs until 4.3BSD-Tahoe.
In the BSD line, find on 4.3BSD-Reno switched to fnmatch(3), so the dot was not special anymore.

What about later implementations?

The dot is special on

7th edition (aka V7), System III
BSD until 4.3 BSD-Tahoe
SunOS 5.x /bin/find. This implementation is a special case, because it allows a * before the dot, that is, '*.' matches '.'.
SCO OpenServer 5.0.6 both /bin/find and /bin/posix/find
SCO OpenServer 6.0.0 /bin/find and /bin/posix/find ^[sco6]
Cray Unicos 9.0.2.2
Control Data EP/IX 2.2.1AA
busybox-1.01

The dot is not special anymore on:

4.3 BSD-Reno ff.
GNU findutils since 4.2.2 (Oct 2004)
BSDi/OS at least since 4.1
FreeBSD at least since 4.7
HP-UX at least since 8.07
OSF1 st least since V4.0G
SCO OpenServer 6.0.0 /u95/bin/find ^[sco6]
IRIX at least since 4.0.5
SunOS 5.9 /usr/xpg4/bin/find
sfind at least since 0.9
AST find
Minix 3.1.1.

Where do you need "`-print`"?

The original find(1) required "-print". However, it was changed to be the default action on 4.3BSD-Reno and in POSIX.
It was still required in vanilla System V until (including) SVR4.2, that is, UnixWare 1. I had a look at some implementations:

It's not required on:

4.3 BSD-Reno and its descendants: 386BSD, Free-, Net-, OpenBSD, BSDi/OS at least on 4.1
GNU find
AIX at least 4.3
OSF1/V4 ff.
HPUX at least since 8 ff.
OpenServer at least on 5.0.6 / 6.0.0
IRIX 6.5
SunOS since 5.5 ff.
Unicos at least on 9.0.2.2
UnixWare since 2
busybox-1.01
Minix at least on 3.1.1,
AST find
sfind at least v1.1

It was required on:

7th edition (aka V7)
System III
BSD until and including 4.3 BSD-Tahoe
Ultrix at least on 4.5
AIX at least on 3.2
SunOS 4 at least on 4.1.4
SunOS 5.1-5.4
UnixWare 1 (aka SVR4.2)
IRIX at least on 5.3
MUNIX at least 3.2, EP/IX at least 2.2.1AA

Omitting -print:

Be careful with omitting -print when using logical operators (-a or -o):
An implicit -print binds to the whole expression, while an explicit -print binds as if it was added as -a -print.
Example:

    find .    -name omit-directory -prune -o -type f 
    find . \( -name omit-directory -prune -o -type f \) -print

    find .    -name omit-directory -prune -o -type f  -print
    find . \( -name omit-directory -prune \) -o \( -type f -print \)

BSD fast-find

Another issue where omitting -print can be confusing: the BSD fast-find feature.
It was first implemented in 4.3 BSD: If given only one single argument,

   find filename

then a database was searched for pathnames containing this filename as component.

But if -print is the default action, this can be confused with the widely known syntax,
where one single argument is interpreted as a directory to descend into.
Thus the feature was removed with 4.3 BSD-Reno and implemented with another utility, locate.

Some variants which implemented the fast-find feature:

4.3BSD (first BSD) until 4.3BSD-Tahoe (last BSD)
SunOS 4.1.2

Availability of the expression "`-ls`"?

Version 7 find didn't know "-ls".
It was introduced with 4.3BSD (early '86).
However, it has never been implemented in the vanilla SysV line (up to SVR4.2).
So it was added individually on non-BSDs.

It is for instance available on

4.3 BSD and its descendants: 386BSD, Free-, Net-, OpenBSD, BSDi/OS (at least on 4.1), SunOS 4.1.2
GNU (from its beginning)
OSF1/V4
AIX 3.2
SunOS 5.1 ff.
EP-IX 2.2.1AA (/bsd43/bin/find)
AST find (since 98-11-11)
sfind

It is not available on

Ultrix 4.5
HP-UX 11.31
SCO OpenServer 5.0.6 / 6.0.0
SVR4.0 v2.1, UnixWare 7.1.4 (aka SVR4.2), Unicos 9.0.2.2, EP/IX 2.2.1AA (/usr/bin/find), Reliant Unix, Irix 6.5.22
Minix 3.1.1.

By the way, the resulting output imitates "ls -ldis".
It includes inode number and size in blocks (with system specific blocksize), sometimes it's documented as kilobytes.

Availability of the expression "`-path`"?

Version 7 find didn't know "-path".
It was introduced with 4.4BSD-alpha (mid '91).
It has never been implemented in the SysV line.
Meanwhile it was picked up by SUSv4 (aka POSIX).

It is for example available on

GNU (since 3.6, mid-1992),
NetBSD (1.2, 12/93, sync with 4.4BSD).
FreeBSD and
OpenBSD (initial revisions),
HP-UX at least since 10.x,
sfind (since 0.93, 7/'04)
AST find.

It is not available on

SunOS 5.10, 5.11 (OpenSolaris 2009.06) (both traditional and xpg4 version of find on each)
IRIX 6.5
AIX 6.1
OpenServer 5.0.6, 6.0.0 (bin and u95)
OSF1 V5.1
Ultrix 4.2
Minix 3.1.8
UnixWare 7.1.4, OpenUnix 8 (aka SVR4.2)

Availability of the expression "`-printf`"?

Only GNU find seems to implement it as of this writing. The Changelog for GNU find doesn't mention its implementation; so it might have been available from start (02/'87)

Omission of path-list"?

You may omit the path-list (example: "find -print") with

GNU find (since version 2.1)
busybox (v1.01)
sfind (v1.1)
AST find
OpenServer 6.0.0 u95-find

But most implementations will require it, that is, at least

FreeBSD 6
NetBSD 3
BSDi/OS 4.1
HP-UX until 11.31
OSF1/V5.1
AIX 3.2
OpenServer 5 and 6 (bin and posix)
SunOS 4
IRIX 6.5.22
Unicos 9.0.2.2
SunOS 5.10 (svr4 and xpg4/posix)
Minix 3.1.1.
UnixWare 1 (aka SVR4.2)

Following symbolic links, "`-H/-L`"

"-H" was introduced with 4.4BSD-alpha; "-L" with 4.4BSD-Lite.
Meanwhile these flags have been specified with SUSv3.

Some implementations that support the options -H/-L:

NetBSD since 1.4 (formerly only -H)
FreeBSD since 2.0 (formerly only -H)
OpenBSD since 3.2 (formerly only -H)
AST find at least since 95-03-11
sfind at least since 0.92
GNU findutils since 4.2.5 (2004-11-19)
SunOS since 5.10 (XPG find)
AIX since 5.2

Not implemented in

SCO OSR 5.0.6
Unixware7
EP-IX 2.2.1
OSF 5.1
IRIX 4.0.5

Embedded ..{}..

Several modern implementations substitute {} even if it is embedded in a string like this

    find . -exec echo xx{}xx \;

However, the traditional find requires {} to be a separate argument.
The new behaviour was introduced around 386BSD, according to the FreeBSD manpage archive.
POSIX/SUS only specifies {} standing alone, but allows them to be embedded as implementation defined behaviour.

These variants for example accept an embedded {}:

all traditional and free BSDs, BSDi/OS, Mac OS X
GNU (probably from the beginning)
AST (probably from the beginning)
busybox since 1.1.0-pre1, 20051003 (where -exec was implemented itself)
OpenServer 5.0.6
OpenServer 6.0.0 (bin and posix)

but these don't accept an embedded {}:

SunOS 5.10
IRIX 6.5.22
HP-UX 11.11 / 11.31
sfind-1.1
OpenServer 6.0.0 (u95-find)
SVR4.0 v2.1, EP/IX 2.2.1
UnixWare 1, 7.1.4 (aka SVR4.2)

What does the x in "`find ... -exec sh -c '...' x {}`" mean?

This is not concerning find, but differences in shell implementations:
When executing "$SHELL -c 'cmd' x y z ...", how do shells convert the arguments to $0, $1, etc.?
Neverthelesse, it's explained here, because the above combination of find and sh is very versatile, like a swiss army knife.

Possible situations:
Consider the call
```
    $SHELL -c 'command' arg0 arg1 ... 
```
Almost all shells set $0 to arg0 and $1 to arg1.
This means:
- you can set $0 yourself
- you can make use of "$@" as usual, if you supply your arguments starting from arg1.
Thus you will set arg0 to something which makes sense to become $0, e.g. "sh" or "find-sh".
If you set it to something different, keep in mind that an embedded r might trigger a restricted mode,
or that zsh switches to different modes depending on $0. (Update: I can't reproduce any of this,
not even with early bourne shells. Thanks to Stéphane Chazelas for the hint.)
However: early variants ⁴ of both the Almquist shell and the the Korn shell (before ksh88f)
implement this differently: $0 is set to $SHELL and $1 is set to arg0. Here you need:
```
    $SHELL -c 'command' arg1 arg2 ...  
```
What does this mean, if you need a portable call?
Solutions for maximum portability:
Back to find: if you only want to make use of one argument per call (that is, $1 but not $2, $3, etc.),
you can work around by merging both variants:
```
    find ... -exec sh -c '...' {} {}
```
But if you want to use more parameters (or just "$@") in the shell:
Then you either have to decide on one of both variants.
Or you use a completely robust workaround (suggested by Stéphane Chazelas in comp.unix.shell)
```
    SHELL -c 'shift $1; command' 2 1 arg1 arg2 ...
```
But, why would you want to use find like this at all?
Example: You want to move all files in one directory to another directory,
but get an "arg list too long" due to ARG_MAX (too many files) with "mv * /dir".
For reasons of speed you might want to avoid a shell for-loop if you have a really high number of files.
And if you cannot exclude filenames with blanks or newlines, you will use "find" with its + notation.
You will need the shell to hand over the last argument, the target directory, to the "mv" command separately. ⁵
Here are two solutions, the first assumes a modern shell implementation, the second is completely robust,
```
    find . ! -name . -prune -exec sh -c           'mv "$@" targetdirectory/' sh  {} +
    find . ! -name . -prune -exec sh -c 'shift $1; mv "$@" targetdirectory/' 2 1      {} +
```
You do need the shell, because the following is not a valid syntax (it was not allowed to minimize possible confusion with a valid argument "+"):
```
    find ... -exec mv {} target_directory + 
```

^[4] Shell implementations with the old behaviour about arg0:

Older variants than ksh88f for example exist on
· HP-UX 8-11 ("ksh" but not "sh"): ksh88c
· AIX 3.2: ksh88d
· Ultrix 4.5: ksh88
· Unicos 9: ksh88e
· SVR4.0 v2.1: ksh88d

Older variants of ash exist on
· all traditional BSDs that come with an ash (4.3BSD-Net/2 ... 4.4BSD-Lite2)
· FreeBSD before 2.1.0 (10/95)
· NetBSD before 1.2 (10/96)
· Minix before 3.1.3 (5/06)

^[5]	GNU mv knows an option to prepend the target and allows to `find [...] -exec mv --target-directory=dir {} +`

^[sco6]

SCO OpenServer 6.0.0 provides three find variants (Thanks to Rodolfo Martín for access):
- /usr/bin/find
- /usr/bin/posix/find
- /u95/bin/find - interestingly more posix compliant than the posix variant, for example concerning pattern matching

System independent find implementations

GNU findutils.

Gunnar Ritter's Heirloom Toolchest implements several traditional variants and POSIX/SUS.

The AT&T AST toolchest is a POSIX/SUS implementation with numerous extensions.

The busybox toolchest is aiming at tiny implementations.

Jörg Schilling's sfind is a POSIX/SUS implementation.

See a link (cat-v.org) to an interesting answer from Dennis M. Ritchie about the history of the syntax of find.

What else can you do with find? Find prime numbers!

<http://www.in-ulm.de/~mascheck/various/find/>
Comments please to mascheck@in-ulm.de, I'd like to hear from you.

About find(1)

You might be able to avoid xargs(1) (with -exec ... {} + )

About `find(1)`

You might be able to avoid `xargs(1)` (with `-exec ... {} +` )