Unwanted Buffering

Unwanted Buffering

2017-03-29 (see recent changes)

Avoiding "extraneous" buffering in pipelines

If you build a chain of filters, the throughput may drop down unexpectedly slow:

    while sleep 0.1; do date; done|grep .         # immediate output each tenth of a second
    while sleep 0.1; do date; done|grep .|grep .  # delayed output in big chunks, about every 10 seconds

A real world example is searching growing logfiles with tail -f feeding several invocations of grep.

The explanation can be found in the C library, which provides different output buffering methods: unbuffered, line buffeed and block (or fully) buffered. See setbuf(3).
There is no such buffering if write(2) is used instead of library functions like printf(3).
The C library offers buffering simply for performance reasons. The highest throughput is achieved with block buffering due to the lowest overhead.
The slower line buffering might only be used if a command is directly connected to a tty, where immediate output is expected.
In a pipeline, the full buffering usually happens in blocks of PIPE_BUF bytes. Common values are 4096 bytes (4K) or 5120 bytes (10 blocks of 512 bytes).

    $ cpp<<EOF|egrep -v '^#|^$'
    #include <limits.h>
    PIPE_BUF
    EOF

    4096

Keep in mind that it's not about the pipeline itself, but the buffering method, if the utility doesn't see a TTY.

There's no universal way to avoid the buffering of a utility. However, here are some possible options:

1.) some utilities allow to switch off full buffering
- Some grep know --line-buffered: GNU since 2.5 (03/'02), FreeBSD since 5.3, NetBSD since 2.0, OpenBSD since 3.6
- GNU sed (-u/--unbuffered), since 3.02.80 (08/'99)
- GNU awk (-W interactive or per fflush())
- tcpdump (-l)
- GNU coreutils 7.5 (8/'09) provides the wrapper tool stdbuf which uses method 2.) from below (more information).
- even since 7th edition Unix, the traditional cat knows the flag -u. But some variants do not buffer in a pipe, e.g. Solaris 2.9 and GNU.
By the way, tee even produces line buffered output per default. That's why tail -f behaves intuitively on the resulting file.

2.) if you have GNU available, use the wrapper-utility "stdbuf"
It is provided by GNU coreutils since 7.5 (08/2009).

    while sleep 0.1; do date; done|           grep .|grep .   # delayed output
    while sleep 0.1; do date; done|stdbuf -o0 grep .|grep .   # immediate output

3.) if you don't have a wrapper available:
you can work around the problem yourself if you have a compiler, if your systems implements LD_PRELOAD,
and if your tool is linked dynamically against libc. Then you can preload code which modifies the buffering before the utility runs.
Compile this code to a shared library:
```
	/* linux$ gcc -fpic -c unbuffer.c; ld -shared -o libunbuffer.so unbuffer.o
	 * solaris$ cc -Kpic -c unbuffer.c; ld -G -o libunbuffer.so unbuffer.o
	 */
	#include <stdio.h>
	void _init() {
	  setbuf(stdout, NULL);
	} 
```
and create a wrapper named "unbuffer":
```
	#!/bin/sh
	LD_PRELOAD=$HOME/lib/libunbuffer.so export LD_PRELOAD
	exec "$@" 
```
Coming back to the initial example:
```
    while sleep 0.1; do date; done|         grep .|grep .   # delayed output
    while sleep 0.1; do date; done|unbuffer grep .|grep .   # immediate output
```
Using _init() is only a hack (possible clash with real usage of this internal function?).
This item was inspired by a usenet posting from Stephane Chazelas.
Meanwhile GNU coreutils since 7.5 (08/'09) provides this functionality with the tool "stdbuf".

4.) substitute the command with different buffering behaviour

a while-shell-loop

    while sleep 0.1; do date; done| grep .                                                           |grep .  # delayed output
    while sleep 0.1; do date; done| while IFS= read -r line; do
					printf '%s\n' "$line"|grep -q . && printf '%s\n' "$line";
				    done                                                             |grep .  # immediate output

perl ($| controls buffering), courtesy Stefan Reuther

    while sleep 0.1; do date; done| grep .                        | grep .      # immediate output
    while sleep 0.1; do date; done| perl -ne '$|=1; print if /./' | grep .      # immediate output

5.) the package "expect" provides the expect script unbuffer
It simulates an interactive TTY which makes the command chose line buffering. The usage is identical to item 2. and 3.)

<http://www.in-ulm.de/~mascheck/various/buffering/>

Sven Mascheck