memory management in the Bourne shell

About the memory management in the Bourne shell

Update: See Stephen Bourne's talk at BSDCan 2015 "Early days of Unix and design of sh" (youtube)
with some enlightening and entertaining background about the memory management.

In comp.arch, 05/97, <5m2mu4$guf$1@murrow.corp.sgi.com>, John Mashey writes:

For speed, Steve B had used a clever trick of using a memory arena without
checking for the end, but placing it so that running off the end would cause
a memory fault, which the shell then trapped, allocated more memory,
then returned to the instruction that caused the trap and continued.
The MC68000 (in order to go fast) had an exception model that
broke this (among other things) and caused some grief to a whole generation
of people porting UNIX to 68Ks in the early 1980s.

adding the following in <6p5d51$oeg$1@murrow.corp.sgi.com>:

[...] I've always felt bad about this, since I'm the one who accidentally
goaded Steve Bourne into doing this.  [Bell Labs Piscataway used the PWB shell
quite heavily for its scripting needs in the mid-1970s, on PDP-11s of course,
and performance was actually important.  I complained to Steve that
we just couldn't hack a 2:1 performance hit, and he went all-out to tune
the shell up, with this being one of the tricks.

not to forget the following in <5m8bup$b88$1@murrow.corp.sgi.com>,

[...]
I thought from the context that the term clever contained some
irony [especially given that I had to worry about the problem myself later] ...
I probably should have put it quotes.

In comp.arch, 11/94, <39ltmt$isg@nova.netapp.com> Guy Harris writes:

Some of us remember it with disgust from working on early 68020's, i.e.
the first Sun-3's.  SunOS 3.x and 4.0[.x] didn't save/restore the "stack
puke" that the '010's and '020's and '030's dump if the bus error is
handled by a user-mode signal handler, so the Bourne shell's SIGSEGV
handler couldn't restart the faulting instruction.

For some reason, that didn't show up as a problem with the vanilla
Bourne shell until SunOS 3.1 or so; as I remember, somebody changed
something in the kernel between 3.0 and 3.1.

It *had* shown up with a modified version of the Bourne shell from the
US Army's Ballistics Research Lab (no, it wasn't modified to give a
different interpretation of the word "shell" :-)), so I'd gone through
that version of the shell and changed it to check whether a given
reference *would* go past the end of the data segment and, if so, grow
the data segment.

I ported those changes to the vanilla Bourne shell for 3.1 or 3.2 or so,
and the problem cleared up.

SunOS 4.1[.x], I think, would save the "stack puke" if the bus error was
to be handed to a signal handler, and restore it when the signal
trampoline code did the "restore signal context" trap, so, as long as
the signal handler doesn't *itself* get a bus error that's handled by a
signal handler, it'd work OK.

(Dunno if John Mashey's reading this thread, but he has, I think,
claimed to have been responsible for provoking Bourne into handling the
data segment in that fashion, by complaining about the Bourne shell's
performance relative to the shell he did for PWB/UNIX 1.0, or something
like that.

However, I once compared the performance of an unmodified SVR3 Bourne
shell with a Bourne shell modified only to have my changes on a 3B2 - a
machine using a WE32100 processor which, as I remember, *doesn't* have
stack puke - with a test that I think should have fairly vigorously
exercised at least some of the data-segment-growing code, namely
something like

 cd <source directory for shell>
 echo `cat *.c`

and didn't see any performance difference.  Maybe that wasn't a good
test, or maybe things were different on the PDP-11.)

This is all one reason why I prefer it when processors don't have stack
puke....

And Amos Shapir about the mentioned change in the SunOS kernel between 3.0 and 3.1, <amos.784724267@hal>:

Just some more old geezers' war stories:  The bug in the Bourne shell had
made it release memory pages which still contained valid pointers; when
these pointers were used later, the SIGSEGV handler requested more memory,
and, in all the VAX systems and early Suns, sbrk gave it back the same pages
it released previously, unscrubbed.  What changed in SunOS 3.1 was that newly
allocated pages were zeroed out, and so broke this wonderful scheme.
I don't think it had anything to do with restarting the faulting instruction.

In comp.unix.wizards, 08/92, <1992Aug6.151154.25788@maths.nott.ac.uk>,
Andy N. Walker tries to explain why such a method might have been used at all:

In article <1992Aug5.232821.24813@athena.mit.edu> scs@adam.mit.edu
(Steve Summit) writes:
> Well, if you're the Bourne shell (/bin/sh), you can use this
> stupefyingly baroque "hit or miss" memory allocation technique in
> which you eschew malloc entirely [note 1] [...]
> Note 1.  To be fair, /bin/sh predates malloc.

        I can no longer find my V6 manual;  but "malloc" is not mentioned
as something new in my V7 manual, and I'm pretty sure that I sometimes
avoided "malloc" in V6.  If so, then although "Note 1" may be correct,
it's not fair, as the V6 "/bin/sh" wasn't the Bourne shell.

        On the other hand, to be fair, there were good reasons for avoiding
"malloc" in those days.  Unix on a PDP 11/34 [128K memory, 2.4M disc, 64K
per-process limit (inc the kernel), no split I/D] was a tight squeeze, and
utilities like "sh", "ed", "cc" had to be very economical.  If you have
only one dynamic data structure, then using "brk" and friends to grow the
necessary space probably saved hundreds or even thousands of bytes, which
could be used instead for the data.

        Programs of yesteryear used all sorts of exotic tricks to save
bytes.  We most certainly shouldn't be doing that today [except in bizarre
circumstances];  but nor should we be too dismissive of old practices.
[This is not a dig at SCS!]

In comp.arch, 10/00, <39DE8FB8.A034B1CF@bell-labs.com>, Dennis Ritchie tells:

[...]

Early (through 6th edition, and PWB/Mashey) Unix shells used
the automatic system-provided mechanisms for
stack extension, namely that the stack grew from the top of
memory, the heap from below (by explicit request).  The upper
(stack) memory allocation was extended automatically when a fault
was occurred and the hardware SP was within the unallocated region
between lower and upper--/bin/sh was exactly like all other Unix programs.

The Bourne shell was aggressively different.  For its heapish
storage, instead of using malloc or doing its own pre-planned
Unix sbrk() management, simply stored things at--and sometimes
beyond--the end of the the lower memory allocation, and caught
SIGSEGV.  Enough information was available on the PDP-11 to do
a new sbrk and restart the "beyond"-referencing instructions.

When we came to the first port, Johnson and I discovered that
the recovery (even for the stack, which was supposed to be
automatic in the OS) was tough.  Eventually we found enough
Interdata 8/32 magic to make it possible.

But the Bourne shell was one of the last programs to be moved
to the Interdata machine--and that's why "osh" (the 6th edition
shell) was distributed with the 7th edition--we knew that future
ports would have difficulties with it.

As it happened, VAX, which was the next port, had no unsolvable problems
in dealing with instruction restart after protection exceptions.
So the Bourne code remained.

The Cray folks, somewhat later, had hardware that really couldn't help.
As others have mentioned, I think, their stacks are managed with
linked records in space that's allocated by requests that are explicit
in user-space, run-time code generated by the compilers, even if implicit
in the source program.

They were among those who rewrote the Bourne shell to use advertised
resources like malloc instead of fixup-and-recovery from memory faults.

And here is a patch from Guy Harris (posted to Usenet) against the SVR2 shell to avoid the original way of allocation.