From: karmadon@my-dejanews.com (karmadon@my-dejanews.com)
Subject: nanosleep/sleep/SIGCONT bug
Newsgroups: comp.os.linux.development.system
Date: 1998/11/16
Hi,
On 2.0.x kernel with glibc.2.x try the following:
$ sleep 100
^Z
$ fg
sleep will return immediately, instead of sleeping for the remaining interval.
This did not happen with older libc.5.x
The bug lies in the implementation of sleep(2) in glibc, which does it through
nanosleep(2) call. (older libc used alarm(2) and a signal handler).
nanosleep (or sys_nanosleep) in kernel/sched.c returns -EINTR when
interrupted: 1652 expire = timespectojiffies(&t) + (t.tv_sec || t.tv_nsec)
+ jiffi 1653 current->timeout = expire; 1654 current->state =
TASK_INTERRUPTIBLE; 1655 schedule(); 1656 1657 if (expire > jiffies) {
1658 if (rmtp) { 1659 jiffiestotimespec(expire - jiffies - 1660 (expire
> jiffies + 1), &t); 1661 memcpy_tofs(rmtp, &t, sizeof(struct timespec));
1662 } 1663 return -EINTR; 1664 }
However, from the logic in arch/i386/kernel/signal.c, I can see that the
function which wants to be re-started after benign signal such as SIGCONT,
SIGWINCH, SIGCHLD should return -ERESTARTNOHAND, like a few hundred lines
above in sys_pause() implementation is kernel/sched.c:
507 asmlinkage int sys_pause(void)
508 {
509 current->state = TASK_INTERRUPTIBLE;
510 schedule();
511 return -ERESTARTNOHAND;
512 }
513
I re-compiled the kernel with the return changed to -ERESTARTNOHAND, and,
lo and behold, everything works properly now.
Could some of you, system gurus, take a look at it?
Thanks,
Igor
--
Igor Shpigelman "Homo sapiens are always capable of thinking,
Yet Another UNIX Hacker but not always able to think" A.B.Strugatski
-----------== Posted via Deja News, The Discussion Network ==----------
http://www.dejanews.com/ Search, Read, Discuss, or Start Your Own
From: Linus Torvalds (torvalds@transmeta.com)
Subject: Re: nanosleep/sleep/SIGCONT bug
Newsgroups: comp.os.linux.development.system
Date: 1998/11/16
In article <72q4hp$59d$1@nnrp1.dejanews.com>,
<karmadon@my-dejanews.com> wrote:
>
>I re-compiled the kernel with the return changed to -ERESTARTNOHAND, and,
>lo and behold, everything works properly now.
>
>Could some of you, system gurus, take a look at it?
Your change makes the thing restart. Which is a good thing in general.
HOWEVER, it doesn't address the fact that the "nanosleep()" system call
by design simply is not restartable. With your change, it will now
restart with the same timeout it had originally, unless you have made
sure to alias the original and the result timers.
As such, suddenly "nanosleep()" isn't nanosleep() any more, but "sleep
for arbitrarily long if certain signals happen". Which may be the right
behaviour for your application, but not in general.
The right thing to do is to _not_ use "nanosleep()" for sleeping, and
that requires a glibc change.
Linus
From: Erik Westlin (westlin@msi.se)
Subject: Re: nanosleep/sleep/SIGCONT bug
Newsgroups: comp.os.linux.development.system
Date: 1998/11/18
karmadon@my-dejanews.com wrote:
>
> Hi,
>
> On 2.0.x kernel with glibc.2.x try the following:
>
> $ sleep 100
> ^Z
> $ fg
>
> sleep will return immediately, instead of sleeping for the remaining interval.
> This did not happen with older libc.5.x
>
> The bug lies in the implementation of sleep(2) in glibc, which does it through
> nanosleep(2) call. (older libc used alarm(2) and a signal handler).
>
I suspect this is causing trouble in the dce-rpc 1.1 package.
Maybe it would be possible to replace the sleep(2) in glibc2 with the
one in libc5? If so maybe you could post it if you have the code
at hand.
------------------------------------------------------------------------
Erik Westlin Manne Siegbahn Laboratory
email: westlin@msi.se
From: Matthew Hannigan (mlh@zipper.zip.com.au)
Subject: Re: nanosleep/sleep/SIGCONT bug
Newsgroups: comp.os.linux.development.system
Date: 1998/11/19
In article <365286AF.4306@msi.se>, Erik Westlin <westlin@msi.se> wrote:
>karmadon@my-dejanews.com wrote:
>>
>> Hi,
>>
>> On 2.0.x kernel with glibc.2.x try the following:
>>
>> $ sleep 100
>> ^Z
>> $ fg
>>
>> sleep will return immediately, instead of sleeping for the remaining interval.
>> This did not happen with older libc.5.x
>>
>> The bug lies in the implementation of sleep(2) in glibc, which does it through
>> nanosleep(2) call. (older libc used alarm(2) and a signal handler).
>>
>
>I suspect this is causing trouble in the dce-rpc 1.1 package.
>Maybe it would be possible to replace the sleep(2) in glibc2 with the
>one in libc5? If so maybe you could post it if you have the code
>at hand.
>
H.J.Lu just sent some nanosleep code for this on the linux-kernel
mailing list.
--
-Matt Hannigan
From: Linus Torvalds (torvalds@transmeta.com)
Subject: Re: nanosleep/sleep/SIGCONT bug
Newsgroups: comp.os.linux.development.system
Date: 1998/11/19
In article <730611$rj4$1@the-fly.zip.com.au>,
Matthew Hannigan <mlh@zipper.zip.com.au> wrote:
>
>H.J.Lu just sent some nanosleep code for this on the linux-kernel
>mailing list.
I don't think that is enough. What hjl did was a special case for
SIGCHLD only. It doesn't fix the fact that nanosleep() simply _cannot_
be restarted. You can hide some of the problems (SIGCHLD) by letting
zombies stay around, but there is no way you can hide the basic
restartability issue.
I think the only fix is to go back to the libc-5 implementation.
Alternatively, the interface to "nanosleep()" inside the kernel can be
completely revamped so that there is only one buffer that holds both the
incoming and the outgoing values, so that nanosleep can be restarted
with the proper timeout.
For example, doing something like this is likely to be ok:
nanosleep(×pec, ×pec);
where the _modified_ timespec is required to be the _same_ as the
incoming timespec, and then you change "sys_nanosleep()" to do what the
original email in this thread suggested, ie make it return
-ERESTARTNOHAND. Then it correctly handles the case of being restarted.
But then it's not the POSIX nanosleep any more.
Hjl, please go back to using the old "sleep()", because the current
glibc one is buggy (even with your SIGCHLD changes). Users who use
"nanosleep()" directly are supposed to restart by hand - but "sleep()"
cannot do this correctly with nanosleep().
Linus
---
[ hjl, in case you didn't follow it, the problem is a program that does
sleep(10)
and is suspended with ^Z only to be re-woken with "fg" - at which
point it returns immediately. Which is wrong. ]