From: karmadon@my-dejanews.com (karmadon@my-dejanews.com)
 Subject: nanosleep/sleep/SIGCONT bug 
 Newsgroups: comp.os.linux.development.system
 Date: 1998/11/16 


Hi,

On 2.0.x kernel with glibc.2.x try the following:

$ sleep 100
^Z
$ fg

sleep will return immediately, instead of sleeping for the remaining interval.
This did not happen with older libc.5.x

The bug lies in the implementation of sleep(2) in glibc, which does it through
nanosleep(2) call. (older libc used alarm(2) and a signal handler).

nanosleep (or sys_nanosleep) in kernel/sched.c returns -EINTR when
interrupted:  1652  expire = timespectojiffies(&t) + (t.tv_sec || t.tv_nsec)
+ jiffi  1653  current->timeout = expire;  1654  current->state =
TASK_INTERRUPTIBLE;  1655  schedule();  1656  1657  if (expire > jiffies) { 
1658  if (rmtp) {  1659  jiffiestotimespec(expire - jiffies -  1660  (expire
> jiffies + 1), &t);  1661  memcpy_tofs(rmtp, &t, sizeof(struct timespec)); 
1662  }  1663  return -EINTR;  1664  }

However, from the logic in  arch/i386/kernel/signal.c, I can see that the
function which wants to be re-started after benign signal such as SIGCONT,
SIGWINCH, SIGCHLD should return -ERESTARTNOHAND, like a few hundred lines
above in sys_pause() implementation is kernel/sched.c:



    507 asmlinkage int sys_pause(void)
    508 {
    509         current->state = TASK_INTERRUPTIBLE;
    510         schedule();
    511         return -ERESTARTNOHAND;
    512 }
    513

I re-compiled the kernel with the return changed to -ERESTARTNOHAND, and,
lo and behold, everything works properly now.

Could some of you, system gurus, take a look at it?

Thanks,
Igor



--
Igor Shpigelman         "Homo sapiens are always capable of thinking,
Yet Another UNIX Hacker  but not always able to think" A.B.Strugatski


-----------== Posted via Deja News, The Discussion Network ==----------
http://www.dejanews.com/       Search, Read, Discuss, or Start Your Own    


From: Linus Torvalds (torvalds@transmeta.com) Subject: Re: nanosleep/sleep/SIGCONT bug Newsgroups: comp.os.linux.development.system Date: 1998/11/16 In article <72q4hp$59d$1@nnrp1.dejanews.com>, <karmadon@my-dejanews.com> wrote: > >I re-compiled the kernel with the return changed to -ERESTARTNOHAND, and, >lo and behold, everything works properly now. > >Could some of you, system gurus, take a look at it? Your change makes the thing restart. Which is a good thing in general. HOWEVER, it doesn't address the fact that the "nanosleep()" system call by design simply is not restartable. With your change, it will now restart with the same timeout it had originally, unless you have made sure to alias the original and the result timers. As such, suddenly "nanosleep()" isn't nanosleep() any more, but "sleep for arbitrarily long if certain signals happen". Which may be the right behaviour for your application, but not in general. The right thing to do is to _not_ use "nanosleep()" for sleeping, and that requires a glibc change. Linus


From: Erik Westlin (westlin@msi.se) Subject: Re: nanosleep/sleep/SIGCONT bug Newsgroups: comp.os.linux.development.system Date: 1998/11/18 karmadon@my-dejanews.com wrote: > > Hi, > > On 2.0.x kernel with glibc.2.x try the following: > > $ sleep 100 > ^Z > $ fg > > sleep will return immediately, instead of sleeping for the remaining interval. > This did not happen with older libc.5.x > > The bug lies in the implementation of sleep(2) in glibc, which does it through > nanosleep(2) call. (older libc used alarm(2) and a signal handler). > I suspect this is causing trouble in the dce-rpc 1.1 package. Maybe it would be possible to replace the sleep(2) in glibc2 with the one in libc5? If so maybe you could post it if you have the code at hand. ------------------------------------------------------------------------ Erik Westlin Manne Siegbahn Laboratory email: westlin@msi.se


From: Matthew Hannigan (mlh@zipper.zip.com.au) Subject: Re: nanosleep/sleep/SIGCONT bug Newsgroups: comp.os.linux.development.system Date: 1998/11/19 In article <365286AF.4306@msi.se>, Erik Westlin <westlin@msi.se> wrote: >karmadon@my-dejanews.com wrote: >> >> Hi, >> >> On 2.0.x kernel with glibc.2.x try the following: >> >> $ sleep 100 >> ^Z >> $ fg >> >> sleep will return immediately, instead of sleeping for the remaining interval. >> This did not happen with older libc.5.x >> >> The bug lies in the implementation of sleep(2) in glibc, which does it through >> nanosleep(2) call. (older libc used alarm(2) and a signal handler). >> > >I suspect this is causing trouble in the dce-rpc 1.1 package. >Maybe it would be possible to replace the sleep(2) in glibc2 with the >one in libc5? If so maybe you could post it if you have the code >at hand. > H.J.Lu just sent some nanosleep code for this on the linux-kernel mailing list. -- -Matt Hannigan


From: Linus Torvalds (torvalds@transmeta.com) Subject: Re: nanosleep/sleep/SIGCONT bug Newsgroups: comp.os.linux.development.system Date: 1998/11/19 In article <730611$rj4$1@the-fly.zip.com.au>, Matthew Hannigan <mlh@zipper.zip.com.au> wrote: > >H.J.Lu just sent some nanosleep code for this on the linux-kernel >mailing list. I don't think that is enough. What hjl did was a special case for SIGCHLD only. It doesn't fix the fact that nanosleep() simply _cannot_ be restarted. You can hide some of the problems (SIGCHLD) by letting zombies stay around, but there is no way you can hide the basic restartability issue. I think the only fix is to go back to the libc-5 implementation. Alternatively, the interface to "nanosleep()" inside the kernel can be completely revamped so that there is only one buffer that holds both the incoming and the outgoing values, so that nanosleep can be restarted with the proper timeout. For example, doing something like this is likely to be ok: nanosleep(×pec, ×pec); where the _modified_ timespec is required to be the _same_ as the incoming timespec, and then you change "sys_nanosleep()" to do what the original email in this thread suggested, ie make it return -ERESTARTNOHAND. Then it correctly handles the case of being restarted. But then it's not the POSIX nanosleep any more. Hjl, please go back to using the old "sleep()", because the current glibc one is buggy (even with your SIGCHLD changes). Users who use "nanosleep()" directly are supposed to restart by hand - but "sleep()" cannot do this correctly with nanosleep(). Linus --- [ hjl, in case you didn't follow it, the problem is a program that does sleep(10) and is suspended with ^Z only to be re-woken with "fg" - at which point it returns immediately. Which is wrong. ]