Then I noticed kernel V2.0.31 included some reworked tty code by Bill Hawes.
So I installed the lastest kernel to make sure I was running up to date code.
(Kernel V2.0.32).
This fixed the problem partly. Now if I send more than 256 characters to a
login process on a telnetd session, it doesn't lock up anymore. However I still
notice a locked up login process every now and then.
Here's some information I collected:
KERNEL: Linux Kernel V2.0.32
RPMS: /bin/login from package util-linux-2.5-11fix
/usr/sbin/in.telnetd from package NetKit-B-0.06-12
libc from package libc-5.3.12-17
libtermcap from package libtermcap-2.0.8-4
14383 pb S < 0:00 login -h srv004.seagha.com -p
('ps l' indicates it's waiting in write_chan)
----------------------------------
(gdb) bt
#0 0x40020d78 in __write ()
#1 0x40030f5b in _IO_file_write ()
Cannot access memory at address 0x21.
(gdb)
----------------------------------
(gdb) disas
Dump of assembler code for function __write:
0x40020d64 <__write>: pushl %ebp
0x40020d65 <__write+1>: movl %esp,%ebp
0x40020d67 <__write+3>: pushl %ebx
0x40020d68 <__write+4>: movl $0x4,%eax
0x40020d6d <__write+9>: movl 0x8(%ebp),%ebx
0x40020d70 <__write+12>: movl 0xc(%ebp),%ecx
0x40020d73 <__write+15>: movl 0x10(%ebp),%edx
0x40020d76 <__write+18>: int $0x80
0x40020d78 <__write+20>: movl %eax,%edx
0x40020d7a <__write+22>: testl %edx,%edx
0x40020d7c <__write+24>: jnl 0x40020d9a <__write+54>
0x40020d7e <__write+26>: negl %edx
0x40020d80 <__write+28>: pushl %edx
0x40020d81 <__write+29>: call 0x40020d86 <__write+34>
0x40020d86 <__write+34>: popl %ebx
0x40020d87 <__write+35>: addl $0x7ec12,%ebx
0x40020d8d <__write+41>: call 0x4001e3e8 <_init+10344>
0x40020d92 <__write+46>: popl %edx
0x40020d93 <__write+47>: movl %edx,(%eax)
0x40020d95 <__write+49>: movl $0xffffffff,%eax
0x40020d9a <__write+54>: popl %ebx
0x40020d9b <__write+55>: movl %ebp,%esp
0x40020d9d <__write+57>: popl %ebp
0x40020d9e <__write+58>: ret
0x40020d9f <__write+59>: nop
End of assembler dump.
------------
On a kernel < V2.0.31 you can easily simulate this by telnetting into your box,
then entering more than 256 characters at the login prompt and the aborting the
connection using ^]. If you now do a PS you will see that the login process is
still waiting on the data. Use 'cat </dev/ttypXX' to grab the remaining data,
this will let the login process continue and die.
On a kernel >= V2.0.31 this doesn't work anymore, and I haven't been able to
reproduce the problem manually, which makes it more difficult to correct.
(Note.. all these connections run over a telnetd session.. even the modem
logins -- the linux box doesn't have modems directly connected to it)
Karl.
// Electronic Mail - SMTP: kvo@seagha.com
\X/ - X400: c=BE; a=RTT; p=SEAGHA; s=VOGEL; g=KARL