Re: Crash in vm86() on SMP boxes with vesa driver?

Kendall Bennett (KendallB@scitechsoft.com)
Tue, 29 Apr 2003 12:41:12 -0700


"Petr Vandrovec" <VANDROVE@vc.cvut.cz> wrote:

> > 8.0 box with the latest 2.4.20 kernel on it (but the problem happened
> > with the stock kernel and kernels lower then .20 as well). Unfortunately
> > I don't have access to the box (it is in Australia), but I have access to
> > the bug report information (and will try to configure a box soon to
> > reproduce it here). Anyway the folowing is the error log produced by
> > XFree86 when the crash occurs:
>
> We told you before that you cannot trust VESA BIOS.

No, I do not agree with that statement at all. At present I would say
that there is a problem with the vm86() services, or perhaps something
wrong with the way we (and XFree86) are setting up the vm86 state for the
BIOS. The reason I say that is because we use the BIOS all the time using
vm86() services on OS/2 and we have not had any of these problems.

Essentially what I am saying is that this problem is fixable somewhere
(either in the kernel or in our/XFree86's vm86() code).

> > (II) VESA(0): initializing int10
> > (WW) VESA(0): Bad V_BIOS checksum
> > (II) VESA(0): Primary V_BIOS segment is: 0xc000
>
> Bad checksum? Sorry, your BIOS is not usable. Either XFree gets
> checksum wrong, or there is something I would not want in my
> computer there...

I wish we stll had access to the machine so I could debug this. It is
plausible that the BIOS has a bad checksum, but if it did, the system
BIOS would have failed to POST the card. Hence I think there is something
else going on here.

> > Also from debugging our own code we have a bit more information about
> > where the problem occurs, and it occurs on the return from the vm86()
> > system call when the code tries to pop the EBX register from the stack.
> > Which kind of indicates that the kernel screwed up the return stack of
> > the program for some reason:
>
> No. Crash happened inside VM, and it was shown as happening on
> return from int $0x80. But real problem is that in the VM you are
> executing code at 0xC000:0x800F. But there is no code there, it is
> garbage (bound bx,[bx+si]; xchg cx,ax; pusha; or dx,di ???) which
> generated bounds check interrupt.

Ok, that makes sense. From experience with OS/2 and virtual machines,
this generally happens when the video BIOS is confused by the state of
the hardware, especially of I/O port access to certain registers has been
incorrectly virtualised. ATI cards have been notorious for us on OS/2 for
these types of problems, but the problem is solveable (most of the OS/2
related problems are all specific to running in a window, where access to
the hardware registers has to be restricted and correctly emulated).

> > Any ideas? I am not sure how to start debuging this (assuming I can get
> > my SMP machine up and running and reproduce it) in the kernel. Also the
> > machine that the problem occurs on goes to the customer tomorrow, so we
> > won't be able to debug this much ourselves until I can get a new machine
> > to reproduce it. But, it would seem to me that others may well have seen
> > this problem already?
>
> Make sure that videocard properly reports that it uses more than
> 32kB BIOS. Maybe card reports only 32kB, while it uses 48kB. System
> is free to do anything it wants with 32-48kB range including
> mapping another BIOS there, or writting zeroes, or garbage there...
> Also make sure that you have properly setup VM, that 0xC8000 is
> mapped to physical address 0xC8000...

I will check into this. I am running into some strange problems on an
NVIDIA GeForce4 integrated system right now, yet that same BIOS works
perfectly in DOS and OS/2 so there is something up with the way the
vm86() services are being handled. I will try to solve the problem I am
seeing on this NVIDIA machine, and perhaps that will lead to a solution
for both problems (assuming they are actually related of course ;-).

Regards,

---
Kendall Bennett
Chief Executive Officer
SciTech Software, Inc.
Phone: (530) 894 8400
http://www.scitechsoft.com

~ SciTech SNAP - The future of device driver technology! ~

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/