Re: Bug with shared memory.

Andrew Morton (akpm@zip.com.au)
Tue, 14 May 2002 12:33:23 -0700


Martin Schwidefsky wrote:
>
> Hi,
> we managed to hang the kernel with a db/2 stress test on s/390. The test
> was done on 2.4.7 but the problem is present on all recent 2.4.x and 2.5.x
> kernels (all architectures). In short a schedule is done while holding
> the shm_lock of a shared memory segment. The system call that caused
> this has been sys_ipc with IPC_RMID and from there the call chain is
> as follows: sys_shmctl, shm_destroy, fput, dput, iput, truncate_inode_pages,
> truncate_list_pages, schedule. The scheduler picked a process that called
> sys_shmat. It tries to get the lock and hangs.

There's no way the kernel can successfully hold a spinlock
across that call chain.

> One way to fix this is to remove the schedule call from truncate_list_pages:
>
> --- linux-2.5/mm/filemap.c~ Tue May 14 17:04:14 2002
> +++ linux-2.5/mm/filemap.c Tue May 14 17:04:33 2002
> @@ -237,11 +237,6 @@
>
> page_cache_release(page);
>
> - if (need_resched()) {
> - __set_current_state(TASK_RUNNING);
> - schedule();
> - }
> -
> write_lock(&mapping->page_lock);
> goto restart;
> }
>
> Another way is to free the lock before calling fput in shm_destroy but the
> comment says that this functions has to be called with shp and shm_ids.sem
> locked. Comments?

Maybe ipc_ids.ary should become a semaphore?

-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/