Re: Syscall numbers for BProc

hendriks@lanl.gov
Fri, 4 Apr 2003 17:44:28 -0700


On Fri, Apr 04, 2003 at 08:35:31PM +0100, Christoph Hellwig wrote:
> On Fri, Apr 04, 2003 at 12:32:18PM -0700, Erik Hendriks wrote:
> > Is it possible to get a Linux system call number allocated for BProc?
> > (http://sourceforge.net/projects/bproc) I've been using arbitrary
> > system call numbers for a while but there have been collisions with
> > new kernel features. I'd like to avoid that in the future. BProc
> > currently works on 2.4.x kernels on x86, alpha and ppc (32bit).
>
> Please explain why you need syscalls and the exact APIs.

Here's the answer to the second half... The kernel API is reasonably
large so let me know if anybody wants more detail anywhere. The C
library is basically a 1:1 mapping for these calls.

Also, if anybody wants to know more about BProc in general, they can
check out: http://public.lanl.gov/cluster/papers/papers/hendriks-ics02.pdf

API for the syscall:

arg0 is a function number and meaning of the rest of the arguments
depends on that.

For arg0:

0x0001 - BPROC_SYS_VERSION - get BProc version
arg1 is a pointer to a bproc_version_t which gets filled in.

return value: 0 on success, -errno on error

0x0002 - BPROC_SYS_DEBUG - undocumented debugging call a magic
debugging hook who's argument meanings are fluid, currently it does:
arg1 = 0 - return number of children by checking pptr and opptr.
arg1 = 1 - return true/false indicating whether wait() will be local.
arg1 = 2 - return value of nlchild (internal BProc process book keeping val)
arg1 = 3 - perform process ID sanity check and return information about
pptr and opptr in linux and BProc.

0x0003 - BPROC_SYS_MASTER - get master daemon file descriptor. The master
daemon reads/writes messages to/from kernel
space with this file descriptor.
no arguments

return value: a new file descriptor or -errno on failure.

0x0004 - BPROC_SYS_SLAVE - get slave daemon file descriptor. The slave
daemon reads/writes messages to/from kernel
space with this file descriptor.
no arguments

return value: a new file descriptor or -errno on failure.

0x0005 - BPROC_SYS_IOD - get I/O daemon file descriptor. The I/O daemon
reads/writes messages to/from kernel space with
this file descriptor.
no arguments

return value is a new file descriptor or -errno on failure.

0x0006 - BPROC_SYS_NOTIFY - get notifier file descriptor. An app can get a
file descriptor which becomes ready for reading
whenever a BProc machine state change happens.
no arguments

return value: a new file descriptor or -errno on failure.

0x0201 - BPROC_SYS_INFO - get information on node status
arg1 pointer to an array of bproc_node_info_t's (first element contains
index of last node seen, nodes returned will start with next node)
arg2 number of elements in array

return value: number of elements returned on success, -errno on error

0x0202 - BPROC_SYS_STATUS - set node status
arg1 node number
arg2 new node state

return value: 0 on success, -errno on error

0x0203 - BPROC_SYS_CHOWN - change node permission bits (perms are file-like)
arg1 node number
arg2 new node perm

return value: 0 on success, -errno on error

0x0207 - BPROC_SYS_CHROOT - ask slave node to chroot()
arg1 node number
arg2 pointer to patch to chroot() to.

return value: 0 on success, -errno on error

0x0208 - BPROC_SYS_REBOOT - ask slave node to reboot
arg1 node number

return value: 0 on success, -errno on error

0x0209 - BPROC_SYS_HALT - ask slave node to halt
arg1 node number

return value: 0 on success, -errno on error

0x020A - BPROC_SYS_PWROFF - ask slave node to power off
arg1 node number

return value: 0 on success, -errno on error

0x020B - BPROC_SYS_PINFO - get information about location of remote processes
arg1 pointer to an array of bproc_proc_info_t's (first element contains
index of last proc seen, procs returned will start with next node)
arg2 number of elements in array

return value: number of elements returned on success, -errno on error

0x020E - BPROC_SYS_RECONNECT - ask slave daemon to reconnect
arg1 node number
arg2 pointer to bproc_connect_t which contains 2 sockaddrs - a local
and remote address for the slave to use when re-connecting to
the master.

return value: 0 on success, -errno on error

0x0301 - BPROC_SYS_REXEC - remote exec (replace current process with exec
performed on remote node)
arg1 node number
arg2 pointer to bproc_move_t (contains exec args, io setup info, etc)

return value: no return (it's an exec) on success, -errno on error

0x0302 - BPROC_SYS_MOVE - move caller to remote node
arg1 node number
arg2 pointer to bproc_move_t (contains flags, io setup info, etc)

arg2 move flags (how much of the memory space gets sent)

return value: 0 on success, -errno on error

0x0303 - BPROC_SYS_RFORK - fork a child onto another node. This is a
combination of the fork and move calls with
semantics such that no child process is
ever created (from the parent's point of
view) if the move step fails.
arg1 node number
arg2 pointer to bproc_move_t (contains flags, io setup info, etc)

return value: parent: child pid on success, -errno on error
child: 0

0x0304 - BPROC_SYS_EXECMOVE - exec and then move. This is a
combination of the xec and move
syscalls. This call performs an exec
and then moves the resulting process
image to a remote node before it is
allowed to run. This is used to place
images of programs which are not BProc
aware on remote nodes.
arg1 node number
arg2 pointer to bproc_move_t (contains exec args, io setup info, etc.)

return value: no return on success, -errno on error if error happens
in exec(). If error happens during the move step the process will
exit with errno as its status.

0x0306 - BPROC_SYS_VRFORK - vector rfork - create many child processes
on many remote nodes efficiently.

arg1 pointer to bproc_move_t. This contains the number of children
to create, a list of nodes to move to, an array to store the
resulting child process IDs, and possibly IO setup information.

return value: parent: number of nodes or -errno on error.
pid array contains pids or -errno for each child.
child: rank in list of nodes (0 .. n-1)

0x0307 - BPROC_SYS_EXEC - use master node to perform exec. A process
running on a slave node can ask it's "ghost"
on the front end to perform an exec for it.
The results of that exec will replace the
process on the slave node.
arg1 pointer to bproc_move_t (contains execve args)

return value: no return on success, -errno on failure.

0x0309 - BPROC_SYS_VEXECMOVE - vector execmove - create many child
processes on many remote nodes
efficiently. The child process image
is the result of the supplied execve.

arg1 pointer to bproc_move_t. This contains the number of children
to create, a list of nodes to move to, an array to store the
resulting child process IDs, execve args and possibly IO setup
information.

return value: parent: number of nodes or -errno on failure. The array
children: no return. If BPROC_RANK=XXXXXXX exists in the
environment, vexecmove will replace the Xs with
the child's rank in vexecmove.

0x1000 - at this offset, bproc provides an interface to virtual memory
area dumper (vmadump). VMADump is the process save/restore
mechanism that BProc uses internally.

0x1000 - VMAD_DO_DUMP - dump the calling process's image to a file descriptor
arg1 file descriptor
arg2 dump flags (controls which regions are dumped and which regions
are stored as references to files.)

return value: during dump: number of bytes written to the file descriptor.
during undump: 0 (when the process image is restored, it will
start by returning from this system call)

0x1001 - VMAD_DO_UNDUMP - restore a process image from a file
descriptor. The new image will replace the
calling process just like exec.
arg1 file descriptor

return value: no return on success, -errno on failure.

side note: where possible, vmadump adds a binary format for dump files
which allows a dump stored in a file to be executed directly.

0x1002 VMAD_DO_EXECDUMP - this is a combination of the exec and dump
system calls. An exec is performed and the
resulting process image is dumped to a file
descriptor. The process will exit after the
dump is complete.

arg1 pointer to vmadump_execdump_args (contains execve arguments, a
file descriptor to dump to and dump flags)

return value: no return on success, -errno on failure.

VMADump has a "library list" in kernel space. This is the list of
files it presumes are available at undump time if you ask it not to
dump "libaries".

0x1030 - VMAD_LIB_CLEAR - clear the library list
no arguments

return value: 0 on success, -errno no failure.

0x1031 - VMAD_LIB_ADD - add a new library to the list
arg1 pointer to filename
arg2 length of filename

return value: 0 on success, -errno no failure.

0x1032 - VMAD_LIB_DEL - delete a library from the list
arg1 pointer to filename
arg2 length of filename

return value: 0 on success, -errno no failure.

0x1033 - VMAD_LIB_LIST - get the contents of the library list
arg1 pointer to region to store library list
arg2 size of region

The list of libraries is store as a null delimited list of strings.

return value: number of bytes stored on success, -errno no failure.

0x1034 - VMAD_LIB_SIZE - get the size in bytes of the library list
no arguments

return value: size in bytes of the library list
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/