Re: [DRIVER][RFC] SC1200 Watchdog driver

Christer Weinigel (wingel@acolyte.hack.org)
Tue, 26 Feb 2002 02:37:35 +0100 (CET)


Hi,

this is a patch trying to document the Watchdog API in the kernel.
I'd suggest that it gets into the mainline kernel so that people read
it and (hopefully) also update it.

/Christer

diff -urN linux.orig/Documentation/watchdog-api.txt linux/Documentation/watchdog-api.txt
--- /dev/null Wed Dec 31 16:00:00 1969
+++ linux/Documentation/watchdog-api.txt Tue Feb 26 01:16:58 2002
@@ -0,0 +1,365 @@
+The Linux Watchdog driver API.
+
+Copyright 2002 Christer Weingel <wingel@nano-system.com>
+
+Some parts of this document are copied verbatim from the sbc60xxwdt
+driver which is (c) Copyright 2000 Jakob Oestergaard <jakob@ostenfeld.dk>
+
+This document describes the state of the Linux 2.4.18 kernel.
+
+Introduction:
+
+A Watchdog Timer (WDT) is a hardware circuit that can reset the
+computer system in case of a software fault. You probably knew that
+already.
+
+Usually a userspace daemon will notify the kernel watchdog driver via the
+/dev/watchdog special device file that userspace is still alive, at
+regular intervals. When such a notification occurs, the driver will
+usually tell the hardware watchdog that everything is in order, and
+that the watchdog should wait for yet another little while to reset
+the system. If userspace fails (RAM error, kernel bug, whatever), the
+notifications cease to occur, and the hardware watchdog will reset the
+system (causing a reboot) after the timeout occurs.
+
+The Linux watchdog API is a rather AD hoc construction and different
+drivers implement different, and sometimes incompatible, parts of it.
+This file is an attempt to document the existing usage and allow
+future driver writers to use it as a reference.
+
+The simplest API:
+
+All drivers support the basic mode of operation, where the watchdog
+activates as soon as /dev/watchdog is opened and will reboot unless
+the watchdog is pinged within a certain time, this time is called the
+timeout or margin. The simplest way to ping the watchdog is to write
+some data to the device. So a very simple watchdog daemon would look
+like this:
+
+int main(int argc, const char *argv[]) {
+ int fd=open("/dev/watchdog",O_WRONLY);
+ if (fd==-1) {
+ perror("watchdog");
+ exit(1);
+ }
+ while(1) {
+ write(fd, "\0", 1);
+ sleep(10);
+ }
+}
+
+A more advanced driver could for example check that a HTTP server is
+still responding before doing the write call to ping the watchdog.
+
+When the device is closed, the watchdog is disabled. This is not
+always such a good idea, since if there is a bug in the watchdog
+daemon and it crashes the system will not reboot. Because of this,
+some of the drivers support the configuration option "Disable watchdog
+shutdown on close", CONFIG_WATCHDOG_NOWAYOUT. If it is set to Y when
+compiling the kernel, there is no way of disabling the watchdog once
+it has been started. So, if the watchdog dameon crashes, the system
+will reboot after the timeout has passed.
+
+Some other drivers will not disable the watchdog, unless a specific
+magic character 'V' has been sent /dev/watchdog just before closing
+the file. If the userspace daemon closes the file without sending
+this special character, the driver will assume that the daemon (and
+userspace in general) died, and will stop pinging the watchdog without
+disabling it first. This will then cause a reboot.
+
+The ioctl API:
+
+Some drivers also support an ioctl API. I belive this API was first
+used in the Berkshire PC Watchdog driver and has been adopted by the
+other drivers.
+
+Pinging the watchdog using an ioctl:
+
+All drivers that have an ioctl interface support at least one ioctl,
+KEEPALIVE. This ioctl does exactly the same thing as a write to the
+watchdog device, so the main loop in the above program could be
+replaced with:
+
+ while (1) {
+ ioctl(fd, WDIOC_KEEPALIVE, 0);
+ sleep(10);
+ }
+
+the argument to the ioctl is ignored.
+
+Setting and getting the timeout:
+
+For some drivers it is possible to modify the watchdog timeout on the
+fly with the SETTIMEOUT ioctl, those drivers have the WDIOF_SETTIMEOUT
+flag set in their option field. The argument is an integer
+representing the timeout in seconds. The driver returns the real
+timeout used in the same variable, and this timeout might differ from
+the requested one due to limitation of the hardware.
+
+ int timeout = 45;
+ ioctl(fd, WDIOC_SETTIMEOUT, &timeout);
+ printf("The timeout was set to %d seconds\n", timeout);
+
+This example might actually print "The timeout was set to 60 seconds"
+if the device has a granularity of minutes for its timeout.
+
+Starting with the Linux 2.4.18 kernel, it is possible to query the
+current timeout using the GETTIMEOUT ioctl.
+
+ ioctl(fd, WDIOC_GETTIMEOUT, &timeout);
+ printf("The timeout was is %d seconds\n", timeout);
+
+Envinronmental monitoring:
+
+Some watchdog drivers can return more information about the system,
+some do temperature, fan and power level monitoring, some can tell you
+the reason for the last reebot of the system. The GETSUPPORT ioctl is
+available to ask what the device can do:
+
+ struct watchdog_info ident;
+ ioctl(fd, WDIOC_GETSUPPORT, &ident);
+
+the fields returned in the ident struct are:
+
+ identity a string identifying the watchdog driver
+ firmware_version the firmware version of the card if available
+ options a flags describing what the device supports
+
+the options field can have the following bits set, and describes what
+kind of information that the GET_STATUS and GET_BOOT_STATUS ioctls can
+return. [FIXME -- Is this correct?]
+
+ WDIOF_OVERHEAT Reset due to CPU overheat
+ WDIOF_FANFAULT Fan failed
+ WDIOF_EXTERN1 External relay 1
+ WDIOF_EXTERN2 External relay 2
+ WDIOF_POWERUNDER Power bad/power fault
+ WDIOF_CARDRESET Card previously reset the CPU
+ WDIOF_POWEROVER Power over voltage
+ WDIOF_KEEPALIVEPING Keep alive ping reply
+ WDIOF_SETTIMEOUT Can set/get the timeout
+
+[FIXME -- Can somebody explain what the flags mean in more detail,
+especially KEEPALIVEPING?]
+
+For those drivers that return any bits set in the option field, the
+GETSTATUS and GETBOOTSTATUS ioctls can be used to ask for the current
+status, and the status at the last reboot, respectively.
+
+ int flags;
+ ioctl(fd, WDIOC_GETSTATUS, &flags);
+
+ or
+
+ ioctl(fd, WDIOC_GETBOOTSTATUS, &flags);
+
+Note that not all devices support these two calls, and some only
+support the GETBOOTSTATUS call.
+
+Some drivers can measure the temperature using the GETTEMP ioctl. The
+returned value is the temperature in degrees farenheit.
+
+ int temperature;
+ ioctl(fd, WDIOC_GETTEMP, &temperature);
+
+Finally the SETOPTIONS ioctl can be used to control some aspects of
+the cards operation; right now the pcwd driver is the only one
+supporting thiss ioctl.
+
+ int options = 0;
+ ioctl(fd, WDIOC_SETOPTIONS, options);
+
+The following options are available:
+
+ WDIOS_DISABLECARD Turn off the watchdog timer
+ WDIOS_ENABLECARD Turn on the watchdog timer
+ WDIOS_TEMPPANIC Kernel panic on temperature trip
+
+[FIXME -- better explanations]
+
+Implementations in the current drivers in the kernel tree:
+
+Here I have tried to summarize what the different drivers support and
+where they do strange things compared to the other drivers.
+
+acquirewdt.c -- Acquire Single Board Computer
+
+ This driver has a hardcoded timeout of 1 minute
+
+ Supports CONFIG_WATCHDOG_NOWAYOUT
+
+ GETSUPPORT returns KEEPALIVEPING. GETSTATUS will return 1 if
+ the device is open, 0 if not. [FIXME -- isn't this rather
+ silly? To be able to use the ioctl, the device must be open
+ and so GETSTATUS will always return 1].
+
+advantechwdt.c -- Advantech Single Board Computer
+
+ Timeout that defaults to 60 seconds, supports SETTIMEOUT.
+
+ Supports CONFIG_WATCHDOG_NOWAYOUT
+
+ GETSUPPORT returns WDIOF_KEEPALIVEPING and WDIOF_SETTIMEOUT.
+ The GETSTATUS call returns if the device is open or not.
+ [FIXME -- silliness again?]
+
+eurotechwdt.c -- Eurotech CPU-1220/1410
+
+ The timeout can be set using the SETTIMEOUT ioctl and defaults
+ to 60 seconds.
+
+ Also has a module parameter "ev", event type which controls
+ what should happen on a timeout, the string "int" or anything
+ else that causes a reboot. [FIXME -- better description]
+
+ Supports CONFIG_WATCHDOG_NOWAYOUT
+
+ GETSUPPORT returns CARDRESET and WDIOF_SETTIMEOUT but
+ GETSTATUS is not supported and GETBOOTSTATUS just returns 0.
+
+i810-tco.c -- Intel 810 chipset
+
+ Also has support for a lot of other i810 stuff, but the
+ watchdog is one of the things.
+
+ The timeout is set using the module parameter "i810_margin",
+ which is in steps of 0.6 seconds where 2<i810_margin<64. The
+ driver supports the SETTIMEOUT ioctl.
+
+ Supports CONFIG_WATCHDOG_NOWAYOUT.
+
+ GETSUPPORT returns WDIOF_SETTIMEOUT. The GETSTATUS call
+ returns some kind of timer value which ist not compatible with
+ the other drivers. GETBOOT status returns some kind of
+ hardware specific boot status. [FIXME -- describe this]
+
+ib700wdt.c -- IB700 Single Board Computer
+
+ Default timeout of 30 seconds and the timeout is settable
+ using the SETTIMEOUT ioctl. Note that only a few timeout
+ values are supported.
+
+ Supports CONFIG_WATCHDOG_NOWAYOUT
+
+ GETSUPPORT returns WDIOF_KEEPALIVEPING and WDIOF_SETTIMEOUT.
+ The GETSTATUS call returns if the device is open or not.
+ [FIXME -- silliness again?]
+
+machzwd.c -- MachZ ZF-Logic
+
+ Hardcoded timeout of 10 seconds
+
+ Has a module parameter "action" that controls what happens
+ when the timeout runs out which can be 0 = RESET (default),
+ 1 = SMI, 2 = NMI, 3 = SCI.
+
+ Supports CONFIG_WATCHDOG_NOWAYOUT and the the magic character
+ 'V' close handling.
+
+ GETSUPPORT returns WDIOF_KEEPALIVEPING, and the GETSTATUS call
+ returns if the device is open or not. [FIXME -- silliness
+ again?]
+
+mixcomwd.c -- MixCom Watchdog
+
+ [FIXME -- I'm unable to tell what the timeout is]
+
+ Supports CONFIG_WATCHDOG_NOWAYOUT
+
+ GETSUPPORT returns WDIOF_KEEPALIVEPING, GETSTATUS returns if
+ the device is opened or not [FIXME -- I'm not really sure how
+ this works, there seems to be some magic connected to
+ CONFIG_WATCHDOG_NOWAYOUT]
+
+pcwd.c -- Berkshire PC Watchdog
+
+ Hardcoded timeout of 1.5 seconds
+
+ Supports CONFIG_WATCHDOG_NOWAYOUT
+
+ GETSUPPORT returns WDIOF_OVERHEAT|WDIOF_CARDRESET and both
+ GETSTATUS and GETBOOTSTATUS return something useful.
+
+ The SETOPTIONS call can be used to enable and disable the card
+ and to ask the driver to call panic if the system overheats.
+
+sbc60xxwdt.c -- 60xx Single Board Computer
+
+ Hardcoded timeout of 10 seconds
+
+ Does not support CONFIG_WATCHDOG_NOWAYOUT, but has the magic
+ character 'V' close handling.
+
+ No bits set in GETSUPPORT
+
+shwdt.c -- SuperH 3/4 processors
+
+ [FIXME -- I'm unable to tell what the timeout is]
+
+ Supports CONFIG_WATCHDOG_NOWAYOUT
+
+ GETSUPPORT returns WDIOF_KEEPALIVEPING, and the GETSTATUS call
+ returns if the device is open or not. [FIXME -- silliness
+ again?]
+
+softdog.c -- Software watchdog
+
+ The timeout is set with the module parameter "soft_margin"
+ which defaults to 60 seconds, the timeout is also settable
+ using the SETTIMEOUT ioctl.
+
+ Supports CONFIG_WATCHDOG_NOWAYOUT
+
+ WDIOF_SETTIMEOUT bit set in GETSUPPORT
+
+w83877f_wdt.c -- W83877F Computer
+
+ Hardcoded timeout of 30 seconds
+
+ Does not support CONFIG_WATCHDOG_NOWAYOUT, but has the magic
+ character 'V' close handling.
+
+ No bits set in GETSUPPORT
+
+wdt.c -- ICS WDT500/501 ISA and
+wdt_pci.c -- ICS WDT500/501 PCI
+
+ Default timeout of 60 seconds. The timeout is also settable
+ using the SETTIMEOUT ioctl.
+
+ Supports CONFIG_WATCHDOG_NOWAYOUT
+
+ GETSUPPORT returns with a lot of bits set, it seems that the
+ driver can check for overheating, power failure, fan failure
+ and a two external relay inputs. GETSTATUS can be used to
+ read these flags.
+
+wdt285.c -- Footbridge watchdog
+
+ The timeout is set with the module parameter "soft_margin"
+ which defaults to 60 seconds. The timeout is also settable
+ using the SETTIMEOUT ioctl.
+
+ Does not support CONFIG_WATCHDOG_NOWAYOUT
+
+ WDIOF_SETTIMEOUT bit set in GETSUPPORT
+
+wdt977.c -- Netwinder W83977AF chip
+
+ Hardcoded timeout of 3 minutes
+
+ Supports CONFIG_WATCHDOG_NOWAYOUT
+
+ Does not support any ioctls at all.
+
+scx200.c -- National SCx200 CPUs
+
+ Not in the kernel yet.
+
+ The timeout is set using a module parameter "margin" which
+ defaults to 60 seconds. The timeout can also be set using
+ SETTIMEOUT and read using GETTIMEOUT.
+
+ Supports a module parameter "nowayout" that is initialized
+ with the value of CONFIG_WATCHDOG_NOWAYOUT. Also supports the
+ magic character 'V' handling.

-- 
"Just how much can I get away with and still go to heaven?"
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/