This page is the incident log for the halibut systems. This exists in order to keep our user community informed of incidents that might affect them.
Date: 2003.12.21
Posted by: Josh
HalNet was down for about 2 hours due to a power outage (which was in turn due to an earthquake centered about 50 miles north of Mark's house). The generator was pulled out, but not connected before the power came back up.
Date: 2003.12.15
Posted by: Josh
The power outage killed the SDSL router (it locked up every half hour
or so), so HalNet experienced intermittent connectivity (shortly after
someone was around to kick the router) for the later portion of the 14th,
and most of the 15th. We swapped out the router on the evening of the
15th, which seems to have cleared things up. We used the downtime as
an opportunity to install more memory into chiba, and perform some other
hardware stuff.
Date: 2003.12.14
Posted by: Josh
Power outage for 2 hours, from roughly 7:55am PST to 10:00am PST. The UPS only held up for 15 minutes.
Date: 2003.11.22
Posted by: Josh
We installed a new RAID system (3ware based), with 4 drives total (3 used in a RAID 5 array, 1 hot spare). We've gone from 98-100% usage on /home to 6% usage. Happy day! Now to see if the new hardware works out well...
Date: 2003.07.02
Posted by: Josh
After 620 days of uptime, chiba was rebooted for some scheduled maintenance on July 2, 2003. Chiba is now running a newer kernel, and supports IPSec. Chiba has been setup to support our next line of hardware upgrades (to a 3ware RAID controller and new larger IDE disks...), and the necessary drivers to support Mark's software repeater project.
This clears the way for chiba's new incarnation, Mecha-Chiba, which will likely be based on OpenWall's Secure Owl Linux distribution.
Date: 2001.04.18
Posted by: Josh
We brought down the system to add another 256MB of memory, now with 30% more
silicon. Sweet Jesus!
Date: 2000.08.17
Posted by: Josh
Yesterday, around 1040 drive 0's SCSI interface stopped responding to the outside world. About 3 minutes after that, drive 1 was disabled use to a faulty RAID header. The machine's drives were at this point dead to the world. We then started playing the 'how does the Linux VM pagecache work'. Programs that were fully loaded in memory continued to work. Any program that had to access the disk to load the binary or data died. Programs that don't check return values from system calls started acting odd.
I determined that the drives and/or RAID controller had failed. I could not access the firmware RAID configuration software. Attempting to do so locked the machine.
I setup a new IDE disk as a replacement. All files were transfered from the drive to the new drive and verified.
What was apparently a RAID controller in early August could have been a transitory hard drive failure.
The RAID card's BIOS and the system BIOS appear to not work-and-play well with each other.