[rescue] Anal-Retentiveness and Source Code

Thu Feb 7 15:16:50 CST 2002

There was a discussion on here a week or so ago about certain folks being
a bit fussy about starting clean-slate on their boxes, and then building
everything possible from source (including the compiler).   I thought I'd
share with you my experience at work for the last 26 hours, as it sheds
some light on -why- I do this on every box I run, whether it's IRIX,
Linux, NetBSD, FreeBSD, Solaris, whatever.

12noon:  During a scheduled security audit, I notice that $company's www
         server has a few interestingly-named accounts in /etc/passwd with
         uid 0, and /bin/login has a modification date that doesn't
         correspond with anything on my patch calendar.

1pm:     Pull the plug on the server to fully assess the damages and
         causes, and to verify containment.  Customers are redirected to
         an "Oops" page on my Indigo2.

2pm:     Begin building the new server to be an exact replacement by
         robbing parts from my stash at home (45 minutes away) and various
         boxen around the office.

Sidenote: $server is a SPARCstation 10, 128MB memory, 4GB disc, single
          CPU.  Truly a classic.  From 1992, as well, a good vintage.

4pm:     Some $expletive scratched my Solaris media and Software Companion
         media.  Yay for download & burn time and for a new lock on the
         media cabinet.

9pm:     Solaris installed.

11pm:    Patches installed.  Off to data center (30 minutes away).

11:10pm: It's freezing, and there's ice on US-290.  Great.

12:00am: At the Datacenter, with my VT420, an SS10, and the only tape
         drive I can find[1]: a slower-than-molasses-uphill-against-the-
         wind-in-Aspen-during-a-record-blizzard Archive Python.  With a
         pissy SCSI connector that likes to bounce into la-la-la-I-can't
         hear-you mode if the SCSI host even -thinks- about moving data
         at greater than 200kB/s.

3am:     Damn, but Solaris 8 takes a -long- time to patch when you've only
         managed to scrounge up 32MB of RAM.

5am:     Patches installed, now I'll build a host of software while
         waiting for old-www to finish backing up to tape.  I've promised
         people that this'll be done by noon, so I don't have time to
         build my own gcc+binutils.

7am:     Backup's done (avg. 168kB/s. Whoo-hoo!), and all I've got left to
         build are Berkeley DB, PERL, SSH, and Apache.

12pm:    Backup is restored onto the new server.  Now let's test this
         thing.  Oops!  Forgot to download wget.

12:05pm: wget's configure script informs me of an obscure linker bug in
         gcc < 3.0 with non-GNU ld or GNU ld < 2.9 under Solaris.

12:07pm: Bug verified.  SSH breaks with rld errors, libperl segfaults.
         Whee!

12:21pm: At least two dozen login entries show up at ftp.gnu.org with
         username "anonymous", and various passwords containing "rms", the
         names of assorted farm animals, and things you probably wouldn't
         want to describe in polite company.

12:30pm: GCC starts building, and I head off to lunch.  The thing's up
         (FSVO) and in-place, I can fix the rest from home.

Now, I'm back to where I was about 6 hours ago, as the whole host of
software I've built is dynamically-linked, and I -don't- want it randomly
failing left and right.

See?  I trusted Sun to be able to download and build GCC with a sensible
set of dependencies (like, oh, the -linker- it -has- to work with to build
shared libs and executables) and in a sensible configuration, and it
burned me.  The whole time I thought I was just being anal-retentive
about this on Solaris, but now I'm finally justified.  Old man Murphy was
sitting right there in SFWgcc grinning at me.

Not that Sun's the only culprit here.  I've had this experience with SGI's
precompiled freeware and even with DeadRat/SPARC 5.2.  I just really don't
trust vendors to build open-source software competently anymore.

--Jonathan
[1] That isn't doing nightly backups on something mission-critical.