[geeks] script language advice
Nadine Miller
velociraptor at gmail.com
Tue Feb 5 17:27:12 CST 2008
Jonathan C. Patschke wrote:
> On Sat, 2 Feb 2008, Nadine Miller wrote:
>
>> num of dupes * filesize /path/to/file/filename /path/to/file/filename2
>> /path/to/filename3 [...]
>
> Ah, the perl code you'd want is something like:
>
> while (<>) { # Snag a line from stdin
> chomp; # Trim the line-ending
> my @components = split(/\s/); # Split into a list at spaces
> my $dupcount = shift @components; # Remove the first field.
> my $trash1 = shift @components; # Remove the second field.
> my $trash2 = shift @components; # Remove the third field.
> my $fileToKeep = shift @components; # Remove the fourth field.
>
> unlink @components; # Delete everything else.
> }
>
> If you wanted to test this, you could replace the last two lines with:
>
> print "Keeping $fileToKeep, nuking: ";
> foreach my $filename (@components) { print "$filename," }
> print "\n";
Thanks for the educational code, Jonathan. I really need to force
myself to sit down and do some serious work in Perl.
In this case, though, these file paths are not sane. Since the files
are on Fat32, there's a lot of spaces in the paths.
All of the partitions are mounted under the same sub-directory on the
Xubuntu box, so had I continued with the scripting, I would have split
the lines at that sub-directory "prefix" since it would be easy to add
the prefix back.
I copied the files from each of the WinXP computer's individual
partitions into a sub-dir on one external drive. For some reason, my
dad felt the need to partition his HD's into a bunch of smaller
partitions, so I had C:, D:, E:, etc. all the way up to K: on one of the
computers.
So now it looks like:
/subdir/computer1/partition1
/subdir/computer1/partition2
etc.
/subdir/computer2/partition1
/subdir/computer2/partition2
etc.
All mounted on a box were I shuffled data to make room for a Xubuntu
install.
I decided not to build the script after some testing of fslint with a
couple of smaller filesets with no recursion; it seemed to DTRT and
performed reasonably well. Being conservative, I am running it with
recursion on each sub-directory I created for the individual fat32 file
systems. When those are complete, I'll re-run it over the top
sub-directory with recursion to get down to a single copy across all
file systems. I feel a little sketchy trusting it, but to be honest,
I'd feel even sketchier trusting my own code. :-/
I haven't re-installed any of the original computers, so I can fall back
to the original data if this blows up. I am fairly certain that most of
the data is also backed up on removable media, too, but that would be a
bear to deal with unless absolutely necessary. I can tell my dad was
starting to doubt his own reliability, as I've found many, many
duplicates of things like digital photos and similar.
Bottom line, my dad was a pack rat, and unfortunately didn't segregate
"important" data (e.g. pictures, financial info, personal IP, contact
info, login info) from "interesting" but non-essential info (music
collection, info downloaded for later reading, etc). Hopefully this
will prod some other pack rats I know here to think about what might
happen at home with their computers if they were seriously injured/
passed away and a relative had to deal with the data.
I have already started a "home wiki" so that my husband and I can manage
all our website business related info. I think I'll be extending that
to cover some other things that we both should have access to, as well
as thinking about how to re-organize the rest of my data to make it
clear what is important and what is not. This experience also
reinforces my support of open formats.
Food for thought--
=Nadine=
More information about the geeks
mailing list