[geeks] awk q: sorting on two different fields
William Kirkland
bill.kirkland at gmail.com
Sun Jul 30 07:05:53 CDT 2006
> Subject: [geeks] awk q: sorting on two different fields
. . .
> I am doing a directory for a non-profit organization. I am able to
> pre-process the CSV-like file and output to Postscript.
>
> However, what is really desired is for the data to be sorted on two
> different fields, alpha by $city and then alpha by $orgname in that
> city.
>
> e.g. given a hypothetical data set like
>
> Founder's Harvest, Philadelphia
> Lancaster Market, Lancaster
> John's Market, Lancaster
> Vento's Organic Cheese Steaks, Philadelphia
>
> I would end up with :
> (Lancaster sorts before Philadelphia)
> (Founder's appears alpha sorted before Vento's)
>
> John's Market, Lancaster
> Lancaster Market, Lancaster
> Founder's Harvest, Philadelphia
> Vento's Organic Cheese Steaks, Philadelphia
>
> Any idea how to do this in awk?
You really do not want to do this in awk, as der Mouse has already
indicated. Though I would like to add to his series of questions,
because I feel he missed the most critical one ...
- What is the source of your data?
a) sql database? ... sort the data as part of your query, prior
to formatting it.
b) flat file, one record per line? ... use a wrapper script to
strip the data to those records you are interested in, sort
that, then pipe it to your awk script for formatting.
c) flat file, multi line records? ... der Mouse provided this one
... convert everything to single strings of the form
"State/City/Name" (where / can be any separator that
doesn't appear in any of the strings, maybe a ^A if
necessary). Then sort those, then split them apart
again.
d) If you must do it *in* awk, you could use arrays to hold your
data, then sort the arrays. This is EXTREEMLY inefficent and
will be limited by awk's capacity to handle arrays (this is
not awk's best feature), as indicated by der Mouse's
questions ...
> I already have their directory broken out by state, just need to sort
> within the state by city and organization name.
>
> --Patrick
using option b ... assuming your data set is exactly as provided ...
Founder's Harvest, Philadelphia
Lancaster Market, Lancaster
John's Market, Lancaster
Vento's Organic Cheese Steaks, Philadelphia
$ sort -t "," -k 1 -k 2 <${input_file}
John's Market, Lancaster
Lancaster Market, Lancaster
Founder's Harvest, Philadelphia
Vento's Organic Cheese Steaks, Philadelphia
... so one would then simply pipe the output into their awk script ...
$ sort -t "," -k 1 -k 2 <${input_file} | awk ...
More information about the geeks
mailing list