[geeks] PDF: How to tell if a PDF file is text or image only
Charles Shannon Hendrix
shannon at widomaker.com
Tue Dec 12 15:12:45 CST 2006
I need to be able to tell in a shell script if a PDF file is text based
or image based.
What I mean is that some documents are actualy PDF text documents, while
others are image scans only with no text.
I have software which processes text files, but will fail on an image
only PDF file, so I want to detect those early to set them aside
for processing by hand.
Any ideas?
I've thought of scanning the header for "/Subtype /Image", but I'm not
sure that's only used to describe an image only document: it might for
any potential image section or image in a PDF file.
Anyway... ideas and even useful rants are welcome.
--
shannon "AT" widomaker.com -- ["Meddle not in the affairs of Wizards, for
thou art crunchy, and taste good with ketchup." -- unknown]
More information about the geeks
mailing list