I bet 90% of the problem space is legacy PDFs. My company has thousands of these...

		carabiner 10 months ago \| parent \| context \| favorite \| on: PDF to Text, a challenging problem I bet 90% of the problem space is legacy PDFs. My company has thousands of these. Some are crappy scans. Some have Adobe's OCR embedded, but most have none at all.