Then I match keywords in the strings to categorize the expenses. These days, this would be called AI :)
Edit: I use pdftotext, which has a mode that keeps the spatial structure of tables. Works for my bank.
pdftk input.pdf output output.pdf uncompress
Then try grepping or using whatever tools you like (there will be binary parts of the file still, like embedded fonts and bitmaps).