Thanks, I always forget I can do that! I've given it a go and it's really impressive – the default chunker is very smart and manages to keep most of the chunk context together
The table parser in particular is really good. Is the trick that you draw some guide lines and rectangles around tables? I'm trying to understand the GraphicsStreamProcessor class as I'm not familiar with Tika, how does it know where to draw in the first place?