Discussion is here: https://news.ycombinator.com/item?id=45652952
The "guts" are here: https://github.com/majcheradam/ocrbase/blob/7706ef79493c47e8...
Equally important is how easily you can build a human-in-the-loop review layer on top of the tool. This is needed not only to improve accuracy, but also for compliance—especially in regulated industries like insurance.
Other tools in this space:
LLMWhisperer/Unstract(AGPL)
Reducto
Extend Ai
LLamaparse
Docling