FTK is fast. It will use 32 cores if I let it, on all my machines. One step which isn't fast however, is OCR. It will sit there and use one core to OCR and this can cause OCR to lag behind processing for weeks.
I've tried FTK's Leadtools module, which is an alternative library (no success). Encase doesn't have it, unless you get Encase eDiscovery. X-Ways is the same, although OCR files could be exported and processed separately. The issue is that it makes a mess of hierarchy and long path issues abound. Also, it seems to not be as good for my cases (5M+ items).
Does anyone have a solution for OCR processing?
I'm thinking a workaround might be inserting the fulltext into the database. It doesn't appear to be possible with FTK since dtSearch is a roadblock and there isn't an API, but maybe an Enscript with Encase? Rather than reinvent the wheel I thought I'd ask.
↧