Why is DocReviewPad crashing during a search of OCR data?

When you type into the search field the app filters the document list to only show files that contain the search characters in the file names. This is a good way to find, for example, all the interrogatories by just typing "interr". Once you finish typing a search word and press Return the app will then start searching all the OCR data in each file. This is a very computational heavy task.

In the new versions of our apps we switched from using our own custom search to Apple’s PDF search. Apple’s search is very good at finding results in scanned documents (even better than Adobe’s search). This is because when scanning, the OCR process identifies text where it visually appears on the document, so sentences, phrases, and even words can be broken apart if the text is skewed or misaligned. Apple’s search is able to find phrases in a document even if that phrase is separate blocks of text as recognized by the OCR process.

Unfortunately, since Apple’s search is more powerful it is slower, and consumes more memory (RAM). During testing we did find that some documents, due to the way they were scanned, would consume a very large amount of memory during the search. If there are many of these documents inside a case, the search can consume too much memory when searching multiple documents at a time.

We tried to strike a balance between performance and reliability, but we are also working on optimizing the search in a way that will allow our users to keep the accuracy of the new search along with speed and reliability. In the meantime there are a few things you can do that can help prevent the search from crashing the app:

  1. Organize your case into logical folders and search the folders to avoid searching more documents than you need to.
  2. Avoid scanning multiple documents as a single PDF. Scanned documents contain images of each page, and the more pages a document contains, the more memory it will consume while it is being searched.
  3. Scanning at a higher DPI will cause higher memory usage when working with the document. This itself isn’t necessarily an issue, but when combined with scanning a lot of documents, it can be.
  4. For scanned documents, file size doesn’t matter. With everything else equal, a highly compressed 300DPI scan, once decompressed, will use just as much memory as the same image from a larger file. When scanning documents, having a setting over 150DPI offers little clarity but can consume much more resources.