Enterprise Strength Optical Character Recognition

Symphony OCR is an enterprise grade, back-end optical character recognition (OCR) system for Worldox®.  Symphony OCR makes the process of OCRing documents as easy as filing the document to Worldox.

Why OCR?

  • Firms using Worldox rely on full text searching to locate all documents that contain a given word or phrase
  • Scanned documents are not text-searchable when they are scanned
  • OCR is the process of analyzing the scanned page to determine which letters and words are actually contained on the page
  • After OCR, images become text searchable

Without OCR, between 25% and 50% of the PDF files in a typical Worldox document repository may not be full text searchable.  OCR allows the firm to be confident that when they perform text searches, they are searching for all of their documents – including scanned images.

Capabilities

Set it and forget it No employee effort is required – the very act of saving an image to Worldox means it will automatically become text searchable
Image+Text PDF files Symphony OCR generates image+text PDF files.  This special type of PDF preserves the exact image as it was scanned, overlaying it with an invisible layer of text with the OCR results.  This text is full text searchable, and users can even select and copy text directly from the image
Back-end processing Symphony OCR runs as a back-end process, analyzing and OCRing every document in your Worldox repository.  This means that all images will be OCRed and available for full text searching – regardless of whether they were scanned, received via email, on physical media as part of discovery, etc…
Advanced Analysis Not all PDF files (or pages in a given PDF file) should be OCRed. Symphony OCR’s Analysis module determines which pages of a given PDF are appropriate to OCR
Legacy image handling Many firms have existing filing structures with hundreds of thousands of pages that need to be OCRed.  Symphony OCR ensures that the legacy backlog gets OCRed, while still processing newer documents in a timely manner
Content preservation Symphony OCR preserves all of the content of the original file, including any PDF annotations, Bates numbers, bookmarks, etc…