What is OCR and when do you actually need it.
OCR turns images of text into real text — great for scanned books, contracts and document photos. When it works perfectly, when it struggles, and how to use it in Freekonvert.
OCR (Optical Character Recognition) is the technology that looks at an image of a page and extracts the actual text from it. The difference is fundamental: without OCR, the image of the word "Contract" is just a bunch of pixels that look like that word. With OCR, the computer knows that says "Contract" and you can copy it, search it, and edit it.
When you need OCR
- You scanned a contract and want to copy one sentence into an email — without OCR, you cannot select it.
- You have a 200-page PDF with scans of a book and you need a specific phrase — OCR makes Ctrl+F possible.
- You need a Word version of a document someone sent you as a scanned PDF — OCR reads the text so LibreOffice can put it into a .docx.
- You have a phone photo of a printed document — OCR can extract the text (if it is legible).
When you do NOT need it
If your PDF already has real text (try Ctrl+F in it — if it finds words, it has text), OCR just wastes time. Word documents, PDFs saved from Word, store receipts as PDF — those are native text PDFs. OCR is for **scanned** documents.
How it works (simple version)
Modern OCR uses neural networks trained on millions of text examples in many languages. The short version:
- 1. The algorithm splits the page image into smaller regions (paragraphs, lines, words).
- 2. Each word is recognised by comparing pixel shapes with learned letter patterns.
- 3. A language dictionary checks whether the result makes sense — "cotnract" gets corrected to "contract" because the first is not a real word.
- 4. Output: text with positions inside the document, ready to be embedded into a PDF or extracted as .txt or .docx.
What affects OCR quality
Scan resolution
Ideal: 300 DPI. Lower (like 150 DPI phone photos) can still work for printed text but stumbles on small details. Higher than 300 DPI does not help — it just slows things down.
Original quality
Printed text with good contrast (black on white) — ~99% accuracy. Handwriting — 60-85% in the best case. Photographed at an angle — "deskew" preprocessing helps. Stains, glare, folded pages — all of these lower accuracy.
Language
OCR engines are trained per language. We use srp+eng models in Freekonvert, so Serbian and English have good accuracy. Montenegrin Cyrillic/Latin is part of the Serbian model. For German, French and so on it still works, but with somewhat higher error rates because the model is not primarily trained for them.
How it works in Freekonvert
Our PDF to Word tool has OCR built in — it automatically detects whether the PDF already has text. If yes, it uses that text directly (faster, perfect accuracy). If not (scanned PDF), it runs OCR with srp+eng models before handing the file over to LibreOffice to produce a Word file. All in a single click, no manual choice required.