Bug: doesn't seem to perform OCR on image pdfs

This is one of the very few pdf to md solutions I've found that is able to preserve italics and bold from text pdfs, which is great. However, when I attempt to convert an image pdf the only output is an md file with a link to the extracted image. All dependencies are installed, including opencv and pytesseract. 

This is the terminal output:

2025-10-25 20:36:04,152 - __main__ - INFO - Image captioning model set up successfully.
2025-10-25 20:36:04,185 - __main__ - INFO - Extracted 0 tables from the PDF.
2025-10-25 20:36:04,185 - __main__ - INFO - Processing page 1
2025-10-25 20:36:04,497 - __main__ - INFO - Extracted 0 links from the page.
2025-10-25 20:36:05,672 - __main__ - INFO - Markdown content saved successfully.
2025-10-25 20:36:05,672 - __main__ - INFO - Markdown content has been saved to pdf_to_MD_output/Test-italics.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: doesn't seem to perform OCR on image pdfs #6

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Bug: doesn't seem to perform OCR on image pdfs #6

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions