2

I am currently working on a project where I need to detect bold text on a multi font-size image (so no mathematic morphology possible). This detection will be used in parallel of an OCR system (with tesseract) to detect which information (in bold) are important in a document.

I already tested the wordFontAttribute() function of tesseract but it is inconsistent : it provide me poor results of bold detection and decresease the performance of my OCR system because to use this function an old version of tesseract (v3) is needed.

I found a couple of scientific researchs who were based on font style detection and so on bold detection ("Automatic Detection of Italic, Bold and All-Capital Words in Document Images" and "Script Independent Detection of Bold Words in Multi Font-size Documents" on google scholar).

I was wondering if there is an code implementation of this research online.

Any others ideas on bold detection is also welcome

0