Tesseract Ocr Returns Null String
I am building an OCR app for android and i use tesseract ocr engine. Somehow every time i use the engine on a photo it returns an empty text. This is my code: public String detectT
Solution 1:
You are using the OCR Engine Mode Enum value for setting the page segmentation in your setTessData() method.
setTessData() {
...
tessBaseAPI.setPageSegMode(TessBaseAPI.OEM_TESSERACT_ONLY);
}
Based on the type of image on which you are trying to detect the characters, setting an appropriate Page segmentation mode will help detect the characters.
For example :
tessBaseAPI.setPageSegMode(TessBaseAPI.PageSegMode.PSM_AUTO);
The various other Page segmentation values are present in TessBaseApi.java :
/** Page segmentation mode. */
public static final class PageSegMode {
/** Orientation and script detection only. */
public static final int PSM_OSD_ONLY = 0;
/** Automatic page segmentation with orientation and script detection. (OSD) */
public static final int PSM_AUTO_OSD = 1;
/** Fully automatic page segmentation, but no OSD, or OCR. */
public static final int PSM_AUTO_ONLY = 2;
/** Fully automatic page segmentation, but no OSD. */
public static final int PSM_AUTO = 3;
/** Assume a single column of text of variable sizes. */
public static final int PSM_SINGLE_COLUMN = 4;
/** Assume a single uniform block of vertically aligned text. */
public static final int PSM_SINGLE_BLOCK_VERT_TEXT = 5;
/** Assume a single uniform block of text. (Default.) */
public static final int PSM_SINGLE_BLOCK = 6;
/** Treat the image as a single text line. */
public static final int PSM_SINGLE_LINE = 7;
/** Treat the image as a single word. */
public static final int PSM_SINGLE_WORD = 8;
/** Treat the image as a single word in a circle. */
public static final int PSM_CIRCLE_WORD = 9;
/** Treat the image as a single character. */
public static final int PSM_SINGLE_CHAR = 10;
/** Find as much text as possible in no particular order. */
public static final int PSM_SPARSE_TEXT = 11;
/** Sparse text with orientation and script detection. */
public static final int PSM_SPARSE_TEXT_OSD = 12;
/** Number of enum entries. */
public static final int PSM_COUNT = 13;
}
You can experiment with different page segmentation enum values and see which gives the best result.
Post a Comment for "Tesseract Ocr Returns Null String"