Attribute type | Value | Function | Default |
LIBVERSION | string | Returns a string containing the library version. This is a read-only attribute. | none |
VERBOSE | an integer from 0 to 3 | [t]5cmSets the level of output: 0 nothing; 1 error messages; 2 warnings and errors; 3: everything. Used mostly for debugging. |
1 |
BLOCK_OVERLAP | boolean | If true, allows two blocks to overlap | FALSE |
NO_BLOCK | boolean | If true, and no block was found, creates a block covering whole image. | TRUE |
CHAR_OVERLAP | boolean | If true, allows characters to overlap | TRUE |
CHAR_RECTANGLES | boolean | If true, all characters are selected as rectangles | TRUE |
FIND_ALL | boolean | If true, first find all characters, saving in memory, and then process. | FALSE |
ERROR_FILE | (FILE *) variable | Sets the error messages output file. | stderr |
an integer from 0 to 6 | [t]5cmWhat is printed: 0: only data bit (. = white, * = black) 1: marked bits (mark1 + 2*mark2 + 4*mark3) 2: data and marked bits: if white, a...h;if black, marked bits->A...H 3: only isblock bit (. = is not block, * = is block) 4: only ischar bit (. = is not char, * = is char) 5: complete byte, in hexadecimal 6: complete byte, in ASCII |
0 | |
PRINT_IMAGE | boolean | If true, gocr_print* functions will print the image associated with the structure. | 1 |
Module type | Function | Examples |
imageLoader | Loads an image. | Load images. There can be only oneimage loader. |
imageFilter | Filter the image. | Dust removal, etc. |
blockFinder | Find blocks, i.e., groups of similar dataand add information of its content. | Find pictures, find columns of text,find mathematical expressions. |
charFinder | Frame characters, and add informationof its content. | Frame characters, font recognition. |
charRecognizer | Recognize the framed characters. | Italic, bold, greek specialiazed OCR. |
contextCorrection | Try to recognize the still unrecognized characters. | Spell checker, ligature checker. |
outputFormatter | Output data to some format and file. | HTML output, LATEX output. |
Type | Arguments |
gocrBeginWindow | ( wchar_t *title, wchar_t **buttons ) |
gocrEndWindow | |
gocrDisplayCheckButton | |
gocrDisplayImage | |
gocrDisplayRadioButtons | |
gocrDisplaySpinButton | |
gocrDisplayText | |
gocrDisplayTextField |
Type | Symbol | Pixel size |
Black & white | GOCR_BW | 1 |
Grayscale | GOCR_GRAY | 2 |
Color | GOCR_COLOR | 4 |
User-defined | GOCR_OTHER | - |
Block type |
TEXT |
PICTURE |
MATH_EXPRESSION |
Field | Description |
m0 | Top boundary |
m1 | Middle |
m2 | Baseline |
m3 | Bottom |
What is UNICODE?In GOCR, we adopted the Unicode Standard version 3.0. To the programmer using GOCR, this is a very simple way to deal with characters that are not in the ASCII or the ISO-8859-1 table, and let one to support any language.
Historically, there have been two independent attempts to create a single unified character set. One was the ISO 10646 project of the International Organization for Standardization (ISO), the other was the Unicode Project organized by a consortium of (initially mostly US) manufacturers of multi-lingual software. Fortunately, the participants of both projects realized around 1991 that two different unified character sets is not what the world needs. They joined their efforts and worked together on creating a single code table. Both projects still exist and publish their respective standards independently, however the Unicode Consortium and ISO/IEC JTC1/SC2 have agreed to keep the code tables of the Unicode and ISO 10646 standards compatible and they closely coordinate any further extensions. Unicode 1.1 corresponds to ISO 10646-1:1993 and Unicode 3.0 corresponds to ISO 10646-1:2000.
Modifier | Characters |
ACUTE_ACCENT | aeiouy AEIOUY |
CEDILLA | c C |
TILDE | ano ANO |
GRAVE_ACCENT | aeiou AEIOU |
DIAERESIS | aeiouy AEIOUY |
CIRCUMFLEX_ACCENT | aeiou AEIOU |
RING_ABOVE | a A |
e or E ( æ, oe) | ao AO |
If main is a capital letter, the returning characters will also be capital letters. Support of greek accents (tonos, dialytika, etc) is under way.
Latin Greek a α b β g γ d δ e є z ζ h η q θ i ι k κ l λ m μ n ν x ξ o o p π r ρ & ς s σ t τ y υ f ϕ c χ v ψ w ω
Table 3.1: Latin → greek reference for gocr_compose.
|
|
This document was translated from LATEX by HEVEA.