Number 28 June 15,1992 The Huntington Technical Brief By David Brubaker Ph.D. Fuzzy Evidence in Character Recognition --------------------------------------- INTRODUCTION This month we will discuss a fuzzy application in character recognition that The Huntington Group developed under contract to Canon Research Center America. A departure from the previously presented applications, this system does not use a rule-base as its basis. [Not present in this online version of the newsletter is a drawing of a captial "A" with each of the three segments that make an "A" labeled - seg. 1, seg. 2, seg. 3.] ARCHITECTURE The goal in character recognition is to determine how well an unknown character matches one of a set of known characters - in our case the fifty-two upper and lower-case letters, ten numeric digits, and seven of the punctuation marks. The use of a fuzzy measure, an indication of how well an unknown entity fits into a crisp set, is ideal. The overall structure therefore involves creating templates of the possible characters and determining which (if any) of the templates an unknown character best matches. To introduce terminology, we start with a "premise", for example: THE SAMPLE CHARACTER IS AN UPPERCASE "A". A "template" of an uppercase A is created, made up of necessary "criteria". These criteria are individually matched to the "characteristics" of the unknown "sample". The matching of a characteristic to a criterion results in an individual piece of "evidence". Evidence may be either "supportive" or "opposive" (a coined term, intended to indicate the opposite of "supportive"). Supportive evidence supports and opposive evidence opposes the associated premise. Evidence may also be either "cumulative" or "conclusive". A piece of cumulative evidence in and of itself does not prove the premise; the accumulation of cumulative evidence provides increasingly strong support or opposition. Conclusive evidence either proves or disproves the premise without need for further evidence. All accumulated evidence is combined to form a final verdict. CHARACTER RECOGNITION The shapes of printed alphanumeric characters and punctuation can be thought of as collections of interconnected segments, each segment being either a line or a curve - a line is straight, a curve is not. Preprocessing reduces pixel-represented sample characters to the necessary segment representations. The criteria that make up templates are expressed using these segments, with characteristics expressed in fuzzy terms. For example, a line may be defined to be LONG and its slope to be nearly 90 degrees. A curve may be defined to be SHORT, with its orientation (the equivalent of a line's slope) HORIZONTAL and its arc approximately 180 degrees. Template criteria based on the characteristics of individual segments are obviously not enough. Criteria based on both the connections between segments, and the positional relationships among the segments must also be expressed. This can also be done using fuzzy terms. EXAMPLE - To be described will be the three step sequence to determine the degree of validity of the premise: The sample is an uppercase A. The first step consists of a single criterion that conclusively opposes the premise. Specifically, if any of the unknown samples segments is a curve, the verdict is set to V = 1, opposive (the sample character is not an A), and further attempts to match the sample to the "A" template are aborted. The second step is a preliminary pass to see if the character might be an A. Three criteria are specified: 1) That there is a line (seg_1) whose length is LONG, slanting slightly to the right of vertical; 2) That there is another line (seg_2) with length very nearly equal to the length of seg_1, and slanting slightly to the left of vertical; and 3) That there is a third line (seg_3) with length MEDIUM whose slope is HORIZONTAL. Successful matching of sample characteristics to the three criteria forms three pieces of evidence which, when combined, form the preliminary verdict that the unknown character may be an A. If the verdict is strongly supportive, the third and final set of criteria is considered. These are: 1) That seg_3 be positioned between seg_1 and seg_2; 2) That the leftmost endpoint of seg_3 connect to roughly the midpoint of seg_1; 3) That the right most endpoint of seg_3 connect to roughly the midpoint of seg_2; and 4) That the upper endpoints of seg_1 and seg_2 connect. Having formed additional evidence by attempting matches on these criteria, the final verdict as to whether the unknown character is an A or not is available. If the verdict is strongly supportive, the conclusion that the character is an A will be considered final, and templates of other characters (B, C, etc.) will not be considered. If, however, the verdict is weakly supportive, matches to the templates of other letters will be attempted, with the search order based on what character might be weakly considered an A (for example, an R or H). Finally, if the verdict is opposive, the standard search sequence will be continued. RESULTS - Preliminary results on non-noisy and partially noisy characters shows high success in character recognition. The search process, however, is slow in the software based solution. ---------------------------------------------------------------- The Huntington Technical Brief is published monthly as part of the marketing effort of Dr. David Brubaker of The Huntington Group. The unedited version complete with all figures is available at a subscription price of $24.00 per year. Past issues are available for $1.00 and samples of the Huntington Report are available at no charge. Please call Dr. David Brubaker at the number below for complete details. The 42-page report "Introduction to Fuzzy Logic Systems" is available for $35.00. For the past sixteen years Dr. Brubaker has provided technical consulting services in the design of complex systems, real-time, embedded processor systems, and for the past five years, fuzzy logic systems. If you need out-of-house expertise in any of these, please call 415-325-7554. ---------------------------------------------------------------- Copyright 1992 by The Huntington Group 883 Santa Cruz Avenue, Suite 31 Menlo Park, CA 94025-4608 This information is provided by Aptronix FuzzyNet 408-428-1883 Data USR V.32bis