Looking up the word ‘dis’ in the online OED I came upon the following rather amusing quotation from The Independent.
Independent 11 May 2000: Seething at seeing his life’s work in pesticide research being dissed by the organic lobby, he called in the Advertising Standards Authority.
Tuesday, 7 August 2012
Friday, 3 August 2012
Character recognition
16/05/12 15:25 [Wednesday]
I have been thinking about character recognition and
visual field analysis again. The latest I was doing involved trying to settle
on a best resolution, given a visual field, as a prerequisite before getting
into the business of recognising objects at all. What I thought was that
finding a measure of ‘busyness’ and observing how the measure altered as
resolution increased might be the way to go. I see now that if instead of
busyness I think in terms of information content, then what is certainly
required is an optimal trade-off between that measure and the resolution since
as resolution increases so processing cost increases. In other words in the
animal kingdom would-be pattern recognisers need to gain maximum information
(through recognising objects in the environment, ultimately) for the least
possible expenditure of time and effort on processing.
What I have further thought is that the measure I
developed of ‘clustering’ should be used as the measure of information content.
Having toyed with simply counting black fragments (on the basis that many
fragments means many objects being observed) it strikes me that whatever the
number of fragments if they are better clustered it means they are better
defined and thereby more likely to give up useful information through being
recognised. Now I can measure clustering for a field of greyscale and this
obviates the need to distinguish black from white. If a pixel at xi
has blackness (inverse greyscale 0 .. 255) bi then the
measure of clustering is
∑bibj.exp
-d(xi - xj) 2
In effect we are counting each unit of blackness as
a separate black pixel.
I am wondering whether to use as a function of
resolution giving an estimate of processing cost, ∑bibj.
The processing the computer does is adding up a lot of exponentials and
processing cost is only saved in cases of bi = 0, but for
animal processing systems I feel they must model each unit of blackness
separately which leads to very dark fields being puzzling and headachey.
30/07/12 13:17 [Monday]
About two weeks ago I wrote a program based on the
ideas above, but found I needed to alter the measure to be maximised to
∑bibj.exp
-d(xi - xj) 2 / (1/n)∑bibj
where n is the number of pixels (width
x height of the rectangular field). The reason is the numerator has a
number of terms proportional to n rather than n 2
because for each pixel i the multiplication is not by the bj
values over the entire field but only those for which exp -d(xi
- xj) 2 is non-negligible and this value
is independent of the width or height of the field (as long as width and height
are not too small).
Using this measure to find the best resolution for
the field over my sample of cases (ie finding the resolution which maximises
the measure of information content in ratio to the processing cost, as above)
gives results like the following:
It must be admitted these divisions do correspond
well with the natural scale of structures within each image. For the picture of
the garden each quarter of it can be seen to be basically light (especially the
quarter showing the sky) or dark. For the portion of a printed letter the
reason the resolution arrived at is so high (corresponding in fact to the scale
of the width of lines making up printed characters) is that the black print
shows up so clearly against a very white background.
The question is where do I take this next? The next
thing is to analyse each subdivision arrived at of the image, using the same
technique of distinguishing light from dark at a natural grainsize. Repeated
subdivision will end when cells are found which are not suitable candidates for
further subdivision because they vary so little in greyscale across their
entire size: this stage will be marked by very low values for the ratio measure
defined above because really there will be no information content to speak of
within each cell.
Subscribe to:
Posts (Atom)