THBImage - image and vector data analysis algorithms

THBImage can read, write and display vectordata. With the Vectoranalysation package you get the tools to run smart algorithms over the vector data to interpret its content.
For advanced algorithms like deskewing scanned documents, despeckle documents or punch hole removal we need advanced vectordata analysation functions.

For these operations we convert your raster image documents to vector data that can be analysed.

Preprocessing raster image data
The first step is to convert your raster image data to vectordata.
At this stage you have to choose between analysation performance and quality.
Scaling down to a lower resolution means better performance but a loss of detail as always this is a tradeoff depending on your input data.

The next step is to linearize the raster data using an edge detection algorithm.

Now we can extract vectordata contours out of the raster image data.
The resulting contours can be viewed along with the original imagedata on a separate layer using THBView.
Now that we have the vectordata we can calculate minimum bounding rectangle (MBR), convex hull, deskew angle.

Minimum Bounding Rectangle
The Minimum Bounding Rectangle ( MBR ) for a set of point, as the name suggests, is the minimum rectangular region with sides parallel to the axis of the data space that encloses all of the points in P.

Convex hull
Convex hull for the boundary of the minimal convex set containing a given non-empty finite set of points in the plane. Unless the points are collinear, the convex hull in this sense is a simple closed polygonal chain.
For planar objects, i.e., lying in the plane, the convex hull may be easily visualized by imagining an elastic band stretched open to encompass the given object, when released, it will assume the shape of the required convex hull.


Deskew
Scanned documents can be misaligned, in general because the paper was not placed completely flat when scanned.
Deskew is the process of removing skew from images.
In automatic scanning processes this causes troubles because there is no user who could manually deskew hundreds of scanned pages.
And if there is a user, why should he do things that the computer can do even faster.
Our deskewing algorithm works with 1bit, 8bit and RGB data. We preprocess it to get the raster input data we need.
Deskewing speed can be adjusted by lowering the inputdata quality.

Deskewing involves multiple steps.
We detect the deskew angle for rotation, then we rotate the image to remove skew. An optional step is to crop out the page to remove unnecessary border around the scanned document.