SAFS Image-Based Recognition

Originated: Sept 09, 2008
Updated: Oct 27, 2010 -- Update on OCR and PerImageModifiers.
Updated: Dec 13, 2013 -- Update on new Fuzzy Matching.
Updated: Feb 19, 2016 -- Update on Whole Screen ImageRect and SearchRect.

Contents: Walk-Thru, Syntax, PerImageModifiers, Images, FuzzyMatching, Commands, more...

One of the big issues facing todays test automator is the fast-pace in which tools and application technologies are changing. Applications might be developed in Java Swing, or RCP\SWT, or .NET 2.0, or 3.0, AJAX, Adobe Flex, Google's GWT\Chrome, Delphi, or PowerBuilder; and they might be running in IE6, or IE7, or Firefox 2.0, or Firefox 3.0 or who-knows-what else.

Tool manufacturers can hardly keep pace with the needs of cutting edge application testers. If you are tasked with testing newer or novel technologies then it is usually difficult to find a tool that can work in that environment early in the development and testing lifecycle. When the tools you have can't support the technologies you need to test you need to turn to something else. This is a good time to consider Image-Based testing.

Image-Based testing allows us to test virtually anything that can be displayed on the screen. It doesn't matter what the underlying technology is, if it is visible on the screen then we can interact with it.

The SAFS Framework to-date has largely been based on "Object-Based" testing and SAFS Component Recognition. With Image-Based testing we must now expand our recognition syntax to allow for testing based on finding and interacting with graphics on the screen.

Top, Syntax, PerImageModifiers, Images, FuzzyMatching, Commands, more...

A Walk-Thru:

Test records for image-based testing are the same as they are for object-based testing:

Test Records:
T, IExplorer, Maximize, Click
T, IExplorer, Restore, Click
T, IExplorer, Close, Click

The friendly names for the "window" and the "component" are mapped in the application map the same as for object-based tests, but the syntax facilitates image-based testing:

IExplorer App Map Entry:

[IExplorer]
IExplorer="Image=<imagepath>"

<imagepath can be the full path to a single image or to a directory containing multiple images. Multiple images are necessary if the target image is different in different environments. For example, on different platforms, or different versions of the application or operating system. The framework will search the screen for each of the images in the directory until it finds the match.

<imagepath> can also be an incomplete 'relative' path to a single image or to a directory of images. The specified path can be relative to the SAFS project directory, or to the SAFS project\Datapool directory.

Sample <imagepath>:

IExplorer

Right away we can tell that finding the IExplorer icon in the top-left of the titlebar does not give us enough information to locate significant areas within the IExplorer window. We can find any individual image this way and act on areas relative to it. But in many cases it will be beneficial to identify more specific bounds for the area we want to search or act upon.

We can enhance our IExplorer recognition to include an image to the right of the anchor that identifies the width or right-side bounds of the area of interest. We can add an ImageRight or ImageR modifier to find the Close icon in the titlebar. The framework knows to search only within the narrow band to the right of the anchor icon for this ImageR image.

Enhanced <imagepath>:

IExplorer

IExplorer

By capturing these two small images, we are now able to complete our App Map and execute all three of the test records listed above. This works because we used two images to define the area of interest of our "window" and the framework knows to only seek "components" within this area of interest.

IExplorer Titlebar

IExplorer App Map Entries:

[IExplorer]
IExplorer="Image=IExplorer\;ImageR=Titlebar\Close\"
Maximize="Image=Titlebar\Close\;Hotspot=-17"
Restore="Image=Titlebar\Close\;Hotspot=-17"
Minimize="Image=Titlebar\Close\;Hotspot=-35"
Close="Image=Titlebar\Close\"

As you see we can also define Hotspots that tell the framework to act on a point relative or offset from the location where an image was actually found. We specified Hotspot=-12 (12 pixels to the left) of the Close icon to act on the Maximize and Restore controls without having to capture their images.

Top, Walk-Thru, PerImageModifiers, Images, FuzzyMatching, Commands, more...

Image-Based Recognition Syntax

Image=

The primary anchor image to seek identifying the item or area of interest on the screen. When seeking a "window" mapping the entire screen is searched for this image. When seeking a "component" mapping the search area is limited to the area of interest found for the "window" mapping. The bounds of the area of interest can be expanded by using the optional ImageR and ImageB items described below.

Sample:

IExplorer="Image=<imagepath>"

ImageR|ImageWidth|ImageR|ImageRight=

Optional. A second (and, or third) image to seek to expand the area of interest to the right of the anchor image. This expands the width of the area of interest. This can be used with 'ImageH=' to fully identify the area of interest rectangle by width and height.

Sample:

IExplorer="Image=<imagepath>;ImageR=<imagepath>"

ImageH|ImageHeight|ImageB|ImageBottom=

Optional. A second (and, or third) image to seek to expand the area of interest down from the anchor image. This expands the height of the area of interest. This can be used with 'ImageR=' to fully identify the area of interest rectangle by width and height.

Samples:

IExplorer="Image=<imagepath>;ImageH=<imagepath>"
IExplorer="Image=<imagepath>;ImageW=<imagepath>;ImageH=<imagepath>"
IExplorer="Image=<imagepath>;ImageB=<imagepath>;ImageR=<imagepath>"

ImageText=

An IBT search mode used to locate a component (non-top window) via OCR text. No recognition image needs to be stored for the component. Only the Tesseract OCR engine supports it at this time. This mode can help SAFS IBT locate a child component without resorting to stored images.

This mode is not well supported for NLS testing at this time, and the accuracy of the Tesseract OCR could be better. However, if the text you seek is reliably found by this method it saves on having to capture and maintain child recognition images.

This mode supports the use of the following modifiers to fine tune the search and the ultimate point of interest:

SearchRect	Limit the area of the search in the "window" area.
Index	Find the Nth match.
PointRelative	Default Hotspot relative to the text location.
Hotspot	Coordinate offset relative to the Default Hotspot.

Samples:

Component="ImageText=Log Off"

Index|Ind=

Optional. Specifies to find the Nth instance of the anchor image. The first instance is N=1 which will be sought by default if Index is not specified.

Samples:

IExplorer="Image=<imagepath>;Index=3"
IExplorer="Image=<imagepath>;Index=3;ImageR=<imagepath>"

BitTolerance|BT=

Optional. Specifies the integer percentage (1-100) of image bits or pixels that must match for an image to be considered a successful match. The default is, of course, 100. This means ALL pixels must match unless some other BitTolerance is specified. NOTE: BitTolerance must be less than 100 for fuzzy logic to be be invoked by the search algorithm. See Fuzzy Matching for more information.

Samples:

IExplorer="Image=<imagepath>;BitTolerance=70"
IExplorer="Image=<imagepath>;ImageR=<imagepath>;BT=75"

Hotspot|HS=

Optional. The X,Y offset for specific actions like Click relative to the overall area of interest. The default Hotspot is the center of the area of interest and X,Y offsets specified in the recognition string are generally relative to this center point. Negative values for X and Y are allowed and will be clipped to screen edge as needed. The X,Y offset values can be separated by a comma or a space char.

Samples:

Close="Image=<imagepath>;Hotspot=offsetX,offsetY"
Close="Image=<imagepath>;Hotspot=-12"
Close="Image=<imagepath>;Hotspot=-12,0"
Close="Image=<imagepath>;Hotspot=-12 3"
Close="Image=<imagepath>;Hotspot= ,3
Close="Image=<imagepath>;Hotspot=0,3

PointRelative|PR=

Optional. The PR Value is used to change the default location of the Hotspot for the area of interest. For example, we can change the default hotspot to be "TopLeft" or "BottomRight". If a Hotspot= value is also specified then the Hotspot= offsets are relative to this PR value.

Valid PR= values are:

  TopLeft       TL
  TopCenter     TC
  TopRight      TR
  LeftCenter    LC
  Center        C
  RightCenter   RC
  BottomLeft    BL
  BottomCenter  BC
  BottomRight   BR

Samples:

Maximize="Image=<imagepath>;PointRelative=LeftCenter"
Maximize="Image=<imagepath>;PR=LC;Hotspot=-5" (5 pixels from the left edge vertically centered)
Maximize="Image=<imagepath>;HS=0,5;PointRelative=BottomRight" (5 pixels below the bottom right corner)

SearchRect|SR=

Optional. SearchRect alters the area X,Y,Width, and Height searched for a target image. For a "window" mapping, SearchRect can alter the area of the screen searched. For a "component" mapping, SearchRect alters the search rectangle relative to the found "window" area.

For component searches, SearchRect can now be used to expand or modify the area to be searched--not just reduce the scope of a search. This enables very creative component searches that can actually be almost anywhere relative to the window rectangle previously found.

X,Y,W,and H are individually optional and their presence or absence will define or modify the search rectangle accordingly. Missing X and\or Y values will default to 0. Missing W and\or H values will default to the maximum width and height of the screen when searching for a window image and default to the width and height of the window rectangle when searching for a component relative to the window.

Coordinates can be comma OR space delimited, but only use one or the other.

All values can be absolute or can be specified as a percentage. For component searches relative to a window the percentage is not limited to 100%. For example, it is reasonable to expand the width of a search rectangle by more than 100%. All percentages for component searches are considered to be percentage of window width and height NOT window position(X,Y) or screen width and height(W,H).

Window Search Samples:

TopEdgeItem="Image=<imagepath>;  SearchRect=0,0, ,75" (Search only the top 75 pixels)
RightEdgeItem="Image=<imagepath>;SR=750,0"        (Start search 750 pixels from left)
LeftEdgeItem="Image=<imagepath>; sr=0,0,25%"     (Search only the left 25% of screen)
TaskBarItem="Image=<imagepath>;  SR=0,90%"     (Search only the bottom 10% of screen)

Component Search Samples:

a) TitleBarItem="Image=<imagepath>;  SearchRect= ,  , , 10%"
b) OffsetTitlebarItem="Image=<imagepath>;SR=0,-15, 10, 120%"
c) LeftSideItem="Image=<imagepath>; sr= -150%, -5, 160%, 30"

a) The TitleBarItem SearchRect above indicates that only the top 10% of the found window rectangle should be searched for the required component image. This limits the search to what is often considered to be the Titlebar of a window.
(Of course, we aren't always looking for a window. Sometimes we are just looking for a reference image anywhere on the screen.)

EXAMPLE TitleBarItem SearchRect= 0, 0, 0, 10%

Modified rectangle for the component search is: x=400, y=200, w=100, h=30

b) The OffsetTitlebarItem SearchRect shows we want to start our search 15 pixels higher (y-15) than we otherwise would search. This is useful if the component image we are seeking is not exactly inline or inside the window rectangle we are working with. This SearchRect is also expanding the width of our search by 10 pixels (w+10) and the height of our search by 120% (h*120%).
(We often want to alter the width and\or height of the search to accomodate changes we might have made to the x and y coordinates for the start of the search.)

EXAMPLE OffsetTitlebarItem SearchRect= 0, -15, 10, 120%

Modified rectangle for the component search is: x=400, y=185, w=110, h=360

c) The LeftSideItem SearchRect shows an example where we aren't actually looking for something inside our "window". Here we are actually trying to find an image that is actually to the left of our initial window image--outside the bounds of the "window". In this case, we are changing the component search rectangle 'x' coordinate to move left of the window by 150% (w*150%). If the window rectangle is 50 pixels wide then we are moving the start of our component search(x) 75 pixels to the left of the window rectangle. This SearchRect is also expanding our search rect 5 pixels higher(y-5), making the search width 160% greater(w*160%), and adding 30 pixels to the height of the search(h+30).

EXAMPLE LeftSideItem SearchRect= -150%, -5, 160%, 30

Modified rectangle for the component search is: x=250, y=195, w=160, h=330

Notice how the LeftSideItem will be sought to the left of the original "window" and not inside it.

Whole Screen Child Searches

It is now possible to search for "child" component images without specifying a parent "window" anchor. Instead, the Window item should be specified as a user-defined search rectangle--which can be the whole screen or any subarea of the screen.

The Window item mapping specifying a search rectangle can use any of the following equivalents:

SearchRect=
SearchRectangle=
ImageRect=
ImageRectangle=

Sample:

[IExplorer]
IExplorer="SearchRect=0,0,1024,768"       or
IExplorer="SearchRectangle=0,0,1024,768"  or
IExplorer="ImageRect=0,0,1024,768"

The X, Y, W, and H entries can optionally be specified as a percentage of the screen:

Sample:

[IExplorer]
IExplorer="SearchRect=0,0,50%,25%"       (top-left of screen)
IExplorer="ImageRect=50%,50%,100%,100%"  (bottom-right of screen)
IExplorer="ImageRect=0,0,100%,100%"      (the whole screen)

A simple example to find target images anywhere on the screen:

[AWindow]
AWindow="SearchRect=0,0,100%,100%"
AnImage="Image=<imagePath>"
AnotherImage="Image=<imagePath>"
AComponent="Image=<imagePath>"
AnotherComponent="Image=<imagePath>"

Top, Walk-Thru, Syntax, Images, FuzzyMatching, Commands, more... 'UsePerImageModifiers'

We have added the capability for a different SearchRect, Index, and BitTolerance to be specified for every image in a multi-image definition--like the kind generally used to identify the corners of a Window. An example using SearchRect is shown below:

Win="Image=image\topleft.tif;SR=,,,20%;ImageW=image\topright.tif;SR=0,-10,100%,30"

To enable this feature, you must set the following in your test INI file:

[SAFS_IBT]

UsePerImageModifiers=TRUE

Without UsePerImageModifiers set to TRUE, the first modified SearchRect found will be applied to the initial anchor image (Image=) only, and no modified SearchRect will be applied to ImageW or ImageH images. The same per-image usage is made available for Index and BitTolerance with this setting.

UsePerImageModifiers is actually correcting a defect in the original implementation. It was always intended to work this way. However, we have retained the old "broken" functionality as default to retain backward compatibility with existing tests.

Top, Walk-Thru, Syntax, PerImageModifiers, FuzzyMatching, Commands, more...

Image Formats for Image-Based Recognition

Icons or images stored for screen matching must be in one of the formats supported by the JAI API:

BMP
FPX
GIF
JPEG
PNG
PNM
TIFF

It is important to note that images must be saved in a format that provides no-loss of pixel information. Stored images must be able to match with 100% picture quality the image snapshots that will be retrieved from the screen. While "BitTolerance" discussed above allows for some degree of comparison fuzziness, it will usually not be able to compensate for stored images that cannot reproduce 100% picture quality due to excessive compression or intentional loss of pixel information.

It is also important to note there are differences in display configurations that will likely require separate images to support them. For example, performing the same test in the following display configurations--even on the same machine--might require a different set of images for each configuration:

Normal Workstation Display
Remote Desktop Display
Remote Web (ex: Juniper) Display

This is not an issue of screen resolution. Images stored for a particular Display typically work for most or all screen resolutions on that Display.

This is an issue that each Display is configured for different levels of data compression. Bitmaps stored for the Normal Display have no data compression and no loss of image information. The displayed image for the Remote displays is usually compressed--intentionally removing image information. Because of this, Normal Display images usually will not match Remote Display images.

To compensate for this, it is highly recommended that recognition images always be captured in the display mode that will be used for runtime testing. For example, if you know all testing will be done via Remote Desktop sessions, then it is best to have all recognition images captured and prepared during Remote Desktop sessions.

Note:*** Remote testing over VNC does NOT have this display problem! *** The target machine is actually using its Normal Display (uncompressed) so the images that work for the Normal Display continue to work locally even when manipulated or viewed remotely.

Top, Walk-Thru, Syntax, PerImageModifiers, Images, Commands, more...

Fuzzy Matching:

There are occasions where strict image comparisons will not find an exact 1-to-1 match of the target image. BitTolerance (BT) allows us to get past some of these problems, but not all. There are cases where dynamic onscreen image dithering, color blending, and transparency can prove almost impossible to make an accurate match of a stored image to a dynamically generated onscreen location. For these cases, you might want to try Fuzzy Matching.

When fuzzy matching is turned on, and the original algorithm fails to find the image, then it will make a fuzzy logic attempt to match the image. This algorithm will expand the match attempt for each pixel tested to not just a single pixel at a specified location, but also the 8 pixels adjacent to the pixel being tested. In this way, inconsequential image dithering and color blending normally causing comparison failures have a better chance of allowing a successful match.

It is important to note, however, that the improved chance of a successful match also heightens the chance of a false match--a "false positive"--indicating a particular location on the screen is an image match when really it is not. This concern makes it critical that the target images stored for matching are truly uniquely identifiable on the screen and not easily matched incorrectly to the wrong location on the screen. It also means that BitTolerance must be carefully used to help guide the algorithm to correct matches, and not false positives.

It is also important to note that enabling a 9-to-1 fuzzy matching comparison across the entire screen, or any subarea of the screen, is by nature a much larger performance hit than the normal 1-to-1 comparison algorithm. Consequently...

You do NOT want to leave fuzzy matching on unnecessarily. The algorithm can be a huge performance hit when you are conceivably performing nine pixel tests for every 1 pixel test normally attempted.

To help reduce unnecessary performance hits, the algorithm does NOT attempt fuzzy matching if BitTolerance (BT) is not specified, or is set to 100--an exact match. Thus, to enable fuzzy matching requires a two-step process. The user must enable Fuzzy Matching with the SetImageFuzzyMatching Driver Command, and the recognition strings for the image(s) to be fuzzy matched must have a BT less than 100.

Example App Map:

[Window]
;define an area on the screen to be considered our "Window"
Window=Image=Images\Product\WindowAnchorDir;ImageR=Images\Product\WindowRDir

;define a child within that "Window" space that *might* require fuzzy matching
Child1=Image=Images\Product\Child1Dir;BT=70

Example TestRecord Usage:

;testing has shown that fuzzy matching not needed here
T, Window, Window, Click

;testing has shown that fuzzy matching IS needed for Child1
C, SetImageFuzzyMatching, ON
T, Window, Child1, Click
C, SetImageFuzzyMatching, OFF

Note: In the above test records, the search for Window does NOT use fuzzy matching--even when fuzzy matching is turned ON. This is because the App Map recognition string for Window does not specify a BitTolerance (BT). Thus, only the search for Child1 will take the performance penalty for using fuzzy matching.

Top, Walk-Thru, Syntax, PerImageModifiers, Images, FuzzyMatching, more...

Supported Commands:

The following commands are supported for Image-Based Testing:

Component Functions:

Click	a.k.a. ClickScreenImage
ClickScreenPoint	Absolute location. No Window or Component needed.
ClickScreenLocation	Absolute location + relative offsets. No Window or Component needed.

RightClick	a.k.a. RightClickScreenImage
RightClickScreenPoint	Absolute location. No Window or Component needed.
RightClickScreenLocation	Absolute location + relative offsets. No Window or Component needed.

DoubleClick	a.k.a. DoubleClickScreenImage
DoubleClickScreenPoint	Absolute location. No Window or Component needed.
DoubleClickScreenLocation	Absolute location + relative offsets. No Window or Component needed.

MultiClick	a.k.a. MultiClickScreenImage

CtrlClick	a.k.a. CtrlClickScreenImage
CtrlRightClick	a.k.a. CtrlRightClickScreenImage
ShiftClick	a.k.a. ShiftClickScreenImage

LeftDrag
RightDrag

HoverMouse
HoverScreenLocation	Absolute location + relative offsets. No Window or Component needed.

InputKeys
InputCharacters
TypeKeys	No Window or Component necessary.
TypeChars	No Window or Component necessary.

GuiDoesExist
GuiDoesNotExist

GetGUIImage
VerifyGUIImageToFile
LocateScreenImage	Get location and dimensions of Window or Component.

GetTextFromGUI	OCR Text from Component and save to a variable.
SaveTextFromGUI	OCR Text from Component and save to a file.

    T, WindowID, WindowID , Click
    T, WindowID, CompID   , Click, "Coords=20;45"
    T, Notepad , CloseIco , RightClick
    T, Notepad , Titlebar , DoubleClick
    T, WindowID, CompID   , LeftDrag, Left2Right
    T, WindowID, CompID   , LeftDrag, "10;10;200;20"

    T, Notepad , Notepad  , InputCharacters, "Any Text Here"

    T, Notepad , Notepad  , InputKeys, "{TAB}{DOWN 3}{ENTER}%{F4}"
    T, Notepad , Notepad  , InputKeys, "%{F4}"

The Click commands do not yet support the AppMapSubKey parameter as documented in the SAFS Keyword Reference.

Consult the InputKeys Map for the format of keystrokes for InputKeys.

Driver Commands:

GetTextFromImage	Text OCR support.
SaveTextFromImage	Text OCR support.

FilterImage	Remove specific content from image.

CaptureMousePositionOnScreen	Retrieve the X,Y coordinates of the mouse cursor.

SetImageDebug	Enable verbose IBT debug info when Debug Log is enabled.
SetImageFuzzyMatching	Enable enhanced fuzzy logic IBT image comparison when enabled.

WaitForGui
WaitForGuiGone
OnGuiExistsGotoBlockID
OnGuiNotExistGotoBlockID

    C, WaitForGui     , WindowID , WindowID, 10
    C, WaitForGui     , WindowID , CompID  , 15

    C, WaitForGuiGone , WindowID , WindowID, 30
    C, WaitForGuiGone , WindowID , CompID  , 5

The WaitFor commands do support the default 15 second timeout when TIMEOUT is not specified.

Top, Walk-Thru, Syntax, PerImageModifiers, Images, FuzzyMatching, Commands

Sample Window and Components "recognition strings"

WindowID="Image=[pathTo]\image.ext;[hotspot=x[,y][;pointrelative=constant]"
=================================================================================

If image fullpath is not provided a path relative to the Project is assumed. If project-relative path is not found then path relative to project\Datapool is assumed.

For a WindowID, the bounds of the single image specifies the total bounds of the deduced rectangle for that "Window" object. A subsequent CompID search will first attempt to be found within the bounds of that "Window". If not found, then the top-left corner of the "Window" rectangle defines the top-left coordinate of the remaining area to search on the screen for the CompID image.

Example:

1. WindowID="Image=\image.ext;HotSpot=2,2;PointRelative=TopLeft"

   (hotspot is 2,2 pixels in from the top-left corner of the
   deduced rectangle)

2. WindowID="Image=\image.ext;hs=2,2;pr=tl"

   (hotspot is 2,2 pixels in from the top-left corner of the
   deduced rectangle)

3. WindowID="Image=\image.ext;HS=-10;pr=LeftCenter"

   (hotspot is 10 pixels to the left and vertically centered on
   the left edge of the deduced rectangle)

4. WindowID="Image=\image.ext;hotspot= ,-10;pr=TopCenter"

   (hotspot is 10 pixels above and horizontally centered on the
   top edge of the deduced rectangle)

5. WindowID="Image=\image.ext;pr=BottomRight"

   (hotspot is the bottom-right corner of the deduced rectangle)


WindowID="Image=[pathTo]\image1.ext;ImageRight=[pathTo]\image2.ext;....."
===============================================================================

If image fullpath is not provided a path relative to the Project is assumed. If project-relative path is not found then path relative to project\Datapool is assumed.

The top-left image1 will be sought first. Once found,the top-right image2 will be sought within the vertically limited bounds defined by top-left image1 along with any Top and Bottom OutSets, if provided.

For a WindowID, coordinates for top-left image1 and top-right image2 deduces the top-left and width of the bounds for the deduced rectangle for that "Window" object. The height of the window will be from that deduced top edge down to the bottom of the screen. A subsequent CompID search will be limited within these bounds of that "Window".

Example:

1. WindowID="Image=\topleft.ext;ImageR=\topright.ext;HotSpot=0,2;PointRelative=TopCenter"

   Hotspot is 2 pixels down from the top-center edge of the deduced rectangle.
   The width of the deduced rectangle is limited by the outer coordinates at which
   topright.ext was found.

2. WindowID="Image=\topleft.ext;ImageR=topright.ext;hs=-10;pr=LeftCenter>

   Hotspot is 10 pixels to the left and vertically centered on the left edge of the deduced
   rectangle. The width of the deduced rectangle is limited by the outer coordinates at which
   topright.ext was found.

Recall that ImageBottom(ImageB) can be used to similarly limit the deduced height of the rectangle. ImageB can be used with the anchor Image alone, or in conjunction with ImageRight to fully define the width and height of the target window or component.

Top, Walk-Thru, Syntax, PerImageModifiers, Images, FuzzyMatching, Commands, more...