1. Image Identification
The task of image identification or copy detection consists in taking a given query image and finding out the original from where it possibly derives, together with any set of relevant metadata, such as titles, authors, copyright information, etc.
The transformations may include:
- Geometric transforms: mainly translations, rotations and scale changes;
- Photometric and colorimetric transforms: changes in brightness, contrast, saturation or colour balance;
- Selection and occlusions: the image may have been cropped to select just a detail; it may also present labels, stamps, annotations, censor bands, etc.;
- Noise: of several kinds, including compression artifacts (from lossy compression like JPEG), electronic noise from cameras and scanners, etc.;
- Analogical/digital conversion: the initial acquisition of the digital image, as well as reprinting and rescanning may cause considerable distortion.
This large variety of transformations makes the task very challenging.
Institutions possessing large sets of images, like museums, archives and press agencies, are often asked to perform the identification of images in newspaper clippings, books, thesis and even postcards, where the references are missing, too summary, outdated or incorrect. Sometimes the only source indication is a copyright notice.
In those cases, the visual information is the only reliable evidence for the identification of the document. But because the collection consists of hundreds of thousands of items, manual search is infeasible. An operator with good knowledge about the collection may use conventional search tools to narrow the search and retrieve the desired image, but even so, hundreds of possibilities may remain and the whole process may take hours. The simple question, “Which image is this?” becomes difficult to answer, frustrating the users.
Armitage and Enser (Armitage and Enser, 1997) identify the need of checking the identification, attribution and provenance of items as one of the major types of inquires users present to libraries and archives. This is not surprising, since the meaning of a document depends on its context, and the lack of metadata reduces its usefulness.
Image identification is also needed within the boundaries of an institution, when images get separated from their metadata at one of the steps of complex workflows. In this case, quality control will identify images with lacking or incorrect metadata, and proceed to rectify the situation. A concrete example is the image database of a conservation / restoration laboratory. The study of a single artwork will generate many images, including general views, close-up of details, pictures taken under different illumination conditions, etc. Those images are only useful in the context of a dossier of analysis. If one of them gets lost and is found isolated, it is important to be able to retrieve its dossier (Lewis et al., 2004 § III.A).
One of the most prominent applications of image identification is the tracking of the reproduction of visual documents, either to reveal the historic of exploitation of a given document, or to enforce the copyright.
Traceability allows to uncover the exploitation of an image, which may be interesting both for understanding usage and to establish links between different collections.
Unauthorized reproduction, on the other hand, is a serious issue for authors and holders, especially with the availability of digital tools and the Internet, which make it very easy to make and disseminate high quality copies of protected material.
Photography agencies, for example, make their profit from licensing the use of their images. If they suspect someone is using one of their images without paying the due fees, they can hire an expert to determine if the suspect image is indeed derived from one of their images. This process can be greatly streamlined if the software automatically detects, in a set of images, those which probably come from the database, associating to them the most likely originals.
2. Traditional Methods for Image Identification
Traditional image retrieval systems are based on textual annotations, either added manually by a human operator or automatically harvested from the context of the image (e.g., captions and text around the image in the page). In those systems, users can try different searches, using different combinations of keywords, until the desired original is found.
Textual search, however, has serious shortcomings:
- Manual annotation is a very labour intensive task, and text harvesting depends on the availability of contextual text;
- The annotations may be unavailable, incorrect, or too summary;
- The users may not be knowledgeable enough about the image in order to provide textual descriptors. For example, they may want to find the original of a portrait precisely in order to know whose portrait is that;
- Synonyms, homonyms, polysemes, specialisation and generalisation also make textual search difficult.
An alternative is the use of watermarks. This technique allows the embedding of a certain amount of information in the visual mesh of the image, normally almost imperceptibly to the human eye. This may be used to include a unique identification in each image.
Though theoretically interesting, this technique faces many practical challenges:
- The watermarks are not robust to strong image transformations;
- Someone aware of the presence of the watermark may remove it or change it;
- The watermarks always distort the image a little, and for a given method, there is a trade-off between the robustness and the degree of distortion;
- The watermarks only work for images reproduced after their adoption, all the copies circulating before they were adopted will remain unidentifiable;
- Most analogical sources cannot use the technique, for example, a picture in a museum which can be photographed by the patrons.
Content-based image retrieval (CBIR), even if not specifically conceived for image identification, may be a powerful ally, since they rely only on the visual aspect of the documents, needing not keywords, annotations or watermarks, which can be unavailable, lost or removed. The image on hand may be used as query in order to search for images with similar distribution of colours, textures and shapes.
The major disadvantage of those systems in comparison to those specific for image identification consists in their criterion of similarity, which is not always adequate to retrieve the original images. For example, a system based on colour distributions will not be able to find the original if the reproduction was converted to greyscale.
Generally speaking, semantic-oriented CBIR systems are not as well suited for image identification as dedicated systems, because their matching criteria do not take in account the kinds of distortions which are most commonly expected in this specific task.
3. Content-Based Identification with Local Descriptors
Image identification systems are a specialization of CBIR systems, and share with them many characteristics. Both use the concept of descriptors in order to establish the similarity between the images, instead of trying to compare directly the raw visual contents. Those descriptors can be either global, meaning they are based on the entire image, or local, meaning they are based on local features of the images, like regions or points of interest (PoI).
Systems based on PoI have shown excellent performance in the task of image identification (Valle et al., 2006, 2007).They work by identifying points in the image which are in a certain way more unique than the others, and remain the same in spite of transformations. A local descriptor is computed around each one of those points.
In order to perform the identification, first, in an offline phase, which is done only once, the image set is prepared: each image has its PoI detected and described, and the descriptors are indexed in a large database in order to facilitate the matching. Then, in the online phase, the PoI of the query image are detected and described, and each descriptor is matched with the descriptors in the database. Each matched descriptor votes to the image to which it belongs (Figure 1).
This method is robust, because the descriptors are many. If some descriptors are lost due to occlusions or cropping, enough will remain to guarantee good results. Even if some descriptors are matched incorrectly, giving votes for the wrong images, only a correctly identified image will receive a significant amount of votes.
The identification is made even more robust in a latter step, where the geometric consistency of the matches is enforced. This will filter out random matches, by requiring that the pairings follow a uniform spatial transformation. Almost all false positives can be eliminated this way.
Unfortunately, the multiplicity of descriptors brings a performance penalty, since hundreds, even thousands of matches must be found in order to identify a single image. The matter is made worse if the descriptors are high-dimensional, which is often the case. Because of the infamous effect of the curse of dimensionality, each match operation becomes very expensive.
4. Indexing Local Descriptors with Space-Filling Curves
Response time is an important usability consideration for the search system. A fast system provides a more comfortable interaction with the users, inviting the exploration of alternative query formulations, which ultimately improves their ability of finding the desired answer.
The emphasis on quick response times depends on the intended application. In some contexts, speed, while still a desirable property, is of secondary importance. A user searching for an image in a state archive, for example, will be willing to wait several minutes in order to get a correct answer, since the back-up strategy – manual search –may take entire weeks. In this case, efficacy, the capability of obtaining a correct answer, is more important than efficiency, the capability of doing it fast.
In other applications, however, efficiency is essential. One example is copyright enforcement, in which we look for the intersection between two images bases. If the system is to have practical use, each individual search operation must take at most a few seconds, because we will typically be performing thousands of queries at a time.
In our system architecture, most of the time is spent on the matching of the local descriptors, since there are hundreds of descriptors in a single image. Matching the descriptors is done by performing a similarity search, i.e., looking for the descriptors in the database which are the most similar to those in the query.
Unfortunately, due to a phenomenon known as curse of dimensionality, the time needed to perform the similarity search grows exponentially with the dimensionality of the descriptor. Because we use SIFT descriptors (Lowe, 2004) which have 128 dimensions, we had to use especially adapted indexes to avoid incurring in unacceptable penalties.
Our indexes are based on Hilbert’s space-filling curves. Those special fractal curves have the surprising property of filling the entire hyper-dimensional space, being able, therefore, to map the multidimensional descriptors in a single dimension (Hilbert, 1891). Points that are near to each other in the curve always correspond to points that are near to each other in the hyper-dimensional space (e.g., the two stars in Figure 2), so it is a good heuristic to take the nearest points in the curve as the nearest points in the space.
Unfortunately, the converse is not always true, in the sense that near points in the space are not always near in the curve (e.g., the two hexagons in Figure 2). This is because of the boundary effects, which tend to put very far apart points in certain regions of the curve. The matter is seriously aggravated as the dimensionality increases. In order to minimize this problem, our method uses multiple curves, each responsible for a subset of the dimensions.
This way, two advantages are gained:
- Each curve is of moderate dimensionality, which alleviates the boundary effects;
- Points that are incorrectly separated in one curve will probably stay together in another, minimizing, in this way, boundary effects.
Besides its performance, our index has the advantage of being dynamic, i.e., it handles well the insertion and deletion of data. Each curve is stored in sorted list implemented in a data structure known as B-Tree (Bayer and McCreigh, 1972).
4. System Operation
Usage of the system is very simple. There are separate modules for management of the index and for the search operations.
New images, as they are added to the system, have their local descriptors computed and inserted in the Hilbert indexes. The image files and concerned metadata are also stored.
In order to search the database, the user opens an image file or acquires a new image from a scanner or camera. The PoI and descriptors are computed for the query image (Figure 3, left).
The system will then match the descriptors of the query to the descriptors in the database, using the Hilbert indexes. Geometric consistency is verified, to eliminate spurious matches, using a robust model estimator. The surviving matches are counted, and the result images are presented to the user, in the descending order of the number of the votes they received. The matched region is highlighted in each result image (Figure 3, right).
Finally, the user can obtain the details on the desired result, viewing the image and its metadata. The matched points are highlighted (Figure 4).
Using our system, a difficult query, which would otherwise require a frustrating trial-and-error with textual keywords – or worse, a painstaking browsing of images – can be answered in a few seconds, with minimal user intervention.
Mr. Eduardo Valle is sponsored by a CAPES scholarship through the CAPES/COFECUB program.
Armitage, L.H. and P.G.B. Enser. Analysis of user need in image archives. Journal of Information Science, vol. 23, no. 4. 1997. pp. 287–289. DOI: 10.1177/016555159702300403
Bayer, R. and E.M. McCreight. Organization and maintenance of large ordered indexes. Acta Informatica, vol. 1, no. 3. Berlin / Heidelberg: Springer. September, 1972. pp. 173–189. DOI: 10.1007/BF00288683.
Hilbert, D. Über die stetige Abbildung einer Line auf ein Flächenstück. Mathematische Annalen, vol. 38, no. 3. Berlin / Heidelberg: Springer. September, 1891. pp. 459–460. DOI: 10.1007/BF01199431.
Lewis P.H., K,Martinez, F.S. Abas et al. An integrated content and metadata based retrieval system for art. IEEE Transactions on Image Processing, vol. 13, no. 3. IEEE Signal Processing Society. March, 2004. pp. 302–313. DOI: 10.1109/TIP.2003.821346.
Lowe, D.G. Distinctive Image Features from Scale-Invariant Keypoints. International Journal of Computer Vision, vol. 60, no. 2. Netherlands: Springer. November, 2004. pp. 91–110. DOI: 10.1023/B:VISI.0000029664.99615.94.
Valle, E, M. Cord and S. Philipp-Foliguet. Content-based Retrieval of Images for Cultural Institutions using Local Descriptors. In Geometric Modelling and Imaging – New Trends – GMAI 2006. London, England: IEEE Computer Society. 2006. pp. 177–182. DOI: 10.1109/GMAI.2006.16
Valle, E., S. Philipp-Foliguet and M. Cord. Image Identification using Local Descriptors. In ImagEVAL 2006 Workshop. Amsterdam, Netherlands. July 12, 2007. 5 p. URL: http://www.imageval.org/Workshop/ENSEA_ETIS_T01_ImagEVAL06_p.pdf
Valle, E., et al., Content Based Image Identification on Cultural Databases , in International Cultural Heritage Informatics Meeting (ICHIM07): Proceedings, J. Trant and D. Bearman (eds). Toronto: Archives & Museum Informatics. 2007. Published October 24, 2007 at http://www.archimuse.com/ichim07/papers/valle/valle.html