Abstract
The amount of information on the World Wide Web(WWW) has been increasing tremendously. It has been observed that more the available information about a given subject, the more difficult it is to locate accurate and relevant information. However, WWW is not just about textual information. The prevalence of digital cameras has brought about an exponential growth in the amount of images found on the World Wide Web. These images can be a good source of information and may prove useful for various applications. Indexing the web images and building
image search engines could help users quickly find images they seek, which can then be used for many purposes. The large volume of web images has necessitated the development of efficient methods to enable indexing, searching, and retrieval of web images. The research community has responded to this challenge and we have witnessed the development of a plethora of techniques and working systems in the last two decades. Furthermore, some commercial engines have appeared, including
Google image search1, Lycos2 and AltaVista photo finder3. The effectiveness of these engines is still limited.
In this thesis we address the following two issues: First, predicting the terms in image alternate text using other textual descriptors. Providing search services for the web images has been difficult. Traditional image retrieval systems assign annotations to each image manually. Although it is a good methodology to retrieve images through text retrieval technologies, it is gradually becoming impossible to
annotate images manually one by one due to the huge and rapid growing number of web images. Automatic Image Annotation has become an active research area since then. A common view is that semantics of web images are well correlated with their associated texts. Because of this, several popular search engines offer web image search based only on the associated texts. Alternate text(ALT tag) is considered the most important of all associated texts. ALT attribute is used to describe the contents of an image file. ALT attribute is designed to be an alternative text description for images. It represents the semantics of an image as it provides useful information to anyone using the browsers that cannot display images or image display disabled. However, a recent study has shown that around half of the images on the web have no ALT text at all. We propose an approach to predict the terms in ALT text of an image based on the term co-occurrences. We explore different approaches to the problem of predicting terms in ALT text using term co-occurrences
and natural language processing techniques where the terms in noun phrases and verb phrases are given more weight, and conclude that both the term co-occurrence approach and term co-occurrence approach combined with noun phrases and verb
phrases work well for the problem. However, for the prediction task, we prefer the term co-occurrence approach with out natural language processing techniques as it is language independent. We build two image annotation systems on top of the proposed approach and the proposed approach combined with natural language processing techniques, and find out that latter achieves good performance over the
former.
We also explore a slightly different but related problem in the context of web image retrieval: how users behave in a multilingual information access task. Search logs are good sources of information to know how the user needs are expressed. This information can be used to improve the quality of the search interface and the search experience. We analyze the search logs generated by an online game, known-item image retrieval from Flickr4, a popular image store on the web. We present our experiments to mine the search logs and extract conclusions about the
behavior of users when facing a strictly multilingual information access task. We group the users based on the score, number of hints taken, and the precision to study the behavior of the most successful users, the least successful users and the users in between the two. We analyze the two types of questionnaires - Found Image Questionnaire after the user finds an image and Give up Questionnaire when the user gives up - to know the issues they faced while searching for an image. We study how the language skills effect the user behavior. The majority of Web image search is text-based and the success of such approaches often depends on reliably identifying relevant text associated with a particular image. Our findings
confirm research results reported in previous studies, such as the fact that finding images is still quite challenging for users, even when the images have been tagged by a community such as Flickr. Our results show that, most of the users start with
monolingual interface and soon they realize cross-lingual interface is more useful than mono-lingual interface, and the users are more comfortable to search in their mother language or the l