Abstract
Sponsored search is one of the most dominant modes of online advertising on the web. In sponsored search, advertisers bid on relevant keywords to advertise their product. For an incoming search query, advertisements from the ad campaigns containing the query keywords are shown along with the search results. If multiple advertisers demand to be shown on the same query’s results page, they are ranked for the allocation of ad space. The ranking is determined by multiple factors including the bid amount of the advertiser on the query keywords, relevance of ad content to the search query, Click-Through-Rate (CTR) and budget of the advertiser. The ecosystem of sponsored search has three main stakeholders - search engine, advertisers and users. Search engines aim to maximize the revenue by showing ads. Advertisers want to maximize the reach of their service or increase the sales. Users see advertisements on the search results page. Some of the key research challenges in sponsored search are query-ad relevance, click-through-rate prediction, optimal auction design, optimum utilization of ad space of rare (tail) queries and click-fraud detection.
In this thesis, related to sponsored search, we have investigated approaches to exploit the ad space of tail queries. It is well established that search queries tend to follow a heavy-tailed Zipf distribution wherein a large fraction of queries occur too infrequently. Such infrequent query set is called long tail. Advertising on long tail queries is challenging as long tail queries are encountered rarely which makes them hard to interpret for sponsored search. Also, it has been observed that during keyword auctions, advertisers tend to bid for the head keywords to reach more users. This creates a high demand for the head query keywords and little or no demand for the tail query keywords. The long tail phenomenon also makes it quite difficult for an advertiser to capture the relevant keywords from the long tail. The above stated factors result in under-utilization of a significant amount of the ad space provided by tail queries in sponsored search which is identified as the research issue.
We propose two approaches for the utilization of ad space of tail queries by proposing that instead of bidding on keywords, advertisers should bid upon high level concepts. In the first approach, we propose bidding on concepts by organizing concepts into a two level taxonomy. In the second approach, we propose a generalized approach by considering organization of concepts into a multi-level taxonomy.
In the first approach, we have proposed an improved framework to cover more advertisers by exploiting the notions of coverage and concept taxonomy based on the log data of search queries. The proposed framework allows to form the distinct groups of keywords such that each group of keywords could be allocated to meet the demands of the advertiser. We model each search session as a transactionand queries occurring in a session form the items of the transaction. Coverage patterns are then mined from these session-based transactions. The coverage patterns fuse tail keywords together into multiple groups to maximize the unique visitors and two-level concept taxonomy ensures that groups are meaningful. A comprehensive framework has been proposed to map the extracted coverage patterns and the demands of the advertisers to allocate incoming queries to advertisers. We have conducted experiments on a real world dataset of AOL web search engine and found out that it is possible to meet the demands of more advertisers with the proposed approach. It was found out that the proposed approach is able to provide diverse but meaningful group of keywords which allows the advertiser to display advertisement to appropriate users based on the requirements.
In the second approach, we propose that advertisers should bid upon high level concepts represented by a multi-level taxonomy instead of search keywords during ad space auctions. Advertisers are free to bid on any node of the taxonomy. Bidding on any node in the taxonomy provides more flexibility to the advertisers to target the potential consumers. However, to allocate children nodes of bidding nodes to advertisers, the flat model of coverage patterns cannot be used. To address the issues of interdependency of concepts on each other, we exploit search query logs and a taxonomy to extract level-wise coverage patterns. We further propose an end-to-end architecture which takes search query logs, axonomy and advertising demands as inputs and allocates an incoming search query to the advertisers. The corresponding architecture is used to perform allocation of incoming queries to advertisers for sponsored search. Experiments on a real world dataset of AOL search query logs show improvement in performance with respect to ad space utilization and reach of the advertisements.
Overall, we have proposed two approaches for improving the utilization of ad