Martin Klein and Michael L. Nelson
Abstract. Missing web pages (pages that return the 404 “Page Not Found” error) are part of the browsing experience. The manual use of search engines to rediscover missing pages can be frustrating and unsuccessful. We compare four automated methods for rediscovering web pages. We extract the page’s title, generate the page’s lexical signature (LS), query the bookmarking website delicious.com for the page’s tags and generate a LS from the page’s link neighborhood. We use all methods to query Internet search engines and analyze their retrieval performance. Our results show that both LSs and titles perform fairly well with over 60% URIs returned top ranked from Yahoo. However, the combination of methods improves the retrieval performance. Considering the complexity of the LS generation, querying the title first and in case of insufficient results querying the LSs second is the preferable setup. This combination accounts for more than 75% top ranked URIs.
Search Behaviors in Different Task Types (Full Paper)
Jingjing Liu, Michael Cole, Chang Liu, Ralf Bierig, Jacek Gwizdka, Nick Belkin, Jun Zhang and Xiangmin Zhang
Abstract. Personalization of information retrieval tailors search towards individual users to meet their particular information needs by taking into account information about users and their contexts, often through implicit sources of evidence such as user behaviors. Task types have been shown to influence search behaviors including usefulness judgments. This paper reports on an investigation of user behaviors associated with different task types. Twenty-two undergraduate journalism students participated in a controlled lab experiment, each searching on four tasks which varied on four dimensions: complexity, task product, task goal and task level. Results indicate regular differences associated with different task characteristics in several search behaviors, including decision time (the time taken to decide whether a document is useful or not) and eye fixations. We suggest these behaviors can be used as implicit indicators of the user.s task type.
Exploiting Time-based Synonyms in Searching Document Archives (Full Paper)
Nattiya Kanhabua and Kjetil Norvag
Abstract. Recently a large number of easily accessible information resources have become available. In order to increase search quality, document creation time can be taken into account in order to increase precision, and query expansion of named entities can be employed in order to increase recall. A peculiarity of named entities compared to other vocabulary terms is that they are very dynamic in appearance. In this paper, we present an approach to extract synonyms of named entities over time from the whole history of Wikipedia. In addition, we will use their temporal patterns as a feature in ranking and classifying them into two types, i.e., time-independent or time-dependent. Time-independent synonyms are invariant to time, while time-dependent synonyms are relevant to a particular time period, i.e., the synonym relation changes over time. Further, we describe how to make use of both types of synonyms in order to increase the retrieval effectiveness (precision and recall), i.e., query expansion with time-independent synonyms for an ordinary search, and query expansion with time-dependent synonyms for a search wrt.\ temporal criteria. Finally, through an evaluation based on TREC collections we demonstrate how retrieval performance of queries consisting of named entity can be improved using our approach.