Scholarly Paper Recommendation via User’s Recent Research Interests (Full Paper)
Kazunari Sugiyama and Min-Yen Kan
Abstract. We examine the effect of modeling a researcher’s past works in recommending scholarly papers to the researcher. Our hypothesis is that an author’s published works constitute a clean signal of the latent interests of a researcher. A key part of model is to enhance the profile derived directly from past works with information coming from the past works’ referenced papers as well as papers that cite the work. In our experiments, we differentiate between junior researchers that have only published one paper and senior researchers that have multiple publications. We show that filtering these sources of information is advantageous ? when we additionally prune noisy citations, referenced papers and publication history, we achieve statistically significant higher levels of recommendation accuracy.
Effective Self-Training Author Name Disambiguation in Scholarly Digital Libraries (Full Paper)
Anderson Ferreira, Adriano Veloso, Marcos Goncalves and Alberto Laender
Abstract. Name ambiguity in the context of bibliographic citation records is a hard problem that affects the quality of services and content in digital libraries and similar systems. Supervised methods that exploit training examples in order to distinguish ambiguous author names are among the most effective solutions for the problem, but they require skilled human annotators in a laborious and continuous process of manually labeling citations in order to provide enough training examples. Thus, addressing the issues of (i) automatic acquisition of examples, and (ii) highly effective disambiguation even when only few examples are available, are the need of the hour for such systems. In this paper, we propose a novel two-step disambiguation method, SAND (Self-training Associative Name Disambiguator), that deals with these two issues. The first step eliminates the need of any manual labeling effort by automatically acquiring examples using a clustering method that groups citation records based on the similarity among coauthor names. The second step uses a supervised disambiguation method that is able to detect unseen authors not included in any of the given training examples. Experiments conducted with standard public collections, using the minimum set of attributes present in a citation (i.e., author names, work title and publication venue), demonstrated that our proposed method outperforms representative unsupervised disambiguation methods that exploit similarities between citation records and has effectiveness close, and in some cases superior, to supervised ones, without manually labeling any training example.
Citing for High Impact (Full Paper)
Xiaolin Shi, Jure Leskovec and Daniel McFarland
Abstract. The question of citation behavior has always intrigued scientists from various disciplines. While general citation patterns have been widely studied we develop the notion of citation projection graphs by investigating the references between the publications that a given paper cites. We investigate how patterns of citations vary between scientific disciplines and how such patterns reflect the impact of the paper. We find that idiosyncratic citation patterns are used by low impact papers; while narrow, discipline-focused citation patterns are used by medium impact papers. Crossing-community, or bridging citation patters are high risk and high reward since these result in either low or high impact papers. Last, we observe a trend in paper citation networks over time toward more bridging and interdisciplinary forms.