Incubators of knowledge: Predicting protege productivity and impact in the social sciences (August 1, 2012-July 31, 2014, Funded by NSF)

Doctoral students comprise a larger portion of the academic workforce, yet scholars have very little knowledge of their place in scholarly networks, the degree to which they contribute to scholarly output, and the impact of this output. Very little quantitative analysis shows the relationship between advisors' scholarly practices and the future success of their advisees. This study investigates these issues from two main angles: understanding the contribution of doctoral students to social-science research (the extent and character of this contribution) and the impact of this research (visibility through citations); and examining the advisor's knowledge base and knowledge-diffusion practices, and whether these factors are involved in expanding knowledge frontiers and how they relate to the career trajectories and future success of doctoral students. 

Cascades, islands, or streams? Time, topic, and scholarly activities in humanities and social science research (February 2012-January 2014, Funded by NSF, JISC and SSHRC)

This project will examine topic lifecycles across heterogeneous corpora, including not only scholarly and scientific literature, but also social networks, blogs and other materials. While the growth of large-scale datasets has enabled examination within scientific datasets, there is little research that looks across datasets. The team will analyze the importance of various scholarly activities for creating, sustaining and propelling new knowledge; compare and triangulate the results of topic analysis methods; and develop transparent and accessible tools. This work should identify which scholarly activities are indicative of emerging areas and identify datasets that should no longer be marginalized, but built into understandings and measurements of scholarship.

VIVO (Oct 2009-Oct 2011, Funded by NIH)

VIVO Project aims to facilitate research networking and collaboration of basic, clinical, and translational researchers including investigators, students, technical staff and others. The Semantic Web/Linked Data approach will be used to implement locally controlled researcher network installations that interoperate to create a flexible and scalable multi-institutional network.

Main work

NSF Workshop: Semantic Web and Map of Science (sponsored by NSF)

This workshop brings together semantic web researchers and science map makers to discuss the impact and potential collaboration of utilizing semantic web technologies and fast-growing linked semantic data to improve access to and understanding of science and technology data. Demonstration of existing approaches, tools, and techniques as well as discussion of synergies, will provide a point of departure for developing improved interfaces to massive amounts of semantic web data for researchers, educators and the general public alike.

NSF Workshop: Scholarly Evaluation Metrics: Opportunities and Challenges (funded by NSF) news in Chronicle

The quantitative evaluation of scholarly impact and value has historically been conducted on the basis of metrics derived from citation data. Although well-established and productive, this approach is not always best suited to fit the fast-paced, open, and interdisciplinary nature of today's digital scholarship. Significant advances have been made in the past years to address this problem. First, we have seen a rapid expansion of the set of metrics at our disposal to evaluate scientific impact. This expansion has been driven by interdisciplinary work in web, network and social network science, e.g. citation PageRank and various other social network metrics. Second, data sets have been investigated that are more compatible with the online nature of modern science such as large-scale usage and query data. Projects such as COUNTER and MESUR have determined the feasibility of deriving various indicators and impact metrics from usage log data.

Online Vocabulary Research (OVR)

One study currently being conducted by the Online Vocabulary Research (OVR) project involves a longitudinal case study and analysis of tagging patterns in the social bookmarking site Delicious. Previous research has indicated that a tagging vocabulary stabilizes over time, suggesting that convergence of individual folksonomies occurs as a result of participation by multiple users in a social tagging application. This study investigates the existence of stability and convergence in a subset of the tagging vocabulary used in Delicious between 2004 and 2007 and the presence and/or nature of community as well as patterns and practices of tagging behaviors in Delicious.

A second OVR study addresses the nature of web-based tagging systems. These systems have characteristics of structure, design and purpose that have implications for the behaviors of users when annotating online resources. Unfortunately, findings from studies of popular applications such as Delicious and Flickr have frequently been extended to all tagging systems without consideration of the heterogeneity that exists across systems. This research explores the characteristics of a set of randomly selected tagging systems to identify differences and similarities that should be taken into consideration when studying the tagging behaviors of users.

Upper Tag Ontology (UTO)

Data integration and mediation have become central concerns of information technology over the past few decades. With the advent of the Web and the rapid increases in the amount of data and the number of Web documents and users, researchers have focused on enhancing the interoperability of data through the development of metadata schemes. Other researchers have looked to the wealth of metadata generated by bookmarking sites on the Social Web. While several existing ontologies capitalize on the semantics of metadata created by tagging activities, the Upper Tag Ontology (UTO) emphasizes the structure of tagging activities to facilitate modeling of tagging data and the integration of data from different bookmarking sites as well as the alignment of tagging ontologies. UTO is described and its utility in modeling, harvesting, integrating, searching and analyzing data is demonstrated with metadata harvested from three major social tagging systems (Delicious, Flickr and YouTube).

Social Tagging Network Analysis

This project aims to use social network analysis to portray the social vocabulary evolving in there major social tagging websites: Delicious, Flickr and YouTube. Large set of crawled tagging will be analyzed based on the major social network analysis methods.

PageRank for Scholarly Ranking and Social Influence Analysis

Using scholarly data (citations and publications) to form scholarly networks to identify the influential authors, journals, or papers. Ranking individuals based on the combined factors, such as citation time, multi-mode graphs and so on.

Semantic Web for Biochemical Knowledge Discovery

Recent advances in biomedical research have led to an influx of large amount of datasets about genes, proteins, genetic variations, chemical compounds, diseases and drugs. Most of the data only begins to make sense when it is integrated with other data and domain knowledge. This triggers the emerging studies in Network Biology, which models biological objects and reactions as complex networks and utilizes network analysis methods to discover hidden associations. Meanwhile, the cutting-edge Semantic Web technologies enable interoperable data integration and knowledge representation which significantly facilitate publishing, sharing and interlinking biomedical data on the Web. The Semantic Web enables integrative analysis of highly heterogeneous networks. Integrating multiple graphs that may be highly heterogeneous is obviously necessary for advanced network biology and network medicine research.

Research Profiling

Research profiling is a widely-adopted method to monitor research development and rank research performance. This project describes a novel infrastructure to generate semantic-powered research profiling for research fields, organizations and individuals. It uses the proposed Research Ontology to crawl related websites and news feeds, and model them into RDF triple stores to facilitate semantic queries and semantic mining on novelty detection, hot topic detection, dynamics of research, and topic clustering. This RDF triple store contains crawled 1,273,037 unique webpages and 6,467,448 news articles and is a large-scale Terabyte database built upon MySQL.

Linked Open Data (LOD) and its application

- Linking LOD with Patent data

DBpedia is a community effort to extract structured information from Wikipedia and to make this information available on the Web. DBpedia allows users to ask expressive queries *against Wikipedia and to interlink other datasets on the Web with DBpedia data. This paper presents a simple approach to link any dataset to DBpedia and thus making the dataset available for anyone’s use.

- Mashingup semantic music data

The work discusses the MuzkMesh music portal which mashups existing semantic music data from the Linked Open Data (LOD) bubbles and other common APIs. It aims to demo the power of semantic integration and useful use scenarios on music retrieval and entertainment.