The web of
scientific knowledge

Current trends
and future perspectives
in the big data era

Information
Description
Purpose
Program
Speakers

Information  Description  Purpose  Program  Speakers

Information

Date: June 23, 2014

Duration: full day

Organizing committee:

  • Filippo Radicchi, Indiana University, USA
  • Staša Milojević, Indiana University, USA
  • Ying Ding, Indiana University, USA
  • Cassidy Sugimoto, Indiana University, USA
  • Vincent Larivière, University of Montreal, Canada
  • Min Song, Yonsei University, Korea

Description

The idea to study science using scientific methods is by no means new. It has long been the focus of scientometrics, the field that uses quantitative and systematic analysis of bibliographic data for the purposes of understanding and evaluating science. Scientometrics has been deeply rooted within social sciences. However, it has recently attracted the attention of many other scientific communities: physics, computer science, biomedicine, to name a few. The field is undergoing a very rapid growth and transformation, not only due to the influx of fresh perspectives, but because of the expansion to novel data sources.

This global interest is motivated by two main reasons:

(i) the possibility to understand the way in which scientific knowledge is created, articulated and disseminated thanks to the increasing availability of bibliographic data in digital format;

(ii) the growing importance of citation numbers or other indicators derived from bibliographic datasets in the quantitative evaluation of scientific activities.

Purpose

The main purpose of this workshop is to bring together scientists from different disciplines to engage a multidisciplinary discussion on recent developments in the analysis of bibliographic data and especially on the future perspectives of this field of research. We hope that the workshop will engage participants a collaborative discussion around a number of important topics such as:

  1. Measures of scientific impact
  2. Structure and evolution of citation and collaboration networks
  3. Knowledge discovery from scientific literature using entity-based analysis
  4. Organization and dynamics of scientific disciplines
  5. Gender differences in publication rates, scientific impact and collaboration patterns

We have envisioned the workshop as a combination of panels and invited talks. All of the speakers will be by invitation only. The panels will be assembled around the topics listed above. In order to promote more discussion and exchange of ideas special attention in the selection of speaker is being paid to ensuring that they come from different disciplinary backgrounds.

Program

Monday, June 23

10:00 - 10:05Introduction - Filippo Radicchi 10:05 - 10:40Talk 1 - Daniel Romero 10:40 - 11:05Talk 2 - Jasleen Kaur 11:05 - 11:15break 11:15 - 11:50Talk 3 - Staša Milojević 11:50 - 12:25Talk 4 - Johan Bollen 12:25 - 1:00Talk 5 - Caroline Wagner 1:00 - 2:30break 2:30 - 3:05Talk 6 - Xiaozhong Liu 3:05 - 3:40Talk 7 - Filippo Radicchi

Speakers

Johan Bollen

Associate Professor of Informatics and Computing
Indiana University, Bloomington
Website

Time:11:50 - 12:25 Topic:TBA

Jasleen Kaur

PhD student
Indiana University, Bloomington
Website

Time:10:40 - 11:05 Topic:Emergence of Innovation in Science Abstract:The birth and decline of disciplines are critical to science and society. How do scientific disciplines emerge? We study the role of social interactions in the birth and evolution of disciplines through several empirical analyses and an agent-based model in which the evolution of disciplines is guided mainly by social interactions among agents representing scientists. We map the evolution of co-authorship networks of scientific papers over time. As a field develops, it undergoes a topological transition in its collaboration structure from a small disconnected graph to a network where a giant component of collaboration appears. We analyze this qualitative change in network topology in terms of several quantitative graph theoretical measures, such as density, diameter, and relative size of the networks largest component. Given the changes in network topology of a scientific field, we modeled emergence of disciplines from splitting and merging of social communities in a collaboration network. We find that this social model can account for a number of stylized facts about the relationships between disciplines, scholars, and publications. These results provide strong quantitative support for the key role of social interactions in shaping the dynamics of science.

Xiaozhong Liu

Assistant Professor of Information Science
Indiana University, Bloomington
Website

Time:2:30 - 3:05 Topic:Scientific literature recommendation via full-text citation analysis Abstract:While rapid access to digital publications revolutionizes research and education, algorithms and systems are needed to recommend and locate relevant and useful scientific publications and topics for scholars. In this talk, I will investigate two related projects to address this challenge. First, by employing novel full-text citation data, i.e., citation context as the prior knowledge, a new method is implemented to enhance classical citation network and recommendation algorithms. Second, I investigate a new method to integrate full-text citation data into an innovative "context-rich heterogeneous graph", where citation motivation is characterized as vertexes and edges on the graph.

Staša Milojević

Assistant Professor of Information Science
Indiana University, Bloomington
Website

Time:11:15 - 11:50 Topic:Evlution of research team sizes and the modes of knowledge production Abstract:In the majority of disciplines papers used to feature a single author and only rarely several, but nowadays it is quite common for papers to have tens or even hundreds of authors. How did this happen? The model I developed shows that today, unlike in the past, there are two types of research teams. First are "core" teams that form by the Poisson process and consist of a small number of authors and another type are "extended" teams. Unlike the core teams these teams get exponentially bigger, and the modeling I performed suggests that this happens because of the process of cumulative advantage related to team productivity. These two types of teams co-exist today. Revealing that scientific knowledge is produced in two very different modes is important for understanding the science as a social undertaking with implications for interpreting different measures of research evaluation.

Filippo Radicchi

Assistant Professor of Informatics
Indiana University, Bloomington
Website

Time:3:05 - 3:40 Topic:Sharp transition in scientific impact Abstract:Since the rise of the publish or perish age, citation-based impact metrics have found wide application in the quantitative evaluation of research. Although being good proxies for scientific quality, such metrics may introduce bias, for example due to discipline and age dependent factors. For the case of single publications, citation counts can be easily normal- ized to generate universal indicators. The proper evaluation of performance of scholars is, however, a more complicated task because direct comparisons among individuals are not pos- sible due to the intrinsic heterogeneity in publication records and career trajectories. Here, we introduce a novel statistical methodology that relies on the generation of a personalized term of comparison specifically tailored on each researcher. Our approach allows to simultaneously suppress all possible factors that can bias a citation-based metric. By applying our methodology to the h-index of millions of scholars, we show that researchers can be classified into two distinct categories: those with a h value that cannot be achieved by random chance, and those with a h-index much lower than expected. The two categories are separated by a well defined critical line on the career phase diagram, approximately given by a h-index proportional to the square root of the total number of publications.

Daniel Romero

Research Fellow at the School of Information
University of Michigan
Website

Time:10:05 - 10:40 Topic:Analyzing the Frontiers of Science Abstract:

We have witnessed an upsurge in the production of scientific papers. In 1950 there were approximately 26,000 papers indexed in Web of Science. In 2010 the number rose to 1.4 million. This suggests that the amount of scientific knowledge available to researchers is increasing at very fast rate. Since every scientist has a limited capacity to discover and process existing knowledge that is relevant to her research area, every scientific paper builds upon a relatively small subset of previous knowledge. An important question that remains largely unanswered is how to search for relevant knowledge when writing a scientific paper. In particular, given that there is much more recent knowledge than early knowledge, how far back in time should we look for relevant knowledge?

In this work, we analyze approximately 18 million papers in the Web of Science database. We define two measures that characterize the distribution years between the time a paper was published and the time its references were published. We find that papers that reference recent papers, but also reference papers from a large variety of years, tend to have higher scientific impact than other papers. Furthermore, papers that follow this pattern of references also tend to reference more interdisciplinary and high impact papers.

To characterize how a paper draws together knowledge of prior work, we take two summary statistics from each paper. First, to measure how far back in time a paper looks for knowledge, we define the “reach” of a paper as the median time distance from the publication year of the paper and the publication year of its references. Second, to measure the variety of years among a paper’s references, we define the “breadth” of a paper as the coefficient of variation of the time distance among all pairs of references. The coefficient of variation measures the dispersion in time distance among pairs of references, while controlling for the average distance.

We study the relationship between a paper’s impact and its reach and breadth. Hit papers are defined as the top 5 percentile papers for a given year, as measured by the cumulative number of citations 8 years after publication. Our findings suggest a universal pattern: the highest scientific impact is achieved when recent work is combined with a large variety of work from the past. Finally, we test the robustness of our findings by testing whether it holds in other domains. We find that low reach and high breadth also leads to high impact work in the context of patents and US Supreme Court rulings. This provides evidence for the universality of our results.

Caroline Wagner

Associate Professor at the John Glenn School of Public Affairs
Ambassador Milton A. and Roslyn Z. Wolf Chair in International Affairs
The Ohio State University
Website

Time:12:25 - 1:00 Topic:TBA