STAR METRICS: Analysis and Next Steps
December 27, 2011 § Leave a comment
Posted on behalf of Mya Strauss (Environmental Protection Agency), Umesh Thakkar (Government Accountability Office), and Julia Lane (National Science Foundation)
The agencies participating in the STAR METRICS program (National Institute of Health (NIH), National Science Foundation (NSF), Department of Energy (DOE), Environmental Protection Agency (EPA), and Department of Agriculture (USDA)) have continued to move forward in consulting with both the research and agency communities.
One focus has been to expand the current jobs reports that are provided back to research institutions each quarter. At a workshop held immediately before the Federal Demonstration Partnership (FDP) meeting, the research institutions asked us to provide information about the jobs supported over a rolling 12 month period in addition to the current quarter. They also agreed that it would be useful to use the electronic text mining techniques (described below) to both describe the research that is being done by scientists and students as well as the connections between researchers within their institutions. A sociologist, Jason Owen-Smith of the University of Michigan, did an analysis of STAR METRICS data for two institutions and found a high level of connection in a private university (Figure 1) with clear “bridging” positions across scientific teams and low levels of connection in a public university (Figure 2). Using standard analytical tools, he found that almost 54% of individuals in the private university were reachable, while only 11% were in the public and found that the most powerful bridges were research staff (possibly through shared instrumentation) and faculty.
The FDP asked research institutions how much time it had taken them to participate in the program. The median response for the participants who were not part of the initial pilot was 45 hours for the initial setup (with a range of 30-100 hours). The median time for subsequent transmissions was 2.5 hours (with a range of 0-10 hours).
The STAR METRICS outreach has also involved getting input from the community about three projects, which are currently in the proof of concept/pilot stage. These include a tool to describe and visualize research portfolios within and across agencies (which we have called Portfolio Explorer and includes an expertise locator tool) as well as an expansion of the R&D Dashboard that was initially developed in response to the eGov Act of 2002, Section 207, which mandates the development of a repository and website to describe R&D investments to the public. Finally, STAR METRICS has been partnering with Research Business Models (RBM) and the FDP to develop a common researcher profile (now called Science Experts Network) and CV (SciENCV) platform), as described in Sally Rockey’s blog.
Some of these tools were described to research institutions in an October 25 and 26 workshop held by the American Association of Universities (AAU). AAU will be setting up an issues group that will meet regularly to determine research institution needs. STAR METRICS will also hold a workshop on December 12 for Vice Presidents for Research to finalize plans for implementation that will feature some work that has already been done in Brazil (http://www.slideshare.net/rpacheco/sti-national-information-system-platform-the-brazilian-case-of-lattes; http://www.slideshare.net/rpacheco/sti-information-systems-brazilian-initiatives-frequently-asked-questions) and by NSF funded researchers (http://www.kauffman.org/comets/).
In addition, the NSTC’s Science of Science Policy interagency working group (which reports to the Social, Behavioral and Economic subcommittee) is planning a STAR METRICS workshop. This workshop will allow federal employees to learn how agencies are using STAR METRICS data and tools. NIH, NSF, DOE, EPA, and USDA will demonstrate how they are using tools, such as the Portfolio Explorer, R&D Dashboard, and Jobs Report. The March 29-30, 2012 workshop will be held at the American Association for the Advancement of Science (AAAS) headquarters and is open to federal employees. For more information, contact Mya Strauss (firstname.lastname@example.org), Umesh Thakkar (email@example.com), Julia Lane (firstname.lastname@example.org), or Bill Valdez (Bill.Valdez@hq.doe.gov).
More about Topic Modeling (David Newman, University of California, Irvine)
Machine learning is a scientific discipline that develops algorithms to allow computers to learn from patterns of occurring data to capture the relevance of certain events. Computational linguistics and natural language processing use these machine learning algorithms to automatically learn from a general set of rules that have been established (i.e. defining that when human and genome co-occur should be identified as one topic related to genetics). With these powerful tools in hand, it is then possible to feed large bodies of text to machines and abstract topics embedded in these documents (topic modeling). The machine identifies the topics, while the user sets the granularity for the buckets. The process is reiterative and it is refined with several rounds of fine tuning.
In the era of the data deluge, and ever increasing fragmentation of data sources, this technology provides great opportunities for the STAR METRICS endeavor, which has the goal of linking information from different existing sources to understand the effects of science funding. In this context, the cornerstone is to topic model the awards pulled from the STAR METRICS participating institutions in Level I and mine them for topics. Very practically, topic modeling can be useful to STAR METRICS institutions and agencies to assess:
- What science is being funded
- What are the trends in funding
- Quick quantitative analyses
In addition, the same models can be applied to texts from other available sources; for example, from full text articles in PubMedCentral or abstracts pulled from PubMed and Web of Science. This will allow for establishing a relationship between what is being funded by a given organization and what possibly is the universe of knowledge available. Practically this powerful analysis will allow:
- Gap identification
- Overlap identification
- Strategic planning
By combining this information with individuals and network analysis, such as those already developed in prototypes for STAR METRICS based on the patent database, it will be possible to link the topics to the map of innovation that results from these analyses.