A Strategic Plan for Biomedical Research— Without Statistics?

The future comes fast in the draft of the National Institute of Health’s (NIH) Strategic Plan for Data Science. Exascale calculation—the ability of the next generation of supercomputers to execute a quintillion (1018) calculations each second—“will be able to more realistically mimic the speed of life operating inside the human body,” according to the NIH plan, “enabling promising new avenues of pursuit for biomedical research that involves clinical data.” Such extraordinary advances, the draft plan says, will build on “interdisciplinary research” in a “highly integrated biomedical research landscape.” Scientists will use a vast array of biomedical data and deploy “machine learning, deep learning, artificial intelligence, and virtual-reality technologies” to “yield transformative changes for biomedical research over the coming decade.”

And statistics’ role in this new era? Hardly at all. Statistics gets two mentions, while storage—as in data storage—gets 19. The  American Statistical Association’s comments, prepared under the guidance of members of the Committee on Funded Research, include concern about the “overall lack of acknowledgement of the important role of statistics—the science of learning from data and measuring, controlling, and communicating uncertainty—and statisticians and biostatisticians in data science.”

The ASA comments also note statistics is “essential in fundamental experimental design, whether this is based on ‘small’ data or massive data”—and that statistical knowledge is fundamental to the modeling, machine learning, and decision support layers of the data science knowledge stack.

The oversight is puzzling, given the successful integration of statistics into the NIH’s Big Data to Knowledge program—and the increasing recognition in academic publishing that greater statistical expertise is needed to tackle problems in biomedical reproducibility, with both Science and Nature striving to implement increased statistical review. As the ASA comments conclude, “stronger and more robust scientific results” can only be achieved if statistics is fully integrated into, and statisticians and biostatisticians are part of, the NIH’s big plans for big data.