Preview note: As the scientific community slides into the era of towering collaborative and multidisciplinary projects it is now impossible to ignore the importance of Open source and Free data sharing. However, the community is gravely divided on this ground. As we progress in our discussion, we figure out that both the pros and cons are justified and deserve open minded considerations. Inspired by a heated online debate on the official Facebook page of Career and Support Group on 29th March 2017, Rohit decided to reasonably curate the views under one roof. This is a very enticing article, especially for the early stage researchers who often find themselves in the dilemma of ‘To be shared or not to be’ – Rituparna Chakrabarti
Data sharing is an integral part of collaborative scientific research and is often encouraged within the scientific community. National Science Foundation has emphasized the importance of sustainable data sharing and management in the progress of science and engineering, and has proposed policies in its favor. The New England Journal of Medicine has published a number of articles and editorials highlighting the importance and new developments in data sharing, especially in clinical sciences. Therefore, generally speaking, while there may be multiple aspects and finer details attached to the individual arguments, it is accepted that data sharing has a positive impact on scientific research and is encouraged1. There is, however, a part of the data sharing conversation that is often experienced firsthand by fresh PhDs, post-doctoral fellows and other young scientists and researchers.
As young researchers attempt to embark upon new career opportunities, they must reply upon the limited research experience they have accumulated thus far. It is natural they want to use this experience to sell their skills and knowledge to the prospective employer during their job interview. It is also expected that sometimes the prospective employer would want to learn more about the candidate’s past research and evaluate them based on their work. This may involve seeking relevant data, research methodologies and innovations involved in that research. If the said research has already been published (or has been accepted to be published) then the subsequent process would be fairly straightforward and the candidate will triumphantly share the past research. If, however, the research is yet unpublished and/or is under peer-review process then data sharing can be tricky. The Principal Investigator (PI) heading the research might not be too keen on sharing it outside the lab until it is published. Scientific research is a competitive domain and it is a valid concern for such a PI who might be at a risk of getting scooped because the data may end up in the hands of a competitor. And the push and impact of publishing novel research on the career of a scientist is not unknown. But what must then the young researchers do? They are not allowed to share the research they have worked hard on thus far even if they want to. A prospective employer will want to evaluate the candidate’s research skills and such a discussion might require discussing unpublished work. Everyone involved appears to be justified in their stance.
There are some suggestions2, ranging from having better interpersonal communication to changing mindsets in the field that may allow young researchers to circumvent this issue. At the outset it is essential to establish the fact that all data and other research content generated in an academic setting belongs to the hosting university or institute, and therefore, the employees may not be at complete liberty to disseminate this content without prior permission of the university. But this doesn’t mean that the research may not be discussed at all (unless some form of a non-disclosure agreement is involved) with a third party. In such a situation, researchers seeking for employment outside the university may choose to have prior discussion with the prospective employer and ask to share only the published research details (preferably pre-approved by the PI/university), which might demonstrate their competence and research acumen. But this may not always be easy. Talking about the dilemma of how much research one should share when interviewing at a company while working at another, Derek Lowe has noted,“ No published work worth talking about, no patent applications, no nothing. I actually did go out and give an interview seminar under those conditions once, and it was an unpleasant experience. I had to talk about ancient stuff from my post-doc, and it was a real challenge convincing people that I knew what was going on in a drug company. I don’t recommend trying it.” If the unpublished data must be discussed, it can be done as a part of more general problem to solution to impact discussion i.e. without communicating any specific details, presenting any slides and only mentioning the data verbally– the interviewee must be clear about what can and cannot be discussed, because once the idea is out there anyone may be able to claim rights to it if it is not already published or patented. Data sharing under these circumstances does present a unique challenge. As an innovative solution, the PI and the prospective employer may discuss a publication strategy beforehand that may benefit all parties involved. This approach is more likely to succeed if the candidate takes the initiative to establish such a communication channel and be open about the prospects – high risk, high reward.
Needless to say that this problem may not arise at all if the research in question is already published on a preprint platform such as arXiv or bioRxiv.While the jury is still out on the pros and cons of the preprint strategy, it is undeniable that it has been gaining in popularity due to its open source nature and ease of submission. In spite of the benefits, this option may be more favorable to researchers publishing in physics, mathematics and informatics related fields (primarily on arXiv), than to those publishing in life sciences (primarily on bioRxiv) and chemistry related fields. This argument is supported by the statistics as they currently stand. Launched in November 2013, bioRxiv had received ~3100 submissions till January 2016 (as per ; ~114 submissions/month). On the other hand, arXiv was launched in August 1991 and has received ~1.2 million submissions to date* (~4050 submissions/month). The reason it took some 23 years for a life science-centric preprint server to be launched might have had to do with both culture and content in the life sciences research. As it may be clear by now, publishing one’s research on a preprint server is not the end of the road. Most researchers eventually want to publish the same (or an improved version of) research in a high impact peer-reviewed academic journal. Policy on publishing the manuscripts that have been published elsewhere, including on preprint servers, differs by journal. Even though researchers would like to publish on a preprint server (for all the reasons discussed above), they are wary of the fact that they will likely not be able to ultimately publish the same research in high-impact journals like NEJM or AACR. The problem is more acute in life science journals while most physics and mathematics journals accept research previously published on preprint servers.
But things seem to looking up in the preprint world in general. bioRxiv is far younger than arXiv, but the trend in the rate of submissions has been steadily increasing since their inception which means that more and more researchers are opting for this route. On that note, it is now possible to directly submit bioRxiv preprints to leading academic journals. Encouraged by the success of preprint approach and its potential in pacing the speed of scientific discoveries Chan Zukerberg Initiative has decided to provide a funding of $3 billion over 10 years to bioRxiv. What’s more, American Chemical Society has decided to launch a preprint server for chemists. These developments point to the preprint approach becoming the leading approach to share research data before it is published in a peer-reviewed journal. Therefore, in absence of extraordinary circumstances, if the PI can be persuaded to publish the research to a preprint server, the candidate may avoid the difficulties around data sharing. It is, therefore, important to foster a productive, amicable and strong professional relationship with the PI.
Even with the increase in the sheer amount of data, data sharing today is easier than ever. The question is how much are we willing to share and how much are we allowed to share. For young researchers and graduates transitioning into new research positions these questions can be the difference between success and failure. These suggestions aim to provide a template and facilitate decision making for these very researchers. Eventually, a more collaborative effort and understanding by all the stakeholders is required.
1Further reading on the importance of Data sharing:
2Certain suggestions are adapted from a recent discussion on the official Career and Support Group Facebook page, which inspired this post. The contribution of the members to this discussion is acknowledged and appreciated.
*At the time of publishing of this article
Acknowledgements: Somdatta Karak (Editing), Rituparna Chakrabarti (Featured image).
About the author: Rohit Arora obtained his PhD from ENS in France. Post-Phd he worked as a postdoc in France in collaboration with a major pharmaceutical company. He is currently a postdoc at Beth Israel Deaconess Medical Center. His research focus includes understanding biological structure-function relationships and developing novel tools to make sense out of “big data” in biology. He enjoys reading about his newfound interest in the history of mathematics, geometry, and philosophy. He can be reached on Twitter @RealRohitArora (sure, you try and come up with a better handle for name this common)
This work by ClubSciWri is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.