Scientists Simplifying Science

Tag archive

big data

CSG’s Data Science Venture: A new beginning

in ClubSciWri/Data Science by

Data generation and analysis is not a new concept. While high throughput scientific data has been generated by the likes of genome sequencers and the Large Hadron Collider, large swathes of commercial data has been generated by Amazon, Netflix, Google and social media platforms. Thinking in a more organized manner, every aspect of life, be it mechanics, biology, social media or weather, all generate data. Analyzing this data gives rise to meaningful trends and patterns, which can be used to ask questions that would reveal logical answers to various different aspects of life. For example, can we predict the next big disease outbreak using patient data from hospitals, so healthcare organizations can be better prepared to handle the outbreak; combine domain specific knowledge from modern medicine to build models that suggest what type of treatments will work for a given disease. In other words, by analyzing data, we can make better decisions and efficiently allocate resources to tackle existing problems.

Many trillions of gigabytes of data is generated everyday. It is estimated that by 2018 we will be generating about 50,000GB/second. GIS images, preferences on shopping, movies and shows on television networks, GPS based trends, social media behavior, photos and videos acquired by smartphones all contribute to very simple forms of data that is generated and stored by companies. Another example is healthcare, where patient metrics have been collected over a century, in the form of qualitative observations, images and numbers. Meaningful trends can be identified from this data by using predictive models and visualization tools.  In order to identify trends and build these models,  appropriate questions need to be asked of a data set.

With a growing trend in data analysis, there is an increasing demand for people who can perform such analyses. Scientists, by virtue of their training, are required to frame questions as the starting point of any project. They work to find answers to their questions by designing experiments, building numerical or analytical models and sometimes combining all three methods.  Since these people know how to extract useful information from a bunch of messy, unclean data, they are fast becoming the lifeline of data science. A number of scientists are self-teaching themselves at least one relevant coding language and are playing around with open-source biological and other data sets.

It is important that scientists who wish to train themselves for this relatively new profession benefit not only from a plethora of resources available online but also from appropriate interactions with the data science community. The purpose being not to feel lost in the huge online community that already exists, but to receive relevant pointers to resources and networking. The PhDCSG group has been especially instrumental in providing professional development guidance and resources to scientists especially from the STEM fields. Their recent initiative, called the Data Science Club has been created with this same objective: to train PhDs in relevant data science skills and also guide them towards career opportunities in this area. The club functions on the mentor-mentee format, with one mentor assigned to a group of 3-5 mentees. The mentees have been acquiring coding skills since the inception of the club and have an active network available to discuss projects, job openings, problem solving etc. The club will soon embark upon undertaking challenges from freely sourced data sets as well as smaller projects that are being out-sourced to the club members.

The goal of the club is two fold-  firstly by introducing people to the wonderful world of data science and machine learning, the club aims to help those who want to familiarize themselves with data science tools and apply them to various types of problems in disciplines like science, industry, education etc.  Their second goal is to keep in touch with the rapidly advancing field of data science, the academic leaps and its subsequent applications to various domains. The aim is to present articles at various levels, from building data based stories using visualizations to reviews and implementations of some of the newest techniques in machine learning.

Infographically Speaking…

Big Data, Big Returns

From Visually.

 

There are vast opportunities waiting to be harnessed in this field and scientists have a very unique opportunity on their hands: to take up the challenge of this newer profession and start making sense of the vast amounts of data that have already been generated in the world. After all, who understands data better than a scientist?!

 

If you want to learn about data science or discuss interesting ideas and projects please write to us. We would love to collaborate with data science practitioners and start ups looking to develop machine learning and data science solutions. So please feel free to connect with us at csgdatascience@gmail.com

 

 

 

About the Author:

Pawan Nandakishore  is a postdoctoral researcher at International center for theoretical sciences (ICTS). He is a PhD from the Max Planck Institute for Dynamics and Self Organization. His background is experimental physics, with a specialization in soft matter. Do feel free to write to him at pawan.nandakishore@gmail.com

 

Edited by:

Anshu Malhotra is an assistant scientist at Emory University and she is actively involved in co-ordinating the activities in CSG’s flagship mentor-mentee program (Gurukool). She is actively involved in bench-based research in pediatric oncology and is strongly interested in developing skills in data science. CSG’s current venture, the Data Science club is Anshu’s latest passion and she hopes that this platform will bring more life scientists together to train themselves and network in this budding new profession. In her spare time, she dabbles into artwork of 3D murals.

 

Cover Image: Vinita Bharat

 

The contents of Club SciWri are the copyright of PhD Career Support Group for STEM PhDs (A US Non-Profit 501(c)3, PhDCSG is an initiative of the alumni of the Indian Institute of Science, Bangalore. The primary aim of this group is to build a NETWORK among scientists, engineers and entrepreneurs).

This work by Club SciWri is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License

Facing adversities with alacrity – the odyssey of an aspiring Data Scientist

in ClubSciWri/Face à Face/That Makes Sense/Theory of Creativity by

There’re always stories about people who flourish or aspire to flourish while tackling challenges and setbacks during their training or profession. This time I bring to you the adventure of Urszula Czerwinska. Urszula, or Ula as her friends call her, is a Ph.D. student at the Institut Curie, Paris. Throughout her higher education, she’s donned the hats of an entrepreneur, a blogger, and that of an aspiring Data Scientist. She’s encountered her fair share of challenges during her education, but as we’ll learn, it’s the perseverance that drives a person to fulfil his/her passion. Ula’s tale highlights the determination and resilience required to achieve what at one point may seem inconceivable.

“I’ll never doubt that my parents have always had the best intentions for me. But they believed in the idea of ‘predisposition’. Simply put, one must perfect their skills for the talent they possess, rather than learning something completely new. That’s why I never got involved with sports, I didn’t go to art school (my drawings were good, but that wasn’t enough). I was shy as a kid and that’s why my parents advised me to choose a career that doesn’t involve a lot of social interaction. I don’t agree with the dogma of predisposition any more. Of course, it’s easy for some to be good in maths and for others in sports. But it doesn’t mean that one cannot learn. People change, I changed a lot through my experiences. I don’t aim for the Olympics, but I feel content going to the gym or dance classes. I previously considered it as a waste of time as I couldn’t be the best at them. And that so because I wasn’t ‘predisposed’ to sports. In my opinion, our future lies in our own hands. We can convert our weakness into strengths, only if we want to, and if we are ready to invest our time and efforts doing that. I also think that we have the capability of changing our thinking – to forge the path of our education and our career. It’s actually a proof that we can always get better and improvise.” – Urszula ‘Ula’ Czerwinska.

The journey begins – there’s plenty to learn

Ula’s Polish. She left for France at the age of 18 to pursue a joint degree in Biology and Mathematics in Roscoff. During her Bachelor’s, she studied an entirely new subject – programming. “And here’s the funny part – I sucked at it in the beginning”, she says. “I had troubles typing on a French keyboard (which is an AZERTY one)! While most of the students were finishing their exercises, I was still looking for the “?” button on the keyboard.” At one point, self-annoyance took a toll on her and she spent a lot of time studying using online resources. “Once I understood the logic of Python, the rest went smooth. I absolutely nailed the final project, and subsequently, I applied for a short internship in Bioinformatics at the end of my second year.” Ula also had the chance to study in Singapore as an exchange student. There, she shared classes with students who had completely different backgrounds than hers, such as, business. It was very enriching for her as she was exposed to the tools they used – like Prezi – and applied it to her own life science projects. She mentions a thought by Walt Disney that drove her, “All our dreams can come true, if we have the courage to pursue them – This quote motivated me to take the decision in my early years to go to France and fight for good grades.”

While she was finishing her Bachelor studies, Ula’s heart remained close to biology since it seemed like a mine of complex problems that she could solve with mathematics and programming. After applying to several Systems Biology Master programs across Europe, she finally chose the most flexible one in Paris at the Center of Interdisciplinary Research, supported by the Bettencourt Foundation. The uniqueness of this program was that a big part of the curriculum was designed by the students themselves and involved several internships. The coordinators encouraged the students to take part in initiatives, create thematic clubs, and of course, have fun with what they did. She decided to spend some time in a lab in Institut Curie, what would later become the home for her Ph.D. “I had to program in Java, and I had no clue about it. I spent half the time teaching myself and that too in a specific context of a software on which I had to work on. I felt demoralized as I was not progressing anywhere, and to make things worse, my supervisor left for 3 months. I was completely lost! But I started asking for help from postdocs in my lab and finally succeeded in coding a part of the software – it even got published!”

The following experience, although discouraging (as Ula would put it), was life changing for her – the iGEM competition. It’s an international competition in Synthetic Biology: modifying organisms to solve real world problems, or to the least, have fun. Her team worked all summer as an interdisciplinary unit to develop beauty products that would help people smell better though reprogramming their skin microbiome. The very idea of creating a product, something that people could use in their everyday life in itself was highly motivating for her. Their team also consisted of designers who helped them a lot with product design and attractive visuals. “This made me realize that science is not necessarily research, it’s very diverse.” Consecutively, during the final internship of her Master’s, she partnered with her friend and colleague Cristina Garcia Timermans to launch a startup called Eco-Smart Solutions. It was aimed at designing probiotic cleaners.

Eco-Smart Solutions – a beautiful failure

The startup was co-founded by Ula and her colleague Cristina, driven by their entrepreneurial enthusiasm after the iGEM competition. Initially, their idea was to design a probiotic cleaner containing bacteria that would eat dirt. This product would clean deeper and independently of the surface texture. Most importantly, it would not result in the creation of chemically resistant bacteria. The to-be treated surface’s natural microbiome would’ve been regulated by their cleaning microbiome, hence preventing the creation of a biofilm to which dirt sticks.

They discerned that the Paris metro system would be a great place to start, as it’s very hard to clean. Furthermore, it’s being cleaned using water at a high pressure that has a detrimental effect on the walls. “We even met the R&D team of Paris metro, but they said that the metro was clean, and basically that was it.” The team did not give up yet. Guided by their teachers, they continued with the project, but in the form of studying the microbiome of Paris metro. This would 1) unveil the metro’s micro-diversity, and, 2) aid them with designing a customized product.

Probiotic cleaners are wide spread. They’re used in hospitals across England, and on a regular basis in the USA, especially for cleaning animal farms (probiotic cleaners have positive impact on an animal’s health). Therefore, they also decided to test the existing probiotic cleaners and natural cleaners like soap. “We had a lot of fun in the lab that was not high-tech, and working with a tight budget within a short time.” They spent their days in the metro collecting samples from stations per their own protocol design. “And in the evenings, we would attend startup events and pitch competitions.” The samples they collected were sent for sequencing, but they encountered issues analysing them. “We asked a bioinformatics research team at the university for assistance and it turned out that the DNA we had collected was not of good quality. Hence, we couldn’t draw any conclusions from the analysis.” As conditions would turn out at the end of their internship, Ula and Cristina decided not to carry on as full-time entrepreneurs as at that time they didn’t have enough capital, and in parallel, they both had secured Ph.D. opportunities.

“We failed, but it was a beautiful failure. We created and executed a project form A-Z, learnt about visualizing aids, making a business plan, and studying the market. Although our skills and resources were not sufficient, I am incredibly fulfilled with this experience.” Right at this moment, a Polish saying crosses her mind which as Ula puts, matches one of the negative aspects of her character. “I’d rather die on my feet, than live on my knees” – Emiliano Zapata. She explains, “We need to be flexible nowadays, and sometimes, we need to get down on our knees to stand up later.”

Crafting the path of a Ph.D. – the challenges ahead

Ula started her Ph.D. in the same lab where she previously interned during her Master’s – engaging in unveiling the complexity of transcriptomic data with unsupervised learning. “This was perfect for me! I had to search for factors that drive biological processes in the ocean of noise.” The lab had also secured funds specifically for her, in case she didn’t secure a scholarship. “What I encountered next was one of my biggest failures, and it hurt my ego a lot!” Ula had applied for a Ph.D. scholarship with a career defining project in mind. She’d also apply for an MBA program for Ph.Ds. “During that time, I was convinced that I didn’t want to stay in academia and so, this project was the perfect opportunity for me. I could accomplish as a researcher while gaining access to management jobs right after my Ph.D.” Unfortunately, she wasn’t selected for the final round of interviews, and it disheartened her. “I even thought of giving up on my Ph.D., but I decided against it as I liked my topic of work.

Severely demotivated and lacking a vision for herself, Ula attended a Ph.D. talent fair in Paris. She realized that companies look for analysts with her skills sets – machine learning, R, Python. She received the same impression upon conversing with the representatives of one company. “This moment opened up a whole new universe for me – Data Science.” Following this defining moment, she decided to craft her extra training skills using free online resources and courses to ultimately land the job of a data scientist following her Ph.D.

Ula describes herself as an aspiring data scientist or a budding data scientist. There’s no definitive explanation for Data Science. “To me, Data Science is analytics, data visualization, machine learning, database management and big data.” Or to be abstract, it’s more like detective work: looking for patterns in data, building predictive models from data, and shaping the world based on accessible information.

For a layman – let’s say there’s a playground where a lot of kids are playing. Every kid is different, but they share some similar characteristics – hair colour, dress type, behaviour etc. Now if we look at say five more playgrounds and try to search for the same characteristics, we’ll end up with some properties that are either common or discrete amongst the kids. Using these properties (data), we can try to predict a prevalent picture (model/pattern) of the characteristics/behaviour of most children. Therefore, what we end up with is a meaningful description of the existing information. This is what Data Science looks like. But of course, it’s not this simple.

“Indeed, the Harvard Business Review has cited Data Science to be the ‘sexiest’ job of the 21st Century”, but why is it so appealing? “It’s appealing due to the power it gives to the companies in all sectors – finance, medicine, education etc. Given the vast availability of resources, it’s also not the hardest profession to move into or learn.” What’s sexy about Data Science is that it’s a relatively new field, geographically unbound, and is spreading like wild fire across all industries and disciplines.

Blogging – a tool for personal branding

Ula’s also a Senior Blogger for PLOS Computational Biology. “PLOS Computational Biology is very generous with its titles. I am a regular contributor for them.” She received communication from PLOS while she was about to attend an international conference on computational biology – ISMB in Orlando, USA. They were looking for live-bloggers for this conference. “I was already thinking of setting up a personal blog at that time, and the communication from PLOS turned out be the right trigger for me.” PLOS appreciated her initial work, and therefore, she continues to write for them on matters pertaining to computational biology, in addition to Data Science and associated Ph.D. careers.

Her personal website highlights the versatility of her writing skills – from career transition to live blogging. As she humbly mentions, “Honestly, I don’t think I’m a good writer. My English is far from perfect, but I keep working on improvising it by reading a lot. The Economist has turned out to be a great resource for me. I also think that apart from me writing the articles that I publish with PLOS, the hands of the editors also wean magic and make my scripts smooth. And as far as content is concerned, I try to be honest and share my experiences and thoughts. Funny as it may seem, I don’t take my writing to be versatile as I don’t write about travels, cooking etc. I only cater to what concerns me the most – Ph.D. and Data Science.”

Writing for her takes a lot of time, but once an article is published, it provides Ula a lot of satisfaction as her audience can read and review her point of view. Plus it’s still faster than writing and publishing in peer review journals.

The Pivigo Ambassador – another feather on the cap

Once Ula defined Data Science as the domain of interest for her Ph.D. studies, she started researching in-depth about it – more so about the skills needed and how to acquire them. There were and still is a range of online courses and materials. “I also subscribed to many mailing lists of Data Science websites”, she discloses her secret to me.

Pivigo – The Data Science Hub as it states on its website is a data science marketplace and training provider based in London. Ula’s determination in exploiting available resources led her to this platform and found the S2DS (Science to Data Science) program. S2DS is a program that helps Ph.D. students or postdocs in STEM to transition to Data Science. Their program takes place both in London and online. Students work on real problems of companies and are extended job opportunities following the program. “I would like to consider this as an option at the culmination of my Ph.D.” Interestingly, Ula found an advertisement about their ambassador program in their newsletter. “I contacted their community manager and I agreed to be the Pivigo ambassador in Paris.” Ula was already settling in.

“My role is to mainly spread the spirit of Data Science and information about the S2DS program”, describes Ula. They’ve also proposed that if she organizes any events in Paris that revolve around Data Science, Pivigo would support her. Ula chips in, “Most importantly, although this role is not a formal engagement, it has inspired me to instigate the community and create a Data Science Club at the Center for Interdisciplinary Research (CRI).”

Lessons from a journey well taken – an inspiration for everyone

For Ula, the journey as an entrepreneur, blogger, and an aspiring data scientist has not been easy. She deems herself fortunate enough to meet her colleague turned friend for co-founding the startup, and to convince their teachers for investing in it. “Although we didn’t play high risk, we didn’t also lose a lot of money, but most importantly we gained a lot in experience”, she confides in me. “I don’t treat blogging seriously as it’s a new role for me. I don’t even force myself to write regularly – I just follow my inspiration. I guess, the hardest part is Data Science. I realize that I need to prove myself in this field and it’s not easy for me with the workload of being a Ph.D. student”.

The time is ripe for Ph.D. students to explore resources outside their lab in addition to polishing and nurturing both new and existing skills. Curiosity and determination play an important role in achieving success. But some may feel diffident to do so. Ula adds, “I reckon if someone is shy, the best way would be to find a buddy from their lab or institute who can accompany them for some outdoor ventures. It’s more motivating to give a joint effort as we feel less insecure. It’s also crucial to realize that courses and networking are not side activities – they are as important as or even more important than your experiments, if you want to continue your career outside of academia.”

Alice Roosevelt Longworth once quoted, “Fill what’s empty. Empty what’s full”. It reflects on the idea of not only enjoying life and taking the best from it, but also share with others our own knowledge, competency, philosophy, and ideas.

Ula’s now following her own plan of gaining skills, reading, and interviewing companies. She also feels that being a part of the Ph.D. Career Support Group keeps her motivated for achieving her goals. She’s optimistic and hopes that future employers will recognize the passion in her for Data Science. And when Ula tastes success in her own terms, we will be there to applaud her.


About Urszula:

She’s a dynamic young scientist with an entrepreneurial spirit and high interest in Big Data, design, fintech & business analytics. She’s a self‐directed innovator working towards creating an opportunity to transition from academia to Data Science companies.

She also runs her own blogging website: https://urszulaczerwinska.github.io/

Follow her on Twitter @UlaLaParis

About Sayantan:

I’m an IRTA postdoctoral visiting fellow at the National Institute on Aging – National Institutes of Health, Baltimore, USA. Apart from science, I invest my time in networking, writing, organizing events, and consolidating efforts to build a platform that brings together scientists and industry professionals to help spread the perception of alternate careers for life science graduates.

Follow me on Twitter @ch_sayantan

 

Creative Commons License
This work by ClubSciWri is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Go to Top
Close
loading...
%d bloggers like this: