thank you for coming to this talk. welcome to this talk. as was said it's a talk about moving into data science specifically from non-data science academia. so approximately three years ago I myself moved from academia so after my PhD in molecular biology and because I’ve done this, sometimes I get contacted by people who are interested in knowing how could they do such a similar thing. so, a few months ago I decided to conduct a survey so I started just a short survey collecting ideas (I’m just trying to change the slide) so to ask people what are the kind of questions that they would like to know about this topic. and three main themes came up in this. it was about what are the skills required for to change, what kind of things should I learn, and how do I get hired. so, if I introduce first the topic of which kind of skills are required: on the right-hand side you see a Venn diagram which covers knowing business knowledge, coding skills, and also on standardized academic approach for me. in my opinion this is what you need to do data science, but I know that if you're coming from academia directly, you're not going to have all these skills. but it is okay not to have the skills it's important to know where your strengths are though and then to identify which of those which of the areas that you can improve on. and one thing I’d like to mention is it's good not to assume you know everything. so, if you're working in academia often you are an expert, but you are an expert in a very small area and just because you're an expert in that field does not mean you can you're an expert in everything. but on the other hand, don't assume that you cannot learn new things. so, if you are in academia you are constantly learning so you can definitely learn new things as well. so now I’m just going to briefly go through these three points of business knowledge, an academic approach, and coding skills. so, on the left of the slide here I just have a few examples of many different terms which I did not know what they meant when I first left academia and in my first job at a1 Austria. I really learnt a lot of these words. and I think when it comes to business knowledge it's definitely something that you learn on the job and therefore when thinking of trying to improve business knowledge and it should be focused on how do I get my first job. now I’ve been in a few recruiting circles for various data science positions and one of the fears that is often had by HR is that a person can be too academic. so, this is why I encourage people to emphasize their work with non-academic collaborations or maybe non-academic coding projects which is something that I will talk about a bit later. in terms of coding skills there are mainly two different languages which are used when I think of data science. this is python and r. they both have their advantages which are listed here, and they also have some disadvantages each. and alongside r and python there is also sql which is an extremely powerful data manipulation language used for manipulating data as soon as it is extracted from relational databases and it's often long used alongside other tools so if you have a solid foundation a solid knowledge of python or r along with some initial foundations in sql then I think that's a really good way to go again, on the topic of coding skills one of the questions which I was asked in the survey was, “is it possible to do a soft transition from academia to data science?” and I decided to ask a friend of mine Brian reichholf to get his opinion on this too. and we both kind of agreed that you shouldn't avoid coding. so, as a data scientist we see coding as an integral part of this and we recommend jumping in to try coding if you don't like it it's fine, but you try it and you know. you could go for a job which does not require coding and maybe you learn a lot of business relevant skills there and that's great, but we also don't think that if you go for a job which doesn't involve coding, I don't think you should expect to learn how to code on this job. in terms of which language to choose between r and python my recommendation is if you know any of one of them even a small bit, I definitely say start with what you know and also think about where you want to be in the short term so in six months for example you can become quite proficient in one of these two languages at least to an intermediate level. and also, I would say stick to one initially and don't assume that you can't learn more languages in the future. so just because you start with python doesn't mean in a couple of years’ time you can't start using r. and once you've learned one programming language it's a lot easier to move into another one as well. at the bottom here I’ve just listed a list of free resources that are available so there are it's a link to a list of books which are freely available and also contains step-by-step tutorials on how to start get how to get started doing data science using r or python so here on this slide I just mentioned a few of the key data science tasks which these are things which I do every single day pretty much as a data scientist. so, in the first part you have data import, cleaning and manipulation, pivoting, joining, summarizing. These kinds of things. so, these are often tasked with are done usually in excel you probably might know how to do them already in excel but I’ve also listed packages in r and packages in python used for doing the same thing. and I think that if you are able to show that you can do these tasks using either r or python then this is a really good foundation. secondly is data visualization this is also a really important part of data science so here I’ve shown the common commonly known package called ggplot2 in r and also there is matplotlib and seaborn in python. I personally am a big fan of plotly because this is a tool used for interactive visualizations, I think I think these have really high value so if you are interested in interactive visualizations then I recommend to check that out as well. and finally, there is machine learning which is something if you are interested only in data analytics you do not necessarily have to go into this but if you are interested in it in the more data science aspect of the field then some classic packages there would be are would be caret in r and scikit-learn in python. so these are pretty much my set of tools which I use on a regular basis as a data scientist. now I move on to the academic approach and because this talk is about moving from academia so there are many many ways that we approach problems in academia which have very strong equivalence in data science actually. so, on the left I have a set of questions some kind of questions that we often ask ourselves in academia such as “what went wrong?”, “which is the best method to use?”, “can I trust my result?”. and here on the right side I just show different examples of how this is commonly tackled in STEM academia and in data science. now I saw I saw a comment in the chat in one of the breakout sessions previously the one about career and about stem versus non-STEM, and I just wanted to say that I come from a STEM background, but I think generally in academia, in academic research these kinds of methods apply definitely and so definitely when you think about social sciences such as sociology and psychology these definitely apply as well. so, if I give you a brief example here if we look at the question of “what is the best method?”. in academia and in data science the best method is often a combination of balancing accuracy, time, cost, and also the interpretability of results so how well do I understand the results and how well can I explain those results to others. another thing that is done in in STEM academia so in my field of molecular biology is optimizing different methods. and to do this we there are normally a bunch of different options which can take different values for example how much of a certain chemical we should use in an experiment and to optimize this we try to work out which is the best combination of all these different chemicals and so we do this by changing one option at a time to see what is the best overall combination. and this is pretty much what is also done when you look at machine learning in data science and when you are tuning hyperparameters to find the best hyperparameters for a model. and one other aspect that I would like to highlight here is “how do I show the value of my work?” so overall in academia and data science it's about clearly writing or being able to verbally communicate in understandable language. and I think if you're able to do this then it really makes things a lot easier and if you're in academia you've already got a lot of practice doing this skill, so this is something which is really beneficial for data science too now I will move on to learning. so when thinking about “how can I prove what I know?” one overall thing I definitely recommend is doing personal projects for example on GitHub so I think if, using the tools that I showed before for r or python, you can show how you clean data visualize data and, if you're interested, how you can do modelling based on this data, I think you can you can have a pretty good case which shows that you can do and that you're ready for a data science job. there were a couple of other questions which came from the survey such as “should I do online courses?” so I recommend if you do an online course that's great but apply what you've learned to a personal project as well. and then there was another question about “should I get certifications in cloud computing?” personally I think it's more important to get the basic aspects of cleaning, visualizing, modelling in a coding language before moving on to topics such as cloud computing. and finally, I would like to talk briefly about getting hired. so, one thing I think is important to bear in mind if you're wanting to move outside of academia is that the aim is to get a job outside of the academia. the aim is not to get a high-paying job that challenges you mentally in all the ways that a PhD might do. this means that when you look at a job description make sure you identify the skills listed in the job description and write your cv to match those skills. this unfortunately means that for example, lists of conferences attended, or papers published might not be so relevant and, in many cases, might lead to the person hiring thinking that you are too academic. so, it can actually be a detriment to have them on there. I would also recommend highlighting skills that you use in a project rather than a project itself so for example saying something like “you collected standardized data over a period of six months, merged data together, developed conclusions and presented them to stakeholders” rather than saying “I looked at gene x in species y” and finally I would say be humble. as I mentioned this Venn diagram, there are many skills that we do not have when we start but it doesn't mean we cannot learn them. so really, it's good to see a job opportunity as an opportunity to gain experience. and finally, the first position is definitely the hardest so don't be discouraged and after that it will get a lot easier. and with that I just want to thank you for your attention and I’m happy to take a few questions.