Welcome to my blog! A brief introduction to me is on the About page - head over there now before reading on!
So I’ve been a Data Scientist for about 6 months now and already I feel like I’ve been on quite a journey. Below is a summary of some of the things I’ve been doing, some of the choices I’ve made, and where I want to go next.
Where to focus my initial effort?
So one of the first questions I felt I had to answer before progressing any further, is what to learn first. After doing a lot of reading online and watching a lot of YouTube videos, I boiled it down to a decision between two languages; Python or R. I know there’s a whole world of other languages and tools out there, but if I was going to make a significant investment of my time in learning something, I felt I really had to make it count. This meant something which was already established with a strong user base, something which was free (obviously), and something which was going to be used as a vehicle to learn Data Science concepts; things like data cleaning, vectorised functions, reproducability, etc.
After much deliberation I decided to go with R. This was for several reasons. Firstly, I’d already had a good working knowledge of procedural and Object-Orientated programming through past experience with Visual Basic, and so I felt my general purpose programming knowledge was sufficient for now. Since I’d already had some significant programming experience behind me, the learning curve of R wasn’t as daunting, and I liked its native focus on dataframes. Most importantly, R is a vectorised language and so I wanted to use it as a tool to conceptually get my head around doing things without looping.
Great, now time to get up to speed on Machine Learning, right?
No. I’d read over and over that 80% of a Data Scientist’s time is spent cleaning data. I had no inclination to start learning more advanced concepts until I had some competence in the fundamentals. There is nothing more frustrating than muddling your way through a new language, especially one as hard as R, with no real grasp of the basics. So I’ve spent several months diligently working my way through courses on DataCamp as part of the 90-odd hour Data Scientist with R career track and am only now starting the machine learning modules.
I’ve also avoided going back to learn Python for now as I feel like for the time being I can do everything I need in R. This may change in time, so I may go back and learn it at some point. But right now, learning two languages would be counter-productive to my conceptual learning.
In addition, I’ve tried to find some small personal projects to allow me to practice, which I’ll be writing about soon. I’ve also enrolled on the Data Academy at work - a four week crash course on everything Data Science, which, if anything, has taught me how much I still need to learn.
My first priority is to see this DataCamp career track to the end, of which I’m currently over 80% through. I’m also really keen to understand Git and web technology a bit more, and so I’ve started this blog. The main thing is that I want to take things slowly and methodically, because I am definitely in it for the long haul and I have big ambitions to become the best Data Scientist I can be, and I can see me working in this field until I eventually retire. The whole field is so interesting to me and (relatively) new so I do feel like I’m gradually integrating into a much larger tech community. The collaborative culture is great, and I’ll now be spending much of my work time helping embed these emerging technologies in Government work, whilst trying to satisfy my insatiable appetite for learning!