My name is Donovan. I graduated from the Georgia Institute of Technology in 2011 with a Bachelor’s Degree in Computer Science. Since then I’ve been doing web development. My experience in the work force has been a great one, and I’ve learned so much about software engineering and the business world. However, lately I’ve been wanting to sink my teeth into something a little meatier.
In school I always had a love for math, and at one point I even considered majoring in math. I chose computer science because of two reasons: it had a math-heavy curriculum and also because the job market for CS majors looked great. After a few years of writing websites though, I realized that those two facets were not necessarily well correlated. That is, a large portion of computer jobs are not full of fun math problems, but are more rooted in the engineering and business domains. After writing what I can only imagine was my millionth html tag, I saw the error in my naive view of a Computer Science major’s future being interesting problem after interesting problem, with a nice paycheck to boot. So one day when I found myself watching hours of PyData videos, it hit me that I had an amazing option to pivot into an exciting, growing field that taps into more of my interests.
Enter Data Science
There is so much excitement right now around data, and I think for good reason. Data is becoming more and more prevalent as more and more devices are becoming connected. By Cisco’s predictions, we will reach 50 billion connected devices by the year 2020. And with a constantly growing number of devices, the number of connections will grow even faster. Soon, these connected devices and the data they generate could influence nearly every aspect of our day to day lives. In this one example on WIRED, someone in the future is recovering from a hip replacement, and several systems work together to aid in his recovery, including his doctor, pharmacy, insurance, utilities, and car. But as the WIRED article state, none of this will be possible without an emphasis on data to drive the seamless connection between the components of the system.
With this ever-increasing supply of data, the potential to extract meaningful knowledge out of the average human’s interactions and routines is immense! But by many accounts, including this one from the Wall Street Journal, the growth of professionals who can deal with this data isn’t increasing at the same rate as the data itself. Pat Gelsinger, COO of EMC Corp. is quoted in that WSJ artical saying,
Thirty years ago we didn’t have computer-science departments; now every quality school on the planet has a CS department. Now nobody has a data-science department; in 30 years every school on the planet will have one.
Maybe some of this excitement around Big Data is hype, but the trend is there, and it seems like a lot will change in the near future to push that trend even further.
What is a Data Journeyman?
Data Scientist. Data Analyst. Data Engineer. Data Specialist. Data Wrangler. All these terms and others are used with varying accuracy to describe people who deal with data. Plus, given the projected growth of the field, it’s believable that these terms will shift or possibly be redefined completely. Not to mention the new terms that will probably surface. So, talking about roles in this field can be confusing and frustrating to say the least.
I, however, don’t fit any of those roles. Yet.
I am just starting my venture into the world of data. I do have a statistics and AI background from college, so I’m not starting from scratch, but I have a long way to go before I’m worthy of any of the aforementioned roles. So for now, I’m just a journeyman.
Wikipedia defines a journeyman as follows:
A journeyman is an individual who has completed an apprenticeship and is fully educated in a trade or craft, but not yet a master. To become a master, a journeyman has to submit a master work piece to a guild for evaluation and be admitted to the guild as a master. Sometimes, a journeyman is required to accomplish a three-year working trip, which may be called the journeyman years.
So, in lieu of a guild, I will submit myself to the feedback of the internet and hopefully take the step from apprentice to master.
My vision for this blog
In a talk at the Galileo Symposium in Italy on the 400th anniversary of Galileo’s birth, Richard Feynman gave a talk entitled “What is and What Should Be the Role of Scientific Culture in Modern Society” in which he talked about two methods of verifying scientific knowledge. The first is gathering evidence to test what you think you know, a la the Scientific Method. He goes on to say,
But another way and a very important one that should not be neglected and that is very vital is to put together ideas to try to enforce a logical consistency among the various things that you know. It is a very valuable thing to try to connect this, what you know, with that, that you know, and try to find out if they are consistent. And the more activity in the direction of trying to put together the ideas of different directions, the better it is.
So that, I think, is the main goal of any Data [insert role]. Given a mountain of data (the stuff that you know), how can you filter it, massage it, love it, and combine it to reveal a truth about the world. I hope that through this blog, as a Data Journeyman, I’m able to connect the dots between the many areas of Data Science in a logically consistent way, from statistics to machine learning, from data analysis tools to visualization tools, from the unintuitive to the downright beautiful, for myself and other journeymen (and women) alike.
I plan to explore the books, blogs, websites, MOOCs, communities, and other resources I can find that focus on data. Perhaps my journey won’t be representative of every aspiring data scientist – after all, I am but one data point – but I hope that through my journey I can shed some light on the path for others to follow. So let’s begin.