In this installation of Data Points, I include several books that have been instrumental in shaping my career and life in general.
In his 1974 Caltech Commencement Address, Richard Feynman speaks on scientific integrity, which equally applies to practitioners of data science.
I'm starting a new monthly series called Data Points where instead of taking a deep dive into a topic, I touch on several external sources that speak to me, and hopefully some of which will have an impact on you, as well.
This post includes a few details about a new job I've accepted, and what that means for this blog.
This post is Part 2 of my survey of the 2017 TensorFlow Dev Summit. In this round of videos, there are some impressive applications of TensorFlow.
After the TensorFlow introduction from my last post, I thought it would be interested to take stock in where the community is currently from their 2017 developer summit.
This week we dip our toes into Google's open source machine learning library and get our hands dirty with an example program. Wait, so are we washing our feet or dirtying our hands? Oh, never mind.
Two weeks ago I posted an experiement, and in this post I'm going to analyze the results.
For the first time on the blog, I'm actively collecting experimental data which I'll analyze in a future post.
I'm excited for 2017 for a number of reasons. At the top of that list are a few new opportunities I'll update you on here.
In this seasonal take on the classic Monty Hall problem, we'll look at yet another paradoxical result of probability in action.
As I enter the last week of my first semester of the OMSCS program, I have a little perspective to offer.
I ran across a visualization the other day that I thought could use some improving.
In this post, I continue with the Netflix interviews from a previous post to get a deeper look at how data analysis and visualization comes into play at the company.
Edward Tufte gave a talk at Art Center College of Design on some of his work that hasn't been published yet. Tufte is an incredible visionary, and this talk doesn't disappoint.
Data Science and the traditional sciences are similar in so many ways, and communication between the two is vital to both communities.
Taking grad courses online isn't what I was expecting, but a few weeks in, I think I already have some advice for anyone considering it.
A look into how data science is used at Netflix, a company that relies heavily on data-driven functionality.
My first semester of grad school has officially started. In this update, I'll talk about my experience so far, and how it might affect this blog.
Data and machine learning have the potential to save human lives in a variety of contexts, but in every such instance, ethical concerns are raised as well.
The mantra of big data is 'more is more', but this sentiment must be tempered with a respect for privacy, in my opinion. In this post, I'll look at some cases where identities were exposed not by malice but by lack of rigor.
The Central Limit Theorem is extremely fundamental to statistics, but it's so fundamental that it pops up in other places, like physics, too.
Algorithmic problem solving skills are crucial to data science, and as such, is a skill that deserves constant sharpening.
People say 'You can make statistics say anything', but that's only true if you don't know how to spot the warning signs of bad statistics.
Lately, I've been attending local meetups for civically minded data science projects. The one I attended last week had amazing projects and presenters.
I've been studying the incomparable works on visualization by Edward Tufte, and I'm sharing my notes here along with some general self-study tips.
The best way to enter into the world of Data Science is to practice Data Science. A great way to get involved (and to make a real difference in the world) is to join the civic tech community in your area.
Human progress can often be grouped into phases, from the Stone Age to the Iron Age to the Information Age and beyond. We may be on the brink of another technological revolution powered by AI.
This post is a continuation on the concepts of previous data privacy posts, focusing on the perspective of Eleanor Saitta, Etsy's new Security Architect.
The kind of statistics that have been covered in previous posts has mostly been Frequentist statistics. This post goes into the basics of Bayesian statistics with a look at experimental design.
Algorithmic bias can pop up in unexpected places if you don't safeguard against it.
A Data Scientist is often concerned with optimization problems, so when I find a great workflow for getting a task done efficiently, I immediately want to incorporate it into my process.
As I wrap up my Udacity Data Analyst Nano Degree, I look forward to my next steps as a Data Journeyman.
A visualization project I did for my Udacity nano degree program.
The senior graphics editor at Scientific American magazine, Jen Christiansen, has four rules of thumb for when a data visualization is appropriate.
I like to stay up to date on data science news, and a great way to do so is through newsletters. I have a couple of tips for managing them.
My visualization posts so far have covered a lot on the theory side of things. Armed with that background of theory, we can appreciate a very inspiring case where data visualization actually saved lives.
Tying together concepts from the past three posts, the principals of preattentive processing give your visualization that extra punch that can make communication with your audience more effective.
Color is one of the most misused visual encodings, so I'm dedicating an entire post on its dos and don'ts.
Visual encodings are the building blocks of data visualizations, so before we go any further with visualization posts, we need to go over them.
This introductory post to data visualization will be the first of a several-week series on the subject.
As proponents of the data revolution will often say, more data is always better. But is this actually the case?
What is the role of uncertainty in data science? It definitely needs to be part of the equation. Um...probably.
I take an introspective look at what the scientist part of data scientist really means in terms of one's personal worldview.
'Correlation does not imply causation' is data science mantra, but in this post I take a look at another problem with reports of correlations.
After many weeks of not posting and much consideration, I'm taking this blog in a new direction.
I'm making a personal shift in my data science studies, and we're making a curriculum shift as we move on to explore Bayesian statistics.
Taking a quick break our statistics curriculum, let's dive into the world of theoretical set theory. Don't worry, our previously scheduled programming will continue next week when we start tackling Bayesian Statistics.
We previously looked at a paper that showed a concept called the illusion of causality. Now that we have the tools to check the results of that paper, we're going to do just that.
This week we wrap up our Inferential Statistics course with a look at Analysis of Variance (ANOVA), a very common technique to test the relationship of outcomes among multiple groups.
We've done some hypothesis tests for normal distributions (and t distributions when appropriate). Now we'll look how we can use the chi-squared distribution to perform hypothesis tests on other distributions.
We've looked at confidence intervals and hypothesis tests by comparing a sample to an entire population (with well defined parameters), but what changes if you replace that population with another sample?
A simple April Fool's Day stunt turned into a fascinating social experiment, but was derailed weeks into it by technical issues. I'll offer an analysis that will hopefully bring closure to those who were invested in the experiment.
We will look at hypothesis testing by way of an example problem.
A look at the concept of confidence intervals (as opposed to point estimators) by way of several examples, including one that introduces a new distribution, the t distribution.
A quick look at some common discrete distributions.
Incorrectly recognizing a relationship as causal is so hard-wired into the human psyche, experts have given it a name: causal illusion.
Moving on from Think Stats, we'll apply many of its probability concepts by looking into the Inferrential Statistics curriculum on Khan Academy.
This week we wrap up our studies on the Think Stats book with the subject of its final chapter, correlation.
With the concept of estimators in hand, we'll take on an actual wartime usage of the concept.
The idea of estimating a distribution's parameters is often glossed over, but it's important to know the difference between an estimated parameter and a true parameter.
Wikipedia is an unrivaled source of information, but who says you can't have brains and looks?
Not all misrepresentations of data are of malicious intent. Sometimes a misrepresentation arises from a lack of due diligence.
If you've been reading along and are convinced of the value of modeling your data with a well-defined distribution, then understanding how to know which distribution is a good fit for you data is the important next step.
Examining well-known probability distributions will give us a lot to chew on in our path to understanding data. We'll start of with 4 common continuous distributions.
Udacity's Co-Founder and CEO, Sabastian Thrun, and VP Engineering and Data Science, Nitin Sharma, answer some aspects to this question.
Starting out with Think Stats will require us to cover the preliminary definitions that the book covers.
I'm starting a Curriculum tag to track a linear and (hopefully) fairly complete progression of data science skills and knowledge, starting with a statistics refresher book.
When I found myself in the weeds of the comment section of a politically charged opinion piece, I couldn't stand by and let some misleading data representations slide.
Dr. Messerli takes some liberties in his paper comparing chocolate consumption to cognitive abilities and reaches a dubious conclusion.
Finding good (and preferably free) data sets online can be challenging, but hopefully this list will help you get jump started on your next data project.
Trying to build a case for the claim "Ice Cream Sales and Violent Crime Rates are positively correlated" proves to be harder than I expected.
I'm kicking off a new series of posts to point out incorrectly applied data analysis techniques.
In this post I'll take a look at Max Shron's book Thinking With Data: How to Turn Information into Insights.
In honor of my birthday weekend, here's a look at one of the first unintuitive statistics results I ever encountered, which just so happens to deal with birthdays.
An overview of the tools that are used throughout this blog.
In this prefatory post, I will answer several starting questions.
Who am I? What will this blog cover? What is a Data Journeyman, anyway?