The exponential rise of machine learning is as much a result of technological advancement as it is the active community growing around it. This includes researchers working on core algorithms, as well as practitioners who are pushing the boundaries of how machine learning can be applied. It also includes an increasing number of machine learning enthusiasts with atypical backgrounds who are joining the conversation, bringing in diverse experiences and points of view.
Discovering and Attracting Machine Learning Talent
The increasingly symbiotic relationship between companies that need machine learning expertise, and data science competition platforms like Kaggle, has greatly impacted how rapid advancement is being achieved. This relationship has also changed the hiring landscape. Companies today face ever-increasing pressure to innovate in order to remain competitive, and they are pursuing comparatively unconventional means for discovering and attracting new talent in order to maintain their edge. The need for machine learning talent is so great, that companies are looking far further afield than once they might have.
Significant contributions from individuals outside the traditional boundaries of specialized fields like machine learning used to be few and far between. The process was historically a long and hard one, usually requiring an advanced graduate degree in math, statistics, computer science or a related field, and years of working on academic research projects. But this is changing, with platforms like Kaggle allowing anyone to get their hands on real-world data and giving them the opportunity to solve hard machine learning challenges.
Kaggle’s model is based on presenting machine learning competitions that function as opportunities for machine learning enthusiasts to test out, and further develop, their skills. From a learning perspective, this makes a great deal of sense, and the elements of play and competition add layers of motivation and excitement. The real key to Kaggle’s success, however, is that these competitions rely on real-world data provided by real-world companies.
For instance, Kaggle is currently running a competition where the task is to identify nerve structures in ultrasound images. Another problem, posted by State Farm, asks: Can computer vision spot distracted drivers? Facebook wants you to identify the correct place for check-ins, given noisy GPS data. As a participant, getting to play with these real-world datasets and structured problems is extremely valuable; in addition, some competitions carry significant prize money as well. But why would a company provide its data to Kaggle? Well, they actually do more than that—more often than not, they actually sponsor the competitions!
Interviewing Kaggle Co-founder and CTO Ben Hamner
This warrants some explanation, and there is no one better positioned to help with that than Kaggle Co-founder and CTO Ben Hamner. So we asked him about what’s in it for the sponsoring companies. Ben says that some companies “have a really challenging use case and they want to put some of the best minds of the world on the problem, and to attack the use case in a very competitive setting.”
Crowdsourcing Machine Learning
In short, they’re crowdsourcing machine learning! Sounds like a crazy idea? Surely it’s impossible for someone without the appropriate training and background knowledge to perform well on these tasks? One of the interesting things Ben’s team has noticed is that, “There’s basically no correlation whatsoever between expertise and who wins competitions. We have the occasional competition that was targeting astronomy and was won by an astronomer, but for every one like that we have ten where an English major won a bioinformatics competition, or a physicist won a brain signal processing competition.”
Fascinating, isn’t it? Perhaps this indicates that machine learning expertise really consists of a core set of skills that are highly transferable. Working on different kinds of problems can help identify and develop these skills further. It could also be the case that participants without prior exposure to a problem domain may have a unique perspective on it that allows them to think out-of-the-box, and try techniques that others may not have considered. “The diversity of the people who win competitions is the single most surprising thing to me.”
How Kaggle Competitors Stack Up Against The Experts
This all still begs the question—is Kaggle ultimately just a place for machine learning hobbyists to nerd out and win bragging rights, or is there any real value to it? How do the winning solutions compare with the state-of-the-art? In one of Ben’s favorite competitions, they “took a dataset of about 28,000 student-written essays from middle-school and high-school students…and challenged the community to develop automatic methods to grade them.” When the top solutions were compared with grades assigned by two human teachers, “the machine’s scores agreed as often with one of the teachers as the other teacher agreed with the first teacher.”
In another competition around a cardiac diagnostics problem, Ben says, “we had a dataset of about 500 patient MRI images that were videos of their beating hearts, and we challenged participants to do the same work that a cardiologist would do in estimating certain properties of the heart from the MRI images. This is a task that takes a trained cardiologist about 20 minutes and it’s pretty much wasted time for the cardiologist when they could be doing more valuable things in a clinical setting.” Amazingly enough, “the performance of the top 20 or so Kaggle teams approximated that of a human cardiologist.”
Collaboration and Community
I’m starting to sense that there is something incredible going on here! Beyond individual competitions, Kaggle seems to be having a significant impact on the entire field of machine learning. “Kaggle is a really great distribution platform for innovative techniques, because whenever someone sees the first person on the leaderboard in a competition that they participated and competed in, they spent many hours and they really really want to know what beat them and how did that winning person win. So that means if there’s a technique that tends to do well in a single competition we have a thousand or ten thousand participants poring over the details of that technique and how it works. And that makes Kaggle a really effective grounds both for proving that this technique on a problem really works as well as growing a large-scale adoption of that.”
Wait a second, aren’t these meant to be competitions? Why would participants reveal the solutions they put in so much effort to develop? Ben explains, “Yes, usually during a competition itself the top people in our community are incredibly competitive. But after the competition is over, it swaps to a very collaborative environment where in most cases there’s a winning interview published with the top teams, where they describe the solution in depth. In many cases we actually release code for the winning solution as well. All the competitions we host for research require the top three participants to open source their solutions.”
Doubling down on the community, Kaggle is working to lower the barrier for entry and foster better collaboration using tools such as Kaggle Scripts. “It immediately drops you into an R or Python or Julia computational environment that’s both linked to the competition dataset, as well as all the painful-to-install analytics packages that you need to use.” These scripts and their results are automatically published for anyone to fork and improve upon. They are also a great resource for beginners to study and learn from.
Building Your Machine Learning Career With Kaggle
The benefits of being active on Kaggle are not just limited to the rich set of problems available, and the support from the community. It is also an important part of building a career in machine learning. Some competitions are explicitly meant for recruiting, for instance Facebook’s check-in identification problem, where a high rank can get you an interview for a Machine Learning Software Engineer position. Kaggle also helps connect recruiters and candidates through their Jobs Board. But even otherwise, “Employers pay a lot of attention to your Kaggle profile…Strong competition performances are a really effective way to stand out from the crowd and demonstrate how well you can do on a problem.”
In a short period of time, Kaggle has evolved from being just another competition platform to a hub for applied machine learning research. It serves as a bridge between those with hard data-centric problems and passionate participants with an appetite for a challenge. Perhaps this sort of collaboration is exactly what we need to sustain progress. Ben agrees that machine learning is still in its infancy. “Machine learning is going to become just as easy to create and use an iPhone app is or a web app is. So it’s not going to be this magical thing that requires a small number of really specialized experts to create and use. Access to machine learning will become much more commoditized and democratized. And then that’s really going to enable a huge wave of new applications.”
You can watch our full interview with Ben Hamner (@benhamner) here. If you’re interested in joining the machine learning community (or deepening your involvement), experiencing Kaggle firsthand with the support of experienced competitors, and connecting directly to career opportunities in the field, consider enrolling in Udacity’s Machine Learning Engineer Nanodegree program. It’s the best way to master critical technical skills, while at the same time enjoyed mentored immersion into the exciting world of machine learning, and laying the foundations for your future career!
Machine Learning Engineer Nanodegree program