Source:- .cio.com
Data science may never be easy but it’s getting easier to dive in. Buzzwords like “machine learning,” “regression,” and “dimensionality reduction” are just as challenging to understand as ever, but the widespread desire to reap the benefits of these techniques has resulted in several good tools that create assembly lines for data that are ready to pump out the answers we seek.
Data scientists used to wring their hands because 80 percent of the work was preparing data for analysis by crafting custom routines in Python, Java or their favorite language all so the sophisticated statistical tools in R or SASS could do their job. The marketplace is now filling with sophisticated tools that bundle together several hundred well-engineered routines into a package that does much of the repetitive and unpleasant data cleanup and standardization for you.
These new tools open the opportunity for anyone who’s comfortable working with a spreadsheet. They won’t make all prep work disappear, but they’ll make it easier. There’s less need to fuss over data formats because the tools are smart enough to do the right thing. You can often just open the file and start learning.
The secret is similar to what revolutionized manufacturing. Just as standardized parts helped launch the industrial revolution, data scientists at various tools vendors have produced a collection of very powerful and very adaptive analytical routines. They’ve standardized the interfaces, making it much simpler to build your custom pipeline out of these interchangeable data science tools.