Source – https://siliconangle.com/
More than a year after announcing plans to automate the feature engineering phase of artificial intelligence projects, Seattle-based startup Kaskada Inc. is bringing its first product to market.
Kaskada says it aims to democratize feature engineering, an often laborious process that requires data scientists to select, clean and validate the data to be fed into machine learning training models prior to moving them into production.
A model intended to predict housing prices, for example, would be feature engineered with predictor data such as the square footage of properties, number of bedrooms and location. The larger and more complete the training data set, the better the results.
The resources required to collect data and move machine learning models into production can be so significant that the capabilities are out of reach of all but the largest companies. Kaskada says its platform features a collaborative interface for team engineering and a proprietary data infrastructure for computing across event-based data and serving features in production.
“We are focused on building the bridge between training and production,” said Davor Bonaci, Kaskada’s chief executive and a former software engineer at Google LLC and Microsoft Corp. “We are launching a self-service platform to help data scientists get work into production by automating infrastructure. You can onboard and don’t have a big adoption curve or need to get everybody in your organization you agree to try it.”
The company’s self-service platform is a self-contained data science studio with pre-built machine learning models and the feature vectors needed to support them provided via an application program interface. “You get up-to-the-moment feature vectors for functions like real-time fraud detection,” Bonaci said. “You don’t have to write data pipelines or process streaming data. We run the data processing needed for the model.”
Event-driven focus
Kaskada’s platform has undergone some changes since it was announced, the most significant of which is a greater focus on event-driven data collection. That’s a type of processing that makes decisions in response to real-time events such as mouse clicks and transactions.
Event-driven processing is especially useful in scenarios like predicting the probability that a customer will buy a product or that a credit card transaction will be fraudulent. Real-time data handling requires an efficient data infrastructure to calculate features at arbitrary points in time and to deliver them to both training and production environments. “We have built a lot of functionality to think in terms of time,” Bonaci said.
The company has also focused more of its attention on automating the data science process rather than data engineering. Those two functions are supposed to work in tandem but frequently fail to communicate effectively because data scientists are focused on data and engineers on getting models into production.
“There can be friction getting into production because science and engineering teams have different values,” Bonaci said. “We reduce the friction needed to get work into production.”
Kaskada is a cloud-native service that customers can deploy in their own cloud instances, run as a managed service or install on local infrastructure. The company offers a distinctive pricing model that includes a free tier with limited data capacity, curated public datasets, sample projects and individual commit and version histories. Paid plans support team development, batch data uploads, direct data connection and real-time features. Details weren’t provided.