Source – https://www.datanami.com/
Databricks today unveiled a new cloud-based machine learning offering that’s designed to give engineer everything they need to build, train, deploy, and manage ML models.
The new offering is designed to bridge the gap in existing machine learning products that arises by focusing too much on data engineering, ML model creation, or the deployment aspects of the machine learning cycle, Databricks says.
“Many ML platforms fall short because they ignore a key challenge in machine learning: they assume that data are available at high quality and ready for training,” Databricks says in its announcement. “That requires data teams to stitch together solutions that are good at data but not AI, with others that are good at AI but not data.”
To address this gap, Databricks lets users switch between user “experiences” that it exposes, including data science/engineering, SQL analytics, and machine learning experiences, to access tools and features relevant to their everyday workflow.
The new Databricks offering includes two major components: AutoML and Feature Store.
Databricks AutoML, like other AutoML solutions, automates many of the steps that data scientists must manually go through in terms of experimenting and testing different machine learning models. But instead of working like a “black box,” like other AutoML offerings, the new Databricks offering works like a “glass box,” according to Databricks vice president of marketing Joel Minnick.
AutoML will give data scientists the ability to peer into the inner workings of the model to see how things are working, either through the Databricks UI, through an API call, or through the notebook interface, he says.
“We will do everything that you expect an AutoML tool to do, like analyzing the data, figuring out the features, training and tuning the model,” he says. “But what we give you on the end of that process is all the experiment we ran, all of the notebooks we auto-generated as a result of those [runs] and let you compare those models and decide, perhaps I’d be willing to give up a point of accuracy if I can get inference 200 milliseconds faster with this model versus perhaps the most accurate model.”
If edge cases arise and the model is acting up, Databricks AutoML lets the data scientist “dive into the notebook code and have full control over that notebook code, so that that if I do need to spend some time accounting for edge cases, doing some additional tuning, I can do that,” Minnick says. “I can also understand how this model works, so if I have to explain to regulatory or to compliance authorities exactly how I’m making these decisions, I’m able to do that and be sure that that transparency is there.”
Feature Store, meanwhile, automatically tracks the machine learning features that are core to the functioning of the machine learning models. This helps to ensure the performance of the model doesn’t drift over time. The Feature Store is integrated with Databricsk’ Delta Lake platform and uses Delta Lake APIs. It’s also integrated with MLflow, which simplifies ongoing management of the model.
The features are embedded directly into the models when packaged with MLFlow, which simplifies the process of changing the features at a later time, Minnick says.
“So I never have to engage the application engineering team to make changes to the client application just because I’m evolving the features,” he tells Datanami. “This is a way for us to help customer get models to production and iterate those models faster and easier than before.”