Source- cio.com
Advanced analytics techniques, such as artificial intelligence and machine learning, provide organizations with new insights not possible with traditional analytics. To take advantage of these technologies and drive competitive advantage, organizations need to design and build solutions that allow them to exponentially grow their capacity to create value from data.
The challenge is, how do you do that without also exponentially growing infrastructure costs and the number of data scientists needed to meet that business demand? The answer lies in industrializing the process using a data factory model.
- How do you drive innovation with AI/ML technologies?
As AI / ML technologies, packaging, frameworks and tooling are emerging rapidly, there is a real need to evaluate these new capabilities to understand the potential impact they might have on your business. The right place to do that is an R&D lab. In addition to this technology-led approach, your Lean Innovation team will also be scanning the technology horizon to fill any engineering gaps. Close cooperation between these teams resulting in a melting pot of innovation is exactly what’s needed to survive and thrive over the long term in a disruptive business climate.
- How do you prioritize horizon 1 activities?
As strategic developments progress, they will mature and move into planning horizon 1, assuming they continue to be viewed as adding value to the business. Given the infinite demand and finite resources available in most organizations, you need to decide which ones to focus time on. This prioritization challenge needs to be based on a combination of factors, including overall strategy, likely value, and current business priorities, as well as the availability of the data required. The data doesn’t need to be available in its final form at this stage, but you will likely need some data accessible to start the discovery process.
- How do you maximize data scientist productivity?
If you crowd your data scientists around a single production line with one set of tools and shared resources, they can’t help but get in each other’s way. Data scientists will be much more productive if they have an isolated environment, tailored specifically to the challenge they are faced with and have tools they know. That way, they get to independently determine the speed of the production line, which tools they use and how they are laid out.
- How do you address data supply chain and quality issues?
To avoid interruptions in production, the supply chain needs to deliver the data just in time for it to be assembled into the end-product, and the data needs to be of acceptable quality. That validation shouldn’t be done right next to the production line, so push it as far upstream as possible — so as not to interfere with the production line and so any quality problems can be addressed.
Data scientists also need to be able to iteratively save data as source datasets are integrated, wrangled and any additional facets generated. In legacy environments, this can mean significant delay and costs as the data is replicated multiple times. With modern storage technologies, replicas take near zero additional capacity and time to create.
- How do you accelerate the time to production?
If the data scientist finds something of value that needs to be put into production, the work package can be put in the Agile Development team’s backlog.
The same supply chain rules apply to the Agile Development team as did for the data scientists — they also need to have everything ready and at their disposal to accelerate the process, including the right environment and chosen tool-chain.
In a traditional approach, the unpredictable nature of provisioning affects when the data science work can start. By addressing this one issue, we not only improve planning for data science, but also for the downstream development and testing steps as well, making the entire process more predictable and efficient.
- How do you know if the model is performing as designed?
Having changed the business process in some fashion by implementing the new data product, you need to have some way of monitoring its performance – ensuring the real-world results are as expected, triggering either management attention, or a simple model rebuild when it declines below acceptable limits. In practice, this can often mean adding a new measure or report to an existing business intelligence solution or real-time monitoring dashboards — something that’s best “designed-in” from the start.