Machine Learning Workflow
Machine Learning Workflows
The Machine Learning process involves providing a computer with data and answers so it can evolve a program.
We can separate the machine learning workflow into three main steps:
- Data Collection: gathering necessary data and answers from the solution workflow.
- Model Development and Testing: where the machine learning workflow develops the machine learning program.
- Model Deployment: exposing the program to new data by deploying it in the solution workflow so it can find answers.
Each step takes place within your solution as the Machine Learning Workbench integrates seamlessly with the rest of the platform.
Data Collection
Data collection begins in the solution workflow, where all your streaming data exists. You can gather data for machine learning from the solution workflow using a Repository Node.
Once configured, the repository node will begin collecting data from the workflow. You can configure the node to store this data for as long as required. Data stored in the repository node will be available from the machine learning workbench.
When configuring your repository node, consider what data you require to explore your hypothesis. Remember that machine learning requires broad and deep data. Ensure you gather data for several variables across wide and relevant ranges for representative time frames. If in doubt, include the data, as the modeling process will remove irrelevant and weak variables.
For a detailed explanation of the node and its configuration, see Repository Node.
Model Development and Testing
Machine learning is an exploratory and iterative process. In recognition of this, we created the Machine Learning Workbench, an integral part of the platform that you can access from any solution by selecting 'Machine Learning' from the left-hand panel.
Using the Machine Learning Workbench, you can create a new workflow or return to any you have previously saved. Selecting the 'Models' tab will display any existing machine learning models.
Machine learning workflows are interactive development canvases that allow you to analyze data, explore and test multiple ideas, save your work, and return to it as you need. The flowchart format exposes the course of data logic and analyses to ensure your workings are coherent and transparent for other contributors. This flowchart also acts as an interactive interface to Python. Each node comes with prepackaged functionality from simple programs like a Pandas DataFrame object to complex neural networks, enabling you to focus less on coding and more on analysis.
The Rayven library of machine learning nodes is regularly reviewed and expanded: if you don't see what you need, please speak to your Rayven Support Representative.
To begin a new machine learning workflow, select a repository node from the node menu and drag it into the canvas. Repository nodes in the machine learning workflow are effectively windows into the repository node created in your solution workflow. With machine learning repository nodes, instead of seeing the variable names and types, you will see the data itself as a DataFrame. From here, you can add nodes to visualize, analyze, or model your data by adding new machine learning nodes and connecting them to the node you want to extend. You can also create multiple branches with different analyses from the same data pre-processing steps.
Remember to save your workflow: unlike the solution workflow, you must save your machine learning workflow before closing the canvas, or your work will be lost.
Model Deployment
Saving a canvas will save the state of your work. However, it won't save the model or make it available to your workflow.
You must add a 'save model' node to your machine learning workflow to save a model. Before doing so, you might want to run a survival analysis to ensure the model is functional.
To deploy your machine learning model, you will need to expose the model to your solution workflow. This process is essentially the reverse of our first step with the repository node. Then, we extracted data from the solution workflow to use in the machine learning workbench. Now that we've analyzed and processed this data, we want to send it back to the solution workflow.
You can deploy your model using a Machine Learning Model Node.