Repository Node

      Repository Node


        Article Summary

        Repository Node

        Data enters the machine learning workbench via the Repository node. It is collected from the workflow and retained in a MySQL database until needed. It will then be transferred to a Python Pandas DataFrame and become available for profiling, transformation, and modeling.

        You must identify at least one data field name to store and can add additional fields by clicking '⊕ Add Value'. The Repository node will save all listed data fields and retain them for a moving window equal to the retention period.

        Adding a Repository node to your workflow

        1. First, select your desired solution and navigate to Rayven Workflow.
        2. Select ‘Machine Learning’ from the left-hand panel.
        3. Find the Repository node and drag it onto the canvas.
        4. Provide input to the Repository node by connecting it to your node of interest.
        5. Double click on the Repository node to open its configuration window.

                 

        Configuring your Repository node

        1. First, give your node a Name. Choose something simple that clearly explains its purpose.
        2. Set the Retention Period for the data. As soon as you save the node, it will begin saving data. However, it will not pull any historical data. The retention period is a moving window: selecting 'one year' will mean the node takes 365 days to fill. After this point, the node will hold data for the year to date, with older data dropping off.
        3. Enter the Retention Period Type. This value can be days, months, or years.
        4. The workflow processes records using the device and timestamp combination to create a unique key. By default, records with a duplicate device and timestamp combination are assumed to be fresher and will overwrite previous records. Selecting Device Unique Index and/or Timestamp Unique Index will modify this behavior and determine whether records get overwritten. You can use fields other than Device and Timestamp as Unique Keys.  
        5.  Enter the Data Field Name where you want to display the Machine Learning data.
        6. Select the SQL Data Type to save.
        7. Once you have connected and saved your Repository node, it is ready to collect data from the workflow.