3. SustainML Framework Data Model

Following are the Data Structures definitions using the Interface Definition Language specification.

Every node in the SustainML Framework needs to provide a continuous feedback status to the Orchestrator Node. This is modeled with the NodeStatus data structure.

3.1. Node Status Type

#NodeStatus.idl

enum Status {
        INACTIVE,
        INITIALIZING,
        IDLE,
        RUNNING,
        ERROR,
        TERMINATING};

enum TaskStatus {
        WAITING,
        RUNNING,
        ERROR,
        SUCCEEDED};

enum ErrorCode {
        NO_ERROR,
        INTERNAL_ERROR};

struct NodeStatus
{

    Status node_status;
    TaskStatus task_status;
    ErrorCode error_code;
    @key long task_id;
    string error_description;
    @key string node_name;
};

3.2. Node Control Type

In order to correctly manage the lifecycle of nodes and tasks, a NodeControl data structure is defined. This data structure is internally used in the communication library.

#NodeControl.idl

enum CmdNode {
    NO_CMD,
    CMD_START_NODE,
    CMD_STOP_NODE,
    CMD_RESET_NODE,
    CMD_TERMINATE_NODE};

enum CmdTask {
    NO_CMD,
    CMD_STOP_TASK,
    CMD_RESET_TASK,
    CMD_PREEMPT_TASK,
    CMD_TERMINATE_TASK};

struct NodeControl
{

    CmdNode cmd_node;
    CmdTask cmd_task;
    string target_node;
    long task_id;
    @key string source_node;
};

3.3. User Input Type

The UserInput data structure carries information about the input from the user, when describing a new task (Machine Learning problem). It is comprised of the following fields:

  • task_name: The name of the task.

  • modality: The modality of the input data e.g image/video, text, audio, sensor,…

  • problem_definition: The type of problem to solve e.g classification, regression, clustering,…

  • inputs: A sequence of serialized batches of input data.

  • outputs: A sequence of serialized batches of output data.

  • minimum_samples: The minimum number of samples required.

  • maximum_samples: The maximum number of samples required.

  • optimize_carbon_footprint_manual: A boolean indicating if the user wants to manually optimize the carbon footprint.

  • previous_iteration: A previous iteration from which to perform the optimization (-1 for taking the last one).

  • optimize_carbon_footprint_auto: A boolean indicating if the user wants to automatically optimize the carbon footprint until a desired value.

  • desired_carbon_footprint: The desired carbon footprint.

  • geo_location_continent: The geo-location continent in which the ML problem is going to take place.

  • geo_location_region: The geo-location region in which the ML problem is going to take place.

  • extra_data: A sequence of raw extra data for out-of-scope use cases.

  • task_id: The identifier of the ML problem to solve.

#UserInput.idl

struct GeoLocation
{
    string continent;
    string region;
};

struct UserInput
{
    string modality;
    string problem_short_description;
    string problem_definition;
    sequence<string> inputs;
    sequence<string> outputs;
    unsigned long minimum_samples;
    unsigned long maximum_samples;
    boolean optimize_carbon_footprint_manual;
    long previous_iteration;
    boolean optimize_carbon_footprint_auto;
    double desired_carbon_footprint;
    string geo_location_continent;
    string geo_location_region;
    sequence<octet> extra_data;
    @key TaskIdImpl task_id;
};

3.4. ML Model Metadata Type

The MLModelMetadata data structure represents the output from the Task Encoder. It is composed by:

  • keywords: A sequence of strings identifying the key workd from the user input problem description.

  • ml_model_metadata: The machine learning model metadata. At the current stage of development, the metadata is generically defined as a sequence of strings.

  • extra_data: A sequence of raw extra data for out-of-scope use cases.

  • task_id: The identifier of the ML problem to solve.

#MLModelMetadata.idl

struct MLModelMetadata
{
    sequence<string> keywords;
    sequence<string> ml_model_metadata;
    sequence<octet> extra_data;
    @key long task_id;
};

3.5. Application Requirements Type

The AppRequirements data structure depicts the output from the Application Requirements Node and consists on:

  • app_requirements: A sequence of application-level requirements, modeled as a sequence of strings, to be considered in the selection of the machine learning model.

  • extra_data: A sequence of raw extra data for out-of-scope use cases.

  • task_id: The identifier of the ML problem to solve.

#AppRequirements.idl

struct AppRequirements
{
    sequence<string> app_requirements;
    sequence<octet> extra_data;
    @key long task_id;
};

3.6. Hardware Constraints Type

The HWConstraints represents the group of constraints defined (or not) by the user when describing the problem. It is the output from the Hardware Constraints Node.

  • max_memory_footprint: The maximum memory footprint allowed for the ML model.

  • hardware_required: A sequence of hardware selected by the user to be taken into account by the framework.

  • extra_data: A sequence of raw extra data for out-of-scope use cases.

  • task_id: The identifier of the ML problem to solve.

#HWConstraints.idl

struct HWConstraints
{
    unsigned long max_memory_footprint;
    sequence<string> hardware_required;
    sequence<octet> extra_data;
    @key long task_id;
};

3.7. ML Model Type

The MLModel data structure represents the output from the Machine Learning Model Provider. It is divided in the following fields:

  • model_path: A string containing the path to the ML model.

  • model: A string with the model name, in case of remote approach.

  • raw_model: A sequence of bytes with the raw model, in case of remote approach.

  • model_properties_path: A string containing the path to the properties of the model.

  • model_properties_path: A string containing the path to the properties of the model.

  • model_properties: A string with the model properties, in case of remote approach.

  • input_batch: A sequence of serialized numpy arrays with a dimension: Batch x Channels x Height x Width (each one representing a batch) conforming the input batch.

  • target_latency: The target latency or fps for computer vision tasks or target processing latency in seconds for other tasks, like time-series analysis.

  • extra_data: A sequence of raw extra data for out-of-scope use cases.

  • task_id: The identifier of the ML problem to solve.

The model and model_properties can be optionally filled. The reasoning for include them is to overcome situations in which the model is generated into a remote machine.

# MLModel.idl

struct MLModel
{
    string model_path;
    string model;
    sequence<octet> raw_model;
    string model_properties_path;
    string model_properties;
    sequence<string> input_batch;
    double target_latency;
    sequence<octet> extra_data;
    @key long task_id;
};

3.8. Hardware Resource Type

The Hardware Resources Provider selects a best-suited energy-optimized hardware according to the ML model. To represent that information, the HWResource data structure is defined containing the following fields:

  • hw_description: A string with the detailed hardware description.

  • power_consumption: The power consumption in W.

  • latency: The estimation of latency of the given ONNX model for the given input batch.

  • memory_footprint_of_ml_model: The maximum memory footprint that can be implemented on the target FPGA.

  • extra_data: A sequence of raw extra data for out-of-scope use cases.

  • task_id: The identifier of the ML problem to solve.

#HWResource.idl

struct HWResource
{
    string hw_description;
    double power_consumption;
    double latency;
    double memory_footprint_of_ml_model;
    double max_hw_memory_footprint;
    sequence<octet> extra_data;
    @key long task_id;
};

3.9. Carbon Footprint Type

Finally, in order to model the output from the CO2 Footprint Provider, the CO2Footprint data structure consisting in the following fields:

  • carbon_footprint: The CO2 footprint in kgCO2e.

  • energy_consumption: The energy consumption in Wh.

  • carbon_intensity: The carbon intensity.

  • extra_data: A sequence of raw extra data for out-of-scope use cases.

  • task_id: The identifier of the ML problem to solve.

#CO2Footprint.idl

struct CO2Footprint
{
    double carbon_footprint;
    double energy_consumption;
    double carbon_intensity;
    sequence<octet> extra_data;
    @key long task_id;
};

3.10. Service Types

RequestType and ResponseType are the data structures used to communicate with the Node Configuration Service.

The RequestType structure includes the following fields:

  • node_id: An integer key based on the enum NodeID.

  • transaction_id: An integer key representing a unique counter internal to the orchestrator.

  • configuration: A string or sequence of octets representing the configuration to apply.

The ResponseType structure includes the following fields:

  • node_id: An integer key based on the enum NodeID.

  • transaction_id: An integer key representing a unique counter internal to the orchestrator.

  • success: A boolean indicating whether the request was successful.

  • err_code: An integer representing the error code if the request was not successful.

  • configuration: A string or sequence of octets representing the configuration after the request.

#Servicetypes.idl

enum ErrorCode {
        NO_ERROR,
        INTERNAL_ERROR};

struct RequestType
{
    @key long node_id;
    @key int transaction_id;
    string configuration;
};

struct ResponseType
{
    @key long node_id;
    @key int transaction_id;
    bool success;
    ErrorCode err_code;
    string configuration;
};