Optimizing AI Deployment on Edge Nodes with Barbara

This post delves into how Barbara 's MLOps capabilities make deploying AI models like ResNet18 to edge devices effortless and remote, all managed seamlessly through a single, intuitive console. Discover more.

Technology
Written by:
Enrique Ramírez
Tags:

Unlocking the full potential of AI in industry requires deploying machine learning models at the edge, where data is generated. Barbara makes this seamless, offering efficient deployment, scalability, and reliable edge data processing.

Leveraging MLOps for Efficient AI Deployment

Barbara's MLOps management capabilities empower users to:

1. Load Trained Models: Seamlessly integrate trained models into the platform, supporting a variety of formats including TensorFlow SavedModel, PyTorch TorchScript, and ONNX.

2. Deploy Models to Edge Nodes: With a single click, you can deploy models to one or multiple edge nodes.

3. Choose the Right Serving Engine: You can select between TensorFlow's TFX Serving Engine and NVIDIA's Triton Inference Server to serve the deployed models in the node.

4. Harness GPU Power: Utilize the GPU capabilities of edge devices to accelerate model inference and enhance real-time performance.

Empowering AI at the Edge

Barbara’s MLOps capabilities eliminate the challenges of deploying and managing AI at the edge, enabling organizations to unlock the full potential of their models. By simplifying the deployment process and offering flexible serving options, Barbara helps industrial operations stay agile, efficient, and ahead of the curve.

Understanding the Use Case: Serving ResNet18 on an Edge Node

The ResNet18 model, a popular convolutional neural network (CNN), is specifically designed for image classification tasks. It excels at recognizing objects such as animals, equipment, or components in images, making it highly valuable in industries like manufacturing, healthcare, and logistics. Deploying ResNet18 on an edge device enables faster inference and minimizes dependence on cloud connectivity.

Using Barbara Edge AI Platform, the deployment process is broken into 3 key steps:

  1. Upload the Model to Panel’s Library: Save the ResNet18 model in Torcscript format and upload it to the Panel’s library.
  2. Deploy on the Edge Node: Upload and configure the model for real-time inference using NVidia’s Triton Inference Server
  3. Run Inference Requests: Send image data to the edge node and retrieve classification predictions using the VPN connection available in Barbara Panel.

Step 1: Uploading the PyTorch Model to Panel’s Library

Before deploying the model, it must be uploaded to the Panel’s library in a compatible model’s format. Remember the options are:

  • SavedModel (Tensorflow/Keras)
  • Torchscript (Pytorch)
  • ONNX
Figure 1: You can upload Tensorflow, Pytorch or ONNX formats to Panel’s library.

In this case we will use the Pytorch framework to download the pretrained Resnet18 model and save it locally in Torchscript format. The following script demonstrates how to download the ResNet18 model, convert it into TorchScript format, and save it as resnet18_traced.pt.

Interfaz de usuario gráfica, Texto, Aplicación, Correo electrónicoDescripción generada automáticamente

Once we have the resnet18_traced.pt file, we just need to compress it in a zip file and upload it to our Panel’s library.

Figure 2: Uploading the Resnet18 model to Panel’s library.

TorchScript ensures compatibility with NVIDIA Triton, Barbara's model-serving engine, so we will use that inference server in our Edge Node.

After uploading our model, it will be available in our Lilbrary, ready to be deployed to any Edge Node.

Figure 3: Resnet18 model in Panel’s library.

Step 2: Deploying our model in an Edge Node

  1. Select the target edge node and enter its Node Details view.
Figure 4: The Node Details view in Barbara Panel. Adding a new Card.

  1. Add a Model Card and choose the uploaded ResNet18 model. Select if your model must be run over the Edge Node’s GPU.
Figure 5: Deploying a model: Choose the desired model, version, and GPU serving option.

  1. Then, your model will be sent to your Edge Node and will start being served using the NVidia Triton Inference Server. A new “model” card will appear in the “Node Details” view of your target node. Check that the server’s inference endpoints are listed in the “Inference REST URL” section of the card.
Figure 6: The Resnet18 model’s card listing the Inference REST URLs. 

Step 3: Running a remote Inference on the Edge Node

IInference involves sending an image to the model via REST API and receiving classification results. But how can we access remotely to the services deployed in our Edge Nodes? Thanks to the VPN functionality available in Barbara Panel, it is really easy. Just expand the "Networking card” available in the Node Details view and activate the VPN connection of the Edge Node.

Figure 7: Activate the node’s VPN before sending the inference in the Networking card of the Edge Node. 

Once the VPN connection is enabled, we will use a Jupyter Notebook to perform the Inference request to our node. This Jupyter Notebook will do several things:

  1. Preprocess the image to adapt it to the Resnet18 model’s input format.
  2. Send the image data to the Edge Node’s  Inference REST URL.
  3. Receive the response and get the prediction’s label.

Figure 8: “resnet_inference.ipynb” jupyter Notebook used to send inferences to the Node.

.

Finally, the results obtained from the model are interpreted and that are the results:

Figure 9: 3 different model’s predictions for a cat, a dog and an elefant. Well done, Resnet18!!

Advantages of deploying your AI Models using Barbara

1. Simplified Deployment and Management:

  • Focus in model training and optimization: Focus your efforts in the model training and optimization, not in the process of edge deployment.
  • Centralized Control: Manage and deploy AI models across a large number of edge devices from a single, centralized platform.  
  • Automated Deployment: Automate the deployment process, reducing manual effort and potential errors.
  • Remote Monitoring and Management: Monitor the health and performance of deployed models remotely, allowing for proactive maintenance and troubleshooting.  

2. Enhanced Performance and Scalability:

  • Reduced Latency: Process data locally on edge devices, minimizing latency and enabling real-time decision-making.  
  • Improved Scalability: Easily scale AI deployments by adding or removing edge devices as needed.
  • Optimized Resource Utilization: Efficiently allocate compute resources to AI workloads, maximizing hardware utilization.  

3. Improved Security and Privacy:

  • Data Privacy: Process sensitive data locally, reducing the risk of data breaches and privacy violations.  
  • Secure Communication: Implement secure communication protocols to protect data transmission between edge devices and the central platform.  
  • Enhanced Security: Benefit from Barbara’s built-in security features to safeguard the deployment environment.  

4. Increased Flexibility and Agility:

  • Rapid Model Deployment: Quickly deploy and update AI models across a large number of edge devices.  
  • Adaptability to Changing Conditions: Dynamically adjust model configurations and parameters to respond to changing conditions.
  • Support for Diverse Hardware and Software: Accommodate a variety of edge devices and AI frameworks.

Conclusion

Deploying AI models like ResNet18 to edge devices is made simple and efficient with Barbara's Edge Orchestration Tool. By combining the power of PyTorch, NVIDIA Triton, and Barbara’s platform, organizations can unlock real-time AI capabilities at the edge. 

Ready to take your AI models to the edge? Start exploring Barbara today! Book a free trial today.