From DATA to Deploy — Nvidia/ Docker/ PyTorch / ONNX / TensorRT on Jetson

5 min readJul 14, 2021

In this article, I want to share my experience using TensorRT, RetinaNet, based on an official (Nvidia) repository that will allow you to start using optimized models in production as soon as possible.

We need to mark up the dataset, train the RetinaNet / Unet network on Pytorch1.3+ on it, convert the received weights to ONNX, then convert them to the TensorRT engine run the whole thing in docker, on ARM (Jetson) architecture, thereby minimizing manual deployment surroundings. Our steps:

Our environment
Assembling Docker containers
Running and debugging a docker container
We mark our data and train NN
Export and inference of Unet models with Resnet encoder
Deployment on NVIDIA Jetson

Environment

I completely quit the use and deployment of at least some libraries on the desktop machine and the devbox. The only things to build and install are the python virtual environment and CUDA 11.x. (but you can limit yourself to one Nvidia driver) from deb.

I hope you have Ubuntu 18+ and CUDA. Details on the installation process, I will not stop; the official documentation is quite enough—for example, this detailed instruction.

We will install Docker, and the docker installation guide can be found easily; here is an example. Of course, setting up the lasted Docker version.

Ok… We have Ubuntu, CUDA, and Docker. Next, make this:

distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.listsudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker

And you can not learn to officially repository Nvidia-docker. In this instruction, I will be using the official NVIDIA Object Detection Toolkit (ODTK), not forks or duplicates.

git clone https://github.com/nvidia/retinanet-examples

We need to register with NGC Cloud and log in. We go here ngc.nvidia.com, register, and after we get inside the NGC Cloud, click SETUP in the upper left corner of the screen and press “generate API Key.” I recommend saving it. Otherwise, the next time you visit it, you will have to generate it again and, accordingly, deploying it on a new wheelbarrow, re-perform this operation.

Ok, next… Let’s execute in your Terminal:

docker login nvcr.io
Username: $oauthtoken 
Password: <Your Key> #What you have saved from NGC Cloud SETUP

2. Assembling Docker containers

We have to go to the root folder to the retina-examples project and execute:

docker build -t odtk:latest retinanet-examples/

3. Running and debugging a Docker container

Let’s move on to the main case of using the container and the development environment. First, let’s run Nvidia docker. Let’s execute:

docker run --gpus all --rm --ipc=host -it odtk:latest

or… if you want to get access, ssh to the container and mount volume.

docker run --gpus all --net=host -v /home/<your_user_name>:/workspace/mounted_vol -d -P --rm --ipc=host -it odtk:latest

4. DATA and Train

For labeling (annotate Dataset), I use LabelImg and Roboflow. This is not an advertisement, for me, there is no support or discounts — just good services and I want to use these. What about a second (Roboflow) is a nice and convenient service? Recently, many problems have been fixed, and conversion to many formats has been added.

All this is necessary because you will not be able to push your data into RetinaNet quickly since it is in its own format, and for this, we need to convert it to COCO format. Roboflow will help us with this and give us the required format.

There are two main ways to train a model with odtk:

Fine-tuning the detection model using a model already trained on a large dataset. (like MS-COCO)
Fully training the detection model from random initialization using a pre-trained backbone. (usually ImageNet)

This is from the original repository Train configuration:

odtk train retinanet_rn50fpn.pth --backbone ResNet50FPN \
--images /coco/images/train2017/ 
--annotations /coco/annotations/instances_train2017.json \
--val-images /coco/images/val2017/ 
--val-annotations /coco/annotations/instances_val2017.json

This mine config:

train retinanet_rn34fpn.pth
--backbone ResNet34FPN
--classes 2
--val-iters 10
--images /workspace/mounted_vol/dataset/train/images
--annotations /workspace/mounted_vol/dataset/train.json
--val-images /workspace/mounted_vol/dataset/test/images
--val-annotations /workspace/mounted_vol/dataset/val.json
--jitter 256 512
--max-size 512
--batch 32

You can use any config for your experiments, and you can fix train config for your own parameters—a document for details here.

Ok. In the console, you will see something like this:

Initializing model...
     model: RetinaNet
  backbone: ResNet34FPN
   classes: 2, anchors: 9
Selected optimization level O0:  Pure FP32 training.Defaults for this optimization level are:
enabled                : True
opt_level              : O0
cast_model_type        : torch.float32
patch_torch_functions  : False
keep_batchnorm_fp32    : None
master_weights         : False
loss_scale             : 1.0
Processing user overrides (additional kwargs that are not None)...
After processing overrides, optimization options are:
enabled                : True
opt_level              : O0
cast_model_type        : torch.float32
patch_torch_functions  : False
keep_batchnorm_fp32    : None
master_weights         : False
loss_scale             : 128.0
Preparing dataset...
    loader: pytorch
    resize: [1024, 1280], max: 1280
    device: 1 gpus
    batch: 1, precision: mixed
Training model for 20000 iterations...
[    1/20000] focal loss: 0.95619, box loss: 0.51584, 4.042s/4-batch (fw: 0.698s, bw: 0.459s), 1.0 im/s, lr: 0.0001
....
......

5. Export and inference our model

Now… we need to make the Inference on our data. This can be done using this command:

odtk infer retinanet_rn34fpn.pth --images /dataset/val --output detections.json

And so… we’ve jumped on to the most interesting stage — export in TensorRT. For faster inference, export the detection model to an optimized FP16 TensorRT engine:

odtk export model.pth engine.plan

And of course, we have to Evaluate our model with TensorRT.

odtk infer engine.plan 
--images /workspace/mounted_vol/dataset/test/images 
--annotations /workspace/mounted_vol/dataset/val.json

6. Deployment to Nvidia Jetson nano (4GB)

I believe you have a Jetson nano with the latest JetPack and have done some optimization. If you have not, here are some sources.

https://www.pyimagesearch.com/2020/03/25/how-to-configure-your-nvidia-jetson-nano-for-computer-vision-and-deep-learning/https://www.jetsonhacks.com/2019/03/25/nvidia-jetson-nano-developer-kit/

Ok. We need to connect our Jetson Nano and assemble and run a Deepstream on it. You can provide 4A @ 5V to the barrel jack for more power-hungry applications after putting a jumper on the power selection pins. The jumper determines which power jack to use. I know Jetson includes Deepstream in JetPack, but I like to use Docker on Jetson and Deepstream apps. Let’s go to the official Documentation from Nvidia and Config our Deepstream app.

I do not want to retell the official Nvidia documentation. Nvidia has excellent documentation for the Jetson family. We need to use our custom model and try to deploy it to Jetson.

https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_using_custom_model.html

I used DetectNet for my detector in DeepStrem. Official Nvidia Docker container has many examples for pre-trained models. You can test it before you are starting to deploy your own model. It will be a great experience, and you will understand how this SDK works.

Testing — DeepStream

I hope this article was helpful and maybe helped you find some answers. I apologize if the article was not detailed, and I did not cover all the steps step by step.

From DATA to Deploy — Nvidia/ Docker/ PyTorch / ONNX / TensorRT on Jetson

Written by Evhenii Rvachov