Does high-speed inference have to be complex? A comparison of TensorRT and ONNXRuntime

2 minute read

ONNXRuntime

ONNXRuntime is a framework based on the onnx model type and allows neural network inference on a few lines. Recently, it has been developed very dynamically towards different inference engines. The table below shows the sample providers and target platforms (you can find the whole and the up-to-date version here).

Target	Native library	Provider
NVIDIA Jetson	TensorRT	TensorrtExecutionProvider
INTEL NCS2	OpenVINO	OpenVINOExecutionProvider
XILINX FPGAs (preview)	Vitis-AI	VitisAIExecutionProvider
Android	NNAPI	OrtSessionOptionsAppendExecutionProvider_Nnapi
Apple (preview)	CoreML	OrtSessionOptionsAppendExecutionProvider_CoreML

Native libraries

Native libraries such as TensorRT for the NVIDIA Jetson series are specifically optimized for the platform. However, it is difficult to write code in them and for a non-experienced person, it may introduce many mistakes. In the TensorRT case for inference you need to:

create a session and the parser, and then load the engine into the program properly,
create memory allocations in the host and device,
bind inputs and outputs,
manually copy data to the GPU before inference and to the host after the operation.

But just since the alternatives have appeared, why not trust the experienced developers and just use the lines:

import onnxruntime as ort

ort_sess = ort.InferenceSession('model.onnx', providers=['TensorrtExecutionProvider'])
in_names = [inp.name for inp in ort_sess.get_inputs()]
out_name = [output.name for output in ort_sess.get_outputs()]

some_data_output = ort_sess.run(out_name, {in_names[0]: some_data_input})

EDGE AI benchmark

The one-API framework sounds excellent. However, how does it work in practice? It works pretty well! See the following comparison of the performance of ONNXRuntime against TensorRT on popular algorithms for classification tasks on EDGE AI devices: Jetson NX and Jetson NANO.

Twitter Facebook LinkedIn

Bartosz Ptak

Does high-speed inference have to be complex? A comparison of TensorRT and ONNXRuntime

ONNXRuntime

Native libraries

EDGE AI benchmark

Comments

You May Also Enjoy

Robotics Week at Silesian University of Technology
November 14, 2024 1 minute read

ML in PL Conference 2024 - our participation
November 9, 2024 5 minute read

IAC Congress 2024 - Milan
October 25, 2024 less than 1 minute read

IEEE ICCP Conference 2024 - Cluj-Napoca, Romania - Our contribution
October 19, 2024 4 minute read

Bartosz Ptak

ONNXRuntime

Native libraries

EDGE AI benchmark

Comments

You May Also Enjoy

Robotics Week at Silesian University of Technology November 14, 2024 1 minute read

ML in PL Conference 2024 - our participation November 9, 2024 5 minute read

IAC Congress 2024 - Milan October 25, 2024 less than 1 minute read

IEEE ICCP Conference 2024 - Cluj-Napoca, Romania - Our contribution October 19, 2024 4 minute read

Robotics Week at Silesian University of Technology
November 14, 2024 1 minute read

ML in PL Conference 2024 - our participation
November 9, 2024 5 minute read

IAC Congress 2024 - Milan
October 25, 2024 less than 1 minute read

IEEE ICCP Conference 2024 - Cluj-Napoca, Romania - Our contribution
October 19, 2024 4 minute read