If you already made something, tell us!
AI Performance Benchmark for UP products
Hi makers and up-developers,
this discussion treats the performance of UP products for AI applications, especially running neural nets for inference. The goal would be a benchmark between some well known maker-boards and the UP products.
Why do I open this discussion?
Currently I am developing an application which core component is a neural network that predicts objects in images (a pretty normal application ). Currently i'm using the Jetson TX2 and it works well. Using the internal GPU with Tensorflow is very intuitively. Now I want to try some other (maybe cheaper) boards for inferring neural nets.
The biggest problem: Converting nets with OpenVINO is much work... Often my Nets are not running or couldn't be converted. An example problem i had is THIS.
I don't want to be sad after hard investigating and work on my models to make them running on UP products to realize that the Up products are at least very slow...
I discussed this topic with an AAEON collaborator on VISION trade fair in Stuttgart and he agreed my opinion, that it would be very usefull to have a kind of benchmark.
What to do?
To get meaningful results, one needs to run the same model on UP products with OpenVINO and some other boards. It would be very interesting to see a direct comparison between some boards inferring the same neural net.
Interesting categories would be:
- Up Board GPU
- Up Squared GPU
- Up AI core (MYRIAD)
- Up AI core X (after release!)
A widespread board is the Jetson TX2, so i think thats a interesting adversary for a comparison (Just my opinion!)
Is there anybody that made some experiences? It would be nice to share them!
My first experiences!
Ok, let me open this discussion with my experiences:
I converted a Mask-RCNN with Resnet101 as backend. You can find it in here. I inferred a Full HD video (1920x1080)
- On Jetson TX2 inferring was done within 1-2 seconds
- On Up Squared GPU it was done in ~40 Seconds.
- On Up AI core (MYRIAD) i was not able to test it, because the IR-model was not loadable...
Thanks for your interest!
Greetings,
Timo
Comments
-
New results:
For every setup i loaded an image and inferred it 1000 times with batch size 1. Precision was set to FP32. Every model was downloaded here: https://software.intel.com/en-us/articles/OpenVINO-Using-TensorFlow
SSD-Mobilenet-V2
resolution 300x300- UP AI CORE (FP16!) 8:02
- Jetson (clocked!) 2:21
- Up Squared GPU 1:14
Faster RCNN Resnet 50
resolution 600x600- Jetson (clocked!) 9:39
- Up Squared GPU 21:29
Faster RCNN Inception V2
resolution 600x600- Jetson (clocked!) 5:08
- Up Squared GPU 14:28
-
Hi TimoK93,
Thanks for publishing these interesting numbers and opening a topic about benchmarking of inference models on different platforms.
One thing that we should clarify is that the price, the power consumption and other elements differs from platform to platform and in order to make a comparison we should list those information, maybe with the help of a table.
You can contact me privately and we can further discuss and publish the results also in our wiki.
Thanks for your work so far!
-
Hey guys,
new results!Today I worked on an Object detector benchmark on the Up-Squared for high definition images!
Test setting
- Up Squared (Pentium N4200)
- Load an image and infer it
- While testing the Screen was turned on (Maybe this slows down the GPU?)
- For evaluation I logged the average inference time per frame
- All models downloaded here: https://software.intel.com/en-us/articles/OpenVINO-Using-TensorFlow
Results
- All networks only ran on GPU, not on CPU.
Faster R-CNN ResNet 101 Kitti
- 1920x1080 FP32: 9.7 s/frame
- 1920x1080 FP16: 5.4 s/frame
- 1280x720 FP32: 5.15 s/frame
- 1280x720 FP16: 3 s/frame
Faster R-CNN Inception V2 COCO
- 1920x1080 FP32: 2.2 s/frame
- 1920x1080 FP16: 1.5 s/frame
- 1280x720 FP32: 1.2 s/frame
- 1280x720 FP16: 0.83 s/frame
Hope it will help someone!
Greetings,
Timo