Using OpenVINO with GPU on UP Squared reboot the system

I'm using OpenVINO with GPU for inference my deep learning model. It's very often, the UP squared board automatically reboot when I use GPU. It might because of overheating, after I use the cooling fan to decrease from 65°C to 45°C, the problems disappeared.

Anyone knows about this problem, in my opinion, 65°C is not very hot and the GPU shouldn't make the system reboot right ?

Tagged:

Comments

  • DCleri
    DCleri Administrator, AAEON Posts: 1,213 admin

    No it shouldn't reboot when the temperature is 65 degrees.

    can you provide the sample or application that cause the system to reboot? Can you make the system reboot with another OpenVINO sample while using the GPU?

  • dhoa
    dhoa New Member Posts: 13

    Thanks for your response. Actually, I think about the temperature because I ran the same program and sometimes it works, sometimes it make the computer reboot. I tested with 2 different models, resnet and SSD and the same things happened. When I ran with cooling fan, everything is ok

    Actually, I use sensors command in linux to measure the temperature. And I think it is for the CPU temperature. I don't know how to measure precisely the Intel GPU.

  • dhoa
    dhoa New Member Posts: 13

    Can someone give me some hints why the system can be reboot when using GPU ? The problem is still there and is unpredictable for me. Thanks

  • DCleri
    DCleri Administrator, AAEON Posts: 1,213 admin

    The system should not reboot even if under stress, where eventually it would throttle the frequency down.

    Can you tell me how it the UP Squared positioned? Is the heatsink on top, or the board is upside down?

    Do you have an example from the OpenVINO samples for GPU that we can use to reproduce the issue?

    Also what is the hardware and software complete setup?

    BIOS version, Operating System version, kernel version, OpenVINO version etc.

  • dhoa
    dhoa New Member Posts: 13

    The heatsink is down. One OpenVINO model you can test with is face-detection-retail-0004 (you can find it in OpenVINO example).

    I used the UP squared board, VPU attached (but it is not used during the test).

    Bios Version: UPA1AM42
    OS: Ubuntu 18.04,
    Kernel version: 4.15.0-37-generic
    OpenVINO version: 2019 R3.1

    You need to retest it several times because most of the times it runs ok.

    Thank you for your helps

  • DCleri
    DCleri Administrator, AAEON Posts: 1,213 admin

    The heatsink should be facing up for better dissipation, especially if you are stressing CPU and GPU.
    Try with some spacers if you have some.

    About the BIOS, I don't think it is the case, but it is good to update to the latest version available: https://downloads.up-community.org/download/up-squared-uefi-bios-v4-6/

    About the kernel version, is that the UP Ubuntu kernel installed from our PPA?

    Last thing which I forgot to ask, what is the Power Supply that you are using?

  • dhoa
    dhoa New Member Posts: 13

    The kernel I got from your PPA
    Kernel = 4.15.0-37-generic
    Patch = #40~upboard04-Ubuntu SMP Thu Feb 14 13:49:37 UTC 2019

    Power getting from plugging into normal Europe electricity.
    Adapter model FJ-SW0506000
    Input 1.5A Max
    Output 5V 6000mA

  • DCleri
    DCleri Administrator, AAEON Posts: 1,213 admin

    ok, thank you.

    What is the CPU/RAM configuration of the Board?
    Celeron, Pentium or Atom? 2, 4 or 8GB?

  • dhoa
    dhoa New Member Posts: 13

    CPU information:
    vendor_id : GenuineIntel
    cpu family : 6
    model : 92
    model name : Intel(R) Pentium(R) CPU N4200 @ 1.10GHz

    RAM: 8GB

  • dhoa
    dhoa New Member Posts: 13

    More specific for Ubuntu version: Ubuntu 18.04.3 LTS

    Because someone told me maybe there are problems in 18.04.3

    Thanks

  • dhoa
    dhoa New Member Posts: 13
    edited November 2019

    I minimize the code to reproduce the problem for you guys can see below

    import cv2                                                                                                       
    import numpy as np
    import time
    from openvino.inference_engine import IENetwork, IEPlugin
    net = IENetwork(model='/home/k/face-detection-retail-0004.xml', weights='/home/k/face-detection-retail-0004.bin')
    plugin = IEPlugin(device="GPU")
    exec_net = plugin.load(network=net)
    
    def _preprocess_face(img, shape=(300,300)):
        origin_shape = img.shape
        img_processed = cv2.cvtColor(img, cv2.COLOR_RGB2BGR)
        img_processed = cv2.resize(img_processed, (shape[0], shape[1]))
        img_processed = img_processed.transpose((2, 0, 1))
        return img_processed
    
    image = cv2.imread('/home/k/_.jpg')
    
    for i in range(100):
        img = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
        img = _preprocess_face(img)
        start = time.time()
        res = exec_net.infer({'data': img})
        print(time.time()-start)
    
    

    The output I got is:
    $ python gpu.py
    0.017605304718017578
    0.014819860458374023

    It ran 2 times then the system reboot. I have tried many differents things that I can imagine: I use different model, run as asynchronous, synchronous but the problems is always there. sometimes it reboots directly from the 1 try, sometimes after several try.

    I would really appreciate if someone can help me on that.

    The model I downloaded from Openvino pretrained model for face detection

  • dhoa
    dhoa New Member Posts: 13

    Sorry for bothering you too much on this unclear problem. I know it's hard to figure out this kind of problem.

    I'm thinking maybe it is because of the source power. I also have problem with ssh to the UP and when I change the source power, it is fixed. Maybe the GPU's problem related to it. Can the GPU use to much power and restart the machine ?

  • DCleri
    DCleri Administrator, AAEON Posts: 1,213 admin

    Hi @dhoa

    Thanks for the detailed instructions.

    The reboot seems something related to power but 5V6A is the standard PSU we use and suggest for this platform. If you have another PSU you can try in case the one you have does not provide a stable voltage.

    We will try to reproduce the issue and get back to you.

  • stykujason
    stykujason New Member Posts: 6
    edited December 2019

    Hello,
    I'm encountering a similar issue when stressing the GPU, running Windows 10. Up Squared board reboots within 5-10 minutes of full GPU usage. I'm using the PSU provided in the dev kit sold by Aaeon. Can you please advise? Here is my original post:
    Up^2 with fan, Pentium N4200 crashing/rebooting Win10 with heavy graphics use

  • DCleri
    DCleri Administrator, AAEON Posts: 1,213 admin

    Hi @stykujason

    I just replied to your topic, we can continue from there, but the issue is different.

  • DCleri
    DCleri Administrator, AAEON Posts: 1,213 admin

    hello @dhoa

    We have tested the same code and setup (hardware and software), but with the latest BIOS available R4.6 and the sample was running fine for 3 hours.

    Can you please try to update the BIOS and try again?

    If it is not working yet, please apply for RMA

  • dhoa
    dhoa New Member Posts: 13
    edited April 2020

    Hi @DCleri,

    Sorry for missing your response. Thank you for your help. I think the problem is related to the PSU. When I used the GPU with the standard PSU, the reboot is quite rare, but very often with our own battery. I have tested with several UP also

    Can you help me to clarify how much power should we use for UP Squared board with GPU ? (I think the standard PSU has 65W)

    At this moment I'm still using BIOS 4.0 and can not update to 4.6 immediately because the board is in remote (Can I update it remotely ?). But if the system is rebooted by BIOS, do we have some logs somewhere to confirm this ? Because I checked in syslog by journalctl and found no information

    Thanks

  • DCleri
    DCleri Administrator, AAEON Posts: 1,213 admin

    The latest BIOS available is 5.0, when you get a chance you can try.

    ABout the PSU it is 30W, the SoC itself can drain 15W or more when it peaks, then you have to consider all the other components.
    The spikes can be the ones causing problems where a battery not powerful enough may lower its voltage when giving more current suddenly. So it is required a proper circuit to make sure the voltage is stable in any situation.

  • dhoa
    dhoa New Member Posts: 13

    My friend just help me to update the BIOS and it's very likely fix the problems. I need to test more but until now everything seems fine. I've never think about this solution can work but after looking at the changelog for v.4.6

    **Change: Set "OS reset select" as "Warm Reset" to fix issue that OS restart becomes cold reset. Patch issue that TPM self-test CMD is timeout sometimes. Enable iTCO as default. Intel uCode update for Intel-SA00233 **

    Then I thought that it could be a very nice idea :D . Hope that this will fix forever the problem because I've stayed with it for a really long time.

    Thank so much for your support