UPboard Ubilinux Freezing

David
David New Member Posts: 4
edited May 2019 in UP Board Linux

OS: ubilinux 4.9.45-ubilinux+

I have 100s of UPBoards deployed remotely i do not have physical access to them.
1. They do not have screens attached to them
2. they are running 24/7

I have had some issues of the devices freezing.
the only way to fix this issue is to hard reset the device (unplugging and plugging back in again)... this is a major problem for me as i cannot gain access to them myself and they should not be freezing like they do..

On problematic devices i have pulled the sys and kernel logs and there is nothing in between the timespan of the device being down.

The last line logged in the Sys log was:
Aug 9 00:17:01 ubilinux CRON[14157]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
and then a complete system crash...

Rough Hardware Info:
1. I am powering the device over the 5V GPIO pin with a UPS designed specifically for the UPBoard...
2. I also am utilizing 2 USB ports (1 for a custom BLE dongle, 2 for a Wifi dongle)
3. I have a Custom PCB attached getting powered by the UPS
4. The Custom PCB Utilizes the C7 10 pin connector to connect to a 3G mini-PCie Telit Cellular module

That is a rough description of the hardware.. and most of the time everything seems fine..
however we do seem to "randomly" get complete OS freezes and the only fix is to recycle the power..

Is there any logs / things you can think of that i can look at to get a better understanding of what the problem is.. or do you have a possible solution that will fix the issue at its core.

This is urgent, as i say i have 100's nearly 1000's of UPboards and they are "randomly" experiencing this issue..

Regards,
David Hutchinson

Comments

  • Filippo Bellini
    Filippo Bellini Emutex Posts: 10 mod
    edited August 2018

    Dear David,

    Thank you for your precise description of the issue. It looks similar to the situation experienced by another user. We're investigating the problem. I hope we will be able to get back to you soon.

    Kind regards

  • David
    David New Member Posts: 4

    @Filippo Bellini

    Have there been any further advancements with this issue? i would really like to know how to recreate this issue (i have tried multiple things) or best case scenario find a solution. I don't think the Upboard is running out of memory as this is logged in the sys logs. However i do know that i can recreate a similar issue, if i reboot the device multiple times consecutively it will freeze on boot. To recreate i just put reboot after 25 seconds into the crontab.

    I am not fully convinced it is the same issue as this happens on boot up and the other "Freeze" i am experiencing happens while running and after a "random" period of time.

    Regards,
    David Hutchinson

  • David
    David New Member Posts: 4

    I managed to get a kernel dump please see image

    This happened after rebooting the device multiple times in a row
    It was saying there was an error at drivers/gpu/drm/i915/intel_runtime_pm.c line 1059 vlv_power_well_enabled

    https://github.com/emutex/ubilinux-kernel/blob/upboard-4.9/drivers/gpu/drm/i915/intel_runtime_pm.c

    i had a look at the specified line of code. but i have limited knowledge of linux kernels and drivers so it didn't make much sense to me.
    Any help would be appreciated.

    Thanks,
    David Hutchinson

  • Jesse Kaukonen
    Jesse Kaukonen New Member Posts: 42 ✭✭

    There's an open i915 bug they've reported to Intel: https://bugs.freedesktop.org/show_bug.cgi?id=106721

    You'll have to compile to kernel without i915 to avoid this particular issue, I'm afraid, at least until there's a patched driver. To do this, comment out CONFIG_DRM_I915=m when building the kernel (use these instructions).

    https://github.com/emutex/ubilinux-kernel/blob/upboard-4.9/arch/x86/configs/upboard_defconfig#L2097

  • Dan O'Donovan
    Dan O'Donovan Administrator, Moderator, Emutex Posts: 241 admin

    Hi David

    I think there may be 2 or 3 distinct issues described in this thread:

    1) The original post describes a problem with UP boards freezing at random, with apparently no clues of the cause in the kernel or system log files. Unfortunately, I think there is not enough information available to reproduce or further diagnose this issue. I am not aware of this specific issue on other UP boards, and do not have the same configuration of hardware peripherals and power supply that may be a factor in triggering this. My suggestion would be to try to eliminate the power supply and external peripherals as a factor, if possible, by running 2 UP boards side-by-side with the same application where one board has a standard power supply and no external peripherals while the other has the full hardware configuration described in the original issue. Also, it may be worth enabling persistent log storage for journald (if not already enabled) and using ‘journalctl –b-1’ after rebooting a frozen system to see any addition log information can be retrieved when the issue occurs.

    2) The i915 driver issue highlighted by Jesse Kaukonen has been seen to occur occasionally after multiple reboots during the OS start-up sequence. The workaround proposed by Jesse appears to be effective in eliminating this specific issue. To avoid rebuilding the kernel, an alternative approach to implementing this workaround may be to remove the kernel module from the filesystem (and run 'update-initramfs') to ensure it doesn't get loaded.

    3) As mentioned above in an earlier post, the UP board may freeze during reboots but this would not appear to be related to the issue reported in the original post, as it happens only when a reboot is triggered. One possible cause of that can be the i915 driver issue (2). However, we've identified a separate issue where the UP board can freeze at random while executing a reboot sequence, occurring at a very early stage during initial boot before the OS is loaded (and before the BIOS splash screen appears or the num-lock key is lit on the keyboard). This can be reproduced using only a BIOS EFI script to repeatedly trigger a reboot before the OS is loaded. We have reported this issue to AAEON for further investigation at BIOS level.

    Please let us know if you manage to further narrow down issue (1) above. In the meantime, we will let you know if you we hear of any further developments on issues (2) and (3) above.

    Best regards,
    Dan

  • Dan O'Donovan
    Dan O'Donovan Administrator, Moderator, Emutex Posts: 241 admin

    Hi all

    In the context of increasing the reliability of reboots on the UP board, we also recommend the following measure:

    Please edit the file /etc/default/grub and ensure the following options are included in "GRUB_CMDLINE_LINUX_DEFAULT"

    GRUB_CMDLINE_LINUX_DEFAULT="reboot=efi,cold fsck.mode=force fsck.repair=yes"
    

    Then run update-grub from the command line to apply the changes.

    This does not address the issue raised in the original post, but we believe that this does help in reducing the probability of issues that may occur on the UP board during reboots.

    AAEON are actively investigating the issue (3) mentioned in my previous post, where the board may freeze immediately following a reboot.

    Best regards,
    -Dan

  • Ransu
    Ransu New Member Posts: 4
    edited September 2018

    Hi,

    I'm a new owner of a Up Board, the one that looks like a Raspberry Pi. I've noticed issues with rebooting so I came to this forum searching for similar reports. I have Arch Linux installed to the eMMC and it seems to run alright but more often then not it fails to reboot. When the board enters this state I have to fully remove power to get it to boot again.

    @Dan O'Donovan I'll try the above kernel command line settings to see if it helps any with reboots.

  • Ransu
    Ransu New Member Posts: 4

    @Dan O'Donovan I only need to add "reboot=efi,cold" right? I don't think I want to run fsck every single time.

  • Dan O'Donovan
    Dan O'Donovan Administrator, Moderator, Emutex Posts: 241 admin

    We used only the 'force' mode option in our tests, but using the following fsck options may be sufficient for your needs:
    fsck.mode=auto fsck.repair=yes

  • Ransu
    Ransu New Member Posts: 4

    @Dan O'Donovan I like to report back that I've been using my up-board for some time now as a server and I experience no issues rebooting with the above options in my grub configuration.

  • ccalde
    ccalde New Member Posts: 348 ✭✭✭

    Hi @Ransu ,

    Thanks for reporting your results with the UP board rebooting.

  • Ransu
    Ransu New Member Posts: 4
    edited February 2019

    I also added the note about it to the Arch Linux Wiki page for the UP Board https://wiki.archlinux.org/index.php/Up-board#Rebooting

  • Peter
    Peter New Member Posts: 3

    Hello,

    We have also around 100 board running offsite, and experience exactly the same issues, an unit can randomly freeze after running for months without issues. Solution for this is a power cycle.

    We only are using a serial connection to the device, connected to 40 pin connector.
    As power supply we use the standard power adapter. ( FJ-SW0504000N 5V 4A)
    We experience the same problems with ubuntu 16 and 18.

    Not sure if I understand correctly, this issue can be solved by changing the /etc/default/grub file?

    Thanks in advance.

  • Jesse Kaukonen
    Jesse Kaukonen New Member Posts: 42 ✭✭
    edited May 2019

    AAEON put out a new BIOS in last November with a fix to a particular BIOS reset issue. That helped us to solve most of the problems. The GRUB parameters above mostly help with dealing with a problem where the filesystem gets corrupted and that prevents a boot.

    The fixed BIOS is UPC1DM15. From the changelist:

    Change:
        1. Fixed : Sometimes, the time to reset system by watchdog is longer
        than the time we set in BIOS setup "Timer in Second".
    

    The latest version of Ubuntu kernel seems to have lessened the i915 issues, but I've seen it on at least two boards with the latest kernel.

  • PierreLouisK
    PierreLouisK New Member Posts: 2

    Hello everyone,

    I am also in a similar position where I have several upboards that are deployed remotely and that can freeze randomly (even after several months). We thought it was caused by heat or power voltage fluctuation but it doesn't seem to be linked (as @peter is running on a UPS and that we had the issue in winter (we also have devices with fan).
    It is neither related to the uptime as we reboot them every day (which might not be such a good idea with previous BIOS version apparently).

    Any luck at solving the issue @peter? Is upboard investigating this issue @Filippo Bellini?

  • Marius123
    Marius123 New Member Posts: 1

    Hello,

    I ran into a similar problem. I only use a chromium browser in kiosk mode and after a random time the devices freeze. I could catch the screenshot I attached. Does somebody have an idea or found a solution how to fix this issue?
    Newest BIOS Version is used and also Hardware watchdog was configured, nevertheless sometimes the whole system is stuck and needs to be power cycled.

  • roar
    roar New Member Posts: 9
    edited May 2021

    Solved this problem. The 2021 BIOS for UP Boards (UPC1DM23_EFI) has a iTCO watchdog configurable from the BIOS. If Ubilinux doesn't feed the watchdog, it will reboot the device. Therefore, if it gets stuck booting up or shutting down, the watchdog will kick in and restart the device. An easy way to test this is just leaving it in the GRUB menu and waiting for the watchdog timer to expire.

    You also need to blacklist the i915 kernel module as this module seems to make the device stuck on reboot - it doesn't seem to freeze. In fact, if you have i915 enabled, the device will be 'stuck' but still feeding the watchdog meaning it won't reboot. I tested this by making my device reboot constantly with BIOS watchdog enabled and having i915 enabled. It got stuck after 60 reboots. You must disable the i915 kernel module as that is the main issue.

    My BIOS Watchdog settings:
    watchdog function [Enabled]
    timer in second 120
    halt timer [Disabled] <- disabled means as soon as device turns on, watchdog is ticking down. (it gets fed in BIOS settings menu and by any OS that has /dev/watchdog i.e. Ubilinux).

  • jpboucher
    jpboucher New Member Posts: 1

    Hello,

    We also have a couple of UPBoards running 24/7 that seem to randomly freeze. It doesn't seem to be related to how long they run since some of them have frozen in their first hour while others can run for a month without issues. For now we use the internal watchdog to reset the boards if it happens but it's not a very good long term solution...

    Anyone has any update on this?

  • FredyHsu
    FredyHsu Administrator, Moderator, AAEON Posts: 49 admin

    Hi jpboucher,

    can you check the BIOS version you are using? You could check if you are using the latest version on the downloads page. https://downloads.up-community.org/

    Let me know if could share more information about your set-up environment.

    Thanks.

  • UpBored
    UpBored New Member Posts: 16

    I am also experiencing this, with the latest BIOS.