[SOLVED] it freezes when stressed

Andrea Giammarchi
Andrea Giammarchi New Member Posts: 9
edited June 2016 in UP Board Linux
Dear UP,
I've successfully installed ArchLinux on my 2GB RAM / 32GB UP Board and it works just fine.

However, as soon as I stress the board via heavy GPU oriented tasks (or let's say just this WebGL page ) the board freezes: every. single. time.

I can use either Web (Epiphany) or Chromium ... after a couple of minutes it dies.

The current BIOS doesn't have any option for temperature but if I touch the board when the freeze happens it is *really* that hot.

However, I start believing maybe I'm not giving the board enough AMPs ... could that be the cause instead?

I am using a 5V 4A power supply, please let me know if it's my board a problematic one, the power supply isn't optimal, or anything else that could help.

Thank you!

P.S. I've tried to attach an image but your forum keep telling me "Attachment image size exceed limit allowed by configuration" without specifying which one is this size. Even less than 500Kb caused this warning ¯\_(ツ)_/¯

Comments

  • Javier Arteaga
    Javier Arteaga Emutex Posts: 163 mod
    Hi WebReflection,

    Most instances of freezes we saw during testing turned out to be related to a known SoC bug (ref: Intel errata CHT45. See also Linux #109051.). The only known OS workaround is to disallow the C6 and C7 processor sleep states. Could you please add [tt]intel_idle.max_cstate=1[/tt] to your kernel parameters, reboot and rerun your test?

    A 4A power supply should be enough, even for full CPU + GPU stress testing.

    PS: bumped the attachment file size to 2MB. Thanks for pointing it out.
  • Andrea Giammarchi
    Andrea Giammarchi New Member Posts: 9
    nope, even after adding that kernel parameter and rebooting, as soon as I switch to 500 fishes and one core reaches 100% it freezes

    Any other possible work around? I am also in kernel 4.6.2 but I could try an LTS version if you think that would fix.

    Thank you
  • Andrea Giammarchi
    Andrea Giammarchi New Member Posts: 9
    OK, I've tried to disable most BIOS turbo/boost related things without success, trying using or not that boot kernel parameter.

    This board, my board, looks extremely unstable with linux. I wonder if I should try Windows 10 on it and see if it goes any better ...

    If you have any other hint/info about Linux though, please let me know 'cause I don't fancy Windows too much.

    Best Regards
  • Björn Reichert
    Björn Reichert New Member Posts: 48
    edited June 2016
    Other option was to set the kernel parameter maxcpus=3. This was my solution for video freeze.
  • Andrea Giammarchi
    Andrea Giammarchi New Member Posts: 9
    nope, didn't solve neither. Also it doesn't look much like a solution but just to double check: are you also disabling max_cstate ?

    Thanks
  • Javier Arteaga
    Javier Arteaga Emutex Posts: 163 mod
    Hello WebReflection,

    If the C-state parameter does not help, then it doesn't seem like the bug I mentioned is related to the freezes you're seeing (whereas that sounds more likely in Björn's case). You're also describing a system hang that can be reliably reproduced, which isn't the case for CHT45. So I'd say we can rule that one out.

    Just so we can look deeper into it this coming week, would you mind providing some extra information?
    [ul]
    [li]What is the output of these commands for your board?
    cat /sys/devices/virtual/dmi/id/bios_version
    cat /sys/devices/virtual/dmi/id/board_version
    
    [/li]
    [li]Can you see an entry named DPTF in the BIOS menus? Is it enabled? If it is, does disabling it keep the board running under stress?[/li]
    [li]Do you also experience hangs when running a slightly lighter CPU+GPU workload over a longer (>10 min) period of time?[/li]
    [/ul]

    In the meantime, my only other suggestion would be trying with a different power supply (rated at least 5V/3A).

    Thanks for your feedback!
  • Fabrizio
    Fabrizio AAEON Posts: 123 mod
    Hi WebReflection

    Did you try to put a little fan on top of the heatsink to see whether it still freezes or not ?
    If you stress the GPU and CPU the temperature may raise and the CPU may start to throttle and the system to freeze. This is the reason why many tablets with Cherry Trail have the Turbo disabled.
    We have developed - during the stress test - an heatsink with small fan but we didnt use because in most of the case its not necessary.
    If would be great if you can make the test and let us know
    Thanks !

    Fabrizio
  • Andrea Giammarchi
    Andrea Giammarchi New Member Posts: 9
    will try, but like I've said, disabling the TURBO didn't solve much :-(
  • Fabrizio
    Fabrizio AAEON Posts: 123 mod
    My mistake, I misread it.
    With turbo disabled we didnt experience any freeze.
    Let us know.
    In the meantime could you pleas provide the info to jarteaga ?
    Thanks again
  • Andrea Giammarchi
    Andrea Giammarchi New Member Posts: 9
    bios_version: UPC1BM0R (05/24/2016)
    board_version: V0.4

    DPTF is visible and disabled

    Will try with another power supply
  • Andrea Giammarchi
    Andrea Giammarchi New Member Posts: 9
    OK, so ... this is what I've tried without success:
    [ol]
    [li]I've tried other 2 power supplies, one 5V 3A and a slightly different 5V 4A (just the model name) ... I've also tried a 5V 2A but that didn't reach the login screen[/li]
    [li]I've tried to use the only free pluggable thing on the board for the fan borrowed from an odroid xu4 ... nothing. Turns out that's the power pin header, no fan would work attached there[/li]
    [li]I've tried to enable and disable again that DPTF thing ... nothing[/li]
    [/ol]

    Accordingly, I can summarize my thoughts as such:
    [ol]
    [li]the power supply is not the problem[/li]
    [li]if I don't run heavy tasks and keep CPUs "down" the board seems to work fine so ... I am still thinking about an heating issue. Having an option on the BIOS to decide at which temperature the system should halt would help to diagnose this, but there's no such option[/li]
    [li]can anyone else reproduce the problem? login into a desktop, open Chromium, visit that webgl acquarium page, select 500 fishes, open the system monitor, wait a minute or slightly more, see the frozen env[/li]
    [li]if latter point can be reproduced elsewhere, we've got a heating up problem that should be somehow addressed/mentioned and solved, otherwise I guess I have an up-board with some technical defect and it should be substituted?[/li]
    [/ol]

    Best regards
  • Dan O'Donovan
    Dan O'Donovan Administrator, Moderator, Emutex Posts: 241 admin
    I've tried this test this evening with our ubilinux distro, using both the distro 4.4 kernel and also the latest mainline kernel (4.7-rc3), and I was unable to reproduce the issue. I left it running for over 30 minutes and the test ran fine.

    Would you consider checking with another distro, in case it is anything specific to the graphics libraries or configuration of your current distro? Perhaps try a Live USB version of Lubuntu 16.04, or something like that, if you don't want to overwrite your current Arch Linux install.
  • Andrea Giammarchi
    Andrea Giammarchi New Member Posts: 9
    edited June 2016
    takes nothing too install it ( see archibold.io ) so thanks for trying this out, I hope with your ubilinux it works so it means my board is fine ;-)

    Will come back ASAP, thank you.
  • Andrea Giammarchi
    Andrea Giammarchi New Member Posts: 9
    edited June 2016
    OK ... what a journey here, I don't even know where to start filing bugs but here the story:

    Before installing ubilinux on my up-board, I wanted to try a similar configuration with same exact installation.
    Please bear in mind that archibold.io installer is nothing but a simplified way to install ArchLinux on any i686 or x86_64 target.

    It accepts few parameters and it creates by default a SWAP partition of 2GiB, unless differently specified.

    The Intel compute stick was actually giving similar problems the up-board was giving, but it uses a different platform: Intel® Atom™ CPU Z3735F @ 1.33GHz x 4

    Not happy to mix similar problems on different hardware, I've tried to install fresh new copy on latest Compute Stick: Intel® Atom™ x5-Z8350 @ 1.44GHz x 4

    At this point I installed ubilinux on the up-board and verified with my eyes everything was cool.

    However, in other cases, I could replicate the freezing problem in every single Intel "Embedded oriented" board but then I've realized one thing: these were all freezing as soon as these were reaching max available RAM and start using the SWAP partition!

    The ubilinux RC3 image I've tried didn't install with a SWAP partition: ah-ah!
    On top of that, it used the kernel 4.4 ... which is part of the LTS versioning: ah-AH!

    So, TL;DR I've installed archibold.io with SWAP=0 and used `archy use linux-lts only` to setup the linux lts kernel only and guess what: everything went fine!

    If I use the kernel 4.6, even without SWAP, the board freezes; if I use 4.4 kernel and no SWAP at all, the board runs like a charm.

    It must be said the board also warms up like "the ground is lava" but yet it worked reliably stable, without glitches.

    Furthermore: using LTS 4.4 with SWAP freezes but, at some point, it keeps working. LTS proofed itself stable AF.
    Same config with latest mainstream kernel, bye-bye board, it'll never recover.

    Please find attached a screenshot of archibold running on lts and an up-board on GNOME at full steam and without freezing for a millisecond.

    Thank You!

    P.S. the forum degraded the screenshot, here the original one:
    https://webreflection.github.io/examples/img/archibold.io-up-board.png