Up Core running Windows 10 IoT Core randomly freezing
I have a customer who has deployed about 30 Up Core boards in the field with plans to place another 30 to 50 in the coming months. Over the past two years we have seen the boards randomly freeze. Rebooting the OS allows the systems to recover until they randomly freeze again sometimes many months later. Recently we have seen a spate of random freezing. All of the boards are running the R1.8 BIOS. I read this post which details a similar issue with Up Boards. Is there a new version of the Up Core BIOS available? Looking at the up-community downloads page for the Up Core version R1.8 seems to be the last version released. Since the systems are located out in the field in remote locations, I do not have access to the crash dump files at the moment but I can try to get them if this would be helpful to diagnose the problem.
Comments
-
Hi @gerfen ,
It is difficult to diagnose the issue without the log file. I will suggest that they try to update the BIOS to the latest R1.9, probably on a spare unit or better on one of the affected units. If possible ssh into the system and run a stress-ng test for about 1-2 weeks to try to reproduce the random freezing.
stress-ng --cpu 3 --io 4 --vm 3 --vm-bytes 3160M --fork 4 --timeout 1209600
(for 2 weeks)Also, if you could try to get the dump files for the devices affected already, it can help to narrow down what could be the issue.
-
@camillus Thanks for the reply. I'll work on getting a couple systems set up with the new R1.9 version of the BIOS. Unfortunately, the failing systems are installed in remote locations which are off grid so it's going to take a bit of time to have one the systems shipped back so I can run tests on it. When I get access to the failing systems, I will see if there are any dump files available. Given that the systems appear to be completely frozen, I am concerned there may not be any dump files. I will let you know just as soon as I can.
Also I'm a bit confused by your suggestion to run stress-ng. Isn't this a Linux utility? We're dealing with Windows 10 IoT Core as the OS. I've connected to a system via Powershell and tried to execute:
stress-ng ?
which returns this:
[10.10.10.10]: PS C:\Data\Users\Administrator\Documents> stress-ng ? The term 'stress-ng' is not recognized as the name of a cmdlet, function, script file, or operable program. Check the spelling of the name, or if a path was included, verify that the path is correct and try again. + CategoryInfo : ObjectNotFound: (stress-ng:String) [], CommandNotFoundException + FullyQualifiedErrorId : CommandNotFoundException
Is there a similar utility for Windows 10 IoT Core?
-
Hi @gerfen ,
Yes you are right, stress-ng is a Linux utility. However booting the device using bootable usb with ubuntu for example and running the stress-ng test will help us narrow down the cause, if for example no crash happens then we can eliminate the idea of the random freezing being caused by the hardware. I see that there is a Windows tool that could help them with getting the logs from the Windows IOT Core. Kindly check this resource or share with them to test.
If there are more concerns you may also proceed with an RMA if they purchased directly from UP Shop, otherwise raise the RMA via their reseller and provide us a sample of the affected unit to check on our side.
-
@camillus We've been able to recreate the freezing behavior on an UP CORE which is in my customer's possession. The UP CORE is completely frozen. We cannot log into the Device Portal built into Windows 10 IoT Core, nor can we log in with a secure Powershell session. Also, the device is not seen by the Windows 10 IoT Core dashboard nor can the device be seen connected to the network router.
I've asked my customer to keep the device in this frozen state until I've had a chance to get some guidance from you. Given that the system is locked up, I'm not sure what the next steps should be. I think we'll need to force the system to reboot but I do not think we'll get a crash dump. Do you have any other advice before we restart the system?
-
@camillus Here are three dump files from the frozen system. Interesting two appear to be empty but I included the files anyway.
-
@gerfen
The dump file only showing "Probably caused by : ntoskrnl.exe ( nt+1b73b0 )", still could not know what happened to crash ><The IoT Core image is built by yourself or using our pre-built version?
You can try to connected display to see is it frozen in OS or stop in system reboot?
If frozen in OS could you detail list which app is running when IoT Core startup or try to exclusion the possible driver or app to try.you can email me directly to discussion, thanks.
-
@garyw The image is the one you helped me build about three years ago. It's been running just fine on many 10's of devices in the field until recently but just on a small number of UP Core devices. Yesterday, I was able to be onsite with one of the frozen devices. I could not get the device to respond when I connected a display and keyboard -- the display remained blank and the keyboard was unresponsive. I was forced to manually reboot the device. At the moment it is still running.
I will send you an email to continue our discussion.