drm/i915: Resetting chip after gpu hang

MICHAEL A POPE
MICHAEL A POPE New Member Posts: 8
edited October 2017 in UP Squared Linux
I'm running ubilinux 4.0 with Kernel 4.12.0-0.bpo.2-amd64 and this morning whilst playing a video off the twitch website within google-chrome the whole computer froze. The computer still had plenty of disk and memory space and was not overheating. I was still able to ssh in though and capture this from the dmesg log;
[32344.796818] drm/i915: Resetting chip after gpu hang
[32344.800711] BUG: unable to handle kernel NULL pointer dereference at 0000000000000070
[32344.809593] IP: reset_common_ring+0x8d/0x110 [i915]
[32344.815084] PGD 0 
[32344.815086] P4D 0 

[32344.821281] Oops: 0000 [#1] SMP
[32344.824805] Modules linked in: bnep 8021q garp mrp stp llc snd_hda_codec_hdmi xt_connmark cpufreq_conservative iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 cpufreq_userspace nf_nat_ipv4 nf_nat nf_conntrack libcrc32c cpufreq_powersave iptable_mangle iptable_filter bluetooth joydev ecdh_generic rfkill hid_logitech_hidpp usb_f_acm usb_f_fs usb_f_serial u_serial libcomposite udc_core configfs hid_generic hid_logitech_dj binfmt_misc nls_ascii nls_cp437 vfat fat intel_rapl x86_pkg_temp_thermal coretemp i2c_designware_platform kvm_intel i2c_designware_core kvm evdev irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_soc_skl intel_rapl_perf efi_pstore snd_soc_skl_ipc snd_soc_sst_ipc efivars snd_soc_sst_dsp snd_hda_ext_core pcspkr snd_soc_sst_match lpc_ich snd_soc_core snd_compress snd_hda_intel
[32344.904437]  snd_hda_codec i915 snd_hda_core snd_hwdep idma64 snd_pcm ftdi_sio snd_timer usbserial drm_kms_helper snd intel_lpss_pci intel_lpss soundcore drm mei_me shpchp mfd_core mei i2c_algo_bit video button usbhid hid i2c_dev parport_pc ppdev lp parport efivarfs ip_tables x_tables autofs4 ext4 crc16 jbd2 crc32c_generic fscrypto ecb mbcache mmc_block crc32c_intel aesni_intel aes_x86_64 crypto_simd cryptd glue_helper ahci libahci libata xhci_pci sdhci_pci i2c_i801 xhci_hcd sdhci mmc_core usbcore usb_common r8169 mii scsi_mod
[32344.956903] CPU: 0 PID: 13569 Comm: kworker/0:0 Not tainted 4.12.0-0.bpo.2-amd64 #1 Debian 4.12.13-1~bpo9+1
[32344.967847] Hardware name: AAEON UP-APL01/UP-APL01, BIOS UPA1AM18 06/23/2017
[32344.975807] Workqueue: events_long i915_hangcheck_elapsed [i915]
[32344.982575] task: ffff96e549126180 task.stack: ffffa6a58c320000
[32344.989268] RIP: 0010:reset_common_ring+0x8d/0x110 [i915]
[32344.995334] RSP: 0018:ffffa6a58c323b98 EFLAGS: 00010206
[32345.001201] RAX: 0000000000003f50 RBX: ffff96e590e87900 RCX: ffff96e5b3ebdc38
[32345.009214] RDX: 0000000000003f90 RSI: ffff96e5b3e0c000 RDI: ffff96e5b3ebdc00
[32345.017229] RBP: ffffa6a58c323bb8 R08: 0000000000000e44 R09: ffffa6a5b0014540
[32345.025242] R10: 00000000ffffffff R11: ffff96e5b1186040 R12: ffff96e5b1850000
[32345.033252] R13: 0000000000000000 R14: ffff96e5b51ba900 R15: ffff96e5b51b8000
[32345.041283] FS:  0000000000000000(0000) GS:ffff96e5bfc00000(0000) knlGS:0000000000000000
[32345.050368] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[32345.056817] CR2: 0000000000000070 CR3: 000000027194a000 CR4: 00000000003406f0
[32345.064829] Call Trace:
[32345.067577]  ? bit_wait_io_timeout+0x90/0x90
[32345.072406]  ? i915_gem_reset+0xbe/0x370 [i915]
[32345.077537]  ? intel_uncore_forcewake_put+0x36/0x50 [i915]
[32345.083702]  ? bit_wait_io_timeout+0x90/0x90
[32345.088526]  ? i915_reset+0xd9/0x160 [i915]
[32345.093245]  ? i915_reset_and_wakeup+0x17d/0x190 [i915]
[32345.099152]  ? i915_handle_error+0x1df/0x220 [i915]
[32345.104629]  ? scnprintf+0x49/0x80
[32345.108488]  ? hangcheck_declare_hang+0xce/0xf0 [i915]
[32345.114306]  ? fwtable_read32+0x83/0x1b0 [i915]
[32345.119441]  ? i915_hangcheck_elapsed+0x2b1/0x2e0 [i915]
[32345.125404]  ? process_one_work+0x181/0x370
[32345.130105]  ? worker_thread+0x4d/0x3a0
[32345.134411]  ? kthread+0xfc/0x130
[32345.138129]  ? process_one_work+0x370/0x370
[32345.142822]  ? kthread_create_on_node+0x70/0x70
[32345.147907]  ? do_group_exit+0x3a/0xa0
[32345.152114]  ? ret_from_fork+0x25/0x30
[32345.156317] Code: c8 01 00 00 89 50 14 48 8b 83 80 00 00 00 8b 93 c8 01 00 00 89 50 28 48 8b bb 80 00 00 00 e8 2b 29 00 00 4d 8b ac 24 60 02 00 00 <49> 8b 45 70 48 39 43 70 74 51 4d 85 ed 74 14 4c 89 ef e8 cc c1 
[32345.177577] RIP: reset_common_ring+0x8d/0x110 [i915] RSP: ffffa6a58c323b98
[32345.185298] CR2: 0000000000000070
[32345.204182] ---[ end trace e28431d538f56e18 ]---
[32345.466388] BUG: unable to handle kernel paging request at fffffffeecabe52a
[32345.474229] IP: 0xfffffffeecabe52a
[32345.478046] PGD 1e6c0c067 
[32345.478048] P4D 1e6c0c067 
[32345.481081] PUD 0 

[32345.488029] Oops: 0010 [#2] SMP
[32345.491541] Modules linked in: bnep 8021q garp mrp stp llc snd_hda_codec_hdmi xt_connmark cpufreq_conservative iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 cpufreq_userspace nf_nat_ipv4 nf_nat nf_conntrack libcrc32c cpufreq_powersave iptable_mangle iptable_filter bluetooth joydev ecdh_generic rfkill hid_logitech_hidpp usb_f_acm usb_f_fs usb_f_serial u_serial libcomposite udc_core configfs hid_generic hid_logitech_dj binfmt_misc nls_ascii nls_cp437 vfat fat intel_rapl x86_pkg_temp_thermal coretemp i2c_designware_platform kvm_intel i2c_designware_core kvm evdev irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_soc_skl intel_rapl_perf efi_pstore snd_soc_skl_ipc snd_soc_sst_ipc efivars snd_soc_sst_dsp snd_hda_ext_core pcspkr snd_soc_sst_match lpc_ich snd_soc_core snd_compress snd_hda_intel
[32345.571112]  snd_hda_codec i915 snd_hda_core snd_hwdep idma64 snd_pcm ftdi_sio snd_timer usbserial drm_kms_helper snd intel_lpss_pci intel_lpss soundcore drm mei_me shpchp mfd_core mei i2c_algo_bit video button usbhid hid i2c_dev parport_pc ppdev lp parport efivarfs ip_tables x_tables autofs4 ext4 crc16 jbd2 crc32c_generic fscrypto ecb mbcache mmc_block crc32c_intel aesni_intel aes_x86_64 crypto_simd cryptd glue_helper ahci libahci libata xhci_pci sdhci_pci i2c_i801 xhci_hcd sdhci mmc_core usbcore usb_common r8169 mii scsi_mod
[32345.623555] CPU: 0 PID: 31039 Comm: kworker/0:1 Tainted: G      D         4.12.0-0.bpo.2-amd64 #1 Debian 4.12.13-1~bpo9+1
[32345.635855] Hardware name: AAEON UP-APL01/UP-APL01, BIOS UPA1AM18 06/23/2017
[32345.643768] Workqueue: events efivar_update_sysfs_entries [efivars]
[32345.650804] task: ffff96e56d541180 task.stack: ffffa6a58c280000
[32345.660156] RIP: 0010:0xfffffffeecabe52a
[32345.667245] RSP: 0018:ffffa6a58c283bd8 EFLAGS: 00010206
[32345.675766] RAX: 00000000000000ff RBX: ffffa6a58c283c58 RCX: 0000000000000440
[32345.686489] RDX: 00000000000000b2 RSI: ffff96e589e91000 RDI: 0000000000000400
[32345.697194] RBP: ffffa6a58c283dc8 R08: 00000000000000ff R09: 0000000000000000
[32345.707927] R10: 000000007a066018 R11: 0000000000000021 R12: ffffa6a58c283dd0
[32345.718585] R13: 8000000000000015 R14: 0000000000000286 R15: ffffffffb5cd1e68
[32345.729169] FS:  0000000000000000(0000) GS:ffff96e5bfc00000(0000) knlGS:0000000000000000
[32345.740765] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[32345.749639] CR2: fffffffeecabe52a CR3: 00000001e6c09000 CR4: 00000000003406f0
[32345.759999] Call Trace:
[32345.765036]  ? set_next_entity+0xcd/0x200
[32345.771736]  ? efi_call+0x58/0x90
[32345.777624]  ? virt_efi_get_next_variable+0x76/0x120
[32345.785361]  ? efivar_init+0x13e/0x3d0
[32345.791746]  ? efivar_release+0x20/0x20 [efivars]
[32345.799273]  ? up+0x12/0x60
[32345.804570]  ? efivar_entry_add+0x53/0x80
[32345.811211]  ? efivar_update_sysfs_entries+0x24/0x60 [efivars]
[32345.819924]  ? efivar_update_sysfs_entries+0x24/0x60 [efivars]
[32345.828620]  ? process_one_work+0x181/0x370
[32345.835442]  ? worker_thread+0x4d/0x3a0
[32345.841945]  ? kthread+0xfc/0x130
[32345.847803]  ? process_one_work+0x370/0x370
[32345.854643]  ? kthread_create_on_node+0x70/0x70
[32345.861826]  ? do_group_exit+0x3a/0xa0
[32345.868101]  ? ret_from_fork+0x25/0x30
[32345.874332] Code:  Bad RIP value.
[32345.880152] RIP: 0xfffffffeecabe52a RSP: ffffa6a58c283bd8
[32345.888252] CR2: fffffffeecabe52a
[32345.894016] ---[ end trace e28431d538f56e19 ]---
[32347.511656] ------------[ cut here ]------------
[32347.516838] WARNING: CPU: 0 PID: 11608 at /build/linux-RdeW6Z/linux-4.12.13/arch/x86/kernel/fpu/core.c:46 __kernel_fpu_begin+0x9c/0xb0
[32347.530411] Modules linked in: bnep 8021q garp mrp stp llc snd_hda_codec_hdmi xt_connmark cpufreq_conservative iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 cpufreq_userspace nf_nat_ipv4 nf_nat nf_conntrack libcrc32c cpufreq_powersave iptable_mangle iptable_filter bluetooth joydev ecdh_generic rfkill hid_logitech_hidpp usb_f_acm usb_f_fs usb_f_serial u_serial libcomposite udc_core configfs hid_generic hid_logitech_dj binfmt_misc nls_ascii nls_cp437 vfat fat intel_rapl x86_pkg_temp_thermal coretemp i2c_designware_platform kvm_intel i2c_designware_core kvm evdev irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_soc_skl intel_rapl_perf efi_pstore snd_soc_skl_ipc snd_soc_sst_ipc efivars snd_soc_sst_dsp snd_hda_ext_core pcspkr snd_soc_sst_match lpc_ich snd_soc_core snd_compress snd_hda_intel
[32347.609679]  snd_hda_codec i915 snd_hda_core snd_hwdep idma64 snd_pcm ftdi_sio snd_timer usbserial drm_kms_helper snd intel_lpss_pci intel_lpss soundcore drm mei_me shpchp mfd_core mei i2c_algo_bit video button usbhid hid i2c_dev parport_pc ppdev lp parport efivarfs ip_tables x_tables autofs4 ext4 crc16 jbd2 crc32c_generic fscrypto ecb mbcache mmc_block crc32c_intel aesni_intel aes_x86_64 crypto_simd cryptd glue_helper ahci libahci libata xhci_pci sdhci_pci i2c_i801 xhci_hcd sdhci mmc_core usbcore usb_common r8169 mii scsi_mod
[32347.661922] CPU: 0 PID: 11608 Comm: kworker/u8:0 Tainted: G      D         4.12.0-0.bpo.2-amd64 #1 Debian 4.12.13-1~bpo9+1
[32347.674288] Hardware name: AAEON UP-APL01/UP-APL01, BIOS UPA1AM18 06/23/2017
[32347.682190] Workqueue: writeback wb_workfn (flush-179:0)
[32347.688146] task: ffff96e5b2912080 task.stack: ffffa6a58bda0000
[32347.694782] RIP: 0010:__kernel_fpu_begin+0x9c/0xb0
[32347.700155] RSP: 0018:ffffa6a58bda3760 EFLAGS: 00010202
[32347.706011] RAX: 0000000080000001 RBX: 0000000000001000 RCX: ffff96e5b2912080
[32347.714017] RDX: 0000000000000000 RSI: ffff96e59ad61000 RDI: ffffa6a58bda37e8
[32347.722006] RBP: ffffa6a58bda37e8 R08: 0000000000000000 R09: 0000000000001000
[32347.730016] R10: ffff96e5b5ebe800 R11: fffff0b0496b5840 R12: ffff96e59ad61000
[32347.738007] R13: 0000000000000000 R14: ffff96e59ad61000 R15: ffffffffc03dc050
[32347.746014] FS:  0000000000000000(0000) GS:ffff96e5bfc00000(0000) knlGS:0000000000000000
[32347.755081] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[32347.761512] CR2: 0000559a9b118588 CR3: 00000001e6c09000 CR4: 00000000003406f0
[32347.769503] Call Trace:
[32347.772253]  ? crc32c_pcl_intel_update+0x79/0xa0 [crc32c_intel]
[32347.778899]  ? crypto_shash_update+0x3f/0x110
[32347.783804]  ? ext4_block_bitmap_csum_set+0x6a/0xb0 [ext4]
[32347.789968]  ? ext4_mb_mark_diskspace_used+0x1f4/0x490 [ext4]
[32347.796438]  ? ext4_mb_new_blocks+0x30a/0xac0 [ext4]
[32347.802004]  ? blk_attempt_req_merge+0x3c/0x60
[32347.807023]  ? ext4_find_extent+0x28e/0x2d0 [ext4]
[32347.812409]  ? ext4_ext_map_blocks+0xb1b/0x1290 [ext4]
[32347.818191]  ? __alloc_pages_slowpath+0x9a1/0xd10
[32347.823501]  ? ext4_map_blocks+0x164/0x590 [ext4]
[32347.828789]  ? ext4_writepages+0x81b/0xe50 [ext4]
[32347.834065]  ? update_group_capacity+0x23/0x1e0
[32347.839154]  ? cpumask_next_and+0x26/0x40
[32347.843645]  ? strlcpy+0x31/0x40
[32347.847257]  ? do_writepages+0x17/0x60
[32347.851466]  ? do_writepages+0x17/0x60
[32347.855671]  ? __writeback_single_inode+0x3d/0x300
[32347.861035]  ? writeback_sb_inodes+0x221/0x4f0
[32347.866030]  ? __writeback_inodes_wb+0x87/0xb0
[32347.871021]  ? wb_writeback+0x282/0x310
[32347.875320]  ? set_next_entity+0xcd/0x200
[32347.879820]  ? wb_workfn+0x2ce/0x3a0
[32347.883823]  ? wb_workfn+0x2ce/0x3a0
[32347.887838]  ? process_one_work+0x181/0x370
[32347.892521]  ? worker_thread+0x4d/0x3a0
[32347.896831]  ? kthread+0xfc/0x130
[32347.900547]  ? process_one_work+0x370/0x370
[32347.905236]  ? kthread_create_on_node+0x70/0x70
[32347.910319]  ? do_group_exit+0x3a/0xa0
[32347.914517]  ? ret_from_fork+0x25/0x30
[32347.918715] Code: 48 8d b9 40 0b 00 00 85 c0 74 24 b8 ff ff ff ff 89 c2 48 0f c7 2f 31 c0 85 c0 75 17 f3 c3 0f ff eb aa 48 0f ae 81 40 0b 00 00 c3 <0f> ff eb a8 0f ff eb d8 0f ff c3 66 0f 1f 84 00 00 00 00 00 0f 
[32347.939850] ---[ end trace e28431d538f56e1a ]---
[32349.815621] asynchronous wait on fence i915:Xorg[585]/0:e45 timed out
[32349.875629] pipe A vblank wait timed out
[32349.880108] ------------[ cut here ]------------
[32349.885366] WARNING: CPU: 1 PID: 11608 at /build/linux-RdeW6Z/linux-4.12.13/drivers/gpu/drm/i915/intel_display.c:12636 intel_atomic_commit_tail+0xf21/0xf50 [i915]
[32349.901694] Modules linked in: bnep 8021q garp mrp stp llc snd_hda_codec_hdmi xt_connmark cpufreq_conservative iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 cpufreq_userspace nf_nat_ipv4 nf_nat nf_conntrack libcrc32c cpufreq_powersave iptable_mangle iptable_filter bluetooth joydev ecdh_generic rfkill hid_logitech_hidpp usb_f_acm usb_f_fs usb_f_serial u_serial libcomposite udc_core configfs hid_generic hid_logitech_dj binfmt_misc nls_ascii nls_cp437 vfat fat intel_rapl x86_pkg_temp_thermal coretemp i2c_designware_platform kvm_intel i2c_designware_core kvm evdev irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_soc_skl intel_rapl_perf efi_pstore snd_soc_skl_ipc snd_soc_sst_ipc efivars snd_soc_sst_dsp snd_hda_ext_core pcspkr snd_soc_sst_match lpc_ich snd_soc_core snd_compress snd_hda_intel
[32349.980956]  snd_hda_codec i915 snd_hda_core snd_hwdep idma64 snd_pcm ftdi_sio snd_timer usbserial drm_kms_helper snd intel_lpss_pci intel_lpss soundcore drm mei_me shpchp mfd_core mei i2c_algo_bit video button usbhid hid i2c_dev parport_pc ppdev lp parport efivarfs ip_tables x_tables autofs4 ext4 crc16 jbd2 crc32c_generic fscrypto ecb mbcache mmc_block crc32c_intel aesni_intel aes_x86_64 crypto_simd cryptd glue_helper ahci libahci libata xhci_pci sdhci_pci i2c_i801 xhci_hcd sdhci mmc_core usbcore usb_common r8169 mii scsi_mod
[32350.033243] CPU: 1 PID: 11608 Comm: kworker/u8:0 Tainted: G      D W       4.12.0-0.bpo.2-amd64 #1 Debian 4.12.13-1~bpo9+1
[32350.045593] Hardware name: AAEON UP-APL01/UP-APL01, BIOS UPA1AM18 06/23/2017
[32350.053533] Workqueue: events_unbound intel_atomic_commit_work [i915]
[32350.060742] task: ffff96e5b2912080 task.stack: ffffa6a58bda0000
[32350.067422] RIP: 0010:intel_atomic_commit_tail+0xf21/0xf50 [i915]
[32350.074247] RSP: 0018:ffffa6a58bda3da8 EFLAGS: 00010286
[32350.080113] RAX: 000000000000001c RBX: ffff96e5b51b8000 RCX: 0000000000000000
[32350.088096] RDX: 0000000000000000 RSI: ffff96e5bfc8dee8 RDI: ffff96e5bfc8dee8
[32350.096094] RBP: 0000000000000000 R08: ffff96e5b099ac18 R09: 00000000000003d5
[32350.104076] R10: ffffa6a58bda3da8 R11: ffffffffb5ecddcd R12: 0000000000000000
[32350.112074] R13: 0000000000000000 R14: ffff96e5b20bb000 R15: 0000000000000001
[32350.120057] FS:  0000000000000000(0000) GS:ffff96e5bfc80000(0000) knlGS:0000000000000000
[32350.129140] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[32350.135568] CR2: 000003dd16f14800 CR3: 000000022d622000 CR4: 00000000003406e0
[32350.143566] Call Trace:
[32350.146319]  ? remove_wait_queue+0x60/0x60
[32350.150901]  ? process_one_work+0x181/0x370
[32350.155583]  ? worker_thread+0x4d/0x3a0
[32350.159900]  ? kthread+0xfc/0x130
[32350.163600]  ? process_one_work+0x370/0x370
[32350.168279]  ? kthread_create_on_node+0x70/0x70
[32350.173350]  ? do_group_exit+0x3a/0xa0
[32350.177563]  ? ret_from_fork+0x25/0x30
[32350.181760] Code: 4c 89 44 24 08 48 83 c7 08 e8 5c 4b 7a f4 4c 8b 44 24 08 4d 85 c0 0f 85 36 fe ff ff 8d 75 41 48 c7 c7 b0 9e 98 c0 e8 05 2b 87 f4 <0f> ff e9 20 fe ff ff 8d 70 41 48 c7 c7 80 9e 98 c0 e8 ef 2a 87 
[32350.202948] ---[ end trace e28431d538f56e1b ]---

To get around Video tearing I have done this in my /etc/X11/xorg.conf file;
Section "Device"
	Identifier "Intel Graphics"
	Driver "intel"
	Option "TearFree" "true"
EndSection

What can I do so this problem doesn't happen again?