Categories
Linux Wireless

9260 crashes

Intel wireless cards are staples. They’re cheap, they’re always the first to feature the latest technology, and they’re everywhere. You’d be hard-pressed to find a mid/high-end machine out there with anything but Intel’s cards. However, I use Linux, so naturally I want to use those cards with it, considering Intel has been supporting Linux forever in all manner of capacities, and their wireless cards are no different.

I bought a pair of 9260s to put into my laptop and desktop, expecting decent speeds and reliable connections. While the speeds exceeded my expectations, reliability is a different story. At “high load” (e.g. downloading big games from Steam) or simply prolonged intensive usage (e.g. big rsync transfers), the cards tend to crash with microcode errors similar to the following:

Queue 10 is active on fifo 3 and stuck for 10000 ms. SW [1, 2] HW [1, 2] FH TRB=0x0c030a000
Queue 11 is active on fifo 1 and stuck for 10000 ms. SW [1, 10] HW [1, 10] FH TRB=0x0c010b000
Microcode SW error detected. Restarting 0x0.
Start IWL Error Log Dump:
Status: 0x00000040, count: 6
Loaded firmware version: 46.8902351f.0 9260-th-b0-jf-b0-46.ucode
0x00000084 | NMI_INTERRUPT_UNKNOWN       
0x000022F0 | trm_hw_status0
0x00000000 | trm_hw_status1
0x004853D2 | branchlink2
0x004758E2 | interruptlink1
0x00487024 | interruptlink2
0x00011422 | data1
0xFF000000 | data2
0xF0000008 | data3
0x198155A1 | beacon time
0xEADE5A68 | tsf low
0x00000007 | tsf hi
0x00000000 | time gp1
0x3AAC177D | time gp2
0x00000001 | uCode revision type
0x0000002E | uCode version major
0x8902351F | uCode version minor
0x00000321 | hw version
0x00C89004 | board version
0x805DF500 | hcmd
0x00022000 | isr0
0x00000000 | isr1
0x08001802 | isr2
0x004154C0 | isr3
0x00000000 | isr4
0x00DB019C | last cmd Id
0x00011422 | wait_event
0x00000080 | l2p_control
0x00000020 | l2p_duration
0x0000003F | l2p_mhvalid
0x000000CE | l2p_addr_match
0x0000000D | lmpm_pmg_sel
0x02052032 | timestamp
0x00005810 | flow_handler
Start IWL Error Log Dump:
Status: 0x00000040, count: 7
0x20000066 | NMI_INTERRUPT_HOST
0x00000000 | umac branchlink1
0xC0088BDA | umac branchlink2
0xC0084474 | umac interruptlink1
0xC0084474 | umac interruptlink2
0x01000000 | umac data1
0xC0084474 | umac data2
0xDEADBEEF | umac data3
0x0000002E | umac major
0x8902351F | umac minor
0x3AAC1769 | frame pointer
0xC088627C | stack pointer
0x00DB019C | last host cmd
0x00000000 | isr status reg
Fseq Registers:
0xDA22A75D | FSEQ_ERROR_CODE
0x00000000 | FSEQ_TOP_INIT_VERSION
0xF177654E | FSEQ_CNVIO_INIT_VERSION
0x0000A371 | FSEQ_OTP_VERSION
0x4CA11346 | FSEQ_TOP_CONTENT_VERSION
0x772BE26C | FSEQ_ALIVE_TOKEN
0x59B5792F | FSEQ_CNVI_ID
0x192AFB70 | FSEQ_CNVR_ID
0x01000200 | CNVI_AUX_MISC_CHIP
0x01300202 | CNVR_AUX_MISC_CHIP
0x0000485B | CNVR_SCU_SD_REGS_SD_REG_DIG_DCDC_VTRIM
0x0BADCAFE | CNVR_SCU_SD_REGS_SD_REG_ACTIVE_VDIG_MIRROR
WRT: Collecting data: ini trigger 4 fired.

which results in a broken wireless connection, and having to reload the wireless modules, or at best, just restarting NetworkManager or the wireless back-end. I’ve dabbled with iwlwifi/iwlmvm module settings, and tested various configurations under load, eventually ending up with the following at /etc/modprobe.d/iwlwifi.conf:

# Enable "active" power management scheme
options iwlmvm power_scheme=1

#options iwlwifi swcrypto=1

# Disable 802.11n
#options iwlwifi 11n_disable=1

# Disable antenna aggregation
#options iwlwifi 11n_disable=4

# Enable antenna aggregation
#options iwlwifi 11n_disable=8

# Hear 12K long packets
#options iwlwifi amsdu_size=3

# Disable U-APSD for BSS
#options iwlwifi uapsd_disable=1

# Disable U-APSD for P2P client
#options iwlwifi uapsd_disable=2

#options iwlwifi bt_coex_active=0

#options iwlwifi power_save=1

I wrote a script to help with testing. It depends on the above module settings format for iwlwifi (to grep active options), and /etc/NetworkManager/NetworkManager.conf having at least the following:

[device]
#wifi.backend=iwd

since it does NetworkManager back-end switching between wpa_supplicant and iwd on command:

#!/bin/bash

nm_config=/etc/NetworkManager/NetworkManager.conf
[[ -n $(awk '/^wifi.backend/ {print $1}' $nm_config) ]] && iwd_backend=1
module_config=/etc/modprobe.d/iwlwifi.conf

stop_backend() {
  if [[ $(systemctl is-active "$1") = active ]]; then
    echo Stopping "$1"...
    sudo systemctl stop "$1"
  fi
}

switch_backend() {
  echo Switching back-end to "$1"...
  if [[ $1 != iwd ]]; then sudo sed -r 's/^(wifi.backend)/#\1/' -i $nm_config
  else sudo sed -r 's/^#(wifi.backend)/\1/' -i $nm_config
  fi
}

main() {
  if [[ $1 != iwd ]]; then
    stop_backend iwd
    if [[ -n $iwd_backend ]]; then switch_backend "$1"
    else stop_backend "$1"
    fi
  else
    stop_backend wpa_supplicant
    if [[ -z $iwd_backend ]]; then switch_backend iwd
    else stop_backend iwd
    fi
  fi

  echo Stopping NetworkManager...
  sudo systemctl stop NetworkManager
  
  echo Removing modules...
  sudo rmmod iwlmvm iwlwifi
  
  echo Loading modules...
  sudo modprobe iwlwifi
  
  if [[ $1 = iwd ]]; then
    echo Starting iwd...
    sudo systemctl start iwd
  fi

  echo Starting NetworkManager...
  sudo systemctl start NetworkManager
}

case $1 in
  -d) grep ^o $module_config ;;
  -e) sudo vim $module_config ;;
  -w) main wpa_supplicant ;;
  -i) main iwd ;;
  -n) sudo systemctl restart NetworkManager ;;
  *)
    echo "Usage: $(basename "$0") [option]"
    echo "  -d, Display current module options"
    echo "  -e, Edit module options"
    echo "  -w, Enable/start wpa_supplicant"
    echo "  -i, Enable/start iwd"
    echo "  -n, Only restart NetworkManager"
    exit 1
    ;;
esac

While the script does help with quickly getting back to working wireless most of the time, I’ve since switched to connecting my machines via Ethernet to a DD-WRT device running in repeater-bridge mode which has proven for a rock-solid connection to my wireless router, until I figure out the underlying reason for the microcode crashes.

Daniel Robbins of the Gentoo/Funtoo projects suggests disabling antenna aggregation (11n_disable=4), which, although indeed allows for greater reliability, accordingly results in much slower transfer speeds. While reliability is important, especially during COVID-19, we can’t simply sweep performance under the rug, so the search for the proper solution continues.