In light of the CrowdStrike-Microsoft outage/disaster that has been wreaking havoc on corporate Windows systems around the world since Friday, systemd lead developer Lennart Poettering pointed out how such a situation on Linux systems could be averted by leveraging systemd’s Automatic Boot Assessment functionality.

System’s Automatic Boot Assessment feature can allow for reverting to a previous version of the OS or kernel automatically when a system consistently fails to boot. With the systemd-boot bootloader and related tooling within systemd and leveraging the Boot Loader Specification, systemd Automatic Boot Assessment would make for much easier recovery in case of an incident like what happened with Microsoft Windows systems running CrowdStrike software last week.

  • Telorand@reddthat.com
    link
    fedilink
    arrow-up
    4
    ·
    edit-2
    4 months ago

    As someone else pointed out, even on an immutable system where you can swap out the system layer, the update would have likely been somewhere in the mutable /var directory in the userspace, since it was some kind of definition update.

    I believe SteamOS uses ABA to ensure continuous operation in the case of a bad update, but an image rollback would only work if you could include the offending file/directory for anything that’s not in the system layer.

    • Morphit @feddit.uk
      link
      fedilink
      arrow-up
      2
      ·
      4 months ago

      I think having an A partition and a B partition (I’m assuming that’s how SteamOS works) wouldn’t help in this case. If the A partition downloaded the definition file, crashed and failed to reboot; the bootloader could failover to the B partition - which would then download the definition file, crash and fail to reboot. It would have to keep rolling back to a last known good snapshot until the update got withdrawn.

      You could have an ephemeral set up that wipes /var and /etc and recreates them every boot. I don’t think these EDR tools would like that very much though.

      • Telorand@reddthat.com
        link
        fedilink
        arrow-up
        1
        ·
        4 months ago

        You could potentially block your network by disabling your router or something, so it couldn’t download the bad update, but you’d have to know that was a step to prevent it (which most people didn’t until it was too late).

        Ostree-based systems are handy for replacing the system layer, but configs live (mostly) in userspace, and they persist.

        • Morphit @feddit.uk
          link
          fedilink
          arrow-up
          2
          ·
          4 months ago

          Well at that point, just don’t install any kernel mode EDR software at all.

          NixOS can be set up for impermanence where all config is recreated every boot and nothing persists besides the nix store. There’s helpers for ephemeral home also, so you can have something like TailsOS. I’m sure you could do that with other distros but you’d need absolute discipline to have everything the machine needs provisioned at boot.