[TriLUG] Start job for unit session-nnnnn.scope failed with 'failed'

Brian Henning via TriLUG trilug at trilug.org
Fri Nov 17 10:08:42 EST 2017


Folks,

tl;dr: Something weird happened on a CentOS box and I'd like some help understanding it.

Yesterday I had a situation where I couldn't get shell access to a CentOS 7.4 server because some sort of resource had been exhausted.  Since I wasn't physically near the machine, I ended up having to learn a new trick and used Webmin to execute "echo b > /proc/sysrq_trigger" to cause the machine to reboot immediately.  Systemd responded to every request with a cryptic "transaction is destructive" message (which apparently means it's been asked to do something that conflicts with current state somehow?), so that ruled out all of the normal shutdown avenues.  Anyway...

Scores of errors started piling up in logs.

In /var/log/messages:
Nov 15 14:30:01 undecidedgames systemd: Failed to create cgroup /user.slice/user-495.slice: No such file or directory
Nov 15 14:30:01 undecidedgames systemd: Created slice User Slice of amavis.
Nov 15 14:30:01 undecidedgames systemd: Starting User Slice of amavis.
Nov 15 14:30:01 undecidedgames systemd: Failed to create cgroup /user.slice/user-495.slice: No such file or directory
Nov 15 14:30:01 undecidedgames systemd: Failed to start Session 33336 of user amavis.
Nov 15 14:30:01 undecidedgames systemd: Failed to create cgroup /user.slice/user-41.slice: No such file or directory
Nov 15 14:30:01 undecidedgames systemd: Created slice User Slice of mailman.
Nov 15 14:30:01 undecidedgames systemd: Starting User Slice of mailman.
Nov 15 14:30:01 undecidedgames systemd: Failed to create cgroup /user.slice/user-495.slice: No such file or directory
Nov 15 14:30:01 undecidedgames systemd: Failed to realize cgroups for queued unit user-495.slice: No such file or directory
Nov 15 14:30:01 undecidedgames systemd: Failed to create cgroup /user.slice/user-41.slice: No such file or directory
Nov 15 14:30:01 undecidedgames systemd: Failed to start Session 33335 of user mailman.
...etc.

In /var/log/secure:
Nov 15 14:30:01 undecidedgames crond[10474]: pam_systemd(crond:session): Failed to create session: Start job for unit session-33336.scope failed with 'failed'
Nov 15 14:30:01 undecidedgames crond[10473]: pam_systemd(crond:session): Failed to create session: Start job for unit session-33335.scope failed with 'failed'
Nov 15 14:30:01 undecidedgames crond[10475]: pam_systemd(crond:session): Failed to create session: Start job for unit session-33334.scope failed with 'failed'

...and when I tried to log in:

Nov 16 09:23:39 undecidedgames sshd[31804]: Accepted publickey for brian from <redacted> port 54281 ssh2: RSA SHA256:<redacted>
Nov 16 09:23:39 undecidedgames sshd[31804]: pam_systemd(sshd:session): Failed to create session: Start job for unit session-33975.scope failed with 'failed'
Nov 16 09:23:39 undecidedgames sshd[31804]: pam_unix(sshd:session): session opened for user brian by (uid=0)
Nov 16 09:23:39 undecidedgames sshd[31804]: error: openpty: No such file or directory
Nov 16 09:23:39 undecidedgames sshd[31806]: error: session_pty_req: session 0 alloc failed
Nov 16 09:23:42 undecidedgames sshd[31804]: pam_unix(sshd:session): session closed for user brian

On the client side, I'd get a message along the lines of "failed to allocate pty" or some such, and wind up with an SSH connection with no shell running.

The only other evidence I have is that I spun up an instance of ZoneMinder in a chroot jail shortly before the problem manifested, and ZoneMinder causes Apache to do a flurry of setuid(0) operations at startup.  The first examples of the "Failed to create session:" messages in /var/log/secure were immediately preceded by a flurry of 15 su:sessions for Apache, which is why I suspect ZM as the culprit:

Nov 15 14:24:21 undecidedgames sshd[5061]: pam_unix(sshd:session): session opened for user brian by (uid=0)
Nov 15 14:25:06 undecidedgames sudo:   brian : TTY=pts/1 ; PWD=/home/brian ; USER=root ; COMMAND=/vault/zone_jail/startup.sh
Nov 15 14:25:10 undecidedgames su: pam_unix(su:session): session opened for user apache by (uid=0)
Nov 15 14:25:10 undecidedgames su: pam_unix(su:session): session closed for user apache
                [(su:session) messages repeat 13 times] ...
Nov 15 14:25:14 undecidedgames su: pam_unix(su:session): session opened for user apache by (uid=0)
Nov 15 14:25:14 undecidedgames su: pam_unix(su:session): session closed for user apache
Nov 15 14:30:01 undecidedgames crond[10474]: pam_systemd(crond:session): Failed to create session: Start job for unit session-33336.scope failed with 'failed'
Nov 15 14:30:01 undecidedgames crond[10473]: pam_systemd(crond:session): Failed to create session: Start job for unit session-33335.scope failed with 'failed'

This could, of course, be a red herring; after rebooting the machine, I was able to start up ZoneMinder and the problem didn't come back (although I've not left ZM running out of fear...).

So at this point I basically have no Earthly idea what actually happened.  Is this something anyone else has seen before and knows a bit about?

Cheers,
-Brian


More information about the TriLUG mailing list