Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CRI-O: installing default AppArmor profile "crio-default" failed #10783

Closed
ledroide opened this issue Jan 9, 2024 · 6 comments
Closed

CRI-O: installing default AppArmor profile "crio-default" failed #10783

ledroide opened this issue Jan 9, 2024 · 6 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@ledroide
Copy link
Contributor

ledroide commented Jan 9, 2024

Summary: default AppArmor profile crio-default does not exist or cannot be loaded -> crio fails to start

Environment

  • OS: Ubuntu Cloud 23.10 Minimal
  • Ansible: 2.16.1
  • Python: 3.10.12
  • Kubespray version (commit): 8c09c3f
  • Network plugin: cilium
  • Container runtime and engine: cri-o + crun
  • Playbook: cluster.yml

playbook error output

TASK [kubernetes_sigs.kubespray.container-engine/cri-o : Cri-o | trigger service restart only when needed] *********************************************************************************
fatal: [k8ststworker-1]: FAILED! => {"changed": false, "msg": "Unable to restart service crio: Job for crio.service failed because the control process exited with error code.\nSee \"systemctl status crio.service\" and \"journalctl -xeu crio.service\" for details.\n"}

journalctl error output

time="2024-01-09 10:46:34.810337246Z" level=info msg="Installing default AppArmor profile: crio-default"
time="2024-01-09 10:46:34.816522257Z" level=fatal msg="validating runtime config: unable to load AppArmor profile: installing default AppArmor profile \"crio-default\" failed"
crio.service: Main process exited, code=exited, status=1/FAILURE
crio.service: Failed with result 'exit-code'.
Failed to start crio.service - Container Runtime Interface for OCI (CRI-O).

what we have tried

unsuccessfully

  • load manually
  • restart apparmor
  • reboot servers

successfully

  • in roles/container-engine/cri-o/templates/crio.conf add option apparmor_profile = "unconfined" -> see my fork and commit as an ugly workaround.
  • however disabling crio from apparmor is only a workaround to make it run, not a solution

full journalctl log

root@k8ststworker-1:~# journalctl -xeu crio -o cat
crio.service: Scheduled restart job, restart counter is at 72.
Stopped crio.service - Container Runtime Interface for OCI (CRI-O).
Starting crio.service - Container Runtime Interface for OCI (CRI-O)...
time="2024-01-09 10:46:34.754722487Z" level=info msg="Starting CRI-O, version: 1.28.1, git: eda470f7f503d9f40a9aa2a02e45f0878ed6fc61(dirty)"
time="2024-01-09 10:46:34.762728035Z" level=info msg="Node configuration value for hugetlb cgroup is true"
time="2024-01-09 10:46:34.762783723Z" level=info msg="Node configuration value for pid cgroup is true"
time="2024-01-09 10:46:34.762873379Z" level=info msg="Node configuration value for memoryswap cgroup is true"
time="2024-01-09 10:46:34.762886293Z" level=info msg="Node configuration value for cgroup v2 is true"
time="2024-01-09 10:46:34.775572788Z" level=info msg="Node configuration value for systemd CollectMode is true"
time="2024-01-09 10:46:34.790946523Z" level=info msg="Node configuration value for systemd AllowedCPUs is true"
time="2024-01-09 10:46:34.791523156Z" level=info msg="Not using native diff for overlay, this may cause degraded performance for building images: kernel has CONFIG_OVERLAY_FS_REDIRECT_DIR>
time="2024-01-09 10:46:34.794661692Z" level=info msg="Using default capabilities: CAP_CHOWN, CAP_DAC_OVERRIDE, CAP_FSETID, CAP_FOWNER, CAP_NET_RAW, CAP_SETGID, CAP_SETUID, CAP_SETPCAP, CA>
time="2024-01-09 10:46:34.809227417Z" level=error msg="Getting OCI runtime features failed: exit status 1"
time="2024-01-09 10:46:34.810225311Z" level=info msg="Checkpoint/restore support disabled"
time="2024-01-09 10:46:34.810286302Z" level=info msg="Using seccomp default profile when unspecified: true"
time="2024-01-09 10:46:34.810312896Z" level=info msg="Using the internal default seccomp profile"
time="2024-01-09 10:46:34.810337246Z" level=info msg="Installing default AppArmor profile: crio-default"
time="2024-01-09 10:46:34.816522257Z" level=fatal msg="validating runtime config: unable to load AppArmor profile: installing default AppArmor profile \"crio-default\" failed"
crio.service: Main process exited, code=exited, status=1/FAILURE
crio.service: Failed with result 'exit-code'.
Failed to start crio.service - Container Runtime Interface for OCI (CRI-O).

apparmor status

root@k8ststworker-1:~# apparmor_status
apparmor module is loaded.
67 profiles are loaded.
10 profiles are in enforce mode.
   /usr/lib/NetworkManager/nm-dhcp-client.action
   /usr/lib/NetworkManager/nm-dhcp-helper
   /usr/lib/connman/scripts/dhclient-script
   /usr/lib/snapd/snap-confine
   /usr/lib/snapd/snap-confine//mount-namespace-capture-helper
   /{,usr/}sbin/dhclient
   lsb_release
   nvidia_modprobe
   nvidia_modprobe//kmod
   rsyslogd
0 profiles are in complain mode.
0 profiles are in prompt mode.
0 profiles are in kill mode.
57 profiles are in unconfined mode.
   /bin/toybox
   /opt/brave.com/brave/brave
   /opt/google/chrome/chrome
   /opt/microsoft/msedge/msedge
   /opt/vivaldi/vivaldi-bin
   /usr/bin/buildah
   /usr/bin/busybox
   /usr/bin/cam
   /usr/bin/ch-checkns
   /usr/bin/ch-run
   /usr/bin/crun
   /usr/bin/flatpak
   /usr/bin/ipa_verify
   /usr/bin/lc-compliance
   /usr/bin/libcamerify
   /usr/bin/lxc-attach
   /usr/bin/lxc-create
   /usr/bin/lxc-destroy
   /usr/bin/lxc-execute
   /usr/bin/lxc-stop
   /usr/bin/lxc-unshare
   /usr/bin/lxc-usernsexec
   /usr/bin/mmdebstrap
   /usr/bin/podman
   /usr/bin/qcam
   /usr/bin/rootlesskit
   /usr/bin/rpm
   /usr/bin/sbuild
   /usr/bin/sbuild-abort
   /usr/bin/sbuild-apt
   /usr/bin/sbuild-checkpackages
   /usr/bin/sbuild-clean
   /usr/bin/sbuild-createchroot
   /usr/bin/sbuild-distupgrade
   /usr/bin/sbuild-hold
   /usr/bin/sbuild-shell
   /usr/bin/sbuild-unhold
   /usr/bin/sbuild-update
   /usr/bin/sbuild-upgrade
   /usr/bin/slirp4netns
   /usr/bin/stress-ng
   /usr/bin/thunderbird
   /usr/bin/trinity
   /usr/bin/tup
   /usr/bin/userbindmount
   /usr/bin/uwsgi-core
   /usr/bin/vdens
   /usr/bin/vpnns
   /usr/lib/*-linux-gnu*/opera/opera
   /usr/lib/*-linux-gnu*/qt5/libexec/QtWebEngineProcess
   /usr/lib/qt6/libexec/QtWebEngineProcess
   /usr/libexec/*-linux-gnu*/bazel/linux-sandbox
   /usr/libexec/virtiofsd
   /usr/sbin/runc
   /usr/sbin/sbuild-adduser
   /usr/sbin/sbuild-destroychroot
   /usr/share/code/bin/code
1 processes have profiles defined.
1 processes are in enforce mode.
   /usr/sbin/rsyslogd (398) rsyslogd
0 processes are in complain mode.
0 processes are in prompt mode.
0 processes are in kill mode.
0 processes are unconfined but have a profile defined.
0 processes are in mixed mode.
root@k8ststworker-1:~# apparmor_status | grep cri
   /usr/lib/connman/scripts/dhclient-script

crio.conf ugly patch

Since there is no variable that sets this option, we had to patch the template.

diff --git a/roles/container-engine/cri-o/templates/crio.conf.j2 b/roles/container-engine/cri-o/templates/crio.conf.j2
index 81d5a421e..5e1a44634 100644
--- a/roles/container-engine/cri-o/templates/crio.conf.j2
+++ b/roles/container-engine/cri-o/templates/crio.conf.j2
@@ -146,7 +146,7 @@ seccomp_profile = "{{ crio_seccomp_profile }}"
 # does not specify a profile via the Kubernetes Pod's metadata annotation. If
 # the profile is set to "unconfined", then this equals to disabling AppArmor.
 # This option supports live configuration reload.
-# apparmor_profile = "crio-default"
+apparmor_profile = "unconfined"

 # Cgroup management implementation used for the runtime.
 cgroup_manager = "{{ crio_cgroup_manager }}"
@ledroide ledroide added the kind/bug Categorizes issue or PR as related to a bug. label Jan 9, 2024
@tvorogme
Copy link

Have same error

hswong3i added a commit to alvistack/ansible-role-containers_common that referenced this issue Apr 20, 2024
Signed-off-by: Wong Hoi Sing Edison <hswong3i@pantarei-design.com>
hswong3i added a commit to alvistack/ansible-role-cri_o that referenced this issue Apr 20, 2024
Signed-off-by: Wong Hoi Sing Edison <hswong3i@pantarei-design.com>
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 28, 2024
@ledroide
Copy link
Contributor Author

ledroide commented Jul 2, 2024

/remove-lifecycle

@tvorogme
Copy link

tvorogme commented Jul 2, 2024

BTW this is quick solution for those who have this error:

  1. Create app-armor custom profile for CRI-O with
#include <tunables/global>

profile crio-custom flags=(attach_disconnected, mediate_deleted) {
    #include <abstractions/base>
    capability,
    network,
    file,
    umount,
}

in /etc/apparmor.d/crio-custom

  1. Add profile to apparmor: apparmor_parser -r /etc/apparmor.d/crio-custom

  2. Then use this profile in cri-o ( /etc/crio/crio.conf): apparmor_profile = "crio-custom"

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 1, 2024
@ledroide
Copy link
Contributor Author

ledroide commented Aug 9, 2024

After upgrading our servers from Ubuntu 23.10 to 24.04, the apparmor profile for cri-o now works fine.

I have reverted crio.conf to original configuration, and crio.service starts this way.

apparmor_profile = "crio-default"

I am unable from now to reproduce the issue, and it appears that not so many users experienced this issue. I am now closing the issue. Feel free to re-open if you are able to reproduce it.

@ledroide ledroide closed this as completed Aug 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

4 participants