This commit does not belong lớn any branch on this repository, and may belong lớn a fork outside of the repository.

Bạn đang xem: How to quickly fix nvidia container high cpu usage in 2022


" class="Link--secondary" href="/NVIDIA/nvidia-container-runtime/commit/4bbe2e92d9859f21ea61ed37b841e1aec8ec93a5">Bump for post 3.6.0 development
" class="Link--secondary" href="/NVIDIA/nvidia-container-runtime/commit/8f8e5c39fe962c4c7d1967e078b0f86a632a1ab8">Update centos8 mirrors
" class="Link--secondary" href="/NVIDIA/nvidia-container-runtime/commit/8308065e67343039dc8e3f80f28583b3899b5a51">Generate rpm changelog entry automatically
" class="Link--secondary" href="/NVIDIA/nvidia-container-runtime/commit/6c9dc9d5474e6ca2994209b813ea0e563dc6880b">Update gitignore and dockerignore
" class="Link--secondary" href="/NVIDIA/nvidia-container-runtime/commit/6c9dc9d5474e6ca2994209b813ea0e563dc6880b">Update gitignore và dockerignore
" class="Link--secondary" href="/NVIDIA/nvidia-container-runtime/commit/5d7c66ad055eeceb6f1b8b2ff87e759062420808">Require that LIB_VERSION be mix as make variable
" class="Link--secondary" href="/NVIDIA/nvidia-container-runtime/commit/f02eef5370fd07bff784d746b3104e67510aec8c">Update project License
" class="Link--secondary" href="/NVIDIA/nvidia-container-runtime/commit/4d2982d3e47f48b7a08198aee7e0d900d9cbf376">Remove code migrated to nvidia-container-toolkit
" class="Link--secondary" href="/NVIDIA/nvidia-container-runtime/commit/f02eef5370fd07bff784d746b3104e67510aec8c">Update project License
" class="Link--secondary" href="/NVIDIA/nvidia-container-runtime/commit/5d7c66ad055eeceb6f1b8b2ff87e759062420808">Require that LIB_VERSION be phối as make variable
Migration Notice nvidia-container-runtime Usage example Installation Ubuntu distributions CentOS distributions Docker Engine thiết lập Systemd drop-in tệp tin Daemon configuration file Command line Environment variables (OCI spec) NVIDIA_VISIBLE_DEVICES Possible values NVIDIA_MIG_CONFIG_DEVICES Possible values NVIDIA_MIG_MONITOR_DEVICES Possible values NVIDIA_DRIVER_CAPABILITIES Possible values Supported driver capabilities NVIDIA_REQUIRE_* Supported constraints Expressions NVIDIA_DISABLE_REQUIRE NVIDIA_REQUIRE_CUDA Possible values CUDA_VERSION Issues và Contributing

README.md


Migration NoticeNOTE: The source code for the nvidia-container-runtime binary has been moved to the nvidia-container-toolkit repository. It is now included in the nvidia-container-toolkit package and the nvidia-container-runtime package defined in this repository is a meta-package that allows workflows that referred lớn this package directly lớn continue to lớn function without modification.

nvidia-container-runtime

*
*

A modified version of runc adding a custom pre-start hook khổng lồ all containers.If environment variable NVIDIA_VISIBLE_DEVICES is mix in the OCI spec, the hook will configure GPU access for the container by leveraging nvidia-container-cli from project libnvidia-container.

Usage example


# cài đặt a rootfs based on Ubuntu 16.04cd $(mktemp -d) && mkdir rootfscurl -sS http://cdimage.ubuntu.com/ubuntu-base/releases/16.04/release/ubuntu-base-16.04-core-amd64.tar.gz | tar --exclude "dev/*" -C rootfs -xz# Create an OCI runtime specnvidia-container-runtime specsed -i "s;"sh";"nvidia-smi";" config.jsonsed -i "s;("TERM=xterm");1, "NVIDIA_VISIBLE_DEVICES=0";" config.json# Run the containersudo nvidia-container-runtime run nvidia_smi

Installation

Ubuntu distributionsInstall the nvidia-container-runtime package:
sudo apt-get install nvidia-container-runtime
CentOS distributionsInstall the nvidia-container-runtime package:
sudo yum install nvidia-container-runtime

Docker Engine setup

Do not follow this section if you installed the nvidia-docker2 package, it already registers the runtime.

To register the nvidia runtime, use the method below that is best suited lớn your environment.You might need to lớn merge the new argument with your existing configuration.

Systemd drop-in file
sudo mkdir -p /etc/systemd/system/docker.service.dsudo tee /etc/systemd/system/docker.service.d/override.conf EOFExecStart=ExecStart=/usr/bin/dockerd --host=fd:// --add-runtime=nvidia=/usr/bin/nvidia-container-runtimeEOFsudo systemctl daemon-reloadsudo systemctl restart docker
Daemon configuration file
sudo tee /etc/docker/daemon.json EOF "runtimes": "nvidia": "path": "/usr/bin/nvidia-container-runtime", "runtimeArgs": <> EOFsudo pkill -SIGHUP dockerd
You can optionally reconfigure the default runtime by adding the following to lớn /etc/docker/daemon.json:


"default-runtime": "nvidia"
Command line
sudo dockerd --add-runtime=nvidia=/usr/bin/nvidia-container-runtime <...>

Environment variables (OCI spec)

Each environment variable maps khổng lồ an command-line argument for nvidia-container-cli from libnvidia-container.These variables are already set in our official CUDA images.

NVIDIA_VISIBLE_DEVICES

This variable controls which GPUs will be made accessible inside the container.

Possible values0,1,2, GPU-fef8089b …: a comma-separated list of GPU UUID(s) or index(es).all: all GPUs will be accessible, this is the default value in our container images.none: no GPU will be accessible, but driver capabilities will be enabled.void or empty or unset: nvidia-container-runtime will have the same behavior as runc.Note: When running on a MIG capable device, the following values will also be available:

0:0,0:1,1:0, MIG-GPU-fef8089b/0/1 …: a comma-separated danh sách of MIG Device UUID(s) or index(es).

Where the MIG device indices have the size : as seen in the example output:


$ nvidia-smi -LGPU 0: Graphics Device (UUID: GPU-b8ea3855-276c-c9cb-b366-c6fa655957c5) MIG Device 0: (UUID: MIG-GPU-b8ea3855-276c-c9cb-b366-c6fa655957c5/1/0) MIG Device 1: (UUID: MIG-GPU-b8ea3855-276c-c9cb-b366-c6fa655957c5/1/1) MIG Device 2: (UUID: MIG-GPU-b8ea3855-276c-c9cb-b366-c6fa655957c5/11/0)

NVIDIA_MIG_CONFIG_DEVICES

This variable controls which of the visible GPUs can have their MIGconfiguration managed from within the container. This includes enabling anddisabling MIG mode, creating & destroying GPU Instances và ComputeInstances, etc.

Possible valuesall: Allow all MIG-capable GPUs in the visible device danh mục to have theirMIG configurations managed.Note:

This feature is only available on MIG capable devices (e.g. The A100).To use this feature, the container must be started with CAP_SYS_ADMIN privileges.When not running as root, the container user must have read access khổng lồ the/proc/driver/nvidia/capabilities/mig/config file on the host.

NVIDIA_MIG_MONITOR_DEVICES

This variable controls which of the visible GPUs can have aggregate informationabout all of their MIG devices monitored from within the container. Thisincludes inspecting the aggregate memory usage, listing the aggregate runningprocesses, etc.

Possible valuesall: Allow all MIG-capable GPUs in the visible device menu to have theirMIG devices monitored.Note:

This feature is only available on MIG capable devices (e.g. The A100).To use this feature, the container must be started with CAP_SYS_ADMIN privileges.When not running as root, the container user must have read access to the/proc/driver/nvidia/capabilities/mig/monitor file on the host.

NVIDIA_DRIVER_CAPABILITIES

This option controls which driver libraries/binaries will be mounted inside the container.

Possible valuescompute,video, graphics,utility …: a comma-separated danh sách of driver features the container needs.all: enable all available driver capabilities.empty or unset: use default driver capability: utility,compute.Supported driver capabilitiescompute: required for CUDA và OpenCL applications.compat32: required for running 32-bit applications.graphics: required for running OpenGL and Vulkan applications.utility: required for using nvidia-smi and NVML.video: required for using the đoạn clip Codec SDK.display: required for leveraging X11 display.

NVIDIA_REQUIRE_*

A logical expression to define constraints on the configurations supported by the container.

Supported constraintscuda: constraint on the CUDA driver version.driver: constraint on the driver version.arch: constraint on the compute architectures of the selected GPUs.brand: constraint on the brand of the selected GPUs (e.g. GeForce, Tesla, GRID).ExpressionsMultiple constraints can be expressed in a single environment variable: space-separated constraints are ORed, comma-separated constraints are ANDed.Multiple environment variables of the form NVIDIA_REQUIRE_* are ANDed together.

NVIDIA_DISABLE_REQUIRE

Single switch khổng lồ disable all the constraints of the size NVIDIA_REQUIRE_*.

NVIDIA_REQUIRE_CUDA

The version of the CUDA toolkit used by the container. It is an instance of the generic NVIDIA_REQUIRE_* case and it is set by official CUDA images.If the version of the NVIDIA driver is insufficient to lớn run this version of CUDA, the container will not be started.

Possible valuescuda>=7.5, cuda>=8.0, cuda>=9.0 …: any valid CUDA version in the size major.minor.

Xem thêm: Quy Trình Sản Xuất Giống Theo Sơ Đồ Duy Trì Và Phục Tráng Có Gì Giống Và Khác Nhau

CUDA_VERSION

Similar to NVIDIA_REQUIRE_CUDA, for legacy CUDA images.In addition, if NVIDIA_REQUIRE_CUDA is not set, NVIDIA_VISIBLE_DEVICES and NVIDIA_DRIVER_CAPABILITIES will mặc định to all.