kubejobs package

Submodules

kubejobs.jobs module

class kubejobs.jobs.GPU_PRODUCT[source]

Bases: object

NVIDIA_A100_SXM4_40GB = 'NVIDIA-A100-SXM4-40GB'
NVIDIA_A100_SXM4_40GB_MIG_1G_5GB = 'NVIDIA-A100-SXM4-40GB-MIG-1g.5gb'
NVIDIA_A100_SXM4_40GB_MIG_3G_20GB = 'NVIDIA-A100-SXM4-40GB-MIG-3g.20gb'
NVIDIA_A100_SXM4_80GB = 'NVIDIA-A100-SXM4-80GB'
NVIDIA_H100_80GB = 'NVIDIA-H100-80GB-HBM3'
class kubejobs.jobs.KubernetesJob(name: str, image: str, kueue_queue_name: str, command: List[str] | None = None, args: List[str] | None = None, cpu_request: str | None = None, ram_request: str | None = None, storage_request: str | None = None, gpu_type: str | None = None, gpu_product: str | None = None, gpu_limit: int | None = None, backoff_limit: int = 0, restart_policy: str = 'Never', shm_size: str | None = None, secret_env_vars: dict | None = None, env_vars: dict | None = None, volume_mounts: dict | None = None, job_deadlineseconds: int | None = None, privileged_security_context: bool = False, user_name: str | None = None, user_email: str | None = None, labels: dict | None = None, annotations: dict | None = None, namespace: str | None = None, image_pull_secret: str | None = None)[source]

Bases: object

A class for generating Kubernetes Job YAML configurations.

name

Name of the job and associated resources.

Type:

str

image

Container image to use for the job.

Type:

str

command

Command to execute in the container. Defaults to None.

Type:

List[str], optional

args

Arguments for the command. Defaults to None.

Type:

List[str], optional

cpu_request

Amount of CPU to request. For example, “500m” for half a CPU. Defaults to None. Max is 192 CPUs

Type:

str, optional

ram_request

Amount of RAM to request. For example, “1Gi” for 1 gibibyte. Defaults to None. Max is 890 GB

Type:

str, optional

storage_request

Amount of storage to request. For example, “10Gi” for 10 gibibytes. Defaults to None.

Type:

str, optional

gpu_type

Type of GPU resource, e.g. “nvidia.com/gpu”. Defaults to None.

Type:

str, optional

gpu_product

GPU product, e.g. “NVIDIA-A100-SXM4-80GB”. Defaults to None. Possible choices: NVIDIA-A100-SXM4-80GB – a full non-MIG 80GB GPU, total available 32

NVIDIA-A100-SXM4-40GB – a full non-MIG 40GB GPU, total available 88 NVIDIA-A100-SXM4-40GB-MIG-3g.20gb – just under half-GPU NVIDIA-A100-SXM4-40GB-MIG-1g.5gb – a seventh of a GPU

Type:

str, optional

gpu_limit

Number of GPU resources to allocate. Defaults to None.

Type:

int, optional

backoff_limit

Maximum number of retries before marking job as failed. Defaults to 4.

Type:

int, optional

restart_policy

Restart policy for the job, default is “Never”.

Type:

str, optional

shm_size

Size of shared memory, e.g. “2Gi”. If not set, defaults to None.

Type:

str, optional

secret_env_vars

Dictionary of secret environment variables. Defaults to None.

Type:

dict, optional

env_vars

Dictionary of normal (non-secret) environment variables. Defaults to None.

Type:

dict, optional

volume_mounts

Dictionary of volume mounts. Defaults to None.

Type:

dict, optional

namespace

Namespace of the job. Defaults to None.

Type:

str, optional

generate_yaml() dict[source]

Generate the Kubernetes Job YAML configuration.

classmethod from_command_line()[source]

Create a KubernetesJob instance from command-line arguments and run the job. Example: python kubejobs/jobs.py –image=nvcr.io/nvidia/cuda:11.8.0-cudnn8-devel-ubuntu20.04 –gpu_type=nvidia.com/gpu –gpu_limit=1 –backoff_limit=4 –gpu_product=NVIDIA-A100-SXM4-40GB

generate_yaml()[source]
run()[source]
class kubejobs.jobs.KueueQueue[source]

Bases: object

INFORMATICS = 'informatics-user-queue'
kubejobs.jobs.create_jobs_for_experiments(commands: List[str], *args, **kwargs)[source]

Creates and runs a Kubernetes Job for each command in the given list of commands.

Parameters:
  • commands – A list of strings, where each string represents a command to be executed.

  • args – Positional arguments to be passed to the KubernetesJob constructor.

  • kwargs – Keyword arguments to be passed to the KubernetesJob constructor.

Example:

from kubejobs import KubernetesJob

commands = [
    "python experiment.py --param1 value1",
    "python experiment.py --param1 value2",
    "python experiment.py --param1 value3"
]

create_jobs_for_experiments(
    commands,
    image="nvcr.io/nvidia/cuda:12.0.0-cudnn8-devel-ubuntu22.04",
    gpu_type="nvidia.com/gpu",
    gpu_limit=1,
    backoff_limit=4
)
kubejobs.jobs.create_pv(pv_name: str, storage: str, storage_class_name: str, access_modes: list, pv_type: str, namespace: str = 'default', claim_name: str | None = None, local_path: str | None = None, fs_type: str = 'ext4')[source]

Create a PersistentVolume in the specified namespace with the specified type.

Parameters:
  • pv_name – The name of the PersistentVolume.

  • storage – The amount of storage for the PersistentVolume (e.g., “1500Gi”).

  • storage_class_name – The storage class name for the PersistentVolume.

  • access_modes – A list of access modes for the PersistentVolume.

  • pv_type – The type of PersistentVolume, either ‘local’ or ‘gcePersistentDisk’.

  • namespace – The namespace in which to create the PersistentVolume. Defaults to “default”.

  • claim_name – The name of the PersistentVolumeClaim to bind to the PersistentVolume.

  • local_path – The path on the host for a local PersistentVolume. Required if pv_type is ‘local’.

  • fs_type – The filesystem type for the PersistentVolume. Defaults to “ext4”.

Example usage:

create_pv("pv-instafluencer-data", "1500Gi", "sc-instafluencer-data", ["ReadOnlyMany"], "local",
          claim_name="pvc-instafluencer-data", local_path="/mnt/data")
# This will create a local PersistentVolume named "pv-instafluencer-data" with 1500Gi of storage,
# "sc-instafluencer-data" storage class, ReadOnlyMany access mode, and a local path "/mnt/data".
kubejobs.jobs.create_pvc(pvc_name: str, storage: str, access_modes: list | None = None)[source]
kubejobs.jobs.fetch_user_info()[source]

Module contents