kubejobs package
Submodules
kubejobs.jobs module
- class kubejobs.jobs.GPU_PRODUCT[source]
Bases:
object
- NVIDIA_A100_SXM4_40GB = 'NVIDIA-A100-SXM4-40GB'
- NVIDIA_A100_SXM4_40GB_MIG_1G_5GB = 'NVIDIA-A100-SXM4-40GB-MIG-1g.5gb'
- NVIDIA_A100_SXM4_40GB_MIG_3G_20GB = 'NVIDIA-A100-SXM4-40GB-MIG-3g.20gb'
- NVIDIA_A100_SXM4_80GB = 'NVIDIA-A100-SXM4-80GB'
- NVIDIA_H100_80GB = 'NVIDIA-H100-80GB-HBM3'
- class kubejobs.jobs.KubernetesJob(name: str, image: str, kueue_queue_name: str, command: List[str] | None = None, args: List[str] | None = None, cpu_request: str | None = None, ram_request: str | None = None, storage_request: str | None = None, gpu_type: str | None = None, gpu_product: str | None = None, gpu_limit: int | None = None, backoff_limit: int = 0, restart_policy: str = 'Never', shm_size: str | None = None, secret_env_vars: dict | None = None, env_vars: dict | None = None, volume_mounts: dict | None = None, job_deadlineseconds: int | None = None, privileged_security_context: bool = False, user_name: str | None = None, user_email: str | None = None, labels: dict | None = None, annotations: dict | None = None, namespace: str | None = None, image_pull_secret: str | None = None)[source]
Bases:
object
A class for generating Kubernetes Job YAML configurations.
- name
Name of the job and associated resources.
- Type:
str
- image
Container image to use for the job.
- Type:
str
- command
Command to execute in the container. Defaults to None.
- Type:
List[str], optional
- args
Arguments for the command. Defaults to None.
- Type:
List[str], optional
- cpu_request
Amount of CPU to request. For example, “500m” for half a CPU. Defaults to None. Max is 192 CPUs
- Type:
str, optional
- ram_request
Amount of RAM to request. For example, “1Gi” for 1 gibibyte. Defaults to None. Max is 890 GB
- Type:
str, optional
- storage_request
Amount of storage to request. For example, “10Gi” for 10 gibibytes. Defaults to None.
- Type:
str, optional
- gpu_type
Type of GPU resource, e.g. “nvidia.com/gpu”. Defaults to None.
- Type:
str, optional
- gpu_product
GPU product, e.g. “NVIDIA-A100-SXM4-80GB”. Defaults to None. Possible choices: NVIDIA-A100-SXM4-80GB – a full non-MIG 80GB GPU, total available 32
NVIDIA-A100-SXM4-40GB – a full non-MIG 40GB GPU, total available 88 NVIDIA-A100-SXM4-40GB-MIG-3g.20gb – just under half-GPU NVIDIA-A100-SXM4-40GB-MIG-1g.5gb – a seventh of a GPU
- Type:
str, optional
- gpu_limit
Number of GPU resources to allocate. Defaults to None.
- Type:
int, optional
- backoff_limit
Maximum number of retries before marking job as failed. Defaults to 4.
- Type:
int, optional
- restart_policy
Restart policy for the job, default is “Never”.
- Type:
str, optional
- shm_size
Size of shared memory, e.g. “2Gi”. If not set, defaults to None.
- Type:
str, optional
- secret_env_vars
Dictionary of secret environment variables. Defaults to None.
- Type:
dict, optional
- env_vars
Dictionary of normal (non-secret) environment variables. Defaults to None.
- Type:
dict, optional
- volume_mounts
Dictionary of volume mounts. Defaults to None.
- Type:
dict, optional
- namespace
Namespace of the job. Defaults to None.
- Type:
str, optional
- kubejobs.jobs.create_jobs_for_experiments(commands: List[str], *args, **kwargs)[source]
Creates and runs a Kubernetes Job for each command in the given list of commands.
- Parameters:
commands – A list of strings, where each string represents a command to be executed.
args – Positional arguments to be passed to the KubernetesJob constructor.
kwargs – Keyword arguments to be passed to the KubernetesJob constructor.
- Example:
from kubejobs import KubernetesJob commands = [ "python experiment.py --param1 value1", "python experiment.py --param1 value2", "python experiment.py --param1 value3" ] create_jobs_for_experiments( commands, image="nvcr.io/nvidia/cuda:12.0.0-cudnn8-devel-ubuntu22.04", gpu_type="nvidia.com/gpu", gpu_limit=1, backoff_limit=4 )
- kubejobs.jobs.create_pv(pv_name: str, storage: str, storage_class_name: str, access_modes: list, pv_type: str, namespace: str = 'default', claim_name: str | None = None, local_path: str | None = None, fs_type: str = 'ext4')[source]
Create a PersistentVolume in the specified namespace with the specified type.
- Parameters:
pv_name – The name of the PersistentVolume.
storage – The amount of storage for the PersistentVolume (e.g., “1500Gi”).
storage_class_name – The storage class name for the PersistentVolume.
access_modes – A list of access modes for the PersistentVolume.
pv_type – The type of PersistentVolume, either ‘local’ or ‘gcePersistentDisk’.
namespace – The namespace in which to create the PersistentVolume. Defaults to “default”.
claim_name – The name of the PersistentVolumeClaim to bind to the PersistentVolume.
local_path – The path on the host for a local PersistentVolume. Required if pv_type is ‘local’.
fs_type – The filesystem type for the PersistentVolume. Defaults to “ext4”.
Example usage:
create_pv("pv-instafluencer-data", "1500Gi", "sc-instafluencer-data", ["ReadOnlyMany"], "local", claim_name="pvc-instafluencer-data", local_path="/mnt/data") # This will create a local PersistentVolume named "pv-instafluencer-data" with 1500Gi of storage, # "sc-instafluencer-data" storage class, ReadOnlyMany access mode, and a local path "/mnt/data".