What is KubernetesPodOperator in Airflow

A KubernetesPodOperator is a type of operator in Apache Airflow that allows you to launch a Kubernetes pod as a task in an Airflow workflow. This can be useful if you want to run a containerized workload as part of your pipeline, or if you want to use the power of Kubernetes to manage the resources and scheduling of your tasks.

Here is an example of how you might use a KubernetesPodOperator in an Airflow DAG:

from airflow import DAG
from airflow.operators.kubernetes_pod_operator import KubernetesPodOperator
from airflow.utils.dates import days_ago

default_args = {
    'owner': 'me',
    'start_date': days_ago(2),
}

dag = DAG(
    'kubernetes_sample',
    default_args=default_args,
    schedule_interval=timedelta(minutes=10),
)

# Define a task using a KubernetesPodOperator
task = KubernetesPodOperator(
    namespace='default',
    image="python:3.6-slim",
    cmds=["python", "-c"],
    arguments=["print('hello world')"],
    labels={"foo": "bar"},
    name="test-pod",
    task_id="test-pod",
    is_delete_operator_pod=True,
    dag=dag,
)

In this example, we are defining a task that will launch a Kubernetes pod in the default namespace, using the python:3.6-slim Docker image. The pod will run a single command, print('hello world'), using the python interpreter. The task is given a label of foo: bar and a name of test-pod.

There are many other parameters that you can use to customize the behavior of the KubernetesPodOperator, such as setting resource limits and requests, specifying environment variables, and mounting volumes. You can find a full list of available parameters in the Airflow documentation.

Search This Blog

What is KubernetesPodOperator in Airflow

Comments

Post a Comment

Popular posts from this blog

Building Scalable and Efficient Data Lakes with Apache Hudi

Top 25 Data Engineer Interview Questions

How to prepare for the Data Engineering Interviews?