cloudera.cloud.df_service module – Enable or Disable CDP DataFlow Services

Note

This module is part of the cloudera.cloud collection (version 3.1.0).

It is not included in ansible-core. To check whether it is installed, run ansible-galaxy collection list.

To install it, use: ansible-galaxy collection install cloudera.cloud.

To use it in a playbook, specify: cloudera.cloud.df_service.

New in cloudera.cloud 1.2.0

Synopsis

  • Enable or Disable CDP DataFlow Services

Parameters

Parameter

Comments

access_key

string

If provided, the Cloudera on cloud API will use this value as its access key.

If not provided, the API will attempt to use the value from the environment variable CDP_ACCESS_KEY_ID.

Required if private_key is provided.

Mutually exclusive with credentials_path.

cluster_subnets

list / elements=string

Subnet ids that will be assigned to the Kubernetes cluster

Mutually exclusive with the cluster_subnets_filter option

cluster_subnets_filter

string

Filter expression to select subnets for the Kubernetes cluster

Multiple formats supported:

  1. JMESPath syntax (legacy): “[?contains(subnetName, ‘pvt-0’)]”

  2. Simple pattern: “pvt-0” (shorthand for subnetName contains)

  3. Filter expressions: “contains(subnetName, ‘value’)”, “field == ‘value’”, “field != ‘value’”, “startswith(field, ‘value’)”

The filter operates on subnet objects with attributes: subnetId, subnetName, availabilityZone, cidr

Mutually exclusive with the cluster_subnets option.

credentials_path

string

If provided, the Cloudera on cloud API will use this value as its credentials path.

If not provided, the API will attempt to use the value from the environment variable CDP_CREDENTIALS_PATH.

Default: "~/.cdp/credentials"

debug

aliases: debug_endpoints

boolean

If true, the module will capture the Cloudera on cloud HTTP log and return it in the sdk_out and sdk_out_lines fields.

Choices:

  • false ← (default)

  • true

delay

aliases: polling_delay

integer

The internal polling interval (in seconds) while the module waits for the Dataflow Service to achieve the declared state.

Default: 15

df_crn

string

The CRN of the DataFlow Service, if available

Required when state=absent

endpoint

aliases: endpoint_url, url

string

The Cloudera on cloud API endpoint to use.

Mutually exclusive with endpoint_region.

endpoint_region

aliases: cdp_endpoint_region, cdp_region, region

string

Specify the Cloudera on cloud API endpoint region.

See Cloudera Control Plane regions for more information.

If not provided, the API will attempt to use the value from the environment variable CDP_REGION.

default is an alias for the us-west-1 region.

Mutually exclusive with endpoint.

Choices:

  • "default"

  • "us-west-1" ← (default)

  • "eu-1"

  • "ap-1"

endpoint_tls

aliases: verify_endpoint_tls, verify_tls, verify_api_tls

boolean

Verify the TLS certificates for the Cloudera on cloud API endpoint.

Choices:

  • false

  • true ← (default)

env_crn

aliases: name

string

The CRN of the CDP Environment to host the Dataflow Service

The environment name can also be provided, instead of the CRN

Required when state=present

force

aliases: force_delete

boolean

Flag to indicate if the DataFlow deletion should be forced.

Choices:

  • false ← (default)

  • true

http_agent

aliases: agent_header

string

The HTTP user agent to use for Cloudera on cloud API requests.

Default: "cloudera.cloud"

instance_type

string

Indicates custom instance type to be used for Kubernetes nodes

Cloud provider specific (e.g., “m5.2xlarge” for AWS, “Standard_D8s_v3” for Azure)

k8s_ip_ranges

list / elements=string

The IP ranges authorized to connect to the Kubernetes API server

loadbalancer_ip_ranges

list / elements=string

The IP ranges authorized to connect to the load balancer

loadbalancer_subnets

list / elements=string

Subnet ids that will be assigned to the load balancer

Mutually exclusive with the loadbalancer_subnets_filter option

loadbalancer_subnets_filter

string

Filter expression to select subnets for the load balancer

Multiple formats supported:

  1. JMESPath syntax (legacy): “[?contains(subnetName, ‘pub’)]”

  2. Simple pattern: “pub” (shorthand for subnetName contains)

  3. Filter expressions: “contains(subnetName, ‘value’)”, “field == ‘value’”, “field != ‘value’”, “startswith(field, ‘value’)”

The filter operates on subnet objects with attributes: subnetId, subnetName, availabilityZone, cidr

Mutually exclusive with the loadbalancer_subnets option.

nodes_max

aliases: max_k8s_node_count

integer

The maximum number of kubernetes nodes that environment may scale up under high-demand situations.

Default: 3

nodes_min

aliases: min_k8s_node_count

integer

The minimum number of kubernetes nodes needed for the environment. Note that the lowest minimum is 3 nodes.

Default: 3

persist

boolean

Whether or not to retain the database records of related entities during removal.

Choices:

  • false ← (default)

  • true

pod_cidr

string

CIDR range from which to assign IPs to pods in the Kubernetes cluster

Must be a valid CIDR block (e.g., “10.200.0.0/16”)

private_cluster

aliases: enable_private_cluster

boolean

Flag to specify if a private K8s cluster should be created.

Choices:

  • false ← (default)

  • true

private_key

string

If provided, the Cloudera on cloud API will use this value as its private key.

If not provided, the API will attempt to use the value from the environment variable CDP_PRIVATE_KEY.

Required if access_key is provided.

profile

string

If provided, the Cloudera on cloud API will use this value as its profile.

If not provided, the API will attempt to use the value from the environment variable CDP_PROFILE.

Default: "default"

public_loadbalancer

aliases: use_public_load_balancer

boolean

Indicates whether or not to use a public load balancer when deploying dependencies stack.

Choices:

  • false

  • true

service_cidr

string

CIDR range from which to assign IPs to internal services in the Kubernetes cluster

Must be a valid CIDR block (e.g., “10.201.0.0/16”)

skip_preflight_checks

boolean

Indicates whether to skip pre-flight checks during service enablement

Use with caution - skipping checks may result in deployment failures

Choices:

  • false ← (default)

  • true

state

string

The declarative state of the Dataflow Service

Choices:

  • "present" ← (default)

  • "absent"

strict

aliases: strict_errors

boolean

Legacy CDPy SDK error handling.

Choices:

  • false ← (default)

  • true

tags

dictionary

Tags to apply to the DataFlow Service

terminate

boolean

Whether or not to terminate all deployments associated with this DataFlow service

Choices:

  • false ← (default)

  • true

timeout

aliases: polling_timeout

integer

The internal polling timeout (in seconds) while the module waits for the Dataflow Service to achieve the declared state.

Default: 3600

user_defined_routing

boolean

Indicates whether User Defined Routing (UDR) mode is enabled for AKS clusters

Azure-specific option for controlling network routing behavior

Choices:

  • false ← (default)

  • true

wait

boolean

Flag to enable internal polling to wait for the Dataflow Service to achieve the declared state.

If set to FALSE, the module will return immediately.

Choices:

  • false

  • true ← (default)

Notes

Note

  • This feature this module is for is in Technical Preview

  • When updating an existing service, only the following parameters can be changed: nodes_min/nodes_max (both required together), k8s_ip_ranges, loadbalancer_ip_ranges, skip_preflight_checks

  • Network configuration (subnets, pod_cidr, service_cidr, cluster type) cannot be updated after creation

  • To change immutable parameters, you must disable and recreate the service

  • When state=absent and force=true, if service is in NOT_ENABLED state, resetService API is used

  • resetService only works on NOT_ENABLED services and does not clean up cloud resources

  • Use force=true with caution as manual resource cleanup may be required

Examples

# Note: These examples do not set authentication details.

- cloudera.cloud.df_service:
    name: my-service
    nodes_min: 3
    nodes_max: 10
    public_loadbalancer: true
    cluster_subnets_filter: "[?contains(subnetName, 'pvt-0')]"
    loadbalancer_subnets_filter: "[?contains(subnetName, 'pub')]"
    state: present
    wait: true

- cloudera.cloud.df_service:
    name: my-service
    nodes_min: 3
    nodes_max: 10
    public_loadbalancer: true
    cluster_subnets_filter: "pvt-0"
    loadbalancer_subnets_filter: "pub"
    state: present
    wait: true

- cloudera.cloud.df_service:
    name: my-service
    nodes_min: 3
    nodes_max: 10
    public_loadbalancer: true
    cluster_subnets_filter: "availabilityZone == 'us-east-1a'"
    loadbalancer_subnets_filter: "availabilityZone == 'us-east-1b'"
    state: present
    wait: true

# Remove a Dataflow Service with Async wait
- cloudera.cloud.df_service:
    name: my-service
    persist: false
    state: absent
    wait: true
  async: 3600
  poll: 0
  register: __my_teardown_request

Return Values

Common return values are documented here, the following are the fields unique to this module:

Key

Description

sdk_out

string

Returns the captured CDP SDK log.

Returned: when supported

sdk_out_lines

list / elements=string

Returns a list of each line of the captured CDP SDK log.

Returned: when supported

services

list / elements=complex

The information about the named DataFlow Service or DataFlow Services

Returned: always

activeErrorAlertCount

integer

Current count of active alerts classified as an error.

Returned: always

activeWarningAlertCount

integer

Current count of active alerts classified as a warning.

Returned: always

authorizedIpRanges

list / elements=string

The authorized IP Ranges.

Returned: always

cloudPlatform

string

The cloud platform of the environment.

Returned: always

clusterId

string

Cluster id of the environment.

Returned: if enabled

crn

string

The DataFlow Service’s parent environment CRN.

Returned: always

deploymentCount

string

The deployment count.

Returned: always

dfLocalUrl

string

The URL of the environment local DataFlow application.

Returned: always

instanceType

string

The instance type of the kubernetes nodes currently in use by DataFlow for this environment.

Returned: always

k8sNodeCount

integer

The number of kubernetes nodes currently in use by DataFlow for this environment.

Returned: always

maxK8sNodeCount

string

The maximum number of kubernetes nodes that environment may scale up under high-demand situations.

Returned: always

minK8sNodeCount

integer

The minimum number of Kubernetes nodes that need to be provisioned in the environment.

Returned: always

name

string

The DataFlow Service’s parent environment name.

Returned: always

region

string

The region of the environment.

Returned: always

status

dictionary

The status of a DataFlow enabled environment.

Returned: always

message

string

A status message for the environment.

Returned: always

state

string

The state of the environment.

Returned: always

Authors

  • Dan Chaffelson (@chaffelson)

  • Ronald Suplina (@rsuplina)