cloudera.cloud.datahub_cluster module – Manage CDP Datahubs

Note

This module is part of the cloudera.cloud collection (version 2.5.1).

It is not included in ansible-core. To check whether it is installed, run ansible-galaxy collection list.

To install it, use: ansible-galaxy collection install cloudera.cloud. You need further requirements to be able to use this module, see Requirements for details.

To use it in a playbook, specify: cloudera.cloud.datahub_cluster.

Synopsis

  • Create and delete CDP Datahubs.

Requirements

The below requirements are needed on the host that executes this module.

  • cdpy

Parameters

Parameter

Comments

catalog

string

Name of the image catalog to use for cluster instances

cdp_region

aliases: cdp_endpoint_region, endpoint_region

string

Specify the Cloudera Data Platform endpoint region.

Default: "default"

debug

aliases: debug_endpoints

boolean

Capture the CDP SDK debug log.

Choices:

  • false ← (default)

  • true

definition

string

The name or CRN of the cluster definition to use for cluster creation.

delay

aliases: polling_delay

integer

The internal polling interval (in seconds) while the module waits for the datahub to achieve the declared state.

Default: 15

environment

aliases: env

string

The CDP environment name or CRN to which the datahub will be attached.

extension

string

Cluster extensions for Data Hub cluster.

force

boolean

Flag indicating if the datahub should be force deleted.

This option can be used when cluster deletion fails.

This removes the entry from Cloudera Datahub service.

Any lingering resources have to be deleted from the cloud provider manually.

Choices:

  • false ← (default)

  • true

groups

list / elements=dictionary

Instance group details.

attachedVolumeConfiguration

list / elements=dictionary

The attached volume configuration. This does not include root volume.

volumeCount

integer / required

The attached volume count.

volumeSize

integer / required

The attached volume size.

volumeType

string / required

The attached volume type.

instanceGroupName

string / required

The instance group name.

instanceGroupType

string / required

The instance group type.

instanceType

string / required

The cloud provider specific instance type to be used.

nodeCount

integer / required

Number of instances in the instance group

recipeNames

list / elements=string

The names or CRNs of the recipes that would be applied to the instance group.

recoveryMode

string

Recovery mode for the instance group.

rootVolumeSize

integer

The root volume size.

subnetIds

list / elements=string

The list of subnet IDs in case of multi-availability zone setup.

Specifying this field overrides the datahub level subnet ID setup for the multi-availability zone configuration.

volumeEncryption

dictionary

The volume encryption settings.

This setting does not apply to Azure, which always encrypts volumes.

enableEncryption

boolean

Enable encyrption for all volumes in the instance group. Default is false.

Choices:

  • false

  • true

encryptionKey

string

The ARN of the encryption key to use. If nothing is specified, the default key will be used.

image

string

ID of the image used for cluster instances

multi_az

boolean

(AWS) Flag indicating whether to defer to the CDP Environment for availability zone/subnet placement.

Useful for when you are not sure which subnet is available to the datahub cluster.

Choices:

  • false

  • true ← (default)

name

aliases: datahub, cluster_name

string / required

The name of the datahub.

This name must be unique, must have between 5 and 20 characters, and must contain only lowercase letters, numbers, and hyphens.

Names are case-sensitive.

profile

string

If provided, the CDP SDK will use this value as its profile.

state

string

The declarative state of the datahub.

If creating a datahub, the associate Environment and Datalake must be started as well.

Choices:

  • "present" ← (default)

  • "started"

  • "stopped"

  • "absent"

subnet

string

The subnet ID in AWS, or the Subnet Name on Azure or GCP

Mutually exclusive with the subnet and subnets options

subnets

list / elements=string

List of subnet IDs in case of multi availability zone setup.

Mutually exclusive with the subnet and subnets options

subnets_filter

list / elements=string

JMESPath expression to filter the subnets to be used for the load balancer

The expression will be applied to the full list of subnets for the specified environment

Each subnet in the list is an object with the following attributes - subnetId, subnetName, availabilityZone, cidr

The filter expression must only filter the list, but not apply any attribute projection

Mutually exclusive with the subnet and subnets options

tags

aliases: datahub_tags

dictionary

Tags associated with the datahub and its resources.

template

string

Name or CRN of the cluster template to use for cluster creation.

timeout

aliases: polling_timeout

integer

The internal polling timeout (in seconds) while the module waits for the datahub to achieve the declared state.

Default: 3600

verify_endpoint_tls

aliases: endpoint_tls

boolean

Verify the TLS certificates for the CDP endpoint.

Choices:

  • false

  • true ← (default)

wait

boolean

Flag to enable internal polling to wait for the datahub to achieve the declared state.

If set to FALSE, the module will return immediately.

Choices:

  • false

  • true ← (default)

Examples

# Note: These examples do not set authentication details.

- name: Create a datahub specifying instance group details (and do not wait for status change)
  cloudera.cloud.datahub_cluster:
    name: datahub-name
    env: name-or-crn
    state: present
    subnet: subnet-id-for-cloud-provider
    image: image-uuid-from-catalog
    catalog: name-of-catalog-for-image
    template: template-name
    groups:
      - nodeCount: 1
        instanceGroupName: master
        instanceGroupType: GATEWAY
        instanceType: instance-type-for-cloud-provider
        rootVolumeSize: 100
        recoveryMode: MANUAL
        recipeNames: []
        attachedVolumeConfiguration:
          - volumeSize: 100
            volumeCount: 1
            volumeType: volume-type-for-cloud-provider
    tags:
      project: Arbitrary content
    wait: no

- name: Create a datahub specifying only a definition name
  cloudera.cloud.datahub_cluster:
    name: datahub-name
    env: name-or-crn
    definition: definition-name
    tags:
      project: Arbitrary content
    wait: no

- name: Stop the datahub (and wait for status change)
  cloudera.cloud.datahub_cluster:
    name: example-datahub
    state: stopped

- name: Start the datahub (and wait for status change)
  cloudera.cloud.datahub_cluster:
    name: example-datahub
    state: started

- name: Delete the datahub (and wait for status change)
  cloudera.cloud.datahub_cluster:
    name: example-datahub
    state: absent

Return Values

Common return values are documented here, the following are the fields unique to this module:

Key

Description

datahub

dictionary

The information about the Datahub

Returned: always

clouderaManager

dictionary

The Cloudera Manager details.

Returned: success

platformVersion

string

CDP Platform version.

Returned: when supported

version

string

Cloudera Manager version.

Returned: always

cloudPlatform

string

The cloud platform.

Returned: when supported

clusterName

string

The name of the cluster.

Returned: always

clusterStatus

string

The status of the cluster.

Returned: when supported

clusterTemplateCrn

string

The CRN of the cluster template used for the cluster creation.

Returned: when supported

creationDate

string

The date when the cluster was created.

Return value is a date timestamp.

Returned: when supported

credentialCrn

string

The CRN of the credential.

Returned: when supported

crn

string

The CRN of the cluster.

Returned: always

datalakeCrn

string

The CRN of the attached datalake.

Returned: when supported

endpoints

list / elements=dictionary

The exposed service API endpoints.

Returned: when supported

endpoint

list / elements=dictionary

The endpoints.

Returned: always

displayName

string

The more consumable name of the exposed service.

Returned: always

knoxService

string

The related knox entry.

Returned: always

mode

string

The SSO mode of the given service.

Returned: always

open

boolean

Flag of the access status of the given endpoint.

Returned: always

serviceName

string

The name of the exposed service.

Returned: always

serviceUrl

string

The server url for the given exposed service’s API.

Returned: always

environmentCrn

string

The CRN of the environment.

Returned: when supported

environmentName

string

The name of the environment.

Returned: when supported

imageDetails

dictionary

The image details.

Returned: when supported

catalogName

string

The image catalog name.

Returned: when supported

catalogUrl

string

The image catalog URL.

Returned: when supported

id

string

The ID of the image used for cluster instances.

This is internally generated by the cloud provider to uniquely identify the image.

Returned: when supported

name

string

The name of the image used for cluster instances.

Returned: when supported

instanceGroups

list / elements=dictionary

The instance details.

Returned: when supported

availabilityZones

list / elements=string

List of availability zones associated with the instance group.

Returned: when supported

instances

list / elements=dictionary

List of instances in this instance group.

Returned: always

attachedVolumes

list / elements=dictionary

List of volumes attached to this instance.

Returned: when supported

count

integer

The number of volumes.

Returned: when supported

size

integer

The size of each volume in GB.

Returned: when supported

volumeType

string

The type of volumes.

Returned: when supported

availabilityZone

string

The availability zone of the instance.

Returned: when supported

clouderaManagerServer

boolean

Flag indicating if Cloudera Manager has been deployed or not.

Returned: when supported

fqdn

string

The fully-qualified domain name (FQDN) of the instance.

Returned: when supported

id

string

The ID of the given instance.

Returned: always

instanceGroup

string

The name of the instance group associated with the instance.

Returned: when supported

instanceType

string

The type of the given instance.

Values are GATEWAY, GATEWAY_PRIMARY, or CORE.

Returned: always

instanceVmType

string

The VM type of the instance.

Supported values depend on the cloud platform.

Returned: when supported

privateIp

string

The private IP of the given instance.

Returned: when supported

publicIp

string

The public IP of the given instance.

Returned: when supported

rackId

string

The rack ID of the instance in Cloudera Manager.

Returned: when supported

sshPort

integer

The SSH port for the instance.

Returned: when supported

state

string

The health state of the instance.

UNHEALTHY represents instances with unhealthy services, lost instances, or failed operations.

Returned: always

status

string

The status of the instance.

This includes information like whether the instance is being provisioned, stopped, decommissioning failures etc.

Returned: when supported

statusReason

string

The reason for the current status of this instance.

Returned: when supported

subnetId

string

The subnet ID of the instance.

Returned: when supported

name

string

The name of the instance group where the given instance is located.

Returned: always

subnetIds

list / elements=string

The list of subnet IDs in case of multi-availability zone setup.

Returned: when supported

nodeCount

integer

The cluster node count.

Returned: when supported

status

string

The status of the stack.

Returned: when supported

statusReason

string

The status reason.

Returned: when supported

workloadType

string

The workload type for the cluster.

Returned: when supported

sdk_out

string

Returns the captured CDP SDK log.

Returned: when supported

sdk_out_lines

list / elements=string

Returns a list of each line of the captured CDP SDK log.

Returned: when supported

Authors

  • Webster Mudge (@wmudge)

  • Daniel Chaffelson (@chaffelson)

  • Chris Perro (@cmperro)