cloudera.cloud.datalake module – Manage CDP Datalakes

Note

This module is part of the cloudera.cloud collection (version 2.5.1).

It is not included in ansible-core. To check whether it is installed, run ansible-galaxy collection list.

To install it, use: ansible-galaxy collection install cloudera.cloud. You need further requirements to be able to use this module, see Requirements for details.

To use it in a playbook, specify: cloudera.cloud.datalake.

Synopsis

  • Create and delete CDP Datalakes.

  • To start and stop a datalake, use the cloudera.cloud.env module to change the associated CDP Environment’s state.

Requirements

The below requirements are needed on the host that executes this module.

  • cdpy

Parameters

Parameter

Comments

cdp_region

aliases: cdp_endpoint_region, endpoint_region

string

Specify the Cloudera Data Platform endpoint region.

Default: "default"

debug

aliases: debug_endpoints

boolean

Capture the CDP SDK debug log.

Choices:

  • false ← (default)

  • true

delay

aliases: polling_delay

integer

The internal polling interval (in seconds) while the module waits for the datalake to reach the declared state.

Default: 15

environment

string

The CDP environment name or CRN to which the datalake will be attached.

If the environment is AWS-based, instance_profile and storage must be present.

Choices:

  • "env"

force

boolean

Flag indicating if the datalake should be force deleted.

This option can be used when cluster deletion fails.

This removes the entry from Cloudera Datalake service.

Any lingering resources have to be deleted from the cloud provider manually.

Choices:

  • false ← (default)

  • true

instance_profile

string

(AWS) The IAM instance profile of the ID Broker role, which can assume the Datalake Admin S3 role.

(Azure) The URI of the Identity of the ID Broker Role, which can assume the Datalake Admin ADLS role.

(GCP) The Service Account email of the ID Broker Role, which can assume the Datalake Admin GCS role.

multi_az

boolean

(AWS) Flag indicating if the datalake is deployed across multi-availability zones.

Choices:

  • false ← (default)

  • true

name

aliases: datalake

string / required

The name of the datalake.

This name must be unique, must have between 5 and 100 characters, and must contain only lowercase letters, numbers, and hyphens.

Names are case-sensitive.

profile

string

If provided, the CDP SDK will use this value as its profile.

raz

boolean

Flag indicating if Ranger RAZ fine grained access should be enabled for the datalake

Choices:

  • false ← (default)

  • true

recipes

list / elements=dictionary

Recipes that will be attached on the datalake instances groups

instanceGroupName

string

Datalake instance/host group group name, e.g. `master` or `idbroker`.

recipeNames

list / elements=string

Names of the recipes

runtime

string

The Cloudera Runtime version for the datalake, when supported

scale

string

The scale of the datalake.

Note that the choice of MEDIUM_DUTY_HA is unsupported since datalake version 7.2.18.

Choices:

  • "LIGHT_DUTY" ← (default)

  • "ENTERPRISE"

  • "MEDIUM_DUTY_HA"

state

string

The declarative state of the datalake.

If creating a datalake, the associate environment must be started as well.

Choices:

  • "present" ← (default)

  • "absent"

storage

aliases: storage_location, storage_location_base

string

(AWS) The S3 bucket (and optional path) for the Storage Location Base for the datalake, starting with s3a://

(Azure) The ADLS bucket URI (and optional path) for the Datalake storage

(GCP) The bucket name and optional path for the GCS Storage Location Base for the Datalake, starting with gs://

tags

aliases: datalake_tags

dictionary

Tags associated with the datalake and its resources.

timeout

aliases: polling_timeout

integer

The internal polling timeout (in seconds) while the module waits for the datalake to achieve the declared state.

Default: 3600

verify_endpoint_tls

aliases: endpoint_tls

boolean

Verify the TLS certificates for the CDP endpoint.

Choices:

  • false

  • true ← (default)

wait

boolean

Flag to enable internal polling to wait for the datalake to achieve the declared state.

If set to FALSE, the module will return immediately.

Choices:

  • false

  • true ← (default)

Examples

# Note: These examples do not set authentication details.

# Create a datalake in AWS
- cloudera.cloud.datalake:
    name: example-datalake
    state: present
    environment: an-aws-environment-name-or-crn
    instance_profile: arn:aws:iam::1111104421142:instance-profile/example-role
    storage: s3a://example-bucket/datalake/data
    tags:
      project: Arbitrary content

# Create a datalake in AWS, but don't wait for completion (see datalake_info for datalake status)
- cloudera.cloud.datalake:
    name: example-datalake
    state: present
    wait: no
    environment: an-aws-environment-name-or-crn
    instance_profile: arn:aws:iam::1111104421142:instance-profile/example-role
    storage: s3a://example-bucket/datalake/data
    tags:
      project: Arbitrary content

# Delete the datalake (and wait for status change)
  cloudera.cloud.datalake:
    name: example-datalake
    state: absent

Return Values

Common return values are documented here, the following are the fields unique to this module:

Key

Description

datalake

dictionary

The information about the Datalake

Returned: on success

awsConfiguration

dictionary

AWS-specific configuration details.

Returned: when supported

instanceProfile

string

The instance profile used for the ID Broker instance.

Returned: always

azureConfiguration

dictionary

Azure-specific environment configuration information.

Returned: when supported

managedIdentity

string

The managed identity used for the ID Broker instance.

Returned: always

clouderaManager

dictionary

The Cloudera Manager details.

Returned: when supported

clouderaManagerRepositoryURL

string

Cloudera Manager repository URL.

Returned: always

clouderaManagerServerURL

string

Cloudera Manager server URL.

Returned: when supported

version

string

Cloudera Manager version.

Returned: always

Sample: "7.2.1"

cloudPlatform

string

Cloud provider of the Datalake.

Returned: when supported

Sample: "['AWS', 'AZURE']"

creationDate

string

The timestamp when the Datalake was created.

Returned: when supported

Sample: "2020-09-23T11:33:50.847000+00:00"

credentialCrn

string

CRN of the CDP Credential.

Returned: when supported

crn

string

CRN value for the Datalake.

Returned: always

datalakeName

string

Name of the Datalake.

Returned: always

enableRangerRaz

boolean

Whether or not RAZ is enabled

Returned: always

endpoints

dictionary

Details for the exposed service API endpoints of the Datalake.

Returned: when supported

endpoints

list / elements=dictionary

The exposed API endpoints.

Returned: always

displayName

string

User-friendly name of the exposed service.

Returned: always

Sample: "Atlas"

knoxService

string

The related Knox entry for the service.

Returned: always

Sample: "ATLAS_API"

mode

string

The Single Sign-On (SSO) mode for the service.

Returned: always

Sample: "PAM"

open

boolean

Flag for the access status of the service.

Returned: always

serviceName

string

The name of the exposed service.

Returned: always

Sample: "ATLAS_SERVER"

serviceUrl

string

The server URL for the exposed service’s API.

Returned: always

Sample: "https://some.domain/a-datalake/endpoint"

environmentCrn

string

CRN of the associated Environment.

Returned: when supported

gcpConfiguration

dictionary

GCP-specific environment configuration information.

Returned: when supported

serviceAccountEmail

string

The email id of the service account used for the ID Broker instance.

Returned: always

instanceGroups

list / elements=complex

The instance details of the Datalake.

Returned: when supported

instances

list / elements=dictionary

Details about the instances.

Returned: always

id

string

The identifier of the instance.

Returned: always

Sample: "i-00b58f27be4e7ab9f"

state

string

The state of the instance.

Returned: always

Sample: "HEALTHY"

name

string

Name of the instance group associated with the instances.

Returned: always

Sample: "idbroker"

productVersions

list / elements=dictionary

The product versions.

Returned: when supported

name

string

The name of the product.

Returned: always

Sample: "FLINK"

version

string

The version of the product.

Returned: always

Sample: "1.10.0-csa1.2.1.0-cdh7.2.1.0-240-4844562"

region

string

The region of the Datalake.

Returned: when supported

status

string

The status of the Datalake.

Returned: when supported

Sample: "['EXTERNAL_DATABASE_START_IN_PROGRESS', 'START_IN_PROGRESS', 'RUNNING', 'EXTERNAL_DATABASE_START_IN_PROGRESS', 'START_IN_PROGRESS', 'EXTERNAL_DATABASE_STOP_IN_PROGRESS', 'STOP_IN_PROGRESS', 'STOPPED', 'REQUESTED', 'EXTERNAL_DATABASE_CREATION_IN_PROGRESS', 'STACK_CREATION_IN_PROGRESS', 'EXTERNAL_DATABASE_DELETION_IN_PROGRESS', 'STACK_DELETION_IN_PROGRESS', 'PROVISIONING_FAILED']"

statusReason

string

An explanation of the status.

Returned: when supported

Sample: "Datalake is running"

sdk_out

string

Returns the captured CDP SDK log.

Returned: when supported

sdk_out_lines

list / elements=string

Returns a list of each line of the captured CDP SDK log.

Returned: when supported

Authors

  • Webster Mudge (@wmudge)

  • Dan Chaffelson (@chaffelson)