Skip to main content

google_dataproc_cluster resource

Syntax

A google_dataproc_cluster is used to test a Google Cluster resource

Beta Resource

This resource has beta fields available. To retrieve these fields, include beta: true in the constructor for the resource

Examples

describe google_dataproc_cluster(project: 'chef-gcp-inspec', region: 'europe-west2', cluster_name: 'inspec-dataproc-cluster') do
  it { should exist }
  its('labels') { should include('label' => 'value') }
  its('config.master_config.num_instances') { should cmp '1' }
  its('config.worker_config.num_instances') { should cmp '2' }
  its('config.master_config.machine_type_uri') { should match 'n1-standard-1' }
  its('config.worker_config.machine_type_uri') { should match 'n1-standard-1' }
  its('config.software_config.properties') { should include('dataproc:dataproc.allow.zero.workers' => 'true') }
end

describe google_dataproc_cluster(project: 'chef-gcp-inspec', region: 'europe-west2', cluster_name: 'nonexistent') do
  it { should_not exist }
end

Properties

Properties that can be accessed from the google_dataproc_cluster resource:

cluster_name
The name of the cluster, unique within the project and region.
labels
Labels to apply to this cluster. A list of key->value pairs.
config
Configuration for the cluster
config_bucket
The Cloud Storage staging bucket used to stage files, such as Hadoop jars, between client machines and the cluster.
gce_cluster_config
Common config settings for resources of Google Compute Engine cluster instances, applicable to all instances in the cluster.
zone_uri
The zone where the Compute Engine cluster will be located
network_uri
The Compute Engine network to be used for machine communications
subnetwork_uri
The Compute Engine subnetwork to be used for machine communications
internal_ip_only
If true, all instances int he cluster will only have internal IP addresses
service_account_scopes
The URIs of service account scopes to be included in Compute Engine instances The following base set of scopes is always included: https://www.googleapis.com/auth/cloud.useraccounts.readonly https://www.googleapis.com/auth/devstorage.read_write https://www.googleapis.com/auth/logging.write
tags
The Compute Engine tags to add to all instances
metadata
The map of metadata entries to add to all instances
master_config
The config settings for Compute Engine resources in an instance group, such as a master or worker group.
num_instances
The number of VM instances in the instance group. For master instance groups, must be set to 1.
instance_names
The list of instance names.
image_uri
The Compute Engine image resource used for cluster instances.
machine_type_uri
The Compute Engine machine type used for cluster instances
disk_config
Disk option config settings
boot_disk_type
Type of the boot disk. Valid values are “pd-ssd” or “pd-standard”
boot_disk_size_gb
Size in GB of the boot disk.
num_local_ssds
Number of attached SSDs, from 0 to 4.
is_preemptible
Specifies if this instance group contains preemptible instances.
managed_group_config
The config for Compute Engine Instance Group Manager that manages this group. This is only used for preemptible instance groups.
instance_template_name
The name of the Instance Template used for the Managed Instance Group.
instance_group_manager_name
The name of the Instance Group Manager for this group
worker_config
The config settings for Compute Engine resources in an instance group, such as a master or worker group.
num_instances
The number of VM instances in the instance group. For master instance groups, must be set to 1.
instance_names
The list of instance names.
image_uri
The Compute Engine image resource used for cluster instances.
machine_type_uri
The Compute Engine machine type used for cluster instances
disk_config
Disk option config settings
boot_disk_type
Type of the boot disk. Valid values are “pd-ssd” or “pd-standard”
boot_disk_size_gb
Size in GB of the boot disk.
num_local_ssds
Number of attached SSDs, from 0 to 4.
is_preemptible
Specifies if this instance group contains preemptible instances.
managed_group_config
The config for Compute Engine Instance Group Manager that manages this group. This is only used for preemptible instance groups.
instance_template_name
The name of the Instance Template used for the Managed Instance Group.
instance_group_manager_name
The name of the Instance Group Manager for this group
secondary_worker_config
The config settings for Compute Engine resources in an instance group, such as a master or worker group.
num_instances
The number of VM instances in the instance group. For master instance groups, must be set to 1.
instance_names
The list of instance names.
image_uri
The Compute Engine image resource used for cluster instances.
machine_type_uri
The Compute Engine machine type used for cluster instances
disk_config
Disk option config settings
boot_disk_type
Type of the boot disk. Valid values are “pd-ssd” or “pd-standard”
boot_disk_size_gb
Size in GB of the boot disk.
num_local_ssds
Number of attached SSDs, from 0 to 4.
is_preemptible
Specifies if this instance group contains preemptible instances.
managed_group_config
The config for Compute Engine Instance Group Manager that manages this group. This is only used for preemptible instance groups.
instance_template_name
The name of the Instance Template used for the Managed Instance Group.
instance_group_manager_name
The name of the Instance Group Manager for this group
software_config
Specifies the selection and config of software inside the cluster
image_version
The version of software inside the cluster. It must be one of the supported Cloud Dataproc Versions, such as “1.2” (including a subminor version, such as “1.2.29”), or the “preview” version.
properties
The properties to set on daemon config files. Property keys are specified in the prefix:property format, for example core:hadoop.tmp.dir
optional_components
The set of optional components to activate on the cluster.

Possible values:

  • COMPONENT_UNSPECIFIED
  • ANACONDA
  • HBASE
  • RANGER
  • SOLR
  • HIVE_WEBHCAT
  • JUPYTER
  • ZEPPELIN
initialization_actions
Specifies an executable to run on a fully configured node and a timeout period for executable completion.
executable_file
Cloud Storage URI of the executable file
execution_timeout
Amount of time executable has to complete
encryption_config
Encryption settings for the cluster.
gce_pd_kms_key_name
The Cloud KMS key name to use for PD disk encyption for all instances in the cluster.
security_config
Kerberos config holder.
kerberos_config
Kerberos related configuration.
enable_kerberos
Flag to indicate whether to Kerberize the cluster.
rootprincipal_password_uri
The cloud Storage URI of a KMS encrypted file containing the root principal password.
kms_key_uri
The uri of the KMS key used to encrypt various sensitive files.
keystore_uri
The Cloud Storage URI of the keystore file used for SSL encryption.
truststore_uri
The Cloud Storage URI of a KMS encrypted file containing the password to the user provided keystore.
key_password_uri
The Cloud Storage URI of a KMS encrypted file containing the password to the user provided key.
truststore_password_uri
The Cloud Storage URI of a KMS encrypted file containing the password to the user provided truststore.
cross_realm_trust_realm
The remote realm the Dataproc on-cluster KDC will trust, should the user enable cross realm trust.
cross_realm_trust_admin_server
The admin server (IP or hostname) for the remote trusted realm in a cross realm trust relationship.
cross_realm_trust_shared_password_uri
The Cloud Storage URI of a KMS encrypted file containing the shared password between the on-cluster Kerberos realm and the remote trusted realm, in a cross realm trust relationship.
kdc_db_key_uri
The Cloud Storage URI of a KMS encrypted file containing the master key of the KDC database.
tgt_lifetime_hours
The lifetime of the ticket granting ticket, in hours.
realm
The name of the on-cluster Kerberos realm.
region
The region in which the cluster and associated nodes will be created in.

GCP Permissions

Ensure the Cloud Dataproc API is enabled for the current project.