Migrating to google-cloud-dataproc 1.0
The 1.0 release of the google-cloud-dataproc client is a significant upgrade based on a next-gen code generator, and includes substantial interface changes. Existing code written for earlier versions of this library will likely require updates to use this version. This document describes the changes that have been made, and what you need to do to update your usage.
To summarize:
- The library has been broken out into multiple libraries. The new gems
google-cloud-dataproc-v1
andgoogle-cloud-dataproc-v1beta2
contain the actual client classes for versions V2 and V1beta2 of the Dataproc service, and the gemgoogle-cloud-dataproc
now simply provides a convenience wrapper. See Library Structure for more info. - The library uses a new configuration mechanism giving you closer control over endpoint address, network timeouts, and retry. See Client Configuration for more info. Furthermore, when creating a client object, you can customize its configuration in a block rather than passing arguments to the constructor. See Creating Clients for more info.
- Previously, positional arguments were used to indicate required arguments. Now, all method arguments are keyword arguments, with documentation that specifies whether they are required or optional. Additionally, you can pass a proto request object instead of separate arguments. See Passing Arguments for more info.
- Previously, some client classes included class methods for constructing resource paths. These paths are now instance methods on the client objects, and are also available in a separate paths module. See Resource Path Helpers for more info.
- Previously, clients reported RPC errors by raising instances of
Google::Gax::GaxError
and its subclasses. Now, RPC exceptions are of typeGoogle::Cloud::Error
and its subclasses. See Handling Errors for more info. - Some classes have moved into different namespaces. See Class Namespaces for more info.
Library Structure
Older 0.x releases of the google-cloud-dataproc
gem were all-in-one gems
that included potentially multiple clients for multiple versions of the
Dataproc service. Factory methods such as Google::Cloud::Dataproc::ClusterController.new
would return you instances of client classes such as
Google::Cloud::Dataproc::V1::ClusterController
or
Google::Cloud::Dataproc::V1beta2::ClusterController
, depending on which
version of the API requested. These classes were all defined in the same gem.
With the 1.0 release, the google-cloud-dataproc
gem still provides factory
methods for obtaining clients. (The method signatures will have changed. See
Creating Clients for details.) However, the actual client
classes have been moved into separate gems, one per service version. The
Google::Cloud::Dataproc::V1::ClusterController::Client
class, along with its
helpers and data types, is now part of the google-cloud-dataproc-v1
gem.
Similarly, the Google::Cloud::Dataproc::V1beta2::ClusterController::Client
class is part of the google-cloud-dataproc-v1beta2
gem.
For normal usage, you can continue to install the google-cloud-dataproc
gem
(which will bring in the versioned client gems as dependencies) and continue to
use factory methods to create clients. However, you may alternatively choose to
install only one of the versioned gems. For example, if you know you will only
use V1
of the service, you can install google-cloud-dataproc-v1
by
itself, and construct instances of the
Google::Cloud::Dataproc::V1::ClusterController::Client
client class directly.
Client Configuration
In older releases, if you wanted to customize performance parameters or low-level behavior of the client (such as credentials, timeouts, or instrumentation), you would pass a variety of keyword arguments to the client constructor. It was also extremely difficult to customize the default settings.
With the 1.0 release, a configuration interface provides control over these parameters, including defaults for all instances of a client, and settings for each specific client instance. For example, to set default credentials and timeout for all Dataproc V1 ClusterController clients:
Google::Cloud::Dataproc::V1::ClusterController::Client.configure do |config|
config.credentials = "/path/to/credentials.json"
config.timeout = 10.0
end
Individual RPCs can also be configured independently. For example, to set the
timeout for the create_cluster
call:
Google::Cloud::Dataproc::V1::ClusterController::Client.configure do |config|
config.rpcs.create_cluster.timeout = 20.0
end
Defaults for certain configurations can be set for all Dataproc versions and services globally:
Google::Cloud::Dataproc.configure do |config|
config.credentials = "/path/to/credentials.json"
config.timeout = 10.0
end
Finally, you can override the configuration for each client instance. See the next section on Creating Clients for details.
Creating Clients
In older releases, to create a client object, you would use the new
method
of modules under Google::Cloud::Dataproc
. For example, you might call
Google::Cloud::Dataproc::ClusterController.new
. Keyword arguments were available to
select a service version and to configure parameters such as credentials and
timeouts.
With the 1.0 release, use named class methods of Google::Cloud::Dataproc
to
create a client object. For example, Google::Cloud::Dataproc.cluster_controller
.
You may select a service version using the :version
keyword argument.
However, other configuration parameters should be set in a configuration block
when you create the client.
Old:
client = Google::Cloud::Dataproc::ClusterController.new credentials: "/path/to/credentials.json"
New:
client = Google::Cloud::Dataproc.cluster_controller do |config|
config.credentials = "/path/to/credentials.json"
end
The configuration block is optional. If you do not provide it, or you do not set some configuration parameters, then the default configuration is used. See Client Configuration.
Passing Arguments
In older releases, required arguments would be passed as positional method arguments, while most optional arguments would be passed as keyword arguments.
With the 1.0 release, all RPC arguments are passed as keyword arguments, regardless of whether they are required or optional. For example:
Old:
client = Google::Cloud::Dataproc::ClusterController.new
project_id = "my-project"
region = "us-central1"
cluster_name = "my_cluster"
# Arguments are positional
response = client.get_cluster project_id, region, cluster_name
New:
client = Google::Cloud::Dataproc.cluster_controller
project_id = "my-project"
region = "us-central1"
cluster_name = "my_cluster"
# All arguments are keyword arguments
response = client.get_cluster project_id: project_id, region: region,
cluster_name: cluster_name
In the 1.0 release, it is also possible to pass a request object, either as a hash or as a protocol buffer.
New:
client = Google::Cloud::Dataproc.cluster_controller
request = Google::Cloud::Dataproc::V1::GetClusterRequest.new(
project_id: "my-project",
region: "us-central1",
cluster_name: "my_cluster"
)
# Pass a request object as a positional argument:
response = client.get_cluster request
Finally, in older releases, to provide call options, you would pass a
Google::Gax::CallOptions
object with the :options
keyword argument. In the
1.0 release, pass call options using a second set of keyword arguments.
Old:
client = Google::Cloud::Dataproc::ClusterController.new
project_id = "my-project"
region = "us-central1"
cluster_name = "my_cluster"
= Google::Gax::CallOptions.new timeout: 10.0
response = client.get_cluster project_id, region, cluster_name, options:
New:
client = Google::Cloud::Dataproc.cluster_controller
project_id = "my-project"
region = "us-central1"
cluster_name = "my_cluster"
# Use a hash to wrap the normal call arguments (or pass a request object), and
# then add further keyword arguments for the call options.
response = client.get_feed(
{ project_id: project_id, region: region, cluster_name: cluster_name },
timeout: 10.0)
Resource Path Helpers
The client library includes helper methods for generating the resource path strings passed to many calls. These helpers have changed in two ways:
- In older releases, they are class methods on the client class. In the 1.0 release, they are instance methods on the client. They are also available on a separate paths module that you can include elsewhere for convenience.
- In older releases, arguments to a resource path helper are passed as positional arguments. In the 1.0 release, they are passed as named keyword arguments. Some helpers also support different sets of arguments, each set corresponding to a different type of path.
Following is an example involving using a resource path helper.
Old:
client = Google::Cloud::Dataproc::WorkflowTemplateService.new
# Call the helper on the client class
name = Google::Cloud::Dataproc::V1::WorkflowTemplateServiceClient.
workflow_template_path("my-project", "us-central1", "my-template")
response = client.get_workflow_template name
New:
client = Google::Cloud::Dataproc.workflow_template_service
# Call the helper on the client instance, and use keyword arguments
name = client.workflow_template_path project: "my-project",
region: "us-central1",
workflow_template: "my-template"
response = client.get_workflow_template name: name
Because helpers take keyword arguments, some can now generate several different
variations on the path that were not available under earlier versions of the
library. For example, workflow_template_path
can generate paths with either
a region or location as the parent resource.
New:
client = Google::Cloud::Dataproc.workflow_template_service
# Create paths with different parent resource types
name1 = client.workflow_template_path project: "my-project",
region: "us-central1",
workflow_template: "my-template"
# => "projects/my-project/regions/us-central1/workflowTemplates/my-template"
name2 = client.workflow_template_path project: "my-project",
location: "my-location",
workflow_template: "my-template"
# => "projects/my-project/locations/my-location/workflowTemplates/my-template"
Finally, in the 1.0 client, you can also use the paths module as a convenience module.
New:
# Bring the path helper methods into the current class
include Google::Cloud::Dataproc::V1::WorkflowTemplateService::Paths
def foo
client = Google::Cloud::Dataproc.workflow_template_service
# Call the included helper method
name = workflow_template_path project: "my-project",
location: "my-location",
workflow_template: "my-template"
response = client.get_workflow_template name: name
# Do something with response...
end
Handling Errors
The client reports standard
gRPC error codes
by raising exceptions. In older releases, these exceptions were located in the
Google::Gax
namespace and were subclasses of the Google::Gax::GaxError
base
exception class, defined in the google-gax
gem. However, these classes were
different from the standard exceptions (subclasses of Google::Cloud::Error
)
thrown by other client libraries such as google-cloud-storage
.
The 1.0 client library now uses the Google::Cloud::Error
exception hierarchy,
for consistency across all the Google Cloud client libraries. In general, these
exceptions have the same name as their counterparts from older releases, but
are located in the Google::Cloud
namespace rather than the Google::Gax
namespace.
Old:
client = Google::Cloud::Dataproc::ClusterController.new
project_id = "my-project"
region = "us-central1"
cluster_name = "my_cluster"
begin
response = client.get_cluster project_id, region, cluster_name
rescue Google::Gax::Error => e
# Handle exceptions that subclass Google::Gax::Error
end
New:
client = Google::Cloud::Dataproc.cluster_controller
project_id = "my-project"
region = "us-central1"
cluster_name = "my_cluster"
begin
response = client.get_cluster project_id: project_id, region: region,
cluster_name: cluster_name
rescue Google::Cloud::Error => e
# Handle exceptions that subclass Google::Cloud::Error
end
Class Namespaces
In older releases, the client object was of classes with names like:
Google::Cloud::Dataproc::V1::ClusterControllerClient
.
In the 1.0 release, the client object is of a different class:
Google::Cloud::Dataproc::V1::ClusterController::Client
.
Note that most users will use the factory methods such as
Google::Cloud::Dataproc.cluster_controller
to create instances of the client object,
so you may not need to reference the actual class directly.
See Creating Clients.
In older releases, the credentials object was of class
Google::Cloud::Dataproc::V1::Credentials
.
In the 1.0 release, each service has its own credentials class, e.g.
Google::Cloud::Dataproc::V1::ClusterController::Credentials
.
Again, most users will not need to reference this class directly.
See Client Configuration.