Class: Google::Cloud::Bigquery::Model

Inherits:
Object
  • Object
show all
Defined in:
lib/google/cloud/bigquery/model.rb,
lib/google/cloud/bigquery/model/list.rb

Overview

Model

A model in BigQuery ML represents what an ML system has learned from the training data.

The following types of models are supported by BigQuery ML:

  • Linear regression for forecasting; for example, the sales of an item on a given day. Labels are real-valued (they cannot be +/- infinity or NaN).
  • Binary logistic regression for classification; for example, determining whether a customer will make a purchase. Labels must only have two possible values.
  • Multiclass logistic regression for classification. These models can be used to predict multiple possible values such as whether an input is "low-value," "medium-value," or "high-value." Labels can have up to 50 unique values. In BigQuery ML, multiclass logistic regression training uses a multinomial classifier with a cross entropy loss function.
  • K-means clustering for data segmentation (beta); for example, identifying customer segments. K-means is an unsupervised learning technique, so model training does not require labels nor split data for training or evaluation.

In BigQuery ML, a model can be used with data from multiple BigQuery datasets for training and for prediction.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

model = dataset.model "my_model"

See Also:

Defined Under Namespace

Classes: List

Attributes collapse

Data collapse

Lifecycle collapse

Instance Method Details

#created_atTime?

The time when this model was created.

Returns:

  • (Time, nil)

    The creation time, or nil if the object is a reference (see #reference?).



241
242
243
244
# File 'lib/google/cloud/bigquery/model.rb', line 241

def created_at
  return nil if reference?
  Convert.millis_to_time @gapi_json[:creationTime]
end

#dataset_idString

The ID of the Dataset containing this model.

Returns:

  • (String)

    The ID must contain only letters ([A-Za-z]), numbers ([0-9]), or underscores (_). The maximum length is 1,024 characters.



108
109
110
111
# File 'lib/google/cloud/bigquery/model.rb', line 108

def dataset_id
  return @reference.dataset_id if reference?
  @gapi_json[:modelReference][:datasetId]
end

#deleteBoolean

Permanently deletes the model.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
model = dataset.model "my_model"

model.delete

Returns:

  • (Boolean)

    Returns true if the model was deleted.



646
647
648
649
650
651
652
# File 'lib/google/cloud/bigquery/model.rb', line 646

def delete
  ensure_service!
  service.delete_model dataset_id, model_id
  # Set flag for #exists?
  @exists = false
  true
end

#descriptionString?

A user-friendly description of the model.

Returns:

  • (String, nil)

    The description, or nil if the object is a reference (see #reference?).



211
212
213
214
215
# File 'lib/google/cloud/bigquery/model.rb', line 211

def description
  return nil if reference?
  ensure_full_data!
  @gapi_json[:description]
end

#description=(new_description) ⇒ Object

Updates the user-friendly description of the model.

If the model is not a full resource representation (see #resource_full?), the full representation will be retrieved before the update to comply with ETag-based optimistic concurrency control.

Parameters:

  • new_description (String)

    The new user-friendly description.



228
229
230
231
# File 'lib/google/cloud/bigquery/model.rb', line 228

def description= new_description
  ensure_full_data!
  patch_gapi! description: new_description
end

#encryptionEncryptionConfiguration?

The EncryptionConfiguration object that represents the custom encryption method used to protect this model. If not set, Dataset#default_encryption is used.

Present only if this model is using custom encryption.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
model = dataset.model "my_model"

encrypt_config = model.encryption

Returns:

See Also:



399
400
401
402
403
404
405
406
407
# File 'lib/google/cloud/bigquery/model.rb', line 399

def encryption
  return nil if reference?
  return nil if @gapi_json[:encryptionConfiguration].nil?
  # We have to create a gapic object from the hash because that is what
  # EncryptionConfiguration is expecing.
  json_cmek = @gapi_json[:encryptionConfiguration].to_json
  gapi_cmek = Google::Apis::BigqueryV2::EncryptionConfiguration.from_json json_cmek
  EncryptionConfiguration.from_gapi(gapi_cmek).freeze
end

#encryption=(value) ⇒ Object

Set the EncryptionConfiguration object that represents the custom encryption method used to protect this model. If not set, Dataset#default_encryption is used.

Present only if this model is using custom encryption.

If the model is not a full resource representation (see #resource_full?), the full representation will be retrieved before the update to comply with ETag-based optimistic concurrency control.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
model = dataset.model "my_model"

key_name = "projects/a/locations/b/keyRings/c/cryptoKeys/d"
encrypt_config = bigquery.encryption kms_key: key_name

model.encryption = encrypt_config

Parameters:

See Also:



439
440
441
442
443
444
445
# File 'lib/google/cloud/bigquery/model.rb', line 439

def encryption= value
  ensure_full_data!
  # We have to create a hash from the gapic object's JSON because that
  # is what Model is expecing.
  json_cmek = JSON.parse value.to_gapi.to_json, symbolize_names: true
  patch_gapi! encryptionConfiguration: json_cmek
end

#etagString?

The ETag hash of the model.

Returns:

  • (String, nil)

    The ETag hash, or nil if the object is a reference (see #reference?).



197
198
199
200
201
# File 'lib/google/cloud/bigquery/model.rb', line 197

def etag
  return nil if reference?
  ensure_full_data!
  @gapi_json[:etag]
end

#exists?(force: false) ⇒ Boolean

Determines whether the model exists in the BigQuery service. The result is cached locally. To refresh state, set force to true.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

dataset = bigquery.dataset "my_dataset"
model = dataset.model "my_model", skip_lookup: true
model.exists? #=> true

Parameters:

  • force (Boolean) (defaults to: false)

    Force the latest resource representation to be retrieved from the BigQuery service when true. Otherwise the return value of this method will be memoized to reduce the number of API calls made to the BigQuery service. The default is false.

Returns:

  • (Boolean)

    true when the model exists in the BigQuery service, false otherwise.



704
705
706
707
708
709
710
711
# File 'lib/google/cloud/bigquery/model.rb', line 704

def exists? force: false
  return resource_exists? if force
  # If we have a value, return it
  return @exists unless @exists.nil?
  # Always true if we have a gapi_json object
  return true if resource?
  resource_exists?
end

#expires_atTime?

The time when this model expires. If not present, the model will persist indefinitely. Expired models will be deleted and their storage reclaimed.

Returns:

  • (Time, nil)

    The expiration time, or nil if not present or the object is a reference (see #reference?).



269
270
271
272
273
# File 'lib/google/cloud/bigquery/model.rb', line 269

def expires_at
  return nil if reference?
  ensure_full_data!
  Convert.millis_to_time @gapi_json[:expirationTime]
end

#expires_at=(new_expires_at) ⇒ Object

Updates time when this model expires.

If the model is not a full resource representation (see #resource_full?), the full representation will be retrieved before the update to comply with ETag-based optimistic concurrency control.

Parameters:

  • new_expires_at (Integer)

    The new time when this model expires.



286
287
288
289
290
# File 'lib/google/cloud/bigquery/model.rb', line 286

def expires_at= new_expires_at
  ensure_full_data!
  new_expires_millis = Convert.time_to_millis new_expires_at
  patch_gapi! expirationTime: new_expires_millis
end

#extract(extract_url, format: nil) {|job| ... } ⇒ Boolean

Exports the model to Google Cloud Storage using a synchronous method that blocks for a response. Timeouts and transient errors are generally handled as needed to complete the job. See also #extract_job.

The geographic location for the job ("US", "EU", etc.) can be set via ExtractJob::Updater#location= in a block passed to this method. If the model is a full resource representation (see #resource_full?), the location of the job will automatically be set to the location of the model.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
model = dataset.model "my_model"

model.extract "gs://my-bucket/#{model.model_id}"

Parameters:

  • extract_url (String)

    The Google Storage URI to which BigQuery should extract the model. This value should be end in an object name prefix, since multiple objects will be exported.

  • format (String) (defaults to: nil)

    The exported file format. The default value is ml_tf_saved_model.

    The following values are supported:

    • ml_tf_saved_model - TensorFlow SavedModel
    • ml_xgboost_booster - XGBoost Booster

Yields:

  • (job)

    a job configuration object

Yield Parameters:

Returns:

  • (Boolean)

    Returns true if the extract operation succeeded.

See Also:



623
624
625
626
627
628
# File 'lib/google/cloud/bigquery/model.rb', line 623

def extract extract_url, format: nil, &block
  job = extract_job extract_url, format: format, &block
  job.wait_until_done!
  ensure_job_succeeded! job
  true
end

#extract_job(extract_url, format: nil, job_id: nil, prefix: nil, labels: nil) {|job| ... } ⇒ Google::Cloud::Bigquery::ExtractJob

Exports the model to Google Cloud Storage asynchronously, immediately returning an ExtractJob that can be used to track the progress of the export job. The caller may poll the service by repeatedly calling Job#reload! and Job#done? to detect when the job is done, or simply block until the job is done by calling #Job#wait_until_done!. See also #extract.

The geographic location for the job ("US", "EU", etc.) can be set via ExtractJob::Updater#location= in a block passed to this method. If the model is a full resource representation (see #resource_full?), the location of the job will automatically be set to the location of the model.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
model = dataset.model "my_model"

extract_job = model.extract_job "gs://my-bucket/#{model.model_id}"

extract_job.wait_until_done!
extract_job.done? #=> true

Parameters:

  • extract_url (String)

    The Google Storage URI to which BigQuery should extract the model. This value should be end in an object name prefix, since multiple objects will be exported.

  • format (String) (defaults to: nil)

    The exported file format. The default value is ml_tf_saved_model.

    The following values are supported:

    • ml_tf_saved_model - TensorFlow SavedModel
    • ml_xgboost_booster - XGBoost Booster
  • job_id (String) (defaults to: nil)

    A user-defined ID for the extract job. The ID must contain only letters ([A-Za-z]), numbers ([0-9]), underscores (_), or dashes (-). The maximum length is 1,024 characters. If job_id is provided, then prefix will not be used.

    See Generating a job ID.

  • prefix (String) (defaults to: nil)

    A string, usually human-readable, that will be prepended to a generated value to produce a unique job ID. For example, the prefix daily_import_job_ can be given to generate a job ID such as daily_import_job_12vEDtMQ0mbp1Mo5Z7mzAFQJZazh. The prefix must contain only letters ([A-Za-z]), numbers ([0-9]), underscores (_), or dashes (-). The maximum length of the entire ID is 1,024 characters. If job_id is provided, then prefix will not be used.

  • labels (Hash) (defaults to: nil)

    A hash of user-provided labels associated with the job. You can use these to organize and group your jobs.

    The labels applied to a resource must meet the following requirements:

    • Each resource can have multiple labels, up to a maximum of 64.
    • Each label must be a key-value pair.
    • Keys have a minimum length of 1 character and a maximum length of 63 characters, and cannot be empty. Values can be empty, and have a maximum length of 63 characters.
    • Keys and values can contain only lowercase letters, numeric characters, underscores, and dashes. All characters must use UTF-8 encoding, and international characters are allowed.
    • The key portion of a label must be unique. However, you can use the same key with multiple resources.
    • Keys must start with a lowercase letter or international character.

Yields:

  • (job)

    a job configuration object

Yield Parameters:

Returns:

See Also:



569
570
571
572
573
574
575
576
577
578
579
580
# File 'lib/google/cloud/bigquery/model.rb', line 569

def extract_job extract_url, format: nil, job_id: nil, prefix: nil, labels: nil
  ensure_service!
  options = { format: format, job_id: job_id, prefix: prefix, labels: labels }
  updater = ExtractJob::Updater.from_options service, model_ref, extract_url, options
  updater.location = location if location # may be model reference

  yield updater if block_given?

  job_gapi = updater.to_gapi
  gapi = service.extract_table job_gapi
  Job.from_gapi gapi, service
end

#feature_columnsArray<StandardSql::Field>

The input feature columns that were used to train this model.

Returns:



454
455
456
457
458
459
460
# File 'lib/google/cloud/bigquery/model.rb', line 454

def feature_columns
  ensure_full_data!
  Array(@gapi_json[:featureColumns]).map do |field_gapi_json|
    field_gapi = Google::Apis::BigqueryV2::StandardSqlField.from_json field_gapi_json.to_json
    StandardSql::Field.from_gapi field_gapi
  end
end

#label_columnsArray<StandardSql::Field>

The label columns that were used to train this model. The output of the model will have a "predicted_" prefix to these columns.

Returns:



470
471
472
473
474
475
476
# File 'lib/google/cloud/bigquery/model.rb', line 470

def label_columns
  ensure_full_data!
  Array(@gapi_json[:labelColumns]).map do |field_gapi_json|
    field_gapi = Google::Apis::BigqueryV2::StandardSqlField.from_json field_gapi_json.to_json
    StandardSql::Field.from_gapi field_gapi
  end
end

#labelsHash<String, String>?

A hash of user-provided labels associated with this model. Labels are used to organize and group models. See Using Labels.

The returned hash is frozen and changes are not allowed. Use #labels= to replace the entire hash.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
model = dataset.model "my_model"

labels = model.labels

Returns:

  • (Hash<String, String>, nil)

    A hash containing key/value pairs.



327
328
329
330
331
332
# File 'lib/google/cloud/bigquery/model.rb', line 327

def labels
  return nil if reference?
  m = @gapi_json[:labels]
  m = m.to_h if m.respond_to? :to_h
  m.dup.freeze
end

#labels=(new_labels) ⇒ Object

Updates the hash of user-provided labels associated with this model. Labels are used to organize and group models. See Using Labels.

If the model is not a full resource representation (see #resource_full?), the full representation will be retrieved before the update to comply with ETag-based optimistic concurrency control.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
model = dataset.model "my_model"

model.labels = { "env" => "production" }

Parameters:

  • new_labels (Hash<String, String>)

    A hash containing key/value pairs. The labels applied to a resource must meet the following requirements:

    • Each resource can have multiple labels, up to a maximum of 64.
    • Each label must be a key-value pair.
    • Keys have a minimum length of 1 character and a maximum length of 63 characters, and cannot be empty. Values can be empty, and have a maximum length of 63 characters.
    • Keys and values can contain only lowercase letters, numeric characters, underscores, and dashes. All characters must use UTF-8 encoding, and international characters are allowed.
    • The key portion of a label must be unique. However, you can use the same key with multiple resources.
    • Keys must start with a lowercase letter or international character.


369
370
371
372
# File 'lib/google/cloud/bigquery/model.rb', line 369

def labels= new_labels
  ensure_full_data!
  patch_gapi! labels: new_labels
end

#locationString?

The geographic location where the model should reside. Possible values include EU and US. The default value is US.

Returns:

  • (String, nil)

    The location code.



300
301
302
303
304
# File 'lib/google/cloud/bigquery/model.rb', line 300

def location
  return nil if reference?
  ensure_full_data!
  @gapi_json[:location]
end

#model_idString

A unique ID for this model.

Returns:

  • (String)

    The ID must contain only letters ([A-Za-z]), numbers ([0-9]), or underscores (_). The maximum length is 1,024 characters.



95
96
97
98
# File 'lib/google/cloud/bigquery/model.rb', line 95

def model_id
  return @reference.model_id if reference?
  @gapi_json[:modelReference][:modelId]
end

#model_typeString?

Type of the model resource. Expected to be one of the following:

  • LINEAR_REGRESSION - Linear regression model.
  • LOGISTIC_REGRESSION - Logistic regression based classification model.
  • KMEANS - K-means clustering model (beta).
  • TENSORFLOW - An imported TensorFlow model (beta).

Returns:

  • (String, nil)

    The model type, or nil if the object is a reference (see #reference?).



154
155
156
157
# File 'lib/google/cloud/bigquery/model.rb', line 154

def model_type
  return nil if reference?
  @gapi_json[:modelType]
end

#modified_atTime?

The date when this model was last modified.

Returns:

  • (Time, nil)

    The last modified time, or nil if not present or the object is a reference (see #reference?).



254
255
256
257
# File 'lib/google/cloud/bigquery/model.rb', line 254

def modified_at
  return nil if reference?
  Convert.millis_to_time @gapi_json[:lastModifiedTime]
end

#nameString?

The name of the model.

Returns:

  • (String, nil)

    The friendly name, or nil if the object is a reference (see #reference?).



167
168
169
170
171
# File 'lib/google/cloud/bigquery/model.rb', line 167

def name
  return nil if reference?
  ensure_full_data!
  @gapi_json[:friendlyName]
end

#name=(new_name) ⇒ Object

Updates the name of the model.

If the model is not a full resource representation (see #resource_full?), the full representation will be retrieved before the update to comply with ETag-based optimistic concurrency control.

Parameters:

  • new_name (String)

    The new friendly name.



184
185
186
187
# File 'lib/google/cloud/bigquery/model.rb', line 184

def name= new_name
  ensure_full_data!
  patch_gapi! friendlyName: new_name
end

#project_idString

The ID of the Project containing this model.

Returns:

  • (String)

    The project ID.



120
121
122
123
# File 'lib/google/cloud/bigquery/model.rb', line 120

def project_id
  return @reference.project_id if reference?
  @gapi_json[:modelReference][:projectId]
end

#reference?Boolean

Whether the model was created without retrieving the resource representation from the BigQuery service.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

dataset = bigquery.dataset "my_dataset"
model = dataset.model "my_model", skip_lookup: true

model.reference? #=> true
model.reload!
model.reference? #=> false

Returns:

  • (Boolean)

    true when the model is just a local reference object, false otherwise.



732
733
734
# File 'lib/google/cloud/bigquery/model.rb', line 732

def reference?
  @gapi_json.nil?
end

#reload!Google::Cloud::Bigquery::Model Also known as: refresh!

Reloads the model with current data from the BigQuery service.

Examples:

Skip retrieving the model from the service, then load it:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

dataset = bigquery.dataset "my_dataset"
model = dataset.model "my_model", skip_lookup: true

model.reference? #=> true
model.reload!
model.resource? #=> true

Returns:



674
675
676
677
678
679
680
# File 'lib/google/cloud/bigquery/model.rb', line 674

def reload!
  ensure_service!
  @gapi_json = service.get_model dataset_id, model_id
  @reference = nil
  @exists = nil
  self
end

#resource?Boolean

Whether the model was created with a resource representation from the BigQuery service.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

dataset = bigquery.dataset "my_dataset"
model = dataset.model "my_model", skip_lookup: true

model.resource? #=> false
model.reload!
model.resource? #=> true

Returns:

  • (Boolean)

    true when the model was created with a resource representation, false otherwise.



755
756
757
# File 'lib/google/cloud/bigquery/model.rb', line 755

def resource?
  !@gapi_json.nil?
end

#resource_full?Boolean

Whether the model was created with a full resource representation from the BigQuery service.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

dataset = bigquery.dataset "my_dataset"
model = dataset.model "my_model"

model.resource_full? #=> true

Returns:

  • (Boolean)

    true when the model was created with a full resource representation, false otherwise.



804
805
806
# File 'lib/google/cloud/bigquery/model.rb', line 804

def resource_full?
  resource? && @gapi_json.key?(:friendlyName)
end

#resource_partial?Boolean

Whether the model was created with a partial resource representation from the BigQuery service by retrieval through Dataset#models. See Models: list response for the contents of the partial representation. Accessing any attribute outside of the partial representation will result in loading the full representation.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

dataset = bigquery.dataset "my_dataset"
model = dataset.models.first

model.resource_partial? #=> true
model.description # Loads the full resource.
model.resource_partial? #=> false

Returns:

  • (Boolean)

    true when the model was created with a partial resource representation, false otherwise.



783
784
785
# File 'lib/google/cloud/bigquery/model.rb', line 783

def resource_partial?
  resource? && !resource_full?
end

#training_runsArray<Google::Cloud::Bigquery::Model::TrainingRun>

Information for all training runs in increasing order of startTime.

Returns:

  • (Array<Google::Cloud::Bigquery::Model::TrainingRun>)


485
486
487
488
# File 'lib/google/cloud/bigquery/model.rb', line 485

def training_runs
  ensure_full_data!
  Array @gapi_json[:trainingRuns]
end