Class: Google::Cloud::Bigquery::ExtractJob

Inherits:

Job

Object
Job
Google::Cloud::Bigquery::ExtractJob

show all

Defined in:: lib/google/cloud/bigquery/extract_job.rb

Overview

ExtractJob

A Job subclass representing an export operation that may be performed on a Table or Model. A ExtractJob instance is returned when you call Project#extract_job, Table#extract_job or Model#extract_job.

Examples:

Export table data

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
table = dataset.table "my_table"

extract_job = table.extract_job "gs://my-bucket/file-name.json",
                                format: "json"
extract_job.wait_until_done!
extract_job.done? #=> true

Export a model

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
model = dataset.model "my_model"

extract_job = model.extract_job "gs://my-bucket/#{model.model_id}"

extract_job.wait_until_done!
extract_job.done? #=> true

Direct Known Subclasses

Updater

Defined Under Namespace

Classes: Updater

Instance Method Summary collapse

#avro? ⇒ Boolean
Checks if the destination format for the table data is Avro.
#compression? ⇒ Boolean
Checks if the export operation compresses the data using gzip.
#csv? ⇒ Boolean
Checks if the destination format for the table data is CSV.
#delimiter ⇒ String^?
The character or symbol the operation uses to delimit fields in the exported data.
#destinations ⇒ Object
The URI or URIs representing the Google Cloud Storage files to which the data is exported.
#destinations_counts ⇒ Hash<String, Integer>
A hash containing the URI or URI pattern specified in #destinations mapped to the counts of files per destination.
#destinations_file_counts ⇒ Array<Integer>
The number of files per destination URI or URI pattern specified in #destinations.
#json? ⇒ Boolean
Checks if the destination format for the table data is newline-delimited JSON.
#ml_tf_saved_model? ⇒ Boolean
Checks if the destination format for the model is TensorFlow SavedModel.
#ml_xgboost_booster? ⇒ Boolean
Checks if the destination format for the model is XGBoost.
#model? ⇒ Boolean
Whether the source of the export job is a model.
#print_header? ⇒ Boolean
Checks if the exported data contains a header row.
#source(view: nil) ⇒ Table, ...
The table or model which is exported.
#table? ⇒ Boolean
Whether the source of the export job is a table.
#use_avro_logical_types? ⇒ Boolean
If #avro? (#format is set to "AVRO"), this flag indicates whether to enable extracting applicable column types (such as TIMESTAMP) to their corresponding AVRO logical types (timestamp-micros), instead of only using their raw types (avro-long).

Methods inherited from Job

#cancel, #configuration, #created_at, #delete, #done?, #ended_at, #error, #errors, #failed?, #job_id, #labels, #location, #num_child_jobs, #parent_job_id, #pending?, #project_id, #reload!, #rerun!, #reservation_usage, #running?, #script_statistics, #session_id, #started_at, #state, #statistics, #status, #transaction_id, #user_email, #wait_until_done!

Instance Method Details

#avro? ⇒ `Boolean`

Checks if the destination format for the table data is Avro. The default is false. Not applicable when extracting models.

Returns:

(Boolean) —
true when AVRO, false if not AVRO or not a table extraction.

# File 'lib/google/cloud/bigquery/extract_job.rb', line 151

def avro?
  return false unless table?
  @gapi.configuration.extract.destination_format == "AVRO"
end

#compression? ⇒ `Boolean`

Checks if the export operation compresses the data using gzip. The default is false. Not applicable when extracting models.

Returns:

(Boolean) —
true when GZIP, false if not GZIP or not a table extraction.

# File 'lib/google/cloud/bigquery/extract_job.rb', line 110

def compression?
  return false unless table?
  @gapi.configuration.extract.compression == "GZIP"
end

#csv? ⇒ `Boolean`

Checks if the destination format for the table data is CSV. Tables with nested or repeated fields cannot be exported as CSV. The default is true for tables. Not applicable when extracting models.

Returns:

(Boolean) —
true when CSV, or false if not CSV or not a table extraction.

# File 'lib/google/cloud/bigquery/extract_job.rb', line 136

def csv?
  return false unless table?
  val = @gapi.configuration.extract.destination_format
  return true if val.nil?
  val == "CSV"
end

#delimiter ⇒ `String`^?

The character or symbol the operation uses to delimit fields in the exported data. The default is a comma (,) for tables. Not applicable when extracting models.

Returns:

(String, nil) —
A string containing the character, such as ",", nil if not a table extraction.

# File 'lib/google/cloud/bigquery/extract_job.rb', line 190

def delimiter
  return unless table?
  val = @gapi.configuration.extract.field_delimiter
  val = "," if val.nil?
  val
end

#destinations ⇒ `Object`

The URI or URIs representing the Google Cloud Storage files to which the data is exported.



61
62
63

# File 'lib/google/cloud/bigquery/extract_job.rb', line 61

def destinations
  Array @gapi.configuration.extract.destination_uris
end

#destinations_counts ⇒ `Hash<String, Integer>`

A hash containing the URI or URI pattern specified in #destinations mapped to the counts of files per destination.

Returns:

(Hash<String, Integer>) —
A Hash with the URI patterns as keys and the counts as values.



229
230
231

# File 'lib/google/cloud/bigquery/extract_job.rb', line 229

def destinations_counts
  destinations.zip(destinations_file_counts).to_h
end

#destinations_file_counts ⇒ `Array<Integer>`

The number of files per destination URI or URI pattern specified in #destinations.

Returns:

(Array<Integer>) —
An array of values in the same order as the URI patterns.



218
219
220

# File 'lib/google/cloud/bigquery/extract_job.rb', line 218

def destinations_file_counts
  Array @gapi.statistics.extract.destination_uri_file_counts
end

#json? ⇒ `Boolean`

Checks if the destination format for the table data is newline-delimited JSON. The default is false. Not applicable when extracting models.

Returns:

(Boolean) —
true when NEWLINE_DELIMITED_JSON, false if not NEWLINE_DELIMITED_JSON or not a table extraction.

# File 'lib/google/cloud/bigquery/extract_job.rb', line 123

def json?
  return false unless table?
  @gapi.configuration.extract.destination_format == "NEWLINE_DELIMITED_JSON"
end

#ml_tf_saved_model? ⇒ `Boolean`

Checks if the destination format for the model is TensorFlow SavedModel. The default is true for models. Not applicable when extracting tables.

Returns:

(Boolean) —
true when ML_TF_SAVED_MODEL, false if not ML_TF_SAVED_MODEL or not a model extraction.

# File 'lib/google/cloud/bigquery/extract_job.rb', line 163

def ml_tf_saved_model?
  return false unless model?
  val = @gapi.configuration.extract.destination_format
  return true if val.nil?
  val == "ML_TF_SAVED_MODEL"
end

#ml_xgboost_booster? ⇒ `Boolean`

Checks if the destination format for the model is XGBoost. The default is false. Not applicable when extracting tables.

Returns:

(Boolean) —
true when ML_XGBOOST_BOOSTER, false if not ML_XGBOOST_BOOSTER or not a model extraction.

# File 'lib/google/cloud/bigquery/extract_job.rb', line 177

def ml_xgboost_booster?
  return false unless model?
  @gapi.configuration.extract.destination_format == "ML_XGBOOST_BOOSTER"
end

#model? ⇒ `Boolean`

Whether the source of the export job is a model. See #source.

Returns:

(Boolean) —
true when the source is a model, false otherwise.



100
101
102

# File 'lib/google/cloud/bigquery/extract_job.rb', line 100

def model?
  !@gapi.configuration.extract.source_model.nil?
end

#print_header? ⇒ `Boolean`

Checks if the exported data contains a header row. The default is true for tables. Not applicable when extracting models.

Returns:

(Boolean) —
true when the print header configuration is present or nil, false if disabled or not a table extraction.

# File 'lib/google/cloud/bigquery/extract_job.rb', line 204

def print_header?
  return false unless table?
  val = @gapi.configuration.extract.print_header
  val = true if val.nil?
  val
end

#source(view: nil) ⇒ `Table`, ...

The table or model which is exported.

Parameters:

view (String) (defaults to: nil) —
Specifies the view that determines which table information is returned. By default, basic table information and storage statistics (STORAGE_STATS) are returned. Accepted values include :unspecified, :basic, :storage, and :full. For more information, see BigQuery Classes. The default value is the :unspecified view type.

Returns:

(Table, Model, nil) —
A table or model instance, or nil.

# File 'lib/google/cloud/bigquery/extract_job.rb', line 76

def source view: nil
  if (table = @gapi.configuration.extract.source_table)
    retrieve_table table.project_id, table.dataset_id, table.table_id, metadata_view: view
  elsif (model = @gapi.configuration.extract.source_model)
    retrieve_model model.project_id, model.dataset_id, model.model_id
  end
end

#table? ⇒ `Boolean`

Whether the source of the export job is a table. See #source.

Returns:

(Boolean) —
true when the source is a table, false otherwise.



90
91
92

# File 'lib/google/cloud/bigquery/extract_job.rb', line 90

def table?
  !@gapi.configuration.extract.source_table.nil?
end

#use_avro_logical_types? ⇒ `Boolean`

If #avro? (#format is set to "AVRO"), this flag indicates whether to enable extracting applicable column types (such as TIMESTAMP) to their corresponding AVRO logical types (timestamp-micros), instead of only using their raw types (avro-long). Not applicable when extracting models.

Returns:

(Boolean) —
true when applicable column types will use their corresponding AVRO logical types, false if not enabled or not a table extraction.

# File 'lib/google/cloud/bigquery/extract_job.rb', line 244

def use_avro_logical_types?
  return false unless table?
  @gapi.configuration.extract.use_avro_logical_types
end

Class: Google::Cloud::Bigquery::ExtractJob

Overview

ExtractJob

Direct Known Subclasses

Defined Under Namespace

Instance Method Summary collapse

Methods inherited from Job

Instance Method Details

#avro? ⇒ Boolean

#compression? ⇒ Boolean

#csv? ⇒ Boolean

#delimiter ⇒ String?

#destinations ⇒ Object

#destinations_counts ⇒ Hash<String, Integer>

#destinations_file_counts ⇒ Array<Integer>

#json? ⇒ Boolean

#ml_tf_saved_model? ⇒ Boolean

#ml_xgboost_booster? ⇒ Boolean

#model? ⇒ Boolean

#print_header? ⇒ Boolean

#source(view: nil) ⇒ Table, ...

#table? ⇒ Boolean

#use_avro_logical_types? ⇒ Boolean

#avro? ⇒ `Boolean`

#compression? ⇒ `Boolean`

#csv? ⇒ `Boolean`

#delimiter ⇒ `String`^?

#destinations ⇒ `Object`

#destinations_counts ⇒ `Hash<String, Integer>`

#destinations_file_counts ⇒ `Array<Integer>`

#json? ⇒ `Boolean`

#ml_tf_saved_model? ⇒ `Boolean`

#ml_xgboost_booster? ⇒ `Boolean`

#model? ⇒ `Boolean`

#print_header? ⇒ `Boolean`

#source(view: nil) ⇒ `Table`, ...

#table? ⇒ `Boolean`

#use_avro_logical_types? ⇒ `Boolean`