Class: Google::Cloud::Bigquery::Project

Inherits:
Object
  • Object
show all
Defined in:
lib/google/cloud/bigquery/project.rb,
lib/google/cloud/bigquery/project/list.rb

Overview

Project

Projects are top-level containers in Google Cloud Platform. They store information about billing and authorized users, and they contain BigQuery data. Each project has a friendly name and a unique ID.

Google::Cloud::Bigquery::Project is the main object for interacting with Google BigQuery. Dataset objects are created, accessed, and deleted by Google::Cloud::Bigquery::Project.

See Google::Cloud#bigquery.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
table = dataset.table "my_table"

Defined Under Namespace

Classes: List

Instance Attribute Summary collapse

Data collapse

Instance Method Summary collapse

Constructor Details

#initialize(service) ⇒ Project

Creates a new Service instance.

See Google::Cloud.bigquery



65
66
67
# File 'lib/google/cloud/bigquery/project.rb', line 65

def initialize service
  @service = service
end

Instance Attribute Details

#nameString? (readonly)

The descriptive name of the project. Can only be present if the project was retrieved with #projects.

Returns:

  • (String, nil)

    the current value of name



54
55
56
# File 'lib/google/cloud/bigquery/project.rb', line 54

def name
  @name
end

#numeric_idInteger? (readonly)

The numeric ID of the project. Can only be present if the project was retrieved with #projects.

Returns:

  • (Integer, nil)

    the current value of numeric_id



54
55
56
# File 'lib/google/cloud/bigquery/project.rb', line 54

def numeric_id
  @numeric_id
end

Instance Method Details

#copy(source_table, destination_table, create: nil, write: nil) {|job| ... } ⇒ Boolean

Copies the data from the source table to the destination table using a synchronous method that blocks for a response. Timeouts and transient errors are generally handled as needed to complete the job. See #copy_job for the asynchronous version. Use this method instead of Table#copy to copy from source tables in other projects.

The geographic location for the job ("US", "EU", etc.) can be set via CopyJob::Updater#location= in a block passed to this method.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
destination_table = dataset.table "my_destination_table"

bigquery.copy "bigquery-public-data.samples.shakespeare",
              destination_table

Parameters:

  • source_table (String, Table)

    The source table for the copied data. This can be a table object; or a string ID as specified by the Standard SQL Query Reference (project-name.dataset_id.table_id) or the Legacy SQL Query Reference (project-name:dataset_id.table_id).

  • destination_table (String, Table)

    The destination table for the copied data. This can be a table object; or a string ID as specified by the Standard SQL Query Reference (project-name.dataset_id.table_id) or the Legacy SQL Query Reference (project-name:dataset_id.table_id).

  • create (String)

    Specifies whether the job is allowed to create new tables. The default value is needed.

    The following values are supported:

    • needed - Create the table if it does not exist.
    • never - The table must already exist. A 'notFound' error is raised if the table does not exist.
  • write (String)

    Specifies how to handle data already present in the destination table. The default value is empty.

    The following values are supported:

    • truncate - BigQuery overwrites the table data.
    • append - BigQuery appends the data to the table.
    • empty - An error will be returned if the destination table already contains data.

Yields:

  • (job)

    a job configuration object

Yield Parameters:

Returns:

  • (Boolean)

    Returns true if the copy operation succeeded.



264
265
266
267
268
269
270
271
272
273
274
# File 'lib/google/cloud/bigquery/project.rb', line 264

def copy source_table, destination_table, create: nil, write: nil,
         &block
  job = copy_job source_table,
                 destination_table,
                 create: create,
                 write:  write,
                 &block
  job.wait_until_done!
  ensure_job_succeeded! job
  true
end

#copy_job(source_table, destination_table, create: nil, write: nil, job_id: nil, prefix: nil, labels: nil) {|job| ... } ⇒ Google::Cloud::Bigquery::CopyJob

Copies the data from the source table to the destination table using an asynchronous method. In this method, a CopyJob is immediately returned. The caller may poll the service by repeatedly calling Job#reload! and Job#done? to detect when the job is done, or simply block until the job is done by calling #Job#wait_until_done!. See #copy for the synchronous version. Use this method instead of Table#copy_job to copy from source tables in other projects.

The geographic location for the job ("US", "EU", etc.) can be set via CopyJob::Updater#location= in a block passed to this method.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
source_table_id = "bigquery-public-data.samples.shakespeare"
destination_table = dataset.table "my_destination_table"

copy_job = bigquery.copy_job source_table_id, destination_table

copy_job.wait_until_done!
copy_job.done? #=> true

Parameters:

  • source_table (String, Table)

    The source table for the copied data. This can be a table object; or a string ID as specified by the Standard SQL Query Reference (project-name.dataset_id.table_id) or the Legacy SQL Query Reference (project-name:dataset_id.table_id).

  • destination_table (String, Table)

    The destination table for the copied data. This can be a table object; or a string ID as specified by the Standard SQL Query Reference (project-name.dataset_id.table_id) or the Legacy SQL Query Reference (project-name:dataset_id.table_id).

  • create (String)

    Specifies whether the job is allowed to create new tables. The default value is needed.

    The following values are supported:

    • needed - Create the table if it does not exist.
    • never - The table must already exist. A 'notFound' error is raised if the table does not exist.
  • write (String)

    Specifies how to handle data already present in the destination table. The default value is empty.

    The following values are supported:

    • truncate - BigQuery overwrites the table data.
    • append - BigQuery appends the data to the table.
    • empty - An error will be returned if the destination table already contains data.
  • job_id (String)

    A user-defined ID for the copy job. The ID must contain only letters (a-z, A-Z), numbers (0-9), underscores (_), or dashes (-). The maximum length is 1,024 characters. If job_id is provided, then prefix will not be used.

    See Generating a job ID.

  • prefix (String)

    A string, usually human-readable, that will be prepended to a generated value to produce a unique job ID. For example, the prefix daily_import_job_ can be given to generate a job ID such as daily_import_job_12vEDtMQ0mbp1Mo5Z7mzAFQJZazh. The prefix must contain only letters (a-z, A-Z), numbers (0-9), underscores (_), or dashes (-). The maximum length of the entire ID is 1,024 characters. If job_id is provided, then prefix will not be used.

  • labels (Hash)

    A hash of user-provided labels associated with the job. You can use these to organize and group your jobs. Label keys and values can be no longer than 63 characters, can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. Label values are optional. Label keys must start with a letter and each label in the list must have a different key. See Requirements for labels.

Yields:

  • (job)

    a job configuration object

Yield Parameters:

Returns:



185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
# File 'lib/google/cloud/bigquery/project.rb', line 185

def copy_job source_table, destination_table, create: nil, write: nil,
             job_id: nil, prefix: nil, labels: nil
  ensure_service!
  options = { create: create, write: write, labels: labels,
              job_id: job_id, prefix: prefix }

  updater = CopyJob::Updater.from_options(
    service,
    Service.get_table_ref(source_table, default_ref: project_ref),
    Service.get_table_ref(destination_table, default_ref: project_ref),
    options
  )

  yield updater if block_given?

  job_gapi = updater.to_gapi
  gapi = service.copy_table job_gapi
  Job.from_gapi gapi, service
end

#create_dataset(dataset_id, name: nil, description: nil, expiration: nil, location: nil) {|access| ... } ⇒ Google::Cloud::Bigquery::Dataset

Creates a new dataset.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

dataset = bigquery.create_dataset "my_dataset"

A name and description can be provided:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

dataset = bigquery.create_dataset "my_dataset",
                                  name: "My Dataset",
                                  description: "This is my Dataset"

Or, configure access with a block: (See Dataset::Access)

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

dataset = bigquery.create_dataset "my_dataset" do |dataset|
  dataset.access.add_writer_user "writers@example.com"
end

Parameters:

  • dataset_id (String)

    A unique ID for this dataset, without the project name. The ID must contain only letters (a-z, A-Z), numbers (0-9), or underscores (_). The maximum length is 1,024 characters.

  • name (String)

    A descriptive name for the dataset.

  • description (String)

    A user-friendly description of the dataset.

  • expiration (Integer)

    The default lifetime of all tables in the dataset, in milliseconds. The minimum value is 3600000 milliseconds (one hour).

  • location (String)

    The geographic location where the dataset should reside. Possible values include EU and US. The default value is US.

Yields:

  • (access)

    a block for setting rules

Yield Parameters:

Returns:



919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
# File 'lib/google/cloud/bigquery/project.rb', line 919

def create_dataset dataset_id, name: nil, description: nil,
                   expiration: nil, location: nil
  ensure_service!

  new_ds = Google::Apis::BigqueryV2::Dataset.new(
    dataset_reference: Google::Apis::BigqueryV2::DatasetReference.new(
      project_id: project, dataset_id: dataset_id
    )
  )

  # Can set location only on creation, no Dataset#location method
  new_ds.update! location: location unless location.nil?

  updater = Dataset::Updater.new(new_ds).tap do |b|
    b.name = name unless name.nil?
    b.description = description unless description.nil?
    b.default_expiration = expiration unless expiration.nil?
  end

  if block_given?
    yield updater
    updater.check_for_mutated_access!
  end

  gapi = service.insert_dataset new_ds
  Dataset.from_gapi gapi, service
end

#dataset(dataset_id, skip_lookup: nil) ⇒ Google::Cloud::Bigquery::Dataset?

Retrieves an existing dataset by ID.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

dataset = bigquery.dataset "my_dataset"
puts dataset.name

Avoid retrieving the dataset resource with skip_lookup:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

dataset = bigquery.dataset "my_dataset", skip_lookup: true

Parameters:

  • dataset_id (String)

    The ID of a dataset.

  • skip_lookup (Boolean)

    Optionally create just a local reference object without verifying that the resource exists on the BigQuery service. Calls made on this object will raise errors if the resource does not exist. Default is false. Optional.

Returns:



862
863
864
865
866
867
868
869
870
871
# File 'lib/google/cloud/bigquery/project.rb', line 862

def dataset dataset_id, skip_lookup: nil
  ensure_service!
  if skip_lookup
    return Dataset.new_reference project, dataset_id, service
  end
  gapi = service.get_dataset dataset_id
  Dataset.from_gapi gapi, service
rescue Google::Cloud::NotFoundError
  nil
end

#datasets(all: nil, filter: nil, token: nil, max: nil) ⇒ Array<Google::Cloud::Bigquery::Dataset>

Retrieves the list of datasets belonging to the project.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

datasets = bigquery.datasets
datasets.each do |dataset|
  puts dataset.name
end

Retrieve hidden datasets with the all optional arg:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

all_datasets = bigquery.datasets all: true

Retrieve all datasets: (See Dataset::List#all)

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

datasets = bigquery.datasets
datasets.all do |dataset|
  puts dataset.name
end

Parameters:

  • all (Boolean)

    Whether to list all datasets, including hidden ones. The default is false.

  • filter (String)

    An expression for filtering the results of the request by label. The syntax is labels.<name>[:<value>]. Multiple filters can be ANDed together by connecting with a space. Example: labels.department:receiving labels.active. See Filtering datasets using labels.

  • token (String)

    A previously-returned page token representing part of the larger set of results to view.

  • max (Integer)

    Maximum number of datasets to return.

Returns:



991
992
993
994
995
996
# File 'lib/google/cloud/bigquery/project.rb', line 991

def datasets all: nil, filter: nil, token: nil, max: nil
  ensure_service!
  options = { all: all, filter: filter, token: token, max: max }
  gapi = service.list_datasets options
  Dataset::List.from_gapi gapi, service, all, filter, max
end

#encryption(kms_key: nil) ⇒ Google::Cloud::Bigquery::EncryptionConfiguration

Creates a new Bigquery::EncryptionConfiguration instance.

This method does not execute an API call. Use the encryption configuration to encrypt a table when creating one via Bigquery::Dataset#create_table, Bigquery::Dataset#load, Bigquery::Table#copy, or Bigquery::Project#query.

Examples:

Encrypt a new table

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

key_name = "projects/a/locations/b/keyRings/c/cryptoKeys/d"
encrypt_config = bigquery.encryption kms_key: key_name

table = dataset.create_table "my_table" do |updater|
  updater.encryption = encrypt_config
end

Encrypt a load destination table

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

key_name = "projects/a/locations/b/keyRings/c/cryptoKeys/d"
encrypt_config = bigquery.encryption kms_key: key_name
job = dataset.load_job "my_table", "gs://abc/file" do |job|
  job.encryption = encrypt_config
end

Encrypt a copy destination table

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
table = dataset.table "my_table"

key_name = "projects/a/locations/b/keyRings/c/cryptoKeys/d"
encrypt_config = bigquery.encryption kms_key: key_name
job = table.copy_job "my_dataset.new_table" do |job|
  job.encryption = encrypt_config
end

Encrypt a query destination table

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

key_name = "projects/a/locations/b/keyRings/c/cryptoKeys/d"
encrypt_config = bigquery.encryption kms_key: key_name
job = bigquery.query_job "SELECT 1;" do |query|
  query.table = dataset.table "my_table", skip_lookup: true
  query.encryption = encrypt_config
end

Parameters:

  • kms_key (String)

    Name of the Cloud KMS encryption key that will be used to protect the destination BigQuery table. The BigQuery Service Account associated with your project requires access to this encryption key.

Returns:



1300
1301
1302
1303
1304
# File 'lib/google/cloud/bigquery/project.rb', line 1300

def encryption kms_key: nil
  encrypt_config = Bigquery::EncryptionConfiguration.new
  encrypt_config.kms_key = kms_key unless kms_key.nil?
  encrypt_config
end

#external(url, format: nil) {|ext| ... } ⇒ External::DataSource

Creates a new External::DataSource (or subclass) object that represents the external data source that can be queried from directly, even though the data is not stored in BigQuery. Instead of loading or streaming the data, this object references the external data source.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

csv_url = "gs://bucket/path/to/data.csv"
csv_table = bigquery.external csv_url do |csv|
  csv.autodetect = true
  csv.skip_leading_rows = 1
end

data = bigquery.query "SELECT * FROM my_ext_table",
                      external: { my_ext_table: csv_table }

data.each do |row|
  puts row[:name]
end

Parameters:

  • url (String, Array<String>)

    The fully-qualified URL(s) that point to your data in Google Cloud. An attempt will be made to derive the format from the URLs provided.

  • format (String|Symbol)

    The data format. This value will be used even if the provided URLs are recognized as a different format. Optional.

    The following values are supported:

    • csv - CSV
    • json - Newline-delimited JSON
    • avro - Avro
    • sheets - Google Sheets
    • datastore_backup - Cloud Datastore backup
    • bigtable - Bigtable

Yields:

  • (ext)

Returns:

See Also:



829
830
831
832
833
# File 'lib/google/cloud/bigquery/project.rb', line 829

def external url, format: nil
  ext = External.from_urls url, format
  yield ext if block_given?
  ext
end

#extract(table, extract_url, format: nil, compression: nil, delimiter: nil, header: nil) {|job| ... } ⇒ Boolean

Extracts the data from the provided table to a Google Cloud Storage file using a synchronous method that blocks for a response. Timeouts and transient errors are generally handled as needed to complete the job. See #extract_job for the asynchronous version. Use this method instead of Table#extract to extract data from source tables in other projects.

The geographic location for the job ("US", "EU", etc.) can be set via ExtractJob::Updater#location= in a block passed to this method.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

bigquery.extract "bigquery-public-data.samples.shakespeare",
                 "gs://my-bucket/shakespeare.csv"

Parameters:

  • table (String, Table)

    The source table from which to extract data. This can be a table object; or a string ID as specified by the Standard SQL Query Reference (project-name.dataset_id.table_id) or the Legacy SQL Query Reference (project-name:dataset_id.table_id).

  • extract_url (Google::Cloud::Storage::File, String, Array<String>)

    The Google Storage file or file URI pattern(s) to which BigQuery should extract the table data.

  • format (String)

    The exported file format. The default value is csv.

    The following values are supported:

  • compression (String)

    The compression type to use for exported files. Possible values include GZIP and NONE. The default value is NONE.

  • delimiter (String)

    Delimiter to use between fields in the exported data. Default is ,.

  • header (Boolean)

    Whether to print out a header row in the results. Default is true.

Yields:

  • (job)

    a job configuration object

Yield Parameters:

Returns:

  • (Boolean)

    Returns true if the extract operation succeeded.

See Also:



1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
# File 'lib/google/cloud/bigquery/project.rb', line 1463

def extract table, extract_url, format: nil, compression: nil,
            delimiter: nil, header: nil, &block
  job = extract_job table,
                    extract_url,
                    format:      format,
                    compression: compression,
                    delimiter:   delimiter,
                    header:      header,
                    &block
  job.wait_until_done!
  ensure_job_succeeded! job
  true
end

#extract_job(table, extract_url, format: nil, compression: nil, delimiter: nil, header: nil, job_id: nil, prefix: nil, labels: nil) {|job| ... } ⇒ Google::Cloud::Bigquery::ExtractJob

Extracts the data from the provided table to a Google Cloud Storage file using an asynchronous method. In this method, an ExtractJob is immediately returned. The caller may poll the service by repeatedly calling Job#reload! and Job#done? to detect when the job is done, or simply block until the job is done by calling

Job#wait_until_done!. See #extract for the synchronous version.

Use this method instead of Table#extract_job to extract data from source tables in other projects.

The geographic location for the job ("US", "EU", etc.) can be set via ExtractJob::Updater#location= in a block passed to this method.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

table_id = "bigquery-public-data.samples.shakespeare"
extract_job = bigquery.extract_job table_id,
                                   "gs://my-bucket/shakespeare.csv"
extract_job.wait_until_done!
extract_job.done? #=> true

Parameters:

  • table (String, Table)

    The source table from which to extract data. This can be a table object; or a string ID as specified by the Standard SQL Query Reference (project-name.dataset_id.table_id) or the Legacy SQL Query Reference (project-name:dataset_id.table_id).

  • extract_url (Google::Cloud::Storage::File, String, Array<String>)

    The Google Storage file or file URI pattern(s) to which BigQuery should extract the table data.

  • format (String)

    The exported file format. The default value is csv.

    The following values are supported:

  • compression (String)

    The compression type to use for exported files. Possible values include GZIP and NONE. The default value is NONE.

  • delimiter (String)

    Delimiter to use between fields in the exported data. Default is ,.

  • header (Boolean)

    Whether to print out a header row in the results. Default is true.

  • job_id (String)

    A user-defined ID for the extract job. The ID must contain only letters (a-z, A-Z), numbers (0-9), underscores (_), or dashes (-). The maximum length is 1,024 characters. If job_id is provided, then prefix will not be used.

    See Generating a job ID.

  • prefix (String)

    A string, usually human-readable, that will be prepended to a generated value to produce a unique job ID. For example, the prefix daily_import_job_ can be given to generate a job ID such as daily_import_job_12vEDtMQ0mbp1Mo5Z7mzAFQJZazh. The prefix must contain only letters (a-z, A-Z), numbers (0-9), underscores (_), or dashes (-). The maximum length of the entire ID is 1,024 characters. If job_id is provided, then prefix will not be used.

  • labels (Hash)

    A hash of user-provided labels associated with the job. You can use these to organize and group your jobs. Label keys and values can be no longer than 63 characters, can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. Label values are optional. Label keys must start with a letter and each label in the list must have a different key. See Requirements for labels.

Yields:

  • (job)

    a job configuration object

Yield Parameters:

Returns:

See Also:



1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
# File 'lib/google/cloud/bigquery/project.rb', line 1389

def extract_job table, extract_url, format: nil, compression: nil,
                delimiter: nil, header: nil, job_id: nil, prefix: nil,
                labels: nil
  ensure_service!
  options = { format: format, compression: compression,
              delimiter: delimiter, header: header, job_id: job_id,
              prefix: prefix, labels: labels }

  table_ref = Service.get_table_ref table, default_ref: project_ref
  updater = ExtractJob::Updater.from_options service, table_ref,
                                             extract_url, options

  yield updater if block_given?

  job_gapi = updater.to_gapi
  gapi = service.extract_table job_gapi
  Job.from_gapi gapi, service
end

#job(job_id, location: nil) ⇒ Google::Cloud::Bigquery::Job?

Retrieves an existing job by ID.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

job = bigquery.job "my_job"

Parameters:

  • job_id (String)

    The ID of a job.

  • location (String)

    The geographic location where the job was created. Required except for US and EU.

Returns:



1015
1016
1017
1018
1019
1020
1021
# File 'lib/google/cloud/bigquery/project.rb', line 1015

def job job_id, location: nil
  ensure_service!
  gapi = service.get_job job_id, location: location
  Job.from_gapi gapi, service
rescue Google::Cloud::NotFoundError
  nil
end

#jobs(all: nil, token: nil, max: nil, filter: nil, min_created_at: nil, max_created_at: nil) ⇒ Array<Google::Cloud::Bigquery::Job>

Retrieves the list of jobs belonging to the project.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

jobs = bigquery.jobs
jobs.each do |job|
  # process job
end

Retrieve only running jobs using the filter optional arg:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

running_jobs = bigquery.jobs filter: "running"
running_jobs.each do |job|
  # process job
end

Retrieve only jobs created within provided times:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

two_days_ago = Time.now - 60*60*24*2
three_days_ago = Time.now - 60*60*24*3

jobs = bigquery.jobs min_created_at: three_days_ago,
                     max_created_at: two_days_ago
jobs.each do |job|
  # process job
end

Retrieve all jobs: (See Job::List#all)

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

jobs = bigquery.jobs
jobs.all do |job|
  # process job
end

Parameters:

  • all (Boolean)

    Whether to display jobs owned by all users in the project. The default is false. Optional.

  • token (String)

    A previously-returned page token representing part of the larger set of results to view. Optional.

  • max (Integer)

    Maximum number of jobs to return. Optional.

  • filter (String)

    A filter for job state. Optional.

  • min_created_at (Time)

    Min value for Job#created_at. When provided, only jobs created after or at this time are returned. Optional.

  • max_created_at (Time)

    Max value for Job#created_at. When provided, only jobs created before or at this time are returned. Optional.

    Acceptable values are:

    • done - Finished jobs
    • pending - Pending jobs
    • running - Running jobs

Returns:



1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
# File 'lib/google/cloud/bigquery/project.rb', line 1092

def jobs all: nil, token: nil, max: nil, filter: nil,
         min_created_at: nil, max_created_at: nil
  ensure_service!
  options = {
    all: all, token: token, max: max, filter: filter,
    min_created_at: min_created_at, max_created_at: max_created_at
  }
  gapi = service.list_jobs options
  Job::List.from_gapi gapi, service, options
end

#project_idObject Also known as: project

The BigQuery project connected to.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new(
  project_id: "my-project",
  credentials: "/path/to/keyfile.json"
)

bigquery.project_id #=> "my-project"


82
83
84
# File 'lib/google/cloud/bigquery/project.rb', line 82

def project_id
  service.project
end

#projects(token: nil, max: nil) ⇒ Array<Google::Cloud::Bigquery::Project>

Retrieves the list of all projects for which the currently authorized account has been granted any project role. The returned project instances share the same credentials as the project used to retrieve them, but lazily create a new API connection for interactions with the BigQuery service.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

projects = bigquery.projects
projects.each do |project|
  puts project.name
  project.datasets.all.each do |dataset|
    puts dataset.name
  end
end

Retrieve all projects: (See Google::Cloud::Bigquery::Project::List#all)

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

projects = bigquery.projects

projects.all do |project|
  puts project.name
  project.datasets.all.each do |dataset|
    puts dataset.name
  end
end

Parameters:

  • token (String)

    A previously-returned page token representing part of the larger set of results to view.

  • max (Integer)

    Maximum number of projects to return.

Returns:



1144
1145
1146
1147
1148
1149
# File 'lib/google/cloud/bigquery/project.rb', line 1144

def projects token: nil, max: nil
  ensure_service!
  options = { token: token, max: max }
  gapi = service.list_projects options
  Project::List.from_gapi gapi, service, max
end

#query(query, params: nil, external: nil, max: nil, cache: true, dataset: nil, project: nil, standard_sql: nil, legacy_sql: nil) {|job| ... } ⇒ Google::Cloud::Bigquery::Data

Queries data and waits for the results. In this method, a QueryJob is created and its results are saved to a temporary table, then read from the table. Timeouts and transient errors are generally handled as needed to complete the query. When used for executing DDL/DML statements, this method does not return row data.

When using standard SQL and passing arguments using params, Ruby types are mapped to BigQuery types as follows:

BigQuery Ruby Notes
BOOL true/false
INT64 Integer
FLOAT64 Float
NUMERIC BigDecimal Will be rounded to 9 decimal places
STRING String
DATETIME DateTime DATETIME does not support time zone.
DATE Date
TIMESTAMP Time
TIME Google::Cloud::BigQuery::Time
BYTES File, IO, StringIO, or similar
ARRAY Array Nested arrays, nil values are not supported.
STRUCT Hash Hash keys may be strings or symbols.

See Data Types for an overview of each BigQuery data type, including allowed values.

The geographic location for the job ("US", "EU", etc.) can be set via QueryJob::Updater#location= in a block passed to this method.

Examples:

Query using standard SQL:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

sql = "SELECT name FROM `my_project.my_dataset.my_table`"
data = bigquery.query sql

data.each do |row|
  puts row[:name]
end

Query using legacy SQL:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

sql = "SELECT name FROM [my_project:my_dataset.my_table]"
data = bigquery.query sql, legacy_sql: true

data.each do |row|
  puts row[:name]
end

Retrieve all rows: (See Data#all)

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

data = bigquery.query "SELECT name FROM `my_dataset.my_table`"

data.all do |row|
  puts row[:name]
end

Query using positional query parameters:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

data = bigquery.query "SELECT name " \
                      "FROM `my_dataset.my_table`" \
                      "WHERE id = ?",
                      params: [1]

data.each do |row|
  puts row[:name]
end

Query using named query parameters:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

data = bigquery.query "SELECT name " \
                      "FROM `my_dataset.my_table`" \
                      "WHERE id = @id",
                      params: { id: 1 }

data.each do |row|
  puts row[:name]
end

Execute a DDL statement:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

data = bigquery.query "CREATE TABLE `my_dataset.my_table` (x INT64)"

table_ref = data.ddl_target_table

Execute a DML statement:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

data = bigquery.query "UPDATE `my_dataset.my_table` " \
                      "SET x = x + 1 " \
                      "WHERE x IS NOT NULL"

puts data.num_dml_affected_rows

Query using external data source, set destination:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

csv_url = "gs://bucket/path/to/data.csv"
csv_table = bigquery.external csv_url do |csv|
  csv.autodetect = true
  csv.skip_leading_rows = 1
end

data = bigquery.query "SELECT * FROM my_ext_table" do |query|
  query.external = { my_ext_table: csv_table }
  dataset = bigquery.dataset "my_dataset", skip_lookup: true
  query.table = dataset.table "my_table", skip_lookup: true
end

data.each do |row|
  puts row[:name]
end

Parameters:

  • query (String)

    A query string, following the BigQuery query syntax, of the query to execute. Example: "SELECT count(f1) FROM [myProjectId:myDatasetId.myTableId]".

  • params (Array, Hash)

    Standard SQL only. Used to pass query arguments when the query string contains either positional (?) or named (@myparam) query parameters. If value passed is an array ["foo"], the query must use positional query parameters. If value passed is a hash { myparam: "foo" }, the query must use named query parameters. When set, legacy_sql will automatically be set to false and standard_sql to true.

  • external (Hash<String|Symbol, External::DataSource>)

    A Hash that represents the mapping of the external tables to the table names used in the SQL query. The hash keys are the table names, and the hash values are the external table objects. See #query.

  • max (Integer)

    The maximum number of rows of data to return per page of results. Setting this flag to a small value such as 1000 and then paging through results might improve reliability when the query result set is large. In addition to this limit, responses are also limited to 10 MB. By default, there is no maximum row count, and only the byte limit applies.

  • cache (Boolean)

    Whether to look for the result in the query cache. The query cache is a best-effort cache that will be flushed whenever tables in the query are modified. The default value is true. For more information, see query caching.

  • dataset (String)

    Specifies the default datasetId and projectId to assume for any unqualified table names in the query. If not set, all table names in the query string must be qualified in the format 'datasetId.tableId'.

  • project (String)

    Specifies the default projectId to assume for any unqualified table names in the query. Only used if dataset option is set.

  • standard_sql (Boolean)

    Specifies whether to use BigQuery's standard SQL dialect for this query. If set to true, the query will use standard SQL rather than the legacy SQL dialect. When set to true, the values of large_results and flatten are ignored; the query will be run as if large_results is true and flatten is false. Optional. The default value is true.

  • legacy_sql (Boolean)

    Specifies whether to use BigQuery's legacy SQL dialect for this query. If set to false, the query will use BigQuery's standard SQL When set to false, the values of large_results and flatten are ignored; the query will be run as if large_results is true and flatten is false. Optional. The default value is false.

Yields:

  • (job)

    a job configuration object

Yield Parameters:

Returns:

See Also:



762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
# File 'lib/google/cloud/bigquery/project.rb', line 762

def query query, params: nil, external: nil, max: nil, cache: true,
          dataset: nil, project: nil, standard_sql: nil,
          legacy_sql: nil, &block
  job = query_job query, params: params, external: external,
                         cache: cache, dataset: dataset,
                         project: project, standard_sql: standard_sql,
                         legacy_sql: legacy_sql, &block
  job.wait_until_done!

  if job.failed?
    begin
      # raise to activate ruby exception cause handling
      raise job.gapi_error
    rescue StandardError => e
      # wrap Google::Apis::Error with Google::Cloud::Error
      raise Google::Cloud::Error.from_error(e)
    end
  end

  job.data max: max
end

#query_job(query, params: nil, external: nil, priority: "INTERACTIVE", cache: true, table: nil, create: nil, write: nil, dryrun: nil, dataset: nil, project: nil, standard_sql: nil, legacy_sql: nil, large_results: nil, flatten: nil, maximum_billing_tier: nil, maximum_bytes_billed: nil, job_id: nil, prefix: nil, labels: nil, udfs: nil) {|job| ... } ⇒ Google::Cloud::Bigquery::QueryJob

Queries data by creating a query job.

When using standard SQL and passing arguments using params, Ruby types are mapped to BigQuery types as follows:

BigQuery Ruby Notes
BOOL true/false
INT64 Integer
FLOAT64 Float
NUMERIC BigDecimal Will be rounded to 9 decimal places
STRING String
DATETIME DateTime DATETIME does not support time zone.
DATE Date
TIMESTAMP Time
TIME Google::Cloud::BigQuery::Time
BYTES File, IO, StringIO, or similar
ARRAY Array Nested arrays, nil values are not supported.
STRUCT Hash Hash keys may be strings or symbols.

See Data Types for an overview of each BigQuery data type, including allowed values.

The geographic location for the job ("US", "EU", etc.) can be set via QueryJob::Updater#location= in a block passed to this method.

Examples:

Query using standard SQL:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

job = bigquery.query_job "SELECT name FROM " \
                         "`my_project.my_dataset.my_table`"

job.wait_until_done!
if !job.failed?
  job.data.each do |row|
    puts row[:name]
  end
end

Query using legacy SQL:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

job = bigquery.query_job "SELECT name FROM " \
                         " [my_project:my_dataset.my_table]",
                         legacy_sql: true

job.wait_until_done!
if !job.failed?
  job.data.each do |row|
    puts row[:name]
  end
end

Query using positional query parameters:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

job = bigquery.query_job "SELECT name FROM " \
                         "`my_dataset.my_table`" \
                         " WHERE id = ?",
                         params: [1]

job.wait_until_done!
if !job.failed?
  job.data.each do |row|
    puts row[:name]
  end
end

Query using named query parameters:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

job = bigquery.query_job "SELECT name FROM " \
                         "`my_dataset.my_table`" \
                         " WHERE id = @id",
                         params: { id: 1 }

job.wait_until_done!
if !job.failed?
  job.data.each do |row|
    puts row[:name]
  end
end

Execute a DDL statement:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

job = bigquery.query_job "CREATE TABLE " \
                         "`my_dataset.my_table` " \
                         "(x INT64)"

job.wait_until_done!
if !job.failed?
  table_ref = job.ddl_target_table
end

Execute a DML statement:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

job = bigquery.query_job "UPDATE " \
                         "`my_dataset.my_table` " \
                         "SET x = x + 1 " \
                         "WHERE x IS NOT NULL"

job.wait_until_done!
if !job.failed?
  puts job.num_dml_affected_rows
end

Query using external data source, set destination:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

csv_url = "gs://bucket/path/to/data.csv"
csv_table = bigquery.external csv_url do |csv|
  csv.autodetect = true
  csv.skip_leading_rows = 1
end

job = bigquery.query_job "SELECT * FROM my_ext_table" do |query|
  query.external = { my_ext_table: csv_table }
  dataset = bigquery.dataset "my_dataset", skip_lookup: true
  query.table = dataset.table "my_table", skip_lookup: true
end

job.wait_until_done!
if !job.failed?
  job.data.each do |row|
    puts row[:name]
  end
end

Parameters:

  • query (String)

    A query string, following the BigQuery query syntax, of the query to execute. Example: "SELECT count(f1) FROM [myProjectId:myDatasetId.myTableId]".

  • params (Array, Hash)

    Standard SQL only. Used to pass query arguments when the query string contains either positional (?) or named (@myparam) query parameters. If value passed is an array ["foo"], the query must use positional query parameters. If value passed is a hash { myparam: "foo" }, the query must use named query parameters. When set, legacy_sql will automatically be set to false and standard_sql to true.

  • external (Hash<String|Symbol, External::DataSource>)

    A Hash that represents the mapping of the external tables to the table names used in the SQL query. The hash keys are the table names, and the hash values are the external table objects. See #query.

  • priority (String)

    Specifies a priority for the query. Possible values include INTERACTIVE and BATCH. The default value is INTERACTIVE.

  • cache (Boolean)

    Whether to look for the result in the query cache. The query cache is a best-effort cache that will be flushed whenever tables in the query are modified. The default value is true. For more information, see query caching.

  • table (Table)

    The destination table where the query results should be stored. If not present, a new table will be created to store the results.

  • create (String)

    Specifies whether the job is allowed to create new tables. The default value is needed.

    The following values are supported:

    • needed - Create the table if it does not exist.
    • never - The table must already exist. A 'notFound' error is raised if the table does not exist.
  • write (String)

    Specifies the action that occurs if the destination table already exists. The default value is empty.

    The following values are supported:

    • truncate - BigQuery overwrites the table data.
    • append - BigQuery appends the data to the table.
    • empty - A 'duplicate' error is returned in the job result if the table exists and contains data.
  • dryrun (Boolean)

    If set to true, BigQuery doesn't run the job. Instead, if the query is valid, BigQuery returns statistics about the job such as how many bytes would be processed. If the query is invalid, an error returns. The default value is false.

  • dataset (Dataset, String)

    The default dataset to use for unqualified table names in the query. Optional.

  • project (String)

    Specifies the default projectId to assume for any unqualified table names in the query. Only used if dataset option is set.

  • standard_sql (Boolean)

    Specifies whether to use BigQuery's standard SQL dialect for this query. If set to true, the query will use standard SQL rather than the legacy SQL dialect. Optional. The default value is true.

  • legacy_sql (Boolean)

    Specifies whether to use BigQuery's legacy SQL dialect for this query. If set to false, the query will use BigQuery's standard SQL dialect. Optional. The default value is false.

  • large_results (Boolean)

    This option is specific to Legacy SQL. If true, allows the query to produce arbitrarily large result tables at a slight cost in performance. Requires table parameter to be set.

  • flatten (Boolean)

    This option is specific to Legacy SQL. Flattens all nested and repeated fields in the query results. The default value is true. large_results parameter must be true if this is set to false.

  • maximum_bytes_billed (Integer)

    Limits the bytes billed for this job. Queries that will have bytes billed beyond this limit will fail (without incurring a charge). Optional. If unspecified, this will be set to your project default.

  • job_id (String)

    A user-defined ID for the query job. The ID must contain only letters (a-z, A-Z), numbers (0-9), underscores (_), or dashes (-). The maximum length is 1,024 characters. If job_id is provided, then prefix will not be used.

    See Generating a job ID.

  • prefix (String)

    A string, usually human-readable, that will be prepended to a generated value to produce a unique job ID. For example, the prefix daily_import_job_ can be given to generate a job ID such as daily_import_job_12vEDtMQ0mbp1Mo5Z7mzAFQJZazh. The prefix must contain only letters (a-z, A-Z), numbers (0-9), underscores (_), or dashes (-). The maximum length of the entire ID is 1,024 characters. If job_id is provided, then prefix will not be used.

    See Generating a job ID.

  • labels (Hash)

    A hash of user-provided labels associated with the job. You can use these to organize and group your jobs. Label keys and values can be no longer than 63 characters, can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. Label values are optional. Label keys must start with a letter and each label in the list must have a different key. See Requirements for labels.

  • udfs (Array<String>, String)

    User-defined function resources used in the query. May be either a code resource to load from a Google Cloud Storage URI (gs://bucket/path), or an inline resource that contains code for a user-defined function (UDF). Providing an inline code resource is equivalent to providing a URI for a file containing the same code. See User-Defined Functions.

  • maximum_billing_tier (Integer)

    Deprecated: Change the billing tier to allow high-compute queries.

Yields:

  • (job)

    a job configuration object

Yield Parameters:

Returns:



541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
# File 'lib/google/cloud/bigquery/project.rb', line 541

def query_job query, params: nil, external: nil,
              priority: "INTERACTIVE", cache: true, table: nil,
              create: nil, write: nil, dryrun: nil, dataset: nil,
              project: nil, standard_sql: nil, legacy_sql: nil,
              large_results: nil, flatten: nil,
              maximum_billing_tier: nil, maximum_bytes_billed: nil,
              job_id: nil, prefix: nil, labels: nil, udfs: nil
  ensure_service!
  options = { priority: priority, cache: cache, table: table,
              create: create, write: write, dryrun: dryrun,
              large_results: large_results, flatten: flatten,
              dataset: dataset, project: (project || self.project),
              legacy_sql: legacy_sql, standard_sql: standard_sql,
              maximum_billing_tier: maximum_billing_tier,
              maximum_bytes_billed: maximum_bytes_billed,
              external: external, job_id: job_id, prefix: prefix,
              labels: labels, udfs: udfs, params: params }

  updater = QueryJob::Updater.from_options service, query, options

  yield updater if block_given?

  gapi = service.query_job updater.to_gapi
  Job.from_gapi gapi, service
end

#schema {|schema| ... } ⇒ Google::Cloud::Bigquery::Schema

Creates a new schema instance. An optional block may be given to configure the schema, otherwise the schema is returned empty and may be configured directly.

The returned schema can be passed to Dataset#load using the schema option. However, for most use cases, the block yielded by Dataset#load is a more convenient way to configure the schema for the destination table.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

schema = bigquery.schema do |s|
  s.string "first_name", mode: :required
  s.record "cities_lived", mode: :repeated do |nested_schema|
    nested_schema.string "place", mode: :required
    nested_schema.integer "number_of_years", mode: :required
  end
end

dataset = bigquery.dataset "my_dataset"

gs_url = "gs://my-bucket/file-name.csv"
load_job = dataset.load_job "my_new_table", gs_url, schema: schema

Yields:

  • (schema)

    a block for setting the schema

Yield Parameters:

  • schema (Schema)

    the object accepting the schema

Returns:



1229
1230
1231
1232
1233
# File 'lib/google/cloud/bigquery/project.rb', line 1229

def schema
  s = Schema.from_gapi
  yield s if block_given?
  s
end

#service_account_emailString

The email address of the service account for the project used to connect to BigQuery. (See also #project_id.)

Returns:

  • (String)

    The service account email address.



93
94
95
96
# File 'lib/google/cloud/bigquery/project.rb', line 93

def 
  @service_account_email ||= \
    service..email
end

#time(hour, minute, second) ⇒ Bigquery::Time

Creates a Bigquery::Time object to represent a time, independent of a specific date.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

fourpm = bigquery.time 16, 0, 0
data = bigquery.query "SELECT name " \
                      "FROM `my_dataset.my_table`" \
                      "WHERE time_of_date = @time",
                      params: { time: fourpm }

data.each do |row|
  puts row[:name]
end

Create Time with fractional seconds:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

precise_time = bigquery.time 16, 35, 15.376541
data = bigquery.query "SELECT name " \
                      "FROM `my_dataset.my_table`" \
                      "WHERE time_of_date >= @time",
                      params: { time: precise_time }

data.each do |row|
  puts row[:name]
end

Parameters:

  • hour (Integer)

    Hour, valid values from 0 to 23.

  • minute (Integer)

    Minute, valid values from 0 to 59.

  • second (Integer, Float)

    Second, valid values from 0 to 59. Can contain microsecond precision.

Returns:



1192
1193
1194
# File 'lib/google/cloud/bigquery/project.rb', line 1192

def time hour, minute, second
  Bigquery::Time.new "#{hour}:#{minute}:#{second}"
end