Class: Google::Cloud::Bigquery::External::DataSource

Inherits:
Object
  • Object
show all
Defined in:
lib/google/cloud/bigquery/external.rb

Overview

DataSource

External::DataSource and its subclasses represents an external data source that can be queried from directly, even though the data is not stored in BigQuery. Instead of loading or streaming the data, this object references the external data source.

The AVRO and Datastore Backup formats use DataSource. See CsvSource, JsonSource, SheetsSource, BigtableSource for the other formats.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

avro_url = "gs://bucket/path/to/data.avro"
avro_table = bigquery.external avro_url do |avro|
  avro.autodetect = true
end

data = bigquery.query "SELECT * FROM my_ext_table",
                      external: { my_ext_table: avro_table }

# Iterate over the first page of results
data.each do |row|
  puts row[:name]
end
# Retrieve the next page of results
data = data.next if data.next?

Direct Known Subclasses

BigtableSource, CsvSource, JsonSource, SheetsSource

Instance Method Summary collapse

Instance Method Details

#autodetectBoolean

Indicates if the schema and format options are detected automatically.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

csv_url = "gs://bucket/path/to/data.csv"
csv_table = bigquery.external csv_url do |csv|
  csv.autodetect = true
end

csv_table.autodetect #=> true

Returns:

  • (Boolean)


350
351
352
# File 'lib/google/cloud/bigquery/external.rb', line 350

def autodetect
  @gapi.autodetect
end

#autodetect=(new_autodetect) ⇒ Object

Set whether to detect schema and format options automatically. Any option specified explicitly will be honored.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

csv_url = "gs://bucket/path/to/data.csv"
csv_table = bigquery.external csv_url do |csv|
  csv.autodetect = true
end

csv_table.autodetect #=> true

Parameters:

  • new_autodetect (Boolean)

    New autodetect value



372
373
374
375
# File 'lib/google/cloud/bigquery/external.rb', line 372

def autodetect= new_autodetect
  frozen_check!
  @gapi.autodetect = new_autodetect
end

#avro?Boolean

Whether the data format is "AVRO".

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

avro_url = "gs://bucket/path/to/data.avro"
avro_table = bigquery.external avro_url

avro_table.format #=> "AVRO"
avro_table.avro? #=> true

Returns:

  • (Boolean)


261
262
263
# File 'lib/google/cloud/bigquery/external.rb', line 261

def avro?
  @gapi.source_format == "AVRO"
end

#backup?Boolean

Whether the data format is "DATASTORE_BACKUP".

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

backup_url = "gs://bucket/path/to/data.backup_info"
backup_table = bigquery.external backup_url

backup_table.format #=> "DATASTORE_BACKUP"
backup_table.backup? #=> true

Returns:

  • (Boolean)


281
282
283
# File 'lib/google/cloud/bigquery/external.rb', line 281

def backup?
  @gapi.source_format == "DATASTORE_BACKUP"
end

#bigtable?Boolean

Whether the data format is "BIGTABLE".

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

bigtable_url = "https://googleapis.com/bigtable/projects/..."
bigtable_table = bigquery.external bigtable_url

bigtable_table.format #=> "BIGTABLE"
bigtable_table.bigtable? #=> true

Returns:

  • (Boolean)


301
302
303
# File 'lib/google/cloud/bigquery/external.rb', line 301

def bigtable?
  @gapi.source_format == "BIGTABLE"
end

#compressionString

The compression type of the data source. Possible values include "GZIP" and nil. The default value is nil. This setting is ignored for Google Cloud Bigtable, Google Cloud Datastore backups and Avro formats. Optional.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

csv_url = "gs://bucket/path/to/data.csv"
csv_table = bigquery.external csv_url do |csv|
  csv.compression = "GZIP"
end

csv_table.compression #=> "GZIP"

Returns:

  • (String)


396
397
398
# File 'lib/google/cloud/bigquery/external.rb', line 396

def compression
  @gapi.compression
end

#compression=(new_compression) ⇒ Object

Set the compression type of the data source. Possible values include "GZIP" and nil. The default value is nil. This setting is ignored for Google Cloud Bigtable, Google Cloud Datastore backups and Avro formats. Optional.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

csv_url = "gs://bucket/path/to/data.csv"
csv_table = bigquery.external csv_url do |csv|
  csv.compression = "GZIP"
end

csv_table.compression #=> "GZIP"

Parameters:

  • new_compression (String)

    New compression value



420
421
422
423
# File 'lib/google/cloud/bigquery/external.rb', line 420

def compression= new_compression
  frozen_check!
  @gapi.compression = new_compression
end

#csv?Boolean

Whether the data format is "CSV".

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

csv_url = "gs://bucket/path/to/data.csv"
csv_table = bigquery.external csv_url

csv_table.format #=> "CSV"
csv_table.csv? #=> true

Returns:

  • (Boolean)


201
202
203
# File 'lib/google/cloud/bigquery/external.rb', line 201

def csv?
  @gapi.source_format == "CSV"
end

#formatString

The data format. For CSV files, specify "CSV". For Google sheets, specify "GOOGLE_SHEETS". For newline-delimited JSON, specify "NEWLINE_DELIMITED_JSON". For Avro files, specify "AVRO". For Google Cloud Datastore backups, specify "DATASTORE_BACKUP". [Beta] For Google Cloud Bigtable, specify "BIGTABLE".

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

csv_url = "gs://bucket/path/to/data.csv"
csv_table = bigquery.external csv_url

csv_table.format #=> "CSV"

Returns:

  • (String)


181
182
183
# File 'lib/google/cloud/bigquery/external.rb', line 181

def format
  @gapi.source_format
end

#ignore_unknownBoolean

Indicates if BigQuery should allow extra values that are not represented in the table schema. If true, the extra values are ignored. If false, records with extra columns are treated as bad records, and if there are too many bad records, an invalid error is returned in the job result. The default value is false.

BigQuery treats trailing columns as an extra in CSV, named values that don't match any column names in JSON. This setting is ignored for Google Cloud Bigtable, Google Cloud Datastore backups and Avro formats. Optional.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

csv_url = "gs://bucket/path/to/data.csv"
csv_table = bigquery.external csv_url do |csv|
  csv.ignore_unknown = true
end

csv_table.ignore_unknown #=> true

Returns:

  • (Boolean)


451
452
453
# File 'lib/google/cloud/bigquery/external.rb', line 451

def ignore_unknown
  @gapi.ignore_unknown_values
end

#ignore_unknown=(new_ignore_unknown) ⇒ Object

Set whether BigQuery should allow extra values that are not represented in the table schema. If true, the extra values are ignored. If false, records with extra columns are treated as bad records, and if there are too many bad records, an invalid error is returned in the job result. The default value is false.

BigQuery treats trailing columns as an extra in CSV, named values that don't match any column names in JSON. This setting is ignored for Google Cloud Bigtable, Google Cloud Datastore backups and Avro formats. Optional.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

csv_url = "gs://bucket/path/to/data.csv"
csv_table = bigquery.external csv_url do |csv|
  csv.ignore_unknown = true
end

csv_table.ignore_unknown #=> true

Parameters:

  • new_ignore_unknown (Boolean)

    New ignore_unknown value



481
482
483
484
# File 'lib/google/cloud/bigquery/external.rb', line 481

def ignore_unknown= new_ignore_unknown
  frozen_check!
  @gapi.ignore_unknown_values = new_ignore_unknown
end

#json?Boolean

Whether the data format is "NEWLINE_DELIMITED_JSON".

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

json_url = "gs://bucket/path/to/data.json"
json_table = bigquery.external json_url

json_table.format #=> "NEWLINE_DELIMITED_JSON"
json_table.json? #=> true

Returns:

  • (Boolean)


221
222
223
# File 'lib/google/cloud/bigquery/external.rb', line 221

def json?
  @gapi.source_format == "NEWLINE_DELIMITED_JSON"
end

#max_bad_recordsInteger

The maximum number of bad records that BigQuery can ignore when reading data. If the number of bad records exceeds this value, an invalid error is returned in the job result. The default value is 0, which requires that all records are valid. This setting is ignored for Google Cloud Bigtable, Google Cloud Datastore backups and Avro formats.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

csv_url = "gs://bucket/path/to/data.csv"
csv_table = bigquery.external csv_url do |csv|
  csv.max_bad_records = 10
end

csv_table.max_bad_records #=> 10

Returns:

  • (Integer)


508
509
510
# File 'lib/google/cloud/bigquery/external.rb', line 508

def max_bad_records
  @gapi.max_bad_records
end

#max_bad_records=(new_max_bad_records) ⇒ Object

Set the maximum number of bad records that BigQuery can ignore when reading data. If the number of bad records exceeds this value, an invalid error is returned in the job result. The default value is 0, which requires that all records are valid. This setting is ignored for Google Cloud Bigtable, Google Cloud Datastore backups and Avro formats.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

csv_url = "gs://bucket/path/to/data.csv"
csv_table = bigquery.external csv_url do |csv|
  csv.max_bad_records = 10
end

csv_table.max_bad_records #=> 10

Parameters:

  • new_max_bad_records (Integer)

    New max_bad_records value



534
535
536
537
# File 'lib/google/cloud/bigquery/external.rb', line 534

def max_bad_records= new_max_bad_records
  frozen_check!
  @gapi.max_bad_records = new_max_bad_records
end

#sheets?Boolean

Whether the data format is "GOOGLE_SHEETS".

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

sheets_url = "https://docs.google.com/spreadsheets/d/1234567980"
sheets_table = bigquery.external sheets_url

sheets_table.format #=> "GOOGLE_SHEETS"
sheets_table.sheets? #=> true

Returns:

  • (Boolean)


241
242
243
# File 'lib/google/cloud/bigquery/external.rb', line 241

def sheets?
  @gapi.source_format == "GOOGLE_SHEETS"
end

#urlsArray<String>

The fully-qualified URIs that point to your data in Google Cloud. For Google Cloud Storage URIs: Each URI can contain one '' wildcard character and it must come after the 'bucket' name. Size limits related to load jobs apply to external data sources. For Google Cloud Bigtable URIs: Exactly one URI can be specified and it has be a fully specified and valid HTTPS URL for a Google Cloud Bigtable table. For Google Cloud Datastore backups, exactly one URI can be specified, and it must end with '.backup_info'. Also, the '' wildcard character is not allowed.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

csv_url = "gs://bucket/path/to/data.csv"
csv_table = bigquery.external csv_url

csv_table.urls #=> ["gs://bucket/path/to/data.csv"]

Returns:

  • (Array<String>)


328
329
330
# File 'lib/google/cloud/bigquery/external.rb', line 328

def urls
  @gapi.source_uris
end