Class: Google::Cloud::Bigquery::LoadJob
- Defined in:
- lib/google/cloud/bigquery/load_job.rb
Overview
LoadJob
A Job subclass representing a load operation that may be performed on a Table. A LoadJob instance is created when you call Table#load_job.
Direct Known Subclasses
Defined Under Namespace
Classes: Updater
Attributes collapse
-
#clustering? ⇒ Boolean
Checks if the destination table will be clustered.
-
#clustering_fields ⇒ Array<String>?
One or more fields on which the destination table should be clustered.
-
#encryption ⇒ Google::Cloud::BigQuery::EncryptionConfiguration
The encryption configuration of the destination table.
-
#hive_partitioning? ⇒ Boolean
Checks if hive partitioning options are set.
-
#hive_partitioning_mode ⇒ String?
The mode of hive partitioning to use when reading data.
-
#hive_partitioning_source_uri_prefix ⇒ String?
The common prefix for all source uris when hive partition detection is requested.
-
#output_bytes ⇒ Integer
The number of bytes that have been loaded into the table.
-
#parquet_enable_list_inference? ⇒ Boolean?
Indicates whether to use schema inference specifically for Parquet
LIST
logical type. -
#parquet_enum_as_string? ⇒ Boolean?
Indicates whether to infer Parquet
ENUM
logical type asSTRING
instead ofBYTES
by default. -
#parquet_options? ⇒ Boolean
Checks if Parquet options are set.
-
#range_partitioning? ⇒ Boolean
Checks if the destination table will be range partitioned.
-
#range_partitioning_end ⇒ Integer?
The end of range partitioning, exclusive.
-
#range_partitioning_field ⇒ String?
The field on which the destination table will be range partitioned, if any.
-
#range_partitioning_interval ⇒ Integer?
The width of each interval.
-
#range_partitioning_start ⇒ Integer?
The start of range partitioning, inclusive.
-
#time_partitioning? ⇒ Boolean
Checks if the destination table will be time partitioned.
-
#time_partitioning_expiration ⇒ Integer?
The expiration for the destination table time partitions, if any, in seconds.
-
#time_partitioning_field ⇒ String?
The field on which the destination table will be time partitioned, if any.
-
#time_partitioning_require_filter? ⇒ Boolean
If set to true, queries over the destination table will require a time partition filter that can be used for partition elimination to be specified.
-
#time_partitioning_type ⇒ String?
The period for which the destination table will be time partitioned, if any.
Instance Method Summary collapse
-
#allow_jagged_rows? ⇒ Boolean
Checks if the load operation accepts rows that are missing trailing optional columns.
-
#autodetect? ⇒ Boolean
Checks if BigQuery should automatically infer the options and schema for CSV and JSON sources.
-
#backup? ⇒ Boolean
Checks if the source data is a Google Cloud Datastore backup.
-
#csv? ⇒ Boolean
Checks if the format of the source data is CSV.
-
#delimiter ⇒ String
The delimiter used between fields in the source data.
-
#destination(view: nil) ⇒ Table
The table into which the operation loads data.
-
#ignore_unknown_values? ⇒ Boolean
Checks if the load operation allows extra values that are not represented in the table schema.
-
#input_file_bytes ⇒ Integer
The number of bytes of source data in the load job.
-
#input_files ⇒ Integer
The number of source data files in the load job.
-
#iso8859_1? ⇒ Boolean
Checks if the character encoding of the data is ISO-8859-1.
-
#json? ⇒ Boolean
Checks if the format of the source data is newline-delimited JSON.
-
#max_bad_records ⇒ Integer
The maximum number of bad records that the load operation can ignore.
-
#null_marker ⇒ String
Specifies a string that represents a null value in a CSV file.
-
#orc? ⇒ Boolean
Checks if the source format is ORC.
-
#output_rows ⇒ Integer
The number of rows that have been loaded into the table.
-
#parquet? ⇒ Boolean
Checks if the source format is Parquet.
-
#quote ⇒ String
The value that is used to quote data sections in a CSV file.
-
#quoted_newlines? ⇒ Boolean
Checks if quoted data sections may contain newline characters in a CSV file.
-
#schema ⇒ Schema?
The schema for the destination table.
-
#schema_update_options ⇒ Array<String>
Allows the schema of the destination table to be updated as a side effect of the load job if a schema is autodetected or supplied in the job configuration.
-
#skip_leading_rows ⇒ Integer
The number of rows at the top of a CSV file that BigQuery will skip when loading the data.
-
#sources ⇒ Object
The URI or URIs representing the Google Cloud Storage files from which the operation loads data.
-
#utf8? ⇒ Boolean
Checks if the character encoding of the data is UTF-8.
Methods inherited from Job
#cancel, #configuration, #created_at, #delete, #done?, #ended_at, #error, #errors, #failed?, #job_id, #labels, #location, #num_child_jobs, #parent_job_id, #pending?, #project_id, #reload!, #rerun!, #reservation_usage, #running?, #script_statistics, #session_id, #started_at, #state, #statistics, #status, #transaction_id, #user_email, #wait_until_done!
Instance Method Details
#allow_jagged_rows? ⇒ Boolean
Checks if the load operation accepts rows that are missing trailing
optional columns. The missing values are treated as nulls. If false
,
records with missing trailing columns are treated as bad records, and
if there are too many bad records, an error is returned. The default
value is false
. Only applicable to CSV, ignored for other formats.
258 259 260 261 262 |
# File 'lib/google/cloud/bigquery/load_job.rb', line 258 def allow_jagged_rows? val = @gapi.configuration.load.allow_jagged_rows val = false if val.nil? val end |
#autodetect? ⇒ Boolean
Checks if BigQuery should automatically infer the options and schema
for CSV and JSON sources. The default is false
.
189 190 191 192 193 |
# File 'lib/google/cloud/bigquery/load_job.rb', line 189 def autodetect? val = @gapi.configuration.load.autodetect val = false if val.nil? val end |
#backup? ⇒ Boolean
Checks if the source data is a Google Cloud Datastore backup.
224 225 226 |
# File 'lib/google/cloud/bigquery/load_job.rb', line 224 def backup? @gapi.configuration.load.source_format == "DATASTORE_BACKUP" end |
#clustering? ⇒ Boolean
Checks if the destination table will be clustered.
See Google::Cloud::Bigquery::LoadJob::Updater#clustering_fields=, Table#clustering_fields and Table#clustering_fields=.
625 626 627 |
# File 'lib/google/cloud/bigquery/load_job.rb', line 625 def clustering? !@gapi.configuration.load.clustering.nil? end |
#clustering_fields ⇒ Array<String>?
One or more fields on which the destination table should be clustered. Must be specified with time-based partitioning, data in the table will be first partitioned and subsequently clustered. The order of the returned fields determines the sort order of the data.
BigQuery supports clustering for both partitioned and non-partitioned tables.
See Google::Cloud::Bigquery::LoadJob::Updater#clustering_fields=, Table#clustering_fields and Table#clustering_fields=.
651 652 653 |
# File 'lib/google/cloud/bigquery/load_job.rb', line 651 def clustering_fields @gapi.configuration.load.clustering.fields if clustering? end |
#csv? ⇒ Boolean
Checks if the format of the source data is CSV. The default is true
.
212 213 214 215 216 |
# File 'lib/google/cloud/bigquery/load_job.rb', line 212 def csv? val = @gapi.configuration.load.source_format return true if val.nil? val == "CSV" end |
#delimiter ⇒ String
The delimiter used between fields in the source data. The default is a comma (,).
85 86 87 |
# File 'lib/google/cloud/bigquery/load_job.rb', line 85 def delimiter @gapi.configuration.load.field_delimiter || "," end |
#destination(view: nil) ⇒ Table
The table into which the operation loads data. This is the table on which Table#load_job was invoked.
73 74 75 76 77 |
# File 'lib/google/cloud/bigquery/load_job.rb', line 73 def destination view: nil table = @gapi.configuration.load.destination_table return nil unless table retrieve_table table.project_id, table.dataset_id, table.table_id, metadata_view: view end |
#encryption ⇒ Google::Cloud::BigQuery::EncryptionConfiguration
The encryption configuration of the destination table.
355 356 357 358 359 |
# File 'lib/google/cloud/bigquery/load_job.rb', line 355 def encryption EncryptionConfiguration.from_gapi( @gapi.configuration.load.destination_encryption_configuration ) end |
#hive_partitioning? ⇒ Boolean
Checks if hive partitioning options are set.
382 383 384 |
# File 'lib/google/cloud/bigquery/load_job.rb', line 382 def hive_partitioning? !@gapi.configuration.load..nil? end |
#hive_partitioning_mode ⇒ String?
The mode of hive partitioning to use when reading data. The following modes are supported:
AUTO
: automatically infer partition key name(s) and type(s).STRINGS
: automatically infer partition key name(s). All types are interpreted as strings.CUSTOM
: partition key schema is encoded in the source URI prefix.
399 400 401 |
# File 'lib/google/cloud/bigquery/load_job.rb', line 399 def hive_partitioning_mode @gapi.configuration.load..mode if hive_partitioning? end |
#hive_partitioning_source_uri_prefix ⇒ String?
The common prefix for all source uris when hive partition detection is requested. The prefix must end immediately before the partition key encoding begins. For example, consider files following this data layout:
gs://bucket/path_to_table/dt=2019-01-01/country=BR/id=7/file.avro
gs://bucket/path_to_table/dt=2018-12-31/country=CA/id=3/file.avro
When hive partitioning is requested with either AUTO
or STRINGS
mode, the common prefix can be either of
gs://bucket/path_to_table
or gs://bucket/path_to_table/
(trailing slash does not matter).
421 422 423 |
# File 'lib/google/cloud/bigquery/load_job.rb', line 421 def hive_partitioning_source_uri_prefix @gapi.configuration.load..source_uri_prefix if hive_partitioning? end |
#ignore_unknown_values? ⇒ Boolean
Checks if the load operation allows extra values that are not
represented in the table schema. If true
, the extra values are
ignored. If false
, records with extra columns are treated as bad
records, and if there are too many bad records, an invalid error is
returned. The default is false
.
274 275 276 277 278 |
# File 'lib/google/cloud/bigquery/load_job.rb', line 274 def ignore_unknown_values? val = @gapi.configuration.load.ignore_unknown_values val = false if val.nil? val end |
#input_file_bytes ⇒ Integer
The number of bytes of source data in the load job.
330 331 332 333 334 |
# File 'lib/google/cloud/bigquery/load_job.rb', line 330 def input_file_bytes Integer @gapi.statistics.load.input_file_bytes rescue StandardError nil end |
#input_files ⇒ Integer
The number of source data files in the load job.
319 320 321 322 323 |
# File 'lib/google/cloud/bigquery/load_job.rb', line 319 def input_files Integer @gapi.statistics.load.input_files rescue StandardError nil end |
#iso8859_1? ⇒ Boolean
Checks if the character encoding of the data is ISO-8859-1.
120 121 122 |
# File 'lib/google/cloud/bigquery/load_job.rb', line 120 def iso8859_1? @gapi.configuration.load.encoding == "ISO-8859-1" end |
#json? ⇒ Boolean
Checks if the format of the source data is newline-delimited
JSON. The default is false
.
202 203 204 |
# File 'lib/google/cloud/bigquery/load_job.rb', line 202 def json? @gapi.configuration.load.source_format == "NEWLINE_DELIMITED_JSON" end |
#max_bad_records ⇒ Integer
The maximum number of bad records that the load operation can ignore.
If the number of bad records exceeds this value, an error is returned.
The default value is 0
, which requires that all records be valid.
146 147 148 149 150 |
# File 'lib/google/cloud/bigquery/load_job.rb', line 146 def max_bad_records val = @gapi.configuration.load.max_bad_records val = 0 if val.nil? val end |
#null_marker ⇒ String
Specifies a string that represents a null value in a CSV file. For
example, if you specify \N
, BigQuery interprets \N
as a null value
when loading a CSV file. The default value is the empty string. If you
set this property to a custom value, BigQuery throws an error if an
empty string is present for all data types except for STRING and BYTE.
For STRING and BYTE columns, BigQuery interprets the empty string as
an empty value.
163 164 165 166 167 |
# File 'lib/google/cloud/bigquery/load_job.rb', line 163 def null_marker val = @gapi.configuration.load.null_marker val = "" if val.nil? val end |
#orc? ⇒ Boolean
Checks if the source format is ORC.
234 235 236 |
# File 'lib/google/cloud/bigquery/load_job.rb', line 234 def orc? @gapi.configuration.load.source_format == "ORC" end |
#output_bytes ⇒ Integer
The number of bytes that have been loaded into the table. While an import job is in the running state, this value may change.
367 368 369 370 371 |
# File 'lib/google/cloud/bigquery/load_job.rb', line 367 def output_bytes Integer @gapi.statistics.load.output_bytes rescue StandardError nil end |
#output_rows ⇒ Integer
The number of rows that have been loaded into the table. While an import job is in the running state, this value may change.
342 343 344 345 346 |
# File 'lib/google/cloud/bigquery/load_job.rb', line 342 def output_rows Integer @gapi.statistics.load.output_rows rescue StandardError nil end |
#parquet? ⇒ Boolean
Checks if the source format is Parquet.
244 245 246 |
# File 'lib/google/cloud/bigquery/load_job.rb', line 244 def parquet? @gapi.configuration.load.source_format == "PARQUET" end |
#parquet_enable_list_inference? ⇒ Boolean?
Indicates whether to use schema inference specifically for Parquet LIST
logical type.
450 451 452 |
# File 'lib/google/cloud/bigquery/load_job.rb', line 450 def parquet_enable_list_inference? @gapi.configuration.load..enable_list_inference if end |
#parquet_enum_as_string? ⇒ Boolean?
Indicates whether to infer Parquet ENUM
logical type as STRING
instead of BYTES
by default.
464 465 466 |
# File 'lib/google/cloud/bigquery/load_job.rb', line 464 def parquet_enum_as_string? @gapi.configuration.load..enum_as_string if end |
#parquet_options? ⇒ Boolean
Checks if Parquet options are set.
435 436 437 |
# File 'lib/google/cloud/bigquery/load_job.rb', line 435 def !@gapi.configuration.load..nil? end |
#quote ⇒ String
The value that is used to quote data sections in a CSV file. The
default value is a double-quote ("
). If your data does not contain
quoted sections, the value should be an empty string. If your data
contains quoted newline characters, #quoted_newlines? should return
true
.
133 134 135 136 137 |
# File 'lib/google/cloud/bigquery/load_job.rb', line 133 def quote val = @gapi.configuration.load.quote val = "\"" if val.nil? val end |
#quoted_newlines? ⇒ Boolean
Checks if quoted data sections may contain newline characters in a CSV
file. The default is false
.
176 177 178 179 180 |
# File 'lib/google/cloud/bigquery/load_job.rb', line 176 def quoted_newlines? val = @gapi.configuration.load.allow_quoted_newlines val = false if val.nil? val end |
#range_partitioning? ⇒ Boolean
Checks if the destination table will be range partitioned. See Creating and using integer range partitioned tables.
476 477 478 |
# File 'lib/google/cloud/bigquery/load_job.rb', line 476 def range_partitioning? !@gapi.configuration.load.range_partitioning.nil? end |
#range_partitioning_end ⇒ Integer?
The end of range partitioning, exclusive. See Creating and using integer range partitioned tables.
528 529 530 |
# File 'lib/google/cloud/bigquery/load_job.rb', line 528 def range_partitioning_end @gapi.configuration.load.range_partitioning.range.end if range_partitioning? end |
#range_partitioning_field ⇒ String?
The field on which the destination table will be range partitioned, if any. The field must be a
top-level NULLABLE/REQUIRED
field. The only supported type is INTEGER/INT64
. See
Creating and using integer range partitioned
tables.
490 491 492 |
# File 'lib/google/cloud/bigquery/load_job.rb', line 490 def range_partitioning_field @gapi.configuration.load.range_partitioning.field if range_partitioning? end |
#range_partitioning_interval ⇒ Integer?
The width of each interval. See Creating and using integer range partitioned tables.
515 516 517 518 |
# File 'lib/google/cloud/bigquery/load_job.rb', line 515 def range_partitioning_interval return nil unless range_partitioning? @gapi.configuration.load.range_partitioning.range.interval end |
#range_partitioning_start ⇒ Integer?
The start of range partitioning, inclusive. See Creating and using integer range partitioned tables.
502 503 504 |
# File 'lib/google/cloud/bigquery/load_job.rb', line 502 def range_partitioning_start @gapi.configuration.load.range_partitioning.range.start if range_partitioning? end |
#schema ⇒ Schema?
The schema for the destination table. The schema can be omitted if the destination table already exists, or if you're loading data from Google Cloud Datastore.
The returned object is frozen and changes are not allowed. Use Table#schema to update the schema.
290 291 292 |
# File 'lib/google/cloud/bigquery/load_job.rb', line 290 def schema Schema.from_gapi(@gapi.configuration.load.schema).freeze end |
#schema_update_options ⇒ Array<String>
Allows the schema of the destination table to be updated as a side
effect of the load job if a schema is autodetected or supplied in the
job configuration. Schema update options are supported in two cases:
when write disposition is WRITE_APPEND
; when write disposition is
WRITE_TRUNCATE
and the destination table is a partition of a table,
specified by partition decorators. For normal tables, WRITE_TRUNCATE
will always overwrite the schema. One or more of the following values
are specified:
ALLOW_FIELD_ADDITION
: allow adding a nullable field to the schema.ALLOW_FIELD_RELAXATION
: allow relaxing a required field in the original schema to nullable.
310 311 312 |
# File 'lib/google/cloud/bigquery/load_job.rb', line 310 def Array @gapi.configuration.load. end |
#skip_leading_rows ⇒ Integer
The number of rows at the top of a CSV file that BigQuery will skip when loading the data. The default value is 0. This property is useful if you have header rows in the file that should be skipped.
97 98 99 |
# File 'lib/google/cloud/bigquery/load_job.rb', line 97 def skip_leading_rows @gapi.configuration.load.skip_leading_rows || 0 end |
#sources ⇒ Object
The URI or URIs representing the Google Cloud Storage files from which the operation loads data.
57 58 59 |
# File 'lib/google/cloud/bigquery/load_job.rb', line 57 def sources Array @gapi.configuration.load.source_uris end |
#time_partitioning? ⇒ Boolean
Checks if the destination table will be time partitioned. See Partitioned Tables.
541 542 543 |
# File 'lib/google/cloud/bigquery/load_job.rb', line 541 def time_partitioning? !@gapi.configuration.load.time_partitioning.nil? end |
#time_partitioning_expiration ⇒ Integer?
The expiration for the destination table time partitions, if any, in seconds. See Partitioned Tables.
585 586 587 588 589 590 |
# File 'lib/google/cloud/bigquery/load_job.rb', line 585 def time_partitioning_expiration return nil unless time_partitioning? return nil if @gapi.configuration.load.time_partitioning.expiration_ms.nil? @gapi.configuration.load.time_partitioning.expiration_ms / 1_000 end |
#time_partitioning_field ⇒ String?
The field on which the destination table will be time partitioned, if any.
If not set, the destination table will be time partitioned by pseudo column
_PARTITIONTIME
; if set, the table will be time partitioned by this field.
See Partitioned Tables.
571 572 573 |
# File 'lib/google/cloud/bigquery/load_job.rb', line 571 def time_partitioning_field @gapi.configuration.load.time_partitioning.field if time_partitioning? end |
#time_partitioning_require_filter? ⇒ Boolean
If set to true, queries over the destination table will require a time partition filter that can be used for partition elimination to be specified. See Partitioned Tables.
603 604 605 606 607 |
# File 'lib/google/cloud/bigquery/load_job.rb', line 603 def time_partitioning_require_filter? tp = @gapi.configuration.load.time_partitioning return false if tp.nil? || tp.require_partition_filter.nil? tp.require_partition_filter end |
#time_partitioning_type ⇒ String?
The period for which the destination table will be time partitioned, if any. See Partitioned Tables.
555 556 557 |
# File 'lib/google/cloud/bigquery/load_job.rb', line 555 def time_partitioning_type @gapi.configuration.load.time_partitioning.type if time_partitioning? end |
#utf8? ⇒ Boolean
Checks if the character encoding of the data is UTF-8. This is the default.
108 109 110 111 112 |
# File 'lib/google/cloud/bigquery/load_job.rb', line 108 def utf8? val = @gapi.configuration.load.encoding return true if val.nil? val == "UTF-8" end |