Class: Google::Cloud::Bigquery::LoadJob
- Defined in:
- lib/google/cloud/bigquery/load_job.rb
Overview
LoadJob
A Job subclass representing a load operation that may be performed on a Table. A LoadJob instance is created when you call Table#load_job.
Direct Known Subclasses
Defined Under Namespace
Classes: Updater
Attributes collapse
-
#clustering? ⇒ Boolean?
Checks if the destination table will be clustered.
-
#clustering_fields ⇒ Array<String>?
One or more fields on which the destination table should be clustered.
-
#encryption ⇒ Google::Cloud::BigQuery::EncryptionConfiguration
The encryption configuration of the destination table.
-
#hive_partitioning? ⇒ Boolean
Checks if hive partitioning options are set.
-
#hive_partitioning_mode ⇒ String?
The mode of hive partitioning to use when reading data.
-
#hive_partitioning_source_uri_prefix ⇒ String?
The common prefix for all source uris when hive partition detection is requested.
-
#output_bytes ⇒ Integer
The number of bytes that have been loaded into the table.
-
#range_partitioning? ⇒ Boolean
Checks if the destination table will be range partitioned.
-
#range_partitioning_end ⇒ Integer?
The end of range partitioning, exclusive.
-
#range_partitioning_field ⇒ String?
The field on which the destination table will be range partitioned, if any.
-
#range_partitioning_interval ⇒ Integer?
The width of each interval.
-
#range_partitioning_start ⇒ Integer?
The start of range partitioning, inclusive.
-
#time_partitioning? ⇒ Boolean?
Checks if the destination table will be time partitioned.
-
#time_partitioning_expiration ⇒ Integer?
The expiration for the destination table time partitions, if any, in seconds.
-
#time_partitioning_field ⇒ String?
The field on which the destination table will be time partitioned, if any.
-
#time_partitioning_require_filter? ⇒ Boolean
If set to true, queries over the destination table will require a time partition filter that can be used for partition elimination to be specified.
-
#time_partitioning_type ⇒ String?
The period for which the destination table will be time partitioned, if any.
Instance Method Summary collapse
-
#allow_jagged_rows? ⇒ Boolean
Checks if the load operation accepts rows that are missing trailing optional columns.
-
#autodetect? ⇒ Boolean
Checks if BigQuery should automatically infer the options and schema for CSV and JSON sources.
-
#backup? ⇒ Boolean
Checks if the source data is a Google Cloud Datastore backup.
-
#csv? ⇒ Boolean
Checks if the format of the source data is CSV.
-
#delimiter ⇒ String
The delimiter used between fields in the source data.
-
#destination ⇒ Table
The table into which the operation loads data.
-
#ignore_unknown_values? ⇒ Boolean
Checks if the load operation allows extra values that are not represented in the table schema.
-
#input_file_bytes ⇒ Integer
The number of bytes of source data in the load job.
-
#input_files ⇒ Integer
The number of source data files in the load job.
-
#iso8859_1? ⇒ Boolean
Checks if the character encoding of the data is ISO-8859-1.
-
#json? ⇒ Boolean
Checks if the format of the source data is newline-delimited JSON.
-
#max_bad_records ⇒ Integer
The maximum number of bad records that the load operation can ignore.
-
#null_marker ⇒ String
Specifies a string that represents a null value in a CSV file.
-
#orc? ⇒ Boolean
Checks if the source format is ORC.
-
#output_rows ⇒ Integer
The number of rows that have been loaded into the table.
-
#parquet? ⇒ Boolean
Checks if the source format is Parquet.
-
#quote ⇒ String
The value that is used to quote data sections in a CSV file.
-
#quoted_newlines? ⇒ Boolean
Checks if quoted data sections may contain newline characters in a CSV file.
-
#schema ⇒ Schema?
The schema for the destination table.
-
#schema_update_options ⇒ Array<String>
Allows the schema of the destination table to be updated as a side effect of the load job if a schema is autodetected or supplied in the job configuration.
-
#skip_leading_rows ⇒ Integer
The number of rows at the top of a CSV file that BigQuery will skip when loading the data.
-
#sources ⇒ Object
The URI or URIs representing the Google Cloud Storage files from which the operation loads data.
-
#utf8? ⇒ Boolean
Checks if the character encoding of the data is UTF-8.
Methods inherited from Job
#cancel, #configuration, #created_at, #done?, #ended_at, #error, #errors, #failed?, #job_id, #labels, #location, #num_child_jobs, #parent_job_id, #pending?, #project_id, #reload!, #rerun!, #running?, #script_statistics, #started_at, #state, #statistics, #status, #user_email, #wait_until_done!
Instance Method Details
#allow_jagged_rows? ⇒ Boolean
Checks if the load operation accepts rows that are missing trailing
optional columns. The missing values are treated as nulls. If false
,
records with missing trailing columns are treated as bad records, and
if there are too many bad records, an error is returned. The default
value is false
. Only applicable to CSV, ignored for other formats.
252 253 254 255 256 |
# File 'lib/google/cloud/bigquery/load_job.rb', line 252 def allow_jagged_rows? val = @gapi.configuration.load.allow_jagged_rows val = false if val.nil? val end |
#autodetect? ⇒ Boolean
Checks if BigQuery should automatically infer the options and schema
for CSV and JSON sources. The default is false
.
183 184 185 186 187 |
# File 'lib/google/cloud/bigquery/load_job.rb', line 183 def autodetect? val = @gapi.configuration.load.autodetect val = false if val.nil? val end |
#backup? ⇒ Boolean
Checks if the source data is a Google Cloud Datastore backup.
218 219 220 |
# File 'lib/google/cloud/bigquery/load_job.rb', line 218 def backup? @gapi.configuration.load.source_format == "DATASTORE_BACKUP" end |
#clustering? ⇒ Boolean?
Checks if the destination table will be clustered.
571 572 573 |
# File 'lib/google/cloud/bigquery/load_job.rb', line 571 def clustering? !@gapi.configuration.load.clustering.nil? end |
#clustering_fields ⇒ Array<String>?
One or more fields on which the destination table should be clustered. Must be specified with time-based partitioning, data in the table will be first partitioned and subsequently clustered. The order of the returned fields determines the sort order of the data.
See Google::Cloud::Bigquery::LoadJob::Updater#clustering_fields=.
595 596 597 |
# File 'lib/google/cloud/bigquery/load_job.rb', line 595 def clustering_fields @gapi.configuration.load.clustering.fields if clustering? end |
#csv? ⇒ Boolean
Checks if the format of the source data is CSV. The default is true
.
206 207 208 209 210 |
# File 'lib/google/cloud/bigquery/load_job.rb', line 206 def csv? val = @gapi.configuration.load.source_format return true if val.nil? val == "CSV" end |
#delimiter ⇒ String
The delimiter used between fields in the source data. The default is a comma (,).
79 80 81 |
# File 'lib/google/cloud/bigquery/load_job.rb', line 79 def delimiter @gapi.configuration.load.field_delimiter || "," end |
#destination ⇒ Table
The table into which the operation loads data. This is the table on which Table#load_job was invoked.
67 68 69 70 71 |
# File 'lib/google/cloud/bigquery/load_job.rb', line 67 def destination table = @gapi.configuration.load.destination_table return nil unless table retrieve_table table.project_id, table.dataset_id, table.table_id end |
#encryption ⇒ Google::Cloud::BigQuery::EncryptionConfiguration
The encryption configuration of the destination table.
349 350 351 352 353 |
# File 'lib/google/cloud/bigquery/load_job.rb', line 349 def encryption EncryptionConfiguration.from_gapi( @gapi.configuration.load.destination_encryption_configuration ) end |
#hive_partitioning? ⇒ Boolean
Checks if hive partitioning options are set.
376 377 378 |
# File 'lib/google/cloud/bigquery/load_job.rb', line 376 def hive_partitioning? !@gapi.configuration.load..nil? end |
#hive_partitioning_mode ⇒ String?
The mode of hive partitioning to use when reading data. The following modes are supported:
AUTO
: automatically infer partition key name(s) and type(s).STRINGS
: automatically infer partition key name(s). All types are interpreted as strings.CUSTOM
: partition key schema is encoded in the source URI prefix.
393 394 395 |
# File 'lib/google/cloud/bigquery/load_job.rb', line 393 def hive_partitioning_mode @gapi.configuration.load..mode if hive_partitioning? end |
#hive_partitioning_source_uri_prefix ⇒ String?
The common prefix for all source uris when hive partition detection is requested. The prefix must end immediately before the partition key encoding begins. For example, consider files following this data layout:
gs://bucket/path_to_table/dt=2019-01-01/country=BR/id=7/file.avro
gs://bucket/path_to_table/dt=2018-12-31/country=CA/id=3/file.avro
When hive partitioning is requested with either AUTO
or STRINGS
mode, the common prefix can be either of
gs://bucket/path_to_table
or gs://bucket/path_to_table/
(trailing slash does not matter).
415 416 417 |
# File 'lib/google/cloud/bigquery/load_job.rb', line 415 def hive_partitioning_source_uri_prefix @gapi.configuration.load..source_uri_prefix if hive_partitioning? end |
#ignore_unknown_values? ⇒ Boolean
Checks if the load operation allows extra values that are not
represented in the table schema. If true
, the extra values are
ignored. If false
, records with extra columns are treated as bad
records, and if there are too many bad records, an invalid error is
returned. The default is false
.
268 269 270 271 272 |
# File 'lib/google/cloud/bigquery/load_job.rb', line 268 def ignore_unknown_values? val = @gapi.configuration.load.ignore_unknown_values val = false if val.nil? val end |
#input_file_bytes ⇒ Integer
The number of bytes of source data in the load job.
324 325 326 327 328 |
# File 'lib/google/cloud/bigquery/load_job.rb', line 324 def input_file_bytes Integer @gapi.statistics.load.input_file_bytes rescue StandardError nil end |
#input_files ⇒ Integer
The number of source data files in the load job.
313 314 315 316 317 |
# File 'lib/google/cloud/bigquery/load_job.rb', line 313 def input_files Integer @gapi.statistics.load.input_files rescue StandardError nil end |
#iso8859_1? ⇒ Boolean
Checks if the character encoding of the data is ISO-8859-1.
114 115 116 |
# File 'lib/google/cloud/bigquery/load_job.rb', line 114 def iso8859_1? @gapi.configuration.load.encoding == "ISO-8859-1" end |
#json? ⇒ Boolean
Checks if the format of the source data is newline-delimited
JSON. The default is false
.
196 197 198 |
# File 'lib/google/cloud/bigquery/load_job.rb', line 196 def json? @gapi.configuration.load.source_format == "NEWLINE_DELIMITED_JSON" end |
#max_bad_records ⇒ Integer
The maximum number of bad records that the load operation can ignore.
If the number of bad records exceeds this value, an error is returned.
The default value is 0
, which requires that all records be valid.
140 141 142 143 144 |
# File 'lib/google/cloud/bigquery/load_job.rb', line 140 def max_bad_records val = @gapi.configuration.load.max_bad_records val = 0 if val.nil? val end |
#null_marker ⇒ String
Specifies a string that represents a null value in a CSV file. For
example, if you specify \N
, BigQuery interprets \N
as a null value
when loading a CSV file. The default value is the empty string. If you
set this property to a custom value, BigQuery throws an error if an
empty string is present for all data types except for STRING and BYTE.
For STRING and BYTE columns, BigQuery interprets the empty string as
an empty value.
157 158 159 160 161 |
# File 'lib/google/cloud/bigquery/load_job.rb', line 157 def null_marker val = @gapi.configuration.load.null_marker val = "" if val.nil? val end |
#orc? ⇒ Boolean
Checks if the source format is ORC.
228 229 230 |
# File 'lib/google/cloud/bigquery/load_job.rb', line 228 def orc? @gapi.configuration.load.source_format == "ORC" end |
#output_bytes ⇒ Integer
The number of bytes that have been loaded into the table. While an import job is in the running state, this value may change.
361 362 363 364 365 |
# File 'lib/google/cloud/bigquery/load_job.rb', line 361 def output_bytes Integer @gapi.statistics.load.output_bytes rescue StandardError nil end |
#output_rows ⇒ Integer
The number of rows that have been loaded into the table. While an import job is in the running state, this value may change.
336 337 338 339 340 |
# File 'lib/google/cloud/bigquery/load_job.rb', line 336 def output_rows Integer @gapi.statistics.load.output_rows rescue StandardError nil end |
#parquet? ⇒ Boolean
Checks if the source format is Parquet.
238 239 240 |
# File 'lib/google/cloud/bigquery/load_job.rb', line 238 def parquet? @gapi.configuration.load.source_format == "PARQUET" end |
#quote ⇒ String
The value that is used to quote data sections in a CSV file. The
default value is a double-quote ("
). If your data does not contain
quoted sections, the value should be an empty string. If your data
contains quoted newline characters, #quoted_newlines? should return
true
.
127 128 129 130 131 |
# File 'lib/google/cloud/bigquery/load_job.rb', line 127 def quote val = @gapi.configuration.load.quote val = "\"" if val.nil? val end |
#quoted_newlines? ⇒ Boolean
Checks if quoted data sections may contain newline characters in a CSV
file. The default is false
.
170 171 172 173 174 |
# File 'lib/google/cloud/bigquery/load_job.rb', line 170 def quoted_newlines? val = @gapi.configuration.load.allow_quoted_newlines val = false if val.nil? val end |
#range_partitioning? ⇒ Boolean
Checks if the destination table will be range partitioned. See Creating and using integer range partitioned tables.
427 428 429 |
# File 'lib/google/cloud/bigquery/load_job.rb', line 427 def range_partitioning? !@gapi.configuration.load.range_partitioning.nil? end |
#range_partitioning_end ⇒ Integer?
The end of range partitioning, exclusive. See Creating and using integer range partitioned tables.
479 480 481 |
# File 'lib/google/cloud/bigquery/load_job.rb', line 479 def range_partitioning_end @gapi.configuration.load.range_partitioning.range.end if range_partitioning? end |
#range_partitioning_field ⇒ String?
The field on which the destination table will be range partitioned, if any. The field must be a
top-level NULLABLE/REQUIRED
field. The only supported type is INTEGER/INT64
. See
Creating and using integer range partitioned
tables.
441 442 443 |
# File 'lib/google/cloud/bigquery/load_job.rb', line 441 def range_partitioning_field @gapi.configuration.load.range_partitioning.field if range_partitioning? end |
#range_partitioning_interval ⇒ Integer?
The width of each interval. See Creating and using integer range partitioned tables.
466 467 468 469 |
# File 'lib/google/cloud/bigquery/load_job.rb', line 466 def range_partitioning_interval return nil unless range_partitioning? @gapi.configuration.load.range_partitioning.range.interval end |
#range_partitioning_start ⇒ Integer?
The start of range partitioning, inclusive. See Creating and using integer range partitioned tables.
453 454 455 |
# File 'lib/google/cloud/bigquery/load_job.rb', line 453 def range_partitioning_start @gapi.configuration.load.range_partitioning.range.start if range_partitioning? end |
#schema ⇒ Schema?
The schema for the destination table. The schema can be omitted if the destination table already exists, or if you're loading data from Google Cloud Datastore.
The returned object is frozen and changes are not allowed. Use Table#schema to update the schema.
284 285 286 |
# File 'lib/google/cloud/bigquery/load_job.rb', line 284 def schema Schema.from_gapi(@gapi.configuration.load.schema).freeze end |
#schema_update_options ⇒ Array<String>
Allows the schema of the destination table to be updated as a side
effect of the load job if a schema is autodetected or supplied in the
job configuration. Schema update options are supported in two cases:
when write disposition is WRITE_APPEND
; when write disposition is
WRITE_TRUNCATE
and the destination table is a partition of a table,
specified by partition decorators. For normal tables, WRITE_TRUNCATE
will always overwrite the schema. One or more of the following values
are specified:
ALLOW_FIELD_ADDITION
: allow adding a nullable field to the schema.ALLOW_FIELD_RELAXATION
: allow relaxing a required field in the original schema to nullable.
304 305 306 |
# File 'lib/google/cloud/bigquery/load_job.rb', line 304 def Array @gapi.configuration.load. end |
#skip_leading_rows ⇒ Integer
The number of rows at the top of a CSV file that BigQuery will skip when loading the data. The default value is 0. This property is useful if you have header rows in the file that should be skipped.
91 92 93 |
# File 'lib/google/cloud/bigquery/load_job.rb', line 91 def skip_leading_rows @gapi.configuration.load.skip_leading_rows || 0 end |
#sources ⇒ Object
The URI or URIs representing the Google Cloud Storage files from which the operation loads data.
57 58 59 |
# File 'lib/google/cloud/bigquery/load_job.rb', line 57 def sources Array @gapi.configuration.load.source_uris end |
#time_partitioning? ⇒ Boolean?
Checks if the destination table will be time partitioned. See Partitioned Tables.
492 493 494 |
# File 'lib/google/cloud/bigquery/load_job.rb', line 492 def time_partitioning? !@gapi.configuration.load.time_partitioning.nil? end |
#time_partitioning_expiration ⇒ Integer?
The expiration for the destination table time partitions, if any, in seconds. See Partitioned Tables.
536 537 538 539 540 541 |
# File 'lib/google/cloud/bigquery/load_job.rb', line 536 def time_partitioning_expiration return nil unless time_partitioning? return nil if @gapi.configuration.load.time_partitioning.expiration_ms.nil? @gapi.configuration.load.time_partitioning.expiration_ms / 1_000 end |
#time_partitioning_field ⇒ String?
The field on which the destination table will be time partitioned, if any.
If not set, the destination table will be time partitioned by pseudo column
_PARTITIONTIME
; if set, the table will be time partitioned by this field.
See Partitioned Tables.
522 523 524 |
# File 'lib/google/cloud/bigquery/load_job.rb', line 522 def time_partitioning_field @gapi.configuration.load.time_partitioning.field if time_partitioning? end |
#time_partitioning_require_filter? ⇒ Boolean
If set to true, queries over the destination table will require a time partition filter that can be used for partition elimination to be specified. See Partitioned Tables.
554 555 556 557 558 |
# File 'lib/google/cloud/bigquery/load_job.rb', line 554 def time_partitioning_require_filter? tp = @gapi.configuration.load.time_partitioning return false if tp.nil? || tp.require_partition_filter.nil? tp.require_partition_filter end |
#time_partitioning_type ⇒ String?
The period for which the destination table will be time partitioned, if any. See Partitioned Tables.
506 507 508 |
# File 'lib/google/cloud/bigquery/load_job.rb', line 506 def time_partitioning_type @gapi.configuration.load.time_partitioning.type if time_partitioning? end |
#utf8? ⇒ Boolean
Checks if the character encoding of the data is UTF-8. This is the default.
102 103 104 105 106 |
# File 'lib/google/cloud/bigquery/load_job.rb', line 102 def utf8? val = @gapi.configuration.load.encoding return true if val.nil? val == "UTF-8" end |