As of January 1, 2020 this library no longer supports Python 2 on the latest released version. Library versions released prior to that date will continue to be available. For more information please visit Python 2 support on Google Cloud.

google.cloud.bigquery.job.ExtractJob

class google.cloud.bigquery.job.ExtractJob(job_id, source, destination_uris, client, job_config=None)[source]

Asynchronous job: extract data from a table into Cloud Storage.

Parameters
__init__(job_id, source, destination_uris, client, job_config=None)[source]

Initialize self. See help(type(self)) for accurate signature.

Methods

__init__(job_id, source, destination_uris, …)

Initialize self.

add_done_callback(fn)

Add a callback to be executed when the operation is complete.

cancel([client, retry, timeout])

API call: cancel job via a POST request

cancelled()

Check if the job has been cancelled.

done([retry, timeout, reload])

Checks if the job is complete.

exception([timeout])

Get the exception from the operation, blocking if necessary.

exists([client, retry, timeout])

API call: test for the existence of the job via a GET request

from_api_repr(resource, client)

Factory: construct a job given its API representation

reload([client, retry, timeout])

API call: refresh job properties via a GET request.

result([retry, timeout])

Start the job and wait for it to complete and get the result.

running()

True if the operation is currently running.

set_exception(exception)

Set the Future’s exception.

set_result(result)

Set the Future’s result.

to_api_repr()

Generate a resource for _begin().

Attributes

compression

See google.cloud.bigquery.job.ExtractJobConfig.compression.

configuration

The configuration for this extract job.

created

Datetime at which the job was created.

destination_format

See google.cloud.bigquery.job.ExtractJobConfig.destination_format.

destination_uri_file_counts

Return file counts from job statistics, if present.

destination_uris

URIs describing where the extracted data will be written in Cloud Storage, using the format gs://<bucket_name>/<object_name_or_glob>.

ended

Datetime at which the job finished.

error_result

Error information about the job as a whole.

errors

Information about individual errors generated by the job.

etag

ETag for the job resource.

field_delimiter

See google.cloud.bigquery.job.ExtractJobConfig.field_delimiter.

job_id

ID of the job.

job_type

Type of job.

labels

Labels for the job.

location

Location where the job runs.

num_child_jobs

The number of child jobs executed.

parent_job_id

Return the ID of the parent job.

path

URL path for the job’s APIs.

print_header

See google.cloud.bigquery.job.ExtractJobConfig.print_header.

project

Project bound to the job.

reservation_usage

Job resource usage breakdown by reservation.

script_statistics

Statistics for a child job of a script.

self_link

URL for the job resource.

session_info

[Preview] Information of the session if this job is part of one.

source

Table or Model from which data is to be loaded or extracted.

started

Datetime at which the job was started.

state

Status of the job.

transaction_info

Information of the multi-statement transaction if this job is part of one.

user_email

E-mail address of user who submitted the job.

add_done_callback(fn)

Add a callback to be executed when the operation is complete.

If the operation is not already complete, this will start a helper thread to poll for the status of the operation in the background.

Parameters

fn (Callable[Future]) – The callback to execute when the operation is complete.

cancel(client=None, retry: Optional[google.api_core.retry.Retry] = <google.api_core.retry.Retry object>, timeout: Optional[float] = None)bool

API call: cancel job via a POST request

See https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/cancel

Parameters
  • client (Optional[google.cloud.bigquery.client.Client]) – the client to use. If not passed, falls back to the client stored on the current dataset.

  • retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.

  • timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using retry

Returns

Boolean indicating that the cancel request was sent.

Return type

bool

cancelled()

Check if the job has been cancelled.

This always returns False. It’s not possible to check if a job was cancelled in the API. This method is here to satisfy the interface for google.api_core.future.Future.

Returns

False

Return type

bool

property compression

See google.cloud.bigquery.job.ExtractJobConfig.compression.

property configuration: google.cloud.bigquery.job.extract.ExtractJobConfig

The configuration for this extract job.

property created

Datetime at which the job was created.

Returns

the creation time (None until set from the server).

Return type

Optional[datetime.datetime]

property destination_format

See google.cloud.bigquery.job.ExtractJobConfig.destination_format.

property destination_uri_file_counts

Return file counts from job statistics, if present.

See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobStatistics4.FIELDS.destination_uri_file_counts

Returns

A list of integer counts, each representing the number of files per destination URI or URI pattern specified in the extract configuration. These values will be in the same order as the URIs specified in the ‘destinationUris’ field. Returns None if job is not yet complete.

Return type

List[int]

property destination_uris

URIs describing where the extracted data will be written in Cloud Storage, using the format gs://<bucket_name>/<object_name_or_glob>.

Type

List[str]

done(retry: google.api_core.retry.Retry = <google.api_core.retry.Retry object>, timeout: Optional[float] = None, reload: bool = True)bool

Checks if the job is complete.

Parameters
  • retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC. If the job state is DONE, retrying is aborted early, as the job will not change anymore.

  • timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using retry.

  • reload (Optional[bool]) – If True, make an API call to refresh the job state of unfinished jobs before checking. Default True.

Returns

True if the job is complete, False otherwise.

Return type

bool

property ended

Datetime at which the job finished.

Returns

the end time (None until set from the server).

Return type

Optional[datetime.datetime]

property error_result

Error information about the job as a whole.

Returns

the error information (None until set from the server).

Return type

Optional[Mapping]

property errors

Information about individual errors generated by the job.

Returns

the error information (None until set from the server).

Return type

Optional[List[Mapping]]

property etag

ETag for the job resource.

Returns

the ETag (None until set from the server).

Return type

Optional[str]

exception(timeout=<object object>)

Get the exception from the operation, blocking if necessary.

See the documentation for the result() method for details on how this method operates, as both result and this method rely on the exact same polling logic. The only difference is that this method does not accept retry and polling arguments but relies on the default ones instead.

Parameters
  • timeout (int) – How long to wait for the operation to complete.

  • None (If) –

  • indefinitely. (wait) –

Returns

The operation’s

error.

Return type

Optional[google.api_core.GoogleAPICallError]

exists(client=None, retry: google.api_core.retry.Retry = <google.api_core.retry.Retry object>, timeout: Optional[float] = None)bool

API call: test for the existence of the job via a GET request

See https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/get

Parameters
  • client (Optional[google.cloud.bigquery.client.Client]) – the client to use. If not passed, falls back to the client stored on the current dataset.

  • retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.

  • timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using retry.

Returns

Boolean indicating existence of the job.

Return type

bool

property field_delimiter

See google.cloud.bigquery.job.ExtractJobConfig.field_delimiter.

classmethod from_api_repr(resource: dict, client)google.cloud.bigquery.job.extract.ExtractJob[source]

Factory: construct a job given its API representation

Note

This method assumes that the project found in the resource matches the client’s project.

Parameters
  • resource (Dict) – dataset job representation returned from the API

  • client (google.cloud.bigquery.client.Client) – Client which holds credentials and project configuration for the dataset.

Returns

Job parsed from resource.

Return type

google.cloud.bigquery.job.ExtractJob

property job_id

ID of the job.

Type

str

property job_type

Type of job.

Returns

one of ‘load’, ‘copy’, ‘extract’, ‘query’.

Return type

str

property labels

Labels for the job.

Type

Dict[str, str]

property location

Location where the job runs.

Type

str

property num_child_jobs

The number of child jobs executed.

See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobStatistics.FIELDS.num_child_jobs

Returns

int

property parent_job_id

Return the ID of the parent job.

See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobStatistics.FIELDS.parent_job_id

Returns

parent job id.

Return type

Optional[str]

property path

URL path for the job’s APIs.

Returns

the path based on project and job ID.

Return type

str

property print_header

See google.cloud.bigquery.job.ExtractJobConfig.print_header.

property project

Project bound to the job.

Returns

the project (derived from the client).

Return type

str

reload(client=None, retry: google.api_core.retry.Retry = <google.api_core.retry.Retry object>, timeout: Optional[float] = None)

API call: refresh job properties via a GET request.

See https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/get

Parameters
  • client (Optional[google.cloud.bigquery.client.Client]) – the client to use. If not passed, falls back to the client stored on the current dataset.

  • retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.

  • timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using retry.

property reservation_usage

Job resource usage breakdown by reservation.

Returns

Reservation usage stats. Can be empty if not set from the server.

Return type

List[google.cloud.bigquery.job.ReservationUsage]

result(retry: Optional[google.api_core.retry.Retry] = <google.api_core.retry.Retry object>, timeout: Optional[float] = None)google.cloud.bigquery.job.base._AsyncJob

Start the job and wait for it to complete and get the result.

Parameters
  • retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC. If the job state is DONE, retrying is aborted early, as the job will not change anymore.

  • timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using retry. If multiple requests are made under the hood, timeout applies to each individual request.

Returns

This instance.

Return type

_AsyncJob

Raises
running()

True if the operation is currently running.

property script_statistics: Optional[google.cloud.bigquery.job.base.ScriptStatistics]

Statistics for a child job of a script.

URL for the job resource.

Returns

the URL (None until set from the server).

Return type

Optional[str]

property session_info: Optional[google.cloud.bigquery.job.base.SessionInfo]

[Preview] Information of the session if this job is part of one.

New in version 2.29.0.

set_exception(exception)

Set the Future’s exception.

set_result(result)

Set the Future’s result.

property source

Table or Model from which data is to be loaded or extracted.

Type

Union[ google.cloud.bigquery.table.TableReference, google.cloud.bigquery.model.ModelReference ]

property started

Datetime at which the job was started.

Returns

the start time (None until set from the server).

Return type

Optional[datetime.datetime]

property state

Status of the job.

Returns

the state (None until set from the server).

Return type

Optional[str]

to_api_repr()[source]

Generate a resource for _begin().

property transaction_info: Optional[google.cloud.bigquery.job.base.TransactionInfo]

Information of the multi-statement transaction if this job is part of one.

Since a scripting query job can execute multiple transactions, this property is only expected on child jobs. Use the google.cloud.bigquery.client.Client.list_jobs() method with the parent_job parameter to iterate over child jobs.

New in version 2.24.0.

property user_email

E-mail address of user who submitted the job.

Returns

the URL (None until set from the server).

Return type

Optional[str]