As of January 1, 2020 this library no longer supports Python 2 on the latest released version. Library versions released prior to that date will continue to be available. For more information please visit Python 2 support on Google Cloud.

google.cloud.bigquery.job.QueryJob¶

class google.cloud.bigquery.job.QueryJob(job_id, query, client, job_config=None)[source]¶

Asynchronous job: query tables.

Parameters

job_id (str) – the job’s ID, within the project belonging to client.
query (str) – SQL query string.
client (google.cloud.bigquery.client.Client) – A client which holds credentials and project configuration for the dataset (which requires a project).
job_config (Optional[google.cloud.bigquery.job.QueryJobConfig]) – Extra configuration options for the query job.

__init__(job_id, query, client, job_config=None)[source]¶: Initialize self. See help(type(self)) for accurate signature.

Methods

`__init__`(job_id, query, client[, job_config])	Initialize self.
`add_done_callback`(fn)	Add a callback to be executed when the operation is complete.
`cancel`([client, retry, timeout])	API call: cancel job via a POST request
`cancelled`()	Check if the job has been cancelled.
`done`([retry, timeout])	Refresh the job and checks if it is complete.
`exception`([timeout])	Get the exception from the operation, blocking if necessary.
`exists`([client, retry, timeout])	API call: test for the existence of the job via a GET request
`from_api_repr`(resource, client)	Factory: construct a job given its API representation
`reload`([client, retry, timeout])	API call: refresh job properties via a GET request.
`result`([page_size, max_results, retry, …])	Start the job and wait for it to complete and get the result.
`running`()	True if the operation is currently running.
`set_exception`(exception)	Set the Future’s exception.
`set_result`(result)	Set the Future’s result.
`to_api_repr`()	Generate a resource for `_begin()`.
`to_arrow`([progress_bar_type, …])	[Beta] Create a class:pyarrow.Table by loading all pages of a table or query.
`to_dataframe`([bqstorage_client, dtypes, …])	Return a pandas DataFrame from a QueryJob

Attributes

`allow_large_results`	See `google.cloud.bigquery.job.QueryJobConfig.allow_large_results`.
`billing_tier`	Return billing tier from job statistics, if present.
`cache_hit`	Return whether or not query results were served from cache.
`clustering_fields`	See `google.cloud.bigquery.job.QueryJobConfig.clustering_fields`.
`create_disposition`	See `google.cloud.bigquery.job.QueryJobConfig.create_disposition`.
`created`	Datetime at which the job was created.
`ddl_operation_performed`	Return the DDL operation performed.
`ddl_target_routine`	Return the DDL target routine, present
`ddl_target_table`	Return the DDL target table, present
`default_dataset`	See `google.cloud.bigquery.job.QueryJobConfig.default_dataset`.
`destination`	See `google.cloud.bigquery.job.QueryJobConfig.destination`.
`destination_encryption_configuration`	Custom encryption configuration for the destination table.
`dry_run`	See `google.cloud.bigquery.job.QueryJobConfig.dry_run`.
`ended`	Datetime at which the job finished.
`error_result`	Error information about the job as a whole.
`errors`	Information about individual errors generated by the job.
`estimated_bytes_processed`	Return the estimated number of bytes processed by the query.
`etag`	ETag for the job resource.
`flatten_results`	See `google.cloud.bigquery.job.QueryJobConfig.flatten_results`.
`job_id`	ID of the job.
`job_type`	Type of job.
`labels`	Labels for the job.
`location`	Location where the job runs.
`maximum_billing_tier`	See `google.cloud.bigquery.job.QueryJobConfig.maximum_billing_tier`.
`maximum_bytes_billed`	See `google.cloud.bigquery.job.QueryJobConfig.maximum_bytes_billed`.
`num_child_jobs`	The number of child jobs executed.
`num_dml_affected_rows`	Return the number of DML rows affected by the job.
`parent_job_id`	Return the ID of the parent job.
`path`	URL path for the job’s APIs.
`priority`	See `google.cloud.bigquery.job.QueryJobConfig.priority`.
`project`	Project bound to the job.
`query`	The query text used in this query job.
`query_parameters`	See `google.cloud.bigquery.job.QueryJobConfig.query_parameters`.
`query_plan`	Return query plan from job statistics, if present.
`range_partitioning`	See `google.cloud.bigquery.job.QueryJobConfig.range_partitioning`.
`referenced_tables`	Return referenced tables from job statistics, if present.
`schema_update_options`	See `google.cloud.bigquery.job.QueryJobConfig.schema_update_options`.
`script_statistics`
`self_link`	URL for the job resource.
`slot_millis`	Slot-milliseconds used by this query job.
`started`	Datetime at which the job was started.
`state`	Status of the job.
`statement_type`	Return statement type from job statistics, if present.
`table_definitions`	See `google.cloud.bigquery.job.QueryJobConfig.table_definitions`.
`time_partitioning`	See `google.cloud.bigquery.job.QueryJobConfig.time_partitioning`.
`timeline`	Return the query execution timeline from job statistics.
`total_bytes_billed`	Return total bytes billed from job statistics, if present.
`total_bytes_processed`	Return total bytes processed from job statistics, if present.
`udf_resources`	See `google.cloud.bigquery.job.QueryJobConfig.udf_resources`.
`undeclared_query_parameters`	Return undeclared query parameters from job statistics, if present.
`use_legacy_sql`	See `google.cloud.bigquery.job.QueryJobConfig.use_legacy_sql`.
`use_query_cache`	See `google.cloud.bigquery.job.QueryJobConfig.use_query_cache`.
`user_email`	E-mail address of user who submitted the job.
`write_disposition`	See `google.cloud.bigquery.job.QueryJobConfig.write_disposition`.

add_done_callback(fn)¶

Add a callback to be executed when the operation is complete.

If the operation is not already complete, this will start a helper thread to poll for the status of the operation in the background.

Parameters: fn (Callable[Future]) – The callback to execute when the operation is complete.

property allow_large_results¶: See google.cloud.bigquery.job.QueryJobConfig.allow_large_results.

property billing_tier¶

Return billing tier from job statistics, if present.

See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobStatistics2.FIELDS.billing_tier

Returns: Billing tier used by the job, or None if job is not yet complete.
Return type: Optional[int]

property cache_hit¶

Return whether or not query results were served from cache.

See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobStatistics2.FIELDS.cache_hit

Returns: whether the query results were returned from cache, or None if job is not yet complete.
Return type: Optional[bool]

cancel(client=None, retry=<google.api_core.retry.Retry object>, timeout=None)¶

API call: cancel job via a POST request

See https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/cancel

Parameters

client (Optional[google.cloud.bigquery.client.Client]) – the client to use. If not passed, falls back to the client stored on the current dataset.
retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using retry

Returns

Boolean indicating that the cancel request was sent.

Return type

bool

cancelled()¶

Check if the job has been cancelled.

This always returns False. It’s not possible to check if a job was cancelled in the API. This method is here to satisfy the interface for google.api_core.future.Future.

Returns: False
Return type: bool

property clustering_fields¶: See google.cloud.bigquery.job.QueryJobConfig.clustering_fields.

property create_disposition¶: See google.cloud.bigquery.job.QueryJobConfig.create_disposition.

property created¶

Datetime at which the job was created.

Returns: the creation time (None until set from the server).
Return type: Optional[datetime.datetime]

property ddl_operation_performed¶

Return the DDL operation performed.

See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobStatistics2.FIELDS.ddl_operation_performed

Type: Optional[str]

property ddl_target_routine¶

Return the DDL target routine, present: for CREATE/DROP FUNCTION/PROCEDURE queries.

See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobStatistics2.FIELDS.ddl_target_routine

Type: Optional[google.cloud.bigquery.routine.RoutineReference]

property ddl_target_table¶

Return the DDL target table, present: for CREATE/DROP TABLE/VIEW queries.

See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobStatistics2.FIELDS.ddl_target_table

Type: Optional[google.cloud.bigquery.table.TableReference]

property default_dataset¶: See google.cloud.bigquery.job.QueryJobConfig.default_dataset.

property destination¶: See google.cloud.bigquery.job.QueryJobConfig.destination.

property destination_encryption_configuration¶

Custom encryption configuration for the destination table.

Custom encryption configuration (e.g., Cloud KMS keys) or None if using default encryption.

See google.cloud.bigquery.job.QueryJobConfig.destination_encryption_configuration.

Type: google.cloud.bigquery.encryption_configuration.EncryptionConfiguration

done(retry=<google.api_core.retry.Retry object>, timeout=None)[source]¶

Refresh the job and checks if it is complete.

Parameters

retry (Optional[google.api_core.retry.Retry]) – How to retry the call that retrieves query results.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using retry.

Returns

True if the job is complete, False otherwise.

Return type

bool

property dry_run¶: See google.cloud.bigquery.job.QueryJobConfig.dry_run.

property ended¶

Datetime at which the job finished.

Returns: the end time (None until set from the server).
Return type: Optional[datetime.datetime]

property error_result¶

Error information about the job as a whole.

Returns: the error information (None until set from the server).
Return type: Optional[Mapping]

property errors¶

Information about individual errors generated by the job.

Returns: the error information (None until set from the server).
Return type: Optional[List[Mapping]]

property estimated_bytes_processed¶

Return the estimated number of bytes processed by the query.

See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobStatistics2.FIELDS.estimated_bytes_processed

Returns: number of DML rows affected by the job, or None if job is not yet complete.
Return type: Optional[int]

property etag¶

ETag for the job resource.

Returns: the ETag (None until set from the server).
Return type: Optional[str]

exception(timeout=None)¶

Get the exception from the operation, blocking if necessary.

Parameters

timeout (int) – How long to wait for the operation to complete. If None, wait indefinitely.

Returns

The operation’s: error.

Return type

Optional[google.api_core.GoogleAPICallError]

exists(client=None, retry=<google.api_core.retry.Retry object>, timeout=None)¶

API call: test for the existence of the job via a GET request

See https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/get

Parameters

client (Optional[google.cloud.bigquery.client.Client]) – the client to use. If not passed, falls back to the client stored on the current dataset.
retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using retry.

Returns

Boolean indicating existence of the job.

Return type

bool

property flatten_results¶: See google.cloud.bigquery.job.QueryJobConfig.flatten_results.

classmethod from_api_repr(resource, client)[source]¶

Factory: construct a job given its API representation

Parameters

resource (Dict) – dataset job representation returned from the API
client (google.cloud.bigquery.client.Client) – Client which holds credentials and project configuration for the dataset.

Returns

Job parsed from resource.

Return type

google.cloud.bigquery.job.QueryJob

property job_id¶

ID of the job.

Type: str

property job_type¶

Type of job.

Returns: one of ‘load’, ‘copy’, ‘extract’, ‘query’.
Return type: str

property labels¶

Labels for the job.

Type: Dict[str, str]

property location¶

Location where the job runs.

Type: str

property maximum_billing_tier¶: See google.cloud.bigquery.job.QueryJobConfig.maximum_billing_tier.

property maximum_bytes_billed¶: See google.cloud.bigquery.job.QueryJobConfig.maximum_bytes_billed.

property num_child_jobs¶

The number of child jobs executed.

See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobStatistics.FIELDS.num_child_jobs

Returns: int

property num_dml_affected_rows¶

Return the number of DML rows affected by the job.

See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobStatistics2.FIELDS.num_dml_affected_rows

Returns: number of DML rows affected by the job, or None if job is not yet complete.
Return type: Optional[int]

property parent_job_id¶

Return the ID of the parent job.

See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobStatistics.FIELDS.parent_job_id

Returns: parent job id.
Return type: Optional[str]

property path¶

URL path for the job’s APIs.

Returns: the path based on project and job ID.
Return type: str

property priority¶: See google.cloud.bigquery.job.QueryJobConfig.priority.

property project¶

Project bound to the job.

Returns: the project (derived from the client).
Return type: str

property query¶

The query text used in this query job.

See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationQuery.FIELDS.query

Type: str

property query_parameters¶: See google.cloud.bigquery.job.QueryJobConfig.query_parameters.

property query_plan¶

Return query plan from job statistics, if present.

See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobStatistics2.FIELDS.query_plan

Returns: mappings describing the query plan, or an empty list if the query has not yet completed.
Return type: List[QueryPlanEntry]

property range_partitioning¶: See google.cloud.bigquery.job.QueryJobConfig.range_partitioning.

property referenced_tables¶

Return referenced tables from job statistics, if present.

See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobStatistics2.FIELDS.referenced_tables

Returns: mappings describing the query plan, or an empty list if the query has not yet completed.
Return type: List[Dict]

reload(client=None, retry=<google.api_core.retry.Retry object>, timeout=None)¶

API call: refresh job properties via a GET request.

See https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/get

Parameters

client (Optional[google.cloud.bigquery.client.Client]) – the client to use. If not passed, falls back to the client stored on the current dataset.
retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using retry.

result(page_size=None, max_results=None, retry=<google.api_core.retry.Retry object>, timeout=None, start_index=None)[source]¶

Start the job and wait for it to complete and get the result.

Parameters

page_size (Optional[int]) – The maximum number of rows in each page of results from this request. Non-positive values are ignored.
max_results (Optional[int]) – The maximum total number of rows from this request.
retry (Optional[google.api_core.retry.Retry]) – How to retry the call that retrieves rows.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using retry. If multiple requests are made under the hood, timeout applies to each individual request.
start_index (Optional[int]) – The zero-based index of the starting row to read.

Returns

Iterator of row data Row-s. During each page, the iterator will have the total_rows attribute set, which counts the total number of rows in the result set (this is distinct from the total number of rows in the current page: iterator.page.num_items).

If the query is a special query that produces no results, e.g. a DDL query, an _EmptyRowIterator instance is returned.

Return type

google.cloud.bigquery.table.RowIterator

Raises

google.cloud.exceptions.GoogleCloudError – If the job failed.
concurrent.futures.TimeoutError – If the job did not complete in the given timeout.

running()¶: True if the operation is currently running.

property schema_update_options¶: See google.cloud.bigquery.job.QueryJobConfig.schema_update_options.

property self_link¶

URL for the job resource.

Returns: the URL (None until set from the server).
Return type: Optional[str]

set_exception(exception)¶: Set the Future’s exception.

set_result(result)¶: Set the Future’s result.

property slot_millis¶

Slot-milliseconds used by this query job.

Type: Union[int, None]

property started¶

Datetime at which the job was started.

Returns: the start time (None until set from the server).
Return type: Optional[datetime.datetime]

property state¶

Status of the job.

Returns: the state (None until set from the server).
Return type: Optional[str]

property statement_type¶

Return statement type from job statistics, if present.

See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobStatistics2.FIELDS.statement_type

Returns: type of statement used by the job, or None if job is not yet complete.
Return type: Optional[str]

property table_definitions¶: See google.cloud.bigquery.job.QueryJobConfig.table_definitions.

property time_partitioning¶: See google.cloud.bigquery.job.QueryJobConfig.time_partitioning.

property timeline¶

Return the query execution timeline from job statistics.

Type: List(TimelineEntry)

to_api_repr()[source]¶: Generate a resource for _begin().

to_arrow(progress_bar_type=None, bqstorage_client=None, create_bqstorage_client=True)[source]¶

[Beta] Create a class:pyarrow.Table by loading all pages of a table or query.

Parameters

progress_bar_type (Optional[str]) –
If set, use the tqdm library to display a progress bar while the data downloads. Install the tqdm package to use this feature.

Possible values of progress_bar_type include:

None
No progress bar.

'tqdm'
Use the tqdm.tqdm() function to print a progress bar to sys.stderr.

'tqdm_notebook'
Use the tqdm.tqdm_notebook() function to display a progress bar as a Jupyter notebook widget.

'tqdm_gui'
Use the tqdm.tqdm_gui() function to display a progress bar as a graphical dialog box.
bqstorage_client (Optional[google.cloud.bigquery_storage_v1.BigQueryReadClient]) –
A BigQuery Storage API client. If supplied, use the faster BigQuery Storage API to fetch rows from BigQuery. This API is a billable API.

This method requires the pyarrow and google-cloud-bigquery-storage libraries.

Reading from a specific partition or snapshot is not currently supported by this method.
create_bqstorage_client (Optional[bool]) –
If True (default), create a BigQuery Storage API client using the default API settings. The BigQuery Storage API is a faster way to fetch rows from BigQuery. See the bqstorage_client parameter for more information.

This argument does nothing if bqstorage_client is supplied.

..versionadded:: 1.24.0

Returns

pyarrow.Table: A pyarrow.Table populated with row data and column headers from the query results. The column headers are derived from the destination table’s schema.

Raises

ValueError – If the pyarrow library cannot be imported.

..versionadded:: 1.17.0

to_dataframe(bqstorage_client=None, dtypes=None, progress_bar_type=None, create_bqstorage_client=True, date_as_object=True)[source]¶

Return a pandas DataFrame from a QueryJob

Parameters

bqstorage_client (Optional[google.cloud.bigquery_storage_v1.BigQueryReadClient]) –
A BigQuery Storage API client. If supplied, use the faster BigQuery Storage API to fetch rows from BigQuery. This API is a billable API.

This method requires the fastavro and google-cloud-bigquery-storage libraries.

Reading from a specific partition or snapshot is not currently supported by this method.
dtypes (Optional[Map[str, Union[str, pandas.Series.dtype]]]) – A dictionary of column names pandas dtype``s. The provided ``dtype is used when constructing the series for the column specified. Otherwise, the default pandas behavior is used.
progress_bar_type (Optional[str]) –
If set, use the tqdm library to display a progress bar while the data downloads. Install the tqdm package to use this feature.

See to_dataframe() for details.

..versionadded:: 1.11.0
create_bqstorage_client (Optional[bool]) –
If True (default), create a BigQuery Storage API client using the default API settings. The BigQuery Storage API is a faster way to fetch rows from BigQuery. See the bqstorage_client parameter for more information.

This argument does nothing if bqstorage_client is supplied.

..versionadded:: 1.24.0
date_as_object (Optional[bool]) –
If True (default), cast dates to objects. If False, convert to datetime64[ns] dtype.

..versionadded:: 1.26.0

Returns

A DataFrame populated with row data and column headers from the query results. The column headers are derived from the destination table’s schema.

Raises

ValueError – If the pandas library cannot be imported.

property total_bytes_billed¶

Return total bytes billed from job statistics, if present.

See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobStatistics2.FIELDS.total_bytes_billed

Returns: Total bytes processed by the job, or None if job is not yet complete.
Return type: Optional[int]

property total_bytes_processed¶

Return total bytes processed from job statistics, if present.

See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobStatistics2.FIELDS.total_bytes_processed

Returns: Total bytes processed by the job, or None if job is not yet complete.
Return type: Optional[int]

property udf_resources¶: See google.cloud.bigquery.job.QueryJobConfig.udf_resources.

property undeclared_query_parameters¶

Return undeclared query parameters from job statistics, if present.

See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobStatistics2.FIELDS.undeclared_query_parameters

Returns: Undeclared parameters, or an empty list if the query has not yet completed.
Return type: List[Union[ google.cloud.bigquery.query.ArrayQueryParameter, google.cloud.bigquery.query.ScalarQueryParameter, google.cloud.bigquery.query.StructQueryParameter ]]

property use_legacy_sql¶: See google.cloud.bigquery.job.QueryJobConfig.use_legacy_sql.

property use_query_cache¶: See google.cloud.bigquery.job.QueryJobConfig.use_query_cache.

property user_email¶

E-mail address of user who submitted the job.

Returns: the URL (None until set from the server).
Return type: Optional[str]

property write_disposition¶: See google.cloud.bigquery.job.QueryJobConfig.write_disposition.

google.cloud.bigquery.job.QueryJob¶

google-cloud-bigquery

Navigation

Related Topics