API Reference ¶

See https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/list

list_jobs(project: typing.Optional[str] = None, parent_job: typing.Optional[typing.Union[google.cloud.bigquery.job.query.QueryJob, str]] = None, max_results: typing.Optional[int] = None, page_token: typing.Optional[str] = None, all_users: typing.Optional[bool] = None, state_filter: typing.Optional[str] = None, retry: google.api_core.retry.retry_unary.Retry = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = None, min_creation_time: typing.Optional[datetime.datetime] = None, max_creation_time: typing.Optional[datetime.datetime] = None, page_size: typing.Optional[int] = None) → google.api_core.page_iterator.Iterator[source]¶

List jobs for the project associated with this client.

Parameters

project (Optional[str]) – Project ID to use for retreiving datasets. Defaults to the client’s project.
parent_job (Optional[Union[ google.cloud.bigquery.job._AsyncJob, str, ]]) – If set, retrieve only child jobs of the specified parent.
max_results (Optional[int]) – Maximum number of jobs to return.
page_token (Optional[str]) – Opaque marker for the next “page” of jobs. If not passed, the API will return the first page of jobs. The token marks the beginning of the iterator to be returned and the value of the page_token can be accessed at next_page_token of HTTPIterator.
all_users (Optional[bool]) – If true, include jobs owned by all users in the project. Defaults to False.
state_filter (Optional[str]) –
If set, include only jobs matching the given state. One of:
- "done"
- "pending"
- "running"
retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using retry.
min_creation_time (Optional[datetime.datetime]) – Min value for job creation time. If set, only jobs created after or at this timestamp are returned. If the datetime has no time zone assumes UTC time.
max_creation_time (Optional[datetime.datetime]) – Max value for job creation time. If set, only jobs created before or at this timestamp are returned. If the datetime has no time zone assumes UTC time.
page_size (Optional[int]) – Maximum number of jobs to return per page.

Returns

Iterable of job instances.

Return type

See https://cloud.google.com/bigquery/docs/reference/rest/v2/models/list

list_models(dataset: typing.Union[google.cloud.bigquery.dataset.Dataset, google.cloud.bigquery.dataset.DatasetReference, google.cloud.bigquery.dataset.DatasetListItem, str], max_results: typing.Optional[int] = None, page_token: typing.Optional[str] = None, retry: google.api_core.retry.retry_unary.Retry = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = None, page_size: typing.Optional[int] = None) → google.api_core.page_iterator.Iterator[source]¶

[Beta] List models in the dataset.

Parameters

dataset (Union[ google.cloud.bigquery.dataset.Dataset, google.cloud.bigquery.dataset.DatasetReference, google.cloud.bigquery.dataset.DatasetListItem, str, ]) – A reference to the dataset whose models to list from the BigQuery API. If a string is passed in, this method attempts to create a dataset reference from a string using google.cloud.bigquery.dataset.DatasetReference.from_string().
max_results (Optional[int]) – Maximum number of models to return. Defaults to a value set by the API.
page_token (Optional[str]) – Token representing a cursor into the models. If not passed, the API will return the first page of models. The token marks the beginning of the iterator to be returned and the value of the page_token can be accessed at next_page_token of the HTTPIterator.
retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using retry.
page_size – Maximum number of models to return per page. Defaults to a value set by the API.

list_partitions(table: typing.Union[google.cloud.bigquery.table.Table, google.cloud.bigquery.table.TableReference, google.cloud.bigquery.table.TableListItem, str], retry: google.api_core.retry.retry_unary.Retry = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = None) → Sequence[str][source]¶

List the partitions in a table.

Parameters

table (Union[ google.cloud.bigquery.table.Table, google.cloud.bigquery.table.TableReference, google.cloud.bigquery.table.TableListItem, str, ]) – The table or reference from which to get partition info
retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using retry. If multiple requests are made under the hood, timeout applies to each individual request.

Returns

A list of the partition ids present in the partitioned table

Return type

List[str]

list_projects(max_results: typing.Optional[int] = None, page_token: typing.Optional[str] = None, retry: google.api_core.retry.retry_unary.Retry = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = None, page_size: typing.Optional[int] = None) → google.api_core.page_iterator.Iterator[source]¶

List projects for the project associated with this client.

See https://cloud.google.com/bigquery/docs/reference/rest/v2/projects/list

Parameters

max_results (Optional[int]) – Maximum number of projects to return. Defaults to a value set by the API.
page_token (Optional[str]) – Token representing a cursor into the projects. If not passed, the API will return the first page of projects. The token marks the beginning of the iterator to be returned and the value of the page_token can be accessed at next_page_token of the HTTPIterator.
retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using retry.
page_size (Optional[int]) – Maximum number of projects to return in each page. Defaults to a value set by the API.

Returns

Iterator of Project accessible to the current client.

Return type

See https://cloud.google.com/bigquery/docs/reference/rest/v2/routines/list

list_routines(dataset: typing.Union[google.cloud.bigquery.dataset.Dataset, google.cloud.bigquery.dataset.DatasetReference, google.cloud.bigquery.dataset.DatasetListItem, str], max_results: typing.Optional[int] = None, page_token: typing.Optional[str] = None, retry: google.api_core.retry.retry_unary.Retry = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = None, page_size: typing.Optional[int] = None) → google.api_core.page_iterator.Iterator[source]¶

[Beta] List routines in the dataset.

Parameters

dataset (Union[ google.cloud.bigquery.dataset.Dataset, google.cloud.bigquery.dataset.DatasetReference, google.cloud.bigquery.dataset.DatasetListItem, str, ]) – A reference to the dataset whose routines to list from the BigQuery API. If a string is passed in, this method attempts to create a dataset reference from a string using google.cloud.bigquery.dataset.DatasetReference.from_string().
max_results (Optional[int]) – Maximum number of routines to return. Defaults to a value set by the API.
page_token (Optional[str]) – Token representing a cursor into the routines. If not passed, the API will return the first page of routines. The token marks the beginning of the iterator to be returned and the value of the page_token can be accessed at next_page_token of the HTTPIterator.
retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using retry.
page_size – Maximum number of routines to return per page. Defaults to a value set by the API.

list_rows(table: typing.Union[google.cloud.bigquery.table.Table, google.cloud.bigquery.table.TableListItem, google.cloud.bigquery.table.TableReference, str], selected_fields: typing.Optional[typing.Sequence[google.cloud.bigquery.schema.SchemaField]] = None, max_results: typing.Optional[int] = None, page_token: typing.Optional[str] = None, start_index: typing.Optional[int] = None, page_size: typing.Optional[int] = None, retry: google.api_core.retry.retry_unary.Retry = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = None) → google.cloud.bigquery.table.RowIterator[source]¶

List the rows of the table.

See https://cloud.google.com/bigquery/docs/reference/rest/v2/tabledata/list

Note

This method assumes that the provided schema is up-to-date with the schema as defined on the back-end: if the two schemas are not identical, the values returned may be incomplete. To ensure that the local copy of the schema is up-to-date, call client.get_table.

Parameters

table (Union[ google.cloud.bigquery.table.Table, google.cloud.bigquery.table.TableListItem, google.cloud.bigquery.table.TableReference, str, ]) – The table to list, or a reference to it. When the table object does not contain a schema and selected_fields is not supplied, this method calls get_table to fetch the table schema.
selected_fields (Sequence[google.cloud.bigquery.schema.SchemaField]) – The fields to return. If not supplied, data for all columns are downloaded.
max_results (Optional[int]) – Maximum number of rows to return.
page_token (Optional[str]) – Token representing a cursor into the table’s rows. If not passed, the API will return the first page of the rows. The token marks the beginning of the iterator to be returned and the value of the page_token can be accessed at next_page_token of the RowIterator.
start_index (Optional[int]) – The zero-based index of the starting row to read.
page_size (Optional[int]) – The maximum number of rows in each page of results from this request. Non-positive values are ignored. Defaults to a sensible value set by the API.
retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using retry. If multiple requests are made under the hood, timeout applies to each individual request.

Returns

Iterator of row data Row-s. During each page, the iterator will have the total_rows attribute set, which counts the total number of rows in the table (this is distinct from the total number of rows in the current page: iterator.page.num_items).

Return type

google.cloud.bigquery.table.RowIterator

list_tables(dataset: typing.Union[google.cloud.bigquery.dataset.Dataset, google.cloud.bigquery.dataset.DatasetReference, google.cloud.bigquery.dataset.DatasetListItem, str], max_results: typing.Optional[int] = None, page_token: typing.Optional[str] = None, retry: google.api_core.retry.retry_unary.Retry = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = None, page_size: typing.Optional[int] = None) → google.api_core.page_iterator.Iterator[source]¶

List tables in the dataset.

See https://cloud.google.com/bigquery/docs/reference/rest/v2/tables/list

Parameters

dataset (Union[ google.cloud.bigquery.dataset.Dataset, google.cloud.bigquery.dataset.DatasetReference, google.cloud.bigquery.dataset.DatasetListItem, str, ]) – A reference to the dataset whose tables to list from the BigQuery API. If a string is passed in, this method attempts to create a dataset reference from a string using google.cloud.bigquery.dataset.DatasetReference.from_string().
max_results (Optional[int]) – Maximum number of tables to return. Defaults to a value set by the API.
page_token (Optional[str]) – Token representing a cursor into the tables. If not passed, the API will return the first page of tables. The token marks the beginning of the iterator to be returned and the value of the page_token can be accessed at next_page_token of the HTTPIterator.
retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using retry.
page_size (Optional[int]) – Maximum number of tables to return per page. Defaults to a value set by the API.

Returns

Iterator of TableListItem contained within the requested dataset.

Return type

https://github.com/googleapis/python-bigquery/issues/19

load_table_from_dataframe(dataframe: pandas.core.frame.DataFrame, destination: Union[google.cloud.bigquery.table.Table, google.cloud.bigquery.table.TableReference, str], num_retries: int = 6, job_id: Optional[str] = None, job_id_prefix: Optional[str] = None, location: Optional[str] = None, project: Optional[str] = None, job_config: Optional[google.cloud.bigquery.job.load.LoadJobConfig] = None, parquet_compression: str = 'snappy', timeout: Union[None, float, Tuple[float, float]] = None) → google.cloud.bigquery.job.load.LoadJob[source]¶

Upload the contents of a table from a pandas DataFrame.

Similar to load_table_from_uri(), this method creates, starts and returns a LoadJob.

Note

REPEATED fields are NOT supported when using the CSV source format. They are supported when using the PARQUET source format, but due to the way they are encoded in the parquet file, a mismatch with the existing table schema can occur, so REPEATED fields are not properly supported when using pyarrow<4.0.0 using the parquet format.

Parameters

dataframe (pandas.Dataframe) – A DataFrame containing the data to load.
destination (Union[ Table, TableReference, str ]) –
The destination table to use for loading the data. If it is an existing table, the schema of the DataFrame must match the schema of the destination table. If the table does not yet exist, the schema is inferred from the DataFrame.

If a string is passed in, this method attempts to create a table reference from a string using google.cloud.bigquery.table.TableReference.from_string().
num_retries (Optional[int]) – Number of upload retries. Defaults to 6.
job_id (Optional[str]) – Name of the job.
job_id_prefix (Optional[str]) – The user-provided prefix for a randomly generated job ID. This parameter will be ignored if a job_id is also given.
location (Optional[str]) – Location where to run the job. Must match the location of the destination table.
project (Optional[str]) – Project ID of the project of where to run the job. Defaults to the client’s project.
job_config (Optional[LoadJobConfig]) –
Extra configuration options for the job.

To override the default pandas data type conversions, supply a value for schema with column names matching those of the dataframe. The BigQuery schema is used to determine the correct data type conversion. Indexes are not loaded.

By default, this method uses the parquet source format. To override this, supply a value for source_format with the format name. Currently only CSV and PARQUET are supported.
parquet_compression (Optional[str]) –
[Beta] The compression method to use if intermittently serializing dataframe to a parquet file. Defaults to “snappy”.

The argument is directly passed as the compression argument to the underlying pyarrow.parquet.write_table() method (the default value “snappy” gets converted to uppercase). https://arrow.apache.org/docs/python/generated/pyarrow.parquet.write_table.html#pyarrow-parquet-write-table

If the job config schema is missing, the argument is directly passed as the compression argument to the underlying DataFrame.to_parquet() method. https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_parquet.html#pandas.DataFrame.to_parquet
timeout (Optional[flaot]) –
The number of seconds to wait for the underlying HTTP transport before using retry. Depending on the retry strategy, a request may be repeated several times using the same timeout each time. Defaults to None.

Can also be passed as a tuple (connect_timeout, read_timeout). See requests.Session.request() documentation for details.

Returns

A new load job.

Return type

Raises

ValueError – If a usable parquet engine cannot be found. This method requires pyarrow to be installed.
TypeError – If job_config is not an instance of LoadJobConfig class.

load_table_from_file(file_obj: IO[bytes], destination: Union[google.cloud.bigquery.table.Table, google.cloud.bigquery.table.TableReference, google.cloud.bigquery.table.TableListItem, str], rewind: bool = False, size: Optional[int] = None, num_retries: int = 6, job_id: Optional[str] = None, job_id_prefix: Optional[str] = None, location: Optional[str] = None, project: Optional[str] = None, job_config: Optional[google.cloud.bigquery.job.load.LoadJobConfig] = None, timeout: Union[None, float, Tuple[float, float]] = None) → google.cloud.bigquery.job.load.LoadJob[source]¶

Upload the contents of this table from a file-like object.

Similar to load_table_from_uri(), this method creates, starts and returns a LoadJob.

Parameters

file_obj (IO[bytes]) – A file handle opened in binary mode for reading.
destination (Union[Table, TableReference, TableListItem, str ]) – Table into which data is to be loaded. If a string is passed in, this method attempts to create a table reference from a string using google.cloud.bigquery.table.TableReference.from_string().
rewind (Optional[bool]) – If True, seek to the beginning of the file handle before reading the file. Defaults to False.
size (Optional[int]) – The number of bytes to read from the file handle. If size is None or large, resumable upload will be used. Otherwise, multipart upload will be used.
num_retries (Optional[int]) – Number of upload retries. Defaults to 6.
job_id (Optional[str]) – Name of the job.
job_id_prefix (Optional[str]) – The user-provided prefix for a randomly generated job ID. This parameter will be ignored if a job_id is also given.
location (Optional[str]) – Location where to run the job. Must match the location of the destination table.
project (Optional[str]) – Project ID of the project of where to run the job. Defaults to the client’s project.
job_config (Optional[LoadJobConfig]) – Extra configuration options for the job.
timeout (Optional[float]) –
The number of seconds to wait for the underlying HTTP transport before using retry. Depending on the retry strategy, a request may be repeated several times using the same timeout each time. Defaults to None.

Can also be passed as a tuple (connect_timeout, read_timeout). See requests.Session.request() documentation for details.

Returns

A new load job.

Return type

Raises

ValueError – If size is not passed in and can not be determined, or if the file_obj can be detected to be a file opened in text mode.
TypeError – If job_config is not an instance of LoadJobConfig class.

load_table_from_json(json_rows: Iterable[Dict[str, Any]], destination: Union[google.cloud.bigquery.table.Table, google.cloud.bigquery.table.TableReference, google.cloud.bigquery.table.TableListItem, str], num_retries: int = 6, job_id: Optional[str] = None, job_id_prefix: Optional[str] = None, location: Optional[str] = None, project: Optional[str] = None, job_config: Optional[google.cloud.bigquery.job.load.LoadJobConfig] = None, timeout: Union[None, float, Tuple[float, float]] = None) → google.cloud.bigquery.job.load.LoadJob[source]¶

Upload the contents of a table from a JSON string or dict.

Parameters

json_rows (Iterable[Dict[str, Any]]) –
Row data to be inserted. Keys must match the table schema fields and values must be JSON-compatible representations.
Note

If your data is already a newline-delimited JSON string, it is best to wrap it into a file-like object and pass it to load_table_from_file():
```
import io
from google.cloud import bigquery

data = u'{"foo": "bar"}'
data_as_file = io.StringIO(data)

client = bigquery.Client()
client.load_table_from_file(data_as_file, ...)
```
destination (Union[ Table, TableReference, TableListItem, str ]) – Table into which data is to be loaded. If a string is passed in, this method attempts to create a table reference from a string using google.cloud.bigquery.table.TableReference.from_string().
num_retries (Optional[int]) – Number of upload retries. Defaults to 6.
job_id (Optional[str]) – Name of the job.
job_id_prefix (Optional[str]) – The user-provided prefix for a randomly generated job ID. This parameter will be ignored if a job_id is also given.
location (Optional[str]) – Location where to run the job. Must match the location of the destination table.
project (Optional[str]) – Project ID of the project of where to run the job. Defaults to the client’s project.
job_config (Optional[LoadJobConfig]) – Extra configuration options for the job. The source_format setting is always set to NEWLINE_DELIMITED_JSON.
timeout (Optional[float]) –
The number of seconds to wait for the underlying HTTP transport before using retry. Depending on the retry strategy, a request may be repeated several times using the same timeout each time. Defaults to None.

Can also be passed as a tuple (connect_timeout, read_timeout). See requests.Session.request() documentation for details.

Returns

A new load job.

Return type

See https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#jobconfigurationload

Raises

TypeError – If job_config is not an instance of LoadJobConfig class.

load_table_from_uri(source_uris: typing.Union[str, typing.Sequence[str]], destination: typing.Union[google.cloud.bigquery.table.Table, google.cloud.bigquery.table.TableReference, google.cloud.bigquery.table.TableListItem, str], job_id: typing.Optional[str] = None, job_id_prefix: typing.Optional[str] = None, location: typing.Optional[str] = None, project: typing.Optional[str] = None, job_config: typing.Optional[google.cloud.bigquery.job.load.LoadJobConfig] = None, retry: google.api_core.retry.retry_unary.Retry = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = None) → google.cloud.bigquery.job.load.LoadJob[source]¶

Starts a job for loading data into a table from Cloud Storage.

Parameters

source_uris (Union[str, Sequence[str]]) – URIs of data files to be loaded; in format gs://<bucket_name>/<object_name_or_glob>.
destination (Union[ google.cloud.bigquery.table.Table, google.cloud.bigquery.table.TableReference, google.cloud.bigquery.table.TableListItem, str, ]) – Table into which data is to be loaded. If a string is passed in, this method attempts to create a table reference from a string using google.cloud.bigquery.table.TableReference.from_string().
job_id (Optional[str]) – Name of the job.
job_id_prefix (Optional[str]) – The user-provided prefix for a randomly generated job ID. This parameter will be ignored if a job_id is also given.
location (Optional[str]) – Location where to run the job. Must match the location of the destination table.
project (Optional[str]) – Project ID of the project of where to run the job. Defaults to the client’s project.
job_config (Optional[google.cloud.bigquery.job.LoadJobConfig]) – Extra configuration options for the job.
retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using retry.

Returns

A new load job.

Return type

See https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#jobconfigurationquery

Raises

TypeError – If job_config is not an instance of LoadJobConfig class.

property location¶: Default location for jobs / datasets / tables.

query(query: str, job_config: typing.Optional[google.cloud.bigquery.job.query.QueryJobConfig] = None, job_id: typing.Optional[str] = None, job_id_prefix: typing.Optional[str] = None, location: typing.Optional[str] = None, project: typing.Optional[str] = None, retry: google.api_core.retry.retry_unary.Retry = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = None, job_retry: google.api_core.retry.retry_unary.Retry = <google.api_core.retry.retry_unary.Retry object>, api_method: typing.Union[str, google.cloud.bigquery.enums.QueryApiMethod] = QueryApiMethod.INSERT) → google.cloud.bigquery.job.query.QueryJob[source]¶

Run a SQL query.

Parameters

query (str) – SQL query to be executed. Defaults to the standard SQL dialect. Use the job_config parameter to change dialects.
job_config (Optional[google.cloud.bigquery.job.QueryJobConfig]) – Extra configuration options for the job. To override any options that were previously set in the default_query_job_config given to the Client constructor, manually set those options to None, or whatever value is preferred.
job_id (Optional[str]) – ID to use for the query job.
job_id_prefix (Optional[str]) – The prefix to use for a randomly generated job ID. This parameter will be ignored if a job_id is also given.
location (Optional[str]) – Location where to run the job. Must match the location of the table used in the query as well as the destination table.
project (Optional[str]) – Project ID of the project of where to run the job. Defaults to the client’s project.
retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC. This only applies to making RPC calls. It isn’t used to retry failed jobs. This has a reasonable default that should only be overridden with care.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using retry.
job_retry (Optional[google.api_core.retry.Retry]) –
How to retry failed jobs. The default retries rate-limit-exceeded errors. Passing None disables job retry.

Not all jobs can be retried. If job_id is provided, then the job returned by the query will not be retryable, and an exception will be raised if a non-None (and non-default) value for job_retry is also provided.

Note that errors aren’t detected until result() is called on the job returned. The job_retry specified here becomes the default job_retry for result(), where it can also be specified.
api_method (Union[str, enums.QueryApiMethod]) –
Method with which to start the query job.

See google.cloud.bigquery.enums.QueryApiMethod for details on the difference between the query start methods.

Returns

A new query job instance.

Return type

google.cloud.bigquery.job.QueryJob

Raises

TypeError – If job_config is not an instance of QueryJobConfig class, or if both job_id and non-None non-default job_retry are provided.

query_and_wait(query, *, job_config: typing.Optional[google.cloud.bigquery.job.query.QueryJobConfig] = None, location: typing.Optional[str] = None, project: typing.Optional[str] = None, api_timeout: typing.Optional[float] = None, wait_timeout: typing.Union[float, None, object] = <object object>, retry: google.api_core.retry.retry_unary.Retry = <google.api_core.retry.retry_unary.Retry object>, job_retry: google.api_core.retry.retry_unary.Retry = <google.api_core.retry.retry_unary.Retry object>, page_size: typing.Optional[int] = None, max_results: typing.Optional[int] = None) → google.cloud.bigquery.table.RowIterator[source]¶

Run the query, wait for it to finish, and return the results.

While jobCreationMode=JOB_CREATION_OPTIONAL is in preview in the jobs.query REST API, use the default jobCreationMode unless the environment variable QUERY_PREVIEW_ENABLED=true. After jobCreationMode is GA, this method will always use jobCreationMode=JOB_CREATION_OPTIONAL. See: https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/query

Parameters

query (str) – SQL query to be executed. Defaults to the standard SQL dialect. Use the job_config parameter to change dialects.
job_config (Optional[google.cloud.bigquery.job.QueryJobConfig]) – Extra configuration options for the job. To override any options that were previously set in the default_query_job_config given to the Client constructor, manually set those options to None, or whatever value is preferred.
location (Optional[str]) – Location where to run the job. Must match the location of the table used in the query as well as the destination table.
project (Optional[str]) – Project ID of the project of where to run the job. Defaults to the client’s project.
api_timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using retry.
wait_timeout (Optional[Union[float, object]]) – The number of seconds to wait for the query to finish. If the query doesn’t finish before this timeout, the client attempts to cancel the query. If unset, the underlying REST API calls have timeouts, but we still wait indefinitely for the job to finish.
retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC. This only applies to making RPC calls. It isn’t used to retry failed jobs. This has a reasonable default that should only be overridden with care.
job_retry (Optional[google.api_core.retry.Retry]) – How to retry failed jobs. The default retries rate-limit-exceeded errors. Passing None disables job retry. Not all jobs can be retried.
page_size (Optional[int]) – The maximum number of rows in each page of results from this request. Non-positive values are ignored.
max_results (Optional[int]) – The maximum total number of rows from this request.

Returns

Iterator of row data Row-s. During each page, the iterator will have the total_rows attribute set, which counts the total number of rows in the result set (this is distinct from the total number of rows in the current page: iterator.page.num_items).

If the query is a special query that produces no results, e.g. a DDL query, an _EmptyRowIterator instance is returned.

Return type

google.cloud.bigquery.table.RowIterator

Raises

TypeError – If job_config is not an instance of QueryJobConfig class.

schema_from_json(file_or_path: PathType) → List[google.cloud.bigquery.schema.SchemaField][source]¶

Takes a file object or file path that contains json that describes a table schema.

Returns: List of SchemaField objects.
Return type: List[SchemaField]

schema_to_json(schema_list: Sequence[google.cloud.bigquery.schema.SchemaField], destination: PathType)[source]¶

Takes a list of schema field objects.

Serializes the list of schema field objects as json to a file.

Destination is a file path or a file object.

set_iam_policy(table: typing.Union[google.cloud.bigquery.table.Table, google.cloud.bigquery.table.TableReference, google.cloud.bigquery.table.TableListItem, str], policy: google.api_core.iam.Policy, updateMask: typing.Optional[str] = None, retry: google.api_core.retry.retry_unary.Retry = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = None, *, fields: typing.Sequence[str] = ()) → google.api_core.iam.Policy[source]¶

Return the access control policy for a table resource.

Parameters

table (Union[ google.cloud.bigquery.table.Table, google.cloud.bigquery.table.TableReference, google.cloud.bigquery.table.TableListItem, str, ]) – The table to get the access control policy for. If a string is passed in, this method attempts to create a table reference from a string using from_string().
policy (google.api_core.iam.Policy) – The access control policy to set.
updateMask (Optional[str]) –
Mask as defined by https://cloud.google.com/bigquery/docs/reference/rest/v2/tables/setIamPolicy#body.request_body.FIELDS.update_mask

Incompatible with fields.
retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using retry.
fields (Sequence[str]) –
Which properties to set on the policy. See: https://cloud.google.com/bigquery/docs/reference/rest/v2/tables/setIamPolicy#body.request_body.FIELDS.update_mask

Incompatible with updateMask.

Returns

The updated access control policy.

Return type

google.api_core.iam.Policy

update_dataset(dataset: google.cloud.bigquery.dataset.Dataset, fields: typing.Sequence[str], retry: google.api_core.retry.retry_unary.Retry = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = None) → google.cloud.bigquery.dataset.Dataset[source]¶

Change some fields of a dataset.

Use fields to specify which fields to update. At least one field must be provided. If a field is listed in fields and is None in dataset, it will be deleted.

For example, to update the default expiration times, specify both properties in the fields argument:

bigquery_client.update_dataset(
    dataset,
    [
        "default_partition_expiration_ms",
        "default_table_expiration_ms",
    ]
)

If dataset.etag is not None, the update will only succeed if the dataset on the server has the same ETag. Thus reading a dataset with get_dataset, changing its fields, and then passing it to update_dataset will ensure that the changes will only be saved if no modifications to the dataset occurred since the read.

Parameters

dataset (google.cloud.bigquery.dataset.Dataset) – The dataset to update.
fields (Sequence[str]) – The properties of dataset to change. These are strings corresponding to the properties of Dataset.
retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using retry.

Returns

The modified Dataset instance.

Return type

google.cloud.bigquery.dataset.Dataset

update_model(model: google.cloud.bigquery.model.Model, fields: typing.Sequence[str], retry: google.api_core.retry.retry_unary.Retry = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = None) → google.cloud.bigquery.model.Model[source]¶

[Beta] Change some fields of a model.

Use fields to specify which fields to update. At least one field must be provided. If a field is listed in fields and is None in model, the field value will be deleted.

For example, to update the descriptive properties of the model, specify them in the fields argument:

bigquery_client.update_model(
    model, ["description", "friendly_name"]
)

If model.etag is not None, the update will only succeed if the model on the server has the same ETag. Thus reading a model with get_model, changing its fields, and then passing it to update_model will ensure that the changes will only be saved if no modifications to the model occurred since the read.

Parameters

model (google.cloud.bigquery.model.Model) – The model to update.
fields (Sequence[str]) – The properties of model to change. These are strings corresponding to the properties of Model.
retry (Optional[google.api_core.retry.Retry]) – A description of how to retry the API call.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using retry.

Returns

The model resource returned from the API call.

Return type

google.cloud.bigquery.model.Model

update_routine(routine: google.cloud.bigquery.routine.routine.Routine, fields: typing.Sequence[str], retry: google.api_core.retry.retry_unary.Retry = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = None) → google.cloud.bigquery.routine.routine.Routine[source]¶

[Beta] Change some fields of a routine.

Use fields to specify which fields to update. At least one field must be provided. If a field is listed in fields and is None in routine, the field value will be deleted.

For example, to update the description property of the routine, specify it in the fields argument:

bigquery_client.update_routine(
    routine, ["description"]
)

Warning

During beta, partial updates are not supported. You must provide all fields in the resource.

If etag is not None, the update will only succeed if the resource on the server has the same ETag. Thus reading a routine with get_routine(), changing its fields, and then passing it to this method will ensure that the changes will only be saved if no modifications to the resource occurred since the read.

Parameters

routine (google.cloud.bigquery.routine.Routine) – The routine to update.
fields (Sequence[str]) – The fields of routine to change, spelled as the Routine properties.
retry (Optional[google.api_core.retry.Retry]) – A description of how to retry the API call.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using retry.

Returns

The routine resource returned from the API call.

Return type

google.cloud.bigquery.routine.Routine

update_table(table: google.cloud.bigquery.table.Table, fields: typing.Sequence[str], retry: google.api_core.retry.retry_unary.Retry = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = None) → google.cloud.bigquery.table.Table[source]¶

Change some fields of a table.

Use fields to specify which fields to update. At least one field must be provided. If a field is listed in fields and is None in table, the field value will be deleted.

For example, to update the descriptive properties of the table, specify them in the fields argument:

bigquery_client.update_table(
    table,
    ["description", "friendly_name"]
)

If table.etag is not None, the update will only succeed if the table on the server has the same ETag. Thus reading a table with get_table, changing its fields, and then passing it to update_table will ensure that the changes will only be saved if no modifications to the table occurred since the read.

Parameters

table (google.cloud.bigquery.table.Table) – The table to update.
fields (Sequence[str]) – The fields of table to change, spelled as the Table properties.
retry (Optional[google.api_core.retry.Retry]) – A description of how to retry the API call.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using retry.

Returns

The table resource returned from the API call.

Return type

google.cloud.bigquery.table.Table

class google.cloud.bigquery.client.Project(project_id, numeric_id, friendly_name)[source]¶

Wrapper for resource describing a BigQuery project.

Parameters

project_id (str) – Opaque ID of the project
numeric_id (int) – Numeric ID of the project
friendly_name (str) – Display name of the project

classmethod from_api_repr(resource)[source]¶: Factory: construct an instance from a resource dict.

Job¶

Define API Jobs.

class google.cloud.bigquery.job.Compression(value)[source]¶

The compression type to use for exported files. The default value is NONE.

DEFLATE and SNAPPY are only supported for Avro.

DEFLATE = 'DEFLATE'¶: Specifies DEFLATE format.

GZIP = 'GZIP'¶: Specifies GZIP format.

NONE = 'NONE'¶: Specifies no compression.

SNAPPY = 'SNAPPY'¶: Specifies SNAPPY format.

ZSTD = 'ZSTD'¶: Specifies ZSTD format.

class google.cloud.bigquery.job.CopyJob(job_id, sources, destination, client, job_config=None)[source]¶

Asynchronous job: copy data into a table from other tables.

Parameters

job_id (str) – the job’s ID, within the project belonging to client.
sources (List[google.cloud.bigquery.table.TableReference]) – Table from which data is to be loaded.
destination (google.cloud.bigquery.table.TableReference) – Table into which data is to be loaded.
client (google.cloud.bigquery.client.Client) – A client which holds credentials and project configuration for the dataset (which requires a project).
job_config (Optional[google.cloud.bigquery.job.CopyJobConfig]) – Extra configuration options for the copy job.

add_done_callback(fn)¶

Add a callback to be executed when the operation is complete.

If the operation is not already complete, this will start a helper thread to poll for the status of the operation in the background.

Parameters: fn (Callable[Future]) – The callback to execute when the operation is complete.

cancel(client=None, retry: typing.Optional[google.api_core.retry.retry_unary.Retry] = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = None) → bool¶

API call: cancel job via a POST request

See https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/cancel

Parameters

client (Optional[google.cloud.bigquery.client.Client]) – the client to use. If not passed, falls back to the client stored on the current dataset.
retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using retry

Returns

Boolean indicating that the cancel request was sent.

Return type

See google.cloud.bigquery.job.CopyJobConfig.destination_encryption_configuration.

cancelled()¶

Check if the job has been cancelled.

This always returns False. It’s not possible to check if a job was cancelled in the API. This method is here to satisfy the interface for google.api_core.future.Future.

Returns: False
Return type: bool

property configuration: google.cloud.bigquery.job.copy_.CopyJobConfig¶: The configuration for this copy job.

property create_disposition¶: See google.cloud.bigquery.job.CopyJobConfig.create_disposition.

property created¶

Datetime at which the job was created.

Returns: the creation time (None until set from the server).
Return type: Optional[datetime.datetime]

property destination¶

Table into which data is to be loaded.

Type: google.cloud.bigquery.table.TableReference

property destination_encryption_configuration¶

Custom encryption configuration for the destination table.

Custom encryption configuration (e.g., Cloud KMS keys) or None if using default encryption.

Type: google.cloud.bigquery.encryption_configuration.EncryptionConfiguration

done(retry: google.api_core.retry.retry_unary.Retry = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = 128, reload: bool = True) → bool¶

Checks if the job is complete.

Parameters

retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC. If the job state is DONE, retrying is aborted early, as the job will not change anymore.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using retry.
reload (Optional[bool]) – If True, make an API call to refresh the job state of unfinished jobs before checking. Default True.

Returns

True if the job is complete, False otherwise.

Return type

See https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/get

property ended¶

Datetime at which the job finished.

Returns: the end time (None until set from the server).
Return type: Optional[datetime.datetime]

property error_result¶

Error information about the job as a whole.

Returns: the error information (None until set from the server).
Return type: Optional[Mapping]

property errors¶

Information about individual errors generated by the job.

Returns: the error information (None until set from the server).
Return type: Optional[List[Mapping]]

property etag¶

ETag for the job resource.

Returns: the ETag (None until set from the server).
Return type: Optional[str]

exception(timeout=<object object>)¶

Get the exception from the operation, blocking if necessary.

See the documentation for the result() method for details on how this method operates, as both result and this method rely on the exact same polling logic. The only difference is that this method does not accept retry and polling arguments but relies on the default ones instead.

Parameters

timeout (int) – How long to wait for the operation to complete.
None (If) –
indefinitely. (wait) –

Returns

The operation’s: error.

Return type

Optional[google.api_core.GoogleAPICallError]

exists(client=None, retry: google.api_core.retry.retry_unary.Retry = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = None) → bool¶

API call: test for the existence of the job via a GET request

Parameters

client (Optional[google.cloud.bigquery.client.Client]) – the client to use. If not passed, falls back to the client stored on the current dataset.
retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using retry.

Returns

Boolean indicating existence of the job.

Return type

google.cloud.bigquery.job.CopyJob

classmethod from_api_repr(resource, client)[source]¶

Factory: construct a job given its API representation

Note

This method assumes that the project found in the resource matches the client’s project.

Parameters

resource (Dict) – dataset job representation returned from the API
client (google.cloud.bigquery.client.Client) – Client which holds credentials and project configuration for the dataset.

Returns

Job parsed from resource.

Return type

property job_id¶

ID of the job.

Type: str

property job_type¶

Type of job.

Returns: one of ‘load’, ‘copy’, ‘extract’, ‘query’.
Return type: str

property labels¶

Labels for the job.

Type: Dict[str, str]

property location¶

Location where the job runs.

Type: str

property num_child_jobs¶

The number of child jobs executed.

See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobStatistics.FIELDS.num_child_jobs

Returns: int

property parent_job_id¶

Return the ID of the parent job.

See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobStatistics.FIELDS.parent_job_id

Returns: parent job id.
Return type: Optional[str]

property path¶

URL path for the job’s APIs.

Returns: the path based on project and job ID.
Return type: str

property project¶

Project bound to the job.

Returns: the project (derived from the client).
Return type: str

reload(client=None, retry: google.api_core.retry.retry_unary.Retry = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = 128)¶

API call: refresh job properties via a GET request.

See https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/get

Parameters

client (Optional[google.cloud.bigquery.client.Client]) – the client to use. If not passed, falls back to the client stored on the current dataset.
retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using retry.

property reservation_usage¶

Job resource usage breakdown by reservation.

Returns: Reservation usage stats. Can be empty if not set from the server.
Return type: List[google.cloud.bigquery.job.ReservationUsage]

result(retry: typing.Optional[google.api_core.retry.retry_unary.Retry] = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = None) → google.cloud.bigquery.job.base._AsyncJob¶

Start the job and wait for it to complete and get the result.

Parameters

retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC. If the job state is DONE, retrying is aborted early, as the job will not change anymore.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using retry. If multiple requests are made under the hood, timeout applies to each individual request.

Returns

This instance.

Return type

_AsyncJob

Raises

google.cloud.exceptions.GoogleAPICallError – if the job failed.
concurrent.futures.TimeoutError – if the job did not complete in the given timeout.

running()¶: True if the operation is currently running.

property script_statistics: Optional[google.cloud.bigquery.job.base.ScriptStatistics]¶: Statistics for a child job of a script.

property self_link¶

URL for the job resource.

Returns: the URL (None until set from the server).
Return type: Optional[str]

property session_info: Optional[google.cloud.bigquery.job.base.SessionInfo]¶: [Preview] Information of the session if this job is part of one.

New in version 2.29.0.

set_exception(exception)¶: Set the Future’s exception.

set_result(result)¶: Set the Future’s result.

property sources¶

Table(s) from which data is to be loaded.

Type: List[google.cloud.bigquery.table.TableReference])

property started¶

Datetime at which the job was started.

Returns: the start time (None until set from the server).
Return type: Optional[datetime.datetime]

property state¶

Status of the job.

Returns: the state (None until set from the server).
Return type: Optional[str]

to_api_repr()[source]¶: Generate a resource for _begin().

property transaction_info: Optional[google.cloud.bigquery.job.base.TransactionInfo]¶

Information of the multi-statement transaction if this job is part of one.

Since a scripting query job can execute multiple transactions, this property is only expected on child jobs. Use the google.cloud.bigquery.client.Client.list_jobs() method with the parent_job parameter to iterate over child jobs.

New in version 2.24.0.

property user_email¶

E-mail address of user who submitted the job.

Returns: the URL (None until set from the server).
Return type: Optional[str]

property write_disposition¶: See google.cloud.bigquery.job.CopyJobConfig.write_disposition.

class google.cloud.bigquery.job.CopyJobConfig(**kwargs)[source]¶

Configuration options for copy jobs.

All properties in this class are optional. Values which are None -> server defaults. Set properties on the constructed configuration by using the property name as the name of a keyword argument.

__setattr__(name, value)¶: Override to be able to raise error if an unknown property is being set

property create_disposition¶

Specifies behavior for creating tables.

See https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationTableCopy.FIELDS.create_disposition

Type: google.cloud.bigquery.job.CreateDisposition

property destination_encryption_configuration¶

Custom encryption configuration for the destination table.

Custom encryption configuration (e.g., Cloud KMS keys) or None if using default encryption.

See https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationTableCopy.FIELDS.destination_encryption_configuration

Type: google.cloud.bigquery.encryption_configuration.EncryptionConfiguration

property destination_expiration_time: str¶

The time when the destination table expires. Expired tables will be deleted and their storage reclaimed.

See https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationTableCopy.FIELDS.destination_expiration_time

Type: google.cloud.bigquery.job.DestinationExpirationTime

classmethod from_api_repr(resource: dict) → google.cloud.bigquery.job.base._JobConfig¶

Factory: construct a job configuration given its API representation

Parameters: resource (Dict) – A job configuration in the same representation as is returned from the API.
Returns: Configuration parsed from resource.
Return type: google.cloud.bigquery.job._JobConfig

property job_timeout_ms¶

Optional parameter. Job timeout in milliseconds. If this time limit is exceeded, BigQuery might attempt to stop the job. https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfiguration.FIELDS.job_timeout_ms e.g.

job_config = bigquery.QueryJobConfig( job_timeout_ms = 5000 ) or job_config.job_timeout_ms = 5000

Raises: ValueError – If value type is invalid.

property labels¶

Labels for the job.

This method always returns a dict. Once a job has been created on the server, its labels cannot be modified anymore.

Raises: ValueError – If value type is invalid.
Type: Dict[str, str]

property operation_type: str¶

The operation to perform with this copy job.

See https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationTableCopy.FIELDS.operation_type

to_api_repr() → dict¶

Build an API representation of the job config.

Returns: A dictionary in the format used by the BigQuery API.
Return type: Dict

property write_disposition¶

Action that occurs if the destination table already exists.

See https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationTableCopy.FIELDS.write_disposition

Type: google.cloud.bigquery.job.WriteDisposition

class google.cloud.bigquery.job.CreateDisposition[source]¶

Specifies whether the job is allowed to create new tables. The default value is CREATE_IF_NEEDED.

Creation, truncation and append actions occur as one atomic update upon job completion.

CREATE_IF_NEEDED = 'CREATE_IF_NEEDED'¶: If the table does not exist, BigQuery creates the table.

CREATE_NEVER = 'CREATE_NEVER'¶: The table must already exist. If it does not, a ‘notFound’ error is returned in the job result.

class google.cloud.bigquery.job.DestinationFormat[source]¶

The exported file format. The default value is CSV.

Tables with nested or repeated fields cannot be exported as CSV.

AVRO = 'AVRO'¶: Specifies Avro format.

CSV = 'CSV'¶: Specifies CSV format.

NEWLINE_DELIMITED_JSON = 'NEWLINE_DELIMITED_JSON'¶: Specifies newline delimited JSON format.

PARQUET = 'PARQUET'¶: Specifies Parquet format.

class google.cloud.bigquery.job.DmlStats(inserted_row_count: int = 0, deleted_row_count: int = 0, updated_row_count: int = 0)[source]¶

Detailed statistics for DML statements.

https://cloud.google.com/bigquery/docs/reference/rest/v2/DmlStats

Create new instance of DmlStats(inserted_row_count, deleted_row_count, updated_row_count)

count(value, /)¶: Return number of occurrences of value.

deleted_row_count: int¶: Number of deleted rows. populated by DML DELETE, MERGE and TRUNCATE statements.

index(value, start=0, stop=9223372036854775807, /)¶

Return first index of value.

Raises ValueError if the value is not present.

inserted_row_count: int¶: Number of inserted rows. Populated by DML INSERT and MERGE statements.

updated_row_count: int¶: Number of updated rows. Populated by DML UPDATE and MERGE statements.

class google.cloud.bigquery.job.Encoding[source]¶

The character encoding of the data. The default is UTF_8.

BigQuery decodes the data after the raw, binary data has been split using the values of the quote and fieldDelimiter properties.

ISO_8859_1 = 'ISO-8859-1'¶: Specifies ISO-8859-1 encoding.

UTF_8 = 'UTF-8'¶: Specifies UTF-8 encoding.

class google.cloud.bigquery.job.ExtractJob(job_id, source, destination_uris, client, job_config=None)[source]¶

Asynchronous job: extract data from a table into Cloud Storage.

Parameters

job_id (str) – the job’s ID.
source (Union[ google.cloud.bigquery.table.TableReference, google.cloud.bigquery.model.ModelReference ]) – Table or Model from which data is to be loaded or extracted.
destination_uris (List[str]) – URIs describing where the extracted data will be written in Cloud Storage, using the format gs://<bucket_name>/<object_name_or_glob>.
client (google.cloud.bigquery.client.Client) – A client which holds credentials and project configuration.
job_config (Optional[google.cloud.bigquery.job.ExtractJobConfig]) – Extra configuration options for the extract job.

add_done_callback(fn)¶

Add a callback to be executed when the operation is complete.

If the operation is not already complete, this will start a helper thread to poll for the status of the operation in the background.

Parameters: fn (Callable[Future]) – The callback to execute when the operation is complete.

cancel(client=None, retry: typing.Optional[google.api_core.retry.retry_unary.Retry] = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = None) → bool¶

API call: cancel job via a POST request

See https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/cancel

Parameters

client (Optional[google.cloud.bigquery.client.Client]) – the client to use. If not passed, falls back to the client stored on the current dataset.
retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using retry

Returns

Boolean indicating that the cancel request was sent.

Return type

See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobStatistics4.FIELDS.destination_uri_file_counts

cancelled()¶

Check if the job has been cancelled.

This always returns False. It’s not possible to check if a job was cancelled in the API. This method is here to satisfy the interface for google.api_core.future.Future.

Returns: False
Return type: bool

property compression¶: See google.cloud.bigquery.job.ExtractJobConfig.compression.

property configuration: google.cloud.bigquery.job.extract.ExtractJobConfig¶: The configuration for this extract job.

property created¶

Datetime at which the job was created.

Returns: the creation time (None until set from the server).
Return type: Optional[datetime.datetime]

property destination_format¶: See google.cloud.bigquery.job.ExtractJobConfig.destination_format.

property destination_uri_file_counts¶

Return file counts from job statistics, if present.

Returns: A list of integer counts, each representing the number of files per destination URI or URI pattern specified in the extract configuration. These values will be in the same order as the URIs specified in the ‘destinationUris’ field. Returns None if job is not yet complete.
Return type: List[int]

property destination_uris¶

URIs describing where the extracted data will be written in Cloud Storage, using the format gs://<bucket_name>/<object_name_or_glob>.

Type: List[str]

done(retry: google.api_core.retry.retry_unary.Retry = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = 128, reload: bool = True) → bool¶

Checks if the job is complete.

Parameters

retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC. If the job state is DONE, retrying is aborted early, as the job will not change anymore.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using retry.
reload (Optional[bool]) – If True, make an API call to refresh the job state of unfinished jobs before checking. Default True.

Returns

True if the job is complete, False otherwise.

Return type

See https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/get

property ended¶

Datetime at which the job finished.

Returns: the end time (None until set from the server).
Return type: Optional[datetime.datetime]

property error_result¶

Error information about the job as a whole.

Returns: the error information (None until set from the server).
Return type: Optional[Mapping]

property errors¶

Information about individual errors generated by the job.

Returns: the error information (None until set from the server).
Return type: Optional[List[Mapping]]

property etag¶

ETag for the job resource.

Returns: the ETag (None until set from the server).
Return type: Optional[str]

exception(timeout=<object object>)¶

Get the exception from the operation, blocking if necessary.

See the documentation for the result() method for details on how this method operates, as both result and this method rely on the exact same polling logic. The only difference is that this method does not accept retry and polling arguments but relies on the default ones instead.

Parameters

timeout (int) – How long to wait for the operation to complete.
None (If) –
indefinitely. (wait) –

Returns

The operation’s: error.

Return type

Optional[google.api_core.GoogleAPICallError]

exists(client=None, retry: google.api_core.retry.retry_unary.Retry = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = None) → bool¶

API call: test for the existence of the job via a GET request

Parameters

client (Optional[google.cloud.bigquery.client.Client]) – the client to use. If not passed, falls back to the client stored on the current dataset.
retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using retry.

Returns

Boolean indicating existence of the job.

Return type

google.cloud.bigquery.job.ExtractJob

property field_delimiter¶: See google.cloud.bigquery.job.ExtractJobConfig.field_delimiter.

classmethod from_api_repr(resource: dict, client) → google.cloud.bigquery.job.extract.ExtractJob[source]¶

Factory: construct a job given its API representation

Note

This method assumes that the project found in the resource matches the client’s project.

Parameters

resource (Dict) – dataset job representation returned from the API
client (google.cloud.bigquery.client.Client) – Client which holds credentials and project configuration for the dataset.

Returns

Job parsed from resource.

Return type

property job_id¶

ID of the job.

Type: str

property job_type¶

Type of job.

Returns: one of ‘load’, ‘copy’, ‘extract’, ‘query’.
Return type: str

property labels¶

Labels for the job.

Type: Dict[str, str]

property location¶

Location where the job runs.

Type: str

property num_child_jobs¶

The number of child jobs executed.

See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobStatistics.FIELDS.num_child_jobs

Returns: int

property parent_job_id¶

Return the ID of the parent job.

See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobStatistics.FIELDS.parent_job_id

Returns: parent job id.
Return type: Optional[str]

property path¶

URL path for the job’s APIs.

Returns: the path based on project and job ID.
Return type: str

property print_header¶: See google.cloud.bigquery.job.ExtractJobConfig.print_header.

property project¶

Project bound to the job.

Returns: the project (derived from the client).
Return type: str

reload(client=None, retry: google.api_core.retry.retry_unary.Retry = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = 128)¶

API call: refresh job properties via a GET request.

See https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/get

Parameters

client (Optional[google.cloud.bigquery.client.Client]) – the client to use. If not passed, falls back to the client stored on the current dataset.
retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using retry.

property reservation_usage¶

Job resource usage breakdown by reservation.

Returns: Reservation usage stats. Can be empty if not set from the server.
Return type: List[google.cloud.bigquery.job.ReservationUsage]

result(retry: typing.Optional[google.api_core.retry.retry_unary.Retry] = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = None) → google.cloud.bigquery.job.base._AsyncJob¶

Start the job and wait for it to complete and get the result.

Parameters

retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC. If the job state is DONE, retrying is aborted early, as the job will not change anymore.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using retry. If multiple requests are made under the hood, timeout applies to each individual request.

Returns

This instance.

Return type

_AsyncJob

Raises

google.cloud.exceptions.GoogleAPICallError – if the job failed.
concurrent.futures.TimeoutError – if the job did not complete in the given timeout.

running()¶: True if the operation is currently running.

property script_statistics: Optional[google.cloud.bigquery.job.base.ScriptStatistics]¶: Statistics for a child job of a script.

property self_link¶

URL for the job resource.

Returns: the URL (None until set from the server).
Return type: Optional[str]

property session_info: Optional[google.cloud.bigquery.job.base.SessionInfo]¶: [Preview] Information of the session if this job is part of one.

New in version 2.29.0.

set_exception(exception)¶: Set the Future’s exception.

set_result(result)¶: Set the Future’s result.

property source¶

Table or Model from which data is to be loaded or extracted.

Type: Union[ google.cloud.bigquery.table.TableReference, google.cloud.bigquery.model.ModelReference ]

property started¶

Datetime at which the job was started.

Returns: the start time (None until set from the server).
Return type: Optional[datetime.datetime]

property state¶

Status of the job.

Returns: the state (None until set from the server).
Return type: Optional[str]

to_api_repr()[source]¶: Generate a resource for _begin().

property transaction_info: Optional[google.cloud.bigquery.job.base.TransactionInfo]¶

Information of the multi-statement transaction if this job is part of one.

Since a scripting query job can execute multiple transactions, this property is only expected on child jobs. Use the google.cloud.bigquery.client.Client.list_jobs() method with the parent_job parameter to iterate over child jobs.

New in version 2.24.0.

property user_email¶

E-mail address of user who submitted the job.

Returns: the URL (None until set from the server).
Return type: Optional[str]

class google.cloud.bigquery.job.ExtractJobConfig(**kwargs)[source]¶

Configuration options for extract jobs.

All properties in this class are optional. Values which are None -> server defaults. Set properties on the constructed configuration by using the property name as the name of a keyword argument.

__setattr__(name, value)¶: Override to be able to raise error if an unknown property is being set

property compression¶

Compression type to use for exported files.

See https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationExtract.FIELDS.compression

Type: google.cloud.bigquery.job.Compression

property destination_format¶

Exported file format.

See https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationExtract.FIELDS.destination_format

Type: google.cloud.bigquery.job.DestinationFormat

property field_delimiter¶

Delimiter to use between fields in the exported data.

See https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationExtract.FIELDS.field_delimiter

Type: str

classmethod from_api_repr(resource: dict) → google.cloud.bigquery.job.base._JobConfig¶

Factory: construct a job configuration given its API representation

Parameters: resource (Dict) – A job configuration in the same representation as is returned from the API.
Returns: Configuration parsed from resource.
Return type: google.cloud.bigquery.job._JobConfig

property job_timeout_ms¶

Optional parameter. Job timeout in milliseconds. If this time limit is exceeded, BigQuery might attempt to stop the job. https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfiguration.FIELDS.job_timeout_ms e.g.

job_config = bigquery.QueryJobConfig( job_timeout_ms = 5000 ) or job_config.job_timeout_ms = 5000

Raises: ValueError – If value type is invalid.

property labels¶

Labels for the job.

This method always returns a dict. Once a job has been created on the server, its labels cannot be modified anymore.

Raises: ValueError – If value type is invalid.
Type: Dict[str, str]

property print_header¶

Print a header row in the exported data.

See https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationExtract.FIELDS.print_header

Type: bool

to_api_repr() → dict¶

Build an API representation of the job config.

Returns: A dictionary in the format used by the BigQuery API.
Return type: Dict

property use_avro_logical_types¶

For loads of Avro data, governs whether Avro logical types are converted to their corresponding BigQuery types (e.g. TIMESTAMP) rather than raw types (e.g. INTEGER).

Type: bool

class google.cloud.bigquery.job.LoadJob(job_id, source_uris, destination, client, job_config=None)[source]¶

Asynchronous job for loading data into a table.

Can load from Google Cloud Storage URIs or from a file.

Parameters

job_id (str) – the job’s ID
source_uris (Optional[Sequence[str]]) – URIs of one or more data files to be loaded. See https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationLoad.FIELDS.source_uris for supported URI formats. Pass None for jobs that load from a file.
destination (google.cloud.bigquery.table.TableReference) – reference to table into which data is to be loaded.
client (google.cloud.bigquery.client.Client) – A client which holds credentials and project configuration for the dataset (which requires a project).

add_done_callback(fn)¶

Add a callback to be executed when the operation is complete.

If the operation is not already complete, this will start a helper thread to poll for the status of the operation in the background.

Parameters: fn (Callable[Future]) – The callback to execute when the operation is complete.

property allow_jagged_rows¶: See google.cloud.bigquery.job.LoadJobConfig.allow_jagged_rows.

property allow_quoted_newlines¶: See google.cloud.bigquery.job.LoadJobConfig.allow_quoted_newlines.

property autodetect¶: See google.cloud.bigquery.job.LoadJobConfig.autodetect.

cancel(client=None, retry: typing.Optional[google.api_core.retry.retry_unary.Retry] = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = None) → bool¶

API call: cancel job via a POST request

See https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/cancel

Parameters

client (Optional[google.cloud.bigquery.client.Client]) – the client to use. If not passed, falls back to the client stored on the current dataset.
retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using retry

Returns

Boolean indicating that the cancel request was sent.

Return type

See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationLoad.FIELDS.destination_table

cancelled()¶

Check if the job has been cancelled.

This always returns False. It’s not possible to check if a job was cancelled in the API. This method is here to satisfy the interface for google.api_core.future.Future.

Returns: False
Return type: bool

property clustering_fields¶: See google.cloud.bigquery.job.LoadJobConfig.clustering_fields.

property configuration: google.cloud.bigquery.job.load.LoadJobConfig¶: The configuration for this load job.

property connection_properties: List[google.cloud.bigquery.query.ConnectionProperty]¶: See google.cloud.bigquery.job.LoadJobConfig.connection_properties.

New in version 3.7.0.

property create_disposition¶: See google.cloud.bigquery.job.LoadJobConfig.create_disposition.

property create_session: Optional[bool]¶: See google.cloud.bigquery.job.LoadJobConfig.create_session.

New in version 3.7.0.

property created¶

Datetime at which the job was created.

Returns: the creation time (None until set from the server).
Return type: Optional[datetime.datetime]

property destination¶

table where loaded rows are written

Type: google.cloud.bigquery.table.TableReference

property destination_encryption_configuration¶

Custom encryption configuration for the destination table.

Custom encryption configuration (e.g., Cloud KMS keys) or None if using default encryption.

See google.cloud.bigquery.job.LoadJobConfig.destination_encryption_configuration.

Type: google.cloud.bigquery.encryption_configuration.EncryptionConfiguration

property destination_table_description¶

Optional[str] name given to destination table.

See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#DestinationTableProperties.FIELDS.description

property destination_table_friendly_name¶

Optional[str] name given to destination table.

See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#DestinationTableProperties.FIELDS.friendly_name

done(retry: google.api_core.retry.retry_unary.Retry = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = 128, reload: bool = True) → bool¶

Checks if the job is complete.

Parameters

retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC. If the job state is DONE, retrying is aborted early, as the job will not change anymore.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using retry.
reload (Optional[bool]) – If True, make an API call to refresh the job state of unfinished jobs before checking. Default True.

Returns

True if the job is complete, False otherwise.

Return type

See https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/get

property encoding¶: See google.cloud.bigquery.job.LoadJobConfig.encoding.

property ended¶

Datetime at which the job finished.

Returns: the end time (None until set from the server).
Return type: Optional[datetime.datetime]

property error_result¶

Error information about the job as a whole.

Returns: the error information (None until set from the server).
Return type: Optional[Mapping]

property errors¶

Information about individual errors generated by the job.

Returns: the error information (None until set from the server).
Return type: Optional[List[Mapping]]

property etag¶

ETag for the job resource.

Returns: the ETag (None until set from the server).
Return type: Optional[str]

exception(timeout=<object object>)¶

Get the exception from the operation, blocking if necessary.

See the documentation for the result() method for details on how this method operates, as both result and this method rely on the exact same polling logic. The only difference is that this method does not accept retry and polling arguments but relies on the default ones instead.

Parameters

timeout (int) – How long to wait for the operation to complete.
None (If) –
indefinitely. (wait) –

Returns

The operation’s: error.

Return type

Optional[google.api_core.GoogleAPICallError]

exists(client=None, retry: google.api_core.retry.retry_unary.Retry = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = None) → bool¶

API call: test for the existence of the job via a GET request

Parameters

client (Optional[google.cloud.bigquery.client.Client]) – the client to use. If not passed, falls back to the client stored on the current dataset.
retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using retry.

Returns

Boolean indicating existence of the job.

Return type

property field_delimiter¶: See google.cloud.bigquery.job.LoadJobConfig.field_delimiter.

classmethod from_api_repr(resource: dict, client) → google.cloud.bigquery.job.load.LoadJob[source]¶

Factory: construct a job given its API representation

Note

This method assumes that the project found in the resource matches the client’s project.

Parameters

resource (Dict) – dataset job representation returned from the API
client (google.cloud.bigquery.client.Client) – Client which holds credentials and project configuration for the dataset.

Returns

Job parsed from resource.

Return type

See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobStatistics.FIELDS.num_child_jobs

property ignore_unknown_values¶: See google.cloud.bigquery.job.LoadJobConfig.ignore_unknown_values.

property input_file_bytes¶

Count of bytes loaded from source files.

Returns: the count (None until set from the server).
Return type: Optional[int]
Raises: ValueError – for invalid value types.

property input_files¶

Count of source files.

Returns: the count (None until set from the server).
Return type: Optional[int]

property job_id¶

ID of the job.

Type: str

property job_type¶

Type of job.

Returns: one of ‘load’, ‘copy’, ‘extract’, ‘query’.
Return type: str

property labels¶

Labels for the job.

Type: Dict[str, str]

property location¶

Location where the job runs.

Type: str

property max_bad_records¶: See google.cloud.bigquery.job.LoadJobConfig.max_bad_records.

property null_marker¶: See google.cloud.bigquery.job.LoadJobConfig.null_marker.

property num_child_jobs¶

The number of child jobs executed.

Returns: int

property output_bytes¶

Count of bytes saved to destination table.

Returns: the count (None until set from the server).
Return type: Optional[int]

property output_rows¶

Count of rows saved to destination table.

Returns: the count (None until set from the server).
Return type: Optional[int]

property parent_job_id¶

Return the ID of the parent job.

See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobStatistics.FIELDS.parent_job_id

Returns: parent job id.
Return type: Optional[str]

property path¶

URL path for the job’s APIs.

Returns: the path based on project and job ID.
Return type: str

property project¶

Project bound to the job.

Returns: the project (derived from the client).
Return type: str

property quote_character¶: See google.cloud.bigquery.job.LoadJobConfig.quote_character.

property range_partitioning¶: See google.cloud.bigquery.job.LoadJobConfig.range_partitioning.

property reference_file_schema_uri¶: See: attr:google.cloud.bigquery.job.LoadJobConfig.reference_file_schema_uri.

reload(client=None, retry: google.api_core.retry.retry_unary.Retry = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = 128)¶

API call: refresh job properties via a GET request.

See https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/get

Parameters

client (Optional[google.cloud.bigquery.client.Client]) – the client to use. If not passed, falls back to the client stored on the current dataset.
retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using retry.

property reservation_usage¶

Job resource usage breakdown by reservation.

Returns: Reservation usage stats. Can be empty if not set from the server.
Return type: List[google.cloud.bigquery.job.ReservationUsage]

result(retry: typing.Optional[google.api_core.retry.retry_unary.Retry] = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = None) → google.cloud.bigquery.job.base._AsyncJob¶

Start the job and wait for it to complete and get the result.

Parameters

retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC. If the job state is DONE, retrying is aborted early, as the job will not change anymore.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using retry. If multiple requests are made under the hood, timeout applies to each individual request.

Returns

This instance.

Return type

_AsyncJob

Raises

google.cloud.exceptions.GoogleAPICallError – if the job failed.
concurrent.futures.TimeoutError – if the job did not complete in the given timeout.

running()¶: True if the operation is currently running.

property schema¶: See google.cloud.bigquery.job.LoadJobConfig.schema.

property schema_update_options¶: See google.cloud.bigquery.job.LoadJobConfig.schema_update_options.

property script_statistics: Optional[google.cloud.bigquery.job.base.ScriptStatistics]¶: Statistics for a child job of a script.

property self_link¶

URL for the job resource.

Returns: the URL (None until set from the server).
Return type: Optional[str]

property session_info: Optional[google.cloud.bigquery.job.base.SessionInfo]¶: [Preview] Information of the session if this job is part of one.

New in version 2.29.0.

set_exception(exception)¶: Set the Future’s exception.

set_result(result)¶: Set the Future’s result.

property skip_leading_rows¶: See google.cloud.bigquery.job.LoadJobConfig.skip_leading_rows.

property source_format¶: See google.cloud.bigquery.job.LoadJobConfig.source_format.

property source_uris¶

URIs of data files to be loaded. See https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationLoad.FIELDS.source_uris for supported URI formats. None for jobs that load from a file.

Type: Optional[Sequence[str]]

property started¶

Datetime at which the job was started.

Returns: the start time (None until set from the server).
Return type: Optional[datetime.datetime]

property state¶

Status of the job.

Returns: the state (None until set from the server).
Return type: Optional[str]

property time_partitioning¶: See google.cloud.bigquery.job.LoadJobConfig.time_partitioning.

to_api_repr()[source]¶: Generate a resource for _begin().

property transaction_info: Optional[google.cloud.bigquery.job.base.TransactionInfo]¶

Information of the multi-statement transaction if this job is part of one.

Since a scripting query job can execute multiple transactions, this property is only expected on child jobs. Use the google.cloud.bigquery.client.Client.list_jobs() method with the parent_job parameter to iterate over child jobs.

New in version 2.24.0.

property use_avro_logical_types¶: See google.cloud.bigquery.job.LoadJobConfig.use_avro_logical_types.

property user_email¶

E-mail address of user who submitted the job.

Returns: the URL (None until set from the server).
Return type: Optional[str]

property write_disposition¶: See google.cloud.bigquery.job.LoadJobConfig.write_disposition.

class google.cloud.bigquery.job.LoadJobConfig(**kwargs)[source]¶

Configuration options for load jobs.

Set properties on the constructed configuration by using the property name as the name of a keyword argument. Values which are unset or None use the BigQuery REST API default values. See the BigQuery REST API reference documentation for a list of default values.

Required options differ based on the source_format value. For example, the BigQuery API’s default value for source_format is "CSV". When loading a CSV file, either schema must be set or autodetect must be set to True.

__setattr__(name, value)¶: Override to be able to raise error if an unknown property is being set

property allow_jagged_rows¶

Allow missing trailing optional columns (CSV only).

See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationLoad.FIELDS.allow_jagged_rows

Type: Optional[bool]

property allow_quoted_newlines¶

Allow quoted data containing newline characters (CSV only).

See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationLoad.FIELDS.allow_quoted_newlines

Type: Optional[bool]

property autodetect¶

Automatically infer the schema from a sample of the data.

See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationLoad.FIELDS.autodetect

Type: Optional[bool]

property clustering_fields¶

Fields defining clustering for the table

(Defaults to None).

Clustering fields are immutable after table creation.

Note

BigQuery supports clustering for both partitioned and non-partitioned tables.

Type: Optional[List[str]]

property column_name_character_map: str¶

Optional[google.cloud.bigquery.job.ColumnNameCharacterMap]: Character map supported for column names in CSV/Parquet loads. Defaults to STRICT and can be overridden by Project Config Service. Using this option with unsupported load formats will result in an error.

See https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationLoad.FIELDS.column_name_character_map

property connection_properties: List[google.cloud.bigquery.query.ConnectionProperty]¶

Connection properties.

See https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationLoad.FIELDS.connection_properties

New in version 3.7.0.

property create_disposition¶

Specifies behavior for creating tables.

See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationLoad.FIELDS.create_disposition

Type: Optional[google.cloud.bigquery.job.CreateDisposition]

property create_session: Optional[bool]¶

[Preview] If True, creates a new session, where session_info will contain a random server generated session id.

If False, runs load job with an existing session_id passed in connection_properties, otherwise runs load job in non-session mode.

See https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationLoad.FIELDS.create_session

New in version 3.7.0.

property decimal_target_types: Optional[FrozenSet[str]]¶

Possible SQL data types to which the source decimal values are converted.

See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationLoad.FIELDS.decimal_target_types

New in version 2.21.0.

property destination_encryption_configuration¶

Custom encryption configuration for the destination table.

Custom encryption configuration (e.g., Cloud KMS keys) or None if using default encryption.

See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationLoad.FIELDS.destination_encryption_configuration

Type: Optional[google.cloud.bigquery.encryption_configuration.EncryptionConfiguration]

property destination_table_description¶

Description of the destination table.

See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#DestinationTableProperties.FIELDS.description

Type: Optional[str]

property destination_table_friendly_name¶

Name given to destination table.

See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#DestinationTableProperties.FIELDS.friendly_name

Type: Optional[str]

property encoding¶

The character encoding of the data.

See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationLoad.FIELDS.encoding

Type: Optional[google.cloud.bigquery.job.Encoding]

property field_delimiter¶

The separator for fields in a CSV file.

See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationLoad.FIELDS.field_delimiter

Type: Optional[str]

classmethod from_api_repr(resource: dict) → google.cloud.bigquery.job.base._JobConfig¶

Factory: construct a job configuration given its API representation

Parameters: resource (Dict) – A job configuration in the same representation as is returned from the API.
Returns: Configuration parsed from resource.
Return type: google.cloud.bigquery.job._JobConfig

property hive_partitioning¶

[Beta] When set, it configures hive partitioning support.

Note

Experimental. This feature is experimental and might change or have limited support.

See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationLoad.FIELDS.hive_partitioning_options

Type: Optional[HivePartitioningOptions]

property ignore_unknown_values¶

Ignore extra values not represented in the table schema.

See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationLoad.FIELDS.ignore_unknown_values

Type: Optional[bool]

property job_timeout_ms¶

Optional parameter. Job timeout in milliseconds. If this time limit is exceeded, BigQuery might attempt to stop the job. https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfiguration.FIELDS.job_timeout_ms e.g.

job_config = bigquery.QueryJobConfig( job_timeout_ms = 5000 ) or job_config.job_timeout_ms = 5000

Raises: ValueError – If value type is invalid.

property json_extension¶

The extension to use for writing JSON data to BigQuery. Only supports GeoJSON currently.

See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationLoad.FIELDS.json_extension

Type: Optional[str]

property labels¶

Labels for the job.

This method always returns a dict. Once a job has been created on the server, its labels cannot be modified anymore.

Raises: ValueError – If value type is invalid.
Type: Dict[str, str]

property max_bad_records¶

Number of invalid rows to ignore.

See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationLoad.FIELDS.max_bad_records

Type: Optional[int]

property null_marker¶

Represents a null value (CSV only).

See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationLoad.FIELDS.null_marker

Type: Optional[str]

property parquet_options¶

Additional: properties to set if sourceFormat is set to PARQUET.

See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationLoad.FIELDS.parquet_options

Type: Optional[google.cloud.bigquery.format_options.ParquetOptions]

property preserve_ascii_control_characters¶

Preserves the embedded ASCII control characters when sourceFormat is set to CSV.

See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationLoad.FIELDS.preserve_ascii_control_characters

Type: Optional[bool]

property projection_fields: Optional[List[str]]¶

If google.cloud.bigquery.job.LoadJobConfig.source_format is set to “DATASTORE_BACKUP”, indicates which entity properties to load into BigQuery from a Cloud Datastore backup.

Property names are case sensitive and must be top-level properties. If no properties are specified, BigQuery loads all properties. If any named property isn’t found in the Cloud Datastore backup, an invalid error is returned in the job result.

See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationLoad.FIELDS.projection_fields

Type: Optional[List[str]]

property quote_character¶

Character used to quote data sections (CSV only).

See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationLoad.FIELDS.quote

Type: Optional[str]

property range_partitioning¶

Optional[google.cloud.bigquery.table.RangePartitioning]: Configures range-based partitioning for destination table.

Note

Beta. The integer range partitioning feature is in a pre-release state and might change or have limited support.

Only specify at most one of time_partitioning or range_partitioning.

Raises: ValueError – If the value is not RangePartitioning or None.

property reference_file_schema_uri¶

Optional[str]: When creating an external table, the user can provide a reference file with the table schema. This is enabled for the following formats:

AVRO, PARQUET, ORC

property schema¶

Schema of the destination table.

See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationLoad.FIELDS.schema

Type: Optional[Sequence[Union[ SchemaField, Mapping[str, Any] ]]]

property schema_update_options¶

Specifies updates to the destination table schema to allow as a side effect of the load job.

Type: Optional[List[google.cloud.bigquery.job.SchemaUpdateOption]]

property skip_leading_rows¶

Number of rows to skip when reading data (CSV only).

See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationLoad.FIELDS.skip_leading_rows

Type: Optional[int]

property source_format¶

File format of the data.

See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationLoad.FIELDS.source_format

Type: Optional[google.cloud.bigquery.job.SourceFormat]

property time_partitioning¶

Specifies time-based partitioning for the destination table.

Only specify at most one of time_partitioning or range_partitioning.

Type: Optional[google.cloud.bigquery.table.TimePartitioning]

to_api_repr() → dict¶

Build an API representation of the job config.

Returns: A dictionary in the format used by the BigQuery API.
Return type: Dict

property use_avro_logical_types¶

For loads of Avro data, governs whether Avro logical types are converted to their corresponding BigQuery types (e.g. TIMESTAMP) rather than raw types (e.g. INTEGER).

Type: Optional[bool]

property write_disposition¶

Action that occurs if the destination table already exists.

See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationLoad.FIELDS.write_disposition

Type: Optional[google.cloud.bigquery.job.WriteDisposition]

class google.cloud.bigquery.job.OperationType[source]¶

Different operation types supported in table copy job.

https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#operationtype

CLONE = 'CLONE'¶: The source table type is TABLE and the destination table type is CLONE.

COPY = 'COPY'¶: The source and destination table have the same table type.

OPERATION_TYPE_UNSPECIFIED = 'OPERATION_TYPE_UNSPECIFIED'¶: Unspecified operation type.

RESTORE = 'RESTORE'¶: The source table type is SNAPSHOT and the destination table type is TABLE.

SNAPSHOT = 'SNAPSHOT'¶: The source table type is TABLE and the destination table type is SNAPSHOT.

class google.cloud.bigquery.job.QueryJob(job_id, query, client, job_config=None)[source]¶

Asynchronous job: query tables.

Parameters

job_id (str) – the job’s ID, within the project belonging to client.
query (str) – SQL query string.
client (google.cloud.bigquery.client.Client) – A client which holds credentials and project configuration for the dataset (which requires a project).
job_config (Optional[google.cloud.bigquery.job.QueryJobConfig]) – Extra configuration options for the query job.

add_done_callback(fn)¶

Add a callback to be executed when the operation is complete.

If the operation is not already complete, this will start a helper thread to poll for the status of the operation in the background.

Parameters: fn (Callable[Future]) – The callback to execute when the operation is complete.

property allow_large_results¶: See google.cloud.bigquery.job.QueryJobConfig.allow_large_results.

property billing_tier¶

Return billing tier from job statistics, if present.

See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobStatistics2.FIELDS.billing_tier

Returns: Billing tier used by the job, or None if job is not yet complete.
Return type: Optional[int]

property cache_hit¶

Return whether or not query results were served from cache.

See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobStatistics2.FIELDS.cache_hit

Returns: whether the query results were returned from cache, or None if job is not yet complete.
Return type: Optional[bool]

cancel(client=None, retry: typing.Optional[google.api_core.retry.retry_unary.Retry] = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = None) → bool¶

API call: cancel job via a POST request

See https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/cancel

Parameters

client (Optional[google.cloud.bigquery.client.Client]) – the client to use. If not passed, falls back to the client stored on the current dataset.
retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using retry

Returns

Boolean indicating that the cancel request was sent.

Return type

See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobStatistics2.FIELDS.ddl_operation_performed

cancelled()¶

Check if the job has been cancelled.

This always returns False. It’s not possible to check if a job was cancelled in the API. This method is here to satisfy the interface for google.api_core.future.Future.

Returns: False
Return type: bool

property clustering_fields¶: See google.cloud.bigquery.job.QueryJobConfig.clustering_fields.

property configuration: google.cloud.bigquery.job.query.QueryJobConfig¶: The configuration for this query job.

property connection_properties: List[google.cloud.bigquery.query.ConnectionProperty]¶: See google.cloud.bigquery.job.QueryJobConfig.connection_properties.

New in version 2.29.0.

property create_disposition¶: See google.cloud.bigquery.job.QueryJobConfig.create_disposition.

property create_session: Optional[bool]¶: See google.cloud.bigquery.job.QueryJobConfig.create_session.

New in version 2.29.0.

property created¶

Datetime at which the job was created.

Returns: the creation time (None until set from the server).
Return type: Optional[datetime.datetime]

property ddl_operation_performed¶

Return the DDL operation performed.

Type: Optional[str]

property ddl_target_routine¶

Return the DDL target routine, present: for CREATE/DROP FUNCTION/PROCEDURE queries.

See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobStatistics2.FIELDS.ddl_target_routine

Type: Optional[google.cloud.bigquery.routine.RoutineReference]

property ddl_target_table¶

Return the DDL target table, present: for CREATE/DROP TABLE/VIEW queries.

See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobStatistics2.FIELDS.ddl_target_table

Type: Optional[google.cloud.bigquery.table.TableReference]

property default_dataset¶: See google.cloud.bigquery.job.QueryJobConfig.default_dataset.

property destination¶: See google.cloud.bigquery.job.QueryJobConfig.destination.

property destination_encryption_configuration¶

Custom encryption configuration for the destination table.

Custom encryption configuration (e.g., Cloud KMS keys) or None if using default encryption.

See google.cloud.bigquery.job.QueryJobConfig.destination_encryption_configuration.

Type: google.cloud.bigquery.encryption_configuration.EncryptionConfiguration

done(retry: google.api_core.retry.retry_unary.Retry = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = 128, reload: bool = True) → bool¶

Checks if the job is complete.

Parameters

retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC. If the job state is DONE, retrying is aborted early, as the job will not change anymore.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using retry.
reload (Optional[bool]) – If True, make an API call to refresh the job state of unfinished jobs before checking. Default True.

Returns

True if the job is complete, False otherwise.

Return type

See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobStatistics2.FIELDS.estimated_bytes_processed

property dry_run¶: See google.cloud.bigquery.job.QueryJobConfig.dry_run.

property ended¶

Datetime at which the job finished.

Returns: the end time (None until set from the server).
Return type: Optional[datetime.datetime]

property error_result¶

Error information about the job as a whole.

Returns: the error information (None until set from the server).
Return type: Optional[Mapping]

property errors¶

Information about individual errors generated by the job.

Returns: the error information (None until set from the server).
Return type: Optional[List[Mapping]]

property estimated_bytes_processed¶

Return the estimated number of bytes processed by the query.

Returns: number of DML rows affected by the job, or None if job is not yet complete.
Return type: Optional[int]

property etag¶

ETag for the job resource.

Returns: the ETag (None until set from the server).
Return type: Optional[str]

exception(timeout=<object object>)¶

Get the exception from the operation, blocking if necessary.

See the documentation for the result() method for details on how this method operates, as both result and this method rely on the exact same polling logic. The only difference is that this method does not accept retry and polling arguments but relies on the default ones instead.

Parameters

timeout (int) – How long to wait for the operation to complete.
None (If) –
indefinitely. (wait) –

Returns

The operation’s: error.

Return type

Optional[google.api_core.GoogleAPICallError]

exists(client=None, retry: google.api_core.retry.retry_unary.Retry = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = None) → bool¶

API call: test for the existence of the job via a GET request

See https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/get

Parameters

client (Optional[google.cloud.bigquery.client.Client]) – the client to use. If not passed, falls back to the client stored on the current dataset.
retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using retry.

Returns

Boolean indicating existence of the job.

Return type

google.cloud.bigquery.job.QueryJob

property flatten_results¶: See google.cloud.bigquery.job.QueryJobConfig.flatten_results.

classmethod from_api_repr(resource: dict, client: Client) → QueryJob[source]¶

Factory: construct a job given its API representation

Parameters

resource (Dict) – dataset job representation returned from the API
client (google.cloud.bigquery.client.Client) – Client which holds credentials and project configuration for the dataset.

Returns

Job parsed from resource.

Return type

property job_id¶

ID of the job.

Type: str

property job_type¶

Type of job.

Returns: one of ‘load’, ‘copy’, ‘extract’, ‘query’.
Return type: str

property labels¶

Labels for the job.

Type: Dict[str, str]

property location¶

Location where the job runs.

Type: str

property maximum_billing_tier¶: See google.cloud.bigquery.job.QueryJobConfig.maximum_billing_tier.

property maximum_bytes_billed¶: See google.cloud.bigquery.job.QueryJobConfig.maximum_bytes_billed.

property num_child_jobs¶

The number of child jobs executed.

See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobStatistics.FIELDS.num_child_jobs

Returns: int

property num_dml_affected_rows: Optional[int]¶

Return the number of DML rows affected by the job.

See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobStatistics2.FIELDS.num_dml_affected_rows

Returns: number of DML rows affected by the job, or None if job is not yet complete.
Return type: Optional[int]

property parent_job_id¶

Return the ID of the parent job.

See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobStatistics.FIELDS.parent_job_id

Returns: parent job id.
Return type: Optional[str]

property path¶

URL path for the job’s APIs.

Returns: the path based on project and job ID.
Return type: str

property priority¶: See google.cloud.bigquery.job.QueryJobConfig.priority.

property project¶

Project bound to the job.

Returns: the project (derived from the client).
Return type: str

property query¶

The query text used in this query job.

See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationQuery.FIELDS.query

Type: str

property query_id: Optional[str]¶

[Preview] ID of a completed query.

This ID is auto-generated and not guaranteed to be populated.

property query_parameters¶: See google.cloud.bigquery.job.QueryJobConfig.query_parameters.

property query_plan¶

Return query plan from job statistics, if present.

See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobStatistics2.FIELDS.query_plan

Returns: mappings describing the query plan, or an empty list if the query has not yet completed.
Return type: List[google.cloud.bigquery.job.QueryPlanEntry]

property range_partitioning¶: See google.cloud.bigquery.job.QueryJobConfig.range_partitioning.

property referenced_tables¶

Return referenced tables from job statistics, if present.

See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobStatistics2.FIELDS.referenced_tables

Returns: mappings describing the query plan, or an empty list if the query has not yet completed.
Return type: List[Dict]

reload(client=None, retry: google.api_core.retry.retry_unary.Retry = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = 128)¶

API call: refresh job properties via a GET request.

See https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/get

Parameters

client (Optional[google.cloud.bigquery.client.Client]) – the client to use. If not passed, falls back to the client stored on the current dataset.
retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using retry.

property reservation_usage¶

Job resource usage breakdown by reservation.

Returns: Reservation usage stats. Can be empty if not set from the server.
Return type: List[google.cloud.bigquery.job.ReservationUsage]

result(page_size: typing.Optional[int] = None, max_results: typing.Optional[int] = None, retry: typing.Optional[google.api_core.retry.retry_unary.Retry] = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[typing.Union[float, object]] = <object object>, start_index: typing.Optional[int] = None, job_retry: typing.Optional[google.api_core.retry.retry_unary.Retry] = <google.api_core.retry.retry_unary.Retry object>) → Union[RowIterator, google.cloud.bigquery.table._EmptyRowIterator][source]¶

Start the job and wait for it to complete and get the result.

Parameters

page_size (Optional[int]) – The maximum number of rows in each page of results from this request. Non-positive values are ignored.
max_results (Optional[int]) – The maximum total number of rows from this request.
retry (Optional[google.api_core.retry.Retry]) – How to retry the call that retrieves rows. This only applies to making RPC calls. It isn’t used to retry failed jobs. This has a reasonable default that should only be overridden with care. If the job state is DONE, retrying is aborted early even if the results are not available, as this will not change anymore.
timeout (Optional[Union[float, google.api_core.future.polling.PollingFuture._DEFAULT_VALUE, ]]) – The number of seconds to wait for the underlying HTTP transport before using retry. If None, wait indefinitely unless an error is returned. If unset, only the underlying API calls have their default timeouts, but we still wait indefinitely for the job to finish.
start_index (Optional[int]) – The zero-based index of the starting row to read.
job_retry (Optional[google.api_core.retry.Retry]) –
How to retry failed jobs. The default retries rate-limit-exceeded errors. Passing None disables job retry.

Not all jobs can be retried. If job_id was provided to the query that created this job, then the job returned by the query will not be retryable, and an exception will be raised if non-None non-default job_retry is also provided.

Returns

Iterator of row data Row-s. During each page, the iterator will have the total_rows attribute set, which counts the total number of rows in the result set (this is distinct from the total number of rows in the current page: iterator.page.num_items).

If the query is a special query that produces no results, e.g. a DDL query, an _EmptyRowIterator instance is returned.

Return type

google.cloud.bigquery.table.RowIterator

Raises

google.cloud.exceptions.GoogleAPICallError – If the job failed and retries aren’t successful.
concurrent.futures.TimeoutError – If the job did not complete in the given timeout.
TypeError – If Non-None and non-default job_retry is provided and the job is not retryable.

running()¶: True if the operation is currently running.

property schema: Optional[List[google.cloud.bigquery.schema.SchemaField]]¶

The schema of the results.

Present only for successful dry run of non-legacy SQL queries.

property schema_update_options¶: See google.cloud.bigquery.job.QueryJobConfig.schema_update_options.

property script_statistics: Optional[google.cloud.bigquery.job.base.ScriptStatistics]¶: Statistics for a child job of a script.

property search_stats: Optional[google.cloud.bigquery.job.query.SearchStats]¶: Returns a SearchStats object.

property self_link¶

URL for the job resource.

Returns: the URL (None until set from the server).
Return type: Optional[str]

property session_info: Optional[google.cloud.bigquery.job.base.SessionInfo]¶: [Preview] Information of the session if this job is part of one.

New in version 2.29.0.

set_exception(exception)¶: Set the Future’s exception.

set_result(result)¶: Set the Future’s result.

property slot_millis¶

Slot-milliseconds used by this query job.

Type: Union[int, None]

property started¶

Datetime at which the job was started.

Returns: the start time (None until set from the server).
Return type: Optional[datetime.datetime]

property state¶

Status of the job.

Returns: the state (None until set from the server).
Return type: Optional[str]

property statement_type¶

Return statement type from job statistics, if present.

See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobStatistics2.FIELDS.statement_type

Returns: type of statement used by the job, or None if job is not yet complete.
Return type: Optional[str]

property table_definitions¶: See google.cloud.bigquery.job.QueryJobConfig.table_definitions.

property time_partitioning¶: See google.cloud.bigquery.job.QueryJobConfig.time_partitioning.

property timeline¶

Return the query execution timeline from job statistics.

Type: List(TimelineEntry)

to_api_repr()[source]¶: Generate a resource for _begin().

to_arrow(progress_bar_type: Optional[str] = None, bqstorage_client: Optional[bigquery_storage.BigQueryReadClient] = None, create_bqstorage_client: bool = True, max_results: Optional[int] = None) → pyarrow.Table[source]¶

[Beta] Create a class:pyarrow.Table by loading all pages of a table or query.

Parameters

progress_bar_type (Optional[str]) –
If set, use the tqdm library to display a progress bar while the data downloads. Install the tqdm package to use this feature.

Possible values of progress_bar_type include:

None
No progress bar.

'tqdm'
Use the tqdm.tqdm() function to print a progress bar to sys.stdout.

'tqdm_notebook'
Use the tqdm.notebook.tqdm() function to display a progress bar as a Jupyter notebook widget.

'tqdm_gui'
Use the tqdm.tqdm_gui() function to display a progress bar as a graphical dialog box.
bqstorage_client (Optional[google.cloud.bigquery_storage_v1.BigQueryReadClient]) –
A BigQuery Storage API client. If supplied, use the faster BigQuery Storage API to fetch rows from BigQuery. This API is a billable API.

This method requires google-cloud-bigquery-storage library.

Reading from a specific partition or snapshot is not currently supported by this method.
create_bqstorage_client (Optional[bool]) –
If True (default), create a BigQuery Storage API client using the default API settings. The BigQuery Storage API is a faster way to fetch rows from BigQuery. See the bqstorage_client parameter for more information.

This argument does nothing if bqstorage_client is supplied.

New in version 1.24.0.
max_results (Optional[int]) –
Maximum number of rows to include in the result. No limit by default.

New in version 2.21.0.

Returns

pyarrow.Table: A pyarrow.Table populated with row data and column headers from the query results. The column headers are derived from the destination table’s schema.

Raises

ValueError – If the pyarrow library cannot be imported.

New in version 1.17.0.

to_dataframe(bqstorage_client: Optional[bigquery_storage.BigQueryReadClient] = None, dtypes: Optional[Dict[str, Any]] = None, progress_bar_type: Optional[str] = None, create_bqstorage_client: bool = True, max_results: Optional[int] = None, geography_as_object: bool = False, bool_dtype: Optional[Any] = DefaultPandasDTypes.BOOL_DTYPE, int_dtype: Optional[Any] = DefaultPandasDTypes.INT_DTYPE, float_dtype: Optional[Any] = None, string_dtype: Optional[Any] = None, date_dtype: Optional[Any] = DefaultPandasDTypes.DATE_DTYPE, datetime_dtype: Optional[Any] = None, time_dtype: Optional[Any] = DefaultPandasDTypes.TIME_DTYPE, timestamp_dtype: Optional[Any] = None, range_date_dtype: Optional[Any] = DefaultPandasDTypes.RANGE_DATE_DTYPE, range_datetime_dtype: Optional[Any] = DefaultPandasDTypes.RANGE_DATETIME_DTYPE, range_timestamp_dtype: Optional[Any] = DefaultPandasDTypes.RANGE_TIMESTAMP_DTYPE) → pandas.DataFrame[source]¶

Return a pandas DataFrame from a QueryJob

Parameters

bqstorage_client (Optional[google.cloud.bigquery_storage_v1.BigQueryReadClient]) –
A BigQuery Storage API client. If supplied, use the faster BigQuery Storage API to fetch rows from BigQuery. This API is a billable API.

This method requires the fastavro and google-cloud-bigquery-storage libraries.

Reading from a specific partition or snapshot is not currently supported by this method.
dtypes (Optional[Map[str, Union[str, pandas.Series.dtype]]]) – A dictionary of column names pandas dtype``s. The provided ``dtype is used when constructing the series for the column specified. Otherwise, the default pandas behavior is used.
progress_bar_type (Optional[str]) –
If set, use the tqdm library to display a progress bar while the data downloads. Install the tqdm package to use this feature.

See to_dataframe() for details.

New in version 1.11.0.
create_bqstorage_client (Optional[bool]) –
If True (default), create a BigQuery Storage API client using the default API settings. The BigQuery Storage API is a faster way to fetch rows from BigQuery. See the bqstorage_client parameter for more information.

This argument does nothing if bqstorage_client is supplied.

New in version 1.24.0.
max_results (Optional[int]) –
Maximum number of rows to include in the result. No limit by default.

New in version 2.21.0.
geography_as_object (Optional[bool]) –
If True, convert GEOGRAPHY data to shapely geometry objects. If False (default), don’t cast geography data to shapely geometry objects.

New in version 2.24.0.
bool_dtype (Optional[pandas.Series.dtype, None]) –
If set, indicate a pandas ExtensionDtype (e.g. pandas.BooleanDtype()) to convert BigQuery Boolean type, instead of relying on the default pandas.BooleanDtype(). If you explicitly set the value to None, then the data type will be numpy.dtype("bool"). BigQuery Boolean type can be found at: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#boolean_type

New in version 3.8.0.
int_dtype (Optional[pandas.Series.dtype, None]) –
If set, indicate a pandas ExtensionDtype (e.g. pandas.Int64Dtype()) to convert BigQuery Integer types, instead of relying on the default pandas.Int64Dtype(). If you explicitly set the value to None, then the data type will be numpy.dtype("int64"). A list of BigQuery Integer types can be found at: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#integer_types

New in version 3.8.0.
float_dtype (Optional[pandas.Series.dtype, None]) –
If set, indicate a pandas ExtensionDtype (e.g. pandas.Float32Dtype()) to convert BigQuery Float type, instead of relying on the default numpy.dtype("float64"). If you explicitly set the value to None, then the data type will be numpy.dtype("float64"). BigQuery Float type can be found at: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#floating_point_types

New in version 3.8.0.
string_dtype (Optional[pandas.Series.dtype, None]) –
If set, indicate a pandas ExtensionDtype (e.g. pandas.StringDtype()) to convert BigQuery String type, instead of relying on the default numpy.dtype("object"). If you explicitly set the value to None, then the data type will be numpy.dtype("object"). BigQuery String type can be found at: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#string_type

New in version 3.8.0.
date_dtype (Optional[pandas.Series.dtype, None]) –
If set, indicate a pandas ExtensionDtype (e.g. pandas.ArrowDtype(pyarrow.date32())) to convert BigQuery Date type, instead of relying on the default db_dtypes.DateDtype(). If you explicitly set the value to None, then the data type will be numpy.dtype("datetime64[ns]") or object if out of bound. BigQuery Date type can be found at: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#date_type

New in version 3.10.0.
datetime_dtype (Optional[pandas.Series.dtype, None]) –
If set, indicate a pandas ExtensionDtype (e.g. pandas.ArrowDtype(pyarrow.timestamp("us"))) to convert BigQuery Datetime type, instead of relying on the default numpy.dtype("datetime64[ns]. If you explicitly set the value to None, then the data type will be numpy.dtype("datetime64[ns]") or object if out of bound. BigQuery Datetime type can be found at: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#datetime_type

New in version 3.10.0.
time_dtype (Optional[pandas.Series.dtype, None]) –
If set, indicate a pandas ExtensionDtype (e.g. pandas.ArrowDtype(pyarrow.time64("us"))) to convert BigQuery Time type, instead of relying on the default db_dtypes.TimeDtype(). If you explicitly set the value to None, then the data type will be numpy.dtype("object"). BigQuery Time type can be found at: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#time_type

New in version 3.10.0.
timestamp_dtype (Optional[pandas.Series.dtype, None]) –
If set, indicate a pandas ExtensionDtype (e.g. pandas.ArrowDtype(pyarrow.timestamp("us", tz="UTC"))) to convert BigQuery Timestamp type, instead of relying on the default numpy.dtype("datetime64[ns, UTC]"). If you explicitly set the value to None, then the data type will be numpy.dtype("datetime64[ns, UTC]") or object if out of bound. BigQuery Datetime type can be found at: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#timestamp_type

New in version 3.10.0.
range_date_dtype (Optional[pandas.Series.dtype, None]) –
If set, indicate a pandas ExtensionDtype, such as:
```
pandas.ArrowDtype(pyarrow.struct(
    [("start", pyarrow.date32()), ("end", pyarrow.date32())]
))
```
to convert BigQuery RANGE<DATE> type, instead of relying on the default object. If you explicitly set the value to None, the data type will be object. BigQuery Range type can be found at: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#range_type

New in version 3.21.0.
range_datetime_dtype (Optional[pandas.Series.dtype, None]) –
If set, indicate a pandas ExtensionDtype, such as:
```
pandas.ArrowDtype(pyarrow.struct(
    [
        ("start", pyarrow.timestamp("us")),
        ("end", pyarrow.timestamp("us")),
    ]
))
```
to convert BigQuery RANGE<DATETIME> type, instead of relying on the default object. If you explicitly set the value to None, the data type will be object. BigQuery Range type can be found at: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#range_type

New in version 3.21.0.
range_timestamp_dtype (Optional[pandas.Series.dtype, None]) –
If set, indicate a pandas ExtensionDtype, such as:
```
pandas.ArrowDtype(pyarrow.struct(
    [
        ("start", pyarrow.timestamp("us", tz="UTC")),
        ("end", pyarrow.timestamp("us", tz="UTC")),
    ]
))
```
to convert BigQuery RANGE<TIMESTAMP> type, instead of relying on the default object. If you explicitly set the value to None, the data type will be object. BigQuery Range type can be found at: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#range_type

New in version 3.21.0.

Returns

A DataFrame populated with row data and column headers from the query results. The column headers are derived from the destination table’s schema.

Return type

pandas.DataFrame

Raises

ValueError – If the pandas library cannot be imported, or the google.cloud.bigquery_storage_v1 module is required but cannot be imported. Also if geography_as_object is True, but the shapely library cannot be imported.

to_geodataframe(bqstorage_client: Optional[bigquery_storage.BigQueryReadClient] = None, dtypes: Optional[Dict[str, Any]] = None, progress_bar_type: Optional[str] = None, create_bqstorage_client: bool = True, max_results: Optional[int] = None, geography_column: Optional[str] = None) → geopandas.GeoDataFrame[source]¶

Return a GeoPandas GeoDataFrame from a QueryJob

Parameters

bqstorage_client (Optional[google.cloud.bigquery_storage_v1.BigQueryReadClient]) –
A BigQuery Storage API client. If supplied, use the faster BigQuery Storage API to fetch rows from BigQuery. This API is a billable API.

This method requires the fastavro and google-cloud-bigquery-storage libraries.

Reading from a specific partition or snapshot is not currently supported by this method.
dtypes (Optional[Map[str, Union[str, pandas.Series.dtype]]]) – A dictionary of column names pandas dtype``s. The provided ``dtype is used when constructing the series for the column specified. Otherwise, the default pandas behavior is used.
progress_bar_type (Optional[str]) –
If set, use the tqdm library to display a progress bar while the data downloads. Install the tqdm package to use this feature.

See to_dataframe() for details.

New in version 1.11.0.
create_bqstorage_client (Optional[bool]) –
If True (default), create a BigQuery Storage API client using the default API settings. The BigQuery Storage API is a faster way to fetch rows from BigQuery. See the bqstorage_client parameter for more information.

This argument does nothing if bqstorage_client is supplied.

New in version 1.24.0.
max_results (Optional[int]) –
Maximum number of rows to include in the result. No limit by default.

New in version 2.21.0.
geography_column (Optional[str]) – If there are more than one GEOGRAPHY column, identifies which one to use to construct a GeoPandas GeoDataFrame. This option can be ommitted if there’s only one GEOGRAPHY column.

Returns

A geopandas.GeoDataFrame populated with row data and column headers from the query results. The column headers are derived from the destination table’s schema.

Return type

geopandas.GeoDataFrame

Raises

ValueError – If the geopandas library cannot be imported, or the google.cloud.bigquery_storage_v1 module is required but cannot be imported.

New in version 2.24.0.

property total_bytes_billed¶

Return total bytes billed from job statistics, if present.

See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobStatistics2.FIELDS.total_bytes_billed

Returns: Total bytes processed by the job, or None if job is not yet complete.
Return type: Optional[int]

property total_bytes_processed¶

Return total bytes processed from job statistics, if present.

See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobStatistics2.FIELDS.total_bytes_processed

Returns: Total bytes processed by the job, or None if job is not yet complete.
Return type: Optional[int]

property transaction_info: Optional[google.cloud.bigquery.job.base.TransactionInfo]¶

Information of the multi-statement transaction if this job is part of one.

Since a scripting query job can execute multiple transactions, this property is only expected on child jobs. Use the google.cloud.bigquery.client.Client.list_jobs() method with the parent_job parameter to iterate over child jobs.

New in version 2.24.0.

property udf_resources¶: See google.cloud.bigquery.job.QueryJobConfig.udf_resources.

property undeclared_query_parameters¶

Return undeclared query parameters from job statistics, if present.

See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobStatistics2.FIELDS.undeclared_query_parameters

Returns: Undeclared parameters, or an empty list if the query has not yet completed.
Return type: List[Union[ google.cloud.bigquery.query.ArrayQueryParameter, google.cloud.bigquery.query.ScalarQueryParameter, google.cloud.bigquery.query.StructQueryParameter ]]

property use_legacy_sql¶: See google.cloud.bigquery.job.QueryJobConfig.use_legacy_sql.

property use_query_cache¶: See google.cloud.bigquery.job.QueryJobConfig.use_query_cache.

property user_email¶

E-mail address of user who submitted the job.

Returns: the URL (None until set from the server).
Return type: Optional[str]

property write_disposition¶: See google.cloud.bigquery.job.QueryJobConfig.write_disposition.

class google.cloud.bigquery.job.QueryJobConfig(**kwargs)[source]¶

Configuration options for query jobs.

All properties in this class are optional. Values which are None -> server defaults. Set properties on the constructed configuration by using the property name as the name of a keyword argument.

__setattr__(name, value)¶: Override to be able to raise error if an unknown property is being set

property allow_large_results¶

Allow large query results tables (legacy SQL, only)

See https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationQuery.FIELDS.allow_large_results

Type: bool

property clustering_fields¶

Fields defining clustering for the table

(Defaults to None).

Clustering fields are immutable after table creation.

Note

BigQuery supports clustering for both partitioned and non-partitioned tables.

Type: Optional[List[str]]

property connection_properties: List[google.cloud.bigquery.query.ConnectionProperty]¶

Connection properties.

See https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationQuery.FIELDS.connection_properties

New in version 2.29.0.

property create_disposition¶

Specifies behavior for creating tables.

See https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationQuery.FIELDS.create_disposition

Type: google.cloud.bigquery.job.CreateDisposition

property create_session: Optional[bool]¶

[Preview] If True, creates a new session, where session_info will contain a random server generated session id.

If False, runs query with an existing session_id passed in connection_properties, otherwise runs query in non-session mode.

See https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationQuery.FIELDS.create_session

New in version 2.29.0.

property default_dataset¶

the default dataset to use for unqualified table names in the query or None if not set.

The default_dataset setter accepts:

a Dataset, or
a DatasetReference, or
a str of the fully-qualified dataset ID in standard SQL format. The value must included a project ID and dataset ID separated by .. For example: your-project.your_dataset.

See https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationQuery.FIELDS.default_dataset

Type: google.cloud.bigquery.dataset.DatasetReference

property destination¶

table where results are written or None if not set.

The destination setter accepts:

a Table, or
a TableReference, or
a str of the fully-qualified table ID in standard SQL format. The value must included a project ID, dataset ID, and table ID, each separated by .. For example: your-project.your_dataset.your_table.

Note

Only table ID is passed to the backend, so any configuration in ~google.cloud.bigquery.table.Table is discarded.

See https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationQuery.FIELDS.destination_table

Type: google.cloud.bigquery.table.TableReference

property destination_encryption_configuration¶

Custom encryption configuration for the destination table.

Custom encryption configuration (e.g., Cloud KMS keys) or None if using default encryption.

See https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationQuery.FIELDS.destination_encryption_configuration

Type: google.cloud.bigquery.encryption_configuration.EncryptionConfiguration

property dry_run¶

True if this query should be a dry run to estimate costs.

See https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfiguration.FIELDS.dry_run

Type: bool

property flatten_results¶

Flatten nested/repeated fields in results. (Legacy SQL only)

See https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationQuery.FIELDS.flatten_results

Type: bool

classmethod from_api_repr(resource: dict) → google.cloud.bigquery.job.base._JobConfig¶

Factory: construct a job configuration given its API representation

Parameters: resource (Dict) – A job configuration in the same representation as is returned from the API.
Returns: Configuration parsed from resource.
Return type: google.cloud.bigquery.job._JobConfig

property job_timeout_ms¶

Optional parameter. Job timeout in milliseconds. If this time limit is exceeded, BigQuery might attempt to stop the job. https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfiguration.FIELDS.job_timeout_ms e.g.

job_config = bigquery.QueryJobConfig( job_timeout_ms = 5000 ) or job_config.job_timeout_ms = 5000

Raises: ValueError – If value type is invalid.

property labels¶

Labels for the job.

This method always returns a dict. Once a job has been created on the server, its labels cannot be modified anymore.

Raises: ValueError – If value type is invalid.
Type: Dict[str, str]

property maximum_billing_tier¶

Deprecated. Changes the billing tier to allow high-compute queries.

See https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationQuery.FIELDS.maximum_billing_tier

Type: int

property maximum_bytes_billed¶

Maximum bytes to be billed for this job or None if not set.

See https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationQuery.FIELDS.maximum_bytes_billed

Type: int

property priority¶

Priority of the query.

See https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationQuery.FIELDS.priority

Type: google.cloud.bigquery.job.QueryPriority

property query_parameters¶

list of parameters for parameterized query (empty by default)

See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationQuery.FIELDS.query_parameters

Type: List[Union[google.cloud.bigquery.query.ArrayQueryParameter, google.cloud.bigquery.query.ScalarQueryParameter, google.cloud.bigquery.query.StructQueryParameter]]

property range_partitioning¶

Optional[google.cloud.bigquery.table.RangePartitioning]: Configures range-based partitioning for destination table.

Note

Beta. The integer range partitioning feature is in a pre-release state and might change or have limited support.

Only specify at most one of time_partitioning or range_partitioning.

Raises: ValueError – If the value is not RangePartitioning or None.

property schema_update_options¶

Specifies updates to the destination table schema to allow as a side effect of the query job.

Type: List[google.cloud.bigquery.job.SchemaUpdateOption]

property script_options: google.cloud.bigquery.job.query.ScriptOptions¶

Options controlling the execution of scripts.

https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#scriptoptions

property table_definitions¶

Dict[str, google.cloud.bigquery.external_config.ExternalConfig]: Definitions for external tables or None if not set.

See https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationQuery.FIELDS.external_table_definitions

property time_partitioning¶

Specifies time-based partitioning for the destination table.

Only specify at most one of time_partitioning or range_partitioning.

Raises: ValueError – If the value is not TimePartitioning or None.
Type: Optional[google.cloud.bigquery.table.TimePartitioning]

to_api_repr() → dict[source]¶

Build an API representation of the query job config.

Returns: A dictionary in the format used by the BigQuery API.
Return type: Dict

property udf_resources¶

user defined function resources (empty by default)

See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationQuery.FIELDS.user_defined_function_resources

Type: List[google.cloud.bigquery.query.UDFResource]

property use_legacy_sql¶

Use legacy SQL syntax.

See https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationQuery.FIELDS.use_legacy_sql

Type: bool

property use_query_cache¶

Look for the query result in the cache.

See https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationQuery.FIELDS.use_query_cache

Type: bool

property write_disposition¶

Action that occurs if the destination table already exists.

See https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationQuery.FIELDS.write_disposition

Type: google.cloud.bigquery.job.WriteDisposition

class google.cloud.bigquery.job.QueryPlanEntry[source]¶

QueryPlanEntry represents a single stage of a query execution plan.

See https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#ExplainQueryStage for the underlying API representation within query statistics.

property completed_parallel_inputs¶

Number of parallel input segments completed.

Type: Optional[int]

property compute_ms_avg¶

Milliseconds the average worker spent on CPU-bound processing.

Type: Optional[int]

property compute_ms_max¶

Milliseconds the slowest worker spent on CPU-bound processing.

Type: Optional[int]

property compute_ratio_avg¶

Ratio of time the average worker spent on CPU-bound processing, relative to the longest time spent by any worker in any stage of the overall plan.

Type: Optional[float]

property compute_ratio_max¶

Ratio of time the slowest worker spent on CPU-bound processing, relative to the longest time spent by any worker in any stage of the overall plan.

Type: Optional[float]

property end¶

Datetime when the stage ended.

Type: Optional[Datetime]

property entry_id¶

Unique ID for the stage within the plan.

Type: Optional[str]

classmethod from_api_repr(resource: dict) → google.cloud.bigquery.job.query.QueryPlanEntry[source]¶

Factory: construct instance from the JSON repr.

Parameters: resource(Dict[str – object]): ExplainQueryStage representation returned from API.
Returns: Query plan entry parsed from resource.
Return type: google.cloud.bigquery.job.QueryPlanEntry

property input_stages¶

Entry IDs for stages that were inputs for this stage.

Type: List(int)

property name¶

Human-readable name of the stage.

Type: Optional[str]

property parallel_inputs¶

Number of parallel input segments within the stage.

Type: Optional[int]

property read_ms_avg¶

Milliseconds the average worker spent reading input.

Type: Optional[int]

property read_ms_max¶

Milliseconds the slowest worker spent reading input.

Type: Optional[int]

property read_ratio_avg¶

Ratio of time the average worker spent reading input, relative to the longest time spent by any worker in any stage of the overall plan.

Type: Optional[float]

property read_ratio_max¶

Ratio of time the slowest worker spent reading to be scheduled, relative to the longest time spent by any worker in any stage of the overall plan.

Type: Optional[float]

property records_read¶

Number of records read by this stage.

Type: Optional[int]

property records_written¶

Number of records written by this stage.

Type: Optional[int]

property shuffle_output_bytes¶

Number of bytes written by this stage to intermediate shuffle.

Type: Optional[int]

property shuffle_output_bytes_spilled¶

Number of bytes written by this stage to intermediate shuffle and spilled to disk.

Type: Optional[int]

property slot_ms¶

Slot-milliseconds used by the stage.

Type: Optional[int]

property start¶

Datetime when the stage started.

Type: Optional[Datetime]

property status¶

status of this stage.

Type: Optional[str]

property steps¶

List of step operations performed by each worker in the stage.

Type: List(QueryPlanEntryStep)

property wait_ms_avg¶

Milliseconds the average worker spent waiting to be scheduled.

Type: Optional[int]

property wait_ms_max¶

Milliseconds the slowest worker spent waiting to be scheduled.

Type: Optional[int]

property wait_ratio_avg¶

Ratio of time the average worker spent waiting to be scheduled, relative to the longest time spent by any worker in any stage of the overall plan.

Type: Optional[float]

property wait_ratio_max¶

Ratio of time the slowest worker spent waiting to be scheduled, relative to the longest time spent by any worker in any stage of the overall plan.

Type: Optional[float]

property write_ms_avg¶

Milliseconds the average worker spent writing output data.

Type: Optional[int]

property write_ms_max¶

Milliseconds the slowest worker spent writing output data.

Type: Optional[int]

property write_ratio_avg¶

Ratio of time the average worker spent writing output data, relative to the longest time spent by any worker in any stage of the overall plan.

Type: Optional[float]

property write_ratio_max¶

Ratio of time the slowest worker spent writing output data, relative to the longest time spent by any worker in any stage of the overall plan.

Type: Optional[float]

class google.cloud.bigquery.job.QueryPlanEntryStep(kind, substeps)[source]¶

Map a single step in a query plan entry.

Parameters

kind (str) – step type.
substeps (List) – names of substeps.

classmethod from_api_repr(resource: dict) → google.cloud.bigquery.job.query.QueryPlanEntryStep[source]¶

Factory: construct instance from the JSON repr.

Parameters: resource (Dict) – JSON representation of the entry.
Returns: New instance built from the resource.
Return type: google.cloud.bigquery.job.QueryPlanEntryStep

class google.cloud.bigquery.job.QueryPriority[source]¶

Specifies a priority for the query. The default value is INTERACTIVE.

BATCH = 'BATCH'¶: Specifies batch priority.

INTERACTIVE = 'INTERACTIVE'¶: Specifies interactive priority.

class google.cloud.bigquery.job.ReservationUsage(name, slot_ms)¶

Job resource usage for a reservation.

Create new instance of ReservationUsage(name, slot_ms)

count(value, /)¶: Return number of occurrences of value.

index(value, start=0, stop=9223372036854775807, /)¶

Return first index of value.

Raises ValueError if the value is not present.

name¶: Reservation name or “unreserved” for on-demand resources usage.

slot_ms¶: Total slot milliseconds used by the reservation for a particular job.

class google.cloud.bigquery.job.SchemaUpdateOption[source]¶

Specifies an update to the destination table schema as a side effect of a load job.

ALLOW_FIELD_ADDITION = 'ALLOW_FIELD_ADDITION'¶: Allow adding a nullable field to the schema.

ALLOW_FIELD_RELAXATION = 'ALLOW_FIELD_RELAXATION'¶: Allow relaxing a required field in the original schema to nullable.

class google.cloud.bigquery.job.ScriptOptions(statement_timeout_ms: Optional[int] = None, statement_byte_budget: Optional[int] = None, key_result_statement: Optional[google.cloud.bigquery.enums.KeyResultStatementKind] = None)[source]¶

Options controlling the execution of scripts.

https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#ScriptOptions

classmethod from_api_repr(resource: Dict[str, Any]) → google.cloud.bigquery.job.query.ScriptOptions[source]¶

Factory: construct instance from the JSON repr.

Parameters: resource(Dict[str – Any]): ScriptOptions representation returned from API.
Returns: ScriptOptions sample parsed from resource.
Return type: google.cloud.bigquery.ScriptOptions

property key_result_statement: Optional[google.cloud.bigquery.enums.KeyResultStatementKind]¶

Determines which statement in the script represents the “key result”.

This is used to populate the schema and query results of the script job. Default is KeyResultStatementKind.LAST.

property statement_byte_budget: Optional[int]¶

Limit on the number of bytes billed per statement.

Exceeding this budget results in an error.

property statement_timeout_ms: Optional[int]¶: Timeout period for each statement in a script.

to_api_repr() → Dict[str, Any][source]¶: Construct the API resource representation.

class google.cloud.bigquery.job.ScriptStackFrame(resource)[source]¶

Stack frame showing the line/column/procedure name where the current evaluation happened.

Parameters: resource (Map[str, Any]) – JSON representation of object.

property end_column¶

One-based end column.

Type: int

property end_line¶

One-based end line.

Type: int

property procedure_id¶

Name of the active procedure.

Omitted if in a top-level script.

Type: Optional[str]

property start_column¶

One-based start column.

Type: int

property start_line¶

One-based start line.

Type: int

property text¶

Text of the current statement/expression.

Type: str

class google.cloud.bigquery.job.ScriptStatistics(resource)[source]¶

Statistics for a child job of a script.

Parameters: resource (Map[str, Any]) – JSON representation of object.

property evaluation_kind: Optional[str]¶

Indicates the type of child job.

Possible values include STATEMENT and EXPRESSION.

Type: str

property stack_frames: Sequence[google.cloud.bigquery.job.base.ScriptStackFrame]¶

Stack trace where the current evaluation happened.

Shows line/column/procedure name of each frame on the stack at the point where the current evaluation happened.

The leaf frame is first, the primary script is last.

class google.cloud.bigquery.job.SourceFormat[source]¶

The format of the data files. The default value is CSV.

Note that the set of allowed values for loading data is different than the set used for external data sources (see ExternalSourceFormat).

AVRO = 'AVRO'¶: Specifies Avro format.

CSV = 'CSV'¶: Specifies CSV format.

DATASTORE_BACKUP = 'DATASTORE_BACKUP'¶: Specifies datastore backup format

NEWLINE_DELIMITED_JSON = 'NEWLINE_DELIMITED_JSON'¶: Specifies newline delimited JSON format.

ORC = 'ORC'¶: Specifies Orc format.

PARQUET = 'PARQUET'¶: Specifies Parquet format.

class google.cloud.bigquery.job.TimelineEntry[source]¶

TimelineEntry represents progress of a query job at a particular point in time.

See https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#querytimelinesample for the underlying API representation within query statistics.

property active_units¶

Current number of input units being processed by workers, reported as largest value since the last sample.

Type: Optional[int]

property completed_units¶

Current number of input units completed by this query.

Type: Optional[int]

property elapsed_ms¶

Milliseconds elapsed since start of query execution.

Type: Optional[int]

classmethod from_api_repr(resource)[source]¶

Factory: construct instance from the JSON repr.

Parameters: resource(Dict[str – object]): QueryTimelineSample representation returned from API.
Returns: Timeline sample parsed from resource.
Return type: google.cloud.bigquery.TimelineEntry

property pending_units¶

Current number of input units remaining for query stages active at this sample time.

Type: Optional[int]

property slot_millis¶

Cumulative slot-milliseconds consumed by this query.

Type: Optional[int]

class google.cloud.bigquery.job.TransactionInfo(transaction_id: str)[source]¶

[Alpha] Information of a multi-statement transaction.

https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#TransactionInfo

New in version 2.24.0.

Create new instance of TransactionInfo(transaction_id,)

count(value, /)¶: Return number of occurrences of value.

index(value, start=0, stop=9223372036854775807, /)¶

Return first index of value.

Raises ValueError if the value is not present.

transaction_id: str¶: Output only. ID of the transaction.

class google.cloud.bigquery.job.UnknownJob(job_id, client)[source]¶

A job whose type cannot be determined.

add_done_callback(fn)¶

Add a callback to be executed when the operation is complete.

If the operation is not already complete, this will start a helper thread to poll for the status of the operation in the background.

Parameters: fn (Callable[Future]) – The callback to execute when the operation is complete.

cancel(client=None, retry: typing.Optional[google.api_core.retry.retry_unary.Retry] = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = None) → bool¶

API call: cancel job via a POST request

See https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/cancel

Parameters

client (Optional[google.cloud.bigquery.client.Client]) – the client to use. If not passed, falls back to the client stored on the current dataset.
retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using retry

Returns

Boolean indicating that the cancel request was sent.

Return type

cancelled()¶

Check if the job has been cancelled.

This always returns False. It’s not possible to check if a job was cancelled in the API. This method is here to satisfy the interface for google.api_core.future.Future.

Returns: False
Return type: bool

property configuration: google.cloud.bigquery.job.base._JobConfig¶: Job-type specific configurtion.

property created¶

Datetime at which the job was created.

Returns: the creation time (None until set from the server).
Return type: Optional[datetime.datetime]

done(retry: google.api_core.retry.retry_unary.Retry = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = 128, reload: bool = True) → bool¶

Checks if the job is complete.

Parameters

retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC. If the job state is DONE, retrying is aborted early, as the job will not change anymore.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using retry.
reload (Optional[bool]) – If True, make an API call to refresh the job state of unfinished jobs before checking. Default True.

Returns

True if the job is complete, False otherwise.

Return type

See https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/get

property ended¶

Datetime at which the job finished.

Returns: the end time (None until set from the server).
Return type: Optional[datetime.datetime]

property error_result¶

Error information about the job as a whole.

Returns: the error information (None until set from the server).
Return type: Optional[Mapping]

property errors¶

Information about individual errors generated by the job.

Returns: the error information (None until set from the server).
Return type: Optional[List[Mapping]]

property etag¶

ETag for the job resource.

Returns: the ETag (None until set from the server).
Return type: Optional[str]

exception(timeout=<object object>)¶

Get the exception from the operation, blocking if necessary.

See the documentation for the result() method for details on how this method operates, as both result and this method rely on the exact same polling logic. The only difference is that this method does not accept retry and polling arguments but relies on the default ones instead.

Parameters

timeout (int) – How long to wait for the operation to complete.
None (If) –
indefinitely. (wait) –

Returns

The operation’s: error.

Return type

Optional[google.api_core.GoogleAPICallError]

exists(client=None, retry: google.api_core.retry.retry_unary.Retry = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = None) → bool¶

API call: test for the existence of the job via a GET request

Parameters

client (Optional[google.cloud.bigquery.client.Client]) – the client to use. If not passed, falls back to the client stored on the current dataset.
retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using retry.

Returns

Boolean indicating existence of the job.

Return type