API Reference¶
The main concepts with this API are:
Client
manages connections to the BigQuery API. Use the client methods to run jobs (such as aQueryJob
viaquery()
) and manage resources.Dataset
represents a collection of tables.Table
represents a single “relation”.
Client¶
Client for interacting with the Google BigQuery API.
- class google.cloud.bigquery.client.Client(project=None, credentials=None, _http=None, location=None, default_query_job_config=None, default_load_job_config=None, client_info=None, client_options=None)[source]¶
Client to bundle configuration needed for API requests.
- Parameters
project (Optional[str]) – Project ID for the project which the client acts on behalf of. Will be passed when creating a dataset / job. If not passed, falls back to the default inferred from the environment.
credentials (Optional[google.auth.credentials.Credentials]) – The OAuth2 Credentials to use for this client. If not passed (and if no
_http
object is passed), falls back to the default inferred from the environment._http (Optional[requests.Session]) – HTTP object to make requests. Can be any object that defines
request()
with the same interface asrequests.Session.request()
. If not passed, an_http
object is created that is bound to thecredentials
for the current object. This parameter should be considered private, and could change in the future.location (Optional[str]) – Default location for jobs / datasets / tables.
default_query_job_config (Optional[google.cloud.bigquery.job.QueryJobConfig]) – Default
QueryJobConfig
. Will be merged into job configs passed into thequery
method.default_load_job_config (Optional[google.cloud.bigquery.job.LoadJobConfig]) – Default
LoadJobConfig
. Will be merged into job configs passed into theload_table_*
methods.client_info (Optional[google.api_core.client_info.ClientInfo]) – The client info used to send a user-agent string along with API requests. If
None
, then default info will be used. Generally, you only need to set this if you’re developing your own library or partner tool.client_options (Optional[Union[google.api_core.client_options.ClientOptions, Dict]]) – Client options used to set user options on the client. API Endpoint should be set through client_options.
- Raises
google.auth.exceptions.DefaultCredentialsError – Raised if
credentials
is not specified and the library fails to acquire default credentials.
- SCOPE: Optional[Tuple[str, ...]] = ('https://www.googleapis.com/auth/cloud-platform',)¶
The scopes required for authenticating as a BigQuery consumer.
- cancel_job(job_id: str, project: typing.Optional[str] = None, location: typing.Optional[str] = None, retry: google.api_core.retry.retry_unary.Retry = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = None) Union[google.cloud.bigquery.job.load.LoadJob, google.cloud.bigquery.job.copy_.CopyJob, google.cloud.bigquery.job.extract.ExtractJob, google.cloud.bigquery.job.query.QueryJob] [source]¶
Attempt to cancel a job from a job ID.
See https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/cancel
- Parameters
job_id (Union[ str, google.cloud.bigquery.job.LoadJob, google.cloud.bigquery.job.CopyJob, google.cloud.bigquery.job.ExtractJob, google.cloud.bigquery.job.QueryJob ]) – Job identifier.
project (Optional[str]) – ID of the project which owns the job (defaults to the client’s project).
location (Optional[str]) – Location where the job was run. Ignored if
job_id
is a job object.retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using
retry
.
- Returns
Job instance, based on the resource returned by the API.
- Return type
Union[ google.cloud.bigquery.job.LoadJob, google.cloud.bigquery.job.CopyJob, google.cloud.bigquery.job.ExtractJob, google.cloud.bigquery.job.QueryJob, ]
- close()[source]¶
Close the underlying transport objects, releasing system resources.
Note
The client instance can be used for making additional requests even after closing, in which case the underlying connections are automatically re-created.
- copy_table(sources: typing.Union[google.cloud.bigquery.table.Table, google.cloud.bigquery.table.TableReference, google.cloud.bigquery.table.TableListItem, str, typing.Sequence[typing.Union[google.cloud.bigquery.table.Table, google.cloud.bigquery.table.TableReference, google.cloud.bigquery.table.TableListItem, str]]], destination: typing.Union[google.cloud.bigquery.table.Table, google.cloud.bigquery.table.TableReference, google.cloud.bigquery.table.TableListItem, str], job_id: typing.Optional[str] = None, job_id_prefix: typing.Optional[str] = None, location: typing.Optional[str] = None, project: typing.Optional[str] = None, job_config: typing.Optional[google.cloud.bigquery.job.copy_.CopyJobConfig] = None, retry: google.api_core.retry.retry_unary.Retry = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = None) google.cloud.bigquery.job.copy_.CopyJob [source]¶
Copy one or more tables to another table.
See https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#jobconfigurationtablecopy
- Parameters
sources (Union[ google.cloud.bigquery.table.Table, google.cloud.bigquery.table.TableReference, google.cloud.bigquery.table.TableListItem, str, Sequence[ Union[ google.cloud.bigquery.table.Table, google.cloud.bigquery.table.TableReference, google.cloud.bigquery.table.TableListItem, str, ] ], ]) – Table or tables to be copied.
destination (Union[ google.cloud.bigquery.table.Table, google.cloud.bigquery.table.TableReference, google.cloud.bigquery.table.TableListItem, str, ]) – Table into which data is to be copied.
job_id (Optional[str]) – The ID of the job.
job_id_prefix (Optional[str]) – The user-provided prefix for a randomly generated job ID. This parameter will be ignored if a
job_id
is also given.location (Optional[str]) – Location where to run the job. Must match the location of any source table as well as the destination table.
project (Optional[str]) – Project ID of the project of where to run the job. Defaults to the client’s project.
job_config (Optional[google.cloud.bigquery.job.CopyJobConfig]) – Extra configuration options for the job.
retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using
retry
.
- Returns
A new copy job instance.
- Return type
- Raises
TypeError – If
job_config
is not an instance ofCopyJobConfig
class.
- create_dataset(dataset: typing.Union[str, google.cloud.bigquery.dataset.Dataset, google.cloud.bigquery.dataset.DatasetReference, google.cloud.bigquery.dataset.DatasetListItem], exists_ok: bool = False, retry: google.api_core.retry.retry_unary.Retry = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = None) google.cloud.bigquery.dataset.Dataset [source]¶
API call: create the dataset via a POST request.
See https://cloud.google.com/bigquery/docs/reference/rest/v2/datasets/insert
Example:
from google.cloud import bigquery client = bigquery.Client() dataset = bigquery.Dataset('my_project.my_dataset') dataset = client.create_dataset(dataset)
- Parameters
dataset (Union[ google.cloud.bigquery.dataset.Dataset, google.cloud.bigquery.dataset.DatasetReference, google.cloud.bigquery.dataset.DatasetListItem, str, ]) – A
Dataset
to create. Ifdataset
is a reference, an empty dataset is created with the specified ID and client’s default location.exists_ok (Optional[bool]) – Defaults to
False
. IfTrue
, ignore “already exists” errors when creating the dataset.retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using
retry
.
- Returns
A new
Dataset
returned from the API.- Return type
- Raises
google.cloud.exceptions.Conflict – If the dataset already exists.
- create_job(job_config: dict, retry: google.api_core.retry.retry_unary.Retry = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = None) Union[google.cloud.bigquery.job.load.LoadJob, google.cloud.bigquery.job.copy_.CopyJob, google.cloud.bigquery.job.extract.ExtractJob, google.cloud.bigquery.job.query.QueryJob] [source]¶
Create a new job.
- Parameters
job_config (dict) – configuration job representation returned from the API.
retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using
retry
.
- Returns
A new job instance.
- Return type
Union[ google.cloud.bigquery.job.LoadJob, google.cloud.bigquery.job.CopyJob, google.cloud.bigquery.job.ExtractJob, google.cloud.bigquery.job.QueryJob ]
- create_routine(routine: google.cloud.bigquery.routine.routine.Routine, exists_ok: bool = False, retry: google.api_core.retry.retry_unary.Retry = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = None) google.cloud.bigquery.routine.routine.Routine [source]¶
[Beta] Create a routine via a POST request.
See https://cloud.google.com/bigquery/docs/reference/rest/v2/routines/insert
- Parameters
routine (google.cloud.bigquery.routine.Routine) – A
Routine
to create. The dataset that the routine belongs to must already exist.exists_ok (Optional[bool]) – Defaults to
False
. IfTrue
, ignore “already exists” errors when creating the routine.retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using
retry
.
- Returns
A new
Routine
returned from the service.- Return type
- Raises
google.cloud.exceptions.Conflict – If the routine already exists.
- create_table(table: typing.Union[str, google.cloud.bigquery.table.Table, google.cloud.bigquery.table.TableReference, google.cloud.bigquery.table.TableListItem], exists_ok: bool = False, retry: google.api_core.retry.retry_unary.Retry = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = None) google.cloud.bigquery.table.Table [source]¶
API call: create a table via a PUT request
See https://cloud.google.com/bigquery/docs/reference/rest/v2/tables/insert
- Parameters
table (Union[ google.cloud.bigquery.table.Table, google.cloud.bigquery.table.TableReference, google.cloud.bigquery.table.TableListItem, str, ]) – A
Table
to create. Iftable
is a reference, an empty table is created with the specified ID. The dataset that the table belongs to must already exist.exists_ok (Optional[bool]) – Defaults to
False
. IfTrue
, ignore “already exists” errors when creating the table.retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using
retry
.
- Returns
A new
Table
returned from the service.- Return type
- Raises
google.cloud.exceptions.Conflict – If the table already exists.
- dataset(dataset_id: str, project: Optional[str] = None) google.cloud.bigquery.dataset.DatasetReference [source]¶
Deprecated: Construct a reference to a dataset.
Deprecated since version 1.24.0: Construct a
DatasetReference
using its constructor or use a string where previously a reference object was used.As of
google-cloud-bigquery
version 1.7.0, all client methods that take aDatasetReference
orTableReference
also take a string in standard SQL format, e.g.project.dataset_id
orproject.dataset_id.table_id
.- Parameters
- Returns
a new
DatasetReference
instance.- Return type
- property default_load_job_config¶
Default
LoadJobConfig
. Will be merged into job configs passed into theload_table_*
methods.
- property default_query_job_config: Optional[google.cloud.bigquery.job.query.QueryJobConfig]¶
Default
QueryJobConfig
orNone
.Will be merged into job configs passed into the
query
orquery_and_wait
methods.
- delete_dataset(dataset: typing.Union[google.cloud.bigquery.dataset.Dataset, google.cloud.bigquery.dataset.DatasetReference, google.cloud.bigquery.dataset.DatasetListItem, str], delete_contents: bool = False, retry: google.api_core.retry.retry_unary.Retry = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = None, not_found_ok: bool = False) None [source]¶
Delete a dataset.
See https://cloud.google.com/bigquery/docs/reference/rest/v2/datasets/delete
- Parameters
dataset (Union[ google.cloud.bigquery.dataset.Dataset, google.cloud.bigquery.dataset.DatasetReference, google.cloud.bigquery.dataset.DatasetListItem, str, ]) – A reference to the dataset to delete. If a string is passed in, this method attempts to create a dataset reference from a string using
google.cloud.bigquery.dataset.DatasetReference.from_string()
.delete_contents (Optional[bool]) – If True, delete all the tables in the dataset. If False and the dataset contains tables, the request will fail. Default is False.
retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using
retry
.not_found_ok (Optional[bool]) – Defaults to
False
. IfTrue
, ignore “not found” errors when deleting the dataset.
- delete_job_metadata(job_id: typing.Union[str, google.cloud.bigquery.job.load.LoadJob, google.cloud.bigquery.job.copy_.CopyJob, google.cloud.bigquery.job.extract.ExtractJob, google.cloud.bigquery.job.query.QueryJob], project: typing.Optional[str] = None, location: typing.Optional[str] = None, retry: google.api_core.retry.retry_unary.Retry = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = None, not_found_ok: bool = False)[source]¶
[Beta] Delete job metadata from job history.
Note: This does not stop a running job. Use
cancel_job()
instead.- Parameters
job_id (Union[ str, LoadJob, CopyJob, ExtractJob, QueryJob ]) – Job or job identifier.
project (Optional[str]) – ID of the project which owns the job (defaults to the client’s project).
location (Optional[str]) – Location where the job was run. Ignored if
job_id
is a job object.retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using
retry
.not_found_ok (Optional[bool]) – Defaults to
False
. IfTrue
, ignore “not found” errors when deleting the job.
- delete_model(model: typing.Union[google.cloud.bigquery.model.Model, google.cloud.bigquery.model.ModelReference, str], retry: google.api_core.retry.retry_unary.Retry = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = None, not_found_ok: bool = False) None [source]¶
[Beta] Delete a model
See https://cloud.google.com/bigquery/docs/reference/rest/v2/models/delete
- Parameters
model (Union[ google.cloud.bigquery.model.Model, google.cloud.bigquery.model.ModelReference, str, ]) – A reference to the model to delete. If a string is passed in, this method attempts to create a model reference from a string using
google.cloud.bigquery.model.ModelReference.from_string()
.retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using
retry
.not_found_ok (Optional[bool]) – Defaults to
False
. IfTrue
, ignore “not found” errors when deleting the model.
- delete_routine(routine: typing.Union[google.cloud.bigquery.routine.routine.Routine, google.cloud.bigquery.routine.routine.RoutineReference, str], retry: google.api_core.retry.retry_unary.Retry = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = None, not_found_ok: bool = False) None [source]¶
[Beta] Delete a routine.
See https://cloud.google.com/bigquery/docs/reference/rest/v2/routines/delete
- Parameters
routine (Union[ google.cloud.bigquery.routine.Routine, google.cloud.bigquery.routine.RoutineReference, str, ]) – A reference to the routine to delete. If a string is passed in, this method attempts to create a routine reference from a string using
google.cloud.bigquery.routine.RoutineReference.from_string()
.retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using
retry
.not_found_ok (Optional[bool]) – Defaults to
False
. IfTrue
, ignore “not found” errors when deleting the routine.
- delete_table(table: typing.Union[google.cloud.bigquery.table.Table, google.cloud.bigquery.table.TableReference, google.cloud.bigquery.table.TableListItem, str], retry: google.api_core.retry.retry_unary.Retry = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = None, not_found_ok: bool = False) None [source]¶
Delete a table
See https://cloud.google.com/bigquery/docs/reference/rest/v2/tables/delete
- Parameters
table (Union[ google.cloud.bigquery.table.Table, google.cloud.bigquery.table.TableReference, google.cloud.bigquery.table.TableListItem, str, ]) – A reference to the table to delete. If a string is passed in, this method attempts to create a table reference from a string using
google.cloud.bigquery.table.TableReference.from_string()
.retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using
retry
.not_found_ok (Optional[bool]) – Defaults to
False
. IfTrue
, ignore “not found” errors when deleting the table.
- extract_table(source: typing.Union[google.cloud.bigquery.table.Table, google.cloud.bigquery.table.TableReference, google.cloud.bigquery.table.TableListItem, google.cloud.bigquery.model.Model, google.cloud.bigquery.model.ModelReference, str], destination_uris: typing.Union[str, typing.Sequence[str]], job_id: typing.Optional[str] = None, job_id_prefix: typing.Optional[str] = None, location: typing.Optional[str] = None, project: typing.Optional[str] = None, job_config: typing.Optional[google.cloud.bigquery.job.extract.ExtractJobConfig] = None, retry: google.api_core.retry.retry_unary.Retry = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = None, source_type: str = 'Table') google.cloud.bigquery.job.extract.ExtractJob [source]¶
Start a job to extract a table into Cloud Storage files.
See https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#jobconfigurationextract
- Parameters
source (Union[ google.cloud.bigquery.table.Table, google.cloud.bigquery.table.TableReference, google.cloud.bigquery.table.TableListItem, google.cloud.bigquery.model.Model, google.cloud.bigquery.model.ModelReference, src, ]) – Table or Model to be extracted.
destination_uris (Union[str, Sequence[str]]) – URIs of Cloud Storage file(s) into which table data is to be extracted; in format
gs://<bucket_name>/<object_name_or_glob>
.job_id (Optional[str]) – The ID of the job.
job_id_prefix (Optional[str]) – The user-provided prefix for a randomly generated job ID. This parameter will be ignored if a
job_id
is also given.location (Optional[str]) – Location where to run the job. Must match the location of the source table.
project (Optional[str]) – Project ID of the project of where to run the job. Defaults to the client’s project.
job_config (Optional[google.cloud.bigquery.job.ExtractJobConfig]) – Extra configuration options for the job.
retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using
retry
.source_type (Optional[str]) – Type of source to be extracted.``Table`` or
Model
. Defaults toTable
.
- Returns
A new extract job instance.
- Return type
- Raises
TypeError – If
job_config
is not an instance ofExtractJobConfig
class.ValueError – If
source_type
is not amongTable
,``Model``.
- classmethod from_service_account_info(info, *args, **kwargs)¶
Factory to retrieve JSON credentials while creating client.
- Parameters
- Return type
_ClientFactoryMixin
- Returns
The client created with the retrieved JSON credentials.
- Raises
TypeError – if there is a conflict with the kwargs and the credentials created by the factory.
- classmethod from_service_account_json(json_credentials_path, *args, **kwargs)¶
Factory to retrieve JSON credentials while creating client.
- Parameters
json_credentials_path (str) – The path to a private key file (this file was given to you when you created the service account). This file must contain a JSON object with a private key and other credentials information (downloaded from the Google APIs console).
args (tuple) – Remaining positional arguments to pass to constructor.
kwargs – Remaining keyword arguments to pass to constructor.
- Return type
_ClientFactoryMixin
- Returns
The client created with the retrieved JSON credentials.
- Raises
TypeError – if there is a conflict with the kwargs and the credentials created by the factory.
- get_dataset(dataset_ref: typing.Union[google.cloud.bigquery.dataset.DatasetReference, str], retry: google.api_core.retry.retry_unary.Retry = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = None) google.cloud.bigquery.dataset.Dataset [source]¶
Fetch the dataset referenced by
dataset_ref
- Parameters
dataset_ref (Union[ google.cloud.bigquery.dataset.DatasetReference, str, ]) – A reference to the dataset to fetch from the BigQuery API. If a string is passed in, this method attempts to create a dataset reference from a string using
from_string()
.retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using
retry
.
- Returns
A
Dataset
instance.- Return type
- get_iam_policy(table: typing.Union[google.cloud.bigquery.table.Table, google.cloud.bigquery.table.TableReference, google.cloud.bigquery.table.TableListItem, str], requested_policy_version: int = 1, retry: google.api_core.retry.retry_unary.Retry = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = None) google.api_core.iam.Policy [source]¶
Return the access control policy for a table resource.
- Parameters
table (Union[ google.cloud.bigquery.table.Table, google.cloud.bigquery.table.TableReference, google.cloud.bigquery.table.TableListItem, str, ]) – The table to get the access control policy for. If a string is passed in, this method attempts to create a table reference from a string using
from_string()
.requested_policy_version (int) –
Optional. The maximum policy version that will be used to format the policy.
Only version
1
is currently supported.See: https://cloud.google.com/bigquery/docs/reference/rest/v2/GetPolicyOptions
retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using
retry
.
- Returns
The access control policy.
- Return type
- get_job(job_id: typing.Union[str, google.cloud.bigquery.job.load.LoadJob, google.cloud.bigquery.job.copy_.CopyJob, google.cloud.bigquery.job.extract.ExtractJob, google.cloud.bigquery.job.query.QueryJob], project: typing.Optional[str] = None, location: typing.Optional[str] = None, retry: google.api_core.retry.retry_unary.Retry = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = 128) Union[google.cloud.bigquery.job.load.LoadJob, google.cloud.bigquery.job.copy_.CopyJob, google.cloud.bigquery.job.extract.ExtractJob, google.cloud.bigquery.job.query.QueryJob, google.cloud.bigquery.job.base.UnknownJob] [source]¶
Fetch a job for the project associated with this client.
See https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/get
- Parameters
job_id (Union[ str, job.LoadJob, job.CopyJob, job.ExtractJob, job.QueryJob ]) – Job identifier.
project (Optional[str]) – ID of the project which owns the job (defaults to the client’s project).
location (Optional[str]) – Location where the job was run. Ignored if
job_id
is a job object.retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using
retry
.
- Returns
Job instance, based on the resource returned by the API.
- Return type
Union[job.LoadJob, job.CopyJob, job.ExtractJob, job.QueryJob, job.UnknownJob]
- get_model(model_ref: typing.Union[google.cloud.bigquery.model.ModelReference, str], retry: google.api_core.retry.retry_unary.Retry = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = None) google.cloud.bigquery.model.Model [source]¶
[Beta] Fetch the model referenced by
model_ref
.- Parameters
model_ref (Union[ google.cloud.bigquery.model.ModelReference, str, ]) – A reference to the model to fetch from the BigQuery API. If a string is passed in, this method attempts to create a model reference from a string using
google.cloud.bigquery.model.ModelReference.from_string()
.retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using
retry
.
- Returns
A
Model
instance.- Return type
- get_routine(routine_ref: typing.Union[google.cloud.bigquery.routine.routine.Routine, google.cloud.bigquery.routine.routine.RoutineReference, str], retry: google.api_core.retry.retry_unary.Retry = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = None) google.cloud.bigquery.routine.routine.Routine [source]¶
[Beta] Get the routine referenced by
routine_ref
.- Parameters
routine_ref (Union[ google.cloud.bigquery.routine.Routine, google.cloud.bigquery.routine.RoutineReference, str, ]) – A reference to the routine to fetch from the BigQuery API. If a string is passed in, this method attempts to create a reference from a string using
google.cloud.bigquery.routine.RoutineReference.from_string()
.retry (Optional[google.api_core.retry.Retry]) – How to retry the API call.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using
retry
.
- Returns
A
Routine
instance.- Return type
- get_service_account_email(project: typing.Optional[str] = None, retry: google.api_core.retry.retry_unary.Retry = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = None) str [source]¶
Get the email address of the project’s BigQuery service account
Example:
from google.cloud import bigquery client = bigquery.Client() client.get_service_account_email() # returns an email similar to: my_service_account@my-project.iam.gserviceaccount.com
Note
This is the service account that BigQuery uses to manage tables encrypted by a key in KMS.
- Parameters
project (Optional[str]) – Project ID to use for retreiving service account email. Defaults to the client’s project.
retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using
retry
.
- Returns
service account email address
- Return type
- get_table(table: typing.Union[google.cloud.bigquery.table.Table, google.cloud.bigquery.table.TableReference, google.cloud.bigquery.table.TableListItem, str], retry: google.api_core.retry.retry_unary.Retry = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = None) google.cloud.bigquery.table.Table [source]¶
Fetch the table referenced by
table
.- Parameters
table (Union[ google.cloud.bigquery.table.Table, google.cloud.bigquery.table.TableReference, google.cloud.bigquery.table.TableListItem, str, ]) – A reference to the table to fetch from the BigQuery API. If a string is passed in, this method attempts to create a table reference from a string using
google.cloud.bigquery.table.TableReference.from_string()
.retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using
retry
.
- Returns
A
Table
instance.- Return type
- insert_rows(table: Union[google.cloud.bigquery.table.Table, google.cloud.bigquery.table.TableReference, str], rows: Union[Iterable[Tuple], Iterable[Mapping[str, Any]]], selected_fields: Optional[Sequence[google.cloud.bigquery.schema.SchemaField]] = None, **kwargs) Sequence[Dict[str, Any]] [source]¶
Insert rows into a table via the streaming API.
See https://cloud.google.com/bigquery/docs/reference/rest/v2/tabledata/insertAll
BigQuery will reject insertAll payloads that exceed a defined limit (10MB). Additionally, if a payload vastly exceeds this limit, the request is rejected by the intermediate architecture, which returns a 413 (Payload Too Large) status code.
See https://cloud.google.com/bigquery/quotas#streaming_inserts
- Parameters
table (Union[ google.cloud.bigquery.table.Table, google.cloud.bigquery.table.TableReference, str, ]) – The destination table for the row data, or a reference to it.
rows (Union[Sequence[Tuple], Sequence[Dict]]) – Row data to be inserted. If a list of tuples is given, each tuple should contain data for each schema field on the current table and in the same order as the schema fields. If a list of dictionaries is given, the keys must include all required fields in the schema. Keys which do not correspond to a field in the schema are ignored.
selected_fields (Sequence[google.cloud.bigquery.schema.SchemaField]) – The fields to return. Required if
table
is aTableReference
.kwargs (dict) – Keyword arguments to
insert_rows_json()
.
- Returns
One mapping per row with insert errors: the “index” key identifies the row, and the “errors” key contains a list of the mappings describing one or more problems with the row.
- Return type
Sequence[Mappings]
- Raises
ValueError – if table’s schema is not set or rows is not a Sequence.
- insert_rows_from_dataframe(table: Union[google.cloud.bigquery.table.Table, google.cloud.bigquery.table.TableReference, str], dataframe, selected_fields: Optional[Sequence[google.cloud.bigquery.schema.SchemaField]] = None, chunk_size: int = 500, **kwargs: Dict) Sequence[Sequence[dict]] [source]¶
Insert rows into a table from a dataframe via the streaming API.
BigQuery will reject insertAll payloads that exceed a defined limit (10MB). Additionally, if a payload vastly exceeds this limit, the request is rejected by the intermediate architecture, which returns a 413 (Payload Too Large) status code.
See https://cloud.google.com/bigquery/quotas#streaming_inserts
- Parameters
table (Union[ google.cloud.bigquery.table.Table, google.cloud.bigquery.table.TableReference, str, ]) – The destination table for the row data, or a reference to it.
dataframe (pandas.DataFrame) – A
DataFrame
containing the data to load. AnyNaN
values present in the dataframe are omitted from the streaming API request(s).selected_fields (Sequence[google.cloud.bigquery.schema.SchemaField]) – The fields to return. Required if
table
is aTableReference
.chunk_size (int) – The number of rows to stream in a single chunk. Must be positive.
kwargs (Dict) – Keyword arguments to
insert_rows_json()
.
- Returns
A list with insert errors for each insert chunk. Each element is a list containing one mapping per row with insert errors: the “index” key identifies the row, and the “errors” key contains a list of the mappings describing one or more problems with the row.
- Return type
Sequence[Sequence[Mappings]]
- Raises
ValueError – if table’s schema is not set
- insert_rows_json(table: typing.Union[google.cloud.bigquery.table.Table, google.cloud.bigquery.table.TableReference, google.cloud.bigquery.table.TableListItem, str], json_rows: typing.Sequence[typing.Mapping[str, typing.Any]], row_ids: typing.Optional[typing.Union[typing.Iterable[typing.Optional[str]], google.cloud.bigquery.enums.AutoRowIDs]] = AutoRowIDs.GENERATE_UUID, skip_invalid_rows: typing.Optional[bool] = None, ignore_unknown_values: typing.Optional[bool] = None, template_suffix: typing.Optional[str] = None, retry: google.api_core.retry.retry_unary.Retry = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = None) Sequence[dict] [source]¶
Insert rows into a table without applying local type conversions.
See https://cloud.google.com/bigquery/docs/reference/rest/v2/tabledata/insertAll
BigQuery will reject insertAll payloads that exceed a defined limit (10MB). Additionally, if a payload vastly exceeds this limit, the request is rejected by the intermediate architecture, which returns a 413 (Payload Too Large) status code.
See https://cloud.google.com/bigquery/quotas#streaming_inserts
- Parameters
table (Union[ google.cloud.bigquery.table.Table google.cloud.bigquery.table.TableReference, google.cloud.bigquery.table.TableListItem, str ]) – The destination table for the row data, or a reference to it.
json_rows (Sequence[Dict]) – Row data to be inserted. Keys must match the table schema fields and values must be JSON-compatible representations.
row_ids (Union[Iterable[str], AutoRowIDs, None]) –
Unique IDs, one per row being inserted. An ID can also be
None
, indicating that an explicit insert ID should not be used for that row. If the argument is omitted altogether, unique IDs are created automatically.Changed in version 2.21.0: Can also be an iterable, not just a sequence, or an
AutoRowIDs
enum member.Deprecated since version 2.21.0: Passing
None
to explicitly request autogenerating insert IDs is deprecated, useAutoRowIDs.GENERATE_UUID
instead.skip_invalid_rows (Optional[bool]) – Insert all valid rows of a request, even if invalid rows exist. The default value is
False
, which causes the entire request to fail if any invalid rows exist.ignore_unknown_values (Optional[bool]) – Accept rows that contain values that do not match the schema. The unknown values are ignored. Default is
False
, which treats unknown values as errors.template_suffix (Optional[str]) – Treat
name
as a template table and provide a suffix. BigQuery will create the table<name> + <template_suffix>
based on the schema of the template table. See https://cloud.google.com/bigquery/streaming-data-into-bigquery#template-tablesretry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using
retry
.
- Returns
One mapping per row with insert errors: the “index” key identifies the row, and the “errors” key contains a list of the mappings describing one or more problems with the row.
- Return type
Sequence[Mappings]
- Raises
TypeError – if json_rows is not a Sequence.
- job_from_resource(resource: dict) Union[google.cloud.bigquery.job.copy_.CopyJob, google.cloud.bigquery.job.extract.ExtractJob, google.cloud.bigquery.job.load.LoadJob, google.cloud.bigquery.job.query.QueryJob, google.cloud.bigquery.job.base.UnknownJob] [source]¶
Detect correct job type from resource and instantiate.
- Parameters
resource (Dict) – one job resource from API response
- Returns
The job instance, constructed via the resource.
- Return type
Union[job.CopyJob, job.ExtractJob, job.LoadJob, job.QueryJob, job.UnknownJob]
- list_datasets(project: typing.Optional[str] = None, include_all: bool = False, filter: typing.Optional[str] = None, max_results: typing.Optional[int] = None, page_token: typing.Optional[str] = None, retry: google.api_core.retry.retry_unary.Retry = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = None, page_size: typing.Optional[int] = None) google.api_core.page_iterator.Iterator [source]¶
List datasets for the project associated with this client.
See https://cloud.google.com/bigquery/docs/reference/rest/v2/datasets/list
- Parameters
project (Optional[str]) – Project ID to use for retreiving datasets. Defaults to the client’s project.
include_all (Optional[bool]) – True if results include hidden datasets. Defaults to False.
filter (Optional[str]) – An expression for filtering the results by label. For syntax, see https://cloud.google.com/bigquery/docs/reference/rest/v2/datasets/list#body.QUERY_PARAMETERS.filter
max_results (Optional[int]) – Maximum number of datasets to return.
page_token (Optional[str]) – Token representing a cursor into the datasets. If not passed, the API will return the first page of datasets. The token marks the beginning of the iterator to be returned and the value of the
page_token
can be accessed atnext_page_token
of theHTTPIterator
.retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using
retry
.page_size (Optional[int]) – Maximum number of datasets to return per page.
- Returns
Iterator of
DatasetListItem
. associated with the project.- Return type
- list_jobs(project: typing.Optional[str] = None, parent_job: typing.Optional[typing.Union[google.cloud.bigquery.job.query.QueryJob, str]] = None, max_results: typing.Optional[int] = None, page_token: typing.Optional[str] = None, all_users: typing.Optional[bool] = None, state_filter: typing.Optional[str] = None, retry: google.api_core.retry.retry_unary.Retry = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = None, min_creation_time: typing.Optional[datetime.datetime] = None, max_creation_time: typing.Optional[datetime.datetime] = None, page_size: typing.Optional[int] = None) google.api_core.page_iterator.Iterator [source]¶
List jobs for the project associated with this client.
See https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/list
- Parameters
project (Optional[str]) – Project ID to use for retreiving datasets. Defaults to the client’s project.
parent_job (Optional[Union[ google.cloud.bigquery.job._AsyncJob, str, ]]) – If set, retrieve only child jobs of the specified parent.
max_results (Optional[int]) – Maximum number of jobs to return.
page_token (Optional[str]) – Opaque marker for the next “page” of jobs. If not passed, the API will return the first page of jobs. The token marks the beginning of the iterator to be returned and the value of the
page_token
can be accessed atnext_page_token
ofHTTPIterator
.all_users (Optional[bool]) – If true, include jobs owned by all users in the project. Defaults to
False
.state_filter (Optional[str]) –
- If set, include only jobs matching the given state. One of:
"done"
"pending"
"running"
retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using
retry
.min_creation_time (Optional[datetime.datetime]) – Min value for job creation time. If set, only jobs created after or at this timestamp are returned. If the datetime has no time zone assumes UTC time.
max_creation_time (Optional[datetime.datetime]) – Max value for job creation time. If set, only jobs created before or at this timestamp are returned. If the datetime has no time zone assumes UTC time.
page_size (Optional[int]) – Maximum number of jobs to return per page.
- Returns
Iterable of job instances.
- Return type
- list_models(dataset: typing.Union[google.cloud.bigquery.dataset.Dataset, google.cloud.bigquery.dataset.DatasetReference, google.cloud.bigquery.dataset.DatasetListItem, str], max_results: typing.Optional[int] = None, page_token: typing.Optional[str] = None, retry: google.api_core.retry.retry_unary.Retry = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = None, page_size: typing.Optional[int] = None) google.api_core.page_iterator.Iterator [source]¶
[Beta] List models in the dataset.
See https://cloud.google.com/bigquery/docs/reference/rest/v2/models/list
- Parameters
dataset (Union[ google.cloud.bigquery.dataset.Dataset, google.cloud.bigquery.dataset.DatasetReference, google.cloud.bigquery.dataset.DatasetListItem, str, ]) – A reference to the dataset whose models to list from the BigQuery API. If a string is passed in, this method attempts to create a dataset reference from a string using
google.cloud.bigquery.dataset.DatasetReference.from_string()
.max_results (Optional[int]) – Maximum number of models to return. Defaults to a value set by the API.
page_token (Optional[str]) – Token representing a cursor into the models. If not passed, the API will return the first page of models. The token marks the beginning of the iterator to be returned and the value of the
page_token
can be accessed atnext_page_token
of theHTTPIterator
.retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using
retry
.page_size – Maximum number of models to return per page. Defaults to a value set by the API.
- list_partitions(table: typing.Union[google.cloud.bigquery.table.Table, google.cloud.bigquery.table.TableReference, google.cloud.bigquery.table.TableListItem, str], retry: google.api_core.retry.retry_unary.Retry = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = None) Sequence[str] [source]¶
List the partitions in a table.
- Parameters
table (Union[ google.cloud.bigquery.table.Table, google.cloud.bigquery.table.TableReference, google.cloud.bigquery.table.TableListItem, str, ]) – The table or reference from which to get partition info
retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using
retry
. If multiple requests are made under the hood,timeout
applies to each individual request.
- Returns
A list of the partition ids present in the partitioned table
- Return type
List[str]
- list_projects(max_results: typing.Optional[int] = None, page_token: typing.Optional[str] = None, retry: google.api_core.retry.retry_unary.Retry = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = None, page_size: typing.Optional[int] = None) google.api_core.page_iterator.Iterator [source]¶
List projects for the project associated with this client.
See https://cloud.google.com/bigquery/docs/reference/rest/v2/projects/list
- Parameters
max_results (Optional[int]) – Maximum number of projects to return. Defaults to a value set by the API.
page_token (Optional[str]) – Token representing a cursor into the projects. If not passed, the API will return the first page of projects. The token marks the beginning of the iterator to be returned and the value of the
page_token
can be accessed atnext_page_token
of theHTTPIterator
.retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using
retry
.page_size (Optional[int]) – Maximum number of projects to return in each page. Defaults to a value set by the API.
- Returns
Iterator of
Project
accessible to the current client.- Return type
- list_routines(dataset: typing.Union[google.cloud.bigquery.dataset.Dataset, google.cloud.bigquery.dataset.DatasetReference, google.cloud.bigquery.dataset.DatasetListItem, str], max_results: typing.Optional[int] = None, page_token: typing.Optional[str] = None, retry: google.api_core.retry.retry_unary.Retry = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = None, page_size: typing.Optional[int] = None) google.api_core.page_iterator.Iterator [source]¶
[Beta] List routines in the dataset.
See https://cloud.google.com/bigquery/docs/reference/rest/v2/routines/list
- Parameters
dataset (Union[ google.cloud.bigquery.dataset.Dataset, google.cloud.bigquery.dataset.DatasetReference, google.cloud.bigquery.dataset.DatasetListItem, str, ]) – A reference to the dataset whose routines to list from the BigQuery API. If a string is passed in, this method attempts to create a dataset reference from a string using
google.cloud.bigquery.dataset.DatasetReference.from_string()
.max_results (Optional[int]) – Maximum number of routines to return. Defaults to a value set by the API.
page_token (Optional[str]) – Token representing a cursor into the routines. If not passed, the API will return the first page of routines. The token marks the beginning of the iterator to be returned and the value of the
page_token
can be accessed atnext_page_token
of theHTTPIterator
.retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using
retry
.page_size – Maximum number of routines to return per page. Defaults to a value set by the API.
- list_rows(table: typing.Union[google.cloud.bigquery.table.Table, google.cloud.bigquery.table.TableListItem, google.cloud.bigquery.table.TableReference, str], selected_fields: typing.Optional[typing.Sequence[google.cloud.bigquery.schema.SchemaField]] = None, max_results: typing.Optional[int] = None, page_token: typing.Optional[str] = None, start_index: typing.Optional[int] = None, page_size: typing.Optional[int] = None, retry: google.api_core.retry.retry_unary.Retry = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = None) google.cloud.bigquery.table.RowIterator [source]¶
List the rows of the table.
See https://cloud.google.com/bigquery/docs/reference/rest/v2/tabledata/list
Note
This method assumes that the provided schema is up-to-date with the schema as defined on the back-end: if the two schemas are not identical, the values returned may be incomplete. To ensure that the local copy of the schema is up-to-date, call
client.get_table
.- Parameters
table (Union[ google.cloud.bigquery.table.Table, google.cloud.bigquery.table.TableListItem, google.cloud.bigquery.table.TableReference, str, ]) – The table to list, or a reference to it. When the table object does not contain a schema and
selected_fields
is not supplied, this method callsget_table
to fetch the table schema.selected_fields (Sequence[google.cloud.bigquery.schema.SchemaField]) – The fields to return. If not supplied, data for all columns are downloaded.
max_results (Optional[int]) – Maximum number of rows to return.
page_token (Optional[str]) – Token representing a cursor into the table’s rows. If not passed, the API will return the first page of the rows. The token marks the beginning of the iterator to be returned and the value of the
page_token
can be accessed atnext_page_token
of theRowIterator
.start_index (Optional[int]) – The zero-based index of the starting row to read.
page_size (Optional[int]) – The maximum number of rows in each page of results from this request. Non-positive values are ignored. Defaults to a sensible value set by the API.
retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using
retry
. If multiple requests are made under the hood,timeout
applies to each individual request.
- Returns
Iterator of row data
Row
-s. During each page, the iterator will have thetotal_rows
attribute set, which counts the total number of rows in the table (this is distinct from the total number of rows in the current page:iterator.page.num_items
).- Return type
- list_tables(dataset: typing.Union[google.cloud.bigquery.dataset.Dataset, google.cloud.bigquery.dataset.DatasetReference, google.cloud.bigquery.dataset.DatasetListItem, str], max_results: typing.Optional[int] = None, page_token: typing.Optional[str] = None, retry: google.api_core.retry.retry_unary.Retry = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = None, page_size: typing.Optional[int] = None) google.api_core.page_iterator.Iterator [source]¶
List tables in the dataset.
See https://cloud.google.com/bigquery/docs/reference/rest/v2/tables/list
- Parameters
dataset (Union[ google.cloud.bigquery.dataset.Dataset, google.cloud.bigquery.dataset.DatasetReference, google.cloud.bigquery.dataset.DatasetListItem, str, ]) – A reference to the dataset whose tables to list from the BigQuery API. If a string is passed in, this method attempts to create a dataset reference from a string using
google.cloud.bigquery.dataset.DatasetReference.from_string()
.max_results (Optional[int]) – Maximum number of tables to return. Defaults to a value set by the API.
page_token (Optional[str]) – Token representing a cursor into the tables. If not passed, the API will return the first page of tables. The token marks the beginning of the iterator to be returned and the value of the
page_token
can be accessed atnext_page_token
of theHTTPIterator
.retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using
retry
.page_size (Optional[int]) – Maximum number of tables to return per page. Defaults to a value set by the API.
- Returns
Iterator of
TableListItem
contained within the requested dataset.- Return type
- load_table_from_dataframe(dataframe: pandas.core.frame.DataFrame, destination: Union[google.cloud.bigquery.table.Table, google.cloud.bigquery.table.TableReference, str], num_retries: int = 6, job_id: Optional[str] = None, job_id_prefix: Optional[str] = None, location: Optional[str] = None, project: Optional[str] = None, job_config: Optional[google.cloud.bigquery.job.load.LoadJobConfig] = None, parquet_compression: str = 'snappy', timeout: Union[None, float, Tuple[float, float]] = None) google.cloud.bigquery.job.load.LoadJob [source]¶
Upload the contents of a table from a pandas DataFrame.
Similar to
load_table_from_uri()
, this method creates, starts and returns aLoadJob
.Note
REPEATED fields are NOT supported when using the CSV source format. They are supported when using the PARQUET source format, but due to the way they are encoded in the
parquet
file, a mismatch with the existing table schema can occur, so REPEATED fields are not properly supported when usingpyarrow<4.0.0
using the parquet format.- Parameters
dataframe (pandas.Dataframe) – A
DataFrame
containing the data to load.destination (Union[ Table, TableReference, str ]) –
The destination table to use for loading the data. If it is an existing table, the schema of the
DataFrame
must match the schema of the destination table. If the table does not yet exist, the schema is inferred from theDataFrame
.If a string is passed in, this method attempts to create a table reference from a string using
google.cloud.bigquery.table.TableReference.from_string()
.num_retries (Optional[int]) – Number of upload retries. Defaults to 6.
job_id (Optional[str]) – Name of the job.
job_id_prefix (Optional[str]) – The user-provided prefix for a randomly generated job ID. This parameter will be ignored if a
job_id
is also given.location (Optional[str]) – Location where to run the job. Must match the location of the destination table.
project (Optional[str]) – Project ID of the project of where to run the job. Defaults to the client’s project.
job_config (Optional[LoadJobConfig]) –
Extra configuration options for the job.
To override the default pandas data type conversions, supply a value for
schema
with column names matching those of the dataframe. The BigQuery schema is used to determine the correct data type conversion. Indexes are not loaded.By default, this method uses the parquet source format. To override this, supply a value for
source_format
with the format name. Currently onlyCSV
andPARQUET
are supported.parquet_compression (Optional[str]) –
[Beta] The compression method to use if intermittently serializing
dataframe
to a parquet file. Defaults to “snappy”.The argument is directly passed as the
compression
argument to the underlyingpyarrow.parquet.write_table()
method (the default value “snappy” gets converted to uppercase). https://arrow.apache.org/docs/python/generated/pyarrow.parquet.write_table.html#pyarrow-parquet-write-tableIf the job config schema is missing, the argument is directly passed as the
compression
argument to the underlyingDataFrame.to_parquet()
method. https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_parquet.html#pandas.DataFrame.to_parquettimeout (Optional[flaot]) –
The number of seconds to wait for the underlying HTTP transport before using
retry
. Depending on the retry strategy, a request may be repeated several times using the same timeout each time. Defaults to None.Can also be passed as a tuple (connect_timeout, read_timeout). See
requests.Session.request()
documentation for details.
- Returns
A new load job.
- Return type
- Raises
ValueError – If a usable parquet engine cannot be found. This method requires
pyarrow
to be installed.TypeError – If
job_config
is not an instance ofLoadJobConfig
class.
- load_table_from_file(file_obj: IO[bytes], destination: Union[google.cloud.bigquery.table.Table, google.cloud.bigquery.table.TableReference, google.cloud.bigquery.table.TableListItem, str], rewind: bool = False, size: Optional[int] = None, num_retries: int = 6, job_id: Optional[str] = None, job_id_prefix: Optional[str] = None, location: Optional[str] = None, project: Optional[str] = None, job_config: Optional[google.cloud.bigquery.job.load.LoadJobConfig] = None, timeout: Union[None, float, Tuple[float, float]] = None) google.cloud.bigquery.job.load.LoadJob [source]¶
Upload the contents of this table from a file-like object.
Similar to
load_table_from_uri()
, this method creates, starts and returns aLoadJob
.- Parameters
file_obj (IO[bytes]) – A file handle opened in binary mode for reading.
destination (Union[Table, TableReference, TableListItem, str ]) – Table into which data is to be loaded. If a string is passed in, this method attempts to create a table reference from a string using
google.cloud.bigquery.table.TableReference.from_string()
.rewind (Optional[bool]) – If True, seek to the beginning of the file handle before reading the file. Defaults to False.
size (Optional[int]) – The number of bytes to read from the file handle. If size is
None
or large, resumable upload will be used. Otherwise, multipart upload will be used.num_retries (Optional[int]) – Number of upload retries. Defaults to 6.
job_id (Optional[str]) – Name of the job.
job_id_prefix (Optional[str]) – The user-provided prefix for a randomly generated job ID. This parameter will be ignored if a
job_id
is also given.location (Optional[str]) – Location where to run the job. Must match the location of the destination table.
project (Optional[str]) – Project ID of the project of where to run the job. Defaults to the client’s project.
job_config (Optional[LoadJobConfig]) – Extra configuration options for the job.
timeout (Optional[float]) –
The number of seconds to wait for the underlying HTTP transport before using
retry
. Depending on the retry strategy, a request may be repeated several times using the same timeout each time. Defaults to None.Can also be passed as a tuple (connect_timeout, read_timeout). See
requests.Session.request()
documentation for details.
- Returns
A new load job.
- Return type
- Raises
ValueError – If
size
is not passed in and can not be determined, or if thefile_obj
can be detected to be a file opened in text mode.TypeError – If
job_config
is not an instance ofLoadJobConfig
class.
- load_table_from_json(json_rows: Iterable[Dict[str, Any]], destination: Union[google.cloud.bigquery.table.Table, google.cloud.bigquery.table.TableReference, google.cloud.bigquery.table.TableListItem, str], num_retries: int = 6, job_id: Optional[str] = None, job_id_prefix: Optional[str] = None, location: Optional[str] = None, project: Optional[str] = None, job_config: Optional[google.cloud.bigquery.job.load.LoadJobConfig] = None, timeout: Union[None, float, Tuple[float, float]] = None) google.cloud.bigquery.job.load.LoadJob [source]¶
Upload the contents of a table from a JSON string or dict.
- Parameters
json_rows (Iterable[Dict[str, Any]]) –
Row data to be inserted. Keys must match the table schema fields and values must be JSON-compatible representations.
Note
If your data is already a newline-delimited JSON string, it is best to wrap it into a file-like object and pass it to
load_table_from_file()
:import io from google.cloud import bigquery data = u'{"foo": "bar"}' data_as_file = io.StringIO(data) client = bigquery.Client() client.load_table_from_file(data_as_file, ...)
destination (Union[ Table, TableReference, TableListItem, str ]) – Table into which data is to be loaded. If a string is passed in, this method attempts to create a table reference from a string using
google.cloud.bigquery.table.TableReference.from_string()
.num_retries (Optional[int]) – Number of upload retries. Defaults to 6.
job_id (Optional[str]) – Name of the job.
job_id_prefix (Optional[str]) – The user-provided prefix for a randomly generated job ID. This parameter will be ignored if a
job_id
is also given.location (Optional[str]) – Location where to run the job. Must match the location of the destination table.
project (Optional[str]) – Project ID of the project of where to run the job. Defaults to the client’s project.
job_config (Optional[LoadJobConfig]) – Extra configuration options for the job. The
source_format
setting is always set toNEWLINE_DELIMITED_JSON
.timeout (Optional[float]) –
The number of seconds to wait for the underlying HTTP transport before using
retry
. Depending on the retry strategy, a request may be repeated several times using the same timeout each time. Defaults to None.Can also be passed as a tuple (connect_timeout, read_timeout). See
requests.Session.request()
documentation for details.
- Returns
A new load job.
- Return type
- Raises
TypeError – If
job_config
is not an instance ofLoadJobConfig
class.
- load_table_from_uri(source_uris: typing.Union[str, typing.Sequence[str]], destination: typing.Union[google.cloud.bigquery.table.Table, google.cloud.bigquery.table.TableReference, google.cloud.bigquery.table.TableListItem, str], job_id: typing.Optional[str] = None, job_id_prefix: typing.Optional[str] = None, location: typing.Optional[str] = None, project: typing.Optional[str] = None, job_config: typing.Optional[google.cloud.bigquery.job.load.LoadJobConfig] = None, retry: google.api_core.retry.retry_unary.Retry = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = None) google.cloud.bigquery.job.load.LoadJob [source]¶
Starts a job for loading data into a table from Cloud Storage.
See https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#jobconfigurationload
- Parameters
source_uris (Union[str, Sequence[str]]) – URIs of data files to be loaded; in format
gs://<bucket_name>/<object_name_or_glob>
.destination (Union[ google.cloud.bigquery.table.Table, google.cloud.bigquery.table.TableReference, google.cloud.bigquery.table.TableListItem, str, ]) – Table into which data is to be loaded. If a string is passed in, this method attempts to create a table reference from a string using
google.cloud.bigquery.table.TableReference.from_string()
.job_id (Optional[str]) – Name of the job.
job_id_prefix (Optional[str]) – The user-provided prefix for a randomly generated job ID. This parameter will be ignored if a
job_id
is also given.location (Optional[str]) – Location where to run the job. Must match the location of the destination table.
project (Optional[str]) – Project ID of the project of where to run the job. Defaults to the client’s project.
job_config (Optional[google.cloud.bigquery.job.LoadJobConfig]) – Extra configuration options for the job.
retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using
retry
.
- Returns
A new load job.
- Return type
- Raises
TypeError – If
job_config
is not an instance ofLoadJobConfig
class.
- property location¶
Default location for jobs / datasets / tables.
- query(query: str, job_config: typing.Optional[google.cloud.bigquery.job.query.QueryJobConfig] = None, job_id: typing.Optional[str] = None, job_id_prefix: typing.Optional[str] = None, location: typing.Optional[str] = None, project: typing.Optional[str] = None, retry: google.api_core.retry.retry_unary.Retry = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = None, job_retry: google.api_core.retry.retry_unary.Retry = <google.api_core.retry.retry_unary.Retry object>, api_method: typing.Union[str, google.cloud.bigquery.enums.QueryApiMethod] = QueryApiMethod.INSERT) google.cloud.bigquery.job.query.QueryJob [source]¶
Run a SQL query.
See https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#jobconfigurationquery
- Parameters
query (str) – SQL query to be executed. Defaults to the standard SQL dialect. Use the
job_config
parameter to change dialects.job_config (Optional[google.cloud.bigquery.job.QueryJobConfig]) – Extra configuration options for the job. To override any options that were previously set in the
default_query_job_config
given to theClient
constructor, manually set those options toNone
, or whatever value is preferred.job_id (Optional[str]) – ID to use for the query job.
job_id_prefix (Optional[str]) – The prefix to use for a randomly generated job ID. This parameter will be ignored if a
job_id
is also given.location (Optional[str]) – Location where to run the job. Must match the location of the table used in the query as well as the destination table.
project (Optional[str]) – Project ID of the project of where to run the job. Defaults to the client’s project.
retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC. This only applies to making RPC calls. It isn’t used to retry failed jobs. This has a reasonable default that should only be overridden with care.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using
retry
.job_retry (Optional[google.api_core.retry.Retry]) –
How to retry failed jobs. The default retries rate-limit-exceeded errors. Passing
None
disables job retry.Not all jobs can be retried. If
job_id
is provided, then the job returned by the query will not be retryable, and an exception will be raised if a non-None
(and non-default) value forjob_retry
is also provided.Note that errors aren’t detected until
result()
is called on the job returned. Thejob_retry
specified here becomes the defaultjob_retry
forresult()
, where it can also be specified.api_method (Union[str, enums.QueryApiMethod]) –
Method with which to start the query job.
See
google.cloud.bigquery.enums.QueryApiMethod
for details on the difference between the query start methods.
- Returns
A new query job instance.
- Return type
- Raises
TypeError – If
job_config
is not an instance ofQueryJobConfig
class, or if bothjob_id
and non-None
non-defaultjob_retry
are provided.
- query_and_wait(query, *, job_config: typing.Optional[google.cloud.bigquery.job.query.QueryJobConfig] = None, location: typing.Optional[str] = None, project: typing.Optional[str] = None, api_timeout: typing.Optional[float] = None, wait_timeout: typing.Union[float, None, object] = <object object>, retry: google.api_core.retry.retry_unary.Retry = <google.api_core.retry.retry_unary.Retry object>, job_retry: google.api_core.retry.retry_unary.Retry = <google.api_core.retry.retry_unary.Retry object>, page_size: typing.Optional[int] = None, max_results: typing.Optional[int] = None) google.cloud.bigquery.table.RowIterator [source]¶
Run the query, wait for it to finish, and return the results.
While
jobCreationMode=JOB_CREATION_OPTIONAL
is in preview in thejobs.query
REST API, use the defaultjobCreationMode
unless the environment variableQUERY_PREVIEW_ENABLED=true
. AfterjobCreationMode
is GA, this method will always usejobCreationMode=JOB_CREATION_OPTIONAL
. See: https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/query- Parameters
query (str) – SQL query to be executed. Defaults to the standard SQL dialect. Use the
job_config
parameter to change dialects.job_config (Optional[google.cloud.bigquery.job.QueryJobConfig]) – Extra configuration options for the job. To override any options that were previously set in the
default_query_job_config
given to theClient
constructor, manually set those options toNone
, or whatever value is preferred.location (Optional[str]) – Location where to run the job. Must match the location of the table used in the query as well as the destination table.
project (Optional[str]) – Project ID of the project of where to run the job. Defaults to the client’s project.
api_timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using
retry
.wait_timeout (Optional[Union[float, object]]) – The number of seconds to wait for the query to finish. If the query doesn’t finish before this timeout, the client attempts to cancel the query. If unset, the underlying REST API calls have timeouts, but we still wait indefinitely for the job to finish.
retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC. This only applies to making RPC calls. It isn’t used to retry failed jobs. This has a reasonable default that should only be overridden with care.
job_retry (Optional[google.api_core.retry.Retry]) – How to retry failed jobs. The default retries rate-limit-exceeded errors. Passing
None
disables job retry. Not all jobs can be retried.page_size (Optional[int]) – The maximum number of rows in each page of results from this request. Non-positive values are ignored.
max_results (Optional[int]) – The maximum total number of rows from this request.
- Returns
Iterator of row data
Row
-s. During each page, the iterator will have thetotal_rows
attribute set, which counts the total number of rows in the result set (this is distinct from the total number of rows in the current page:iterator.page.num_items
).If the query is a special query that produces no results, e.g. a DDL query, an
_EmptyRowIterator
instance is returned.- Return type
- Raises
TypeError – If
job_config
is not an instance ofQueryJobConfig
class.
- schema_from_json(file_or_path: PathType) List[google.cloud.bigquery.schema.SchemaField] [source]¶
Takes a file object or file path that contains json that describes a table schema.
- Returns
List of
SchemaField
objects.- Return type
List[SchemaField]
- schema_to_json(schema_list: Sequence[google.cloud.bigquery.schema.SchemaField], destination: PathType)[source]¶
Takes a list of schema field objects.
Serializes the list of schema field objects as json to a file.
Destination is a file path or a file object.
- set_iam_policy(table: typing.Union[google.cloud.bigquery.table.Table, google.cloud.bigquery.table.TableReference, google.cloud.bigquery.table.TableListItem, str], policy: google.api_core.iam.Policy, updateMask: typing.Optional[str] = None, retry: google.api_core.retry.retry_unary.Retry = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = None, *, fields: typing.Sequence[str] = ()) google.api_core.iam.Policy [source]¶
Return the access control policy for a table resource.
- Parameters
table (Union[ google.cloud.bigquery.table.Table, google.cloud.bigquery.table.TableReference, google.cloud.bigquery.table.TableListItem, str, ]) – The table to get the access control policy for. If a string is passed in, this method attempts to create a table reference from a string using
from_string()
.policy (google.api_core.iam.Policy) – The access control policy to set.
updateMask (Optional[str]) –
Mask as defined by https://cloud.google.com/bigquery/docs/reference/rest/v2/tables/setIamPolicy#body.request_body.FIELDS.update_mask
Incompatible with
fields
.retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using
retry
.fields (Sequence[str]) –
Which properties to set on the policy. See: https://cloud.google.com/bigquery/docs/reference/rest/v2/tables/setIamPolicy#body.request_body.FIELDS.update_mask
Incompatible with
updateMask
.
- Returns
The updated access control policy.
- Return type
- update_dataset(dataset: google.cloud.bigquery.dataset.Dataset, fields: typing.Sequence[str], retry: google.api_core.retry.retry_unary.Retry = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = None) google.cloud.bigquery.dataset.Dataset [source]¶
Change some fields of a dataset.
Use
fields
to specify which fields to update. At least one field must be provided. If a field is listed infields
and isNone
indataset
, it will be deleted.For example, to update the default expiration times, specify both properties in the
fields
argument:bigquery_client.update_dataset( dataset, [ "default_partition_expiration_ms", "default_table_expiration_ms", ] )
If
dataset.etag
is notNone
, the update will only succeed if the dataset on the server has the same ETag. Thus reading a dataset withget_dataset
, changing its fields, and then passing it toupdate_dataset
will ensure that the changes will only be saved if no modifications to the dataset occurred since the read.- Parameters
dataset (google.cloud.bigquery.dataset.Dataset) – The dataset to update.
fields (Sequence[str]) – The properties of
dataset
to change. These are strings corresponding to the properties ofDataset
.retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using
retry
.
- Returns
The modified
Dataset
instance.- Return type
- update_model(model: google.cloud.bigquery.model.Model, fields: typing.Sequence[str], retry: google.api_core.retry.retry_unary.Retry = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = None) google.cloud.bigquery.model.Model [source]¶
[Beta] Change some fields of a model.
Use
fields
to specify which fields to update. At least one field must be provided. If a field is listed infields
and isNone
inmodel
, the field value will be deleted.For example, to update the descriptive properties of the model, specify them in the
fields
argument:bigquery_client.update_model( model, ["description", "friendly_name"] )
If
model.etag
is notNone
, the update will only succeed if the model on the server has the same ETag. Thus reading a model withget_model
, changing its fields, and then passing it toupdate_model
will ensure that the changes will only be saved if no modifications to the model occurred since the read.- Parameters
model (google.cloud.bigquery.model.Model) – The model to update.
fields (Sequence[str]) – The properties of
model
to change. These are strings corresponding to the properties ofModel
.retry (Optional[google.api_core.retry.Retry]) – A description of how to retry the API call.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using
retry
.
- Returns
The model resource returned from the API call.
- Return type
- update_routine(routine: google.cloud.bigquery.routine.routine.Routine, fields: typing.Sequence[str], retry: google.api_core.retry.retry_unary.Retry = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = None) google.cloud.bigquery.routine.routine.Routine [source]¶
[Beta] Change some fields of a routine.
Use
fields
to specify which fields to update. At least one field must be provided. If a field is listed infields
and isNone
inroutine
, the field value will be deleted.For example, to update the description property of the routine, specify it in the
fields
argument:bigquery_client.update_routine( routine, ["description"] )
Warning
During beta, partial updates are not supported. You must provide all fields in the resource.
If
etag
is notNone
, the update will only succeed if the resource on the server has the same ETag. Thus reading a routine withget_routine()
, changing its fields, and then passing it to this method will ensure that the changes will only be saved if no modifications to the resource occurred since the read.- Parameters
routine (google.cloud.bigquery.routine.Routine) – The routine to update.
fields (Sequence[str]) – The fields of
routine
to change, spelled as theRoutine
properties.retry (Optional[google.api_core.retry.Retry]) – A description of how to retry the API call.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using
retry
.
- Returns
The routine resource returned from the API call.
- Return type
- update_table(table: google.cloud.bigquery.table.Table, fields: typing.Sequence[str], retry: google.api_core.retry.retry_unary.Retry = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = None) google.cloud.bigquery.table.Table [source]¶
Change some fields of a table.
Use
fields
to specify which fields to update. At least one field must be provided. If a field is listed infields
and isNone
intable
, the field value will be deleted.For example, to update the descriptive properties of the table, specify them in the
fields
argument:bigquery_client.update_table( table, ["description", "friendly_name"] )
If
table.etag
is notNone
, the update will only succeed if the table on the server has the same ETag. Thus reading a table withget_table
, changing its fields, and then passing it toupdate_table
will ensure that the changes will only be saved if no modifications to the table occurred since the read.- Parameters
table (google.cloud.bigquery.table.Table) – The table to update.
fields (Sequence[str]) – The fields of
table
to change, spelled as theTable
properties.retry (Optional[google.api_core.retry.Retry]) – A description of how to retry the API call.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using
retry
.
- Returns
The table resource returned from the API call.
- Return type
Job¶
Define API Jobs.
- class google.cloud.bigquery.job.Compression(value)[source]¶
The compression type to use for exported files. The default value is
NONE
.DEFLATE
andSNAPPY
are only supported for Avro.- DEFLATE = 'DEFLATE'¶
Specifies DEFLATE format.
- GZIP = 'GZIP'¶
Specifies GZIP format.
- NONE = 'NONE'¶
Specifies no compression.
- SNAPPY = 'SNAPPY'¶
Specifies SNAPPY format.
- ZSTD = 'ZSTD'¶
Specifies ZSTD format.
- class google.cloud.bigquery.job.CopyJob(job_id, sources, destination, client, job_config=None)[source]¶
Asynchronous job: copy data into a table from other tables.
- Parameters
job_id (str) – the job’s ID, within the project belonging to
client
.sources (List[google.cloud.bigquery.table.TableReference]) – Table from which data is to be loaded.
destination (google.cloud.bigquery.table.TableReference) – Table into which data is to be loaded.
client (google.cloud.bigquery.client.Client) – A client which holds credentials and project configuration for the dataset (which requires a project).
job_config (Optional[google.cloud.bigquery.job.CopyJobConfig]) – Extra configuration options for the copy job.
- add_done_callback(fn)¶
Add a callback to be executed when the operation is complete.
If the operation is not already complete, this will start a helper thread to poll for the status of the operation in the background.
- Parameters
fn (Callable[Future]) – The callback to execute when the operation is complete.
- cancel(client=None, retry: typing.Optional[google.api_core.retry.retry_unary.Retry] = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = None) bool ¶
API call: cancel job via a POST request
See https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/cancel
- Parameters
client (Optional[google.cloud.bigquery.client.Client]) – the client to use. If not passed, falls back to the
client
stored on the current dataset.retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using
retry
- Returns
Boolean indicating that the cancel request was sent.
- Return type
- cancelled()¶
Check if the job has been cancelled.
This always returns False. It’s not possible to check if a job was cancelled in the API. This method is here to satisfy the interface for
google.api_core.future.Future
.- Returns
False
- Return type
- property configuration: google.cloud.bigquery.job.copy_.CopyJobConfig¶
The configuration for this copy job.
- property create_disposition¶
See
google.cloud.bigquery.job.CopyJobConfig.create_disposition
.
- property created¶
Datetime at which the job was created.
- Returns
the creation time (None until set from the server).
- Return type
Optional[datetime.datetime]
- property destination¶
Table into which data is to be loaded.
- property destination_encryption_configuration¶
Custom encryption configuration for the destination table.
Custom encryption configuration (e.g., Cloud KMS keys) or
None
if using default encryption.See
google.cloud.bigquery.job.CopyJobConfig.destination_encryption_configuration
.
- done(retry: google.api_core.retry.retry_unary.Retry = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = 128, reload: bool = True) bool ¶
Checks if the job is complete.
- Parameters
retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC. If the job state is
DONE
, retrying is aborted early, as the job will not change anymore.timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using
retry
.reload (Optional[bool]) – If
True
, make an API call to refresh the job state of unfinished jobs before checking. DefaultTrue
.
- Returns
True if the job is complete, False otherwise.
- Return type
- property ended¶
Datetime at which the job finished.
- Returns
the end time (None until set from the server).
- Return type
Optional[datetime.datetime]
- property error_result¶
Error information about the job as a whole.
- Returns
the error information (None until set from the server).
- Return type
Optional[Mapping]
- property errors¶
Information about individual errors generated by the job.
- Returns
the error information (None until set from the server).
- Return type
Optional[List[Mapping]]
- property etag¶
ETag for the job resource.
- Returns
the ETag (None until set from the server).
- Return type
Optional[str]
- exception(timeout=<object object>)¶
Get the exception from the operation, blocking if necessary.
See the documentation for the
result()
method for details on how this method operates, as bothresult
and this method rely on the exact same polling logic. The only difference is that this method does not acceptretry
andpolling
arguments but relies on the default ones instead.- Parameters
timeout (int) – How long to wait for the operation to complete.
None (If) –
indefinitely. (wait) –
- Returns
- The operation’s
error.
- Return type
Optional[google.api_core.GoogleAPICallError]
- exists(client=None, retry: google.api_core.retry.retry_unary.Retry = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = None) bool ¶
API call: test for the existence of the job via a GET request
See https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/get
- Parameters
client (Optional[google.cloud.bigquery.client.Client]) – the client to use. If not passed, falls back to the
client
stored on the current dataset.retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using
retry
.
- Returns
Boolean indicating existence of the job.
- Return type
- classmethod from_api_repr(resource, client)[source]¶
Factory: construct a job given its API representation
Note
This method assumes that the project found in the resource matches the client’s project.
- Parameters
resource (Dict) – dataset job representation returned from the API
client (google.cloud.bigquery.client.Client) – Client which holds credentials and project configuration for the dataset.
- Returns
Job parsed from
resource
.- Return type
- property num_child_jobs¶
The number of child jobs executed.
See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobStatistics.FIELDS.num_child_jobs
- Returns
int
- property parent_job_id¶
Return the ID of the parent job.
See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobStatistics.FIELDS.parent_job_id
- Returns
parent job id.
- Return type
Optional[str]
- property path¶
URL path for the job’s APIs.
- Returns
the path based on project and job ID.
- Return type
- property project¶
Project bound to the job.
- Returns
the project (derived from the client).
- Return type
- reload(client=None, retry: google.api_core.retry.retry_unary.Retry = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = 128)¶
API call: refresh job properties via a GET request.
See https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/get
- Parameters
client (Optional[google.cloud.bigquery.client.Client]) – the client to use. If not passed, falls back to the
client
stored on the current dataset.retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using
retry
.
- property reservation_usage¶
Job resource usage breakdown by reservation.
- Returns
Reservation usage stats. Can be empty if not set from the server.
- Return type
- result(retry: typing.Optional[google.api_core.retry.retry_unary.Retry] = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = None) google.cloud.bigquery.job.base._AsyncJob ¶
Start the job and wait for it to complete and get the result.
- Parameters
retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC. If the job state is
DONE
, retrying is aborted early, as the job will not change anymore.timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using
retry
. If multiple requests are made under the hood,timeout
applies to each individual request.
- Returns
This instance.
- Return type
_AsyncJob
- Raises
google.cloud.exceptions.GoogleAPICallError – if the job failed.
concurrent.futures.TimeoutError – if the job did not complete in the given timeout.
- running()¶
True if the operation is currently running.
- property script_statistics: Optional[google.cloud.bigquery.job.base.ScriptStatistics]¶
Statistics for a child job of a script.
- property self_link¶
URL for the job resource.
- Returns
the URL (None until set from the server).
- Return type
Optional[str]
- property session_info: Optional[google.cloud.bigquery.job.base.SessionInfo]¶
[Preview] Information of the session if this job is part of one.
New in version 2.29.0.
- set_exception(exception)¶
Set the Future’s exception.
- set_result(result)¶
Set the Future’s result.
- property sources¶
Table(s) from which data is to be loaded.
- Type
- property started¶
Datetime at which the job was started.
- Returns
the start time (None until set from the server).
- Return type
Optional[datetime.datetime]
- property state¶
Status of the job.
- Returns
the state (None until set from the server).
- Return type
Optional[str]
- property transaction_info: Optional[google.cloud.bigquery.job.base.TransactionInfo]¶
Information of the multi-statement transaction if this job is part of one.
Since a scripting query job can execute multiple transactions, this property is only expected on child jobs. Use the
google.cloud.bigquery.client.Client.list_jobs()
method with theparent_job
parameter to iterate over child jobs.New in version 2.24.0.
- property user_email¶
E-mail address of user who submitted the job.
- Returns
the URL (None until set from the server).
- Return type
Optional[str]
- property write_disposition¶
See
google.cloud.bigquery.job.CopyJobConfig.write_disposition
.
- class google.cloud.bigquery.job.CopyJobConfig(**kwargs)[source]¶
Configuration options for copy jobs.
All properties in this class are optional. Values which are
None
-> server defaults. Set properties on the constructed configuration by using the property name as the name of a keyword argument.- __setattr__(name, value)¶
Override to be able to raise error if an unknown property is being set
- property create_disposition¶
Specifies behavior for creating tables.
- property destination_encryption_configuration¶
Custom encryption configuration for the destination table.
Custom encryption configuration (e.g., Cloud KMS keys) or
None
if using default encryption.
- property destination_expiration_time: str¶
The time when the destination table expires. Expired tables will be deleted and their storage reclaimed.
- Type
google.cloud.bigquery.job.DestinationExpirationTime
- classmethod from_api_repr(resource: dict) google.cloud.bigquery.job.base._JobConfig ¶
Factory: construct a job configuration given its API representation
- Parameters
resource (Dict) – A job configuration in the same representation as is returned from the API.
- Returns
Configuration parsed from
resource
.- Return type
google.cloud.bigquery.job._JobConfig
- property job_timeout_ms¶
Optional parameter. Job timeout in milliseconds. If this time limit is exceeded, BigQuery might attempt to stop the job. https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfiguration.FIELDS.job_timeout_ms e.g.
job_config = bigquery.QueryJobConfig( job_timeout_ms = 5000 ) or job_config.job_timeout_ms = 5000
- Raises
ValueError – If
value
type is invalid.
- property labels¶
Labels for the job.
This method always returns a dict. Once a job has been created on the server, its labels cannot be modified anymore.
- Raises
ValueError – If
value
type is invalid.- Type
- to_api_repr() dict ¶
Build an API representation of the job config.
- Returns
A dictionary in the format used by the BigQuery API.
- Return type
Dict
- property write_disposition¶
Action that occurs if the destination table already exists.
- class google.cloud.bigquery.job.CreateDisposition[source]¶
Specifies whether the job is allowed to create new tables. The default value is
CREATE_IF_NEEDED
.Creation, truncation and append actions occur as one atomic update upon job completion.
- CREATE_IF_NEEDED = 'CREATE_IF_NEEDED'¶
If the table does not exist, BigQuery creates the table.
- CREATE_NEVER = 'CREATE_NEVER'¶
The table must already exist. If it does not, a ‘notFound’ error is returned in the job result.
- class google.cloud.bigquery.job.DestinationFormat[source]¶
The exported file format. The default value is
CSV
.Tables with nested or repeated fields cannot be exported as CSV.
- AVRO = 'AVRO'¶
Specifies Avro format.
- CSV = 'CSV'¶
Specifies CSV format.
- NEWLINE_DELIMITED_JSON = 'NEWLINE_DELIMITED_JSON'¶
Specifies newline delimited JSON format.
- PARQUET = 'PARQUET'¶
Specifies Parquet format.
- class google.cloud.bigquery.job.DmlStats(inserted_row_count: int = 0, deleted_row_count: int = 0, updated_row_count: int = 0)[source]¶
Detailed statistics for DML statements.
https://cloud.google.com/bigquery/docs/reference/rest/v2/DmlStats
Create new instance of DmlStats(inserted_row_count, deleted_row_count, updated_row_count)
- count(value, /)¶
Return number of occurrences of value.
- deleted_row_count: int¶
Number of deleted rows. populated by DML DELETE, MERGE and TRUNCATE statements.
- index(value, start=0, stop=9223372036854775807, /)¶
Return first index of value.
Raises ValueError if the value is not present.
- class google.cloud.bigquery.job.Encoding[source]¶
The character encoding of the data. The default is
UTF_8
.BigQuery decodes the data after the raw, binary data has been split using the values of the quote and fieldDelimiter properties.
- ISO_8859_1 = 'ISO-8859-1'¶
Specifies ISO-8859-1 encoding.
- UTF_8 = 'UTF-8'¶
Specifies UTF-8 encoding.
- class google.cloud.bigquery.job.ExtractJob(job_id, source, destination_uris, client, job_config=None)[source]¶
Asynchronous job: extract data from a table into Cloud Storage.
- Parameters
job_id (str) – the job’s ID.
source (Union[ google.cloud.bigquery.table.TableReference, google.cloud.bigquery.model.ModelReference ]) – Table or Model from which data is to be loaded or extracted.
destination_uris (List[str]) – URIs describing where the extracted data will be written in Cloud Storage, using the format
gs://<bucket_name>/<object_name_or_glob>
.client (google.cloud.bigquery.client.Client) – A client which holds credentials and project configuration.
job_config (Optional[google.cloud.bigquery.job.ExtractJobConfig]) – Extra configuration options for the extract job.
- add_done_callback(fn)¶
Add a callback to be executed when the operation is complete.
If the operation is not already complete, this will start a helper thread to poll for the status of the operation in the background.
- Parameters
fn (Callable[Future]) – The callback to execute when the operation is complete.
- cancel(client=None, retry: typing.Optional[google.api_core.retry.retry_unary.Retry] = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = None) bool ¶
API call: cancel job via a POST request
See https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/cancel
- Parameters
client (Optional[google.cloud.bigquery.client.Client]) – the client to use. If not passed, falls back to the
client
stored on the current dataset.retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using
retry
- Returns
Boolean indicating that the cancel request was sent.
- Return type
- cancelled()¶
Check if the job has been cancelled.
This always returns False. It’s not possible to check if a job was cancelled in the API. This method is here to satisfy the interface for
google.api_core.future.Future
.- Returns
False
- Return type
- property compression¶
- property configuration: google.cloud.bigquery.job.extract.ExtractJobConfig¶
The configuration for this extract job.
- property created¶
Datetime at which the job was created.
- Returns
the creation time (None until set from the server).
- Return type
Optional[datetime.datetime]
- property destination_format¶
See
google.cloud.bigquery.job.ExtractJobConfig.destination_format
.
- property destination_uri_file_counts¶
Return file counts from job statistics, if present.
- Returns
A list of integer counts, each representing the number of files per destination URI or URI pattern specified in the extract configuration. These values will be in the same order as the URIs specified in the ‘destinationUris’ field. Returns None if job is not yet complete.
- Return type
List[int]
- property destination_uris¶
URIs describing where the extracted data will be written in Cloud Storage, using the format
gs://<bucket_name>/<object_name_or_glob>
.- Type
List[str]
- done(retry: google.api_core.retry.retry_unary.Retry = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = 128, reload: bool = True) bool ¶
Checks if the job is complete.
- Parameters
retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC. If the job state is
DONE
, retrying is aborted early, as the job will not change anymore.timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using
retry
.reload (Optional[bool]) – If
True
, make an API call to refresh the job state of unfinished jobs before checking. DefaultTrue
.
- Returns
True if the job is complete, False otherwise.
- Return type
- property ended¶
Datetime at which the job finished.
- Returns
the end time (None until set from the server).
- Return type
Optional[datetime.datetime]
- property error_result¶
Error information about the job as a whole.
- Returns
the error information (None until set from the server).
- Return type
Optional[Mapping]
- property errors¶
Information about individual errors generated by the job.
- Returns
the error information (None until set from the server).
- Return type
Optional[List[Mapping]]
- property etag¶
ETag for the job resource.
- Returns
the ETag (None until set from the server).
- Return type
Optional[str]
- exception(timeout=<object object>)¶
Get the exception from the operation, blocking if necessary.
See the documentation for the
result()
method for details on how this method operates, as bothresult
and this method rely on the exact same polling logic. The only difference is that this method does not acceptretry
andpolling
arguments but relies on the default ones instead.- Parameters
timeout (int) – How long to wait for the operation to complete.
None (If) –
indefinitely. (wait) –
- Returns
- The operation’s
error.
- Return type
Optional[google.api_core.GoogleAPICallError]
- exists(client=None, retry: google.api_core.retry.retry_unary.Retry = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = None) bool ¶
API call: test for the existence of the job via a GET request
See https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/get
- Parameters
client (Optional[google.cloud.bigquery.client.Client]) – the client to use. If not passed, falls back to the
client
stored on the current dataset.retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using
retry
.
- Returns
Boolean indicating existence of the job.
- Return type
- property field_delimiter¶
See
google.cloud.bigquery.job.ExtractJobConfig.field_delimiter
.
- classmethod from_api_repr(resource: dict, client) google.cloud.bigquery.job.extract.ExtractJob [source]¶
Factory: construct a job given its API representation
Note
This method assumes that the project found in the resource matches the client’s project.
- Parameters
resource (Dict) – dataset job representation returned from the API
client (google.cloud.bigquery.client.Client) – Client which holds credentials and project configuration for the dataset.
- Returns
Job parsed from
resource
.- Return type
- property num_child_jobs¶
The number of child jobs executed.
See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobStatistics.FIELDS.num_child_jobs
- Returns
int
- property parent_job_id¶
Return the ID of the parent job.
See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobStatistics.FIELDS.parent_job_id
- Returns
parent job id.
- Return type
Optional[str]
- property path¶
URL path for the job’s APIs.
- Returns
the path based on project and job ID.
- Return type
- property print_header¶
See
google.cloud.bigquery.job.ExtractJobConfig.print_header
.
- property project¶
Project bound to the job.
- Returns
the project (derived from the client).
- Return type
- reload(client=None, retry: google.api_core.retry.retry_unary.Retry = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = 128)¶
API call: refresh job properties via a GET request.
See https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/get
- Parameters
client (Optional[google.cloud.bigquery.client.Client]) – the client to use. If not passed, falls back to the
client
stored on the current dataset.retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using
retry
.
- property reservation_usage¶
Job resource usage breakdown by reservation.
- Returns
Reservation usage stats. Can be empty if not set from the server.
- Return type
- result(retry: typing.Optional[google.api_core.retry.retry_unary.Retry] = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = None) google.cloud.bigquery.job.base._AsyncJob ¶
Start the job and wait for it to complete and get the result.
- Parameters
retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC. If the job state is
DONE
, retrying is aborted early, as the job will not change anymore.timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using
retry
. If multiple requests are made under the hood,timeout
applies to each individual request.
- Returns
This instance.
- Return type
_AsyncJob
- Raises
google.cloud.exceptions.GoogleAPICallError – if the job failed.
concurrent.futures.TimeoutError – if the job did not complete in the given timeout.
- running()¶
True if the operation is currently running.
- property script_statistics: Optional[google.cloud.bigquery.job.base.ScriptStatistics]¶
Statistics for a child job of a script.
- property self_link¶
URL for the job resource.
- Returns
the URL (None until set from the server).
- Return type
Optional[str]
- property session_info: Optional[google.cloud.bigquery.job.base.SessionInfo]¶
[Preview] Information of the session if this job is part of one.
New in version 2.29.0.
- set_exception(exception)¶
Set the Future’s exception.
- set_result(result)¶
Set the Future’s result.
- property source¶
Table or Model from which data is to be loaded or extracted.
- property started¶
Datetime at which the job was started.
- Returns
the start time (None until set from the server).
- Return type
Optional[datetime.datetime]
- property state¶
Status of the job.
- Returns
the state (None until set from the server).
- Return type
Optional[str]
- property transaction_info: Optional[google.cloud.bigquery.job.base.TransactionInfo]¶
Information of the multi-statement transaction if this job is part of one.
Since a scripting query job can execute multiple transactions, this property is only expected on child jobs. Use the
google.cloud.bigquery.client.Client.list_jobs()
method with theparent_job
parameter to iterate over child jobs.New in version 2.24.0.
- class google.cloud.bigquery.job.ExtractJobConfig(**kwargs)[source]¶
Configuration options for extract jobs.
All properties in this class are optional. Values which are
None
-> server defaults. Set properties on the constructed configuration by using the property name as the name of a keyword argument.- __setattr__(name, value)¶
Override to be able to raise error if an unknown property is being set
- property compression¶
Compression type to use for exported files.
- property destination_format¶
Exported file format.
- property field_delimiter¶
Delimiter to use between fields in the exported data.
- Type
- classmethod from_api_repr(resource: dict) google.cloud.bigquery.job.base._JobConfig ¶
Factory: construct a job configuration given its API representation
- Parameters
resource (Dict) – A job configuration in the same representation as is returned from the API.
- Returns
Configuration parsed from
resource
.- Return type
google.cloud.bigquery.job._JobConfig
- property job_timeout_ms¶
Optional parameter. Job timeout in milliseconds. If this time limit is exceeded, BigQuery might attempt to stop the job. https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfiguration.FIELDS.job_timeout_ms e.g.
job_config = bigquery.QueryJobConfig( job_timeout_ms = 5000 ) or job_config.job_timeout_ms = 5000
- Raises
ValueError – If
value
type is invalid.
- property labels¶
Labels for the job.
This method always returns a dict. Once a job has been created on the server, its labels cannot be modified anymore.
- Raises
ValueError – If
value
type is invalid.- Type
- property print_header¶
Print a header row in the exported data.
- Type
- class google.cloud.bigquery.job.LoadJob(job_id, source_uris, destination, client, job_config=None)[source]¶
Asynchronous job for loading data into a table.
Can load from Google Cloud Storage URIs or from a file.
- Parameters
job_id (str) – the job’s ID
source_uris (Optional[Sequence[str]]) – URIs of one or more data files to be loaded. See https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationLoad.FIELDS.source_uris for supported URI formats. Pass None for jobs that load from a file.
destination (google.cloud.bigquery.table.TableReference) – reference to table into which data is to be loaded.
client (google.cloud.bigquery.client.Client) – A client which holds credentials and project configuration for the dataset (which requires a project).
- add_done_callback(fn)¶
Add a callback to be executed when the operation is complete.
If the operation is not already complete, this will start a helper thread to poll for the status of the operation in the background.
- Parameters
fn (Callable[Future]) – The callback to execute when the operation is complete.
- property allow_jagged_rows¶
See
google.cloud.bigquery.job.LoadJobConfig.allow_jagged_rows
.
- property allow_quoted_newlines¶
See
google.cloud.bigquery.job.LoadJobConfig.allow_quoted_newlines
.
- property autodetect¶
- cancel(client=None, retry: typing.Optional[google.api_core.retry.retry_unary.Retry] = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = None) bool ¶
API call: cancel job via a POST request
See https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/cancel
- Parameters
client (Optional[google.cloud.bigquery.client.Client]) – the client to use. If not passed, falls back to the
client
stored on the current dataset.retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using
retry
- Returns
Boolean indicating that the cancel request was sent.
- Return type
- cancelled()¶
Check if the job has been cancelled.
This always returns False. It’s not possible to check if a job was cancelled in the API. This method is here to satisfy the interface for
google.api_core.future.Future
.- Returns
False
- Return type
- property clustering_fields¶
See
google.cloud.bigquery.job.LoadJobConfig.clustering_fields
.
- property configuration: google.cloud.bigquery.job.load.LoadJobConfig¶
The configuration for this load job.
- property connection_properties: List[google.cloud.bigquery.query.ConnectionProperty]¶
See
google.cloud.bigquery.job.LoadJobConfig.connection_properties
.New in version 3.7.0.
- property create_disposition¶
See
google.cloud.bigquery.job.LoadJobConfig.create_disposition
.
- property create_session: Optional[bool]¶
See
google.cloud.bigquery.job.LoadJobConfig.create_session
.New in version 3.7.0.
- property created¶
Datetime at which the job was created.
- Returns
the creation time (None until set from the server).
- Return type
Optional[datetime.datetime]
- property destination¶
table where loaded rows are written
- property destination_encryption_configuration¶
Custom encryption configuration for the destination table.
Custom encryption configuration (e.g., Cloud KMS keys) or
None
if using default encryption.See
google.cloud.bigquery.job.LoadJobConfig.destination_encryption_configuration
.
- property destination_table_description¶
Optional[str] name given to destination table.
- property destination_table_friendly_name¶
Optional[str] name given to destination table.
- done(retry: google.api_core.retry.retry_unary.Retry = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = 128, reload: bool = True) bool ¶
Checks if the job is complete.
- Parameters
retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC. If the job state is
DONE
, retrying is aborted early, as the job will not change anymore.timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using
retry
.reload (Optional[bool]) – If
True
, make an API call to refresh the job state of unfinished jobs before checking. DefaultTrue
.
- Returns
True if the job is complete, False otherwise.
- Return type
- property encoding¶
- property ended¶
Datetime at which the job finished.
- Returns
the end time (None until set from the server).
- Return type
Optional[datetime.datetime]
- property error_result¶
Error information about the job as a whole.
- Returns
the error information (None until set from the server).
- Return type
Optional[Mapping]
- property errors¶
Information about individual errors generated by the job.
- Returns
the error information (None until set from the server).
- Return type
Optional[List[Mapping]]
- property etag¶
ETag for the job resource.
- Returns
the ETag (None until set from the server).
- Return type
Optional[str]
- exception(timeout=<object object>)¶
Get the exception from the operation, blocking if necessary.
See the documentation for the
result()
method for details on how this method operates, as bothresult
and this method rely on the exact same polling logic. The only difference is that this method does not acceptretry
andpolling
arguments but relies on the default ones instead.- Parameters
timeout (int) – How long to wait for the operation to complete.
None (If) –
indefinitely. (wait) –
- Returns
- The operation’s
error.
- Return type
Optional[google.api_core.GoogleAPICallError]
- exists(client=None, retry: google.api_core.retry.retry_unary.Retry = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = None) bool ¶
API call: test for the existence of the job via a GET request
See https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/get
- Parameters
client (Optional[google.cloud.bigquery.client.Client]) – the client to use. If not passed, falls back to the
client
stored on the current dataset.retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using
retry
.
- Returns
Boolean indicating existence of the job.
- Return type
- property field_delimiter¶
See
google.cloud.bigquery.job.LoadJobConfig.field_delimiter
.
- classmethod from_api_repr(resource: dict, client) google.cloud.bigquery.job.load.LoadJob [source]¶
Factory: construct a job given its API representation
Note
This method assumes that the project found in the resource matches the client’s project.
- Parameters
resource (Dict) – dataset job representation returned from the API
client (google.cloud.bigquery.client.Client) – Client which holds credentials and project configuration for the dataset.
- Returns
Job parsed from
resource
.- Return type
- property ignore_unknown_values¶
See
google.cloud.bigquery.job.LoadJobConfig.ignore_unknown_values
.
- property input_file_bytes¶
Count of bytes loaded from source files.
- Returns
the count (None until set from the server).
- Return type
Optional[int]
- Raises
ValueError – for invalid value types.
- property input_files¶
Count of source files.
- Returns
the count (None until set from the server).
- Return type
Optional[int]
- property max_bad_records¶
See
google.cloud.bigquery.job.LoadJobConfig.max_bad_records
.
- property null_marker¶
- property num_child_jobs¶
The number of child jobs executed.
See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobStatistics.FIELDS.num_child_jobs
- Returns
int
- property output_bytes¶
Count of bytes saved to destination table.
- Returns
the count (None until set from the server).
- Return type
Optional[int]
- property output_rows¶
Count of rows saved to destination table.
- Returns
the count (None until set from the server).
- Return type
Optional[int]
- property parent_job_id¶
Return the ID of the parent job.
See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobStatistics.FIELDS.parent_job_id
- Returns
parent job id.
- Return type
Optional[str]
- property path¶
URL path for the job’s APIs.
- Returns
the path based on project and job ID.
- Return type
- property project¶
Project bound to the job.
- Returns
the project (derived from the client).
- Return type
- property quote_character¶
See
google.cloud.bigquery.job.LoadJobConfig.quote_character
.
- property range_partitioning¶
See
google.cloud.bigquery.job.LoadJobConfig.range_partitioning
.
- property reference_file_schema_uri¶
See: attr:google.cloud.bigquery.job.LoadJobConfig.reference_file_schema_uri.
- reload(client=None, retry: google.api_core.retry.retry_unary.Retry = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = 128)¶
API call: refresh job properties via a GET request.
See https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/get
- Parameters
client (Optional[google.cloud.bigquery.client.Client]) – the client to use. If not passed, falls back to the
client
stored on the current dataset.retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using
retry
.
- property reservation_usage¶
Job resource usage breakdown by reservation.
- Returns
Reservation usage stats. Can be empty if not set from the server.
- Return type
- result(retry: typing.Optional[google.api_core.retry.retry_unary.Retry] = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = None) google.cloud.bigquery.job.base._AsyncJob ¶
Start the job and wait for it to complete and get the result.
- Parameters
retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC. If the job state is
DONE
, retrying is aborted early, as the job will not change anymore.timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using
retry
. If multiple requests are made under the hood,timeout
applies to each individual request.
- Returns
This instance.
- Return type
_AsyncJob
- Raises
google.cloud.exceptions.GoogleAPICallError – if the job failed.
concurrent.futures.TimeoutError – if the job did not complete in the given timeout.
- running()¶
True if the operation is currently running.
- property schema¶
- property schema_update_options¶
See
google.cloud.bigquery.job.LoadJobConfig.schema_update_options
.
- property script_statistics: Optional[google.cloud.bigquery.job.base.ScriptStatistics]¶
Statistics for a child job of a script.
- property self_link¶
URL for the job resource.
- Returns
the URL (None until set from the server).
- Return type
Optional[str]
- property session_info: Optional[google.cloud.bigquery.job.base.SessionInfo]¶
[Preview] Information of the session if this job is part of one.
New in version 2.29.0.
- set_exception(exception)¶
Set the Future’s exception.
- set_result(result)¶
Set the Future’s result.
- property skip_leading_rows¶
See
google.cloud.bigquery.job.LoadJobConfig.skip_leading_rows
.
- property source_format¶
- property source_uris¶
URIs of data files to be loaded. See https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationLoad.FIELDS.source_uris for supported URI formats. None for jobs that load from a file.
- Type
Optional[Sequence[str]]
- property started¶
Datetime at which the job was started.
- Returns
the start time (None until set from the server).
- Return type
Optional[datetime.datetime]
- property state¶
Status of the job.
- Returns
the state (None until set from the server).
- Return type
Optional[str]
- property time_partitioning¶
See
google.cloud.bigquery.job.LoadJobConfig.time_partitioning
.
- property transaction_info: Optional[google.cloud.bigquery.job.base.TransactionInfo]¶
Information of the multi-statement transaction if this job is part of one.
Since a scripting query job can execute multiple transactions, this property is only expected on child jobs. Use the
google.cloud.bigquery.client.Client.list_jobs()
method with theparent_job
parameter to iterate over child jobs.New in version 2.24.0.
- property use_avro_logical_types¶
See
google.cloud.bigquery.job.LoadJobConfig.use_avro_logical_types
.
- property user_email¶
E-mail address of user who submitted the job.
- Returns
the URL (None until set from the server).
- Return type
Optional[str]
- property write_disposition¶
See
google.cloud.bigquery.job.LoadJobConfig.write_disposition
.
- class google.cloud.bigquery.job.LoadJobConfig(**kwargs)[source]¶
Configuration options for load jobs.
Set properties on the constructed configuration by using the property name as the name of a keyword argument. Values which are unset or
None
use the BigQuery REST API default values. See the BigQuery REST API reference documentation for a list of default values.Required options differ based on the
source_format
value. For example, the BigQuery API’s default value forsource_format
is"CSV"
. When loading a CSV file, eitherschema
must be set orautodetect
must be set toTrue
.- __setattr__(name, value)¶
Override to be able to raise error if an unknown property is being set
- property allow_quoted_newlines¶
Allow quoted data containing newline characters (CSV only).
- Type
Optional[bool]
- property autodetect¶
Automatically infer the schema from a sample of the data.
See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationLoad.FIELDS.autodetect
- Type
Optional[bool]
- property clustering_fields¶
Fields defining clustering for the table
(Defaults to
None
).Clustering fields are immutable after table creation.
Note
BigQuery supports clustering for both partitioned and non-partitioned tables.
- Type
Optional[List[str]]
- property column_name_character_map: str¶
Optional[google.cloud.bigquery.job.ColumnNameCharacterMap]: Character map supported for column names in CSV/Parquet loads. Defaults to STRICT and can be overridden by Project Config Service. Using this option with unsupported load formats will result in an error.
- property connection_properties: List[google.cloud.bigquery.query.ConnectionProperty]¶
Connection properties.
New in version 3.7.0.
- property create_disposition¶
Specifies behavior for creating tables.
- Type
- property create_session: Optional[bool]¶
[Preview] If
True
, creates a new session, wheresession_info
will contain a random server generated session id.If
False
, runs load job with an existingsession_id
passed inconnection_properties
, otherwise runs load job in non-session mode.New in version 3.7.0.
- property decimal_target_types: Optional[FrozenSet[str]]¶
Possible SQL data types to which the source decimal values are converted.
New in version 2.21.0.
- property destination_encryption_configuration¶
Custom encryption configuration for the destination table.
Custom encryption configuration (e.g., Cloud KMS keys) or
None
if using default encryption.
- property encoding¶
The character encoding of the data.
See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationLoad.FIELDS.encoding
- Type
Optional[google.cloud.bigquery.job.Encoding]
- classmethod from_api_repr(resource: dict) google.cloud.bigquery.job.base._JobConfig ¶
Factory: construct a job configuration given its API representation
- Parameters
resource (Dict) – A job configuration in the same representation as is returned from the API.
- Returns
Configuration parsed from
resource
.- Return type
google.cloud.bigquery.job._JobConfig
- property hive_partitioning¶
[Beta] When set, it configures hive partitioning support.
Note
Experimental. This feature is experimental and might change or have limited support.
- Type
Optional[
HivePartitioningOptions
]
- property ignore_unknown_values¶
Ignore extra values not represented in the table schema.
- Type
Optional[bool]
- property job_timeout_ms¶
Optional parameter. Job timeout in milliseconds. If this time limit is exceeded, BigQuery might attempt to stop the job. https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfiguration.FIELDS.job_timeout_ms e.g.
job_config = bigquery.QueryJobConfig( job_timeout_ms = 5000 ) or job_config.job_timeout_ms = 5000
- Raises
ValueError – If
value
type is invalid.
- property json_extension¶
The extension to use for writing JSON data to BigQuery. Only supports GeoJSON currently.
- Type
Optional[str]
- property labels¶
Labels for the job.
This method always returns a dict. Once a job has been created on the server, its labels cannot be modified anymore.
- Raises
ValueError – If
value
type is invalid.- Type
- property null_marker¶
Represents a null value (CSV only).
See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationLoad.FIELDS.null_marker
- Type
Optional[str]
- property parquet_options¶
- Additional
properties to set if
sourceFormat
is set to PARQUET.
- Type
Optional[google.cloud.bigquery.format_options.ParquetOptions]
- property preserve_ascii_control_characters¶
Preserves the embedded ASCII control characters when sourceFormat is set to CSV.
- Type
Optional[bool]
- property projection_fields: Optional[List[str]]¶
If
google.cloud.bigquery.job.LoadJobConfig.source_format
is set to “DATASTORE_BACKUP”, indicates which entity properties to load into BigQuery from a Cloud Datastore backup.Property names are case sensitive and must be top-level properties. If no properties are specified, BigQuery loads all properties. If any named property isn’t found in the Cloud Datastore backup, an invalid error is returned in the job result.
- Type
Optional[List[str]]
- property quote_character¶
Character used to quote data sections (CSV only).
See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationLoad.FIELDS.quote
- Type
Optional[str]
- property range_partitioning¶
Optional[google.cloud.bigquery.table.RangePartitioning]: Configures range-based partitioning for destination table.
Note
Beta. The integer range partitioning feature is in a pre-release state and might change or have limited support.
Only specify at most one of
time_partitioning
orrange_partitioning
.- Raises
ValueError – If the value is not
RangePartitioning
orNone
.
- property reference_file_schema_uri¶
Optional[str]: When creating an external table, the user can provide a reference file with the table schema. This is enabled for the following formats:
AVRO, PARQUET, ORC
- property schema¶
Schema of the destination table.
See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationLoad.FIELDS.schema
- Type
Optional[Sequence[Union[
SchemaField
, Mapping[str, Any] ]]]
- property schema_update_options¶
Specifies updates to the destination table schema to allow as a side effect of the load job.
- Type
Optional[List[google.cloud.bigquery.job.SchemaUpdateOption]]
- property source_format¶
File format of the data.
- Type
Optional[google.cloud.bigquery.job.SourceFormat]
- property time_partitioning¶
Specifies time-based partitioning for the destination table.
Only specify at most one of
time_partitioning
orrange_partitioning
.- Type
- to_api_repr() dict ¶
Build an API representation of the job config.
- Returns
A dictionary in the format used by the BigQuery API.
- Return type
Dict
- property use_avro_logical_types¶
For loads of Avro data, governs whether Avro logical types are converted to their corresponding BigQuery types (e.g. TIMESTAMP) rather than raw types (e.g. INTEGER).
- Type
Optional[bool]
- property write_disposition¶
Action that occurs if the destination table already exists.
- Type
- class google.cloud.bigquery.job.OperationType[source]¶
Different operation types supported in table copy job.
https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#operationtype
- CLONE = 'CLONE'¶
The source table type is TABLE and the destination table type is CLONE.
- COPY = 'COPY'¶
The source and destination table have the same table type.
- OPERATION_TYPE_UNSPECIFIED = 'OPERATION_TYPE_UNSPECIFIED'¶
Unspecified operation type.
- RESTORE = 'RESTORE'¶
The source table type is SNAPSHOT and the destination table type is TABLE.
- SNAPSHOT = 'SNAPSHOT'¶
The source table type is TABLE and the destination table type is SNAPSHOT.
- class google.cloud.bigquery.job.QueryJob(job_id, query, client, job_config=None)[source]¶
Asynchronous job: query tables.
- Parameters
job_id (str) – the job’s ID, within the project belonging to
client
.query (str) – SQL query string.
client (google.cloud.bigquery.client.Client) – A client which holds credentials and project configuration for the dataset (which requires a project).
job_config (Optional[google.cloud.bigquery.job.QueryJobConfig]) – Extra configuration options for the query job.
- add_done_callback(fn)¶
Add a callback to be executed when the operation is complete.
If the operation is not already complete, this will start a helper thread to poll for the status of the operation in the background.
- Parameters
fn (Callable[Future]) – The callback to execute when the operation is complete.
- property allow_large_results¶
See
google.cloud.bigquery.job.QueryJobConfig.allow_large_results
.
- property billing_tier¶
Return billing tier from job statistics, if present.
See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobStatistics2.FIELDS.billing_tier
- Returns
Billing tier used by the job, or None if job is not yet complete.
- Return type
Optional[int]
- property cache_hit¶
Return whether or not query results were served from cache.
See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobStatistics2.FIELDS.cache_hit
- Returns
whether the query results were returned from cache, or None if job is not yet complete.
- Return type
Optional[bool]
- cancel(client=None, retry: typing.Optional[google.api_core.retry.retry_unary.Retry] = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = None) bool ¶
API call: cancel job via a POST request
See https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/cancel
- Parameters
client (Optional[google.cloud.bigquery.client.Client]) – the client to use. If not passed, falls back to the
client
stored on the current dataset.retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using
retry
- Returns
Boolean indicating that the cancel request was sent.
- Return type
- cancelled()¶
Check if the job has been cancelled.
This always returns False. It’s not possible to check if a job was cancelled in the API. This method is here to satisfy the interface for
google.api_core.future.Future
.- Returns
False
- Return type
- property clustering_fields¶
See
google.cloud.bigquery.job.QueryJobConfig.clustering_fields
.
- property configuration: google.cloud.bigquery.job.query.QueryJobConfig¶
The configuration for this query job.
- property connection_properties: List[google.cloud.bigquery.query.ConnectionProperty]¶
See
google.cloud.bigquery.job.QueryJobConfig.connection_properties
.New in version 2.29.0.
- property create_disposition¶
See
google.cloud.bigquery.job.QueryJobConfig.create_disposition
.
- property create_session: Optional[bool]¶
See
google.cloud.bigquery.job.QueryJobConfig.create_session
.New in version 2.29.0.
- property created¶
Datetime at which the job was created.
- Returns
the creation time (None until set from the server).
- Return type
Optional[datetime.datetime]
- property ddl_target_routine¶
- Return the DDL target routine, present
for CREATE/DROP FUNCTION/PROCEDURE queries.
- Type
- property ddl_target_table¶
- Return the DDL target table, present
for CREATE/DROP TABLE/VIEW queries.
See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobStatistics2.FIELDS.ddl_target_table
- Type
- property default_dataset¶
See
google.cloud.bigquery.job.QueryJobConfig.default_dataset
.
- property destination¶
- property destination_encryption_configuration¶
Custom encryption configuration for the destination table.
Custom encryption configuration (e.g., Cloud KMS keys) or
None
if using default encryption.See
google.cloud.bigquery.job.QueryJobConfig.destination_encryption_configuration
.
- done(retry: google.api_core.retry.retry_unary.Retry = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = 128, reload: bool = True) bool ¶
Checks if the job is complete.
- Parameters
retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC. If the job state is
DONE
, retrying is aborted early, as the job will not change anymore.timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using
retry
.reload (Optional[bool]) – If
True
, make an API call to refresh the job state of unfinished jobs before checking. DefaultTrue
.
- Returns
True if the job is complete, False otherwise.
- Return type
- property dry_run¶
- property ended¶
Datetime at which the job finished.
- Returns
the end time (None until set from the server).
- Return type
Optional[datetime.datetime]
- property error_result¶
Error information about the job as a whole.
- Returns
the error information (None until set from the server).
- Return type
Optional[Mapping]
- property errors¶
Information about individual errors generated by the job.
- Returns
the error information (None until set from the server).
- Return type
Optional[List[Mapping]]
- property estimated_bytes_processed¶
Return the estimated number of bytes processed by the query.
- Returns
number of DML rows affected by the job, or None if job is not yet complete.
- Return type
Optional[int]
- property etag¶
ETag for the job resource.
- Returns
the ETag (None until set from the server).
- Return type
Optional[str]
- exception(timeout=<object object>)¶
Get the exception from the operation, blocking if necessary.
See the documentation for the
result()
method for details on how this method operates, as bothresult
and this method rely on the exact same polling logic. The only difference is that this method does not acceptretry
andpolling
arguments but relies on the default ones instead.- Parameters
timeout (int) – How long to wait for the operation to complete.
None (If) –
indefinitely. (wait) –
- Returns
- The operation’s
error.
- Return type
Optional[google.api_core.GoogleAPICallError]
- exists(client=None, retry: google.api_core.retry.retry_unary.Retry = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = None) bool ¶
API call: test for the existence of the job via a GET request
See https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/get
- Parameters
client (Optional[google.cloud.bigquery.client.Client]) – the client to use. If not passed, falls back to the
client
stored on the current dataset.retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using
retry
.
- Returns
Boolean indicating existence of the job.
- Return type
- property flatten_results¶
See
google.cloud.bigquery.job.QueryJobConfig.flatten_results
.
- classmethod from_api_repr(resource: dict, client: Client) QueryJob [source]¶
Factory: construct a job given its API representation
- Parameters
resource (Dict) – dataset job representation returned from the API
client (google.cloud.bigquery.client.Client) – Client which holds credentials and project configuration for the dataset.
- Returns
Job parsed from
resource
.- Return type
- property maximum_billing_tier¶
See
google.cloud.bigquery.job.QueryJobConfig.maximum_billing_tier
.
- property maximum_bytes_billed¶
See
google.cloud.bigquery.job.QueryJobConfig.maximum_bytes_billed
.
- property num_child_jobs¶
The number of child jobs executed.
See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobStatistics.FIELDS.num_child_jobs
- Returns
int
- property num_dml_affected_rows: Optional[int]¶
Return the number of DML rows affected by the job.
- Returns
number of DML rows affected by the job, or None if job is not yet complete.
- Return type
Optional[int]
- property parent_job_id¶
Return the ID of the parent job.
See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobStatistics.FIELDS.parent_job_id
- Returns
parent job id.
- Return type
Optional[str]
- property path¶
URL path for the job’s APIs.
- Returns
the path based on project and job ID.
- Return type
- property priority¶
- property project¶
Project bound to the job.
- Returns
the project (derived from the client).
- Return type
- property query¶
The query text used in this query job.
See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationQuery.FIELDS.query
- Type
- property query_id: Optional[str]¶
[Preview] ID of a completed query.
This ID is auto-generated and not guaranteed to be populated.
- property query_parameters¶
See
google.cloud.bigquery.job.QueryJobConfig.query_parameters
.
- property query_plan¶
Return query plan from job statistics, if present.
See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobStatistics2.FIELDS.query_plan
- Returns
mappings describing the query plan, or an empty list if the query has not yet completed.
- Return type
- property range_partitioning¶
See
google.cloud.bigquery.job.QueryJobConfig.range_partitioning
.
- property referenced_tables¶
Return referenced tables from job statistics, if present.
See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobStatistics2.FIELDS.referenced_tables
- Returns
mappings describing the query plan, or an empty list if the query has not yet completed.
- Return type
List[Dict]
- reload(client=None, retry: google.api_core.retry.retry_unary.Retry = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = 128)¶
API call: refresh job properties via a GET request.
See https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/get
- Parameters
client (Optional[google.cloud.bigquery.client.Client]) – the client to use. If not passed, falls back to the
client
stored on the current dataset.retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using
retry
.
- property reservation_usage¶
Job resource usage breakdown by reservation.
- Returns
Reservation usage stats. Can be empty if not set from the server.
- Return type
- result(page_size: typing.Optional[int] = None, max_results: typing.Optional[int] = None, retry: typing.Optional[google.api_core.retry.retry_unary.Retry] = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[typing.Union[float, object]] = <object object>, start_index: typing.Optional[int] = None, job_retry: typing.Optional[google.api_core.retry.retry_unary.Retry] = <google.api_core.retry.retry_unary.Retry object>) Union[RowIterator, google.cloud.bigquery.table._EmptyRowIterator] [source]¶
Start the job and wait for it to complete and get the result.
- Parameters
page_size (Optional[int]) – The maximum number of rows in each page of results from this request. Non-positive values are ignored.
max_results (Optional[int]) – The maximum total number of rows from this request.
retry (Optional[google.api_core.retry.Retry]) – How to retry the call that retrieves rows. This only applies to making RPC calls. It isn’t used to retry failed jobs. This has a reasonable default that should only be overridden with care. If the job state is
DONE
, retrying is aborted early even if the results are not available, as this will not change anymore.timeout (Optional[Union[float, google.api_core.future.polling.PollingFuture._DEFAULT_VALUE, ]]) – The number of seconds to wait for the underlying HTTP transport before using
retry
. IfNone
, wait indefinitely unless an error is returned. If unset, only the underlying API calls have their default timeouts, but we still wait indefinitely for the job to finish.start_index (Optional[int]) – The zero-based index of the starting row to read.
job_retry (Optional[google.api_core.retry.Retry]) –
How to retry failed jobs. The default retries rate-limit-exceeded errors. Passing
None
disables job retry.Not all jobs can be retried. If
job_id
was provided to the query that created this job, then the job returned by the query will not be retryable, and an exception will be raised if non-None
non-defaultjob_retry
is also provided.
- Returns
Iterator of row data
Row
-s. During each page, the iterator will have thetotal_rows
attribute set, which counts the total number of rows in the result set (this is distinct from the total number of rows in the current page:iterator.page.num_items
).If the query is a special query that produces no results, e.g. a DDL query, an
_EmptyRowIterator
instance is returned.- Return type
- Raises
google.cloud.exceptions.GoogleAPICallError – If the job failed and retries aren’t successful.
concurrent.futures.TimeoutError – If the job did not complete in the given timeout.
TypeError – If Non-
None
and non-defaultjob_retry
is provided and the job is not retryable.
- running()¶
True if the operation is currently running.
- property schema: Optional[List[google.cloud.bigquery.schema.SchemaField]]¶
The schema of the results.
Present only for successful dry run of non-legacy SQL queries.
- property schema_update_options¶
See
google.cloud.bigquery.job.QueryJobConfig.schema_update_options
.
- property script_statistics: Optional[google.cloud.bigquery.job.base.ScriptStatistics]¶
Statistics for a child job of a script.
- property search_stats: Optional[google.cloud.bigquery.job.query.SearchStats]¶
Returns a SearchStats object.
- property self_link¶
URL for the job resource.
- Returns
the URL (None until set from the server).
- Return type
Optional[str]
- property session_info: Optional[google.cloud.bigquery.job.base.SessionInfo]¶
[Preview] Information of the session if this job is part of one.
New in version 2.29.0.
- set_exception(exception)¶
Set the Future’s exception.
- set_result(result)¶
Set the Future’s result.
- property started¶
Datetime at which the job was started.
- Returns
the start time (None until set from the server).
- Return type
Optional[datetime.datetime]
- property state¶
Status of the job.
- Returns
the state (None until set from the server).
- Return type
Optional[str]
- property statement_type¶
Return statement type from job statistics, if present.
See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobStatistics2.FIELDS.statement_type
- Returns
type of statement used by the job, or None if job is not yet complete.
- Return type
Optional[str]
- property table_definitions¶
See
google.cloud.bigquery.job.QueryJobConfig.table_definitions
.
- property time_partitioning¶
See
google.cloud.bigquery.job.QueryJobConfig.time_partitioning
.
- property timeline¶
Return the query execution timeline from job statistics.
- Type
List(TimelineEntry)
- to_arrow(progress_bar_type: Optional[str] = None, bqstorage_client: Optional[bigquery_storage.BigQueryReadClient] = None, create_bqstorage_client: bool = True, max_results: Optional[int] = None) pyarrow.Table [source]¶
[Beta] Create a class:pyarrow.Table by loading all pages of a table or query.
- Parameters
progress_bar_type (Optional[str]) –
If set, use the tqdm library to display a progress bar while the data downloads. Install the
tqdm
package to use this feature.Possible values of
progress_bar_type
include:None
No progress bar.
'tqdm'
Use the
tqdm.tqdm()
function to print a progress bar tosys.stdout
.'tqdm_notebook'
Use the
tqdm.notebook.tqdm()
function to display a progress bar as a Jupyter notebook widget.'tqdm_gui'
Use the
tqdm.tqdm_gui()
function to display a progress bar as a graphical dialog box.
bqstorage_client (Optional[google.cloud.bigquery_storage_v1.BigQueryReadClient]) –
A BigQuery Storage API client. If supplied, use the faster BigQuery Storage API to fetch rows from BigQuery. This API is a billable API.
This method requires
google-cloud-bigquery-storage
library.Reading from a specific partition or snapshot is not currently supported by this method.
create_bqstorage_client (Optional[bool]) –
If
True
(default), create a BigQuery Storage API client using the default API settings. The BigQuery Storage API is a faster way to fetch rows from BigQuery. See thebqstorage_client
parameter for more information.This argument does nothing if
bqstorage_client
is supplied.New in version 1.24.0.
max_results (Optional[int]) –
Maximum number of rows to include in the result. No limit by default.
New in version 2.21.0.
- Returns
- pyarrow.Table
A
pyarrow.Table
populated with row data and column headers from the query results. The column headers are derived from the destination table’s schema.
- Raises
ValueError – If the
pyarrow
library cannot be imported.
New in version 1.17.0.
- to_dataframe(bqstorage_client: Optional[bigquery_storage.BigQueryReadClient] = None, dtypes: Optional[Dict[str, Any]] = None, progress_bar_type: Optional[str] = None, create_bqstorage_client: bool = True, max_results: Optional[int] = None, geography_as_object: bool = False, bool_dtype: Optional[Any] = DefaultPandasDTypes.BOOL_DTYPE, int_dtype: Optional[Any] = DefaultPandasDTypes.INT_DTYPE, float_dtype: Optional[Any] = None, string_dtype: Optional[Any] = None, date_dtype: Optional[Any] = DefaultPandasDTypes.DATE_DTYPE, datetime_dtype: Optional[Any] = None, time_dtype: Optional[Any] = DefaultPandasDTypes.TIME_DTYPE, timestamp_dtype: Optional[Any] = None, range_date_dtype: Optional[Any] = DefaultPandasDTypes.RANGE_DATE_DTYPE, range_datetime_dtype: Optional[Any] = DefaultPandasDTypes.RANGE_DATETIME_DTYPE, range_timestamp_dtype: Optional[Any] = DefaultPandasDTypes.RANGE_TIMESTAMP_DTYPE) pandas.DataFrame [source]¶
Return a pandas DataFrame from a QueryJob
- Parameters
bqstorage_client (Optional[google.cloud.bigquery_storage_v1.BigQueryReadClient]) –
A BigQuery Storage API client. If supplied, use the faster BigQuery Storage API to fetch rows from BigQuery. This API is a billable API.
This method requires the
fastavro
andgoogle-cloud-bigquery-storage
libraries.Reading from a specific partition or snapshot is not currently supported by this method.
dtypes (Optional[Map[str, Union[str, pandas.Series.dtype]]]) – A dictionary of column names pandas
dtype``s. The provided ``dtype
is used when constructing the series for the column specified. Otherwise, the default pandas behavior is used.progress_bar_type (Optional[str]) –
If set, use the tqdm library to display a progress bar while the data downloads. Install the
tqdm
package to use this feature.See
to_dataframe()
for details.New in version 1.11.0.
create_bqstorage_client (Optional[bool]) –
If
True
(default), create a BigQuery Storage API client using the default API settings. The BigQuery Storage API is a faster way to fetch rows from BigQuery. See thebqstorage_client
parameter for more information.This argument does nothing if
bqstorage_client
is supplied.New in version 1.24.0.
max_results (Optional[int]) –
Maximum number of rows to include in the result. No limit by default.
New in version 2.21.0.
geography_as_object (Optional[bool]) –
If
True
, convert GEOGRAPHY data toshapely
geometry objects. IfFalse
(default), don’t cast geography data toshapely
geometry objects.New in version 2.24.0.
bool_dtype (Optional[pandas.Series.dtype, None]) –
If set, indicate a pandas ExtensionDtype (e.g.
pandas.BooleanDtype()
) to convert BigQuery Boolean type, instead of relying on the defaultpandas.BooleanDtype()
. If you explicitly set the value toNone
, then the data type will benumpy.dtype("bool")
. BigQuery Boolean type can be found at: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#boolean_typeNew in version 3.8.0.
int_dtype (Optional[pandas.Series.dtype, None]) –
If set, indicate a pandas ExtensionDtype (e.g.
pandas.Int64Dtype()
) to convert BigQuery Integer types, instead of relying on the defaultpandas.Int64Dtype()
. If you explicitly set the value toNone
, then the data type will benumpy.dtype("int64")
. A list of BigQuery Integer types can be found at: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#integer_typesNew in version 3.8.0.
float_dtype (Optional[pandas.Series.dtype, None]) –
If set, indicate a pandas ExtensionDtype (e.g.
pandas.Float32Dtype()
) to convert BigQuery Float type, instead of relying on the defaultnumpy.dtype("float64")
. If you explicitly set the value toNone
, then the data type will benumpy.dtype("float64")
. BigQuery Float type can be found at: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#floating_point_typesNew in version 3.8.0.
string_dtype (Optional[pandas.Series.dtype, None]) –
If set, indicate a pandas ExtensionDtype (e.g.
pandas.StringDtype()
) to convert BigQuery String type, instead of relying on the defaultnumpy.dtype("object")
. If you explicitly set the value toNone
, then the data type will benumpy.dtype("object")
. BigQuery String type can be found at: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#string_typeNew in version 3.8.0.
date_dtype (Optional[pandas.Series.dtype, None]) –
If set, indicate a pandas ExtensionDtype (e.g.
pandas.ArrowDtype(pyarrow.date32())
) to convert BigQuery Date type, instead of relying on the defaultdb_dtypes.DateDtype()
. If you explicitly set the value toNone
, then the data type will benumpy.dtype("datetime64[ns]")
orobject
if out of bound. BigQuery Date type can be found at: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#date_typeNew in version 3.10.0.
datetime_dtype (Optional[pandas.Series.dtype, None]) –
If set, indicate a pandas ExtensionDtype (e.g.
pandas.ArrowDtype(pyarrow.timestamp("us"))
) to convert BigQuery Datetime type, instead of relying on the defaultnumpy.dtype("datetime64[ns]
. If you explicitly set the value toNone
, then the data type will benumpy.dtype("datetime64[ns]")
orobject
if out of bound. BigQuery Datetime type can be found at: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#datetime_typeNew in version 3.10.0.
time_dtype (Optional[pandas.Series.dtype, None]) –
If set, indicate a pandas ExtensionDtype (e.g.
pandas.ArrowDtype(pyarrow.time64("us"))
) to convert BigQuery Time type, instead of relying on the defaultdb_dtypes.TimeDtype()
. If you explicitly set the value toNone
, then the data type will benumpy.dtype("object")
. BigQuery Time type can be found at: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#time_typeNew in version 3.10.0.
timestamp_dtype (Optional[pandas.Series.dtype, None]) –
If set, indicate a pandas ExtensionDtype (e.g.
pandas.ArrowDtype(pyarrow.timestamp("us", tz="UTC"))
) to convert BigQuery Timestamp type, instead of relying on the defaultnumpy.dtype("datetime64[ns, UTC]")
. If you explicitly set the value toNone
, then the data type will benumpy.dtype("datetime64[ns, UTC]")
orobject
if out of bound. BigQuery Datetime type can be found at: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#timestamp_typeNew in version 3.10.0.
range_date_dtype (Optional[pandas.Series.dtype, None]) –
If set, indicate a pandas ExtensionDtype, such as:
pandas.ArrowDtype(pyarrow.struct( [("start", pyarrow.date32()), ("end", pyarrow.date32())] ))
to convert BigQuery RANGE<DATE> type, instead of relying on the default
object
. If you explicitly set the value toNone
, the data type will beobject
. BigQuery Range type can be found at: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#range_typeNew in version 3.21.0.
range_datetime_dtype (Optional[pandas.Series.dtype, None]) –
If set, indicate a pandas ExtensionDtype, such as:
pandas.ArrowDtype(pyarrow.struct( [ ("start", pyarrow.timestamp("us")), ("end", pyarrow.timestamp("us")), ] ))
to convert BigQuery RANGE<DATETIME> type, instead of relying on the default
object
. If you explicitly set the value toNone
, the data type will beobject
. BigQuery Range type can be found at: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#range_typeNew in version 3.21.0.
range_timestamp_dtype (Optional[pandas.Series.dtype, None]) –
If set, indicate a pandas ExtensionDtype, such as:
pandas.ArrowDtype(pyarrow.struct( [ ("start", pyarrow.timestamp("us", tz="UTC")), ("end", pyarrow.timestamp("us", tz="UTC")), ] ))
to convert BigQuery RANGE<TIMESTAMP> type, instead of relying on the default
object
. If you explicitly set the value toNone
, the data type will beobject
. BigQuery Range type can be found at: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#range_typeNew in version 3.21.0.
- Returns
A
DataFrame
populated with row data and column headers from the query results. The column headers are derived from the destination table’s schema.- Return type
- Raises
ValueError – If the
pandas
library cannot be imported, or thegoogle.cloud.bigquery_storage_v1
module is required but cannot be imported. Also if geography_as_object is True, but theshapely
library cannot be imported.
- to_geodataframe(bqstorage_client: Optional[bigquery_storage.BigQueryReadClient] = None, dtypes: Optional[Dict[str, Any]] = None, progress_bar_type: Optional[str] = None, create_bqstorage_client: bool = True, max_results: Optional[int] = None, geography_column: Optional[str] = None) geopandas.GeoDataFrame [source]¶
Return a GeoPandas GeoDataFrame from a QueryJob
- Parameters
bqstorage_client (Optional[google.cloud.bigquery_storage_v1.BigQueryReadClient]) –
A BigQuery Storage API client. If supplied, use the faster BigQuery Storage API to fetch rows from BigQuery. This API is a billable API.
This method requires the
fastavro
andgoogle-cloud-bigquery-storage
libraries.Reading from a specific partition or snapshot is not currently supported by this method.
dtypes (Optional[Map[str, Union[str, pandas.Series.dtype]]]) – A dictionary of column names pandas
dtype``s. The provided ``dtype
is used when constructing the series for the column specified. Otherwise, the default pandas behavior is used.progress_bar_type (Optional[str]) –
If set, use the tqdm library to display a progress bar while the data downloads. Install the
tqdm
package to use this feature.See
to_dataframe()
for details.New in version 1.11.0.
create_bqstorage_client (Optional[bool]) –
If
True
(default), create a BigQuery Storage API client using the default API settings. The BigQuery Storage API is a faster way to fetch rows from BigQuery. See thebqstorage_client
parameter for more information.This argument does nothing if
bqstorage_client
is supplied.New in version 1.24.0.
max_results (Optional[int]) –
Maximum number of rows to include in the result. No limit by default.
New in version 2.21.0.
geography_column (Optional[str]) – If there are more than one GEOGRAPHY column, identifies which one to use to construct a GeoPandas GeoDataFrame. This option can be ommitted if there’s only one GEOGRAPHY column.
- Returns
A
geopandas.GeoDataFrame
populated with row data and column headers from the query results. The column headers are derived from the destination table’s schema.- Return type
- Raises
ValueError – If the
geopandas
library cannot be imported, or thegoogle.cloud.bigquery_storage_v1
module is required but cannot be imported.
New in version 2.24.0.
- property total_bytes_billed¶
Return total bytes billed from job statistics, if present.
- Returns
Total bytes processed by the job, or None if job is not yet complete.
- Return type
Optional[int]
- property total_bytes_processed¶
Return total bytes processed from job statistics, if present.
- Returns
Total bytes processed by the job, or None if job is not yet complete.
- Return type
Optional[int]
- property transaction_info: Optional[google.cloud.bigquery.job.base.TransactionInfo]¶
Information of the multi-statement transaction if this job is part of one.
Since a scripting query job can execute multiple transactions, this property is only expected on child jobs. Use the
google.cloud.bigquery.client.Client.list_jobs()
method with theparent_job
parameter to iterate over child jobs.New in version 2.24.0.
- property udf_resources¶
- property undeclared_query_parameters¶
Return undeclared query parameters from job statistics, if present.
- Returns
Undeclared parameters, or an empty list if the query has not yet completed.
- Return type
List[Union[ google.cloud.bigquery.query.ArrayQueryParameter, google.cloud.bigquery.query.ScalarQueryParameter, google.cloud.bigquery.query.StructQueryParameter ]]
- property use_legacy_sql¶
See
google.cloud.bigquery.job.QueryJobConfig.use_legacy_sql
.
- property use_query_cache¶
See
google.cloud.bigquery.job.QueryJobConfig.use_query_cache
.
- property user_email¶
E-mail address of user who submitted the job.
- Returns
the URL (None until set from the server).
- Return type
Optional[str]
- property write_disposition¶
See
google.cloud.bigquery.job.QueryJobConfig.write_disposition
.
- class google.cloud.bigquery.job.QueryJobConfig(**kwargs)[source]¶
Configuration options for query jobs.
All properties in this class are optional. Values which are
None
-> server defaults. Set properties on the constructed configuration by using the property name as the name of a keyword argument.- __setattr__(name, value)¶
Override to be able to raise error if an unknown property is being set
- property allow_large_results¶
Allow large query results tables (legacy SQL, only)
- Type
- property clustering_fields¶
Fields defining clustering for the table
(Defaults to
None
).Clustering fields are immutable after table creation.
Note
BigQuery supports clustering for both partitioned and non-partitioned tables.
- Type
Optional[List[str]]
- property connection_properties: List[google.cloud.bigquery.query.ConnectionProperty]¶
Connection properties.
New in version 2.29.0.
- property create_disposition¶
Specifies behavior for creating tables.
- property create_session: Optional[bool]¶
[Preview] If
True
, creates a new session, wheresession_info
will contain a random server generated session id.If
False
, runs query with an existingsession_id
passed inconnection_properties
, otherwise runs query in non-session mode.New in version 2.29.0.
- property default_dataset¶
the default dataset to use for unqualified table names in the query or
None
if not set.The
default_dataset
setter accepts:a
Dataset
, ora
DatasetReference
, ora
str
of the fully-qualified dataset ID in standard SQL format. The value must included a project ID and dataset ID separated by.
. For example:your-project.your_dataset
.
- property destination¶
table where results are written or
None
if not set.The
destination
setter accepts:a
Table
, ora
TableReference
, ora
str
of the fully-qualified table ID in standard SQL format. The value must included a project ID, dataset ID, and table ID, each separated by.
. For example:your-project.your_dataset.your_table
.
Note
Only table ID is passed to the backend, so any configuration in ~google.cloud.bigquery.table.Table is discarded.
- property destination_encryption_configuration¶
Custom encryption configuration for the destination table.
Custom encryption configuration (e.g., Cloud KMS keys) or
None
if using default encryption.
- property dry_run¶
True
if this query should be a dry run to estimate costs.See https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfiguration.FIELDS.dry_run
- Type
- property flatten_results¶
Flatten nested/repeated fields in results. (Legacy SQL only)
- Type
- classmethod from_api_repr(resource: dict) google.cloud.bigquery.job.base._JobConfig ¶
Factory: construct a job configuration given its API representation
- Parameters
resource (Dict) – A job configuration in the same representation as is returned from the API.
- Returns
Configuration parsed from
resource
.- Return type
google.cloud.bigquery.job._JobConfig
- property job_timeout_ms¶
Optional parameter. Job timeout in milliseconds. If this time limit is exceeded, BigQuery might attempt to stop the job. https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfiguration.FIELDS.job_timeout_ms e.g.
job_config = bigquery.QueryJobConfig( job_timeout_ms = 5000 ) or job_config.job_timeout_ms = 5000
- Raises
ValueError – If
value
type is invalid.
- property labels¶
Labels for the job.
This method always returns a dict. Once a job has been created on the server, its labels cannot be modified anymore.
- Raises
ValueError – If
value
type is invalid.- Type
- property maximum_billing_tier¶
Deprecated. Changes the billing tier to allow high-compute queries.
- Type
- property priority¶
Priority of the query.
See https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationQuery.FIELDS.priority
- property query_parameters¶
list of parameters for parameterized query (empty by default)
- property range_partitioning¶
Optional[google.cloud.bigquery.table.RangePartitioning]: Configures range-based partitioning for destination table.
Note
Beta. The integer range partitioning feature is in a pre-release state and might change or have limited support.
Only specify at most one of
time_partitioning
orrange_partitioning
.- Raises
ValueError – If the value is not
RangePartitioning
orNone
.
- property schema_update_options¶
Specifies updates to the destination table schema to allow as a side effect of the query job.
- Type
- property script_options: google.cloud.bigquery.job.query.ScriptOptions¶
Options controlling the execution of scripts.
https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#scriptoptions
- property table_definitions¶
Dict[str, google.cloud.bigquery.external_config.ExternalConfig]: Definitions for external tables or
None
if not set.
- property time_partitioning¶
Specifies time-based partitioning for the destination table.
Only specify at most one of
time_partitioning
orrange_partitioning
.- Raises
ValueError – If the value is not
TimePartitioning
orNone
.- Type
- to_api_repr() dict [source]¶
Build an API representation of the query job config.
- Returns
A dictionary in the format used by the BigQuery API.
- Return type
Dict
- property udf_resources¶
user defined function resources (empty by default)
- Type
- property use_legacy_sql¶
Use legacy SQL syntax.
- Type
- property use_query_cache¶
Look for the query result in the cache.
- Type
- property write_disposition¶
Action that occurs if the destination table already exists.
- class google.cloud.bigquery.job.QueryPlanEntry[source]¶
QueryPlanEntry represents a single stage of a query execution plan.
See https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#ExplainQueryStage for the underlying API representation within query statistics.
- property compute_ms_avg¶
Milliseconds the average worker spent on CPU-bound processing.
- Type
Optional[int]
- property compute_ms_max¶
Milliseconds the slowest worker spent on CPU-bound processing.
- Type
Optional[int]
- property compute_ratio_avg¶
Ratio of time the average worker spent on CPU-bound processing, relative to the longest time spent by any worker in any stage of the overall plan.
- Type
Optional[float]
- property compute_ratio_max¶
Ratio of time the slowest worker spent on CPU-bound processing, relative to the longest time spent by any worker in any stage of the overall plan.
- Type
Optional[float]
- property end¶
Datetime when the stage ended.
- Type
Optional[Datetime]
- classmethod from_api_repr(resource: dict) google.cloud.bigquery.job.query.QueryPlanEntry [source]¶
Factory: construct instance from the JSON repr.
- Parameters
resource(Dict[str – object]): ExplainQueryStage representation returned from API.
- Returns
Query plan entry parsed from
resource
.- Return type
- property read_ratio_avg¶
Ratio of time the average worker spent reading input, relative to the longest time spent by any worker in any stage of the overall plan.
- Type
Optional[float]
- property read_ratio_max¶
Ratio of time the slowest worker spent reading to be scheduled, relative to the longest time spent by any worker in any stage of the overall plan.
- Type
Optional[float]
- property shuffle_output_bytes¶
Number of bytes written by this stage to intermediate shuffle.
- Type
Optional[int]
- property shuffle_output_bytes_spilled¶
Number of bytes written by this stage to intermediate shuffle and spilled to disk.
- Type
Optional[int]
- property start¶
Datetime when the stage started.
- Type
Optional[Datetime]
- property steps¶
List of step operations performed by each worker in the stage.
- Type
List(QueryPlanEntryStep)
- property wait_ms_avg¶
Milliseconds the average worker spent waiting to be scheduled.
- Type
Optional[int]
- property wait_ms_max¶
Milliseconds the slowest worker spent waiting to be scheduled.
- Type
Optional[int]
- property wait_ratio_avg¶
Ratio of time the average worker spent waiting to be scheduled, relative to the longest time spent by any worker in any stage of the overall plan.
- Type
Optional[float]
- property wait_ratio_max¶
Ratio of time the slowest worker spent waiting to be scheduled, relative to the longest time spent by any worker in any stage of the overall plan.
- Type
Optional[float]
- property write_ms_avg¶
Milliseconds the average worker spent writing output data.
- Type
Optional[int]
- property write_ms_max¶
Milliseconds the slowest worker spent writing output data.
- Type
Optional[int]
- class google.cloud.bigquery.job.QueryPlanEntryStep(kind, substeps)[source]¶
Map a single step in a query plan entry.
- Parameters
kind (str) – step type.
substeps (List) – names of substeps.
- classmethod from_api_repr(resource: dict) google.cloud.bigquery.job.query.QueryPlanEntryStep [source]¶
Factory: construct instance from the JSON repr.
- Parameters
resource (Dict) – JSON representation of the entry.
- Returns
New instance built from the resource.
- Return type
- class google.cloud.bigquery.job.QueryPriority[source]¶
Specifies a priority for the query. The default value is
INTERACTIVE
.- BATCH = 'BATCH'¶
Specifies batch priority.
- INTERACTIVE = 'INTERACTIVE'¶
Specifies interactive priority.
- class google.cloud.bigquery.job.ReservationUsage(name, slot_ms)¶
Job resource usage for a reservation.
Create new instance of ReservationUsage(name, slot_ms)
- count(value, /)¶
Return number of occurrences of value.
- index(value, start=0, stop=9223372036854775807, /)¶
Return first index of value.
Raises ValueError if the value is not present.
- name¶
Reservation name or “unreserved” for on-demand resources usage.
- slot_ms¶
Total slot milliseconds used by the reservation for a particular job.
- class google.cloud.bigquery.job.SchemaUpdateOption[source]¶
Specifies an update to the destination table schema as a side effect of a load job.
- ALLOW_FIELD_ADDITION = 'ALLOW_FIELD_ADDITION'¶
Allow adding a nullable field to the schema.
- ALLOW_FIELD_RELAXATION = 'ALLOW_FIELD_RELAXATION'¶
Allow relaxing a required field in the original schema to nullable.
- class google.cloud.bigquery.job.ScriptOptions(statement_timeout_ms: Optional[int] = None, statement_byte_budget: Optional[int] = None, key_result_statement: Optional[google.cloud.bigquery.enums.KeyResultStatementKind] = None)[source]¶
Options controlling the execution of scripts.
https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#ScriptOptions
- classmethod from_api_repr(resource: Dict[str, Any]) google.cloud.bigquery.job.query.ScriptOptions [source]¶
Factory: construct instance from the JSON repr.
- Parameters
resource(Dict[str – Any]): ScriptOptions representation returned from API.
- Returns
ScriptOptions sample parsed from
resource
.- Return type
google.cloud.bigquery.ScriptOptions
- property key_result_statement: Optional[google.cloud.bigquery.enums.KeyResultStatementKind]¶
Determines which statement in the script represents the “key result”.
This is used to populate the schema and query results of the script job. Default is
KeyResultStatementKind.LAST
.
- class google.cloud.bigquery.job.ScriptStackFrame(resource)[source]¶
Stack frame showing the line/column/procedure name where the current evaluation happened.
- Parameters
resource (Map[str, Any]) – JSON representation of object.
- class google.cloud.bigquery.job.ScriptStatistics(resource)[source]¶
Statistics for a child job of a script.
- Parameters
resource (Map[str, Any]) – JSON representation of object.
- property evaluation_kind: Optional[str]¶
Indicates the type of child job.
Possible values include
STATEMENT
andEXPRESSION
.- Type
- property stack_frames: Sequence[google.cloud.bigquery.job.base.ScriptStackFrame]¶
Stack trace where the current evaluation happened.
Shows line/column/procedure name of each frame on the stack at the point where the current evaluation happened.
The leaf frame is first, the primary script is last.
- class google.cloud.bigquery.job.SourceFormat[source]¶
The format of the data files. The default value is
CSV
.Note that the set of allowed values for loading data is different than the set used for external data sources (see
ExternalSourceFormat
).- AVRO = 'AVRO'¶
Specifies Avro format.
- CSV = 'CSV'¶
Specifies CSV format.
- DATASTORE_BACKUP = 'DATASTORE_BACKUP'¶
Specifies datastore backup format
- NEWLINE_DELIMITED_JSON = 'NEWLINE_DELIMITED_JSON'¶
Specifies newline delimited JSON format.
- ORC = 'ORC'¶
Specifies Orc format.
- PARQUET = 'PARQUET'¶
Specifies Parquet format.
- class google.cloud.bigquery.job.TimelineEntry[source]¶
TimelineEntry represents progress of a query job at a particular point in time.
See https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#querytimelinesample for the underlying API representation within query statistics.
- property active_units¶
Current number of input units being processed by workers, reported as largest value since the last sample.
- Type
Optional[int]
- classmethod from_api_repr(resource)[source]¶
Factory: construct instance from the JSON repr.
- Parameters
resource(Dict[str – object]): QueryTimelineSample representation returned from API.
- Returns
Timeline sample parsed from
resource
.- Return type
google.cloud.bigquery.TimelineEntry
- class google.cloud.bigquery.job.TransactionInfo(transaction_id: str)[source]¶
[Alpha] Information of a multi-statement transaction.
https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#TransactionInfo
New in version 2.24.0.
Create new instance of TransactionInfo(transaction_id,)
- count(value, /)¶
Return number of occurrences of value.
- index(value, start=0, stop=9223372036854775807, /)¶
Return first index of value.
Raises ValueError if the value is not present.
- class google.cloud.bigquery.job.UnknownJob(job_id, client)[source]¶
A job whose type cannot be determined.
- add_done_callback(fn)¶
Add a callback to be executed when the operation is complete.
If the operation is not already complete, this will start a helper thread to poll for the status of the operation in the background.
- Parameters
fn (Callable[Future]) – The callback to execute when the operation is complete.
- cancel(client=None, retry: typing.Optional[google.api_core.retry.retry_unary.Retry] = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = None) bool ¶
API call: cancel job via a POST request
See https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/cancel
- Parameters
client (Optional[google.cloud.bigquery.client.Client]) – the client to use. If not passed, falls back to the
client
stored on the current dataset.retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using
retry
- Returns
Boolean indicating that the cancel request was sent.
- Return type
- cancelled()¶
Check if the job has been cancelled.
This always returns False. It’s not possible to check if a job was cancelled in the API. This method is here to satisfy the interface for
google.api_core.future.Future
.- Returns
False
- Return type
- property configuration: google.cloud.bigquery.job.base._JobConfig¶
Job-type specific configurtion.
- property created¶
Datetime at which the job was created.
- Returns
the creation time (None until set from the server).
- Return type
Optional[datetime.datetime]
- done(retry: google.api_core.retry.retry_unary.Retry = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = 128, reload: bool = True) bool ¶
Checks if the job is complete.
- Parameters
retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC. If the job state is
DONE
, retrying is aborted early, as the job will not change anymore.timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using
retry
.reload (Optional[bool]) – If
True
, make an API call to refresh the job state of unfinished jobs before checking. DefaultTrue
.
- Returns
True if the job is complete, False otherwise.
- Return type
- property ended¶
Datetime at which the job finished.
- Returns
the end time (None until set from the server).
- Return type
Optional[datetime.datetime]
- property error_result¶
Error information about the job as a whole.
- Returns
the error information (None until set from the server).
- Return type
Optional[Mapping]
- property errors¶
Information about individual errors generated by the job.
- Returns
the error information (None until set from the server).
- Return type
Optional[List[Mapping]]
- property etag¶
ETag for the job resource.
- Returns
the ETag (None until set from the server).
- Return type
Optional[str]
- exception(timeout=<object object>)¶
Get the exception from the operation, blocking if necessary.
See the documentation for the
result()
method for details on how this method operates, as bothresult
and this method rely on the exact same polling logic. The only difference is that this method does not acceptretry
andpolling
arguments but relies on the default ones instead.- Parameters
timeout (int) – How long to wait for the operation to complete.
None (If) –
indefinitely. (wait) –
- Returns
- The operation’s
error.
- Return type
Optional[google.api_core.GoogleAPICallError]
- exists(client=None, retry: google.api_core.retry.retry_unary.Retry = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = None) bool ¶
API call: test for the existence of the job via a GET request
See https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/get
- Parameters
client (Optional[google.cloud.bigquery.client.Client]) – the client to use. If not passed, falls back to the
client
stored on the current dataset.retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using
retry
.
- Returns
Boolean indicating existence of the job.
- Return type
- classmethod from_api_repr(resource: dict, client) google.cloud.bigquery.job.base.UnknownJob [source]¶
Construct an UnknownJob from the JSON representation.
- Parameters
resource (Dict) – JSON representation of a job.
client (google.cloud.bigquery.client.Client) – Client connected to BigQuery API.
- Returns
Job corresponding to the resource.
- Return type
- property num_child_jobs¶
The number of child jobs executed.
See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobStatistics.FIELDS.num_child_jobs
- Returns
int
- property parent_job_id¶
Return the ID of the parent job.
See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobStatistics.FIELDS.parent_job_id
- Returns
parent job id.
- Return type
Optional[str]
- property path¶
URL path for the job’s APIs.
- Returns
the path based on project and job ID.
- Return type
- property project¶
Project bound to the job.
- Returns
the project (derived from the client).
- Return type
- reload(client=None, retry: google.api_core.retry.retry_unary.Retry = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = 128)¶
API call: refresh job properties via a GET request.
See https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/get
- Parameters
client (Optional[google.cloud.bigquery.client.Client]) – the client to use. If not passed, falls back to the
client
stored on the current dataset.retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using
retry
.
- property reservation_usage¶
Job resource usage breakdown by reservation.
- Returns
Reservation usage stats. Can be empty if not set from the server.
- Return type
- result(retry: typing.Optional[google.api_core.retry.retry_unary.Retry] = <google.api_core.retry.retry_unary.Retry object>, timeout: typing.Optional[float] = None) google.cloud.bigquery.job.base._AsyncJob ¶
Start the job and wait for it to complete and get the result.
- Parameters
retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC. If the job state is
DONE
, retrying is aborted early, as the job will not change anymore.timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using
retry
. If multiple requests are made under the hood,timeout
applies to each individual request.
- Returns
This instance.
- Return type
_AsyncJob
- Raises
google.cloud.exceptions.GoogleAPICallError – if the job failed.
concurrent.futures.TimeoutError – if the job did not complete in the given timeout.
- running()¶
True if the operation is currently running.
- property script_statistics: Optional[google.cloud.bigquery.job.base.ScriptStatistics]¶
Statistics for a child job of a script.
- property self_link¶
URL for the job resource.
- Returns
the URL (None until set from the server).
- Return type
Optional[str]
- property session_info: Optional[google.cloud.bigquery.job.base.SessionInfo]¶
[Preview] Information of the session if this job is part of one.
New in version 2.29.0.
- set_exception(exception)¶
Set the Future’s exception.
- set_result(result)¶
Set the Future’s result.
- property started¶
Datetime at which the job was started.
- Returns
the start time (None until set from the server).
- Return type
Optional[datetime.datetime]
- property state¶
Status of the job.
- Returns
the state (None until set from the server).
- Return type
Optional[str]
- to_api_repr()¶
Generate a resource for the job.
- property transaction_info: Optional[google.cloud.bigquery.job.base.TransactionInfo]¶
Information of the multi-statement transaction if this job is part of one.
Since a scripting query job can execute multiple transactions, this property is only expected on child jobs. Use the
google.cloud.bigquery.client.Client.list_jobs()
method with theparent_job
parameter to iterate over child jobs.New in version 2.24.0.
- class google.cloud.bigquery.job.WriteDisposition[source]¶
Specifies the action that occurs if destination table already exists.
The default value is
WRITE_APPEND
.Each action is atomic and only occurs if BigQuery is able to complete the job successfully. Creation, truncation and append actions occur as one atomic update upon job completion.
- WRITE_APPEND = 'WRITE_APPEND'¶
If the table already exists, BigQuery appends the data to the table.
- WRITE_EMPTY = 'WRITE_EMPTY'¶
If the table already exists and contains data, a ‘duplicate’ error is returned in the job result.
- WRITE_TRUNCATE = 'WRITE_TRUNCATE'¶
If the table already exists, BigQuery overwrites the table data.
Dataset¶
Define API Datasets.
- class google.cloud.bigquery.dataset.AccessEntry(role: Optional[str] = None, entity_type: Optional[str] = None, entity_id: Optional[Union[Dict[str, Any], str]] = None)[source]¶
Represents grant of an access role to an entity.
An entry must have exactly one of the allowed
google.cloud.bigquery.enums.EntityTypes
. If anything butview
,routine
, ordataset
are set, arole
is also required.role
is omitted forview
,routine
,dataset
, because they are always read-only.See https://cloud.google.com/bigquery/docs/reference/rest/v2/datasets.
- Parameters
role – Role granted to the entity. The following string values are supported: ‘READER’, ‘WRITER’, ‘OWNER’. It may also be
None
if theentity_type
isview
,routine
, ordataset
.entity_type – Type of entity being granted the role. See
google.cloud.bigquery.enums.EntityTypes
for supported types.entity_id –
If the
entity_type
is not ‘view’, ‘routine’, or ‘dataset’, theentity_id
is thestr
ID of the entity being granted the role. If theentity_type
is ‘view’ or ‘routine’, theentity_id
is adict
representing the view or routine from a different dataset to grant access to in the following format for views:{ 'projectId': string, 'datasetId': string, 'tableId': string }
For routines:
{ 'projectId': string, 'datasetId': string, 'routineId': string }
If the
entity_type
is ‘dataset’, theentity_id
is adict
that includes a ‘dataset’ field with adict
representing the dataset and a ‘target_types’ field with astr
value of the dataset’s resource type:{ 'dataset': { 'projectId': string, 'datasetId': string, }, 'target_types: 'VIEWS' }
- Raises
ValueError – If a
view
,routine
, ordataset
hasrole
set, or a nonview
, nonroutine
, and nondataset
does not have arole
set.
Examples
>>> entry = AccessEntry('OWNER', 'userByEmail', 'user@example.com')
>>> view = { ... 'projectId': 'my-project', ... 'datasetId': 'my_dataset', ... 'tableId': 'my_table' ... } >>> entry = AccessEntry(None, 'view', view)
- property dataset: Optional[google.cloud.bigquery.dataset.DatasetReference]¶
API resource representation of a dataset reference.
- property dataset_target_types: Optional[List[str]]¶
Which resources that the dataset in this entry applies to.
- classmethod from_api_repr(resource: dict) google.cloud.bigquery.dataset.AccessEntry [source]¶
Factory: construct an access entry given its API representation
- Parameters
resource (Dict[str, object]) – Access entry resource representation returned from the API
- Returns
Access entry parsed from
resource
.- Return type
- Raises
ValueError – If the resource has more keys than
role
and one additional key.
- property routine: Optional[google.cloud.bigquery.routine.routine.RoutineReference]¶
API resource representation of a routine reference.
- property view: Optional[google.cloud.bigquery.table.TableReference]¶
API resource representation of a view reference.
- class google.cloud.bigquery.dataset.Dataset(dataset_ref)[source]¶
Datasets are containers for tables.
See https://cloud.google.com/bigquery/docs/reference/rest/v2/datasets#resource-dataset
- Parameters
dataset_ref (Union[google.cloud.bigquery.dataset.DatasetReference, str]) – A pointer to a dataset. If
dataset_ref
is a string, it must include both the project ID and the dataset ID, separated by.
.
- property access_entries¶
Dataset’s access entries.
role
augments the entity type and must be present unless the entity type isview
orroutine
.- Raises
TypeError – If ‘value’ is not a sequence
ValueError – If any item in the sequence is not an
AccessEntry
.
- Type
- property created¶
Datetime at which the dataset was created (
None
until set from the server).- Type
Union[datetime.datetime, None]
- property default_encryption_configuration¶
Custom encryption configuration for all tables in the dataset.
Custom encryption configuration (e.g., Cloud KMS keys) or
None
if using default encryption.See protecting data with Cloud KMS keys in the BigQuery documentation.
- property default_partition_expiration_ms¶
The default partition expiration for all partitioned tables in the dataset, in milliseconds.
Once this property is set, all newly-created partitioned tables in the dataset will have an
time_paritioning.expiration_ms
property set to this value, and changing the value will only affect new tables, not existing ones. The storage in a partition will have an expiration time of its partition time plus this value.Setting this property overrides the use of
default_table_expiration_ms
for partitioned tables: only one ofdefault_table_expiration_ms
anddefault_partition_expiration_ms
will be used for any new partitioned table. If you provide an explicittime_partitioning.expiration_ms
when creating or updating a partitioned table, that value takes precedence over the default partition expiration time indicated by this property.- Type
Optional[int]
- property default_rounding_mode¶
defaultRoundingMode of the dataset as set by the user (defaults to
None
).Set the value to one of
'ROUND_HALF_AWAY_FROM_ZERO'
,'ROUND_HALF_EVEN'
, or'ROUNDING_MODE_UNSPECIFIED'
.See default rounding mode in REST API docs and updating the default rounding model guide.
- Raises
ValueError – for invalid value types.
- Type
Union[str, None]
- property default_table_expiration_ms¶
Default expiration time for tables in the dataset (defaults to
None
).- Raises
ValueError – For invalid value types.
- Type
Union[int, None]
- property description¶
Description of the dataset as set by the user (defaults to
None
).- Raises
ValueError – for invalid value types.
- Type
Optional[str]
- property etag¶
ETag for the dataset resource (
None
until set from the server).- Type
Union[str, None]
- property friendly_name¶
Title of the dataset as set by the user (defaults to
None
).- Raises
ValueError – for invalid value types.
- Type
Union[str, None]
- classmethod from_api_repr(resource: dict) google.cloud.bigquery.dataset.Dataset [source]¶
Factory: construct a dataset given its API representation
- Parameters
(Dict[str (resource) – object]): Dataset resource representation returned from the API
- Returns
Dataset parsed from
resource
.- Return type
- classmethod from_string(full_dataset_id: str) google.cloud.bigquery.dataset.Dataset [source]¶
Construct a dataset from fully-qualified dataset ID.
- Parameters
full_dataset_id (str) – A fully-qualified dataset ID in standard SQL format. Must include both the project ID and the dataset ID, separated by
.
.- Returns
Dataset parsed from
full_dataset_id
.- Return type
Examples
>>> Dataset.from_string('my-project-id.some_dataset') Dataset(DatasetReference('my-project-id', 'some_dataset'))
- Raises
ValueError – If
full_dataset_id
is not a fully-qualified dataset ID in standard SQL format.
- property full_dataset_id¶
ID for the dataset resource (
None
until set from the server)In the format
project_id:dataset_id
.- Type
Union[str, None]
- property is_case_insensitive¶
True if the dataset and its table names are case-insensitive, otherwise False. By default, this is False, which means the dataset and its table names are case-sensitive. This field does not affect routine references.
- Raises
ValueError – for invalid value types.
- Type
Optional[bool]
- property labels¶
Labels for the dataset.
This method always returns a dict. To change a dataset’s labels, modify the dict, then call
google.cloud.bigquery.client.Client.update_dataset()
. To delete a label, set its value toNone
before updating.- Raises
ValueError – for invalid value types.
- Type
- property location¶
Location in which the dataset is hosted as set by the user (defaults to
None
).- Raises
ValueError – for invalid value types.
- Type
Union[str, None]
- property max_time_travel_hours¶
Defines the time travel window in hours. The value can be from 48 to 168 hours (2 to 7 days), and in multiple of 24 hours (48, 72, 96, 120, 144, 168). The default value is 168 hours if this is not set.
- Type
Optional[int]
- model(model_id)¶
Constructs a ModelReference.
- Parameters
model_id (str) – the ID of the model.
- Returns
A ModelReference for a model in this dataset.
- Return type
- property modified¶
Datetime at which the dataset was last modified (
None
until set from the server).- Type
Union[datetime.datetime, None]
- property reference¶
A reference to this dataset.
- routine(routine_id)¶
Constructs a RoutineReference.
- Parameters
routine_id (str) – the ID of the routine.
- Returns
A RoutineReference for a routine in this dataset.
- Return type
- property self_link¶
URL for the dataset resource (
None
until set from the server).- Type
Union[str, None]
- property storage_billing_model¶
StorageBillingModel of the dataset as set by the user (defaults to
None
).Set the value to one of
'LOGICAL'
,'PHYSICAL'
, or'STORAGE_BILLING_MODEL_UNSPECIFIED'
. This change takes 24 hours to take effect and you must wait 14 days before you can change the storage billing model again.See storage billing model in REST API docs and updating the storage billing model guide.
- Raises
ValueError – for invalid value types.
- Type
Union[str, None]
- table(table_id: str) google.cloud.bigquery.table.TableReference ¶
Constructs a TableReference.
- Parameters
table_id (str) – The ID of the table.
- Returns
A table reference for a table in this dataset.
- Return type
- class google.cloud.bigquery.dataset.DatasetListItem(resource)[source]¶
A read-only dataset resource from a list operation.
For performance reasons, the BigQuery API only includes some of the dataset properties when listing datasets. Notably,
access_entries
is missing.For a full list of the properties that the BigQuery API returns, see the REST documentation for datasets.list.
- Parameters
resource (Dict[str, str]) – A dataset-like resource object from a dataset list response. A
datasetReference
property is required.- Raises
ValueError – If
datasetReference
or one of its required members is missing fromresource
.
- property friendly_name¶
Title of the dataset as set by the user (defaults to
None
).- Type
Union[str, None]
- property full_dataset_id¶
ID for the dataset resource (
None
until set from the server)In the format
project_id:dataset_id
.- Type
Union[str, None]
- model(model_id)¶
Constructs a ModelReference.
- Parameters
model_id (str) – the ID of the model.
- Returns
A ModelReference for a model in this dataset.
- Return type
- property reference¶
A reference to this dataset.
- routine(routine_id)¶
Constructs a RoutineReference.
- Parameters
routine_id (str) – the ID of the routine.
- Returns
A RoutineReference for a routine in this dataset.
- Return type
- table(table_id: str) google.cloud.bigquery.table.TableReference ¶
Constructs a TableReference.
- Parameters
table_id (str) – The ID of the table.
- Returns
A table reference for a table in this dataset.
- Return type
- class google.cloud.bigquery.dataset.DatasetReference(project: str, dataset_id: str)[source]¶
DatasetReferences are pointers to datasets.
See https://cloud.google.com/bigquery/docs/reference/rest/v2/datasets#datasetreference
- Parameters
- Raises
ValueError – If either argument is not of type
str
.
- classmethod from_api_repr(resource: dict) google.cloud.bigquery.dataset.DatasetReference [source]¶
Factory: construct a dataset reference given its API representation
- classmethod from_string(dataset_id: str, default_project: Optional[str] = None) google.cloud.bigquery.dataset.DatasetReference [source]¶
Construct a dataset reference from dataset ID string.
- Parameters
- Returns
Dataset reference parsed from
dataset_id
.- Return type
Examples
>>> DatasetReference.from_string('my-project-id.some_dataset') DatasetReference('my-project-id', 'some_dataset')
- Raises
ValueError – If
dataset_id
is not a fully-qualified dataset ID in standard SQL format.
- model(model_id)¶
Constructs a ModelReference.
- Parameters
model_id (str) – the ID of the model.
- Returns
A ModelReference for a model in this dataset.
- Return type
- routine(routine_id)¶
Constructs a RoutineReference.
- Parameters
routine_id (str) – the ID of the routine.
- Returns
A RoutineReference for a routine in this dataset.
- Return type
- table(table_id: str) google.cloud.bigquery.table.TableReference ¶
Constructs a TableReference.
- Parameters
table_id (str) – The ID of the table.
- Returns
A table reference for a table in this dataset.
- Return type
Table¶
Define API Tables.
- class google.cloud.bigquery.table.CloneDefinition(resource: Dict[str, Any])[source]¶
Information about base table and clone time of the clone.
See https://cloud.google.com/bigquery/docs/reference/rest/v2/tables#clonedefinition
- Parameters
resource – Clone definition representation returned from the API.
- class google.cloud.bigquery.table.ColumnReference(referencing_column: str, referenced_column: str)[source]¶
The pair of the foreign key column and primary key column.
- Parameters
referencing_column – The column that composes the foreign key.
referenced_column – The column in the primary key that are referenced by the referencingColumn.
- class google.cloud.bigquery.table.ForeignKey(name: str, referenced_table: google.cloud.bigquery.table.TableReference, column_references: List[google.cloud.bigquery.table.ColumnReference])[source]¶
Represents a foreign key constraint on a table’s columns.
- Parameters
name – Set only if the foreign key constraint is named.
referenced_table – The table that holds the primary key and is referenced by this foreign key.
column_references – The columns that compose the foreign key.
- class google.cloud.bigquery.table.PartitionRange(start=None, end=None, interval=None, _properties=None)[source]¶
Definition of the ranges for range partitioning.
Note
Beta. The integer range partitioning feature is in a pre-release state and might change or have limited support.
- Parameters
- class google.cloud.bigquery.table.PrimaryKey(columns: List[str])[source]¶
Represents the primary key constraint on a table’s columns.
- Parameters
columns – The columns that are composed of the primary key constraint.
- class google.cloud.bigquery.table.RangePartitioning(range_=None, field=None, _properties=None)[source]¶
Range-based partitioning configuration for a table.
Note
Beta. The integer range partitioning feature is in a pre-release state and might change or have limited support.
- Parameters
range (Optional[google.cloud.bigquery.table.PartitionRange]) – Sets the
google.cloud.bigquery.table.RangePartitioning.range_
property.field (Optional[str]) – Sets the
google.cloud.bigquery.table.RangePartitioning.field
property._properties (Optional[dict]) – Private. Used to construct object from API resource.
- property field¶
The table is partitioned by this field.
The field must be a top-level
NULLABLE
/REQUIRED
field. The only supported type isINTEGER
/INT64
.- Type
- property range_¶
Defines the ranges for range partitioning.
- Raises
ValueError – If the value is not a
PartitionRange
.- Type
- class google.cloud.bigquery.table.Row(values, field_to_index)[source]¶
A BigQuery row.
Values can be accessed by position (index), by key like a dict, or as properties.
- Parameters
- get(key: str, default: Optional[Any] = None) Any [source]¶
Return a value for key, with a default value if it does not exist.
- Parameters
- Returns
The value associated with the provided key, or a default value.
- Return type
Examples
When the key exists, the value associated with it is returned.
>>> Row(('a', 'b'), {'x': 0, 'y': 1}).get('x') 'a'
The default value is
None
when the key does not exist.>>> Row(('a', 'b'), {'x': 0, 'y': 1}).get('z') None
The default value can be overridden with the
default
parameter.>>> Row(('a', 'b'), {'x': 0, 'y': 1}).get('z', '') ''
>>> Row(('a', 'b'), {'x': 0, 'y': 1}).get('z', default = '') ''
- items() Iterable[Tuple[str, Any]] [source]¶
Return items as
(key, value)
pairs.Examples
>>> list(Row(('a', 'b'), {'x': 0, 'y': 1}).items()) [('x', 'a'), ('y', 'b')]
- class google.cloud.bigquery.table.RowIterator(client, api_request, path, schema, page_token=None, max_results=None, page_size=None, extra_params=None, table=None, selected_fields=None, total_rows=None, first_page_response=None, location: Optional[str] = None, job_id: Optional[str] = None, query_id: Optional[str] = None, project: Optional[str] = None, num_dml_affected_rows: Optional[int] = None)[source]¶
A class for iterating through HTTP/JSON API row list responses.
- Parameters
client (Optional[google.cloud.bigquery.Client]) – The API client instance. This should always be non-None, except for subclasses that do not use it, namely the
_EmptyRowIterator
.api_request (Callable[google.cloud._http.JSONConnection.api_request]) – The function to use to make API requests.
path (str) – The method path to query for the list of items.
schema (Sequence[Union[
SchemaField
, Mapping[str, Any] ]]) – The table’s schema. If any item is a mapping, its content must be compatible withfrom_api_repr()
.page_token (str) – A token identifying a page in a result set to start fetching results from.
max_results (Optional[int]) – The maximum number of results to fetch.
page_size (Optional[int]) – The maximum number of rows in each page of results from this request. Non-positive values are ignored. Defaults to a sensible value set by the API.
extra_params (Optional[Dict[str, object]]) – Extra query string parameters for the API call.
table (Optional[Union[ google.cloud.bigquery.table.Table, google.cloud.bigquery.table.TableReference, ]]) – The table which these rows belong to, or a reference to it. Used to call the BigQuery Storage API to fetch rows.
selected_fields (Optional[Sequence[google.cloud.bigquery.schema.SchemaField]]) – A subset of columns to select from this table.
total_rows (Optional[int]) – Total number of rows in the table.
first_page_response (Optional[dict]) – API response for the first page of results. These are returned when the first page is requested.
- __iter__()¶
Iterator for each item returned.
- Returns
A generator of items from the API.
- Return type
types.GeneratorType[Any]
- Raises
ValueError – If the iterator has already been started.
- client¶
The client that created this iterator.
- Type
Optional[Any]
- item_to_value¶
Callable to convert an item from the type in the raw API response into the native object. Will be called with the iterator and a single item.
- Type
Callable[Iterator, Any]
- property job_id: Optional[str]¶
ID of the query job (if applicable).
To get the job metadata, call
job = client.get_job(rows.job_id, location=rows.location)
.
- next_page_token¶
The token for the next page of results. If this is set before the iterator starts, it effectively offsets the iterator to a specific starting point.
- Type
- property num_dml_affected_rows: Optional[int]¶
If this RowIterator is the result of a DML query, the number of rows that were affected.
- property pages¶
Iterator of pages in the response.
- Returns
- A
generator of page instances.
- Return type
types.GeneratorType[google.api_core.page_iterator.Page]
- Raises
ValueError – If the iterator has already been started.
- property query_id: Optional[str]¶
[Preview] ID of a completed query.
This ID is auto-generated and not guaranteed to be populated.
- property schema¶
The subset of columns to be read from the table.
- Type
- to_arrow(progress_bar_type: Optional[str] = None, bqstorage_client: Optional[bigquery_storage.BigQueryReadClient] = None, create_bqstorage_client: bool = True) pyarrow.Table [source]¶
[Beta] Create a class:pyarrow.Table by loading all pages of a table or query.
- Parameters
progress_bar_type (Optional[str]) –
If set, use the tqdm library to display a progress bar while the data downloads. Install the
tqdm
package to use this feature.Possible values of
progress_bar_type
include:None
No progress bar.
'tqdm'
Use the
tqdm.tqdm()
function to print a progress bar tosys.stdout
.'tqdm_notebook'
Use the
tqdm.notebook.tqdm()
function to display a progress bar as a Jupyter notebook widget.'tqdm_gui'
Use the
tqdm.tqdm_gui()
function to display a progress bar as a graphical dialog box.
bqstorage_client (Optional[google.cloud.bigquery_storage_v1.BigQueryReadClient]) –
A BigQuery Storage API client. If supplied, use the faster BigQuery Storage API to fetch rows from BigQuery. This API is a billable API.
This method requires
google-cloud-bigquery-storage
library.This method only exposes a subset of the capabilities of the BigQuery Storage API. For full access to all features (projections, filters, snapshots) use the Storage API directly.
create_bqstorage_client (Optional[bool]) –
If
True
(default), create a BigQuery Storage API client using the default API settings. The BigQuery Storage API is a faster way to fetch rows from BigQuery. See thebqstorage_client
parameter for more information.This argument does nothing if
bqstorage_client
is supplied.New in version 1.24.0.
- Returns
- pyarrow.Table
A
pyarrow.Table
populated with row data and column headers from the query results. The column headers are derived from the destination table’s schema.
- Raises
ValueError – If the
pyarrow
library cannot be imported.
New in version 1.17.0.
- to_arrow_iterable(bqstorage_client: typing.Optional[bigquery_storage.BigQueryReadClient] = None, max_queue_size: int = <object object>) Iterator[pyarrow.RecordBatch] [source]¶
[Beta] Create an iterable of class:pyarrow.RecordBatch, to process the table as a stream.
- Parameters
bqstorage_client (Optional[google.cloud.bigquery_storage_v1.BigQueryReadClient]) –
A BigQuery Storage API client. If supplied, use the faster BigQuery Storage API to fetch rows from BigQuery.
This method requires the
pyarrow
andgoogle-cloud-bigquery-storage
libraries.This method only exposes a subset of the capabilities of the BigQuery Storage API. For full access to all features (projections, filters, snapshots) use the Storage API directly.
max_queue_size (Optional[int]) –
The maximum number of result pages to hold in the internal queue when streaming query results over the BigQuery Storage API. Ignored if Storage API is not used.
By default, the max queue size is set to the number of BQ Storage streams created by the server. If
max_queue_size
isNone
, the queue size is infinite.
- Returns
A generator of
RecordBatch
.- Return type
pyarrow.RecordBatch
New in version 2.31.0.
- to_dataframe(bqstorage_client: Optional[bigquery_storage.BigQueryReadClient] = None, dtypes: Optional[Dict[str, Any]] = None, progress_bar_type: Optional[str] = None, create_bqstorage_client: bool = True, geography_as_object: bool = False, bool_dtype: Optional[Any] = DefaultPandasDTypes.BOOL_DTYPE, int_dtype: Optional[Any] = DefaultPandasDTypes.INT_DTYPE, float_dtype: Optional[Any] = None, string_dtype: Optional[Any] = None, date_dtype: Optional[Any] = DefaultPandasDTypes.DATE_DTYPE, datetime_dtype: Optional[Any] = None, time_dtype: Optional[Any] = DefaultPandasDTypes.TIME_DTYPE, timestamp_dtype: Optional[Any] = None, range_date_dtype: Optional[Any] = DefaultPandasDTypes.RANGE_DATE_DTYPE, range_datetime_dtype: Optional[Any] = DefaultPandasDTypes.RANGE_DATETIME_DTYPE, range_timestamp_dtype: Optional[Any] = DefaultPandasDTypes.RANGE_TIMESTAMP_DTYPE) pandas.DataFrame [source]¶
Create a pandas DataFrame by loading all pages of a query.
- Parameters
bqstorage_client (Optional[google.cloud.bigquery_storage_v1.BigQueryReadClient]) –
A BigQuery Storage API client. If supplied, use the faster BigQuery Storage API to fetch rows from BigQuery.
This method requires
google-cloud-bigquery-storage
library.This method only exposes a subset of the capabilities of the BigQuery Storage API. For full access to all features (projections, filters, snapshots) use the Storage API directly.
dtypes (Optional[Map[str, Union[str, pandas.Series.dtype]]]) – A dictionary of column names pandas
dtype``s. The provided ``dtype
is used when constructing the series for the column specified. Otherwise, the default pandas behavior is used.progress_bar_type (Optional[str]) –
If set, use the tqdm library to display a progress bar while the data downloads. Install the
tqdm
package to use this feature.Possible values of
progress_bar_type
include:None
No progress bar.
'tqdm'
Use the
tqdm.tqdm()
function to print a progress bar tosys.stdout
.'tqdm_notebook'
Use the
tqdm.notebook.tqdm()
function to display a progress bar as a Jupyter notebook widget.'tqdm_gui'
Use the
tqdm.tqdm_gui()
function to display a progress bar as a graphical dialog box.
New in version 1.11.0.
create_bqstorage_client (Optional[bool]) –
If
True
(default), create a BigQuery Storage API client using the default API settings. The BigQuery Storage API is a faster way to fetch rows from BigQuery. See thebqstorage_client
parameter for more information.This argument does nothing if
bqstorage_client
is supplied.New in version 1.24.0.
geography_as_object (Optional[bool]) –
If
True
, convert GEOGRAPHY data toshapely
geometry objects. IfFalse
(default), don’t cast geography data toshapely
geometry objects.New in version 2.24.0.
bool_dtype (Optional[pandas.Series.dtype, None]) –
If set, indicate a pandas ExtensionDtype (e.g.
pandas.BooleanDtype()
) to convert BigQuery Boolean type, instead of relying on the defaultpandas.BooleanDtype()
. If you explicitly set the value toNone
, then the data type will benumpy.dtype("bool")
. BigQuery Boolean type can be found at: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#boolean_typeNew in version 3.8.0.
int_dtype (Optional[pandas.Series.dtype, None]) –
If set, indicate a pandas ExtensionDtype (e.g.
pandas.Int64Dtype()
) to convert BigQuery Integer types, instead of relying on the defaultpandas.Int64Dtype()
. If you explicitly set the value toNone
, then the data type will benumpy.dtype("int64")
. A list of BigQuery Integer types can be found at: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#integer_typesNew in version 3.8.0.
float_dtype (Optional[pandas.Series.dtype, None]) –
If set, indicate a pandas ExtensionDtype (e.g.
pandas.Float32Dtype()
) to convert BigQuery Float type, instead of relying on the defaultnumpy.dtype("float64")
. If you explicitly set the value toNone
, then the data type will benumpy.dtype("float64")
. BigQuery Float type can be found at: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#floating_point_typesNew in version 3.8.0.
string_dtype (Optional[pandas.Series.dtype, None]) –
If set, indicate a pandas ExtensionDtype (e.g.
pandas.StringDtype()
) to convert BigQuery String type, instead of relying on the defaultnumpy.dtype("object")
. If you explicitly set the value toNone
, then the data type will benumpy.dtype("object")
. BigQuery String type can be found at: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#string_typeNew in version 3.8.0.
date_dtype (Optional[pandas.Series.dtype, None]) –
If set, indicate a pandas ExtensionDtype (e.g.
pandas.ArrowDtype(pyarrow.date32())
) to convert BigQuery Date type, instead of relying on the defaultdb_dtypes.DateDtype()
. If you explicitly set the value toNone
, then the data type will benumpy.dtype("datetime64[ns]")
orobject
if out of bound. BigQuery Date type can be found at: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#date_typeNew in version 3.10.0.
datetime_dtype (Optional[pandas.Series.dtype, None]) –
If set, indicate a pandas ExtensionDtype (e.g.
pandas.ArrowDtype(pyarrow.timestamp("us"))
) to convert BigQuery Datetime type, instead of relying on the defaultnumpy.dtype("datetime64[ns]
. If you explicitly set the value toNone
, then the data type will benumpy.dtype("datetime64[ns]")
orobject
if out of bound. BigQuery Datetime type can be found at: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#datetime_typeNew in version 3.10.0.
time_dtype (Optional[pandas.Series.dtype, None]) –
If set, indicate a pandas ExtensionDtype (e.g.
pandas.ArrowDtype(pyarrow.time64("us"))
) to convert BigQuery Time type, instead of relying on the defaultdb_dtypes.TimeDtype()
. If you explicitly set the value toNone
, then the data type will benumpy.dtype("object")
. BigQuery Time type can be found at: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#time_typeNew in version 3.10.0.
timestamp_dtype (Optional[pandas.Series.dtype, None]) –
If set, indicate a pandas ExtensionDtype (e.g.
pandas.ArrowDtype(pyarrow.timestamp("us", tz="UTC"))
) to convert BigQuery Timestamp type, instead of relying on the defaultnumpy.dtype("datetime64[ns, UTC]")
. If you explicitly set the value toNone
, then the data type will benumpy.dtype("datetime64[ns, UTC]")
orobject
if out of bound. BigQuery Datetime type can be found at: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#timestamp_typeNew in version 3.10.0.
range_date_dtype (Optional[pandas.Series.dtype, None]) –
If set, indicate a pandas ExtensionDtype, such as:
pandas.ArrowDtype(pyarrow.struct( [("start", pyarrow.date32()), ("end", pyarrow.date32())] ))
to convert BigQuery RANGE<DATE> type, instead of relying on the default
object
. If you explicitly set the value toNone
, the data type will beobject
. BigQuery Range type can be found at: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#range_typeNew in version 3.21.0.
range_datetime_dtype (Optional[pandas.Series.dtype, None]) –
If set, indicate a pandas ExtensionDtype, such as:
pandas.ArrowDtype(pyarrow.struct( [ ("start", pyarrow.timestamp("us")), ("end", pyarrow.timestamp("us")), ] ))
to convert BigQuery RANGE<DATETIME> type, instead of relying on the default
object
. If you explicitly set the value toNone
, the data type will beobject
. BigQuery Range type can be found at: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#range_typeNew in version 3.21.0.
range_timestamp_dtype (Optional[pandas.Series.dtype, None]) –
If set, indicate a pandas ExtensionDtype, such as:
pandas.ArrowDtype(pyarrow.struct( [ ("start", pyarrow.timestamp("us", tz="UTC")), ("end", pyarrow.timestamp("us", tz="UTC")), ] ))
to convert BigQuery RANGE<TIMESTAMP> type, instead of relying on the default
object
. If you explicitly set the value toNone
, the data type will beobject
. BigQuery Range type can be found at: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#range_typeNew in version 3.21.0.
- Returns
A
DataFrame
populated with row data and column headers from the query results. The column headers are derived from the destination table’s schema.- Return type
- Raises
ValueError – If the
pandas
library cannot be imported, or thegoogle.cloud.bigquery_storage_v1
module is required but cannot be imported. Also if geography_as_object is True, but theshapely
library cannot be imported. Also if bool_dtype, int_dtype or other dtype parameters is not supported dtype.
- to_dataframe_iterable(bqstorage_client: typing.Optional[bigquery_storage.BigQueryReadClient] = None, dtypes: typing.Optional[typing.Dict[str, typing.Any]] = None, max_queue_size: int = <object object>) pandas.DataFrame [source]¶
Create an iterable of pandas DataFrames, to process the table as a stream.
- Parameters
bqstorage_client (Optional[google.cloud.bigquery_storage_v1.BigQueryReadClient]) –
A BigQuery Storage API client. If supplied, use the faster BigQuery Storage API to fetch rows from BigQuery.
This method requires
google-cloud-bigquery-storage
library.This method only exposes a subset of the capabilities of the BigQuery Storage API. For full access to all features (projections, filters, snapshots) use the Storage API directly.
dtypes (Optional[Map[str, Union[str, pandas.Series.dtype]]]) – A dictionary of column names pandas
dtype``s. The provided ``dtype
is used when constructing the series for the column specified. Otherwise, the default pandas behavior is used.max_queue_size (Optional[int]) –
The maximum number of result pages to hold in the internal queue when streaming query results over the BigQuery Storage API. Ignored if Storage API is not used.
By default, the max queue size is set to the number of BQ Storage streams created by the server. If
max_queue_size
isNone
, the queue size is infinite.New in version 2.14.0.
- Returns
A generator of
DataFrame
.- Return type
- Raises
ValueError – If the
pandas
library cannot be imported.
- to_geodataframe(bqstorage_client: Optional[bigquery_storage.BigQueryReadClient] = None, dtypes: Optional[Dict[str, Any]] = None, progress_bar_type: Optional[str] = None, create_bqstorage_client: bool = True, geography_column: Optional[str] = None) geopandas.GeoDataFrame [source]¶
Create a GeoPandas GeoDataFrame by loading all pages of a query.
- Parameters
bqstorage_client (Optional[google.cloud.bigquery_storage_v1.BigQueryReadClient]) –
A BigQuery Storage API client. If supplied, use the faster BigQuery Storage API to fetch rows from BigQuery.
This method requires the
pyarrow
andgoogle-cloud-bigquery-storage
libraries.This method only exposes a subset of the capabilities of the BigQuery Storage API. For full access to all features (projections, filters, snapshots) use the Storage API directly.
dtypes (Optional[Map[str, Union[str, pandas.Series.dtype]]]) – A dictionary of column names pandas
dtype``s. The provided ``dtype
is used when constructing the series for the column specified. Otherwise, the default pandas behavior is used.progress_bar_type (Optional[str]) –
If set, use the tqdm library to display a progress bar while the data downloads. Install the
tqdm
package to use this feature.Possible values of
progress_bar_type
include:None
No progress bar.
'tqdm'
Use the
tqdm.tqdm()
function to print a progress bar tosys.stdout
.'tqdm_notebook'
Use the
tqdm.notebook.tqdm()
function to display a progress bar as a Jupyter notebook widget.'tqdm_gui'
Use the
tqdm.tqdm_gui()
function to display a progress bar as a graphical dialog box.
create_bqstorage_client (Optional[bool]) –
If
True
(default), create a BigQuery Storage API client using the default API settings. The BigQuery Storage API is a faster way to fetch rows from BigQuery. See thebqstorage_client
parameter for more information.This argument does nothing if
bqstorage_client
is supplied.geography_column (Optional[str]) – If there are more than one GEOGRAPHY column, identifies which one to use to construct a geopandas GeoDataFrame. This option can be ommitted if there’s only one GEOGRAPHY column.
- Returns
A
geopandas.GeoDataFrame
populated with row data and column headers from the query results. The column headers are derived from the destination table’s schema.- Return type
- Raises
ValueError – If the
geopandas
library cannot be imported, or thegoogle.cloud.bigquery_storage_v1
module is required but cannot be imported.
New in version 2.24.0.
- class google.cloud.bigquery.table.SnapshotDefinition(resource: Dict[str, Any])[source]¶
Information about base table and snapshot time of the snapshot.
See https://cloud.google.com/bigquery/docs/reference/rest/v2/tables#snapshotdefinition
- Parameters
resource – Snapshot definition representation returned from the API.
- class google.cloud.bigquery.table.StreamingBuffer(resource)[source]¶
Information about a table’s streaming buffer.
See https://cloud.google.com/bigquery/streaming-data-into-bigquery.
- class google.cloud.bigquery.table.Table(table_ref, schema=None)[source]¶
Tables represent a set of rows whose values correspond to a schema.
See https://cloud.google.com/bigquery/docs/reference/rest/v2/tables#resource-table
- Parameters
table_ref (Union[google.cloud.bigquery.table.TableReference, str]) – A pointer to a table. If
table_ref
is a string, it must included a project ID, dataset ID, and table ID, each separated by.
.schema (Optional[Sequence[Union[
SchemaField
, Mapping[str, Any] ]]]) – The table’s schema. If any item is a mapping, its content must be compatible withfrom_api_repr()
.
- property clone_definition: Optional[google.cloud.bigquery.table.CloneDefinition]¶
Information about the clone. This value is set via clone creation.
See: https://cloud.google.com/bigquery/docs/reference/rest/v2/tables#Table.FIELDS.clone_definition
- property clustering_fields¶
Fields defining clustering for the table
(Defaults to
None
).Clustering fields are immutable after table creation.
Note
BigQuery supports clustering for both partitioned and non-partitioned tables.
- Type
Union[List[str], None]
- property created¶
Datetime at which the table was created (
None
until set from the server).- Type
Union[datetime.datetime, None]
- property description¶
Description of the table (defaults to
None
).- Raises
ValueError – For invalid value types.
- Type
Union[str, None]
- property encryption_configuration¶
Custom encryption configuration for the table.
Custom encryption configuration (e.g., Cloud KMS keys) or
None
if using default encryption.See protecting data with Cloud KMS keys in the BigQuery documentation.
- property expires¶
Datetime at which the table will be deleted.
- Raises
ValueError – For invalid value types.
- Type
Union[datetime.datetime, None]
- property external_data_configuration¶
Configuration for an external data source (defaults to
None
).- Raises
ValueError – For invalid value types.
- Type
Union[google.cloud.bigquery.ExternalConfig, None]
- property friendly_name¶
Title of the table (defaults to
None
).- Raises
ValueError – For invalid value types.
- Type
Union[str, None]
- classmethod from_api_repr(resource: dict) google.cloud.bigquery.table.Table [source]¶
Factory: construct a table given its API representation
- Parameters
resource (Dict[str, object]) – Table resource representation from the API
- Returns
Table parsed from
resource
.- Return type
- Raises
KeyError – If the
resource
lacks the key'tableReference'
, or if thedict
stored within the key'tableReference'
lacks the keys'tableId'
,'projectId'
, or'datasetId'
.
- classmethod from_string(full_table_id: str) google.cloud.bigquery.table.Table [source]¶
Construct a table from fully-qualified table ID.
- Parameters
full_table_id (str) – A fully-qualified table ID in standard SQL format. Must included a project ID, dataset ID, and table ID, each separated by
.
.- Returns
Table parsed from
full_table_id
.- Return type
Examples
>>> Table.from_string('my-project.mydataset.mytable') Table(TableRef...(D...('my-project', 'mydataset'), 'mytable'))
- Raises
ValueError – If
full_table_id
is not a fully-qualified table ID in standard SQL format.
- property full_table_id¶
ID for the table (
None
until set from the server).In the format
project-id:dataset_id.table_id
.- Type
Union[str, None]
- property labels¶
Labels for the table.
This method always returns a dict. To change a table’s labels, modify the dict, then call
Client.update_table
. To delete a label, set its value toNone
before updating.- Raises
ValueError – If
value
type is invalid.- Type
- property modified¶
Datetime at which the table was last modified (
None
until set from the server).- Type
Union[datetime.datetime, None]
- property mview_enable_refresh¶
Enable automatic refresh of the materialized view when the base table is updated. The default value is
True
.- Type
Optional[bool]
- property mview_last_refresh_time¶
Datetime at which the materialized view was last refreshed (
None
until set from the server).- Type
Optional[datetime.datetime]
- property mview_query¶
SQL query defining the table as a materialized view (defaults to
None
).- Type
Optional[str]
- property mview_refresh_interval¶
The maximum frequency at which this materialized view will be refreshed. The default value is 1800000 milliseconds (30 minutes).
- Type
Optional[datetime.timedelta]
- property num_bytes¶
The size of the table in bytes (
None
until set from the server).- Type
Union[int, None]
- property num_rows¶
The number of rows in the table (
None
until set from the server).- Type
Union[int, None]
- property partition_expiration¶
Expiration time in milliseconds for a partition.
If
partition_expiration
is set andtype_
is not set,type_
will default toDAY
.- Type
Union[int, None]
- property partitioning_type¶
Time partitioning of the table if it is partitioned (Defaults to
None
).- Type
Union[str, None]
- property range_partitioning¶
Optional[google.cloud.bigquery.table.RangePartitioning]: Configures range-based partitioning for a table.
Note
Beta. The integer range partitioning feature is in a pre-release state and might change or have limited support.
Only specify at most one of
time_partitioning
orrange_partitioning
.- Raises
ValueError – If the value is not
RangePartitioning
orNone
.
- property reference¶
A
TableReference
pointing to this table.- Returns
pointer to this table.
- Return type
- property require_partition_filter¶
If set to true, queries over the partitioned table require a partition filter that can be used for partition elimination to be specified.
- Type
- property schema¶
- Sequence[Union[
SchemaField
, Mapping[str, Any] ]]: Table’s schema.
- Raises
Exception – If
schema
is not a sequence, or if any item in the sequence is not aSchemaField
instance or a compatible mapping representation of the field.
- Sequence[Union[
- property self_link¶
URL for the table resource (
None
until set from the server).- Type
Union[str, None]
- property snapshot_definition: Optional[google.cloud.bigquery.table.SnapshotDefinition]¶
Information about the snapshot. This value is set via snapshot creation.
See: https://cloud.google.com/bigquery/docs/reference/rest/v2/tables#Table.FIELDS.snapshot_definition
- property streaming_buffer¶
Information about a table’s streaming buffer.
- Type
google.cloud.bigquery.StreamingBuffer
- property table_constraints: Optional[google.cloud.bigquery.table.TableConstraints]¶
Tables Primary Key and Foreign Key information.
- property table_type¶
The type of the table (
None
until set from the server).Possible values are
'TABLE'
,'VIEW'
,'MATERIALIZED_VIEW'
or'EXTERNAL'
.- Type
Union[str, None]
- property time_partitioning¶
Configures time-based partitioning for a table.
Only specify at most one of
time_partitioning
orrange_partitioning
.- Raises
ValueError – If the value is not
TimePartitioning
orNone
.- Type
- to_bqstorage() str [source]¶
Construct a BigQuery Storage API representation of this table.
- Returns
A reference to this table in the BigQuery Storage API.
- Return type
- property view_query¶
SQL query defining the table as a view (defaults to
None
).By default, the query is treated as Standard SQL. To use Legacy SQL, set
view_use_legacy_sql
toTrue
.- Raises
ValueError – For invalid value types.
- Type
Union[str, None]
- property view_use_legacy_sql¶
Specifies whether to execute the view with Legacy or Standard SQL.
This boolean specifies whether to execute the view with Legacy SQL (
True
) or Standard SQL (False
). The client side default isFalse
. The server-side default isTrue
. If this table is not a view,None
is returned.- Raises
ValueError – For invalid value types.
- Type
- class google.cloud.bigquery.table.TableConstraints(primary_key: Optional[google.cloud.bigquery.table.PrimaryKey], foreign_keys: Optional[List[google.cloud.bigquery.table.ForeignKey]])[source]¶
The TableConstraints defines the primary key and foreign key.
- Parameters
primary_key – Represents a primary key constraint on a table’s columns. Present only if the table has a primary key. The primary key is not enforced.
foreign_keys – Present only if the table has a foreign key. The foreign key is not enforced.
- class google.cloud.bigquery.table.TableListItem(resource)[source]¶
A read-only table resource from a list operation.
For performance reasons, the BigQuery API only includes some of the table properties when listing tables. Notably,
schema
andnum_rows
are missing.For a full list of the properties that the BigQuery API returns, see the REST documentation for tables.list.
- Parameters
resource (Dict[str, object]) – A table-like resource object from a table list response. A
tableReference
property is required.- Raises
ValueError – If
tableReference
or one of its required members is missing fromresource
.
- property clustering_fields¶
Fields defining clustering for the table
(Defaults to
None
).Clustering fields are immutable after table creation.
Note
BigQuery supports clustering for both partitioned and non-partitioned tables.
- Type
Union[List[str], None]
- property created¶
Datetime at which the table was created (
None
until set from the server).- Type
Union[datetime.datetime, None]
- property expires¶
Datetime at which the table will be deleted.
- Type
Union[datetime.datetime, None]
- classmethod from_string(full_table_id: str) google.cloud.bigquery.table.TableListItem [source]¶
Construct a table from fully-qualified table ID.
- Parameters
full_table_id (str) – A fully-qualified table ID in standard SQL format. Must included a project ID, dataset ID, and table ID, each separated by
.
.- Returns
Table parsed from
full_table_id
.- Return type
Examples
>>> Table.from_string('my-project.mydataset.mytable') Table(TableRef...(D...('my-project', 'mydataset'), 'mytable'))
- Raises
ValueError – If
full_table_id
is not a fully-qualified table ID in standard SQL format.
- property full_table_id¶
ID for the table (
None
until set from the server).In the format
project_id:dataset_id.table_id
.- Type
Union[str, None]
- property labels¶
Labels for the table.
This method always returns a dict. To change a table’s labels, modify the dict, then call
Client.update_table
. To delete a label, set its value toNone
before updating.
- property partition_expiration¶
Expiration time in milliseconds for a partition.
If this property is set and
type_
is not set,type_
will default toTimePartitioningType.DAY
.- Type
Union[int, None]
- property partitioning_type¶
Time partitioning of the table if it is partitioned (Defaults to
None
).- Type
Union[str, None]
- property reference¶
A
TableReference
pointing to this table.- Returns
pointer to this table.
- Return type
- property table_type¶
The type of the table (
None
until set from the server).Possible values are
'TABLE'
,'VIEW'
, or'EXTERNAL'
.- Type
Union[str, None]
- property time_partitioning¶
Configures time-based partitioning for a table.
- to_bqstorage() str [source]¶
Construct a BigQuery Storage API representation of this table.
- Returns
A reference to this table in the BigQuery Storage API.
- Return type
- property view_use_legacy_sql¶
Specifies whether to execute the view with Legacy or Standard SQL.
This boolean specifies whether to execute the view with Legacy SQL (
True
) or Standard SQL (False
). The client side default isFalse
. The server-side default isTrue
. If this table is not a view,None
is returned.- Raises
ValueError – For invalid value types.
- Type
- class google.cloud.bigquery.table.TableReference(dataset_ref: DatasetReference, table_id: str)[source]¶
TableReferences are pointers to tables.
See https://cloud.google.com/bigquery/docs/reference/rest/v2/tables#tablereference
- Parameters
dataset_ref – A pointer to the dataset
table_id – The ID of the table
- classmethod from_api_repr(resource: dict) google.cloud.bigquery.table.TableReference [source]¶
Factory: construct a table reference given its API representation
- classmethod from_string(table_id: str, default_project: Optional[str] = None) google.cloud.bigquery.table.TableReference [source]¶
Construct a table reference from table ID string.
- Parameters
- Returns
Table reference parsed from
table_id
.- Return type
Examples
>>> TableReference.from_string('my-project.mydataset.mytable') TableRef...(DatasetRef...('my-project', 'mydataset'), 'mytable')
- Raises
ValueError – If
table_id
is not a fully-qualified table ID in standard SQL format.
- to_bqstorage() str [source]¶
Construct a BigQuery Storage API representation of this table.
Install the
google-cloud-bigquery-storage
package to use this feature.If the
table_id
contains a partition identifier (e.g.my_table$201812
) or a snapshot identifier (e.g.mytable@1234567890
), it is ignored. Usegoogle.cloud.bigquery_storage.types.ReadSession.TableReadOptions
to filter rows by partition. Usegoogle.cloud.bigquery_storage.types.ReadSession.TableModifiers
to select a specific snapshot to read from.- Returns
A reference to this table in the BigQuery Storage API.
- Return type
- class google.cloud.bigquery.table.TimePartitioning(type_=None, field=None, expiration_ms=None, require_partition_filter=None)[source]¶
Configures time-based partitioning for a table.
- Parameters
type (Optional[google.cloud.bigquery.table.TimePartitioningType]) –
Specifies the type of time partitioning to perform. Defaults to
DAY
.Supported values are:
field (Optional[str]) –
If set, the table is partitioned by this field. If not set, the table is partitioned by pseudo column
_PARTITIONTIME
. The field must be a top-levelTIMESTAMP
,DATETIME
, orDATE
field. Its mode must beNULLABLE
orREQUIRED
.See the time-unit column-partitioned tables guide in the BigQuery documentation.
expiration_ms (Optional[int]) – Number of milliseconds for which to keep the storage for a partition.
require_partition_filter (Optional[bool]) – DEPRECATED: Use
require_partition_filter
, instead.
- classmethod from_api_repr(api_repr: dict) google.cloud.bigquery.table.TimePartitioning [source]¶
Return a
TimePartitioning
object deserialized from a dict.This method creates a new
TimePartitioning
instance that points to theapi_repr
parameter as its internal properties dict. This means that when aTimePartitioning
instance is stored as a property of another object, any changes made at the higher level will also appear here:>>> time_partitioning = TimePartitioning() >>> table.time_partitioning = time_partitioning >>> table.time_partitioning.field = 'timecolumn' >>> time_partitioning.field 'timecolumn'
- Parameters
api_repr (Mapping[str, str]) – The serialized representation of the TimePartitioning, such as what is output by
to_api_repr()
.- Returns
The
TimePartitioning
object.- Return type
- property require_partition_filter¶
Specifies whether partition filters are required for queries
DEPRECATED: Use
require_partition_filter
, instead.- Type
- to_api_repr() dict [source]¶
Return a dictionary representing this object.
This method returns the properties dict of the
TimePartitioning
instance rather than making a copy. This means that when aTimePartitioning
instance is stored as a property of another object, any changes made at the higher level will also appear here.- Returns
A dictionary representing the TimePartitioning object in serialized form.
- Return type
- property type_¶
The type of time partitioning to use.
Model¶
Define resources for the BigQuery ML Models API.
- class google.cloud.bigquery.model.Model(model_ref: Optional[Union[google.cloud.bigquery.model.ModelReference, str]])[source]¶
Model represents a machine learning model resource.
See https://cloud.google.com/bigquery/docs/reference/rest/v2/models
- Parameters
model_ref – A pointer to a model. If
model_ref
is a string, it must included a project ID, dataset ID, and model ID, each separated by.
.
- property best_trial_id: Optional[int]¶
The best trial_id across all training runs.
Deprecated since version This: property is deprecated!
Read-only.
- property created: Optional[datetime.datetime]¶
Datetime at which the model was created (
None
until set from the server).Read-only.
- property encryption_configuration: Optional[google.cloud.bigquery.encryption_configuration.EncryptionConfiguration]¶
Custom encryption configuration for the model.
Custom encryption configuration (e.g., Cloud KMS keys) or
None
if using default encryption.See protecting data with Cloud KMS keys in the BigQuery documentation.
- property etag: Optional[str]¶
ETag for the model resource (
None
until set from the server).Read-only.
- property expires: Optional[datetime.datetime]¶
The datetime when this model expires.
If not present, the model will persist indefinitely. Expired models will be deleted and their storage reclaimed.
- property feature_columns: Sequence[google.cloud.bigquery.standard_sql.StandardSqlField]¶
Input feature columns that were used to train this model.
Read-only.
- classmethod from_api_repr(resource: Dict[str, Any]) google.cloud.bigquery.model.Model [source]¶
Factory: construct a model resource given its API representation
- Parameters
resource – Model resource representation from the API
- Returns
Model parsed from
resource
.
- property label_columns: Sequence[google.cloud.bigquery.standard_sql.StandardSqlField]¶
Label columns that were used to train this model.
The output of the model will have a
predicted_
prefix to these columns.Read-only.
- property labels: Dict[str, str]¶
Labels for the table.
This method always returns a dict. To change a model’s labels, modify the dict, then call
Client.update_model
. To delete a label, set its value toNone
before updating.
- property location: Optional[str]¶
The geographic location where the model resides.
This value is inherited from the dataset.
Read-only.
- property modified: Optional[datetime.datetime]¶
Datetime at which the model was last modified (
None
until set from the server).Read-only.
- property reference: Optional[google.cloud.bigquery.model.ModelReference]¶
A model reference pointing to this model.
Read-only.
- to_api_repr() Dict[str, Any] [source]¶
Construct the API resource representation of this model.
- Returns
Model reference represented as an API resource
- property training_runs: Sequence[Dict[str, Any]]¶
Information for all training runs in increasing order of start time.
Dictionaries are in REST API format. See: https://cloud.google.com/bigquery/docs/reference/rest/v2/models#trainingrun
Read-only.
- property transform_columns: Sequence[google.cloud.bigquery.model.TransformColumn]¶
The input feature columns that were used to train this model. The output transform columns used to train this model.
See REST API: https://cloud.google.com/bigquery/docs/reference/rest/v2/models#transformcolumn
Read-only.
- class google.cloud.bigquery.model.ModelReference[source]¶
ModelReferences are pointers to models.
See https://cloud.google.com/bigquery/docs/reference/rest/v2/models#modelreference
- classmethod from_api_repr(resource: Dict[str, Any]) google.cloud.bigquery.model.ModelReference [source]¶
Factory: construct a model reference given its API representation.
- Parameters
resource – Model reference representation returned from the API
- Returns
Model reference parsed from
resource
.
- classmethod from_string(model_id: str, default_project: Optional[str] = None) google.cloud.bigquery.model.ModelReference [source]¶
Construct a model reference from model ID string.
- Parameters
model_id – A model ID in standard SQL format. If
default_project
is not specified, this must included a project ID, dataset ID, and model ID, each separated by.
.default_project – The project ID to use when
model_id
does not include a project ID.
- Returns
Model reference parsed from
model_id
.- Raises
ValueError – If
model_id
is not a fully-qualified table ID in standard SQL format.
- class google.cloud.bigquery.model.TransformColumn(resource: Dict[str, Any])[source]¶
TransformColumn represents a transform column feature.
See https://cloud.google.com/bigquery/docs/reference/rest/v2/models#transformcolumn
- Parameters
resource – A dictionary representing a transform column feature.
- classmethod from_api_repr(resource: Dict[str, Any]) google.cloud.bigquery.model.TransformColumn [source]¶
Constructs a transform column feature given its API representation
- Parameters
resource – Transform column feature representation from the API
- Returns
Transform column feature parsed from
resource
.
- property type_: Optional[google.cloud.bigquery.standard_sql.StandardSqlDataType]¶
Data type of the column after the transform.
- Returns
Data type of the column.
- Return type
Optional[google.cloud.bigquery.standard_sql.StandardSqlDataType]
Routine¶
User-Defined Routines.
- class google.cloud.bigquery.routine.DeterminismLevel[source]¶
Specifies determinism level for JavaScript user-defined functions (UDFs).
https://cloud.google.com/bigquery/docs/reference/rest/v2/routines#DeterminismLevel
- DETERMINISM_LEVEL_UNSPECIFIED = 'DETERMINISM_LEVEL_UNSPECIFIED'¶
The determinism of the UDF is unspecified.
- DETERMINISTIC = 'DETERMINISTIC'¶
The UDF is deterministic, meaning that 2 function calls with the same inputs always produce the same result, even across 2 query runs.
- NOT_DETERMINISTIC = 'NOT_DETERMINISTIC'¶
The UDF is not deterministic.
- class google.cloud.bigquery.routine.RemoteFunctionOptions(endpoint=None, connection=None, max_batching_rows=None, user_defined_context=None, _properties=None)[source]¶
Configuration options for controlling remote BigQuery functions.
- property connection¶
Fully qualified name of the user-provided connection object which holds the authentication information to send requests to the remote service.
Format is “projects/{projectId}/locations/{locationId}/connections/{connectionId}”
- Type
string
- property endpoint¶
Endpoint of the user-provided remote service
Example: “https://us-east1-my_gcf_project.cloudfunctions.net/remote_add”
- Type
string
- classmethod from_api_repr(resource: dict) google.cloud.bigquery.routine.routine.RemoteFunctionOptions [source]¶
Factory: construct remote function options given its API representation.
- property max_batching_rows¶
Max number of rows in each batch sent to the remote service.
If absent or if 0, BigQuery dynamically decides the number of rows in a batch.
- Type
int64
- class google.cloud.bigquery.routine.Routine(routine_ref, **kwargs)[source]¶
Resource representing a user-defined routine.
See https://cloud.google.com/bigquery/docs/reference/rest/v2/routines
- Parameters
routine_ref (Union[str, google.cloud.bigquery.routine.RoutineReference]) – A pointer to a routine. If
routine_ref
is a string, it must included a project ID, dataset ID, and routine ID, each separated by.
.**kwargs (Dict) – Initial property values.
- property arguments¶
Input/output argument of a function or a stored procedure.
In-place modification is not supported. To set, replace the entire property value with the modified list of
RoutineArgument
objects.
- property created¶
Datetime at which the routine was created (
None
until set from the server).Read-only.
- Type
Optional[datetime.datetime]
- property data_governance_type¶
If set to
DATA_MASKING
, the function is validated and made available as a masking function.- Raises
ValueError – If the value is not
string
orNone
.- Type
Optional[str]
- property determinism_level¶
(experimental) The determinism level of the JavaScript UDF if defined.
- Type
Optional[str]
- classmethod from_api_repr(resource: dict) google.cloud.bigquery.routine.routine.Routine [source]¶
Factory: construct a routine given its API representation.
- property imported_libraries¶
The path of the imported JavaScript libraries.
The
language
must equalJAVACRIPT
.Examples
Set the
imported_libraries
to a list of Google Cloud Storage URIs.routine = bigquery.Routine("proj.dataset.routine_id") routine.imported_libraries = [ "gs://cloud-samples-data/bigquery/udfs/max-value.js", ]
- Type
List[str]
- property modified¶
Datetime at which the routine was last modified (
None
until set from the server).Read-only.
- Type
Optional[datetime.datetime]
- property reference¶
Reference describing the ID of this routine.
- property remote_function_options¶
Optional[google.cloud.bigquery.routine.RemoteFunctionOptions]: Configures remote function options for a routine.
- Raises
ValueError – If the value is not
RemoteFunctionOptions
orNone
.
- property return_table_type: Optional[google.cloud.bigquery.standard_sql.StandardSqlTableType]¶
The return type of a Table Valued Function (TVF) routine.
New in version 2.22.0.
- property return_type¶
Return type of the routine.
If absent, the return type is inferred from
body
at query time in each query that references this routine. If present, then the evaluated result will be cast to the specified returned type at query time.See: https://cloud.google.com/bigquery/docs/reference/rest/v2/routines#Routine.FIELDS.return_type
- Type
google.cloud.bigquery.StandardSqlDataType
- property type_¶
The fine-grained type of the routine.
See: https://cloud.google.com/bigquery/docs/reference/rest/v2/routines#RoutineType
- Type
- class google.cloud.bigquery.routine.RoutineArgument(**kwargs)[source]¶
Input/output argument of a function or a stored procedure.
See: https://cloud.google.com/bigquery/docs/reference/rest/v2/routines#argument
- Parameters
**kwargs (Dict) – Initial property values.
- property data_type¶
Type of a variable, e.g., a function argument.
See: https://cloud.google.com/bigquery/docs/reference/rest/v2/routines#Argument.FIELDS.data_type
- Type
Optional[google.cloud.bigquery.StandardSqlDataType]
- classmethod from_api_repr(resource: dict) google.cloud.bigquery.routine.routine.RoutineArgument [source]¶
Factory: construct a routine argument given its API representation.
- property kind¶
The kind of argument, for example
FIXED_TYPE
orANY_TYPE
.See: https://cloud.google.com/bigquery/docs/reference/rest/v2/routines#Argument.FIELDS.argument_kind
- Type
Optional[str]
- class google.cloud.bigquery.routine.RoutineReference[source]¶
A pointer to a routine.
See: https://cloud.google.com/bigquery/docs/reference/rest/v2/routines#routinereference
- __str__()[source]¶
String representation of the reference.
This is a fully-qualified ID, including the project ID and dataset ID.
- classmethod from_api_repr(resource: dict) google.cloud.bigquery.routine.routine.RoutineReference [source]¶
Factory: construct a routine reference given its API representation.
- classmethod from_string(routine_id: str, default_project: Optional[str] = None) google.cloud.bigquery.routine.routine.RoutineReference [source]¶
Factory: construct a routine reference from routine ID string.
- Parameters
- Returns
Routine reference parsed from
routine_id
.- Return type
- Raises
ValueError – If
routine_id
is not a fully-qualified routine ID in standard SQL format.
- class google.cloud.bigquery.routine.RoutineType[source]¶
The fine-grained type of the routine.
https://cloud.google.com/bigquery/docs/reference/rest/v2/routines#routinetype
New in version 2.22.0.
Schema¶
Schemas for BigQuery tables / queries.
- class google.cloud.bigquery.schema.FieldElementType(element_type: str)[source]¶
Represents the type of a field element.
- Parameters
element_type (str) – The type of a field element.
- google.cloud.bigquery.schema.LEGACY_TO_STANDARD_TYPES = {'BIGNUMERIC': StandardSqlTypeNames.BIGNUMERIC, 'BOOL': StandardSqlTypeNames.BOOL, 'BOOLEAN': StandardSqlTypeNames.BOOL, 'BYTES': StandardSqlTypeNames.BYTES, 'DATE': StandardSqlTypeNames.DATE, 'DATETIME': StandardSqlTypeNames.DATETIME, 'FLOAT': StandardSqlTypeNames.FLOAT64, 'FLOAT64': StandardSqlTypeNames.FLOAT64, 'GEOGRAPHY': StandardSqlTypeNames.GEOGRAPHY, 'INT64': StandardSqlTypeNames.INT64, 'INTEGER': StandardSqlTypeNames.INT64, 'NUMERIC': StandardSqlTypeNames.NUMERIC, 'RECORD': StandardSqlTypeNames.STRUCT, 'STRING': StandardSqlTypeNames.STRING, 'STRUCT': StandardSqlTypeNames.STRUCT, 'TIME': StandardSqlTypeNames.TIME, 'TIMESTAMP': StandardSqlTypeNames.TIMESTAMP}¶
String names of the legacy SQL types to integer codes of Standard SQL standard_sql.
- class google.cloud.bigquery.schema.PolicyTagList(names: Iterable[str] = ())[source]¶
Define Policy Tags for a column.
- Parameters
( (names) – Optional[Tuple[str]]): list of policy tags to associate with the column. Policy tag identifiers are of the form projects/*/locations/*/taxonomies/*/policyTags/*.
- classmethod from_api_repr(api_repr: dict) google.cloud.bigquery.schema.PolicyTagList [source]¶
Return a
PolicyTagList
object deserialized from a dict.This method creates a new
PolicyTagList
instance that points to theapi_repr
parameter as its internal properties dict. This means that when aPolicyTagList
instance is stored as a property of another object, any changes made at the higher level will also appear here.- Parameters
api_repr (Mapping[str, str]) – The serialized representation of the PolicyTagList, such as what is output by
to_api_repr()
.- Returns
The
PolicyTagList
object or None.- Return type
- to_api_repr() dict [source]¶
Return a dictionary representing this object.
This method returns the properties dict of the
PolicyTagList
instance rather than making a copy. This means that when aPolicyTagList
instance is stored as a property of another object, any changes made at the higher level will also appear here.- Returns
A dictionary representing the PolicyTagList object in serialized form.
- Return type
- class google.cloud.bigquery.schema.SchemaField(name: str, field_type: str, mode: str = 'NULLABLE', default_value_expression: Optional[str] = None, description: Union[str, google.cloud.bigquery.schema._DefaultSentinel] = _DefaultSentinel.DEFAULT_VALUE, fields: Iterable[google.cloud.bigquery.schema.SchemaField] = (), policy_tags: Union[google.cloud.bigquery.schema.PolicyTagList, None, google.cloud.bigquery.schema._DefaultSentinel] = _DefaultSentinel.DEFAULT_VALUE, precision: Union[int, google.cloud.bigquery.schema._DefaultSentinel] = _DefaultSentinel.DEFAULT_VALUE, scale: Union[int, google.cloud.bigquery.schema._DefaultSentinel] = _DefaultSentinel.DEFAULT_VALUE, max_length: Union[int, google.cloud.bigquery.schema._DefaultSentinel] = _DefaultSentinel.DEFAULT_VALUE, range_element_type: Optional[Union[google.cloud.bigquery.schema.FieldElementType, str]] = None)[source]¶
Describe a single field within a table schema.
- Parameters
name – The name of the field.
field_type – The type of the field. See https://cloud.google.com/bigquery/docs/reference/rest/v2/tables#TableFieldSchema.FIELDS.type
mode – Defaults to
'NULLABLE'
. The mode of the field. See https://cloud.google.com/bigquery/docs/reference/rest/v2/tables#TableFieldSchema.FIELDS.modedescription – Description for the field.
fields – Subfields (requires
field_type
of ‘RECORD’).policy_tags – The policy tag list for the field.
precision – Precison (number of digits) of fields with NUMERIC or BIGNUMERIC type.
scale – Scale (digits after decimal) of fields with NUMERIC or BIGNUMERIC type.
max_length – Maximum length of fields with STRING or BYTES type.
default_value_expression –
str, Optional Used to specify the default value of a field using a SQL expression. It can only be set for top level fields (columns).
You can use a struct or array expression to specify default value for the entire struct or array. The valid SQL expressions are:
Literals for all data types, including STRUCT and ARRAY.
The following functions:
CURRENT_TIMESTAMP CURRENT_TIME CURRENT_DATE CURRENT_DATETIME GENERATE_UUID RAND SESSION_USER ST_GEOPOINT
Struct or array composed with the above allowed functions, for example:
”[CURRENT_DATE(), DATE ‘2020-01-01’”]
range_element_type – FieldElementType, str, Optional The subtype of the RANGE, if the type of this field is RANGE. If the type is RANGE, this field is required. Possible values for the field element type of a RANGE include DATE, DATETIME and TIMESTAMP.
- property default_value_expression¶
Optional[str] default value of a field, using an SQL expression
- property field_type¶
The type of the field.
See: https://cloud.google.com/bigquery/docs/reference/rest/v2/tables#TableFieldSchema.FIELDS.type
- Type
- property fields¶
Subfields contained in this field.
Must be empty unset if
field_type
is not ‘RECORD’.- Type
Optional[tuple]
- classmethod from_api_repr(api_repr: dict) google.cloud.bigquery.schema.SchemaField [source]¶
Return a
SchemaField
object deserialized from a dictionary.- Parameters
api_repr (Mapping[str, str]) – The serialized representation of the SchemaField, such as what is output by
to_api_repr()
.- Returns
The
SchemaField
object.- Return type
- property mode¶
The mode of the field.
See: https://cloud.google.com/bigquery/docs/reference/rest/v2/tables#TableFieldSchema.FIELDS.mode
- Type
Optional[str]
- property policy_tags¶
Policy tag list definition for this field.
- Type
- property range_element_type¶
The subtype of the RANGE, if the type of this field is RANGE.
Must be set when
type
is “RANGE”. Must be one of “DATE”, “DATETIME” or “TIMESTAMP”.- Type
Optional[FieldElementType]
- to_api_repr() dict [source]¶
Return a dictionary representing this schema field.
- Returns
A dictionary representing the SchemaField in a serialized form.
- Return type
Dict
- to_standard_sql() google.cloud.bigquery.standard_sql.StandardSqlField [source]¶
Return the field as the standard SQL field representation object.
Query¶
Retries¶
- google.cloud.bigquery.retry.DEFAULT_GET_JOB_TIMEOUT = 128¶
Default timeout for Client.get_job().
- google.cloud.bigquery.retry.DEFAULT_JOB_RETRY = <google.api_core.retry.retry_unary.Retry object>¶
The default job retry object.
- google.cloud.bigquery.retry.DEFAULT_RETRY = <google.api_core.retry.retry_unary.Retry object>¶
The default retry object.
Any method with a
retry
parameter will be retried automatically, with reasonable defaults. To disable retry, passretry=None
. To modify the default retry behavior, call awith_XXX
method onDEFAULT_RETRY
. For example, to change the deadline to 30 seconds, passretry=bigquery.DEFAULT_RETRY.with_deadline(30)
.
- google.cloud.bigquery.retry.DEFAULT_TIMEOUT = None¶
The default API timeout.
This is the time to wait per request. To adjust the total wait time, set a deadline on the retry object.
- google.cloud.bigquery.retry.POLLING_DEFAULT_VALUE = <object object>¶
Default value defined in google.api_core.future.polling.PollingFuture.
External Configuration¶
Define classes that describe external data sources.
These are used for both Table.externalDataConfiguration and Job.configuration.query.tableDefinitions.
- class google.cloud.bigquery.external_config.BigtableColumn[source]¶
Options for a Bigtable column.
- property encoding¶
The encoding of the values when the type is not STRING
See https://cloud.google.com/bigquery/docs/reference/rest/v2/tables#BigtableColumn.FIELDS.encoding
- Type
- property field_name¶
An identifier to use if the qualifier is not a valid BigQuery field identifier
See https://cloud.google.com/bigquery/docs/reference/rest/v2/tables#BigtableColumn.FIELDS.field_name
- Type
- classmethod from_api_repr(resource: dict) google.cloud.bigquery.external_config.BigtableColumn [source]¶
Factory: construct a
BigtableColumn
instance given its API representation.- Parameters
resource (Dict[str, Any]) – Definition of a
BigtableColumn
instance in the same representation as is returned from the API.- Returns
Configuration parsed from
resource
.- Return type
- property only_read_latest¶
If this is set, only the latest version of value in this column are exposed.
- Type
- property qualifier_encoded¶
The qualifier encoded in binary.
The type is
str
(Python 2.x) orbytes
(Python 3.x). The module will handle base64 encoding for you.
- property qualifier_string¶
A valid UTF-8 string qualifier
- Type
- to_api_repr() dict [source]¶
Build an API representation of this object.
- Returns
A dictionary in the format used by the BigQuery API.
- Return type
Dict[str, Any]
- property type_¶
The type to convert the value in cells of this column.
See https://cloud.google.com/bigquery/docs/reference/rest/v2/tables#BigtableColumn.FIELDS.type
- Type
- class google.cloud.bigquery.external_config.BigtableColumnFamily[source]¶
Options for a Bigtable column family.
- property columns¶
Lists of columns that should be exposed as individual fields.
See https://cloud.google.com/bigquery/docs/reference/rest/v2/tables#BigtableColumnFamily.FIELDS.columns
- Type
List[BigtableColumn]
- property encoding¶
The encoding of the values when the type is not STRING
See https://cloud.google.com/bigquery/docs/reference/rest/v2/tables#BigtableColumnFamily.FIELDS.encoding
- Type
- property family_id¶
Identifier of the column family.
- Type
- classmethod from_api_repr(resource: dict) google.cloud.bigquery.external_config.BigtableColumnFamily [source]¶
Factory: construct a
BigtableColumnFamily
instance given its API representation.- Parameters
resource (Dict[str, Any]) – Definition of a
BigtableColumnFamily
instance in the same representation as is returned from the API.- Returns
Configuration parsed from
resource
.- Return type
- property only_read_latest¶
If this is set only the latest version of value are exposed for all columns in this column family.
- Type
- to_api_repr() dict [source]¶
Build an API representation of this object.
- Returns
A dictionary in the format used by the BigQuery API.
- Return type
Dict[str, Any]
- property type_¶
The type to convert the value in cells of this column family.
See https://cloud.google.com/bigquery/docs/reference/rest/v2/tables#BigtableColumnFamily.FIELDS.type
- Type
- class google.cloud.bigquery.external_config.BigtableOptions[source]¶
Options that describe how to treat Bigtable tables as BigQuery tables.
- property column_families¶
List of column families to expose in the table schema along with their types.
- Type
List[
BigtableColumnFamily
]
- classmethod from_api_repr(resource: dict) google.cloud.bigquery.external_config.BigtableOptions [source]¶
Factory: construct a
BigtableOptions
instance given its API representation.- Parameters
resource (Dict[str, Any]) – Definition of a
BigtableOptions
instance in the same representation as is returned from the API.- Returns
Configuration parsed from
resource
.- Return type
- property ignore_unspecified_column_families¶
If
True
, ignore columns not specified incolumn_families
list. Defaults toFalse
.- Type
- class google.cloud.bigquery.external_config.CSVOptions[source]¶
Options that describe how to treat CSV files as BigQuery tables.
- property allow_jagged_rows¶
If
True
, BigQuery treats missing trailing columns as null values. Defaults toFalse
.See https://cloud.google.com/bigquery/docs/reference/rest/v2/tables#CsvOptions.FIELDS.allow_jagged_rows
- Type
- property allow_quoted_newlines¶
If
True
, quoted data sections that contain newline characters in a CSV file are allowed. Defaults toFalse
.- Type
- property encoding¶
The character encoding of the data.
See https://cloud.google.com/bigquery/docs/reference/rest/v2/tables#CsvOptions.FIELDS.encoding
- Type
- property field_delimiter¶
The separator for fields in a CSV file. Defaults to comma (‘,’).
See https://cloud.google.com/bigquery/docs/reference/rest/v2/tables#CsvOptions.FIELDS.field_delimiter
- Type
- classmethod from_api_repr(resource: dict) google.cloud.bigquery.external_config.CSVOptions [source]¶
Factory: construct a
CSVOptions
instance given its API representation.- Parameters
resource (Dict[str, Any]) – Definition of a
CSVOptions
instance in the same representation as is returned from the API.- Returns
Configuration parsed from
resource
.- Return type
- property preserve_ascii_control_characters¶
Indicates if the embedded ASCII control characters (the first 32 characters in the ASCII-table, from ‘' to ‘‘) are preserved.
- Type
- property quote_character¶
The value that is used to quote data sections in a CSV file.
See https://cloud.google.com/bigquery/docs/reference/rest/v2/tables#CsvOptions.FIELDS.quote
- Type
- property skip_leading_rows¶
The number of rows at the top of a CSV file.
See https://cloud.google.com/bigquery/docs/reference/rest/v2/tables#CsvOptions.FIELDS.skip_leading_rows
- Type
- class google.cloud.bigquery.external_config.ExternalConfig(source_format)[source]¶
Description of an external data source.
- Parameters
source_format (ExternalSourceFormat) – See
source_format
.
- property avro_options: Optional[google.cloud.bigquery.format_options.AvroOptions]¶
Additional properties to set if
sourceFormat
is set to AVRO.
- property bigtable_options: Optional[google.cloud.bigquery.external_config.BigtableOptions]¶
Additional properties to set if
sourceFormat
is set to BIGTABLE.
- property compression¶
The compression type of the data source.
- Type
- property connection_id¶
[Experimental] ID of a BigQuery Connection API resource.
Warning
This feature is experimental. Pre-GA features may have limited support, and changes to pre-GA features may not be compatible with other pre-GA versions.
- Type
Optional[str]
- property csv_options: Optional[google.cloud.bigquery.external_config.CSVOptions]¶
Additional properties to set if
sourceFormat
is set to CSV.
- property decimal_target_types: Optional[FrozenSet[str]]¶
Possible SQL data types to which the source decimal values are converted.
New in version 2.21.0.
- classmethod from_api_repr(resource: dict) google.cloud.bigquery.external_config.ExternalConfig [source]¶
Factory: construct an
ExternalConfig
instance given its API representation.- Parameters
resource (Dict[str, Any]) – Definition of an
ExternalConfig
instance in the same representation as is returned from the API.- Returns
Configuration parsed from
resource
.- Return type
- property google_sheets_options: Optional[google.cloud.bigquery.external_config.GoogleSheetsOptions]¶
Additional properties to set if
sourceFormat
is set to GOOGLE_SHEETS.
- property hive_partitioning¶
[Beta] When set, it configures hive partitioning support.
Note
Experimental. This feature is experimental and might change or have limited support.
- Type
Optional[
HivePartitioningOptions
]
- property ignore_unknown_values¶
If
True
, extra values that are not represented in the table schema are ignored. Defaults toFalse
.- Type
- property max_bad_records¶
The maximum number of bad records that BigQuery can ignore when reading data.
- Type
- property options: Optional[Union[google.cloud.bigquery.format_options.AvroOptions, google.cloud.bigquery.external_config.BigtableOptions, google.cloud.bigquery.external_config.CSVOptions, google.cloud.bigquery.external_config.GoogleSheetsOptions, google.cloud.bigquery.format_options.ParquetOptions]]¶
Source-specific options.
- property parquet_options: Optional[google.cloud.bigquery.format_options.ParquetOptions]¶
Additional properties to set if
sourceFormat
is set to PARQUET.
- property reference_file_schema_uri¶
Optional[str]: When creating an external table, the user can provide a reference file with the table schema. This is enabled for the following formats:
AVRO, PARQUET, ORC
- property schema¶
The schema for the data.
- Type
List[
SchemaField
]
- property source_format¶
ExternalSourceFormat
: Format of external source.
- class google.cloud.bigquery.external_config.ExternalSourceFormat[source]¶
The format for external data files.
Note that the set of allowed values for external data sources is different than the set used for loading data (see
SourceFormat
).- AVRO = 'AVRO'¶
Specifies Avro format.
- BIGTABLE = 'BIGTABLE'¶
Specifies Bigtable format.
- CSV = 'CSV'¶
Specifies CSV format.
- DATASTORE_BACKUP = 'DATASTORE_BACKUP'¶
Specifies datastore backup format
- GOOGLE_SHEETS = 'GOOGLE_SHEETS'¶
Specifies Google Sheets format.
- NEWLINE_DELIMITED_JSON = 'NEWLINE_DELIMITED_JSON'¶
Specifies newline delimited JSON format.
- ORC = 'ORC'¶
Specifies ORC format.
- PARQUET = 'PARQUET'¶
Specifies Parquet format.
- class google.cloud.bigquery.external_config.GoogleSheetsOptions[source]¶
Options that describe how to treat Google Sheets as BigQuery tables.
- classmethod from_api_repr(resource: dict) google.cloud.bigquery.external_config.GoogleSheetsOptions [source]¶
Factory: construct a
GoogleSheetsOptions
instance given its API representation.- Parameters
resource (Dict[str, Any]) – Definition of a
GoogleSheetsOptions
instance in the same representation as is returned from the API.- Returns
Configuration parsed from
resource
.- Return type
- property range¶
The range of a sheet that BigQuery will query from.
See https://cloud.google.com/bigquery/docs/reference/rest/v2/tables#GoogleSheetsOptions.FIELDS.range
- Type
- property skip_leading_rows¶
The number of rows at the top of a sheet that BigQuery will skip when reading the data.
- Type
- class google.cloud.bigquery.external_config.HivePartitioningOptions[source]¶
[Beta] Options that configure hive partitioning.
Note
Experimental. This feature is experimental and might change or have limited support.
See https://cloud.google.com/bigquery/docs/reference/rest/v2/tables#HivePartitioningOptions
- classmethod from_api_repr(resource: dict) google.cloud.bigquery.external_config.HivePartitioningOptions [source]¶
Factory: construct a
HivePartitioningOptions
instance given its API representation.- Parameters
resource (Dict[str, Any]) – Definition of a
HivePartitioningOptions
instance in the same representation as is returned from the API.- Returns
Configuration parsed from
resource
.- Return type
- property mode¶
When set, what mode of hive partitioning to use when reading data.
Two modes are supported: “AUTO” and “STRINGS”.
See https://cloud.google.com/bigquery/docs/reference/rest/v2/tables#HivePartitioningOptions.FIELDS.mode
- Type
Optional[str]
- property require_partition_filter¶
If set to true, queries over the partitioned table require a partition filter that can be used for partition elimination to be specified.
See https://cloud.google.com/bigquery/docs/reference/rest/v2/tables#HivePartitioningOptions.FIELDS.mode
- Type
Optional[bool]
Magics¶
Enums¶
Encryption Configuration¶
Define class for the custom encryption configuration.
- class google.cloud.bigquery.encryption_configuration.EncryptionConfiguration(kms_key_name=None)[source]¶
Custom encryption configuration (e.g., Cloud KMS keys).
- Parameters
kms_key_name (str) – resource ID of Cloud KMS key used for encryption
- classmethod from_api_repr(resource)[source]¶
Construct an encryption configuration from its API representation
Additional Types¶
Helper SQL type classes.
Legacy proto-based Types (deprecated)¶
The legacy type classes based on protocol buffers.
Deprecated since version 3.0.0: These types are provided for backward compatibility only, and are not maintained anymore.