google.cloud.bigquery.client.Client¶
-
class
google.cloud.bigquery.client.
Client
(project=None, credentials=None, _http=None, location=None, default_query_job_config=None, client_info=None, client_options=None)[source]¶ Client to bundle configuration needed for API requests.
- Parameters
project (Optional[str]) – Project ID for the project which the client acts on behalf of. Will be passed when creating a dataset / job. If not passed, falls back to the default inferred from the environment.
credentials (Optional[google.auth.credentials.Credentials]) – The OAuth2 Credentials to use for this client. If not passed (and if no
_http
object is passed), falls back to the default inferred from the environment._http (Optional[requests.Session]) – HTTP object to make requests. Can be any object that defines
request()
with the same interface asrequests.Session.request()
. If not passed, an_http
object is created that is bound to thecredentials
for the current object. This parameter should be considered private, and could change in the future.location (Optional[str]) – Default location for jobs / datasets / tables.
default_query_job_config (Optional[google.cloud.bigquery.job.QueryJobConfig]) – Default
QueryJobConfig
. Will be merged into job configs passed into thequery
method.client_info (Optional[google.api_core.client_info.ClientInfo]) – The client info used to send a user-agent string along with API requests. If
None
, then default info will be used. Generally, you only need to set this if you’re developing your own library or partner tool.client_options (Optional[Union[google.api_core.client_options.ClientOptions, Dict]]) – Client options used to set user options on the client. API Endpoint should be set through client_options.
- Raises
google.auth.exceptions.DefaultCredentialsError – Raised if
credentials
is not specified and the library fails to acquire default credentials.
-
__init__
(project=None, credentials=None, _http=None, location=None, default_query_job_config=None, client_info=None, client_options=None)[source]¶ Initialize self. See help(type(self)) for accurate signature.
Methods
__init__
([project, credentials, _http, …])Initialize self.
cancel_job
(job_id[, project, location, …])Attempt to cancel a job from a job ID.
close
()Close the underlying transport objects, releasing system resources.
copy_table
(sources, destination[, job_id, …])Copy one or more tables to another table.
create_dataset
(dataset[, exists_ok, retry, …])API call: create the dataset via a POST request.
create_job
(job_config[, retry])Create a new job.
create_routine
(routine[, exists_ok, retry, …])[Beta] Create a routine via a POST request.
create_table
(table[, exists_ok, retry, timeout])API call: create a table via a PUT request
dataset
(dataset_id[, project])Deprecated: Construct a reference to a dataset.
delete_dataset
(dataset[, delete_contents, …])Delete a dataset.
delete_model
(model[, retry, timeout, …])[Beta] Delete a model
delete_routine
(routine[, retry, timeout, …])[Beta] Delete a routine.
delete_table
(table[, retry, timeout, …])Delete a table
extract_table
(source, destination_uris[, …])Start a job to extract a table into Cloud Storage files.
Factory to retrieve JSON credentials while creating client.
get_dataset
(dataset_ref[, retry, timeout])Fetch the dataset referenced by
dataset_ref
get_iam_policy
(table[, …])get_job
(job_id[, project, location, retry, …])Fetch a job for the project associated with this client.
get_model
(model_ref[, retry, timeout])[Beta] Fetch the model referenced by
model_ref
.get_routine
(routine_ref[, retry, timeout])[Beta] Get the routine referenced by
routine_ref
.get_service_account_email
([project, retry, …])Get the email address of the project’s BigQuery service account
get_table
(table[, retry, timeout])Fetch the table referenced by
table
.insert_rows
(table, rows[, selected_fields])Insert rows into a table via the streaming API.
insert_rows_from_dataframe
(table, dataframe)Insert rows into a table from a dataframe via the streaming API.
insert_rows_json
(table, json_rows[, …])Insert rows into a table without applying local type conversions.
job_from_resource
(resource)Detect correct job type from resource and instantiate.
list_datasets
([project, include_all, …])List datasets for the project associated with this client.
list_jobs
([project, parent_job, …])List jobs for the project associated with this client.
list_models
(dataset[, max_results, …])[Beta] List models in the dataset.
list_partitions
(table[, retry, timeout])List the partitions in a table.
list_projects
([max_results, page_token, …])List projects for the project associated with this client.
list_routines
(dataset[, max_results, …])[Beta] List routines in the dataset.
list_rows
(table[, selected_fields, …])List the rows of the table.
list_tables
(dataset[, max_results, …])List tables in the dataset.
load_table_from_dataframe
(dataframe, destination)Upload the contents of a table from a pandas DataFrame.
load_table_from_file
(file_obj, destination)Upload the contents of this table from a file-like object.
load_table_from_json
(json_rows, destination)Upload the contents of a table from a JSON string or dict.
load_table_from_uri
(source_uris, destination)Starts a job for loading data into a table from CloudStorage.
query
(query[, job_config, job_id, …])Run a SQL query.
schema_from_json
(file_or_path)Takes a file object or file path that contains json that describes a table schema.
schema_to_json
(schema_list, destination)Takes a list of schema field objects.
set_iam_policy
(table, policy[, updateMask, …])test_iam_permissions
(table, permissions[, …])update_dataset
(dataset, fields[, retry, timeout])Change some fields of a dataset.
update_model
(model, fields[, retry, timeout])[Beta] Change some fields of a model.
update_routine
(routine, fields[, retry, timeout])[Beta] Change some fields of a routine.
update_table
(table, fields[, retry, timeout])Change some fields of a table.
Attributes
The scopes required for authenticating as a BigQuery consumer.
Default location for jobs / datasets / tables.
-
SCOPE
= ('https://www.googleapis.com/auth/bigquery', 'https://www.googleapis.com/auth/cloud-platform')¶ The scopes required for authenticating as a BigQuery consumer.
-
cancel_job
(job_id, project=None, location=None, retry=<google.api_core.retry.Retry object>, timeout=None)[source]¶ Attempt to cancel a job from a job ID.
See https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/cancel
- Parameters
job_id (str) – Unique job identifier.
- Keyword Arguments
project (Optional[str]) – ID of the project which owns the job (defaults to the client’s project).
location (Optional[str]) – Location where the job was run.
retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using
retry
.
- Returns
Job instance, based on the resource returned by the API.
- Return type
Union[ google.cloud.bigquery.job.LoadJob, google.cloud.bigquery.job.CopyJob, google.cloud.bigquery.job.ExtractJob, google.cloud.bigquery.job.QueryJob, ]
-
close
()[source]¶ Close the underlying transport objects, releasing system resources.
Note
The client instance can be used for making additional requests even after closing, in which case the underlying connections are automatically re-created.
-
copy_table
(sources, destination, job_id=None, job_id_prefix=None, location=None, project=None, job_config=None, retry=<google.api_core.retry.Retry object>, timeout=None)[source]¶ Copy one or more tables to another table.
See https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#jobconfigurationtablecopy
- Parameters
sources (Union[ google.cloud.bigquery.table.Table, google.cloud.bigquery.table.TableReference, str, Sequence[ Union[ google.cloud.bigquery.table.Table, google.cloud.bigquery.table.TableReference, str, ] ], ]) – Table or tables to be copied.
destination (Union[ google.cloud.bigquery.table.Table, google.cloud.bigquery.table.TableReference, str, ]) – Table into which data is to be copied.
- Keyword Arguments
job_id (Optional[str]) – The ID of the job.
job_id_prefix (Optional[str]) – The user-provided prefix for a randomly generated job ID. This parameter will be ignored if a
job_id
is also given.location (Optional[str]) – Location where to run the job. Must match the location of any source table as well as the destination table.
project (Optional[str]) – Project ID of the project of where to run the job. Defaults to the client’s project.
job_config (Optional[google.cloud.bigquery.job.CopyJobConfig]) – Extra configuration options for the job.
retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using
retry
.
- Returns
A new copy job instance.
- Return type
- Raises
TypeError – If
job_config
is not an instance ofCopyJobConfig
class.
-
create_dataset
(dataset, exists_ok=False, retry=<google.api_core.retry.Retry object>, timeout=None)[source]¶ API call: create the dataset via a POST request.
See https://cloud.google.com/bigquery/docs/reference/rest/v2/datasets/insert
- Parameters
dataset (Union[ google.cloud.bigquery.dataset.Dataset, google.cloud.bigquery.dataset.DatasetReference, str, ]) – A
Dataset
to create. Ifdataset
is a reference, an empty dataset is created with the specified ID and client’s default location.exists_ok (Optional[bool]) – Defaults to
False
. IfTrue
, ignore “already exists” errors when creating the dataset.retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using
retry
.
- Returns
A new
Dataset
returned from the API.- Return type
- Raises
google.cloud.exceptions.Conflict – If the dataset already exists.
Example
>>> from google.cloud import bigquery >>> client = bigquery.Client() >>> dataset = bigquery.Dataset('my_project.my_dataset') >>> dataset = client.create_dataset(dataset)
-
create_job
(job_config, retry=<google.api_core.retry.Retry object>)[source]¶ Create a new job. :param job_config: configuration job representation returned from the API. :type job_config: dict
- Keyword Arguments
retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
- Returns
A new job instance.
- Return type
Union[ google.cloud.bigquery.job.LoadJob, google.cloud.bigquery.job.CopyJob, google.cloud.bigquery.job.ExtractJob, google.cloud.bigquery.job.QueryJob ]
-
create_routine
(routine, exists_ok=False, retry=<google.api_core.retry.Retry object>, timeout=None)[source]¶ [Beta] Create a routine via a POST request.
See https://cloud.google.com/bigquery/docs/reference/rest/v2/routines/insert
- Parameters
routine (google.cloud.bigquery.routine.Routine) – A
Routine
to create. The dataset that the routine belongs to must already exist.exists_ok (Optional[bool]) – Defaults to
False
. IfTrue
, ignore “already exists” errors when creating the routine.retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using
retry
.
- Returns
A new
Routine
returned from the service.- Return type
- Raises
google.cloud.exceptions.Conflict – If the routine already exists.
-
create_table
(table, exists_ok=False, retry=<google.api_core.retry.Retry object>, timeout=None)[source]¶ API call: create a table via a PUT request
See https://cloud.google.com/bigquery/docs/reference/rest/v2/tables/insert
- Parameters
table (Union[ google.cloud.bigquery.table.Table, google.cloud.bigquery.table.TableReference, str, ]) – A
Table
to create. Iftable
is a reference, an empty table is created with the specified ID. The dataset that the table belongs to must already exist.exists_ok (Optional[bool]) – Defaults to
False
. IfTrue
, ignore “already exists” errors when creating the table.retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using
retry
.
- Returns
A new
Table
returned from the service.- Return type
- Raises
google.cloud.exceptions.Conflict – If the table already exists.
-
dataset
(dataset_id, project=None)[source]¶ Deprecated: Construct a reference to a dataset.
Deprecated since version 1.24.0: Construct a
DatasetReference
using its constructor or use a string where previously a reference object was used.As of
google-cloud-bigquery
version 1.7.0, all client methods that take aDatasetReference
orTableReference
also take a string in standard SQL format, e.g.project.dataset_id
orproject.dataset_id.table_id
.- Parameters
- Returns
a new
DatasetReference
instance.- Return type
-
delete_dataset
(dataset, delete_contents=False, retry=<google.api_core.retry.Retry object>, timeout=None, not_found_ok=False)[source]¶ Delete a dataset.
See https://cloud.google.com/bigquery/docs/reference/rest/v2/datasets/delete
- Parameters
dataset (Union[ google.cloud.bigquery.dataset.Dataset, google.cloud.bigquery.dataset.DatasetReference, str, ]) – A reference to the dataset to delete. If a string is passed in, this method attempts to create a dataset reference from a string using
google.cloud.bigquery.dataset.DatasetReference.from_string()
.delete_contents (Optional[bool]) – If True, delete all the tables in the dataset. If False and the dataset contains tables, the request will fail. Default is False.
retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using
retry
.not_found_ok (Optional[bool]) – Defaults to
False
. IfTrue
, ignore “not found” errors when deleting the dataset.
-
delete_model
(model, retry=<google.api_core.retry.Retry object>, timeout=None, not_found_ok=False)[source]¶ [Beta] Delete a model
See https://cloud.google.com/bigquery/docs/reference/rest/v2/models/delete
- Parameters
model (Union[ google.cloud.bigquery.model.Model, google.cloud.bigquery.model.ModelReference, str, ]) – A reference to the model to delete. If a string is passed in, this method attempts to create a model reference from a string using
google.cloud.bigquery.model.ModelReference.from_string()
.retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using
retry
.not_found_ok (Optional[bool]) – Defaults to
False
. IfTrue
, ignore “not found” errors when deleting the model.
-
delete_routine
(routine, retry=<google.api_core.retry.Retry object>, timeout=None, not_found_ok=False)[source]¶ [Beta] Delete a routine.
See https://cloud.google.com/bigquery/docs/reference/rest/v2/routines/delete
- Parameters
routine (Union[ google.cloud.bigquery.routine.Routine, google.cloud.bigquery.routine.RoutineReference, str, ]) – A reference to the routine to delete. If a string is passed in, this method attempts to create a routine reference from a string using
google.cloud.bigquery.routine.RoutineReference.from_string()
.retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using
retry
.not_found_ok (Optional[bool]) – Defaults to
False
. IfTrue
, ignore “not found” errors when deleting the routine.
-
delete_table
(table, retry=<google.api_core.retry.Retry object>, timeout=None, not_found_ok=False)[source]¶ Delete a table
See https://cloud.google.com/bigquery/docs/reference/rest/v2/tables/delete
- Parameters
table (Union[ google.cloud.bigquery.table.Table, google.cloud.bigquery.table.TableReference, str, ]) – A reference to the table to delete. If a string is passed in, this method attempts to create a table reference from a string using
google.cloud.bigquery.table.TableReference.from_string()
.retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using
retry
.not_found_ok (Optional[bool]) – Defaults to
False
. IfTrue
, ignore “not found” errors when deleting the table.
-
extract_table
(source, destination_uris, job_id=None, job_id_prefix=None, location=None, project=None, job_config=None, retry=<google.api_core.retry.Retry object>, timeout=None, source_type='Table')[source]¶ Start a job to extract a table into Cloud Storage files.
See https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#jobconfigurationextract
- Parameters
source (Union[ google.cloud.bigquery.table.Table, google.cloud.bigquery.table.TableReference, google.cloud.bigquery.model.Model, google.cloud.bigquery.model.ModelReference, src, ]) – Table or Model to be extracted.
destination_uris (Union[str, Sequence[str]]) – URIs of Cloud Storage file(s) into which table data is to be extracted; in format
gs://<bucket_name>/<object_name_or_glob>
.
- Keyword Arguments
job_id (Optional[str]) – The ID of the job.
job_id_prefix (Optional[str]) – The user-provided prefix for a randomly generated job ID. This parameter will be ignored if a
job_id
is also given.location (Optional[str]) – Location where to run the job. Must match the location of the source table.
project (Optional[str]) – Project ID of the project of where to run the job. Defaults to the client’s project.
job_config (Optional[google.cloud.bigquery.job.ExtractJobConfig]) – Extra configuration options for the job.
retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using
retry
.source_type (Optional[str]) – Type of source to be extracted.``Table`` or
Model
. Defaults toTable
.
- Returns
A new extract job instance.
- Return type
- Raises
TypeError – If
job_config
is not an instance ofExtractJobConfig
class.ValueError – If
source_type
is not amongTable
,``Model``.
-
classmethod
from_service_account_json
(json_credentials_path, *args, **kwargs)¶ Factory to retrieve JSON credentials while creating client.
- Parameters
json_credentials_path (str) – The path to a private key file (this file was given to you when you created the service account). This file must contain a JSON object with a private key and other credentials information (downloaded from the Google APIs console).
args (tuple) – Remaining positional arguments to pass to constructor.
kwargs – Remaining keyword arguments to pass to constructor.
- Return type
_ClientFactoryMixin
- Returns
The client created with the retrieved JSON credentials.
- Raises
TypeError – if there is a conflict with the kwargs and the credentials created by the factory.
-
get_dataset
(dataset_ref, retry=<google.api_core.retry.Retry object>, timeout=None)[source]¶ Fetch the dataset referenced by
dataset_ref
- Parameters
dataset_ref (Union[ google.cloud.bigquery.dataset.DatasetReference, str, ]) – A reference to the dataset to fetch from the BigQuery API. If a string is passed in, this method attempts to create a dataset reference from a string using
from_string()
.retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using
retry
.
- Returns
A
Dataset
instance.- Return type
-
get_job
(job_id, project=None, location=None, retry=<google.api_core.retry.Retry object>, timeout=None)[source]¶ Fetch a job for the project associated with this client.
See https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/get
- Parameters
job_id (str) – Unique job identifier.
- Keyword Arguments
project (Optional[str]) – ID of the project which owns the job (defaults to the client’s project).
location (Optional[str]) – Location where the job was run.
retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using
retry
.
- Returns
Job instance, based on the resource returned by the API.
- Return type
Union[ google.cloud.bigquery.job.LoadJob, google.cloud.bigquery.job.CopyJob, google.cloud.bigquery.job.ExtractJob, google.cloud.bigquery.job.QueryJob ]
-
get_model
(model_ref, retry=<google.api_core.retry.Retry object>, timeout=None)[source]¶ [Beta] Fetch the model referenced by
model_ref
.- Parameters
model_ref (Union[ google.cloud.bigquery.model.ModelReference, str, ]) – A reference to the model to fetch from the BigQuery API. If a string is passed in, this method attempts to create a model reference from a string using
google.cloud.bigquery.model.ModelReference.from_string()
.retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using
retry
.
- Returns
A
Model
instance.- Return type
-
get_routine
(routine_ref, retry=<google.api_core.retry.Retry object>, timeout=None)[source]¶ [Beta] Get the routine referenced by
routine_ref
.- Parameters
routine_ref (Union[ google.cloud.bigquery.routine.Routine, google.cloud.bigquery.routine.RoutineReference, str, ]) – A reference to the routine to fetch from the BigQuery API. If a string is passed in, this method attempts to create a reference from a string using
google.cloud.bigquery.routine.RoutineReference.from_string()
.retry (Optional[google.api_core.retry.Retry]) – How to retry the API call.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using
retry
.
- Returns
A
Routine
instance.- Return type
-
get_service_account_email
(project=None, retry=<google.api_core.retry.Retry object>, timeout=None)[source]¶ Get the email address of the project’s BigQuery service account
Note
This is the service account that BigQuery uses to manage tables encrypted by a key in KMS.
- Parameters
project (Optional[str]) – Project ID to use for retreiving service account email. Defaults to the client’s project.
retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using
retry
.
- Returns
service account email address
- Return type
Example
>>> from google.cloud import bigquery >>> client = bigquery.Client() >>> client.get_service_account_email() my_service_account@my-project.iam.gserviceaccount.com
-
get_table
(table, retry=<google.api_core.retry.Retry object>, timeout=None)[source]¶ Fetch the table referenced by
table
.- Parameters
table (Union[ google.cloud.bigquery.table.Table, google.cloud.bigquery.table.TableReference, str, ]) – A reference to the table to fetch from the BigQuery API. If a string is passed in, this method attempts to create a table reference from a string using
google.cloud.bigquery.table.TableReference.from_string()
.retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using
retry
.
- Returns
A
Table
instance.- Return type
-
insert_rows
(table, rows, selected_fields=None, **kwargs)[source]¶ Insert rows into a table via the streaming API.
See https://cloud.google.com/bigquery/docs/reference/rest/v2/tabledata/insertAll
- Parameters
table (Union[ google.cloud.bigquery.table.Table, google.cloud.bigquery.table.TableReference, str, ]) – The destination table for the row data, or a reference to it.
rows (Union[Sequence[Tuple], Sequence[Dict]]) – Row data to be inserted. If a list of tuples is given, each tuple should contain data for each schema field on the current table and in the same order as the schema fields. If a list of dictionaries is given, the keys must include all required fields in the schema. Keys which do not correspond to a field in the schema are ignored.
selected_fields (Sequence[google.cloud.bigquery.schema.SchemaField]) – The fields to return. Required if
table
is aTableReference
.kwargs (Dict) – Keyword arguments to
insert_rows_json()
.
- Returns
One mapping per row with insert errors: the “index” key identifies the row, and the “errors” key contains a list of the mappings describing one or more problems with the row.
- Return type
Sequence[Mappings]
- Raises
ValueError – if table’s schema is not set or rows is not a Sequence.
-
insert_rows_from_dataframe
(table, dataframe, selected_fields=None, chunk_size=500, **kwargs)[source]¶ Insert rows into a table from a dataframe via the streaming API.
- Parameters
table (Union[ google.cloud.bigquery.table.Table, google.cloud.bigquery.table.TableReference, str, ]) – The destination table for the row data, or a reference to it.
dataframe (pandas.DataFrame) – A
DataFrame
containing the data to load. AnyNaN
values present in the dataframe are omitted from the streaming API request(s).selected_fields (Sequence[google.cloud.bigquery.schema.SchemaField]) – The fields to return. Required if
table
is aTableReference
.chunk_size (int) – The number of rows to stream in a single chunk. Must be positive.
kwargs (Dict) – Keyword arguments to
insert_rows_json()
.
- Returns
A list with insert errors for each insert chunk. Each element is a list containing one mapping per row with insert errors: the “index” key identifies the row, and the “errors” key contains a list of the mappings describing one or more problems with the row.
- Return type
Sequence[Sequence[Mappings]]
- Raises
ValueError – if table’s schema is not set
-
insert_rows_json
(table, json_rows, row_ids=None, skip_invalid_rows=None, ignore_unknown_values=None, template_suffix=None, retry=<google.api_core.retry.Retry object>, timeout=None)[source]¶ Insert rows into a table without applying local type conversions.
See https://cloud.google.com/bigquery/docs/reference/rest/v2/tabledata/insertAll
- Parameters
table (Union[ google.cloud.bigquery.table.Table google.cloud.bigquery.table.TableReference, str ]) – The destination table for the row data, or a reference to it.
json_rows (Sequence[Dict]) – Row data to be inserted. Keys must match the table schema fields and values must be JSON-compatible representations.
row_ids (Optional[Sequence[Optional[str]]]) – Unique IDs, one per row being inserted. An ID can also be
None
, indicating that an explicit insert ID should not be used for that row. If the argument is omitted altogether, unique IDs are created automatically.skip_invalid_rows (Optional[bool]) – Insert all valid rows of a request, even if invalid rows exist. The default value is
False
, which causes the entire request to fail if any invalid rows exist.ignore_unknown_values (Optional[bool]) – Accept rows that contain values that do not match the schema. The unknown values are ignored. Default is
False
, which treats unknown values as errors.template_suffix (Optional[str]) – Treat
name
as a template table and provide a suffix. BigQuery will create the table<name> + <template_suffix>
based on the schema of the template table. See https://cloud.google.com/bigquery/streaming-data-into-bigquery#template-tablesretry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using
retry
.
- Returns
One mapping per row with insert errors: the “index” key identifies the row, and the “errors” key contains a list of the mappings describing one or more problems with the row.
- Return type
Sequence[Mappings]
- Raises
TypeError – if json_rows is not a Sequence.
-
job_from_resource
(resource)[source]¶ Detect correct job type from resource and instantiate.
- Parameters
resource (Dict) – one job resource from API response
- Returns
The job instance, constructed via the resource.
- Return type
Union[ google.cloud.bigquery.job.LoadJob, google.cloud.bigquery.job.CopyJob, google.cloud.bigquery.job.ExtractJob, google.cloud.bigquery.job.QueryJob ]
-
list_datasets
(project=None, include_all=False, filter=None, max_results=None, page_token=None, retry=<google.api_core.retry.Retry object>, timeout=None)[source]¶ List datasets for the project associated with this client.
See https://cloud.google.com/bigquery/docs/reference/rest/v2/datasets/list
- Parameters
project (Optional[str]) – Project ID to use for retreiving datasets. Defaults to the client’s project.
include_all (Optional[bool]) – True if results include hidden datasets. Defaults to False.
filter (Optional[str]) – An expression for filtering the results by label. For syntax, see https://cloud.google.com/bigquery/docs/reference/rest/v2/datasets/list#body.QUERY_PARAMETERS.filter
max_results (Optional[int]) – Maximum number of datasets to return.
page_token (Optional[str]) – Token representing a cursor into the datasets. If not passed, the API will return the first page of datasets. The token marks the beginning of the iterator to be returned and the value of the
page_token
can be accessed atnext_page_token
of theHTTPIterator
.retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using
retry
.
- Returns
Iterator of
DatasetListItem
. associated with the project.- Return type
-
list_jobs
(project=None, parent_job=None, max_results=None, page_token=None, all_users=None, state_filter=None, retry=<google.api_core.retry.Retry object>, timeout=None, min_creation_time=None, max_creation_time=None)[source]¶ List jobs for the project associated with this client.
See https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/list
- Parameters
project (Optional[str]) – Project ID to use for retreiving datasets. Defaults to the client’s project.
parent_job (Optional[Union[ google.cloud.bigquery.job._AsyncJob, str, ]]) – If set, retrieve only child jobs of the specified parent.
max_results (Optional[int]) – Maximum number of jobs to return.
page_token (Optional[str]) – Opaque marker for the next “page” of jobs. If not passed, the API will return the first page of jobs. The token marks the beginning of the iterator to be returned and the value of the
page_token
can be accessed atnext_page_token
ofHTTPIterator
.all_users (Optional[bool]) – If true, include jobs owned by all users in the project. Defaults to
False
.state_filter (Optional[str]) –
- If set, include only jobs matching the given state. One of:
"done"
"pending"
"running"
retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using
retry
.min_creation_time (Optional[datetime.datetime]) – Min value for job creation time. If set, only jobs created after or at this timestamp are returned. If the datetime has no time zone assumes UTC time.
max_creation_time (Optional[datetime.datetime]) – Max value for job creation time. If set, only jobs created before or at this timestamp are returned. If the datetime has no time zone assumes UTC time.
- Returns
Iterable of job instances.
- Return type
-
list_models
(dataset, max_results=None, page_token=None, retry=<google.api_core.retry.Retry object>, timeout=None)[source]¶ [Beta] List models in the dataset.
See https://cloud.google.com/bigquery/docs/reference/rest/v2/models/list
- Parameters
dataset (Union[ google.cloud.bigquery.dataset.Dataset, google.cloud.bigquery.dataset.DatasetReference, str, ]) – A reference to the dataset whose models to list from the BigQuery API. If a string is passed in, this method attempts to create a dataset reference from a string using
google.cloud.bigquery.dataset.DatasetReference.from_string()
.max_results (Optional[int]) – Maximum number of models to return. If not passed, defaults to a value set by the API.
page_token (Optional[str]) – Token representing a cursor into the models. If not passed, the API will return the first page of models. The token marks the beginning of the iterator to be returned and the value of the
page_token
can be accessed atnext_page_token
of theHTTPIterator
.retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout – The number of seconds to wait for the underlying HTTP transport before using
retry
.
-
list_partitions
(table, retry=<google.api_core.retry.Retry object>, timeout=None)[source]¶ List the partitions in a table.
- Parameters
table (Union[ google.cloud.bigquery.table.Table, google.cloud.bigquery.table.TableReference, str, ]) – The table or reference from which to get partition info
retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using
retry
. If multiple requests are made under the hood,timeout
applies to each individual request.
- Returns
A list of the partition ids present in the partitioned table
- Return type
List[str]
-
list_projects
(max_results=None, page_token=None, retry=<google.api_core.retry.Retry object>, timeout=None)[source]¶ List projects for the project associated with this client.
See https://cloud.google.com/bigquery/docs/reference/rest/v2/projects/list
- Parameters
max_results (Optional[int]) – Maximum number of projects to return, If not passed, defaults to a value set by the API.
page_token (Optional[str]) – Token representing a cursor into the projects. If not passed, the API will return the first page of projects. The token marks the beginning of the iterator to be returned and the value of the
page_token
can be accessed atnext_page_token
of theHTTPIterator
.retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using
retry
.
- Returns
Iterator of
Project
accessible to the current client.- Return type
-
list_routines
(dataset, max_results=None, page_token=None, retry=<google.api_core.retry.Retry object>, timeout=None)[source]¶ [Beta] List routines in the dataset.
See https://cloud.google.com/bigquery/docs/reference/rest/v2/routines/list
- Parameters
dataset (Union[ google.cloud.bigquery.dataset.Dataset, google.cloud.bigquery.dataset.DatasetReference, str, ]) – A reference to the dataset whose routines to list from the BigQuery API. If a string is passed in, this method attempts to create a dataset reference from a string using
google.cloud.bigquery.dataset.DatasetReference.from_string()
.max_results (Optional[int]) – Maximum number of routines to return. If not passed, defaults to a value set by the API.
page_token (Optional[str]) – Token representing a cursor into the routines. If not passed, the API will return the first page of routines. The token marks the beginning of the iterator to be returned and the value of the
page_token
can be accessed atnext_page_token
of theHTTPIterator
.retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout – The number of seconds to wait for the underlying HTTP transport before using
retry
.
-
list_rows
(table, selected_fields=None, max_results=None, page_token=None, start_index=None, page_size=None, retry=<google.api_core.retry.Retry object>, timeout=None)[source]¶ List the rows of the table.
See https://cloud.google.com/bigquery/docs/reference/rest/v2/tabledata/list
Note
This method assumes that the provided schema is up-to-date with the schema as defined on the back-end: if the two schemas are not identical, the values returned may be incomplete. To ensure that the local copy of the schema is up-to-date, call
client.get_table
.- Parameters
table (Union[ google.cloud.bigquery.table.Table, google.cloud.bigquery.table.TableListItem, google.cloud.bigquery.table.TableReference, str, ]) – The table to list, or a reference to it. When the table object does not contain a schema and
selected_fields
is not supplied, this method callsget_table
to fetch the table schema.selected_fields (Sequence[google.cloud.bigquery.schema.SchemaField]) – The fields to return. If not supplied, data for all columns are downloaded.
max_results (Optional[int]) – Maximum number of rows to return.
page_token (Optional[str]) – Token representing a cursor into the table’s rows. If not passed, the API will return the first page of the rows. The token marks the beginning of the iterator to be returned and the value of the
page_token
can be accessed atnext_page_token
of theRowIterator
.start_index (Optional[int]) – The zero-based index of the starting row to read.
page_size (Optional[int]) – The maximum number of rows in each page of results from this request. Non-positive values are ignored. Defaults to a sensible value set by the API.
retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using
retry
. If multiple requests are made under the hood,timeout
applies to each individual request.
- Returns
Iterator of row data
Row
-s. During each page, the iterator will have thetotal_rows
attribute set, which counts the total number of rows in the table (this is distinct from the total number of rows in the current page:iterator.page.num_items
).- Return type
-
list_tables
(dataset, max_results=None, page_token=None, retry=<google.api_core.retry.Retry object>, timeout=None)[source]¶ List tables in the dataset.
See https://cloud.google.com/bigquery/docs/reference/rest/v2/tables/list
- Parameters
dataset (Union[ google.cloud.bigquery.dataset.Dataset, google.cloud.bigquery.dataset.DatasetReference, str, ]) – A reference to the dataset whose tables to list from the BigQuery API. If a string is passed in, this method attempts to create a dataset reference from a string using
google.cloud.bigquery.dataset.DatasetReference.from_string()
.max_results (Optional[int]) – Maximum number of tables to return. If not passed, defaults to a value set by the API.
page_token (Optional[str]) – Token representing a cursor into the tables. If not passed, the API will return the first page of tables. The token marks the beginning of the iterator to be returned and the value of the
page_token
can be accessed atnext_page_token
of theHTTPIterator
.retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using
retry
.
- Returns
Iterator of
TableListItem
contained within the requested dataset.- Return type
-
load_table_from_dataframe
(dataframe, destination, num_retries=6, job_id=None, job_id_prefix=None, location=None, project=None, job_config=None, parquet_compression='snappy')[source]¶ Upload the contents of a table from a pandas DataFrame.
Similar to
load_table_from_uri()
, this method creates, starts and returns aLoadJob
.Note
Due to the way REPEATED fields are encoded in the
parquet
file format, a mismatch with the existing table schema can occur, and 100% compatibility cannot be guaranteed for REPEATED fields.- Parameters
dataframe (pandas.DataFrame) – A
DataFrame
containing the data to load.destination (google.cloud.bigquery.table.TableReference) –
The destination table to use for loading the data. If it is an existing table, the schema of the
DataFrame
must match the schema of the destination table. If the table does not yet exist, the schema is inferred from theDataFrame
.If a string is passed in, this method attempts to create a table reference from a string using
google.cloud.bigquery.table.TableReference.from_string()
.
- Keyword Arguments
num_retries (Optional[int]) – Number of upload retries.
job_id (Optional[str]) – Name of the job.
job_id_prefix (Optional[str]) – The user-provided prefix for a randomly generated job ID. This parameter will be ignored if a
job_id
is also given.location (Optional[str]) – Location where to run the job. Must match the location of the destination table.
project (Optional[str]) – Project ID of the project of where to run the job. Defaults to the client’s project.
job_config (Optional[google.cloud.bigquery.job.LoadJobConfig]) –
Extra configuration options for the job.
To override the default pandas data type conversions, supply a value for
schema
with column names matching those of the dataframe. The BigQuery schema is used to determine the correct data type conversion. Indexes are not loaded. Requires thepyarrow
library.parquet_compression (Optional[str]) –
[Beta] The compression method to use if intermittently serializing
dataframe
to a parquet file.If
pyarrow
and job config schema are used, the argument is directly passed as thecompression
argument to the underlyingpyarrow.parquet.write_table()
method (the default value “snappy” gets converted to uppercase). https://arrow.apache.org/docs/python/generated/pyarrow.parquet.write_table.html#pyarrow-parquet-write-tableIf either
pyarrow
or job config schema are missing, the argument is directly passed as thecompression
argument to the underlyingDataFrame.to_parquet()
method. https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_parquet.html#pandas.DataFrame.to_parquet
- Returns
A new load job.
- Return type
- Raises
ImportError – If a usable parquet engine cannot be found. This method requires
pyarrow
orfastparquet
to be installed.TypeError – If
job_config
is not an instance ofLoadJobConfig
class.
-
load_table_from_file
(file_obj, destination, rewind=False, size=None, num_retries=6, job_id=None, job_id_prefix=None, location=None, project=None, job_config=None)[source]¶ Upload the contents of this table from a file-like object.
Similar to
load_table_from_uri()
, this method creates, starts and returns aLoadJob
.- Parameters
file_obj (file) – A file handle opened in binary mode for reading.
destination (Union[ google.cloud.bigquery.table.Table, google.cloud.bigquery.table.TableReference, str, ]) – Table into which data is to be loaded. If a string is passed in, this method attempts to create a table reference from a string using
google.cloud.bigquery.table.TableReference.from_string()
.
- Keyword Arguments
rewind (Optional[bool]) – If True, seek to the beginning of the file handle before reading the file.
size (Optional[int]) – The number of bytes to read from the file handle. If size is
None
or large, resumable upload will be used. Otherwise, multipart upload will be used.num_retries (Optional[int]) – Number of upload retries. Defaults to 6.
job_id (Optional[str]) – Name of the job.
job_id_prefix (Optional[str]) – The user-provided prefix for a randomly generated job ID. This parameter will be ignored if a
job_id
is also given.location (Optional[str]) – Location where to run the job. Must match the location of the destination table.
project (Optional[str]) – Project ID of the project of where to run the job. Defaults to the client’s project.
job_config (Optional[google.cloud.bigquery.job.LoadJobConfig]) – Extra configuration options for the job.
- Returns
A new load job.
- Return type
- Raises
ValueError – If
size
is not passed in and can not be determined, or if thefile_obj
can be detected to be a file opened in text mode.TypeError – If
job_config
is not an instance ofLoadJobConfig
class.
-
load_table_from_json
(json_rows, destination, num_retries=6, job_id=None, job_id_prefix=None, location=None, project=None, job_config=None)[source]¶ Upload the contents of a table from a JSON string or dict.
- Parameters
json_rows (Iterable[Dict[str, Any]]) –
Row data to be inserted. Keys must match the table schema fields and values must be JSON-compatible representations.
Note
If your data is already a newline-delimited JSON string, it is best to wrap it into a file-like object and pass it to
load_table_from_file()
:import io from google.cloud import bigquery data = u'{"foo": "bar"}' data_as_file = io.StringIO(data) client = bigquery.Client() client.load_table_from_file(data_as_file, ...)
destination (Union[ google.cloud.bigquery.table.Table, google.cloud.bigquery.table.TableReference, str, ]) – Table into which data is to be loaded. If a string is passed in, this method attempts to create a table reference from a string using
google.cloud.bigquery.table.TableReference.from_string()
.
- Keyword Arguments
num_retries (Optional[int]) – Number of upload retries.
job_id (Optional[str]) – Name of the job.
job_id_prefix (Optional[str]) – The user-provided prefix for a randomly generated job ID. This parameter will be ignored if a
job_id
is also given.location (Optional[str]) – Location where to run the job. Must match the location of the destination table.
project (Optional[str]) – Project ID of the project of where to run the job. Defaults to the client’s project.
job_config (Optional[google.cloud.bigquery.job.LoadJobConfig]) – Extra configuration options for the job. The
source_format
setting is always set toNEWLINE_DELIMITED_JSON
.
- Returns
A new load job.
- Return type
- Raises
TypeError – If
job_config
is not an instance ofLoadJobConfig
class.
-
load_table_from_uri
(source_uris, destination, job_id=None, job_id_prefix=None, location=None, project=None, job_config=None, retry=<google.api_core.retry.Retry object>, timeout=None)[source]¶ Starts a job for loading data into a table from CloudStorage.
See https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#jobconfigurationload
- Parameters
source_uris (Union[str, Sequence[str]]) – URIs of data files to be loaded; in format
gs://<bucket_name>/<object_name_or_glob>
.destination (Union[ google.cloud.bigquery.table.Table, google.cloud.bigquery.table.TableReference, str, ]) – Table into which data is to be loaded. If a string is passed in, this method attempts to create a table reference from a string using
google.cloud.bigquery.table.TableReference.from_string()
.
- Keyword Arguments
job_id (Optional[str]) – Name of the job.
job_id_prefix (Optional[str]) – The user-provided prefix for a randomly generated job ID. This parameter will be ignored if a
job_id
is also given.location (Optional[str]) – Location where to run the job. Must match the location of the destination table.
project (Optional[str]) – Project ID of the project of where to run the job. Defaults to the client’s project.
job_config (Optional[google.cloud.bigquery.job.LoadJobConfig]) – Extra configuration options for the job.
retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using
retry
.
- Returns
A new load job.
- Return type
- Raises
TypeError – If
job_config
is not an instance ofLoadJobConfig
class.
-
property
location
¶ Default location for jobs / datasets / tables.
-
query
(query, job_config=None, job_id=None, job_id_prefix=None, location=None, project=None, retry=<google.api_core.retry.Retry object>, timeout=None)[source]¶ Run a SQL query.
See https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#jobconfigurationquery
- Parameters
query (str) – SQL query to be executed. Defaults to the standard SQL dialect. Use the
job_config
parameter to change dialects.- Keyword Arguments
job_config (Optional[google.cloud.bigquery.job.QueryJobConfig]) – Extra configuration options for the job. To override any options that were previously set in the
default_query_job_config
given to theClient
constructor, manually set those options toNone
, or whatever value is preferred.job_id (Optional[str]) – ID to use for the query job.
job_id_prefix (Optional[str]) – The prefix to use for a randomly generated job ID. This parameter will be ignored if a
job_id
is also given.location (Optional[str]) – Location where to run the job. Must match the location of the any table used in the query as well as the destination table.
project (Optional[str]) – Project ID of the project of where to run the job. Defaults to the client’s project.
retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using
retry
.
- Returns
A new query job instance.
- Return type
- Raises
TypeError – If
job_config
is not an instance ofQueryJobConfig
class.
-
schema_from_json
(file_or_path)[source]¶ Takes a file object or file path that contains json that describes a table schema.
- Returns
List of schema field objects.
-
schema_to_json
(schema_list, destination)[source]¶ Takes a list of schema field objects.
Serializes the list of schema field objects as json to a file.
Destination is a file path or a file object.
-
update_dataset
(dataset, fields, retry=<google.api_core.retry.Retry object>, timeout=None)[source]¶ Change some fields of a dataset.
Use
fields
to specify which fields to update. At least one field must be provided. If a field is listed infields
and isNone
indataset
, it will be deleted.If
dataset.etag
is notNone
, the update will only succeed if the dataset on the server has the same ETag. Thus reading a dataset withget_dataset
, changing its fields, and then passing it toupdate_dataset
will ensure that the changes will only be saved if no modifications to the dataset occurred since the read.- Parameters
dataset (google.cloud.bigquery.dataset.Dataset) – The dataset to update.
fields (Sequence[str]) – The properties of
dataset
to change (e.g. “friendly_name”).retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using
retry
.
- Returns
The modified
Dataset
instance.- Return type
-
update_model
(model, fields, retry=<google.api_core.retry.Retry object>, timeout=None)[source]¶ [Beta] Change some fields of a model.
Use
fields
to specify which fields to update. At least one field must be provided. If a field is listed infields
and isNone
inmodel
, the field value will be deleted.If
model.etag
is notNone
, the update will only succeed if the model on the server has the same ETag. Thus reading a model withget_model
, changing its fields, and then passing it toupdate_model
will ensure that the changes will only be saved if no modifications to the model occurred since the read.- Parameters
model (google.cloud.bigquery.model.Model) – The model to update.
fields (Sequence[str]) – The fields of
model
to change, spelled as the Model properties (e.g. “friendly_name”).retry (Optional[google.api_core.retry.Retry]) – A description of how to retry the API call.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using
retry
.
- Returns
The model resource returned from the API call.
- Return type
-
update_routine
(routine, fields, retry=<google.api_core.retry.Retry object>, timeout=None)[source]¶ [Beta] Change some fields of a routine.
Use
fields
to specify which fields to update. At least one field must be provided. If a field is listed infields
and isNone
inroutine
, the field value will be deleted.Warning
During beta, partial updates are not supported. You must provide all fields in the resource.
If
etag
is notNone
, the update will only succeed if the resource on the server has the same ETag. Thus reading a routine withget_routine()
, changing its fields, and then passing it to this method will ensure that the changes will only be saved if no modifications to the resource occurred since the read.- Parameters
routine (google.cloud.bigquery.routine.Routine) – The routine to update.
fields (Sequence[str]) – The fields of
routine
to change, spelled as theRoutine
properties (e.g.type_
).retry (Optional[google.api_core.retry.Retry]) – A description of how to retry the API call.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using
retry
.
- Returns
The routine resource returned from the API call.
- Return type
-
update_table
(table, fields, retry=<google.api_core.retry.Retry object>, timeout=None)[source]¶ Change some fields of a table.
Use
fields
to specify which fields to update. At least one field must be provided. If a field is listed infields
and isNone
intable
, the field value will be deleted.If
table.etag
is notNone
, the update will only succeed if the table on the server has the same ETag. Thus reading a table withget_table
, changing its fields, and then passing it toupdate_table
will ensure that the changes will only be saved if no modifications to the table occurred since the read.- Parameters
table (google.cloud.bigquery.table.Table) – The table to update.
fields (Sequence[str]) – The fields of
table
to change, spelled as the Table properties (e.g. “friendly_name”).retry (Optional[google.api_core.retry.Retry]) – A description of how to retry the API call.
timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using
retry
.
- Returns
The table resource returned from the API call.
- Return type