As of January 1, 2020 this library no longer supports Python 2 on the latest released version. Library versions released prior to that date will continue to be available. For more information please visit Python 2 support on Google Cloud.

google.cloud.bigquery.client.Client

class google.cloud.bigquery.client.Client(project=None, credentials=None, _http=None, location=None, default_query_job_config=None, client_info=None, client_options=None)[source]

Client to bundle configuration needed for API requests.

Parameters
  • project (Optional[str]) – Project ID for the project which the client acts on behalf of. Will be passed when creating a dataset / job. If not passed, falls back to the default inferred from the environment.

  • credentials (Optional[google.auth.credentials.Credentials]) – The OAuth2 Credentials to use for this client. If not passed (and if no _http object is passed), falls back to the default inferred from the environment.

  • _http (Optional[requests.Session]) – HTTP object to make requests. Can be any object that defines request() with the same interface as requests.Session.request(). If not passed, an _http object is created that is bound to the credentials for the current object. This parameter should be considered private, and could change in the future.

  • location (Optional[str]) – Default location for jobs / datasets / tables.

  • default_query_job_config (Optional[google.cloud.bigquery.job.QueryJobConfig]) – Default QueryJobConfig. Will be merged into job configs passed into the query method.

  • client_info (Optional[google.api_core.client_info.ClientInfo]) – The client info used to send a user-agent string along with API requests. If None, then default info will be used. Generally, you only need to set this if you’re developing your own library or partner tool.

  • client_options (Optional[Union[google.api_core.client_options.ClientOptions, Dict]]) – Client options used to set user options on the client. API Endpoint should be set through client_options.

Raises

google.auth.exceptions.DefaultCredentialsError – Raised if credentials is not specified and the library fails to acquire default credentials.

__init__(project=None, credentials=None, _http=None, location=None, default_query_job_config=None, client_info=None, client_options=None)[source]

Initialize self. See help(type(self)) for accurate signature.

Methods

__init__([project, credentials, _http, …])

Initialize self.

cancel_job(job_id[, project, location, …])

Attempt to cancel a job from a job ID.

close()

Close the underlying transport objects, releasing system resources.

copy_table(sources, destination[, job_id, …])

Copy one or more tables to another table.

create_dataset(dataset[, exists_ok, retry, …])

API call: create the dataset via a POST request.

create_job(job_config[, retry, timeout])

Create a new job.

create_routine(routine[, exists_ok, retry, …])

[Beta] Create a routine via a POST request.

create_table(table[, exists_ok, retry, timeout])

API call: create a table via a PUT request

dataset(dataset_id[, project])

Deprecated: Construct a reference to a dataset.

delete_dataset(dataset[, delete_contents, …])

Delete a dataset.

delete_model(model[, retry, timeout, …])

[Beta] Delete a model

delete_routine(routine[, retry, timeout, …])

[Beta] Delete a routine.

delete_table(table[, retry, timeout, …])

Delete a table

extract_table(source, destination_uris[, …])

Start a job to extract a table into Cloud Storage files.

from_service_account_json(…)

Factory to retrieve JSON credentials while creating client.

get_dataset(dataset_ref[, retry, timeout])

Fetch the dataset referenced by dataset_ref

get_iam_policy(table[, …])

get_job(job_id[, project, location, retry, …])

Fetch a job for the project associated with this client.

get_model(model_ref[, retry, timeout])

[Beta] Fetch the model referenced by model_ref.

get_routine(routine_ref[, retry, timeout])

[Beta] Get the routine referenced by routine_ref.

get_service_account_email([project, retry, …])

Get the email address of the project’s BigQuery service account

get_table(table[, retry, timeout])

Fetch the table referenced by table.

insert_rows(table, rows[, selected_fields])

Insert rows into a table via the streaming API.

insert_rows_from_dataframe(table, dataframe)

Insert rows into a table from a dataframe via the streaming API.

insert_rows_json(table, json_rows[, …])

Insert rows into a table without applying local type conversions.

job_from_resource(resource)

Detect correct job type from resource and instantiate.

list_datasets([project, include_all, …])

List datasets for the project associated with this client.

list_jobs([project, parent_job, …])

List jobs for the project associated with this client.

list_models(dataset[, max_results, …])

[Beta] List models in the dataset.

list_partitions(table[, retry, timeout])

List the partitions in a table.

list_projects([max_results, page_token, …])

List projects for the project associated with this client.

list_routines(dataset[, max_results, …])

[Beta] List routines in the dataset.

list_rows(table[, selected_fields, …])

List the rows of the table.

list_tables(dataset[, max_results, …])

List tables in the dataset.

load_table_from_dataframe(dataframe, destination)

Upload the contents of a table from a pandas DataFrame.

load_table_from_file(file_obj, destination)

Upload the contents of this table from a file-like object.

load_table_from_json(json_rows, destination)

Upload the contents of a table from a JSON string or dict.

load_table_from_uri(source_uris, destination)

Starts a job for loading data into a table from CloudStorage.

query(query[, job_config, job_id, …])

Run a SQL query.

schema_from_json(file_or_path)

Takes a file object or file path that contains json that describes a table schema.

schema_to_json(schema_list, destination)

Takes a list of schema field objects.

set_iam_policy(table, policy[, updateMask, …])

test_iam_permissions(table, permissions[, …])

update_dataset(dataset, fields[, retry, timeout])

Change some fields of a dataset.

update_model(model, fields[, retry, timeout])

[Beta] Change some fields of a model.

update_routine(routine, fields[, retry, timeout])

[Beta] Change some fields of a routine.

update_table(table, fields[, retry, timeout])

Change some fields of a table.

Attributes

SCOPE

The scopes required for authenticating as a BigQuery consumer.

location

Default location for jobs / datasets / tables.

SCOPE = ('https://www.googleapis.com/auth/bigquery', 'https://www.googleapis.com/auth/cloud-platform')

The scopes required for authenticating as a BigQuery consumer.

__getstate__()[source]

Explicitly state that clients are not pickleable.

cancel_job(job_id, project=None, location=None, retry=<google.api_core.retry.Retry object>, timeout=None)[source]

Attempt to cancel a job from a job ID.

See https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/cancel

Parameters

job_id (str) – Unique job identifier.

Keyword Arguments
  • project (Optional[str]) – ID of the project which owns the job (defaults to the client’s project).

  • location (Optional[str]) – Location where the job was run.

  • retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.

  • timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using retry.

Returns

Job instance, based on the resource returned by the API.

Return type

Union[ google.cloud.bigquery.job.LoadJob, google.cloud.bigquery.job.CopyJob, google.cloud.bigquery.job.ExtractJob, google.cloud.bigquery.job.QueryJob, ]

close()[source]

Close the underlying transport objects, releasing system resources.

Note

The client instance can be used for making additional requests even after closing, in which case the underlying connections are automatically re-created.

copy_table(sources, destination, job_id=None, job_id_prefix=None, location=None, project=None, job_config=None, retry=<google.api_core.retry.Retry object>, timeout=None)[source]

Copy one or more tables to another table.

See https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#jobconfigurationtablecopy

Parameters
Keyword Arguments
  • job_id (Optional[str]) – The ID of the job.

  • job_id_prefix (Optional[str]) – The user-provided prefix for a randomly generated job ID. This parameter will be ignored if a job_id is also given.

  • location (Optional[str]) – Location where to run the job. Must match the location of any source table as well as the destination table.

  • project (Optional[str]) – Project ID of the project of where to run the job. Defaults to the client’s project.

  • job_config (Optional[google.cloud.bigquery.job.CopyJobConfig]) – Extra configuration options for the job.

  • retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.

  • timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using retry.

Returns

A new copy job instance.

Return type

google.cloud.bigquery.job.CopyJob

Raises

TypeError – If job_config is not an instance of CopyJobConfig class.

create_dataset(dataset, exists_ok=False, retry=<google.api_core.retry.Retry object>, timeout=None)[source]

API call: create the dataset via a POST request.

See https://cloud.google.com/bigquery/docs/reference/rest/v2/datasets/insert

Parameters
Returns

A new Dataset returned from the API.

Return type

google.cloud.bigquery.dataset.Dataset

Raises

google.cloud.exceptions.Conflict – If the dataset already exists.

Example

>>> from google.cloud import bigquery
>>> client = bigquery.Client()
>>> dataset = bigquery.Dataset('my_project.my_dataset')
>>> dataset = client.create_dataset(dataset)
create_job(job_config, retry=<google.api_core.retry.Retry object>, timeout=None)[source]

Create a new job. :param job_config: configuration job representation returned from the API. :type job_config: dict

Keyword Arguments
  • retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.

  • timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using retry.

Returns

A new job instance.

Return type

Union[ google.cloud.bigquery.job.LoadJob, google.cloud.bigquery.job.CopyJob, google.cloud.bigquery.job.ExtractJob, google.cloud.bigquery.job.QueryJob ]

create_routine(routine, exists_ok=False, retry=<google.api_core.retry.Retry object>, timeout=None)[source]

[Beta] Create a routine via a POST request.

See https://cloud.google.com/bigquery/docs/reference/rest/v2/routines/insert

Parameters
  • routine (google.cloud.bigquery.routine.Routine) – A Routine to create. The dataset that the routine belongs to must already exist.

  • exists_ok (Optional[bool]) – Defaults to False. If True, ignore “already exists” errors when creating the routine.

  • retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.

  • timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using retry.

Returns

A new Routine returned from the service.

Return type

google.cloud.bigquery.routine.Routine

Raises

google.cloud.exceptions.Conflict – If the routine already exists.

create_table(table, exists_ok=False, retry=<google.api_core.retry.Retry object>, timeout=None)[source]

API call: create a table via a PUT request

See https://cloud.google.com/bigquery/docs/reference/rest/v2/tables/insert

Parameters
Returns

A new Table returned from the service.

Return type

google.cloud.bigquery.table.Table

Raises

google.cloud.exceptions.Conflict – If the table already exists.

dataset(dataset_id, project=None)[source]

Deprecated: Construct a reference to a dataset.

Deprecated since version 1.24.0: Construct a DatasetReference using its constructor or use a string where previously a reference object was used.

As of google-cloud-bigquery version 1.7.0, all client methods that take a DatasetReference or TableReference also take a string in standard SQL format, e.g. project.dataset_id or project.dataset_id.table_id.

Parameters
  • dataset_id (str) – ID of the dataset.

  • project (Optional[str]) – Project ID for the dataset (defaults to the project of the client).

Returns

a new DatasetReference instance.

Return type

google.cloud.bigquery.dataset.DatasetReference

delete_dataset(dataset, delete_contents=False, retry=<google.api_core.retry.Retry object>, timeout=None, not_found_ok=False)[source]

Delete a dataset.

See https://cloud.google.com/bigquery/docs/reference/rest/v2/datasets/delete

Parameters
delete_model(model, retry=<google.api_core.retry.Retry object>, timeout=None, not_found_ok=False)[source]

[Beta] Delete a model

See https://cloud.google.com/bigquery/docs/reference/rest/v2/models/delete

Parameters
delete_routine(routine, retry=<google.api_core.retry.Retry object>, timeout=None, not_found_ok=False)[source]

[Beta] Delete a routine.

See https://cloud.google.com/bigquery/docs/reference/rest/v2/routines/delete

Parameters
delete_table(table, retry=<google.api_core.retry.Retry object>, timeout=None, not_found_ok=False)[source]

Delete a table

See https://cloud.google.com/bigquery/docs/reference/rest/v2/tables/delete

Parameters
extract_table(source, destination_uris, job_id=None, job_id_prefix=None, location=None, project=None, job_config=None, retry=<google.api_core.retry.Retry object>, timeout=None, source_type='Table')[source]

Start a job to extract a table into Cloud Storage files.

See https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#jobconfigurationextract

Parameters
Keyword Arguments
  • job_id (Optional[str]) – The ID of the job.

  • job_id_prefix (Optional[str]) – The user-provided prefix for a randomly generated job ID. This parameter will be ignored if a job_id is also given.

  • location (Optional[str]) – Location where to run the job. Must match the location of the source table.

  • project (Optional[str]) – Project ID of the project of where to run the job. Defaults to the client’s project.

  • job_config (Optional[google.cloud.bigquery.job.ExtractJobConfig]) – Extra configuration options for the job.

  • retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.

  • timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using retry.

  • source_type (Optional[str]) – Type of source to be extracted.``Table`` or Model. Defaults to Table.

Returns

A new extract job instance.

Return type

google.cloud.bigquery.job.ExtractJob

Raises
classmethod from_service_account_json(json_credentials_path, *args, **kwargs)

Factory to retrieve JSON credentials while creating client.

Parameters
  • json_credentials_path (str) – The path to a private key file (this file was given to you when you created the service account). This file must contain a JSON object with a private key and other credentials information (downloaded from the Google APIs console).

  • args (tuple) – Remaining positional arguments to pass to constructor.

  • kwargs – Remaining keyword arguments to pass to constructor.

Return type

_ClientFactoryMixin

Returns

The client created with the retrieved JSON credentials.

Raises

TypeError – if there is a conflict with the kwargs and the credentials created by the factory.

get_dataset(dataset_ref, retry=<google.api_core.retry.Retry object>, timeout=None)[source]

Fetch the dataset referenced by dataset_ref

Parameters
Returns

A Dataset instance.

Return type

google.cloud.bigquery.dataset.Dataset

get_job(job_id, project=None, location=None, retry=<google.api_core.retry.Retry object>, timeout=None)[source]

Fetch a job for the project associated with this client.

See https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/get

Parameters

job_id (str) – Unique job identifier.

Keyword Arguments
  • project (Optional[str]) – ID of the project which owns the job (defaults to the client’s project).

  • location (Optional[str]) – Location where the job was run.

  • retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.

  • timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using retry.

Returns

Job instance, based on the resource returned by the API.

Return type

Union[ google.cloud.bigquery.job.LoadJob, google.cloud.bigquery.job.CopyJob, google.cloud.bigquery.job.ExtractJob, google.cloud.bigquery.job.QueryJob ]

get_model(model_ref, retry=<google.api_core.retry.Retry object>, timeout=None)[source]

[Beta] Fetch the model referenced by model_ref.

Parameters
Returns

A Model instance.

Return type

google.cloud.bigquery.model.Model

get_routine(routine_ref, retry=<google.api_core.retry.Retry object>, timeout=None)[source]

[Beta] Get the routine referenced by routine_ref.

Parameters
Returns

A Routine instance.

Return type

google.cloud.bigquery.routine.Routine

get_service_account_email(project=None, retry=<google.api_core.retry.Retry object>, timeout=None)[source]

Get the email address of the project’s BigQuery service account

Note

This is the service account that BigQuery uses to manage tables encrypted by a key in KMS.

Parameters
  • project (Optional[str]) – Project ID to use for retreiving service account email. Defaults to the client’s project.

  • retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.

  • timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using retry.

Returns

service account email address

Return type

str

Example

>>> from google.cloud import bigquery
>>> client = bigquery.Client()
>>> client.get_service_account_email()
my_service_account@my-project.iam.gserviceaccount.com
get_table(table, retry=<google.api_core.retry.Retry object>, timeout=None)[source]

Fetch the table referenced by table.

Parameters
Returns

A Table instance.

Return type

google.cloud.bigquery.table.Table

insert_rows(table, rows, selected_fields=None, **kwargs)[source]

Insert rows into a table via the streaming API.

See https://cloud.google.com/bigquery/docs/reference/rest/v2/tabledata/insertAll

Parameters
Returns

One mapping per row with insert errors: the “index” key identifies the row, and the “errors” key contains a list of the mappings describing one or more problems with the row.

Return type

Sequence[Mappings]

Raises

ValueError – if table’s schema is not set or rows is not a Sequence.

insert_rows_from_dataframe(table, dataframe, selected_fields=None, chunk_size=500, **kwargs)[source]

Insert rows into a table from a dataframe via the streaming API.

Parameters
Returns

A list with insert errors for each insert chunk. Each element is a list containing one mapping per row with insert errors: the “index” key identifies the row, and the “errors” key contains a list of the mappings describing one or more problems with the row.

Return type

Sequence[Sequence[Mappings]]

Raises

ValueError – if table’s schema is not set

insert_rows_json(table, json_rows, row_ids=None, skip_invalid_rows=None, ignore_unknown_values=None, template_suffix=None, retry=<google.api_core.retry.Retry object>, timeout=None)[source]

Insert rows into a table without applying local type conversions.

See https://cloud.google.com/bigquery/docs/reference/rest/v2/tabledata/insertAll

Parameters
  • table (Union[ google.cloud.bigquery.table.Table google.cloud.bigquery.table.TableReference, str ]) – The destination table for the row data, or a reference to it.

  • json_rows (Sequence[Dict]) – Row data to be inserted. Keys must match the table schema fields and values must be JSON-compatible representations.

  • row_ids (Optional[Sequence[Optional[str]]]) – Unique IDs, one per row being inserted. An ID can also be None, indicating that an explicit insert ID should not be used for that row. If the argument is omitted altogether, unique IDs are created automatically.

  • skip_invalid_rows (Optional[bool]) – Insert all valid rows of a request, even if invalid rows exist. The default value is False, which causes the entire request to fail if any invalid rows exist.

  • ignore_unknown_values (Optional[bool]) – Accept rows that contain values that do not match the schema. The unknown values are ignored. Default is False, which treats unknown values as errors.

  • template_suffix (Optional[str]) – Treat name as a template table and provide a suffix. BigQuery will create the table <name> + <template_suffix> based on the schema of the template table. See https://cloud.google.com/bigquery/streaming-data-into-bigquery#template-tables

  • retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.

  • timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using retry.

Returns

One mapping per row with insert errors: the “index” key identifies the row, and the “errors” key contains a list of the mappings describing one or more problems with the row.

Return type

Sequence[Mappings]

Raises

TypeError – if json_rows is not a Sequence.

job_from_resource(resource)[source]

Detect correct job type from resource and instantiate.

Parameters

resource (Dict) – one job resource from API response

Returns

The job instance, constructed via the resource.

Return type

Union[ google.cloud.bigquery.job.LoadJob, google.cloud.bigquery.job.CopyJob, google.cloud.bigquery.job.ExtractJob, google.cloud.bigquery.job.QueryJob ]

list_datasets(project=None, include_all=False, filter=None, max_results=None, page_token=None, retry=<google.api_core.retry.Retry object>, timeout=None)[source]

List datasets for the project associated with this client.

See https://cloud.google.com/bigquery/docs/reference/rest/v2/datasets/list

Parameters
  • project (Optional[str]) – Project ID to use for retreiving datasets. Defaults to the client’s project.

  • include_all (Optional[bool]) – True if results include hidden datasets. Defaults to False.

  • filter (Optional[str]) – An expression for filtering the results by label. For syntax, see https://cloud.google.com/bigquery/docs/reference/rest/v2/datasets/list#body.QUERY_PARAMETERS.filter

  • max_results (Optional[int]) – Maximum number of datasets to return.

  • page_token (Optional[str]) – Token representing a cursor into the datasets. If not passed, the API will return the first page of datasets. The token marks the beginning of the iterator to be returned and the value of the page_token can be accessed at next_page_token of the HTTPIterator.

  • retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.

  • timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using retry.

Returns

Iterator of DatasetListItem. associated with the project.

Return type

google.api_core.page_iterator.Iterator

list_jobs(project=None, parent_job=None, max_results=None, page_token=None, all_users=None, state_filter=None, retry=<google.api_core.retry.Retry object>, timeout=None, min_creation_time=None, max_creation_time=None)[source]

List jobs for the project associated with this client.

See https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/list

Parameters
  • project (Optional[str]) – Project ID to use for retreiving datasets. Defaults to the client’s project.

  • parent_job (Optional[Union[ google.cloud.bigquery.job._AsyncJob, str, ]]) – If set, retrieve only child jobs of the specified parent.

  • max_results (Optional[int]) – Maximum number of jobs to return.

  • page_token (Optional[str]) – Opaque marker for the next “page” of jobs. If not passed, the API will return the first page of jobs. The token marks the beginning of the iterator to be returned and the value of the page_token can be accessed at next_page_token of HTTPIterator.

  • all_users (Optional[bool]) – If true, include jobs owned by all users in the project. Defaults to False.

  • state_filter (Optional[str]) –

    If set, include only jobs matching the given state. One of:
    • "done"

    • "pending"

    • "running"

  • retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.

  • timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using retry.

  • min_creation_time (Optional[datetime.datetime]) – Min value for job creation time. If set, only jobs created after or at this timestamp are returned. If the datetime has no time zone assumes UTC time.

  • max_creation_time (Optional[datetime.datetime]) – Max value for job creation time. If set, only jobs created before or at this timestamp are returned. If the datetime has no time zone assumes UTC time.

Returns

Iterable of job instances.

Return type

google.api_core.page_iterator.Iterator

list_models(dataset, max_results=None, page_token=None, retry=<google.api_core.retry.Retry object>, timeout=None)[source]

[Beta] List models in the dataset.

See https://cloud.google.com/bigquery/docs/reference/rest/v2/models/list

Parameters
list_partitions(table, retry=<google.api_core.retry.Retry object>, timeout=None)[source]

List the partitions in a table.

Parameters
Returns

A list of the partition ids present in the partitioned table

Return type

List[str]

list_projects(max_results=None, page_token=None, retry=<google.api_core.retry.Retry object>, timeout=None)[source]

List projects for the project associated with this client.

See https://cloud.google.com/bigquery/docs/reference/rest/v2/projects/list

Parameters
  • max_results (Optional[int]) – Maximum number of projects to return, If not passed, defaults to a value set by the API.

  • page_token (Optional[str]) – Token representing a cursor into the projects. If not passed, the API will return the first page of projects. The token marks the beginning of the iterator to be returned and the value of the page_token can be accessed at next_page_token of the HTTPIterator.

  • retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.

  • timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using retry.

Returns

Iterator of Project accessible to the current client.

Return type

google.api_core.page_iterator.Iterator

list_routines(dataset, max_results=None, page_token=None, retry=<google.api_core.retry.Retry object>, timeout=None)[source]

[Beta] List routines in the dataset.

See https://cloud.google.com/bigquery/docs/reference/rest/v2/routines/list

Parameters
list_rows(table, selected_fields=None, max_results=None, page_token=None, start_index=None, page_size=None, retry=<google.api_core.retry.Retry object>, timeout=None)[source]

List the rows of the table.

See https://cloud.google.com/bigquery/docs/reference/rest/v2/tabledata/list

Note

This method assumes that the provided schema is up-to-date with the schema as defined on the back-end: if the two schemas are not identical, the values returned may be incomplete. To ensure that the local copy of the schema is up-to-date, call client.get_table.

Parameters
  • table (Union[ google.cloud.bigquery.table.Table, google.cloud.bigquery.table.TableListItem, google.cloud.bigquery.table.TableReference, str, ]) – The table to list, or a reference to it. When the table object does not contain a schema and selected_fields is not supplied, this method calls get_table to fetch the table schema.

  • selected_fields (Sequence[google.cloud.bigquery.schema.SchemaField]) – The fields to return. If not supplied, data for all columns are downloaded.

  • max_results (Optional[int]) – Maximum number of rows to return.

  • page_token (Optional[str]) – Token representing a cursor into the table’s rows. If not passed, the API will return the first page of the rows. The token marks the beginning of the iterator to be returned and the value of the page_token can be accessed at next_page_token of the RowIterator.

  • start_index (Optional[int]) – The zero-based index of the starting row to read.

  • page_size (Optional[int]) – The maximum number of rows in each page of results from this request. Non-positive values are ignored. Defaults to a sensible value set by the API.

  • retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.

  • timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using retry. If multiple requests are made under the hood, timeout applies to each individual request.

Returns

Iterator of row data Row-s. During each page, the iterator will have the total_rows attribute set, which counts the total number of rows in the table (this is distinct from the total number of rows in the current page: iterator.page.num_items).

Return type

google.cloud.bigquery.table.RowIterator

list_tables(dataset, max_results=None, page_token=None, retry=<google.api_core.retry.Retry object>, timeout=None)[source]

List tables in the dataset.

See https://cloud.google.com/bigquery/docs/reference/rest/v2/tables/list

Parameters
Returns

Iterator of TableListItem contained within the requested dataset.

Return type

google.api_core.page_iterator.Iterator

load_table_from_dataframe(dataframe, destination, num_retries=6, job_id=None, job_id_prefix=None, location=None, project=None, job_config=None, parquet_compression='snappy', timeout=None)[source]

Upload the contents of a table from a pandas DataFrame.

Similar to load_table_from_uri(), this method creates, starts and returns a LoadJob.

Note

REPEATED fields are NOT supported when using the CSV source format. They are supported when using the PARQUET source format, but due to the way they are encoded in the parquet file, a mismatch with the existing table schema can occur, so 100% compatibility cannot be guaranteed for REPEATED fields when using the parquet format.

https://github.com/googleapis/python-bigquery/issues/17

Parameters
  • dataframe (pandas.DataFrame) – A DataFrame containing the data to load.

  • destination (google.cloud.bigquery.table.TableReference) –

    The destination table to use for loading the data. If it is an existing table, the schema of the DataFrame must match the schema of the destination table. If the table does not yet exist, the schema is inferred from the DataFrame.

    If a string is passed in, this method attempts to create a table reference from a string using google.cloud.bigquery.table.TableReference.from_string().

Keyword Arguments
  • num_retries (Optional[int]) – Number of upload retries.

  • job_id (Optional[str]) – Name of the job.

  • job_id_prefix (Optional[str]) – The user-provided prefix for a randomly generated job ID. This parameter will be ignored if a job_id is also given.

  • location (Optional[str]) – Location where to run the job. Must match the location of the destination table.

  • project (Optional[str]) – Project ID of the project of where to run the job. Defaults to the client’s project.

  • job_config (Optional[google.cloud.bigquery.job.LoadJobConfig]) –

    Extra configuration options for the job.

    To override the default pandas data type conversions, supply a value for schema with column names matching those of the dataframe. The BigQuery schema is used to determine the correct data type conversion. Indexes are not loaded. Requires the pyarrow library.

    By default, this method uses the parquet source format. To override this, supply a value for source_format with the format name. Currently only CSV and PARQUET are supported.

  • parquet_compression (Optional[str]) –

    [Beta] The compression method to use if intermittently serializing dataframe to a parquet file.

    The argument is directly passed as the compression argument to the underlying pyarrow.parquet.write_table() method (the default value “snappy” gets converted to uppercase). https://arrow.apache.org/docs/python/generated/pyarrow.parquet.write_table.html#pyarrow-parquet-write-table

    If the job config schema is missing, the argument is directly passed as the compression argument to the underlying DataFrame.to_parquet() method. https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_parquet.html#pandas.DataFrame.to_parquet

  • timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using retry.

Returns

A new load job.

Return type

google.cloud.bigquery.job.LoadJob

Raises
  • ValueError – If a usable parquet engine cannot be found. This method requires pyarrow to be installed.

  • TypeError – If job_config is not an instance of LoadJobConfig class.

load_table_from_file(file_obj, destination, rewind=False, size=None, num_retries=6, job_id=None, job_id_prefix=None, location=None, project=None, job_config=None, timeout=None)[source]

Upload the contents of this table from a file-like object.

Similar to load_table_from_uri(), this method creates, starts and returns a LoadJob.

Parameters
Keyword Arguments
  • rewind (Optional[bool]) – If True, seek to the beginning of the file handle before reading the file.

  • size (Optional[int]) – The number of bytes to read from the file handle. If size is None or large, resumable upload will be used. Otherwise, multipart upload will be used.

  • num_retries (Optional[int]) – Number of upload retries. Defaults to 6.

  • job_id (Optional[str]) – Name of the job.

  • job_id_prefix (Optional[str]) – The user-provided prefix for a randomly generated job ID. This parameter will be ignored if a job_id is also given.

  • location (Optional[str]) – Location where to run the job. Must match the location of the destination table.

  • project (Optional[str]) – Project ID of the project of where to run the job. Defaults to the client’s project.

  • job_config (Optional[google.cloud.bigquery.job.LoadJobConfig]) – Extra configuration options for the job.

  • timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using retry.

Returns

A new load job.

Return type

google.cloud.bigquery.job.LoadJob

Raises
  • ValueError – If size is not passed in and can not be determined, or if the file_obj can be detected to be a file opened in text mode.

  • TypeError – If job_config is not an instance of LoadJobConfig class.

load_table_from_json(json_rows, destination, num_retries=6, job_id=None, job_id_prefix=None, location=None, project=None, job_config=None, timeout=None)[source]

Upload the contents of a table from a JSON string or dict.

Parameters
Keyword Arguments
  • num_retries (Optional[int]) – Number of upload retries.

  • job_id (Optional[str]) – Name of the job.

  • job_id_prefix (Optional[str]) – The user-provided prefix for a randomly generated job ID. This parameter will be ignored if a job_id is also given.

  • location (Optional[str]) – Location where to run the job. Must match the location of the destination table.

  • project (Optional[str]) – Project ID of the project of where to run the job. Defaults to the client’s project.

  • job_config (Optional[google.cloud.bigquery.job.LoadJobConfig]) – Extra configuration options for the job. The source_format setting is always set to NEWLINE_DELIMITED_JSON.

  • timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using retry.

Returns

A new load job.

Return type

google.cloud.bigquery.job.LoadJob

Raises

TypeError – If job_config is not an instance of LoadJobConfig class.

load_table_from_uri(source_uris, destination, job_id=None, job_id_prefix=None, location=None, project=None, job_config=None, retry=<google.api_core.retry.Retry object>, timeout=None)[source]

Starts a job for loading data into a table from CloudStorage.

See https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#jobconfigurationload

Parameters
Keyword Arguments
  • job_id (Optional[str]) – Name of the job.

  • job_id_prefix (Optional[str]) – The user-provided prefix for a randomly generated job ID. This parameter will be ignored if a job_id is also given.

  • location (Optional[str]) – Location where to run the job. Must match the location of the destination table.

  • project (Optional[str]) – Project ID of the project of where to run the job. Defaults to the client’s project.

  • job_config (Optional[google.cloud.bigquery.job.LoadJobConfig]) – Extra configuration options for the job.

  • retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.

  • timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using retry.

Returns

A new load job.

Return type

google.cloud.bigquery.job.LoadJob

Raises

TypeError – If job_config is not an instance of LoadJobConfig class.

property location

Default location for jobs / datasets / tables.

query(query, job_config=None, job_id=None, job_id_prefix=None, location=None, project=None, retry=<google.api_core.retry.Retry object>, timeout=None)[source]

Run a SQL query.

See https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#jobconfigurationquery

Parameters

query (str) – SQL query to be executed. Defaults to the standard SQL dialect. Use the job_config parameter to change dialects.

Keyword Arguments
  • job_config (Optional[google.cloud.bigquery.job.QueryJobConfig]) – Extra configuration options for the job. To override any options that were previously set in the default_query_job_config given to the Client constructor, manually set those options to None, or whatever value is preferred.

  • job_id (Optional[str]) – ID to use for the query job.

  • job_id_prefix (Optional[str]) – The prefix to use for a randomly generated job ID. This parameter will be ignored if a job_id is also given.

  • location (Optional[str]) – Location where to run the job. Must match the location of the any table used in the query as well as the destination table.

  • project (Optional[str]) – Project ID of the project of where to run the job. Defaults to the client’s project.

  • retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.

  • timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using retry.

Returns

A new query job instance.

Return type

google.cloud.bigquery.job.QueryJob

Raises

TypeError – If job_config is not an instance of QueryJobConfig class.

schema_from_json(file_or_path)[source]

Takes a file object or file path that contains json that describes a table schema.

Returns

List of schema field objects.

schema_to_json(schema_list, destination)[source]

Takes a list of schema field objects.

Serializes the list of schema field objects as json to a file.

Destination is a file path or a file object.

update_dataset(dataset, fields, retry=<google.api_core.retry.Retry object>, timeout=None)[source]

Change some fields of a dataset.

Use fields to specify which fields to update. At least one field must be provided. If a field is listed in fields and is None in dataset, it will be deleted.

If dataset.etag is not None, the update will only succeed if the dataset on the server has the same ETag. Thus reading a dataset with get_dataset, changing its fields, and then passing it to update_dataset will ensure that the changes will only be saved if no modifications to the dataset occurred since the read.

Parameters
  • dataset (google.cloud.bigquery.dataset.Dataset) – The dataset to update.

  • fields (Sequence[str]) –

    The properties of dataset to change. These are strings corresponding to the properties of Dataset.

    For example, to update the default expiration times, specify both properties in the fields argument:

    bigquery_client.update_dataset(
        dataset,
        [
            "default_partition_expiration_ms",
            "default_table_expiration_ms",
        ]
    )
    

  • retry (Optional[google.api_core.retry.Retry]) – How to retry the RPC.

  • timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using retry.

Returns

The modified Dataset instance.

Return type

google.cloud.bigquery.dataset.Dataset

update_model(model, fields, retry=<google.api_core.retry.Retry object>, timeout=None)[source]

[Beta] Change some fields of a model.

Use fields to specify which fields to update. At least one field must be provided. If a field is listed in fields and is None in model, the field value will be deleted.

If model.etag is not None, the update will only succeed if the model on the server has the same ETag. Thus reading a model with get_model, changing its fields, and then passing it to update_model will ensure that the changes will only be saved if no modifications to the model occurred since the read.

Parameters
  • model (google.cloud.bigquery.model.Model) – The model to update.

  • fields (Sequence[str]) –

    The properties of model to change. These are strings corresponding to the properties of Model.

    For example, to update the descriptive properties of the model, specify them in the fields argument:

    bigquery_client.update_model(
        model, ["description", "friendly_name"]
    )
    

  • retry (Optional[google.api_core.retry.Retry]) – A description of how to retry the API call.

  • timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using retry.

Returns

The model resource returned from the API call.

Return type

google.cloud.bigquery.model.Model

update_routine(routine, fields, retry=<google.api_core.retry.Retry object>, timeout=None)[source]

[Beta] Change some fields of a routine.

Use fields to specify which fields to update. At least one field must be provided. If a field is listed in fields and is None in routine, the field value will be deleted.

Warning

During beta, partial updates are not supported. You must provide all fields in the resource.

If etag is not None, the update will only succeed if the resource on the server has the same ETag. Thus reading a routine with get_routine(), changing its fields, and then passing it to this method will ensure that the changes will only be saved if no modifications to the resource occurred since the read.

Parameters
  • routine (google.cloud.bigquery.routine.Routine) – The routine to update.

  • fields (Sequence[str]) –

    The fields of routine to change, spelled as the Routine properties.

    For example, to update the description property of the routine, specify it in the fields argument:

    bigquery_client.update_routine(
        routine, ["description"]
    )
    

  • retry (Optional[google.api_core.retry.Retry]) – A description of how to retry the API call.

  • timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using retry.

Returns

The routine resource returned from the API call.

Return type

google.cloud.bigquery.routine.Routine

update_table(table, fields, retry=<google.api_core.retry.Retry object>, timeout=None)[source]

Change some fields of a table.

Use fields to specify which fields to update. At least one field must be provided. If a field is listed in fields and is None in table, the field value will be deleted.

If table.etag is not None, the update will only succeed if the table on the server has the same ETag. Thus reading a table with get_table, changing its fields, and then passing it to update_table will ensure that the changes will only be saved if no modifications to the table occurred since the read.

Parameters
  • table (google.cloud.bigquery.table.Table) – The table to update.

  • fields (Sequence[str]) –

    The fields of table to change, spelled as the Table properties.

    For example, to update the descriptive properties of the table, specify them in the fields argument:

    bigquery_client.update_table(
        table,
        ["description", "friendly_name"]
    )
    

  • retry (Optional[google.api_core.retry.Retry]) – A description of how to retry the API call.

  • timeout (Optional[float]) – The number of seconds to wait for the underlying HTTP transport before using retry.

Returns

The table resource returned from the API call.

Return type

google.cloud.bigquery.table.Table