As of January 1, 2020 this library no longer supports Python 2 on the latest released version. Library versions released prior to that date will continue to be available. For more information please visit Python 2 support on Google Cloud.

API Reference

Note

Only functions and classes which are members of the pandas_gbq module are considered public. Submodules and their members are considered private.

read_gbq(query_or_table[, project_id, ...])

Load data from Google BigQuery using google-cloud-python

to_gbq(dataframe, destination_table[, ...])

Write a DataFrame to a Google BigQuery table.

context

Storage for objects to be used throughout a session.

Context()

Storage for objects to be used throughout a session.

pandas_gbq.read_gbq(query_or_table, project_id=None, index_col=None, columns=None, reauth=False, auth_local_webserver=True, dialect=None, location=None, configuration=None, credentials=None, use_bqstorage_api=False, max_results=None, verbose=None, private_key=None, progress_bar_type='tqdm', dtypes=None, auth_redirect_uri=None, client_id=None, client_secret=None, *, col_order=None)[source]

Load data from Google BigQuery using google-cloud-python

The main method a user calls to execute a Query in Google BigQuery and read results into a pandas DataFrame.

This method uses the Google Cloud client library to make requests to Google BigQuery, documented here.

See the How to authenticate with Google BigQuery guide for authentication instructions.

Parameters
  • query_or_table (str) – SQL query to return data values. If the string is a table ID, fetch the rows directly from the table without running a query.

  • project_id (str, optional) – Google Cloud Platform project ID. Optional when available from the environment.

  • index_col (str, optional) – Name of result column to use for index in results DataFrame.

  • columns (list(str), optional) – List of BigQuery column names in the desired order for results DataFrame.

  • reauth (boolean, default False) – Force Google BigQuery to re-authenticate the user. This is useful if multiple accounts are used.

  • auth_local_webserver (bool, default True) –

    Use the local webserver flow instead of the console flow when getting user credentials. Your code must run on the same machine as your web browser and your web browser can access your application via localhost:808X.

    New in version 0.2.0.

  • dialect (str, default 'standard') –

    Note: The default value changed to ‘standard’ in version 0.10.0.

    SQL syntax dialect to use. Value can be one of:

    'legacy'

    Use BigQuery’s legacy SQL dialect. For more information see BigQuery Legacy SQL Reference.

    'standard'

    Use BigQuery’s standard SQL, which is compliant with the SQL 2011 standard. For more information see BigQuery Standard SQL Reference.

  • location (str, optional) –

    Location where the query job should run. See the BigQuery locations documentation for a list of available locations. The location must match that of any datasets used in the query.

    New in version 0.5.0.

  • configuration (dict, optional) –

    Query config parameters for job processing. For example:

    configuration = {‘query’: {‘useQueryCache’: False}}

    For more information see BigQuery REST API Reference.

  • credentials (google.auth.credentials.Credentials, optional) –

    Credentials for accessing Google APIs. Use this parameter to override default credentials, such as to use Compute Engine google.auth.compute_engine.Credentials or Service Account google.oauth2.service_account.Credentials directly.

    New in version 0.8.0.

  • use_bqstorage_api (bool, default False) –

    Use the BigQuery Storage API to download query results quickly, but at an increased cost. To use this API, first enable it in the Cloud Console. You must also have the bigquery.readsessions.create permission on the project you are billing queries to.

    This feature requires the google-cloud-bigquery-storage and pyarrow packages.

    This value is ignored if max_results is set.

    New in version 0.10.0.

  • max_results (int, optional) –

    If set, limit the maximum number of rows to fetch from the query results.

    New in version 0.12.0.

  • (Optional[str]) (progress_bar_type) –

    If set, use the tqdm library to display a progress bar while the data downloads. Install the tqdm package to use this feature. Possible values of progress_bar_type include:

    None

    No progress bar.

    'tqdm'

    Use the tqdm.tqdm() function to print a progress bar to sys.stderr.

    'tqdm_notebook'

    Use the tqdm.tqdm_notebook() function to display a progress bar as a Jupyter notebook widget.

    'tqdm_gui'

    Use the tqdm.tqdm_gui() function to display a progress bar as a graphical dialog box.

  • dtypes (dict, optional) – A dictionary of column names to pandas dtype. The provided dtype is used when constructing the series for the column specified. Otherwise, a default dtype is used.

  • verbose (None, deprecated) – Deprecated in Pandas-GBQ 0.4.0. Use the logging module to adjust verbosity instead.

  • private_key (str, deprecated) – Deprecated in pandas-gbq version 0.8.0. Use the credentials parameter and google.oauth2.service_account.Credentials.from_service_account_info() or google.oauth2.service_account.Credentials.from_service_account_file() instead.

  • auth_redirect_uri (str) – Path to the authentication page for organization-specific authentication workflows. Used when auth_local_webserver=False.

  • client_id (str) – The Client ID for the Google Cloud Project the user is attempting to connect to.

  • client_secret (str) – The Client Secret associated with the Client ID for the Google Cloud Project the user is attempting to connect to.

  • col_order (list(str), optional) – Alias for columns, retained for backwards compatibility.

Returns

df – DataFrame representing results of query.

Return type

DataFrame

pandas_gbq.to_gbq(dataframe, destination_table, project_id=None, chunksize=None, reauth=False, if_exists='fail', auth_local_webserver=True, table_schema=None, location=None, progress_bar=True, credentials=None, api_method: str = 'default', verbose=None, private_key=None, auth_redirect_uri=None, client_id=None, client_secret=None)[source]

Write a DataFrame to a Google BigQuery table.

The main method a user calls to export pandas DataFrame contents to Google BigQuery table.

This method uses the Google Cloud client library to make requests to Google BigQuery, documented here.

See the How to authenticate with Google BigQuery guide for authentication instructions.

Parameters
  • dataframe (pandas.DataFrame) – DataFrame to be written to a Google BigQuery table.

  • destination_table (str) – Name of table to be written, in the form dataset.tablename or project.dataset.tablename.

  • project_id (str, optional) – Google Cloud Platform project ID. Optional when available from the environment.

  • chunksize (int, optional) – Number of rows to be inserted in each chunk from the dataframe. Set to None to load the whole dataframe at once.

  • reauth (bool, default False) – Force Google BigQuery to re-authenticate the user. This is useful if multiple accounts are used.

  • if_exists (str, default 'fail') –

    Behavior when the destination table exists. Value can be one of:

    'fail'

    If table exists, do nothing.

    'replace'

    If table exists, drop it, recreate it, and insert data.

    'append'

    If table exists, insert data. Create if does not exist.

  • auth_local_webserver (bool, default True) –

    Use the local webserver flow instead of the console flow when getting user credentials. Your code must run on the same machine as your web browser and your web browser can access your application via localhost:808X.

    New in version 0.2.0.

  • table_schema (list of dicts, optional) –

    List of BigQuery table fields to which according DataFrame columns conform to, e.g. [{'name': 'col1', 'type': 'STRING'},...]. The type values must be BigQuery type names.

    • If table_schema is provided, it may contain all or a subset of DataFrame columns. If a subset is provided, the rest will be inferred from the DataFrame dtypes. If table_schema contains columns not in the DataFrame, they’ll be ignored.

    • If table_schema is not provided, it will be generated according to dtypes of DataFrame columns. See Inferring the Table Schema. for a description of the schema inference.

    See BigQuery API documentation on valid column names <https://cloud.google.com/bigquery/docs/schemas#column_names>__.

    New in version 0.3.1.

  • location (str, optional) –

    Location where the load job should run. See the BigQuery locations documentation for a list of available locations. The location must match that of the target dataset.

    New in version 0.5.0.

  • progress_bar (bool, default True) –

    Use the library tqdm to show the progress bar for the upload, chunk by chunk.

    New in version 0.5.0.

  • credentials (google.auth.credentials.Credentials, optional) –

    Credentials for accessing Google APIs. Use this parameter to override default credentials, such as to use Compute Engine google.auth.compute_engine.Credentials or Service Account google.oauth2.service_account.Credentials directly.

    New in version 0.8.0.

  • api_method (str, optional) –

    API method used to upload DataFrame to BigQuery. One of “load_parquet”, “load_csv”. Default “load_parquet” if pandas is version 1.1.0+, otherwise “load_csv”.

    New in version 0.16.0.

  • verbose (bool, deprecated) – Deprecated in Pandas-GBQ 0.4.0. Use the logging module to adjust verbosity instead.

  • private_key (str, deprecated) – Deprecated in pandas-gbq version 0.8.0. Use the credentials parameter and google.oauth2.service_account.Credentials.from_service_account_info() or google.oauth2.service_account.Credentials.from_service_account_file() instead.

  • auth_redirect_uri (str) – Path to the authentication page for organization-specific authentication workflows. Used when auth_local_webserver=False.

  • client_id (str) – The Client ID for the Google Cloud Project the user is attempting to connect to.

  • client_secret (str) – The Client Secret associated with the Client ID for the Google Cloud Project the user is attempting to connect to.

pandas_gbq.context = <pandas_gbq.gbq.Context object>

Storage for objects to be used throughout a session.

A Context object is initialized when the pandas_gbq module is imported, and can be found at pandas_gbq.context.

class pandas_gbq.Context[source]

Storage for objects to be used throughout a session.

A Context object is initialized when the pandas_gbq module is imported, and can be found at pandas_gbq.context.

property credentials

Credentials to use for Google APIs.

These credentials are automatically cached in memory by calls to pandas_gbq.read_gbq() and pandas_gbq.to_gbq(). To manually set the credentials, construct an google.auth.credentials.Credentials object and set it as the context credentials as demonstrated in the example below. See auth docs for more information on obtaining credentials.

Return type

google.auth.credentials.Credentials

Examples

Manually setting the context credentials:

>>> import pandas_gbq
>>> from google.oauth2 import service_account
>>> credentials = service_account.Credentials.from_service_account_file(
...     '/path/to/key.json',
... )
>>> pandas_gbq.context.credentials = credentials
property dialect

Default dialect to use in pandas_gbq.read_gbq().

Allowed values for the BigQuery SQL syntax dialect:

'legacy'

Use BigQuery’s legacy SQL dialect. For more information see BigQuery Legacy SQL Reference.

'standard'

Use BigQuery’s standard SQL, which is compliant with the SQL 2011 standard. For more information see BigQuery Standard SQL Reference.

Return type

str

Examples

Setting the default syntax to standard:

>>> import pandas_gbq
>>> pandas_gbq.context.dialect = 'standard'
property project

Default project to use for calls to Google APIs.

Return type

str

Examples

Manually setting the context project:

>>> import pandas_gbq
>>> pandas_gbq.context.project = 'my-project'