API Reference¶
Note
Only functions and classes which are members of the pandas_gbq
module
are considered public. Submodules and their members are considered private.
|
Read data from Google BigQuery to a pandas DataFrame. |
|
Write a DataFrame to a Google BigQuery table. |
Storage for objects to be used throughout a session. |
|
|
Storage for objects to be used throughout a session. |
- pandas_gbq.read_gbq(query_or_table, project_id=None, index_col=None, columns=None, reauth=False, auth_local_webserver=True, dialect=None, location=None, configuration=None, credentials=None, use_bqstorage_api=False, max_results=None, verbose=None, private_key=None, progress_bar_type='tqdm', dtypes=None, auth_redirect_uri=None, client_id=None, client_secret=None, *, col_order=None)[source]¶
Read data from Google BigQuery to a pandas DataFrame.
Run a SQL query in BigQuery or read directly from a table the Python client library for BigQuery and for BigQuery Storage to make API requests.
See the How to authenticate with Google BigQuery guide for authentication instructions.
Note
Consider using BigQuery DataFrames to process large results with pandas compatible APIs that run in the BigQuery SQL query engine. This provides an opportunity to save on costs and improve performance.
- Parameters
query_or_table (str) – SQL query to return data values. If the string is a table ID, fetch the rows directly from the table without running a query.
project_id (str, optional) – Google Cloud Platform project ID. Optional when available from the environment.
index_col (str, optional) – Name of result column to use for index in results DataFrame.
columns (list(str), optional) – List of BigQuery column names in the desired order for results DataFrame.
reauth (boolean, default False) – Force Google BigQuery to re-authenticate the user. This is useful if multiple accounts are used.
auth_local_webserver (bool, default True) –
Use the local webserver flow instead of the console flow when getting user credentials. Your code must run on the same machine as your web browser and your web browser can access your application via
localhost:808X
.New in version 0.2.0.
dialect (str, default 'standard') –
Note: The default value changed to ‘standard’ in version 0.10.0.
SQL syntax dialect to use. Value can be one of:
'legacy'
Use BigQuery’s legacy SQL dialect. For more information see BigQuery Legacy SQL Reference.
'standard'
Use BigQuery’s standard SQL, which is compliant with the SQL 2011 standard. For more information see BigQuery Standard SQL Reference.
location (str, optional) –
Location where the query job should run. See the BigQuery locations documentation for a list of available locations. The location must match that of any datasets used in the query.
New in version 0.5.0.
configuration (dict, optional) –
Query config parameters for job processing. For example:
configuration = {‘query’: {‘useQueryCache’: False}}
For more information see BigQuery REST API Reference.
credentials (google.auth.credentials.Credentials, optional) –
Credentials for accessing Google APIs. Use this parameter to override default credentials, such as to use Compute Engine
google.auth.compute_engine.Credentials
or Service Accountgoogle.oauth2.service_account.Credentials
directly.New in version 0.8.0.
use_bqstorage_api (bool, default False) –
Use the BigQuery Storage API to download query results quickly, but at an increased cost. To use this API, first enable it in the Cloud Console. You must also have the bigquery.readsessions.create permission on the project you are billing queries to.
This feature requires the
google-cloud-bigquery-storage
andpyarrow
packages.This value is ignored if
max_results
is set.New in version 0.10.0.
max_results (int, optional) –
If set, limit the maximum number of rows to fetch from the query results.
New in version 0.12.0.
(Optional[str]) (progress_bar_type) –
If set, use the tqdm library to display a progress bar while the data downloads. Install the
tqdm
package to use this feature. Possible values ofprogress_bar_type
include:None
No progress bar.
'tqdm'
Use the
tqdm.tqdm()
function to print a progress bar tosys.stderr
.'tqdm_notebook'
Use the
tqdm.tqdm_notebook()
function to display a progress bar as a Jupyter notebook widget.'tqdm_gui'
Use the
tqdm.tqdm_gui()
function to display a progress bar as a graphical dialog box.
dtypes (dict, optional) – A dictionary of column names to pandas
dtype
. The provideddtype
is used when constructing the series for the column specified. Otherwise, a defaultdtype
is used.verbose (None, deprecated) – Deprecated in Pandas-GBQ 0.4.0. Use the logging module to adjust verbosity instead.
private_key (str, deprecated) – Deprecated in pandas-gbq version 0.8.0. Use the
credentials
parameter andgoogle.oauth2.service_account.Credentials.from_service_account_info()
orgoogle.oauth2.service_account.Credentials.from_service_account_file()
instead.auth_redirect_uri (str) – Path to the authentication page for organization-specific authentication workflows. Used when
auth_local_webserver=False
.client_id (str) – The Client ID for the Google Cloud Project the user is attempting to connect to.
client_secret (str) – The Client Secret associated with the Client ID for the Google Cloud Project the user is attempting to connect to.
col_order (list(str), optional) – Alias for columns, retained for backwards compatibility.
- Returns
df – DataFrame representing results of query.
- Return type
DataFrame
- pandas_gbq.to_gbq(dataframe, destination_table, project_id=None, chunksize=None, reauth=False, if_exists='fail', auth_local_webserver=True, table_schema=None, location=None, progress_bar=True, credentials=None, api_method: str = 'default', verbose=None, private_key=None, auth_redirect_uri=None, client_id=None, client_secret=None, user_agent=None, rfc9110_delimiter=False)[source]¶
Write a DataFrame to a Google BigQuery table.
The main method a user calls to export pandas DataFrame contents to Google BigQuery table.
This method uses the Google Cloud client library to make requests to Google BigQuery, documented here.
See the How to authenticate with Google BigQuery guide for authentication instructions.
- Parameters
dataframe (pandas.DataFrame) – DataFrame to be written to a Google BigQuery table.
destination_table (str) – Name of table to be written, in the form
dataset.tablename
orproject.dataset.tablename
.project_id (str, optional) – Google Cloud Platform project ID. Optional when available from the environment.
chunksize (int, optional) – Number of rows to be inserted in each chunk from the dataframe. Set to
None
to load the whole dataframe at once.reauth (bool, default False) – Force Google BigQuery to re-authenticate the user. This is useful if multiple accounts are used.
if_exists (str, default 'fail') –
Behavior when the destination table exists. Value can be one of:
'fail'
If table exists, do nothing.
'replace'
If table exists, drop it, recreate it, and insert data.
'append'
If table exists, insert data. Create if does not exist.
auth_local_webserver (bool, default True) –
Use the local webserver flow instead of the console flow when getting user credentials. Your code must run on the same machine as your web browser and your web browser can access your application via
localhost:808X
.New in version 0.2.0.
table_schema (list of dicts, optional) –
List of BigQuery table fields to which according DataFrame columns conform to, e.g.
[{'name': 'col1', 'type': 'STRING'},...]
. Thetype
values must be BigQuery type names.If
table_schema
is provided, it may contain all or a subset of DataFrame columns. If a subset is provided, the rest will be inferred from the DataFrame dtypes. Iftable_schema
contains columns not in the DataFrame, they’ll be ignored.If
table_schema
is not provided, it will be generated according to dtypes of DataFrame columns. See Inferring the Table Schema. for a description of the schema inference.
See BigQuery API documentation on valid column names <https://cloud.google.com/bigquery/docs/schemas#column_names>__.
New in version 0.3.1.
location (str, optional) –
Location where the load job should run. See the BigQuery locations documentation for a list of available locations. The location must match that of the target dataset.
New in version 0.5.0.
progress_bar (bool, default True) –
Use the library tqdm to show the progress bar for the upload, chunk by chunk.
New in version 0.5.0.
credentials (google.auth.credentials.Credentials, optional) –
Credentials for accessing Google APIs. Use this parameter to override default credentials, such as to use Compute Engine
google.auth.compute_engine.Credentials
or Service Accountgoogle.oauth2.service_account.Credentials
directly.New in version 0.8.0.
api_method (str, optional) –
API method used to upload DataFrame to BigQuery. One of “load_parquet”, “load_csv”. Default “load_parquet” if pandas is version 1.1.0+, otherwise “load_csv”.
New in version 0.16.0.
verbose (bool, deprecated) – Deprecated in Pandas-GBQ 0.4.0. Use the logging module to adjust verbosity instead.
private_key (str, deprecated) – Deprecated in pandas-gbq version 0.8.0. Use the
credentials
parameter andgoogle.oauth2.service_account.Credentials.from_service_account_info()
orgoogle.oauth2.service_account.Credentials.from_service_account_file()
instead.auth_redirect_uri (str) – Path to the authentication page for organization-specific authentication workflows. Used when
auth_local_webserver=False
.client_id (str) – The Client ID for the Google Cloud Project the user is attempting to connect to.
client_secret (str) – The Client Secret associated with the Client ID for the Google Cloud Project the user is attempting to connect to.
user_agent (str) – Custom user agent string used as a prefix to the pandas version.
rfc9110_delimiter (bool) –
Sets user agent delimiter to a hyphen or a slash. Default is False, meaning a hyphen will be used.
New in version 0.23.3.
- pandas_gbq.context = <pandas_gbq.gbq.Context object>¶
Storage for objects to be used throughout a session.
A Context object is initialized when the
pandas_gbq
module is imported, and can be found atpandas_gbq.context
.
- class pandas_gbq.Context[source]¶
Storage for objects to be used throughout a session.
A Context object is initialized when the
pandas_gbq
module is imported, and can be found atpandas_gbq.context
.- property credentials¶
Credentials to use for Google APIs.
These credentials are automatically cached in memory by calls to
pandas_gbq.read_gbq()
andpandas_gbq.to_gbq()
. To manually set the credentials, construct angoogle.auth.credentials.Credentials
object and set it as the context credentials as demonstrated in the example below. See auth docs for more information on obtaining credentials.- Return type
Examples
Manually setting the context credentials:
>>> import pandas_gbq >>> from google.oauth2 import service_account >>> credentials = service_account.Credentials.from_service_account_file( ... '/path/to/key.json', ... ) >>> pandas_gbq.context.credentials = credentials
- property dialect¶
Default dialect to use in
pandas_gbq.read_gbq()
.Allowed values for the BigQuery SQL syntax dialect:
'legacy'
Use BigQuery’s legacy SQL dialect. For more information see BigQuery Legacy SQL Reference.
'standard'
Use BigQuery’s standard SQL, which is compliant with the SQL 2011 standard. For more information see BigQuery Standard SQL Reference.
- Return type
Examples
Setting the default syntax to standard:
>>> import pandas_gbq >>> pandas_gbq.context.dialect = 'standard'