Reading Tables¶
Use the pandas_gbq.read_gbq()
function to run a BigQuery query and
download the results as a pandas.DataFrame
object.
import pandas_gbq
# TODO: Set project_id to your Google Cloud Platform project ID.
# project_id = "my-project"
sql = """
SELECT country_name, alpha_2_code
FROM `bigquery-public-data.utility_us.country_code_iso`
WHERE alpha_2_code LIKE 'A%'
"""
df = pandas_gbq.read_gbq(sql, project_id=project_id)
Note
A project ID is optional if it can be inferred during authentication, but it is required when authenticating with user credentials. You can find your project ID in the Google Cloud console.
You can define which column from BigQuery to use as an index in the destination DataFrame as well as a preferred column order as follows:
data_frame = pandas_gbq.read_gbq(
'SELECT * FROM `test_dataset.test_table`',
project_id=projectid,
index_col='index_column_name',
columns=['col1', 'col2'])
Querying with legacy SQL syntax¶
The dialect
argument can be used to indicate whether to use
BigQuery’s 'legacy'
SQL or BigQuery’s 'standard'
SQL. The
default value is 'standard'
.
sql = """
SELECT country_name, alpha_2_code
FROM [bigquery-public-data:utility_us.country_code_iso]
WHERE alpha_2_code LIKE 'Z%'
"""
df = pandas_gbq.read_gbq(
sql,
project_id=project_id,
# Set the dialect to "legacy" to use legacy SQL syntax. As of
# pandas-gbq version 0.10.0, the default dialect is "standard".
dialect="legacy",
)
Inferring the DataFrame’s dtypes¶
The read_gbq()
method infers the pandas dtype for each
column, based on the BigQuery table schema.
BigQuery Data Type |
dtype |
---|---|
BOOL |
boolean |
INT64 |
Int64 |
FLOAT64 |
float64 |
TIME |
dbtime |
DATE |
dbdate or object |
DATETIME |
datetime64[ns] or object |
TIMESTAMP |
datetime64[ns, UTC] or object |
If any DATE/DATETIME/TIMESTAMP value is outside of the range of pandas.Timestamp.min (1677-09-22) and pandas.Timestamp.max (2262-04-11), the data type maps to the pandas object dtype.
Improving download performance¶
Use the BigQuery Storage API to download large (>125 MB) query results more
quickly (but at an increased cost) by setting
use_bqstorage_api
to True
.
Enable the BigQuery Storage API on the project you are using to run queries.
Ensure you have the bigquery.readsessions.create permission. to create BigQuery Storage API read sessions. This permission is provided by the bigquery.user role.
- Install the
google-cloud-bigquery-storage
andpyarrow
packages.
With pip:
pip install --upgrade google-cloud-bigquery-storage pyarrow
- Install the
Set
use_bqstorage_api
toTrue
when calling theread_gbq()
function. As of thegoogle-cloud-bigquery
package, version 1.11.1 or later,the function will fallback to the BigQuery API if the BigQuery Storage API cannot be used, such as with small query results.
Advanced configuration¶
You can specify the query config as parameter to use additional options of your job. Refer to the JobConfiguration REST resource reference for details.
configuration = {
'query': {
"useQueryCache": False
}
}
data_frame = read_gbq(
'SELECT * FROM `test_dataset.test_table`',
project_id=projectid,
configuration=configuration)