Python Client for Google BigQuery Storage API¶
Quick Start¶
In order to use this library, you first need to go through the following steps:
Installation¶
Install this library in a virtual environment using venv. venv is a tool that creates isolated Python environments. These isolated environments can have separate versions of Python packages, which allows you to isolate one project’s dependencies from the dependencies of other projects.
With venv, it’s possible to install this library without needing system install permissions, and without clashing with the installed system dependencies.
Code samples and snippets¶
Code samples and snippets live in the samples/ folder.
Supported Python Versions¶
Our client libraries are compatible with all current active and maintenance versions of Python.
Python >= 3.7
Unsupported Python Versions¶
Python <= 3.6
If you are using an end-of-life version of Python, we recommend that you update as soon as possible to an actively supported version.
Mac/Linux¶
python3 -m venv <your-env>
source <your-env>/bin/activate
pip install google-cloud-bigquery-storage
Windows¶
py -m venv <your-env>
.\<your-env>\Scripts\activate
pip install google-cloud-bigquery-storage
Next Steps¶
Read the Client Library Documentation for Google BigQuery Storage API to see other available methods on the client.
Read the Google BigQuery Storage API Product documentation to learn more about the product and see How-to Guides.
View this README to see the full list of Cloud APIs that we cover.
Note
Because this client uses grpc
library, it is safe to
share instances across threads. In multiprocessing scenarios, the best
practice is to create client instances after the invocation of
os.fork()
by multiprocessing.pool.Pool
or
multiprocessing.Process
.
Example Usage¶
from google.cloud.bigquery_storage import BigQueryReadClient, types
# TODO(developer): Set the project_id variable.
# project_id = 'your-project-id'
#
# The read session is created in this project. This project can be
# different from that which contains the table.
client = BigQueryReadClient()
# This example reads baby name data from the public datasets.
table = "projects/{}/datasets/{}/tables/{}".format(
"bigquery-public-data", "usa_names", "usa_1910_current"
)
requested_session = types.ReadSession()
requested_session.table = table
# This API can also deliver data serialized in Apache Arrow format.
# This example leverages Apache Avro.
requested_session.data_format = types.DataFormat.AVRO
# We limit the output columns to a subset of those allowed in the table,
# and set a simple filter to only report names from the state of
# Washington (WA).
requested_session.read_options.selected_fields = ["name", "number", "state"]
requested_session.read_options.row_restriction = 'state = "WA"'
# Set a snapshot time if it's been specified.
if snapshot_millis > 0:
snapshot_time = types.Timestamp()
snapshot_time.FromMilliseconds(snapshot_millis)
requested_session.table_modifiers.snapshot_time = snapshot_time
parent = "projects/{}".format(project_id)
session = client.create_read_session(
parent=parent,
read_session=requested_session,
# We'll use only a single stream for reading data from the table. However,
# if you wanted to fan out multiple readers you could do so by having a
# reader process each individual stream.
max_stream_count=1,
)
reader = client.read_rows(session.streams[0].name)
# The read stream contains blocks of Avro-encoded bytes. The rows() method
# uses the fastavro library to parse these blocks as an iterable of Python
# dictionaries. Install fastavro with the following command:
#
# pip install google-cloud-bigquery-storage[fastavro]
rows = reader.rows(session)
# Do any local processing by iterating over the rows. The
# google-cloud-bigquery-storage client reconnects to the API after any
# transient network errors or timeouts.
names = set()
states = set()
# fastavro returns EOFError instead of StopIterationError starting v1.8.4.
# See https://github.com/googleapis/python-bigquery-storage/pull/687
try:
for row in rows:
names.add(row["name"])
states.add(row["state"])
except EOFError:
pass
print("Got {} unique names in states: {}".format(len(names), ", ".join(states)))
API Reference¶
- Bigquery Storage v1 API Library
- Services for Google Cloud Bigquery Storage v1 API
- Types for Google Cloud Bigquery Storage v1 API
- Bigquery Storage v1beta2 API Library
- Services for Google Cloud Bigquery Storage v1beta2 API
- Types for Google Cloud Bigquery Storage v1beta2 API
- Services for Google Cloud Bigquery Storage v1alpha API
- Types for Google Cloud Bigquery Storage v1alpha API
Migration Guide¶
See the guide below for instructions on migrating to the 2.x release of this library.