As of January 1, 2020 this library no longer supports Python 2 on the latest released version. Library versions released prior to that date will continue to be available. For more information please visit Python 2 support on Google Cloud.

Bigquery Storage v1 API Library

Parent client for calling the Cloud BigQuery Storage API.

This is the base from which all interactions with the API occur.

class google.cloud.bigquery_storage_v1.client.BigQueryReadClient(*, credentials: typing.Optional[google.auth.credentials.Credentials] = None, transport: typing.Optional[typing.Union[str, google.cloud.bigquery_storage_v1.services.big_query_read.transports.base.BigQueryReadTransport, typing.Callable[[...], google.cloud.bigquery_storage_v1.services.big_query_read.transports.base.BigQueryReadTransport]]] = None, client_options: typing.Optional[typing.Union[google.api_core.client_options.ClientOptions, dict]] = None, client_info: google.api_core.gapic_v1.client_info.ClientInfo = <google.api_core.gapic_v1.client_info.ClientInfo object>)[source]

Client for interacting with BigQuery Storage API.

The BigQuery storage API can be used to read data stored in BigQuery.

Instantiates the big query read client.

Parameters
  • credentials (Optional[google.auth.credentials.Credentials]) – The authorization credentials to attach to requests. These credentials identify the application to the service; if none are specified, the client will attempt to ascertain the credentials from the environment.

  • transport (Optional[Union[str,BigQueryReadTransport,Callable[..., BigQueryReadTransport]]]) – The transport to use, or a Callable that constructs and returns a new transport. If a Callable is given, it will be called with the same set of initialization arguments as used in the BigQueryReadTransport constructor. If set to None, a transport is chosen automatically.

  • client_options (Optional[Union[google.api_core.client_options.ClientOptions, dict]]) –

    Custom options for the client.

    1. The api_endpoint property can be used to override the default endpoint provided by the client when transport is not explicitly provided. Only if this property is not set and transport was not explicitly provided, the endpoint is determined by the GOOGLE_API_USE_MTLS_ENDPOINT environment variable, which have one of the following values: “always” (always use the default mTLS endpoint), “never” (always use the default regular endpoint) and “auto” (auto-switch to the default mTLS endpoint if client certificate is present; this is the default value).

    2. If the GOOGLE_API_USE_CLIENT_CERTIFICATE environment variable is “true”, then the client_cert_source property can be used to provide a client certificate for mTLS transport. If not provided, the default SSL client certificate will be used if present. If GOOGLE_API_USE_CLIENT_CERTIFICATE is “false” or not set, no client certificate will be used.

    3. The universe_domain property can be used to override the default “googleapis.com” universe. Note that the api_endpoint property still takes precedence; and universe_domain is currently not supported for mTLS.

  • client_info (google.api_core.gapic_v1.client_info.ClientInfo) – The client info used to send a user-agent string along with API requests. If None, then default info will be used. Generally, you only need to set this if you’re developing your own client library.

Raises

google.auth.exceptions.MutualTLSChannelError – If mutual TLS transport creation failed for any reason.

__exit__(type, value, traceback)[source]

Releases underlying transport’s resources.

Warning

ONLY use as a context manager if the transport is NOT shared with other clients! Exiting the with block will CLOSE the transport and may cause errors in other clients!

property api_endpoint

Return the API endpoint used by the client instance.

Returns

The API endpoint used by the client instance.

Return type

str

static common_billing_account_path(billing_account: str) str[source]

Returns a fully-qualified billing_account string.

static common_folder_path(folder: str) str[source]

Returns a fully-qualified folder string.

static common_location_path(project: str, location: str) str[source]

Returns a fully-qualified location string.

static common_organization_path(organization: str) str[source]

Returns a fully-qualified organization string.

static common_project_path(project: str) str[source]

Returns a fully-qualified project string.

create_read_session(request: Optional[Union[google.cloud.bigquery_storage_v1.types.storage.CreateReadSessionRequest, dict]] = None, *, parent: Optional[str] = None, read_session: Optional[google.cloud.bigquery_storage_v1.types.stream.ReadSession] = None, max_stream_count: Optional[int] = None, retry: Optional[Union[google.api_core.retry.retry_unary.Retry, google.api_core.gapic_v1.method._MethodDefault]] = _MethodDefault._DEFAULT_VALUE, timeout: Union[float, object] = _MethodDefault._DEFAULT_VALUE, metadata: Sequence[Tuple[str, str]] = ()) google.cloud.bigquery_storage_v1.types.stream.ReadSession[source]

Creates a new read session. A read session divides the contents of a BigQuery table into one or more streams, which can then be used to read data from the table. The read session also specifies properties of the data to be read, such as a list of columns or a push-down filter describing the rows to be returned.

A particular row can be read by at most one stream. When the caller has reached the end of each stream in the session, then all the data in the table has been read.

Data is assigned to each stream such that roughly the same number of rows can be read from each stream. Because the server-side unit for assigning data is collections of rows, the API does not guarantee that each stream will return the same number or rows. Additionally, the limits are enforced based on the number of pre-filtered rows, so some filters can lead to lopsided assignments.

Read sessions automatically expire 6 hours after they are created and do not require manual clean-up by the caller.

# This snippet has been automatically generated and should be regarded as a
# code template only.
# It will require modifications to work:
# - It may require correct/in-range values for request initialization.
# - It may require specifying regional endpoints when creating the service
#   client as shown in:
#   https://googleapis.dev/python/google-api-core/latest/client_options.html
from google.cloud import bigquery_storage_v1

def sample_create_read_session():
    # Create a client
    client = bigquery_storage_v1.BigQueryReadClient()

    # Initialize request argument(s)
    request = bigquery_storage_v1.CreateReadSessionRequest(
        parent="parent_value",
    )

    # Make the request
    response = client.create_read_session(request=request)

    # Handle the response
    print(response)
Parameters
  • request (Union[google.cloud.bigquery_storage_v1.types.CreateReadSessionRequest, dict]) – The request object. Request message for CreateReadSession.

  • parent (str) –

    Required. The request project that owns the session, in the form of projects/{project_id}.

    This corresponds to the parent field on the request instance; if request is provided, this should not be set.

  • read_session (google.cloud.bigquery_storage_v1.types.ReadSession) – Required. Session to be created. This corresponds to the read_session field on the request instance; if request is provided, this should not be set.

  • max_stream_count (int) –

    Max initial number of streams. If unset or zero, the server will provide a value of streams so as to produce reasonable throughput. Must be non-negative. The number of streams may be lower than the requested number, depending on the amount parallelism that is reasonable for the table. There is a default system max limit of 1,000.

    This must be greater than or equal to preferred_min_stream_count. Typically, clients should either leave this unset to let the system to determine an upper bound OR set this a size for the maximum “units of work” it can gracefully handle.

    This corresponds to the max_stream_count field on the request instance; if request is provided, this should not be set.

  • retry (google.api_core.retry.Retry) – Designation of what errors, if any, should be retried.

  • timeout (float) – The timeout for this request.

  • metadata (Sequence[Tuple[str, str]]) – Strings which should be sent along with the request as metadata.

Returns

Information about the ReadSession.

Return type

google.cloud.bigquery_storage_v1.types.ReadSession

classmethod from_service_account_file(filename: str, *args, **kwargs)[source]
Creates an instance of this client using the provided credentials

file.

Parameters
  • filename (str) – The path to the service account private key json file.

  • args – Additional arguments to pass to the constructor.

  • kwargs – Additional arguments to pass to the constructor.

Returns

The constructed client.

Return type

BigQueryReadClient

classmethod from_service_account_info(info: dict, *args, **kwargs)[source]
Creates an instance of this client using the provided credentials

info.

Parameters
  • info (dict) – The service account private key info.

  • args – Additional arguments to pass to the constructor.

  • kwargs – Additional arguments to pass to the constructor.

Returns

The constructed client.

Return type

BigQueryReadClient

classmethod from_service_account_json(filename: str, *args, **kwargs)
Creates an instance of this client using the provided credentials

file.

Parameters
  • filename (str) – The path to the service account private key json file.

  • args – Additional arguments to pass to the constructor.

  • kwargs – Additional arguments to pass to the constructor.

Returns

The constructed client.

Return type

BigQueryReadClient

classmethod get_mtls_endpoint_and_cert_source(client_options: Optional[google.api_core.client_options.ClientOptions] = None)[source]

Deprecated. Return the API endpoint and client cert source for mutual TLS.

The client cert source is determined in the following order: (1) if GOOGLE_API_USE_CLIENT_CERTIFICATE environment variable is not “true”, the client cert source is None. (2) if client_options.client_cert_source is provided, use the provided one; if the default client cert source exists, use the default one; otherwise the client cert source is None.

The API endpoint is determined in the following order: (1) if client_options.api_endpoint if provided, use the provided one. (2) if GOOGLE_API_USE_CLIENT_CERTIFICATE environment variable is “always”, use the default mTLS endpoint; if the environment variable is “never”, use the default API endpoint; otherwise if client cert source exists, use the default mTLS endpoint, otherwise use the default API endpoint.

More details can be found at https://google.aip.dev/auth/4114.

Parameters

client_options (google.api_core.client_options.ClientOptions) – Custom options for the client. Only the api_endpoint and client_cert_source properties may be used in this method.

Returns

returns the API endpoint and the

client cert source to use.

Return type

Tuple[str, Callable[[], Tuple[bytes, bytes]]]

Raises

google.auth.exceptions.MutualTLSChannelError – If any errors happen.

static parse_common_billing_account_path(path: str) Dict[str, str][source]

Parse a billing_account path into its component segments.

static parse_common_folder_path(path: str) Dict[str, str][source]

Parse a folder path into its component segments.

static parse_common_location_path(path: str) Dict[str, str][source]

Parse a location path into its component segments.

static parse_common_organization_path(path: str) Dict[str, str][source]

Parse a organization path into its component segments.

static parse_common_project_path(path: str) Dict[str, str][source]

Parse a project path into its component segments.

static parse_read_session_path(path: str) Dict[str, str][source]

Parses a read_session path into its component segments.

static parse_read_stream_path(path: str) Dict[str, str][source]

Parses a read_stream path into its component segments.

static parse_table_path(path: str) Dict[str, str][source]

Parses a table path into its component segments.

read_rows(name, offset=0, retry=_MethodDefault._DEFAULT_VALUE, timeout=_MethodDefault._DEFAULT_VALUE, metadata=(), retry_delay_callback=None)[source]

Reads rows from the table in the format prescribed by the read session. Each response contains one or more table rows, up to a maximum of 10 MiB per response; read requests which attempt to read individual rows larger than this will fail.

Each request also returns a set of stream statistics reflecting the estimated total number of rows in the read stream. This number is computed based on the total table size and the number of active streams in the read session, and may change as other streams continue to read data.

Example

>>> from google.cloud import bigquery_storage
>>>
>>> client = bigquery_storage.BigQueryReadClient()
>>>
>>> # TODO: Initialize ``table``:
>>> table = "projects/{}/datasets/{}/tables/{}".format(
...     'project_id': 'your-data-project-id',
...     'dataset_id': 'your_dataset_id',
...     'table_id': 'your_table_id',
... )
>>>
>>> # TODO: Initialize `parent`:
>>> parent = 'projects/your-billing-project-id'
>>>
>>> requested_session = bigquery_storage.types.ReadSession(
...     table=table,
...     data_format=bigquery_storage.types.DataFormat.AVRO,
... )
>>> session = client.create_read_session(
...     parent=parent, read_session=requested_session
... )
>>>
>>> stream = session.streams[0],  # TODO: Also read any other streams.
>>> read_rows_stream = client.read_rows(stream.name)
>>>
>>> for element in read_rows_stream.rows(session):
...     # process element
...     pass
Parameters
  • name (str) – Required. Name of the stream to start reading from, of the form projects/{project_id}/locations/{location}/sessions/{session_id}/streams/{stream_id}

  • offset (Optional[int]) – The starting offset from which to begin reading rows from in the stream. The offset requested must be less than the last row read from ReadRows. Requesting a larger offset is undefined.

  • retry (Optional[google.api_core.retry.Retry]) – A retry object used to retry requests. If None is specified, requests will not be retried.

  • timeout (Optional[float]) – The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.

  • metadata (Optional[Sequence[Tuple[str, str]]]) – Additional metadata that is provided to the method.

  • retry_delay_callback (Optional[Callable[[float], None]]) – If the client receives a retryable error that asks the client to delay its next attempt and retry_delay_callback is not None, BigQueryReadClient will call retry_delay_callback with the delay duration (in seconds) before it starts sleeping until the next attempt.

Returns

An iterable of ReadRowsResponse.

Return type

ReadRowsStream

Raises
static read_session_path(project: str, location: str, session: str) str[source]

Returns a fully-qualified read_session string.

static read_stream_path(project: str, location: str, session: str, stream: str) str[source]

Returns a fully-qualified read_stream string.

split_read_stream(request: Optional[Union[google.cloud.bigquery_storage_v1.types.storage.SplitReadStreamRequest, dict]] = None, *, retry: Optional[Union[google.api_core.retry.retry_unary.Retry, google.api_core.gapic_v1.method._MethodDefault]] = _MethodDefault._DEFAULT_VALUE, timeout: Union[float, object] = _MethodDefault._DEFAULT_VALUE, metadata: Sequence[Tuple[str, str]] = ()) google.cloud.bigquery_storage_v1.types.storage.SplitReadStreamResponse[source]

Splits a given ReadStream into two ReadStream objects. These ReadStream objects are referred to as the primary and the residual streams of the split. The original ReadStream can still be read from in the same manner as before. Both of the returned ReadStream objects can also be read from, and the rows returned by both child streams will be the same as the rows read from the original stream.

Moreover, the two child streams will be allocated back-to-back in the original ReadStream. Concretely, it is guaranteed that for streams original, primary, and residual, that original[0-j] = primary[0-j] and original[j-n] = residual[0-m] once the streams have been read to completion.

# This snippet has been automatically generated and should be regarded as a
# code template only.
# It will require modifications to work:
# - It may require correct/in-range values for request initialization.
# - It may require specifying regional endpoints when creating the service
#   client as shown in:
#   https://googleapis.dev/python/google-api-core/latest/client_options.html
from google.cloud import bigquery_storage_v1

def sample_split_read_stream():
    # Create a client
    client = bigquery_storage_v1.BigQueryReadClient()

    # Initialize request argument(s)
    request = bigquery_storage_v1.SplitReadStreamRequest(
        name="name_value",
    )

    # Make the request
    response = client.split_read_stream(request=request)

    # Handle the response
    print(response)
Parameters
Returns

Response message for SplitReadStream.

Return type

google.cloud.bigquery_storage_v1.types.SplitReadStreamResponse

static table_path(project: str, dataset: str, table: str) str[source]

Returns a fully-qualified table string.

property transport: google.cloud.bigquery_storage_v1.services.big_query_read.transports.base.BigQueryReadTransport

Returns the transport used by the client instance.

Returns

The transport used by the client

instance.

Return type

BigQueryReadTransport

property universe_domain: str

Return the universe domain used by the client instance.

Returns

The universe domain used by the client instance.

Return type

str

class google.cloud.bigquery_storage_v1.client.BigQueryWriteClient(*, credentials: typing.Optional[google.auth.credentials.Credentials] = None, transport: typing.Optional[typing.Union[str, google.cloud.bigquery_storage_v1.services.big_query_write.transports.base.BigQueryWriteTransport, typing.Callable[[...], google.cloud.bigquery_storage_v1.services.big_query_write.transports.base.BigQueryWriteTransport]]] = None, client_options: typing.Optional[typing.Union[google.api_core.client_options.ClientOptions, dict]] = None, client_info: google.api_core.gapic_v1.client_info.ClientInfo = <google.api_core.gapic_v1.client_info.ClientInfo object>)[source]

BigQuery Write API.

The Write API can be used to write data to BigQuery.

For supplementary information about the Write API, see:

https://cloud.google.com/bigquery/docs/write-api

Instantiates the big query write client.

Parameters
  • credentials (Optional[google.auth.credentials.Credentials]) – The authorization credentials to attach to requests. These credentials identify the application to the service; if none are specified, the client will attempt to ascertain the credentials from the environment.

  • transport (Optional[Union[str,BigQueryWriteTransport,Callable[..., BigQueryWriteTransport]]]) – The transport to use, or a Callable that constructs and returns a new transport. If a Callable is given, it will be called with the same set of initialization arguments as used in the BigQueryWriteTransport constructor. If set to None, a transport is chosen automatically.

  • client_options (Optional[Union[google.api_core.client_options.ClientOptions, dict]]) –

    Custom options for the client.

    1. The api_endpoint property can be used to override the default endpoint provided by the client when transport is not explicitly provided. Only if this property is not set and transport was not explicitly provided, the endpoint is determined by the GOOGLE_API_USE_MTLS_ENDPOINT environment variable, which have one of the following values: “always” (always use the default mTLS endpoint), “never” (always use the default regular endpoint) and “auto” (auto-switch to the default mTLS endpoint if client certificate is present; this is the default value).

    2. If the GOOGLE_API_USE_CLIENT_CERTIFICATE environment variable is “true”, then the client_cert_source property can be used to provide a client certificate for mTLS transport. If not provided, the default SSL client certificate will be used if present. If GOOGLE_API_USE_CLIENT_CERTIFICATE is “false” or not set, no client certificate will be used.

    3. The universe_domain property can be used to override the default “googleapis.com” universe. Note that the api_endpoint property still takes precedence; and universe_domain is currently not supported for mTLS.

  • client_info (google.api_core.gapic_v1.client_info.ClientInfo) – The client info used to send a user-agent string along with API requests. If None, then default info will be used. Generally, you only need to set this if you’re developing your own client library.

Raises

google.auth.exceptions.MutualTLSChannelError – If mutual TLS transport creation failed for any reason.

__exit__(type, value, traceback)[source]

Releases underlying transport’s resources.

Warning

ONLY use as a context manager if the transport is NOT shared with other clients! Exiting the with block will CLOSE the transport and may cause errors in other clients!

property api_endpoint

Return the API endpoint used by the client instance.

Returns

The API endpoint used by the client instance.

Return type

str

append_rows(requests: Optional[Iterator[google.cloud.bigquery_storage_v1.types.storage.AppendRowsRequest]] = None, *, retry: Optional[Union[google.api_core.retry.retry_unary.Retry, google.api_core.gapic_v1.method._MethodDefault]] = _MethodDefault._DEFAULT_VALUE, timeout: Union[float, object] = _MethodDefault._DEFAULT_VALUE, metadata: Sequence[Tuple[str, str]] = ()) Iterable[google.cloud.bigquery_storage_v1.types.storage.AppendRowsResponse][source]

Appends data to the given stream.

If offset is specified, the offset is checked against the end of stream. The server returns OUT_OF_RANGE in AppendRowsResponse if an attempt is made to append to an offset beyond the current end of the stream or ALREADY_EXISTS if user provides an offset that has already been written to. User can retry with adjusted offset within the same RPC connection. If offset is not specified, append happens at the end of the stream.

The response contains an optional offset at which the append happened. No offset information will be returned for appends to a default stream.

Responses are received in the same order in which requests are sent. There will be one response for each successful inserted request. Responses may optionally embed error information if the originating AppendRequest was not successfully processed.

The specifics of when successfully appended data is made visible to the table are governed by the type of stream:

  • For COMMITTED streams (which includes the default stream), data is visible immediately upon successful append.

  • For BUFFERED streams, data is made visible via a subsequent FlushRows rpc which advances a cursor to a newer offset in the stream.

  • For PENDING streams, data is not made visible until the stream itself is finalized (via the FinalizeWriteStream rpc), and the stream is explicitly committed via the BatchCommitWriteStreams rpc.

# This snippet has been automatically generated and should be regarded as a
# code template only.
# It will require modifications to work:
# - It may require correct/in-range values for request initialization.
# - It may require specifying regional endpoints when creating the service
#   client as shown in:
#   https://googleapis.dev/python/google-api-core/latest/client_options.html
from google.cloud import bigquery_storage_v1

def sample_append_rows():
    # Create a client
    client = bigquery_storage_v1.BigQueryWriteClient()

    # Initialize request argument(s)
    request = bigquery_storage_v1.AppendRowsRequest(
        write_stream="write_stream_value",
    )

    # This method expects an iterator which contains
    # 'bigquery_storage_v1.AppendRowsRequest' objects
    # Here we create a generator that yields a single `request` for
    # demonstrative purposes.
    requests = [request]

    def request_generator():
        for request in requests:
            yield request

    # Make the request
    stream = client.append_rows(requests=request_generator())

    # Handle the response
    for response in stream:
        print(response)
Parameters
  • requests (Iterator[google.cloud.bigquery_storage_v1.types.AppendRowsRequest]) –

    The request object iterator. Request message for AppendRows.

    Because AppendRows is a bidirectional streaming RPC, certain parts of the AppendRowsRequest need only be specified for the first request before switching table destinations. You can also switch table destinations within the same connection for the default stream.

    The size of a single AppendRowsRequest must be less than 10 MB in size. Requests larger than this return an error, typically INVALID_ARGUMENT.

  • retry (google.api_core.retry.Retry) – Designation of what errors, if any, should be retried.

  • timeout (float) – The timeout for this request.

  • metadata (Sequence[Tuple[str, str]]) – Strings which should be sent along with the request as metadata.

Returns

Response message for AppendRows.

Return type

Iterable[google.cloud.bigquery_storage_v1.types.AppendRowsResponse]

batch_commit_write_streams(request: Optional[Union[google.cloud.bigquery_storage_v1.types.storage.BatchCommitWriteStreamsRequest, dict]] = None, *, parent: Optional[str] = None, retry: Optional[Union[google.api_core.retry.retry_unary.Retry, google.api_core.gapic_v1.method._MethodDefault]] = _MethodDefault._DEFAULT_VALUE, timeout: Union[float, object] = _MethodDefault._DEFAULT_VALUE, metadata: Sequence[Tuple[str, str]] = ()) google.cloud.bigquery_storage_v1.types.storage.BatchCommitWriteStreamsResponse[source]

Atomically commits a group of PENDING streams that belong to the same parent table.

Streams must be finalized before commit and cannot be committed multiple times. Once a stream is committed, data in the stream becomes available for read operations.

# This snippet has been automatically generated and should be regarded as a
# code template only.
# It will require modifications to work:
# - It may require correct/in-range values for request initialization.
# - It may require specifying regional endpoints when creating the service
#   client as shown in:
#   https://googleapis.dev/python/google-api-core/latest/client_options.html
from google.cloud import bigquery_storage_v1

def sample_batch_commit_write_streams():
    # Create a client
    client = bigquery_storage_v1.BigQueryWriteClient()

    # Initialize request argument(s)
    request = bigquery_storage_v1.BatchCommitWriteStreamsRequest(
        parent="parent_value",
        write_streams=['write_streams_value1', 'write_streams_value2'],
    )

    # Make the request
    response = client.batch_commit_write_streams(request=request)

    # Handle the response
    print(response)
Parameters
  • request (Union[google.cloud.bigquery_storage_v1.types.BatchCommitWriteStreamsRequest, dict]) – The request object. Request message for BatchCommitWriteStreams.

  • parent (str) –

    Required. Parent table that all the streams should belong to, in the form of projects/{project}/datasets/{dataset}/tables/{table}.

    This corresponds to the parent field on the request instance; if request is provided, this should not be set.

  • retry (google.api_core.retry.Retry) – Designation of what errors, if any, should be retried.

  • timeout (float) – The timeout for this request.

  • metadata (Sequence[Tuple[str, str]]) – Strings which should be sent along with the request as metadata.

Returns

Response message for BatchCommitWriteStreams.

Return type

google.cloud.bigquery_storage_v1.types.BatchCommitWriteStreamsResponse

static common_billing_account_path(billing_account: str) str[source]

Returns a fully-qualified billing_account string.

static common_folder_path(folder: str) str[source]

Returns a fully-qualified folder string.

static common_location_path(project: str, location: str) str[source]

Returns a fully-qualified location string.

static common_organization_path(organization: str) str[source]

Returns a fully-qualified organization string.

static common_project_path(project: str) str[source]

Returns a fully-qualified project string.

create_write_stream(request: Optional[Union[google.cloud.bigquery_storage_v1.types.storage.CreateWriteStreamRequest, dict]] = None, *, parent: Optional[str] = None, write_stream: Optional[google.cloud.bigquery_storage_v1.types.stream.WriteStream] = None, retry: Optional[Union[google.api_core.retry.retry_unary.Retry, google.api_core.gapic_v1.method._MethodDefault]] = _MethodDefault._DEFAULT_VALUE, timeout: Union[float, object] = _MethodDefault._DEFAULT_VALUE, metadata: Sequence[Tuple[str, str]] = ()) google.cloud.bigquery_storage_v1.types.stream.WriteStream[source]

Creates a write stream to the given table. Additionally, every table has a special stream named ‘_default’ to which data can be written. This stream doesn’t need to be created using CreateWriteStream. It is a stream that can be used simultaneously by any number of clients. Data written to this stream is considered committed as soon as an acknowledgement is received.

# This snippet has been automatically generated and should be regarded as a
# code template only.
# It will require modifications to work:
# - It may require correct/in-range values for request initialization.
# - It may require specifying regional endpoints when creating the service
#   client as shown in:
#   https://googleapis.dev/python/google-api-core/latest/client_options.html
from google.cloud import bigquery_storage_v1

def sample_create_write_stream():
    # Create a client
    client = bigquery_storage_v1.BigQueryWriteClient()

    # Initialize request argument(s)
    request = bigquery_storage_v1.CreateWriteStreamRequest(
        parent="parent_value",
    )

    # Make the request
    response = client.create_write_stream(request=request)

    # Handle the response
    print(response)
Parameters
  • request (Union[google.cloud.bigquery_storage_v1.types.CreateWriteStreamRequest, dict]) – The request object. Request message for CreateWriteStream.

  • parent (str) –

    Required. Reference to the table to which the stream belongs, in the format of projects/{project}/datasets/{dataset}/tables/{table}.

    This corresponds to the parent field on the request instance; if request is provided, this should not be set.

  • write_stream (google.cloud.bigquery_storage_v1.types.WriteStream) – Required. Stream to be created. This corresponds to the write_stream field on the request instance; if request is provided, this should not be set.

  • retry (google.api_core.retry.Retry) – Designation of what errors, if any, should be retried.

  • timeout (float) – The timeout for this request.

  • metadata (Sequence[Tuple[str, str]]) – Strings which should be sent along with the request as metadata.

Returns

Information about a single stream that gets data inside the storage system.

Return type

google.cloud.bigquery_storage_v1.types.WriteStream

finalize_write_stream(request: Optional[Union[google.cloud.bigquery_storage_v1.types.storage.FinalizeWriteStreamRequest, dict]] = None, *, name: Optional[str] = None, retry: Optional[Union[google.api_core.retry.retry_unary.Retry, google.api_core.gapic_v1.method._MethodDefault]] = _MethodDefault._DEFAULT_VALUE, timeout: Union[float, object] = _MethodDefault._DEFAULT_VALUE, metadata: Sequence[Tuple[str, str]] = ()) google.cloud.bigquery_storage_v1.types.storage.FinalizeWriteStreamResponse[source]

Finalize a write stream so that no new data can be appended to the stream. Finalize is not supported on the ‘_default’ stream.

# This snippet has been automatically generated and should be regarded as a
# code template only.
# It will require modifications to work:
# - It may require correct/in-range values for request initialization.
# - It may require specifying regional endpoints when creating the service
#   client as shown in:
#   https://googleapis.dev/python/google-api-core/latest/client_options.html
from google.cloud import bigquery_storage_v1

def sample_finalize_write_stream():
    # Create a client
    client = bigquery_storage_v1.BigQueryWriteClient()

    # Initialize request argument(s)
    request = bigquery_storage_v1.FinalizeWriteStreamRequest(
        name="name_value",
    )

    # Make the request
    response = client.finalize_write_stream(request=request)

    # Handle the response
    print(response)
Parameters
  • request (Union[google.cloud.bigquery_storage_v1.types.FinalizeWriteStreamRequest, dict]) – The request object. Request message for invoking FinalizeWriteStream.

  • name (str) –

    Required. Name of the stream to finalize, in the form of projects/{project}/datasets/{dataset}/tables/{table}/streams/{stream}.

    This corresponds to the name field on the request instance; if request is provided, this should not be set.

  • retry (google.api_core.retry.Retry) – Designation of what errors, if any, should be retried.

  • timeout (float) – The timeout for this request.

  • metadata (Sequence[Tuple[str, str]]) – Strings which should be sent along with the request as metadata.

Returns

Response message for FinalizeWriteStream.

Return type

google.cloud.bigquery_storage_v1.types.FinalizeWriteStreamResponse

flush_rows(request: Optional[Union[google.cloud.bigquery_storage_v1.types.storage.FlushRowsRequest, dict]] = None, *, write_stream: Optional[str] = None, retry: Optional[Union[google.api_core.retry.retry_unary.Retry, google.api_core.gapic_v1.method._MethodDefault]] = _MethodDefault._DEFAULT_VALUE, timeout: Union[float, object] = _MethodDefault._DEFAULT_VALUE, metadata: Sequence[Tuple[str, str]] = ()) google.cloud.bigquery_storage_v1.types.storage.FlushRowsResponse[source]

Flushes rows to a BUFFERED stream.

If users are appending rows to BUFFERED stream, flush operation is required in order for the rows to become available for reading. A Flush operation flushes up to any previously flushed offset in a BUFFERED stream, to the offset specified in the request.

Flush is not supported on the _default stream, since it is not BUFFERED.

# This snippet has been automatically generated and should be regarded as a
# code template only.
# It will require modifications to work:
# - It may require correct/in-range values for request initialization.
# - It may require specifying regional endpoints when creating the service
#   client as shown in:
#   https://googleapis.dev/python/google-api-core/latest/client_options.html
from google.cloud import bigquery_storage_v1

def sample_flush_rows():
    # Create a client
    client = bigquery_storage_v1.BigQueryWriteClient()

    # Initialize request argument(s)
    request = bigquery_storage_v1.FlushRowsRequest(
        write_stream="write_stream_value",
    )

    # Make the request
    response = client.flush_rows(request=request)

    # Handle the response
    print(response)
Parameters
  • request (Union[google.cloud.bigquery_storage_v1.types.FlushRowsRequest, dict]) – The request object. Request message for FlushRows.

  • write_stream (str) –

    Required. The stream that is the target of the flush operation.

    This corresponds to the write_stream field on the request instance; if request is provided, this should not be set.

  • retry (google.api_core.retry.Retry) – Designation of what errors, if any, should be retried.

  • timeout (float) – The timeout for this request.

  • metadata (Sequence[Tuple[str, str]]) – Strings which should be sent along with the request as metadata.

Returns

Respond message for FlushRows.

Return type

google.cloud.bigquery_storage_v1.types.FlushRowsResponse

classmethod from_service_account_file(filename: str, *args, **kwargs)[source]
Creates an instance of this client using the provided credentials

file.

Parameters
  • filename (str) – The path to the service account private key json file.

  • args – Additional arguments to pass to the constructor.

  • kwargs – Additional arguments to pass to the constructor.

Returns

The constructed client.

Return type

BigQueryWriteClient

classmethod from_service_account_info(info: dict, *args, **kwargs)[source]
Creates an instance of this client using the provided credentials

info.

Parameters
  • info (dict) – The service account private key info.

  • args – Additional arguments to pass to the constructor.

  • kwargs – Additional arguments to pass to the constructor.

Returns

The constructed client.

Return type

BigQueryWriteClient

classmethod from_service_account_json(filename: str, *args, **kwargs)
Creates an instance of this client using the provided credentials

file.

Parameters
  • filename (str) – The path to the service account private key json file.

  • args – Additional arguments to pass to the constructor.

  • kwargs – Additional arguments to pass to the constructor.

Returns

The constructed client.

Return type

BigQueryWriteClient

classmethod get_mtls_endpoint_and_cert_source(client_options: Optional[google.api_core.client_options.ClientOptions] = None)[source]

Deprecated. Return the API endpoint and client cert source for mutual TLS.

The client cert source is determined in the following order: (1) if GOOGLE_API_USE_CLIENT_CERTIFICATE environment variable is not “true”, the client cert source is None. (2) if client_options.client_cert_source is provided, use the provided one; if the default client cert source exists, use the default one; otherwise the client cert source is None.

The API endpoint is determined in the following order: (1) if client_options.api_endpoint if provided, use the provided one. (2) if GOOGLE_API_USE_CLIENT_CERTIFICATE environment variable is “always”, use the default mTLS endpoint; if the environment variable is “never”, use the default API endpoint; otherwise if client cert source exists, use the default mTLS endpoint, otherwise use the default API endpoint.

More details can be found at https://google.aip.dev/auth/4114.

Parameters

client_options (google.api_core.client_options.ClientOptions) – Custom options for the client. Only the api_endpoint and client_cert_source properties may be used in this method.

Returns

returns the API endpoint and the

client cert source to use.

Return type

Tuple[str, Callable[[], Tuple[bytes, bytes]]]

Raises

google.auth.exceptions.MutualTLSChannelError – If any errors happen.

get_write_stream(request: Optional[Union[google.cloud.bigquery_storage_v1.types.storage.GetWriteStreamRequest, dict]] = None, *, name: Optional[str] = None, retry: Optional[Union[google.api_core.retry.retry_unary.Retry, google.api_core.gapic_v1.method._MethodDefault]] = _MethodDefault._DEFAULT_VALUE, timeout: Union[float, object] = _MethodDefault._DEFAULT_VALUE, metadata: Sequence[Tuple[str, str]] = ()) google.cloud.bigquery_storage_v1.types.stream.WriteStream[source]

Gets information about a write stream.

# This snippet has been automatically generated and should be regarded as a
# code template only.
# It will require modifications to work:
# - It may require correct/in-range values for request initialization.
# - It may require specifying regional endpoints when creating the service
#   client as shown in:
#   https://googleapis.dev/python/google-api-core/latest/client_options.html
from google.cloud import bigquery_storage_v1

def sample_get_write_stream():
    # Create a client
    client = bigquery_storage_v1.BigQueryWriteClient()

    # Initialize request argument(s)
    request = bigquery_storage_v1.GetWriteStreamRequest(
        name="name_value",
    )

    # Make the request
    response = client.get_write_stream(request=request)

    # Handle the response
    print(response)
Parameters
  • request (Union[google.cloud.bigquery_storage_v1.types.GetWriteStreamRequest, dict]) – The request object. Request message for GetWriteStreamRequest.

  • name (str) –

    Required. Name of the stream to get, in the form of projects/{project}/datasets/{dataset}/tables/{table}/streams/{stream}.

    This corresponds to the name field on the request instance; if request is provided, this should not be set.

  • retry (google.api_core.retry.Retry) – Designation of what errors, if any, should be retried.

  • timeout (float) – The timeout for this request.

  • metadata (Sequence[Tuple[str, str]]) – Strings which should be sent along with the request as metadata.

Returns

Information about a single stream that gets data inside the storage system.

Return type

google.cloud.bigquery_storage_v1.types.WriteStream

static parse_common_billing_account_path(path: str) Dict[str, str][source]

Parse a billing_account path into its component segments.

static parse_common_folder_path(path: str) Dict[str, str][source]

Parse a folder path into its component segments.

static parse_common_location_path(path: str) Dict[str, str][source]

Parse a location path into its component segments.

static parse_common_organization_path(path: str) Dict[str, str][source]

Parse a organization path into its component segments.

static parse_common_project_path(path: str) Dict[str, str][source]

Parse a project path into its component segments.

static parse_table_path(path: str) Dict[str, str][source]

Parses a table path into its component segments.

static parse_write_stream_path(path: str) Dict[str, str][source]

Parses a write_stream path into its component segments.

static table_path(project: str, dataset: str, table: str) str[source]

Returns a fully-qualified table string.

property transport: google.cloud.bigquery_storage_v1.services.big_query_write.transports.base.BigQueryWriteTransport

Returns the transport used by the client instance.

Returns

The transport used by the client

instance.

Return type

BigQueryWriteTransport

property universe_domain: str

Return the universe domain used by the client instance.

Returns

The universe domain used by the client instance.

Return type

str

static write_stream_path(project: str, dataset: str, table: str, stream: str) str[source]

Returns a fully-qualified write_stream string.

class google.cloud.bigquery_storage_v1.reader.ReadRowsIterable(reader, read_session=None)[source]

An iterable of rows from a read session.

Parameters
__iter__()[source]

Iterator for each row in all pages.

property pages

A generator of all pages in the stream.

Returns

A generator of pages.

Return type

types.GeneratorType[google.cloud.bigquery_storage_v1.ReadRowsPage]

to_arrow()[source]

Create a pyarrow.Table of all rows in the stream.

This method requires the pyarrow library and a stream using the Arrow format.

Returns

A table of all rows in the stream.

Return type

pyarrow.Table

to_dataframe(dtypes=None)[source]

Create a pandas.DataFrame of all rows in the stream.

This method requires the pandas libary to create a data frame and the fastavro library to parse row messages.

Warning

DATETIME columns are not supported. They are currently parsed as strings in the fastavro library.

Parameters

dtypes (Map[str, Union[str, pandas.Series.dtype]]) – Optional. A dictionary of column names pandas dtype``s. The provided ``dtype is used when constructing the series for the column specified. Otherwise, the default pandas behavior is used.

Returns

A data frame of all rows in the stream.

Return type

pandas.DataFrame

class google.cloud.bigquery_storage_v1.reader.ReadRowsPage(stream_parser, message)[source]

An iterator of rows from a read session message.

Parameters
__iter__()[source]

A ReadRowsPage is an iterator.

__next__()

Get the next row in the page.

next()[source]

Get the next row in the page.

property num_items

Total items in the page.

Type

int

property remaining

Remaining items in the page.

Type

int

to_arrow()[source]

Create an pyarrow.RecordBatch of rows in the page.

Returns

Rows from the message, as an Arrow record batch.

Return type

pyarrow.RecordBatch

to_dataframe(dtypes=None)[source]

Create a pandas.DataFrame of rows in the page.

This method requires the pandas libary to create a data frame and the fastavro library to parse row messages.

Warning

DATETIME columns are not supported. They are currently parsed as strings in the fastavro library.

Parameters

dtypes (Map[str, Union[str, pandas.Series.dtype]]) – Optional. A dictionary of column names pandas dtype``s. The provided ``dtype is used when constructing the series for the column specified. Otherwise, the default pandas behavior is used.

Returns

A data frame of all rows in the stream.

Return type

pandas.DataFrame

class google.cloud.bigquery_storage_v1.reader.ReadRowsStream(client, name, offset, read_rows_kwargs, retry_delay_callback=None)[source]

A stream of results from a read rows request.

This stream is an iterable of ReadRowsResponse. Iterate over it to fetch all row messages.

If the fastavro library is installed, use the rows() method to parse all messages into a stream of row dictionaries.

If the pandas and fastavro libraries are installed, use the to_dataframe() method to parse all messages into a pandas.DataFrame.

This object should not be created directly, but is returned by other methods in this library.

Construct a ReadRowsStream.

Parameters
  • client (BigQueryReadClient) – A GAPIC client used to reconnect to a ReadRows stream. This must be the GAPIC client to avoid a circular dependency on this class.

  • name (str) – Required. Stream ID from which rows are being read.

  • offset (int) – Required. Position in the stream to start reading from. The offset requested must be less than the last row read from ReadRows. Requesting a larger offset is undefined.

  • read_rows_kwargs (dict) – Keyword arguments to use when reconnecting to a ReadRows stream.

  • retry_delay_callback (Optional[Callable[[float], None]]) – If the client receives a retryable error that asks the client to delay its next attempt and retry_delay_callback is not None, ReadRowsStream will call retry_delay_callback with the delay duration (in seconds) before it starts sleeping until the next attempt.

Returns

A sequence of row messages.

Return type

Iterable[ ReadRowsResponse ]

__iter__()[source]

An iterable of messages.

Returns

A sequence of row messages.

Return type

Iterable[ ReadRowsResponse ]

rows(read_session=None)[source]

Iterate over all rows in the stream.

This method requires the fastavro library in order to parse row messages in avro format. For arrow format messages, the pyarrow library is required.

Warning

DATETIME columns are not supported. They are currently parsed as strings in the fastavro library.

Parameters

read_session (Optional[ReadSession]) – This argument was used to specify the schema of the rows in the stream, but now the first message in a read stream contains this information. When row_restriction is applied, some streams may be empty without read_session info. Provide this argument to avoid an error. For more information, see https://github.com/googleapis/python-bigquery-storage/issues/733

Returns

A sequence of rows, represented as dictionaries.

Return type

Iterable[Mapping]

to_arrow(read_session=None)[source]

Create a pyarrow.Table of all rows in the stream.

This method requires the pyarrow library and a stream using the Arrow format.

Parameters

read_session (ReadSession) – This argument was used to specify the schema of the rows in the stream, but now the first message in a read stream contains this information. When row_restriction is applied, some streams may be empty without read_session info. Provide this argument to avoid an error. For more information, see https://github.com/googleapis/python-bigquery-storage/issues/733

Returns

A table of all rows in the stream.

Return type

pyarrow.Table

to_dataframe(read_session=None, dtypes=None)[source]

Create a pandas.DataFrame of all rows in the stream.

This method requires the pandas libary to create a data frame and the fastavro library to parse row messages.

Warning

DATETIME columns are not supported. They are currently parsed as strings.

Parameters
  • read_session (ReadSession) – This argument was used to specify the schema of the rows in the stream, but now the first message in a read stream contains this information. When row_restriction is applied, some streams may be empty without read_session info. Provide this argument to avoid an error. For more information, see https://github.com/googleapis/python-bigquery-storage/issues/733

  • dtypes (Map[str, Union[str, pandas.Series.dtype]]) – Optional. A dictionary of column names pandas dtype``s. The provided ``dtype is used when constructing the series for the column specified. Otherwise, the default pandas behavior is used.

Returns

A data frame of all rows in the stream.

Return type

pandas.DataFrame