Bigquery Storage v1 API Library¶
Parent client for calling the Cloud BigQuery Storage API.
This is the base from which all interactions with the API occur.
- class google.cloud.bigquery_storage_v1.client.BigQueryReadClient(*, credentials: typing.Optional[google.auth.credentials.Credentials] = None, transport: typing.Optional[typing.Union[str, google.cloud.bigquery_storage_v1.services.big_query_read.transports.base.BigQueryReadTransport, typing.Callable[[...], google.cloud.bigquery_storage_v1.services.big_query_read.transports.base.BigQueryReadTransport]]] = None, client_options: typing.Optional[typing.Union[google.api_core.client_options.ClientOptions, dict]] = None, client_info: google.api_core.gapic_v1.client_info.ClientInfo = <google.api_core.gapic_v1.client_info.ClientInfo object>)[source]¶
Client for interacting with BigQuery Storage API.
The BigQuery storage API can be used to read data stored in BigQuery.
Instantiates the big query read client.
- Parameters
credentials (Optional[google.auth.credentials.Credentials]) – The authorization credentials to attach to requests. These credentials identify the application to the service; if none are specified, the client will attempt to ascertain the credentials from the environment.
transport (Optional[Union[str,BigQueryReadTransport,Callable[..., BigQueryReadTransport]]]) – The transport to use, or a Callable that constructs and returns a new transport. If a Callable is given, it will be called with the same set of initialization arguments as used in the BigQueryReadTransport constructor. If set to None, a transport is chosen automatically.
client_options (Optional[Union[google.api_core.client_options.ClientOptions, dict]]) –
Custom options for the client.
1. The
api_endpoint
property can be used to override the default endpoint provided by the client whentransport
is not explicitly provided. Only if this property is not set andtransport
was not explicitly provided, the endpoint is determined by the GOOGLE_API_USE_MTLS_ENDPOINT environment variable, which have one of the following values: “always” (always use the default mTLS endpoint), “never” (always use the default regular endpoint) and “auto” (auto-switch to the default mTLS endpoint if client certificate is present; this is the default value).2. If the GOOGLE_API_USE_CLIENT_CERTIFICATE environment variable is “true”, then the
client_cert_source
property can be used to provide a client certificate for mTLS transport. If not provided, the default SSL client certificate will be used if present. If GOOGLE_API_USE_CLIENT_CERTIFICATE is “false” or not set, no client certificate will be used.3. The
universe_domain
property can be used to override the default “googleapis.com” universe. Note that theapi_endpoint
property still takes precedence; anduniverse_domain
is currently not supported for mTLS.client_info (google.api_core.gapic_v1.client_info.ClientInfo) – The client info used to send a user-agent string along with API requests. If
None
, then default info will be used. Generally, you only need to set this if you’re developing your own client library.
- Raises
google.auth.exceptions.MutualTLSChannelError – If mutual TLS transport creation failed for any reason.
- __exit__(type, value, traceback)[source]¶
Releases underlying transport’s resources.
Warning
ONLY use as a context manager if the transport is NOT shared with other clients! Exiting the with block will CLOSE the transport and may cause errors in other clients!
- property api_endpoint¶
Return the API endpoint used by the client instance.
- Returns
The API endpoint used by the client instance.
- Return type
- static common_billing_account_path(billing_account: str) str [source]¶
Returns a fully-qualified billing_account string.
- static common_location_path(project: str, location: str) str [source]¶
Returns a fully-qualified location string.
- static common_organization_path(organization: str) str [source]¶
Returns a fully-qualified organization string.
- create_read_session(request: Optional[Union[google.cloud.bigquery_storage_v1.types.storage.CreateReadSessionRequest, dict]] = None, *, parent: Optional[str] = None, read_session: Optional[google.cloud.bigquery_storage_v1.types.stream.ReadSession] = None, max_stream_count: Optional[int] = None, retry: Optional[Union[google.api_core.retry.retry_unary.Retry, google.api_core.gapic_v1.method._MethodDefault]] = _MethodDefault._DEFAULT_VALUE, timeout: Union[float, object] = _MethodDefault._DEFAULT_VALUE, metadata: Sequence[Tuple[str, str]] = ()) google.cloud.bigquery_storage_v1.types.stream.ReadSession [source]¶
Creates a new read session. A read session divides the contents of a BigQuery table into one or more streams, which can then be used to read data from the table. The read session also specifies properties of the data to be read, such as a list of columns or a push-down filter describing the rows to be returned.
A particular row can be read by at most one stream. When the caller has reached the end of each stream in the session, then all the data in the table has been read.
Data is assigned to each stream such that roughly the same number of rows can be read from each stream. Because the server-side unit for assigning data is collections of rows, the API does not guarantee that each stream will return the same number or rows. Additionally, the limits are enforced based on the number of pre-filtered rows, so some filters can lead to lopsided assignments.
Read sessions automatically expire 6 hours after they are created and do not require manual clean-up by the caller.
# This snippet has been automatically generated and should be regarded as a # code template only. # It will require modifications to work: # - It may require correct/in-range values for request initialization. # - It may require specifying regional endpoints when creating the service # client as shown in: # https://googleapis.dev/python/google-api-core/latest/client_options.html from google.cloud import bigquery_storage_v1 def sample_create_read_session(): # Create a client client = bigquery_storage_v1.BigQueryReadClient() # Initialize request argument(s) request = bigquery_storage_v1.CreateReadSessionRequest( parent="parent_value", ) # Make the request response = client.create_read_session(request=request) # Handle the response print(response)
- Parameters
request (Union[google.cloud.bigquery_storage_v1.types.CreateReadSessionRequest, dict]) – The request object. Request message for
CreateReadSession
.parent (str) –
Required. The request project that owns the session, in the form of
projects/{project_id}
.This corresponds to the
parent
field on therequest
instance; ifrequest
is provided, this should not be set.read_session (google.cloud.bigquery_storage_v1.types.ReadSession) – Required. Session to be created. This corresponds to the
read_session
field on therequest
instance; ifrequest
is provided, this should not be set.max_stream_count (int) –
Max initial number of streams. If unset or zero, the server will provide a value of streams so as to produce reasonable throughput. Must be non-negative. The number of streams may be lower than the requested number, depending on the amount parallelism that is reasonable for the table. There is a default system max limit of 1,000.
This must be greater than or equal to preferred_min_stream_count. Typically, clients should either leave this unset to let the system to determine an upper bound OR set this a size for the maximum “units of work” it can gracefully handle.
This corresponds to the
max_stream_count
field on therequest
instance; ifrequest
is provided, this should not be set.retry (google.api_core.retry.Retry) – Designation of what errors, if any, should be retried.
timeout (float) – The timeout for this request.
metadata (Sequence[Tuple[str, str]]) – Strings which should be sent along with the request as metadata.
- Returns
Information about the ReadSession.
- Return type
- classmethod from_service_account_file(filename: str, *args, **kwargs)[source]¶
- Creates an instance of this client using the provided credentials
file.
- Parameters
filename (str) – The path to the service account private key json file.
args – Additional arguments to pass to the constructor.
kwargs – Additional arguments to pass to the constructor.
- Returns
The constructed client.
- Return type
- classmethod from_service_account_info(info: dict, *args, **kwargs)[source]¶
- Creates an instance of this client using the provided credentials
info.
- Parameters
info (dict) – The service account private key info.
args – Additional arguments to pass to the constructor.
kwargs – Additional arguments to pass to the constructor.
- Returns
The constructed client.
- Return type
- classmethod from_service_account_json(filename: str, *args, **kwargs)¶
- Creates an instance of this client using the provided credentials
file.
- Parameters
filename (str) – The path to the service account private key json file.
args – Additional arguments to pass to the constructor.
kwargs – Additional arguments to pass to the constructor.
- Returns
The constructed client.
- Return type
- classmethod get_mtls_endpoint_and_cert_source(client_options: Optional[google.api_core.client_options.ClientOptions] = None)[source]¶
Deprecated. Return the API endpoint and client cert source for mutual TLS.
The client cert source is determined in the following order: (1) if GOOGLE_API_USE_CLIENT_CERTIFICATE environment variable is not “true”, the client cert source is None. (2) if client_options.client_cert_source is provided, use the provided one; if the default client cert source exists, use the default one; otherwise the client cert source is None.
The API endpoint is determined in the following order: (1) if client_options.api_endpoint if provided, use the provided one. (2) if GOOGLE_API_USE_CLIENT_CERTIFICATE environment variable is “always”, use the default mTLS endpoint; if the environment variable is “never”, use the default API endpoint; otherwise if client cert source exists, use the default mTLS endpoint, otherwise use the default API endpoint.
More details can be found at https://google.aip.dev/auth/4114.
- Parameters
client_options (google.api_core.client_options.ClientOptions) – Custom options for the client. Only the api_endpoint and client_cert_source properties may be used in this method.
- Returns
- returns the API endpoint and the
client cert source to use.
- Return type
- Raises
google.auth.exceptions.MutualTLSChannelError – If any errors happen.
- static parse_common_billing_account_path(path: str) Dict[str, str] [source]¶
Parse a billing_account path into its component segments.
- static parse_common_folder_path(path: str) Dict[str, str] [source]¶
Parse a folder path into its component segments.
- static parse_common_location_path(path: str) Dict[str, str] [source]¶
Parse a location path into its component segments.
- static parse_common_organization_path(path: str) Dict[str, str] [source]¶
Parse a organization path into its component segments.
- static parse_common_project_path(path: str) Dict[str, str] [source]¶
Parse a project path into its component segments.
- static parse_read_session_path(path: str) Dict[str, str] [source]¶
Parses a read_session path into its component segments.
- static parse_read_stream_path(path: str) Dict[str, str] [source]¶
Parses a read_stream path into its component segments.
- static parse_table_path(path: str) Dict[str, str] [source]¶
Parses a table path into its component segments.
- read_rows(name, offset=0, retry=_MethodDefault._DEFAULT_VALUE, timeout=_MethodDefault._DEFAULT_VALUE, metadata=(), retry_delay_callback=None)[source]¶
Reads rows from the table in the format prescribed by the read session. Each response contains one or more table rows, up to a maximum of 10 MiB per response; read requests which attempt to read individual rows larger than this will fail.
Each request also returns a set of stream statistics reflecting the estimated total number of rows in the read stream. This number is computed based on the total table size and the number of active streams in the read session, and may change as other streams continue to read data.
Example
>>> from google.cloud import bigquery_storage >>> >>> client = bigquery_storage.BigQueryReadClient() >>> >>> # TODO: Initialize ``table``: >>> table = "projects/{}/datasets/{}/tables/{}".format( ... 'project_id': 'your-data-project-id', ... 'dataset_id': 'your_dataset_id', ... 'table_id': 'your_table_id', ... ) >>> >>> # TODO: Initialize `parent`: >>> parent = 'projects/your-billing-project-id' >>> >>> requested_session = bigquery_storage.types.ReadSession( ... table=table, ... data_format=bigquery_storage.types.DataFormat.AVRO, ... ) >>> session = client.create_read_session( ... parent=parent, read_session=requested_session ... ) >>> >>> stream = session.streams[0], # TODO: Also read any other streams. >>> read_rows_stream = client.read_rows(stream.name) >>> >>> for element in read_rows_stream.rows(session): ... # process element ... pass
- Parameters
name (str) – Required. Name of the stream to start reading from, of the form projects/{project_id}/locations/{location}/sessions/{session_id}/streams/{stream_id}
offset (Optional[int]) – The starting offset from which to begin reading rows from in the stream. The offset requested must be less than the last row read from ReadRows. Requesting a larger offset is undefined.
retry (Optional[google.api_core.retry.Retry]) – A retry object used to retry requests. If
None
is specified, requests will not be retried.timeout (Optional[float]) – The amount of time, in seconds, to wait for the request to complete. Note that if
retry
is specified, the timeout applies to each individual attempt.metadata (Optional[Sequence[Tuple[str, str]]]) – Additional metadata that is provided to the method.
retry_delay_callback (Optional[Callable[[float], None]]) – If the client receives a retryable error that asks the client to delay its next attempt and retry_delay_callback is not None, BigQueryReadClient will call retry_delay_callback with the delay duration (in seconds) before it starts sleeping until the next attempt.
- Returns
An iterable of
ReadRowsResponse
.- Return type
- Raises
google.api_core.exceptions.GoogleAPICallError – If the request failed for any reason.
google.api_core.exceptions.RetryError – If the request failed due to a retryable error and retry attempts failed.
ValueError – If the parameters are invalid.
- static read_session_path(project: str, location: str, session: str) str [source]¶
Returns a fully-qualified read_session string.
- static read_stream_path(project: str, location: str, session: str, stream: str) str [source]¶
Returns a fully-qualified read_stream string.
- split_read_stream(request: Optional[Union[google.cloud.bigquery_storage_v1.types.storage.SplitReadStreamRequest, dict]] = None, *, retry: Optional[Union[google.api_core.retry.retry_unary.Retry, google.api_core.gapic_v1.method._MethodDefault]] = _MethodDefault._DEFAULT_VALUE, timeout: Union[float, object] = _MethodDefault._DEFAULT_VALUE, metadata: Sequence[Tuple[str, str]] = ()) google.cloud.bigquery_storage_v1.types.storage.SplitReadStreamResponse [source]¶
Splits a given
ReadStream
into twoReadStream
objects. TheseReadStream
objects are referred to as the primary and the residual streams of the split. The originalReadStream
can still be read from in the same manner as before. Both of the returnedReadStream
objects can also be read from, and the rows returned by both child streams will be the same as the rows read from the original stream.Moreover, the two child streams will be allocated back-to-back in the original
ReadStream
. Concretely, it is guaranteed that for streams original, primary, and residual, that original[0-j] = primary[0-j] and original[j-n] = residual[0-m] once the streams have been read to completion.# This snippet has been automatically generated and should be regarded as a # code template only. # It will require modifications to work: # - It may require correct/in-range values for request initialization. # - It may require specifying regional endpoints when creating the service # client as shown in: # https://googleapis.dev/python/google-api-core/latest/client_options.html from google.cloud import bigquery_storage_v1 def sample_split_read_stream(): # Create a client client = bigquery_storage_v1.BigQueryReadClient() # Initialize request argument(s) request = bigquery_storage_v1.SplitReadStreamRequest( name="name_value", ) # Make the request response = client.split_read_stream(request=request) # Handle the response print(response)
- Parameters
request (Union[google.cloud.bigquery_storage_v1.types.SplitReadStreamRequest, dict]) – The request object. Request message for
SplitReadStream
.retry (google.api_core.retry.Retry) – Designation of what errors, if any, should be retried.
timeout (float) – The timeout for this request.
metadata (Sequence[Tuple[str, str]]) – Strings which should be sent along with the request as metadata.
- Returns
Response message for SplitReadStream.
- Return type
google.cloud.bigquery_storage_v1.types.SplitReadStreamResponse
- static table_path(project: str, dataset: str, table: str) str [source]¶
Returns a fully-qualified table string.
- property transport: google.cloud.bigquery_storage_v1.services.big_query_read.transports.base.BigQueryReadTransport¶
Returns the transport used by the client instance.
- Returns
- The transport used by the client
instance.
- Return type
BigQueryReadTransport
- class google.cloud.bigquery_storage_v1.client.BigQueryWriteClient(*, credentials: typing.Optional[google.auth.credentials.Credentials] = None, transport: typing.Optional[typing.Union[str, google.cloud.bigquery_storage_v1.services.big_query_write.transports.base.BigQueryWriteTransport, typing.Callable[[...], google.cloud.bigquery_storage_v1.services.big_query_write.transports.base.BigQueryWriteTransport]]] = None, client_options: typing.Optional[typing.Union[google.api_core.client_options.ClientOptions, dict]] = None, client_info: google.api_core.gapic_v1.client_info.ClientInfo = <google.api_core.gapic_v1.client_info.ClientInfo object>)[source]¶
BigQuery Write API.
The Write API can be used to write data to BigQuery.
For supplementary information about the Write API, see:
https://cloud.google.com/bigquery/docs/write-api
Instantiates the big query write client.
- Parameters
credentials (Optional[google.auth.credentials.Credentials]) – The authorization credentials to attach to requests. These credentials identify the application to the service; if none are specified, the client will attempt to ascertain the credentials from the environment.
transport (Optional[Union[str,BigQueryWriteTransport,Callable[..., BigQueryWriteTransport]]]) – The transport to use, or a Callable that constructs and returns a new transport. If a Callable is given, it will be called with the same set of initialization arguments as used in the BigQueryWriteTransport constructor. If set to None, a transport is chosen automatically.
client_options (Optional[Union[google.api_core.client_options.ClientOptions, dict]]) –
Custom options for the client.
1. The
api_endpoint
property can be used to override the default endpoint provided by the client whentransport
is not explicitly provided. Only if this property is not set andtransport
was not explicitly provided, the endpoint is determined by the GOOGLE_API_USE_MTLS_ENDPOINT environment variable, which have one of the following values: “always” (always use the default mTLS endpoint), “never” (always use the default regular endpoint) and “auto” (auto-switch to the default mTLS endpoint if client certificate is present; this is the default value).2. If the GOOGLE_API_USE_CLIENT_CERTIFICATE environment variable is “true”, then the
client_cert_source
property can be used to provide a client certificate for mTLS transport. If not provided, the default SSL client certificate will be used if present. If GOOGLE_API_USE_CLIENT_CERTIFICATE is “false” or not set, no client certificate will be used.3. The
universe_domain
property can be used to override the default “googleapis.com” universe. Note that theapi_endpoint
property still takes precedence; anduniverse_domain
is currently not supported for mTLS.client_info (google.api_core.gapic_v1.client_info.ClientInfo) – The client info used to send a user-agent string along with API requests. If
None
, then default info will be used. Generally, you only need to set this if you’re developing your own client library.
- Raises
google.auth.exceptions.MutualTLSChannelError – If mutual TLS transport creation failed for any reason.
- __exit__(type, value, traceback)[source]¶
Releases underlying transport’s resources.
Warning
ONLY use as a context manager if the transport is NOT shared with other clients! Exiting the with block will CLOSE the transport and may cause errors in other clients!
- property api_endpoint¶
Return the API endpoint used by the client instance.
- Returns
The API endpoint used by the client instance.
- Return type
- append_rows(requests: Optional[Iterator[google.cloud.bigquery_storage_v1.types.storage.AppendRowsRequest]] = None, *, retry: Optional[Union[google.api_core.retry.retry_unary.Retry, google.api_core.gapic_v1.method._MethodDefault]] = _MethodDefault._DEFAULT_VALUE, timeout: Union[float, object] = _MethodDefault._DEFAULT_VALUE, metadata: Sequence[Tuple[str, str]] = ()) Iterable[google.cloud.bigquery_storage_v1.types.storage.AppendRowsResponse] [source]¶
Appends data to the given stream.
If
offset
is specified, theoffset
is checked against the end of stream. The server returnsOUT_OF_RANGE
inAppendRowsResponse
if an attempt is made to append to an offset beyond the current end of the stream orALREADY_EXISTS
if user provides anoffset
that has already been written to. User can retry with adjusted offset within the same RPC connection. Ifoffset
is not specified, append happens at the end of the stream.The response contains an optional offset at which the append happened. No offset information will be returned for appends to a default stream.
Responses are received in the same order in which requests are sent. There will be one response for each successful inserted request. Responses may optionally embed error information if the originating AppendRequest was not successfully processed.
The specifics of when successfully appended data is made visible to the table are governed by the type of stream:
For COMMITTED streams (which includes the default stream), data is visible immediately upon successful append.
For BUFFERED streams, data is made visible via a subsequent
FlushRows
rpc which advances a cursor to a newer offset in the stream.For PENDING streams, data is not made visible until the stream itself is finalized (via the
FinalizeWriteStream
rpc), and the stream is explicitly committed via theBatchCommitWriteStreams
rpc.
# This snippet has been automatically generated and should be regarded as a # code template only. # It will require modifications to work: # - It may require correct/in-range values for request initialization. # - It may require specifying regional endpoints when creating the service # client as shown in: # https://googleapis.dev/python/google-api-core/latest/client_options.html from google.cloud import bigquery_storage_v1 def sample_append_rows(): # Create a client client = bigquery_storage_v1.BigQueryWriteClient() # Initialize request argument(s) request = bigquery_storage_v1.AppendRowsRequest( write_stream="write_stream_value", ) # This method expects an iterator which contains # 'bigquery_storage_v1.AppendRowsRequest' objects # Here we create a generator that yields a single `request` for # demonstrative purposes. requests = [request] def request_generator(): for request in requests: yield request # Make the request stream = client.append_rows(requests=request_generator()) # Handle the response for response in stream: print(response)
- Parameters
requests (Iterator[google.cloud.bigquery_storage_v1.types.AppendRowsRequest]) –
The request object iterator. Request message for
AppendRows
.Because AppendRows is a bidirectional streaming RPC, certain parts of the AppendRowsRequest need only be specified for the first request before switching table destinations. You can also switch table destinations within the same connection for the default stream.
The size of a single AppendRowsRequest must be less than 10 MB in size. Requests larger than this return an error, typically
INVALID_ARGUMENT
.retry (google.api_core.retry.Retry) – Designation of what errors, if any, should be retried.
timeout (float) – The timeout for this request.
metadata (Sequence[Tuple[str, str]]) – Strings which should be sent along with the request as metadata.
- Returns
Response message for AppendRows.
- Return type
Iterable[google.cloud.bigquery_storage_v1.types.AppendRowsResponse]
- batch_commit_write_streams(request: Optional[Union[google.cloud.bigquery_storage_v1.types.storage.BatchCommitWriteStreamsRequest, dict]] = None, *, parent: Optional[str] = None, retry: Optional[Union[google.api_core.retry.retry_unary.Retry, google.api_core.gapic_v1.method._MethodDefault]] = _MethodDefault._DEFAULT_VALUE, timeout: Union[float, object] = _MethodDefault._DEFAULT_VALUE, metadata: Sequence[Tuple[str, str]] = ()) google.cloud.bigquery_storage_v1.types.storage.BatchCommitWriteStreamsResponse [source]¶
Atomically commits a group of
PENDING
streams that belong to the sameparent
table.Streams must be finalized before commit and cannot be committed multiple times. Once a stream is committed, data in the stream becomes available for read operations.
# This snippet has been automatically generated and should be regarded as a # code template only. # It will require modifications to work: # - It may require correct/in-range values for request initialization. # - It may require specifying regional endpoints when creating the service # client as shown in: # https://googleapis.dev/python/google-api-core/latest/client_options.html from google.cloud import bigquery_storage_v1 def sample_batch_commit_write_streams(): # Create a client client = bigquery_storage_v1.BigQueryWriteClient() # Initialize request argument(s) request = bigquery_storage_v1.BatchCommitWriteStreamsRequest( parent="parent_value", write_streams=['write_streams_value1', 'write_streams_value2'], ) # Make the request response = client.batch_commit_write_streams(request=request) # Handle the response print(response)
- Parameters
request (Union[google.cloud.bigquery_storage_v1.types.BatchCommitWriteStreamsRequest, dict]) – The request object. Request message for
BatchCommitWriteStreams
.parent (str) –
Required. Parent table that all the streams should belong to, in the form of
projects/{project}/datasets/{dataset}/tables/{table}
.This corresponds to the
parent
field on therequest
instance; ifrequest
is provided, this should not be set.retry (google.api_core.retry.Retry) – Designation of what errors, if any, should be retried.
timeout (float) – The timeout for this request.
metadata (Sequence[Tuple[str, str]]) – Strings which should be sent along with the request as metadata.
- Returns
Response message for BatchCommitWriteStreams.
- Return type
google.cloud.bigquery_storage_v1.types.BatchCommitWriteStreamsResponse
- static common_billing_account_path(billing_account: str) str [source]¶
Returns a fully-qualified billing_account string.
- static common_location_path(project: str, location: str) str [source]¶
Returns a fully-qualified location string.
- static common_organization_path(organization: str) str [source]¶
Returns a fully-qualified organization string.
- create_write_stream(request: Optional[Union[google.cloud.bigquery_storage_v1.types.storage.CreateWriteStreamRequest, dict]] = None, *, parent: Optional[str] = None, write_stream: Optional[google.cloud.bigquery_storage_v1.types.stream.WriteStream] = None, retry: Optional[Union[google.api_core.retry.retry_unary.Retry, google.api_core.gapic_v1.method._MethodDefault]] = _MethodDefault._DEFAULT_VALUE, timeout: Union[float, object] = _MethodDefault._DEFAULT_VALUE, metadata: Sequence[Tuple[str, str]] = ()) google.cloud.bigquery_storage_v1.types.stream.WriteStream [source]¶
Creates a write stream to the given table. Additionally, every table has a special stream named ‘_default’ to which data can be written. This stream doesn’t need to be created using CreateWriteStream. It is a stream that can be used simultaneously by any number of clients. Data written to this stream is considered committed as soon as an acknowledgement is received.
# This snippet has been automatically generated and should be regarded as a # code template only. # It will require modifications to work: # - It may require correct/in-range values for request initialization. # - It may require specifying regional endpoints when creating the service # client as shown in: # https://googleapis.dev/python/google-api-core/latest/client_options.html from google.cloud import bigquery_storage_v1 def sample_create_write_stream(): # Create a client client = bigquery_storage_v1.BigQueryWriteClient() # Initialize request argument(s) request = bigquery_storage_v1.CreateWriteStreamRequest( parent="parent_value", ) # Make the request response = client.create_write_stream(request=request) # Handle the response print(response)
- Parameters
request (Union[google.cloud.bigquery_storage_v1.types.CreateWriteStreamRequest, dict]) – The request object. Request message for
CreateWriteStream
.parent (str) –
Required. Reference to the table to which the stream belongs, in the format of
projects/{project}/datasets/{dataset}/tables/{table}
.This corresponds to the
parent
field on therequest
instance; ifrequest
is provided, this should not be set.write_stream (google.cloud.bigquery_storage_v1.types.WriteStream) – Required. Stream to be created. This corresponds to the
write_stream
field on therequest
instance; ifrequest
is provided, this should not be set.retry (google.api_core.retry.Retry) – Designation of what errors, if any, should be retried.
timeout (float) – The timeout for this request.
metadata (Sequence[Tuple[str, str]]) – Strings which should be sent along with the request as metadata.
- Returns
Information about a single stream that gets data inside the storage system.
- Return type
- finalize_write_stream(request: Optional[Union[google.cloud.bigquery_storage_v1.types.storage.FinalizeWriteStreamRequest, dict]] = None, *, name: Optional[str] = None, retry: Optional[Union[google.api_core.retry.retry_unary.Retry, google.api_core.gapic_v1.method._MethodDefault]] = _MethodDefault._DEFAULT_VALUE, timeout: Union[float, object] = _MethodDefault._DEFAULT_VALUE, metadata: Sequence[Tuple[str, str]] = ()) google.cloud.bigquery_storage_v1.types.storage.FinalizeWriteStreamResponse [source]¶
Finalize a write stream so that no new data can be appended to the stream. Finalize is not supported on the ‘_default’ stream.
# This snippet has been automatically generated and should be regarded as a # code template only. # It will require modifications to work: # - It may require correct/in-range values for request initialization. # - It may require specifying regional endpoints when creating the service # client as shown in: # https://googleapis.dev/python/google-api-core/latest/client_options.html from google.cloud import bigquery_storage_v1 def sample_finalize_write_stream(): # Create a client client = bigquery_storage_v1.BigQueryWriteClient() # Initialize request argument(s) request = bigquery_storage_v1.FinalizeWriteStreamRequest( name="name_value", ) # Make the request response = client.finalize_write_stream(request=request) # Handle the response print(response)
- Parameters
request (Union[google.cloud.bigquery_storage_v1.types.FinalizeWriteStreamRequest, dict]) – The request object. Request message for invoking
FinalizeWriteStream
.name (str) –
Required. Name of the stream to finalize, in the form of
projects/{project}/datasets/{dataset}/tables/{table}/streams/{stream}
.This corresponds to the
name
field on therequest
instance; ifrequest
is provided, this should not be set.retry (google.api_core.retry.Retry) – Designation of what errors, if any, should be retried.
timeout (float) – The timeout for this request.
metadata (Sequence[Tuple[str, str]]) – Strings which should be sent along with the request as metadata.
- Returns
Response message for FinalizeWriteStream.
- Return type
google.cloud.bigquery_storage_v1.types.FinalizeWriteStreamResponse
- flush_rows(request: Optional[Union[google.cloud.bigquery_storage_v1.types.storage.FlushRowsRequest, dict]] = None, *, write_stream: Optional[str] = None, retry: Optional[Union[google.api_core.retry.retry_unary.Retry, google.api_core.gapic_v1.method._MethodDefault]] = _MethodDefault._DEFAULT_VALUE, timeout: Union[float, object] = _MethodDefault._DEFAULT_VALUE, metadata: Sequence[Tuple[str, str]] = ()) google.cloud.bigquery_storage_v1.types.storage.FlushRowsResponse [source]¶
Flushes rows to a BUFFERED stream.
If users are appending rows to BUFFERED stream, flush operation is required in order for the rows to become available for reading. A Flush operation flushes up to any previously flushed offset in a BUFFERED stream, to the offset specified in the request.
Flush is not supported on the _default stream, since it is not BUFFERED.
# This snippet has been automatically generated and should be regarded as a # code template only. # It will require modifications to work: # - It may require correct/in-range values for request initialization. # - It may require specifying regional endpoints when creating the service # client as shown in: # https://googleapis.dev/python/google-api-core/latest/client_options.html from google.cloud import bigquery_storage_v1 def sample_flush_rows(): # Create a client client = bigquery_storage_v1.BigQueryWriteClient() # Initialize request argument(s) request = bigquery_storage_v1.FlushRowsRequest( write_stream="write_stream_value", ) # Make the request response = client.flush_rows(request=request) # Handle the response print(response)
- Parameters
request (Union[google.cloud.bigquery_storage_v1.types.FlushRowsRequest, dict]) – The request object. Request message for
FlushRows
.write_stream (str) –
Required. The stream that is the target of the flush operation.
This corresponds to the
write_stream
field on therequest
instance; ifrequest
is provided, this should not be set.retry (google.api_core.retry.Retry) – Designation of what errors, if any, should be retried.
timeout (float) – The timeout for this request.
metadata (Sequence[Tuple[str, str]]) – Strings which should be sent along with the request as metadata.
- Returns
Respond message for FlushRows.
- Return type
- classmethod from_service_account_file(filename: str, *args, **kwargs)[source]¶
- Creates an instance of this client using the provided credentials
file.
- Parameters
filename (str) – The path to the service account private key json file.
args – Additional arguments to pass to the constructor.
kwargs – Additional arguments to pass to the constructor.
- Returns
The constructed client.
- Return type
- classmethod from_service_account_info(info: dict, *args, **kwargs)[source]¶
- Creates an instance of this client using the provided credentials
info.
- Parameters
info (dict) – The service account private key info.
args – Additional arguments to pass to the constructor.
kwargs – Additional arguments to pass to the constructor.
- Returns
The constructed client.
- Return type
- classmethod from_service_account_json(filename: str, *args, **kwargs)¶
- Creates an instance of this client using the provided credentials
file.
- Parameters
filename (str) – The path to the service account private key json file.
args – Additional arguments to pass to the constructor.
kwargs – Additional arguments to pass to the constructor.
- Returns
The constructed client.
- Return type
- classmethod get_mtls_endpoint_and_cert_source(client_options: Optional[google.api_core.client_options.ClientOptions] = None)[source]¶
Deprecated. Return the API endpoint and client cert source for mutual TLS.
The client cert source is determined in the following order: (1) if GOOGLE_API_USE_CLIENT_CERTIFICATE environment variable is not “true”, the client cert source is None. (2) if client_options.client_cert_source is provided, use the provided one; if the default client cert source exists, use the default one; otherwise the client cert source is None.
The API endpoint is determined in the following order: (1) if client_options.api_endpoint if provided, use the provided one. (2) if GOOGLE_API_USE_CLIENT_CERTIFICATE environment variable is “always”, use the default mTLS endpoint; if the environment variable is “never”, use the default API endpoint; otherwise if client cert source exists, use the default mTLS endpoint, otherwise use the default API endpoint.
More details can be found at https://google.aip.dev/auth/4114.
- Parameters
client_options (google.api_core.client_options.ClientOptions) – Custom options for the client. Only the api_endpoint and client_cert_source properties may be used in this method.
- Returns
- returns the API endpoint and the
client cert source to use.
- Return type
- Raises
google.auth.exceptions.MutualTLSChannelError – If any errors happen.
- get_write_stream(request: Optional[Union[google.cloud.bigquery_storage_v1.types.storage.GetWriteStreamRequest, dict]] = None, *, name: Optional[str] = None, retry: Optional[Union[google.api_core.retry.retry_unary.Retry, google.api_core.gapic_v1.method._MethodDefault]] = _MethodDefault._DEFAULT_VALUE, timeout: Union[float, object] = _MethodDefault._DEFAULT_VALUE, metadata: Sequence[Tuple[str, str]] = ()) google.cloud.bigquery_storage_v1.types.stream.WriteStream [source]¶
Gets information about a write stream.
# This snippet has been automatically generated and should be regarded as a # code template only. # It will require modifications to work: # - It may require correct/in-range values for request initialization. # - It may require specifying regional endpoints when creating the service # client as shown in: # https://googleapis.dev/python/google-api-core/latest/client_options.html from google.cloud import bigquery_storage_v1 def sample_get_write_stream(): # Create a client client = bigquery_storage_v1.BigQueryWriteClient() # Initialize request argument(s) request = bigquery_storage_v1.GetWriteStreamRequest( name="name_value", ) # Make the request response = client.get_write_stream(request=request) # Handle the response print(response)
- Parameters
request (Union[google.cloud.bigquery_storage_v1.types.GetWriteStreamRequest, dict]) – The request object. Request message for
GetWriteStreamRequest
.name (str) –
Required. Name of the stream to get, in the form of
projects/{project}/datasets/{dataset}/tables/{table}/streams/{stream}
.This corresponds to the
name
field on therequest
instance; ifrequest
is provided, this should not be set.retry (google.api_core.retry.Retry) – Designation of what errors, if any, should be retried.
timeout (float) – The timeout for this request.
metadata (Sequence[Tuple[str, str]]) – Strings which should be sent along with the request as metadata.
- Returns
Information about a single stream that gets data inside the storage system.
- Return type
- static parse_common_billing_account_path(path: str) Dict[str, str] [source]¶
Parse a billing_account path into its component segments.
- static parse_common_folder_path(path: str) Dict[str, str] [source]¶
Parse a folder path into its component segments.
- static parse_common_location_path(path: str) Dict[str, str] [source]¶
Parse a location path into its component segments.
- static parse_common_organization_path(path: str) Dict[str, str] [source]¶
Parse a organization path into its component segments.
- static parse_common_project_path(path: str) Dict[str, str] [source]¶
Parse a project path into its component segments.
- static parse_table_path(path: str) Dict[str, str] [source]¶
Parses a table path into its component segments.
- static parse_write_stream_path(path: str) Dict[str, str] [source]¶
Parses a write_stream path into its component segments.
- static table_path(project: str, dataset: str, table: str) str [source]¶
Returns a fully-qualified table string.
- property transport: google.cloud.bigquery_storage_v1.services.big_query_write.transports.base.BigQueryWriteTransport¶
Returns the transport used by the client instance.
- Returns
- The transport used by the client
instance.
- Return type
BigQueryWriteTransport
- class google.cloud.bigquery_storage_v1.reader.ReadRowsIterable(reader, read_session=None)[source]¶
An iterable of rows from a read session.
- Parameters
reader (google.cloud.bigquery_storage_v1.reader.ReadRowsStream) – A read rows stream.
read_session (Optional[ReadSession]) – This argument was used to specify the schema of the rows in the stream, but now the first message in a read stream contains this information. When row_restriction is applied, some streams may be empty without read_session info. Provide this argument to avoid an error. For more information, see https://github.com/googleapis/python-bigquery-storage/issues/733ß
- property pages¶
A generator of all pages in the stream.
- Returns
A generator of pages.
- Return type
types.GeneratorType[google.cloud.bigquery_storage_v1.ReadRowsPage]
- to_arrow()[source]¶
Create a
pyarrow.Table
of all rows in the stream.This method requires the pyarrow library and a stream using the Arrow format.
- Returns
A table of all rows in the stream.
- Return type
pyarrow.Table
- to_dataframe(dtypes=None)[source]¶
Create a
pandas.DataFrame
of all rows in the stream.This method requires the pandas libary to create a data frame and the fastavro library to parse row messages.
Warning
DATETIME columns are not supported. They are currently parsed as strings in the fastavro library.
- Parameters
dtypes (Map[str, Union[str, pandas.Series.dtype]]) – Optional. A dictionary of column names pandas
dtype``s. The provided ``dtype
is used when constructing the series for the column specified. Otherwise, the default pandas behavior is used.- Returns
A data frame of all rows in the stream.
- Return type
pandas.DataFrame
- class google.cloud.bigquery_storage_v1.reader.ReadRowsPage(stream_parser, message)[source]¶
An iterator of rows from a read session message.
- Parameters
stream_parser (google.cloud.bigquery_storage_v1.reader._StreamParser) – A helper for parsing messages into rows.
message (google.cloud.bigquery_storage_v1.types.ReadRowsResponse) – A message of data from a read rows stream.
- __next__()¶
Get the next row in the page.
- to_arrow()[source]¶
Create an
pyarrow.RecordBatch
of rows in the page.- Returns
Rows from the message, as an Arrow record batch.
- Return type
pyarrow.RecordBatch
- to_dataframe(dtypes=None)[source]¶
Create a
pandas.DataFrame
of rows in the page.This method requires the pandas libary to create a data frame and the fastavro library to parse row messages.
Warning
DATETIME columns are not supported. They are currently parsed as strings in the fastavro library.
- Parameters
dtypes (Map[str, Union[str, pandas.Series.dtype]]) – Optional. A dictionary of column names pandas
dtype``s. The provided ``dtype
is used when constructing the series for the column specified. Otherwise, the default pandas behavior is used.- Returns
A data frame of all rows in the stream.
- Return type
pandas.DataFrame
- class google.cloud.bigquery_storage_v1.reader.ReadRowsStream(client, name, offset, read_rows_kwargs, retry_delay_callback=None)[source]¶
A stream of results from a read rows request.
This stream is an iterable of
ReadRowsResponse
. Iterate over it to fetch all row messages.If the fastavro library is installed, use the
rows()
method to parse all messages into a stream of row dictionaries.If the pandas and fastavro libraries are installed, use the
to_dataframe()
method to parse all messages into apandas.DataFrame
.This object should not be created directly, but is returned by other methods in this library.
Construct a ReadRowsStream.
- Parameters
client (BigQueryReadClient) – A GAPIC client used to reconnect to a ReadRows stream. This must be the GAPIC client to avoid a circular dependency on this class.
name (str) – Required. Stream ID from which rows are being read.
offset (int) – Required. Position in the stream to start reading from. The offset requested must be less than the last row read from ReadRows. Requesting a larger offset is undefined.
read_rows_kwargs (dict) – Keyword arguments to use when reconnecting to a ReadRows stream.
retry_delay_callback (Optional[Callable[[float], None]]) – If the client receives a retryable error that asks the client to delay its next attempt and retry_delay_callback is not None, ReadRowsStream will call retry_delay_callback with the delay duration (in seconds) before it starts sleeping until the next attempt.
- Returns
A sequence of row messages.
- Return type
Iterable[ ReadRowsResponse ]
- __iter__()[source]¶
An iterable of messages.
- Returns
A sequence of row messages.
- Return type
Iterable[ ReadRowsResponse ]
- rows(read_session=None)[source]¶
Iterate over all rows in the stream.
This method requires the fastavro library in order to parse row messages in avro format. For arrow format messages, the pyarrow library is required.
Warning
DATETIME columns are not supported. They are currently parsed as strings in the fastavro library.
- Parameters
read_session (Optional[ReadSession]) – This argument was used to specify the schema of the rows in the stream, but now the first message in a read stream contains this information. When row_restriction is applied, some streams may be empty without read_session info. Provide this argument to avoid an error. For more information, see https://github.com/googleapis/python-bigquery-storage/issues/733
- Returns
A sequence of rows, represented as dictionaries.
- Return type
Iterable[Mapping]
- to_arrow(read_session=None)[source]¶
Create a
pyarrow.Table
of all rows in the stream.This method requires the pyarrow library and a stream using the Arrow format.
- Parameters
read_session (ReadSession) – This argument was used to specify the schema of the rows in the stream, but now the first message in a read stream contains this information. When row_restriction is applied, some streams may be empty without read_session info. Provide this argument to avoid an error. For more information, see https://github.com/googleapis/python-bigquery-storage/issues/733
- Returns
A table of all rows in the stream.
- Return type
pyarrow.Table
- to_dataframe(read_session=None, dtypes=None)[source]¶
Create a
pandas.DataFrame
of all rows in the stream.This method requires the pandas libary to create a data frame and the fastavro library to parse row messages.
Warning
DATETIME columns are not supported. They are currently parsed as strings.
- Parameters
read_session (ReadSession) – This argument was used to specify the schema of the rows in the stream, but now the first message in a read stream contains this information. When row_restriction is applied, some streams may be empty without read_session info. Provide this argument to avoid an error. For more information, see https://github.com/googleapis/python-bigquery-storage/issues/733
dtypes (Map[str, Union[str, pandas.Series.dtype]]) – Optional. A dictionary of column names pandas
dtype``s. The provided ``dtype
is used when constructing the series for the column specified. Otherwise, the default pandas behavior is used.
- Returns
A data frame of all rows in the stream.
- Return type
pandas.DataFrame