google.cloud.bigquery.table.RowIterator¶
- class google.cloud.bigquery.table.RowIterator(client, api_request, path, schema, page_token=None, max_results=None, page_size=None, extra_params=None, table=None, selected_fields=None, total_rows=None, first_page_response=None, location: Optional[str] = None, job_id: Optional[str] = None, query_id: Optional[str] = None, project: Optional[str] = None, num_dml_affected_rows: Optional[int] = None)[source]¶
A class for iterating through HTTP/JSON API row list responses.
- Parameters
client (Optional[google.cloud.bigquery.Client]) – The API client instance. This should always be non-None, except for subclasses that do not use it, namely the
_EmptyRowIterator
.api_request (Callable[google.cloud._http.JSONConnection.api_request]) – The function to use to make API requests.
path (str) – The method path to query for the list of items.
schema (Sequence[Union[
SchemaField
, Mapping[str, Any] ]]) – The table’s schema. If any item is a mapping, its content must be compatible withfrom_api_repr()
.page_token (str) – A token identifying a page in a result set to start fetching results from.
max_results (Optional[int]) – The maximum number of results to fetch.
page_size (Optional[int]) – The maximum number of rows in each page of results from this request. Non-positive values are ignored. Defaults to a sensible value set by the API.
extra_params (Optional[Dict[str, object]]) – Extra query string parameters for the API call.
table (Optional[Union[ google.cloud.bigquery.table.Table, google.cloud.bigquery.table.TableReference, ]]) – The table which these rows belong to, or a reference to it. Used to call the BigQuery Storage API to fetch rows.
selected_fields (Optional[Sequence[google.cloud.bigquery.schema.SchemaField]]) – A subset of columns to select from this table.
total_rows (Optional[int]) – Total number of rows in the table.
first_page_response (Optional[dict]) – API response for the first page of results. These are returned when the first page is requested.
- __init__(client, api_request, path, schema, page_token=None, max_results=None, page_size=None, extra_params=None, table=None, selected_fields=None, total_rows=None, first_page_response=None, location: Optional[str] = None, job_id: Optional[str] = None, query_id: Optional[str] = None, project: Optional[str] = None, num_dml_affected_rows: Optional[int] = None)[source]¶
Initialize self. See help(type(self)) for accurate signature.
Methods
__init__
(client, api_request, path, schema)Initialize self.
to_arrow
([progress_bar_type, …])[Beta] Create a class:pyarrow.Table by loading all pages of a table or query.
to_arrow_iterable
([bqstorage_client, …])[Beta] Create an iterable of class:pyarrow.RecordBatch, to process the table as a stream.
to_dataframe
([bqstorage_client, dtypes, …])Create a pandas DataFrame by loading all pages of a query.
to_dataframe_iterable
([bqstorage_client, …])Create an iterable of pandas DataFrames, to process the table as a stream.
to_geodataframe
([bqstorage_client, dtypes, …])Create a GeoPandas GeoDataFrame by loading all pages of a query.
Attributes
ID of the query job (if applicable).
Location where the query executed (if applicable).
If this RowIterator is the result of a DML query, the number of rows that were affected.
Iterator of pages in the response.
GCP Project ID where these rows are read from.
[Preview] ID of a completed query.
The subset of columns to be read from the table.
The total number of rows in the table or query results.
- __iter__()¶
Iterator for each item returned.
- Returns
A generator of items from the API.
- Return type
types.GeneratorType[Any]
- Raises
ValueError – If the iterator has already been started.
- client¶
The client that created this iterator.
- Type
Optional[Any]
- item_to_value¶
Callable to convert an item from the type in the raw API response into the native object. Will be called with the iterator and a single item.
- Type
Callable[Iterator, Any]
- property job_id: Optional[str]¶
ID of the query job (if applicable).
To get the job metadata, call
job = client.get_job(rows.job_id, location=rows.location)
.
- property location: Optional[str]¶
Location where the query executed (if applicable).
- next_page_token¶
The token for the next page of results. If this is set before the iterator starts, it effectively offsets the iterator to a specific starting point.
- Type
- property num_dml_affected_rows: Optional[int]¶
If this RowIterator is the result of a DML query, the number of rows that were affected.
- property pages¶
Iterator of pages in the response.
- Returns
- A
generator of page instances.
- Return type
types.GeneratorType[google.api_core.page_iterator.Page]
- Raises
ValueError – If the iterator has already been started.
- property project: Optional[str]¶
GCP Project ID where these rows are read from.
- property query_id: Optional[str]¶
[Preview] ID of a completed query.
This ID is auto-generated and not guaranteed to be populated.
- property schema¶
The subset of columns to be read from the table.
- Type
- to_arrow(progress_bar_type: Optional[str] = None, bqstorage_client: Optional[bigquery_storage.BigQueryReadClient] = None, create_bqstorage_client: bool = True) → pyarrow.Table[source]¶
[Beta] Create a class:pyarrow.Table by loading all pages of a table or query.
- Parameters
progress_bar_type (Optional[str]) –
If set, use the tqdm library to display a progress bar while the data downloads. Install the
tqdm
package to use this feature.Possible values of
progress_bar_type
include:None
No progress bar.
'tqdm'
Use the
tqdm.tqdm()
function to print a progress bar tosys.stdout
.'tqdm_notebook'
Use the
tqdm.notebook.tqdm()
function to display a progress bar as a Jupyter notebook widget.'tqdm_gui'
Use the
tqdm.tqdm_gui()
function to display a progress bar as a graphical dialog box.
bqstorage_client (Optional[google.cloud.bigquery_storage_v1.BigQueryReadClient]) –
A BigQuery Storage API client. If supplied, use the faster BigQuery Storage API to fetch rows from BigQuery. This API is a billable API.
This method requires
google-cloud-bigquery-storage
library.This method only exposes a subset of the capabilities of the BigQuery Storage API. For full access to all features (projections, filters, snapshots) use the Storage API directly.
create_bqstorage_client (Optional[bool]) –
If
True
(default), create a BigQuery Storage API client using the default API settings. The BigQuery Storage API is a faster way to fetch rows from BigQuery. See thebqstorage_client
parameter for more information.This argument does nothing if
bqstorage_client
is supplied.New in version 1.24.0.
- Returns
- pyarrow.Table
A
pyarrow.Table
populated with row data and column headers from the query results. The column headers are derived from the destination table’s schema.
- Raises
ValueError – If the
pyarrow
library cannot be imported.
New in version 1.17.0.
- to_arrow_iterable(bqstorage_client: Optional[bigquery_storage.BigQueryReadClient] = None, max_queue_size: int = <object object>) → Iterator[pyarrow.RecordBatch][source]¶
[Beta] Create an iterable of class:pyarrow.RecordBatch, to process the table as a stream.
- Parameters
bqstorage_client (Optional[google.cloud.bigquery_storage_v1.BigQueryReadClient]) –
A BigQuery Storage API client. If supplied, use the faster BigQuery Storage API to fetch rows from BigQuery.
This method requires the
pyarrow
andgoogle-cloud-bigquery-storage
libraries.This method only exposes a subset of the capabilities of the BigQuery Storage API. For full access to all features (projections, filters, snapshots) use the Storage API directly.
max_queue_size (Optional[int]) –
The maximum number of result pages to hold in the internal queue when streaming query results over the BigQuery Storage API. Ignored if Storage API is not used.
By default, the max queue size is set to the number of BQ Storage streams created by the server. If
max_queue_size
isNone
, the queue size is infinite.
- Returns
A generator of
RecordBatch
.- Return type
pyarrow.RecordBatch
New in version 2.31.0.
- to_dataframe(bqstorage_client: Optional[bigquery_storage.BigQueryReadClient] = None, dtypes: Optional[Dict[str, Any]] = None, progress_bar_type: Optional[str] = None, create_bqstorage_client: bool = True, geography_as_object: bool = False, bool_dtype: Optional[Any] = <DefaultPandasDTypes.BOOL_DTYPE: <object object>>, int_dtype: Optional[Any] = <DefaultPandasDTypes.INT_DTYPE: <object object>>, float_dtype: Optional[Any] = None, string_dtype: Optional[Any] = None, date_dtype: Optional[Any] = <DefaultPandasDTypes.DATE_DTYPE: <object object>>, datetime_dtype: Optional[Any] = None, time_dtype: Optional[Any] = <DefaultPandasDTypes.TIME_DTYPE: <object object>>, timestamp_dtype: Optional[Any] = None) → pandas.DataFrame[source]¶
Create a pandas DataFrame by loading all pages of a query.
- Parameters
bqstorage_client (Optional[google.cloud.bigquery_storage_v1.BigQueryReadClient]) –
A BigQuery Storage API client. If supplied, use the faster BigQuery Storage API to fetch rows from BigQuery.
This method requires
google-cloud-bigquery-storage
library.This method only exposes a subset of the capabilities of the BigQuery Storage API. For full access to all features (projections, filters, snapshots) use the Storage API directly.
dtypes (Optional[Map[str, Union[str, pandas.Series.dtype]]]) – A dictionary of column names pandas
dtype``s. The provided ``dtype
is used when constructing the series for the column specified. Otherwise, the default pandas behavior is used.progress_bar_type (Optional[str]) –
If set, use the tqdm library to display a progress bar while the data downloads. Install the
tqdm
package to use this feature.Possible values of
progress_bar_type
include:None
No progress bar.
'tqdm'
Use the
tqdm.tqdm()
function to print a progress bar tosys.stdout
.'tqdm_notebook'
Use the
tqdm.notebook.tqdm()
function to display a progress bar as a Jupyter notebook widget.'tqdm_gui'
Use the
tqdm.tqdm_gui()
function to display a progress bar as a graphical dialog box.
New in version 1.11.0.
create_bqstorage_client (Optional[bool]) –
If
True
(default), create a BigQuery Storage API client using the default API settings. The BigQuery Storage API is a faster way to fetch rows from BigQuery. See thebqstorage_client
parameter for more information.This argument does nothing if
bqstorage_client
is supplied.New in version 1.24.0.
geography_as_object (Optional[bool]) –
If
True
, convert GEOGRAPHY data toshapely
geometry objects. IfFalse
(default), don’t cast geography data toshapely
geometry objects.New in version 2.24.0.
bool_dtype (Optional[pandas.Series.dtype, None]) –
If set, indicate a pandas ExtensionDtype (e.g.
pandas.BooleanDtype()
) to convert BigQuery Boolean type, instead of relying on the defaultpandas.BooleanDtype()
. If you explicitly set the value toNone
, then the data type will benumpy.dtype("bool")
. BigQuery Boolean type can be found at: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#boolean_typeNew in version 3.8.0.
int_dtype (Optional[pandas.Series.dtype, None]) –
If set, indicate a pandas ExtensionDtype (e.g.
pandas.Int64Dtype()
) to convert BigQuery Integer types, instead of relying on the defaultpandas.Int64Dtype()
. If you explicitly set the value toNone
, then the data type will benumpy.dtype("int64")
. A list of BigQuery Integer types can be found at: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#integer_typesNew in version 3.8.0.
float_dtype (Optional[pandas.Series.dtype, None]) –
If set, indicate a pandas ExtensionDtype (e.g.
pandas.Float32Dtype()
) to convert BigQuery Float type, instead of relying on the defaultnumpy.dtype("float64")
. If you explicitly set the value toNone
, then the data type will benumpy.dtype("float64")
. BigQuery Float type can be found at: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#floating_point_typesNew in version 3.8.0.
string_dtype (Optional[pandas.Series.dtype, None]) –
If set, indicate a pandas ExtensionDtype (e.g.
pandas.StringDtype()
) to convert BigQuery String type, instead of relying on the defaultnumpy.dtype("object")
. If you explicitly set the value toNone
, then the data type will benumpy.dtype("object")
. BigQuery String type can be found at: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#string_typeNew in version 3.8.0.
date_dtype (Optional[pandas.Series.dtype, None]) –
If set, indicate a pandas ExtensionDtype (e.g.
pandas.ArrowDtype(pyarrow.date32())
) to convert BigQuery Date type, instead of relying on the defaultdb_dtypes.DateDtype()
. If you explicitly set the value toNone
, then the data type will benumpy.dtype("datetime64[ns]")
orobject
if out of bound. BigQuery Date type can be found at: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#date_typeNew in version 3.10.0.
datetime_dtype (Optional[pandas.Series.dtype, None]) –
If set, indicate a pandas ExtensionDtype (e.g.
pandas.ArrowDtype(pyarrow.timestamp("us"))
) to convert BigQuery Datetime type, instead of relying on the defaultnumpy.dtype("datetime64[ns]
. If you explicitly set the value toNone
, then the data type will benumpy.dtype("datetime64[ns]")
orobject
if out of bound. BigQuery Datetime type can be found at: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#datetime_typeNew in version 3.10.0.
time_dtype (Optional[pandas.Series.dtype, None]) –
If set, indicate a pandas ExtensionDtype (e.g.
pandas.ArrowDtype(pyarrow.time64("us"))
) to convert BigQuery Time type, instead of relying on the defaultdb_dtypes.TimeDtype()
. If you explicitly set the value toNone
, then the data type will benumpy.dtype("object")
. BigQuery Time type can be found at: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#time_typeNew in version 3.10.0.
timestamp_dtype (Optional[pandas.Series.dtype, None]) –
If set, indicate a pandas ExtensionDtype (e.g.
pandas.ArrowDtype(pyarrow.timestamp("us", tz="UTC"))
) to convert BigQuery Timestamp type, instead of relying on the defaultnumpy.dtype("datetime64[ns, UTC]")
. If you explicitly set the value toNone
, then the data type will benumpy.dtype("datetime64[ns, UTC]")
orobject
if out of bound. BigQuery Datetime type can be found at: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#timestamp_typeNew in version 3.10.0.
- Returns
A
DataFrame
populated with row data and column headers from the query results. The column headers are derived from the destination table’s schema.- Return type
- Raises
ValueError – If the
pandas
library cannot be imported, or thegoogle.cloud.bigquery_storage_v1
module is required but cannot be imported. Also if geography_as_object is True, but theshapely
library cannot be imported. Also if bool_dtype, int_dtype or other dtype parameters is not supported dtype.
- to_dataframe_iterable(bqstorage_client: Optional[bigquery_storage.BigQueryReadClient] = None, dtypes: Optional[Dict[str, Any]] = None, max_queue_size: int = <object object>) → pandas.DataFrame[source]¶
Create an iterable of pandas DataFrames, to process the table as a stream.
- Parameters
bqstorage_client (Optional[google.cloud.bigquery_storage_v1.BigQueryReadClient]) –
A BigQuery Storage API client. If supplied, use the faster BigQuery Storage API to fetch rows from BigQuery.
This method requires
google-cloud-bigquery-storage
library.This method only exposes a subset of the capabilities of the BigQuery Storage API. For full access to all features (projections, filters, snapshots) use the Storage API directly.
dtypes (Optional[Map[str, Union[str, pandas.Series.dtype]]]) – A dictionary of column names pandas
dtype``s. The provided ``dtype
is used when constructing the series for the column specified. Otherwise, the default pandas behavior is used.max_queue_size (Optional[int]) –
The maximum number of result pages to hold in the internal queue when streaming query results over the BigQuery Storage API. Ignored if Storage API is not used.
By default, the max queue size is set to the number of BQ Storage streams created by the server. If
max_queue_size
isNone
, the queue size is infinite.New in version 2.14.0.
- Returns
A generator of
DataFrame
.- Return type
- Raises
ValueError – If the
pandas
library cannot be imported.
- to_geodataframe(bqstorage_client: Optional[bigquery_storage.BigQueryReadClient] = None, dtypes: Optional[Dict[str, Any]] = None, progress_bar_type: Optional[str] = None, create_bqstorage_client: bool = True, geography_column: Optional[str] = None) → geopandas.GeoDataFrame[source]¶
Create a GeoPandas GeoDataFrame by loading all pages of a query.
- Parameters
bqstorage_client (Optional[google.cloud.bigquery_storage_v1.BigQueryReadClient]) –
A BigQuery Storage API client. If supplied, use the faster BigQuery Storage API to fetch rows from BigQuery.
This method requires the
pyarrow
andgoogle-cloud-bigquery-storage
libraries.This method only exposes a subset of the capabilities of the BigQuery Storage API. For full access to all features (projections, filters, snapshots) use the Storage API directly.
dtypes (Optional[Map[str, Union[str, pandas.Series.dtype]]]) – A dictionary of column names pandas
dtype``s. The provided ``dtype
is used when constructing the series for the column specified. Otherwise, the default pandas behavior is used.progress_bar_type (Optional[str]) –
If set, use the tqdm library to display a progress bar while the data downloads. Install the
tqdm
package to use this feature.Possible values of
progress_bar_type
include:None
No progress bar.
'tqdm'
Use the
tqdm.tqdm()
function to print a progress bar tosys.stdout
.'tqdm_notebook'
Use the
tqdm.notebook.tqdm()
function to display a progress bar as a Jupyter notebook widget.'tqdm_gui'
Use the
tqdm.tqdm_gui()
function to display a progress bar as a graphical dialog box.
create_bqstorage_client (Optional[bool]) –
If
True
(default), create a BigQuery Storage API client using the default API settings. The BigQuery Storage API is a faster way to fetch rows from BigQuery. See thebqstorage_client
parameter for more information.This argument does nothing if
bqstorage_client
is supplied.geography_column (Optional[str]) – If there are more than one GEOGRAPHY column, identifies which one to use to construct a geopandas GeoDataFrame. This option can be ommitted if there’s only one GEOGRAPHY column.
- Returns
A
geopandas.GeoDataFrame
populated with row data and column headers from the query results. The column headers are derived from the destination table’s schema.- Return type
- Raises
ValueError – If the
geopandas
library cannot be imported, or thegoogle.cloud.bigquery_storage_v1
module is required but cannot be imported.
New in version 2.24.0.