As of January 1, 2020 this library no longer supports Python 2 on the latest released version. Library versions released prior to that date will continue to be available. For more information please visit Python 2 support on Google Cloud.

google.cloud.bigquery.table.RowIterator

class google.cloud.bigquery.table.RowIterator(client, api_request, path, schema, page_token=None, max_results=None, page_size=None, extra_params=None, table=None, selected_fields=None, total_rows=None, first_page_response=None, location: Optional[str] = None, job_id: Optional[str] = None, query_id: Optional[str] = None, project: Optional[str] = None, num_dml_affected_rows: Optional[int] = None)[source]

A class for iterating through HTTP/JSON API row list responses.

Parameters
  • client (Optional[google.cloud.bigquery.Client]) – The API client instance. This should always be non-None, except for subclasses that do not use it, namely the _EmptyRowIterator.

  • api_request (Callable[google.cloud._http.JSONConnection.api_request]) – The function to use to make API requests.

  • path (str) – The method path to query for the list of items.

  • schema (Sequence[Union[ SchemaField, Mapping[str, Any] ]]) – The table’s schema. If any item is a mapping, its content must be compatible with from_api_repr().

  • page_token (str) – A token identifying a page in a result set to start fetching results from.

  • max_results (Optional[int]) – The maximum number of results to fetch.

  • page_size (Optional[int]) – The maximum number of rows in each page of results from this request. Non-positive values are ignored. Defaults to a sensible value set by the API.

  • extra_params (Optional[Dict[str, object]]) – Extra query string parameters for the API call.

  • table (Optional[Union[ google.cloud.bigquery.table.Table, google.cloud.bigquery.table.TableReference, ]]) – The table which these rows belong to, or a reference to it. Used to call the BigQuery Storage API to fetch rows.

  • selected_fields (Optional[Sequence[google.cloud.bigquery.schema.SchemaField]]) – A subset of columns to select from this table.

  • total_rows (Optional[int]) – Total number of rows in the table.

  • first_page_response (Optional[dict]) – API response for the first page of results. These are returned when the first page is requested.

__init__(client, api_request, path, schema, page_token=None, max_results=None, page_size=None, extra_params=None, table=None, selected_fields=None, total_rows=None, first_page_response=None, location: Optional[str] = None, job_id: Optional[str] = None, query_id: Optional[str] = None, project: Optional[str] = None, num_dml_affected_rows: Optional[int] = None)[source]

Initialize self. See help(type(self)) for accurate signature.

Methods

__init__(client, api_request, path, schema)

Initialize self.

to_arrow([progress_bar_type, …])

[Beta] Create a class:pyarrow.Table by loading all pages of a table or query.

to_arrow_iterable([bqstorage_client, …])

[Beta] Create an iterable of class:pyarrow.RecordBatch, to process the table as a stream.

to_dataframe([bqstorage_client, dtypes, …])

Create a pandas DataFrame by loading all pages of a query.

to_dataframe_iterable([bqstorage_client, …])

Create an iterable of pandas DataFrames, to process the table as a stream.

to_geodataframe([bqstorage_client, dtypes, …])

Create a GeoPandas GeoDataFrame by loading all pages of a query.

Attributes

job_id

ID of the query job (if applicable).

location

Location where the query executed (if applicable).

num_dml_affected_rows

If this RowIterator is the result of a DML query, the number of rows that were affected.

pages

Iterator of pages in the response.

project

GCP Project ID where these rows are read from.

query_id

[Preview] ID of a completed query.

schema

The subset of columns to be read from the table.

total_rows

The total number of rows in the table or query results.

__iter__()

Iterator for each item returned.

Returns

A generator of items from the API.

Return type

types.GeneratorType[Any]

Raises

ValueError – If the iterator has already been started.

client

The client that created this iterator.

Type

Optional[Any]

item_to_value

Callable to convert an item from the type in the raw API response into the native object. Will be called with the iterator and a single item.

Type

Callable[Iterator, Any]

property job_id: Optional[str]

ID of the query job (if applicable).

To get the job metadata, call job = client.get_job(rows.job_id, location=rows.location).

property location: Optional[str]

Location where the query executed (if applicable).

See: https://cloud.google.com/bigquery/docs/locations

max_results

The maximum number of results to fetch

Type

int

next_page_token

The token for the next page of results. If this is set before the iterator starts, it effectively offsets the iterator to a specific starting point.

Type

str

property num_dml_affected_rows: Optional[int]

If this RowIterator is the result of a DML query, the number of rows that were affected.

See: https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/query#body.QueryResponse.FIELDS.num_dml_affected_rows

num_results

The total number of results fetched so far.

Type

int

page_number

The current page of results.

Type

int

property pages

Iterator of pages in the response.

Returns

A

generator of page instances.

Return type

types.GeneratorType[google.api_core.page_iterator.Page]

Raises

ValueError – If the iterator has already been started.

property project: Optional[str]

GCP Project ID where these rows are read from.

property query_id: Optional[str]

[Preview] ID of a completed query.

This ID is auto-generated and not guaranteed to be populated.

property schema

The subset of columns to be read from the table.

Type

List[google.cloud.bigquery.schema.SchemaField]

to_arrow(progress_bar_type: Optional[str] = None, bqstorage_client: Optional[bigquery_storage.BigQueryReadClient] = None, create_bqstorage_client: bool = True)pyarrow.Table[source]

[Beta] Create a class:pyarrow.Table by loading all pages of a table or query.

Parameters
  • progress_bar_type (Optional[str]) –

    If set, use the tqdm library to display a progress bar while the data downloads. Install the tqdm package to use this feature.

    Possible values of progress_bar_type include:

    None

    No progress bar.

    'tqdm'

    Use the tqdm.tqdm() function to print a progress bar to sys.stdout.

    'tqdm_notebook'

    Use the tqdm.notebook.tqdm() function to display a progress bar as a Jupyter notebook widget.

    'tqdm_gui'

    Use the tqdm.tqdm_gui() function to display a progress bar as a graphical dialog box.

  • bqstorage_client (Optional[google.cloud.bigquery_storage_v1.BigQueryReadClient]) –

    A BigQuery Storage API client. If supplied, use the faster BigQuery Storage API to fetch rows from BigQuery. This API is a billable API.

    This method requires google-cloud-bigquery-storage library.

    This method only exposes a subset of the capabilities of the BigQuery Storage API. For full access to all features (projections, filters, snapshots) use the Storage API directly.

  • create_bqstorage_client (Optional[bool]) –

    If True (default), create a BigQuery Storage API client using the default API settings. The BigQuery Storage API is a faster way to fetch rows from BigQuery. See the bqstorage_client parameter for more information.

    This argument does nothing if bqstorage_client is supplied.

    New in version 1.24.0.

Returns

pyarrow.Table

A pyarrow.Table populated with row data and column headers from the query results. The column headers are derived from the destination table’s schema.

Raises

ValueError – If the pyarrow library cannot be imported.

New in version 1.17.0.

to_arrow_iterable(bqstorage_client: Optional[bigquery_storage.BigQueryReadClient] = None, max_queue_size: int = <object object>)Iterator[pyarrow.RecordBatch][source]

[Beta] Create an iterable of class:pyarrow.RecordBatch, to process the table as a stream.

Parameters
  • bqstorage_client (Optional[google.cloud.bigquery_storage_v1.BigQueryReadClient]) –

    A BigQuery Storage API client. If supplied, use the faster BigQuery Storage API to fetch rows from BigQuery.

    This method requires the pyarrow and google-cloud-bigquery-storage libraries.

    This method only exposes a subset of the capabilities of the BigQuery Storage API. For full access to all features (projections, filters, snapshots) use the Storage API directly.

  • max_queue_size (Optional[int]) –

    The maximum number of result pages to hold in the internal queue when streaming query results over the BigQuery Storage API. Ignored if Storage API is not used.

    By default, the max queue size is set to the number of BQ Storage streams created by the server. If max_queue_size is None, the queue size is infinite.

Returns

A generator of RecordBatch.

Return type

pyarrow.RecordBatch

New in version 2.31.0.

to_dataframe(bqstorage_client: Optional[bigquery_storage.BigQueryReadClient] = None, dtypes: Optional[Dict[str, Any]] = None, progress_bar_type: Optional[str] = None, create_bqstorage_client: bool = True, geography_as_object: bool = False, bool_dtype: Optional[Any] = <DefaultPandasDTypes.BOOL_DTYPE: <object object>>, int_dtype: Optional[Any] = <DefaultPandasDTypes.INT_DTYPE: <object object>>, float_dtype: Optional[Any] = None, string_dtype: Optional[Any] = None, date_dtype: Optional[Any] = <DefaultPandasDTypes.DATE_DTYPE: <object object>>, datetime_dtype: Optional[Any] = None, time_dtype: Optional[Any] = <DefaultPandasDTypes.TIME_DTYPE: <object object>>, timestamp_dtype: Optional[Any] = None)pandas.DataFrame[source]

Create a pandas DataFrame by loading all pages of a query.

Parameters
  • bqstorage_client (Optional[google.cloud.bigquery_storage_v1.BigQueryReadClient]) –

    A BigQuery Storage API client. If supplied, use the faster BigQuery Storage API to fetch rows from BigQuery.

    This method requires google-cloud-bigquery-storage library.

    This method only exposes a subset of the capabilities of the BigQuery Storage API. For full access to all features (projections, filters, snapshots) use the Storage API directly.

  • dtypes (Optional[Map[str, Union[str, pandas.Series.dtype]]]) – A dictionary of column names pandas dtype``s. The provided ``dtype is used when constructing the series for the column specified. Otherwise, the default pandas behavior is used.

  • progress_bar_type (Optional[str]) –

    If set, use the tqdm library to display a progress bar while the data downloads. Install the tqdm package to use this feature.

    Possible values of progress_bar_type include:

    None

    No progress bar.

    'tqdm'

    Use the tqdm.tqdm() function to print a progress bar to sys.stdout.

    'tqdm_notebook'

    Use the tqdm.notebook.tqdm() function to display a progress bar as a Jupyter notebook widget.

    'tqdm_gui'

    Use the tqdm.tqdm_gui() function to display a progress bar as a graphical dialog box.

    New in version 1.11.0.

  • create_bqstorage_client (Optional[bool]) –

    If True (default), create a BigQuery Storage API client using the default API settings. The BigQuery Storage API is a faster way to fetch rows from BigQuery. See the bqstorage_client parameter for more information.

    This argument does nothing if bqstorage_client is supplied.

    New in version 1.24.0.

  • geography_as_object (Optional[bool]) –

    If True, convert GEOGRAPHY data to shapely geometry objects. If False (default), don’t cast geography data to shapely geometry objects.

    New in version 2.24.0.

  • bool_dtype (Optional[pandas.Series.dtype, None]) –

    If set, indicate a pandas ExtensionDtype (e.g. pandas.BooleanDtype()) to convert BigQuery Boolean type, instead of relying on the default pandas.BooleanDtype(). If you explicitly set the value to None, then the data type will be numpy.dtype("bool"). BigQuery Boolean type can be found at: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#boolean_type

    New in version 3.8.0.

  • int_dtype (Optional[pandas.Series.dtype, None]) –

    If set, indicate a pandas ExtensionDtype (e.g. pandas.Int64Dtype()) to convert BigQuery Integer types, instead of relying on the default pandas.Int64Dtype(). If you explicitly set the value to None, then the data type will be numpy.dtype("int64"). A list of BigQuery Integer types can be found at: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#integer_types

    New in version 3.8.0.

  • float_dtype (Optional[pandas.Series.dtype, None]) –

    If set, indicate a pandas ExtensionDtype (e.g. pandas.Float32Dtype()) to convert BigQuery Float type, instead of relying on the default numpy.dtype("float64"). If you explicitly set the value to None, then the data type will be numpy.dtype("float64"). BigQuery Float type can be found at: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#floating_point_types

    New in version 3.8.0.

  • string_dtype (Optional[pandas.Series.dtype, None]) –

    If set, indicate a pandas ExtensionDtype (e.g. pandas.StringDtype()) to convert BigQuery String type, instead of relying on the default numpy.dtype("object"). If you explicitly set the value to None, then the data type will be numpy.dtype("object"). BigQuery String type can be found at: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#string_type

    New in version 3.8.0.

  • date_dtype (Optional[pandas.Series.dtype, None]) –

    If set, indicate a pandas ExtensionDtype (e.g. pandas.ArrowDtype(pyarrow.date32())) to convert BigQuery Date type, instead of relying on the default db_dtypes.DateDtype(). If you explicitly set the value to None, then the data type will be numpy.dtype("datetime64[ns]") or object if out of bound. BigQuery Date type can be found at: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#date_type

    New in version 3.10.0.

  • datetime_dtype (Optional[pandas.Series.dtype, None]) –

    If set, indicate a pandas ExtensionDtype (e.g. pandas.ArrowDtype(pyarrow.timestamp("us"))) to convert BigQuery Datetime type, instead of relying on the default numpy.dtype("datetime64[ns]. If you explicitly set the value to None, then the data type will be numpy.dtype("datetime64[ns]") or object if out of bound. BigQuery Datetime type can be found at: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#datetime_type

    New in version 3.10.0.

  • time_dtype (Optional[pandas.Series.dtype, None]) –

    If set, indicate a pandas ExtensionDtype (e.g. pandas.ArrowDtype(pyarrow.time64("us"))) to convert BigQuery Time type, instead of relying on the default db_dtypes.TimeDtype(). If you explicitly set the value to None, then the data type will be numpy.dtype("object"). BigQuery Time type can be found at: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#time_type

    New in version 3.10.0.

  • timestamp_dtype (Optional[pandas.Series.dtype, None]) –

    If set, indicate a pandas ExtensionDtype (e.g. pandas.ArrowDtype(pyarrow.timestamp("us", tz="UTC"))) to convert BigQuery Timestamp type, instead of relying on the default numpy.dtype("datetime64[ns, UTC]"). If you explicitly set the value to None, then the data type will be numpy.dtype("datetime64[ns, UTC]") or object if out of bound. BigQuery Datetime type can be found at: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#timestamp_type

    New in version 3.10.0.

Returns

A DataFrame populated with row data and column headers from the query results. The column headers are derived from the destination table’s schema.

Return type

pandas.DataFrame

Raises

ValueError – If the pandas library cannot be imported, or the google.cloud.bigquery_storage_v1 module is required but cannot be imported. Also if geography_as_object is True, but the shapely library cannot be imported. Also if bool_dtype, int_dtype or other dtype parameters is not supported dtype.

to_dataframe_iterable(bqstorage_client: Optional[bigquery_storage.BigQueryReadClient] = None, dtypes: Optional[Dict[str, Any]] = None, max_queue_size: int = <object object>)pandas.DataFrame[source]

Create an iterable of pandas DataFrames, to process the table as a stream.

Parameters
  • bqstorage_client (Optional[google.cloud.bigquery_storage_v1.BigQueryReadClient]) –

    A BigQuery Storage API client. If supplied, use the faster BigQuery Storage API to fetch rows from BigQuery.

    This method requires google-cloud-bigquery-storage library.

    This method only exposes a subset of the capabilities of the BigQuery Storage API. For full access to all features (projections, filters, snapshots) use the Storage API directly.

  • dtypes (Optional[Map[str, Union[str, pandas.Series.dtype]]]) – A dictionary of column names pandas dtype``s. The provided ``dtype is used when constructing the series for the column specified. Otherwise, the default pandas behavior is used.

  • max_queue_size (Optional[int]) –

    The maximum number of result pages to hold in the internal queue when streaming query results over the BigQuery Storage API. Ignored if Storage API is not used.

    By default, the max queue size is set to the number of BQ Storage streams created by the server. If max_queue_size is None, the queue size is infinite.

    New in version 2.14.0.

Returns

A generator of DataFrame.

Return type

pandas.DataFrame

Raises

ValueError – If the pandas library cannot be imported.

to_geodataframe(bqstorage_client: Optional[bigquery_storage.BigQueryReadClient] = None, dtypes: Optional[Dict[str, Any]] = None, progress_bar_type: Optional[str] = None, create_bqstorage_client: bool = True, geography_column: Optional[str] = None)geopandas.GeoDataFrame[source]

Create a GeoPandas GeoDataFrame by loading all pages of a query.

Parameters
  • bqstorage_client (Optional[google.cloud.bigquery_storage_v1.BigQueryReadClient]) –

    A BigQuery Storage API client. If supplied, use the faster BigQuery Storage API to fetch rows from BigQuery.

    This method requires the pyarrow and google-cloud-bigquery-storage libraries.

    This method only exposes a subset of the capabilities of the BigQuery Storage API. For full access to all features (projections, filters, snapshots) use the Storage API directly.

  • dtypes (Optional[Map[str, Union[str, pandas.Series.dtype]]]) – A dictionary of column names pandas dtype``s. The provided ``dtype is used when constructing the series for the column specified. Otherwise, the default pandas behavior is used.

  • progress_bar_type (Optional[str]) –

    If set, use the tqdm library to display a progress bar while the data downloads. Install the tqdm package to use this feature.

    Possible values of progress_bar_type include:

    None

    No progress bar.

    'tqdm'

    Use the tqdm.tqdm() function to print a progress bar to sys.stdout.

    'tqdm_notebook'

    Use the tqdm.notebook.tqdm() function to display a progress bar as a Jupyter notebook widget.

    'tqdm_gui'

    Use the tqdm.tqdm_gui() function to display a progress bar as a graphical dialog box.

  • create_bqstorage_client (Optional[bool]) –

    If True (default), create a BigQuery Storage API client using the default API settings. The BigQuery Storage API is a faster way to fetch rows from BigQuery. See the bqstorage_client parameter for more information.

    This argument does nothing if bqstorage_client is supplied.

  • geography_column (Optional[str]) – If there are more than one GEOGRAPHY column, identifies which one to use to construct a geopandas GeoDataFrame. This option can be ommitted if there’s only one GEOGRAPHY column.

Returns

A geopandas.GeoDataFrame populated with row data and column headers from the query results. The column headers are derived from the destination table’s schema.

Return type

geopandas.GeoDataFrame

Raises

ValueError – If the geopandas library cannot be imported, or the google.cloud.bigquery_storage_v1 module is required but cannot be imported.

New in version 2.24.0.

property total_rows

The total number of rows in the table or query results.

Type

int