As of January 1, 2020 this library no longer supports Python 2 on the latest released version. Library versions released prior to that date will continue to be available. For more information please visit Python 2 support on Google Cloud.

google.cloud.bigquery.table.RowIterator

class google.cloud.bigquery.table.RowIterator(client, api_request, path, schema, page_token=None, max_results=None, page_size=None, extra_params=None, table=None, selected_fields=None, total_rows=None, first_page_response=None)[source]

A class for iterating through HTTP/JSON API row list responses.

Parameters
  • client (google.cloud.bigquery.Client) – The API client.

  • api_request (Callable[google.cloud._http.JSONConnection.api_request]) – The function to use to make API requests.

  • path (str) – The method path to query for the list of items.

  • schema (Sequence[Union[ SchemaField, Mapping[str, Any] ]]) – The table’s schema. If any item is a mapping, its content must be compatible with from_api_repr().

  • page_token (str) – A token identifying a page in a result set to start fetching results from.

  • max_results (Optional[int]) – The maximum number of results to fetch.

  • page_size (Optional[int]) – The maximum number of rows in each page of results from this request. Non-positive values are ignored. Defaults to a sensible value set by the API.

  • extra_params (Optional[Dict[str, object]]) – Extra query string parameters for the API call.

  • table (Optional[Union[ google.cloud.bigquery.table.Table, google.cloud.bigquery.table.TableReference, ]]) – The table which these rows belong to, or a reference to it. Used to call the BigQuery Storage API to fetch rows.

  • selected_fields (Optional[Sequence[google.cloud.bigquery.schema.SchemaField]]) – A subset of columns to select from this table.

  • total_rows (Optional[int]) – Total number of rows in the table.

  • first_page_response (Optional[dict]) – API response for the first page of results. These are returned when the first page is requested.

__init__(client, api_request, path, schema, page_token=None, max_results=None, page_size=None, extra_params=None, table=None, selected_fields=None, total_rows=None, first_page_response=None)[source]

Initialize self. See help(type(self)) for accurate signature.

Methods

__init__(client, api_request, path, schema)

Initialize self.

to_arrow([progress_bar_type, …])

[Beta] Create a class:pyarrow.Table by loading all pages of a table or query.

to_dataframe([bqstorage_client, dtypes, …])

Create a pandas DataFrame by loading all pages of a query.

to_dataframe_iterable([bqstorage_client, dtypes])

Create an iterable of pandas DataFrames, to process the table as a stream.

Attributes

pages

Iterator of pages in the response.

schema

The subset of columns to be read from the table.

total_rows

The total number of rows in the table.

__iter__()

Iterator for each item returned.

Returns

A generator of items from the API.

Return type

types.GeneratorType[Any]

Raises

ValueError – If the iterator has already been started.

client

The client that created this iterator.

Type

Optional[Any]

item_to_value

Callable to convert an item from the type in the raw API response into the native object. Will be called with the iterator and a single item.

Type

Callable[Iterator, Any]

max_results

The maximum number of results to fetch.

Type

int

next_page_token

The token for the next page of results. If this is set before the iterator starts, it effectively offsets the iterator to a specific starting point.

Type

str

num_results

The total number of results fetched so far.

Type

int

page_number

The current page of results.

Type

int

property pages

Iterator of pages in the response.

Returns

A

generator of page instances.

Return type

types.GeneratorType[google.api_core.page_iterator.Page]

Raises

ValueError – If the iterator has already been started.

property schema

The subset of columns to be read from the table.

Type

List[google.cloud.bigquery.schema.SchemaField]

to_arrow(progress_bar_type=None, bqstorage_client=None, create_bqstorage_client=True)[source]

[Beta] Create a class:pyarrow.Table by loading all pages of a table or query.

Parameters
  • progress_bar_type (Optional[str]) –

    If set, use the tqdm library to display a progress bar while the data downloads. Install the tqdm package to use this feature.

    Possible values of progress_bar_type include:

    None

    No progress bar.

    'tqdm'

    Use the tqdm.tqdm() function to print a progress bar to sys.stderr.

    'tqdm_notebook'

    Use the tqdm.tqdm_notebook() function to display a progress bar as a Jupyter notebook widget.

    'tqdm_gui'

    Use the tqdm.tqdm_gui() function to display a progress bar as a graphical dialog box.

  • bqstorage_client (Optional[google.cloud.bigquery_storage_v1.BigQueryReadClient]) –

    A BigQuery Storage API client. If supplied, use the faster BigQuery Storage API to fetch rows from BigQuery. This API is a billable API.

    This method requires the pyarrow and google-cloud-bigquery-storage libraries.

    This method only exposes a subset of the capabilities of the BigQuery Storage API. For full access to all features (projections, filters, snapshots) use the Storage API directly.

  • create_bqstorage_client (Optional[bool]) –

    If True (default), create a BigQuery Storage API client using the default API settings. The BigQuery Storage API is a faster way to fetch rows from BigQuery. See the bqstorage_client parameter for more information.

    This argument does nothing if bqstorage_client is supplied.

    ..versionadded:: 1.24.0

Returns

pyarrow.Table

A pyarrow.Table populated with row data and column headers from the query results. The column headers are derived from the destination table’s schema.

Raises

ValueError – If the pyarrow library cannot be imported.

..versionadded:: 1.17.0

to_dataframe(bqstorage_client=None, dtypes=None, progress_bar_type=None, create_bqstorage_client=True, date_as_object=True)[source]

Create a pandas DataFrame by loading all pages of a query.

Parameters
  • bqstorage_client (Optional[google.cloud.bigquery_storage_v1.BigQueryReadClient]) –

    A BigQuery Storage API client. If supplied, use the faster BigQuery Storage API to fetch rows from BigQuery.

    This method requires the pyarrow and google-cloud-bigquery-storage libraries.

    This method only exposes a subset of the capabilities of the BigQuery Storage API. For full access to all features (projections, filters, snapshots) use the Storage API directly.

  • dtypes (Optional[Map[str, Union[str, pandas.Series.dtype]]]) – A dictionary of column names pandas dtype``s. The provided ``dtype is used when constructing the series for the column specified. Otherwise, the default pandas behavior is used.

  • progress_bar_type (Optional[str]) –

    If set, use the tqdm library to display a progress bar while the data downloads. Install the tqdm package to use this feature.

    Possible values of progress_bar_type include:

    None

    No progress bar.

    'tqdm'

    Use the tqdm.tqdm() function to print a progress bar to sys.stderr.

    'tqdm_notebook'

    Use the tqdm.tqdm_notebook() function to display a progress bar as a Jupyter notebook widget.

    'tqdm_gui'

    Use the tqdm.tqdm_gui() function to display a progress bar as a graphical dialog box.

    ..versionadded:: 1.11.0

  • create_bqstorage_client (Optional[bool]) –

    If True (default), create a BigQuery Storage API client using the default API settings. The BigQuery Storage API is a faster way to fetch rows from BigQuery. See the bqstorage_client parameter for more information.

    This argument does nothing if bqstorage_client is supplied.

    ..versionadded:: 1.24.0

  • date_as_object (Optional[bool]) –

    If True (default), cast dates to objects. If False, convert to datetime64[ns] dtype.

    ..versionadded:: 1.26.0

Returns

A DataFrame populated with row data and column headers from the query results. The column headers are derived from the destination table’s schema.

Return type

pandas.DataFrame

Raises

ValueError – If the pandas library cannot be imported, or the google.cloud.bigquery_storage_v1 module is required but cannot be imported.

to_dataframe_iterable(bqstorage_client=None, dtypes=None)[source]

Create an iterable of pandas DataFrames, to process the table as a stream.

Parameters
  • bqstorage_client (Optional[google.cloud.bigquery_storage_v1.BigQueryReadClient]) –

    A BigQuery Storage API client. If supplied, use the faster BigQuery Storage API to fetch rows from BigQuery.

    This method requires the pyarrow and google-cloud-bigquery-storage libraries.

    This method only exposes a subset of the capabilities of the BigQuery Storage API. For full access to all features (projections, filters, snapshots) use the Storage API directly.

  • dtypes (Optional[Map[str, Union[str, pandas.Series.dtype]]]) – A dictionary of column names pandas dtype``s. The provided ``dtype is used when constructing the series for the column specified. Otherwise, the default pandas behavior is used.

Returns

A generator of DataFrame.

Return type

pandas.DataFrame

Raises

ValueError – If the pandas library cannot be imported.

property total_rows

The total number of rows in the table.

Type

int