As of January 1, 2020 this library no longer supports Python 2 on the latest released version. Library versions released prior to that date will continue to be available. For more information please visit Python 2 support on Google Cloud.

Python Client for Cloud Dataflow

beta pypi versions

Cloud Dataflow: Unified stream and batch data processing that’s serverless, fast, and cost-effective.

Quick Start

In order to use this library, you first need to go through the following steps:

  1. Select or create a Cloud Platform project.

  2. Enable billing for your project.

  3. Enable the APIs.

  4. Setup Authentication.

Installation

Install this library in a virtualenv using pip. virtualenv is a tool to create isolated Python environments. The basic problem it addresses is one of dependencies and versions, and indirectly permissions.

With virtualenv, it’s possible to install this library without needing system install permissions, and without clashing with the installed system dependencies.

Mac/Linux

pip install virtualenv
virtualenv <your-env>
source <your-env>/bin/activate
<your-env>/bin/pip install google-cloud-dataflow-client

Windows

pip install virtualenv
virtualenv <your-env>
<your-env>\Scripts\activate
<your-env>\Scripts\pip.exe install google-cloud-dataflow-client

Next Steps

Note

Because this client uses grpc library, it is safe to share instances across threads. In multiprocessing scenarios, the best practice is to create client instances after the invocation of os.fork() by multiprocessing.pool.Pool or multiprocessing.Process.

API Reference

Changelog

For a list of all google-cloud-dataflow-client releases: