As of January 1, 2020 this library no longer supports Python 2 on the latest released version. Library versions released prior to that date will continue to be available. For more information please visit Python 2 support on Google Cloud.

Types for Google Cloud Speech v2 API

class google.cloud.speech_v2.types.AutoDetectDecodingConfig(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Automatically detected decoding parameters. Supported for the following encodings:

  • WAV_LINEAR16: 16-bit signed little-endian PCM samples in a WAV container.

  • WAV_MULAW: 8-bit companded mulaw samples in a WAV container.

  • WAV_ALAW: 8-bit companded alaw samples in a WAV container.

  • RFC4867_5_AMR: AMR frames with an rfc4867.5 header.

  • RFC4867_5_AMRWB: AMR-WB frames with an rfc4867.5 header.

  • FLAC: FLAC frames in the “native FLAC” container format.

  • MP3: MPEG audio frames with optional (ignored) ID3 metadata.

  • OGG_OPUS: Opus audio frames in an Ogg container.

  • WEBM_OPUS: Opus audio frames in a WebM container.

class google.cloud.speech_v2.types.BatchRecognizeFileMetadata(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Metadata about a single file in a batch for BatchRecognize.

uri

Cloud Storage URI for the audio file.

This field is a member of oneof audio_source.

Type

str

config

Features and audio metadata to use for the Automatic Speech Recognition. This field in combination with the [config_mask][google.cloud.speech.v2.BatchRecognizeFileMetadata.config_mask] field can be used to override parts of the [default_recognition_config][google.cloud.speech.v2.Recognizer.default_recognition_config] of the Recognizer resource as well as the [config][google.cloud.speech.v2.BatchRecognizeRequest.config] at the request level.

Type

google.cloud.speech_v2.types.RecognitionConfig

config_mask

The list of fields in [config][google.cloud.speech.v2.BatchRecognizeFileMetadata.config] that override the values in the [default_recognition_config][google.cloud.speech.v2.Recognizer.default_recognition_config] of the recognizer during this recognition request. If no mask is provided, all non-default valued fields in [config][google.cloud.speech.v2.BatchRecognizeFileMetadata.config] override the values in the recognizer for this recognition request. If a mask is provided, only the fields listed in the mask override the config in the recognizer for this recognition request. If a wildcard (*) is provided, [config][google.cloud.speech.v2.BatchRecognizeFileMetadata.config] completely overrides and replaces the config in the recognizer for this recognition request.

Type

google.protobuf.field_mask_pb2.FieldMask

class google.cloud.speech_v2.types.BatchRecognizeFileResult(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Final results for a single file.

uri

The Cloud Storage URI to which recognition results were written.

Type

str

error

Error if one was encountered.

Type

google.rpc.status_pb2.Status

metadata
Type

google.cloud.speech_v2.types.RecognitionResponseMetadata

transcript

The transcript for the audio file. This is populated only when [InlineOutputConfig][google.cloud.speech.v2.InlineOutputConfig] is set in the [RecognitionOutputConfig][[google.cloud.speech.v2.RecognitionOutputConfig].

Type

google.cloud.speech_v2.types.BatchRecognizeResults

class google.cloud.speech_v2.types.BatchRecognizeMetadata(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Operation metadata for [BatchRecognize][google.cloud.speech.v2.Speech.BatchRecognize].

transcription_metadata

Map from provided filename to the transcription metadata for that file.

Type

MutableMapping[str, google.cloud.speech_v2.types.BatchRecognizeTranscriptionMetadata]

class TranscriptionMetadataEntry(mapping=None, *, ignore_unknown_fields=False, **kwargs)

Bases: proto.message.Message

class google.cloud.speech_v2.types.BatchRecognizeRequest(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Request message for the [BatchRecognize][google.cloud.speech.v2.Speech.BatchRecognize] method.

recognizer

Required. The name of the Recognizer to use during recognition. The expected format is projects/{project}/locations/{location}/recognizers/{recognizer}. The {recognizer} segment may be set to _ to use an empty implicit Recognizer.

Type

str

config

Features and audio metadata to use for the Automatic Speech Recognition. This field in combination with the [config_mask][google.cloud.speech.v2.BatchRecognizeRequest.config_mask] field can be used to override parts of the [default_recognition_config][google.cloud.speech.v2.Recognizer.default_recognition_config] of the Recognizer resource.

Type

google.cloud.speech_v2.types.RecognitionConfig

config_mask

The list of fields in [config][google.cloud.speech.v2.BatchRecognizeRequest.config] that override the values in the [default_recognition_config][google.cloud.speech.v2.Recognizer.default_recognition_config] of the recognizer during this recognition request. If no mask is provided, all given fields in [config][google.cloud.speech.v2.BatchRecognizeRequest.config] override the values in the recognizer for this recognition request. If a mask is provided, only the fields listed in the mask override the config in the recognizer for this recognition request. If a wildcard (*) is provided, [config][google.cloud.speech.v2.BatchRecognizeRequest.config] completely overrides and replaces the config in the recognizer for this recognition request.

Type

google.protobuf.field_mask_pb2.FieldMask

files

Audio files with file metadata for ASR. The maximum number of files allowed to be specified is 5.

Type

MutableSequence[google.cloud.speech_v2.types.BatchRecognizeFileMetadata]

recognition_output_config

Configuration options for where to output the transcripts of each file.

Type

google.cloud.speech_v2.types.RecognitionOutputConfig

processing_strategy

Processing strategy to use for this request.

Type

google.cloud.speech_v2.types.BatchRecognizeRequest.ProcessingStrategy

class ProcessingStrategy(value)[source]

Bases: proto.enums.Enum

Possible processing strategies for batch requests.

Values:
PROCESSING_STRATEGY_UNSPECIFIED (0):

Default value for the processing strategy. The request is processed as soon as its received.

DYNAMIC_BATCHING (1):

If selected, processes the request during lower utilization periods for a price discount. The request is fulfilled within 24 hours.

class google.cloud.speech_v2.types.BatchRecognizeResponse(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Response message for [BatchRecognize][google.cloud.speech.v2.Speech.BatchRecognize] that is packaged into a longrunning [Operation][google.longrunning.Operation].

results

Map from filename to the final result for that file.

Type

MutableMapping[str, google.cloud.speech_v2.types.BatchRecognizeFileResult]

total_billed_duration

When available, billed audio seconds for the corresponding request.

Type

google.protobuf.duration_pb2.Duration

class ResultsEntry(mapping=None, *, ignore_unknown_fields=False, **kwargs)

Bases: proto.message.Message

class google.cloud.speech_v2.types.BatchRecognizeResults(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Output type for Cloud Storage of BatchRecognize transcripts. Though this proto isn’t returned in this API anywhere, the Cloud Storage transcripts will be this proto serialized and should be parsed as such.

results

Sequential list of transcription results corresponding to sequential portions of audio.

Type

MutableSequence[google.cloud.speech_v2.types.SpeechRecognitionResult]

metadata

Metadata about the recognition.

Type

google.cloud.speech_v2.types.RecognitionResponseMetadata

class google.cloud.speech_v2.types.BatchRecognizeTranscriptionMetadata(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Metadata about transcription for a single file (for example, progress percent).

progress_percent

How much of the file has been transcribed so far.

Type

int

error

Error if one was encountered.

Type

google.rpc.status_pb2.Status

uri

The Cloud Storage URI to which recognition results will be written.

Type

str

class google.cloud.speech_v2.types.Config(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Message representing the config for the Speech-to-Text API. This includes an optional KMS key with which incoming data will be encrypted.

name

Output only. The name of the config resource. There is exactly one config resource per project per location. The expected format is projects/{project}/locations/{location}/config.

Type

str

kms_key_name

Optional. An optional KMS key name that if present, will be used to encrypt Speech-to-Text resources at-rest. Updating this key will not encrypt existing resources using this key; only new resources will be encrypted using this key. The expected format is projects/{project}/locations/{location}/keyRings/{key_ring}/cryptoKeys/{crypto_key}.

Type

str

update_time

Output only. The most recent time this resource was modified.

Type

google.protobuf.timestamp_pb2.Timestamp

class google.cloud.speech_v2.types.CreateCustomClassRequest(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Request message for the [CreateCustomClass][google.cloud.speech.v2.Speech.CreateCustomClass] method.

custom_class

Required. The CustomClass to create.

Type

google.cloud.speech_v2.types.CustomClass

validate_only

If set, validate the request and preview the CustomClass, but do not actually create it.

Type

bool

custom_class_id

The ID to use for the CustomClass, which will become the final component of the CustomClass’s resource name.

This value should be 4-63 characters, and valid characters are /[a-z][0-9]-/.

Type

str

parent

Required. The project and location where this CustomClass will be created. The expected format is projects/{project}/locations/{location}.

Type

str

class google.cloud.speech_v2.types.CreatePhraseSetRequest(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Request message for the [CreatePhraseSet][google.cloud.speech.v2.Speech.CreatePhraseSet] method.

phrase_set

Required. The PhraseSet to create.

Type

google.cloud.speech_v2.types.PhraseSet

validate_only

If set, validate the request and preview the PhraseSet, but do not actually create it.

Type

bool

phrase_set_id

The ID to use for the PhraseSet, which will become the final component of the PhraseSet’s resource name.

This value should be 4-63 characters, and valid characters are /[a-z][0-9]-/.

Type

str

parent

Required. The project and location where this PhraseSet will be created. The expected format is projects/{project}/locations/{location}.

Type

str

class google.cloud.speech_v2.types.CreateRecognizerRequest(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Request message for the [CreateRecognizer][google.cloud.speech.v2.Speech.CreateRecognizer] method.

recognizer

Required. The Recognizer to create.

Type

google.cloud.speech_v2.types.Recognizer

validate_only

If set, validate the request and preview the Recognizer, but do not actually create it.

Type

bool

recognizer_id

The ID to use for the Recognizer, which will become the final component of the Recognizer’s resource name.

This value should be 4-63 characters, and valid characters are /[a-z][0-9]-/.

Type

str

parent

Required. The project and location where this Recognizer will be created. The expected format is projects/{project}/locations/{location}.

Type

str

class google.cloud.speech_v2.types.CustomClass(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

CustomClass for biasing in speech recognition. Used to define a set of words or phrases that represents a common concept or theme likely to appear in your audio, for example a list of passenger ship names.

name

Output only. The resource name of the CustomClass. Format: projects/{project}/locations/{location}/customClasses/{custom_class}.

Type

str

uid

Output only. System-assigned unique identifier for the CustomClass.

Type

str

display_name

User-settable, human-readable name for the CustomClass. Must be 63 characters or less.

Type

str

items

A collection of class items.

Type

MutableSequence[google.cloud.speech_v2.types.CustomClass.ClassItem]

state

Output only. The CustomClass lifecycle state.

Type

google.cloud.speech_v2.types.CustomClass.State

create_time

Output only. Creation time.

Type

google.protobuf.timestamp_pb2.Timestamp

update_time

Output only. The most recent time this resource was modified.

Type

google.protobuf.timestamp_pb2.Timestamp

delete_time

Output only. The time at which this resource was requested for deletion.

Type

google.protobuf.timestamp_pb2.Timestamp

expire_time

Output only. The time at which this resource will be purged.

Type

google.protobuf.timestamp_pb2.Timestamp

annotations

Allows users to store small amounts of arbitrary data. Both the key and the value must be 63 characters or less each. At most 100 annotations.

Type

MutableMapping[str, str]

etag

Output only. This checksum is computed by the server based on the value of other fields. This may be sent on update, undelete, and delete requests to ensure the client has an up-to-date value before proceeding.

Type

str

reconciling

Output only. Whether or not this CustomClass is in the process of being updated.

Type

bool

kms_key_name

Output only. The KMS key name with which the CustomClass is encrypted. The expected format is projects/{project}/locations/{location}/keyRings/{key_ring}/cryptoKeys/{crypto_key}.

Type

str

kms_key_version_name

Output only. The KMS key version name with which the CustomClass is encrypted. The expected format is projects/{project}/locations/{location}/keyRings/{key_ring}/cryptoKeys/{crypto_key}/cryptoKeyVersions/{crypto_key_version}.

Type

str

class AnnotationsEntry(mapping=None, *, ignore_unknown_fields=False, **kwargs)

Bases: proto.message.Message

class ClassItem(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

An item of the class.

value

The class item’s value.

Type

str

class State(value)[source]

Bases: proto.enums.Enum

Set of states that define the lifecycle of a CustomClass.

Values:
STATE_UNSPECIFIED (0):

Unspecified state. This is only used/useful for distinguishing unset values.

ACTIVE (2):

The normal and active state.

DELETED (4):

This CustomClass has been deleted.

class google.cloud.speech_v2.types.DeleteCustomClassRequest(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Request message for the [DeleteCustomClass][google.cloud.speech.v2.Speech.DeleteCustomClass] method.

name

Required. The name of the CustomClass to delete. Format: projects/{project}/locations/{location}/customClasses/{custom_class}

Type

str

validate_only

If set, validate the request and preview the deleted CustomClass, but do not actually delete it.

Type

bool

allow_missing

If set to true, and the CustomClass is not found, the request will succeed and be a no-op (no Operation is recorded in this case).

Type

bool

etag

This checksum is computed by the server based on the value of other fields. This may be sent on update, undelete, and delete requests to ensure the client has an up-to-date value before proceeding.

Type

str

class google.cloud.speech_v2.types.DeletePhraseSetRequest(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Request message for the [DeletePhraseSet][google.cloud.speech.v2.Speech.DeletePhraseSet] method.

name

Required. The name of the PhraseSet to delete. Format: projects/{project}/locations/{location}/phraseSets/{phrase_set}

Type

str

validate_only

If set, validate the request and preview the deleted PhraseSet, but do not actually delete it.

Type

bool

allow_missing

If set to true, and the PhraseSet is not found, the request will succeed and be a no-op (no Operation is recorded in this case).

Type

bool

etag

This checksum is computed by the server based on the value of other fields. This may be sent on update, undelete, and delete requests to ensure the client has an up-to-date value before proceeding.

Type

str

class google.cloud.speech_v2.types.DeleteRecognizerRequest(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Request message for the [DeleteRecognizer][google.cloud.speech.v2.Speech.DeleteRecognizer] method.

name

Required. The name of the Recognizer to delete. Format: projects/{project}/locations/{location}/recognizers/{recognizer}

Type

str

validate_only

If set, validate the request and preview the deleted Recognizer, but do not actually delete it.

Type

bool

allow_missing

If set to true, and the Recognizer is not found, the request will succeed and be a no-op (no Operation is recorded in this case).

Type

bool

etag

This checksum is computed by the server based on the value of other fields. This may be sent on update, undelete, and delete requests to ensure the client has an up-to-date value before proceeding.

Type

str

class google.cloud.speech_v2.types.ExplicitDecodingConfig(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Explicitly specified decoding parameters.

encoding

Required. Encoding of the audio data sent for recognition.

Type

google.cloud.speech_v2.types.ExplicitDecodingConfig.AudioEncoding

sample_rate_hertz

Sample rate in Hertz of the audio data sent for recognition. Valid values are: 8000-48000. 16000 is optimal. For best results, set the sampling rate of the audio source to 16000 Hz. If that’s not possible, use the native sample rate of the audio source (instead of re-sampling). Supported for the following encodings:

  • LINEAR16: Headerless 16-bit signed little-endian PCM samples.

  • MULAW: Headerless 8-bit companded mulaw samples.

  • ALAW: Headerless 8-bit companded alaw samples.

Type

int

audio_channel_count

Number of channels present in the audio data sent for recognition. Supported for the following encodings:

  • LINEAR16: Headerless 16-bit signed little-endian PCM samples.

  • MULAW: Headerless 8-bit companded mulaw samples.

  • ALAW: Headerless 8-bit companded alaw samples.

The maximum allowed value is 8.

Type

int

class AudioEncoding(value)[source]

Bases: proto.enums.Enum

Supported audio data encodings.

Values:
AUDIO_ENCODING_UNSPECIFIED (0):

Default value. This value is unused.

LINEAR16 (1):

Headerless 16-bit signed little-endian PCM samples.

MULAW (2):

Headerless 8-bit companded mulaw samples.

ALAW (3):

Headerless 8-bit companded alaw samples.

class google.cloud.speech_v2.types.GcsOutputConfig(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Output configurations for Cloud Storage.

uri

The Cloud Storage URI prefix with which recognition results will be written.

Type

str

class google.cloud.speech_v2.types.GetConfigRequest(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Request message for the [GetConfig][google.cloud.speech.v2.Speech.GetConfig] method.

name

Required. The name of the config to retrieve. There is exactly one config resource per project per location. The expected format is projects/{project}/locations/{location}/config.

Type

str

class google.cloud.speech_v2.types.GetCustomClassRequest(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Request message for the [GetCustomClass][google.cloud.speech.v2.Speech.GetCustomClass] method.

name

Required. The name of the CustomClass to retrieve. The expected format is projects/{project}/locations/{location}/customClasses/{custom_class}.

Type

str

class google.cloud.speech_v2.types.GetPhraseSetRequest(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Request message for the [GetPhraseSet][google.cloud.speech.v2.Speech.GetPhraseSet] method.

name

Required. The name of the PhraseSet to retrieve. The expected format is projects/{project}/locations/{location}/phraseSets/{phrase_set}.

Type

str

class google.cloud.speech_v2.types.GetRecognizerRequest(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Request message for the [GetRecognizer][google.cloud.speech.v2.Speech.GetRecognizer] method.

name

Required. The name of the Recognizer to retrieve. The expected format is projects/{project}/locations/{location}/recognizers/{recognizer}.

Type

str

class google.cloud.speech_v2.types.InlineOutputConfig(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Output configurations for inline response.

class google.cloud.speech_v2.types.ListCustomClassesRequest(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Request message for the [ListCustomClasses][google.cloud.speech.v2.Speech.ListCustomClasses] method.

parent

Required. The project and location of CustomClass resources to list. The expected format is projects/{project}/locations/{location}.

Type

str

page_size

Number of results per requests. A valid page_size ranges from 0 to 100 inclusive. If the page_size is zero or unspecified, a page size of 5 will be chosen. If the page size exceeds 100, it will be coerced down to 100. Note that a call might return fewer results than the requested page size.

Type

int

page_token

A page token, received from a previous [ListCustomClasses][google.cloud.speech.v2.Speech.ListCustomClasses] call. Provide this to retrieve the subsequent page.

When paginating, all other parameters provided to [ListCustomClasses][google.cloud.speech.v2.Speech.ListCustomClasses] must match the call that provided the page token.

Type

str

show_deleted

Whether, or not, to show resources that have been deleted.

Type

bool

class google.cloud.speech_v2.types.ListCustomClassesResponse(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Response message for the [ListCustomClasses][google.cloud.speech.v2.Speech.ListCustomClasses] method.

custom_classes

The list of requested CustomClasses.

Type

MutableSequence[google.cloud.speech_v2.types.CustomClass]

next_page_token

A token, which can be sent as [page_token][google.cloud.speech.v2.ListCustomClassesRequest.page_token] to retrieve the next page. If this field is omitted, there are no subsequent pages. This token expires after 72 hours.

Type

str

class google.cloud.speech_v2.types.ListPhraseSetsRequest(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Request message for the [ListPhraseSets][google.cloud.speech.v2.Speech.ListPhraseSets] method.

parent

Required. The project and location of PhraseSet resources to list. The expected format is projects/{project}/locations/{location}.

Type

str

page_size

The maximum number of PhraseSets to return. The service may return fewer than this value. If unspecified, at most 5 PhraseSets will be returned. The maximum value is 100; values above 100 will be coerced to 100.

Type

int

page_token

A page token, received from a previous [ListPhraseSets][google.cloud.speech.v2.Speech.ListPhraseSets] call. Provide this to retrieve the subsequent page.

When paginating, all other parameters provided to [ListPhraseSets][google.cloud.speech.v2.Speech.ListPhraseSets] must match the call that provided the page token.

Type

str

show_deleted

Whether, or not, to show resources that have been deleted.

Type

bool

class google.cloud.speech_v2.types.ListPhraseSetsResponse(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Response message for the [ListPhraseSets][google.cloud.speech.v2.Speech.ListPhraseSets] method.

phrase_sets

The list of requested PhraseSets.

Type

MutableSequence[google.cloud.speech_v2.types.PhraseSet]

next_page_token

A token, which can be sent as [page_token][google.cloud.speech.v2.ListPhraseSetsRequest.page_token] to retrieve the next page. If this field is omitted, there are no subsequent pages. This token expires after 72 hours.

Type

str

class google.cloud.speech_v2.types.ListRecognizersRequest(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Request message for the [ListRecognizers][google.cloud.speech.v2.Speech.ListRecognizers] method.

parent

Required. The project and location of Recognizers to list. The expected format is projects/{project}/locations/{location}.

Type

str

page_size

The maximum number of Recognizers to return. The service may return fewer than this value. If unspecified, at most 5 Recognizers will be returned. The maximum value is 100; values above 100 will be coerced to 100.

Type

int

page_token

A page token, received from a previous [ListRecognizers][google.cloud.speech.v2.Speech.ListRecognizers] call. Provide this to retrieve the subsequent page.

When paginating, all other parameters provided to [ListRecognizers][google.cloud.speech.v2.Speech.ListRecognizers] must match the call that provided the page token.

Type

str

show_deleted

Whether, or not, to show resources that have been deleted.

Type

bool

class google.cloud.speech_v2.types.ListRecognizersResponse(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Response message for the [ListRecognizers][google.cloud.speech.v2.Speech.ListRecognizers] method.

recognizers

The list of requested Recognizers.

Type

MutableSequence[google.cloud.speech_v2.types.Recognizer]

next_page_token

A token, which can be sent as [page_token][google.cloud.speech.v2.ListRecognizersRequest.page_token] to retrieve the next page. If this field is omitted, there are no subsequent pages. This token expires after 72 hours.

Type

str

class google.cloud.speech_v2.types.OperationMetadata(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Represents the metadata of a long-running operation.

This message has oneof fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.

create_time

The time the operation was created.

Type

google.protobuf.timestamp_pb2.Timestamp

update_time

The time the operation was last updated.

Type

google.protobuf.timestamp_pb2.Timestamp

resource

The resource path for the target of the operation.

Type

str

method

The method that triggered the operation.

Type

str

kms_key_name

The KMS key name with which the content of the Operation is encrypted. The expected format is projects/{project}/locations/{location}/keyRings/{key_ring}/cryptoKeys/{crypto_key}.

Type

str

kms_key_version_name

The KMS key version name with which content of the Operation is encrypted. The expected format is projects/{project}/locations/{location}/keyRings/{key_ring}/cryptoKeys/{crypto_key}/cryptoKeyVersions/{crypto_key_version}.

Type

str

batch_recognize_request

The BatchRecognizeRequest that spawned the Operation.

This field is a member of oneof request.

Type

google.cloud.speech_v2.types.BatchRecognizeRequest

create_recognizer_request

The CreateRecognizerRequest that spawned the Operation.

This field is a member of oneof request.

Type

google.cloud.speech_v2.types.CreateRecognizerRequest

update_recognizer_request

The UpdateRecognizerRequest that spawned the Operation.

This field is a member of oneof request.

Type

google.cloud.speech_v2.types.UpdateRecognizerRequest

delete_recognizer_request

The DeleteRecognizerRequest that spawned the Operation.

This field is a member of oneof request.

Type

google.cloud.speech_v2.types.DeleteRecognizerRequest

undelete_recognizer_request

The UndeleteRecognizerRequest that spawned the Operation.

This field is a member of oneof request.

Type

google.cloud.speech_v2.types.UndeleteRecognizerRequest

create_custom_class_request

The CreateCustomClassRequest that spawned the Operation.

This field is a member of oneof request.

Type

google.cloud.speech_v2.types.CreateCustomClassRequest

update_custom_class_request

The UpdateCustomClassRequest that spawned the Operation.

This field is a member of oneof request.

Type

google.cloud.speech_v2.types.UpdateCustomClassRequest

delete_custom_class_request

The DeleteCustomClassRequest that spawned the Operation.

This field is a member of oneof request.

Type

google.cloud.speech_v2.types.DeleteCustomClassRequest

undelete_custom_class_request

The UndeleteCustomClassRequest that spawned the Operation.

This field is a member of oneof request.

Type

google.cloud.speech_v2.types.UndeleteCustomClassRequest

create_phrase_set_request

The CreatePhraseSetRequest that spawned the Operation.

This field is a member of oneof request.

Type

google.cloud.speech_v2.types.CreatePhraseSetRequest

update_phrase_set_request

The UpdatePhraseSetRequest that spawned the Operation.

This field is a member of oneof request.

Type

google.cloud.speech_v2.types.UpdatePhraseSetRequest

delete_phrase_set_request

The DeletePhraseSetRequest that spawned the Operation.

This field is a member of oneof request.

Type

google.cloud.speech_v2.types.DeletePhraseSetRequest

undelete_phrase_set_request

The UndeletePhraseSetRequest that spawned the Operation.

This field is a member of oneof request.

Type

google.cloud.speech_v2.types.UndeletePhraseSetRequest

update_config_request

The UpdateConfigRequest that spawned the Operation.

This field is a member of oneof request.

Type

google.cloud.speech_v2.types.UpdateConfigRequest

progress_percent

The percent progress of the Operation. Values can range from 0-100. If the value is 100, then the operation is finished.

Type

int

batch_recognize_metadata

Metadata specific to the BatchRecognize method.

This field is a member of oneof metadata.

Type

google.cloud.speech_v2.types.BatchRecognizeMetadata

class google.cloud.speech_v2.types.PhraseSet(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

PhraseSet for biasing in speech recognition. A PhraseSet is used to provide “hints” to the speech recognizer to favor specific words and phrases in the results.

name

Output only. The resource name of the PhraseSet. Format: projects/{project}/locations/{location}/phraseSets/{phrase_set}.

Type

str

uid

Output only. System-assigned unique identifier for the PhraseSet.

Type

str

phrases

A list of word and phrases.

Type

MutableSequence[google.cloud.speech_v2.types.PhraseSet.Phrase]

boost

Hint Boost. Positive value will increase the probability that a specific phrase will be recognized over other similar sounding phrases. The higher the boost, the higher the chance of false positive recognition as well. Valid boost values are between 0 (exclusive) and 20. We recommend using a binary search approach to finding the optimal value for your use case as well as adding phrases both with and without boost to your requests.

Type

float

display_name

User-settable, human-readable name for the PhraseSet. Must be 63 characters or less.

Type

str

state

Output only. The PhraseSet lifecycle state.

Type

google.cloud.speech_v2.types.PhraseSet.State

create_time

Output only. Creation time.

Type

google.protobuf.timestamp_pb2.Timestamp

update_time

Output only. The most recent time this resource was modified.

Type

google.protobuf.timestamp_pb2.Timestamp

delete_time

Output only. The time at which this resource was requested for deletion.

Type

google.protobuf.timestamp_pb2.Timestamp

expire_time

Output only. The time at which this resource will be purged.

Type

google.protobuf.timestamp_pb2.Timestamp

annotations

Allows users to store small amounts of arbitrary data. Both the key and the value must be 63 characters or less each. At most 100 annotations.

Type

MutableMapping[str, str]

etag

Output only. This checksum is computed by the server based on the value of other fields. This may be sent on update, undelete, and delete requests to ensure the client has an up-to-date value before proceeding.

Type

str

reconciling

Output only. Whether or not this PhraseSet is in the process of being updated.

Type

bool

kms_key_name

Output only. The KMS key name with which the PhraseSet is encrypted. The expected format is projects/{project}/locations/{location}/keyRings/{key_ring}/cryptoKeys/{crypto_key}.

Type

str

kms_key_version_name

Output only. The KMS key version name with which the PhraseSet is encrypted. The expected format is projects/{project}/locations/{location}/keyRings/{key_ring}/cryptoKeys/{crypto_key}/cryptoKeyVersions/{crypto_key_version}.

Type

str

class AnnotationsEntry(mapping=None, *, ignore_unknown_fields=False, **kwargs)

Bases: proto.message.Message

class Phrase(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

A Phrase contains words and phrase “hints” so that the speech recognition is more likely to recognize them. This can be used to improve the accuracy for specific words and phrases, for example, if specific commands are typically spoken by the user. This can also be used to add additional words to the vocabulary of the recognizer.

List items can also include CustomClass references containing groups of words that represent common concepts that occur in natural language.

value

The phrase itself.

Type

str

boost

Hint Boost. Overrides the boost set at the phrase set level. Positive value will increase the probability that a specific phrase will be recognized over other similar sounding phrases. The higher the boost, the higher the chance of false positive recognition as well. Negative boost values would correspond to anti-biasing. Anti-biasing is not enabled, so negative boost values will return an error. Boost values must be between 0 and 20. Any values outside that range will return an error. We recommend using a binary search approach to finding the optimal value for your use case as well as adding phrases both with and without boost to your requests.

Type

float

class State(value)[source]

Bases: proto.enums.Enum

Set of states that define the lifecycle of a PhraseSet.

Values:
STATE_UNSPECIFIED (0):

Unspecified state. This is only used/useful for distinguishing unset values.

ACTIVE (2):

The normal and active state.

DELETED (4):

This PhraseSet has been deleted.

class google.cloud.speech_v2.types.RecognitionConfig(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Provides information to the Recognizer that specifies how to process the recognition request.

This message has oneof fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.

auto_decoding_config

Automatically detect decoding parameters. Preferred for supported formats.

This field is a member of oneof decoding_config.

Type

google.cloud.speech_v2.types.AutoDetectDecodingConfig

explicit_decoding_config

Explicitly specified decoding parameters. Required if using headerless PCM audio (linear16, mulaw, alaw).

This field is a member of oneof decoding_config.

Type

google.cloud.speech_v2.types.ExplicitDecodingConfig

model

Optional. Which model to use for recognition requests. Select the model best suited to your domain to get best results.

Guidance for choosing which model to use can be found in the Transcription Models Documentation and the models supported in each region can be found in the Table Of Supported Models.

Type

str

language_codes

Optional. The language of the supplied audio as a BCP-47 language tag. Language tags are normalized to BCP-47 before they are used eg “en-us” becomes “en-US”.

Supported languages for each model are listed in the Table of Supported Models.

If additional languages are provided, recognition result will contain recognition in the most likely language detected. The recognition result will include the language tag of the language detected in the audio.

Type

MutableSequence[str]

features

Speech recognition features to enable.

Type

google.cloud.speech_v2.types.RecognitionFeatures

adaptation

Speech adaptation context that weights recognizer predictions for specific words and phrases.

Type

google.cloud.speech_v2.types.SpeechAdaptation

class google.cloud.speech_v2.types.RecognitionFeatures(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Available recognition features.

profanity_filter

If set to true, the server will attempt to filter out profanities, replacing all but the initial character in each filtered word with asterisks, for instance, “f***”. If set to false or omitted, profanities won’t be filtered out.

Type

bool

enable_word_time_offsets

If true, the top result includes a list of words and the start and end time offsets (timestamps) for those words. If false, no word-level time offset information is returned. The default is false.

Type

bool

enable_word_confidence

If true, the top result includes a list of words and the confidence for those words. If false, no word-level confidence information is returned. The default is false.

Type

bool

enable_automatic_punctuation

If true, adds punctuation to recognition result hypotheses. This feature is only available in select languages. The default false value does not add punctuation to result hypotheses.

Type

bool

enable_spoken_punctuation

The spoken punctuation behavior for the call. If true, replaces spoken punctuation with the corresponding symbols in the request. For example, “how are you question mark” becomes “how are you?”. See https://cloud.google.com/speech-to-text/docs/spoken-punctuation for support. If false, spoken punctuation is not replaced.

Type

bool

enable_spoken_emojis

The spoken emoji behavior for the call. If true, adds spoken emoji formatting for the request. This will replace spoken emojis with the corresponding Unicode symbols in the final transcript. If false, spoken emojis are not replaced.

Type

bool

multi_channel_mode

Mode for recognizing multi-channel audio.

Type

google.cloud.speech_v2.types.RecognitionFeatures.MultiChannelMode

diarization_config

Configuration to enable speaker diarization and set additional parameters to make diarization better suited for your application. When this is enabled, we send all the words from the beginning of the audio for the top alternative in every consecutive STREAMING responses. This is done in order to improve our speaker tags as our models learn to identify the speakers in the conversation over time. For non-streaming requests, the diarization results will be provided only in the top alternative of the FINAL SpeechRecognitionResult.

Type

google.cloud.speech_v2.types.SpeakerDiarizationConfig

max_alternatives

Maximum number of recognition hypotheses to be returned. The server may return fewer than max_alternatives. Valid values are 0-30. A value of 0 or 1 will return a maximum of one. If omitted, will return a maximum of one.

Type

int

class MultiChannelMode(value)[source]

Bases: proto.enums.Enum

Options for how to recognize multi-channel audio.

Values:
MULTI_CHANNEL_MODE_UNSPECIFIED (0):

Default value for the multi-channel mode. If the audio contains multiple channels, only the first channel will be transcribed; other channels will be ignored.

SEPARATE_RECOGNITION_PER_CHANNEL (1):

If selected, each channel in the provided audio is transcribed independently. This cannot be selected if the selected [model][google.cloud.speech.v2.Recognizer.model] is latest_short.

class google.cloud.speech_v2.types.RecognitionOutputConfig(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Configuration options for the output(s) of recognition.

This message has oneof fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.

gcs_output_config

If this message is populated, recognition results are written to the provided Google Cloud Storage URI.

This field is a member of oneof output.

Type

google.cloud.speech_v2.types.GcsOutputConfig

inline_response_config

If this message is populated, recognition results are provided in the [BatchRecognizeResponse][google.cloud.speech.v2.BatchRecognizeResponse] message of the Operation when completed. This is only supported when calling [BatchRecognize][google.cloud.speech.v2.Speech.BatchRecognize] with just one audio file.

This field is a member of oneof output.

Type

google.cloud.speech_v2.types.InlineOutputConfig

class google.cloud.speech_v2.types.RecognitionResponseMetadata(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Metadata about the recognition request and response.

total_billed_duration

When available, billed audio seconds for the corresponding request.

Type

google.protobuf.duration_pb2.Duration

class google.cloud.speech_v2.types.RecognizeRequest(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Request message for the [Recognize][google.cloud.speech.v2.Speech.Recognize] method. Either content or uri must be supplied. Supplying both or neither returns [INVALID_ARGUMENT][google.rpc.Code.INVALID_ARGUMENT]. See content limits.

This message has oneof fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.

recognizer

Required. The name of the Recognizer to use during recognition. The expected format is projects/{project}/locations/{location}/recognizers/{recognizer}. The {recognizer} segment may be set to _ to use an empty implicit Recognizer.

Type

str

config

Features and audio metadata to use for the Automatic Speech Recognition. This field in combination with the [config_mask][google.cloud.speech.v2.RecognizeRequest.config_mask] field can be used to override parts of the [default_recognition_config][google.cloud.speech.v2.Recognizer.default_recognition_config] of the Recognizer resource.

Type

google.cloud.speech_v2.types.RecognitionConfig

config_mask

The list of fields in [config][google.cloud.speech.v2.RecognizeRequest.config] that override the values in the [default_recognition_config][google.cloud.speech.v2.Recognizer.default_recognition_config] of the recognizer during this recognition request. If no mask is provided, all non-default valued fields in [config][google.cloud.speech.v2.RecognizeRequest.config] override the values in the recognizer for this recognition request. If a mask is provided, only the fields listed in the mask override the config in the recognizer for this recognition request. If a wildcard (*) is provided, [config][google.cloud.speech.v2.RecognizeRequest.config] completely overrides and replaces the config in the recognizer for this recognition request.

Type

google.protobuf.field_mask_pb2.FieldMask

content

The audio data bytes encoded as specified in [RecognitionConfig][google.cloud.speech.v2.RecognitionConfig]. As with all bytes fields, proto buffers use a pure binary representation, whereas JSON representations use base64.

This field is a member of oneof audio_source.

Type

bytes

uri

URI that points to a file that contains audio data bytes as specified in [RecognitionConfig][google.cloud.speech.v2.RecognitionConfig]. The file must not be compressed (for example, gzip). Currently, only Google Cloud Storage URIs are supported, which must be specified in the following format: gs://bucket_name/object_name (other URI formats return [INVALID_ARGUMENT][google.rpc.Code.INVALID_ARGUMENT]). For more information, see Request URIs.

This field is a member of oneof audio_source.

Type

str

class google.cloud.speech_v2.types.RecognizeResponse(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Response message for the [Recognize][google.cloud.speech.v2.Speech.Recognize] method.

results

Sequential list of transcription results corresponding to sequential portions of audio.

Type

MutableSequence[google.cloud.speech_v2.types.SpeechRecognitionResult]

metadata

Metadata about the recognition.

Type

google.cloud.speech_v2.types.RecognitionResponseMetadata

class google.cloud.speech_v2.types.Recognizer(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

A Recognizer message. Stores recognition configuration and metadata.

name

Output only. The resource name of the Recognizer. Format: projects/{project}/locations/{location}/recognizers/{recognizer}.

Type

str

uid

Output only. System-assigned unique identifier for the Recognizer.

Type

str

display_name

User-settable, human-readable name for the Recognizer. Must be 63 characters or less.

Type

str

model

Optional. Which model to use for recognition requests. Select the model best suited to your domain to get best results.

Guidance for choosing which model to use can be found in the Transcription Models Documentation and the models supported in each region can be found in the Table Of Supported Models.

Type

str

language_codes

Optional. The language of the supplied audio as a BCP-47 language tag.

Supported languages for each model are listed in the Table of Supported Models.

If additional languages are provided, recognition result will contain recognition in the most likely language detected. The recognition result will include the language tag of the language detected in the audio. When you create or update a Recognizer, these values are stored in normalized BCP-47 form. For example, “en-us” is stored as “en-US”.

Type

MutableSequence[str]

default_recognition_config

Default configuration to use for requests with this Recognizer. This can be overwritten by inline configuration in the [RecognizeRequest.config][google.cloud.speech.v2.RecognizeRequest.config] field.

Type

google.cloud.speech_v2.types.RecognitionConfig

annotations

Allows users to store small amounts of arbitrary data. Both the key and the value must be 63 characters or less each. At most 100 annotations.

Type

MutableMapping[str, str]

state

Output only. The Recognizer lifecycle state.

Type

google.cloud.speech_v2.types.Recognizer.State

create_time

Output only. Creation time.

Type

google.protobuf.timestamp_pb2.Timestamp

update_time

Output only. The most recent time this Recognizer was modified.

Type

google.protobuf.timestamp_pb2.Timestamp

delete_time

Output only. The time at which this Recognizer was requested for deletion.

Type

google.protobuf.timestamp_pb2.Timestamp

expire_time

Output only. The time at which this Recognizer will be purged.

Type

google.protobuf.timestamp_pb2.Timestamp

etag

Output only. This checksum is computed by the server based on the value of other fields. This may be sent on update, undelete, and delete requests to ensure the client has an up-to-date value before proceeding.

Type

str

reconciling

Output only. Whether or not this Recognizer is in the process of being updated.

Type

bool

kms_key_name

Output only. The KMS key name with which the Recognizer is encrypted. The expected format is projects/{project}/locations/{location}/keyRings/{key_ring}/cryptoKeys/{crypto_key}.

Type

str

kms_key_version_name

Output only. The KMS key version name with which the Recognizer is encrypted. The expected format is projects/{project}/locations/{location}/keyRings/{key_ring}/cryptoKeys/{crypto_key}/cryptoKeyVersions/{crypto_key_version}.

Type

str

class AnnotationsEntry(mapping=None, *, ignore_unknown_fields=False, **kwargs)

Bases: proto.message.Message

class State(value)[source]

Bases: proto.enums.Enum

Set of states that define the lifecycle of a Recognizer.

Values:
STATE_UNSPECIFIED (0):

The default value. This value is used if the state is omitted.

ACTIVE (2):

The Recognizer is active and ready for use.

DELETED (4):

This Recognizer has been deleted.

class google.cloud.speech_v2.types.SpeakerDiarizationConfig(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Configuration to enable speaker diarization.

min_speaker_count

Required. Minimum number of speakers in the conversation. This range gives you more flexibility by allowing the system to automatically determine the correct number of speakers.

To fix the number of speakers detected in the audio, set min_speaker_count = max_speaker_count.

Type

int

max_speaker_count

Required. Maximum number of speakers in the conversation. Valid values are: 1-6. Must be >= min_speaker_count. This range gives you more flexibility by allowing the system to automatically determine the correct number of speakers.

Type

int

class google.cloud.speech_v2.types.SpeechAdaptation(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Provides “hints” to the speech recognizer to favor specific words and phrases in the results. PhraseSets can be specified as an inline resource, or a reference to an existing PhraseSet resource.

phrase_sets

A list of inline or referenced PhraseSets.

Type

MutableSequence[google.cloud.speech_v2.types.SpeechAdaptation.AdaptationPhraseSet]

custom_classes

A list of inline CustomClasses. Existing CustomClass resources can be referenced directly in a PhraseSet.

Type

MutableSequence[google.cloud.speech_v2.types.CustomClass]

class AdaptationPhraseSet(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

A biasing PhraseSet, which can be either a string referencing the name of an existing PhraseSets resource, or an inline definition of a PhraseSet.

This message has oneof fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.

phrase_set

The name of an existing PhraseSet resource. The user must have read access to the resource and it must not be deleted.

This field is a member of oneof value.

Type

str

inline_phrase_set

An inline defined PhraseSet.

This field is a member of oneof value.

Type

google.cloud.speech_v2.types.PhraseSet

class google.cloud.speech_v2.types.SpeechRecognitionAlternative(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Alternative hypotheses (a.k.a. n-best list).

transcript

Transcript text representing the words that the user spoke.

Type

str

confidence

The confidence estimate between 0.0 and 1.0. A higher number indicates an estimated greater likelihood that the recognized words are correct. This field is set only for the top alternative of a non-streaming result or, of a streaming result where [is_final][google.cloud.speech.v2.StreamingRecognitionResult.is_final] is set to true. This field is not guaranteed to be accurate and users should not rely on it to be always provided. The default of 0.0 is a sentinel value indicating confidence was not set.

Type

float

words

A list of word-specific information for each recognized word. When the [SpeakerDiarizationConfig][google.cloud.speech.v2.SpeakerDiarizationConfig] is set, you will see all the words from the beginning of the audio.

Type

MutableSequence[google.cloud.speech_v2.types.WordInfo]

class google.cloud.speech_v2.types.SpeechRecognitionResult(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

A speech recognition result corresponding to a portion of the audio.

alternatives

May contain one or more recognition hypotheses. These alternatives are ordered in terms of accuracy, with the top (first) alternative being the most probable, as ranked by the recognizer.

Type

MutableSequence[google.cloud.speech_v2.types.SpeechRecognitionAlternative]

channel_tag

For multi-channel audio, this is the channel number corresponding to the recognized result for the audio from that channel. For audio_channel_count = N, its output values can range from 1 to N.

Type

int

result_end_offset

Time offset of the end of this result relative to the beginning of the audio.

Type

google.protobuf.duration_pb2.Duration

language_code

Output only. The BCP-47 language tag of the language in this result. This language code was detected to have the most likelihood of being spoken in the audio.

Type

str

class google.cloud.speech_v2.types.StreamingRecognitionConfig(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Provides configuration information for the StreamingRecognize request.

config

Required. Features and audio metadata to use for the Automatic Speech Recognition. This field in combination with the [config_mask][google.cloud.speech.v2.StreamingRecognitionConfig.config_mask] field can be used to override parts of the [default_recognition_config][google.cloud.speech.v2.Recognizer.default_recognition_config] of the Recognizer resource.

Type

google.cloud.speech_v2.types.RecognitionConfig

config_mask

The list of fields in [config][google.cloud.speech.v2.StreamingRecognitionConfig.config] that override the values in the [default_recognition_config][google.cloud.speech.v2.Recognizer.default_recognition_config] of the recognizer during this recognition request. If no mask is provided, all non-default valued fields in [config][google.cloud.speech.v2.StreamingRecognitionConfig.config] override the values in the Recognizer for this recognition request. If a mask is provided, only the fields listed in the mask override the config in the Recognizer for this recognition request. If a wildcard (*) is provided, [config][google.cloud.speech.v2.StreamingRecognitionConfig.config] completely overrides and replaces the config in the recognizer for this recognition request.

Type

google.protobuf.field_mask_pb2.FieldMask

streaming_features

Speech recognition features to enable specific to streaming audio recognition requests.

Type

google.cloud.speech_v2.types.StreamingRecognitionFeatures

class google.cloud.speech_v2.types.StreamingRecognitionFeatures(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Available recognition features specific to streaming recognition requests.

enable_voice_activity_events

If true, responses with voice activity speech events will be returned as they are detected.

Type

bool

interim_results

Whether or not to stream interim results to the client. If set to true, interim results will be streamed to the client. Otherwise, only the final response will be streamed back.

Type

bool

voice_activity_timeout

If set, the server will automatically close the stream after the specified duration has elapsed after the last VOICE_ACTIVITY speech event has been sent. The field voice_activity_events must also be set to true.

Type

google.cloud.speech_v2.types.StreamingRecognitionFeatures.VoiceActivityTimeout

class VoiceActivityTimeout(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Events that a timeout can be set on for voice activity.

speech_start_timeout

Duration to timeout the stream if no speech begins. If this is set and no speech is detected in this duration at the start of the stream, the server will close the stream.

Type

google.protobuf.duration_pb2.Duration

speech_end_timeout

Duration to timeout the stream after speech ends. If this is set and no speech is detected in this duration after speech was detected, the server will close the stream.

Type

google.protobuf.duration_pb2.Duration

class google.cloud.speech_v2.types.StreamingRecognitionResult(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

A streaming speech recognition result corresponding to a portion of the audio that is currently being processed.

alternatives

May contain one or more recognition hypotheses. These alternatives are ordered in terms of accuracy, with the top (first) alternative being the most probable, as ranked by the recognizer.

Type

MutableSequence[google.cloud.speech_v2.types.SpeechRecognitionAlternative]

is_final

If false, this [StreamingRecognitionResult][google.cloud.speech.v2.StreamingRecognitionResult] represents an interim result that may change. If true, this is the final time the speech service will return this particular [StreamingRecognitionResult][google.cloud.speech.v2.StreamingRecognitionResult], the recognizer will not return any further hypotheses for this portion of the transcript and corresponding audio.

Type

bool

stability

An estimate of the likelihood that the recognizer will not change its guess about this interim result. Values range from 0.0 (completely unstable) to 1.0 (completely stable). This field is only provided for interim results ([is_final][google.cloud.speech.v2.StreamingRecognitionResult.is_final]=``false``). The default of 0.0 is a sentinel value indicating stability was not set.

Type

float

result_end_offset

Time offset of the end of this result relative to the beginning of the audio.

Type

google.protobuf.duration_pb2.Duration

channel_tag

For multi-channel audio, this is the channel number corresponding to the recognized result for the audio from that channel. For audio_channel_count = N, its output values can range from 1 to N.

Type

int

language_code

Output only. The BCP-47 language tag of the language in this result. This language code was detected to have the most likelihood of being spoken in the audio.

Type

str

class google.cloud.speech_v2.types.StreamingRecognizeRequest(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Request message for the [StreamingRecognize][google.cloud.speech.v2.Speech.StreamingRecognize] method. Multiple [StreamingRecognizeRequest][google.cloud.speech.v2.StreamingRecognizeRequest] messages are sent in one call.

If the [Recognizer][google.cloud.speech.v2.Recognizer] referenced by [recognizer][google.cloud.speech.v2.StreamingRecognizeRequest.recognizer] contains a fully specified request configuration then the stream may only contain messages with only [audio][google.cloud.speech.v2.StreamingRecognizeRequest.audio] set.

Otherwise the first message must contain a [recognizer][google.cloud.speech.v2.StreamingRecognizeRequest.recognizer] and a [streaming_config][google.cloud.speech.v2.StreamingRecognizeRequest.streaming_config] message that together fully specify the request configuration and must not contain [audio][google.cloud.speech.v2.StreamingRecognizeRequest.audio]. All subsequent messages must only have [audio][google.cloud.speech.v2.StreamingRecognizeRequest.audio] set.

This message has oneof fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.

recognizer

Required. The name of the Recognizer to use during recognition. The expected format is projects/{project}/locations/{location}/recognizers/{recognizer}. The {recognizer} segment may be set to _ to use an empty implicit Recognizer.

Type

str

streaming_config

StreamingRecognitionConfig to be used in this recognition attempt. If provided, it will override the default RecognitionConfig stored in the Recognizer.

This field is a member of oneof streaming_request.

Type

google.cloud.speech_v2.types.StreamingRecognitionConfig

audio

Inline audio bytes to be Recognized. Maximum size for this field is 15 KB per request.

This field is a member of oneof streaming_request.

Type

bytes

class google.cloud.speech_v2.types.StreamingRecognizeResponse(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

StreamingRecognizeResponse is the only message returned to the client by StreamingRecognize. A series of zero or more StreamingRecognizeResponse messages are streamed back to the client. If there is no recognizable audio then no messages are streamed back to the client.

Here are some examples of StreamingRecognizeResponses that might be returned while processing audio:

  1. results { alternatives { transcript: “tube” } stability: 0.01 }

  2. results { alternatives { transcript: “to be a” } stability: 0.01 }

  3. results { alternatives { transcript: “to be” } stability: 0.9 } results { alternatives { transcript: ” or not to be” } stability: 0.01 }

  4. results { alternatives { transcript: “to be or not to be” confidence: 0.92 } alternatives { transcript: “to bee or not to bee” } is_final: true }

  5. results { alternatives { transcript: ” that’s” } stability: 0.01 }

  6. results { alternatives { transcript: ” that is” } stability: 0.9 } results { alternatives { transcript: ” the question” } stability: 0.01 }

  7. results { alternatives { transcript: ” that is the question” confidence: 0.98 } alternatives { transcript: ” that was the question” } is_final: true }

Notes:

  • Only two of the above responses #4 and #7 contain final results; they are indicated by is_final: true. Concatenating these together generates the full transcript: “to be or not to be that is the question”.

  • The others contain interim results. #3 and #6 contain two interim results: the first portion has a high stability and is less likely to change; the second portion has a low stability and is very likely to change. A UI designer might choose to show only high stability results.

  • The specific stability and confidence values shown above are only for illustrative purposes. Actual values may vary.

  • In each response, only one of these fields will be set: error, speech_event_type, or one or more (repeated) results.

results

This repeated list contains zero or more results that correspond to consecutive portions of the audio currently being processed. It contains zero or one [is_final][google.cloud.speech.v2.StreamingRecognitionResult.is_final]=``true`` result (the newly settled portion), followed by zero or more [is_final][google.cloud.speech.v2.StreamingRecognitionResult.is_final]=``false`` results (the interim results).

Type

MutableSequence[google.cloud.speech_v2.types.StreamingRecognitionResult]

speech_event_type

Indicates the type of speech event.

Type

google.cloud.speech_v2.types.StreamingRecognizeResponse.SpeechEventType

speech_event_offset

Time offset between the beginning of the audio and event emission.

Type

google.protobuf.duration_pb2.Duration

metadata

Metadata about the recognition.

Type

google.cloud.speech_v2.types.RecognitionResponseMetadata

class SpeechEventType(value)[source]

Bases: proto.enums.Enum

Indicates the type of speech event.

Values:
SPEECH_EVENT_TYPE_UNSPECIFIED (0):

No speech event specified.

END_OF_SINGLE_UTTERANCE (1):

This event indicates that the server has detected the end of the user’s speech utterance and expects no additional speech. Therefore, the server will not process additional audio and will close the gRPC bidirectional stream. This event is only sent if there was a force cutoff due to silence being detected early. This event is only available through the latest_short [model][google.cloud.speech.v2.Recognizer.model].

SPEECH_ACTIVITY_BEGIN (2):

This event indicates that the server has detected the beginning of human voice activity in the stream. This event can be returned multiple times if speech starts and stops repeatedly throughout the stream. This event is only sent if voice_activity_events is set to true.

SPEECH_ACTIVITY_END (3):

This event indicates that the server has detected the end of human voice activity in the stream. This event can be returned multiple times if speech starts and stops repeatedly throughout the stream. This event is only sent if voice_activity_events is set to true.

class google.cloud.speech_v2.types.UndeleteCustomClassRequest(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Request message for the [UndeleteCustomClass][google.cloud.speech.v2.Speech.UndeleteCustomClass] method.

name

Required. The name of the CustomClass to undelete. Format: projects/{project}/locations/{location}/customClasses/{custom_class}

Type

str

validate_only

If set, validate the request and preview the undeleted CustomClass, but do not actually undelete it.

Type

bool

etag

This checksum is computed by the server based on the value of other fields. This may be sent on update, undelete, and delete requests to ensure the client has an up-to-date value before proceeding.

Type

str

class google.cloud.speech_v2.types.UndeletePhraseSetRequest(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Request message for the [UndeletePhraseSet][google.cloud.speech.v2.Speech.UndeletePhraseSet] method.

name

Required. The name of the PhraseSet to undelete. Format: projects/{project}/locations/{location}/phraseSets/{phrase_set}

Type

str

validate_only

If set, validate the request and preview the undeleted PhraseSet, but do not actually undelete it.

Type

bool

etag

This checksum is computed by the server based on the value of other fields. This may be sent on update, undelete, and delete requests to ensure the client has an up-to-date value before proceeding.

Type

str

class google.cloud.speech_v2.types.UndeleteRecognizerRequest(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Request message for the [UndeleteRecognizer][google.cloud.speech.v2.Speech.UndeleteRecognizer] method.

name

Required. The name of the Recognizer to undelete. Format: projects/{project}/locations/{location}/recognizers/{recognizer}

Type

str

validate_only

If set, validate the request and preview the undeleted Recognizer, but do not actually undelete it.

Type

bool

etag

This checksum is computed by the server based on the value of other fields. This may be sent on update, undelete, and delete requests to ensure the client has an up-to-date value before proceeding.

Type

str

class google.cloud.speech_v2.types.UpdateConfigRequest(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Request message for the [UpdateConfig][google.cloud.speech.v2.Speech.UpdateConfig] method.

config

Required. The config to update.

The config’s name field is used to identify the config to be updated. The expected format is projects/{project}/locations/{location}/config.

Type

google.cloud.speech_v2.types.Config

update_mask

The list of fields to be updated.

Type

google.protobuf.field_mask_pb2.FieldMask

class google.cloud.speech_v2.types.UpdateCustomClassRequest(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Request message for the [UpdateCustomClass][google.cloud.speech.v2.Speech.UpdateCustomClass] method.

custom_class

Required. The CustomClass to update.

The CustomClass’s name field is used to identify the CustomClass to update. Format: projects/{project}/locations/{location}/customClasses/{custom_class}.

Type

google.cloud.speech_v2.types.CustomClass

update_mask

The list of fields to be updated. If empty, all fields are considered for update.

Type

google.protobuf.field_mask_pb2.FieldMask

validate_only

If set, validate the request and preview the updated CustomClass, but do not actually update it.

Type

bool

class google.cloud.speech_v2.types.UpdatePhraseSetRequest(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Request message for the [UpdatePhraseSet][google.cloud.speech.v2.Speech.UpdatePhraseSet] method.

phrase_set

Required. The PhraseSet to update.

The PhraseSet’s name field is used to identify the PhraseSet to update. Format: projects/{project}/locations/{location}/phraseSets/{phrase_set}.

Type

google.cloud.speech_v2.types.PhraseSet

update_mask

The list of fields to update. If empty, all non-default valued fields are considered for update. Use * to update the entire PhraseSet resource.

Type

google.protobuf.field_mask_pb2.FieldMask

validate_only

If set, validate the request and preview the updated PhraseSet, but do not actually update it.

Type

bool

class google.cloud.speech_v2.types.UpdateRecognizerRequest(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Request message for the [UpdateRecognizer][google.cloud.speech.v2.Speech.UpdateRecognizer] method.

recognizer

Required. The Recognizer to update.

The Recognizer’s name field is used to identify the Recognizer to update. Format: projects/{project}/locations/{location}/recognizers/{recognizer}.

Type

google.cloud.speech_v2.types.Recognizer

update_mask

The list of fields to update. If empty, all non-default valued fields are considered for update. Use * to update the entire Recognizer resource.

Type

google.protobuf.field_mask_pb2.FieldMask

validate_only

If set, validate the request and preview the updated Recognizer, but do not actually update it.

Type

bool

class google.cloud.speech_v2.types.WordInfo(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Word-specific information for recognized words.

start_offset

Time offset relative to the beginning of the audio, and corresponding to the start of the spoken word. This field is only set if [enable_word_time_offsets][google.cloud.speech.v2.RecognitionFeatures.enable_word_time_offsets] is true and only in the top hypothesis. This is an experimental feature and the accuracy of the time offset can vary.

Type

google.protobuf.duration_pb2.Duration

end_offset

Time offset relative to the beginning of the audio, and corresponding to the end of the spoken word. This field is only set if [enable_word_time_offsets][google.cloud.speech.v2.RecognitionFeatures.enable_word_time_offsets] is true and only in the top hypothesis. This is an experimental feature and the accuracy of the time offset can vary.

Type

google.protobuf.duration_pb2.Duration

word

The word corresponding to this set of information.

Type

str

confidence

The confidence estimate between 0.0 and 1.0. A higher number indicates an estimated greater likelihood that the recognized words are correct. This field is set only for the top alternative of a non-streaming result or, of a streaming result where [is_final][google.cloud.speech.v2.StreamingRecognitionResult.is_final] is set to true. This field is not guaranteed to be accurate and users should not rely on it to be always provided. The default of 0.0 is a sentinel value indicating confidence was not set.

Type

float

speaker_label

A distinct label is assigned for every speaker within the audio. This field specifies which one of those speakers was detected to have spoken this word. speaker_label is set if [SpeakerDiarizationConfig][google.cloud.speech.v2.SpeakerDiarizationConfig] is given and only in the top alternative.

Type

str