As of January 1, 2020 this library no longer supports Python 2 on the latest released version. Library versions released prior to that date will continue to be available. For more information please visit Python 2 support on Google Cloud.

Types for Google Cloud Documentai v1beta3 API

class google.cloud.documentai_v1beta3.types.Barcode(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Encodes the detailed information of a barcode.

format_

Format of a barcode. The supported formats are:

  • CODE_128: Code 128 type.

  • CODE_39: Code 39 type.

  • CODE_93: Code 93 type.

  • CODABAR: Codabar type.

  • DATA_MATRIX: 2D Data Matrix type.

  • ITF: ITF type.

  • EAN_13: EAN-13 type.

  • EAN_8: EAN-8 type.

  • QR_CODE: 2D QR code type.

  • UPC_A: UPC-A type.

  • UPC_E: UPC-E type.

  • PDF417: PDF417 type.

  • AZTEC: 2D Aztec code type.

  • DATABAR: GS1 DataBar code type.

Type

str

value_format

Value format describes the format of the value that a barcode encodes. The supported formats are:

  • CONTACT_INFO: Contact information.

  • EMAIL: Email address.

  • ISBN: ISBN identifier.

  • PHONE: Phone number.

  • PRODUCT: Product.

  • SMS: SMS message.

  • TEXT: Text string.

  • URL: URL address.

  • WIFI: Wifi information.

  • GEO: Geo-localization.

  • CALENDAR_EVENT: Calendar event.

  • DRIVER_LICENSE: Driver’s license.

Type

str

raw_value

Raw value encoded in the barcode. For example: 'MEBKM:TITLE:Google;URL:https://www.google.com;;'.

Type

str

class google.cloud.documentai_v1beta3.types.BatchDatasetDocuments(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Dataset documents that the batch operation will be applied to.

This message has oneof fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.

individual_document_ids

Document identifiers.

This field is a member of oneof criteria.

Type

google.cloud.documentai_v1beta3.types.BatchDatasetDocuments.IndividualDocumentIds

filter

A filter matching the documents. Follows the same format and restriction as [google.cloud.documentai.master.ListDocumentsRequest.filter].

This field is a member of oneof criteria.

Type

str

class IndividualDocumentIds(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

List of individual DocumentIds.

document_ids

Required. List of Document IDs indicating where the actual documents are stored.

Type

MutableSequence[google.cloud.documentai_v1beta3.types.DocumentId]

class google.cloud.documentai_v1beta3.types.BatchDeleteDocumentsMetadata(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

common_metadata

The basic metadata of the long-running operation.

Type

google.cloud.documentai_v1beta3.types.CommonOperationMetadata

individual_batch_delete_statuses

The list of response details of each document.

Type

MutableSequence[google.cloud.documentai_v1beta3.types.BatchDeleteDocumentsMetadata.IndividualBatchDeleteStatus]

total_document_count

Total number of documents deleting from dataset.

Type

int

error_document_count

Total number of documents that failed to be deleted in storage.

Type

int

class IndividualBatchDeleteStatus(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

The status of each individual document in the batch delete process.

document_id

The document id of the document.

Type

google.cloud.documentai_v1beta3.types.DocumentId

status

The status of deleting the document in storage.

Type

google.rpc.status_pb2.Status

class google.cloud.documentai_v1beta3.types.BatchDeleteDocumentsRequest(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

dataset

Required. The dataset resource name. Format:

projects/{project}/locations/{location}/processors/{processor}/dataset

Type

str

dataset_documents

Required. Dataset documents input. If given filter, all documents satisfying the filter will be deleted. If given documentIds, a maximum of 50 documents can be deleted in a batch. The request will be rejected if more than 50 document_ids are provided.

Type

google.cloud.documentai_v1beta3.types.BatchDatasetDocuments

class google.cloud.documentai_v1beta3.types.BatchDeleteDocumentsResponse(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Response of the delete documents operation.

class google.cloud.documentai_v1beta3.types.BatchDocumentsInputConfig(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

The common config to specify a set of documents used as input.

This message has oneof fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.

gcs_prefix

The set of documents that match the specified Cloud Storage gcs_prefix.

This field is a member of oneof source.

Type

google.cloud.documentai_v1beta3.types.GcsPrefix

gcs_documents

The set of documents individually specified on Cloud Storage.

This field is a member of oneof source.

Type

google.cloud.documentai_v1beta3.types.GcsDocuments

class google.cloud.documentai_v1beta3.types.BatchProcessMetadata(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

The long-running operation metadata for [BatchProcessDocuments][google.cloud.documentai.v1beta3.DocumentProcessorService.BatchProcessDocuments].

state

The state of the current batch processing.

Type

google.cloud.documentai_v1beta3.types.BatchProcessMetadata.State

state_message

A message providing more details about the current state of processing. For example, the error message if the operation is failed.

Type

str

create_time

The creation time of the operation.

Type

google.protobuf.timestamp_pb2.Timestamp

update_time

The last update time of the operation.

Type

google.protobuf.timestamp_pb2.Timestamp

individual_process_statuses

The list of response details of each document.

Type

MutableSequence[google.cloud.documentai_v1beta3.types.BatchProcessMetadata.IndividualProcessStatus]

class IndividualProcessStatus(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

The status of a each individual document in the batch process.

input_gcs_source

The source of the document, same as the [input_gcs_source][google.cloud.documentai.v1beta3.BatchProcessMetadata.IndividualProcessStatus.input_gcs_source] field in the request when the batch process started.

Type

str

status

The status processing the document.

Type

google.rpc.status_pb2.Status

output_gcs_destination

The Cloud Storage output destination (in the request as [DocumentOutputConfig.GcsOutputConfig.gcs_uri][google.cloud.documentai.v1beta3.DocumentOutputConfig.GcsOutputConfig.gcs_uri]) of the processed document if it was successful, otherwise empty.

Type

str

human_review_operation

The name of the operation triggered by the processed document. If the human review process isn’t triggered, this field will be empty. It has the same response type and metadata as the long-running operation returned by the [ReviewDocument][google.cloud.documentai.v1beta3.DocumentProcessorService.ReviewDocument] method.

Type

str

human_review_status

The status of human review on the processed document.

Type

google.cloud.documentai_v1beta3.types.HumanReviewStatus

class State(value)[source]

Bases: proto.enums.Enum

Possible states of the batch processing operation.

Values:
STATE_UNSPECIFIED (0):

The default value. This value is used if the state is omitted.

WAITING (1):

Request operation is waiting for scheduling.

RUNNING (2):

Request is being processed.

SUCCEEDED (3):

The batch processing completed successfully.

CANCELLING (4):

The batch processing was being cancelled.

CANCELLED (5):

The batch processing was cancelled.

FAILED (6):

The batch processing has failed.

class google.cloud.documentai_v1beta3.types.BatchProcessRequest(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Request message for [BatchProcessDocuments][google.cloud.documentai.v1beta3.DocumentProcessorService.BatchProcessDocuments].

name

Required. The resource name of [Processor][google.cloud.documentai.v1beta3.Processor] or [ProcessorVersion][google.cloud.documentai.v1beta3.ProcessorVersion]. Format: projects/{project}/locations/{location}/processors/{processor}, or projects/{project}/locations/{location}/processors/{processor}/processorVersions/{processorVersion}

Type

str

input_configs

The input config for each single document in the batch process.

Type

MutableSequence[google.cloud.documentai_v1beta3.types.BatchProcessRequest.BatchInputConfig]

output_config

The overall output config for batch process.

Type

google.cloud.documentai_v1beta3.types.BatchProcessRequest.BatchOutputConfig

input_documents

The input documents for the [BatchProcessDocuments][google.cloud.documentai.v1beta3.DocumentProcessorService.BatchProcessDocuments] method.

Type

google.cloud.documentai_v1beta3.types.BatchDocumentsInputConfig

document_output_config

The output configuration for the [BatchProcessDocuments][google.cloud.documentai.v1beta3.DocumentProcessorService.BatchProcessDocuments] method.

Type

google.cloud.documentai_v1beta3.types.DocumentOutputConfig

skip_human_review

Whether human review should be skipped for this request. Default to false.

Type

bool

process_options

Inference-time options for the process API

Type

google.cloud.documentai_v1beta3.types.ProcessOptions

labels

Optional. The labels with user-defined metadata for the request. Label keys and values can be no longer than 63 characters (Unicode codepoints) and can only contain lowercase letters, numeric characters, underscores, and dashes. International characters are allowed. Label values are optional. Label keys must start with a letter.

Type

MutableMapping[str, str]

class BatchInputConfig(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

The message for input config in batch process.

gcs_source

The Cloud Storage location as the source of the document.

Type

str

mime_type

An IANA published media type (MIME type) of the input. If the input is a raw document, refer to supported file types for the list of media types. If the input is a [Document][google.cloud.documentai.v1beta3.Document], the type should be application/json.

Type

str

class BatchOutputConfig(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

The output configuration in the [BatchProcessDocuments][google.cloud.documentai.v1beta3.DocumentProcessorService.BatchProcessDocuments] method.

gcs_destination

The output Cloud Storage directory to put the processed documents.

Type

str

class LabelsEntry(mapping=None, *, ignore_unknown_fields=False, **kwargs)

Bases: proto.message.Message

class google.cloud.documentai_v1beta3.types.BatchProcessResponse(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Response message for [BatchProcessDocuments][google.cloud.documentai.v1beta3.DocumentProcessorService.BatchProcessDocuments].

class google.cloud.documentai_v1beta3.types.BoundingPoly(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

A bounding polygon for the detected image annotation.

vertices

The bounding polygon vertices.

Type

MutableSequence[google.cloud.documentai_v1beta3.types.Vertex]

normalized_vertices

The bounding polygon normalized vertices.

Type

MutableSequence[google.cloud.documentai_v1beta3.types.NormalizedVertex]

class google.cloud.documentai_v1beta3.types.CommonOperationMetadata(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

The common metadata for long running operations.

state

The state of the operation.

Type

google.cloud.documentai_v1beta3.types.CommonOperationMetadata.State

state_message

A message providing more details about the current state of processing.

Type

str

resource

A related resource to this operation.

Type

str

create_time

The creation time of the operation.

Type

google.protobuf.timestamp_pb2.Timestamp

update_time

The last update time of the operation.

Type

google.protobuf.timestamp_pb2.Timestamp

class State(value)[source]

Bases: proto.enums.Enum

State of the longrunning operation.

Values:
STATE_UNSPECIFIED (0):

Unspecified state.

RUNNING (1):

Operation is still running.

CANCELLING (2):

Operation is being cancelled.

SUCCEEDED (3):

Operation succeeded.

FAILED (4):

Operation failed.

CANCELLED (5):

Operation is cancelled.

class google.cloud.documentai_v1beta3.types.CreateProcessorRequest(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Request message for the [CreateProcessor][google.cloud.documentai.v1beta3.DocumentProcessorService.CreateProcessor] method. Notice this request is sent to a regionalized backend service. If the [ProcessorType][google.cloud.documentai.v1beta3.ProcessorType] isn’t available in that region, the creation fails.

parent

Required. The parent (project and location) under which to create the processor. Format: projects/{project}/locations/{location}

Type

str

processor

Required. The processor to be created, requires [Processor.type][google.cloud.documentai.v1beta3.Processor.type] and [Processor.display_name][google.cloud.documentai.v1beta3.Processor.display_name] to be set. Also, the [Processor.kms_key_name][google.cloud.documentai.v1beta3.Processor.kms_key_name] field must be set if the processor is under CMEK.

Type

google.cloud.documentai_v1beta3.types.Processor

class google.cloud.documentai_v1beta3.types.Dataset(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

A singleton resource under a [Processor][google.cloud.documentai.v1beta3.Processor] which configures a collection of documents.

This message has oneof fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.

gcs_managed_config

Optional. User-managed Cloud Storage dataset configuration. Use this configuration if the dataset documents are stored under a user-managed Cloud Storage location.

This field is a member of oneof storage_source.

Type

google.cloud.documentai_v1beta3.types.Dataset.GCSManagedConfig

document_warehouse_config

Optional. Deprecated. Warehouse-based dataset configuration is not supported.

This field is a member of oneof storage_source.

Type

google.cloud.documentai_v1beta3.types.Dataset.DocumentWarehouseConfig

unmanaged_dataset_config

Optional. Unmanaged dataset configuration. Use this configuration if the dataset documents are managed by the document service internally (not user-managed).

This field is a member of oneof storage_source.

Type

google.cloud.documentai_v1beta3.types.Dataset.UnmanagedDatasetConfig

spanner_indexing_config

Optional. A lightweight indexing source with low latency and high reliability, but lacking advanced features like CMEK and content-based search.

This field is a member of oneof indexing_source.

Type

google.cloud.documentai_v1beta3.types.Dataset.SpannerIndexingConfig

name

Dataset resource name. Format: projects/{project}/locations/{location}/processors/{processor}/dataset

Type

str

state

Required. State of the dataset. Ignored when updating dataset.

Type

google.cloud.documentai_v1beta3.types.Dataset.State

satisfies_pzs

Output only. Reserved for future use.

Type

bool

satisfies_pzi

Output only. Reserved for future use.

Type

bool

class DocumentWarehouseConfig(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Configuration specific to the Document AI Warehouse-based implementation.

collection

Output only. The collection in Document AI Warehouse associated with the dataset.

Type

str

schema

Output only. The schema in Document AI Warehouse associated with the dataset.

Type

str

class GCSManagedConfig(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Configuration specific to the Cloud Storage-based implementation.

gcs_prefix

Required. The Cloud Storage URI (a directory) where the documents belonging to the dataset must be stored.

Type

google.cloud.documentai_v1beta3.types.GcsPrefix

class SpannerIndexingConfig(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Configuration specific to spanner-based indexing.

class State(value)[source]

Bases: proto.enums.Enum

Different states of a dataset.

Values:
STATE_UNSPECIFIED (0):

Default unspecified enum, should not be used.

UNINITIALIZED (1):

Dataset has not been initialized.

INITIALIZING (2):

Dataset is being initialized.

INITIALIZED (3):

Dataset has been initialized.

class UnmanagedDatasetConfig(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Configuration specific to an unmanaged dataset.

class google.cloud.documentai_v1beta3.types.DatasetSchema(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Dataset Schema.

name

Dataset schema resource name. Format: projects/{project}/locations/{location}/processors/{processor}/dataset/datasetSchema

Type

str

document_schema

Optional. Schema of the dataset.

Type

google.cloud.documentai_v1beta3.types.DocumentSchema

satisfies_pzs

Output only. Reserved for future use.

Type

bool

satisfies_pzi

Output only. Reserved for future use.

Type

bool

class google.cloud.documentai_v1beta3.types.DatasetSplitType(value)[source]

Bases: proto.enums.Enum

Documents belonging to a dataset will be split into different groups referred to as splits: train, test.

Values:
DATASET_SPLIT_TYPE_UNSPECIFIED (0):

Default value if the enum is not set.

DATASET_SPLIT_TRAIN (1):

Identifies the train documents.

DATASET_SPLIT_TEST (2):

Identifies the test documents.

DATASET_SPLIT_UNASSIGNED (3):

Identifies the unassigned documents.

class google.cloud.documentai_v1beta3.types.DeleteProcessorMetadata(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

The long-running operation metadata for the [DeleteProcessor][google.cloud.documentai.v1beta3.DocumentProcessorService.DeleteProcessor] method.

common_metadata

The basic metadata of the long-running operation.

Type

google.cloud.documentai_v1beta3.types.CommonOperationMetadata

class google.cloud.documentai_v1beta3.types.DeleteProcessorRequest(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Request message for the [DeleteProcessor][google.cloud.documentai.v1beta3.DocumentProcessorService.DeleteProcessor] method.

name

Required. The processor resource name to be deleted.

Type

str

class google.cloud.documentai_v1beta3.types.DeleteProcessorVersionMetadata(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

The long-running operation metadata for the [DeleteProcessorVersion][google.cloud.documentai.v1beta3.DocumentProcessorService.DeleteProcessorVersion] method.

common_metadata

The basic metadata of the long-running operation.

Type

google.cloud.documentai_v1beta3.types.CommonOperationMetadata

class google.cloud.documentai_v1beta3.types.DeleteProcessorVersionRequest(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Request message for the [DeleteProcessorVersion][google.cloud.documentai.v1beta3.DocumentProcessorService.DeleteProcessorVersion] method.

name

Required. The processor version resource name to be deleted.

Type

str

class google.cloud.documentai_v1beta3.types.DeployProcessorVersionMetadata(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

The long-running operation metadata for the [DeployProcessorVersion][google.cloud.documentai.v1beta3.DocumentProcessorService.DeployProcessorVersion] method.

common_metadata

The basic metadata of the long-running operation.

Type

google.cloud.documentai_v1beta3.types.CommonOperationMetadata

class google.cloud.documentai_v1beta3.types.DeployProcessorVersionRequest(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Request message for the [DeployProcessorVersion][google.cloud.documentai.v1beta3.DocumentProcessorService.DeployProcessorVersion] method.

name

Required. The processor version resource name to be deployed.

Type

str

class google.cloud.documentai_v1beta3.types.DeployProcessorVersionResponse(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Response message for the [DeployProcessorVersion][google.cloud.documentai.v1beta3.DocumentProcessorService.DeployProcessorVersion] method.

class google.cloud.documentai_v1beta3.types.DisableProcessorMetadata(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

The long-running operation metadata for the [DisableProcessor][google.cloud.documentai.v1beta3.DocumentProcessorService.DisableProcessor] method.

common_metadata

The basic metadata of the long-running operation.

Type

google.cloud.documentai_v1beta3.types.CommonOperationMetadata

class google.cloud.documentai_v1beta3.types.DisableProcessorRequest(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Request message for the [DisableProcessor][google.cloud.documentai.v1beta3.DocumentProcessorService.DisableProcessor] method.

name

Required. The processor resource name to be disabled.

Type

str

class google.cloud.documentai_v1beta3.types.DisableProcessorResponse(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Response message for the [DisableProcessor][google.cloud.documentai.v1beta3.DocumentProcessorService.DisableProcessor] method. Intentionally empty proto for adding fields in future.

class google.cloud.documentai_v1beta3.types.Document(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Document represents the canonical document resource in Document AI. It is an interchange format that provides insights into documents and allows for collaboration between users and Document AI to iterate and optimize for quality.

This message has oneof fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.

uri

Optional. Currently supports Google Cloud Storage URI of the form gs://bucket_name/object_name. Object versioning is not supported. For more information, refer to Google Cloud Storage Request URIs.

This field is a member of oneof source.

Type

str

content

Optional. Inline document content, represented as a stream of bytes. Note: As with all bytes fields, protobuffers use a pure binary representation, whereas JSON representations use base64.

This field is a member of oneof source.

Type

bytes

mime_type

An IANA published media type (MIME type).

Type

str

text

Optional. UTF-8 encoded text in reading order from the document.

Type

str

text_styles

Styles for the [Document.text][google.cloud.documentai.v1beta3.Document.text].

Type

MutableSequence[google.cloud.documentai_v1beta3.types.Document.Style]

pages

Visual page layout for the [Document][google.cloud.documentai.v1beta3.Document].

Type

MutableSequence[google.cloud.documentai_v1beta3.types.Document.Page]

entities

A list of entities detected on [Document.text][google.cloud.documentai.v1beta3.Document.text]. For document shards, entities in this list may cross shard boundaries.

Type

MutableSequence[google.cloud.documentai_v1beta3.types.Document.Entity]

entity_relations

Placeholder. Relationship among [Document.entities][google.cloud.documentai.v1beta3.Document.entities].

Type

MutableSequence[google.cloud.documentai_v1beta3.types.Document.EntityRelation]

text_changes

Placeholder. A list of text corrections made to [Document.text][google.cloud.documentai.v1beta3.Document.text]. This is usually used for annotating corrections to OCR mistakes. Text changes for a given revision may not overlap with each other.

Type

MutableSequence[google.cloud.documentai_v1beta3.types.Document.TextChange]

shard_info

Information about the sharding if this document is sharded part of a larger document. If the document is not sharded, this message is not specified.

Type

google.cloud.documentai_v1beta3.types.Document.ShardInfo

error

Any error that occurred while processing this document.

Type

google.rpc.status_pb2.Status

revisions

Placeholder. Revision history of this document.

Type

MutableSequence[google.cloud.documentai_v1beta3.types.Document.Revision]

document_layout

Parsed layout of the document.

Type

google.cloud.documentai_v1beta3.types.Document.DocumentLayout

chunked_document

Document chunked based on chunking config.

Type

google.cloud.documentai_v1beta3.types.Document.ChunkedDocument

class ChunkedDocument(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Represents the chunks that the document is divided into.

chunks

List of chunks.

Type

MutableSequence[google.cloud.documentai_v1beta3.types.Document.ChunkedDocument.Chunk]

class Chunk(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Represents a chunk.

chunk_id

ID of the chunk.

Type

str

source_block_ids

Unused.

Type

MutableSequence[str]

content

Text content of the chunk.

Type

str

page_span

Page span of the chunk.

Type

google.cloud.documentai_v1beta3.types.Document.ChunkedDocument.Chunk.ChunkPageSpan

page_headers

Page headers associated with the chunk.

Type

MutableSequence[google.cloud.documentai_v1beta3.types.Document.ChunkedDocument.Chunk.ChunkPageHeader]

page_footers

Page footers associated with the chunk.

Type

MutableSequence[google.cloud.documentai_v1beta3.types.Document.ChunkedDocument.Chunk.ChunkPageFooter]

class ChunkPageFooter(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Represents the page footer associated with the chunk.

text

Footer in text format.

Type

str

page_span

Page span of the footer.

Type

google.cloud.documentai_v1beta3.types.Document.ChunkedDocument.Chunk.ChunkPageSpan

class ChunkPageHeader(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Represents the page header associated with the chunk.

text

Header in text format.

Type

str

page_span

Page span of the header.

Type

google.cloud.documentai_v1beta3.types.Document.ChunkedDocument.Chunk.ChunkPageSpan

class ChunkPageSpan(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Represents where the chunk starts and ends in the document.

page_start

Page where chunk starts in the document.

Type

int

page_end

Page where chunk ends in the document.

Type

int

class DocumentLayout(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Represents the parsed layout of a document as a collection of blocks that the document is divided into.

blocks

List of blocks in the document.

Type

MutableSequence[google.cloud.documentai_v1beta3.types.Document.DocumentLayout.DocumentLayoutBlock]

class DocumentLayoutBlock(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Represents a block. A block could be one of the various types (text, table, list) supported.

This message has oneof fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.

text_block

Block consisting of text content.

This field is a member of oneof block.

Type

google.cloud.documentai_v1beta3.types.Document.DocumentLayout.DocumentLayoutBlock.LayoutTextBlock

table_block

Block consisting of table content/structure.

This field is a member of oneof block.

Type

google.cloud.documentai_v1beta3.types.Document.DocumentLayout.DocumentLayoutBlock.LayoutTableBlock

list_block

Block consisting of list content/structure.

This field is a member of oneof block.

Type

google.cloud.documentai_v1beta3.types.Document.DocumentLayout.DocumentLayoutBlock.LayoutListBlock

block_id

ID of the block.

Type

str

page_span

Page span of the block.

Type

google.cloud.documentai_v1beta3.types.Document.DocumentLayout.DocumentLayoutBlock.LayoutPageSpan

class LayoutListBlock(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Represents a list type block.

list_entries

List entries that constitute a list block.

Type

MutableSequence[google.cloud.documentai_v1beta3.types.Document.DocumentLayout.DocumentLayoutBlock.LayoutListEntry]

type_

Type of the list_entries (if exist). Available options are ordered and unordered.

Type

str

class LayoutListEntry(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Represents an entry in the list.

blocks

A list entry is a list of blocks. Repeated blocks support further hierarchies and nested blocks.

Type

MutableSequence[google.cloud.documentai_v1beta3.types.Document.DocumentLayout.DocumentLayoutBlock]

class LayoutPageSpan(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Represents where the block starts and ends in the document.

page_start

Page where block starts in the document.

Type

int

page_end

Page where block ends in the document.

Type

int

class LayoutTableBlock(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Represents a table type block.

header_rows

Header rows at the top of the table.

Type

MutableSequence[google.cloud.documentai_v1beta3.types.Document.DocumentLayout.DocumentLayoutBlock.LayoutTableRow]

body_rows

Body rows containing main table content.

Type

MutableSequence[google.cloud.documentai_v1beta3.types.Document.DocumentLayout.DocumentLayoutBlock.LayoutTableRow]

caption

Table caption/title.

Type

str

class LayoutTableCell(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Represents a cell in a table row.

blocks

A table cell is a list of blocks. Repeated blocks support further hierarchies and nested blocks.

Type

MutableSequence[google.cloud.documentai_v1beta3.types.Document.DocumentLayout.DocumentLayoutBlock]

row_span

How many rows this cell spans.

Type

int

col_span

How many columns this cell spans.

Type

int

class LayoutTableRow(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Represents a row in a table.

cells

A table row is a list of table cells.

Type

MutableSequence[google.cloud.documentai_v1beta3.types.Document.DocumentLayout.DocumentLayoutBlock.LayoutTableCell]

class LayoutTextBlock(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Represents a text type block.

text

Text content stored in the block.

Type

str

type_

Type of the text in the block. Available options are: paragraph, subtitle, heading-1, heading-2, heading-3, heading-4, heading-5, header, footer.

Type

str

blocks

A text block could further have child blocks. Repeated blocks support further hierarchies and nested blocks.

Type

MutableSequence[google.cloud.documentai_v1beta3.types.Document.DocumentLayout.DocumentLayoutBlock]

class Entity(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

An entity that could be a phrase in the text or a property that belongs to the document. It is a known entity type, such as a person, an organization, or location.

text_anchor

Optional. Provenance of the entity. Text anchor indexing into the [Document.text][google.cloud.documentai.v1beta3.Document.text].

Type

google.cloud.documentai_v1beta3.types.Document.TextAnchor

type_

Required. Entity type from a schema e.g. Address.

Type

str

mention_text

Optional. Text value of the entity e.g. 1600 Amphitheatre Pkwy.

Type

str

mention_id

Optional. Deprecated. Use id field instead.

Type

str

confidence

Optional. Confidence of detected Schema entity. Range [0, 1].

Type

float

page_anchor

Optional. Represents the provenance of this entity wrt. the location on the page where it was found.

Type

google.cloud.documentai_v1beta3.types.Document.PageAnchor

id

Optional. Canonical id. This will be a unique value in the entity list for this document.

Type

str

normalized_value

Optional. Normalized entity value. Absent if the extracted value could not be converted or the type (e.g. address) is not supported for certain parsers. This field is also only populated for certain supported document types.

Type

google.cloud.documentai_v1beta3.types.Document.Entity.NormalizedValue

properties

Optional. Entities can be nested to form a hierarchical data structure representing the content in the document.

Type

MutableSequence[google.cloud.documentai_v1beta3.types.Document.Entity]

provenance

Optional. The history of this annotation.

Type

google.cloud.documentai_v1beta3.types.Document.Provenance

redacted

Optional. Whether the entity will be redacted for de-identification purposes.

Type

bool

class NormalizedValue(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Parsed and normalized entity value.

This message has oneof fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.

money_value

Money value. See also:

https://github.com/googleapis/googleapis/blob/master/google/type/money.proto

This field is a member of oneof structured_value.

Type

google.type.money_pb2.Money

date_value

Date value. Includes year, month, day. See also: https://github.com/googleapis/googleapis/blob/master/google/type/date.proto

This field is a member of oneof structured_value.

Type

google.type.date_pb2.Date

datetime_value

DateTime value. Includes date, time, and timezone. See also: https://github.com/googleapis/googleapis/blob/master/google/type/datetime.proto

This field is a member of oneof structured_value.

Type

google.type.datetime_pb2.DateTime

address_value

Postal address. See also: https://github.com/googleapis/googleapis/blob/master/google/type/postal_address.proto

This field is a member of oneof structured_value.

Type

google.type.postal_address_pb2.PostalAddress

boolean_value

Boolean value. Can be used for entities with binary values, or for checkboxes.

This field is a member of oneof structured_value.

Type

bool

integer_value

Integer value.

This field is a member of oneof structured_value.

Type

int

float_value

Float value.

This field is a member of oneof structured_value.

Type

float

text

Optional. An optional field to store a normalized string. For some entity types, one of respective structured_value fields may also be populated. Also not all the types of structured_value will be normalized. For example, some processors may not generate float or integer normalized text by default.

Below are sample formats mapped to structured values.

  • Money/Currency type (money_value) is in the ISO 4217 text format.

  • Date type (date_value) is in the ISO 8601 text format.

  • Datetime type (datetime_value) is in the ISO 8601 text format.

Type

str

class EntityRelation(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Relationship between [Entities][google.cloud.documentai.v1beta3.Document.Entity].

subject_id

Subject entity id.

Type

str

object_id

Object entity id.

Type

str

relation

Relationship description.

Type

str

class Page(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

A page in a [Document][google.cloud.documentai.v1beta3.Document].

page_number

1-based index for current [Page][google.cloud.documentai.v1beta3.Document.Page] in a parent [Document][google.cloud.documentai.v1beta3.Document]. Useful when a page is taken out of a [Document][google.cloud.documentai.v1beta3.Document] for individual processing.

Type

int

image

Rendered image for this page. This image is preprocessed to remove any skew, rotation, and distortions such that the annotation bounding boxes can be upright and axis-aligned.

Type

google.cloud.documentai_v1beta3.types.Document.Page.Image

transforms

Transformation matrices that were applied to the original document image to produce [Page.image][google.cloud.documentai.v1beta3.Document.Page.image].

Type

MutableSequence[google.cloud.documentai_v1beta3.types.Document.Page.Matrix]

dimension

Physical dimension of the page.

Type

google.cloud.documentai_v1beta3.types.Document.Page.Dimension

layout

[Layout][google.cloud.documentai.v1beta3.Document.Page.Layout] for the page.

Type

google.cloud.documentai_v1beta3.types.Document.Page.Layout

detected_languages

A list of detected languages together with confidence.

Type

MutableSequence[google.cloud.documentai_v1beta3.types.Document.Page.DetectedLanguage]

blocks

A list of visually detected text blocks on the page. A block has a set of lines (collected into paragraphs) that have a common line-spacing and orientation.

Type

MutableSequence[google.cloud.documentai_v1beta3.types.Document.Page.Block]

paragraphs

A list of visually detected text paragraphs on the page. A collection of lines that a human would perceive as a paragraph.

Type

MutableSequence[google.cloud.documentai_v1beta3.types.Document.Page.Paragraph]

lines

A list of visually detected text lines on the page. A collection of tokens that a human would perceive as a line.

Type

MutableSequence[google.cloud.documentai_v1beta3.types.Document.Page.Line]

tokens

A list of visually detected tokens on the page.

Type

MutableSequence[google.cloud.documentai_v1beta3.types.Document.Page.Token]

visual_elements

A list of detected non-text visual elements e.g. checkbox, signature etc. on the page.

Type

MutableSequence[google.cloud.documentai_v1beta3.types.Document.Page.VisualElement]

tables

A list of visually detected tables on the page.

Type

MutableSequence[google.cloud.documentai_v1beta3.types.Document.Page.Table]

form_fields

A list of visually detected form fields on the page.

Type

MutableSequence[google.cloud.documentai_v1beta3.types.Document.Page.FormField]

symbols

A list of visually detected symbols on the page.

Type

MutableSequence[google.cloud.documentai_v1beta3.types.Document.Page.Symbol]

detected_barcodes

A list of detected barcodes.

Type

MutableSequence[google.cloud.documentai_v1beta3.types.Document.Page.DetectedBarcode]

image_quality_scores

Image quality scores.

Type

google.cloud.documentai_v1beta3.types.Document.Page.ImageQualityScores

provenance

The history of this page.

Type

google.cloud.documentai_v1beta3.types.Document.Provenance

class Block(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

A block has a set of lines (collected into paragraphs) that have a common line-spacing and orientation.

layout

[Layout][google.cloud.documentai.v1beta3.Document.Page.Layout] for [Block][google.cloud.documentai.v1beta3.Document.Page.Block].

Type

google.cloud.documentai_v1beta3.types.Document.Page.Layout

detected_languages

A list of detected languages together with confidence.

Type

MutableSequence[google.cloud.documentai_v1beta3.types.Document.Page.DetectedLanguage]

provenance

The history of this annotation.

Type

google.cloud.documentai_v1beta3.types.Document.Provenance

class DetectedBarcode(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

A detected barcode.

layout

[Layout][google.cloud.documentai.v1beta3.Document.Page.Layout] for [DetectedBarcode][google.cloud.documentai.v1beta3.Document.Page.DetectedBarcode].

Type

google.cloud.documentai_v1beta3.types.Document.Page.Layout

barcode

Detailed barcode information of the [DetectedBarcode][google.cloud.documentai.v1beta3.Document.Page.DetectedBarcode].

Type

google.cloud.documentai_v1beta3.types.Barcode

class DetectedLanguage(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Detected language for a structural component.

language_code

The BCP-47 language code, such as en-US or sr-Latn.

Type

str

confidence

Confidence of detected language. Range [0, 1].

Type

float

class Dimension(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Dimension for the page.

width

Page width.

Type

float

height

Page height.

Type

float

unit

Dimension unit.

Type

str

class FormField(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

A form field detected on the page.

field_name

[Layout][google.cloud.documentai.v1beta3.Document.Page.Layout] for the [FormField][google.cloud.documentai.v1beta3.Document.Page.FormField] name. e.g. Address, Email, Grand total, Phone number, etc.

Type

google.cloud.documentai_v1beta3.types.Document.Page.Layout

field_value

[Layout][google.cloud.documentai.v1beta3.Document.Page.Layout] for the [FormField][google.cloud.documentai.v1beta3.Document.Page.FormField] value.

Type

google.cloud.documentai_v1beta3.types.Document.Page.Layout

name_detected_languages

A list of detected languages for name together with confidence.

Type

MutableSequence[google.cloud.documentai_v1beta3.types.Document.Page.DetectedLanguage]

value_detected_languages

A list of detected languages for value together with confidence.

Type

MutableSequence[google.cloud.documentai_v1beta3.types.Document.Page.DetectedLanguage]

value_type

If the value is non-textual, this field represents the type. Current valid values are:

  • blank (this indicates the field_value is normal text)

  • unfilled_checkbox

  • filled_checkbox

Type

str

corrected_key_text

Created for Labeling UI to export key text. If corrections were made to the text identified by the field_name.text_anchor, this field will contain the correction.

Type

str

corrected_value_text

Created for Labeling UI to export value text. If corrections were made to the text identified by the field_value.text_anchor, this field will contain the correction.

Type

str

provenance

The history of this annotation.

Type

google.cloud.documentai_v1beta3.types.Document.Provenance

class Image(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Rendered image contents for this page.

content

Raw byte content of the image.

Type

bytes

mime_type

Encoding media type (MIME type) for the image.

Type

str

width

Width of the image in pixels.

Type

int

height

Height of the image in pixels.

Type

int

class ImageQualityScores(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Image quality scores for the page image.

quality_score

The overall quality score. Range [0, 1] where 1 is perfect quality.

Type

float

detected_defects

A list of detected defects.

Type

MutableSequence[google.cloud.documentai_v1beta3.types.Document.Page.ImageQualityScores.DetectedDefect]

class DetectedDefect(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Image Quality Defects

type_

Name of the defect type. Supported values are:

  • quality/defect_blurry

  • quality/defect_noisy

  • quality/defect_dark

  • quality/defect_faint

  • quality/defect_text_too_small

  • quality/defect_document_cutoff

  • quality/defect_text_cutoff

  • quality/defect_glare

Type

str

confidence

Confidence of detected defect. Range [0, 1] where 1 indicates strong confidence that the defect exists.

Type

float

class Layout(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Visual element describing a layout unit on a page.

text_anchor

Text anchor indexing into the [Document.text][google.cloud.documentai.v1beta3.Document.text].

Type

google.cloud.documentai_v1beta3.types.Document.TextAnchor

confidence

Confidence of the current [Layout][google.cloud.documentai.v1beta3.Document.Page.Layout] within context of the object this layout is for. e.g. confidence can be for a single token, a table, a visual element, etc. depending on context. Range [0, 1].

Type

float

bounding_poly

The bounding polygon for the [Layout][google.cloud.documentai.v1beta3.Document.Page.Layout].

Type

google.cloud.documentai_v1beta3.types.BoundingPoly

orientation

Detected orientation for the [Layout][google.cloud.documentai.v1beta3.Document.Page.Layout].

Type

google.cloud.documentai_v1beta3.types.Document.Page.Layout.Orientation

class Orientation(value)[source]

Bases: proto.enums.Enum

Detected human reading orientation.

Values:
ORIENTATION_UNSPECIFIED (0):

Unspecified orientation.

PAGE_UP (1):

Orientation is aligned with page up.

PAGE_RIGHT (2):

Orientation is aligned with page right. Turn the head 90 degrees clockwise from upright to read.

PAGE_DOWN (3):

Orientation is aligned with page down. Turn the head 180 degrees from upright to read.

PAGE_LEFT (4):

Orientation is aligned with page left. Turn the head 90 degrees counterclockwise from upright to read.

class Line(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

A collection of tokens that a human would perceive as a line. Does not cross column boundaries, can be horizontal, vertical, etc.

layout

[Layout][google.cloud.documentai.v1beta3.Document.Page.Layout] for [Line][google.cloud.documentai.v1beta3.Document.Page.Line].

Type

google.cloud.documentai_v1beta3.types.Document.Page.Layout

detected_languages

A list of detected languages together with confidence.

Type

MutableSequence[google.cloud.documentai_v1beta3.types.Document.Page.DetectedLanguage]

provenance

The history of this annotation.

Type

google.cloud.documentai_v1beta3.types.Document.Provenance

class Matrix(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Representation for transformation matrix, intended to be compatible and used with OpenCV format for image manipulation.

rows

Number of rows in the matrix.

Type

int

cols

Number of columns in the matrix.

Type

int

type_

This encodes information about what data type the matrix uses. For example, 0 (CV_8U) is an unsigned 8-bit image. For the full list of OpenCV primitive data types, please refer to https://docs.opencv.org/4.3.0/d1/d1b/group__core__hal__interface.html

Type

int

data

The matrix data.

Type

bytes

class Paragraph(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

A collection of lines that a human would perceive as a paragraph.

layout

[Layout][google.cloud.documentai.v1beta3.Document.Page.Layout] for [Paragraph][google.cloud.documentai.v1beta3.Document.Page.Paragraph].

Type

google.cloud.documentai_v1beta3.types.Document.Page.Layout

detected_languages

A list of detected languages together with confidence.

Type

MutableSequence[google.cloud.documentai_v1beta3.types.Document.Page.DetectedLanguage]

provenance

The history of this annotation.

Type

google.cloud.documentai_v1beta3.types.Document.Provenance

class Symbol(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

A detected symbol.

layout

[Layout][google.cloud.documentai.v1beta3.Document.Page.Layout] for [Symbol][google.cloud.documentai.v1beta3.Document.Page.Symbol].

Type

google.cloud.documentai_v1beta3.types.Document.Page.Layout

detected_languages

A list of detected languages together with confidence.

Type

MutableSequence[google.cloud.documentai_v1beta3.types.Document.Page.DetectedLanguage]

class Table(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

A table representation similar to HTML table structure.

layout

[Layout][google.cloud.documentai.v1beta3.Document.Page.Layout] for [Table][google.cloud.documentai.v1beta3.Document.Page.Table].

Type

google.cloud.documentai_v1beta3.types.Document.Page.Layout

header_rows

Header rows of the table.

Type

MutableSequence[google.cloud.documentai_v1beta3.types.Document.Page.Table.TableRow]

body_rows

Body rows of the table.

Type

MutableSequence[google.cloud.documentai_v1beta3.types.Document.Page.Table.TableRow]

detected_languages

A list of detected languages together with confidence.

Type

MutableSequence[google.cloud.documentai_v1beta3.types.Document.Page.DetectedLanguage]

provenance

The history of this table.

Type

google.cloud.documentai_v1beta3.types.Document.Provenance

class TableCell(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

A cell representation inside the table.

layout

[Layout][google.cloud.documentai.v1beta3.Document.Page.Layout] for [TableCell][google.cloud.documentai.v1beta3.Document.Page.Table.TableCell].

Type

google.cloud.documentai_v1beta3.types.Document.Page.Layout

row_span

How many rows this cell spans.

Type

int

col_span

How many columns this cell spans.

Type

int

detected_languages

A list of detected languages together with confidence.

Type

MutableSequence[google.cloud.documentai_v1beta3.types.Document.Page.DetectedLanguage]

class TableRow(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

A row of table cells.

cells

Cells that make up this row.

Type

MutableSequence[google.cloud.documentai_v1beta3.types.Document.Page.Table.TableCell]

class Token(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

A detected token.

layout

[Layout][google.cloud.documentai.v1beta3.Document.Page.Layout] for [Token][google.cloud.documentai.v1beta3.Document.Page.Token].

Type

google.cloud.documentai_v1beta3.types.Document.Page.Layout

detected_break

Detected break at the end of a [Token][google.cloud.documentai.v1beta3.Document.Page.Token].

Type

google.cloud.documentai_v1beta3.types.Document.Page.Token.DetectedBreak

detected_languages

A list of detected languages together with confidence.

Type

MutableSequence[google.cloud.documentai_v1beta3.types.Document.Page.DetectedLanguage]

provenance

The history of this annotation.

Type

google.cloud.documentai_v1beta3.types.Document.Provenance

style_info

Text style attributes.

Type

google.cloud.documentai_v1beta3.types.Document.Page.Token.StyleInfo

class DetectedBreak(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Detected break at the end of a [Token][google.cloud.documentai.v1beta3.Document.Page.Token].

type_

Detected break type.

Type

google.cloud.documentai_v1beta3.types.Document.Page.Token.DetectedBreak.Type

class Type(value)[source]

Bases: proto.enums.Enum

Enum to denote the type of break found.

Values:
TYPE_UNSPECIFIED (0):

Unspecified break type.

SPACE (1):

A single whitespace.

WIDE_SPACE (2):

A wider whitespace.

HYPHEN (3):

A hyphen that indicates that a token has been split across lines.

class StyleInfo(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Font and other text style attributes.

font_size

Font size in points (1 point is ¹⁄₇₂ inches).

Type

int

pixel_font_size

Font size in pixels, equal to unrounded [font_size][google.cloud.documentai.v1beta3.Document.Page.Token.StyleInfo.font_size]

  • resolution ÷ 72.0.

Type

float

letter_spacing

Letter spacing in points.

Type

float

font_type

Name or style of the font.

Type

str

bold

Whether the text is bold (equivalent to [font_weight][google.cloud.documentai.v1beta3.Document.Page.Token.StyleInfo.font_weight] is at least 700).

Type

bool

italic

Whether the text is italic.

Type

bool

underlined

Whether the text is underlined.

Type

bool

strikeout

Whether the text is strikethrough. This feature is not supported yet.

Type

bool

subscript

Whether the text is a subscript. This feature is not supported yet.

Type

bool

superscript

Whether the text is a superscript. This feature is not supported yet.

Type

bool

smallcaps

Whether the text is in small caps. This feature is not supported yet.

Type

bool

font_weight

TrueType weight on a scale 100 (thin) to 1000 (ultra-heavy). Normal is 400, bold is 700.

Type

int

handwritten

Whether the text is handwritten.

Type

bool

text_color

Color of the text.

Type

google.type.color_pb2.Color

background_color

Color of the background.

Type

google.type.color_pb2.Color

class VisualElement(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Detected non-text visual elements e.g. checkbox, signature etc. on the page.

layout

[Layout][google.cloud.documentai.v1beta3.Document.Page.Layout] for [VisualElement][google.cloud.documentai.v1beta3.Document.Page.VisualElement].

Type

google.cloud.documentai_v1beta3.types.Document.Page.Layout

type_

Type of the [VisualElement][google.cloud.documentai.v1beta3.Document.Page.VisualElement].

Type

str

detected_languages

A list of detected languages together with confidence.

Type

MutableSequence[google.cloud.documentai_v1beta3.types.Document.Page.DetectedLanguage]

class PageAnchor(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Referencing the visual context of the entity in the [Document.pages][google.cloud.documentai.v1beta3.Document.pages]. Page anchors can be cross-page, consist of multiple bounding polygons and optionally reference specific layout element types.

page_refs

One or more references to visual page elements

Type

MutableSequence[google.cloud.documentai_v1beta3.types.Document.PageAnchor.PageRef]

class PageRef(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Represents a weak reference to a page element within a document.

page

Required. Index into the [Document.pages][google.cloud.documentai.v1beta3.Document.pages] element, for example using [Document.pages][page_refs.page] to locate the related page element. This field is skipped when its value is the default 0. See https://developers.google.com/protocol-buffers/docs/proto3#json.

Type

int

layout_type

Optional. The type of the layout element that is being referenced if any.

Type

google.cloud.documentai_v1beta3.types.Document.PageAnchor.PageRef.LayoutType

layout_id

Optional. Deprecated. Use [PageRef.bounding_poly][google.cloud.documentai.v1beta3.Document.PageAnchor.PageRef.bounding_poly] instead.

Type

str

bounding_poly

Optional. Identifies the bounding polygon of a layout element on the page. If layout_type is set, the bounding polygon must be exactly the same to the layout element it’s referring to.

Type

google.cloud.documentai_v1beta3.types.BoundingPoly

confidence

Optional. Confidence of detected page element, if applicable. Range [0, 1].

Type

float

class LayoutType(value)[source]

Bases: proto.enums.Enum

The type of layout that is being referenced.

Values:
LAYOUT_TYPE_UNSPECIFIED (0):

Layout Unspecified.

BLOCK (1):

References a [Page.blocks][google.cloud.documentai.v1beta3.Document.Page.blocks] element.

PARAGRAPH (2):

References a [Page.paragraphs][google.cloud.documentai.v1beta3.Document.Page.paragraphs] element.

LINE (3):

References a [Page.lines][google.cloud.documentai.v1beta3.Document.Page.lines] element.

TOKEN (4):

References a [Page.tokens][google.cloud.documentai.v1beta3.Document.Page.tokens] element.

VISUAL_ELEMENT (5):

References a [Page.visual_elements][google.cloud.documentai.v1beta3.Document.Page.visual_elements] element.

TABLE (6):

Refrrences a [Page.tables][google.cloud.documentai.v1beta3.Document.Page.tables] element.

FORM_FIELD (7):

References a [Page.form_fields][google.cloud.documentai.v1beta3.Document.Page.form_fields] element.

class Provenance(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Structure to identify provenance relationships between annotations in different revisions.

revision

The index of the revision that produced this element.

Type

int

id

The Id of this operation. Needs to be unique within the scope of the revision.

Type

int

parents

References to the original elements that are replaced.

Type

MutableSequence[google.cloud.documentai_v1beta3.types.Document.Provenance.Parent]

type_

The type of provenance operation.

Type

google.cloud.documentai_v1beta3.types.Document.Provenance.OperationType

class OperationType(value)[source]

Bases: proto.enums.Enum

If a processor or agent does an explicit operation on existing elements.

Values:
OPERATION_TYPE_UNSPECIFIED (0):

Operation type unspecified. If no operation is specified a provenance entry is simply used to match against a parent.

ADD (1):

Add an element.

REMOVE (2):

Remove an element identified by parent.

UPDATE (7):

Updates any fields within the given provenance scope of the message. It overwrites the fields rather than replacing them. Use this when you want to update a field value of an entity without also updating all the child properties.

REPLACE (3):

Currently unused. Replace an element identified by parent.

EVAL_REQUESTED (4):

Deprecated. Request human review for the element identified by parent.

EVAL_APPROVED (5):

Deprecated. Element is reviewed and approved at human review, confidence will be set to 1.0.

EVAL_SKIPPED (6):

Deprecated. Element is skipped in the validation process.

class Parent(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

The parent element the current element is based on. Used for referencing/aligning, removal and replacement operations.

revision

The index of the index into current revision’s parent_ids list.

Type

int

index

The index of the parent item in the corresponding item list (eg. list of entities, properties within entities, etc.) in the parent revision.

Type

int

id

The id of the parent provenance.

Type

int

class Revision(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Contains past or forward revisions of this document.

This message has oneof fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.

agent

If the change was made by a person specify the name or id of that person.

This field is a member of oneof source.

Type

str

processor

If the annotation was made by processor identify the processor by its resource name.

This field is a member of oneof source.

Type

str

id

Id of the revision, internally generated by doc proto storage. Unique within the context of the document.

Type

str

parent

The revisions that this revision is based on. This can include one or more parent (when documents are merged.) This field represents the index into the revisions field.

Type

MutableSequence[int]

parent_ids

The revisions that this revision is based on. Must include all the ids that have anything to do with this revision - eg. there are provenance.parent.revision fields that index into this field.

Type

MutableSequence[str]

create_time

The time that the revision was created, internally generated by doc proto storage at the time of create.

Type

google.protobuf.timestamp_pb2.Timestamp

human_review

Human Review information of this revision.

Type

google.cloud.documentai_v1beta3.types.Document.Revision.HumanReview

class HumanReview(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Human Review information of the document.

state

Human review state. e.g. requested, succeeded, rejected.

Type

str

state_message

A message providing more details about the current state of processing. For example, the rejection reason when the state is rejected.

Type

str

class ShardInfo(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

For a large document, sharding may be performed to produce several document shards. Each document shard contains this field to detail which shard it is.

shard_index

The 0-based index of this shard.

Type

int

shard_count

Total number of shards.

Type

int

text_offset

The index of the first character in [Document.text][google.cloud.documentai.v1beta3.Document.text] in the overall document global text.

Type

int

class Style(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Annotation for common text style attributes. This adheres to CSS conventions as much as possible.

text_anchor

Text anchor indexing into the [Document.text][google.cloud.documentai.v1beta3.Document.text].

Type

google.cloud.documentai_v1beta3.types.Document.TextAnchor

color

Text color.

Type

google.type.color_pb2.Color

background_color

Text background color.

Type

google.type.color_pb2.Color

font_weight

Font weight. Possible values are normal, bold, bolder, and lighter.

Type

str

text_style

Text style. Possible values are normal, italic, and oblique.

Type

str

text_decoration

Text decoration. Follows CSS standard.

Type

str

font_size

Font size.

Type

google.cloud.documentai_v1beta3.types.Document.Style.FontSize

font_family

Font family such as Arial, Times New Roman. https://www.w3schools.com/cssref/pr_font_font-family.asp

Type

str

class FontSize(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Font size with unit.

size

Font size for the text.

Type

float

unit

Unit for the font size. Follows CSS naming (such as in, px, and pt).

Type

str

class TextAnchor(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Text reference indexing into the [Document.text][google.cloud.documentai.v1beta3.Document.text].

text_segments

The text segments from the [Document.text][google.cloud.documentai.v1beta3.Document.text].

Type

MutableSequence[google.cloud.documentai_v1beta3.types.Document.TextAnchor.TextSegment]

content

Contains the content of the text span so that users do not have to look it up in the text_segments. It is always populated for formFields.

Type

str

class TextSegment(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

A text segment in the [Document.text][google.cloud.documentai.v1beta3.Document.text]. The indices may be out of bounds which indicate that the text extends into another document shard for large sharded documents. See [ShardInfo.text_offset][google.cloud.documentai.v1beta3.Document.ShardInfo.text_offset]

start_index

[TextSegment][google.cloud.documentai.v1beta3.Document.TextAnchor.TextSegment] start UTF-8 char index in the [Document.text][google.cloud.documentai.v1beta3.Document.text].

Type

int

end_index

[TextSegment][google.cloud.documentai.v1beta3.Document.TextAnchor.TextSegment] half open end UTF-8 char index in the [Document.text][google.cloud.documentai.v1beta3.Document.text].

Type

int

class TextChange(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

This message is used for text changes aka. OCR corrections.

text_anchor

Provenance of the correction. Text anchor indexing into the [Document.text][google.cloud.documentai.v1beta3.Document.text]. There can only be a single TextAnchor.text_segments element. If the start and end index of the text segment are the same, the text change is inserted before that index.

Type

google.cloud.documentai_v1beta3.types.Document.TextAnchor

changed_text

The text that replaces the text identified in the text_anchor.

Type

str

provenance

The history of this annotation.

Type

MutableSequence[google.cloud.documentai_v1beta3.types.Document.Provenance]

class google.cloud.documentai_v1beta3.types.DocumentId(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Document Identifier.

This message has oneof fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.

gcs_managed_doc_id

A document id within user-managed Cloud Storage.

This field is a member of oneof type.

Type

google.cloud.documentai_v1beta3.types.DocumentId.GCSManagedDocumentId

unmanaged_doc_id

A document id within unmanaged dataset.

This field is a member of oneof type.

Type

google.cloud.documentai_v1beta3.types.DocumentId.UnmanagedDocumentId

revision_ref

Points to a specific revision of the document if set.

Type

google.cloud.documentai_v1beta3.types.RevisionRef

class GCSManagedDocumentId(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Identifies a document uniquely within the scope of a dataset in the user-managed Cloud Storage option.

gcs_uri

Required. The Cloud Storage URI where the actual document is stored.

Type

str

cw_doc_id

Id of the document (indexed) managed by Content Warehouse.

Type

str

class UnmanagedDocumentId(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Identifies a document uniquely within the scope of a dataset in unmanaged option.

doc_id

Required. The id of the document.

Type

str

class google.cloud.documentai_v1beta3.types.DocumentLabelingState(value)[source]

Bases: proto.enums.Enum

Describes the labeling status of a document.

Values:
DOCUMENT_LABELING_STATE_UNSPECIFIED (0):

Default value if the enum is not set.

DOCUMENT_LABELED (1):

Document has been labeled.

DOCUMENT_UNLABELED (2):

Document has not been labeled.

DOCUMENT_AUTO_LABELED (3):

Document has been auto-labeled.

class google.cloud.documentai_v1beta3.types.DocumentMetadata(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Metadata about a document.

document_id

Document identifier.

Type

google.cloud.documentai_v1beta3.types.DocumentId

page_count

Number of pages in the document.

Type

int

dataset_type

Type of the dataset split to which the document belongs.

Type

google.cloud.documentai_v1beta3.types.DatasetSplitType

labeling_state

Labeling state of the document.

Type

google.cloud.documentai_v1beta3.types.DocumentLabelingState

display_name

The display name of the document.

Type

str

class google.cloud.documentai_v1beta3.types.DocumentOutputConfig(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Config that controls the output of documents. All documents will be written as a JSON file.

gcs_output_config

Output config to write the results to Cloud Storage.

This field is a member of oneof destination.

Type

google.cloud.documentai_v1beta3.types.DocumentOutputConfig.GcsOutputConfig

class GcsOutputConfig(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

The configuration used when outputting documents.

gcs_uri

The Cloud Storage uri (a directory) of the output.

Type

str

field_mask

Specifies which fields to include in the output documents. Only supports top level document and pages field so it must be in the form of {document_field_name} or pages.{page_field_name}.

Type

google.protobuf.field_mask_pb2.FieldMask

sharding_config

Specifies the sharding config for the output document.

Type

google.cloud.documentai_v1beta3.types.DocumentOutputConfig.GcsOutputConfig.ShardingConfig

class ShardingConfig(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

The sharding config for the output document.

pages_per_shard

The number of pages per shard.

Type

int

pages_overlap

The number of overlapping pages between consecutive shards.

Type

int

class google.cloud.documentai_v1beta3.types.DocumentPageRange(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Range of pages present in a document.

start

First page number (one-based index) to be returned.

Type

int

end

Last page number (one-based index) to be returned.

Type

int

class google.cloud.documentai_v1beta3.types.DocumentSchema(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

The schema defines the output of the processed document by a processor.

display_name

Display name to show to users.

Type

str

description

Description of the schema.

Type

str

entity_types

Entity types of the schema.

Type

MutableSequence[google.cloud.documentai_v1beta3.types.DocumentSchema.EntityType]

metadata

Metadata of the schema.

Type

google.cloud.documentai_v1beta3.types.DocumentSchema.Metadata

class EntityType(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

EntityType is the wrapper of a label of the corresponding model with detailed attributes and limitations for entity-based processors. Multiple types can also compose a dependency tree to represent nested types.

enum_values

If specified, lists all the possible values for this entity. This should not be more than a handful of values. If the number of values is >10 or could change frequently use the EntityType.value_ontology field and specify a list of all possible values in a value ontology file.

This field is a member of oneof value_source.

Type

google.cloud.documentai_v1beta3.types.DocumentSchema.EntityType.EnumValues

display_name

User defined name for the type.

Type

str

name

Name of the type. It must be unique within the schema file and cannot be a “Common Type”. The following naming conventions are used:

  • Use snake_casing.

  • Name matching is case-sensitive.

  • Maximum 64 characters.

  • Must start with a letter.

  • Allowed characters: ASCII letters [a-z0-9_-]. (For backward compatibility internal infrastructure and tooling can handle any ascii character.)

  • The / is sometimes used to denote a property of a type. For example line_item/amount. This convention is deprecated, but will still be honored for backward compatibility.

Type

str

description

The description of the entity type. Could be used to provide more information about the entity type for model calls.

Type

str

base_types

The entity type that this type is derived from. For now, one and only one should be set.

Type

MutableSequence[str]

properties

Description the nested structure, or composition of an entity.

Type

MutableSequence[google.cloud.documentai_v1beta3.types.DocumentSchema.EntityType.Property]

entity_type_metadata

Metadata for the entity type.

Type

google.cloud.documentai_v1beta3.types.EntityTypeMetadata

class EnumValues(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Defines the a list of enum values.

values

The individual values that this enum values type can include.

Type

MutableSequence[str]

class Property(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Defines properties that can be part of the entity type.

name

The name of the property. Follows the same guidelines as the EntityType name.

Type

str

description

The description of the property. Could be used to provide more information about the property for model calls.

Type

str

display_name

User defined name for the property.

Type

str

value_type

A reference to the value type of the property. This type is subject to the same conventions as the Entity.base_types field.

Type

str

occurrence_type

Occurrence type limits the number of instances an entity type appears in the document.

Type

google.cloud.documentai_v1beta3.types.DocumentSchema.EntityType.Property.OccurrenceType

property_metadata

Any additional metadata about the property can be added here.

Type

google.cloud.documentai_v1beta3.types.PropertyMetadata

class OccurrenceType(value)[source]

Bases: proto.enums.Enum

Types of occurrences of the entity type in the document. This represents the number of instances, not mentions, of an entity. For example, a bank statement might only have one account_number, but this account number can be mentioned in several places on the document. In this case, the account_number is considered a REQUIRED_ONCE entity type. If, on the other hand, we expect a bank statement to contain the status of multiple different accounts for the customers, the occurrence type is set to REQUIRED_MULTIPLE.

Values:
OCCURRENCE_TYPE_UNSPECIFIED (0):

Unspecified occurrence type.

OPTIONAL_ONCE (1):

There will be zero or one instance of this entity type. The same entity instance may be mentioned multiple times.

OPTIONAL_MULTIPLE (2):

The entity type will appear zero or multiple times.

REQUIRED_ONCE (3):

The entity type will only appear exactly once. The same entity instance may be mentioned multiple times.

REQUIRED_MULTIPLE (4):

The entity type will appear once or more times.

class Metadata(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Metadata for global schema behavior.

document_splitter

If true, a document entity type can be applied to subdocument (splitting). Otherwise, it can only be applied to the entire document (classification).

Type

bool

document_allow_multiple_labels

If true, on a given page, there can be multiple document annotations covering it.

Type

bool

prefixed_naming_on_properties

If set, all the nested entities must be prefixed with the parents.

Type

bool

skip_naming_validation

If set, we will skip the naming format validation in the schema. So the string values in DocumentSchema.EntityType.name and DocumentSchema.EntityType.Property.name will not be checked.

Type

bool

class google.cloud.documentai_v1beta3.types.EnableProcessorMetadata(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

The long-running operation metadata for the [EnableProcessor][google.cloud.documentai.v1beta3.DocumentProcessorService.EnableProcessor] method.

common_metadata

The basic metadata of the long-running operation.

Type

google.cloud.documentai_v1beta3.types.CommonOperationMetadata

class google.cloud.documentai_v1beta3.types.EnableProcessorRequest(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Request message for the [EnableProcessor][google.cloud.documentai.v1beta3.DocumentProcessorService.EnableProcessor] method.

name

Required. The processor resource name to be enabled.

Type

str

class google.cloud.documentai_v1beta3.types.EnableProcessorResponse(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Response message for the [EnableProcessor][google.cloud.documentai.v1beta3.DocumentProcessorService.EnableProcessor] method. Intentionally empty proto for adding fields in future.

class google.cloud.documentai_v1beta3.types.EntityTypeMetadata(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Metadata about an entity type.

inactive

Whether the entity type should be considered inactive.

Type

bool

class google.cloud.documentai_v1beta3.types.EvaluateProcessorVersionMetadata(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Metadata of the [EvaluateProcessorVersion][google.cloud.documentai.v1beta3.DocumentProcessorService.EvaluateProcessorVersion] method.

common_metadata

The basic metadata of the long-running operation.

Type

google.cloud.documentai_v1beta3.types.CommonOperationMetadata

class google.cloud.documentai_v1beta3.types.EvaluateProcessorVersionRequest(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Evaluates the given [ProcessorVersion][google.cloud.documentai.v1beta3.ProcessorVersion] against the supplied documents.

processor_version

Required. The resource name of the [ProcessorVersion][google.cloud.documentai.v1beta3.ProcessorVersion] to evaluate. projects/{project}/locations/{location}/processors/{processor}/processorVersions/{processorVersion}

Type

str

evaluation_documents

Optional. The documents used in the evaluation. If unspecified, use the processor’s dataset as evaluation input.

Type

google.cloud.documentai_v1beta3.types.BatchDocumentsInputConfig

class google.cloud.documentai_v1beta3.types.EvaluateProcessorVersionResponse(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Response of the [EvaluateProcessorVersion][google.cloud.documentai.v1beta3.DocumentProcessorService.EvaluateProcessorVersion] method.

evaluation

The resource name of the created evaluation.

Type

str

class google.cloud.documentai_v1beta3.types.Evaluation(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

An evaluation of a ProcessorVersion’s performance.

name

The resource name of the evaluation. Format: projects/{project}/locations/{location}/processors/{processor}/processorVersions/{processor_version}/evaluations/{evaluation}

Type

str

create_time

The time that the evaluation was created.

Type

google.protobuf.timestamp_pb2.Timestamp

document_counters

Counters for the documents used in the evaluation.

Type

google.cloud.documentai_v1beta3.types.Evaluation.Counters

all_entities_metrics

Metrics for all the entities in aggregate.

Type

google.cloud.documentai_v1beta3.types.Evaluation.MultiConfidenceMetrics

entity_metrics

Metrics across confidence levels, for different entities.

Type

MutableMapping[str, google.cloud.documentai_v1beta3.types.Evaluation.MultiConfidenceMetrics]

kms_key_name

The KMS key name used for encryption.

Type

str

kms_key_version_name

The KMS key version with which data is encrypted.

Type

str

class ConfidenceLevelMetrics(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Evaluations metrics, at a specific confidence level.

confidence_level

The confidence level.

Type

float

metrics

The metrics at the specific confidence level.

Type

google.cloud.documentai_v1beta3.types.Evaluation.Metrics

class Counters(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Evaluation counters for the documents that were used.

input_documents_count

How many documents were sent for evaluation.

Type

int

invalid_documents_count

How many documents were not included in the evaluation as they didn’t pass validation.

Type

int

failed_documents_count

How many documents were not included in the evaluation as Document AI failed to process them.

Type

int

evaluated_documents_count

How many documents were used in the evaluation.

Type

int

class EntityMetricsEntry(mapping=None, *, ignore_unknown_fields=False, **kwargs)

Bases: proto.message.Message

class Metrics(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Evaluation metrics, either in aggregate or about a specific entity.

precision

The calculated precision.

Type

float

recall

The calculated recall.

Type

float

f1_score

The calculated f1 score.

Type

float

predicted_occurrences_count

The amount of occurrences in predicted documents.

Type

int

ground_truth_occurrences_count

The amount of occurrences in ground truth documents.

Type

int

predicted_document_count

The amount of documents with a predicted occurrence.

Type

int

ground_truth_document_count

The amount of documents with a ground truth occurrence.

Type

int

true_positives_count

The amount of true positives.

Type

int

false_positives_count

The amount of false positives.

Type

int

false_negatives_count

The amount of false negatives.

Type

int

total_documents_count

The amount of documents that had an occurrence of this label.

Type

int

class MultiConfidenceMetrics(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Metrics across multiple confidence levels.

confidence_level_metrics

Metrics across confidence levels with fuzzy matching enabled.

Type

MutableSequence[google.cloud.documentai_v1beta3.types.Evaluation.ConfidenceLevelMetrics]

confidence_level_metrics_exact

Metrics across confidence levels with only exact matching.

Type

MutableSequence[google.cloud.documentai_v1beta3.types.Evaluation.ConfidenceLevelMetrics]

auprc

The calculated area under the precision recall curve (AUPRC), computed by integrating over all confidence thresholds.

Type

float

estimated_calibration_error

The Estimated Calibration Error (ECE) of the confidence of the predicted entities.

Type

float

auprc_exact

The AUPRC for metrics with fuzzy matching disabled, i.e., exact matching only.

Type

float

estimated_calibration_error_exact

The ECE for the predicted entities with fuzzy matching disabled, i.e., exact matching only.

Type

float

metrics_type

The metrics type for the label.

Type

google.cloud.documentai_v1beta3.types.Evaluation.MultiConfidenceMetrics.MetricsType

class MetricsType(value)[source]

Bases: proto.enums.Enum

A type that determines how metrics should be interpreted.

Values:
METRICS_TYPE_UNSPECIFIED (0):

The metrics type is unspecified. By default, metrics without a particular specification are for leaf entity types (i.e., top-level entity types without child types, or child types which are not parent types themselves).

AGGREGATE (1):

Indicates whether metrics for this particular label type represent an aggregate of metrics for other types instead of being based on actual TP/FP/FN values for the label type. Metrics for parent (i.e., non-leaf) entity types are an aggregate of metrics for their children.

class google.cloud.documentai_v1beta3.types.EvaluationReference(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Gives a short summary of an evaluation, and links to the evaluation itself.

operation

The resource name of the Long Running Operation for the evaluation.

Type

str

evaluation

The resource name of the evaluation.

Type

str

aggregate_metrics

An aggregate of the statistics for the evaluation with fuzzy matching on.

Type

google.cloud.documentai_v1beta3.types.Evaluation.Metrics

aggregate_metrics_exact

An aggregate of the statistics for the evaluation with fuzzy matching off.

Type

google.cloud.documentai_v1beta3.types.Evaluation.Metrics

class google.cloud.documentai_v1beta3.types.FetchProcessorTypesRequest(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Request message for the [FetchProcessorTypes][google.cloud.documentai.v1beta3.DocumentProcessorService.FetchProcessorTypes] method. Some processor types may require the project be added to an allowlist.

parent

Required. The location of processor types to list. Format: projects/{project}/locations/{location}.

Type

str

class google.cloud.documentai_v1beta3.types.FetchProcessorTypesResponse(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Response message for the [FetchProcessorTypes][google.cloud.documentai.v1beta3.DocumentProcessorService.FetchProcessorTypes] method.

processor_types

The list of processor types.

Type

MutableSequence[google.cloud.documentai_v1beta3.types.ProcessorType]

class google.cloud.documentai_v1beta3.types.FieldExtractionMetadata(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Metadata for how this field value is extracted.

summary_options

Summary options config.

Type

google.cloud.documentai_v1beta3.types.SummaryOptions

class google.cloud.documentai_v1beta3.types.GcsDocument(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Specifies a document stored on Cloud Storage.

gcs_uri

The Cloud Storage object uri.

Type

str

mime_type

An IANA MIME type (RFC6838) of the content.

Type

str

class google.cloud.documentai_v1beta3.types.GcsDocuments(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Specifies a set of documents on Cloud Storage.

documents

The list of documents.

Type

MutableSequence[google.cloud.documentai_v1beta3.types.GcsDocument]

class google.cloud.documentai_v1beta3.types.GcsPrefix(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Specifies all documents on Cloud Storage with a common prefix.

gcs_uri_prefix

The URI prefix.

Type

str

class google.cloud.documentai_v1beta3.types.GetDatasetSchemaRequest(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Request for GetDatasetSchema.

name

Required. The dataset schema resource name. Format:

projects/{project}/locations/{location}/processors/{processor}/dataset/datasetSchema

Type

str

visible_fields_only

If set, only returns the visible fields of the schema.

Type

bool

class google.cloud.documentai_v1beta3.types.GetDocumentRequest(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

dataset

Required. The resource name of the dataset that the document belongs to . Format:

projects/{project}/locations/{location}/processors/{processor}/dataset

Type

str

document_id

Required. Document identifier.

Type

google.cloud.documentai_v1beta3.types.DocumentId

read_mask

If set, only fields listed here will be returned. Otherwise, all fields will be returned by default.

Type

google.protobuf.field_mask_pb2.FieldMask

page_range

List of pages for which the fields specified in the read_mask must be served.

Type

google.cloud.documentai_v1beta3.types.DocumentPageRange

class google.cloud.documentai_v1beta3.types.GetDocumentResponse(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

document
Type

google.cloud.documentai_v1beta3.types.Document

class google.cloud.documentai_v1beta3.types.GetEvaluationRequest(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Retrieves a specific Evaluation.

name

Required. The resource name of the [Evaluation][google.cloud.documentai.v1beta3.Evaluation] to get. projects/{project}/locations/{location}/processors/{processor}/processorVersions/{processorVersion}/evaluations/{evaluation}

Type

str

class google.cloud.documentai_v1beta3.types.GetProcessorRequest(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Request message for the [GetProcessor][google.cloud.documentai.v1beta3.DocumentProcessorService.GetProcessor] method.

name

Required. The processor resource name.

Type

str

class google.cloud.documentai_v1beta3.types.GetProcessorTypeRequest(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Request message for the [GetProcessorType][google.cloud.documentai.v1beta3.DocumentProcessorService.GetProcessorType] method.

name

Required. The processor type resource name.

Type

str

class google.cloud.documentai_v1beta3.types.GetProcessorVersionRequest(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Request message for the [GetProcessorVersion][google.cloud.documentai.v1beta3.DocumentProcessorService.GetProcessorVersion] method.

name

Required. The processor resource name.

Type

str

class google.cloud.documentai_v1beta3.types.HumanReviewStatus(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

The status of human review on a processed document.

state

The state of human review on the processing request.

Type

google.cloud.documentai_v1beta3.types.HumanReviewStatus.State

state_message

A message providing more details about the human review state.

Type

str

human_review_operation

The name of the operation triggered by the processed document. This field is populated only when the [state][google.cloud.documentai.v1beta3.HumanReviewStatus.state] is HUMAN_REVIEW_IN_PROGRESS. It has the same response type and metadata as the long-running operation returned by [ReviewDocument][google.cloud.documentai.v1beta3.DocumentProcessorService.ReviewDocument].

Type

str

class State(value)[source]

Bases: proto.enums.Enum

The final state of human review on a processed document.

Values:
STATE_UNSPECIFIED (0):

Human review state is unspecified. Most likely due to an internal error.

SKIPPED (1):

Human review is skipped for the document. This can happen because human review isn’t enabled on the processor or the processing request has been set to skip this document.

VALIDATION_PASSED (2):

Human review validation is triggered and passed, so no review is needed.

IN_PROGRESS (3):

Human review validation is triggered and the document is under review.

ERROR (4):

Some error happened during triggering human review, see the [state_message][google.cloud.documentai.v1beta3.HumanReviewStatus.state_message] for details.

class google.cloud.documentai_v1beta3.types.ImportDocumentsMetadata(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Metadata of the import document operation.

common_metadata

The basic metadata of the long-running operation.

Type

google.cloud.documentai_v1beta3.types.CommonOperationMetadata

individual_import_statuses

The list of response details of each document.

Type

MutableSequence[google.cloud.documentai_v1beta3.types.ImportDocumentsMetadata.IndividualImportStatus]

import_config_validation_results

Validation statuses of the batch documents import config.

Type

MutableSequence[google.cloud.documentai_v1beta3.types.ImportDocumentsMetadata.ImportConfigValidationResult]

total_document_count

Total number of the documents that are qualified for importing.

Type

int

class ImportConfigValidationResult(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

The validation status of each import config. Status is set to an error if there are no documents to import in the import_config, or OK if the operation will try to proceed with at least one document.

input_gcs_source

The source Cloud Storage URI specified in the import config.

Type

str

status

The validation status of import config.

Type

google.rpc.status_pb2.Status

class IndividualImportStatus(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

The status of each individual document in the import process.

input_gcs_source

The source Cloud Storage URI of the document.

Type

str

status

The status of the importing of the document.

Type

google.rpc.status_pb2.Status

output_document_id

The document id of imported document if it was successful, otherwise empty.

Type

google.cloud.documentai_v1beta3.types.DocumentId

class google.cloud.documentai_v1beta3.types.ImportDocumentsRequest(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

dataset

Required. The dataset resource name. Format:

projects/{project}/locations/{location}/processors/{processor}/dataset

Type

str

batch_documents_import_configs

Required. The Cloud Storage uri containing raw documents that must be imported.

Type

MutableSequence[google.cloud.documentai_v1beta3.types.ImportDocumentsRequest.BatchDocumentsImportConfig]

class BatchDocumentsImportConfig(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Config for importing documents. Each batch can have its own dataset split type.

This message has oneof fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.

dataset_split

Target dataset split where the documents must be stored.

This field is a member of oneof split_type_config.

Type

google.cloud.documentai_v1beta3.types.DatasetSplitType

auto_split_config

If set, documents will be automatically split into training and test split category with the specified ratio.

This field is a member of oneof split_type_config.

Type

google.cloud.documentai_v1beta3.types.ImportDocumentsRequest.BatchDocumentsImportConfig.AutoSplitConfig

batch_input_config

The common config to specify a set of documents used as input.

Type

google.cloud.documentai_v1beta3.types.BatchDocumentsInputConfig

class AutoSplitConfig(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

The config for auto-split.

training_split_ratio

Ratio of training dataset split.

Type

float

class google.cloud.documentai_v1beta3.types.ImportDocumentsResponse(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Response of the import document operation.

class google.cloud.documentai_v1beta3.types.ImportProcessorVersionMetadata(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

The long-running operation metadata for the [ImportProcessorVersion][google.cloud.documentai.v1beta3.DocumentProcessorService.ImportProcessorVersion] method.

common_metadata

The basic metadata for the long-running operation.

Type

google.cloud.documentai_v1beta3.types.CommonOperationMetadata

class google.cloud.documentai_v1beta3.types.ImportProcessorVersionRequest(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

The request message for the [ImportProcessorVersion][google.cloud.documentai.v1beta3.DocumentProcessorService.ImportProcessorVersion] method.

The Document AI Service Agent of the destination project must have Document AI Editor role on the source project.

The destination project is specified as part of the [parent][google.cloud.documentai.v1beta3.ImportProcessorVersionRequest.parent] field. The source project is specified as part of the [source][google.cloud.documentai.v1beta3.ImportProcessorVersionRequest.processor_version_source] or [external_processor_version_source][google.cloud.documentai.v1beta3.ImportProcessorVersionRequest.external_processor_version_source] field.

This message has oneof fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.

processor_version_source

The source processor version to import from. The source processor version and destination processor need to be in the same environment and region. Note that ProcessorVersions with model_type MODEL_TYPE_LLM are not supported.

This field is a member of oneof source.

Type

str

external_processor_version_source

The source processor version to import from. It can be from a different environment and region than the destination processor.

This field is a member of oneof source.

Type

google.cloud.documentai_v1beta3.types.ImportProcessorVersionRequest.ExternalProcessorVersionSource

parent

Required. The destination processor name to create the processor version in. Format: projects/{project}/locations/{location}/processors/{processor}

Type

str

class ExternalProcessorVersionSource(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

The external source processor version.

processor_version

Required. The processor version name. Format: projects/{project}/locations/{location}/processors/{processor}/processorVersions/{processorVersion}

Type

str

service_endpoint

Optional. The Document AI service endpoint. For example, ‘https://us-documentai.googleapis.com

Type

str

class google.cloud.documentai_v1beta3.types.ImportProcessorVersionResponse(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

The response message for the [ImportProcessorVersion][google.cloud.documentai.v1beta3.DocumentProcessorService.ImportProcessorVersion] method.

processor_version

The destination processor version name.

Type

str

class google.cloud.documentai_v1beta3.types.ListDocumentsRequest(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

dataset

Required. The resource name of the dataset to be listed. Format:

projects/{project}/locations/{location}/processors/{processor}/dataset

Type

str

page_size

The maximum number of documents to return. The service may return fewer than this value. If unspecified, at most 20 documents will be returned. The maximum value is 100; values above 100 will be coerced to 100.

Type

int

page_token

A page token, received from a previous ListDocuments call. Provide this to retrieve the subsequent page.

When paginating, all other parameters provided to ListDocuments must match the call that provided the page token.

Type

str

filter

Optional. Query to filter the documents based on https://google.aip.dev/160.

Currently support query strings are:

  • SplitType=DATASET_SPLIT_TEST|DATASET_SPLIT_TRAIN|DATASET_SPLIT_UNASSIGNED

  • LabelingState=DOCUMENT_LABELED|DOCUMENT_UNLABELED|DOCUMENT_AUTO_LABELED

  • DisplayName=\"file_name.pdf\"

  • EntityType=abc/def

  • TagName=\"auto-labeling-running\"|\"sampled\"

Note:

  • Only AND, = and != are supported. e.g. DisplayName=file_name AND EntityType!=abc IS supported.

  • Wildcard * is supported only in DisplayName filter

  • No duplicate filter keys are allowed, e.g. EntityType=a AND EntityType=b is NOT supported.

  • String match is case sensitive (for filter DisplayName & EntityType).

Type

str

return_total_size

Optional. Controls if the request requires a total size of matched documents. See [ListDocumentsResponse.total_size][google.cloud.documentai.v1beta3.ListDocumentsResponse.total_size].

Enabling this flag may adversely impact performance.

Defaults to false.

Type

bool

skip

Optional. Number of results to skip beginning from the page_token if provided. https://google.aip.dev/158#skipping-results. It must be a non-negative integer. Negative values will be rejected. Note that this is not the number of pages to skip. If this value causes the cursor to move past the end of results, [ListDocumentsResponse.document_metadata][google.cloud.documentai.v1beta3.ListDocumentsResponse.document_metadata] and [ListDocumentsResponse.next_page_token][google.cloud.documentai.v1beta3.ListDocumentsResponse.next_page_token] will be empty.

Type

int

class google.cloud.documentai_v1beta3.types.ListDocumentsResponse(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

document_metadata

Document metadata corresponding to the listed documents.

Type

MutableSequence[google.cloud.documentai_v1beta3.types.DocumentMetadata]

next_page_token

A token, which can be sent as [ListDocumentsRequest.page_token][google.cloud.documentai.v1beta3.ListDocumentsRequest.page_token] to retrieve the next page. If this field is omitted, there are no subsequent pages.

Type

str

total_size

Total count of documents queried.

Type

int

class google.cloud.documentai_v1beta3.types.ListEvaluationsRequest(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Retrieves a list of evaluations for a given [ProcessorVersion][google.cloud.documentai.v1beta3.ProcessorVersion].

parent

Required. The resource name of the [ProcessorVersion][google.cloud.documentai.v1beta3.ProcessorVersion] to list evaluations for. projects/{project}/locations/{location}/processors/{processor}/processorVersions/{processorVersion}

Type

str

page_size

The standard list page size. If unspecified, at most 5 evaluations are returned. The maximum value is 100. Values above 100 are coerced to 100.

Type

int

page_token

A page token, received from a previous ListEvaluations call. Provide this to retrieve the subsequent page.

Type

str

class google.cloud.documentai_v1beta3.types.ListEvaluationsResponse(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

The response from ListEvaluations.

evaluations

The evaluations requested.

Type

MutableSequence[google.cloud.documentai_v1beta3.types.Evaluation]

next_page_token

A token, which can be sent as page_token to retrieve the next page. If this field is omitted, there are no subsequent pages.

Type

str

class google.cloud.documentai_v1beta3.types.ListProcessorTypesRequest(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Request message for the [ListProcessorTypes][google.cloud.documentai.v1beta3.DocumentProcessorService.ListProcessorTypes] method. Some processor types may require the project be added to an allowlist.

parent

Required. The location of processor types to list. Format: projects/{project}/locations/{location}.

Type

str

page_size

The maximum number of processor types to return. If unspecified, at most 100 processor types will be returned. The maximum value is 500. Values above 500 will be coerced to 500.

Type

int

page_token

Used to retrieve the next page of results, empty if at the end of the list.

Type

str

class google.cloud.documentai_v1beta3.types.ListProcessorTypesResponse(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Response message for the [ListProcessorTypes][google.cloud.documentai.v1beta3.DocumentProcessorService.ListProcessorTypes] method.

processor_types

The processor types.

Type

MutableSequence[google.cloud.documentai_v1beta3.types.ProcessorType]

next_page_token

Points to the next page, otherwise empty.

Type

str

class google.cloud.documentai_v1beta3.types.ListProcessorVersionsRequest(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Request message for list all processor versions belongs to a processor.

parent

Required. The parent (project, location and processor) to list all versions. Format: projects/{project}/locations/{location}/processors/{processor}

Type

str

page_size

The maximum number of processor versions to return. If unspecified, at most 10 processor versions will be returned. The maximum value is 20. Values above 20 will be coerced to 20.

Type

int

page_token

We will return the processor versions sorted by creation time. The page token will point to the next processor version.

Type

str

class google.cloud.documentai_v1beta3.types.ListProcessorVersionsResponse(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Response message for the [ListProcessorVersions][google.cloud.documentai.v1beta3.DocumentProcessorService.ListProcessorVersions] method.

processor_versions

The list of processors.

Type

MutableSequence[google.cloud.documentai_v1beta3.types.ProcessorVersion]

next_page_token

Points to the next processor, otherwise empty.

Type

str

class google.cloud.documentai_v1beta3.types.ListProcessorsRequest(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Request message for list all processors belongs to a project.

parent

Required. The parent (project and location) which owns this collection of Processors. Format: projects/{project}/locations/{location}

Type

str

page_size

The maximum number of processors to return. If unspecified, at most 50 processors will be returned. The maximum value is 100. Values above 100 will be coerced to 100.

Type

int

page_token

We will return the processors sorted by creation time. The page token will point to the next processor.

Type

str

class google.cloud.documentai_v1beta3.types.ListProcessorsResponse(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Response message for the [ListProcessors][google.cloud.documentai.v1beta3.DocumentProcessorService.ListProcessors] method.

processors

The list of processors.

Type

MutableSequence[google.cloud.documentai_v1beta3.types.Processor]

next_page_token

Points to the next processor, otherwise empty.

Type

str

class google.cloud.documentai_v1beta3.types.NormalizedVertex(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

A vertex represents a 2D point in the image. NOTE: the normalized vertex coordinates are relative to the original image and range from 0 to 1.

x

X coordinate.

Type

float

y

Y coordinate (starts from the top of the image).

Type

float

class google.cloud.documentai_v1beta3.types.OcrConfig(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Config for Document OCR.

hints

Hints for the OCR model.

Type

google.cloud.documentai_v1beta3.types.OcrConfig.Hints

enable_native_pdf_parsing

Enables special handling for PDFs with existing text information. Results in better text extraction quality in such PDF inputs.

Type

bool

enable_image_quality_scores

Enables intelligent document quality scores after OCR. Can help with diagnosing why OCR responses are of poor quality for a given input. Adds additional latency comparable to regular OCR to the process call.

Type

bool

advanced_ocr_options

A list of advanced OCR options to further fine-tune OCR behavior. Current valid values are:

  • legacy_layout: a heuristics layout detection algorithm, which serves as an alternative to the current ML-based layout detection algorithm. Customers can choose the best suitable layout algorithm based on their situation.

Type

MutableSequence[str]

enable_symbol

Includes symbol level OCR information if set to true.

Type

bool

compute_style_info

Turn on font identification model and return font style information. Deprecated, use [PremiumFeatures.compute_style_info][google.cloud.documentai.v1beta3.OcrConfig.PremiumFeatures.compute_style_info] instead.

Type

bool

disable_character_boxes_detection

Turn off character box detector in OCR engine. Character box detection is enabled by default in OCR 2.0 (and later) processors.

Type

bool

premium_features

Configurations for premium OCR features.

Type

google.cloud.documentai_v1beta3.types.OcrConfig.PremiumFeatures

class Hints(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Hints for OCR Engine

language_hints

List of BCP-47 language codes to use for OCR. In most cases, not specifying it yields the best results since it enables automatic language detection. For languages based on the Latin alphabet, setting hints is not needed. In rare cases, when the language of the text in the image is known, setting a hint will help get better results (although it will be a significant hindrance if the hint is wrong).

Type

MutableSequence[str]

class PremiumFeatures(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Configurations for premium OCR features.

enable_selection_mark_detection

Turn on selection mark detector in OCR engine. Only available in OCR 2.0 (and later) processors.

Type

bool

compute_style_info

Turn on font identification model and return font style information.

Type

bool

enable_math_ocr

Turn on the model that can extract LaTeX math formulas.

Type

bool

class google.cloud.documentai_v1beta3.types.ProcessOptions(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Options for Process API

This message has oneof fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.

individual_page_selector

Which pages to process (1-indexed).

This field is a member of oneof page_range.

Type

google.cloud.documentai_v1beta3.types.ProcessOptions.IndividualPageSelector

from_start

Only process certain pages from the start. Process all if the document has fewer pages.

This field is a member of oneof page_range.

Type

int

from_end

Only process certain pages from the end, same as above.

This field is a member of oneof page_range.

Type

int

ocr_config

Only applicable to OCR_PROCESSOR and FORM_PARSER_PROCESSOR. Returns error if set on other processor types.

Type

google.cloud.documentai_v1beta3.types.OcrConfig

layout_config

Optional. Only applicable to LAYOUT_PARSER_PROCESSOR. Returns error if set on other processor types.

Type

google.cloud.documentai_v1beta3.types.ProcessOptions.LayoutConfig

schema_override

Optional. Override the schema of the [ProcessorVersion][google.cloud.documentai.v1beta3.ProcessorVersion]. Will return an Invalid Argument error if this field is set when the underlying [ProcessorVersion][google.cloud.documentai.v1beta3.ProcessorVersion] doesn’t support schema override.

Type

google.cloud.documentai_v1beta3.types.DocumentSchema

class IndividualPageSelector(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

A list of individual page numbers.

pages

Optional. Indices of the pages (starting from 1).

Type

MutableSequence[int]

class LayoutConfig(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Serving config for layout parser processor.

chunking_config

Optional. Config for chunking in layout parser processor.

Type

google.cloud.documentai_v1beta3.types.ProcessOptions.LayoutConfig.ChunkingConfig

class ChunkingConfig(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Serving config for chunking.

chunk_size

Optional. The chunk sizes to use when splitting documents, in order of level.

Type

int

include_ancestor_headings

Optional. Whether or not to include ancestor headings when splitting.

Type

bool

semantic_chunking_group_size

Optional. The number of tokens to group together when evaluating semantic similarity.

Type

bool

breakpoint_percentile_threshold

Optional. The percentile of cosine dissimilarity that must be exceeded between a group of tokens and the next. The smaller this number is, the more chunks will be generated.

Type

int

class google.cloud.documentai_v1beta3.types.ProcessRequest(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Request message for the [ProcessDocument][google.cloud.documentai.v1beta3.DocumentProcessorService.ProcessDocument] method.

This message has oneof fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.

inline_document

An inline document proto.

This field is a member of oneof source.

Type

google.cloud.documentai_v1beta3.types.Document

raw_document

A raw document content (bytes).

This field is a member of oneof source.

Type

google.cloud.documentai_v1beta3.types.RawDocument

gcs_document

A raw document on Google Cloud Storage.

This field is a member of oneof source.

Type

google.cloud.documentai_v1beta3.types.GcsDocument

name

Required. The resource name of the [Processor][google.cloud.documentai.v1beta3.Processor] or [ProcessorVersion][google.cloud.documentai.v1beta3.ProcessorVersion] to use for processing. If a [Processor][google.cloud.documentai.v1beta3.Processor] is specified, the server will use its [default version][google.cloud.documentai.v1beta3.Processor.default_processor_version]. Format: projects/{project}/locations/{location}/processors/{processor}, or projects/{project}/locations/{location}/processors/{processor}/processorVersions/{processorVersion}

Type

str

document

The document payload, the [content][google.cloud.documentai.v1beta3.Document.content] and [mime_type][google.cloud.documentai.v1beta3.Document.mime_type] fields must be set.

Type

google.cloud.documentai_v1beta3.types.Document

skip_human_review

Whether human review should be skipped for this request. Default to false.

Type

bool

field_mask

Specifies which fields to include in the [ProcessResponse.document][google.cloud.documentai.v1beta3.ProcessResponse.document] output. Only supports top-level document and pages field, so it must be in the form of {document_field_name} or pages.{page_field_name}.

Type

google.protobuf.field_mask_pb2.FieldMask

process_options

Inference-time options for the process API

Type

google.cloud.documentai_v1beta3.types.ProcessOptions

labels

Optional. The labels with user-defined metadata for the request. Label keys and values can be no longer than 63 characters (Unicode codepoints) and can only contain lowercase letters, numeric characters, underscores, and dashes. International characters are allowed. Label values are optional. Label keys must start with a letter.

Type

MutableMapping[str, str]

imageless_mode

Optional. Option to remove images from the document.

Type

bool

class LabelsEntry(mapping=None, *, ignore_unknown_fields=False, **kwargs)

Bases: proto.message.Message

class google.cloud.documentai_v1beta3.types.ProcessResponse(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Response message for the [ProcessDocument][google.cloud.documentai.v1beta3.DocumentProcessorService.ProcessDocument] method.

document

The document payload, will populate fields based on the processor’s behavior.

Type

google.cloud.documentai_v1beta3.types.Document

human_review_operation

The name of the operation triggered by the processed document. If the human review process isn’t triggered, this field is empty. It has the same response type and metadata as the long-running operation returned by [ReviewDocument][google.cloud.documentai.v1beta3.DocumentProcessorService.ReviewDocument].

Type

str

human_review_status

The status of human review on the processed document.

Type

google.cloud.documentai_v1beta3.types.HumanReviewStatus

class google.cloud.documentai_v1beta3.types.Processor(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

The first-class citizen for Document AI. Each processor defines how to extract structural information from a document.

name

Output only. Immutable. The resource name of the processor. Format: projects/{project}/locations/{location}/processors/{processor}

Type

str

type_

The processor type, such as: OCR_PROCESSOR, INVOICE_PROCESSOR. To get a list of processor types, see [FetchProcessorTypes][google.cloud.documentai.v1beta3.DocumentProcessorService.FetchProcessorTypes].

Type

str

display_name

The display name of the processor.

Type

str

state

Output only. The state of the processor.

Type

google.cloud.documentai_v1beta3.types.Processor.State

default_processor_version

The default processor version.

Type

str

processor_version_aliases

Output only. The processor version aliases.

Type

MutableSequence[google.cloud.documentai_v1beta3.types.ProcessorVersionAlias]

process_endpoint

Output only. Immutable. The http endpoint that can be called to invoke processing.

Type

str

create_time

The time the processor was created.

Type

google.protobuf.timestamp_pb2.Timestamp

kms_key_name

The KMS key used for encryption and decryption in CMEK scenarios.

Type

str

satisfies_pzs

Output only. Reserved for future use.

Type

bool

satisfies_pzi

Output only. Reserved for future use.

Type

bool

class State(value)[source]

Bases: proto.enums.Enum

The possible states of the processor.

Values:
STATE_UNSPECIFIED (0):

The processor is in an unspecified state.

ENABLED (1):

The processor is enabled, i.e., has an enabled version which can currently serve processing requests and all the feature dependencies have been successfully initialized.

DISABLED (2):

The processor is disabled.

ENABLING (3):

The processor is being enabled, will become ENABLED if successful.

DISABLING (4):

The processor is being disabled, will become DISABLED if successful.

CREATING (5):

The processor is being created, will become either ENABLED (for successful creation) or FAILED (for failed ones). Once a processor is in this state, it can then be used for document processing, but the feature dependencies of the processor might not be fully created yet.

FAILED (6):

The processor failed during creation or initialization of feature dependencies. The user should delete the processor and recreate one as all the functionalities of the processor are disabled.

DELETING (7):

The processor is being deleted, will be removed if successful.

class google.cloud.documentai_v1beta3.types.ProcessorType(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

A processor type is responsible for performing a certain document understanding task on a certain type of document.

name

The resource name of the processor type. Format: projects/{project}/processorTypes/{processor_type}

Type

str

type_

The processor type, such as: OCR_PROCESSOR, INVOICE_PROCESSOR.

Type

str

category

The processor category, used by UI to group processor types.

Type

str

available_locations

The locations in which this processor is available.

Type

MutableSequence[google.cloud.documentai_v1beta3.types.ProcessorType.LocationInfo]

allow_creation

Whether the processor type allows creation. If true, users can create a processor of this processor type. Otherwise, users need to request access.

Type

bool

launch_stage

Launch stage of the processor type

Type

google.api.launch_stage_pb2.LaunchStage

sample_document_uris

A set of Cloud Storage URIs of sample documents for this processor.

Type

MutableSequence[str]

class LocationInfo(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

The location information about where the processor is available.

location_id

The location ID. For supported locations, refer to regional and multi-regional support.

Type

str

class google.cloud.documentai_v1beta3.types.ProcessorVersion(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

A processor version is an implementation of a processor. Each processor can have multiple versions, pretrained by Google internally or uptrained by the customer. A processor can only have one default version at a time. Its document-processing behavior is defined by that version.

name

Identifier. The resource name of the processor version. Format: projects/{project}/locations/{location}/processors/{processor}/processorVersions/{processor_version}

Type

str

display_name

The display name of the processor version.

Type

str

document_schema

The schema of the processor version. Describes the output.

Type

google.cloud.documentai_v1beta3.types.DocumentSchema

state

Output only. The state of the processor version.

Type

google.cloud.documentai_v1beta3.types.ProcessorVersion.State

create_time

The time the processor version was created.

Type

google.protobuf.timestamp_pb2.Timestamp

latest_evaluation

The most recently invoked evaluation for the processor version.

Type

google.cloud.documentai_v1beta3.types.EvaluationReference

kms_key_name

The KMS key name used for encryption.

Type

str

kms_key_version_name

The KMS key version with which data is encrypted.

Type

str

google_managed

Output only. Denotes that this ProcessorVersion is managed by Google.

Type

bool

deprecation_info

If set, information about the eventual deprecation of this version.

Type

google.cloud.documentai_v1beta3.types.ProcessorVersion.DeprecationInfo

model_type

Output only. The model type of this processor version.

Type

google.cloud.documentai_v1beta3.types.ProcessorVersion.ModelType

satisfies_pzs

Output only. Reserved for future use.

Type

bool

satisfies_pzi

Output only. Reserved for future use.

Type

bool

gen_ai_model_info

Output only. Information about Generative AI model-based processor versions.

Type

google.cloud.documentai_v1beta3.types.ProcessorVersion.GenAiModelInfo

class DeprecationInfo(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Information about the upcoming deprecation of this processor version.

deprecation_time

The time at which this processor version will be deprecated.

Type

google.protobuf.timestamp_pb2.Timestamp

replacement_processor_version

If set, the processor version that will be used as a replacement.

Type

str

class GenAiModelInfo(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Information about Generative AI model-based processor versions.

This message has oneof fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.

foundation_gen_ai_model_info

Information for a pretrained Google-managed foundation model.

This field is a member of oneof model_info.

Type

google.cloud.documentai_v1beta3.types.ProcessorVersion.GenAiModelInfo.FoundationGenAiModelInfo

custom_gen_ai_model_info

Information for a custom Generative AI model created by the user.

This field is a member of oneof model_info.

Type

google.cloud.documentai_v1beta3.types.ProcessorVersion.GenAiModelInfo.CustomGenAiModelInfo

class CustomGenAiModelInfo(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Information for a custom Generative AI model created by the user. These are created with Create New Version in either the Call foundation model or Fine tuning tabs.

custom_model_type

The type of custom model created by the user.

Type

google.cloud.documentai_v1beta3.types.ProcessorVersion.GenAiModelInfo.CustomGenAiModelInfo.CustomModelType

base_processor_version_id

The base processor version ID for the custom model.

Type

str

class CustomModelType(value)[source]

Bases: proto.enums.Enum

The type of custom model created by the user.

Values:
CUSTOM_MODEL_TYPE_UNSPECIFIED (0):

The model type is unspecified.

VERSIONED_FOUNDATION (1):

The model is a versioned foundation model.

FINE_TUNED (2):

The model is a finetuned foundation model.

class FoundationGenAiModelInfo(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Information for a pretrained Google-managed foundation model.

finetuning_allowed

Whether finetuning is allowed for this base processor version.

Type

bool

min_train_labeled_documents

The minimum number of labeled documents in the training dataset required for finetuning.

Type

int

class ModelType(value)[source]

Bases: proto.enums.Enum

The possible model types of the processor version.

Values:
MODEL_TYPE_UNSPECIFIED (0):

The processor version has unspecified model type.

MODEL_TYPE_GENERATIVE (1):

The processor version has generative model type.

MODEL_TYPE_CUSTOM (2):

The processor version has custom model type.

class State(value)[source]

Bases: proto.enums.Enum

The possible states of the processor version.

Values:
STATE_UNSPECIFIED (0):

The processor version is in an unspecified state.

DEPLOYED (1):

The processor version is deployed and can be used for processing.

DEPLOYING (2):

The processor version is being deployed.

UNDEPLOYED (3):

The processor version is not deployed and cannot be used for processing.

UNDEPLOYING (4):

The processor version is being undeployed.

CREATING (5):

The processor version is being created.

DELETING (6):

The processor version is being deleted.

FAILED (7):

The processor version failed and is in an indeterminate state.

IMPORTING (8):

The processor version is being imported.

class google.cloud.documentai_v1beta3.types.ProcessorVersionAlias(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Contains the alias and the aliased resource name of processor version.

alias

The alias in the form of processor_version resource name.

Type

str

processor_version

The resource name of aliased processor version.

Type

str

class google.cloud.documentai_v1beta3.types.PropertyMetadata(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Metadata about a property.

inactive

Whether the property should be considered as “inactive”.

Type

bool

field_extraction_metadata

Field extraction metadata on the property.

Type

google.cloud.documentai_v1beta3.types.FieldExtractionMetadata

class google.cloud.documentai_v1beta3.types.RawDocument(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Payload message of raw document content (bytes).

content

Inline document content.

Type

bytes

mime_type

An IANA MIME type (RFC6838) indicating the nature and format of the [content][google.cloud.documentai.v1beta3.RawDocument.content].

Type

str

display_name

The display name of the document, it supports all Unicode characters except the following: *, ?, [, ], %, {, },', \", , ~, = and : are reserved. If not specified, a default ID is generated.

Type

str

class google.cloud.documentai_v1beta3.types.ReviewDocumentOperationMetadata(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

The long-running operation metadata for the [ReviewDocument][google.cloud.documentai.v1beta3.DocumentProcessorService.ReviewDocument] method.

state

Used only when Operation.done is false.

Type

google.cloud.documentai_v1beta3.types.ReviewDocumentOperationMetadata.State

state_message

A message providing more details about the current state of processing. For example, the error message if the operation is failed.

Type

str

create_time

The creation time of the operation.

Type

google.protobuf.timestamp_pb2.Timestamp

update_time

The last update time of the operation.

Type

google.protobuf.timestamp_pb2.Timestamp

common_metadata

The basic metadata of the long-running operation.

Type

google.cloud.documentai_v1beta3.types.CommonOperationMetadata

question_id

The Crowd Compute question ID.

Type

str

class State(value)[source]

Bases: proto.enums.Enum

State of the long-running operation.

Values:
STATE_UNSPECIFIED (0):

Unspecified state.

RUNNING (1):

Operation is still running.

CANCELLING (2):

Operation is being cancelled.

SUCCEEDED (3):

Operation succeeded.

FAILED (4):

Operation failed.

CANCELLED (5):

Operation is cancelled.

class google.cloud.documentai_v1beta3.types.ReviewDocumentRequest(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Request message for the [ReviewDocument][google.cloud.documentai.v1beta3.DocumentProcessorService.ReviewDocument] method.

inline_document

An inline document proto.

This field is a member of oneof source.

Type

google.cloud.documentai_v1beta3.types.Document

human_review_config

Required. The resource name of the [HumanReviewConfig][google.cloud.documentai.v1beta3.HumanReviewConfig] that the document will be reviewed with.

Type

str

document

The document that needs human review.

Type

google.cloud.documentai_v1beta3.types.Document

enable_schema_validation

Whether the validation should be performed on the ad-hoc review request.

Type

bool

priority

The priority of the human review task.

Type

google.cloud.documentai_v1beta3.types.ReviewDocumentRequest.Priority

document_schema

The document schema of the human review task.

Type

google.cloud.documentai_v1beta3.types.DocumentSchema

class Priority(value)[source]

Bases: proto.enums.Enum

The priority level of the human review task.

Values:
DEFAULT (0):

The default priority level.

URGENT (1):

The urgent priority level. The labeling manager should allocate labeler resource to the urgent task queue to respect this priority level.

class google.cloud.documentai_v1beta3.types.ReviewDocumentResponse(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Response message for the [ReviewDocument][google.cloud.documentai.v1beta3.DocumentProcessorService.ReviewDocument] method.

gcs_destination

The Cloud Storage uri for the human reviewed document if the review is succeeded.

Type

str

state

The state of the review operation.

Type

google.cloud.documentai_v1beta3.types.ReviewDocumentResponse.State

rejection_reason

The reason why the review is rejected by reviewer.

Type

str

class State(value)[source]

Bases: proto.enums.Enum

Possible states of the review operation.

Values:
STATE_UNSPECIFIED (0):

The default value. This value is used if the state is omitted.

REJECTED (1):

The review operation is rejected by the reviewer.

SUCCEEDED (2):

The review operation is succeeded.

class google.cloud.documentai_v1beta3.types.RevisionRef(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

The revision reference specifies which revision on the document to read.

This message has oneof fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.

revision_case

Reads the revision by the predefined case.

This field is a member of oneof source.

Type

google.cloud.documentai_v1beta3.types.RevisionRef.RevisionCase

revision_id

Reads the revision given by the id.

This field is a member of oneof source.

Type

str

latest_processor_version

Reads the revision generated by the processor version. The format takes the full resource name of processor version. projects/{project}/locations/{location}/processors/{processor}/processorVersions/{processorVersion}

This field is a member of oneof source.

Type

str

class RevisionCase(value)[source]

Bases: proto.enums.Enum

Some predefined revision cases.

Values:
REVISION_CASE_UNSPECIFIED (0):

Unspecified case, fall back to read the LATEST_HUMAN_REVIEW.

LATEST_HUMAN_REVIEW (1):

The latest revision made by a human.

LATEST_TIMESTAMP (2):

The latest revision based on timestamp.

BASE_OCR_REVISION (3):

The first (OCR) revision.

class google.cloud.documentai_v1beta3.types.SetDefaultProcessorVersionMetadata(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

The long-running operation metadata for the [SetDefaultProcessorVersion][google.cloud.documentai.v1beta3.DocumentProcessorService.SetDefaultProcessorVersion] method.

common_metadata

The basic metadata of the long-running operation.

Type

google.cloud.documentai_v1beta3.types.CommonOperationMetadata

class google.cloud.documentai_v1beta3.types.SetDefaultProcessorVersionRequest(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Request message for the [SetDefaultProcessorVersion][google.cloud.documentai.v1beta3.DocumentProcessorService.SetDefaultProcessorVersion] method.

processor

Required. The resource name of the [Processor][google.cloud.documentai.v1beta3.Processor] to change default version.

Type

str

default_processor_version

Required. The resource name of child [ProcessorVersion][google.cloud.documentai.v1beta3.ProcessorVersion] to use as default. Format: projects/{project}/locations/{location}/processors/{processor}/processorVersions/{version}

Type

str

class google.cloud.documentai_v1beta3.types.SetDefaultProcessorVersionResponse(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Response message for the [SetDefaultProcessorVersion][google.cloud.documentai.v1beta3.DocumentProcessorService.SetDefaultProcessorVersion] method.

class google.cloud.documentai_v1beta3.types.SummaryOptions(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Metadata for document summarization.

length

How long the summary should be.

Type

google.cloud.documentai_v1beta3.types.SummaryOptions.Length

format_

The format the summary should be in.

Type

google.cloud.documentai_v1beta3.types.SummaryOptions.Format

class Format(value)[source]

Bases: proto.enums.Enum

The Format enum.

Values:
FORMAT_UNSPECIFIED (0):

Default.

PARAGRAPH (1):

Format the output in paragraphs.

BULLETS (2):

Format the output in bullets.

class Length(value)[source]

Bases: proto.enums.Enum

The Length enum.

Values:
LENGTH_UNSPECIFIED (0):

Default.

BRIEF (1):

A brief summary of one or two sentences.

MODERATE (2):

A paragraph-length summary.

COMPREHENSIVE (3):

The longest option available.

class google.cloud.documentai_v1beta3.types.TrainProcessorVersionMetadata(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

The metadata that represents a processor version being created.

common_metadata

The basic metadata of the long-running operation.

Type

google.cloud.documentai_v1beta3.types.CommonOperationMetadata

training_dataset_validation

The training dataset validation information.

Type

google.cloud.documentai_v1beta3.types.TrainProcessorVersionMetadata.DatasetValidation

test_dataset_validation

The test dataset validation information.

Type

google.cloud.documentai_v1beta3.types.TrainProcessorVersionMetadata.DatasetValidation

class DatasetValidation(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

The dataset validation information. This includes any and all errors with documents and the dataset.

document_error_count

The total number of document errors.

Type

int

dataset_error_count

The total number of dataset errors.

Type

int

document_errors

Error information pertaining to specific documents. A maximum of 10 document errors will be returned. Any document with errors will not be used throughout training.

Type

MutableSequence[google.rpc.status_pb2.Status]

dataset_errors

Error information for the dataset as a whole. A maximum of 10 dataset errors will be returned. A single dataset error is terminal for training.

Type

MutableSequence[google.rpc.status_pb2.Status]

class google.cloud.documentai_v1beta3.types.TrainProcessorVersionRequest(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Request message for the [TrainProcessorVersion][google.cloud.documentai.v1beta3.DocumentProcessorService.TrainProcessorVersion] method.

This message has oneof fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.

custom_document_extraction_options

Options to control Custom Document Extraction (CDE) Processor.

This field is a member of oneof processor_flags.

Type

google.cloud.documentai_v1beta3.types.TrainProcessorVersionRequest.CustomDocumentExtractionOptions

foundation_model_tuning_options

Options to control foundation model tuning of a processor.

This field is a member of oneof processor_flags.

Type

google.cloud.documentai_v1beta3.types.TrainProcessorVersionRequest.FoundationModelTuningOptions

parent

Required. The parent (project, location and processor) to create the new version for. Format: projects/{project}/locations/{location}/processors/{processor}.

Type

str

processor_version

Required. The processor version to be created.

Type

google.cloud.documentai_v1beta3.types.ProcessorVersion

document_schema

Optional. The schema the processor version will be trained with.

Type

google.cloud.documentai_v1beta3.types.DocumentSchema

input_data

Optional. The input data used to train the [ProcessorVersion][google.cloud.documentai.v1beta3.ProcessorVersion].

Type

google.cloud.documentai_v1beta3.types.TrainProcessorVersionRequest.InputData

base_processor_version

Optional. The processor version to use as a base for training. This processor version must be a child of parent. Format: projects/{project}/locations/{location}/processors/{processor}/processorVersions/{processorVersion}.

Type

str

class CustomDocumentExtractionOptions(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Options to control the training of the Custom Document Extraction (CDE) Processor.

training_method

Training method to use for CDE training.

Type

google.cloud.documentai_v1beta3.types.TrainProcessorVersionRequest.CustomDocumentExtractionOptions.TrainingMethod

class TrainingMethod(value)[source]

Bases: proto.enums.Enum

Training Method for CDE. TRAINING_METHOD_UNSPECIFIED will fall back to MODEL_BASED.

Values:
TRAINING_METHOD_UNSPECIFIED (0):

No description available.

MODEL_BASED (1):

No description available.

TEMPLATE_BASED (2):

No description available.

class FoundationModelTuningOptions(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Options to control foundation model tuning of the processor.

train_steps

Optional. The number of steps to run for model tuning. Valid values are between 1 and 400. If not provided, recommended steps will be used.

Type

int

learning_rate_multiplier

Optional. The multiplier to apply to the recommended learning rate. Valid values are between 0.1 and 10. If not provided, recommended learning rate will be used.

Type

float

class InputData(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

The input data used to train a new [ProcessorVersion][google.cloud.documentai.v1beta3.ProcessorVersion].

training_documents

The documents used for training the new version.

Type

google.cloud.documentai_v1beta3.types.BatchDocumentsInputConfig

test_documents

The documents used for testing the trained version.

Type

google.cloud.documentai_v1beta3.types.BatchDocumentsInputConfig

class google.cloud.documentai_v1beta3.types.TrainProcessorVersionResponse(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

The response for [TrainProcessorVersion][google.cloud.documentai.v1beta3.DocumentProcessorService.TrainProcessorVersion].

processor_version

The resource name of the processor version produced by training.

Type

str

class google.cloud.documentai_v1beta3.types.UndeployProcessorVersionMetadata(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

The long-running operation metadata for the [UndeployProcessorVersion][google.cloud.documentai.v1beta3.DocumentProcessorService.UndeployProcessorVersion] method.

common_metadata

The basic metadata of the long-running operation.

Type

google.cloud.documentai_v1beta3.types.CommonOperationMetadata

class google.cloud.documentai_v1beta3.types.UndeployProcessorVersionRequest(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Request message for the [UndeployProcessorVersion][google.cloud.documentai.v1beta3.DocumentProcessorService.UndeployProcessorVersion] method.

name

Required. The processor version resource name to be undeployed.

Type

str

class google.cloud.documentai_v1beta3.types.UndeployProcessorVersionResponse(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Response message for the [UndeployProcessorVersion][google.cloud.documentai.v1beta3.DocumentProcessorService.UndeployProcessorVersion] method.

class google.cloud.documentai_v1beta3.types.UpdateDatasetOperationMetadata(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

common_metadata

The basic metadata of the long-running operation.

Type

google.cloud.documentai_v1beta3.types.CommonOperationMetadata

class google.cloud.documentai_v1beta3.types.UpdateDatasetRequest(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

dataset

Required. The name field of the Dataset is used to identify the resource to be updated.

Type

google.cloud.documentai_v1beta3.types.Dataset

update_mask

The update mask applies to the resource.

Type

google.protobuf.field_mask_pb2.FieldMask

class google.cloud.documentai_v1beta3.types.UpdateDatasetSchemaRequest(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

Request for UpdateDatasetSchema.

dataset_schema

Required. The name field of the DatasetSchema is used to identify the resource to be updated.

Type

google.cloud.documentai_v1beta3.types.DatasetSchema

update_mask

The update mask applies to the resource.

Type

google.protobuf.field_mask_pb2.FieldMask

class google.cloud.documentai_v1beta3.types.Vertex(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]

Bases: proto.message.Message

A vertex represents a 2D point in the image. NOTE: the vertex coordinates are in the same scale as the original image.

x

X coordinate.

Type

int

y

Y coordinate (starts from the top of the image).

Type

int