Types for Google Cloud Documentai v1beta3 API¶
- class google.cloud.documentai_v1beta3.types.Barcode(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Encodes the detailed information of a barcode.
- format_¶
Format of a barcode. The supported formats are:
CODE_128
: Code 128 type.CODE_39
: Code 39 type.CODE_93
: Code 93 type.CODABAR
: Codabar type.DATA_MATRIX
: 2D Data Matrix type.ITF
: ITF type.EAN_13
: EAN-13 type.EAN_8
: EAN-8 type.QR_CODE
: 2D QR code type.UPC_A
: UPC-A type.UPC_E
: UPC-E type.PDF417
: PDF417 type.AZTEC
: 2D Aztec code type.DATABAR
: GS1 DataBar code type.
- Type
- value_format¶
Value format describes the format of the value that a barcode encodes. The supported formats are:
CONTACT_INFO
: Contact information.EMAIL
: Email address.ISBN
: ISBN identifier.PHONE
: Phone number.PRODUCT
: Product.SMS
: SMS message.TEXT
: Text string.URL
: URL address.WIFI
: Wifi information.GEO
: Geo-localization.CALENDAR_EVENT
: Calendar event.DRIVER_LICENSE
: Driver’s license.
- Type
- class google.cloud.documentai_v1beta3.types.BatchDatasetDocuments(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Dataset documents that the batch operation will be applied to.
This message has oneof fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.
- filter¶
A filter matching the documents. Follows the same format and restriction as [google.cloud.documentai.master.ListDocumentsRequest.filter].
This field is a member of oneof
criteria
.- Type
- class IndividualDocumentIds(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
List of individual DocumentIds.
- document_ids¶
Required. List of Document IDs indicating where the actual documents are stored.
- Type
MutableSequence[google.cloud.documentai_v1beta3.types.DocumentId]
- class google.cloud.documentai_v1beta3.types.BatchDeleteDocumentsMetadata(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
- common_metadata¶
The basic metadata of the long-running operation.
- individual_batch_delete_statuses¶
The list of response details of each document.
- class IndividualBatchDeleteStatus(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
The status of each individual document in the batch delete process.
- document_id¶
The document id of the document.
- status¶
The status of deleting the document in storage.
- Type
google.rpc.status_pb2.Status
- class google.cloud.documentai_v1beta3.types.BatchDeleteDocumentsRequest(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
- dataset¶
Required. The dataset resource name. Format:
projects/{project}/locations/{location}/processors/{processor}/dataset
- Type
- dataset_documents¶
Required. Dataset documents input. If given
filter
, all documents satisfying the filter will be deleted. If given documentIds, a maximum of 50 documents can be deleted in a batch. The request will be rejected if more than 50 document_ids are provided.
- class google.cloud.documentai_v1beta3.types.BatchDeleteDocumentsResponse(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Response of the delete documents operation.
- class google.cloud.documentai_v1beta3.types.BatchDocumentsInputConfig(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
The common config to specify a set of documents used as input.
This message has oneof fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.
- gcs_prefix¶
The set of documents that match the specified Cloud Storage
gcs_prefix
.This field is a member of oneof
source
.
- class google.cloud.documentai_v1beta3.types.BatchProcessMetadata(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
The long-running operation metadata for [BatchProcessDocuments][google.cloud.documentai.v1beta3.DocumentProcessorService.BatchProcessDocuments].
- state¶
The state of the current batch processing.
- state_message¶
A message providing more details about the current state of processing. For example, the error message if the operation is failed.
- Type
- create_time¶
The creation time of the operation.
- update_time¶
The last update time of the operation.
- individual_process_statuses¶
The list of response details of each document.
- Type
MutableSequence[google.cloud.documentai_v1beta3.types.BatchProcessMetadata.IndividualProcessStatus]
- class IndividualProcessStatus(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
The status of a each individual document in the batch process.
- input_gcs_source¶
The source of the document, same as the [input_gcs_source][google.cloud.documentai.v1beta3.BatchProcessMetadata.IndividualProcessStatus.input_gcs_source] field in the request when the batch process started.
- Type
- status¶
The status processing the document.
- Type
google.rpc.status_pb2.Status
- output_gcs_destination¶
The Cloud Storage output destination (in the request as [DocumentOutputConfig.GcsOutputConfig.gcs_uri][google.cloud.documentai.v1beta3.DocumentOutputConfig.GcsOutputConfig.gcs_uri]) of the processed document if it was successful, otherwise empty.
- Type
- human_review_operation¶
The name of the operation triggered by the processed document. If the human review process isn’t triggered, this field will be empty. It has the same response type and metadata as the long-running operation returned by the [ReviewDocument][google.cloud.documentai.v1beta3.DocumentProcessorService.ReviewDocument] method.
- Type
- human_review_status¶
The status of human review on the processed document.
- class State(value)[source]¶
Bases:
proto.enums.Enum
Possible states of the batch processing operation.
- Values:
- STATE_UNSPECIFIED (0):
The default value. This value is used if the state is omitted.
- WAITING (1):
Request operation is waiting for scheduling.
- RUNNING (2):
Request is being processed.
- SUCCEEDED (3):
The batch processing completed successfully.
- CANCELLING (4):
The batch processing was being cancelled.
- CANCELLED (5):
The batch processing was cancelled.
- FAILED (6):
The batch processing has failed.
- class google.cloud.documentai_v1beta3.types.BatchProcessRequest(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Request message for [BatchProcessDocuments][google.cloud.documentai.v1beta3.DocumentProcessorService.BatchProcessDocuments].
- name¶
Required. The resource name of [Processor][google.cloud.documentai.v1beta3.Processor] or [ProcessorVersion][google.cloud.documentai.v1beta3.ProcessorVersion]. Format:
projects/{project}/locations/{location}/processors/{processor}
, orprojects/{project}/locations/{location}/processors/{processor}/processorVersions/{processorVersion}
- Type
- input_configs¶
The input config for each single document in the batch process.
- Type
MutableSequence[google.cloud.documentai_v1beta3.types.BatchProcessRequest.BatchInputConfig]
- output_config¶
The overall output config for batch process.
- input_documents¶
The input documents for the [BatchProcessDocuments][google.cloud.documentai.v1beta3.DocumentProcessorService.BatchProcessDocuments] method.
- document_output_config¶
The output configuration for the [BatchProcessDocuments][google.cloud.documentai.v1beta3.DocumentProcessorService.BatchProcessDocuments] method.
- skip_human_review¶
Whether human review should be skipped for this request. Default to
false
.- Type
- process_options¶
Inference-time options for the process API
- labels¶
Optional. The labels with user-defined metadata for the request. Label keys and values can be no longer than 63 characters (Unicode codepoints) and can only contain lowercase letters, numeric characters, underscores, and dashes. International characters are allowed. Label values are optional. Label keys must start with a letter.
- class BatchInputConfig(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
The message for input config in batch process.
- mime_type¶
An IANA published media type (MIME type) of the input. If the input is a raw document, refer to supported file types for the list of media types. If the input is a [Document][google.cloud.documentai.v1beta3.Document], the type should be
application/json
.- Type
- class BatchOutputConfig(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
The output configuration in the [BatchProcessDocuments][google.cloud.documentai.v1beta3.DocumentProcessorService.BatchProcessDocuments] method.
- class LabelsEntry(mapping=None, *, ignore_unknown_fields=False, **kwargs)¶
Bases:
proto.message.Message
- class google.cloud.documentai_v1beta3.types.BatchProcessResponse(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Response message for [BatchProcessDocuments][google.cloud.documentai.v1beta3.DocumentProcessorService.BatchProcessDocuments].
- class google.cloud.documentai_v1beta3.types.BoundingPoly(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
A bounding polygon for the detected image annotation.
- vertices¶
The bounding polygon vertices.
- Type
MutableSequence[google.cloud.documentai_v1beta3.types.Vertex]
- normalized_vertices¶
The bounding polygon normalized vertices.
- Type
MutableSequence[google.cloud.documentai_v1beta3.types.NormalizedVertex]
- class google.cloud.documentai_v1beta3.types.CommonOperationMetadata(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
The common metadata for long running operations.
- state¶
The state of the operation.
- create_time¶
The creation time of the operation.
- update_time¶
The last update time of the operation.
- class State(value)[source]¶
Bases:
proto.enums.Enum
State of the longrunning operation.
- Values:
- STATE_UNSPECIFIED (0):
Unspecified state.
- RUNNING (1):
Operation is still running.
- CANCELLING (2):
Operation is being cancelled.
- SUCCEEDED (3):
Operation succeeded.
- FAILED (4):
Operation failed.
- CANCELLED (5):
Operation is cancelled.
- class google.cloud.documentai_v1beta3.types.CreateProcessorRequest(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Request message for the [CreateProcessor][google.cloud.documentai.v1beta3.DocumentProcessorService.CreateProcessor] method. Notice this request is sent to a regionalized backend service. If the [ProcessorType][google.cloud.documentai.v1beta3.ProcessorType] isn’t available in that region, the creation fails.
- parent¶
Required. The parent (project and location) under which to create the processor. Format:
projects/{project}/locations/{location}
- Type
- processor¶
Required. The processor to be created, requires [Processor.type][google.cloud.documentai.v1beta3.Processor.type] and [Processor.display_name][google.cloud.documentai.v1beta3.Processor.display_name] to be set. Also, the [Processor.kms_key_name][google.cloud.documentai.v1beta3.Processor.kms_key_name] field must be set if the processor is under CMEK.
- class google.cloud.documentai_v1beta3.types.Dataset(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
A singleton resource under a [Processor][google.cloud.documentai.v1beta3.Processor] which configures a collection of documents.
This message has oneof fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.
- gcs_managed_config¶
Optional. User-managed Cloud Storage dataset configuration. Use this configuration if the dataset documents are stored under a user-managed Cloud Storage location.
This field is a member of oneof
storage_source
.
- document_warehouse_config¶
Optional. Deprecated. Warehouse-based dataset configuration is not supported.
This field is a member of oneof
storage_source
.
- unmanaged_dataset_config¶
Optional. Unmanaged dataset configuration. Use this configuration if the dataset documents are managed by the document service internally (not user-managed).
This field is a member of oneof
storage_source
.
- spanner_indexing_config¶
Optional. A lightweight indexing source with low latency and high reliability, but lacking advanced features like CMEK and content-based search.
This field is a member of oneof
indexing_source
.
- name¶
Dataset resource name. Format:
projects/{project}/locations/{location}/processors/{processor}/dataset
- Type
- state¶
Required. State of the dataset. Ignored when updating dataset.
- class DocumentWarehouseConfig(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Configuration specific to the Document AI Warehouse-based implementation.
- collection¶
Output only. The collection in Document AI Warehouse associated with the dataset.
- Type
- class GCSManagedConfig(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Configuration specific to the Cloud Storage-based implementation.
- gcs_prefix¶
Required. The Cloud Storage URI (a directory) where the documents belonging to the dataset must be stored.
- class SpannerIndexingConfig(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Configuration specific to spanner-based indexing.
- class State(value)[source]¶
Bases:
proto.enums.Enum
Different states of a dataset.
- Values:
- STATE_UNSPECIFIED (0):
Default unspecified enum, should not be used.
- UNINITIALIZED (1):
Dataset has not been initialized.
- INITIALIZING (2):
Dataset is being initialized.
- INITIALIZED (3):
Dataset has been initialized.
- class UnmanagedDatasetConfig(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Configuration specific to an unmanaged dataset.
- class google.cloud.documentai_v1beta3.types.DatasetSchema(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Dataset Schema.
- name¶
Dataset schema resource name. Format:
projects/{project}/locations/{location}/processors/{processor}/dataset/datasetSchema
- Type
- document_schema¶
Optional. Schema of the dataset.
- class google.cloud.documentai_v1beta3.types.DatasetSplitType(value)[source]¶
Bases:
proto.enums.Enum
Documents belonging to a dataset will be split into different groups referred to as splits: train, test.
- Values:
- DATASET_SPLIT_TYPE_UNSPECIFIED (0):
Default value if the enum is not set.
- DATASET_SPLIT_TRAIN (1):
Identifies the train documents.
- DATASET_SPLIT_TEST (2):
Identifies the test documents.
- DATASET_SPLIT_UNASSIGNED (3):
Identifies the unassigned documents.
- class google.cloud.documentai_v1beta3.types.DeleteProcessorMetadata(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
The long-running operation metadata for the [DeleteProcessor][google.cloud.documentai.v1beta3.DocumentProcessorService.DeleteProcessor] method.
- common_metadata¶
The basic metadata of the long-running operation.
- class google.cloud.documentai_v1beta3.types.DeleteProcessorRequest(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Request message for the [DeleteProcessor][google.cloud.documentai.v1beta3.DocumentProcessorService.DeleteProcessor] method.
- class google.cloud.documentai_v1beta3.types.DeleteProcessorVersionMetadata(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
The long-running operation metadata for the [DeleteProcessorVersion][google.cloud.documentai.v1beta3.DocumentProcessorService.DeleteProcessorVersion] method.
- common_metadata¶
The basic metadata of the long-running operation.
- class google.cloud.documentai_v1beta3.types.DeleteProcessorVersionRequest(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Request message for the [DeleteProcessorVersion][google.cloud.documentai.v1beta3.DocumentProcessorService.DeleteProcessorVersion] method.
- class google.cloud.documentai_v1beta3.types.DeployProcessorVersionMetadata(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
The long-running operation metadata for the [DeployProcessorVersion][google.cloud.documentai.v1beta3.DocumentProcessorService.DeployProcessorVersion] method.
- common_metadata¶
The basic metadata of the long-running operation.
- class google.cloud.documentai_v1beta3.types.DeployProcessorVersionRequest(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Request message for the [DeployProcessorVersion][google.cloud.documentai.v1beta3.DocumentProcessorService.DeployProcessorVersion] method.
- class google.cloud.documentai_v1beta3.types.DeployProcessorVersionResponse(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Response message for the [DeployProcessorVersion][google.cloud.documentai.v1beta3.DocumentProcessorService.DeployProcessorVersion] method.
- class google.cloud.documentai_v1beta3.types.DisableProcessorMetadata(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
The long-running operation metadata for the [DisableProcessor][google.cloud.documentai.v1beta3.DocumentProcessorService.DisableProcessor] method.
- common_metadata¶
The basic metadata of the long-running operation.
- class google.cloud.documentai_v1beta3.types.DisableProcessorRequest(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Request message for the [DisableProcessor][google.cloud.documentai.v1beta3.DocumentProcessorService.DisableProcessor] method.
- class google.cloud.documentai_v1beta3.types.DisableProcessorResponse(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Response message for the [DisableProcessor][google.cloud.documentai.v1beta3.DocumentProcessorService.DisableProcessor] method. Intentionally empty proto for adding fields in future.
- class google.cloud.documentai_v1beta3.types.Document(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Document represents the canonical document resource in Document AI. It is an interchange format that provides insights into documents and allows for collaboration between users and Document AI to iterate and optimize for quality.
This message has oneof fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.
- uri¶
Optional. Currently supports Google Cloud Storage URI of the form
gs://bucket_name/object_name
. Object versioning is not supported. For more information, refer to Google Cloud Storage Request URIs.This field is a member of oneof
source
.- Type
- content¶
Optional. Inline document content, represented as a stream of bytes. Note: As with all
bytes
fields, protobuffers use a pure binary representation, whereas JSON representations use base64.This field is a member of oneof
source
.- Type
- mime_type¶
An IANA published media type (MIME type).
- Type
- text_styles¶
Styles for the [Document.text][google.cloud.documentai.v1beta3.Document.text].
- Type
MutableSequence[google.cloud.documentai_v1beta3.types.Document.Style]
- pages¶
Visual page layout for the [Document][google.cloud.documentai.v1beta3.Document].
- Type
MutableSequence[google.cloud.documentai_v1beta3.types.Document.Page]
- entities¶
A list of entities detected on [Document.text][google.cloud.documentai.v1beta3.Document.text]. For document shards, entities in this list may cross shard boundaries.
- Type
MutableSequence[google.cloud.documentai_v1beta3.types.Document.Entity]
- entity_relations¶
Placeholder. Relationship among [Document.entities][google.cloud.documentai.v1beta3.Document.entities].
- Type
MutableSequence[google.cloud.documentai_v1beta3.types.Document.EntityRelation]
- text_changes¶
Placeholder. A list of text corrections made to [Document.text][google.cloud.documentai.v1beta3.Document.text]. This is usually used for annotating corrections to OCR mistakes. Text changes for a given revision may not overlap with each other.
- Type
MutableSequence[google.cloud.documentai_v1beta3.types.Document.TextChange]
- shard_info¶
Information about the sharding if this document is sharded part of a larger document. If the document is not sharded, this message is not specified.
- error¶
Any error that occurred while processing this document.
- Type
google.rpc.status_pb2.Status
- revisions¶
Placeholder. Revision history of this document.
- Type
MutableSequence[google.cloud.documentai_v1beta3.types.Document.Revision]
- document_layout¶
Parsed layout of the document.
- chunked_document¶
Document chunked based on chunking config.
- class ChunkedDocument(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Represents the chunks that the document is divided into.
- chunks¶
List of chunks.
- Type
MutableSequence[google.cloud.documentai_v1beta3.types.Document.ChunkedDocument.Chunk]
- class Chunk(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Represents a chunk.
- page_span¶
Page span of the chunk.
- page_headers¶
Page headers associated with the chunk.
- Type
MutableSequence[google.cloud.documentai_v1beta3.types.Document.ChunkedDocument.Chunk.ChunkPageHeader]
Page footers associated with the chunk.
- Type
MutableSequence[google.cloud.documentai_v1beta3.types.Document.ChunkedDocument.Chunk.ChunkPageFooter]
Bases:
proto.message.Message
Represents the page footer associated with the chunk.
Footer in text format.
- Type
Page span of the footer.
- class ChunkPageHeader(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Represents the page header associated with the chunk.
- page_span¶
Page span of the header.
- class DocumentLayout(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Represents the parsed layout of a document as a collection of blocks that the document is divided into.
- blocks¶
List of blocks in the document.
- Type
MutableSequence[google.cloud.documentai_v1beta3.types.Document.DocumentLayout.DocumentLayoutBlock]
- class DocumentLayoutBlock(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Represents a block. A block could be one of the various types (text, table, list) supported.
This message has oneof fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.
- page_span¶
Page span of the block.
- class LayoutListBlock(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Represents a list type block.
- list_entries¶
List entries that constitute a list block.
- class LayoutListEntry(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Represents an entry in the list.
- blocks¶
A list entry is a list of blocks. Repeated blocks support further hierarchies and nested blocks.
- Type
MutableSequence[google.cloud.documentai_v1beta3.types.Document.DocumentLayout.DocumentLayoutBlock]
- class LayoutPageSpan(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Represents where the block starts and ends in the document.
- class LayoutTableBlock(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Represents a table type block.
- header_rows¶
Header rows at the top of the table.
- body_rows¶
Body rows containing main table content.
Table caption/title.
- Type
- class LayoutTableCell(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Represents a cell in a table row.
- blocks¶
A table cell is a list of blocks. Repeated blocks support further hierarchies and nested blocks.
- Type
MutableSequence[google.cloud.documentai_v1beta3.types.Document.DocumentLayout.DocumentLayoutBlock]
- class LayoutTableRow(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Represents a row in a table.
- cells¶
A table row is a list of table cells.
- class LayoutTextBlock(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Represents a text type block.
- type_¶
Type of the text in the block. Available options are:
paragraph
,subtitle
,heading-1
,heading-2
,heading-3
,heading-4
,heading-5
,header
,footer
.- Type
- blocks¶
A text block could further have child blocks. Repeated blocks support further hierarchies and nested blocks.
- Type
MutableSequence[google.cloud.documentai_v1beta3.types.Document.DocumentLayout.DocumentLayoutBlock]
- class Entity(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
An entity that could be a phrase in the text or a property that belongs to the document. It is a known entity type, such as a person, an organization, or location.
- text_anchor¶
Optional. Provenance of the entity. Text anchor indexing into the [Document.text][google.cloud.documentai.v1beta3.Document.text].
- page_anchor¶
Optional. Represents the provenance of this entity wrt. the location on the page where it was found.
- id¶
Optional. Canonical id. This will be a unique value in the entity list for this document.
- Type
- normalized_value¶
Optional. Normalized entity value. Absent if the extracted value could not be converted or the type (e.g. address) is not supported for certain parsers. This field is also only populated for certain supported document types.
- properties¶
Optional. Entities can be nested to form a hierarchical data structure representing the content in the document.
- Type
MutableSequence[google.cloud.documentai_v1beta3.types.Document.Entity]
- provenance¶
Optional. The history of this annotation.
- class NormalizedValue(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Parsed and normalized entity value.
This message has oneof fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.
- money_value¶
Money value. See also:
https://github.com/googleapis/googleapis/blob/master/google/type/money.proto
This field is a member of oneof
structured_value
.- Type
google.type.money_pb2.Money
- date_value¶
Date value. Includes year, month, day. See also: https://github.com/googleapis/googleapis/blob/master/google/type/date.proto
This field is a member of oneof
structured_value
.- Type
google.type.date_pb2.Date
- datetime_value¶
DateTime value. Includes date, time, and timezone. See also: https://github.com/googleapis/googleapis/blob/master/google/type/datetime.proto
This field is a member of oneof
structured_value
.- Type
google.type.datetime_pb2.DateTime
- address_value¶
Postal address. See also: https://github.com/googleapis/googleapis/blob/master/google/type/postal_address.proto
This field is a member of oneof
structured_value
.- Type
google.type.postal_address_pb2.PostalAddress
- boolean_value¶
Boolean value. Can be used for entities with binary values, or for checkboxes.
This field is a member of oneof
structured_value
.- Type
- text¶
Optional. An optional field to store a normalized string. For some entity types, one of respective
structured_value
fields may also be populated. Also not all the types ofstructured_value
will be normalized. For example, some processors may not generatefloat
orinteger
normalized text by default.Below are sample formats mapped to structured values.
Money/Currency type (
money_value
) is in the ISO 4217 text format.Date type (
date_value
) is in the ISO 8601 text format.Datetime type (
datetime_value
) is in the ISO 8601 text format.
- Type
- class EntityRelation(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Relationship between [Entities][google.cloud.documentai.v1beta3.Document.Entity].
- class Page(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
A page in a [Document][google.cloud.documentai.v1beta3.Document].
- page_number¶
1-based index for current [Page][google.cloud.documentai.v1beta3.Document.Page] in a parent [Document][google.cloud.documentai.v1beta3.Document]. Useful when a page is taken out of a [Document][google.cloud.documentai.v1beta3.Document] for individual processing.
- Type
- image¶
Rendered image for this page. This image is preprocessed to remove any skew, rotation, and distortions such that the annotation bounding boxes can be upright and axis-aligned.
- transforms¶
Transformation matrices that were applied to the original document image to produce [Page.image][google.cloud.documentai.v1beta3.Document.Page.image].
- Type
MutableSequence[google.cloud.documentai_v1beta3.types.Document.Page.Matrix]
- dimension¶
Physical dimension of the page.
- layout¶
[Layout][google.cloud.documentai.v1beta3.Document.Page.Layout] for the page.
- detected_languages¶
A list of detected languages together with confidence.
- Type
MutableSequence[google.cloud.documentai_v1beta3.types.Document.Page.DetectedLanguage]
- blocks¶
A list of visually detected text blocks on the page. A block has a set of lines (collected into paragraphs) that have a common line-spacing and orientation.
- Type
MutableSequence[google.cloud.documentai_v1beta3.types.Document.Page.Block]
- paragraphs¶
A list of visually detected text paragraphs on the page. A collection of lines that a human would perceive as a paragraph.
- Type
MutableSequence[google.cloud.documentai_v1beta3.types.Document.Page.Paragraph]
- lines¶
A list of visually detected text lines on the page. A collection of tokens that a human would perceive as a line.
- Type
MutableSequence[google.cloud.documentai_v1beta3.types.Document.Page.Line]
- tokens¶
A list of visually detected tokens on the page.
- Type
MutableSequence[google.cloud.documentai_v1beta3.types.Document.Page.Token]
- visual_elements¶
A list of detected non-text visual elements e.g. checkbox, signature etc. on the page.
- Type
MutableSequence[google.cloud.documentai_v1beta3.types.Document.Page.VisualElement]
- tables¶
A list of visually detected tables on the page.
- Type
MutableSequence[google.cloud.documentai_v1beta3.types.Document.Page.Table]
- form_fields¶
A list of visually detected form fields on the page.
- Type
MutableSequence[google.cloud.documentai_v1beta3.types.Document.Page.FormField]
- symbols¶
A list of visually detected symbols on the page.
- Type
MutableSequence[google.cloud.documentai_v1beta3.types.Document.Page.Symbol]
- detected_barcodes¶
A list of detected barcodes.
- Type
MutableSequence[google.cloud.documentai_v1beta3.types.Document.Page.DetectedBarcode]
- image_quality_scores¶
Image quality scores.
- provenance¶
The history of this page.
- class Block(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
A block has a set of lines (collected into paragraphs) that have a common line-spacing and orientation.
- layout¶
[Layout][google.cloud.documentai.v1beta3.Document.Page.Layout] for [Block][google.cloud.documentai.v1beta3.Document.Page.Block].
- detected_languages¶
A list of detected languages together with confidence.
- Type
MutableSequence[google.cloud.documentai_v1beta3.types.Document.Page.DetectedLanguage]
- provenance¶
The history of this annotation.
- class DetectedBarcode(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
A detected barcode.
- layout¶
[Layout][google.cloud.documentai.v1beta3.Document.Page.Layout] for [DetectedBarcode][google.cloud.documentai.v1beta3.Document.Page.DetectedBarcode].
- barcode¶
Detailed barcode information of the [DetectedBarcode][google.cloud.documentai.v1beta3.Document.Page.DetectedBarcode].
- class DetectedLanguage(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Detected language for a structural component.
- language_code¶
The BCP-47 language code, such as
en-US
orsr-Latn
.- Type
- class Dimension(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Dimension for the page.
- class FormField(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
A form field detected on the page.
- field_name¶
[Layout][google.cloud.documentai.v1beta3.Document.Page.Layout] for the [FormField][google.cloud.documentai.v1beta3.Document.Page.FormField] name. e.g.
Address
,Email
,Grand total
,Phone number
, etc.
- field_value¶
[Layout][google.cloud.documentai.v1beta3.Document.Page.Layout] for the [FormField][google.cloud.documentai.v1beta3.Document.Page.FormField] value.
- name_detected_languages¶
A list of detected languages for name together with confidence.
- Type
MutableSequence[google.cloud.documentai_v1beta3.types.Document.Page.DetectedLanguage]
- value_detected_languages¶
A list of detected languages for value together with confidence.
- Type
MutableSequence[google.cloud.documentai_v1beta3.types.Document.Page.DetectedLanguage]
- value_type¶
If the value is non-textual, this field represents the type. Current valid values are:
blank (this indicates the
field_value
is normal text)unfilled_checkbox
filled_checkbox
- Type
- corrected_key_text¶
Created for Labeling UI to export key text. If corrections were made to the text identified by the
field_name.text_anchor
, this field will contain the correction.- Type
- corrected_value_text¶
Created for Labeling UI to export value text. If corrections were made to the text identified by the
field_value.text_anchor
, this field will contain the correction.- Type
- provenance¶
The history of this annotation.
- class Image(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Rendered image contents for this page.
- mime_type¶
Encoding media type (MIME type) for the image.
- Type
- class ImageQualityScores(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Image quality scores for the page image.
- detected_defects¶
A list of detected defects.
- Type
MutableSequence[google.cloud.documentai_v1beta3.types.Document.Page.ImageQualityScores.DetectedDefect]
- class DetectedDefect(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Image Quality Defects
- type_¶
Name of the defect type. Supported values are:
quality/defect_blurry
quality/defect_noisy
quality/defect_dark
quality/defect_faint
quality/defect_text_too_small
quality/defect_document_cutoff
quality/defect_text_cutoff
quality/defect_glare
- Type
- class Layout(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Visual element describing a layout unit on a page.
- text_anchor¶
Text anchor indexing into the [Document.text][google.cloud.documentai.v1beta3.Document.text].
- confidence¶
Confidence of the current [Layout][google.cloud.documentai.v1beta3.Document.Page.Layout] within context of the object this layout is for. e.g. confidence can be for a single token, a table, a visual element, etc. depending on context. Range
[0, 1]
.- Type
- bounding_poly¶
The bounding polygon for the [Layout][google.cloud.documentai.v1beta3.Document.Page.Layout].
- orientation¶
Detected orientation for the [Layout][google.cloud.documentai.v1beta3.Document.Page.Layout].
- class Orientation(value)[source]¶
Bases:
proto.enums.Enum
Detected human reading orientation.
- Values:
- ORIENTATION_UNSPECIFIED (0):
Unspecified orientation.
- PAGE_UP (1):
Orientation is aligned with page up.
- PAGE_RIGHT (2):
Orientation is aligned with page right. Turn the head 90 degrees clockwise from upright to read.
- PAGE_DOWN (3):
Orientation is aligned with page down. Turn the head 180 degrees from upright to read.
- PAGE_LEFT (4):
Orientation is aligned with page left. Turn the head 90 degrees counterclockwise from upright to read.
- class Line(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
A collection of tokens that a human would perceive as a line. Does not cross column boundaries, can be horizontal, vertical, etc.
- layout¶
[Layout][google.cloud.documentai.v1beta3.Document.Page.Layout] for [Line][google.cloud.documentai.v1beta3.Document.Page.Line].
- detected_languages¶
A list of detected languages together with confidence.
- Type
MutableSequence[google.cloud.documentai_v1beta3.types.Document.Page.DetectedLanguage]
- provenance¶
The history of this annotation.
- class Matrix(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Representation for transformation matrix, intended to be compatible and used with OpenCV format for image manipulation.
- type_¶
This encodes information about what data type the matrix uses. For example, 0 (CV_8U) is an unsigned 8-bit image. For the full list of OpenCV primitive data types, please refer to https://docs.opencv.org/4.3.0/d1/d1b/group__core__hal__interface.html
- Type
- class Paragraph(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
A collection of lines that a human would perceive as a paragraph.
- layout¶
[Layout][google.cloud.documentai.v1beta3.Document.Page.Layout] for [Paragraph][google.cloud.documentai.v1beta3.Document.Page.Paragraph].
- detected_languages¶
A list of detected languages together with confidence.
- Type
MutableSequence[google.cloud.documentai_v1beta3.types.Document.Page.DetectedLanguage]
- provenance¶
The history of this annotation.
- class Symbol(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
A detected symbol.
- layout¶
[Layout][google.cloud.documentai.v1beta3.Document.Page.Layout] for [Symbol][google.cloud.documentai.v1beta3.Document.Page.Symbol].
- detected_languages¶
A list of detected languages together with confidence.
- Type
MutableSequence[google.cloud.documentai_v1beta3.types.Document.Page.DetectedLanguage]
- class Table(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
A table representation similar to HTML table structure.
- layout¶
[Layout][google.cloud.documentai.v1beta3.Document.Page.Layout] for [Table][google.cloud.documentai.v1beta3.Document.Page.Table].
- header_rows¶
Header rows of the table.
- Type
MutableSequence[google.cloud.documentai_v1beta3.types.Document.Page.Table.TableRow]
- body_rows¶
Body rows of the table.
- Type
MutableSequence[google.cloud.documentai_v1beta3.types.Document.Page.Table.TableRow]
- detected_languages¶
A list of detected languages together with confidence.
- Type
MutableSequence[google.cloud.documentai_v1beta3.types.Document.Page.DetectedLanguage]
- provenance¶
The history of this table.
- class TableCell(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
A cell representation inside the table.
- layout¶
[Layout][google.cloud.documentai.v1beta3.Document.Page.Layout] for [TableCell][google.cloud.documentai.v1beta3.Document.Page.Table.TableCell].
- detected_languages¶
A list of detected languages together with confidence.
- Type
MutableSequence[google.cloud.documentai_v1beta3.types.Document.Page.DetectedLanguage]
- class TableRow(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
A row of table cells.
- cells¶
Cells that make up this row.
- Type
MutableSequence[google.cloud.documentai_v1beta3.types.Document.Page.Table.TableCell]
- class Token(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
A detected token.
- layout¶
[Layout][google.cloud.documentai.v1beta3.Document.Page.Layout] for [Token][google.cloud.documentai.v1beta3.Document.Page.Token].
- detected_break¶
Detected break at the end of a [Token][google.cloud.documentai.v1beta3.Document.Page.Token].
- detected_languages¶
A list of detected languages together with confidence.
- Type
MutableSequence[google.cloud.documentai_v1beta3.types.Document.Page.DetectedLanguage]
- provenance¶
The history of this annotation.
- style_info¶
Text style attributes.
- class DetectedBreak(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Detected break at the end of a [Token][google.cloud.documentai.v1beta3.Document.Page.Token].
- type_¶
Detected break type.
- class Type(value)[source]¶
Bases:
proto.enums.Enum
Enum to denote the type of break found.
- Values:
- TYPE_UNSPECIFIED (0):
Unspecified break type.
- SPACE (1):
A single whitespace.
- WIDE_SPACE (2):
A wider whitespace.
- HYPHEN (3):
A hyphen that indicates that a token has been split across lines.
- class StyleInfo(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Font and other text style attributes.
- pixel_font_size¶
Font size in pixels, equal to unrounded [font_size][google.cloud.documentai.v1beta3.Document.Page.Token.StyleInfo.font_size]
resolution ÷
72.0
.
- Type
- bold¶
Whether the text is bold (equivalent to [font_weight][google.cloud.documentai.v1beta3.Document.Page.Token.StyleInfo.font_weight] is at least
700
).- Type
- font_weight¶
TrueType weight on a scale
100
(thin) to1000
(ultra-heavy). Normal is400
, bold is700
.- Type
- text_color¶
Color of the text.
- Type
google.type.color_pb2.Color
- background_color¶
Color of the background.
- Type
google.type.color_pb2.Color
- class VisualElement(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Detected non-text visual elements e.g. checkbox, signature etc. on the page.
- layout¶
[Layout][google.cloud.documentai.v1beta3.Document.Page.Layout] for [VisualElement][google.cloud.documentai.v1beta3.Document.Page.VisualElement].
- type_¶
Type of the [VisualElement][google.cloud.documentai.v1beta3.Document.Page.VisualElement].
- Type
- detected_languages¶
A list of detected languages together with confidence.
- Type
MutableSequence[google.cloud.documentai_v1beta3.types.Document.Page.DetectedLanguage]
- class PageAnchor(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Referencing the visual context of the entity in the [Document.pages][google.cloud.documentai.v1beta3.Document.pages]. Page anchors can be cross-page, consist of multiple bounding polygons and optionally reference specific layout element types.
- page_refs¶
One or more references to visual page elements
- Type
MutableSequence[google.cloud.documentai_v1beta3.types.Document.PageAnchor.PageRef]
- class PageRef(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Represents a weak reference to a page element within a document.
- page¶
Required. Index into the [Document.pages][google.cloud.documentai.v1beta3.Document.pages] element, for example using
[Document.pages][page_refs.page]
to locate the related page element. This field is skipped when its value is the default0
. See https://developers.google.com/protocol-buffers/docs/proto3#json.- Type
- layout_type¶
Optional. The type of the layout element that is being referenced if any.
- layout_id¶
Optional. Deprecated. Use [PageRef.bounding_poly][google.cloud.documentai.v1beta3.Document.PageAnchor.PageRef.bounding_poly] instead.
- Type
- bounding_poly¶
Optional. Identifies the bounding polygon of a layout element on the page. If
layout_type
is set, the bounding polygon must be exactly the same to the layout element it’s referring to.
- class LayoutType(value)[source]¶
Bases:
proto.enums.Enum
The type of layout that is being referenced.
- Values:
- LAYOUT_TYPE_UNSPECIFIED (0):
Layout Unspecified.
- BLOCK (1):
References a [Page.blocks][google.cloud.documentai.v1beta3.Document.Page.blocks] element.
- PARAGRAPH (2):
References a [Page.paragraphs][google.cloud.documentai.v1beta3.Document.Page.paragraphs] element.
- LINE (3):
References a [Page.lines][google.cloud.documentai.v1beta3.Document.Page.lines] element.
- TOKEN (4):
References a [Page.tokens][google.cloud.documentai.v1beta3.Document.Page.tokens] element.
- VISUAL_ELEMENT (5):
References a [Page.visual_elements][google.cloud.documentai.v1beta3.Document.Page.visual_elements] element.
- TABLE (6):
Refrrences a [Page.tables][google.cloud.documentai.v1beta3.Document.Page.tables] element.
- FORM_FIELD (7):
References a [Page.form_fields][google.cloud.documentai.v1beta3.Document.Page.form_fields] element.
- class Provenance(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Structure to identify provenance relationships between annotations in different revisions.
- parents¶
References to the original elements that are replaced.
- Type
MutableSequence[google.cloud.documentai_v1beta3.types.Document.Provenance.Parent]
- type_¶
The type of provenance operation.
- class OperationType(value)[source]¶
Bases:
proto.enums.Enum
If a processor or agent does an explicit operation on existing elements.
- Values:
- OPERATION_TYPE_UNSPECIFIED (0):
Operation type unspecified. If no operation is specified a provenance entry is simply used to match against a
parent
.- ADD (1):
Add an element.
- REMOVE (2):
Remove an element identified by
parent
.- UPDATE (7):
Updates any fields within the given provenance scope of the message. It overwrites the fields rather than replacing them. Use this when you want to update a field value of an entity without also updating all the child properties.
- REPLACE (3):
Currently unused. Replace an element identified by
parent
.- EVAL_REQUESTED (4):
Deprecated. Request human review for the element identified by
parent
.- EVAL_APPROVED (5):
Deprecated. Element is reviewed and approved at human review, confidence will be set to 1.0.
- EVAL_SKIPPED (6):
Deprecated. Element is skipped in the validation process.
- class Parent(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
The parent element the current element is based on. Used for referencing/aligning, removal and replacement operations.
- index¶
The index of the parent item in the corresponding item list (eg. list of entities, properties within entities, etc.) in the parent revision.
- Type
- class Revision(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Contains past or forward revisions of this document.
This message has oneof fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.
- agent¶
If the change was made by a person specify the name or id of that person.
This field is a member of oneof
source
.- Type
- processor¶
If the annotation was made by processor identify the processor by its resource name.
This field is a member of oneof
source
.- Type
- id¶
Id of the revision, internally generated by doc proto storage. Unique within the context of the document.
- Type
- parent¶
The revisions that this revision is based on. This can include one or more parent (when documents are merged.) This field represents the index into the
revisions
field.- Type
MutableSequence[int]
- parent_ids¶
The revisions that this revision is based on. Must include all the ids that have anything to do with this revision - eg. there are
provenance.parent.revision
fields that index into this field.- Type
MutableSequence[str]
- create_time¶
The time that the revision was created, internally generated by doc proto storage at the time of create.
- human_review¶
Human Review information of this revision.
- class HumanReview(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Human Review information of the document.
- class ShardInfo(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
For a large document, sharding may be performed to produce several document shards. Each document shard contains this field to detail which shard it is.
- class Style(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Annotation for common text style attributes. This adheres to CSS conventions as much as possible.
- text_anchor¶
Text anchor indexing into the [Document.text][google.cloud.documentai.v1beta3.Document.text].
- color¶
Text color.
- Type
google.type.color_pb2.Color
- background_color¶
Text background color.
- Type
google.type.color_pb2.Color
- font_weight¶
Font weight. Possible values are
normal
,bold
,bolder
, andlighter
.- Type
- text_style¶
Text style. Possible values are
normal
,italic
, andoblique
.- Type
- text_decoration¶
Text decoration. Follows CSS standard.
- Type
- font_size¶
Font size.
- font_family¶
Font family such as
Arial
,Times New Roman
. https://www.w3schools.com/cssref/pr_font_font-family.asp- Type
- class TextAnchor(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Text reference indexing into the [Document.text][google.cloud.documentai.v1beta3.Document.text].
- text_segments¶
The text segments from the [Document.text][google.cloud.documentai.v1beta3.Document.text].
- Type
MutableSequence[google.cloud.documentai_v1beta3.types.Document.TextAnchor.TextSegment]
- content¶
Contains the content of the text span so that users do not have to look it up in the text_segments. It is always populated for formFields.
- Type
- class TextSegment(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
A text segment in the [Document.text][google.cloud.documentai.v1beta3.Document.text]. The indices may be out of bounds which indicate that the text extends into another document shard for large sharded documents. See [ShardInfo.text_offset][google.cloud.documentai.v1beta3.Document.ShardInfo.text_offset]
- start_index¶
[TextSegment][google.cloud.documentai.v1beta3.Document.TextAnchor.TextSegment] start UTF-8 char index in the [Document.text][google.cloud.documentai.v1beta3.Document.text].
- Type
- class TextChange(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
This message is used for text changes aka. OCR corrections.
- text_anchor¶
Provenance of the correction. Text anchor indexing into the [Document.text][google.cloud.documentai.v1beta3.Document.text]. There can only be a single
TextAnchor.text_segments
element. If the start and end index of the text segment are the same, the text change is inserted before that index.
- provenance¶
The history of this annotation.
- Type
MutableSequence[google.cloud.documentai_v1beta3.types.Document.Provenance]
- class google.cloud.documentai_v1beta3.types.DocumentId(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Document Identifier.
This message has oneof fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.
- gcs_managed_doc_id¶
A document id within user-managed Cloud Storage.
This field is a member of oneof
type
.
- revision_ref¶
Points to a specific revision of the document if set.
- class GCSManagedDocumentId(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Identifies a document uniquely within the scope of a dataset in the user-managed Cloud Storage option.
- class UnmanagedDocumentId(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Identifies a document uniquely within the scope of a dataset in unmanaged option.
- class google.cloud.documentai_v1beta3.types.DocumentLabelingState(value)[source]¶
Bases:
proto.enums.Enum
Describes the labeling status of a document.
- Values:
- DOCUMENT_LABELING_STATE_UNSPECIFIED (0):
Default value if the enum is not set.
- DOCUMENT_LABELED (1):
Document has been labeled.
- DOCUMENT_UNLABELED (2):
Document has not been labeled.
- DOCUMENT_AUTO_LABELED (3):
Document has been auto-labeled.
- class google.cloud.documentai_v1beta3.types.DocumentMetadata(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Metadata about a document.
- document_id¶
Document identifier.
- dataset_type¶
Type of the dataset split to which the document belongs.
- labeling_state¶
Labeling state of the document.
- class google.cloud.documentai_v1beta3.types.DocumentOutputConfig(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Config that controls the output of documents. All documents will be written as a JSON file.
- gcs_output_config¶
Output config to write the results to Cloud Storage.
This field is a member of oneof
destination
.
- class GcsOutputConfig(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
The configuration used when outputting documents.
- field_mask¶
Specifies which fields to include in the output documents. Only supports top level document and pages field so it must be in the form of
{document_field_name}
orpages.{page_field_name}
.
- sharding_config¶
Specifies the sharding config for the output document.
- class ShardingConfig(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
The sharding config for the output document.
- class google.cloud.documentai_v1beta3.types.DocumentPageRange(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Range of pages present in a document.
- class google.cloud.documentai_v1beta3.types.DocumentSchema(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
The schema defines the output of the processed document by a processor.
- entity_types¶
Entity types of the schema.
- Type
MutableSequence[google.cloud.documentai_v1beta3.types.DocumentSchema.EntityType]
- metadata¶
Metadata of the schema.
- class EntityType(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
EntityType is the wrapper of a label of the corresponding model with detailed attributes and limitations for entity-based processors. Multiple types can also compose a dependency tree to represent nested types.
- enum_values¶
If specified, lists all the possible values for this entity. This should not be more than a handful of values. If the number of values is >10 or could change frequently use the
EntityType.value_ontology
field and specify a list of all possible values in a value ontology file.This field is a member of oneof
value_source
.
- name¶
Name of the type. It must be unique within the schema file and cannot be a “Common Type”. The following naming conventions are used:
Use
snake_casing
.Name matching is case-sensitive.
Maximum 64 characters.
Must start with a letter.
Allowed characters: ASCII letters
[a-z0-9_-]
. (For backward compatibility internal infrastructure and tooling can handle any ascii character.)The
/
is sometimes used to denote a property of a type. For exampleline_item/amount
. This convention is deprecated, but will still be honored for backward compatibility.
- Type
- description¶
The description of the entity type. Could be used to provide more information about the entity type for model calls.
- Type
- base_types¶
The entity type that this type is derived from. For now, one and only one should be set.
- Type
MutableSequence[str]
- properties¶
Description the nested structure, or composition of an entity.
- Type
MutableSequence[google.cloud.documentai_v1beta3.types.DocumentSchema.EntityType.Property]
- entity_type_metadata¶
Metadata for the entity type.
- class EnumValues(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Defines the a list of enum values.
- class Property(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Defines properties that can be part of the entity type.
- description¶
The description of the property. Could be used to provide more information about the property for model calls.
- Type
- value_type¶
A reference to the value type of the property. This type is subject to the same conventions as the
Entity.base_types
field.- Type
- occurrence_type¶
Occurrence type limits the number of instances an entity type appears in the document.
- property_metadata¶
Any additional metadata about the property can be added here.
- class OccurrenceType(value)[source]¶
Bases:
proto.enums.Enum
Types of occurrences of the entity type in the document. This represents the number of instances, not mentions, of an entity. For example, a bank statement might only have one
account_number
, but this account number can be mentioned in several places on the document. In this case, theaccount_number
is considered aREQUIRED_ONCE
entity type. If, on the other hand, we expect a bank statement to contain the status of multiple different accounts for the customers, the occurrence type is set toREQUIRED_MULTIPLE
.- Values:
- OCCURRENCE_TYPE_UNSPECIFIED (0):
Unspecified occurrence type.
- OPTIONAL_ONCE (1):
There will be zero or one instance of this entity type. The same entity instance may be mentioned multiple times.
- OPTIONAL_MULTIPLE (2):
The entity type will appear zero or multiple times.
- REQUIRED_ONCE (3):
The entity type will only appear exactly once. The same entity instance may be mentioned multiple times.
- REQUIRED_MULTIPLE (4):
The entity type will appear once or more times.
- class Metadata(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Metadata for global schema behavior.
- document_splitter¶
If true, a
document
entity type can be applied to subdocument (splitting). Otherwise, it can only be applied to the entire document (classification).- Type
- document_allow_multiple_labels¶
If true, on a given page, there can be multiple
document
annotations covering it.- Type
- prefixed_naming_on_properties¶
If set, all the nested entities must be prefixed with the parents.
- Type
- class google.cloud.documentai_v1beta3.types.EnableProcessorMetadata(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
The long-running operation metadata for the [EnableProcessor][google.cloud.documentai.v1beta3.DocumentProcessorService.EnableProcessor] method.
- common_metadata¶
The basic metadata of the long-running operation.
- class google.cloud.documentai_v1beta3.types.EnableProcessorRequest(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Request message for the [EnableProcessor][google.cloud.documentai.v1beta3.DocumentProcessorService.EnableProcessor] method.
- class google.cloud.documentai_v1beta3.types.EnableProcessorResponse(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Response message for the [EnableProcessor][google.cloud.documentai.v1beta3.DocumentProcessorService.EnableProcessor] method. Intentionally empty proto for adding fields in future.
- class google.cloud.documentai_v1beta3.types.EntityTypeMetadata(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Metadata about an entity type.
- class google.cloud.documentai_v1beta3.types.EvaluateProcessorVersionMetadata(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Metadata of the [EvaluateProcessorVersion][google.cloud.documentai.v1beta3.DocumentProcessorService.EvaluateProcessorVersion] method.
- common_metadata¶
The basic metadata of the long-running operation.
- class google.cloud.documentai_v1beta3.types.EvaluateProcessorVersionRequest(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Evaluates the given [ProcessorVersion][google.cloud.documentai.v1beta3.ProcessorVersion] against the supplied documents.
- processor_version¶
Required. The resource name of the [ProcessorVersion][google.cloud.documentai.v1beta3.ProcessorVersion] to evaluate.
projects/{project}/locations/{location}/processors/{processor}/processorVersions/{processorVersion}
- Type
- evaluation_documents¶
Optional. The documents used in the evaluation. If unspecified, use the processor’s dataset as evaluation input.
- class google.cloud.documentai_v1beta3.types.EvaluateProcessorVersionResponse(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Response of the [EvaluateProcessorVersion][google.cloud.documentai.v1beta3.DocumentProcessorService.EvaluateProcessorVersion] method.
- class google.cloud.documentai_v1beta3.types.Evaluation(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
An evaluation of a ProcessorVersion’s performance.
- name¶
The resource name of the evaluation. Format:
projects/{project}/locations/{location}/processors/{processor}/processorVersions/{processor_version}/evaluations/{evaluation}
- Type
- create_time¶
The time that the evaluation was created.
- document_counters¶
Counters for the documents used in the evaluation.
- all_entities_metrics¶
Metrics for all the entities in aggregate.
- entity_metrics¶
Metrics across confidence levels, for different entities.
- Type
MutableMapping[str, google.cloud.documentai_v1beta3.types.Evaluation.MultiConfidenceMetrics]
- class ConfidenceLevelMetrics(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Evaluations metrics, at a specific confidence level.
- metrics¶
The metrics at the specific confidence level.
- class Counters(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Evaluation counters for the documents that were used.
- invalid_documents_count¶
How many documents were not included in the evaluation as they didn’t pass validation.
- Type
- failed_documents_count¶
How many documents were not included in the evaluation as Document AI failed to process them.
- Type
- class EntityMetricsEntry(mapping=None, *, ignore_unknown_fields=False, **kwargs)¶
Bases:
proto.message.Message
- class Metrics(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Evaluation metrics, either in aggregate or about a specific entity.
- class MultiConfidenceMetrics(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Metrics across multiple confidence levels.
- confidence_level_metrics¶
Metrics across confidence levels with fuzzy matching enabled.
- Type
MutableSequence[google.cloud.documentai_v1beta3.types.Evaluation.ConfidenceLevelMetrics]
- confidence_level_metrics_exact¶
Metrics across confidence levels with only exact matching.
- Type
MutableSequence[google.cloud.documentai_v1beta3.types.Evaluation.ConfidenceLevelMetrics]
- auprc¶
The calculated area under the precision recall curve (AUPRC), computed by integrating over all confidence thresholds.
- Type
- estimated_calibration_error¶
The Estimated Calibration Error (ECE) of the confidence of the predicted entities.
- Type
- auprc_exact¶
The AUPRC for metrics with fuzzy matching disabled, i.e., exact matching only.
- Type
- estimated_calibration_error_exact¶
The ECE for the predicted entities with fuzzy matching disabled, i.e., exact matching only.
- Type
- metrics_type¶
The metrics type for the label.
- class MetricsType(value)[source]¶
Bases:
proto.enums.Enum
A type that determines how metrics should be interpreted.
- Values:
- METRICS_TYPE_UNSPECIFIED (0):
The metrics type is unspecified. By default, metrics without a particular specification are for leaf entity types (i.e., top-level entity types without child types, or child types which are not parent types themselves).
- AGGREGATE (1):
Indicates whether metrics for this particular label type represent an aggregate of metrics for other types instead of being based on actual TP/FP/FN values for the label type. Metrics for parent (i.e., non-leaf) entity types are an aggregate of metrics for their children.
- class google.cloud.documentai_v1beta3.types.EvaluationReference(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Gives a short summary of an evaluation, and links to the evaluation itself.
- aggregate_metrics¶
An aggregate of the statistics for the evaluation with fuzzy matching on.
- aggregate_metrics_exact¶
An aggregate of the statistics for the evaluation with fuzzy matching off.
- class google.cloud.documentai_v1beta3.types.FetchProcessorTypesRequest(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Request message for the [FetchProcessorTypes][google.cloud.documentai.v1beta3.DocumentProcessorService.FetchProcessorTypes] method. Some processor types may require the project be added to an allowlist.
- class google.cloud.documentai_v1beta3.types.FetchProcessorTypesResponse(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Response message for the [FetchProcessorTypes][google.cloud.documentai.v1beta3.DocumentProcessorService.FetchProcessorTypes] method.
- processor_types¶
The list of processor types.
- Type
MutableSequence[google.cloud.documentai_v1beta3.types.ProcessorType]
- class google.cloud.documentai_v1beta3.types.FieldExtractionMetadata(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Metadata for how this field value is extracted.
- summary_options¶
Summary options config.
- class google.cloud.documentai_v1beta3.types.GcsDocument(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Specifies a document stored on Cloud Storage.
- class google.cloud.documentai_v1beta3.types.GcsDocuments(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Specifies a set of documents on Cloud Storage.
- documents¶
The list of documents.
- Type
MutableSequence[google.cloud.documentai_v1beta3.types.GcsDocument]
- class google.cloud.documentai_v1beta3.types.GcsPrefix(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Specifies all documents on Cloud Storage with a common prefix.
- class google.cloud.documentai_v1beta3.types.GetDatasetSchemaRequest(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Request for
GetDatasetSchema
.- name¶
Required. The dataset schema resource name. Format:
projects/{project}/locations/{location}/processors/{processor}/dataset/datasetSchema
- Type
- class google.cloud.documentai_v1beta3.types.GetDocumentRequest(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
- dataset¶
Required. The resource name of the dataset that the document belongs to . Format:
projects/{project}/locations/{location}/processors/{processor}/dataset
- Type
- document_id¶
Required. Document identifier.
- read_mask¶
If set, only fields listed here will be returned. Otherwise, all fields will be returned by default.
- page_range¶
List of pages for which the fields specified in the
read_mask
must be served.
- class google.cloud.documentai_v1beta3.types.GetDocumentResponse(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
- document¶
- class google.cloud.documentai_v1beta3.types.GetEvaluationRequest(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Retrieves a specific Evaluation.
- class google.cloud.documentai_v1beta3.types.GetProcessorRequest(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Request message for the [GetProcessor][google.cloud.documentai.v1beta3.DocumentProcessorService.GetProcessor] method.
- class google.cloud.documentai_v1beta3.types.GetProcessorTypeRequest(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Request message for the [GetProcessorType][google.cloud.documentai.v1beta3.DocumentProcessorService.GetProcessorType] method.
- class google.cloud.documentai_v1beta3.types.GetProcessorVersionRequest(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Request message for the [GetProcessorVersion][google.cloud.documentai.v1beta3.DocumentProcessorService.GetProcessorVersion] method.
- class google.cloud.documentai_v1beta3.types.HumanReviewStatus(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
The status of human review on a processed document.
- state¶
The state of human review on the processing request.
- human_review_operation¶
The name of the operation triggered by the processed document. This field is populated only when the [state][google.cloud.documentai.v1beta3.HumanReviewStatus.state] is
HUMAN_REVIEW_IN_PROGRESS
. It has the same response type and metadata as the long-running operation returned by [ReviewDocument][google.cloud.documentai.v1beta3.DocumentProcessorService.ReviewDocument].- Type
- class State(value)[source]¶
Bases:
proto.enums.Enum
The final state of human review on a processed document.
- Values:
- STATE_UNSPECIFIED (0):
Human review state is unspecified. Most likely due to an internal error.
- SKIPPED (1):
Human review is skipped for the document. This can happen because human review isn’t enabled on the processor or the processing request has been set to skip this document.
- VALIDATION_PASSED (2):
Human review validation is triggered and passed, so no review is needed.
- IN_PROGRESS (3):
Human review validation is triggered and the document is under review.
- ERROR (4):
Some error happened during triggering human review, see the [state_message][google.cloud.documentai.v1beta3.HumanReviewStatus.state_message] for details.
- class google.cloud.documentai_v1beta3.types.ImportDocumentsMetadata(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Metadata of the import document operation.
- common_metadata¶
The basic metadata of the long-running operation.
- individual_import_statuses¶
The list of response details of each document.
- Type
MutableSequence[google.cloud.documentai_v1beta3.types.ImportDocumentsMetadata.IndividualImportStatus]
- import_config_validation_results¶
Validation statuses of the batch documents import config.
- class ImportConfigValidationResult(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
The validation status of each import config. Status is set to an error if there are no documents to import in the
import_config
, orOK
if the operation will try to proceed with at least one document.- status¶
The validation status of import config.
- Type
google.rpc.status_pb2.Status
- class IndividualImportStatus(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
The status of each individual document in the import process.
- status¶
The status of the importing of the document.
- Type
google.rpc.status_pb2.Status
- output_document_id¶
The document id of imported document if it was successful, otherwise empty.
- class google.cloud.documentai_v1beta3.types.ImportDocumentsRequest(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
- dataset¶
Required. The dataset resource name. Format:
projects/{project}/locations/{location}/processors/{processor}/dataset
- Type
- batch_documents_import_configs¶
Required. The Cloud Storage uri containing raw documents that must be imported.
- Type
MutableSequence[google.cloud.documentai_v1beta3.types.ImportDocumentsRequest.BatchDocumentsImportConfig]
- class BatchDocumentsImportConfig(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Config for importing documents. Each batch can have its own dataset split type.
This message has oneof fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.
- dataset_split¶
Target dataset split where the documents must be stored.
This field is a member of oneof
split_type_config
.
- auto_split_config¶
If set, documents will be automatically split into training and test split category with the specified ratio.
This field is a member of oneof
split_type_config
.
- batch_input_config¶
The common config to specify a set of documents used as input.
- class AutoSplitConfig(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
The config for auto-split.
- class google.cloud.documentai_v1beta3.types.ImportDocumentsResponse(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Response of the import document operation.
- class google.cloud.documentai_v1beta3.types.ImportProcessorVersionMetadata(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
The long-running operation metadata for the [ImportProcessorVersion][google.cloud.documentai.v1beta3.DocumentProcessorService.ImportProcessorVersion] method.
- common_metadata¶
The basic metadata for the long-running operation.
- class google.cloud.documentai_v1beta3.types.ImportProcessorVersionRequest(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
The request message for the [ImportProcessorVersion][google.cloud.documentai.v1beta3.DocumentProcessorService.ImportProcessorVersion] method.
The Document AI Service Agent of the destination project must have Document AI Editor role on the source project.
The destination project is specified as part of the [parent][google.cloud.documentai.v1beta3.ImportProcessorVersionRequest.parent] field. The source project is specified as part of the [source][google.cloud.documentai.v1beta3.ImportProcessorVersionRequest.processor_version_source] or [external_processor_version_source][google.cloud.documentai.v1beta3.ImportProcessorVersionRequest.external_processor_version_source] field.
This message has oneof fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.
- processor_version_source¶
The source processor version to import from. The source processor version and destination processor need to be in the same environment and region. Note that ProcessorVersions with
model_type
MODEL_TYPE_LLM
are not supported.This field is a member of oneof
source
.- Type
- external_processor_version_source¶
The source processor version to import from. It can be from a different environment and region than the destination processor.
This field is a member of oneof
source
.
- parent¶
Required. The destination processor name to create the processor version in. Format:
projects/{project}/locations/{location}/processors/{processor}
- Type
- class ExternalProcessorVersionSource(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
The external source processor version.
- processor_version¶
Required. The processor version name. Format:
projects/{project}/locations/{location}/processors/{processor}/processorVersions/{processorVersion}
- Type
- service_endpoint¶
Optional. The Document AI service endpoint. For example, ‘https://us-documentai.googleapis.com’
- Type
- class google.cloud.documentai_v1beta3.types.ImportProcessorVersionResponse(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
The response message for the [ImportProcessorVersion][google.cloud.documentai.v1beta3.DocumentProcessorService.ImportProcessorVersion] method.
- class google.cloud.documentai_v1beta3.types.ListDocumentsRequest(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
- dataset¶
Required. The resource name of the dataset to be listed. Format:
projects/{project}/locations/{location}/processors/{processor}/dataset
- Type
- page_size¶
The maximum number of documents to return. The service may return fewer than this value. If unspecified, at most 20 documents will be returned. The maximum value is 100; values above 100 will be coerced to 100.
- Type
- page_token¶
A page token, received from a previous
ListDocuments
call. Provide this to retrieve the subsequent page.When paginating, all other parameters provided to
ListDocuments
must match the call that provided the page token.- Type
- filter¶
Optional. Query to filter the documents based on https://google.aip.dev/160.
Currently support query strings are:
SplitType=DATASET_SPLIT_TEST|DATASET_SPLIT_TRAIN|DATASET_SPLIT_UNASSIGNED
LabelingState=DOCUMENT_LABELED|DOCUMENT_UNLABELED|DOCUMENT_AUTO_LABELED
DisplayName=\"file_name.pdf\"
EntityType=abc/def
TagName=\"auto-labeling-running\"|\"sampled\"
Note:
Only
AND
,=
and!=
are supported. e.g.DisplayName=file_name AND EntityType!=abc
IS supported.Wildcard
*
is supported only inDisplayName
filterNo duplicate filter keys are allowed, e.g.
EntityType=a AND EntityType=b
is NOT supported.String match is case sensitive (for filter
DisplayName
&EntityType
).
- Type
- return_total_size¶
Optional. Controls if the request requires a total size of matched documents. See [ListDocumentsResponse.total_size][google.cloud.documentai.v1beta3.ListDocumentsResponse.total_size].
Enabling this flag may adversely impact performance.
Defaults to false.
- Type
- skip¶
Optional. Number of results to skip beginning from the
page_token
if provided. https://google.aip.dev/158#skipping-results. It must be a non-negative integer. Negative values will be rejected. Note that this is not the number of pages to skip. If this value causes the cursor to move past the end of results, [ListDocumentsResponse.document_metadata][google.cloud.documentai.v1beta3.ListDocumentsResponse.document_metadata] and [ListDocumentsResponse.next_page_token][google.cloud.documentai.v1beta3.ListDocumentsResponse.next_page_token] will be empty.- Type
- class google.cloud.documentai_v1beta3.types.ListDocumentsResponse(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
- document_metadata¶
Document metadata corresponding to the listed documents.
- Type
MutableSequence[google.cloud.documentai_v1beta3.types.DocumentMetadata]
- next_page_token¶
A token, which can be sent as [ListDocumentsRequest.page_token][google.cloud.documentai.v1beta3.ListDocumentsRequest.page_token] to retrieve the next page. If this field is omitted, there are no subsequent pages.
- Type
- class google.cloud.documentai_v1beta3.types.ListEvaluationsRequest(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Retrieves a list of evaluations for a given [ProcessorVersion][google.cloud.documentai.v1beta3.ProcessorVersion].
- parent¶
Required. The resource name of the [ProcessorVersion][google.cloud.documentai.v1beta3.ProcessorVersion] to list evaluations for.
projects/{project}/locations/{location}/processors/{processor}/processorVersions/{processorVersion}
- Type
- page_size¶
The standard list page size. If unspecified, at most
5
evaluations are returned. The maximum value is100
. Values above100
are coerced to100
.- Type
- class google.cloud.documentai_v1beta3.types.ListEvaluationsResponse(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
The response from
ListEvaluations
.- evaluations¶
The evaluations requested.
- Type
MutableSequence[google.cloud.documentai_v1beta3.types.Evaluation]
- class google.cloud.documentai_v1beta3.types.ListProcessorTypesRequest(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Request message for the [ListProcessorTypes][google.cloud.documentai.v1beta3.DocumentProcessorService.ListProcessorTypes] method. Some processor types may require the project be added to an allowlist.
- parent¶
Required. The location of processor types to list. Format:
projects/{project}/locations/{location}
.- Type
- page_size¶
The maximum number of processor types to return. If unspecified, at most
100
processor types will be returned. The maximum value is500
. Values above500
will be coerced to500
.- Type
- class google.cloud.documentai_v1beta3.types.ListProcessorTypesResponse(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Response message for the [ListProcessorTypes][google.cloud.documentai.v1beta3.DocumentProcessorService.ListProcessorTypes] method.
- processor_types¶
The processor types.
- Type
MutableSequence[google.cloud.documentai_v1beta3.types.ProcessorType]
- class google.cloud.documentai_v1beta3.types.ListProcessorVersionsRequest(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Request message for list all processor versions belongs to a processor.
- parent¶
Required. The parent (project, location and processor) to list all versions. Format:
projects/{project}/locations/{location}/processors/{processor}
- Type
- page_size¶
The maximum number of processor versions to return. If unspecified, at most
10
processor versions will be returned. The maximum value is20
. Values above20
will be coerced to20
.- Type
- class google.cloud.documentai_v1beta3.types.ListProcessorVersionsResponse(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Response message for the [ListProcessorVersions][google.cloud.documentai.v1beta3.DocumentProcessorService.ListProcessorVersions] method.
- processor_versions¶
The list of processors.
- Type
MutableSequence[google.cloud.documentai_v1beta3.types.ProcessorVersion]
- class google.cloud.documentai_v1beta3.types.ListProcessorsRequest(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Request message for list all processors belongs to a project.
- parent¶
Required. The parent (project and location) which owns this collection of Processors. Format:
projects/{project}/locations/{location}
- Type
- page_size¶
The maximum number of processors to return. If unspecified, at most
50
processors will be returned. The maximum value is100
. Values above100
will be coerced to100
.- Type
- class google.cloud.documentai_v1beta3.types.ListProcessorsResponse(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Response message for the [ListProcessors][google.cloud.documentai.v1beta3.DocumentProcessorService.ListProcessors] method.
- processors¶
The list of processors.
- Type
MutableSequence[google.cloud.documentai_v1beta3.types.Processor]
- class google.cloud.documentai_v1beta3.types.NormalizedVertex(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
A vertex represents a 2D point in the image. NOTE: the normalized vertex coordinates are relative to the original image and range from 0 to 1.
- class google.cloud.documentai_v1beta3.types.OcrConfig(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Config for Document OCR.
- hints¶
Hints for the OCR model.
- enable_native_pdf_parsing¶
Enables special handling for PDFs with existing text information. Results in better text extraction quality in such PDF inputs.
- Type
- enable_image_quality_scores¶
Enables intelligent document quality scores after OCR. Can help with diagnosing why OCR responses are of poor quality for a given input. Adds additional latency comparable to regular OCR to the process call.
- Type
- advanced_ocr_options¶
A list of advanced OCR options to further fine-tune OCR behavior. Current valid values are:
legacy_layout
: a heuristics layout detection algorithm, which serves as an alternative to the current ML-based layout detection algorithm. Customers can choose the best suitable layout algorithm based on their situation.
- Type
MutableSequence[str]
- compute_style_info¶
Turn on font identification model and return font style information. Deprecated, use [PremiumFeatures.compute_style_info][google.cloud.documentai.v1beta3.OcrConfig.PremiumFeatures.compute_style_info] instead.
- Type
- disable_character_boxes_detection¶
Turn off character box detector in OCR engine. Character box detection is enabled by default in OCR 2.0 (and later) processors.
- Type
Configurations for premium OCR features.
- class Hints(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Hints for OCR Engine
- language_hints¶
List of BCP-47 language codes to use for OCR. In most cases, not specifying it yields the best results since it enables automatic language detection. For languages based on the Latin alphabet, setting hints is not needed. In rare cases, when the language of the text in the image is known, setting a hint will help get better results (although it will be a significant hindrance if the hint is wrong).
- Type
MutableSequence[str]
- class PremiumFeatures(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Configurations for premium OCR features.
- enable_selection_mark_detection¶
Turn on selection mark detector in OCR engine. Only available in OCR 2.0 (and later) processors.
- Type
- class google.cloud.documentai_v1beta3.types.ProcessOptions(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Options for Process API
This message has oneof fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.
- individual_page_selector¶
Which pages to process (1-indexed).
This field is a member of oneof
page_range
.
- from_start¶
Only process certain pages from the start. Process all if the document has fewer pages.
This field is a member of oneof
page_range
.- Type
- from_end¶
Only process certain pages from the end, same as above.
This field is a member of oneof
page_range
.- Type
- ocr_config¶
Only applicable to
OCR_PROCESSOR
andFORM_PARSER_PROCESSOR
. Returns error if set on other processor types.
- layout_config¶
Optional. Only applicable to
LAYOUT_PARSER_PROCESSOR
. Returns error if set on other processor types.
- schema_override¶
Optional. Override the schema of the [ProcessorVersion][google.cloud.documentai.v1beta3.ProcessorVersion]. Will return an Invalid Argument error if this field is set when the underlying [ProcessorVersion][google.cloud.documentai.v1beta3.ProcessorVersion] doesn’t support schema override.
- class IndividualPageSelector(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
A list of individual page numbers.
- class LayoutConfig(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Serving config for layout parser processor.
- chunking_config¶
Optional. Config for chunking in layout parser processor.
- class ChunkingConfig(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Serving config for chunking.
- include_ancestor_headings¶
Optional. Whether or not to include ancestor headings when splitting.
- Type
- semantic_chunking_group_size¶
Optional. The number of tokens to group together when evaluating semantic similarity.
- Type
- class google.cloud.documentai_v1beta3.types.ProcessRequest(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Request message for the [ProcessDocument][google.cloud.documentai.v1beta3.DocumentProcessorService.ProcessDocument] method.
This message has oneof fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.
- name¶
Required. The resource name of the [Processor][google.cloud.documentai.v1beta3.Processor] or [ProcessorVersion][google.cloud.documentai.v1beta3.ProcessorVersion] to use for processing. If a [Processor][google.cloud.documentai.v1beta3.Processor] is specified, the server will use its [default version][google.cloud.documentai.v1beta3.Processor.default_processor_version]. Format:
projects/{project}/locations/{location}/processors/{processor}
, orprojects/{project}/locations/{location}/processors/{processor}/processorVersions/{processorVersion}
- Type
- document¶
The document payload, the [content][google.cloud.documentai.v1beta3.Document.content] and [mime_type][google.cloud.documentai.v1beta3.Document.mime_type] fields must be set.
- skip_human_review¶
Whether human review should be skipped for this request. Default to
false
.- Type
- field_mask¶
Specifies which fields to include in the [ProcessResponse.document][google.cloud.documentai.v1beta3.ProcessResponse.document] output. Only supports top-level document and pages field, so it must be in the form of
{document_field_name}
orpages.{page_field_name}
.
- process_options¶
Inference-time options for the process API
- labels¶
Optional. The labels with user-defined metadata for the request. Label keys and values can be no longer than 63 characters (Unicode codepoints) and can only contain lowercase letters, numeric characters, underscores, and dashes. International characters are allowed. Label values are optional. Label keys must start with a letter.
- class LabelsEntry(mapping=None, *, ignore_unknown_fields=False, **kwargs)¶
Bases:
proto.message.Message
- class google.cloud.documentai_v1beta3.types.ProcessResponse(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Response message for the [ProcessDocument][google.cloud.documentai.v1beta3.DocumentProcessorService.ProcessDocument] method.
- document¶
The document payload, will populate fields based on the processor’s behavior.
- human_review_operation¶
The name of the operation triggered by the processed document. If the human review process isn’t triggered, this field is empty. It has the same response type and metadata as the long-running operation returned by [ReviewDocument][google.cloud.documentai.v1beta3.DocumentProcessorService.ReviewDocument].
- Type
- human_review_status¶
The status of human review on the processed document.
- class google.cloud.documentai_v1beta3.types.Processor(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
The first-class citizen for Document AI. Each processor defines how to extract structural information from a document.
- name¶
Output only. Immutable. The resource name of the processor. Format:
projects/{project}/locations/{location}/processors/{processor}
- Type
- type_¶
The processor type, such as:
OCR_PROCESSOR
,INVOICE_PROCESSOR
. To get a list of processor types, see [FetchProcessorTypes][google.cloud.documentai.v1beta3.DocumentProcessorService.FetchProcessorTypes].- Type
- state¶
Output only. The state of the processor.
- processor_version_aliases¶
Output only. The processor version aliases.
- Type
MutableSequence[google.cloud.documentai_v1beta3.types.ProcessorVersionAlias]
- process_endpoint¶
Output only. Immutable. The http endpoint that can be called to invoke processing.
- Type
- create_time¶
The time the processor was created.
- class State(value)[source]¶
Bases:
proto.enums.Enum
The possible states of the processor.
- Values:
- STATE_UNSPECIFIED (0):
The processor is in an unspecified state.
- ENABLED (1):
The processor is enabled, i.e., has an enabled version which can currently serve processing requests and all the feature dependencies have been successfully initialized.
- DISABLED (2):
The processor is disabled.
- ENABLING (3):
The processor is being enabled, will become
ENABLED
if successful.- DISABLING (4):
The processor is being disabled, will become
DISABLED
if successful.- CREATING (5):
The processor is being created, will become either
ENABLED
(for successful creation) orFAILED
(for failed ones). Once a processor is in this state, it can then be used for document processing, but the feature dependencies of the processor might not be fully created yet.- FAILED (6):
The processor failed during creation or initialization of feature dependencies. The user should delete the processor and recreate one as all the functionalities of the processor are disabled.
- DELETING (7):
The processor is being deleted, will be removed if successful.
- class google.cloud.documentai_v1beta3.types.ProcessorType(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
A processor type is responsible for performing a certain document understanding task on a certain type of document.
- name¶
The resource name of the processor type. Format:
projects/{project}/processorTypes/{processor_type}
- Type
- available_locations¶
The locations in which this processor is available.
- Type
MutableSequence[google.cloud.documentai_v1beta3.types.ProcessorType.LocationInfo]
- allow_creation¶
Whether the processor type allows creation. If true, users can create a processor of this processor type. Otherwise, users need to request access.
- Type
- launch_stage¶
Launch stage of the processor type
- Type
google.api.launch_stage_pb2.LaunchStage
- sample_document_uris¶
A set of Cloud Storage URIs of sample documents for this processor.
- Type
MutableSequence[str]
- class LocationInfo(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
The location information about where the processor is available.
- location_id¶
The location ID. For supported locations, refer to regional and multi-regional support.
- Type
- class google.cloud.documentai_v1beta3.types.ProcessorVersion(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
A processor version is an implementation of a processor. Each processor can have multiple versions, pretrained by Google internally or uptrained by the customer. A processor can only have one default version at a time. Its document-processing behavior is defined by that version.
- name¶
Identifier. The resource name of the processor version. Format:
projects/{project}/locations/{location}/processors/{processor}/processorVersions/{processor_version}
- Type
- document_schema¶
The schema of the processor version. Describes the output.
- state¶
Output only. The state of the processor version.
- create_time¶
The time the processor version was created.
- latest_evaluation¶
The most recently invoked evaluation for the processor version.
- deprecation_info¶
If set, information about the eventual deprecation of this version.
- model_type¶
Output only. The model type of this processor version.
- gen_ai_model_info¶
Output only. Information about Generative AI model-based processor versions.
- class DeprecationInfo(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Information about the upcoming deprecation of this processor version.
- deprecation_time¶
The time at which this processor version will be deprecated.
- class GenAiModelInfo(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Information about Generative AI model-based processor versions.
This message has oneof fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.
- foundation_gen_ai_model_info¶
Information for a pretrained Google-managed foundation model.
This field is a member of oneof
model_info
.
- custom_gen_ai_model_info¶
Information for a custom Generative AI model created by the user.
This field is a member of oneof
model_info
.
- class CustomGenAiModelInfo(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Information for a custom Generative AI model created by the user. These are created with
Create New Version
in either theCall foundation model
orFine tuning
tabs.- custom_model_type¶
The type of custom model created by the user.
- class CustomModelType(value)[source]¶
Bases:
proto.enums.Enum
The type of custom model created by the user.
- Values:
- CUSTOM_MODEL_TYPE_UNSPECIFIED (0):
The model type is unspecified.
- VERSIONED_FOUNDATION (1):
The model is a versioned foundation model.
- FINE_TUNED (2):
The model is a finetuned foundation model.
- class FoundationGenAiModelInfo(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Information for a pretrained Google-managed foundation model.
- class ModelType(value)[source]¶
Bases:
proto.enums.Enum
The possible model types of the processor version.
- Values:
- MODEL_TYPE_UNSPECIFIED (0):
The processor version has unspecified model type.
- MODEL_TYPE_GENERATIVE (1):
The processor version has generative model type.
- MODEL_TYPE_CUSTOM (2):
The processor version has custom model type.
- class State(value)[source]¶
Bases:
proto.enums.Enum
The possible states of the processor version.
- Values:
- STATE_UNSPECIFIED (0):
The processor version is in an unspecified state.
- DEPLOYED (1):
The processor version is deployed and can be used for processing.
- DEPLOYING (2):
The processor version is being deployed.
- UNDEPLOYED (3):
The processor version is not deployed and cannot be used for processing.
- UNDEPLOYING (4):
The processor version is being undeployed.
- CREATING (5):
The processor version is being created.
- DELETING (6):
The processor version is being deleted.
- FAILED (7):
The processor version failed and is in an indeterminate state.
- IMPORTING (8):
The processor version is being imported.
- class google.cloud.documentai_v1beta3.types.ProcessorVersionAlias(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Contains the alias and the aliased resource name of processor version.
- class google.cloud.documentai_v1beta3.types.PropertyMetadata(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Metadata about a property.
- field_extraction_metadata¶
Field extraction metadata on the property.
- class google.cloud.documentai_v1beta3.types.RawDocument(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Payload message of raw document content (bytes).
- mime_type¶
An IANA MIME type (RFC6838) indicating the nature and format of the [content][google.cloud.documentai.v1beta3.RawDocument.content].
- Type
- class google.cloud.documentai_v1beta3.types.ReviewDocumentOperationMetadata(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
The long-running operation metadata for the [ReviewDocument][google.cloud.documentai.v1beta3.DocumentProcessorService.ReviewDocument] method.
- state¶
Used only when Operation.done is false.
- state_message¶
A message providing more details about the current state of processing. For example, the error message if the operation is failed.
- Type
- create_time¶
The creation time of the operation.
- update_time¶
The last update time of the operation.
- common_metadata¶
The basic metadata of the long-running operation.
- class State(value)[source]¶
Bases:
proto.enums.Enum
State of the long-running operation.
- Values:
- STATE_UNSPECIFIED (0):
Unspecified state.
- RUNNING (1):
Operation is still running.
- CANCELLING (2):
Operation is being cancelled.
- SUCCEEDED (3):
Operation succeeded.
- FAILED (4):
Operation failed.
- CANCELLED (5):
Operation is cancelled.
- class google.cloud.documentai_v1beta3.types.ReviewDocumentRequest(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Request message for the [ReviewDocument][google.cloud.documentai.v1beta3.DocumentProcessorService.ReviewDocument] method.
- human_review_config¶
Required. The resource name of the [HumanReviewConfig][google.cloud.documentai.v1beta3.HumanReviewConfig] that the document will be reviewed with.
- Type
- document¶
The document that needs human review.
- enable_schema_validation¶
Whether the validation should be performed on the ad-hoc review request.
- Type
- priority¶
The priority of the human review task.
- document_schema¶
The document schema of the human review task.
- class Priority(value)[source]¶
Bases:
proto.enums.Enum
The priority level of the human review task.
- Values:
- DEFAULT (0):
The default priority level.
- URGENT (1):
The urgent priority level. The labeling manager should allocate labeler resource to the urgent task queue to respect this priority level.
- class google.cloud.documentai_v1beta3.types.ReviewDocumentResponse(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Response message for the [ReviewDocument][google.cloud.documentai.v1beta3.DocumentProcessorService.ReviewDocument] method.
- gcs_destination¶
The Cloud Storage uri for the human reviewed document if the review is succeeded.
- Type
- state¶
The state of the review operation.
- class State(value)[source]¶
Bases:
proto.enums.Enum
Possible states of the review operation.
- Values:
- STATE_UNSPECIFIED (0):
The default value. This value is used if the state is omitted.
- REJECTED (1):
The review operation is rejected by the reviewer.
- SUCCEEDED (2):
The review operation is succeeded.
- class google.cloud.documentai_v1beta3.types.RevisionRef(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
The revision reference specifies which revision on the document to read.
This message has oneof fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.
- latest_processor_version¶
Reads the revision generated by the processor version. The format takes the full resource name of processor version.
projects/{project}/locations/{location}/processors/{processor}/processorVersions/{processorVersion}
This field is a member of oneof
source
.- Type
- class RevisionCase(value)[source]¶
Bases:
proto.enums.Enum
Some predefined revision cases.
- Values:
- REVISION_CASE_UNSPECIFIED (0):
Unspecified case, fall back to read the
LATEST_HUMAN_REVIEW
.- LATEST_HUMAN_REVIEW (1):
The latest revision made by a human.
- LATEST_TIMESTAMP (2):
The latest revision based on timestamp.
- BASE_OCR_REVISION (3):
The first (OCR) revision.
- class google.cloud.documentai_v1beta3.types.SetDefaultProcessorVersionMetadata(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
The long-running operation metadata for the [SetDefaultProcessorVersion][google.cloud.documentai.v1beta3.DocumentProcessorService.SetDefaultProcessorVersion] method.
- common_metadata¶
The basic metadata of the long-running operation.
- class google.cloud.documentai_v1beta3.types.SetDefaultProcessorVersionRequest(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Request message for the [SetDefaultProcessorVersion][google.cloud.documentai.v1beta3.DocumentProcessorService.SetDefaultProcessorVersion] method.
- processor¶
Required. The resource name of the [Processor][google.cloud.documentai.v1beta3.Processor] to change default version.
- Type
- class google.cloud.documentai_v1beta3.types.SetDefaultProcessorVersionResponse(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Response message for the [SetDefaultProcessorVersion][google.cloud.documentai.v1beta3.DocumentProcessorService.SetDefaultProcessorVersion] method.
- class google.cloud.documentai_v1beta3.types.SummaryOptions(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Metadata for document summarization.
- length¶
How long the summary should be.
- format_¶
The format the summary should be in.
- class Format(value)[source]¶
Bases:
proto.enums.Enum
The Format enum.
- Values:
- FORMAT_UNSPECIFIED (0):
Default.
- PARAGRAPH (1):
Format the output in paragraphs.
- BULLETS (2):
Format the output in bullets.
- class Length(value)[source]¶
Bases:
proto.enums.Enum
The Length enum.
- Values:
- LENGTH_UNSPECIFIED (0):
Default.
- BRIEF (1):
A brief summary of one or two sentences.
- MODERATE (2):
A paragraph-length summary.
- COMPREHENSIVE (3):
The longest option available.
- class google.cloud.documentai_v1beta3.types.TrainProcessorVersionMetadata(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
The metadata that represents a processor version being created.
- common_metadata¶
The basic metadata of the long-running operation.
- training_dataset_validation¶
The training dataset validation information.
- test_dataset_validation¶
The test dataset validation information.
- class DatasetValidation(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
The dataset validation information. This includes any and all errors with documents and the dataset.
- document_errors¶
Error information pertaining to specific documents. A maximum of 10 document errors will be returned. Any document with errors will not be used throughout training.
- Type
MutableSequence[google.rpc.status_pb2.Status]
- dataset_errors¶
Error information for the dataset as a whole. A maximum of 10 dataset errors will be returned. A single dataset error is terminal for training.
- Type
MutableSequence[google.rpc.status_pb2.Status]
- class google.cloud.documentai_v1beta3.types.TrainProcessorVersionRequest(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Request message for the [TrainProcessorVersion][google.cloud.documentai.v1beta3.DocumentProcessorService.TrainProcessorVersion] method.
This message has oneof fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.
- custom_document_extraction_options¶
Options to control Custom Document Extraction (CDE) Processor.
This field is a member of oneof
processor_flags
.
- foundation_model_tuning_options¶
Options to control foundation model tuning of a processor.
This field is a member of oneof
processor_flags
.
- parent¶
Required. The parent (project, location and processor) to create the new version for. Format:
projects/{project}/locations/{location}/processors/{processor}
.- Type
- processor_version¶
Required. The processor version to be created.
- document_schema¶
Optional. The schema the processor version will be trained with.
- input_data¶
Optional. The input data used to train the [ProcessorVersion][google.cloud.documentai.v1beta3.ProcessorVersion].
- base_processor_version¶
Optional. The processor version to use as a base for training. This processor version must be a child of
parent
. Format:projects/{project}/locations/{location}/processors/{processor}/processorVersions/{processorVersion}
.- Type
- class CustomDocumentExtractionOptions(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Options to control the training of the Custom Document Extraction (CDE) Processor.
- training_method¶
Training method to use for CDE training.
- class TrainingMethod(value)[source]¶
Bases:
proto.enums.Enum
Training Method for CDE.
TRAINING_METHOD_UNSPECIFIED
will fall back toMODEL_BASED
.- Values:
- TRAINING_METHOD_UNSPECIFIED (0):
No description available.
- MODEL_BASED (1):
No description available.
- TEMPLATE_BASED (2):
No description available.
- class FoundationModelTuningOptions(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Options to control foundation model tuning of the processor.
- train_steps¶
Optional. The number of steps to run for model tuning. Valid values are between 1 and 400. If not provided, recommended steps will be used.
- Type
- class InputData(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
The input data used to train a new [ProcessorVersion][google.cloud.documentai.v1beta3.ProcessorVersion].
- training_documents¶
The documents used for training the new version.
- test_documents¶
The documents used for testing the trained version.
- class google.cloud.documentai_v1beta3.types.TrainProcessorVersionResponse(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
The response for [TrainProcessorVersion][google.cloud.documentai.v1beta3.DocumentProcessorService.TrainProcessorVersion].
- class google.cloud.documentai_v1beta3.types.UndeployProcessorVersionMetadata(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
The long-running operation metadata for the [UndeployProcessorVersion][google.cloud.documentai.v1beta3.DocumentProcessorService.UndeployProcessorVersion] method.
- common_metadata¶
The basic metadata of the long-running operation.
- class google.cloud.documentai_v1beta3.types.UndeployProcessorVersionRequest(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Request message for the [UndeployProcessorVersion][google.cloud.documentai.v1beta3.DocumentProcessorService.UndeployProcessorVersion] method.
- class google.cloud.documentai_v1beta3.types.UndeployProcessorVersionResponse(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Response message for the [UndeployProcessorVersion][google.cloud.documentai.v1beta3.DocumentProcessorService.UndeployProcessorVersion] method.
- class google.cloud.documentai_v1beta3.types.UpdateDatasetOperationMetadata(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
- common_metadata¶
The basic metadata of the long-running operation.
- class google.cloud.documentai_v1beta3.types.UpdateDatasetRequest(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
- dataset¶
Required. The
name
field of theDataset
is used to identify the resource to be updated.
- update_mask¶
The update mask applies to the resource.
- class google.cloud.documentai_v1beta3.types.UpdateDatasetSchemaRequest(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
Request for
UpdateDatasetSchema
.- dataset_schema¶
Required. The name field of the
DatasetSchema
is used to identify the resource to be updated.
- update_mask¶
The update mask applies to the resource.
- class google.cloud.documentai_v1beta3.types.Vertex(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]¶
Bases:
proto.message.Message
A vertex represents a 2D point in the image. NOTE: the vertex coordinates are in the same scale as the original image.