public class DocumentOcrTemplate extends Object
Constructor and Description |
---|
DocumentOcrTemplate(ImageAnnotatorClient imageAnnotatorClient,
Storage storage,
Executor executor,
int jsonOutputBatchSize) |
Modifier and Type | Method and Description |
---|---|
DocumentOcrResultSet |
readOcrOutputFile(GoogleStorageLocation jsonFile)
Parses a single JSON output file and returns the list of pages stored in the file.
|
DocumentOcrResultSet |
readOcrOutputFileSet(GoogleStorageLocation jsonOutputFilePathPrefix)
Parses the OCR output files who have the specified
jsonFilesetPrefix . |
org.springframework.util.concurrent.ListenableFuture<DocumentOcrResultSet> |
runOcrForDocument(GoogleStorageLocation document,
GoogleStorageLocation outputFilePathPrefix)
Runs OCR processing for a specified
document and generates OCR output files
under the path specified by outputFilePathPrefix . |
public DocumentOcrTemplate(ImageAnnotatorClient imageAnnotatorClient, Storage storage, Executor executor, int jsonOutputBatchSize)
public org.springframework.util.concurrent.ListenableFuture<DocumentOcrResultSet> runOcrForDocument(GoogleStorageLocation document, GoogleStorageLocation outputFilePathPrefix)
document
and generates OCR output files
under the path specified by outputFilePathPrefix
.
For example, if you specify an outputFilePathPrefix
of
"gs://bucket_name/ocr_results/myDoc_", all the output files of OCR processing will be
saved under prefix, such as:
Note: OCR processing operations may take several minutes to complete, so it may not be
advisable to block on the completion of the operation. One may use the returned
ListenableFuture
to register callbacks or track the status of the operation.
document
- The GoogleStorageLocation
of the document to run OCR processingoutputFilePathPrefix
- The GoogleStorageLocation
of a file, folder, or a
bucket describing the path for which all output files shall be saved underListenableFuture
allowing you to register callbacks or wait for the
completion of the operation.public DocumentOcrResultSet readOcrOutputFileSet(GoogleStorageLocation jsonOutputFilePathPrefix)
jsonFilesetPrefix
. This
method assumes that all of the OCR output files with the prefix are a part of the same
document.jsonOutputFilePathPrefix
- the folder location containing all of the JSON files of
OCR outputDocumentOcrResultSet
describing the OCR content of a documentpublic DocumentOcrResultSet readOcrOutputFile(GoogleStorageLocation jsonFile)
Each page of the document is represented as a TextAnnotation
which contains the
parsed OCR data.
jsonFile
- the location of the JSON output fileTextAnnotation
containing the OCR resultsRuntimeException
- if the JSON file cannot be deserialized into a
TextAnnotation
objectCopyright © 2021. All rights reserved.