Class: Google::Cloud::DocumentAI::V1::OcrConfig

Inherits:
Object
  • Object
show all
Extended by:
Protobuf::MessageExts::ClassMethods
Includes:
Protobuf::MessageExts
Defined in:
proto_docs/google/cloud/documentai/v1/document_io.rb

Overview

Config for Document OCR.

Defined Under Namespace

Classes: Hints, PremiumFeatures

Instance Attribute Summary collapse

Instance Attribute Details

#advanced_ocr_options::Array<::String>

Returns A list of advanced OCR options to further fine-tune OCR behavior. Current valid values are:

  • legacy_layout: a heuristics layout detection algorithm, which serves as an alternative to the current ML-based layout detection algorithm. Customers can choose the best suitable layout algorithm based on their situation.

Returns:

  • (::Array<::String>)

    A list of advanced OCR options to further fine-tune OCR behavior. Current valid values are:

    • legacy_layout: a heuristics layout detection algorithm, which serves as an alternative to the current ML-based layout detection algorithm. Customers can choose the best suitable layout algorithm based on their situation.


164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
# File 'proto_docs/google/cloud/documentai/v1/document_io.rb', line 164

class OcrConfig
  include ::Google::Protobuf::MessageExts
  extend ::Google::Protobuf::MessageExts::ClassMethods

  # Hints for OCR Engine
  # @!attribute [rw] language_hints
  #   @return [::Array<::String>]
  #     List of BCP-47 language codes to use for OCR. In most cases, not
  #     specifying it yields the best results since it enables automatic language
  #     detection. For languages based on the Latin alphabet, setting hints is
  #     not needed. In rare cases, when the language of the text in the
  #     image is known, setting a hint will help get better results (although it
  #     will be a significant hindrance if the hint is wrong).
  class Hints
    include ::Google::Protobuf::MessageExts
    extend ::Google::Protobuf::MessageExts::ClassMethods
  end

  # Configurations for premium OCR features.
  # @!attribute [rw] enable_selection_mark_detection
  #   @return [::Boolean]
  #     Turn on selection mark detector in OCR engine. Only available in OCR 2.0
  #     (and later) processors.
  # @!attribute [rw] compute_style_info
  #   @return [::Boolean]
  #     Turn on font identification model and return font style information.
  # @!attribute [rw] enable_math_ocr
  #   @return [::Boolean]
  #     Turn on the model that can extract LaTeX math formulas.
  class PremiumFeatures
    include ::Google::Protobuf::MessageExts
    extend ::Google::Protobuf::MessageExts::ClassMethods
  end
end

#compute_style_info::Boolean

Deprecated.

This field is deprecated and may be removed in the next major version update.

Returns Turn on font identification model and return font style information. Deprecated, use PremiumFeatures.compute_style_info instead.

Returns:



164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
# File 'proto_docs/google/cloud/documentai/v1/document_io.rb', line 164

class OcrConfig
  include ::Google::Protobuf::MessageExts
  extend ::Google::Protobuf::MessageExts::ClassMethods

  # Hints for OCR Engine
  # @!attribute [rw] language_hints
  #   @return [::Array<::String>]
  #     List of BCP-47 language codes to use for OCR. In most cases, not
  #     specifying it yields the best results since it enables automatic language
  #     detection. For languages based on the Latin alphabet, setting hints is
  #     not needed. In rare cases, when the language of the text in the
  #     image is known, setting a hint will help get better results (although it
  #     will be a significant hindrance if the hint is wrong).
  class Hints
    include ::Google::Protobuf::MessageExts
    extend ::Google::Protobuf::MessageExts::ClassMethods
  end

  # Configurations for premium OCR features.
  # @!attribute [rw] enable_selection_mark_detection
  #   @return [::Boolean]
  #     Turn on selection mark detector in OCR engine. Only available in OCR 2.0
  #     (and later) processors.
  # @!attribute [rw] compute_style_info
  #   @return [::Boolean]
  #     Turn on font identification model and return font style information.
  # @!attribute [rw] enable_math_ocr
  #   @return [::Boolean]
  #     Turn on the model that can extract LaTeX math formulas.
  class PremiumFeatures
    include ::Google::Protobuf::MessageExts
    extend ::Google::Protobuf::MessageExts::ClassMethods
  end
end

#disable_character_boxes_detection::Boolean

Returns Turn off character box detector in OCR engine. Character box detection is enabled by default in OCR 2.0 (and later) processors.

Returns:

  • (::Boolean)

    Turn off character box detector in OCR engine. Character box detection is enabled by default in OCR 2.0 (and later) processors.



164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
# File 'proto_docs/google/cloud/documentai/v1/document_io.rb', line 164

class OcrConfig
  include ::Google::Protobuf::MessageExts
  extend ::Google::Protobuf::MessageExts::ClassMethods

  # Hints for OCR Engine
  # @!attribute [rw] language_hints
  #   @return [::Array<::String>]
  #     List of BCP-47 language codes to use for OCR. In most cases, not
  #     specifying it yields the best results since it enables automatic language
  #     detection. For languages based on the Latin alphabet, setting hints is
  #     not needed. In rare cases, when the language of the text in the
  #     image is known, setting a hint will help get better results (although it
  #     will be a significant hindrance if the hint is wrong).
  class Hints
    include ::Google::Protobuf::MessageExts
    extend ::Google::Protobuf::MessageExts::ClassMethods
  end

  # Configurations for premium OCR features.
  # @!attribute [rw] enable_selection_mark_detection
  #   @return [::Boolean]
  #     Turn on selection mark detector in OCR engine. Only available in OCR 2.0
  #     (and later) processors.
  # @!attribute [rw] compute_style_info
  #   @return [::Boolean]
  #     Turn on font identification model and return font style information.
  # @!attribute [rw] enable_math_ocr
  #   @return [::Boolean]
  #     Turn on the model that can extract LaTeX math formulas.
  class PremiumFeatures
    include ::Google::Protobuf::MessageExts
    extend ::Google::Protobuf::MessageExts::ClassMethods
  end
end

#enable_image_quality_scores::Boolean

Returns Enables intelligent document quality scores after OCR. Can help with diagnosing why OCR responses are of poor quality for a given input. Adds additional latency comparable to regular OCR to the process call.

Returns:

  • (::Boolean)

    Enables intelligent document quality scores after OCR. Can help with diagnosing why OCR responses are of poor quality for a given input. Adds additional latency comparable to regular OCR to the process call.



164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
# File 'proto_docs/google/cloud/documentai/v1/document_io.rb', line 164

class OcrConfig
  include ::Google::Protobuf::MessageExts
  extend ::Google::Protobuf::MessageExts::ClassMethods

  # Hints for OCR Engine
  # @!attribute [rw] language_hints
  #   @return [::Array<::String>]
  #     List of BCP-47 language codes to use for OCR. In most cases, not
  #     specifying it yields the best results since it enables automatic language
  #     detection. For languages based on the Latin alphabet, setting hints is
  #     not needed. In rare cases, when the language of the text in the
  #     image is known, setting a hint will help get better results (although it
  #     will be a significant hindrance if the hint is wrong).
  class Hints
    include ::Google::Protobuf::MessageExts
    extend ::Google::Protobuf::MessageExts::ClassMethods
  end

  # Configurations for premium OCR features.
  # @!attribute [rw] enable_selection_mark_detection
  #   @return [::Boolean]
  #     Turn on selection mark detector in OCR engine. Only available in OCR 2.0
  #     (and later) processors.
  # @!attribute [rw] compute_style_info
  #   @return [::Boolean]
  #     Turn on font identification model and return font style information.
  # @!attribute [rw] enable_math_ocr
  #   @return [::Boolean]
  #     Turn on the model that can extract LaTeX math formulas.
  class PremiumFeatures
    include ::Google::Protobuf::MessageExts
    extend ::Google::Protobuf::MessageExts::ClassMethods
  end
end

#enable_native_pdf_parsing::Boolean

Returns Enables special handling for PDFs with existing text information. Results in better text extraction quality in such PDF inputs.

Returns:

  • (::Boolean)

    Enables special handling for PDFs with existing text information. Results in better text extraction quality in such PDF inputs.



164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
# File 'proto_docs/google/cloud/documentai/v1/document_io.rb', line 164

class OcrConfig
  include ::Google::Protobuf::MessageExts
  extend ::Google::Protobuf::MessageExts::ClassMethods

  # Hints for OCR Engine
  # @!attribute [rw] language_hints
  #   @return [::Array<::String>]
  #     List of BCP-47 language codes to use for OCR. In most cases, not
  #     specifying it yields the best results since it enables automatic language
  #     detection. For languages based on the Latin alphabet, setting hints is
  #     not needed. In rare cases, when the language of the text in the
  #     image is known, setting a hint will help get better results (although it
  #     will be a significant hindrance if the hint is wrong).
  class Hints
    include ::Google::Protobuf::MessageExts
    extend ::Google::Protobuf::MessageExts::ClassMethods
  end

  # Configurations for premium OCR features.
  # @!attribute [rw] enable_selection_mark_detection
  #   @return [::Boolean]
  #     Turn on selection mark detector in OCR engine. Only available in OCR 2.0
  #     (and later) processors.
  # @!attribute [rw] compute_style_info
  #   @return [::Boolean]
  #     Turn on font identification model and return font style information.
  # @!attribute [rw] enable_math_ocr
  #   @return [::Boolean]
  #     Turn on the model that can extract LaTeX math formulas.
  class PremiumFeatures
    include ::Google::Protobuf::MessageExts
    extend ::Google::Protobuf::MessageExts::ClassMethods
  end
end

#enable_symbol::Boolean

Returns Includes symbol level OCR information if set to true.

Returns:

  • (::Boolean)

    Includes symbol level OCR information if set to true.



164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
# File 'proto_docs/google/cloud/documentai/v1/document_io.rb', line 164

class OcrConfig
  include ::Google::Protobuf::MessageExts
  extend ::Google::Protobuf::MessageExts::ClassMethods

  # Hints for OCR Engine
  # @!attribute [rw] language_hints
  #   @return [::Array<::String>]
  #     List of BCP-47 language codes to use for OCR. In most cases, not
  #     specifying it yields the best results since it enables automatic language
  #     detection. For languages based on the Latin alphabet, setting hints is
  #     not needed. In rare cases, when the language of the text in the
  #     image is known, setting a hint will help get better results (although it
  #     will be a significant hindrance if the hint is wrong).
  class Hints
    include ::Google::Protobuf::MessageExts
    extend ::Google::Protobuf::MessageExts::ClassMethods
  end

  # Configurations for premium OCR features.
  # @!attribute [rw] enable_selection_mark_detection
  #   @return [::Boolean]
  #     Turn on selection mark detector in OCR engine. Only available in OCR 2.0
  #     (and later) processors.
  # @!attribute [rw] compute_style_info
  #   @return [::Boolean]
  #     Turn on font identification model and return font style information.
  # @!attribute [rw] enable_math_ocr
  #   @return [::Boolean]
  #     Turn on the model that can extract LaTeX math formulas.
  class PremiumFeatures
    include ::Google::Protobuf::MessageExts
    extend ::Google::Protobuf::MessageExts::ClassMethods
  end
end

#hints::Google::Cloud::DocumentAI::V1::OcrConfig::Hints

Returns Hints for the OCR model.

Returns:



164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
# File 'proto_docs/google/cloud/documentai/v1/document_io.rb', line 164

class OcrConfig
  include ::Google::Protobuf::MessageExts
  extend ::Google::Protobuf::MessageExts::ClassMethods

  # Hints for OCR Engine
  # @!attribute [rw] language_hints
  #   @return [::Array<::String>]
  #     List of BCP-47 language codes to use for OCR. In most cases, not
  #     specifying it yields the best results since it enables automatic language
  #     detection. For languages based on the Latin alphabet, setting hints is
  #     not needed. In rare cases, when the language of the text in the
  #     image is known, setting a hint will help get better results (although it
  #     will be a significant hindrance if the hint is wrong).
  class Hints
    include ::Google::Protobuf::MessageExts
    extend ::Google::Protobuf::MessageExts::ClassMethods
  end

  # Configurations for premium OCR features.
  # @!attribute [rw] enable_selection_mark_detection
  #   @return [::Boolean]
  #     Turn on selection mark detector in OCR engine. Only available in OCR 2.0
  #     (and later) processors.
  # @!attribute [rw] compute_style_info
  #   @return [::Boolean]
  #     Turn on font identification model and return font style information.
  # @!attribute [rw] enable_math_ocr
  #   @return [::Boolean]
  #     Turn on the model that can extract LaTeX math formulas.
  class PremiumFeatures
    include ::Google::Protobuf::MessageExts
    extend ::Google::Protobuf::MessageExts::ClassMethods
  end
end

#premium_features::Google::Cloud::DocumentAI::V1::OcrConfig::PremiumFeatures

Returns Configurations for premium OCR features.

Returns:



164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
# File 'proto_docs/google/cloud/documentai/v1/document_io.rb', line 164

class OcrConfig
  include ::Google::Protobuf::MessageExts
  extend ::Google::Protobuf::MessageExts::ClassMethods

  # Hints for OCR Engine
  # @!attribute [rw] language_hints
  #   @return [::Array<::String>]
  #     List of BCP-47 language codes to use for OCR. In most cases, not
  #     specifying it yields the best results since it enables automatic language
  #     detection. For languages based on the Latin alphabet, setting hints is
  #     not needed. In rare cases, when the language of the text in the
  #     image is known, setting a hint will help get better results (although it
  #     will be a significant hindrance if the hint is wrong).
  class Hints
    include ::Google::Protobuf::MessageExts
    extend ::Google::Protobuf::MessageExts::ClassMethods
  end

  # Configurations for premium OCR features.
  # @!attribute [rw] enable_selection_mark_detection
  #   @return [::Boolean]
  #     Turn on selection mark detector in OCR engine. Only available in OCR 2.0
  #     (and later) processors.
  # @!attribute [rw] compute_style_info
  #   @return [::Boolean]
  #     Turn on font identification model and return font style information.
  # @!attribute [rw] enable_math_ocr
  #   @return [::Boolean]
  #     Turn on the model that can extract LaTeX math formulas.
  class PremiumFeatures
    include ::Google::Protobuf::MessageExts
    extend ::Google::Protobuf::MessageExts::ClassMethods
  end
end