Class: Google::Apis::AiplatformV1::GoogleCloudAiplatformV1InputDataConfig
- Inherits:
-
Object
- Object
- Google::Apis::AiplatformV1::GoogleCloudAiplatformV1InputDataConfig
- Includes:
- Core::Hashable, Core::JsonObjectSupport
- Defined in:
- lib/google/apis/aiplatform_v1/classes.rb,
lib/google/apis/aiplatform_v1/representations.rb,
lib/google/apis/aiplatform_v1/representations.rb
Overview
Specifies Vertex AI owned input data to be used for training, and possibly evaluating, the Model.
Instance Attribute Summary collapse
-
#annotation_schema_uri ⇒ String
Applicable only to custom training with Datasets that have DataItems and Annotations.
-
#annotations_filter ⇒ String
Applicable only to Datasets that have DataItems and Annotations.
-
#bigquery_destination ⇒ Google::Apis::AiplatformV1::GoogleCloudAiplatformV1BigQueryDestination
The BigQuery location for the output content.
-
#dataset_id ⇒ String
Required.
-
#filter_split ⇒ Google::Apis::AiplatformV1::GoogleCloudAiplatformV1FilterSplit
Assigns input data to training, validation, and test sets based on the given filters, data pieces not matched by any filter are ignored.
-
#fraction_split ⇒ Google::Apis::AiplatformV1::GoogleCloudAiplatformV1FractionSplit
Assigns the input data to training, validation, and test sets as per the given fractions.
-
#gcs_destination ⇒ Google::Apis::AiplatformV1::GoogleCloudAiplatformV1GcsDestination
The Google Cloud Storage location where the output is to be written to.
-
#persist_ml_use_assignment ⇒ Boolean
(also: #persist_ml_use_assignment?)
Whether to persist the ML use assignment to data item system labels.
-
#predefined_split ⇒ Google::Apis::AiplatformV1::GoogleCloudAiplatformV1PredefinedSplit
Assigns input data to training, validation, and test sets based on the value of a provided key.
-
#saved_query_id ⇒ String
Only applicable to Datasets that have SavedQueries.
-
#stratified_split ⇒ Google::Apis::AiplatformV1::GoogleCloudAiplatformV1StratifiedSplit
Assigns input data to the training, validation, and test sets so that the distribution of values found in the categorical column (as specified by the
key
field) is mirrored within each split. -
#timestamp_split ⇒ Google::Apis::AiplatformV1::GoogleCloudAiplatformV1TimestampSplit
Assigns input data to training, validation, and test sets based on a provided timestamps.
Instance Method Summary collapse
-
#initialize(**args) ⇒ GoogleCloudAiplatformV1InputDataConfig
constructor
A new instance of GoogleCloudAiplatformV1InputDataConfig.
-
#update!(**args) ⇒ Object
Update properties of this object.
Constructor Details
#initialize(**args) ⇒ GoogleCloudAiplatformV1InputDataConfig
Returns a new instance of GoogleCloudAiplatformV1InputDataConfig.
11746 11747 11748 |
# File 'lib/google/apis/aiplatform_v1/classes.rb', line 11746 def initialize(**args) update!(**args) end |
Instance Attribute Details
#annotation_schema_uri ⇒ String
Applicable only to custom training with Datasets that have DataItems and
Annotations. Cloud Storage URI that points to a YAML file describing the
annotation schema. The schema is defined as an OpenAPI 3.0.2 Schema Object. The schema files that can be used here are found in gs://google-
cloud-aiplatform/schema/dataset/annotation/ , note that the chosen schema must
be consistent with metadata of the Dataset specified by dataset_id. Only
Annotations that both match this schema and belong to DataItems not ignored by
the split method are used in respectively training, validation or test role,
depending on the role of the DataItem they are on. When used in conjunction
with annotations_filter, the Annotations used for training are filtered by
both annotations_filter and annotation_schema_uri.
Corresponds to the JSON property annotationSchemaUri
11646 11647 11648 |
# File 'lib/google/apis/aiplatform_v1/classes.rb', line 11646 def annotation_schema_uri @annotation_schema_uri end |
#annotations_filter ⇒ String
Applicable only to Datasets that have DataItems and Annotations. A filter on
Annotations of the Dataset. Only Annotations that both match this filter and
belong to DataItems not ignored by the split method are used in respectively
training, validation or test role, depending on the role of the DataItem they
are on (for the auto-assigned that role is decided by Vertex AI). A filter
with same syntax as the one used in ListAnnotations may be used, but note here
it filters across all Annotations of the Dataset, and not just within a single
DataItem.
Corresponds to the JSON property annotationsFilter
11658 11659 11660 |
# File 'lib/google/apis/aiplatform_v1/classes.rb', line 11658 def annotations_filter @annotations_filter end |
#bigquery_destination ⇒ Google::Apis::AiplatformV1::GoogleCloudAiplatformV1BigQueryDestination
The BigQuery location for the output content.
Corresponds to the JSON property bigqueryDestination
11663 11664 11665 |
# File 'lib/google/apis/aiplatform_v1/classes.rb', line 11663 def bigquery_destination @bigquery_destination end |
#dataset_id ⇒ String
Required. The ID of the Dataset in the same Project and Location which data
will be used to train the Model. The Dataset must use schema compatible with
Model being trained, and what is compatible should be described in the used
TrainingPipeline's training_task_definition. For tabular Datasets, all their
data is exported to training, to pick and choose from.
Corresponds to the JSON property datasetId
11672 11673 11674 |
# File 'lib/google/apis/aiplatform_v1/classes.rb', line 11672 def dataset_id @dataset_id end |
#filter_split ⇒ Google::Apis::AiplatformV1::GoogleCloudAiplatformV1FilterSplit
Assigns input data to training, validation, and test sets based on the given
filters, data pieces not matched by any filter are ignored. Currently only
supported for Datasets containing DataItems. If any of the filters in this
message are to match nothing, then they can be set as '-' (the minus sign).
Supported only for unstructured Datasets.
Corresponds to the JSON property filterSplit
11681 11682 11683 |
# File 'lib/google/apis/aiplatform_v1/classes.rb', line 11681 def filter_split @filter_split end |
#fraction_split ⇒ Google::Apis::AiplatformV1::GoogleCloudAiplatformV1FractionSplit
Assigns the input data to training, validation, and test sets as per the given
fractions. Any of training_fraction
, validation_fraction
and
test_fraction
may optionally be provided, they must sum to up to 1. If the
provided ones sum to less than 1, the remainder is assigned to sets as decided
by Vertex AI. If none of the fractions are set, by default roughly 80% of data
is used for training, 10% for validation, and 10% for test.
Corresponds to the JSON property fractionSplit
11691 11692 11693 |
# File 'lib/google/apis/aiplatform_v1/classes.rb', line 11691 def fraction_split @fraction_split end |
#gcs_destination ⇒ Google::Apis::AiplatformV1::GoogleCloudAiplatformV1GcsDestination
The Google Cloud Storage location where the output is to be written to.
Corresponds to the JSON property gcsDestination
11696 11697 11698 |
# File 'lib/google/apis/aiplatform_v1/classes.rb', line 11696 def gcs_destination @gcs_destination end |
#persist_ml_use_assignment ⇒ Boolean Also known as: persist_ml_use_assignment?
Whether to persist the ML use assignment to data item system labels.
Corresponds to the JSON property persistMlUseAssignment
11701 11702 11703 |
# File 'lib/google/apis/aiplatform_v1/classes.rb', line 11701 def persist_ml_use_assignment @persist_ml_use_assignment end |
#predefined_split ⇒ Google::Apis::AiplatformV1::GoogleCloudAiplatformV1PredefinedSplit
Assigns input data to training, validation, and test sets based on the value
of a provided key. Supported only for tabular Datasets.
Corresponds to the JSON property predefinedSplit
11708 11709 11710 |
# File 'lib/google/apis/aiplatform_v1/classes.rb', line 11708 def predefined_split @predefined_split end |
#saved_query_id ⇒ String
Only applicable to Datasets that have SavedQueries. The ID of a SavedQuery (
annotation set) under the Dataset specified by dataset_id used for filtering
Annotations for training. Only Annotations that are associated with this
SavedQuery are used in respectively training. When used in conjunction with
annotations_filter, the Annotations used for training are filtered by both
saved_query_id and annotations_filter. Only one of saved_query_id and
annotation_schema_uri should be specified as both of them represent the same
thing: problem type.
Corresponds to the JSON property savedQueryId
11720 11721 11722 |
# File 'lib/google/apis/aiplatform_v1/classes.rb', line 11720 def saved_query_id @saved_query_id end |
#stratified_split ⇒ Google::Apis::AiplatformV1::GoogleCloudAiplatformV1StratifiedSplit
Assigns input data to the training, validation, and test sets so that the
distribution of values found in the categorical column (as specified by the
key
field) is mirrored within each split. The fraction values determine the
relative sizes of the splits. For example, if the specified column has three
values, with 50% of the rows having value "A", 25% value "B", and 25% value "C"
, and the split fractions are specified as 80/10/10, then the training set
will constitute 80% of the training data, with about 50% of the training set
rows having the value "A" for the specified column, about 25% having the value
"B", and about 25% having the value "C". Only the top 500 occurring values are
used; any values not in the top 500 values are randomly assigned to a split.
If less than three rows contain a specific value, those rows are randomly
assigned. Supported only for tabular Datasets.
Corresponds to the JSON property stratifiedSplit
11736 11737 11738 |
# File 'lib/google/apis/aiplatform_v1/classes.rb', line 11736 def stratified_split @stratified_split end |
#timestamp_split ⇒ Google::Apis::AiplatformV1::GoogleCloudAiplatformV1TimestampSplit
Assigns input data to training, validation, and test sets based on a provided
timestamps. The youngest data pieces are assigned to training set, next to
validation set, and the oldest to the test set. Supported only for tabular
Datasets.
Corresponds to the JSON property timestampSplit
11744 11745 11746 |
# File 'lib/google/apis/aiplatform_v1/classes.rb', line 11744 def @timestamp_split end |
Instance Method Details
#update!(**args) ⇒ Object
Update properties of this object
11751 11752 11753 11754 11755 11756 11757 11758 11759 11760 11761 11762 11763 11764 |
# File 'lib/google/apis/aiplatform_v1/classes.rb', line 11751 def update!(**args) @annotation_schema_uri = args[:annotation_schema_uri] if args.key?(:annotation_schema_uri) @annotations_filter = args[:annotations_filter] if args.key?(:annotations_filter) @bigquery_destination = args[:bigquery_destination] if args.key?(:bigquery_destination) @dataset_id = args[:dataset_id] if args.key?(:dataset_id) @filter_split = args[:filter_split] if args.key?(:filter_split) @fraction_split = args[:fraction_split] if args.key?(:fraction_split) @gcs_destination = args[:gcs_destination] if args.key?(:gcs_destination) @persist_ml_use_assignment = args[:persist_ml_use_assignment] if args.key?(:persist_ml_use_assignment) @predefined_split = args[:predefined_split] if args.key?(:predefined_split) @saved_query_id = args[:saved_query_id] if args.key?(:saved_query_id) @stratified_split = args[:stratified_split] if args.key?(:stratified_split) @timestamp_split = args[:timestamp_split] if args.key?(:timestamp_split) end |