Class: Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1InputDataConfig

Inherits:

Object

Object
Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1InputDataConfig

show all

Includes:: Core::Hashable, Core::JsonObjectSupport

Defined in:: lib/google/apis/aiplatform_v1beta1/classes.rb,
lib/google/apis/aiplatform_v1beta1/representations.rb,
lib/google/apis/aiplatform_v1beta1/representations.rb

Overview

Specifies Vertex AI owned input data to be used for training, and possibly evaluating, the Model.

Instance Attribute Summary collapse

#annotation_schema_uri ⇒ String
Applicable only to custom training with Datasets that have DataItems and Annotations.
#annotations_filter ⇒ String
Applicable only to Datasets that have DataItems and Annotations.
#bigquery_destination ⇒ Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1BigQueryDestination
The BigQuery location for the output content.
#dataset_id ⇒ String
Required.
#filter_split ⇒ Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1FilterSplit
Assigns input data to training, validation, and test sets based on the given filters, data pieces not matched by any filter are ignored.
#fraction_split ⇒ Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1FractionSplit
Assigns the input data to training, validation, and test sets as per the given fractions.
#gcs_destination ⇒ Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1GcsDestination
The Google Cloud Storage location where the output is to be written to.
#persist_ml_use_assignment ⇒ Boolean (also: #persist_ml_use_assignment?)
Whether to persist the ML use assignment to data item system labels.
#predefined_split ⇒ Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1PredefinedSplit
Assigns input data to training, validation, and test sets based on the value of a provided key.
#saved_query_id ⇒ String
Only applicable to Datasets that have SavedQueries.
#stratified_split ⇒ Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1StratifiedSplit
Assigns input data to the training, validation, and test sets so that the distribution of values found in the categorical column (as specified by the key field) is mirrored within each split.
#timestamp_split ⇒ Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1TimestampSplit
Assigns input data to training, validation, and test sets based on a provided timestamps.

Instance Method Summary collapse

#initialize(**args) ⇒ GoogleCloudAiplatformV1beta1InputDataConfig constructor
A new instance of GoogleCloudAiplatformV1beta1InputDataConfig.
#update!(**args) ⇒ Object
Update properties of this object.

Constructor Details

#initialize(**args) ⇒ `GoogleCloudAiplatformV1beta1InputDataConfig`

Returns a new instance of GoogleCloudAiplatformV1beta1InputDataConfig.



15004
15005
15006

# File 'lib/google/apis/aiplatform_v1beta1/classes.rb', line 15004

def initialize(**args)
   update!(**args)
end

Instance Attribute Details

#annotation_schema_uri ⇒ `String`

Applicable only to custom training with Datasets that have DataItems and Annotations. Cloud Storage URI that points to a YAML file describing the annotation schema. The schema is defined as an OpenAPI 3.0.2 Schema Object. The schema files that can be used here are found in gs://google- cloud-aiplatform/schema/dataset/annotation/ , note that the chosen schema must be consistent with metadata of the Dataset specified by dataset_id. Only Annotations that both match this schema and belong to DataItems not ignored by the split method are used in respectively training, validation or test role, depending on the role of the DataItem they are on. When used in conjunction with annotations_filter, the Annotations used for training are filtered by both annotations_filter and annotation_schema_uri. Corresponds to the JSON property annotationSchemaUri

Returns:

(String)



14904
14905
14906

# File 'lib/google/apis/aiplatform_v1beta1/classes.rb', line 14904

def annotation_schema_uri
  @annotation_schema_uri
end

#annotations_filter ⇒ `String`

Applicable only to Datasets that have DataItems and Annotations. A filter on Annotations of the Dataset. Only Annotations that both match this filter and belong to DataItems not ignored by the split method are used in respectively training, validation or test role, depending on the role of the DataItem they are on (for the auto-assigned that role is decided by Vertex AI). A filter with same syntax as the one used in ListAnnotations may be used, but note here it filters across all Annotations of the Dataset, and not just within a single DataItem. Corresponds to the JSON property annotationsFilter

Returns:

(String)



14916
14917
14918

# File 'lib/google/apis/aiplatform_v1beta1/classes.rb', line 14916

def annotations_filter
  @annotations_filter
end

#bigquery_destination ⇒ `Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1BigQueryDestination`

The BigQuery location for the output content. Corresponds to the JSON property bigqueryDestination

Returns:

(Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1BigQueryDestination)



14921
14922
14923

# File 'lib/google/apis/aiplatform_v1beta1/classes.rb', line 14921

def bigquery_destination
  @bigquery_destination
end

#dataset_id ⇒ `String`

Required. The ID of the Dataset in the same Project and Location which data will be used to train the Model. The Dataset must use schema compatible with Model being trained, and what is compatible should be described in the used TrainingPipeline's training_task_definition. For tabular Datasets, all their data is exported to training, to pick and choose from. Corresponds to the JSON property datasetId

Returns:

(String)



14930
14931
14932

# File 'lib/google/apis/aiplatform_v1beta1/classes.rb', line 14930

def dataset_id
  @dataset_id
end

#filter_split ⇒ `Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1FilterSplit`

Assigns input data to training, validation, and test sets based on the given filters, data pieces not matched by any filter are ignored. Currently only supported for Datasets containing DataItems. If any of the filters in this message are to match nothing, then they can be set as '-' (the minus sign). Supported only for unstructured Datasets. Corresponds to the JSON property filterSplit

Returns:

(Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1FilterSplit)



14939
14940
14941

# File 'lib/google/apis/aiplatform_v1beta1/classes.rb', line 14939

def filter_split
  @filter_split
end

#fraction_split ⇒ `Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1FractionSplit`

Assigns the input data to training, validation, and test sets as per the given fractions. Any of training_fraction, validation_fraction and test_fraction may optionally be provided, they must sum to up to 1. If the provided ones sum to less than 1, the remainder is assigned to sets as decided by Vertex AI. If none of the fractions are set, by default roughly 80% of data is used for training, 10% for validation, and 10% for test. Corresponds to the JSON property fractionSplit

Returns:

(Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1FractionSplit)



14949
14950
14951

# File 'lib/google/apis/aiplatform_v1beta1/classes.rb', line 14949

def fraction_split
  @fraction_split
end

#gcs_destination ⇒ `Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1GcsDestination`

The Google Cloud Storage location where the output is to be written to. Corresponds to the JSON property gcsDestination

Returns:

(Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1GcsDestination)



14954
14955
14956

# File 'lib/google/apis/aiplatform_v1beta1/classes.rb', line 14954

def gcs_destination
  @gcs_destination
end

#persist_ml_use_assignment ⇒ `Boolean` Also known as: persist_ml_use_assignment?

Whether to persist the ML use assignment to data item system labels. Corresponds to the JSON property persistMlUseAssignment

Returns:

(Boolean)



14959
14960
14961

# File 'lib/google/apis/aiplatform_v1beta1/classes.rb', line 14959

def persist_ml_use_assignment
  @persist_ml_use_assignment
end

#predefined_split ⇒ `Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1PredefinedSplit`

Assigns input data to training, validation, and test sets based on the value of a provided key. Supported only for tabular Datasets. Corresponds to the JSON property predefinedSplit

Returns:

(Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1PredefinedSplit)



14966
14967
14968

# File 'lib/google/apis/aiplatform_v1beta1/classes.rb', line 14966

def predefined_split
  @predefined_split
end

#saved_query_id ⇒ `String`

Only applicable to Datasets that have SavedQueries. The ID of a SavedQuery ( annotation set) under the Dataset specified by dataset_id used for filtering Annotations for training. Only Annotations that are associated with this SavedQuery are used in respectively training. When used in conjunction with annotations_filter, the Annotations used for training are filtered by both saved_query_id and annotations_filter. Only one of saved_query_id and annotation_schema_uri should be specified as both of them represent the same thing: problem type. Corresponds to the JSON property savedQueryId

Returns:

(String)



14978
14979
14980

# File 'lib/google/apis/aiplatform_v1beta1/classes.rb', line 14978

def saved_query_id
  @saved_query_id
end

#stratified_split ⇒ `Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1StratifiedSplit`

Assigns input data to the training, validation, and test sets so that the distribution of values found in the categorical column (as specified by the key field) is mirrored within each split. The fraction values determine the relative sizes of the splits. For example, if the specified column has three values, with 50% of the rows having value "A", 25% value "B", and 25% value "C" , and the split fractions are specified as 80/10/10, then the training set will constitute 80% of the training data, with about 50% of the training set rows having the value "A" for the specified column, about 25% having the value "B", and about 25% having the value "C". Only the top 500 occurring values are used; any values not in the top 500 values are randomly assigned to a split. If less than three rows contain a specific value, those rows are randomly assigned. Supported only for tabular Datasets. Corresponds to the JSON property stratifiedSplit

Returns:

(Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1StratifiedSplit)



14994
14995
14996

# File 'lib/google/apis/aiplatform_v1beta1/classes.rb', line 14994

def stratified_split
  @stratified_split
end

#timestamp_split ⇒ `Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1TimestampSplit`

Assigns input data to training, validation, and test sets based on a provided timestamps. The youngest data pieces are assigned to training set, next to validation set, and the oldest to the test set. Supported only for tabular Datasets. Corresponds to the JSON property timestampSplit

Returns:

(Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1TimestampSplit)



15002
15003
15004

# File 'lib/google/apis/aiplatform_v1beta1/classes.rb', line 15002

def timestamp_split
  @timestamp_split
end

Instance Method Details

#update!(**args) ⇒ `Object`

Update properties of this object

# File 'lib/google/apis/aiplatform_v1beta1/classes.rb', line 15009

def update!(**args)
  @annotation_schema_uri = args[:annotation_schema_uri] if args.key?(:annotation_schema_uri)
  @annotations_filter = args[:annotations_filter] if args.key?(:annotations_filter)
  @bigquery_destination = args[:bigquery_destination] if args.key?(:bigquery_destination)
  @dataset_id = args[:dataset_id] if args.key?(:dataset_id)
  @filter_split = args[:filter_split] if args.key?(:filter_split)
  @fraction_split = args[:fraction_split] if args.key?(:fraction_split)
  @gcs_destination = args[:gcs_destination] if args.key?(:gcs_destination)
  @persist_ml_use_assignment = args[:persist_ml_use_assignment] if args.key?(:persist_ml_use_assignment)
  @predefined_split = args[:predefined_split] if args.key?(:predefined_split)
  @saved_query_id = args[:saved_query_id] if args.key?(:saved_query_id)
  @stratified_split = args[:stratified_split] if args.key?(:stratified_split)
  @timestamp_split = args[:timestamp_split] if args.key?(:timestamp_split)
end

Class: Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1InputDataConfig

Overview

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(**args) ⇒ GoogleCloudAiplatformV1beta1InputDataConfig

Instance Attribute Details

#annotation_schema_uri ⇒ String

#annotations_filter ⇒ String

#bigquery_destination ⇒ Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1BigQueryDestination

#dataset_id ⇒ String

#filter_split ⇒ Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1FilterSplit

#fraction_split ⇒ Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1FractionSplit

#gcs_destination ⇒ Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1GcsDestination

#persist_ml_use_assignment ⇒ Boolean Also known as: persist_ml_use_assignment?

#predefined_split ⇒ Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1PredefinedSplit

#saved_query_id ⇒ String

#stratified_split ⇒ Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1StratifiedSplit

#timestamp_split ⇒ Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1TimestampSplit

Instance Method Details

#update!(**args) ⇒ Object

#initialize(**args) ⇒ `GoogleCloudAiplatformV1beta1InputDataConfig`

#annotation_schema_uri ⇒ `String`

#annotations_filter ⇒ `String`

#bigquery_destination ⇒ `Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1BigQueryDestination`

#dataset_id ⇒ `String`

#filter_split ⇒ `Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1FilterSplit`

#fraction_split ⇒ `Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1FractionSplit`

#gcs_destination ⇒ `Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1GcsDestination`

#persist_ml_use_assignment ⇒ `Boolean` Also known as: persist_ml_use_assignment?

#predefined_split ⇒ `Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1PredefinedSplit`

#saved_query_id ⇒ `String`

#stratified_split ⇒ `Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1StratifiedSplit`

#timestamp_split ⇒ `Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1TimestampSplit`

#update!(**args) ⇒ `Object`