TRAIN MODEL
===========
Purpose
-------
Use the ``TRAIN MODEL`` statement to build a model by training a modeltype on the columns in a given table.
Syntax
------
Diagram
~~~~~~~
**trainModel**
.. only:: html
.. raw:: html
.. only:: latex
.. image:: ../_static/rrd/trainModel1.rrd.*
.. image:: ../_static/rrd/trainModel2.rrd.*
**trainDataClause**
.. only:: html
.. raw:: html
**columnNameList**
.. only:: html
.. raw:: html
.. only:: latex
.. image:: ../_static/rrd/columnNameList.rrd.*
**trainDataConditionClause**
.. only:: html
.. raw:: html
.. only:: latex
.. image:: ../_static/rrd/trainDataConditionClause.rrd.*
**trainSampleClause**
.. only:: html
.. raw:: html
.. only:: latex
.. image:: ../_static/rrd/trainSampleClause.rrd.*
**trainModelOptionsClause**
.. only:: html
.. raw:: html
.. only:: latex
.. image:: ../_static/rrd/trainModelOptionsClause.rrd.*
**optionKeyValue**
.. only:: html
.. raw:: html
.. only:: latex
.. image:: ../_static/rrd/optionKeyValue.rrd.*
Keywords and Parameters
~~~~~~~~~~~~~~~~~~~~~~~
**modelName**
This is an identifier that specifies the name of the model to be built.
**modeltypeName**
This is an identifier that specifies the name of the modeltype to be used for model training.
**UPDATE**
Use the UPDATE clause if you want to update the model by training additional data on an existing model.
**LIKE**
Use the LIKE clause if you want to train a model with the same columns as the existing model.
**exModelName**
This is an identifier that specifies the name of the existing model.
**trainDataClause**
Specify the target data for model training.
To train a model on columns from multiple tables, specify them using the JOIN clause.
**schemaName**
This is an identifier that specifies the name of the schema that contains the training target table.
If not specified, the default (current) schema is used.
**tableName**
This is an identifier that specifies the name of the training target table.
**columnNameList**
Specify the target columns for model training. Multiple columns can be specified as a comma-separated list.
**trainDataConditionClause**
Specify the conditions for retrieving target data for model training.
This clause is used to specify join conditions for training a model on multiple tables, or to filter target data for updating an existing model.
**trainSampleClause**
Use the SAMPLE caluse if you want to use only a part of the original table as training data.
**trainModelOptionsClause**
Specify the model training options, including hyperparameters like epochs.
The options that can be specified depend on the modeltype.
**'optionKey'**
This is a string literal that specifies the key of the option.
**optionValue**
This is a string literal or a numeric value that specifies the value of the option.
Examples
--------
Training a Model
~~~~~~~~~~~~~~~~
The following statement trains a model ``tgan`` of the ``tablegan`` modeltype on the columns ``reordered`` and ``add_to_cart_order`` of the ``order_products`` table in the ``instacart`` schema.
.. code-block:: console
TRAIN MODEL tgan MODELTYPE tablegan
ON instacart.order_products(reordered, add_to_cart_order);
By adding the ``OPTIONS`` clause, the ``epochs`` hyperparameter can also be specified.
.. code-block:: console
TRAIN MODEL tgan MODELTYPE tablegan
ON instacart.order_products(reordered, add_to_cart_order)
OPTIONS ( 'epochs' = 100 );
It is possible to train a model with data from multiple tables, as shown below.
.. code-block:: console
TRAIN MODEL tgan_multi_tables MODELTYPE tablegan
FROM instacart.order_products(reordered, add_to_cart_order, order_id)
JOIN instacart.orders(order_id, order_dow)
ON orders.order_id = order_products.order_id;
Updating a Model
~~~~~~~~~~~~~~~~
The following statements train a model ``rspn_op`` of the ``rspn`` modeltype on the columns ``reordered`` and ``add_to_cart_order`` of the ``order_products`` table in the ``instacart`` schema, then train a new model ``rspn_op_update`` by updating the model with additional data.
.. code-block:: console
TRAIN MODEL rspn_op MODELTYPE rspn
FROM instacart.order_products(reordered, add_to_cart_order);
TRAIN MODEL rspn_op_update UPDATE rspn_op
ON order_products.order_id > 3000000;