Model formats#

Models exist in multiple formats:

  • Yggdrasil model: A directory containing a Yggdrasil model. The directory is recognizable by a <optional prefix>data_spec.pb file. This is the native format of the library. This format is used with the CLI, C++, and Go APIs.

  • Zipped Yggdrasil model: A zip file containing a Yggdrasil model. This format is used with the JavaScript serving API.

  • A TensorFlow Decision Forests model: This is a TensorFlow SavedModel containing one or more Yggdrasil models and recognizable by a saved_model.pb file . A SavedModel itself is a directory and the Yggdrasil models are stored in one of its subdirectories. The directory containing the Yggdrasil models (currently named assets). This format is used with the TensorFlow Decision Forests API.

Models can be “converted” between formats:

  • Yggdrasil model → Zipped Yggdrasil model: Zip the model directory.

  • Zipped Yggdrasil model → Yggdrasil model: Unzip the model directory.

  • TensorFlow Decision Forests model → Yggdrasil model: The Yggdrasil model is contained in the assets sub-directory of the TensorFlow Decision Forests model directory. See details in the the next section.

  • (Zipped) Yggdrasil model → TensorFlow Decision Forests model: This conversion requires some code. See the next section.

Convert a a TensorFlow Decision Forests model to a Yggdrasil model#

A TensorFlow Decision Forests model directory contains a sub-directory containing a Yggdrasil model. In the next example, we create a TF-DF model in /tmp/model, and show the Yggdrasil model in /tmp/model/assets.

import tensorflow_decision_forests as tfdf

# Train a TF-DF model (without any pre-processing)
model = tfdf.keras.GradientBoostedTreesModel()
model.fit(...)

# Export the model as a TF Saved Model
# Note: /tmp/model/assets is an Yggdrasil Decision Forests model
model.save("/tmp/model")

# Show the structure of the TF SavedModel.
!tree /tmp/model
# /tmp/model
# ├── assets
# │   ├── <prefix>data_spec.pb
# │   ├── <prefix>done
# │   ├── <prefix>gradient_boosted_trees_header.pb
# │   ├── <prefix>header.pb
# │   └── <prefix>nodes-00000-of-00001
# ├── keras_metadata.pb
# ├── saved_model.pb
# └── variables
#     ├── variables.data-00000-of-00001
#     └── variables.index

Yggdrasil tools can be used. The next example use the show_model Yggdrasil command on a TF-DF model:

# Show the model structure using Yggdrasil tool box
!yggdrasil_decision_forests/cli/show_model --model=/tmp/model/assets

An Yggdrasil model contained in a TF-DF model differ from the TF-DF model in the following points:

  • Any TensorFlow preprocessing (e.g., using of the preprocessor model constructor argument) is not taken into account.

  • Categorical integer features need to be offset by 1. Therefore, for simplicity of use, it is better to use categorical string features.

Convert a Yggdrasil model to a TensorFlow Decision Forests model#

The conversion of an Yggdrasil model to to a TensorFlow Decision Forests is done using the tfdf.keras.yggdrasil_model_to_keras_model() function available in in TensorFlow Decision Forests.

The following example converts a YDF model into a TensorFlow Decision Forests model:

# Prepare and load the model with TensorFlow
import tensorflow_decision_forests as tfdf

tfdf.keras.yggdrasil_model_to_keras_model(<path to ydf model>, <path to tfdf-model>)

By default, yggdrasil_model_to_keras_model creates a model with raw tensor inputs. The type of the input tensor is determined by the semantics of the features. For example, numerical features are always encoded as float32. You can change this signature using the input_model_signature_fn argument.

The following example converts a YDF model while making sure that feature_1 is encoded into a int64 tensor.

def custom_model_input_signature(inspector) -> Any:

    # Default signature.
    input_spec = tfdf.keras.build_default_input_model_signature(inspector)

    # Override the type of feature_1.
    input_spec["feature_1"] = tf.TensorSpec(shape=[None], dtype=tf.int64)
    return input_spec

tfdf.keras.yggdrasil_model_to_keras_model(
    ygg_model_path,
    tfdf_model_path,
    input_model_signature_fn=custom_model_input_signature)

In YDF, pre-integerized categorical features (i.e., categorical features represented as integers) are 1-indexed. The value 0 (zero) is reserved for the out-of-vocabulary item. However, many TensorFlow tools assume that categorical integer features are 0-indexed. For this reason, TF-DF models are 0-indexed and contain an implicit +1 offset to the YDF model underneath. This offset does not impact users in most cases. However, when converting models, depending on how this model will be used, removing this offset might be better. This is done with the disable_categorical_integer_offset_correction argument.