Write filters to disk

Regular Contributor

Write filters to disk

The operator ModelGrouper is a convenient solution if some preprocessing and predictions models must be
simultaneously written to disk. A data mining process also often contains some filters like the
"FeatureNameFilter" operator which are however not written to disk when the ModelWriter is used.

In the following code, is there a way to also dump the "FeatureNameFilter" into a file such that the complete
process can be later read in and be applied on unseen data?

<?xml version="1.0" encoding="US-ASCII"?>
<process version="4.4">

  <operator name="Root" class="Process" expanded="yes">
      <parameter key="logverbosity"    value="init"/>
      <parameter key="random_seed"      value="2001"/>
      <parameter key="encoding" value="SYSTEM"/>
      <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
          <parameter key="target_function"      value="polynomial classification"/>
          <parameter key="number_examples"      value="100"/>
          <parameter key="number_of_attributes" value="5"/>
          <parameter key="attributes_lower_bound"      value="-10.0"/>
          <parameter key="attributes_upper_bound"      value="10.0"/>
          <parameter key="local_random_seed"    value="-1"/>
          <parameter key="datamanagement"      value="double_array"/>
      <operator name="NoiseGenerator" class="NoiseGenerator">
          <parameter key="random_attributes"    value="3"/>
          <parameter key="label_noise"  value="0.05"/>
          <parameter key="default_attribute_noise"      value="0.0"/>
          <list key="noise">
          <parameter key="offset"      value="0.0"/>
          <parameter key="linear_factor"        value="1.0"/>
          <parameter key="local_random_seed"    value="-1"/>
      <operator name="Normalization" class="Normalization">
          <parameter key="return_preprocessing_model"  value="true"/>
          <parameter key="create_view"  value="false"/>
          <parameter key="method"      value="Z-Transformation"/>
          <parameter key="min"  value="0.0"/>
          <parameter key="max"  value="1.0"/>
      <operator name="FeatureNameFilter" class="FeatureNameFilter">
          <parameter key="filter_special_features"      value="false"/>
          <parameter key="skip_features_with_name"      value="result"/>
      <operator name="NearestNeighbors" class="NearestNeighbors">
          <parameter key="keep_example_set"    value="false"/>
          <parameter key="k"    value="3"/>
          <parameter key="weighted_vote"        value="false"/>
          <parameter key="measure_types"        value="MixedMeasures"/>
          <parameter key="mixed_measure"        value="MixedEuclideanDistance"/>
          <parameter key="nominal_measure"      value="NominalDistance"/>
          <parameter key="numerical_measure"    value="EuclideanDistance"/>
          <parameter key="divergence"  value="GeneralizedIDivergence"/>
          <parameter key="kernel_type"  value="radial"/>
          <parameter key="kernel_gamma" value="1.0"/>
          <parameter key="kernel_sigma1"        value="1.0"/>
      <parameter key="kernel_sigma2"        value="0.0"/>
          <parameter key="kernel_sigma3"        value="2.0"/>
          <parameter key="kernel_degree"        value="3.0"/>
          <parameter key="kernel_shift" value="1.0"/>
          <parameter key="kernel_a"    value="1.0"/>
          <parameter key="kernel_b"    value="0.0"/>
      <operator name="ModelGrouper" class="ModelGrouper">
      <operator name="ModelWriter" class="ModelWriter">
          <parameter key="model_file"  value="combined_model_bin.mod"/>
          <parameter key="overwrite_existing_file"      value="true"/>
          <parameter key="output_type"  value="XML"/>

Elite II

Re: Write filters to disk

Hi Chris,
this unfortunately is not possible. You still have to design a process for application. But you could use a trick for simplifying this:
If you store all the preprocessing stuff in a single process, you might load and apply it in both the training process as well as in the apply process using the process embedder. Then this process behaves like a modell itself.

Old World Computing - Establishing the Future

Professional consulting for your Data Science problems