Bug while running Clustering with data 2 similarity operator

daniel_foersterdaniel_foerster Member Posts: 5 Contributor I
edited December 2018 in Help
Hello RapidMiner Community,

i think i found a bug in the Data to Similarity operator.

I'm clustering some little data set (around 20k lines) and after updating to 9.0.003 the data to similarity operator doesn't work anymore.

While clustering without that operator enabled everything works fine, with that operator enabled RapidMiner takes itself around 97% of memory and the process i ran doesn't show up in the result history.

To verify that i'm not doing something wrong i tested following:
- Executing only the read database Operator   ---> works
- Executing the whole Process without data2sim ---> works

- Clustering numerical ID + numerical Value + data2sim ---> Bug
- Clustering nominal value + text + data2sim  ---> Bug
- Clustering Text with Text + data2sim ---> Bug

I attached the process-XML and the RapidMiner Logfile down here.

Thanks for any help in this case!

My System:
CPU: Intel Xeon Gold 6140 @2,3Ghz
RAM: 64GB
HDD: 1TB
Windows Server 2016 64Bit


Process:
<?xml version="1.0" encoding="UTF-8"?><process version="9.0.003">  <operator activated="true" class="jdbc_connectors:read_database" compatibility="9.0.003" expanded="true" height="68" name="Read Database" width="90" x="45" y="34">    <parameter key="define_connection" value="predefined"/>    <parameter key="connection" value="DB"/>    <parameter key="database_system" value="MySQL"/>    <parameter key="define_query" value="query"/>    <parameter key="query" value="SELECT *&#10;FROM &quot;dbo&quot;.&quot;barillakaeufer&quot;"/>    <parameter key="use_default_schema" value="true"/>    <parameter key="prepare_statement" value="false"/>    <enumeration key="parameters"/>    <parameter key="datamanagement" value="double_array"/>    <parameter key="data_management" value="auto"/>  </operator></process><?xml version="1.0" encoding="UTF-8"?><process version="9.0.003">  <operator activated="true" class="select_attributes" compatibility="9.0.003" expanded="true" height="82" name="Select Attributes" width="90" x="179" y="34">    <parameter key="attribute_filter_type" value="subset"/>    <parameter key="attribute" value=""/>    <parameter key="attributes" value="Quantity_Barillakaeufer|receipt_header_id"/>    <parameter key="use_except_expression" value="false"/>    <parameter key="value_type" value="attribute_value"/>    <parameter key="use_value_type_exception" value="false"/>    <parameter key="except_value_type" value="time"/>    <parameter key="block_type" value="attribute_block"/>    <parameter key="use_block_type_exception" value="false"/>    <parameter key="except_block_type" value="value_matrix_row_start"/>    <parameter key="invert_selection" value="false"/>    <parameter key="include_special_attributes" value="false"/>  </operator></process><?xml version="1.0" encoding="UTF-8"?><process version="9.0.003">  <operator activated="true" class="shuffle" compatibility="9.0.003" expanded="true" height="82" name="Shuffle" width="90" x="313" y="34">    <parameter key="use_local_random_seed" value="true"/>    <parameter key="local_random_seed" value="1234567890"/>  </operator></process><?xml version="1.0" encoding="UTF-8"?><process version="9.0.003">  <operator activated="true" class="normalize" compatibility="9.0.003" expanded="true" height="103" name="Normalize" width="90" x="447" y="34">    <parameter key="return_preprocessing_model" value="false"/>    <parameter key="create_view" value="false"/>    <parameter key="attribute_filter_type" value="all"/>    <parameter key="attribute" value=""/>    <parameter key="attributes" value=""/>    <parameter key="use_except_expression" value="false"/>    <parameter key="value_type" value="numeric"/>    <parameter key="use_value_type_exception" value="false"/>    <parameter key="except_value_type" value="real"/>    <parameter key="block_type" value="value_series"/>    <parameter key="use_block_type_exception" value="false"/>    <parameter key="except_block_type" value="value_series_end"/>    <parameter key="invert_selection" value="false"/>    <parameter key="include_special_attributes" value="false"/>    <parameter key="method" value="Z-transformation"/>    <parameter key="min" value="0.0"/>    <parameter key="max" value="1.0"/>    <parameter key="allow_negative_values" value="false"/>  </operator></process><?xml version="1.0" encoding="UTF-8"?><process version="9.0.003">  <operator activated="true" class="multiply" compatibility="9.0.003" expanded="true" height="103" name="Multiply" width="90" x="45" y="187"/></process><?xml version="1.0" encoding="UTF-8"?><process version="9.0.003">  <operator activated="true" class="remove_correlated_attributes" compatibility="9.0.003" expanded="true" height="82" name="Remove Correlated Attributes" width="90" x="246" y="340">    <parameter key="correlation" value="0.8"/>    <parameter key="filter_relation" value="greater equals"/>    <parameter key="attribute_order" value="original"/>    <parameter key="use_absolute_correlation" value="true"/>    <parameter key="use_local_random_seed" value="false"/>    <parameter key="local_random_seed" value="1992"/>  </operator></process><?xml version="1.0" encoding="UTF-8"?><process version="9.0.003">  <operator activated="true" class="x_means" compatibility="9.0.003" expanded="true" height="82" name="X-Means" width="90" x="447" y="289">    <parameter key="add_cluster_attribute" value="true"/>    <parameter key="add_as_label" value="true"/>    <parameter key="remove_unlabeled" value="false"/>    <parameter key="k_min" value="2"/>    <parameter key="k_max" value="5"/>    <parameter key="determine_good_start_values" value="true"/>    <parameter key="measure_types" value="MixedMeasures"/>    <parameter key="mixed_measure" value="MixedEuclideanDistance"/>    <parameter key="nominal_measure" value="NominalDistance"/>    <parameter key="numerical_measure" value="EuclideanDistance"/>    <parameter key="divergence" value="GeneralizedIDivergence"/>    <parameter key="kernel_type" value="radial"/>    <parameter key="kernel_gamma" value="1.0"/>    <parameter key="kernel_sigma1" value="1.0"/>    <parameter key="kernel_sigma2" value="0.0"/>    <parameter key="kernel_sigma3" value="2.0"/>    <parameter key="kernel_degree" value="3.0"/>    <parameter key="kernel_shift" value="1.0"/>    <parameter key="kernel_a" value="1.0"/>    <parameter key="kernel_b" value="0.0"/>    <parameter key="clustering_algorithm" value="KMeans"/>    <parameter key="max_runs" value="10"/>    <parameter key="max_optimization_steps" value="100"/>    <parameter key="use_local_random_seed" value="true"/>    <parameter key="local_random_seed" value="1234567890"/>  </operator></process><?xml version="1.0" encoding="UTF-8"?><process version="9.0.003">  <operator activated="true" class="data_to_similarity" compatibility="9.0.003" expanded="true" height="82" name="Data to Similarity" width="90" x="581" y="340">    <parameter key="measure_types" value="MixedMeasures"/>    <parameter key="mixed_measure" value="MixedEuclideanDistance"/>    <parameter key="nominal_measure" value="NominalDistance"/>    <parameter key="numerical_measure" value="EuclideanDistance"/>    <parameter key="divergence" value="GeneralizedIDivergence"/>    <parameter key="kernel_type" value="radial"/>    <parameter key="kernel_gamma" value="1.0"/>    <parameter key="kernel_sigma1" value="1.0"/>    <parameter key="kernel_sigma2" value="0.0"/>    <parameter key="kernel_sigma3" value="2.0"/>    <parameter key="kernel_degree" value="3.0"/>    <parameter key="kernel_shift" value="1.0"/>    <parameter key="kernel_a" value="1.0"/>    <parameter key="kernel_b" value="0.0"/>  </operator></process><?xml version="1.0" encoding="UTF-8"?><process version="9.0.003">  <operator activated="true" class="concurrency:correlation_matrix" compatibility="9.0.003" expanded="true" height="103" name="Correlation Matrix" width="90" x="246" y="187">    <parameter key="attribute_filter_type" value="all"/>    <parameter key="attribute" value=""/>    <parameter key="attributes" value=""/>    <parameter key="use_except_expression" value="false"/>    <parameter key="value_type" value="attribute_value"/>    <parameter key="use_value_type_exception" value="false"/>    <parameter key="except_value_type" value="time"/>    <parameter key="block_type" value="attribute_block"/>    <parameter key="use_block_type_exception" value="false"/>    <parameter key="except_block_type" value="value_matrix_row_start"/>    <parameter key="invert_selection" value="false"/>    <parameter key="include_special_attributes" value="false"/>    <parameter key="normalize_weights" value="true"/>    <parameter key="squared_correlation" value="false"/>  </operator></process>
RM Log File:
Nov 19, 2018 2:31:05 PM com.rapidminer.gui.RapidMinerGUI run
INFORMATION: Launching RapidMiner 9.0.003, platform WIN64
Nov 19, 2018 2:31:05 PM com.rapidminer.tools.I18N <clinit>
INFO: Set locale to en.
Nov 19, 2018 2:31:06 PM com.rapidminer.core.license.ProductConstraintManager initialize
INFO: Initializing license manager.
Nov 19, 2018 2:31:06 PM com.rapidminer.core.license.ProductConstraintManager initialize
INFO: Using default license location.
Nov 19, 2018 2:31:06 PM com.rapidminer.core.license.ProductConstraintManager initialize
INFO: Registering default product.
Nov 19, 2018 2:31:06 PM com.rapidminer.search.GlobalSearchRegistry registerSearchCategory
INFO: Global Search category repository registered.
Nov 19, 2018 2:31:06 PM com.rapidminer.tools.plugin.Plugin registerPlugins
INFO: Register plugin: Cloud Connectivity
Nov 19, 2018 2:31:06 PM com.rapidminer.tools.plugin.Plugin registerPlugins
INFO: Register plugin: Cloud Execution
Nov 19, 2018 2:31:06 PM com.rapidminer.tools.plugin.Plugin registerPlugins
INFO: Register plugin: Clustering Performance Plugin
Nov 19, 2018 2:31:06 PM com.rapidminer.tools.plugin.Plugin registerPlugins
INFO: Register plugin: Data Editor
Nov 19, 2018 2:31:06 PM com.rapidminer.tools.plugin.Plugin registerPlugins
INFO: Register plugin: H2O
Nov 19, 2018 2:31:06 PM com.rapidminer.tools.plugin.Plugin registerPlugins
INFO: Register plugin: Model Simulator
Nov 19, 2018 2:31:06 PM com.rapidminer.tools.plugin.Plugin registerPlugins
INFO: Register plugin: Operator Recommender
Nov 19, 2018 2:31:06 PM com.rapidminer.tools.plugin.Plugin registerPlugins
INFO: Register plugin: Process Scheduling
Nov 19, 2018 2:31:06 PM com.rapidminer.tools.plugin.Plugin registerPlugins
INFO: Register plugin: Advanced File Connectors
Nov 19, 2018 2:31:06 PM com.rapidminer.tools.plugin.Plugin registerPlugins
INFO: Register plugin: Concurrency
Nov 19, 2018 2:31:06 PM com.rapidminer.tools.plugin.Plugin registerPlugins
INFO: Register plugin: JDBC Connectors
Nov 19, 2018 2:31:06 PM com.rapidminer.tools.plugin.Plugin registerPlugins
INFO: Register plugin: Legacy Result Access
Nov 19, 2018 2:31:06 PM com.rapidminer.tools.plugin.Plugin registerPlugins
INFO: Register plugin: Productivity
Nov 19, 2018 2:31:06 PM com.rapidminer.tools.plugin.Plugin registerPlugins
INFO: Register plugin: Professional
Nov 19, 2018 2:31:06 PM com.rapidminer.tools.plugin.Plugin registerPlugins
INFO: Register plugin: Remote Repository
Nov 19, 2018 2:31:06 PM com.rapidminer.tools.plugin.Plugin registerPlugins
INFO: Register plugin: Social Media
Nov 19, 2018 2:31:06 PM com.rapidminer.tools.plugin.Plugin registerPlugins
INFO: Register plugin: Time Series
Nov 19, 2018 2:31:08 PM com.rapidminer.tools.config.ConfigurationManager register
INFO: Registered configurator Twitter Connection.
Nov 19, 2018 2:31:08 PM com.rapidminer.tools.config.ConfigurationManager register
INFO: Registered configurator Salesforce Connection.
Nov 19, 2018 2:31:08 PM com.rapidminer.tools.config.ConfigurationManager register
INFO: Registered configurator Amazon S3 Connection.
Nov 19, 2018 2:31:08 PM com.rapidminer.tools.config.ConfigurationManager register
INFO: Registered configurator Azure Blob Storage Connection.
Nov 19, 2018 2:31:08 PM com.rapidminer.tools.config.ConfigurationManager register
INFO: Registered configurator Google Cloud Storage Connection.
Nov 19, 2018 2:31:08 PM com.rapidminer.tools.config.ConfigurationManager register
INFO: Registered configurator Dropbox Connection.
Nov 19, 2018 2:31:09 PM com.rapidminer.extension.jdbc.tools.jdbc.JDBCProperties <init>
WARNING: Missing database driver class name for ODBC Bridge (e.g. Access)
Nov 19, 2018 2:31:09 PM com.rapidminer.extension.jdbc.tools.jdbc.JDBCProperties <init>
WARNING: Missing database driver class name for Ingres
Nov 19, 2018 2:31:09 PM com.rapidminer.extension.jdbc.tools.jdbc.JDBCProperties registerDrivers
WARNING: Driver jar file C:\Program Files\RapidMiner\RapidMiner Studio referenced for JDBC driver Test does not exist.
Nov 19, 2018 2:31:09 PM com.rapidminer.extension.jdbc.tools.jdbc.JDBCProperties registerDrivers
INFO: JDBC driver  not found in 
Nov 19, 2018 2:31:11 PM com.rapidminer.repository.RepositoryManager registerExtensionSamples
INFO: Registered 'Time Series' as sample folder.
Nov 19, 2018 2:31:11 PM com.rapidminer.tools.plugin.Plugin initAll
INFO: Extension Process Scheduling was loaded in 146ms.
Nov 19, 2018 2:31:11 PM com.rapidminer.tools.plugin.Plugin initAll
INFO: Extension Professional was loaded in 148ms.
Nov 19, 2018 2:31:11 PM com.rapidminer.tools.plugin.Plugin initAll
INFO: Extension Productivity was loaded in 150ms.
Nov 19, 2018 2:31:11 PM com.rapidminer.tools.plugin.Plugin initAll
INFO: Extension Operator Recommender was loaded in 150ms.
Nov 19, 2018 2:31:11 PM com.rapidminer.tools.plugin.Plugin initAll
INFO: Extension Legacy Result Access was loaded in 153ms.
Nov 19, 2018 2:31:11 PM com.rapidminer.tools.plugin.Plugin initAll
INFO: Extension Clustering Performance Plugin was loaded in 157ms.
Nov 19, 2018 2:31:11 PM com.rapidminer.tools.plugin.Plugin initAll
INFO: Extension Remote Repository was loaded in 159ms.
Nov 19, 2018 2:31:11 PM com.rapidminer.tools.plugin.Plugin initAll
INFO: Extension Social Media was loaded in 171ms.
Nov 19, 2018 2:31:11 PM com.rapidminer.tools.plugin.Plugin initAll
INFO: Extension Advanced File Connectors was loaded in 185ms.
Nov 19, 2018 2:31:11 PM com.rapidminer.tools.plugin.Plugin initAll
INFO: Extension Model Simulator was loaded in 192ms.
Nov 19, 2018 2:31:11 PM com.rapidminer.tools.plugin.Plugin initAll
INFO: Extension Data Editor was loaded in 194ms.
Nov 19, 2018 2:31:11 PM com.rapidminer.tools.plugin.Plugin initAll
INFO: Extension Cloud Execution was loaded in 224ms.
Nov 19, 2018 2:31:11 PM com.rapidminer.tools.plugin.Plugin initAll
INFO: Extension Time Series was loaded in 234ms.
Nov 19, 2018 2:31:11 PM com.rapidminer.tools.plugin.Plugin initAll
INFO: Extension Concurrency was loaded in 255ms.
Nov 19, 2018 2:31:11 PM com.rapidminer.tools.plugin.Plugin initAll
INFO: Extension H2O was loaded in 447ms.
Nov 19, 2018 2:31:11 PM com.rapidminer.tools.plugin.Plugin initAll
INFO: Extension Cloud Connectivity was loaded in 450ms.
Nov 19, 2018 2:31:11 PM com.rapidminer.tools.plugin.Plugin initAll
INFO: Extension JDBC Connectors was loaded in 1609ms.
Nov 19, 2018 2:31:11 PM com.rapidminer.search.GlobalSearchRegistry registerSearchCategory
INFO: Global Search category operator registered.
Nov 19, 2018 2:31:18 PM com.rapidminer.tools.config.ConfigurationManager loadConfiguration
INFO: Load configuration for Amazon S3 Connection.
Nov 19, 2018 2:31:18 PM com.rapidminer.tools.config.ClientConfigurationManager loadAllParameters
INFO: No configuration file found for Amazon S3 Connection
Nov 19, 2018 2:31:18 PM com.rapidminer.tools.config.ConfigurationManager loadConfiguration
INFO: Loaded configurations for 0 objects of type Amazon S3 Connection.
Nov 19, 2018 2:31:18 PM com.rapidminer.tools.config.ConfigurationManager loadConfiguration
INFO: Load configuration for Azure Blob Storage Connection.
Nov 19, 2018 2:31:18 PM com.rapidminer.tools.config.ClientConfigurationManager loadAllParameters
INFO: No configuration file found for Azure Blob Storage Connection
Nov 19, 2018 2:31:18 PM com.rapidminer.tools.config.ConfigurationManager loadConfiguration
INFO: Loaded configurations for 0 objects of type Azure Blob Storage Connection.
Nov 19, 2018 2:31:18 PM com.rapidminer.tools.config.ConfigurationManager loadConfiguration
INFO: Load configuration for Dropbox Connection.
Nov 19, 2018 2:31:18 PM com.rapidminer.tools.config.ClientConfigurationManager loadAllParameters
INFO: No configuration file found for Dropbox Connection
Nov 19, 2018 2:31:18 PM com.rapidminer.tools.config.ConfigurationManager loadConfiguration
INFO: Loaded configurations for 0 objects of type Dropbox Connection.
Nov 19, 2018 2:31:18 PM com.rapidminer.tools.config.ConfigurationManager loadConfiguration
INFO: Load configuration for Salesforce Connection.
Nov 19, 2018 2:31:18 PM com.rapidminer.tools.config.ClientConfigurationManager loadAllParameters
INFO: No configuration file found for Salesforce Connection
Nov 19, 2018 2:31:18 PM com.rapidminer.tools.config.ConfigurationManager loadConfiguration
INFO: Loaded configurations for 0 objects of type Salesforce Connection.
Nov 19, 2018 2:31:18 PM com.rapidminer.tools.config.ConfigurationManager loadConfiguration
INFO: Load configuration for Google Cloud Storage Connection.
Nov 19, 2018 2:31:18 PM com.rapidminer.tools.config.ClientConfigurationManager loadAllParameters
INFO: No configuration file found for Google Cloud Storage Connection
Nov 19, 2018 2:31:18 PM com.rapidminer.tools.config.ConfigurationManager loadConfiguration
INFO: Loaded configurations for 0 objects of type Google Cloud Storage Connection.
Nov 19, 2018 2:31:18 PM com.rapidminer.tools.config.ConfigurationManager loadConfiguration
INFO: Load configuration for Twitter Connection.
Nov 19, 2018 2:31:18 PM com.rapidminer.tools.config.ClientConfigurationManager loadAllParameters
INFO: No configuration file found for Twitter Connection
Nov 19, 2018 2:31:18 PM com.rapidminer.tools.config.ConfigurationManager loadConfiguration
INFO: Loaded configurations for 0 objects of type Twitter Connection.
Nov 19, 2018 2:31:19 PM com.rapidminer.search.GlobalSearchRegistry registerSearchCategory
INFO: Global Search category actions registered.
Nov 19, 2018 2:31:22 PM com.rapidminer.gui.search.GlobalSearchGUIRegistry registerSearchVisualizationProvider
INFO: Global Search GUI provider added for category operator.
Nov 19, 2018 2:31:22 PM com.rapidminer.gui.search.GlobalSearchGUIRegistry registerSearchVisualizationProvider
INFO: Global Search GUI provider added for category repository.
Nov 19, 2018 2:31:22 PM com.rapidminer.gui.search.GlobalSearchGUIRegistry registerSearchVisualizationProvider
INFO: Global Search GUI provider added for category actions.
Nov 19, 2018 2:31:22 PM com.rapidminer.search.GlobalSearchRegistry registerSearchCategory
INFO: Global Search category marketplace registered.
Nov 19, 2018 2:31:22 PM com.rapidminer.gui.search.GlobalSearchGUIRegistry registerSearchVisualizationProvider
INFO: Global Search GUI provider added for category marketplace.
Nov 19, 2018 2:31:47 PM com.rapidminer.Process setProcessLocation
INFO: Decoupling process from location C:\Users\dfoerste\.RapidMiner\autosave\autosaved_process.xml. Process is now associated with file //Local Repository/processes/Clusteranalyse.
Nov 19, 2018 2:34:30 PM com.rapidminer.tools.ResultService init
INFO: No filename given for result file, using stdout for logging results!
Nov 19, 2018 2:34:30 PM com.rapidminer.Process execute
INFO: Process //Local Repository/processes/Clusteranalyse starts
Nov 19, 2018 2:34:30 PM com.rapidminer.extension.jdbc.tools.jdbc.DatabaseHandler executeStatement
INFO: Executing query: 'SELECT *
FROM "dbo"."barillakaeufer"'
Nov 19, 2018 2:34:31 PM com.rapidminer.Process saveResults
INFO: Saving results.
Nov 19, 2018 2:34:31 PM com.rapidminer.Process execute
INFO: Process //Local Repository/processes/Clusteranalyse finished successfully after 0 s

Answers

  • mschmitzmschmitz Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 1,874  RM Data Scientist
    Hi @daniel_foerster,
    i got some problems getting your process in. Can you maybe share the RMP?
    BR,
    Martin
    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
  • daniel_foersterdaniel_foerster Member Posts: 5 Contributor I
    Hi @mschmitz

    i'm sorry that you're having Problems, the RMP File is attached now.

    It's my Bachelor Thesis Project and i'm using the educational Licene - i forgot to mention this above!
  • mschmitzmschmitz Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 1,874  RM Data Scientist
    HI @daniel_foerster ,
    X-Means has in the current version a bug which causes it to return twice the number of examples. Can you try to set the compability to 9.0.0 ?
    Best,
    Martin
    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
  • daniel_foersterdaniel_foerster Member Posts: 5 Contributor I
    Hi @mschmitz

    i changed the compability of the X-means to 9.0.0 and still don't get results and don't see the process in the process history.


  • daniel_foersterdaniel_foerster Member Posts: 5 Contributor I
    HI @mschmitz

    do you have any update for my problem here?

    Thanks in advance.
Sign In or Register to comment.