Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
how can ı affect on some attributes?because some of attributes more important than others
ı did clustering(k-mean) in my process and now ı want to affect on some attributes .ı used to generate attribute but it is looking lıke doesnt affect to attributes.so what ı need to do ? which way or operator gonna make it work ?
ı shared xml
thanks
-------------------
ı shared xml
thanks
-------------------
<?xml version="1.0" encoding="UTF-8"?><process version="9.2.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="9.2.001" expanded="true" name="Process">
<parameter key="logverbosity" value="init"/>
<parameter key="random_seed" value="2001"/>
<parameter key="send_mail" value="never"/>
<parameter key="notification_email" value=""/>
<parameter key="process_duration_for_mail" value="30"/>
<parameter key="encoding" value="SYSTEM"/>
<process expanded="true">
<operator activated="true" class="read_excel" compatibility="9.2.001" expanded="true" height="68" name="Read Excel" width="90" x="45" y="85">
<parameter key="excel_file" value="C:\Users\selimcelebi\Desktop\k-means.xlsx"/>
<parameter key="sheet_selection" value="sheet number"/>
<parameter key="sheet_number" value="1"/>
<parameter key="imported_cell_range" value="F1:J36"/>
<parameter key="encoding" value="SYSTEM"/>
<parameter key="first_row_as_names" value="true"/>
<list key="annotations"/>
<parameter key="date_format" value=""/>
<parameter key="time_zone" value="SYSTEM"/>
<parameter key="locale" value="English (United States)"/>
<parameter key="read_all_values_as_polynominal" value="false"/>
<list key="data_set_meta_data_information">
<parameter key="0" value="ürün ID.true.integer.attribute"/>
<parameter key="1" value="hacim.true.integer.attribute"/>
<parameter key="2" value="satış miktarı.true.integer.attribute"/>
<parameter key="3" value="ağırlık.true.real.attribute"/>
<parameter key="4" value="kırılganlık.true.polynominal.attribute"/>
</list>
<parameter key="read_not_matching_values_as_missings" value="false"/>
<parameter key="datamanagement" value="double_array"/>
<parameter key="data_management" value="auto"/>
</operator>
<operator activated="true" class="set_role" compatibility="9.2.001" expanded="true" height="82" name="Set Role" width="90" x="179" y="85">
<parameter key="attribute_name" value="ürün ID"/>
<parameter key="target_role" value="id"/>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="nominal_to_numerical" compatibility="9.2.001" expanded="true" height="103" name="Nominal to Numerical" width="90" x="313" y="85">
<parameter key="return_preprocessing_model" value="false"/>
<parameter key="create_view" value="false"/>
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="kırılganlık"/>
<parameter key="attributes" value=""/>
<parameter key="use_except_expression" value="false"/>
<parameter key="value_type" value="nominal"/>
<parameter key="use_value_type_exception" value="false"/>
<parameter key="except_value_type" value="file_path"/>
<parameter key="block_type" value="single_value"/>
<parameter key="use_block_type_exception" value="false"/>
<parameter key="except_block_type" value="single_value"/>
<parameter key="invert_selection" value="false"/>
<parameter key="include_special_attributes" value="false"/>
<parameter key="coding_type" value="dummy coding"/>
<parameter key="use_comparison_groups" value="false"/>
<list key="comparison_groups"/>
<parameter key="unexpected_value_handling" value="all 0 and warning"/>
<parameter key="use_underscore_in_name" value="false"/>
</operator>
<operator activated="true" class="normalize" compatibility="9.2.001" expanded="true" height="103" name="Normalize" width="90" x="447" y="85">
<parameter key="return_preprocessing_model" value="false"/>
<parameter key="create_view" value="false"/>
<parameter key="attribute_filter_type" value="all"/>
<parameter key="attribute" value=""/>
<parameter key="attributes" value=""/>
<parameter key="use_except_expression" value="false"/>
<parameter key="value_type" value="numeric"/>
<parameter key="use_value_type_exception" value="false"/>
<parameter key="except_value_type" value="real"/>
<parameter key="block_type" value="value_series"/>
<parameter key="use_block_type_exception" value="false"/>
<parameter key="except_block_type" value="value_series_end"/>
<parameter key="invert_selection" value="false"/>
<parameter key="include_special_attributes" value="false"/>
<parameter key="method" value="Z-transformation"/>
<parameter key="min" value="0.0"/>
<parameter key="max" value="1.0"/>
<parameter key="allow_negative_values" value="false"/>
</operator>
<operator activated="true" class="generate_attributes" compatibility="9.2.001" expanded="true" height="82" name="Generate Attributes" width="90" x="581" y="85">
<list key="function_descriptions">
<parameter key="ağırlık" value="[ağırlık]*0.2"/>
<parameter key="hacim" value="[hacim]*0.5"/>
<parameter key="satış miktarı" value="[satış miktarı]*0.3"/>
</list>
<parameter key="keep_all" value="true"/>
</operator>
<operator activated="true" class="select_attributes" compatibility="9.2.001" expanded="true" height="82" name="Select Attributes" width="90" x="782" y="34">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attribute" value="F"/>
<parameter key="attributes" value="|F|ürün ID"/>
<parameter key="use_except_expression" value="false"/>
<parameter key="value_type" value="attribute_value"/>
<parameter key="use_value_type_exception" value="false"/>
<parameter key="except_value_type" value="time"/>
<parameter key="block_type" value="attribute_block"/>
<parameter key="use_block_type_exception" value="false"/>
<parameter key="except_block_type" value="value_matrix_row_start"/>
<parameter key="invert_selection" value="true"/>
<parameter key="include_special_attributes" value="true"/>
</operator>
<operator activated="true" class="select_attributes" compatibility="9.2.001" expanded="true" height="82" name="Select Attributes (2)" width="90" x="782" y="238">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attribute" value="F"/>
<parameter key="attributes" value="|ürün ID"/>
<parameter key="use_except_expression" value="false"/>
<parameter key="value_type" value="attribute_value"/>
<parameter key="use_value_type_exception" value="false"/>
<parameter key="except_value_type" value="time"/>
<parameter key="block_type" value="attribute_block"/>
<parameter key="use_block_type_exception" value="false"/>
<parameter key="except_block_type" value="value_matrix_row_start"/>
<parameter key="invert_selection" value="true"/>
<parameter key="include_special_attributes" value="true"/>
</operator>
<operator activated="true" class="generate_id" compatibility="9.2.001" expanded="true" height="82" name="Generate ID (2)" width="90" x="983" y="238">
<parameter key="create_nominal_ids" value="false"/>
<parameter key="offset" value="0"/>
</operator>
<operator activated="true" class="set_macros" compatibility="9.2.001" expanded="true" height="82" name="Set Macros" width="90" x="916" y="34">
<list key="macros">
<parameter key="cluster_number" value="5"/>
</list>
</operator>
<operator activated="true" class="python_scripting:execute_python" compatibility="9.2.000" expanded="true" height="103" name="Execute Python" width="90" x="1050" y="34">
<parameter key="script" value="import pandas as pd from operator import itemgetter import numpy as np import random import sys from scipy.spatial import distance from sklearn.cluster import KMeans C = %{cluster_number} def k_means(X) : kmeans = KMeans(n_clusters=C, random_state=0).fit(X) return kmeans.cluster_centers_ def samesizecluster( D ): """ in: point-to-cluster-centre distances D, Npt x C out: xtoc, X -> C, equal-size clusters """ Npt, C = D.shape clustersize = (Npt + C - 1) // C xcd = list( np.ndenumerate(D) ) # ((0,0), d00), ((0,1), d01) ... xcd.sort( key=itemgetter(1) ) xtoc = np.ones( Npt, int ) * -1 nincluster = np.zeros( C, int ) nall = 0 for (x,c), d in xcd: if xtoc[x] < 0 and nincluster[c] < clustersize: xtoc[x] = c nincluster[c] += 1 nall += 1 if nall >= Npt: break return xtoc def rm_main(data): data_2 = data.values centres = k_means(data_2) D = distance.cdist( data_2, centres ) xtoc = samesizecluster( D ) data['cluster'] = xtoc return data"/>
<parameter key="use_default_python" value="true"/>
<parameter key="package_manager" value="conda (anaconda)"/>
</operator>
<operator activated="true" class="set_role" compatibility="9.2.001" expanded="true" height="82" name="Set Role (2)" width="90" x="1184" y="34">
<parameter key="attribute_name" value="cluster"/>
<parameter key="target_role" value="cluster"/>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="generate_id" compatibility="9.2.001" expanded="true" height="82" name="Generate ID" width="90" x="1318" y="34">
<parameter key="create_nominal_ids" value="false"/>
<parameter key="offset" value="0"/>
</operator>
<operator activated="true" class="concurrency:join" compatibility="9.2.001" expanded="true" height="82" name="Join" width="90" x="1251" y="238">
<parameter key="remove_double_attributes" value="true"/>
<parameter key="join_type" value="inner"/>
<parameter key="use_id_attribute_as_key" value="true"/>
<list key="key_attributes"/>
<parameter key="keep_both_join_attributes" value="false"/>
</operator>
<connect from_port="input 1" to_op="Read Excel" to_port="file"/>
<connect from_op="Read Excel" from_port="output" to_op="Set Role" to_port="example set input"/>
<connect from_op="Set Role" from_port="example set output" to_op="Nominal to Numerical" to_port="example set input"/>
<connect from_op="Nominal to Numerical" from_port="example set output" to_op="Normalize" to_port="example set input"/>
<connect from_op="Normalize" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
<connect from_op="Generate Attributes" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
<connect from_op="Select Attributes" from_port="example set output" to_op="Set Macros" to_port="through 1"/>
<connect from_op="Select Attributes" from_port="original" to_op="Select Attributes (2)" to_port="example set input"/>
<connect from_op="Select Attributes (2)" from_port="example set output" to_op="Generate ID (2)" to_port="example set input"/>
<connect from_op="Generate ID (2)" from_port="example set output" to_op="Join" to_port="left"/>
<connect from_op="Set Macros" from_port="through 1" to_op="Execute Python" to_port="input 1"/>
<connect from_op="Execute Python" from_port="output 1" to_op="Set Role (2)" to_port="example set input"/>
<connect from_op="Set Role (2)" from_port="example set output" to_op="Generate ID" to_port="example set input"/>
<connect from_op="Generate ID" from_port="example set output" to_op="Join" to_port="right"/>
<connect from_op="Join" from_port="join" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="source_input 2" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
Tagged:
0
Best Answer
-
yyhuang Administrator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 364 RM Data ScientistHi @Selim,
Thanks for sharing your process with integrated python scripts.
I saw that you are clustering some data with k-means. How about some normalization on the attributes. For important attributes, we can use range normalization method to make the normalized data in the new range, e.g. [0, 3]. But everything else would be normalized into [0 , 1].
Since k-means can be implemented with Euclidean distance, the normalization would change the calculated distance based on the interference.
Take the iris data as example. We put much higher weight on a3 than other attributes a1, a2, a4.
Here is the normalization process before clustering.<?xml version="1.0" encoding="UTF-8"?><process version="9.2.001"> <context> <input/> <output/> <macros/> </context> <operator activated="true" class="process" compatibility="6.0.002" expanded="true" name="Process" origin="GENERATED_TUTORIAL"> <parameter key="logverbosity" value="init"/> <parameter key="random_seed" value="2001"/> <parameter key="send_mail" value="never"/> <parameter key="notification_email" value=""/> <parameter key="process_duration_for_mail" value="30"/> <parameter key="encoding" value="SYSTEM"/> <process expanded="true"> <operator activated="true" class="retrieve" compatibility="9.2.001" expanded="true" height="68" name="Retrieve Iris" origin="GENERATED_TUTORIAL" width="90" x="45" y="34"> <parameter key="repository_entry" value="//Samples/data/Iris"/> <description align="center" color="blue" colored="true" width="126">The Iris data set is retrieved from the Samples folder.<br/>The label Attribute remains in the ExampleSet for comparison the results of the Clustering. It is not used in the Clustering itself.</description> </operator> <operator activated="true" class="normalize" compatibility="9.2.001" expanded="true" height="103" name="Normalize" width="90" x="179" y="34"> <parameter key="return_preprocessing_model" value="false"/> <parameter key="create_view" value="false"/> <parameter key="attribute_filter_type" value="subset"/> <parameter key="attribute" value=""/> <parameter key="attributes" value="a1|a2|a4"/> <parameter key="use_except_expression" value="false"/> <parameter key="value_type" value="numeric"/> <parameter key="use_value_type_exception" value="false"/> <parameter key="except_value_type" value="real"/> <parameter key="block_type" value="value_series"/> <parameter key="use_block_type_exception" value="false"/> <parameter key="except_block_type" value="value_series_end"/> <parameter key="invert_selection" value="false"/> <parameter key="include_special_attributes" value="false"/> <parameter key="method" value="range transformation"/> <parameter key="min" value="0.0"/> <parameter key="max" value="1.0"/> <parameter key="allow_negative_values" value="false"/> </operator> <operator activated="true" class="normalize" compatibility="9.2.001" expanded="true" height="103" name="Normalize (2)" width="90" x="313" y="34"> <parameter key="return_preprocessing_model" value="false"/> <parameter key="create_view" value="false"/> <parameter key="attribute_filter_type" value="single"/> <parameter key="attribute" value="a3"/> <parameter key="attributes" value=""/> <parameter key="use_except_expression" value="false"/> <parameter key="value_type" value="numeric"/> <parameter key="use_value_type_exception" value="false"/> <parameter key="except_value_type" value="real"/> <parameter key="block_type" value="value_series"/> <parameter key="use_block_type_exception" value="false"/> <parameter key="except_block_type" value="value_series_end"/> <parameter key="invert_selection" value="false"/> <parameter key="include_special_attributes" value="false"/> <parameter key="method" value="range transformation"/> <parameter key="min" value="0.0"/> <parameter key="max" value="3.0"/> <parameter key="allow_negative_values" value="false"/> </operator> <operator activated="true" class="concurrency:k_means" compatibility="9.0.001" expanded="true" height="82" name="Clustering" origin="GENERATED_TUTORIAL" width="90" x="447" y="34"> <parameter key="add_cluster_attribute" value="true"/> <parameter key="add_as_label" value="false"/> <parameter key="remove_unlabeled" value="false"/> <parameter key="k" value="3"/> <parameter key="max_runs" value="10"/> <parameter key="determine_good_start_values" value="false"/> <parameter key="measure_types" value="BregmanDivergences"/> <parameter key="mixed_measure" value="MixedEuclideanDistance"/> <parameter key="nominal_measure" value="NominalDistance"/> <parameter key="numerical_measure" value="EuclideanDistance"/> <parameter key="divergence" value="SquaredEuclideanDistance"/> <parameter key="kernel_type" value="radial"/> <parameter key="kernel_gamma" value="1.0"/> <parameter key="kernel_sigma1" value="1.0"/> <parameter key="kernel_sigma2" value="0.0"/> <parameter key="kernel_sigma3" value="2.0"/> <parameter key="kernel_degree" value="3.0"/> <parameter key="kernel_shift" value="1.0"/> <parameter key="kernel_a" value="1.0"/> <parameter key="kernel_b" value="0.0"/> <parameter key="max_optimization_steps" value="100"/> <parameter key="use_local_random_seed" value="true"/> <parameter key="local_random_seed" value="1992"/> <description align="center" color="green" colored="true" width="126">The k-Means algorithm is used to determine three clusters on the Iris data set and assign each Example to one cluster.</description> </operator> <connect from_op="Retrieve Iris" from_port="output" to_op="Normalize" to_port="example set input"/> <connect from_op="Normalize" from_port="example set output" to_op="Normalize (2)" to_port="example set input"/> <connect from_op="Normalize (2)" from_port="example set output" to_op="Clustering" to_port="example set"/> <connect from_op="Clustering" from_port="clustered set" to_port="result 1"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="sink_result 1" spacing="0"/> <portSpacing port="sink_result 2" spacing="0"/> <description align="left" color="yellow" colored="false" height="281" resized="true" width="788" x="24" y="371">Look into the results of the process:<br>ExampleSet (Rename):<br>- cluster_0 consist mainly of iris_virginica Examples (36) with only a few (3) iris_versicolor Examples<br>- cluster_1 consists completely of iris_setosa Examples (50). Also iris_setosa Example cannot be found in other clusters.<br>- cluster_2 consists most of iris_versicolor Examples (47) but with also some (14) iris_virginica Examples<br><br>ExampleSet (Clustering):<br>- You can visualize the assignment of the Examples to the clusters by using the 'Scatter' Chart, plotting two of the Attributes a1,a2,a3,a4 on x-and y-axis and the cluster Attribute as Color Column<br><br>Cluster Model (Clustering):<br>- The Cluster Model consist information which Example is assigned to which cluster<br/>- the size of the clusters can be visualized as a graph<br/>- the position of the centroids is listed</description> </process> </operator> </process>
Best,
YY
5
Answers
my last xml pls check to coefficients
----------------------------------