Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
W-Apriori Missing Parameter
Hey,
i am new to data mining and RapidMiner especially. At the moment i work on my bachelor thesis where at some point i have to compare the fp-growth and the apriori algorithm.
The fp-growth algorithm within rapidminer works fine. But when i want to use apriori via weka-extension there is a problem. The help window says there is a paramenter Z (Z: Treat zero (i.e. first value of nominal attributes) as missing Range: boolean; default: false) . But within the process there is no such parameter. Is there a way to set the parameter anyway? Now my Results of apriori are wrong and i think this parameter set to true could fix this.
My dataset looks like that: http://s7.directupload.net/file/d/2998/6noxngwr_png.htm
This is how the xml output looks like:
i am new to data mining and RapidMiner especially. At the moment i work on my bachelor thesis where at some point i have to compare the fp-growth and the apriori algorithm.
The fp-growth algorithm within rapidminer works fine. But when i want to use apriori via weka-extension there is a problem. The help window says there is a paramenter Z (Z: Treat zero (i.e. first value of nominal attributes) as missing Range: boolean; default: false) . But within the process there is no such parameter. Is there a way to set the parameter anyway? Now my Results of apriori are wrong and i think this parameter set to true could fix this.
My dataset looks like that: http://s7.directupload.net/file/d/2998/6noxngwr_png.htm
This is how the xml output looks like:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.008">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.2.008" expanded="true" name="Process">
<process expanded="true" height="643" width="1028">
<operator activated="true" class="retrieve" compatibility="5.2.008" expanded="true" height="60" name="Retrieve" width="90" x="210" y="190">
<parameter key="repository_entry" value="Transformierte Transatkionsdatenbank"/>
</operator>
<operator activated="true" class="select_attributes" compatibility="5.2.008" expanded="true" height="76" name="Select Attributes" width="90" x="313" y="435">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="Hol_Link_1"/>
<parameter key="invert_selection" value="true"/>
</operator>
<operator activated="true" class="weka:W-Apriori" compatibility="5.1.001" expanded="true" height="60" name="W-Apriori" width="90" x="648" y="255">
<parameter key="M" value="0.5"/>
<parameter key="I" value="true"/>
<parameter key="V" value="true"/>
</operator>
<connect from_op="Retrieve" from_port="output" to_op="Select Attributes" to_port="example set input"/>
<connect from_op="Select Attributes" from_port="example set output" to_op="W-Apriori" to_port="example set"/>
<connect from_op="W-Apriori" from_port="associator" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
[ /code]
0
Answers
the operator manual is outdated in this spot. But you can use the Declare Missing value operator before W-Apriori to explicitly define 0 as missing.
Best, Marius
But it seems that doesnt solve my problem. The algorithm generates the wrong large itemsets.
As you can see at the picture below, the algorithm creates for example a large itemset for Abra Alba=0 with 173 support.
But there is no large itemset for Abra Alba=1 although the support is over 500.
I only want itemsets where the parameter is 1.
I thought when I declare 0 as missing value it would help. But when I use the Declare Missing value operator like you mentioned I get no results.
Is there maybe something wrong with my data set?
Large 1-itemsets
http://s1.directupload.net/file/d/3000/6pw5klye_png.htm
EDIT: Seems to be the same problem as it is in this thread http://rapid-i.com/rapidforum/index.php?topic=2281.0
But the suggested solution isnt working correct either.