Options

Recommender System on Rapidminer

hattanhattan Member Posts: 7 Contributor II
edited November 2018 in Help
Hi

I'm trying to build a recommendation system to recommend goals for users based on their goal list..

it's my senior project I'm working on it by my own, and running out of time,

so pleeeeease I need any avilable help or advice you can provide me with ;


I have my dataset (125 user ,7 category,1800 goal name) as access database

image


I have made the cluster for each goal category separately, got 5 clusters group for each category
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.1.014">
 <context>
   <input/>
   <output/>
   <macros/>
 </context>
 <operator activated="true" class="process" compatibility="5.1.014" expanded="true" name="Process">
   <process expanded="true" height="431" width="681">
     <operator activated="true" class="retrieve" compatibility="5.1.014" expanded="true" height="60" name="Retrieve" width="90" x="6" y="38">
       <parameter key="repository_entry" value="Travel &amp; Entertainment"/>
     </operator>
     <operator activated="true" class="select_attributes" compatibility="5.1.014" expanded="true" height="76" name="Select Attributes" width="90" x="45" y="120">
       <parameter key="attribute_filter_type" value="subset"/>
       <parameter key="attribute" value="Goal  Name"/>
       <parameter key="attributes" value="Goal  Name|Goal-ID|User_ID"/>
     </operator>
     <operator activated="true" class="nominal_to_text" compatibility="5.1.014" expanded="true" height="76" name="Nominal to Text" width="90" x="45" y="210">
       <parameter key="attribute_filter_type" value="single"/>
       <parameter key="attribute" value="Goal  Name"/>
     </operator>
     <operator activated="true" class="text:process_document_from_data" compatibility="5.1.004" expanded="true" height="76" name="Process Documents from Data" width="90" x="45" y="300">
       <parameter key="vector_creation" value="Binary Term Occurrences"/>
       <list key="specify_weights"/>
       <process expanded="true" height="370" width="563">
         <operator activated="true" class="text:tokenize" compatibility="5.1.004" expanded="true" height="60" name="Tokenize" width="90" x="45" y="30"/>
         <operator activated="true" class="text:transform_cases" compatibility="5.1.004" expanded="true" height="60" name="Transform Cases" width="90" x="45" y="120"/>
         <operator activated="true" class="text:filter_stopwords_english" compatibility="5.1.004" expanded="true" height="60" name="Filter Stopwords (English)" width="90" x="315" y="30"/>
         <operator activated="true" class="text:stem_porter" compatibility="5.1.004" expanded="true" height="60" name="Stem (Porter)" width="90" x="448" y="30"/>
         <connect from_port="document" to_op="Tokenize" to_port="document"/>
         <connect from_op="Tokenize" from_port="document" to_op="Transform Cases" to_port="document"/>
         <connect from_op="Transform Cases" from_port="document" to_op="Filter Stopwords (English)" to_port="document"/>
         <connect from_op="Filter Stopwords (English)" from_port="document" to_op="Stem (Porter)" to_port="document"/>
         <connect from_op="Stem (Porter)" from_port="document" to_port="document 1"/>
         <portSpacing port="source_document" spacing="0"/>
         <portSpacing port="sink_document 1" spacing="0"/>
         <portSpacing port="sink_document 2" spacing="0"/>
       </process>
     </operator>
     <operator activated="true" class="k_means" compatibility="5.1.014" expanded="true" height="76" name="Clustering" width="90" x="179" y="30">
       <parameter key="k" value="5"/>
     </operator>
     <operator activated="true" class="select_attributes" compatibility="5.1.014" expanded="true" height="76" name="Select Attributes (2)" width="90" x="313" y="255">
       <parameter key="attribute_filter_type" value="subset"/>
       <parameter key="attributes" value="|Goal-ID|User_ID|cluster"/>
     </operator>
     <connect from_op="Retrieve" from_port="output" to_op="Select Attributes" to_port="example set input"/>
     <connect from_op="Select Attributes" from_port="example set output" to_op="Nominal to Text" to_port="example set input"/>
     <connect from_op="Nominal to Text" from_port="example set output" to_op="Process Documents from Data" to_port="example set"/>
     <connect from_op="Process Documents from Data" from_port="example set" to_op="Clustering" to_port="example set"/>
     <connect from_op="Clustering" from_port="clustered set" to_op="Select Attributes (2)" to_port="example set input"/>
     <connect from_op="Select Attributes (2)" from_port="example set output" to_port="result 1"/>
     <portSpacing port="source_input 1" spacing="0"/>
     <portSpacing port="sink_result 1" spacing="0"/>
     <portSpacing port="sink_result 2" spacing="0"/>
   </process>
 </operator>
</process>
image


I made a lot of manual work, I renamed clusters name, made crosstab query for clusters name with user ID,did other modification  to get binominal values as showen in here (f=means they don't have goal in the cluster,t=mean they do)



image

then in an other process I apply the association rule for frequent items as in here:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.1.014">
 <context>
   <input/>
   <output/>
   <macros/>
 </context>
 <operator activated="true" class="process" compatibility="5.1.014" expanded="true" name="Process">
   <process expanded="true" height="390" width="614">
     <operator activated="true" class="retrieve" compatibility="5.1.014" expanded="true" height="60" name="Retrieve" width="90" x="45" y="75">
       <parameter key="repository_entry" value="read User_Cluster"/>
     </operator>
     <operator activated="true" class="select_attributes" compatibility="5.1.014" expanded="true" height="76" name="Select Attributes" width="90" x="179" y="75">
       <parameter key="attribute_filter_type" value="single"/>
       <parameter key="attribute" value="User-ID"/>
       <parameter key="invert_selection" value="true"/>
     </operator>
     <operator activated="true" class="fp_growth" compatibility="5.1.014" expanded="true" height="76" name="FP-Growth" width="90" x="313" y="75">
       <parameter key="min_support" value="0.5"/>
     </operator>
     <operator activated="true" class="create_association_rules" compatibility="5.1.014" expanded="true" height="76" name="Create Association Rules" width="90" x="447" y="75">
       <parameter key="min_confidence" value="0.5"/>
     </operator>
     <connect from_op="Retrieve" from_port="output" to_op="Select Attributes" to_port="example set input"/>
     <connect from_op="Select Attributes" from_port="example set output" to_op="FP-Growth" to_port="example set"/>
     <connect from_op="FP-Growth" from_port="frequent sets" to_op="Create Association Rules" to_port="item sets"/>
     <connect from_op="Create Association Rules" from_port="rules" to_port="result 1"/>
     <portSpacing port="source_input 1" spacing="0"/>
     <portSpacing port="sink_result 1" spacing="0"/>
     <portSpacing port="sink_result 2" spacing="0"/>
   </process>
 </operator>
</process>

image


I know how terrible this is, I even didn't get final certain recommended goal!!

my interface will be as webpage

how can I fix my process ??

How can i do the association analysis directly with the output of the clustering operator? Without the need of all my manual work?
I have every category having its own process how can i group them together?

how should the process be when new user enter new goal
What it should go through?!

is there a better way to build my system?!

any help will be very much appreciated
regards

Sign In or Register to comment.