Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
"[SOLVED] Sparse Data/Collaborative Filtering"
Hello,
I am trying to do collaborative filtering however I am having difficulty reading the data in.
Originally I formatted the data as a map, where a line contained ID, ID, Boolean. This would process in a few seconds.
What I need is a matrix with the two ID fields being coordinates and the Boolean being the entry. I could not figure out how to do this.
I moved on to trying to use readSparse, however it now takes 1 minute to read in the data. This seems odd and probably wont scale.
*I am new to rapidMiner, any suggestions on resources would be great.
I am trying to do collaborative filtering however I am having difficulty reading the data in.
Originally I formatted the data as a map, where a line contained ID, ID, Boolean. This would process in a few seconds.
What I need is a matrix with the two ID fields being coordinates and the Boolean being the entry. I could not figure out how to do this.
I moved on to trying to use readSparse, however it now takes 1 minute to read in the data. This seems odd and probably wont scale.
*I am new to rapidMiner, any suggestions on resources would be great.
<?xml version="1.0" encoding="UTF-8" standalone="no"?>Thanks in advance
<process version="5.2.008">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.2.008" expanded="true" name="Process">
<process expanded="true" height="-20" width="-50">
<operator activated="true" class="read_sparse" compatibility="5.2.008" expanded="true" height="60" name="Read Sparse" width="90" x="28" y="230">
<parameter key="format" value="yx"/>
<parameter key="data_file" value="*****************t"/>
<parameter key="dimension" value="216370"/>
<parameter key="datamanagement" value="boolean_sparse_array"/>
<list key="prefix_map"/>
</operator>
<connect from_op="Read Sparse" from_port="output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
Tagged:
0
Answers
as you can see in the help view the Read Sparse operator does not support the format you are trying to load. It supports only simple arrays. What do you want to do with your data afterwards? Maybe you can try another Read operator.
Best,
Nils