nominal to binominal in large DataSEts

dehghan-vdehghan-v Member Posts: 3 Contributor I
edited November 2018 in Help
with hello , i have problem to preparing my dataset.
i work in tehran traffic transaction database .
this data include this attributes:
    iD،HighWayCode,day,AirCondition,TrafficType,Time

this data set is over 530000 records.

i decide to work on association rule mining with this dataset . for example fp-growth
this attributes to work with this alghoritm(ARM) must convert to binominal.
day ,aircondition ,traffictype ,successfully converted to binominal in rapidminer .
but when converting HighWayCode to binominal crashed.

i read data from database -select attribute-nominaltobinomial-write to database

can any one help me to solve this problem????plz

i mentioned that  attribute HighWayCode is 1000 record

Answers

  • Marco_BoeckMarco_Boeck Administrator, Moderator, Employee, Member, University Professor Posts: 1,993 RM Engineering
    Hi,

    I'm afraid more information is needed to provide any help here.
    Please post your process xml (to get that, select the xml tab over your RapidMiner process and copy&paste the contents) and the error message from the log. And - if possible - a sample line of data which leads to the crash would be very useful.

    Regards,
    Marco
  • dehghan-vdehghan-v Member Posts: 3 Contributor I
    hi
    when run this code memory usage go to very high and rapidminer hanged.

    examle row:1،Sunday,12:00-1:00,Fluent,cloudy


    this xml code :


    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
     <context>
       <input>
         <location/>
       </input>
       <output>
         <location/>
       </output>
       <macros/>
     </context>
     <operator activated="true" class="process" expanded="true" name="Process">
       <process expanded="true" height="363" width="827">
         <operator activated="true" class="read_database" expanded="true" height="60" name="Read Database" width="90" x="45" y="30">
           <parameter key="connection" value="1"/>
           <parameter key="query" value="SELECT  *&#10;FROM dbo.BOZ&#10;where id&gt;=250000 and id&lt;550000"/>
         </operator>
         <operator activated="true" class="select_attributes" expanded="true" height="76" name="Select Attributes" width="90" x="182" y="18"/>
         <operator activated="true" class="nominal_to_binominal" expanded="true" height="94" name="Nominal to Binominal" width="90" x="372" y="22">
           <parameter key="attribute_filter_type" value="single"/>
           <parameter key="attribute" value="CodeBozorgRah"/>
         </operator>
         <operator activated="true" class="write_database" expanded="true" height="60" name="Write Database" width="90" x="645" y="91">
           <parameter key="connection" value="1"/>
           <parameter key="table_name" value="BOZ2"/>
           <parameter key="overwrite_mode" value="overwrite first, append then"/>
         </operator>
         <connect from_op="Read Database" from_port="output" to_op="Select Attributes" to_port="example set input"/>
         <connect from_op="Select Attributes" from_port="example set output" to_op="Nominal to Binominal" to_port="example set input"/>
         <connect from_op="Nominal to Binominal" from_port="example set output" to_op="Write Database" to_port="input"/>
         <portSpacing port="source_input 1" spacing="0"/>
         <portSpacing port="sink_result 1" spacing="0"/>
       </process>
     </operator>
    </process>


    CodeBozorgRah=HighWayCode
  • Marco_BoeckMarco_Boeck Administrator, Moderator, Employee, Member, University Professor Posts: 1,993 RM Engineering
    Hi,

    for each different string in a polynominal attribute a new attribute is created if you're converting it to binominal ("attribute = 1", "attribute = 2", etc), for large data sets with thousands of different entries the result will be really large. Therefore you may either increase the memory available for RapidMiner (see this) or use a different learning scheme (see the example processes in the samples repository).

    Regards,
    Marco
Sign In or Register to comment.