Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

Importing .dat data set

peleitorpeleitor Member Posts: 10 Contributor II
edited November 2018 in Help
Herllo. There is a quite popular retail dataset from belgian anonymized stores, which can be found here:

http://fimi.ua.ac.be/data/retail.dat.gz


First file lines are:

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
30 31 32
33 34 35
36 37 38 39 40 41 42 43 44 45 46
38 39 47 48
38 39 48 49 50 51 52 53 54 55 56 57 58
32 41 59 60 61 62
3 39 48
63 64 65 66 67 68
32 69
48 70 71 72
39 73 74 75 76 77 78 79

But I don't know retail.dat file format.

How can it be imported into rapidminer?

I could not find any format descriptors of this.

Thanks

Answers

  • frasfras Member Posts: 93 Contributor II
    Try this:

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="6.0.008">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="6.0.002" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="read_csv" compatibility="6.0.008" expanded="true" height="60" name="Read CSV" width="90" x="45" y="75">
            <parameter key="csv_file" value="c:\data.csv"/>
            <parameter key="column_separators" value="\n"/>
            <parameter key="first_row_as_names" value="false"/>
            <list key="annotations"/>
            <parameter key="encoding" value="windows-1252"/>
            <list key="data_set_meta_data_information">
              <parameter key="0" value="att1.true.polynominal.attribute"/>
            </list>
          </operator>
          <operator activated="true" class="split" compatibility="6.0.008" expanded="true" height="76" name="Split" width="90" x="179" y="75">
            <parameter key="split_pattern" value="\s+"/>
          </operator>
          <connect from_op="Read CSV" from_port="output" to_op="Split" to_port="example set input"/>
          <connect from_op="Split" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>

  • peleitorpeleitor Member Posts: 10 Contributor II
    Thank you, now I could load it. But as I see it, it is just a two dimensional matrix of integer numbers; I reckon this will not be useful without some descriptor file. I expected to see something like an array of tickets, with product codes inside.


  • mdelliasmdellias Member Posts: 1 Learner III
    Hi I'm a newbie for RapidMiner. Can you please explain step by step how can I import .dat file? I need to open dataset from http://www.damianospina.com/projects/the-replab-2014-dataset/
  • FBTFBT Member Posts: 106 Unicorn

    I just made a few spot checks, but it looks like the files are regular excel files with a header (despite having a .dat file extension). Hence you would want to use the "Read Excel" operator. However, before importing, you will need to save the files with an Excel extension: Open the .dat file with Excel -> Save as -> Get rid of the .dat in the file name -> Select an Excel file format -> Save.

     

    The guys at Rapidminer have made a nice series of introductory videos, which can be found on Youtube. The one dealing with importing data can be found here: https://www.youtube.com/watch?v=1EZk9w1ln0g&index=2&list=PLssWC2d9JhOZLbQNZ80uOxLypglgWqbJA

Sign In or Register to comment.