Importing .dat data set

peleitorpeleitor Member Posts: 10 Contributor II
edited November 2018 in Help
Herllo. There is a quite popular retail dataset from belgian anonymized stores, which can be found here:

http://fimi.ua.ac.be/data/retail.dat.gz


First file lines are:

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
30 31 32
33 34 35
36 37 38 39 40 41 42 43 44 45 46
38 39 47 48
38 39 48 49 50 51 52 53 54 55 56 57 58
32 41 59 60 61 62
3 39 48
63 64 65 66 67 68
32 69
48 70 71 72
39 73 74 75 76 77 78 79

But I don't know retail.dat file format.

How can it be imported into rapidminer?

I could not find any format descriptors of this.

Thanks

Answers

  • frasfras Member Posts: 93 Contributor II
    Try this:

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="6.0.008">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="6.0.002" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="read_csv" compatibility="6.0.008" expanded="true" height="60" name="Read CSV" width="90" x="45" y="75">
            <parameter key="csv_file" value="c:\data.csv"/>
            <parameter key="column_separators" value="\n"/>
            <parameter key="first_row_as_names" value="false"/>
            <list key="annotations"/>
            <parameter key="encoding" value="windows-1252"/>
            <list key="data_set_meta_data_information">
              <parameter key="0" value="att1.true.polynominal.attribute"/>
            </list>
          </operator>
          <operator activated="true" class="split" compatibility="6.0.008" expanded="true" height="76" name="Split" width="90" x="179" y="75">
            <parameter key="split_pattern" value="\s+"/>
          </operator>
          <connect from_op="Read CSV" from_port="output" to_op="Split" to_port="example set input"/>
          <connect from_op="Split" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>

  • peleitorpeleitor Member Posts: 10 Contributor II
    Thank you, now I could load it. But as I see it, it is just a two dimensional matrix of integer numbers; I reckon this will not be useful without some descriptor file. I expected to see something like an array of tickets, with product codes inside.


  • mdelliasmdellias Member Posts: 1 Contributor I
    Hi I'm a newbie for RapidMiner. Can you please explain step by step how can I import .dat file? I need to open dataset from http://www.damianospina.com/projects/the-replab-2014-dataset/
  • FBTFBT Member Posts: 106 Unicorn

    I just made a few spot checks, but it looks like the files are regular excel files with a header (despite having a .dat file extension). Hence you would want to use the "Read Excel" operator. However, before importing, you will need to save the files with an Excel extension: Open the .dat file with Excel -> Save as -> Get rid of the .dat in the file name -> Select an Excel file format -> Save.

     

    The guys at Rapidminer have made a nice series of introductory videos, which can be found on Youtube. The one dealing with importing data can be found here: https://www.youtube.com/watch?v=1EZk9w1ln0g&index=2&list=PLssWC2d9JhOZLbQNZ80uOxLypglgWqbJA

Sign In or Register to comment.