Options

"read database"

gfyanggfyang Member Posts: 29 Maven
edited May 2019 in Help
Hi,

RM could read the database table like,
attribute-1 attribute-2 ... attribute-N
record-1  value-11 value-12 ... value-1N
record-2  value-21 value-22 ... value-2N
... ... ... ... ...
Now, due to the huge number of attributes (more than 60000), the format of a table is:

1, record-1, attribute-1, value-11
2, record-1, attribute-2, value-12
3, ....

How to read this database?

Thank you very much.

Sincerely yours,
gfyang

Tagged:

Answers

  • Options
    steffensteffen Member Posts: 347 Maven
    Hi,

    The data is loaded correctly, it is just no displayed in the standard way due to its huge size. How to "fix" this:
    Go to Tools->Preferences->GUI and increase "rapidminer.gui.max statistics rows" (whatever your pc can handle)

    regards,

    steffen


  • Options
    gfyanggfyang Member Posts: 29 Maven
    Hi, Steffen,

    Thanks.

    It is not only about how to display the data. My question is how to input this database for ExampleSet?

    Of course, DatabaseExampleSource operator could read this database, but it could not know the actual meaning, thus could not do the following tasks, like classification or clustering.For example, in this table,

    1, record-1, attribute-1, value-11
    2, record-1, attribute-2, value-12
    3, ....

    DatabaseExampleSource thinks there are only four attributes, however, in fact there are 60000 attributes. How to let the operator know this?

    Thank you.

    Sincerely yours,
    gfyang
  • Options
    steffensteffen Member Posts: 347 Maven
    Hello gfyang

    Ok, it seems that I did not understand you the first time.

    Let me get this straight. Does your database really looks like this  :o ?
    1, record-1, attribute-1, value-11
    2, record-1, attribute-2, value-12
    3, ....

    Let me guess. Your data is a collection of key-value pairs with varying number of attributes. For instance:

    record-1{
      attribute-1: value-11
      attribute-3: value-13
    }
    record-2{
      attribute-2: value-1122
      attribute-123: value-13121
    }
    etc...
    Storing such data in the format described by you  in a relational database is simply abusive.

    @Your question:
    To the best of my knowledge, there is no operator which does this automatically for you. You have to write one on your own ... either this or you convert the data into standard tabular format (eg csv) using another programming language/tool and then import the csv into rapidminer.

    good luck,

    steffen
  • Options
    rakirkrakirk Member Posts: 29 Contributor II
    Have you tried querying a transpose of your table? Additionally, you may want to make such a transpose table into a view to speed up your processing. If you try this approach, let me know of your success since I could use such a method for a few of the datasets I have been to busy to analyze.

    regards,
    rk
  • Options
    gfyanggfyang Member Posts: 29 Maven
    Hi,

    Thanks.

    I have to use this format to store data because usually database could not support such a large number of columns. For example, SQL Server could only support 1024 columns/attributes.

    I will try to solve it by writing data from database to CSV file, and writing a corresponding aml file, too.

    Sincerely yours,
    gfyang
  • Options
    IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hi,

    actually the operator Pivot does exactly the desired transformation from your data format to a column based format. We frequently have to use it in our projects for exactly the reason you pointed out: databases can only handle a fixed amount of columns.

    There are  several examples for using the Pivot operator at myExperiment:

    http://www.myexperiment.org/search?query=pivot&;type=all

    Just use the Community Extension of RapidMiner and search for "Pivot" in order to download one of those processes. I am sure you will be able to configure it for your data set.

    Cheers,
    Ingo
  • Options
    steffensteffen Member Posts: 347 Maven
    Hi,

    upsie ... well, it seems that I do not know all operators ;) ... sorry, gfyang ! Additionally, I encounter key-value based data on a daily basis, so I guess I start to see it everywhere ...

    I wonder whether it is better to perform such a transformation and store such (sparse, I assume) data in relational db or store it as key-value pairs in a nosql like e.g. cassandra. But since more relational dbs are around and the first solution works ...

    greetings,

    steffen
  • Options
    IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hi Steffen,

    don't worry  ;) I am sure that you know exactly as much operators than me but a different subset. This makes the combined wisdom especially useful  ;D

    Good point about the nosql aspect. Actually this is one of the advantages I would expect of those db systems with respect to data analysis. If I only had more time...

    Cheers,
    Ingo
  • Options
    gfyanggfyang Member Posts: 29 Maven
    Hi,

    Thank you, Steffen,  ;) you also gave a good advice. You are right: the data is the processed text data, very sparse.I have not used cassandra system before, but I also agree it may provide a better way.

    Also thank you, Ingo, I am trying the operator you said, and let you know the result later.

    Sincerely yours,
    gfyang
Sign In or Register to comment.