The RapidMiner community is on read-only mode until further notice. Technical support via cases will continue to work as is. For any urgent licensing related requests from Students/Faculty members, please use the Altair academic forum here.
"Stream Database operator: metadata ?"
camielcoenen
Member Posts: 4 Contributor I
Hi,
I am working with a large dataset (approx. 250,000 rows and 300+ columns) which is loaded in a MySQL database table and would like to use the Stream Database operator to use this dataset in a proces. However, unlike the Read Database operator, the Stream Database operator doesn't output the metadata information, which makes it impossible to use other operators like Select Attributes in the steps following Stream Database. I am using RapidMiner 5.1 .
I am working with a large dataset (approx. 250,000 rows and 300+ columns) which is loaded in a MySQL database table and would like to use the Stream Database operator to use this dataset in a proces. However, unlike the Read Database operator, the Stream Database operator doesn't output the metadata information, which makes it impossible to use other operators like Select Attributes in the steps following Stream Database. I am using RapidMiner 5.1 .
Tagged:
0
Answers
I think all the Import Data Operators couldn't prepare the meta data informations directly.Because only when you start the process RM can read the meta data informations.
The easiest way is to save the dataset with the store operator at the repository. And then you have an fast acces to the dataset with the Retrieve operator. And alway the meta data informations.
Greetings
Matthias
Greetings,
Camiel
let me formulate it in this way: Do you use the Community Edition?
Greetings,
Sebastian
Thanks,
Camiel
currently not, but as a community edition user you simply have to wait until someone has idle time to fix it. As an enterprise customer your wishes would have a "little" bit more importance to us. Not to mention that we could hire more guys helping us coding things if you would become enterprise customer.
Anyway I think that handling of large amounts of data will become an enterprise feature sooner or later. So I won't bet that the improvements of Stream Database will make it into the community edition.
Greetings,
Sebastian
Is it a JDBC connection issue that needs to be fixed ? The "Read Database", on the other hand, is working fine.
Nevertheless, I would like to know how to handle a large dataset in Rapidminer Community Edition, what kind of operators can be used to make the dataset more manageable? Are there tutorials/samples on how to do this ?
Greetings,
Camiel
aggregate it before loading it. Split the data set before loading it. Try to cluster things before by using samples where possible, apply in batches...
Well, everything depends on your problem. But the basic idea is to use only samples or batches where possible or to compress the data even before loading.
Greetings,
Sebastian