The RapidMiner community is on read-only mode until further notice. Technical support via cases will continue to work as is. For any urgent licensing related requests from Students/Faculty members, please use the Altair academic forum here.

DatabaseExampleSource: Metadata

calvinuscalvinus Member Posts: 5 Contributor II
edited November 2018 in Help
Hi there,

I'm quite new to RapidMiner, but have some experience with SAS Enterprise Miner. I already set up a Decision Tree learning process and got some exciting results (well at least judging how they look ;)).
But what I'm missing now is the following:
In SAS Enterprise Miner you can tweak the variable roles on every node - like setting it to "ignore", "target" or whatever. I use a DatabaseExampleSource and when using the Database wizard I can set up the roles once - but after that I don't manage to get back to change those roles without doing the whole wizard again. What's the right approach to change the roles of variables from a DatabaseExampleSource? And is there a node type which can be used to do this?

Thanks again in advance and for this great piece of software.

Answers

  • IngoRMIngoRM Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hi,

    thanks for your kind words.

    And is there a node type which can be used to do this?
    Sure (is there anything in SAS not possible with RM  :P )

    The node ("operator" in RapidMiner terminology) you are searching for is called "ChangeAttributeRole" - you can search for this in the Add Operator dialog or in the search field at the bottom of the New Operator tab. With this operator you can change regular attributes (used as input variables for the mining method) to labels, ids, weights etc. and back again.

    Cheers,
    Ingo
  • calvinuscalvinus Member Posts: 5 Contributor II
    Hi Ingo,

    thank you very much for your fast and helpful answer.
    I wondered if it is possible to access and change the metadata that is set in the DatabaseExampleSource. The problem I have with the ChangeAttributeRole is that it only changes one variable at a time.
    If I have an input set of 400 or more variables and for a certain run I want to select those which should be taken into account, it's not really possible to apply 200 of those operators. How would you suggest to deal with that situation?
    In SAS EM (sorry, that I come back on that, but it's my reference because I know it) you simply change the metadata of the InputSource node. In the DatabaseExampleSource there is a similar table when you run the wizard - but I didn't find a way to access this table again to change it.
    Ingo Mierswa wrote:

    [...]
    Sure (is there anything in SAS not possible with RM  :P )
    [...]
    I don't know - tell me!  :D

    Thanks again for your fast answer, especially at this time :)

    Jörg
  • TobiasMalbrechtTobiasMalbrecht Moderator, Employee, Member Posts: 295 RM Product Management
    Hi Jörg,

    you are right, 200 operators of the same kind are not an option. Fortunately, RM offers multiple options avoiding that. But that depends on the attributes you want to deselect for analysis. Are these attributes simply chosen by random or do they exhibit a common characteristic (e.g. a common name stem or a common attribute type)? One solution would be to use an [tt]AttributeFilter[/tt] and simply filter them out by setting the condition appropriately.

    Regards,
    Tobias
  • calvinuscalvinus Member Posts: 5 Contributor II
    Hi Tobias,

    thanks for your answer.
    My approach is the following: I have a set of many variables and I would like to use only some of them when e.g. learning a decision tree for a certain target. Instead of changing the input set every time it would be more convenient to see a list of variables in which you can simply adjust the role, e.g. set to "ignore" for not using this variable or set it to "label".
    Rapidminer already has such a similar  thing in the DatabaseExampleSource Wizard, but I don't manage to access it without processing all the wizard which leads to using all the settings made before (and is clumsy as well). So is there a possibility to get to this table where all variables are shown and their role can be adjusted without having to redo the wizard? Especially in an initial state it's a lot of back and forth until meaningful and valid variables are selected appropriately. And then you might want to exermine different aspect in one dataset. So there should be some easy means to adjust the variable selection. What is your suggestion for that? (I hope I could explain it well enough :) )

    I looked at the AttributeFilter, which is useful, but renaming the variables is not an option (then I would have to know up-front which variables are valid and which are not, but that's determined exploratively by running the mining process). I could attach a certain role "ignore" to the variables I don't want to include and filter them with an AttributeFilter - but then again it only offers built-in attribute roles. Can it be expanded to user defined roles?

    Sorry for bothering you continously with this topic, but I'm seriously considering to use RapidMiner and I know some people who are interested in the results, so I somehow have to get away with the obstacles ;)

    Thanks and have a nice day,
    Jörg



  • TobiasMalbrechtTobiasMalbrecht Moderator, Employee, Member Posts: 295 RM Product Management
    Hi Jörg,

    ok, let's see if I got you right:

    You want to exploratively select attributes and learn a decision tree, meaning you repeatedly load the data, select some attributes which you think are relevant (or which you think should be included in the tree), then learn the tree and finally investigate whether the attribtues seem relevant by looking at the tree. Is that correct?

    Well you can achieve something like that by using an [tt]InteractiveAttributeWeighting[/tt] operator, setting the weights of the attribute you wish to deselect to zero and apply the weights to the data afterwards by using the [tt]AttributeWeightsApplier[/tt] operator. You could also put these operators (and the learner) inside a [tt]IteratingOperatorChain[/tt] which prevents the data from being read multiple times. You have to set a breakpoint after the learner then. However, with more than 400 attributes this is of course still tedious work.

    Have you thought about inspecting the results of an attribute weighting scheme like [tt]InfoGainWeighting[/tt] which gives you a hint of how important attributes are for the learning problem? Or letting RM identify an optimal feature set automatically through its feature selection operators? That might save you a lot of work ...

    Additionally, it should be mentioned, that a decision tree algorithm also selects the more important attributes and does not bother the less important ones if you allow the algorithm to prune the tree.

    Regards,
    Tobias
  • TobiasMalbrechtTobiasMalbrecht Moderator, Employee, Member Posts: 295 RM Product Management
    I forget to explain: user defined roles may be specified (in the same operator). However, the attribute role is unique, meaning you can only set one attribute to one role at a time.

    Regards,
    Tobias
  • calvinuscalvinus Member Posts: 5 Contributor II
    Yes, you're right. I'm starting to get my feet wet with RapidMiner and have a dataset used for another (let's say "manual") analysis and I want to try what RapidMiner can get out of that. I first try to build decision trees because they're such a nice visualization. I just run it and see what comes out, take some variables out, run again,... just to get used to the tool and to some relations in the data.

    The InteractiveAttributeWeighting operator is close to what I looked for - thanks for the hint. Does it keep the settings and apply them automatically each time I load and run the process? It probably always displays the dialogue?

    The automatic feature selection sounds very exciting. However I did not yet manage to get this right, it's always complaining about some PerformanceVector missing and when I put this in, he needs again something else... but I will try the InfoGainWeighting as well :) Thanks for all those valuable hints.

    And finally, my first question: is there a possibility to adjust the attribute settings in the DatabaseExampleSource like one can do when executing the wizard?

    Thanks again and a lot for your help.
    Cheers,
    Jörg
  • TobiasMalbrechtTobiasMalbrecht Moderator, Employee, Member Posts: 295 RM Product Management
    Hi,
    calvinus wrote:

    Yes, you're right. I'm starting to get my feet wet with RapidMiner and have a dataset used for another (let's say "manual") analysis and I want to try what RapidMiner can get out of that. I first try to build decision trees because they're such a nice visualization. I just run it and see what comes out, take some variables out, run again,... just to get used to the tool and to some relations in the data.

    The InteractiveAttributeWeighting operator is close to what I looked for - thanks for the hint. Does it keep the settings and apply them automatically each time I load and run the process? It probably always displays the dialogue?
    Yes it does. Just a bit of explaination: the [tt]InteractiveAttributeWeighting[/tt] operator does what it says, namely it lets you specify attribute weights interactively at runtime, hence every time you run the process, and create an [tt]AttributeWeights[/tt] object. If you would like to save your settings you can save the attribute weights by using the [tt]AttributeWeightsWriter[/tt].
    calvinus wrote:

    The automatic feature selection sounds very exciting. However I did not yet manage to get this right, it's always complaining about some PerformanceVector missing and when I put this in, he needs again something else... but I will try the InfoGainWeighting as well :) Thanks for all those valuable hints.
    Have a look at the feature selection examples which are contained in the RM online tutorial. You should understand how the feature selection works from these very quickly. The basic idea is, that each feature selection constructs attribute subsets which are passed to inner operators for evaluation. Evaluation can be done e.g. by a cross-validation containing an inner learner.
    calvinus wrote:

    And finally, my first question: is there a possibility to adjust the attribute settings in the DatabaseExampleSource like one can do when executing the wizard?
    Not in the form as the interactive attribute weighting gives. There is only the possibility in the Wizard, which actually affects the parameter sql statement. Hence, you can set the sql statement of the operator, but you will have to do that manually. Maybe it is however an idea to divide the wizard in multiple parts, i.e. the connection settings, the choice of the table and the choice of the attributes. We will discuss this, but we are probably not able to include this into RM in the short term since we are very busy at the moment.

    Hope that helps,
    Tobias
Sign In or Register to comment.