RAPIDMINER 9.7 BETA ANNOUNCEMENT
The beta program for the RapidMiner 9.7 release is now available. Lots of amazing new improvements including true version control!
How to make comparisons between treatment and control groups using "Data to Similarity"?
I don't have an implementation with XML code yet, but I'm working on applying models that deal with treatment effects, for example the effect of drinking wine on lifespan, or the effect of higher ticket prices on attendance at a concert. To do this I'd like to create a kind of matched sample where I match each example from the control group to its most similar example from the treatment group, and compare their outcomes.
Assume computation time is not a concern for the solution.
Use Data Similarity to identify most similar examples from the treatment and control group, and then apply Generate Attribute to calculate the difference between two values (like lifespan, sales, etc.). Ultimately, the objective is to estimate a treatment effect for each individual.
I know Data to Similarity will provide me with the most similar examples in the example set, but from there I'm trying to determine:
- How can I find the most similar match from another group? (i.e., if I take one example from the control group, what is the most similar example from the treatment group)
- Once I have each pair of IDs how can I put this back into the process to make comparisons within each pair?
This is my first post. Sorry if anything is unclear; I'm happy to provide more detail if helpful.