How to map a predicted result from Auto Model to original data?

budyonosaputrobudyonosaputro Member Posts: 5 Contributor II
edited August 2019 in Help

Hi everyone,

I'm very new in Rapidminer and just found a difficulty in here. I have 2 columns of data, the 1st one is text data row which I crawled from twitter, and the 2nd one is the category that belongs to the text classification. The classification is partially done by the manual process and the rest needs to be predicted by Rapidminer. Thus, I auto modeled my text data using a "Predict" task in the first screen and I click next until the results is come out.
I've exported the predictions results into an excel sheet, but I'm confused with the result. Indeed, in the predicted sheet has a prediction for my category, but I don't know how to map the results with my original data. I dont know which one belongs to positive or negative categories.

Your help is really appreciated.
Thanks,
Budyono from Indonesia
Tagged:

Best Answer

  • varunm1varunm1 Moderator, Member Posts: 1,207 Unicorn
    edited August 2019 Solution Accepted
    Hello @budyonosaputro

    After some analysis on the auto model process for text, there is one way to get the text column so that you can fill the empty values.

    1. Once you run the auto model, you need to select "open process"


    2. Then once you open the process there is a block called "Handle Texts" as shown below.

    3. Then double click on this "Handle Texts" you will find  "Text Vectorization" block, click on that and you can see an option "Keep Originals" in parameters block and select it as shown in below image.


    4. Once you select that, you need to run the process. Then you can see multiple results tabs, here select "Explain PredictionsIOObject" tab, now you can see the texts as well as predictions, you can use this to fill your empty columns.



    5. You can also write the results into excel so that it will be easy to fill. To do that, in the same process you need to connect "Write Excel" operator to the "exa" port of "Explain Predictions" operator. Fill the File name parameter of write excel and run the process so that you can get the predictions and texts in an excel file.


    Hope this helps.
    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

Answers

  • varunm1varunm1 Moderator, Member Posts: 1,207 Unicorn
    edited August 2019
    Hello @budyonosaputro

    The auto models separate the data that have labels and without labels, one reason from my understanding for this separation is that "performance metrics like accuracy cannot be calculated with unlabelled data". Once the data is separated it will again divide the labeled dataset into 60:40 ratio (train: test) and the performance is calculated on the testing data (divided into 7 folds and tested). 

    What happens to unlabelled data?
    The unlabeled data is not removed, it is being predicted by "Apply TV on scoring" operator or the Explain prediction operator based on the type of data inputted into the auto model. You can see this in the predictions tab as shown in the figure. Here you will find a prediction for  40% test dataset as well as unlabelled data. You can also see which data predictions belong to based on the features in this column.



    This is a bit confusing statement from your post.
     but I don't know how to map the results with my original data
    Do you mean, you don't know how to map the predictions of unlabeled to the original label? 
    If that is the case then I don't think it is possible without knowing the original labels. You can map predictions to the data based on attributes (features) but you cannot find an original label (maybe assume based on performance) from computational modeling.

    Please inform if you need more information. If this is not what you are looking for, provide an example with an XML code or an excel. If you would like to post some images of the problem @Tghadially can help you with that.
    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

  • budyonosaputrobudyonosaputro Member Posts: 5 Contributor II
    Hi @varunm1,

    Thanks for your explanation.
    What I mean is I want to fill the unlabeled data/category with the prediction from rapidminer. The result from rapidminer is quite confusing to me and I don't know how to read it.

    Fyi, my original data is only 2 columns, the first one is a free-text data from twitter and the second one is a positive/negative category that belongs to the first column. Some of data are already categorized into positive and negative and the rest needs to be filled in with the prediction from rapidminer. 

    I hope you can understand what my question is. I'd like post a picture in here so maybe you can understand what I need, but it always says "You have to be around for a little while longer before you can post links.". I just sent a message to @Tghadially for this issue and I hope I can immediately upload my pics in here.

    Thanks,
    Budyono from Indonesia
  • varunm1varunm1 Moderator, Member Posts: 1,207 Unicorn
    Thanks for your response. I understood your requirement now, once you get access to post image then we can take a look at the predictions so that we can clarify your confusion. Its Sunday night here in the US, you will get access mostly tomorrow morning once they are in office.
    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

  • budyonosaputrobudyonosaputro Member Posts: 5 Contributor II
    Thanks @varunm1 for your info. I didn't notice that currently is sunday night in there, because in Indonesia is already monday morning, haha. Ok then, I'll get back to you once I can post some images in here yaa.

    Thanks,
    Budyono from Indonesia
  • budyonosaputrobudyonosaputro Member Posts: 5 Contributor II
    Hi @varunm1

    Back again with me and currently I can post any image now, hehe. Sorry for late reply by the way. 
    Below is the pic I promised to you.

    Picture above is example of my original data. I need to fill the yellow colored column with the predicted result from Rapidminer. But the result in rapidminer is like below image and I don't know how to fill my original data with the result from Rapidminer. Can you help me?


    Thanks,
    Budyono from Indonesia
  • budyonosaputrobudyonosaputro Member Posts: 5 Contributor II
    Dear @varunm1

    Thank you so much for your ultimate solution. It really helps me a lot. aaahhh I'm so happy right now. hahaha

    Thanks and Regards,
    Budyono from Indonesia
  • varunm1varunm1 Moderator, Member Posts: 1,207 Unicorn
    Hello @IngoRM

    While working on this question, I thought of something that needs your suggestion. Can you inform why we are unable to access a process unless we run them in auto model?. Is it possible to access the process from the below page when I click on a decision tree? The reason I am asking this is, in case I want to change something in the model, I need to run it first and then access the process.


    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

Sign In or Register to comment.