Input data with each cell containing an array instead of a single numerical or categorical entry.

explorerexplorer Member Posts: 7 Newbie
Urgent!! I have posted this before under a different caption but have not received any response. I am trying to build a model which must take in all inputs as arrays. (Each cell would consist of arrays of the same size). The numerical inputs have to be arrays and the categorical ones also have to be arrays. The reason is that the predicted output is provided as a "group" but there are several members in each group which have separate decision variables. . Each member contributes to the group output in different ways depending on its decision variables. Imagine for example that I have 1000 football matches as sample data and would like to predict the number of goals that will be scored by a team from that dataset. I know that the number of goals is based on team work and each player contributes to the goal. So I get the decision variable for each team player such as (age, skill level, experience, role  etc), but my predicted output (number of goals) is a 'group value' so I cant assign an output for each player rather I can only assign an output for each team, but I need to be able to individually provide the input variables for each and all players  (array) of that team in each cell.  How is this kind of problem solved in rapidminer?

Best Answer

  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
    Solution Accepted
    You don't need to use arrays to do this, but you do need to think about how you structure your dataset for RapidMiner to produce the outcome you want.  In your example, you would need to predict at the level of each team (so each row would be a game for a team) and the label would be the number of goals scored (again for the team) but you would have attributes for each individual player (as many as you need) and then indicators for which players were participating in each game for each team.  So you could end up with hundreds of attributes that would be considered based on whatever you are tracking for each individual player.
    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
    explorer

Answers

  • explorerexplorer Member Posts: 7 Newbie
    First of all, I wish to my express my sincere gratitude for getting some response after quite sometime. Greatly appreciated! My apologies about posting a second time. This was Tyler's recommendation when I chatted with him. Going back to my example: I had thought about something similar to your solution, but note that the example I provided for a football team and the players  was just to illustrate the need for an "array". In my actual task, the  number of members in a given entity (group) varies from one entity to another. Also, I cannot  pre-determine the maximum number of members possible for a each given entity. I would definitely go with your suggestion as a last resort. But if there is at all a way to define them as an array I would be glad to know.  Also let me know if there is a way I can terminate the other post to avoid a duplicate. Thanks very much.
  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
    I just replied on the other ticket.  If you mark one as Solved it will at least not attract additional attention.
    There is not to my knowledge any way to do arrays as you are requesting with native RapidMiner operators but you might be able to accomplish it with R or python scripting.
    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • explorerexplorer Member Posts: 7 Newbie
    A little bit more explanation regarding missing entries as I am completely new to Rapidminer: So my plan is to assume a certain fixed number of entity members, so in situations where actual entity members are less than the number columns (attributes), those extra columns will be empty. In essence, so some rows may have all the column entries duly occupied but some would not. Of course I do not want to replace any missing entry in that case. Does rapid miner automatically  exclude the empty ones from the prediction or is there something I need to do additionally to tell it which of the columns/attributes should be ignored for a given row?
Sign In or Register to comment.