Options

SOLVED - Basic data consolidation problem

sgtrocksgtrock Member Posts: 17 Contributor II
edited November 2018 in Help
Never mind.  After sleeping on it, I realised that the clue for how to do what I want to was shown in the video, Using RapidAnalytics 3: Exposing RapidAnalytics Processes as Web Services.  All I really needed to do was just Aggregate by Asset and sum Seats.  Dead obvious once I really understood what that function did! 

========
Hi, all;

My primary job function is as a system administrator.  I have multiple text data sources that have been generated from logs, data extracts, what have you, that can come at me multiple times a day.  Like many other sysadmins, I started out using traditional tools like shell scripts, perl, Python, and Excel spreadsheets to make sense of the data.  

Over time, I've found that building new ways to look at the data becomes inflexible.  I'm trying to figure out new ways to handle these multiple fire hoses of data without having to spend an arm and a leg.  RapidMiner, especially 5.x, looks like it might possibly be a good alternative for me.

I have gone through the 5.0 User Manual, the wiki, and every video tutorial that I could find.  Unfortunately, my lack of background in data mining techniques is definitely hampering my ability to get much done so far as virtually all of the available documentation appears to be aimed at data mining professionals.  Here is a typical example that I'm hoping someone might help me figure out:

I've imported a .CSV file with about 30 attributes successfully.  This file is an extract from our software asset management system.  The data has been entered largely by hand and we know it's not perfect.  However, it's the best that we have for the time being.

The first step is to create a picture of the number of users per application.  The software asset file has a field called Asset, another called Version, and a third called Seats (total number of assigned licenses for a piece of software).  Each software Asset will have zero, one, or more Versions.  Each Asset/Version pair has zero, one, or more Seats.

What I want to do is simple enough.  I have isolated these three attributes successfully.  Now what I wish to do is iterate through my three isolated attributes by Asset, creating a new attribute called TotalSeats.    The output should just include those two attributes.  Based upon some offline work using other tools, I would expect an output that should have roughly 30% fewer records than the input file.  

I've been beating my head against the wall for two days trying to figure this out with no success.  I've tried using more combinations of GenerateAttribute, various forms of WorkOnSubset, Aggregate, Loop, Pivot, and I don't know what all to no success.  The closest I've come was using GenerateAttribute.  It gives me a result where the number of records in the output that is equal to what I have in the input.  TotalSeats gets created, but each instance equals the value in Seats.

This is such a simple task that I know that I'm missing something basic.  Can someone tell me where I'm going wrong, or at least point me at a resource that might help a newbie like me?

Answers

  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    welcome on board! You already mastered the first step on a very steeeep learning curve. :)

    Greetings,
      Sebastian
Sign In or Register to comment.