RapidMiner 9.8 Beta is now available
Be one of the first to get your hands on the new features. More details and downloads here:
"Multiple Iteration Linear Regression for a range of values for the coefficients"
I’m using Rapidminer Studio 6.4 with an spreadsheet as the data source.
I’m trying to help a law firm figure out the standard costs for some of their services they do for their clients. They have found that people like the law firm because of standard costs clearly stated rather than having to pay per hour which they don’t know in advance.
In the past they would do time studies where they ask representative how many minutes it takes them to do certain parts of their services. My problem with this is that people are very fuzzy on their times. People have trouble answering because the time could vary on many different parameters of what is requested that I don't have a way to look up and people do not always track how much time they put into each thing they do because of the standard cost method. (that’s another discussion)
I’m wondering if I can solve it with linear regression. They all have outlook calendars that they keep up to date. Can I take all the available working hours, subtract out vacation, department meetings, and any other things that are obviously not income producing activities to get a total hours per month available for lawyer output? Then figure out how many things were completed in a month for several years and do a linear regression on the output items to get to total hours?
|[Month]||[Wills]||[House Closing]||[Employment Contracts]||[Total Working Hours Available(label)]|
The coefficients that come back for each product would tell me the average hours it takes for each one to complete.
The one issue I have with this is that they all tell me each product can take longer or shorter based on a variety of conditions, that they might not know and want to average it in. So a Will could take 2 hours if it is simple, but for someone else it could take 14. They’d like to know what kind of range each one could take
Is there a way to possibly run iterations of multiple linear regressions and get a best (average?) coefficient for each product as well as ranges of values that worked? For example I might have some linear regression ensemble method that tells me that on average it takes 5 hours for a will, but it could take a rage of one hour to 14 hours and the standard deviation is 1.5 hours?
Is this at all possible in Rapidminer? I don’t even know which search terms to use in trying to figure this out.