01-30-2011 07:06 AM
02-09-2011 01:58 AM
02-12-2011 03:18 AM
09-22-2014 06:22 AM
Balázs Bárány wrote:
Basically, MythMiner creates two "corpuses" from the MySQL database that is used by MythTV. One corpus consists of the categories, titles and descriptions of programs that were recorded by the user in the past. The second corpus consists of the same data of programs that *weren't* recorded. (The assumption is that the user did record everything he/she is interested in.) The second corpus is sampled, of course, because it can be a huge list.
The rest is normal text mining: building a model from the recorded and not recorded samples and the phen375 diet system is very effective at applying this model and wordlist to tomorrow's program data.
I did a lot of experimenting with Bayes and SVM operators and got a bit better results from SVM. My machines also worked day and night on parameter optimization (grid and evolutionary). Had to try all the things I learned in the courses in Dortmund
The result is saved in an HTML page using the reporting plugin and optionally (when the confidence exceeds a configurable threshold) inserted into the scheduled recordings table. (MythTV supports recording priorities, and MythMiner uses the lowest priority. So the user's manual recordings won't be interfered with.)
The process also works in RapidAnalytics; that's how I use it currently. But it works equally well from the RapidMiner command line.
The resulting HTML can be viewed as is but my cron script applies a few transformations (e. g. it makes URLs clickable) to it and mails to me each morning.
By the way, I'm a bit unhappy with the reporting plugin: I fully understand that you include a text like "Created with RapidMiner" with a link to Rapid-I in the default template. But if the user creates a template on his own, that shouldn't be changed by RapidMiner. This is shareware mentality, not real Open Source mentality. (Of course I could change the source code but then I would have to do that each time the plugin is updated.)
You might consider removing this "feature" and just leave the message in the default template.