MythTV RapidMiner = MythMiner - A suggestion system for TV programs

BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 Unicorn
edited July 2019 in Help
Hi!

Over the last months, I created a complex RapidMiner process for suggesting TV programs based on the programs recorded by the user. A bit like a Tivo but for the open source personal video recorder software MythTV. (http://www.mythtv.org/) MythMiner also works in RapidAnalytics.

MythTV users can now receive daily suggestions for new TV programs they might be interested in. MythMiner can even automatically schedule recordings of interesting programs if the user wishes.

Homepage: http://tud.at/programm/mythminer/

RapidMiner users, please note that MythTV is a complex system and it's currently only available for Unix/Linux. It can take weeks to install it and you probably need a dedicated computer for it. So maybe you are interested in taking a look at MythMiner to learn how I solved this problem but it can be only used with a full MythTV installation.

Thanks to the RapidMiner team for this amazing software and especially to Mr. Ralf Klinkenberg at Rapid-I who taught me everything about data and text mining in the training courses.
Tagged:

Answers

  • B_B_ Member Posts: 70 Maven
    Unique application and a good example for ideas.

    Very cool.
  • RalfKlinkenbergRalfKlinkenberg Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, RMResearcher, Member, Unconfirmed, University Professor Posts: 68 RM Founder
    Dear Balázs Bárány,

    thanks a lot for your kind words. I have to return your compliments: You are one of the most experienced and knowledgeable data mining experts I have met in our training courses so far. And I gave a lot of courses during last five years or so. So that means something. It is a pleasure to work with you. So I hope we have a chance to meet again this year.

    Best wishes,
    Ralf
  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,

    that really sounds interesting. How does it work technically?

    Greetings,
      Sebastian
  • BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 Unicorn
    Hello,

    I put the script to MyExperiment.org so you can take a look yourself.
    http://www.myexperiment.org/workflows/1796.html
    But if you'd like to use it, better download the zip from the homepage http://tud.at/programm/mythminer/ so you get the parameter definitions and the cron script.

    Basically, MythMiner creates two "corpuses" from the MySQL database that is used by MythTV. One corpus consists of the categories, titles and descriptions of programs that were recorded by the user in the past. The second corpus consists of the same data of programs that *weren't* recorded. (The assumption is that the user did record everything he/she is interested in.) The second corpus is sampled, of course, because it can be a huge list.

    The rest is normal text mining: building a model from the recorded and not recorded samples and applying this model and wordlist to tomorrow's program data.

    I did a lot of experimenting with Bayes and SVM operators and got a bit better results from SVM. My machines also worked day and night on parameter optimization (grid and evolutionary). Had to try all the things I learned in the courses in Dortmund  ;)

    The result is saved in an HTML page using the reporting plugin and optionally (when the confidence exceeds a configurable threshold) inserted into the scheduled recordings table. (MythTV supports recording priorities, and MythMiner uses the lowest priority. So the user's manual recordings won't be interfered with.)

    The process also works in RapidAnalytics; that's how I use it currently. But it works equally well from the RapidMiner command line.

    The resulting HTML can be viewed as is but my cron script applies a few transformations (e. g. it makes URLs clickable) to it and mails to me each morning.

    By the way, I'm a bit unhappy with the reporting plugin: I fully understand that you include a text like "Created with RapidMiner" with a link to Rapid-I in the default template. But if the user creates a template on his own, that shouldn't be changed by RapidMiner. This is shareware mentality, not real Open Source mentality. (Of course I could change the source code but then I would have to do that each time the plugin is updated.)

    You might consider removing this "feature" and just leave the message in the default template.
  • MarkHaxtonMarkHaxton Member Posts: 1 Contributor I
    Balázs Bárány wrote:

    Basically, MythMiner creates two "corpuses" from the MySQL database that is used by MythTV. One corpus consists of the categories, titles and descriptions of programs that were recorded by the user in the past. The second corpus consists of the same data of programs that *weren't* recorded. (The assumption is that the user did record everything he/she is interested in.) The second corpus is sampled, of course, because it can be a huge list.

    The rest is normal text mining: building a model from the recorded and not recorded samples and the phen375 diet system is very effective at applying this model and wordlist to tomorrow's program data.

    I did a lot of experimenting with Bayes and SVM operators and got a bit better results from SVM. My machines also worked day and night on parameter optimization (grid and evolutionary). Had to try all the things I learned in the courses in Dortmund  ;)

    The result is saved in an HTML page using the reporting plugin and optionally (when the confidence exceeds a configurable threshold) inserted into the scheduled recordings table. (MythTV supports recording priorities, and MythMiner uses the lowest priority. So the user's manual recordings won't be interfered with.)

    The process also works in RapidAnalytics; that's how I use it currently. But it works equally well from the RapidMiner command line.

    The resulting HTML can be viewed as is but my cron script applies a few transformations (e. g. it makes URLs clickable) to it and mails to me each morning.

    By the way, I'm a bit unhappy with the reporting plugin: I fully understand that you include a text like "Created with RapidMiner" with a link to Rapid-I in the default template. But if the user creates a template on his own, that shouldn't be changed by RapidMiner. This is shareware mentality, not real Open Source mentality. (Of course I could change the source code but then I would have to do that each time the plugin is updated.)

    You might consider removing this "feature" and just leave the message in the default template.
    Hi Balázs Bárány, very cool script you have created. I looked through your workflow and I like the way you work. Was wondering if your still around to help with a project I'm working on...
  • BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 Unicorn
    Hi MarkHaxton,

    yes, I still exist ;-)

    See my private message.

    Best,

    Balázs
Sign In or Register to comment.