best unbiased estimator vs second moment

steffen · June 2008

Hello

A quite unimportant question:

Is there any specific reason why you use the second moment instead of the best unbiased estimator for variance ? I have often stepped over different calculations in different libaries, which sometimes caused a little headache "why the results differ now ?".

greetings

Steffen

TobiasMalbrecht · June 2008

Hi Steffen,

this question somehow seems to be like a question of taste. As far as I know, there is no particular reason, why RM uses the second moment rather than the best unbiased estimator for variance. But moreover, this exclusive statement is not true for all RM variance/standard deviation calculations. Hence, I think we should discuss this topic internally, maybe centralize the aforementioned calculations and clearly communicate which way we chose to calculate and possibly also why. Maybe our Wiki will be a good place to explain and document such calculations for users and developers. What do you think?

Regards,
Tobias

steffen · June 2008

Hello Tobias

Ah yeah, the wiki thing. I was contemplating a lot over some kind of "Data Mining Wiki". As far as I see, there is no such thing as a successful Data Mining Wiki out there, ignoring Wikipedia, which already contains a lot of articles about statistics and various algorithms.

I looked at your wiki, too and thought about writing something. To be honest: This are my main problems...

The wiki is proprietary: If the situation of your company changes in the future (good or bad),there is always the chance someone will pull the plug.
The money thing: I am still amazed about RapidMiner and I wish you (truly) success with your courses and the enterprise edition (also I am not able to afford it (yet :P)). Knowledge is money (it is as it is), so maybe you are not going to be too amazed about a well written and somehow complete wiki.
I am somehow disappointed about the community of RapidMiner. A lot of people jump in to ask questions, but only few are trying to answer or to participate generally (jump by for a little talk for example). This may not be obvious at the moment, but it was in the old sourceforge forum. So the question raises: WHO is going to write articles for the wiki ?

So, these are my problems.

I hope you understand

greetings

Steffen

PS: Where is Jean Charles ??? ;D

jdouet · June 2008

> PS: Where is Jean Charles
I am here and...oh gosh, I was to write on the wiki !! ::) ;D
Well, to say it clearly, I began to write a few sketches, for instance "regexp". I had the idea to write two types of articles :
- "as is" operator description, that is to reproduce the operator tree and each "F1 box" text
- cross-theme articles, as "regexp", "attributes' roles", each of these articles giving a stub towards more specific operators...

You will see that I began with the following articles :
http://yale.sourceforge.net/wiki/index.php/Regular_expressions
http://yale.sourceforge.net/wiki/index.php/Mapping_the_data_in_RapidMiner_table
http://yale.sourceforge.net/wiki/index.php/Category:Data_Formats
http://yale.sourceforge.net/wiki/index.php/Computing_load_issues

What do you think ? any idea of "cross-theme" articles ? "Encoding" ? "XPath" ? "relational algebra" ? "Handling ExampleSource" ? "text mining basics" ?
At the moment, I am sure that it is just needed to "cast the sketch and keywords" for each foreseen article...

Cheers,
Jean-Charles.

steffen · June 2008

Ok...

Regarding the community aspect: Two people for the wiki.

More to come ?

Please dont get me wrong. I am more the pessimistic type of guy, the advocatus diaboli. I like to contribute, I just want to assure (and motivate myself), that I do not spend my precious time in vain.

greetings

Steffen

PS: Nice to see you

IngoRM · June 2008

Hi,

just a few thoughts:

The wiki is proprietary: If the situation of your company changes in the future (good or bad),there is always the chance someone will pull the plug.

Well, this is basically true for almost all wikis on the net. If Wikipedia decides to do so, there isn't much you could do about that. And chances are probably much better for a Wiki for an open-source software than for most proprietary products. To be honest: there is no reason for us to pull the plug but we would be really interested to see a larger community ourself: independently of the large amounts of valuable wisdom which could be written in a Wiki I am pretty sure that we still have a lot of knowledge about Data Mining in general and RapidMiner in particular in our brains. And that's what people are paying us for, not the information which is available freely in the net anyway. So you can be assured: the Wiki, the freely available written tutorial, RapidMiner itself: everything will stay free and open and will remain on the new (although it might move like this forum for example).

The money thing: I am still amazed about RapidMiner and I wish you (truly) success with your courses and the enterprise edition (also I am not able to afford it (yet )). Knowledge is money (it is as it is), so maybe you are not going to be too amazed about a well written and somehow complete wiki.

We don't have a problem with this as long as it helps to let the community grow and helps other users. As I said: I am sure that we still know even more than the things which could and probably would be written and we could provide this additional amount of knowledge together with our experience to our customers.

I am somehow disappointed about the community of RapidMiner. A lot of people jump in to ask questions, but only few are trying to answer or to participate generally (jump by for a little talk for example). This may not be obvious at the moment, but it was in the old sourceforge forum. So the question raises: WHO is going to write articles for the wiki ?

And here I fully agree: there only few people who step by on a regular base. But Data Mining is a highly specialized field and only few people work on it. There are also (related) communities like statistics and also communities built around other products so I find it quite natural that the group of "core users" grows only slowly. On the other hand, it was really great to see how people like you, Steffen and Jean-Charles but also others of course started to form a small user community. And we highly appreciate that, not only since your answers always are of really high quality.

And there other movements, too, for example a first user group meeting in New York. We will certainly organize a free 1-day workshop for users where we can meet and exchange our experiences. So in my understanding, there are a lot of users who just have a problem and want an answer - but there is also a small group of people who start to form a really community around of software. And this actually is one of the greatest things for us here at Rapid-I.

Ok, back to the Wiki. We at Rapid-I will certainly also provide some articles - at least for the most common questions we often have to answer here in the forum (and there also a lot of people who mail us directly although we cannot guarantee an answer for those support requests).

But some people simply have to start and others will follow as the information grows. I also do not think that there is a good data mining wiki around in the net but we could start to build it and other people will follow then. So we would provide a free Wiki and a free tool to use the processes and hints given there. I strongly believe that all sides will benefit from that (I would have founded a company around an open-source software otherwise). But of course among these side there always will be people who will not contribute but only benefit. But that's ok to me.

Cheers,
Ingo

steffen · June 2008

Hello again

We don't have a problem with this as long as it helps to let the community grow and helps other users. As I said: I am sure that we still know even more than the things which could and probably would be written and we could provide this additional amount of knowledge together with our experience to our customers.

I didnt thought about that. As a student, I am yet not able to estimate the power of experience in this field.

And here I fully agree: there only few people who step by on a regular base. But Data Mining is a highly specialized field and only few people work on it. There are also (related) communities like statistics and also communities built around other products so I find it quite natural that the group of "core users" grows only slowly.

Didnt thought about that either :-[

[qoute]
And we highly appreciate that, not only since your answers always are of really high quality.
[/quote]
thank you

!

But some people simply have to start and others will follow as the information grows. I also do not think that there is a good data mining wiki around in the net but we could start to build it and other people will follow then.

I see...

Well, that was quite motivating. I wont promise anything, cause I am afraid of a lie, but ....

good night

Steffen

IngoRM · July 2008

Just take your time

I just want to remember how long it took to set up this forum...

All the best,
Ingo

jdouet · July 2008

Hi All,

About the wiki, especially about "operators" pages : I would like to suggest a kind of form, which would take the following structure :

[tt]==Name==

==Group==

==Class==

==Input==

==Output==

==Min Inner==

==Max Inner==

==Inner operator conditions==

==Description==[/tt]

It reproduces the "F1" box, but is there any "template" facility for reusing such a structure ?

Cheers,
Jean-Charles.

IngoRM · July 2008

Hi,

good idea. But I have no idea if a template mechanism exists for Wikis. But maybe we can just post the operator template somewhere on the main page so that others can simply copy it from there.

Cheers,
Ingo

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

best unbiased estimator vs second moment

Answers