Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
Parse Numbers not parsing some attributes - attribute remains nominal
chazzmoney
Member Posts: 10 Contributor II
Hello all,
I'm starting to get the hang of this - a fully integrated RM and R setup with database access and all. However, I am struggling to get Parse Numbers to work on all data returned by R.
Some attributes are converted correctly, but others remain nominal although they are numerical text - Parse Numbers doesn't seem to convert them. I'm wondering if I can determine what is causing this and how to fix it. Some example of the data that is not converted:
Thanks!
-Charles
I'm starting to get the hang of this - a fully integrated RM and R setup with database access and all. However, I am struggling to get Parse Numbers to work on all data returned by R.
Some attributes are converted correctly, but others remain nominal although they are numerical text - Parse Numbers doesn't seem to convert them. I'm wondering if I can determine what is causing this and how to fix it. Some example of the data that is not converted:
1 93.8050166776113 0.0454157740586547 0.0443342630670029 1.47691587794877e-05
2 74.3344711293915 0.0468817728698001 0.0448437650275624 1.34038276456967e-05
3 74.3344711293915 0.0474954230324842 0.0453740966285467 1.22608784444541e-05
4 92.6475985034064 0.0491963308363763 0.0461385434701127 1.65735980572911e-05
2476 26.7948477790146 -0.00422690861908714 -0.0094301112003816 1.45483738479543e-05The only guess I have is that Parse Numbers can't handle the number of digits in the string and somehow fails? If any of you have thoughts, they would be greatly appreciated.
2477 8.61930163773315 -0.00573894600796043 -0.00869187816189736 1.35195186313268e-05
2478 37.0749500040653 -0.00627755754951664 -0.00820901403942122 1.64033364144268e-05
2479 78.0625143961145 -0.00488596064822655 -0.00754440336118229 1.51513082738902e-05
Thanks!
-Charles
0
Answers
For the scientific notation, Parse Numbers appeared to be unable to deal with it successfully even after splitting, cutting the first part, and regenerating via a concat. I ended up just leaving the two split variables, assuming that the learning algorithms will be able to figure it out themselves. Seem like a hack, but it will work.
However, I still can't get this one to parse, even after cutting it as low as 6 characters: What other kinds of unexpected things will prevent a nominal from being parsed by Parse Numbers besides a long string?
Thanks!
-Charles
Thanks.
1) When in R, if you execute as.character() on a column in an xts or zoo object, ALL columns are converted to characters.
2) Rapidminer will not accept data frame columns in character format. Make sure you change things to factors using as.factor(). They will come into rapidminer as nominal.
Thanks,
-Charles
thanks for this hint. I'm not an expert in R and a character vector didn't occurred to me until now. Could you send me an script generating such a data frame? I could improve the import routine for dataframes to cover these format, too.
By the way, if you are frequently using RM's R Extension, you might be interested to share your experience on our Special Interest Group for R? We want to get feedback from our users to improve the extension further...
Greetings,
Sebastian
Thanks for the opportunity. I'm not really an expert in R - I've been using it for about two weeks so far, about half as long as I've been using Rapidminer. I'm grateful to you for all your hard work in making these amazing products. I'd be happy to share my experiences, but I'm definitely not an expert.
As for the script, there is probably a better / faster way to do this but: If you request x back from R as a data frame, rapidminer will give you an error. If you remove the last line of the R script, it will succeed.
I think that in R the preferred form may be to keep things as factors, but there might be some use to leaving things as characters. Someone who knows R better than I may have an answer.
-Charles
thanks, I have noted it down and will try to include it in one of the next releases.
Greetings,
Sebastian