The RapidMiner community is on read-only mode until further notice. Technical support via cases will continue to work as is. For any urgent licensing related requests from Students/Faculty members, please use the Altair academic forum here.
"[SOLVED] Separating CapitalizedString into single words"
Hi all
I have been trying for quite a long time to solve the following problem but cannot find any way, maybe someone had a similar issue:
I have a set of examples which have attribute values like:
"CapitalizedStringIntoSingleWords"
but I want them in the form of "Capitalized String Into Single Words" (separate them by capital letter, I don't mind if the result words are capitalized or not). I could use Regular Expressions, I can easily filter out the capital letters, but then I get only something like:
" apitalized tring nto ingle ords"
... huh, that seems to be a more general problem, and I cannot get my thinking out of the box.. :-[
Any ideas, help? ???
Cheers,
Monika
I have been trying for quite a long time to solve the following problem but cannot find any way, maybe someone had a similar issue:
I have a set of examples which have attribute values like:
"CapitalizedStringIntoSingleWords"
but I want them in the form of "Capitalized String Into Single Words" (separate them by capital letter, I don't mind if the result words are capitalized or not). I could use Regular Expressions, I can easily filter out the capital letters, but then I get only something like:
" apitalized tring nto ingle ords"
... huh, that seems to be a more general problem, and I cannot get my thinking out of the box.. :-[
Any ideas, help? ???
Cheers,
Monika
Tagged:
0
Answers
you can use the operator Replace with a reg exp and capturing groups like this: The operator "Trim" is just for removing the first space if there is one. If this is not desired, you could also define the reg exp in a way that only Capitals not at the start of the line will be replaced. This looks like this: The reg exp for the parameter "replace what" is '((?<!^)[A-Z])' (without the quotes), and the reg exp for "replace by" is ' $1' (note the leading space).
Hope that helps,
Ingo
Exception: java.lang.IndexOutOfBoundsException
Message: No group 1
Stack trace:
java.util.regex.Matcher.group(Unknown Source)
java.util.regex.Matcher.appendReplacement(Unknown Source)
java.util.regex.Matcher.replaceAll(Unknown Source)
com.rapidminer.operator.preprocessing.filter.AttributeValueReplace.applyOnFiltered(AttributeValueReplace.java:114)
com.rapidminer.operator.preprocessing.filter.AbstractFilteredDataProcessing.apply(AbstractFilteredDataProcessing.java:136)
I try it for the the single attribute value "CompanyEarningsAnnouncement", so there should not be problems...
This is a simplified example process that throws the same exception: What could be the reason??
you have deleted the most important part of my solution: the brackets "(" and ")" have indicated a so-called capturing group which can be re-used in "replace by" with $X where X denotes the number of the group. Just use "([A-Z])" (will introduce leading space which could be removed by trim as in my first process above) or "((?<!^)[A-Z])" (will not introduce leading space as in my second process above) and you will be fine again.
Fore more information about caputuring groups please check out Section 3.4 of the following tutorial:
http://www.vogella.de/articles/JavaRegularExpressions/article.html#regex_grouping
Hope that helps,
Ingo
Thank you