Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
How to extract YEAR from a string?
Hi,
I have this attribute in a dataset that is a string of text :
How do I create a new attribute, say Vintage and get to have only the year, e.g. 2006 in this case as the value of the new attribute?
Am using generate attribute operator, but can't quite work out the syntax correctly...
Thanks.
I have this attribute in a dataset that is a string of text :
- Name: "Angove's 2006 Red Belly Black Shiraz (South Australia)"
How do I create a new attribute, say Vintage and get to have only the year, e.g. 2006 in this case as the value of the new attribute?
- Vintage: 2006
Am using generate attribute operator, but can't quite work out the syntax correctly...
Thanks.
0
Best Answers
-
kayman Member Posts: 662 Unicornuse regex. If there are no other numbers in your string it is pretty easy, then you use someting as
replaceAll([myField],"\\D","")
Read as 'remove everything that's not a digit', so what is left will be your year.
If there are other numbers you can use a range, assuming that your years will go from 2000 to 2019 you could use something like
replaceAll([myField],"^.*?(20[0-9]{2}).*$","$1")
wich reads as 'start at the beginning of string, and if you find something starting with 20 followed by 2 other digits, store it and remove everything else.
If you can also have older years you could try as follows :
replaceAll([myField],"^.*?([12][0-9]{3}).*$","$1")
so now you look at a patters starting either with a 1 or a 2, followed by 3 other digits.
Not foolproof depending on your data, but it might do what you need
7 -
Telcontar120 RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 UnicornYou may also want to consider trying to parse your text field even further to separate out other information such as the geography, the vineyard, etc. You can use Split or Tokenize for that purpose.6