I had bunch of items and i want to group them by their brand. The item description of the data i receive seem concatenate brand name and item name together. With varies length of brand name and now i want to group them, for example in the picture, can group all OREO Item together instead they seperated into different groups. Thank you!
Solved! Go to Solution.
Use the following:
This regular expression means "Capture anything that isn't a space (\S+) that comes before one or more spaces \s+ that in turn come before any kind of character .*" That is why you use $1, because you need the first (and only) string before the \s space.
All the best,
In your example it seems the brand is separated from the other content using a tab (or multiple spaces), can you confirm that?
If that's the case it should be fairly straighforward. Your regex needs to be adjusted as follows in case of tabs :
or, even easier : install the operator toolbox extention, and use the 'create exampleset' operator to copy your data and convert it to a dataset. Attached example gives an idea on how to do this.
<?xml version="1.0" encoding="UTF-8"?><process version="8.2.000"> <context> <input/> <output/> <macros/> </context> <operator activated="true" class="process" compatibility="8.2.000" expanded="true" name="Process"> <process expanded="true"> <operator activated="true" class="operator_toolbox:create_exampleset" compatibility="1.2.000" expanded="true" height="68" name="Create ExampleSet" width="90" x="112" y="85"> <parameter key="generator_type" value="comma_separated_text"/> <list key="function_descriptions"/> <list key="numeric_series_configuration"/> <list key="date_series_configuration"/> <list key="date_series_configuration (interval)"/> <parameter key="input_csv_text" value="Brand	Type	Weight Kraft	Oreo Chock	137G OREO	Mini chocolate	95G"/> <parameter key="column_separator" value="\t"/> </operator> <connect from_op="Create ExampleSet" from_port="output" to_port="result 1"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="sink_result 1" spacing="0"/> <portSpacing port="sink_result 2" spacing="0"/> </process> </operator> </process>
That is not the case, there are branch name in the form of following picture attacted, this is something give us a problem while creating the column. Is that anyway to attract them correctly? Thank you!
Do you have a dictionary containing all possible brand names? If not, I believe your best choice would be to combine the ideas of previous responses (i.e. build some regex logic) to create such a dictionary on which you can then run your grouping. This does require some manual labour and can, depending on the amount of different brand names, take up a lot of time, but based on your input data structure there is just no way to directly make an aggregation on brand name.