Options

# Vector attributes

Member Posts: 2 Contributor I
edited November 2018 in Help
Hey,

I've got one question regarding attribute representation. I'm doing some image preprocessing (feature extraction) and some of extracted attributes (features) are vectors. I mean e.g.

Attribute 1 (Density): [12, 23, 23, 54, 2, 43, 6]
Attribute 2 (OffsetN): [32, 45, 3]
Attribute 3 (OffsetS): [3, 5, 2, 1, 43, 1, 2]
Attribute 4: 12
.
.
.
Attribute N ...

How to deal with this kind of attributes? E.g. I don't want a tree learner to split by a single value but by a whole attribute (vector). I thought it has something to do with value_series but either I cannot set it up properlu or it's not what I need  ???

Thanks,
baze

• Options
RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
Hi,
a tree learner compares the values of one single attribute and splits up in two groups: Greater or Smaller/Equal. But this unfortunately implies, that it cannot cope with vector valued data. Which Vector is greater than another? You cannot say in more than 1 dimension...
Even if you would provide it using a value series, it wont work at all. For most learners, you will have to find a transformation into a tabular format, with single values. If you store the vector values as single attributes, for example Attribute1_1 .. Attribute1_7, you might use the AttributeConstruction in order to calculate complex measures for comparing the vectors. For example the distance to a hyperplane. This value then could be used for the decision tree later on, while the original data attributes could be filtered out, so that they don't disturb the learning process...

Greetings,
Sebastian

• Options
Member Posts: 2 Contributor I
Hey Sebastian,

In terms of trees I was thinking about calculating a mean vector for every feature and split based on a distance from the mean. Well, not sure if it's much of a sense in doing this.

Anyway, what would be the aml representetion for this kind of attributes?  Can vector valued data be treated as one when it goes to plots (variance, std.dev. etc.)?

br,
Piotr
• Options
Member Posts: 439 Maven
Hi,

I'm not absolutely sure what you are trying to achieve. If you are trying to compute the "mean vector" over all examples in you set, you can have one attribute per vector dimension and use the Aggregation operator to compute the mean and the means as new attributes to the old example set using a Cartesian operator. Then you can compute the distance from this mean using an AttributeConstruction. Whether or not this distance from the mean vector is a useful attribute very much depends on your domain.

Cheers,
Simon