Options

Discretize all the attributes (together)

fjcuberosfjcuberos Member Posts: 18 Maven
edited June 2019 in Help
I´m focusing the treatment of multivariate temporal series using the AttributeSubsetPreprocessing.
The idea is process the exampleSet several times (one per dimension) selecting only the attributes of one dimension.
Supose the attribute list for a 2D trajectory
x1
x2
...
x200
y1
y2
...
y200

I need to discretize the values of x1..x200 but taking all the values of all attributes into account. So the discretization model has 200 rangesmap that are identical.
This could be accomplished by a new parameter in the discretization operators.

I include a sample with BinDiscretization I´d developed for my use.

The DiscretizationModelSeries is a empty inheritance of DiscretizationModel needed because the DiscretizationModel constructor is private.

public Model createPreprocessingModel(ExampleSet exampleSet) throws OperatorException {
if (getParameterAsBoolean(PARAMETER_ALL_ATTRIBUTES)){
DiscretizationModelSeries model = new DiscretizationModelSeries(exampleSet);

exampleSet.recalculateAllAttributeStatistics();
int numberOfBins = getParameterAsInt(PARAMETER_NUMBER_OF_BINS);
HashMap<Attribute, double[]> ranges = new HashMap<Attribute, double[]>();

//Get the values of every attibute
double min = Double.POSITIVE_INFINITY;
double max = Double.NEGATIVE_INFINITY;
for (Attribute attribute : exampleSet.getAttributes()) {
if (attribute.isNumerical()) { // skip nominal and date attributes
double mi = exampleSet.getStatistics(attribute, Statistics.MINIMUM);
double ma = exampleSet.getStatistics(attribute, Statistics.MAXIMUM);
if (mi < min) min=mi;
if (ma > max) max=ma;
}
}
// Compute the limits
double[] binRange = new double[numberOfBins];
for (int b = 0; b < numberOfBins - 1; b++) {
binRange = min + (((double) (b + 1) / (double) numberOfBins) * (max - min));
}
binRange[numberOfBins - 1] = Double.POSITIVE_INFINITY;
// Assign the same limits to every attribute 
for (Attribute attribute : exampleSet.getAttributes()) {
ranges.put(attribute, binRange);
}

model.setRanges(ranges, "range", getParameterAsBoolean(PARAMETER_USE_LONG_RANGE_NAMES));
return (model);
}
else{
return ( super.createPreprocessingModel(exampleSet));
}
}


public List<ParameterType> getParameterTypes() {
List<ParameterType> types = super.getParameterTypes();

ParameterType type = new ParameterTypeBoolean(PARAMETER_ALL_ATTRIBUTES , "Indicates if ALL the attributes are discretized together.", false);
type.setExpert(false);
types.add(type);
return types;
}

Thanks and congratulations for making RM better every release.

F.J. Cuberos

Answers

  • Options
    IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hi,

    thanks for sending this in. I will add this to our todo but we will first make the next release which is coming probably end of this week / beginning of the next one.

    Cheers,
    Ingo
Sign In or Register to comment.