"Classifying with SVM through the java API"

Legacy User · September 2008

Hi,

I am trying to do a simple classification by integrating RapidMiner into Java. This is approximately the same as a Process I have defined in the GUI which works great. This is how I try and do it in code:
(I call train() once and the classify() for each text).
The problem is all texts always get the same classification, as if no learning had occured or even just some default is taken. These are texts that I classify in the GUI properly (they belong to 5 different classes - polynominal problem), and in different classifiers (lingPipe and a homebrewed one).


public void train(List<Text> documents) {
		RapidMiner.init(false, false, false, true);

		wvtoolOperator = (OperatorChain) OperatorService
				.createOperator(TextInputOperator.class);

		wvtoolOperator.addOperator(OperatorService
				.createOperator("StringTokenizer"));
		wvtoolOperator.addOperator(OperatorService
				.createOperator("EnglishStopwordFilter"));
		wvtoolOperator.addOperator(OperatorService
				.createOperator("TokenLengthFilter"));
		wvtoolOperator.addOperator(OperatorService
				.createOperator("PorterStemmer"));

		List list = new ArrayList();
		for (Text text : documents) {
			String filename = ...
String classname = ...
			list.add(new Object[] { filename, classname});
		}

		wvtoolOperator.setListParameter("texts", list);


		IOContainer container = wvtoolOperator.apply(new IOContainer());
		ExampleSet exampleSet = container.get(ExampleSet.class);
		Learner learner = (Learner)OperatorService.createOperator(LibSVMLearner.class);
//Maybe set parameters here?		
		model = learner.learn(exampleSet);
		// Create the model applier
		modelApplier = OperatorService.createOperator("ModelApplier");

//Create a new SingleTextInput, for processing test Strings		
wvtoolOperator = (OperatorChain) OperatorService
		.createOperator(SingleTextInput.class);

		// Add additional processing steps.
		// Note the setup must be same as the one you used when creating the classification model
		wvtoolOperator.addOperator(OperatorService
				.createOperator("StringTokenizer"));
		wvtoolOperator.addOperator(OperatorService
				.createOperator("EnglishStopwordFilter"));
		wvtoolOperator.addOperator(OperatorService
				.createOperator("TokenLengthFilter"));
		wvtoolOperator.addOperator(OperatorService
				.createOperator("PorterStemmer"));

	}

	public String classify(String text) {
try{

		// Set the text
		wvtoolOperator.setParameter("text", text);

		// Call the text input operator
		IOContainer container = wvtoolOperator.apply(new IOContainer());

		container = container.append(model);
		// Call the model applier (the model was added already before calling the text input)
		container = modelApplier.apply(container);

		// Obtain the example set from the io container. It contains only a single example with our text in it.
		ExampleSet eset = container.get(ExampleSet.class);
		Example e = eset.iterator().next();

//This does the same thing as what two lines later happens...
		//return e.getValueAsString(eset.getAttributes().getPredictedLabel()));

		int predLabelIndex = (int) e.getPredictedLabel();
		return e.getAttributes().getPredictedLabel().getMapping().mapIndex(predLabelIndex);
		} catch (Exception ex) {
			//...
		}
	}

This works whether I set or not set parameters in //Should we set parameters here?
setting them is done there this way:


		((Operator)learner).setParameter(LibSVMLearner.PARAMETER_SVM_TYPE, new Integer(LibSVMLearner.SVM_TYPE_C_SVC).toString());
		((Operator)learner).setParameter(LibSVMLearner.PARAMETER_KERNEL_TYPE, "0");//linear
		((Operator)learner).setParameter(LibSVMLearner.PARAMETER_EPSILON, "0.001");
		//((Operator)learner).setParameter(LibSVMLearner.PARAMETER_C, "0.0");
		((Operator)learner).setParameter(LibSVMLearner.PARAMETER_P, "0.1");
		((Operator)learner).setParameter(LibSVMLearner.PARAMETER_CONFIDENCE_FOR_MULTICLASS, "true");

I am probably overlooking something simple but I'm completely out of ideas, I have looked around a lot and tried many approaches.

Thanks a lot,
Nimrod.

Legacy User · September 2008

Hi,

Okay, I have understood that I have to save and load the wordlist via the parameters. However, I feel like there should be some kind of object I could pass around between the filters instead of having to write it to a file and load it. Is this supported?
Also, does that mean it will be loaded every time I apply() the SingleTextInput()?

Thanks,
Nimrod

Legacy User · September 2008

Hi,

And another question while I'm at it... Using the code shown above it takes me about 1.5 seconds to classify each text (around 200 words) after learning a model containing a few hundreds of documents. In the GUI it is closer to your published performance of 25ms per post: It takes 66 seconds to cross-validate the same 350 or so documents in 10 folds (I end up classifying around 700+ documents, so it's actually even much faster). I'm running the example in the text plugin samples, 04_Learning/01_TextClassificationXVal.xml .
The slow step is ModelApplier.apply()... What could it be? Inherently my java development environment it over 1,000 times slower? or is something done in a different manner in the GUI environment for the said sample?

Thank you,
Nimrod

Legacy User · September 2008

Hi,

Okay I see now this is because of pruning, which seriously affects the performance of the SVM Model.

Thanks,
I hope this will be useful to someone for posterity.
But please answer my last question if you have the time (in the previous thread).

Nimrod.

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

"Classifying with SVM through the java API"

Answers