"ArrayIndexOutOfBoundsException when loading pdf files"

behrangsabehrangsa Member Posts: 7 Contributor II
edited May 23 in Help
Hi,

When I load PDF files in my process I get the following exception:

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 0
        at com.rapidminer.operator.TermWeightClusterCharacterizer.apply(Unknown Source)
        at com.rapidminer.operator.Operator.apply(Operator.java:664)
        at com.rapidminer.operator.OperatorChain.apply(OperatorChain.java:377)
        at com.rapidminer.operator.Operator.apply(Operator.java:664)
        at com.rapidminer.Process.run(Process.java:612)
        at com.rapidminer.Process.run(Process.java:582)
        at com.rapidminer.Process.run(Process.java:572)
        at org.behrang.clustering.Main.createProcess(Main.java:77)
        at org.behrang.clustering.Main.main(Main.java:26)
Here's my process:

System.setProperty("rapidminer.home", "C:\\Java\\RapidMiner-4.2");
        RapidMiner.init();
       
        Process p = new Process();
       
        OperatorChain textInput = (OperatorChain) OperatorService.createOperator("TextInput");
        textInput.setParameter(PARAMETER_DEFAULT_CONTENT_LANGUAGE, "english");
        textInput.setParameter(PARAMETER_PRUNE_ABOVE, "15");
        textInput.setParameter(PARAMETER_PRUNE_BELOW, "5");
        // textInput.setParameter(PARAMETER_DEFAULT_CONTENT_TYPE, "pdf");
       
        List<Object[]> textList = new LinkedList<Object[]>();
        for (File f : new File("fit4005").listFiles()) {
            textList.add(new Object[] {
              f.getAbsolutePath(),
              f.getAbsolutePath()
            });
        }
//        for (File f : new File("newsgroup/graphics").listFiles()) {
//            textList.add(new Object[] {
//              f.getAbsolutePath(),
//              f.getAbsolutePath()
//            });
//        }
//        for (File f : new File("newsgroup/hardware").listFiles()) {
//            textList.add(new Object[] {
//              f.getAbsolutePath(),
//              f.getAbsolutePath()
//            });
//        }
        // textList.add(new Object[] {"graphics","newsgroup/graphics"});
        // textList.add(new Object[] {"hardware","newsgroup/hardware"});       
        textInput.setListParameter("texts", textList);
        textInput.addOperator(OperatorService.createOperator("StringTokenizer"));
        textInput.addOperator(OperatorService.createOperator("EnglishStopwordFilter"));
       
        Operator tlfOperator = OperatorService.createOperator("TokenLengthFilter");
        tlfOperator.setParameter("min_chars", "5");
        textInput.addOperator(tlfOperator);
        textInput.addOperator(OperatorService.createOperator("PorterStemmer"));
       
        p.getRootOperator().addOperator(textInput);
        p.getRootOperator().addOperator(OperatorService.createOperator("KMeans"));
        p.getRootOperator().addOperator(OperatorService.createOperator("AttributeSumClusterCharacterizer"));

        p.save(new File("Process.xml"));
       
        IOContainer io = p.run();
        SimpleExampleSet ses = (SimpleExampleSet) io.get(SimpleExampleSet.class);
        System.out.println(ses.getExample(0));       
        System.exit(0);
fit4005
contains the PDF files. If I load text files everything works fine. Any ideas why is this happening and how can I fix it?

Thanks in advance,
Behi
Tagged:

Answers

  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,702  RM Founder
    Hi,

    sorry, but I do not have a direct solution. But I would suggest that you setup the process in the GUI first and use the possibility for breakpoints etc. in order to trace down the problem. If everything works fine in the GUI, you can then simply use

    Process process = new Process(xmlFile);
    or

    Process process = new Process(xmlString);
    and

    process.run();
    in order to deploy the process. It is usually much easier to get things right with the GUI mode before you include  the complete process into your own application.

    Cheers,
    Ingo
    RapidMiner Wisdom 2020
    February 11th and 12th 2020 in Boston, MA, USA

Sign In or Register to comment.