Options

new features in the 4.4 release

emaema Member Posts: 33 Maven
edited November 2018 in Help
Hi,
I got an email that the new rapid miner 4.4 will be release soon,

i cant wait ...

what are the new features
specially in clustering and classifications?

Answers

  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    here's a snapshot from the changes.txt in the current developer repository:

    Changes from RapidMiner 4.3.2 to RapidMiner 4.4 [2009/??/??]
    ---------------------------------------------------------------

      * New operators:
     
          - ExampleSetSuperset
          - ExampleSetUnion
          - MacroConstruction
          - CumulateSeries
          - FastLargeMargin

     
      * Parameters will now be adapted according to an operator
        rename, for example the settings of operators like
        the ProcessLog or the parameter optimization operators
        are automatically corrected to the new operator names
       
      * Graphs like the similarity graph display the strengths
        of the edges now by their color
       
      * Added new tree layout algorithm for the decision trees
        preventing most overlapping, the old tighter version
        is available as layout type "Tree (Tight)"
       
      * Decision trees now show the subtree size as tool tip
        for the inner nodes, the edges are now darker for
        larger subtrees and brighter for smaller ones
     
      * Tables like the (meta) data view now supports a new
        context menu for common table operations like column
        sorting or row / column selection
       
      * The New Operator dialog now also supports full text
        search in the description texts of the operators
       
      * RapidMiner now stores all parameter values in the
        process files including the default values which ensures
        a better compatibility with future versions. The XML tab,
        however, only shows the values differing from the default
     
      * Univariate and multivariate series windowing operators
        now also support nominal attributes and even mixed
        types in cases where the series is represented by
        the examples (rows) of the data set
     
      * The range statistics of nominal attributes in the
        meta data view now shows the values with highest and
        lowest occurrency counts, sorts the values according
        to the counts, and displays only an excerpt of the
        occurring values if large amounts of different values
        exist
     
      * List of recent files is now directly saved after opening
        a new process and not only during shutdown
     
      * Changes in the process setup are now allowed even during
        process runtime, e.g. when waiting at a breakpoint
       
      * Updated to latest version of Weka (as of February 26th, 2009)
               
      * Bugfixes:
     
          - fixed bug accuracy criterion for the revised decision
            tree learner
          - Fixed bug in parameter list of ValueSubgroupIterator
          - Fixed bug in ExceptionHandling which sometimes led to
            doubled outputs
          - Fixed bug in ProcessBranch which sometimes led to
            doubled outputs
          - ViewAttributes did not add min and max statistics
            so that those statistics where not calculated on
            data table views
         
               
    Changes from RapidMiner 4.3.1 to RapidMiner 4.3.2 [2009/02/17]
    ---------------------------------------------------------------

      * New operators:
     
          - LinearDiscriminantAnalysis
          - QuadraticDiscriminantAnalysis
          - RegularizedDiscriminantAnalysis
          - DasyLabExampleSource
          - FileIterator
          - ExceptionHandling
          - ChangeAttributeNamesReplace
          - ChangeAttributeNames2Generic
          - DateAdjust
          - MinMaxBinDiscretization
          - RainflowMatrix


      * Deprecated operators:
       
          - DirectoryIterator (use FileIterator instead)
       
      * Renamed parameters:
     
          - ExampleSetWriter:
            quote_whitespace is now named quote_nominal_values
         
         
      * ExampleSetMerge can now handle missing values
     
      * RapidMiner does now better support counts for the in-
        and output types which should considerably reduce the
        amount of warnings if operators like IOConsumer,
        IOMultiplier or ExampleSetMerge (reducing several objects
        of the same type to one of the same) are used
       
      * FileIterator replaces DirectoryIterator and adds many
        new features like recursive iteration, file name based
        filtering, and a new macro for the parent path
     
      * Centroid based clusterings now support assigning unseen
        examples to the nearest cluster on apply time
     
      * ProcessBranch now supports a branching with respect
        to the existance of an input object
     
      * ClearProcessLog now also allows to remove the complete
        logging table
       
      * The logging tables of the ProcessLog operator will now
        not be generated during start up but during the first
        operator usage (and also during the following if the
        table was deleted in the meantime, e.g. in a loop)
     
      * Added support for different time zones, users can now
        define the preferred time zone in the settings dialog
        and time conversion operators are not able to respect
        this setting
       
      * Date and times are now displayed in the system's local
        settings
     
      * New plotter: Block
     
      * Added support for applying a log scale for the color
        column for the Scatter plot and the new Block plotter
       
      * Data tables like those generated by the process log
        are now de-coupled from the table used for plotting
        preventing that the rows will be sampled and rows
        would be removed from the data table
     
      * A double click on the region between two columns in
        the table header now automatically resizes the left
        column to a fitting size (known from Windows programs)
       
      * A double click on the same region while pressing CTRL
        will resize all table columns according to the contents
     
      * GuessValueTypes now only works on regular attributes
        and provides a parameter for extending it on the special
        attributes (work_on_special)
     
      * AttributeFilter now also provides a new parameter
        work_on_special
     
      * The operator Replace now also allows empty replace_by
        values
     
      * The ExampleSetJoin operator now also works if the
        id of the first example set is not part of the second
     
      * Guess value types can now handle missing values

      * CSVExampleSetWriter now supports the parameter quote_nominal
     
      * All feature selection and weighting operators now also
        provide the possibility to log the names of the features
        of the current generation's best individual
     
      * The Replace operator now supports capturing groups
       
      * The file based example source operators (ExampleSource,
        SimpleExampleSource, CSVExampleSource...) now better
        supports quoted strings and also escaped quotes (escaping
        with \")
         
      * Implementation details:
     
          - The method Tools.quotedSplit(...) should now be used
            instead of a regular split followed by the method
            Tools.mergeQuotedSplits(...)
     
         
      * Bugfixes:
     
          - fixed bug in DBScan for empty cluster models
          - fixed bug for simple sampling in cases where a local
            random seed was used
          - fixed bug in process logging to files which prevented
            the writing of the first logged result
          - fixed bug in PSO optimization for cases where the fitness
            should be minimized instead of maximized
          - fixed bug in binary performance measure which was not
            delivering the fitness for specificity, sensitivity,
            and youden index
          - fixed bug in meta data table viewer in cases where huge
            numbers of long nominal values existed which caused a
            crash of the Java Virtual Machine in some cases
         
       
    Changes from RapidMiner 4.3 to RapidMiner 4.3.1 [2009/01/12]
    ---------------------------------------------------------------

      * New operators:
     
          - RemoveDuplicates
          - Cluster2Prediction
          - DirectoryIterator
          - TextObjectWriter
          - TextObjectLoader
          - TextExtractor
          - SingleTextObjectInput
          - TextCleaner
          - TextObject2ExampleSet
          - TextSegmenter
          - AddAttribute
          - SetData
          - EMClustering
          - AttributeWeights2ExampleSet
          - TransitionGraph
          - DatabaseExampleVisualizationOperator
         
       
      * Revised decision tree learning which lead to drastically
        reduced runtimes and better tree models in terms of
        generalization capabilities
           
      * The bar chart now displays the category as label in the
        domain axis
       
      * Removed plotter: Bars 3D
     
      * The IOObjectReader now allows the definition of the expected
        output type
       
      * The LiftParetoChart does no longer re-apply the input model if
        a predicted label does already exist
       
      * Added the ability to "explode" tiles of pie and ring charts
     
      * Added several new options for the reporting operators of the
        RapidMiner Enterprise Edition as well as true parameter handling
        including type checks
       
      * Updated to latest release of Jung
     
      * Fixed GUI related memory leaks
     
     
      * Implementation details:
     
        - The class AttributeWeightsCreator was renamed to
          ExampleSet2AttributeWeights
       
         
      * Bugfixes:
         
        - Fixed a combination of GUI and process thread related
          memory leaks
        - Fixed bug in Series Multiple Plotter which prevented
          rescaling
        - Pie and Bar charts used class limit instead of legend
          limit in order to decide if the legend should be shown
        - special format in ExampleSetWriter ignored quote
          whitespace setting
        - bug in XVPrediction fixed
       

    Hope that satisfies your needs :P


    Greetings,
      Sebastian
  • Options
    emaema Member Posts: 33 Maven
    Thank you very much... can not wait
  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi Ema,
    then you could check out the developer version using the developer branch from cvs? A guide for checking out using eclipse is on our website.

    Greetings,
      Sebastian
  • Options
    IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    The new version 4.4 will be released this week. So only a few days left for waiting  ;D

    Cheers,
    Ingo
  • Options
    emaema Member Posts: 33 Maven
    Hi ,
    downloaded the new Rapidminer...

    I was wondering how to use the Cluster2Prediction ?

    Thank you
  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi Ema,
    Cluster2Prediction enables you to use classification performance measures for clustering, if label informations are available. For example think of the situation, where you know what has to be in the same cluster for a subset of your data. You then might use any flat clustering algorithm and test if it discovers your cluster structure. To achieve this, the operator matches the given cluster labels with the class labels in the best fitting way and converts the clusterattribute into a prediction attribute. You then might use the standard performance operators for classification to calculate the performance.

    Greetings,
      Sebastian
  • Options
    emaema Member Posts: 33 Maven
    Hi.
    Thank you very much

    It works great

    but with aggolom_clustering i tried to use it but
    it is not working

    i tried to flattern then to use example2cluster

    but still can not work ...


    Thank you in advance
  • Options
    IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hi,

    there seems to be a problem during the flattening of the agglomerative clustering. I send this topic to Sebastian who is our clustering expert.

    Cheers,
    Ingo
Sign In or Register to comment.