[SOLVED] RapidMiner faster than RapidAnalytics

veveveve Member Posts: 63 Contributor II
edited November 2018 in Help
Hello,

I did a project in RapidMiner. However it is faster on my computer (i3, 8GB RAM) than on RapidAnalytics(EC2, 15.5 GB RAM, 4 coeurs)

Couls you help me on this point?


Thank you!

Answers

  • homburghomburg Moderator, Employee, Member Posts: 114 RM Data Scientist
    Hi Alina,

    there maybe a plausible explanation for your observation. In fact I do not really know enough details about those virtual computing cores of an EC2 instance to provide a technical explanation, but depending on your process setup a local core may be faster than a virtual cpu core. If your process makes heavy use of an operator whose internal processing steps cannot be parallelized due to the chosen algorithm your local core may very likely be faster computing this step than one of the virtual cores.

    Cheers,
    Helge
  • veveveve Member Posts: 63 Contributor II
    Hello,


    I'm having an association rules workfow:

    - pivoting a couple user-item
    - transforming the table into a binary one
    - creating frequent data sets with FP-GROWTH
    - applying the creation association rule component


    Does that means that these algorithmes are not paralelised?

    To be clear I will give more details:
    - I'm using rapidminer on my machine and I'm launcing the process from it on rapidanalytics(amazon). RapidAnalytics is reading the data from a MySQL database.
    - I'm having 1 234 600 couples user-item.
    - The simple reading on the rapidanalytics database took 20 minutes. By reading the CSV file that correspons to this database on my local machine it takes 1-2 seconds.
    - By runing the process on a sample of 1000 couples on my computer it took  several seconds and on the server it didn't finish in one hour..

    I'm using the comunity edition of the RapidMiner (5.3) and the RapidAnalytics (comunity edition) version 1.3.015 

    Alina
  • homburghomburg Moderator, Employee, Member Posts: 114 RM Data Scientist
    Hi Alina,

    in this case we can easily reject the "single-thread" theory. There are some algorithms you simple cannot distribute, but this is not applicable here. The huge differences in runtimes make it much more likely that there is an connectivity issue somewhere. Where is your database located? Could you try to upload your example set to the server and check the runtime again?

    Cheers,
    Helge
  • veveveve Member Posts: 63 Contributor II
    Hello,


    The database is situated on the server. (on the ec2)

    I created the dataabse beecause when I tryed to use
    - the upload button sait it uploaded the data (.csv) but in fact I could't find it there
    - by putting the file or the rapidminer data object in the rapidanalytics repository I had a socket error (after half an hour). Only putting workflow is working perfectly..

    Note: I tryed several times for each one of these 2 methods and I got the same result.


    Thank you!!

    Alina


  • homburghomburg Moderator, Employee, Member Posts: 114 RM Data Scientist
    Hi Alina,

    do I understand you correctly that a process which runs on your local machine in a few seconds does not finish on your server after one hour? If this is the case there must be a significant issue in the server config. Could you check if there is enough hdd capacity available for the server and its database? Logged in the server you can click on >>Administration -> System information<< and have a look the server log and memory usage. Maybe you can post the log file or send more details about the errors you receive when trying to access the server repository.

    Cheers,
    Helge
  • veveveve Member Posts: 63 Contributor II
    Hello,

    yes, you do understand it right:

    Server informations:

    Time Sep 25, 2014 9:02:39 AM
    Up since Sep 25, 2014 8:51:52 AM
    Total memory 4.5 GB
    Maximum memory 12 GB
    Free memory 2.2 GB

    In the logs there are a lot of errors, I'm not sure if this is normal for RapidMiner.
    In the server logs there are a lot of X11 errors (the EC2 server is not having an interface).

    The java version:

    java version "1.7.0_67"
    Java(TM) SE Runtime Environment (build 1.7.0_67-b01)
    Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)

    I will post only the error lines from the logs since I have 10 719 and  lines of logs (boot + server)

    I started to copy the files on the server with the rapidanalytics repository, I will post the error message as soon as I have it!

    Thnak you!
  • veveveve Member Posts: 63 Contributor II
    In the server logs I have this error a lot of times:


    2014-09-25 09:03:26,472 ERROR [org.apache.catalina.core.ContainerBase.[jboss.web].[localhost].[/RA].[Faces Servlet]] (http-8088-7) Servlet.service() for servlet Faces Servlet threw exception: java.lang.NoClassDefFoundError: Could not initialize class sun.awt.X11GraphicsEnvironment
           at java.lang.Class.forName0(Native Method) [:1.7.0_67]
           at java.lang.Class.forName(Class.java:190) [:1.7.0_67]
           at java.awt.GraphicsEnvironment.createGE(GraphicsEnvironment.java:102) [:1.7.0_67]
           at java.awt.GraphicsEnvironment.getLocalGraphicsEnvironment(GraphicsEnvironment.java:81) [:1.7.0_67]
           at java.awt.image.BufferedImage.createGraphics(BufferedImage.java:1182) [:1.7.0_67]
           at org.ajax4jsf.resource.Java2Dresource.getImage(Java2Dresource.java:115) [:3.3.3.Final]
           at org.ajax4jsf.resource.Java2Dresource.send(Java2Dresource.java:89) [:3.3.3.Final]
           at org.ajax4jsf.resource.ResourceLifecycle.sendResource(ResourceLifecycle.java:221) [:3.3.3.Final]
           at org.ajax4jsf.resource.ResourceLifecycle.send(ResourceLifecycle.java:160) [:3.3.3.Final]
           at org.ajax4jsf.resource.InternetResourceService.load(InternetResourceService.java:335) [:3.3.3.Final]
           at org.ajax4jsf.cache.LRUMapCache.load(LRUMapCache.java:116) [:3.3.3.Final]
           at org.ajax4jsf.cache.LRUMapCache.get(LRUMapCache.java:87) [:3.3.3.Final]
           at org.ajax4jsf.resource.InternetResourceService.serviceResource(InternetResourceService.java:195) [:3.3.3.Final]
           at org.ajax4jsf.resource.InternetResourceService.serviceResource(InternetResourceService.java:141) [:3.3.3.Final]
           at org.ajax4jsf.webapp.BaseFilter.doFilter(BaseFilter.java:508) [:3.3.3.Final]
           at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:274) [:6.0.0.Final]
           at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:242) [:6.0.0.Final]
           at de.rapidanalytics.web.filter.RichFacesFirefox11Filter.doFilter(RichFacesFirefox11Filter.java:36) [:]
           at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:274) [:6.0.0.Final]
           at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:242) [:6.0.0.Final]
           at de.rapidanalytics.web.filter.AccessLogFilter.doFilter(Unknown Source) [:]
           at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:274) [:6.0.0.Final]
           at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:242) [:6.0.0.Final]
           at de.rapidanalytics.web.filter.IE8CompatibilityFilter.doFilter(Unknown Source) [:]
           at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:274) [:6.0.0.Final]
           at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:242) [:6.0.0.Final]
           at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:312) [:3.1.0.M2]
           at org.springframework.security.web.access.intercept.FilterSecurityInterceptor.invoke(FilterSecurityInterceptor.java:116) [:3.1.0.M2]
           at org.springframework.security.web.access.intercept.FilterSecurityInterceptor.doFilter(FilterSecurityInterceptor.java:83) [:3.1.0.M2]
           at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:324) [:3.1.0.M2]
           at de.rapidanalytics.web.filter.auth.SpringEJBAuthorizationFilter.doFilter(Unknown Source) [:]
           at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:324) [:3.1.0.M2]
           at org.springframework.security.web.access.ExceptionTranslationFilter.doFilter(ExceptionTranslationFilter.java:95) [:3.1.0.M2]
           at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:324) [:3.1.0.M2]
           at org.springframework.security.web.session.SessionManagementFilter.doFilter(SessionManagementFilter.java:100) [:3.1.0.M2]
           at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:324) [:3.1.0.M2]
           at org.springframework.security.web.authentication.AnonymousAuthenticationFilter.doFilter(AnonymousAuthenticationFilter.java:79) [:3.1.0.M2]
           at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:324) [:3.1.0.M2]
           at org.springframework.security.web.servletapi.SecurityContextHolderAwareRequestFilter.doFilter(SecurityContextHolderAwareRequestFilter.java:54) [:3.1.0.M2]
           at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:324) [:3.1.0.M2]
           at org.springframework.security.web.savedrequest.RequestCacheAwareFilter.doFilter(RequestCacheAwareFilter.java:35) [:3.1.0.M2]
           at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:324) [:3.1.0.M2]
           at org.springframework.security.web.authentication.AbstractAuthenticationProcessingFilter.doFilter(AbstractAuthenticationProcessingFilter.java:187) [:3.1.0.M2]
           at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:324) [:3.1.0.M2]
           at org.springframework.security.web.authentication.logout.LogoutFilter.doFilter(LogoutFilter.java:105) [:3.1.0.M2]
           at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:324) [:3.1.0.M2]
           at org.springframework.security.web.context.SecurityContextPersistenceFilter.doFilter(SecurityContextPersistenceFilter.java:80) [:3.1.0.M2]
           at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:324) [:3.1.0.M2]
           at org.springframework.security.web.FilterChainProxy.doFilter(FilterChainProxy.java:165) [:3.1.0.M2]
           at org.springframework.web.filter.DelegatingFilterProxy.invokeDelegate(DelegatingFilterProxy.java:237) [:3.0.5.RELEASE]
           at org.springframework.web.filter.DelegatingFilterProxy.doFilter(DelegatingFilterProxy.java:167) [:3.0.5.RELEASE]
           at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:274) [:6.0.0.Final]
           at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:242) [:6.0.0.Final]
           at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:275) [:6.0.0.Final]
           at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) [:6.0.0.Final]
           at org.jboss.web.tomcat.security.SecurityAssociationValve.invoke(SecurityAssociationValve.java:181) [:6.0.0.Final]
           at org.jboss.modcluster.catalina.CatalinaContext$RequestListenerValve.event(CatalinaContext.java:285) [:1.1.0.Final]
           at org.jboss.modcluster.catalina.CatalinaContext$RequestListenerValve.invoke(CatalinaContext.java:261) [:1.1.0.Final]
           at org.jboss.web.tomcat.security.JaccContextValve.invoke(JaccContextValve.java:88) [:6.0.0.Final]
           at org.jboss.web.tomcat.security.SecurityContextEstablishmentValve.invoke(SecurityContextEstablishmentValve.java:100) [:6.0.0.Final]
           at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) [:6.0.0.Final]
           at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) [:6.0.0.Final]
           at org.jboss.web.tomcat.service.jca.CachedConnectionValve.invoke(CachedConnectionValve.java:158) [:6.0.0.Final]
           at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) [:6.0.0.Final]
           at org.jboss.web.tomcat.service.request.ActiveRequestResponseCacheValve.invoke(ActiveRequestResponseCacheValve.java:53) [:6.0.0.Final]
           at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:362) [:6.0.0.Final]
           at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:877) [:6.0.0.Final]
           at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:654) [:6.0.0.Final]
           at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:951) [:6.0.0.Final]
           at java.lang.Thread.run(Thread.java:745) [:1.7.0_67]

    2014-09-25 09:04:42,158 INFO  [com.rapidminer.example.db.ExampleSetToDB] (QuartzScheduler_Worker-1) Stored 4607/638092 rows. ETA in 687 s (Thu Sep 25 09:16:09 UTC 2014).
    2014-09-25 09:04:47,159 INFO  [com.rapidminer.example.db.ExampleSetToDB] (QuartzScheduler_Worker-1) Stored 8659/638092 rows. ETA in 727 s (Thu Sep 25 09:16:54 UTC 2014).
    The error that I have when I try to copy in the repository "Error copying repository entry to //.... Cannot store object at ... java.net.SocketTimeoutException: Read time out"

    Note: I do not have this time out error when I try to copy any other workflow!

    Thank you!
  • veveveve Member Posts: 63 Contributor II
    Some selected errors from the server and from the boot logs:

    Line 1655: 08:51:53,210 ERROR [STDERR]   Scheduler class: 'org.quartz.core.QuartzScheduler' - running locally.
    Line 1656: 08:51:53,210 ERROR [STDERR]   NOT STARTED.
    Line 1657: 08:51:53,210 ERROR [STDERR]   Currently in standby mode.
    Line 1658: 08:51:53,210 ERROR [STDERR]   Number of jobs executed: 0
    Line 1659: 08:51:53,210 ERROR [STDERR]   Using thread pool 'org.quartz.simpl.SimpleThreadPool' - with 10 threads.
    Line 1660: 08:51:53,210 ERROR [STDERR]   Using job-store 'org.quartz.impl.jdbcjobstore.JobStoreCMT' - which supports persistence. and is not clustered.
    Line 1661: 08:51:53,210 ERROR [STDERR]

    Line 1122: 08:51:44,156 ERROR [STDERR]        name : MySQL
    Line 1123: 08:51:44,156 ERROR [STDERR]     version : 5.5.38-0ubuntu0.14.04.1
    Line 1124: 08:51:44,156 ERROR [STDERR]       major : 5
    Line 1125: 08:51:44,156 ERROR [STDERR]       minor : 5
    Line 1126: 08:51:44,157 ERROR [STDERR] 23564 [Thread-2] INFO org.hibernate.cfg.SettingsFactory - Driver ->
    Line 1127: 08:51:44,157 ERROR [STDERR]        name : MySQL-AB JDBC Driver
    Line 1128: 08:51:44,157 ERROR [STDERR]     version : mysql-connector-java-5.1.24 ( Revision: ${bzr.revision-id} )
    Line 1129: 08:51:44,157 ERROR [STDERR]       major : 5
    Line 1130: 08:51:44,157 ERROR [STDERR]       minor : 1
    Line 1131: 08:51:44,158 ERROR [STDERR] 23565 [Thread-2] INFO org.hibernate.dialect.Dialect - Using dialect: org.hibernate.dialect.MySQLDialect
    Line 1132: 08:51:44,164 ERROR [STDERR] 23571 [Thread-2] INFO org.hibernate.transaction.TransactionFactoryFactory - Transaction strategy: org.hibernate.ejb.transaction.JoinableCMTTransactionFactory
    Line 1133: 08:51:44,164 ERROR [STDERR] 23571 [Thread-2] INFO org.hibernate.transaction.TransactionManagerLookupFactory - instantiating TransactionManagerLookup: org.hibernate.transaction.JBossTransactionManagerLookup
    Line 1134: 08:51:44,164 ERROR [STDERR] 23571 [Thread-2] INFO org.hibernate.transaction.TransactionManagerLookupFactory - instantiated TransactionManagerLookup
    Line 1135: 08:51:44,165 ERROR [STDERR] 23572 [Thread-2] INFO org.hibernate.cfg.SettingsFactory - Automatic flush during beforeCompletion(): disabled
    Line 1136: 08:51:44,165 ERROR [STDERR] 23572 [Thread-2] INFO org.hibernate.cfg.SettingsFactory - Automatic session close at end of transaction: disabled
    Line 1137: 08:51:44,165 ERROR [STDERR] 23572 [Thread-2] INFO org.hibernate.cfg.SettingsFactory - JDBC batch size: 15
    Line 1138: 08:51:44,166 ERROR [STDERR] 23573 [Thread-2] INFO org.hibernate.cfg.SettingsFactory - JDBC batch updates for versioned data: disabled
    	Line 365: 08:51:37,896 ERROR [STDERR] 17303 [Thread-2] INFO org.hibernate.cfg.AnnotationBinder - Binding entity from annotated class: org.jboss.ejb3.timerservice.mk2.persistence.TimerEntity
    Line 379: 08:51:38,302 ERROR [STDERR] 17709 [Thread-2] INFO org.hibernate.cfg.SettingsFactory - Database ->
    Line 380: 08:51:38,302 ERROR [STDERR]        name : HSQL Database Engine
    Line 381: 08:51:38,302 ERROR [STDERR]     version : 1.8.0
    Line 382: 08:51:38,302 ERROR [STDERR]       major : 1
    Line 383: 08:51:38,303 ERROR [STDERR]       minor : 8
    Line 384: 08:51:38,303 ERROR [STDERR] 17710 [Thread-2] INFO org.hibernate.cfg.SettingsFactory - Driver ->
    Line 385: 08:51:38,303 ERROR [STDERR]        name : HSQL Database Engine Driver
    Line 386: 08:51:38,303 ERROR [STDERR]     version : 1.8.0
    Line 387: 08:51:38,303 ERROR [STDERR]       major : 1
    Line 388: 08:51:38,303 ERROR [STDERR]       minor : 8
    Line 389: 08:51:38,337 ERROR [STDERR] 17744 [Thread-2] INFO org.hibernate.dialect.Dialect - Using dialect: org.hibernate.dialect.HSQLDialect
    Line 396: 08:51:38,365 ERROR [STDERR] 17772 [Thread-2] INFO org.hibernate.cfg.SettingsFactory - JDBC batch size: 15

    Thank you!
  • homburghomburg Moderator, Employee, Member Posts: 114 RM Data Scientist
    Hi Alina,

    thank you for the detailed information. Here are two configuration changes you might try:

    1) In RapidMiner you set a higher timeout value for server connections (Tools -> Preferences -> System -> connection.timeout). Increasing this value might help when connecting to servers via the Internet.

    2) It looks like your MySQL database server works really slow. This is a known issue for version 5.5 using the innodb engine. Please check your MySQL configuration (usually /etc/my.cnf) and if necessary try to set the following options:

    innodb_support_xa=0

    innodb_flush_log_at_trx_commit = 0

    sync_binlog=0
    Please backup your data before experimenting with these settings. Hope it helps.

    Cheers,
    Helge
  • veveveve Member Posts: 63 Contributor II
    Hello,

    I tryed changing the time out value in RapidMiner: it stopped(with the time out error) after one hour and a half.(note: I transfered the same file in filezilla in 5 minutes)

    With the mysql configuration the reading time has improved (now it is done in 3 minutes) => Thank you very much!!

    However, the same process that is working on my local computer (the association rules one that I described in my second post) is giving a stackoverflow error after 3 minutes and a half in the FP-Growth operator..

    Sep 25, 2014 2:44:20 PM de.rapidanalytics.execution.AbstractProcessExecutor runProcessNow
    SEVERE: Process failed: java.lang.StackOverflowError
    java.lang.StackOverflowError
    at java.util.HashMap.inflateTable(HashMap.java:317)
    at java.util.HashMap.put(HashMap.java:488)
    at com.rapidminer.operator.learner.associations.fpgrowth.FPTreeNode.addItemSet(FPTreeNode.java:84)
    at com.rapidminer.operator.learner.associations.fpgrowth.FPTreeNode.addItemSet(FPTreeNode.java:102)
    at com.rapidminer.operator.learner.associations.fpgrowth.FPTreeNode.addItemSet(FPTreeNode.java:102)

              subprocess 'Nested Chain'
          ==>   |     +- FP-Growth[1] (FP-Growth)
                |     +- Create Association Rules[0] (Create Association Rules)
    By watching the memory and the CPU I see that only 1.5CPU is used /4 CPU and the memory is at 50%+ about 20% cache

    Thank you!


    Alina
  • homburghomburg Moderator, Employee, Member Posts: 114 RM Data Scientist
    Hi Alina,

    the error message shows that there is not enough memory for the java stack left. This could be a false configuration or maybe an infinite recursion (a bug so to speak) in the FP-GROWTH operator. Since I have not found anything about such a bug in our bugtracker lets see if you can increase java stack memory a little bit:

    1) Please edit bin/run.conf in your RA installation folder and add -Xss:4m to the java command line.
    2) I noticed you have a java setting  -XX:MaxPermSize=4096m active in your setup. You may try to decrease this value to something like 512m since java will add this on top of the memory stack. In your setup this may cause a memory allocation of up to 19 gigs.

    Cheers,
    Helge
  • veveveve Member Posts: 63 Contributor II
    Hello,

    I did the changes ont the run.conf script but it didn't worked (I still have the error message on the FP-Growt operator):

    My java configurations:


    if [ "x$JAVA_OPTS" = "x" ]; then
      JAVA_OPTS="-Xms1024m -Xmx14336m -XX:MaxPermSize=512m -Xss4m -Dorg.jboss.resolver.warning=true -Dsun.rmi.dgc.client.gcInterval=3600000 -Dsun.rmi.dgc.server.gcInterval=3600000"
    fi

    I didn't knew java configuration xss before, are there any changes that I would need a bigger xss value (than 4)?
    thank you!

    Alina
  • veveveve Member Posts: 63 Contributor II
    with -Xss128m is working as fast as on my windows machine and on RapidMiner :)
  • veveveve Member Posts: 63 Contributor II
    However, this is working until 10 000 (3 minutes) couples user-item. At 30 000 couples I got an 500 error after 18 minutes.
    (the bloking point is in replacing missing with zeros!!!)
    I will try to increase even more the xss...
    do you have others ideeas?

    Thank you,

    Alina
  • veveveve Member Posts: 63 Contributor II
    I will mark the problem as solved since the speed of RapidAnalytics increased significatly with the noted options in Java and MySQL.
Sign In or Register to comment.