Options

"Reading a csv file?"

andres222esandres222es Member Posts: 1 Newbie
edited May 2019 in Help
I am just starting with rapidminer, it's my second day with the platform. I am managing a data file for a bus company, here in spain. I would like
to have a starting point (model), to obtain several basic stats:sales, transactions, sales by platform, unique users per year(identified with device ID), recurrence factor per month(total transactions per month/unique device IDS), median net price of ticket etc.I have 4 years worth of data.
My data comes from csv files, with ;  as a separator and 50 columns as parameters(deviceID, operating system, return or one way ticket, etc).

Once i have those stats, i would like to use rapidminer modelling or predictive analysis for example to estimate the number of tickets sold next year and the total number of sales, and if we will increase in unique users.
I-ve tried using the csv process, and haven't got anything back. How do i post a minimal chunk of my data here so that someone here can give a starting point ? or if anyone can suggest how to obtain a starting template for what i want it would be helpful.
Another problem i have is that my data is logged in a 6 gb csv file, and  i've managed to split the data into ten chunks, but this is a bit annoying. I think rapidminer can't manage such file? , when i try to open the file the program stops working( i have to close it, it doesn't answer)
Any help would be kindly appreciated

 
Tagged:

Answers

  • Options
    Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
    Indeed a file that large is probably going to cause problems, at least in RapidMiner Studio on a typical desktop.  How much RAM do you have available?  RapidMiner is going to try to put the whole file in memory at once.
    Did you look at the Read CSV operator tutorial process?  Simply reading your file in shouldn't be difficult as long as it isn't too big.
    Personally when I have a project like this (with a large raw data file) I think it is easier to start by taking just the first 100 rows or so (manually copy them from the original file) plus the header.  Then you can set up your entire data import and ETL process using that small file and make sure you are getting all the output you want.  Then you can run the whole thing on your larger files.
    Everything that you have described is easy to do in RapidMiner (outside of the memory constraints already noted).  Summarizing information for different buses and routes by date will be handled by the Aggregate operators.


    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
Sign In or Register to comment.