"Automated short term gas production forecasting using machine learning/big data/data mining"

maurits_freriksmaurits_freriks Member Posts: 28 Contributor I
edited June 2019 in Help



First let me introduce quickly. I'm Maurits Freriks, student Business Analytics of VU Amsterdam. Recently I'm doing an internship for 3 months. I've to investigate if it's possible to automated short term gas production. With other words: An predicition based on historical data. I do have a litte experience with rapid miner but not that much. And first of all I'm wondering if this problem could be solved with Rapid Miner?


What I've done so far:

- I've received an dataset with historical datavalues of the last 3 years. The data comes from measure points for example: The flow of the amount of gass on a specific time serie, degrees, pressure etc.

- I've devided this dataset in a smaller dataset containing only 1 month of data.

- I've built a process with the small dataset and operator polynomial regression. I've received a solution with some coeffincients but if i test this to to total data set, the deviation was to high so the formule was useless. 


Now my question is before spending more and more time in Rapid Miner, if there are some recommendations which operators I've to use. And for example do I have to make a testset and trainingset. If yes, is it right if I devided the total dataset into 80% training an 20% testset.


I appreciate your attention, effort and time. Hopefully someone could help me out! 

And by the way: Sorry for my english!!


With kind regards,


Maurits Freriks 


  • Options
    kypexinkypexin Moderator, RapidMiner Certified Analyst, Member Posts: 291 Unicorn

    Hi @maurits_freriks


    From the description of your task it seems that you could actually use time series RapidMiner extension to predict production volumes. Hard to make any practical advise without seeing the actual data, but this type of predictions are quite common in some domains and you may just search thim forum for 'time series prediction' and you'll get tens of practical solutions on different data. This could be pretty good starting point for your problem also.  


    PS I personally only have played around a bit with time series extension but I know that many people here on the forum are actually very skilled in this topic; as I mentioned, it would be actually beneficial if you could also share the data itself. 

  • Options
    sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager

    hello @maurits_freriks - welcome to the community and very glad that you're using RapidMiner to solve your problem.  :)  I had a client a while ago who was in the oil & gas industry and I think you are on the right path.  To help choose a model, I would recommend using the mod.rapidminer.com page.  As for splitting the data and other "best practices", please go through all the tutorial processes.  They were written by data scientists and are very well done.

    Good luck!


  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,524 RM Data Scientist

    Dear Maurits,


    great to have you here! Have a look at my recent blog post on validation: https://towardsdatascience.com/when-cross-validation-fails-9bd5a57f07b5 it has a different focus, but the use case was similar.




    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • Options
    maurits_freriksmaurits_freriks Member Posts: 28 Contributor I


    Hi @kypexin,


    Thanks for your quick reply. I've attached a screenshot from my dataset. The both flows are exactly the same but the difference is only the measurement. With the historical flow from the day before and the actual pressure, CO2 and degrees I would like to make an prediction. Is this still possible with Time series Rapid miner extension? 


    I've searched a bit on the term "time series" but i didn't find any good answers for me to understand the method. 

  • Options
    maurits_freriksmaurits_freriks Member Posts: 28 Contributor I

    Hi @sgenzer,


    Thanks for you quick reply! I really appreciate your effort!

    Could you be so kind the share the contact of your client in PM? Maybe he could help me out and give som tips and tricks!



Sign In or Register to comment.