"Automated short term gas production forecasting using machine learning/big data/data mining"
Hi,
First let me introduce quickly. I'm Maurits Freriks, student Business Analytics of VU Amsterdam. Recently I'm doing an internship for 3 months. I've to investigate if it's possible to automated short term gas production. With other words: An predicition based on historical data. I do have a litte experience with rapid miner but not that much. And first of all I'm wondering if this problem could be solved with Rapid Miner?
What I've done so far:
- I've received an dataset with historical datavalues of the last 3 years. The data comes from measure points for example: The flow of the amount of gass on a specific time serie, degrees, pressure etc.
- I've devided this dataset in a smaller dataset containing only 1 month of data.
- I've built a process with the small dataset and operator polynomial regression. I've received a solution with some coeffincients but if i test this to to total data set, the deviation was to high so the formule was useless.
Now my question is before spending more and more time in Rapid Miner, if there are some recommendations which operators I've to use. And for example do I have to make a testset and trainingset. If yes, is it right if I devided the total dataset into 80% training an 20% testset.
I appreciate your attention, effort and time. Hopefully someone could help me out!
And by the way: Sorry for my english!!
With kind regards,
Maurits Freriks
Answers
Hi @maurits_freriks
From the description of your task it seems that you could actually use time series RapidMiner extension to predict production volumes. Hard to make any practical advise without seeing the actual data, but this type of predictions are quite common in some domains and you may just search thim forum for 'time series prediction' and you'll get tens of practical solutions on different data. This could be pretty good starting point for your problem also.
PS I personally only have played around a bit with time series extension but I know that many people here on the forum are actually very skilled in this topic; as I mentioned, it would be actually beneficial if you could also share the data itself.
Vladimir
http://whatthefraud.wtf
hello @maurits_freriks - welcome to the community and very glad that you're using RapidMiner to solve your problem. I had a client a while ago who was in the oil & gas industry and I think you are on the right path. To help choose a model, I would recommend using the mod.rapidminer.com page. As for splitting the data and other "best practices", please go through all the tutorial processes. They were written by data scientists and are very well done.
Good luck!
Scott
Dear Maurits,
great to have you here! Have a look at my recent blog post on validation: https://towardsdatascience.com/when-cross-validation-fails-9bd5a57f07b5 it has a different focus, but the use case was similar.
Cheers,
Martin
Dortmund, Germany
Hi @kypexin,
Thanks for your quick reply. I've attached a screenshot from my dataset. The both flows are exactly the same but the difference is only the measurement. With the historical flow from the day before and the actual pressure, CO2 and degrees I would like to make an prediction. Is this still possible with Time series Rapid miner extension?
I've searched a bit on the term "time series" but i didn't find any good answers for me to understand the method.
Hi @sgenzer,
Thanks for you quick reply! I really appreciate your effort!
Could you be so kind the share the contact of your client in PM? Maybe he could help me out and give som tips and tricks!
Thanks!