Generating Synthetic Data or Simulated Data
I am new to RapidMiner but not new to data science. Synthetic data has its uses in developing data science solutions. I am looking for the best RapidMiner approach to simulate booking events, such as airline bookings. As an example consider a single flight, each day a certain number of passengers book or cancel for this flight. If the flight leaves say 3/1/2019, the bookings could start coming in about 60 days prior, say 1/1/2019 and continue booking through the days leading up to the flight. So I have 60 booking days and one flight. In principle this is easy to simulate, even in Excel.
Imagine now that I have a hundred flights and a 60 day booking window. With a page of Python/Pandas I can quickly create this synthetic data, with different booking characteristics for each of my flights depending on flight date, origin and destination, among other factors.
How should I conceptually get started with this in RapidMiner Studio? I can assure you I have rummaged through the nodes named "Generate" but I did not see an obvious and simple way to go about this. I am sure I must have missed something. This is where RapidMiner experts like you, dear Reader, can be very helpful. I am looking for some guidance, not a full solution. Many thanks.
Best Answer
-
SGolbert RapidMiner Certified Analyst, Member Posts: 344 Unicorn
Hi Omar,
there is a very simple way that's right on your hands: you can use the same Python scripts that you have been using. You just need to install the Python scripting extension.
I avoid repeating myself or reinventing the wheel as much as I can, so I think that in your case it is also the "expert" solution.
Regards,
Sebastian
0
Answers
Thanks Sebastian. Of course that makes a lot of sense - using the encapsulated Python. I am able to do this, yes. I will explore the capabilities of the scripting extensions some more as well.