The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Request for advice on processing big data (geospatial) using RapidMiner
A_Houghstow
Member Posts: 2 Learner I
in Help
Hi RM Community,
I am a newbie looking for some advice on getting started. I am currently trying to predict which locations around the world are most vulnerable to experiencing environmental conflict. My goal is to build a model that can predict this at a local (eg town/country/subdistrict) level. I've assembled a PostGIS database of global environmental, governance, development, and conflict data, including a lot of high resolution global-scale rasters. The database is stored on AWS.
I recently tried importing a small subset of this data to RapidMiner Studio to see if I could run my first query. The import included one global raster mapping cropland, one point file on conflict locations, and one set of polygons (~25 sq km hexagons, global) to serve as boundaries of interest. The import took a really long time. I had to stop after a couple of hours and change locations, and this meant stopping the import entirely since I was running Studio locally.
I have been trying to figure out a workaround so I can ultimately work with all my data using RapidMiner. Perhaps running RapidMiner Studio on an AWS instance would work? (I am doing research with an academic license and don't need to deploy the model yet, so Server may be out of the picture at this point.) Maybe there is some intermediate step I should take to make working with the data easier for RapidMiner?
My background is in social science and stats, but I am new to big data, ML, and database architecture, so I would very much appreciate any advice on the challenge!
Thank you so much.
@sgenzer, putting this question on your radar. Thank you for answering my question about RapidMiner Server previously!
I am a newbie looking for some advice on getting started. I am currently trying to predict which locations around the world are most vulnerable to experiencing environmental conflict. My goal is to build a model that can predict this at a local (eg town/country/subdistrict) level. I've assembled a PostGIS database of global environmental, governance, development, and conflict data, including a lot of high resolution global-scale rasters. The database is stored on AWS.
I recently tried importing a small subset of this data to RapidMiner Studio to see if I could run my first query. The import included one global raster mapping cropland, one point file on conflict locations, and one set of polygons (~25 sq km hexagons, global) to serve as boundaries of interest. The import took a really long time. I had to stop after a couple of hours and change locations, and this meant stopping the import entirely since I was running Studio locally.
I have been trying to figure out a workaround so I can ultimately work with all my data using RapidMiner. Perhaps running RapidMiner Studio on an AWS instance would work? (I am doing research with an academic license and don't need to deploy the model yet, so Server may be out of the picture at this point.) Maybe there is some intermediate step I should take to make working with the data easier for RapidMiner?
My background is in social science and stats, but I am new to big data, ML, and database architecture, so I would very much appreciate any advice on the challenge!
Thank you so much.
@sgenzer, putting this question on your radar. Thank you for answering my question about RapidMiner Server previously!
0
Answers
Without providing you a direct answer to your question, did you take a look into previous discussions such as: https://community.rapidminer.com/discussion/25118/geographic-operations-in-rapidminer?
Hope this can be of any help
Cheers
Sven
there's also the possibility of executing the most resource-intensive processes in the RapidMiner Cloud available from your Studio.
If it's a one-time thing (importing and processing the data), this could be sufficient.
Regards,
Balázs
Dortmund, Germany
and shared by @DocMusher. I'll check back in and share what worked after taking some time to work through the tutorial.