Request for advice on processing big data (geospatial) using RapidMiner
I am a newbie looking for some advice on getting started. I am currently trying to predict which locations around the world are most vulnerable to experiencing environmental conflict. My goal is to build a model that can predict this at a local (eg town/country/subdistrict) level. I've assembled a PostGIS database of global environmental, governance, development, and conflict data, including a lot of high resolution global-scale rasters. The database is stored on AWS.
I recently tried importing a small subset of this data to RapidMiner Studio to see if I could run my first query. The import included one global raster mapping cropland, one point file on conflict locations, and one set of polygons (~25 sq km hexagons, global) to serve as boundaries of interest. The import took a really long time. I had to stop after a couple of hours and change locations, and this meant stopping the import entirely since I was running Studio locally.
I have been trying to figure out a workaround so I can ultimately work with all my data using RapidMiner. Perhaps running RapidMiner Studio on an AWS instance would work? (I am doing research with an academic license and don't need to deploy the model yet, so Server may be out of the picture at this point.) Maybe there is some intermediate step I should take to make working with the data easier for RapidMiner?
My background is in social science and stats, but I am new to big data, ML, and database architecture, so I would very much appreciate any advice on the challenge!
Thank you so much.
@sgenzer, putting this question on your radar. Thank you for answering my question about RapidMiner Server previously!