The RapidMiner community is on read-only mode until further notice. Technical support via cases will continue to work as is. For any urgent licensing related requests from Students/Faculty members, please use the Altair academic forum here.
how to find for each object all containing objects within 3km radius using attributes such as lon la
Hello I am new to Rapidminer and I am clueless how I can tackle my problem,
as a training for RM I would like to find the amount of examples in a data set which are within a 5km radius around an example of a different data set. I found the haversine formula to calculate geographical distances and I also found a aggregate function for orthodromic gis calculations.
For example: I would like to find out how many ATMs are around a museum. In one data set I have a list with museums and its lat/long information and in the other set I have a complete list of ATMs of a large region with its lat/long info.
The generated attribute to the museum data set would be the amount of ATMs around each museum. Both data sets are rather large and I don't want to to calculate each combination in a cascade and then set a specific filter (which would probably take my whole life to complete). I am sure there is probably a much more convenient way but I don't see how.
Thx advance for any clues and tips.
as a training for RM I would like to find the amount of examples in a data set which are within a 5km radius around an example of a different data set. I found the haversine formula to calculate geographical distances and I also found a aggregate function for orthodromic gis calculations.
For example: I would like to find out how many ATMs are around a museum. In one data set I have a list with museums and its lat/long information and in the other set I have a complete list of ATMs of a large region with its lat/long info.
The generated attribute to the museum data set would be the amount of ATMs around each museum. Both data sets are rather large and I don't want to to calculate each combination in a cascade and then set a specific filter (which would probably take my whole life to complete). I am sure there is probably a much more convenient way but I don't see how.
Thx advance for any clues and tips.
Tagged:
0
Best Answers
-
BalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 UnicornHi!
In RapidMiner you can only go with the "compare all with all" route. It's actually not too slow on a modern computer. (Let's assume you have 1,000 museums and 10,000 ATMs - ten million comparisons are manageable.)
If you want to optimize the performance, install a PostgreSQL database with the PostGIS extension and process your data there. This is the fastest and probably best way for geographical processing of large datasets.
If you want to stay in RapidMiner, check out this blog entry and the linked entries that describe geoprocessing in RapidMiner.
https://datascientist.at/2016/05/improved-geo-joins-in-rapidminer/#english
Regards,
Balázs7 -
BalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 UnicornHi!
I just downloaded the process from https://datascientist.at/2016/06/generic-joins-in-rapidminer/ and saved it as "Advanced joining.rmp". Then I used File/Import process... to open it in Studio 9.2.001. It works flawlessly.
Please make sure that groovy-all-2.4.5.jar is not in it your lib folder anymore. Leave RapidMiner's newer groovy jar in place.
Regards,
Balázs5
Answers
I already figured that writing your own script might be the best solution. I have tried the cartesian join method, but the memory issue doesn't let me do that with the sheer size of my data sets.
The tools, you recommend to install in RM, are they still up to date for Version 9.2? (datascientist.at/2015/12/gis-in-rapidminer-1/)
As well as the other instructions to get the toolbox working for RM?
The scripting you've done in your example sounds promising for my own approach.
Regards
I finally installed your library selection in RM and tested your example script. Somehow one of the new jar files won't be loaded while launching RM (see below). I guess that is also the reason that your example script doesn't work when you run it.
Do you think this might be an issue with the compatibility of the current version or an installation bug?
I just wanted to make sure if this is a known issue before I start writing my own geoscripts.
Thx for the awesome documentation in your blog. Sry I am not yet allowed to upload any screenshots/links etc.
Launch:
Error Message in RM:
I just tried some of my example processes in the current Studio 9.2.1. Everything works. I get the GDAL native warning, too, but that doesn't affect my processes.
Your problem is probably different.
Could you try the Execute Script without any GeoTools related things? E. g. the first example here would do that.
https://datascientist.at/2016/06/generic-joins-in-rapidminer/
If this works, then the problem is with the GeoTools installation. If not, then you messed up your Groovy. (When I published my HOWTO it was necessary to update the Groovy lib to a newer version. However, nowadays RapidMiner ships a newer Groovy so overwriting that could be harmful.)
Regards,
Balázs
sry I was busy last week. Now I really want to solve this issue^^ I tried to load your generic join process. Unfortunately RM doesn't let me import the xml. It says invalid xml in the log file. Maybe it is a compatibility issue. The xml-process seems to be compatible with 7.1.001. I only have the option to change the compatiblity back to 6.2 inside the parameters field in an empty process. So I can't run this example.
This is the output in the console while to copy and paste in RM:
The other option with the groovy lib: I did install the newer version groovy-all-2.4.5.jar as mentioned in the installation. The latest groovy lib which was shipped in the original RM (9.2.001) installation was groovy-all-2.4.10.jar. But changing back and forth between both versions doesn't improve the outcome.
Regards
Biersepp