how to find for each object all containing objects within 3km radius using attributes such as lon la

Biersepp · May 2019

Hello I am new to Rapidminer and I am clueless how I can tackle my problem,
as a training for RM I would like to find the amount of examples in a data set which are within a 5km radius around an example of a different data set. I found the haversine formula to calculate geographical distances and I also found a aggregate function for orthodromic gis calculations.
For example: I would like to find out how many ATMs are around a museum. In one data set I have a list with museums and its lat/long information and in the other set I have a complete list of ATMs of a large region with its lat/long info.
The generated attribute to the museum data set would be the amount of ATMs around each museum. Both data sets are rather large and I don't want to to calculate each combination in a cascade and then set a specific filter (which would probably take my whole life to complete). I am sure there is probably a much more convenient way but I don't see how.

Thx advance for any clues and tips.

BalazsBarany · May 2019

Hi!

In RapidMiner you can only go with the "compare all with all" route. It's actually not too slow on a modern computer. (Let's assume you have 1,000 museums and 10,000 ATMs - ten million comparisons are manageable.)

If you want to optimize the performance, install a PostgreSQL database with the PostGIS extension and process your data there. This is the fastest and probably best way for geographical processing of large datasets.

If you want to stay in RapidMiner, check out this blog entry and the linked entries that describe geoprocessing in RapidMiner.
https://datascientist.at/2016/05/improved-geo-joins-in-rapidminer/#english

Regards,

Balázs

BalazsBarany · May 2019

Hi!

I just downloaded the process from https://datascientist.at/2016/06/generic-joins-in-rapidminer/ and saved it as "Advanced joining.rmp". Then I used File/Import process... to open it in Studio 9.2.001. It works flawlessly.

Please make sure that groovy-all-2.4.5.jar is not in it your lib folder anymore. Leave RapidMiner's newer groovy jar in place.

Regards,
Balázs

Biersepp · May 2019

Thank you so far Balázs!
I already figured that writing your own script might be the best solution. I have tried the cartesian join method, but the memory issue doesn't let me do that with the sheer size of my data sets.

The tools, you recommend to install in RM, are they still up to date for Version 9.2? (datascientist.at/2015/12/gis-in-rapidminer-1/)
As well as the other instructions to get the toolbox working for RM?
The scripting you've done in your example sounds promising for my own approach.

Regards

BalazsBarany · May 2019

I'm trying them with current versions from time to time, no problems until now. But I didn't test it yet with 9.2.1.

Biersepp · May 2019

Hello Balazs,
I finally installed your library selection in RM and tested your example script. Somehow one of the new jar files won't be loaded while launching RM (see below). I guess that is also the reason that your example script doesn't work when you run it.
Do you think this might be an issue with the compatibility of the current version or an installation bug?
I just wanted to make sure if this is a known issue before I start writing my own geoscripts.
Thx for the awesome documentation in your blog. Sry I am not yet allowed to upload any screenshots/links etc.

Launch:

Mai 13, 2019 4:03:59 PM com.rapidminer.gui.RapidMinerGUI run

INFORMATION: Launching RapidMiner 9.2.001, platform WIN64

Mai 13, 2019 4:03:59 PM it.geosolutions.imageio.gdalframework.GDALUtilities loadGDAL

WARNiNG: Failed to load the GDAL native libs. This is not a problem unless you need to use the GDAL plugins: they won't be enabled.

java.lang.UnsatisfiedLinkError: no gdaljni in java.library.path

Error Message in RM:

May 13, 2019 4:20:30 PM com.rapidminer.gui.ProcessThread run

SEVERE: Process failed: The scripting engine Groovy reported an error in the script: java.lang.NullPointerException.

com.rapidminer.operator.UserError: The scripting engine Groovy reported an error in the script: java.lang.NullPointerException.

at com.rapidminer.operator.ScriptingOperator.doWork(ScriptingOperator.java:264)

at com.rapidminer.operator.Operator.execute(Operator.java:1026)

at com.rapidminer.operator.execution.SimpleUnitExecutor.execute(SimpleUnitExecutor.java:77)

at com.rapidminer.operator.ExecutionUnit$2.run(ExecutionUnit.java:812)

at com.rapidminer.operator.ExecutionUnit$2.run(ExecutionUnit.java:807)

at java.security.AccessController.doPrivileged(Native Method)

at com.rapidminer.operator.ExecutionUnit.execute(ExecutionUnit.java:807)

at com.rapidminer.operator.OperatorChain.doWork(OperatorChain.java:423)

at com.rapidminer.operator.Operator.execute(Operator.java:1026)

at com.rapidminer.Process.executeRoot(Process.java:1386)

at com.rapidminer.Process.execute(Process.java:1327)

at com.rapidminer.Process.run(Process.java:1300)

at com.rapidminer.Process.run(Process.java:1186)

at com.rapidminer.Process.run(Process.java:1139)

at com.rapidminer.Process.run(Process.java:1134)

at com.rapidminer.Process.run(Process.java:1124)

at com.rapidminer.gui.ProcessThread.run(ProcessThread.java:65)

Caused by: java.lang.NullPointerException

at com.rapidminer.example.Example.getNominalValue(Example.java:97)

at com.rapidminer.example.Example$getNominalValue.call(Unknown Source)

at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:48)

at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:113)

at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:125)

at customScript.run(customScript:67)

at com.rapidminer.operator.ScriptingOperator.doWork(ScriptingOperator.java:254)

... 16 more

BalazsBarany · May 2019

Hi,

I just tried some of my example processes in the current Studio 9.2.1. Everything works. I get the GDAL native warning, too, but that doesn't affect my processes.

Your problem is probably different.
Could you try the Execute Script without any GeoTools related things? E. g. the first example here would do that.
https://datascientist.at/2016/06/generic-joins-in-rapidminer/

If this works, then the problem is with the GeoTools installation. If not, then you messed up your Groovy. (When I published my HOWTO it was necessary to update the Groovy lib to a newer version. However, nowadays RapidMiner ships a newer Groovy so overwriting that could be harmful.)

Regards,
Balázs

Biersepp · May 2019

Hi,

sry I was busy last week. Now I really want to solve this issue^^ I tried to load your generic join process. Unfortunately RM doesn't let me import the xml. It says invalid xml in the log file. Maybe it is a compatibility issue. The xml-process seems to be compatible with 7.1.001. I only have the option to change the compatiblity back to 6.2 inside the parameters field in an empty process. So I can't run this example.

This is the output in the console while to copy and paste in RM:

[Fatal Error] :34:34: Element type "parameter" must be followed by either attribute specifications, ">" or "/>".

May 21, 2019 2:03:53 PM com.rapidminer.gui.dnd.ReceivingOperatorTransferHandler importData

WARNING: Invalid process xml!

The other option with the groovy lib: I did install the newer version groovy-all-2.4.5.jar as mentioned in the installation. The latest groovy lib which was shipped in the original RM (9.2.001) installation was groovy-all-2.4.10.jar. But changing back and forth between both versions doesn't improve the outcome.

Regards

Biersepp

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

how to find for each object all containing objects within 3km radius using attributes such as lon la

Best Answers

Answers