The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
"Spatial Clustering with RapidMiner"
WaggaWagga
Member Posts: 6 Contributor II
Hello,
for an analysis of POis, I would like to consider a spatial clustering use case. Normally, DBSCAN is suitable for using clustering use cases based on geo positions (lat, long). But what distance measure should be used? The Haversine Distance is not part of RaoidMiner. Is there also a possibility to use lat/long and nominal values for a clustering analysis?
In the RapidMiner forum they reference to external libraries:
http://rapid-i.com/rapidforum/index.php?topic=6888.0
Best
for an analysis of POis, I would like to consider a spatial clustering use case. Normally, DBSCAN is suitable for using clustering use cases based on geo positions (lat, long). But what distance measure should be used? The Haversine Distance is not part of RaoidMiner. Is there also a possibility to use lat/long and nominal values for a clustering analysis?
In the RapidMiner forum they reference to external libraries:
http://rapid-i.com/rapidforum/index.php?topic=6888.0
Best
Tagged:
0
Answers
i think there is not that much built in. But you might check this post by Tom: http://www.neuralmarkettrends.com/2015/11/04/Geo-Distance-In-RapidMiner-and-Python/
~Martin
Dortmund, Germany
Best
sorry but I can not comment on RapidMiner's Roadmap. This is in the end internal information.
I do not know of any ongoing community project. But maybe you are the one to start this :-)
Best.
Martin
Dortmund, Germany
If the geography is small so that the shape of Earth (ellipsoid) doesn't matter, you can transform your latitude-longitude coordinates into a meter-based projection. There are many open source tools for that (e. g. http://www.gdal.org/ogr2ogr.html). Just search for a projection that is usually used by cartographers in your area.
When you have meter-based coordinates, you can easily interpret Euclidian distances and they will be quite correct.
Beginning at the size of a country like Germany or France and also depending on the distance from the equator, latitude/longitude coordinates don't express true Earth distance.
You will get the best results if you use a geospatially enabled database like PostgreSQL with the PostGIS extension. You can then convert between coordinate reference systems/projections and even calculate exact distances.
I guess I have still a lack of experiences with RapidMiner to implement new RapidMiner functions....;-)
Best
thanks for your comments. The data corpus is based on European POIs (from Germany, France to UK). We are using PostgreSQL and PostGIS (e.g, the data type geometry), and I also found the function ST_ClusterIntersecting during my research. But I guess it is not a fully geo-spatial clustering algorithm. I have to admit, the documentation is very sparse.
Best
Just do a self join the datasets (select ... from data d1 cross join data d2) and calculate the distance between the geometries: ST_Distance(d1.geom, d2.geom) (assuming that you're using a meter based projection/CRS).
Or even better, using Geography instead of Geometry based calculation (slower, but more precise):
ST_Distance(ST_Transform(d1.geo, 4326)::geography, ST_Transform(d2.geo, 4326)::geography) as distance
may be it is interesting for you. I have found the ELKI library. It is a result of a research project by the LMU Munich. It is Java based, but I recommend to use the frontend. For the source code, one weak point is the missing/sparse documentation.
http://elki.dbs.ifi.lmu.de/
ELKI contains 5 different variations of the OPTICS algorithm and a wide list of distance metrics. For the geo-spatial analysis with OPTICS, they provide a latlong-distance metric.
All the best
WaggaWagga