Options

# "Spatial Clustering with RapidMiner"

WaggaWagga
Member Posts:

**6**Contributor II
Hello,

for an analysis of POis, I would like to consider a spatial clustering use case. Normally, DBSCAN is suitable for using clustering use cases based on geo positions (lat, long). But what distance measure should be used? The Haversine Distance is not part of RaoidMiner. Is there also a possibility to use lat/long and nominal values for a clustering analysis?

In the RapidMiner forum they reference to external libraries:

http://rapid-i.com/rapidforum/index.php?topic=6888.0

Best

for an analysis of POis, I would like to consider a spatial clustering use case. Normally, DBSCAN is suitable for using clustering use cases based on geo positions (lat, long). But what distance measure should be used? The Haversine Distance is not part of RaoidMiner. Is there also a possibility to use lat/long and nominal values for a clustering analysis?

In the RapidMiner forum they reference to external libraries:

http://rapid-i.com/rapidforum/index.php?topic=6888.0

Best

Tagged:

0

## Answers

3,507RM Data Scientisti think there is not that much built in. But you might check this post by Tom: http://www.neuralmarkettrends.com/2015/11/04/Geo-Distance-In-RapidMiner-and-Python/

~Martin

Dortmund, Germany

6Contributor IIBest

3,507RM Data Scientistsorry but I can not comment on RapidMiner's Roadmap. This is in the end internal information.

I do not know of any ongoing community project. But maybe you are the one to start this :-)

Best.

Martin

Dortmund, Germany

955UnicornIf the geography is small so that the shape of Earth (ellipsoid) doesn't matter, you can transform your latitude-longitude coordinates into a meter-based projection. There are many open source tools for that (e. g. http://www.gdal.org/ogr2ogr.html). Just search for a projection that is usually used by cartographers in your area.

When you have meter-based coordinates, you can easily interpret Euclidian distances and they will be quite correct.

Beginning at the size of a country like Germany or France and also depending on the distance from the equator, latitude/longitude coordinates don't express true Earth distance.

You will get the best results if you use a geospatially enabled database like PostgreSQL with the PostGIS extension. You can then convert between coordinate reference systems/projections and even calculate exact distances.

6Contributor III guess I have still a lack of experiences with RapidMiner to implement new RapidMiner functions....;-)

Best

6Contributor IIthanks for your comments. The data corpus is based on European POIs (from Germany, France to UK). We are using PostgreSQL and PostGIS (e.g, the data type geometry), and I also found the function ST_ClusterIntersecting during my research. But I guess it is not a fully geo-spatial clustering algorithm. I have to admit, the documentation is very sparse.

Best

955UnicornJust do a self join the datasets (select ... from data d1 cross join data d2) and calculate the distance between the geometries: ST_Distance(d1.geom, d2.geom) (assuming that you're using a meter based projection/CRS).

Or even better, using Geography instead of Geometry based calculation (slower, but more precise):

ST_Distance(ST_Transform(d1.geo, 4326)::geography, ST_Transform(d2.geo, 4326)::geography) as distance

6Contributor IImay be it is interesting for you. I have found the ELKI library. It is a result of a research project by the LMU Munich. It is Java based, but I recommend to use the frontend. For the source code, one weak point is the missing/sparse documentation.

http://elki.dbs.ifi.lmu.de/

ELKI contains 5 different variations of the OPTICS algorithm and a wide list of distance metrics. For the geo-spatial analysis with OPTICS, they provide a latlong-distance metric.

All the best

WaggaWagga