Clustering of GPS coordinate data?

CausalityvsCorr · May 2017

I have dataset which contains only a set of coordinate id's (the name of a building) and their latitude and longitude coordinates. The dataset has three rows, the name (Coordinate ID) of attribute and then latitude and finally below it the longitude data.

I need to cluster the ID's based on their mutual information so that one cluster consists of ID's which are near each others.

=>

How to pre-process the data?

Does Rapidminer have a proper algorithm for this task?

Thomas_Ott · May 2017

Hi, to use Geo and GIS functions in RapidMiner you;ll have to hack some Groovy. Theres a great thread here: http://community.rapidminer.com/t5/RapidMiner-Studio-Forum/Geographic-operations-in-RapidMiner/m-p/25118

Telcontar120 · May 2017

If the attributes are on different rows, you probably also need to pivot your data so the lat and long for each building is all on the same row. That's the way you'll want it to be for the clustering.

CausalityvsCorr · May 2017

Thank you very much of the feedback.

For Thomas: Do I really need all this extra packages, as for example I do not need to show anythig on the map?

For Telcontar 120: Yes, currently the dataset is in form of row 1: building ID, row 2 lat, row 3 long. What exactly means "to have all in one row"? And, if in one row, how would it be possible to use clustering such as k-means when # of rows is just one?

BalazsBarany · May 2017

Hi,

the representation you want to use in RapidMiner is this (CSV example):

Building ID;latitude;longitude

123;22.524;19.4904

etc. This means that each example is a row.

You describe your representation like this:

123

22.524

19.49

If this is actually the case, you need to preprocess your data to get the first (tabular) form.

About the need for converting coordinates: this depends on the geographical area. In an ideal world, one degree of latitude and one degree of longitude would represent the same distance. This is almost true around the equator but spectacularly wrong in the northern and southern regions. The problem is that objects being one degree of latitude apart don't have the same distance between them as objects one degree of longitude apart. If you have an actual globe, you can easily see why (the coordinate lines are not squares but trapezoids).

So the correct, applicable-to-every-situation way is to convert your coordinates into a representation (projection or CRS = coordinate reference system) that is defined for the area you're applying your process in. Doing this conversion is possible in RapidMiner with the mentioned Groovy scripts (or ready-to use processes), but you need to install the Geoscript libraries for it to work. If you don't need to do this often, you might want to transform the coordinates in QGIS (graphical) or with a command line program like ogr2ogr.

That said, you might be in a region where the difference between latitude and longitude distance is negligible, or it wouldn't harm the operation you're applying. For example, some clustering methods would be less affected than others. (You might want to Normalize your lat/long data for some clustering algorithms.)

Regards,

Balázs

Thomas_Ott · May 2017

Hi CvC, you'll need those libraries if you want to do distance calculations based on lat/long and other geo calculations.

Telcontar120 · May 2017

Clustering works on attributes, not examples. So based on your description, you need to take the data from the following structure:

id1 Building #1

id1 Lat #1

id1 Long #1

id2 Building #2

id2 Lat #2

id2 Long #2

To this:

id Building Lat Long

So each building is its own row with an associated latitude and longitude. You can then cluster on latitude and longitude to find the buildings that are closest to each other. Don't forget to normalize your numerical data too before clustering!

CausalityvsCorr · May 2017

Thank you about excellent feedback and multiple viewpoints. I think I can now manage my stuff properly

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Clustering of GPS coordinate data?

Answers