The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Clustering of GPS coordinate data?
CausalityvsCorr
Member Posts: 17 Contributor II
I have dataset which contains only a set of coordinate id's (the name of a building) and their latitude and longitude coordinates. The dataset has three rows, the name (Coordinate ID) of attribute and then latitude and finally below it the longitude data.
I need to cluster the ID's based on their mutual information so that one cluster consists of ID's which are near each others.
=>
How to pre-process the data?
Does Rapidminer have a proper algorithm for this task?
Tagged:
0
Answers
Hi, to use Geo and GIS functions in RapidMiner you;ll have to hack some Groovy. Theres a great thread here: http://community.rapidminer.com/t5/RapidMiner-Studio-Forum/Geographic-operations-in-RapidMiner/m-p/25118
If the attributes are on different rows, you probably also need to pivot your data so the lat and long for each building is all on the same row. That's the way you'll want it to be for the clustering.
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts
Thank you very much of the feedback.
For Thomas: Do I really need all this extra packages, as for example I do not need to show anythig on the map?
For Telcontar 120: Yes, currently the dataset is in form of row 1: building ID, row 2 lat, row 3 long. What exactly means "to have all in one row"? And, if in one row, how would it be possible to use clustering such as k-means when # of rows is just one?
Hi,
the representation you want to use in RapidMiner is this (CSV example):
Building ID;latitude;longitude
123;22.524;19.4904
etc. This means that each example is a row.
You describe your representation like this:
123
22.524
19.49
If this is actually the case, you need to preprocess your data to get the first (tabular) form.
About the need for converting coordinates: this depends on the geographical area. In an ideal world, one degree of latitude and one degree of longitude would represent the same distance. This is almost true around the equator but spectacularly wrong in the northern and southern regions. The problem is that objects being one degree of latitude apart don't have the same distance between them as objects one degree of longitude apart. If you have an actual globe, you can easily see why (the coordinate lines are not squares but trapezoids).
So the correct, applicable-to-every-situation way is to convert your coordinates into a representation (projection or CRS = coordinate reference system) that is defined for the area you're applying your process in. Doing this conversion is possible in RapidMiner with the mentioned Groovy scripts (or ready-to use processes), but you need to install the Geoscript libraries for it to work. If you don't need to do this often, you might want to transform the coordinates in QGIS (graphical) or with a command line program like ogr2ogr.
That said, you might be in a region where the difference between latitude and longitude distance is negligible, or it wouldn't harm the operation you're applying. For example, some clustering methods would be less affected than others. (You might want to Normalize your lat/long data for some clustering algorithms.)
Regards,
Balázs
Hi CvC, you'll need those libraries if you want to do distance calculations based on lat/long and other geo calculations.
Clustering works on attributes, not examples. So based on your description, you need to take the data from the following structure:
id1 Building #1
id1 Lat #1
id1 Long #1
id2 Building #2
id2 Lat #2
id2 Long #2
To this:
id Building Lat Long
So each building is its own row with an associated latitude and longitude. You can then cluster on latitude and longitude to find the buildings that are closest to each other. Don't forget to normalize your numerical data too before clustering!
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts
Thank you about excellent feedback and multiple viewpoints. I think I can now manage my stuff properly