Samples / Help for Location Process

AlexO · May 2016

Hello,

I have the following task to explore:
We want to predict the position of a WiFi Client in a certain room. We have the positions of the Access-Points and the RSSI-Values (WLAN field strength).

I watched the tutorials 1-8 on youtube and tested with the decision tree in the Studio 7.1. I am a beginner in datamining and it is hard for me to rate if this is the right way. ???

Has anybody samples for the given task, or for similar tasks?
Is the "decision tree" the right process in the Studio to get a good result?

Thank you!

Regards

AlexO

MartinLiebig · May 2016

Alex,

thanks for trying out RapidMiner! I think there are some ways you can get better.

First of all most problems of data science are about representation of the data. How does you table look like? I assume you have something like:

Truth WIFI-Strength1, WIFI-Strength2, WIFI-Strength3

etc? Thinking about a useful representation is key.

My peronal feeling (if you have some similar representation) is, that a different model might be better. My feeling says that a Logistic Regression or SVM in a Polynominal by Binominal Classification operator might make sense.

Can you please tell us a bit more about the structure of your data?

Best,
Martin

JEdward · May 2016

There is an academic paper somewhere that used RapidMiner to calculate the location of people using wifi signal strengh. I'm not sure where it is though.

Try search google scholar for RapidMiner + wifi or signal strength that should give you some pointers.

AlexO · May 2016

Hello JEdward,

unfortunately I could not find this paper. Thank you anyway.

Alex

AlexO · May 2016

Hi Martin,

thanks for bolster me up. The question for the data is answered fast: I am free! I could define the data which I need.
What I will/should have is:
- The count of Access-Points (e.g. 5). Data 1 .. n
- The borders of the room I have to predict (e.g. a quad of 50x50 meters).
- "Learning data" (I am not sure how the position should be represented...)
--> I want to teach the System before any prediction
- RSSI (field strength) + Position for the Learning data
- RSSI (filed strength) without Position for the prediction

That's it.

I will be glad about freedback.

Regards
Alex

BalazsBarany · May 2016

What will be your target variable? The room the device is in, or actual X/Y coordinates? In the first case it's a classification problem, in the second it's regression. (Which could be used for classification if you have a "map" of the building - then you can calculate the room from the predicted coordinates).

You'll probably make measurements on defined points of the building and record the coordinates or the room identifier as the target variable (label). Then you can build models from this data and apply them to new data.

You'll have a variable number of RSSIs. This is usually not easy to express in RapidMiner. So you'll probably filter for the top 3 or 5 signals and use the Pivot operator to transform the dataset so it only has one record per reading.

AlexO · May 2016

the target will be coordinates. Coordinates could be X/Y or Geodata. With both you can get a resolution of 1 m.

The variable number of RSSI's is by design. There a many effects which can change the RSSI...

So is Rapidminer the wrong projection??

BalazsBarany · May 2016

No, that's not what I meant. RapidMiner is of course a good solution for this problem. You just have to be smart when preparing the data.

Models just need to have a fixed attribute schema (in each product). They can't work with non-tabular data. Many algorithms also can't work with missing data (this is again conceptual, not a RapidMiner limitation).

Some possible solutions:

- If you have a fixed number of stations installed, your table could be like this:

Measurement ID; Position; Station1; Station2; ... StationN

If no signal strength of Station5 is available, you just put 0 into it.

RapidMiner can work well with a huge number of attributes, and the structure can be automatically created e. g. with the Pivot operator.

- If the number of stations is not fixed and higher than you'd like to express in the previous data structure, you could go with this:

Measurement ID; Position; Top1StationID; Top1StationStrength; Top2StationID; Top2StationStrength; ... as long as it makes sense.

Your ultimate requirement is to express each "example" (measurement, position) in one row in a tabular data structure. That's it.

I would guess that the first representation is easier to work with and it's also better suited for most modeling algorithms.

AlexO · May 2016

Thank you Balázs

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Samples / Help for Location Process

Answers