Samples / Help for Location Process

AlexOAlexO Member Posts: 5 Contributor II
edited November 2018 in Help

I have the following task to explore:
We want to predict the position of a WiFi Client in a certain room. We have the positions of the Access-Points and the RSSI-Values (WLAN field strength).

I watched the tutorials 1-8 on youtube and tested with the decision tree in the Studio 7.1. I am a beginner in datamining and it is hard for me to rate if this is the right way.  ???

Has anybody samples for the given task, or for similar tasks?
Is the "decision tree" the right process in the Studio to get a good result?

Thank you!





  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,517 RM Data Scientist

    thanks for trying out RapidMiner! I think there are some ways you can get better.

    First of all most problems of data science are about representation of the data. How does you table look like? I assume you have something like:

    Truth WIFI-Strength1, WIFI-Strength2, WIFI-Strength3

    etc? Thinking about a useful representation is key.

    My peronal feeling (if you have some similar representation) is, that a different model might be better. My feeling says that a Logistic Regression or SVM in a Polynominal by Binominal Classification operator might make sense.

    Can you please tell us a bit more about the structure of your data?

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • Options
    JEdwardJEdward RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 578 Unicorn
    There is an academic paper somewhere that used RapidMiner to calculate the location of people using wifi signal strengh.  I'm not sure where it is though. 

    Try search google scholar for RapidMiner + wifi or signal strength that should give you some pointers.
  • Options
    AlexOAlexO Member Posts: 5 Contributor II
    Hello JEdward,

    unfortunately I could not find this paper. Thank you anyway.

  • Options
    AlexOAlexO Member Posts: 5 Contributor II
    Hi Martin,

    thanks for bolster me up. The question for the data is answered fast: I am free! I could define the data which I need.
    What I will/should have is:
    - The count of Access-Points (e.g. 5).  Data 1 .. n
    - The borders of the room I have to  predict (e.g. a quad of 50x50 meters).
    - "Learning data" (I am not sure how the position should be represented...)
      --> I want to teach the System before any prediction
    - RSSI (field strength) + Position for the Learning data
    - RSSI (filed strength) without Position for the prediction

    That's it.

    I will be glad about freedback.

  • Options
    BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 Unicorn
    What will be your target variable? The room the device is in, or actual X/Y coordinates? In the first case it's a classification problem, in the second it's regression. (Which could be used for classification if you have a "map" of the building - then you can calculate the room from the predicted coordinates).

    You'll probably make measurements on defined points of the building and record the coordinates or the room identifier as the target variable (label). Then you can build models from this data and apply them to new data.

    You'll have a variable number of RSSIs. This is usually not easy to express in RapidMiner. So you'll probably filter for the top 3 or 5 signals and use the Pivot operator to transform the dataset so it only has one record per reading.
  • Options
    AlexOAlexO Member Posts: 5 Contributor II
    the target will be coordinates. Coordinates could be X/Y or Geodata. With both you can get a resolution of 1 m.

    The variable number of RSSI's is by design. There a many effects which can change the RSSI...

    So is Rapidminer the wrong projection??

  • Options
    BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 Unicorn
    No, that's not what I meant. RapidMiner is of course a good solution for this problem. You just have to be smart when preparing the data.

    Models just need to have a fixed attribute schema (in each product). They can't work with non-tabular data. Many algorithms also can't work with missing data (this is again conceptual, not a RapidMiner limitation).

    Some possible solutions:

    - If you have a fixed number of stations installed, your table could be like this:

    Measurement ID; Position; Station1; Station2; ... StationN

    If no signal strength of Station5 is available, you just put 0 into it.

    RapidMiner can work well with a huge number of attributes, and the structure can be automatically created e. g. with the Pivot operator.

    - If the number of stations is not fixed and higher than you'd like to express in the previous data structure, you could go with this:

    Measurement ID; Position; Top1StationID; Top1StationStrength; Top2StationID; Top2StationStrength; ... as long as it makes sense.

    Your ultimate requirement is to express each "example" (measurement, position) in one row in a tabular data structure. That's it.

    I would guess that the first representation is easier to work with and it's also better suited for most modeling algorithms.
  • Options
    AlexOAlexO Member Posts: 5 Contributor II
    Thank you Balázs
Sign In or Register to comment.