Parsing Latitudes and longitudes

ShouryaShourya Member Posts: 5 Newbie
Hello,
I am using rapidminer studio 9.6.000 on ubuntu 18.04.
I have latitude and longitudes values expressed as e.g. 50.833.333, where 50 is the degree, 833 is minutes and 333 is seconds. By default the values are loading as nominal. Not being able to use format numbers or parse numbers.
Can anyone please help me understand how I can use this values to plot maps in visualization?

Best Answer

Answers

  • jacobcybulskijacobcybulski Member, University Professor Posts: 391 Unicorn
    edited April 2020
    I am not aware of a general function to achieve conversion of map coordinates. However, you could parse your nominal lat and long values using regular expressions into their three components, i.e. degrees, minutes and seconds and then generate a new attribute with a formula:
    decimal_degrees = degrees + (minutes/60) + (seconds/3600)
    Alternatively you can convert it in R or Python via the scripting extension.
    Jacob
    P.S. By the way, I'd expect some additional information in your coordinates, i.e. E,W,N or S? Or at least an optional sign in front of your coordinates? Also are minutes a sufficient precision on the map, I'd assume some additional decimal points at the end, e.g. 37.58.01.8.S,145.25.02.5.E.


  • jacobcybulskijacobcybulski Member, University Professor Posts: 391 Unicorn
    Having said this, you really need to be careful while converting map coordinates as rounding errors will move your points by 20-30 meters in real terms. You need to convert the coordinates in double precision. It is probably best to convert your geo-locations outside RapidMiner using a specialist package. On Linux there is this package called GeoConvert, I have never used it but if you are going to put some places back on maps, it will pay to spend some time doing it right.
  • ShouryaShourya Member Posts: 5 Newbie
    Making the file available as well. I also have numbers like 50.23 or 50.2. I am not being able to frame a regex to catch them all. so we have 6 formats to catch:

    1. 50.333.333
    2. 50.2
    3. 50.333
    4. 50.33
    5. 4.333.333
    6. 433.333

  • jacobcybulskijacobcybulski Member, University Professor Posts: 391 Unicorn
    edited April 2020
    I attach an example with formats which I think should be there, but you are welcome to adapt it to your needs, e.g. when the minutes or seconds are missing you can include an "if" statement matching the pattern and if the pattern fails, insert a zero for these missing bits (the way I dealt with W, E, S and N). Check this out (you will need to save this XML as a RMP file into your repository).
    I made a correction to the millisecond translation.
    <?xml version="1.0" encoding="UTF-8"?><process version="9.6.000">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.6.000" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="-1"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" class="utility:create_exampleset" compatibility="9.6.000" expanded="true" height="68" name="Create ExampleSet" width="90" x="45" y="34">
            <parameter key="generator_type" value="comma separated text"/>
            <parameter key="number_of_examples" value="100"/>
            <parameter key="use_stepsize" value="false"/>
            <list key="function_descriptions"/>
            <parameter key="add_id_attribute" value="false"/>
            <list key="numeric_series_configuration"/>
            <list key="date_series_configuration"/>
            <list key="date_series_configuration (interval)"/>
            <parameter key="date_format" value="yyyy-MM-dd HH:mm:ss"/>
            <parameter key="time_zone" value="SYSTEM"/>
            <parameter key="input_csv_text" value="id,place,long,lat&#10;1,Cardinia Reservoir Emerald VIC 3782,37.58.01.8.S,145.25.02.5.E&#10;2,French Island VIC 3921,38.20.48.5.S,145.20.56.2.E&#10;3,1020 Studewood St Houston TX 77008 United States,29.47.22.1.N,95.23.15.3.W"/>
            <parameter key="column_separator" value=","/>
            <parameter key="parse_all_as_nominal" value="false"/>
            <parameter key="decimal_point_character" value="."/>
            <parameter key="trim_attribute_names" value="true"/>
          </operator>
          <operator activated="true" class="generate_attributes" compatibility="9.6.000" expanded="true" height="82" name="Generate Attributes" width="90" x="179" y="34">
            <list key="function_descriptions">
              <parameter key="dd_long_deg" value="parse(replaceAll(long,&quot;^([0-9]+)\\..*$&quot;,&quot;$1&quot;))"/>
              <parameter key="dd_long_min" value="parse(replaceAll(long,&quot;^[0-9]+\\.([0-9]+)\\..*$&quot;,&quot;$1&quot;))"/>
              <parameter key="dd_long_sec" value="parse(replaceAll(long,&quot;^[0-9]+\\.[0-9]+\\.([0-9]+)\\..*$&quot;,&quot;$1&quot;))"/>
              <parameter key="dd_long_msec" value="parse(replaceAll(long,&quot;^[0-9]+\\.[0-9]+\\.[0-9]+\\.([0-9]+)\\..*$&quot;,&quot;$1&quot;))"/>
              <parameter key="dd_long_sign" value="if(matches(long,&quot;^[0-9]+\\.[0-9]+\\.[0-9]+\\.([0-9]+)\\.S.*$&quot;),-1,1)"/>
              <parameter key="dd_long" value="dd_long_sign*(dd_long_deg + (dd_long_min/60.0) + (dd_long_sec/3600.0)) + (dd_long_msec/1000000.0)"/>
              <parameter key="dd_lat_deg" value="parse(replaceAll(lat,&quot;^([0-9]+)\\..*$&quot;,&quot;$1&quot;))"/>
              <parameter key="dd_lat_min" value="parse(replaceAll(lat,&quot;^[0-9]+\\.([0-9]+)\\..*$&quot;,&quot;$1&quot;))"/>
              <parameter key="dd_lat_sec" value="parse(replaceAll(lat,&quot;^[0-9]+\\.[0-9]+\\.([0-9]+)\\..*$&quot;,&quot;$1&quot;))"/>
              <parameter key="dd_lat_msec" value="parse(replaceAll(lat,&quot;^[0-9]+\\.[0-9]+\\.[0-9]+\\.([0-9]+)\\..*$&quot;,&quot;$1&quot;))"/>
              <parameter key="dd_lat_sign" value="if(matches(lat,&quot;^[0-9]+\\.[0-9]+\\.[0-9]+\\.([0-9]+)\\.W.*$&quot;),-1,1)"/>
              <parameter key="dd_lat" value="dd_long_sign*(dd_lat_deg + (dd_lat_min/60.0) + (dd_lat_sec/3600.0)) + (dd_lat_msec/1000000.0)"/>
            </list>
            <parameter key="keep_all" value="true"/>
          </operator>
          <operator activated="true" class="generate_attributes" compatibility="9.6.000" expanded="true" height="82" name="Check Precision" width="90" x="313" y="34">
            <list key="function_descriptions">
              <parameter key="dd_long_x" value="dd_long*1000"/>
              <parameter key="dd_lat_x" value="dd_lat*1000"/>
            </list>
            <parameter key="keep_all" value="true"/>
          </operator>
          <connect from_op="Create ExampleSet" from_port="output" to_op="Generate Attributes" to_port="example set input"/>
          <connect from_op="Generate Attributes" from_port="example set output" to_op="Check Precision" to_port="example set input"/>
          <connect from_op="Check Precision" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>

    Also note that if you incorporate milliseconds, RapidMiner seems to be losing precision, which will translate into meters of difference.

    Jacob
  • jacobcybulskijacobcybulski Member, University Professor Posts: 391 Unicorn
    edited April 2020
    BTW, without information about East, West, North and South, your translation will be incorrect for southern or northern coordinates. So if the coordinates you received are very specific to the region, e.g. some place in USA, you will have to assume appropriate E/W, N/S location.
  • jacobcybulskijacobcybulski Member, University Professor Posts: 391 Unicorn
    I have just discovered RapidMiner has a GeoProcessing extension, which I have never used as all my coordinates processing was always done externally to RapidMiner (in R or Python). However, it may do the trick?
  • BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 Unicorn
    Hi,

    I'm the author of the GeoProcessing extension, feel free to ask me. It doesn't have an operator or conversion for this format, though.

    Regards,

    Balázs
Sign In or Register to comment.