RapidMiner 9.8 Beta is now available

Be one of the first to get your hands on the new features. More details and downloads here:

GET RAPIDMINER 9.8 BETA

Syntax error: I am getting ? while extracting data using XPATH

ShahzadShahzad Member Posts: 4 Newbie
edited December 2018 in Help
Hi, I am trying to extract some data from donedeal.ie website. But I am getting ? instead of values. I am not sure if my syntax is correct or not.

I have extracted XPATH using google chrome. Right-click and inspect the element and copy the Xpath. For example, I have extracted following following Xpath 
/html/body/main/div/div[1]/div/div[2]/div[2]/div[3]/div[1]/div/div[1]/div/h1

I have used h: before div and html but didnt help

Can you please help?

Regards
/Shahzad

Answers

  • sgenzersgenzer 12Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,954  Community Manager
    hi @Shahzad can you please post your XML?

    Scott

  • ShahzadShahzad Member Posts: 4 Newbie
    edited November 2018
    Hello Scott

    XML is pasted below. I have two processes Adverts Process and Donedeal Process. In Adverts process I am not able to fetch "Year" rest all other attributes are OK.

    From Donedeal process, i cant fetch any attribute from the web page. Any help will be helpful.

    Regards
    /Shahzad 
  • sgenzersgenzer 12Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,954  Community Manager
    hi @Shahzad so for some weird reason your .txt file has no <> symbols in it - hence impossible to paste into RapidMiner. Can you please just insert the XML into this thread by using the ¶ and then choose "Code"?

    Thank you.

    Scott

  • ShahzadShahzad Member Posts: 4 Newbie
    Hello Scott

    I have tried to paste the code but web page is not allowing me to post the comment. I have attached file including xml tag. Hope that will help.

    Regards
    /Shahzad
  • sgenzersgenzer 12Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,954  Community Manager
    hello @Shahzad so thank you for this. Some thoughts...

    - For Adverts, if you want the year of the car why not just create a new attribute which is the prefix of your Vehicle Name or Description fields which have that information? As years are always in the beginning and four digits, you could simply do this:



    - For Donedeal, the issue is that your information is in JSON format, not XML. Just use the Json path option instead of XPath in your Extract Information operator:



    If you're not familiar with JSONPath, this is always my go-to resource: https://goessner.net/articles/JsonPath/

    Scott

  • kaymankayman Member Posts: 510   Unicorn
    http://jsonpath.com/  is an easy to use online tool to test your json path.
    Combined with Scott's link it saved me a lot of time already
    sgenzer
  • ShahzadShahzad Member Posts: 4 Newbie
    Thanks for update guys. In few cases year is not the part of the Vehicle name. Hence JSON wont work. I have used cut operator to extract year from Vehicle name but as mentioned if year is not mentioned in Vehicle title then I am back to square one :(

    I am not sure if the website is badly designed or information in GRID cannot be accessible via XPath.

    Regards
    /Shahzad
Sign In or Register to comment.