Options

[SOLVED] save/export data to excel file (after crawling and extr. info w/ xpath)

PASEEPASEE Member Posts: 5 Contributor II
edited November 2018 in Help
hey folks,

im working on a project where i want to download data from a website (crawl), filter this data using xpath, (maybe select some of the attributes) and save the extracted attributes to a file (e.g. excel).

what im currently doing:
1. crawl the web (using either "crawl web")
result of this step: various html files in a given directory

2. load the files and extract information with xpath (using "process documents from files" and "extract information")
result of this step: exampleSet with some metadata (file, path, date) and my xpath attributes, e.g. "title", "header1", etc..

so far, everything seems to work fine.

what's not working yet:
(just as a maybe: 3. select certain attributes (operator "select attributes" only lists the metadata attributes, but not my xpath attributes))
4. save to excel-file (i have absolutely no idea how this can be achieved)

this shouldnt be too difficult but after watching all vancouverdata/neuralmarkettrends/rapid-i tutorials i havent found out how to do it. ive tried a lot of operators and searched google and this forum but there seems to be no documentation giving answer to this question.

thanks in advance for your help!

best regards,
PASEE

Answers

  • Options
    MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hey PASEE,

    In Select Attributes you can type the attribute names by hand, the drop-down list is only a help for the user in a best-effort approach.

    You can write excel files with Write Excel.

    Best, Marius
  • Options
    PASEEPASEE Member Posts: 5 Contributor II
    thanks, that helped. although im surprised i didnt try this.. I think i had the same problem as someone else.

    I was wondering about the "buffered file" message at the end, and there was no way to define an output file (thus i couldnt find any). this could be solved by removing the "fil"-connection to the result connector.

    selecting the attributes worked as well, but i think its not intuitive.. i wasnt expecting it to work, since my attributes were not listed. (just as a remark for future developments)

    thanks again and best regards!
  • Options
    MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hi,

    the case of the "fil" output is a bit counterintuitive, and in the next release at least it won't get autoconnected anymore.
    For the attributes not being displayed it is technically not always possible to list all attributes without actually executing the process, because they depend not only on the structure of the input data (and thus can be calculated fast), but also on the actual contents.
Sign In or Register to comment.