Quant/Qualitative analysis - Key Search Terms and Website Analysis help?

Haines1997Haines1997 Member Posts: 1 Learner I
edited November 2018 in Help

Hi all, 


For my research, I am currently looking at analysing given websites in their entirety (excluding files contained within the sites) in terms of text to look for and rank key, predetermined terms based on searchability and quality. The search terms are my own and may require a need for translation given the international nature of some of these websites. 


is there a possible way to either copy and paste the data (e.g, the website data and the key search terms) into a given form, such as excel to use - or to give all website URL's into the system to produce; 

Count of words/phrases (E.g, number of a particular type when searched for)

Location of the terms in question on the webpages - for the above searchability and quality outputs. 


Also, is it possible to give the data a baseline, from what would be classed as a "control" website, to compare that to other sites?


I believe I'll have to do much of this manually by simply going through the web pages and judging the quality of the text, however any help with the above would be greatly appreciated. 






  • Options
    Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn

    Some of this can be done using the text mining and web mining extensions.  Scraping the web pages (if it's allowed by the Terms of Use) can be automated, as can the parsing of all the content into words and then counting them.  

    However specifying the location on the web page would be quite tricky---is there even a single defined metric for that?

    And some of your more subjective judgments about usability and searchability would also be difficult.  So you may end up having to do some parts manually and others in RapidMiner.  


    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
Sign In or Register to comment.