Which approach and model to use to optimize keyword selection?

louismlouism Member Posts: 8 Contributor II
edited November 2018 in Help

I have the following data:

1) Web pages ranked by traffic.  I don't have the absolute traffic, but I know a given page ranks 1st, 2nd etc... in terms of total page views.

2) A list of search strings and their relative popularity.  So again, I don't know how often they are used, but I know for example the search term 'Ford' is more frequently searched for than 'clutch'.

3) For each page, I have it's ranking for any particular search string.  So I know for example Page A would be first result for 'Ford' and Page B would be result number 174 for the same search string.

4) I have a few other signals, both numerical and properties ex: page age, page classified by topic etc...

I think you can see me coming here... :)

I am trying to find the combination of search strings a page should cater to in order to rank as high as possible in terms of total traffic.  The results have to be human readable so I can understand them and act upon them.

I can scatter plot search terms by frequency of use and results returned.  Intuitively, it's clear that the search terms which are used often and return fewer results are the best to target as there is a higher probability my page is seen in the limited results.  The problem is all my data ranking is relative and I know that the shape of the search term frequency use curve is exponential.  ie: most frequent search term is probably used 10x more times than the 5th one which is 10x more times used than the 10th one. Also, some search terms will return 10,000 results and others will return 100. So it makes it very difficult to estimate the shape of the optimum region of frequency vs. number of returned results. 

I don't want to predict position or anything like that.  I just want to know what to use to get the maximum boost. So, for example, I'd be very happy to know which combination of terms are more frequent in the first quantile compared to the others for example.

What would you use to analyse this?

Thank you so much!  I read a ton of stuff and my brain is so overloaded I don't even know where to start...

PS: No, this is not data from Google...  Although I am sure thousands of researchers have tried to estimate which signals they use using similar techniques. :)
Sign In or Register to comment.