It's the same problem I have. did you solve it?
Would you have a practical high level example?
What I sometimes do is using a reverse logic flow. So if the tokens can be used to remove 'good content', like the language you are looking for, what remains is 'bad content'. This return can then be used as a filter for your original set, so you can identify the good ones by deducting the outcome of the token process.
Not sure if it's helpfull, there may be other and better ways but more details can help in that case.