🎉 🎉 RAPIDMINER 9.10 IS OUT!!! 🎉🎉

Download the latest version helping analytics teams accelerate time-to-value for streaming and IIOT use cases.

CLICK HERE TO DOWNLOAD

how to loop through python data set?

kaymankayman Member Posts: 652   Unicorn
edited November 2018 in Help

Hi there, I'm a bit stuck on how to use the panda data set when running some python scripts. 

The base idea is to use some python script that allows me to check what language an example is written in. I have recordsets that contain out of a title field and some other fields, in a variety of languages. I use python to check which language the title is in, filter on English and ignore the rest. 

 

Below is the (simplified) code I use :

 

import pandas

import translator

cl=translator.check_language

 

def rm_main(data)

        t=data["my_title_field"]

        try:

               l=cl(t)

        except:

               pass

               l='undefined'

        data['detected_lang']=l

        return data

 

This works pretty fine if I filter my dataset to a single row, but if I send multiple rows they all are assigned the same language. So this this means I need to itterate through the data, but I fail to make it work. I used a few ways (including below) but always get a meaningless parse error so i am a bit stuck. What would be the correct way to itterate through the panda data set, apply the change to each row, and then return the set? 

 

This did not work :

 

def rm_main(data):
    langs=[]

    for row in data.iterrows():
        try:
              rl = msc.detect_lang(row["title_field"])
       except:
             pass
             rl = "undefined"

      langs.append(rl)

      data['langs']=langs
   return data

 

Any advice?

Best Answer

  • kaymankayman Member Posts: 652   Unicorn
    Solution Accepted

    Nevermind, I stupidly forgot to add the index so it couldn't work.

     

    If anybody ever wants to do the same thing this script works :

     

    def rm_main(data):
        langs=[]

        for index,row in data.iterrows():
            s=row["my_check_field"]
            try:
                rl = #do something smart
           except:
               pass
               rl = "undefined"
           langs.append(rl)

       data['langs']=langs
    return data

    Thomas_Ott
Sign In or Register to comment.