how to loop through python data set?
Hi there, I'm a bit stuck on how to use the panda data set when running some python scripts.
The base idea is to use some python script that allows me to check what language an example is written in. I have recordsets that contain out of a title field and some other fields, in a variety of languages. I use python to check which language the title is in, filter on English and ignore the rest.
Below is the (simplified) code I use :
import pandas
import translator
cl=translator.check_language
def rm_main(data)
t=data["my_title_field"]
try:
l=cl(t)
except:
pass
l='undefined'
data['detected_lang']=l
return data
This works pretty fine if I filter my dataset to a single row, but if I send multiple rows they all are assigned the same language. So this this means I need to itterate through the data, but I fail to make it work. I used a few ways (including below) but always get a meaningless parse error so i am a bit stuck. What would be the correct way to itterate through the panda data set, apply the change to each row, and then return the set?
This did not work :
def rm_main(data):
langs=[]
for row in data.iterrows():
try:
rl = msc.detect_lang(row["title_field"])
except:
pass
rl = "undefined"
langs.append(rl)
data['langs']=langs
return data
Any advice?
Best Answer
-
kayman Member Posts: 662 Unicorn
Nevermind, I stupidly forgot to add the index so it couldn't work.
If anybody ever wants to do the same thing this script works :
def rm_main(data):
langs=[]for index,row in data.iterrows():
s=row["my_check_field"]
try:
rl = #do something smart
except:
pass
rl = "undefined"
langs.append(rl)data['langs']=langs
return data1