The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

Python script problems need help

JustMeJustMe Member Posts: 1 Learner I
edited May 2022 in Help
I have a some lines of python that fill in missing values with text scraping. I would like to execute this python code within Rapidminer. When i run it in Pycharm it works just fine and gives the expected output.

So i changed the file with def rm_main(data): But when i run the process i get an error that a certain item doenst have the correct datatype. For example: Process failed: Could not parse this value (attribute 'year') as number: 'Acura'

Does someone else also has this problem? 
How can i debug the code with Rapidminer? 

Thanks in advance! :)

   def rm_main(data):
    print(type(data))

#your code goes here
df1 = data # --> year # The first variable that has an empty value and that I will deal with is the "year". # Looking at the other variables to be able to see if I can extract some info that helps to fill in the "nan" values, I find that the "description" column has the info year inside it. # The problem is that there are also "nan" values in the "description" column. The positive info is that there are only 27 empty values in the "description" column. # I only looked at the observations where "description" is empty and found that, of the 27 empty values, 3 already have the "year" column filled in and the rest have almost no value filled in the other columns. # My solution is as follows: fill the column "year" with the information contained in the column "description" not empty and drop the entire line in the 24 observations where the column "description" is empty along with several others, as we will not have enough info. # First of all, I will drop the rows where the columns "year" and "description" are null. df1.dropna(subset=['year', 'description'], how='all', inplace=True) df1['year'] = df1.apply(lambda x: x['description'][:5] if math.isnan(x['year']) else x['year'], axis=1 ) # After fill the nan values in "year" column, there are some values that aren't year values, such as: '92 to', '03 je', 'Auto' and 'Nice'. I will transform these values by hand.
df1['year'] = df1.apply(lambda x: 1992 if x['year'] == '92 to' else x['year'], axis=1 ) df1['year'] = df1.apply(lambda x: 2003 if x['year'] == '03 je' else x['year'], axis=1 ) df1.drop(df1.loc[df1['year'] == 'Auto '].index, inplace=True) df1.drop(df1.loc[df1['year'] == 'Nice '].index, inplace=True)
return df1
Tagged:
Sign In or Register to comment.