The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Google Cloud Translate via Web Enrichment Transform
paul_balas
Member Posts: 11 Contributor II
in Help
Hi,
I am attempting to translate spanish and indonesian languages to english via google translate api. I am using the example posted on this board to by using the following url:
*** I split this apart as Raidminer thinks I'm posting a nefarious link... ****
translation . googleapis . com / language / translate / v2 / detect?key=mykey
I have a 'Read Database' query that get's the data I need. I named the field with the text to be translated as 'ActionDesc'.
I have set the following on the 'Enrich Data by Webservice' control:
query type = JasonPath
attribute type = nominal
jsonpath queries with attribute name = ActionDescTrans and query expression as $..translatedText
request method = POST
service method = foo
body =
I am attempting to translate spanish and indonesian languages to english via google translate api. I am using the example posted on this board to by using the following url:
*** I split this apart as Raidminer thinks I'm posting a nefarious link... ****
translation . googleapis . com / language / translate / v2 / detect?key=mykey
I have a 'Read Database' query that get's the data I need. I named the field with the text to be translated as 'ActionDesc'.
I have set the following on the 'Enrich Data by Webservice' control:
query type = JasonPath
attribute type = nominal
jsonpath queries with attribute name = ActionDescTrans and query expression as $..translatedText
request method = POST
service method = foo
body =
{
'q': '<%ActionDesc%>',
'target': 'en',
'format': 'text'
}
request properties = property = ActionDesc value = ActionDesc
The process runs and returns my 10 rows I expect, but the ActionDescTrans field just has '?"
Any suggestions would be helpful.
TIA,
Paul
request properties = property = ActionDesc value = ActionDesc
The process runs and returns my 10 rows I expect, but the ActionDescTrans field just has '?"
Any suggestions would be helpful.
TIA,
Paul
5
Best Answer
-
paul_balas Member Posts: 11 Contributor IIHere's what ended up working for me:
You'll need the following process flow at a minimum:
1. A source of data - mine is a database query - this is the data you want to translate
2. An 'Encode URL' operator - to clean up the string you'll pass the the next step
3. A 'Enrich Data by Webservice' operator - this is to call the Google Cloud Translate API
4. Optional - add a target to write your results to (I used excel).
My data set had 3,483 records (most in spanish, some in english, but no way to know in advance given my data). End-to-end, the process took 1.5 hours to run. I ran it locally, I assume it would run faster had I deployed it to a Rapidminer Server in the Google Cloud. I decided to write the intermediate output of translated language to a workbook (excel). My next steps will be to parse the output you get from Google Translate in order to extract entities and do semantic analysis. Here is what the output of a translated title looks like:{"data": {"translations": [{"translatedText": "ENVIRONMENTAL EVENT LEVEL1 - DUST GENERATION","detectedSourceLanguage": "es"}]}}My Example Workflow
Setup your data source (in my case a query to only pull the fields I needed to translate. Please remember that calling the Google Translate API costs $$$ or credits, so while you are testing this out, you need to limit the number of rows you process to 10 or so so you can fix your bugs faster and reduce your cost to develop.
Next you have to encode the URL. If you don't know what that does, here you go: "Since URLs often contain characters outside the ASCII set, the URL has to be converted into a valid ASCII format. URL encoding replaces unsafe ASCII characters with a "%" followed by two hexadecimal digits.URLs cannot contain spaces."
This is the field from my query that I wanted to translate. This next part is pretty important, you need to choose the right 'encoding' method. If you choose 'UTF-8' that will work fine on short text fields, but if you have a paragraph, you'll need to setup encoding as 'UTF-16'. I'm no UTF-expert, but I found this to solve my issue when encoding a larger field. *** UPDATE *** Choose 'SYSTEM' encoding instead of UTF-8 or UTF-16. I discovered that things like accent marks ended up being decoded into '% 00' text results. With SYSTEM encoding, your results should be fine and you wont' get strange decoded texrt.
Next, you use the 'Enrich Data by Webservice" control. Before you do this, you need to setup your Google Cloud Project (below)
<--- this is the Enrich Data by Webservice' control
This part was a bit of trial and error with help from YY at Rapidminer. One thing you need to do here is to update a few dialogues and set the 'url' field. You'll need to have ready your Google API Security Token (explained later). Here is my example to set up the 'url':
https://translation.googleapis.com/language/translate/v2?q=<%Event_Title%>&target=en&key=YOURKEY
Notice that I put the 'Event_Title' in the URL to correspond with the parameter I encoded in the 'encode url' control. You wrap your parameter with the <%YOUR PARAMETER%>
The 'target=en' is to translate the incoming text to English. The api will auto detect the incoming language you send it, so no need to define that.
The 'regular expression queries' section needed an attribute to hold your translated text, which is returned via the 'query expression '.*' In my
case I call it 'TranslatedTitle' which will show up as an extra attribute on the result set and contain the translated text for the Event_Title field.
Next, update the 'request properties' 'Edit List' section. Here you are going to replace the value with your Google API Security Token (yes, explained below...). You'll need to put this on the URL as well (no clue why, but it didn't work without it).
GOOGLE CLOUD PROJECT SETUP
1. You'll need an account on GCP https://console.cloud.google.com
2. Setup a project and then go to the api library: https://console.cloud.google.com/apis/library
3. Then click on the 'Cloud Translation API' (you need a valid billing method... so first, set up the billing method)
4. Then enable the 'Cloud Translation API
5. Go to your projects 'credentials' page: https://console.cloud.google.com/apis/credentials?
6. Add a credential for an API, call it RM_Translate (anything you like really). A key is generated that you can use in the above examples which provide the security token necessary for you to make the call to GCP.
I hope this saves you hours of trouble.
Paul7
Answers
That would make it easier to troubleshoot
Please refer to the process and remember to insert your own key.
Hope it helps.