The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

How to loop through pictures for text recognition

tngotngo Member Posts: 3 Learner I
edited June 2020 in Help
Hi everyone,

I am new to Rapidminer and I would appreciate if any help you can provide. I have a database with a field of URLs. All the URLs are pictures. I need to find a process that without clicking manually on URLs, I still can extract text from the URL images for every row in my dataset. My dataset has hundreds of thousands of rows. 

Answers

  • kaymankayman Member Posts: 662 Unicorn
    As rapidminer has no out of the box 'img to text' operators you will need to use the python extension here.

    One possible workflow would be to use RM to loop all of your db records -> webmining extension to download the image and store it locally -> python using for instance opencv to read the image -> pytesseract to do the OCR to get the text -> return text to Rapidminer and continue with next image.


  • rdesairdesai Employee-RapidMiner, RMResearcher, Member Posts: 15 RM Research
    In deep learning extension with our new functionality, you can easily do by using "extract text from image" as this operator uses  the Tesseract OCR library. In case you have multiple image then you can loop over images by adding another operator referred as "Read Image Meta-Data" inside the process. 
  • tngotngo Member Posts: 3 Learner I
    @kayman
    Hi Kayman, thank you for your help! Can you be more specific about how to download the images? I used the operator Get pages and I don't see any options to download the images from URLs
  • tngotngo Member Posts: 3 Learner I
    @rdesai, Thank you so much! I tried your process and it worked. However, I either need to be able to automatically download all images from the URLs in the database to my own folder, or I need an alternative way to run this without needing to download images to a folder. Do you have any thoughts? 
  • kaymankayman Member Posts: 662 Unicorn
    You could use the [open file] operator, which allows you to select a file based on a url. if you combine this with the [write file] operator you can save it on your disk. You will probably need to do some tweaking with macros to define filename and folder but in essence this should work fine.
  • kaymankayman Member Posts: 662 Unicorn
    @rdesai, oh wow, didn't know that one yet
Sign In or Register to comment.