The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Question regarding the feasibility of a Table data Extraction project with RM
Would it be feasible with RM to extract tables from PDFs? I realize the PDFs might be converted to something else first but would it be possible with RM to run through the entire text of a financial report and identify table data and extract it to examplesets using RM?
I am thinking of trying it out but would like to hear from more seasoned people if they think it is reasonably feasible or if there is a hard wall along the way that I am not yet seeing.
1
Best Answer
-
kayman Member Posts: 662 UnicornThere is an operator to do this. Look for pdf extension on the marketplace.
It is fairly good with converting tables to dataset from pdf, if your tables are structured nice.
If this is not the case you can also use the import document operator from the text extension and select pdf. This will convert your pdf to plain text. Feasible to get table content then using the text operators but not so straightforward.
Finally you could also use the python extension. There are a few good libraries dealing with table extraction from pdf, but try option 1 first.6
Answers