The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
[SOLVED] Format input documents
Hello friends of the community.
I have a question regarding the format of the input documents.
I try the procedure tokenize format files. "txt" and runs smoothly.
The original files I need to work with are in ". Docx" and ". Doc" for Microsoft Word, repeat the procedure for "tokenize" and read me document strange characters.
Is there a way to be able to document format. "Docx" and ". Doc"?
I have a question regarding the format of the input documents.
I try the procedure tokenize format files. "txt" and runs smoothly.
The original files I need to work with are in ". Docx" and ". Doc" for Microsoft Word, repeat the procedure for "tokenize" and read me document strange characters.
Is there a way to be able to document format. "Docx" and ". Doc"?
Tagged:
0
Answers
Do you find a solution for .docx and .doc? I 've got the same problem.
Thanks in advance.
Johan
yes, I solved.
I did was convert documents from ".pdf" format to ".txt" (plain text format) instead of transforming Microsoft Word format (. docx - doc)
Greetings from Argentina
Thank you for the tips.
Greetings from France