The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
"Crawl Web", "Get Page[s]","Extract Content" and document encoding/charset
Hi all
I crawl web sites in Russian. Some of them return content in UTF-8, other use Windows-1251 encoding. Is there a way to convert retrieved pages to any single (preferably UTF-8) encoding based on Content-Type server headers and META tags in the document?
I crawl web sites in Russian. Some of them return content in UTF-8, other use Windows-1251 encoding. Is there a way to convert retrieved pages to any single (preferably UTF-8) encoding based on Content-Type server headers and META tags in the document?
0