The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
"embedded crawler (websphinx) and RegEx"
(How) can I use RegEx within that crawler? It did not work...
I tried this several times as follows (see also attachement):
visit_content: ^water$
or
visit_content: \<water\>
or
visit_content: (?s)\<water\>
...
(I don't want waterfall...)
Please don't suggest HTTRACK. As far as I know HTTRACK can not filter the content of pages but only URLs.
[attachment deleted by admin]
I tried this several times as follows (see also attachement):
visit_content: ^water$
or
visit_content: \<water\>
or
visit_content: (?s)\<water\>
...
(I don't want waterfall...)
Please don't suggest HTTRACK. As far as I know HTTRACK can not filter the content of pages but only URLs.
[attachment deleted by admin]
Tagged:
0
Answers
the crawler does not support regular expressions. This are the only condition types are supported to specify which links to follow:
follow_url A link is only followed, if the target URL contains all terms stated in this parameter.
link_text A link is only followed, if the link text contains all terms stated in this parameter.
The conditions that state whether to store a page or not allow for the following expressions:
visit_url A page is only stored if its URL contains all terms stated in this parameter.
visit_content A page is only stored if its content contains all terms stated in this parameter.
Further informations could be found on http://nemoz.org/joomla/content/view/64/53/lang,de/
Greetings,
Sebastian