The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
"Get page error in Web Mining Package"
user194372
Member Posts: 14 Contributor II
Hi All,
I'm working in Client's company as a project member. Because client wanted us to get the information by Web site, I try to use "get page" operator. Due to Https address, it was problem to access the web page we need.
Even though the url starting https can be accessed by web browser such as IE or Chrome, we cannnot access a URL by "get page" operator with below message
I'm sure it's network security issue of this companay. ( I can access the same page by "get page" operator in other places )
What I want to know is what do I ask to Network manager of the company in order to solve this blocking.
looking forward to your better knowledge.
Thanks
Tagged:
0
Answers
Hi,
if i try to connect from cmd line i get this:
Resolving www.naver.com... 104.121.126.27
Connecting to www.naver.com|104.121.126.27|:443... connected.
ERROR: cannot verify www.naver.com's certificate, issued by `/C=US/O=GeoTrust Inc./CN=GeoTrust SSL CA - G3':
Unable to locally verify the issuer's authority.
which might explain the error.
Dortmund, Germany
Hmm very strange. I have no problem here:
I agree with Martin - try the cmd line. If you're in Unix, I would do "curl -v https://www.naver.com" to see the handshaking. You should see a 200 OK response and so forth:
If that works, you can use the Execute Program operator in RapidMiner and just insert the same curl statement instead of using the Get Page operator. Pretty much the same thing.
Scott
thank you for good answers.
But, I don't have enough knowledge to understand your smart reply.
Could you please explain cmd sciprt in Window OS which would be worked as similar get page Operator?
if it's authority problem, What do I ask to Network Manage of this company?
hello @user194372 - well as @mschmitz said, it is likely an SSL certificate that is expired or something like that. In very general terms, your computer is trying to protect you by not allowing you to connect to a website that is trying to offer an SSL connection but does not have a properly registered SSL certificate. There is a LOT of information on SSL certificates, and error messages to this effect, on the internet.
As for cURL statements on Windows, it is my understanding that it is not native but people often use this. But I will defer to other Windows users here on the forum...
Scott
Dear Team,
I tried to access the url through Curl, then get the below message.
* Rebuilt URL to: https://www.naver.com/
* Trying 202.179.177.21...
* TCP_NODELAY set
* Connected to www.naver.com (202.179.177.21) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* Cipher selection: ALL:!EXPORT:!EXPORT40:!EXPORT56:!aNULL:!LOW:!RC4:@STRENGTH
* successfully set certificate verify locations:
* CAfile: \Profiles\apjt0187\Downloads\curl-7.55.1-win32-mingw\bin\curl-ca-b
undle.crt
CApath: none
* TLSv1.2 (OUT), TLS handshake, Client hello (1):
* TLSv1.2 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (OUT), TLS alert, Server hello (2):
* SSL certificate problem: self signed certificate in certificate chain
* stopped the pause stream!
* Closing connection 0
curl: (60) SSL certificate problem: self signed certificate in certificate chain
More details here: https://curl.haxx.se/docs/sslcerts.html
curl performs SSL certificate verification by default, using a "bundle"
of Certificate Authority (CA) public keys (CA certs). If the default
bundle file isn't adequate, you can specify an alternate file
using the --cacert option.
If this HTTPS server uses a certificate signed by a CA represented in
the bundle, the certificate verification probably failed due to a
problem with the certificate (it might be expired, or the name might
not match the domain name in the URL).
If you'd like to turn off curl's verification of the certificate, use
the -k (or --insecure) option.
HTTPS-proxy has similar options --proxy-cacert and --proxy-insecure.
Can you check the problem on this network environment?
so yes that just confirms that it is an SSL certificate issue: "curl: (60) SSL certificate problem: self signed certificate in certificate chain" The site is using a "self-signed" certificate so it is not externally verified, etc... Time to go to GoDaddy and buy a real one.
Scott
... or go to Let's Encrypt and get one for free ;-)