RapidMiner

Get page error in Web Mining Package

Contributor II

Get page error in Web Mining Package

Hi All,

 

I'm working in Client's company as a project member. Because client wanted us to get the information by Web site, I try to use "get page" operator. Due to Https address, it was problem to access the web page we need.

 

Even though the url starting https can be accessed by web browser such as IE or Chrome, we cannnot access a URL by "get page" operator with below message

error.PNG

I'm sure it's network security issue of this companay. ( I can access the same page by "get page" operator in other places )

What I want to know is what do I ask to Network manager of the company in order to solve this blocking.

 

looking forward to your better knowledge.

 

Thanks

 

 

4 REPLIES
Highlighted
RMStaff

Re: Get page error in Web Mining Package

Hi,

 

if i try to connect from cmd line i get this:

Resolving www.naver.com... 104.121.126.27
Connecting to www.naver.com|104.121.126.27|:443... connected.
ERROR: cannot verify www.naver.com's certificate, issued by `/C=US/O=GeoTrust Inc./CN=GeoTrust SSL CA - G3':
Unable to locally verify the issuer's authority.

 

which might explain the error.

--------------------------------------------------------------------------
Head of Data Science Services at RapidMiner
Community Manager

Re: Get page error in Web Mining Package

[ Edited ]

Hmm very strange.  I have no problem here:

 

Screen Shot 2017-09-04 at 3.21.06 PM.pngScreen Shot 2017-09-04 at 3.21.19 PM.png

 

I agree with Martin - try the cmd line.  If you're in Unix, I would do "curl -v https://www.naver.com" to see the handshaking.  You should see a 200 OK response and so forth:

 

$ curl -v https://www.naver.com
* Rebuilt URL to: https://www.naver.com/
*   Trying 23.66.210.98...
* TCP_NODELAY set
* Connected to www.naver.com (23.66.210.98) port 443 (#0)
* TLS 1.2 connection using TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
* Server certificate: ssl.pstatic.net
* Server certificate: GeoTrust SSL CA - G3
* Server certificate: GeoTrust Global CA
> GET / HTTP/1.1
> Host: www.naver.com
> User-Agent: curl/7.54.0
> Accept: */*
> 
< HTTP/1.1 200 OK
< Server: NWS
< Content-Type: text/html; charset=UTF-8
< Cache-Control: no-cache, no-store, must-revalidate
< Pragma: no-cache
< P3P: CP="CAO DSP CURa ADMa TAIa PSAa OUR LAW STP PHY ONL UNI PUR FIN COM NAV INT DEM STA PRE"
< X-Frame-Options: SAMEORIGIN
< X-EdgeConnect-MidMile-RTT: 30
< X-EdgeConnect-Origin-MEX-Latency: 8
< X-EdgeConnect-MidMile-RTT: 203
< X-EdgeConnect-Origin-MEX-Latency: 8
< X-EdgeConnect-Cache-Status: 0
< Date: Mon, 04 Sep 2017 19:25:10 GMT
< Transfer-Encoding:  chunked
< Connection: keep-alive
< Connection: Transfer-Encoding
< 
<!doctype html>



















<html lang="ko" class="svgless">
<head>
<meta charset="utf-8">
<meta name="Referrer" content="origin">
<meta http-equiv="Content-Script-Type" content="text/javascript">
<meta http-equiv="Content-Style-Type" content="text/css">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=1100">
<meta name="apple-mobile-web-app-title" content="NAVER" />
<meta property="og:title" content="네이버">

If that works, you can use the Execute Program operator in RapidMiner and just insert the same curl statement instead of using the Get Page operator.  Pretty much the same thing.  Smiley Happy

 

Scott

Scott Genzer
Senior Community Manager
RapidMiner, Inc.
Contributor II

Re: Get page error in Web Mining Package

thank you for good answers.

But, I don't have enough knowledge to understand your smart reply.

Could you please explain cmd sciprt in Window OS which would be worked as similar get page Operator?

if it's authority problem, What do I ask to Network Manage of this company?

 

Community Manager

Re: Get page error in Web Mining Package

hello @user194372 - well as @mschmitz said, it is likely an SSL certificate that is expired or something like that.  In very general terms, your computer is trying to protect you by not allowing you to connect to a website that is trying to offer an SSL connection but does not have a properly registered SSL certificate.  There is a LOT of information on SSL certificates, and error messages to this effect, on the internet.

 

As for cURL statements on Windows, it is my understanding that it is not native but people often use this.  But I will defer to other Windows users here on the forum...

 

Scott

Scott Genzer
Senior Community Manager
RapidMiner, Inc.