How to get exact page content in wget if error code is 404

Refresh

April 2019

Views

390 time

1

I have two url one is working url another one is page deleted url.working url is fine but for page deleted url instead of getting the exact page content wget receives 404

Working url

import os
def curl(url):
    data = os.popen('wget -qO- %s '% url).read()
    print (url)
    print (len(data))
    #print (data)

curl("https://www.reverbnation.com/artist_41/bio")

Output:

https://www.reverbnation.com/artist_41/bio
80067

Page Deleted url

import os
def curl(url):
    data = os.popen('wget -qO- %s '% url).read()
    print (url)
    print (len(data))
    #print (data)

curl("https://www.reverbnation.com/artist_42/bio")

output:

https://www.reverbnation.com/artist_42/bio
0

I get length as 0 but live page has some content in it

How to receive the exact content in wget or curl

1 answers

4

wget has a switch called "--content-on-error":

--content-on-error
           If this is set to on, wget will not skip the content when the server responds with a http status code that indicates error.

So just add it to your code and you will have the "content" of the 404 pages too:

import os
def curl(url):
    data = os.popen('wget --content-on-error -qO- %s '% url).read()
    print (url)
    print (len(data))
    #print (data)

curl("https://www.reverbnation.com/artist_42/bio")