Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detecting missing links and 404s #492

Open
asitemade4u opened this issue Apr 30, 2020 · 2 comments
Open

Detecting missing links and 404s #492

asitemade4u opened this issue Apr 30, 2020 · 2 comments

Comments

@asitemade4u
Copy link

@asitemade4u asitemade4u commented Apr 30, 2020

The developer of the website I intend to scrape information from is sloppy and has left a lot of broken links.
When I execute an otherwise effective Ferret script on a list of pages, it stops altogether at every 404.
Is there a DOCUMENT_EXISTS or anything that would help the script go on?

@asitemade4u asitemade4u changed the title Detecting 404 Detecting 404s Apr 30, 2020
@asitemade4u asitemade4u changed the title Detecting 404s Detecting missing links and 404s Apr 30, 2020
@ziflex
Copy link
Contributor

@ziflex ziflex commented May 2, 2020

Nope, there is no such a function. But we can come up with something like that.

@ksdme
Copy link

@ksdme ksdme commented May 22, 2020

@ziflex I would like to pick this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants
You can’t perform that action at this time.