Sign in
Log inSign up

If you build a web crawler how to check if the url crawled before or not ?

Ahmed Ashraf's photo
Ahmed Ashraf
·Dec 1, 2016

I'm building a web crawler for some of our customers .. they have a news website and they publish about 1000 new article per day .. so I think i could use redis to store the urls as set redis with schema like "domian:1"

redis.sadd("domain:1", url_string)

It works good enough for me but after one month from now it will be hard as i guess

so any better sloution for this .. any hints ?

Hassle-free blogging platform that developers and teams love.
  • Docs by Hashnode
    New
  • Blogs
  • AI Markdown Editor
  • GraphQL APIs
  • Open source Starter-kit

© Hashnode 2024 — LinearBytes Inc.

Privacy PolicyTermsCode of Conduct