OK, so being a web developer I can tell you, anything that’s available on the web publicly, more…

2 min readMar 13, 2022

OK, so being a web developer I can tell you, anything that’s available on the web publicly, more than likely has been scraped multiple times. There are robots that do this all the time. I would expect most of them to keep stock of the data they gather. This often gets sold on via the dark web. This is nothing new, and media companies have to deal with copyright infringement all the time. They have entire departments and legals teams just dealing with this. Once content has been crawled and scraped, it has a life of its own. This is the sad reality. It’s one of the reasons I am so hesitant to publish my book(s) on Medium or Substack and trying to go via an official publisher.

The other aspect on new scrapes, more often than not, many of these robots are entirely mindless and just scrape en-masse. It’s quite likely that they’ll end up collecting a 3rd or 4th version of previous scrapes that got sold and posted online on some random site. But again, this is unavoidable, and the only partial remedy is to spend tons of effort reporting plagiarised content and taking people to court.

The rule of thumb is to write your articles and stories locally first on your machine, so the file will have a created date and prove authenticity. If for some reason that cannot be done, get in touch with Medium or the platform you wrote it on, they might have database backups and can give you an export which should clearly show the created and edited dates as well.

Not sure if all this answers all your questions. Feel free to ping me if you need clarification on any of this.

Written by Attila Vágó

Responses (1)