![]() | ||
Views: 24,259,719 |
Home
| Forums
| Uploader
| Wiki
| Object databases
| IRC
Rules/FAQ | Memberlist | Calendar | Stats | Online users | Last posts | Search |
03-19-25 11:37 PM |
Guest: |
0 users reading Evil OpenAI web crawler | 1 bot |
Main - Computers and technology - Evil OpenAI web crawler | Hide post layouts | New reply |
fruityloops |
| ||
Member Normal user Level: 7 ![]() Posts: 4/15 EXP: 1412 Next: 36 Since: 08-07-23 From: Germany Last post: 42 days ago Last view: 5 days ago |
OpenAI is using crawlers to yank training data from your sites, without even knowing the licensing of the content they are stealing.
So if you want to prevent this, you can put the following in your robots.txt: User-agent: GPTBot
Disallow: / Alternatively, you can give it garbage data to fuck up their training data (example with nginx config): if ($http_user_agent ~* "GPTBot") { return 200 'asdlkfjsdjklfjsdlkfjsdkjfhgdfskjhgfd'; # realistically, you would put something more convincing than just a keyboard smash } |
HEYimHeroic |
| ||
![]() Lantern Ghost i'm alice Level: 53 ![]() Posts: 769/770 EXP: 1127575 Next: 29544 Since: 08-04-17 Last post: 413 days ago Last view: 119 days ago |
huh, i always wondered if there was a way to stop openai from vacuuming my website. thanks! this will be helpful. ____________________ yeah |
Digital Cheese |
| ||
![]() Goomba Venith Emperor Level: 11 ![]() Posts: 26/32 EXP: 4949 Next: 1036 Since: 03-03-23 From: Amerika Last post: 252 days ago Last view: 113 days ago |
Probably stupid question, but since I'm not hosting my site on my own servers and just using Neocities does it support robots.txt files or is that something I'd have to host myself otherwise for it to work? If it will work no matter what, I'm gonna add it to my website because I don't want OpenAI vacuuming this shit even if its just a static-website.
____________________ My Website |
fruityloops |
| ||
Member Normal user Level: 7 ![]() Posts: 5/15 EXP: 1412 Next: 36 Since: 08-07-23 From: Germany Last post: 42 days ago Last view: 5 days ago |
yep |
Digital Cheese |
| ||
![]() Goomba Venith Emperor Level: 11 ![]() Posts: 27/32 EXP: 4949 Next: 1036 Since: 03-03-23 From: Amerika Last post: 252 days ago Last view: 113 days ago |
Posted by fruityloopsLets goooooooo, gonna add it to any future websites I create along with any existing ones that I have. ____________________ My Website |
fruityloops |
| ||
Member Normal user Level: 7 ![]() Posts: 7/15 EXP: 1412 Next: 36 Since: 08-07-23 From: Germany Last post: 42 days ago Last view: 5 days ago |
Oh and also, ByteDance does the same thing (they're like 90% of the bots on this website), they will cost you gigabytes of bandwidth per hour, so if you want to block those too, block 'Bytespider' |
Digital Cheese |
| ||
![]() Goomba Venith Emperor Level: 11 ![]() Posts: 31/32 EXP: 4949 Next: 1036 Since: 03-03-23 From: Amerika Last post: 252 days ago Last view: 113 days ago |
Posted by fruityloopsI'm pretty sure ByteDance is TikTok with a different name so if thats true, thats even better being able to block ![]() ____________________ My Website |
Generic aka RSDuck |
| ||
Member Normal user Level: 7 Posts: 8/10 EXP: 1389 Next: 59 Since: 12-06-19 Last post: 238 days ago Last view: 15 days ago |
the most malicious (in a good way) thing to do would be to feed back text already generated by the model.
That is hard to filter and will subtely decreases its quality when trained with it. |
Main - Computers and technology - Evil OpenAI web crawler | Hide post layouts | New reply |
Page rendered in 0.019 seconds. (2048KB of memory used) MySQL - queries: 29, rows: 212/212, time: 0.009 seconds. ![]() © 2005-2008 Acmlm, Xkeeper, blackhole89 et al. |