Kuribo64
Views: 20,068,737 Home | Forums | Uploader | Wiki | Object databases | IRC
Rules/FAQ | Memberlist | Calendar | Stats | Online users | Last posts | Search
04-27-24 11:49 AM
Guest:

0 users reading Evil OpenAI web crawler | 1 bot

Main - Computers and technology - Evil OpenAI web crawler Hide post layouts | New reply


fruityloops
Posted on 08-07-23 04:34 PM Link | #101282
OpenAI is using crawlers to yank training data from your sites, without even knowing the licensing of the content they are stealing.

So if you want to prevent this, you can put the following in your robots.txt:

User-agent: GPTBot
Disallow: /

Alternatively, you can give it garbage data to fuck up their training data (example with nginx config):

if ($http_user_agent ~* "GPTBot") {
return 200 'asdlkfjsdjklfjsdlkfjsdkjfhgdfskjhgfd';
# realistically, you would put something more convincing than just a keyboard smash
}

HEYimHeroic
Posted on 08-11-23 03:05 AM Link | #101294
huh, i always wondered if there was a way to stop openai from vacuuming my website. thanks! this will be helpful.

____________________
yeah

Digital Cheese
Posted on 08-11-23 03:16 AM Link | #101296
Probably stupid question, but since I'm not hosting my site on my own servers and just using Neocities does it support robots.txt files or is that something I'd have to host myself otherwise for it to work? If it will work no matter what, I'm gonna add it to my website because I don't want OpenAI vacuuming this shit even if its just a static-website.

____________________
My Website | 3DSPaint

fruityloops
Posted on 08-11-23 10:53 AM Link | #101297
yep

Digital Cheese
Posted on 08-11-23 08:52 PM Link | #101298
Posted by fruityloops
Lets goooooooo, gonna add it to any future websites I create along with any existing ones that I have.

____________________
My Website | 3DSPaint

fruityloops
Posted on 08-20-23 06:16 PM Link | #101304
Oh and also, ByteDance does the same thing (they're like 90% of the bots on this website), they will cost you gigabytes of bandwidth per hour, so if you want to block those too, block 'Bytespider'

Digital Cheese
Posted on 08-21-23 09:40 PM Link | #101305
Posted by fruityloops
Oh and also, ByteDance does the same thing (they're like 90% of the bots on this website), they will cost you gigabytes of bandwidth per hour, so if you want to block those too, block 'Bytespider'
I'm pretty sure ByteDance is TikTok with a different name so if thats true, thats even better being able to block :D

____________________
My Website | 3DSPaint

Generic aka RSDuck
Posted on 08-21-23 09:50 PM Link | #101306
the most malicious (in a good way) thing to do would be to feed back text already generated by the model.

That is hard to filter and will subtely decreases its quality when trained with it.


Main - Computers and technology - Evil OpenAI web crawler Hide post layouts | New reply

Page rendered in 0.027 seconds. (2048KB of memory used)
MySQL - queries: 29, rows: 211/211, time: 0.012 seconds.
[powered by Acmlm] Acmlmboard 2.064 (2018-07-20)
© 2005-2008 Acmlm, Xkeeper, blackhole89 et al.