![]() | ||
Views: 18,649,221 |
Home
| Forums
| Uploader
| Wiki
| Object databases
| IRC
Rules/FAQ | Memberlist | Calendar | Stats | Online users | Last posts | Search |
10-02-23 03:56 PM |
Guest: |
0 users reading Evil OpenAI web crawler | 1 bot |
Main - Computers and technology - Evil OpenAI web crawler | Hide post layouts | New reply |
fruityloops |
| ||
Newcomer Normal user Level: 4 ![]() Posts: 4/7 EXP: 139 Next: 140 Since: 08-07-23 From: Germany Last post: 42 days ago Last view: 21 hours ago |
OpenAI is using crawlers to yank training data from your sites, without even knowing the licensing of the content they are stealing.
So if you want to prevent this, you can put the following in your robots.txt: User-agent: GPTBot
Disallow: / Alternatively, you can give it garbage data to fuck up their training data (example with nginx config): if ($http_user_agent ~* "GPTBot") { return 200 'asdlkfjsdjklfjsdlkfjsdkjfhgdfskjhgfd'; # realistically, you would put something more convincing than just a keyboard smash } |
HEYimHeroic |
| ||
![]() Lantern Ghost i'm alice Level: 51 ![]() Posts: 769/769 EXP: 1011678 Next: 2260 Since: 08-04-17 Last post: 52 days ago Last view: 20 days ago |
huh, i always wondered if there was a way to stop openai from vacuuming my website. thanks! this will be helpful. ____________________ yeah |
Digital Cheese |
| ||
![]() Goomba TDK Owner Level: 9 ![]() Posts: 26/31 EXP: 2521 Next: 641 Since: 03-03-23 From: Amerika Last post: 41 days ago Last view: 18 hours ago |
Probably stupid question, but since I'm not hosting my site on my own servers and just using Neocities does it support robots.txt files or is that something I'd have to host myself otherwise for it to work? If it will work no matter what, I'm gonna add it to my website because I don't want OpenAI vacuuming this shit even if its just a static-website.
____________________ My Website | YouTube Channel | 3DSPaint |
fruityloops |
| ||
Newcomer Normal user Level: 4 ![]() Posts: 5/7 EXP: 139 Next: 140 Since: 08-07-23 From: Germany Last post: 42 days ago Last view: 21 hours ago |
yep |
Digital Cheese |
| ||
![]() Goomba TDK Owner Level: 9 ![]() Posts: 27/31 EXP: 2521 Next: 641 Since: 03-03-23 From: Amerika Last post: 41 days ago Last view: 18 hours ago |
Posted by fruityloopsLets goooooooo, gonna add it to any future websites I create along with any existing ones that I have. ____________________ My Website | YouTube Channel | 3DSPaint |
fruityloops |
| ||
Newcomer Normal user Level: 4 ![]() Posts: 7/7 EXP: 139 Next: 140 Since: 08-07-23 From: Germany Last post: 42 days ago Last view: 21 hours ago |
Oh and also, ByteDance does the same thing (they're like 90% of the bots on this website), they will cost you gigabytes of bandwidth per hour, so if you want to block those too, block 'Bytespider' |
Digital Cheese |
| ||
![]() Goomba TDK Owner Level: 9 ![]() Posts: 31/31 EXP: 2521 Next: 641 Since: 03-03-23 From: Amerika Last post: 41 days ago Last view: 18 hours ago |
Posted by fruityloopsI'm pretty sure ByteDance is TikTok with a different name so if thats true, thats even better being able to block ![]() ____________________ My Website | YouTube Channel | 3DSPaint |
Generic aka RSDuck |
| ||
Newcomer Normal user Level: 6 Posts: 8/8 EXP: 845 Next: 62 Since: 12-06-19 Last post: 41 days ago Last view: 10 days ago |
the most malicious (in a good way) thing to do would be to feed back text already generated by the model.
That is hard to filter and will subtely decreases its quality when trained with it. |
Main - Computers and technology - Evil OpenAI web crawler | Hide post layouts | New reply |
Page rendered in 0.033 seconds. (2048KB of memory used) MySQL - queries: 29, rows: 212/212, time: 0.013 seconds. ![]() © 2005-2008 Acmlm, Xkeeper, blackhole89 et al. |