What's the plan to tackle the horde of incoming AI bots?

zer0@thelemmy.club · 1 year ago

What's the plan to tackle the horde of incoming AI bots?

Muddybulldog@mylemmy.win · 1 year ago

Somewhat of a loaded question but, if we need to scroll through their comment history meticulously to separate real from bot, does it really matter at that point?

SPAM is SPAM and we’re all in agreement that we don’t want bots junking up the communities with low effort content. However if they reach the point that it takes real effort to ferret them out they must be successfully driving some sort of engagement.

I’m not positive that’s a bad thing.

usrtrv@lemmy.ml · 1 year ago

I think we’ll be in bad shape when you can’t trust any opinions about products, media, politics, etc. Sure, shills currently exists, so everything you read already needs skepticism. But at some point bots will be able to flood very high quality posts. But these will of course be lies to push a product or ideology. The truth will be noise.

I do think this is inevitable, and the only real guard would be to move back to smaller social circles.

Muddybulldog@mylemmy.win · 1 year ago

I’m of the mind that the truth already is noise and has been for a long, long time. AI isn’t introducing anything new, it’s just enabling faster creation of agenda-driven content. Most people already can’t identify the AI generated content that’s been spewing forth in years past. Most people aren’t looking for quality content, they looking for bias-affirming content. The overall quality is irrelevant.

zer0@thelemmy.club · 1 year ago

The outcome is that people will ditch platform like lemmy and seek true informations somewhere else

usernotfound@lemmy.ml · 1 year ago

Where did you have in mind?

HelloHotel@lemm.ee · 1 year ago

Things like chatGPT are not designed to think using object relations like a human. Its designed to respond the way a human would, (a speach quartex with no brain), it is made to figure out what a human would respond with rather than give a well thoght out answer.

Robert Miles can explain it better than i ever could

PipedLinkBot@feddit.rocks · 1 year ago

Here is an alternative Piped link(s): https://piped.video/watch?v=w65p_IIp6JY

Piped is a privacy-respecting open-source alternative frontend to YouTube.

I’m open-source, check me out at GitHub.

usernotfound@lemmy.ml · 1 year ago

BOT! KILL IT!

shagie@programming.dev · 1 year ago

On Usenet, spammers (bots weren’t so much a thing - but spammers were) when found, found their way into cancel messages rather promptly. The Breidbart Index was created to measure the severity of spam and trusted organizations were used by news hosts that would then cancel the spam messages from their feeds. This is widely used even today and if you look at the current feeds on Usenet for offered vs accepted

Lemmy was designed with an anti-censorship goal which makes identifying and deleting spam from others more difficult. To the best of my understanding of how Lemmy implements ActivityPub (and ActivityPub has a bit of this too), there is no way to delete a message except by individual action of moderators of a /c/ or server admins. That is, if someone was to set up a dropship-spam-finder which federated with lemmy servers and then published delete messages… they would fail.

https://www.w3.org/wiki/ActivityPub/Primer/Delete_activity

Here are some important checks that implementers should make when they receive a Delete activity:

Check that the object’s creator is the same as the actor for the Delete activity. This could be stored in a number of ways; the attributedTo property may be used for this check.

This puts the burden of dealing with spam on the moderators of a /c/ and the server admins to delete posts individually or blocking users and possibly defederating sites. It may be useful in time to have some additional functionality that one could federate with for trusted Delete activity messages that would identify spammers and delete those messages from your instance… but that’s not something available today.

zer0@thelemmy.club · 1 year ago

Could something like this be implemented as a nsfw filter you can turn on and off?

shagie@programming.dev · 1 year ago

I’m not going to say “no”, but NSFW filtering is done by a user supplied flag on an item.

There is work that is being done to add an auto mod… https://github.com/LemmyNet/lemmy/issues/3281 but that’s different than a cancel bot approach that Usenet uses.

Not saying it is impossible, just that the structure seems to be trying to replicate Reddit’s functionality (which isn’t federated) rather than Usenet’s functionality (which is federated)… and that trying to replicate the solution that works for Reddit may work at the individual sub level but wouldn’t work at the network level (compare: when spammers are identified on reddit their posts are removed across the entire system).

The Usenet cancel system is federated spam blocking (and according to spammers of old, Lumber Cartel censorship).

zer0@thelemmy.club · 1 year ago

Ahaha the lumber cartel thing is pretty funny. Anyway let me ask you shagie, from usenet what do you think went wrong that lead us to the centralized services we have now? How do we not make the same mistake again?

shagie@programming.dev · edit-2 1 year ago

The tools that Usenet had to maintain its culture were insufficient compared to the spam and the arrival of the rest of the net (today is the 10925th day of September 1993). Moderation was based on barrier to entry (see also the moderation tooling of alt.sysadmin.recovery) or limited federation (bofh restricted hierarchy).

Combine this with the… reduction of, let’s call it ‘deep computer literacy’ with the techies - moving away from the command line and to web browsers and guis. This allowed people to get much of the content of the emerging web while drying up people arriving on Usenet.

While myspace and geocities allowed for the regular person to establish a web presence, these also were centralized systems. The web as a “stand up and self host” is far beyond the technical literacy level of most people… and frankly, those who do know how to do it don’t because keeping your web server up to date with the latest security patches or dealing with someone who is able to do an RCE on your AWS instance and run up your credit card bill is decidedly “not fun.”

And so, instead of running individual forums, you’ve got Reddit. It means that you don’t have to have deep knowledge of system administration or even if you do, spend your days patching servers in order to host and interact with people in the internet.

So, today…

New software need to be turnkey and secure by default. Without this, instability of smaller instances will result in single large instances being the default. Consider Wordpress… and you’ve got a few large servers that run for the regular person and do automated software updates and patching because I’ve got no business anymore running a php application somewhere without spending time doing regular patching. When (not if) Lemmy has a RCE security issue (and not just the “can inject scripts into places” level of problems but rather underlying machine compromised) there will be a “who is staying up to date with the latest patches for Lemmy and the underlying OS?” day of reckoning.

Communities (not /c/ but people) need to be able to protect the culture that is established through sufficient moderation tooling. The moderation tooling on Reddit is ok and supplemented by the reddit admins being able to take deeper actions against the more egregious problem users. That level of moderation tooling isn’t yet present for the ownership level moderation of a /c/ nor at the user level being able to remove themselves from interactions with other individuals.

Culture needs to be as something that is rather than something that is not. This touches on A Group is its Own Worst Enemy ( https://gwern.net/doc/technology/2005-shirky-agroupisitsownworstenemy.pdf ) which I highly recommend. Pay attention to Three Things to Accept and Four Things to Design For. Having a culture of “this is not reddit, but everything we are doing is a clone of reddit” is ultimately self defeating as Conway’s Law works both ways and you’ll get reddit again… with all the problems of federation added in (the moderation one being important).

On that culture point, given federation it is even more important to establish a positive culture (though not toxic positivity has its own problems). The culture of discontents swearing because they can and there’s no moderation to say no or the equivalent of elder statesmen to establish and maintain a tone (tangent: very lightly moderated chat on a game I play has a distinctly different tone if the ‘elder statesmen’ of that particular section of the game are present and chatting or not… just being there and being reasonable and polite has the effect of discouraging trolls - its no fun to troll people who won’t get mad at you, and people seek to be as good as the elder statesmen of the channel).

So, as long as Lemmy is copying Reddit (and Mastodon is copying Twitter - though they’re doing a better job of not copying it now), and moderation isn’t solved, and the core group (read A Group) isn’t sufficiently empowered to set the tone. Without a sufficiently large user base to engage with (and be able to discover other places as appropriate if one /c/ isn’t to one’s liking), blocking users is less palatable and the seeing a larger percentage of messages being ones that you’d rather not interact with… you’d leave. If you sat down at the bar and the guy next to you is swearing every other word because they can and the bartender won’t throw them out - you leave and avoid going back to that bar. Same is true of social media. Mastodon has the advantage that your’e interacting with individuals rather than communities.

On reddit, on subs I moderate (yes, I’m still there), I’ve got auto mod set up to filter all vulgarity. I approve nearly all of it, but it has also let me catch problems that are getting heated in word choice… and I can say “nope”, delete the comments (all the way down to the root) that are setting the wrong tone for the sub. And I’ve only had to do that twice in the past year.

So… there’s my big point. The rate of new people joining has to be equal to or greater than the people who leave because of cultural or technical reasons. Technical reasons are fixed by fixing the software. Cultural ones are done by giving the tools for moderation. And if a given community starts causing evaporation of people because local or all on an instance becomes not something that you’d enjoy seeing, the culture of the admins needs to be sufficiently empowered to boot it. The ideal of “anything as long as it isn’t illegal - we don’t censor anything” often results in a culture of the site that isn’t enjoyable to be part of. A ‘dangerous’ part of the fediverse is that that culture can spread to other instances much more easily.

… And that’s probably enough rambling now. Make sure you read A Group is its Own Worst Enemy though. While it was something from nearly two decades ago - the things that it talks of are timeless and should not be forgotten when designing social software.

RoundSparrow@lemmy.ml · edit-2 1 year ago

One of the cool things to me about Lemmy is it is like email where people have their own custom domain names. Personally I think people using their real identity should come back into fashion and post 9/11/2001 USA culture of terrorism fear-ism should not be the dominating media emotion in 2023.

“Real humans, not bots” for the ongoing Social Media reboot of Twitter since September 2022 and Reddit since May 2023 could really leverage it. The “throwaway” culture of Reddit.

ChatGPT GPT-4 is incredibly good at convincing human beings it gives factual information when it really is great at “sounding good, but being factually wrong”. It’s amazing to me how many people have embraced and even shown deep love towards the machines. It’s pretty weird to me that a computer fed facts spits out anti-facts. Back in March I would doing a lot of research on ChatGPT’s fabrication of facts, it made wild claims like Bill Gates traveled to New Mexico when BASIC was first created. It would even give pages from Bill Gate’s book that did not have the quotes it provided. https://www.AuthoredByComputer.com/ has examples I documented.

EDIT: another example, facts about simple computer chips it would make up about in a book, claiming they had more RAM in the chip than they did, etc: https://www.AuthoredByComputer.com/chatgpt4/chatgpt4-ibm-ps2-uart-2023-03-16a

Lmaydev@programming.dev · 1 year ago

It’s because it isn’t fed facts really. Words are converted into numbers and it understands the relationship between them.

It has absolutely no understanding of facts, just how words are used with other words.

It’s not like it’s looking up things in a database. It’s taking the provided words and applying a mathematical formula to create new words.

RoundSparrow@lemmy.ml · 1 year ago

It’s because it isn’t fed facts really.

That’s an interesting theory of why it works that way. Personally, I think rights usage, as in copyright, is a huge problem for OpenAI and Microsoft (Bing)… and they are trying to avoid paying money for the training material they use. And if they accurately quoted source material, they would run into expensive costs they are trying to avoid.

!aicopyright@lemm.ee

ezmack@lemmy.ml · 1 year ago

The horde aspect might make it easier. The ones on twitter at least you can tell are just running the same script through a thesaurus basically. 20 people leaving the same comment is a little more obvious than just one

usernotfound@lemmy.ml · 1 year ago

That’s why they’re talking about the next generation.

With AI you can easily generate 100 different ways to say the same thing. And it’s hard to distinguish a bot that’s parroting someone else from a person who’s repeating something they heard.

Ocelot@lemmies.world · 1 year ago

The architecture of lemmy means that the API is completely open. Bots do not need to even scrape or do anything with the website, they can do everything through the API. it doesn’t matter how simple the layout is. Lemmy is open source as well and the API is fully documented and available to the public.

Lemmy devs are going to need to do some additional work to differentiate bot/human accounts. In the meantime its going to be on the admins to identify and remove/ban these accounts.

sugar_in_your_tea@sh.itjust.works · 1 year ago

API is fully documented

Yes, but last I checked, it was documented very poorly. It’s just bad enough that I’m not super motivated to build helpful tools, but not bad enough to discourage trolls.