That’s a really clever login system.
That’s a really clever login system.
It should all be opt in
Then you introduce self-selection bias and the data is worthless.
Aggregate data can be used to personally identify
You can’t identify someone based on how they interact with a service. If you spend 5 minutes on one page and 2 minutes on another that could be anyone. Even if you for some reason personally knew someone’s browsing habits it would be nearly impossible to pick them out in a sea of millions of data points.
I see you linked privacyguides.org in the thread as “alternatives”, one of the services it recommends is Proton (Mail, Drive, etc.). Look at their privacy policy:
2.1 Visiting proton.me or protonvpn.com website: We employ a local installation of self-developed analytics tools. Analytics are anonymized whenever possible and stored locally (and not on the cloud). IP addresses are not retained and stored for such analytics.
When you use our native applications, we (or the mobile app platform providers) may collect certain information. We may use mobile analytics software (e.g. fabric.io) app statistics and crash reporting, Play Store app statistics, App Store app statistics, or self-hosted Sentry crash reporting to send crash information to our developers in order to rapidly fix bugs.
Or how about addy.io that privacyguides recommends for email forwarding? From their privacy policy:
We use a self-hosted instance of Umami, an open-source, privacy-focused and lightweight option for website analytics. All the site measurement is carried out absolutely anonymously.
ALL online services collect this kind of data. Even the privacy-focused ones. There is nothing nefarious about it.
Like the comment I replied to already explained, this information is necessary to make informed development decisions. If you don’t know who is using what feature you might be wasting resources on something barely anyone uses while neglecting something everyone needs.
You also need some of that data for security purposes. You can’t implement rate limiting or prevent abuse if you can’t log and track how your services are being interacted with.
And this is aggregate data. I can promise you not a single person cares about what any individual user is doing (assuming it’s not illegal)
Yeah as someone who has worked in web development for over 20 years everything in here is completely standard. Almost every major website in existence collects this kind of analytical data.
This happens to me constantly. Just the other day I asked some friends for something and then they sent the literal exact opposite of that thing. Pretend I asked for blue with red stripes they gave me green with yellow polka dots. And it wasn’t just one person it was three separate people who all decided that made sense for some reason.
I was extremely specific too, even more than usual because I know people constantly misinterpret me. I made extra sure to not use any language with vague meanings and it still happened anyway. It’s like we live in alternate realities where words have completely different meanings.
It makes me not want to talk to people at all.
Again, even an exact copy is not stealing. It’s copyright infringement. Theft is a different crime.
But paraphrasing is not copyright infringement either. It’s no different than Wikipedia having a synopsis for every single episode of a TV series. Telling someone about what a work contains for informational purposes is perfectly fine.
Sorry, I misinterpreted what you meant. You said “any AI models” so I thought you were talking about the model itself should somehow know where the data came from. Obviously the companies training the models can catalog their data sources.
But besides that, if you work on AI you should know better than anyone that removing training data is counter to the goal of fixing overfitting. You need more data to make the model more generalized. All you’d be doing is making it more likely to reproduce existing material because it has less to work off of. That’s worse for everyone.
What you’re asking for is literally impossible.
A neural network is basically nothing more than a set of weights. If one word makes a weight go up by 0.0001 and then another word makes it go down by 0.0001, and you do that billions of times for billions of weights, how do you determine what in the data created those weights? Every single thing that’s in the training data had some kind of effect on everything else.
It’s like combining billions of buckets of water together in a pool and then taking out 1 cup from that and trying to figure out which buckets contributed to that cup. It doesn’t make any sense.
If the model isn’t overfitted it’s also not even copying. By their nature LLMs are transformative which is the whole point of fair use.
For me on Arch, Flatpaks are kinda useless. I can maybe see the appeal for other distros but Arch already has up-to-date versions of everything and anything that’s missing from the main repos is in the AUR.
I also don’t like how it’s a separate package manager, they take up more space, and to run things from the CLI it’s flatpak run com.website.Something
instead of just something
. It’s super cumbersome compared to using normal packages.
Same here. Switched to Arch in 2015 so I am also coming up on the 9 year mark. I have had very few issues, and the ones I have had were usually my fault for doing something stupid. I used Windows, OS X, and Ubuntu previously and compared to those Arch is a dream. Hence why I’ve stuck with it for so long now.
Obviously you’ve never used Arch btw. We live for the sudo pacman -Syu
.
I pretty much never reboot the Pi. It currently has over 18 months of uptime on it. My NAS on the other hand I probably restart for one reason or another maybe once every 6 months. So yeah I’d say I reboot it minimum 3x more often.
Plus a reboot takes much longer on my NAS than on the Pi. The server board is slow to start, the SAS cards are slow to start, and unRAID is slow to start. Then I need to manually enter the password for disk encryption. Then wait for the array to start up. Then wait a bit more for the docker containers to start. Add all of that up and even the absolute fastest reboot is like 10 minutes while the Pi probably takes 30 seconds.
And what if I want to swap hard drives? Now it’s down for an hour. I guess I could wait until 3am to do all my upgrades so everyone is asleep, but I’d rather not. I suppose if it were just for myself it would matter a lot less. But again, it’s only $15 to not have to think about it at all.
I used to do that, but it comes with the problem of your DNS going down any time you want to restart or do a hardware swap on your NAS. Or since it was running in docker something as simple as reloading docker would knock out the internet for a few minutes. It’s worth the $15 to have them operate separately.
$80? I run mine on a Pi Zero that I got for $9 with a $6 wired network adapter for a grand total of $15. No problems for a household of five with one of us (me) being an extremely heavy user.
Yes it’s an exaggeration but it’s not far off. The one for $290 is the aforementioned AOC.
This isn’t a perfect list but pcpartpicker only has 15 monitors with HDR1000 or higher with one being a duplicate so it’s actually 14. If you remove the HDR filters there’s 773 monitors.
That means only 14 out of 773 monitors support HDR properly. And that doesn’t even mean they’re good, just that they support it.
And oops I should have specified 27 inches or under, that is my bad. 27 inches is what I was shopping for recently. Personally I actually prefer 24 inches but they pretty much stopped making good 24 inch ones.
Yep I don’t even play that many games but I watch a lot of movies/TV. HDR works great in mpv. Couple of tweaks in your mpv.conf and you’re off to the races.
Yeah there are like 5 monitors with full array local dimming, most being $500+ except for that one AOC. And OLEDs are still $700+ and have burn-in after a year of desktop use.
Really? For me rspamd blocks at least 15 spam emails a day, usually from China or Russia. An additional 2-3 go to the junk folder, and some still slip through the cracks especially if it’s coming from a gmail address.
But it could be as simple as it being because my email is publicly available (github, my website, etc.) so scrapers are picking it up.