Internet Archive is in danger

Moorshou@lemmy.zip · edit-2 6 months ago

Internet Archive is in danger

A1kmm@lemmy.amxl.com · 6 months ago

Blockchain is great for when you need global consensus on the ordering of events (e.g. Alice gave all her 5 ETH to Bob first, so a later transaction to give 5 ETH to Charlie is invalid). It is an unnecessarily expensive solution just for archival, since it necessitates storing the data on every node forever.

Ethereum charges ‘gas’ fees per transaction which helps ensure it doesn’t collapse under the weight of excess usage. Blocks have transaction limits, and transactions have size limits. It is currently working out at about US$7,500 per MB of block data (which is stored forever, and replicated to every node in the network). The Internet Archive have apparently ~50 PB of data, which would cost US$371 trillion to put onto Ethereum (in practice, attempting this would push up the price of ETH further, and if they succeeded, most nodes would not be able to keep up with the network). Really, this is just telling us that blockchain is not appropriate for that use case, and the designers of real world blockchains have created mechanisms to make it financially unviable to attempt at that scale, because it would effectively destroy the ability to operate nodes.

The only real reason to use an existing blockchain anyway would be on the theory that you could argue it is too big to fail due to legitimate business use cases, and too hard to remove censorship resistant data. However, if it became used in the majority for censorship resistant data sharing, and transactions were the minority, I doubt that this would stop authorities going after node operators and so on.

The real problems that an archival project faces are:

The cost of storing and retrieving large amounts of data. That could be decentralised using a solution where not all data is stored on a chain - for example, IPFS.
The problem of curating data and deciding what is worth archiving, and what is a true-to-source archive vs fake copy. This probably requires either a centralised trusted party, or maybe a voting system.
The problem of censorship. Anonymity and opaqueness about what is on a particular node can help - but they might in some cases undermine the other goals of archival.

parpol@programming.dev · 6 months ago

You suggest IPFS, but isn’t that what web3 is?

Web3 is blockchain + IPFS and/or torrents or whatever p2p protocol.

I am not suggesting storing the data itself on the blockchain, but the index, the equivalent of simple HTML pages on the blockchain so we never lose track of the data we share with torrents or whatever peer to peer protocol.

However, if it became used in the majority for censorship resistant data sharing, and transactions were the minority, I doubt that this would stop authorities going after node operators and so on.

I doubt it would exceed transactions, but if it did, authorities would need a global agreement with every single nation to take down all nodes, and that is never happening.

The problem of curating data and deciding what is worth archiving, and what is a true-to-source archive vs fake copy. This probably requires either a centralised trusted party, or maybe a voting system

I agree with you on this, but a voting system doesn’t sound too difficult to implement. And alternatively the internet archive could be that centralized trusted party. Arresting them for reporting on what data is correct would surely be unconstitutional.