How does it work? Does it download all Wikipedia or any other website I point it to (like Project Gutenberg as it says on the website) pages or does it download it when I access the page?
How does it work? Does it download all Wikipedia or any other website I point it to (like Project Gutenberg as it says on the website) pages or does it download it when I access the page?
Both the Internet Archive and Wikipedia are really important. I guess the way is to make alternatives or datahoard in these difficult times.
Crawling the web is an important right for access of information. I think big crawlers shouldn’t dominate the market. Especially since Google isn’t up to par to find anything that is wanted anymore.
You see this on GitHub already. People publish paper results and manuals, along with a few files, and treat that as if it were open source. And this isn’t limited to LLMs, people with CNN papers or crawlers and other results publish a few files and the results on GitHub as if it were open source. I think this is a clash between current scientific community thinking + Big Tech vs Free Software + Free Culture initiatives.
Additionally, you can’t expect something Microsoft/Meta touches to remain untainted for long.
I completely get that someone used to monopolies can’t understand Mastodon. I don’t think it has anything to do with understanding technology, though.
I think that if we work together as people we can achieve more than just a couple of good organizations that can fade. That doesn’t mean I wouldn’t donate to Wikipedia or Internet Archive. The goal isn’t to compete, it isn’t a business. Just to make things stronger.