Chapter 10 of "Speech and Language Processing" from Daniel Jurafsky

obbeel@lemmy.eco.br · 11 hours ago

I think that if we work together as people we can achieve more than just a couple of good organizations that can fade. That doesn’t mean I wouldn’t donate to Wikipedia or Internet Archive. The goal isn’t to compete, it isn’t a business. Just to make things stronger.

obbeel@lemmy.eco.br · 22 hours ago

How does it work? Does it download all Wikipedia or any other website I point it to (like Project Gutenberg as it says on the website) pages or does it download it when I access the page?

obbeel@lemmy.eco.br · 24 hours ago

Both the Internet Archive and Wikipedia are really important. I guess the way is to make alternatives or datahoard in these difficult times.

obbeel@lemmy.eco.br · 1 day ago

Crawling the web is an important right for access of information. I think big crawlers shouldn’t dominate the market. Especially since Google isn’t up to par to find anything that is wanted anymore.

obbeel@lemmy.eco.br · 3 days ago

You see this on GitHub already. People publish paper results and manuals, along with a few files, and treat that as if it were open source. And this isn’t limited to LLMs, people with CNN papers or crawlers and other results publish a few files and the results on GitHub as if it were open source. I think this is a clash between current scientific community thinking + Big Tech vs Free Software + Free Culture initiatives.

Additionally, you can’t expect something Microsoft/Meta touches to remain untainted for long.

obbeel@lemmy.eco.br · 9 days ago

I completely get that someone used to monopolies can’t understand Mastodon. I don’t think it has anything to do with understanding technology, though.

obbeel@lemmy.eco.br · 2 months ago

Chapter 10 of "Speech and Language Processing" from Daniel Jurafsky