• Mika@sopuli.xyz
    link
    fedilink
    English
    arrow-up
    10
    ·
    3 days ago

    I can almost guarantee that hundred billion params LLMs are not trained on that, and are trained on the whole web scraped to the furthest extent.

    The only sane and ethical solution going forward is to force to opensource all LLMs. Use the datasets generated by humanity - give back to humanity.

    • Skullgrid@lemmy.world
      link
      fedilink
      English
      arrow-up
      3
      arrow-down
      7
      ·
      3 days ago

      The only sane and ethical solution going forward is to force to opensource all LLMs.

      Jesus fucking christ. There are SO GODDAMN MANY open source LLMs, even from fucking scumbags like facebook. I get that there’s subtleties to the argument on the ProAI vs AntiAI side, but you guys just screech and scream.

      https://github.com/eugeneyan/open-llms

      • vrighter@discuss.tchncs.de
        link
        fedilink
        English
        arrow-up
        2
        ·
        1 day ago

        there are barely any. I can’t name a single one offhand. Open weights means absolutely nothing about the actual source of those weights.

      • Mika@sopuli.xyz
        link
        fedilink
        English
        arrow-up
        5
        ·
        3 days ago

        even meta

        Lol, ofc meta, they have the biggest bigdata out there, full of private data.

        Most of the opensources are recompilations of existing opensource LLMs.

        And the page you’ve listed is <10b mostly, bar LLMs with huge financing, and generally either copropate or Chinese behind them.