Taalas HC1: 17,000 tokens/sec on Llama 3.1 8B vs Nvidia H200’s 233 tokens/sec. 73x faster at one-tenth the power. Each chip runs ONE model, hardwired into the transistors.

  • bryndos@fedia.io
    link
    fedilink
    arrow-up
    2
    ·
    1 day ago

    Is there such a thing as modular fpga so that you could “plug in” another one and add more gates, sort of daisy chain them? I don’t know if such interfaces exist , sounds like it might need lots of bandwidth.

    • iceberg314@midwest.social
      link
      fedilink
      arrow-up
      1
      ·
      17 hours ago

      I bet you could! The interface and literally be what ever you want with FPGAs. You’d just have to keep things organized and program them one at a time I think

    • morto@piefed.social
      link
      fedilink
      English
      arrow-up
      1
      ·
      22 hours ago

      I know very little about fpgas, so I can’t answer your question, but let’s hope someone else can