

Yes, it is. But I have llama-swap, openweb-ui. If you spend some time on the llama-swap configuration, then the you have a good chance to run the model on 2 cards is through llama.cpp. The winnings, however, will not be x2 of course and will fall non-linearly from the number of cards. And you need motherboard with good PCI-E lines (2 pci-e x16 or more). But it’s still cheaper than one large card. Example:
HIP_VISIBLE_DEVICES=0,1 \
/opt/llama.cpp/build/bin/llama-server \
--host 127.0.0.1 \
--port 8082 \
--model /storage/models/model.gguf \
--n-gpu-layers all \
--split-mode layer \
--tensor-split 1,1 \
--ctx-size 32768 \
--batch-size 512 \
--ubatch-size 512 \
--flash-attn on \
--parallel 1
There is a less stable but more productive one: --split-mode row
P.S. By the way, one RX9070XT on my instance translates posts and comments. You can test it if you want. =)
if you have an uplink of 1 Gbit/s or less, you can easily solve the problem of ports by purchasing a switch for $3. By the way, there is a mini PC with 4/6/8 ports and even with optical fiber.
and in general, if topic starter build own server, he can just build a router out of it too. the set of programs is not very large: kea-dhcp, radvd, iptables. that’s all. for WiFi, you will need a compatible card in the server or a separate access point like ubiquity.