I am planning to build a multipurpose home server. It will be a NAS, virtualization host, and have the typical selfhosted services. I want all of these services to have high uptime and be protected from power surges/balckouts, so I will put my server on a UPS.

I also want to run an LLM server on this machine, so I plan to add one or more GPUs and pass them through to a VM. I do not care about high uptime on the LLM server. However, this of course means that I will need a more powerful UPS, which I do not have the space for.

My plan is to get a second power supply to power only the GPUs. I do not want to put this PSU on the UPS. I will turn on the second PSU via an Add2PSU.

In the event of a blackout, this means that the base system will get full power and the GPUs will get power via the PCIe slot, but they will lose the power from the dedicated power plug.

Obviously this will slow down or kill the LLM server, but will this have an effect on the rest of the system?

  • antsu@lemmy.wtf
    link
    fedilink
    English
    arrow-up
    1
    ·
    4 months ago

    What you want are two servers, one for each purpose. What you are proposing is very janky and will compromise the reliability of your services.

  • just_another_person@lemmy.world
    link
    fedilink
    English
    arrow-up
    1
    ·
    4 months ago

    The amount of absolutely wrong answers in here is astounding.

    NO. PCIE is not plug and play. Moreover, having a dead PCIE device that was previously accepting information, and then suddenly stops, is almost guaranteed to cause a kernel panic on any OS because of an overflowing bus of tons of data that can’t just sit there waiting. It’s a house of cards at that point. It’s also going to possibly harm the physical machine when the power comes back on due to a sudden influx of power from an outside PSU powering up a device not meant for such things.

    Why wouldn’t instead think of maybe NOT running an insane workload on such a machine with insanely power hungry GPUs, and maybe go for an AMD APU instead? Then you’ll get all the things you want.

    • catloaf@lemm.ee
      link
      fedilink
      English
      arrow-up
      1
      ·
      edit-2
      4 months ago

      PCIe is absolutely plug and play. Cards have been PnP since the ISA era. You probably meant hot-plug, but it’s hot-pluggable too: https://lwn.net/Articles/767885/

      Any buffered data will sit in the buffer, and eventually be dropped. Any data sent to the buffer while the buffer is full will be dropped. I’m not intimately familiar with communicating with GPUs, but I imagine the only buffers are in the GPU driver (which would either handle the removal or crash) or in the application (which would probably not handle the removal and just crash). Buffering is not really where I would expect to see a problem.

      That said, a GPU disappearing unexpectedly will probably crash your program, if not your whole OS. Physical damage is unlikely, though I definitely wouldn’t recommend connecting two PSUs to one system due to the potential for unexpected… well, potential. Inrush current wouldn’t really be my concern, since it would be pulling from the external PSU which should have plenty of capacity (and over-current protection too, I would hope). And it’s mostly a concern for AC systems, rarely for DC.

  • Decronym@lemmy.decronym.xyzB
    link
    fedilink
    English
    arrow-up
    1
    ·
    edit-2
    4 months ago

    Acronyms, initialisms, abbreviations, contractions, and other phrases which expand to something larger, that I’ve seen in this thread:

    Fewer Letters More Letters
    NAS Network-Attached Storage
    PCIe Peripheral Component Interconnect Express
    PSU Power Supply Unit
    PoE Power over Ethernet
    SSL Secure Sockets Layer, for transparent encryption

    5 acronyms in this thread; the most compressed thread commented on today has 4 acronyms.

    [Thread #806 for this sub, first seen 15th Jun 2024, 14:15] [FAQ] [Full list] [Contact] [Source code]

  • Ptsf@lemmy.world
    link
    fedilink
    English
    arrow-up
    1
    ·
    4 months ago

    Most UPS systems of quality will come with software capabilities. You can leverage this and just use a daemon to check the charge status every minute or so. If it’s ever off AC or reporting charge levels lowering, you can toss the system into a low power profile. This might accomplish what you’re trying to do.

  • You999@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    1
    ·
    4 months ago

    I think the safe option would be to use a smart UPS and Network UPS Tools to shutdown the LLM virtual machine when it’s running on battery. I do something similar with my NAS as it’s running on an older dell R510 so when the UPS goes onto battery it’ll safely shut down that whole machine to extend how long my networking gear will stay powered.