How many virtual machines can you nest?

atotayo@lemmy.zip · 1 day ago

How many virtual machines can you nest?

litchralee@sh.itjust.works · 1 day ago

I’ll take a stab at the question, but we will need to dive into a small amount of computer engineering to explain. To start, I am going to assume an x86_64 platform, because while ARM64 and other platforms do support hardware-virtualization, x86_64 is the most popular and most relevant since its introduction at the beginning of the 2000s. My next assumption is that we are using non-ancient hardware, for reasons that will become clear.

As a base concept, a virtual machine means that we have a guest OS that runs subordinate to a host OS on the same piece of hardware. The host OS essentially treats the guest OS as though it is just another userspace process, and gives the guest some time on the CPU, however the host sees fit. The guest OS, meanwhile, is itself a full-blown OS that manages its own userspace processes, divying out whatever CPU time and memory that it can get from the host OS, and this is essentially identical to how it would behave if the guest OS were running on hardware.

The most rudimentary form of virtual machine isolation was achieved back in the 1960s, with software-based virtual machines. This meant that the host emulated every single instruction that the guest OS would issue, recreating every side-effect and memory access that the guest wanted. In this way, the guest OS could run without change, and could even have been written in an entirely different CPU architecture. The IBM System/360 family of mainframes could do this, as a way of ensuring business customers that their old software could still run on new hardware.

The drawbacks are that the performance is generally less-than-stellar, but in an era that valued program correctness, this worked rather well. Also, the idea would carry into higher level languages, most notably the Java Virtual Machine (JVM). The Java language generally compiles down to bytecode suitable to run on the JVM (which doesn’t really exist), and then real machines would essentially run a JVM emulator to actually call the program. In this way, Java is a high-level language that can run anywhere, if provided a JVM implementation.

An advancement from software virtualization is hardware-assisted virtualization, where some amount of the emulation task is offloaded to the machine itself. This is most relevant when virtualizing the same CPU architecture, such as an x86_64 guest on an x86_64 host. The idea is that lots of instructions have no side-effects that affect the host, and so can be run natively on the CPU, then return control back to the host when reaching an instruction that has side-effects. For example, the basic arithmetic operation of adding two registers imposes no risks to the stability of the machine.

To do hardware-assisted virtualizaton requires that the hardware can intercept (or traps) such instructions as they appear, since the nature of branch statements or conditionals means that we can’t detect in-advance whether the guest OS will issue those instructions or not. The CPU will merrily execute all the “safe” instructions within the scope of the guest, but the moment that it sees an “unsafe” instruction, it must stop and kick back control to the host OS, which can then deal with that instruction in the original, emulated fashion.

The benefit is that the guest OS remains unmodified (yay for program correctness!) while getting a substantial speed boost compared to emulation. The drawback is that we need the hardware to help us. Fortunately, Intel and AMD rose to the challenge once x86-on-x86 software virtualization started to show its worth after the early 2000s, when VMWare et al demonstrated that the concept was feasible on x86. Intel VT-x and AMD-V are the hardware helpers, introducing a new set of instructions that the host can issue, which will cause the CPU to start executing guest OS instructions until trapping and returning control back to the host.

I will pause to note why same-on-same CPU architecture virtualization is even desirable, since compared to the emulation-oriented history, this might not seem immediately useful. Essentially, software-based virtualization achieved two goals, the latter which would become extremely relevant only decades later: 1) allow running a nested “machine”, and 2) isolate the nested machine from the parent machine. When emulation was a given, then isolation was practically assured. But for same-on-same virtualization, the benefit of isolation is all that remains. And that proved commercially viable when silicon hit a roadblock at ~4 GHz, and we were unable to make practical single-core CPUs go any faster.

That meant that growing compute would come in the form of multiple cores per CPU chip, and this overlapped with a problem in the server market where having a separate server for your web server, and database server, and proxy server, all of these cost money. But seeing as new CPUs have multiple cores, it would save a bunch of money to consolidate these disparate servers into the same physical machine, so long as they could be assured that they were still logically running independently. That is to say, if only they were isolated.

Lo and behold, Intel VT-X and AMD-V were introduced just as core counts were scaling up in the 2010s. And this worked decently well, since hardware-assisted virtualization was a fair order of magnitude faster than trying to emulate x86, which we could have done but it was just too slow for commercialization.

Some problems quickly emerged, due to the limitations of the hardware assistance. The first has to do with how the guest OS expects to operate, and the second to do with how memory in general is accessed in a performant manner. The fix for these problems involves more hardware assistance features, but also relaxing the requirement that the guest OS remain unchanged. When the guest OS is modified to be better virtualized, this is known as paravirtualization.

All modern multi-tasking OS with non-trivial amounts of memory (which would include all guest OS’s that we care about) does not organize its accessible memory as though it were a "flat’ plane of memory. Rather, memory is typically “paged” – meaning that it’s divvied out in pre-ordained chunks, such as 4096 Bytes – and frequently also makes use of “virtual memory”. Unfortunately, this is a clash in nomenclature, since “virtual memory” long predates virtualization. But understand that “virtual memory” means that userspace programs won’t see physical addresses for its pointers, but rather a fictional address which is cleverly mapped back to physical addresses.

When combining virtual memory with pages, the OS is able to give userspace programs the appearance of near-unlimited, contiguous memory, even though the physical memory behind those virtual addresses are scattered all over the place. This is a defining feature of an OS: to organize and present memory sensibly.

The problem for virtualization is that if the host OS is already doing virtual+paged memory management, then it forces the guest OS to live within the host’s virtual+paged environment, all while the guest OS also wants to do its own virtual+paged memory management to service its own processes. While the host OS can rely upon the physical MMU to efficiently implement virtual+paged memory management, the guest OS cannot. And so the guest OS is always slowed down by the host having to emulate this job.

The second issue relates to caching, and how a CPU can accelerate memory accesses if it can fetch larger chunks of memory than what the program might be currently accessing, in anticipation. This works remarkably well, but only if the program has some sense of locality. That is, if the program isn’t reading randomly from the memory. But from the hardware’s perspective, it sees both the host OS and guest OS and all their processes, which starts to approximate a Gaussian distribution when they’re all running in tandem, and that deeply impacts caching performance.

The hardware solution is to introduce an MMU that is amenable to virtualization, one which can manage both the host OS’s paged+virtual memory as well as any guest OS’s paged+virtual memory. Generally, this is known as Second Level Address Translation (SLAT) and is implemented as AMD’s Rrapid Virtualizaion Indexing or Intel’s Extended Page Tables. This feature allows the MMU to consider page tables – the basic unit of any MMU – that nest below a superior page table. In this way, the host OS can delegate to the guest a range of pages, and the guest OS can manage those pages, all while the MMU gives the guest OS some acceleration because this is all done in hardware.

This also helps with the caching situation, since if the MMU is aware that the memory is in a nested page table (ie guest OS memory), then that likely also means the existing cache for the host is irrelevant, and vice-versa. An optimization would be to split the cache space, so that it remains relevant only to the host or to the guest, without mixing up the two.

With all that said, we can now answer your question about what would happen .With hardware extensions like VT-x and SLAT, I would expect that cascading VMs would consume CPU and memory resources almost linearly, due to each guest OS adding its own overhead and running its own kernel. At some point, the memory performance would slow to a crawl, since there’s a limit on how much the physical cache can be split. But the CPU performance would likely be just fine, such as if you ran a calculation for digits of Pi on the 50th inner VM. Such calculations tend to use CPU registers rather than memory from DDR, and so could run natively on the CPU without trapping to any of the guests.

But I like the other commenter’s idea of just trying it and see what happens.

thelivefive@startrek.website · 20 hours ago

I was not able to read this, as I am an idiot, but I upvoted just for the effort.

AwesomeLowlander@sh.itjust.works · 1 day ago

I would like to subscribe to more deep dives on random topics

litchralee@sh.itjust.works · edit-2 1 day ago

All I can offer you are notable rail vs road innovations in the 18th Century, North American electricity supplies, and bicycle wheel construction.

atotayo@lemmy.zip · 1 day ago

Wow, definitely a more complete response than i expected. Thanks a lot

DaMonsterKnees@lemmy.world · 1 day ago

Yeah, homey wins, I think. Cheers to the lot of you!

zxqwas@lemmy.world · 1 day ago

Best approach: Try it. Tell us how it goes. It seems interesting to me but not interesting enough to spend an afternoon doing it.

Weird problems: some virtualising software uses specific hardware features and just outright does not work in a virtual machine.

Your limit should be how many times bigger your actual cpu or ram is compared to what is used by operating system + virtualization software.

Onomatopoeia@lemmy.cafe · 1 day ago

Or OP could ask Mr. Owl. 😁

LettyWhiterock@lemmy.world · 14 hours ago

Let’s find out.

1…

2…

3…

Crash

dohpaz42@lemmy.world · 1 day ago

For the uninitiated:

Treczoks@lemmy.world · 24 hours ago

For Science!

atotayo@lemmy.zip · 1 day ago

Unfortunately i don’t think my little laptop could handle more than 2, maybe 3 layers deep, making my efforts of little relevance to an actual study about the problem

Dasus@lemmy.world · 1 day ago

new_guy@lemmy.world · 1 day ago

I’ve seen a guy going from windows 10 or 11 and nesting VMs with older versions of windows until maybe 98 or 95.

Mind you it took a few days to actually install windows 98 so… Good luck

notfromhere@lemmy.ml · 1 day ago

Each new VM would get a smaller and smaller amount of resources available to it, and as you nest VMs, things start slowing down due to overhead. Think of a hallway. You have to take messages from one end to the other end for instruction processing. Each further nested VM has a longer and longer hallway.

slazer2au@lemmy.world · 1 day ago

Would entirely depend on what your starting hardware is like.

My 8 core desktop with 32GB of ram would do less then the HP DL380 with 24 cores and 192GB of ram.

It would also depend which hypervisor you use. Microsoft Hyper-V would likely run worse then ESXi or qemu.

How would I approach this? Start with something like damn small Linux or Alpine Linux to start with the smallest footprint. Install qemu and allocate everything to the VM. Rinse and repeat untill I get an out of resources error.

troed@fedia.io · 1 day ago

I’ve done Red Hat in VMware on Windows 10, and then run two layers deep Red Hat VMs with hvm kvm under that one.

Not recommended, really unstable.

BananaTrifleViolin@lemmy.world · 1 day ago

Interesting question, I’d imagine that one major limit would be the number of cores your CPU has available. Once you got to more VMs than cores, I’d guess things would quickly grind to a halt?

But I wonder if you could even anywhere near to that point as on searching only L2 VM is mentioned on various sites and that is with warnings of severe performance limitations and for development testing only. While L3 might work the problems may get too bad you can’t practically go beyond that level?

just_another_person@lemmy.world · 1 day ago

I would say somewhere between 0 and 2 if you’re using a hardware-based hypervisor because the ability to address certain host hardware pretty much stops after Host>VM>VM (there are caveats), and the returns are immediately diminished after the first.

If using a software Hypervisor, you can probably go as far as your resources would let you, but I’d guess about 3 or 4 layers deep on most commercial hardware because the required resources the HV needs to track all the layer translations will balloon quite quickly.

ReallyZen@lemmy.ml · 1 day ago

By spinning headless server machines, you’d probably run way more of them. And the process of allocating ressource, transferring image and spinning it will probably be much faster.

Tell us how it goes!