
Current state of GC in Java 23
JVM is famous for its memory management. 90% of Java ecosystem consists of automatic memory cleaning, with garbage collector doing its job all the time. The whole concept is very broad. You can for sure get a PhD in the area, and there still will be something to learn. My ambition here is not that high. What I plan to create here, is a concise summary of the state of GC solutions present in modern Java. I am going to use Java 23 as a reference. Let’s go!
Table of Contents
ToggleOk. One step back. Where are we even going?
When it comes to garbage collection as a generic process, I have a confession to make. In my previous blog, I dedicated a whole series of posts to that topic. Covering classic books like ‘Garbage Collection Algorithms for Automatic Dynamic Memory Management’ and ‘The Garbage Collection Handbook’. If you are interested in the very specifics of the GC algorithms – linked material is just right to start. In this blog post however, I will concentrate more on how GC works in JVM, without going into nitty-gritty details. Every time a new language version appears, there are updates to the existing cleaning algorithms. What is more – new ones are being added too!
That sounds fun. However, usually this knowledge is not being applied in practice. Piotr Przybył – a Java Champion – recently had a presentation (in Polish) about new features in Java 21. What I find interesting, that during the talk, an audience was asked a couple of questions (the link above shows the picture of the audience). The questions were about the usage of different GCs (except for the default one), and tuning of the GCs in general. Almost no hands were raised. Of course, it is a case study proving nothing. However, let’s be honest – knowledge presented here is not the must-have for every Java developer.
That could be my thinking, but obviously it is not 😉 As a part of my SeniorDevRevamp project, this topic was high on my to-do list from the very beginning. As I am a keen believer in scratching out things from short-term memory – let’s not wait any longer.
JMM&GC 101
It is impossible to fully understand GC in Java, without a quick recap of the Java Memory Model. As GC frees the memory, we must know what that memory looks like. Let’s start with the overall view of the JVM architecture.
What we are interested in, are the boxes within the Runtime Data Area – heap and stack to be precise. Those two are the ones that every programmer (un)knowingly interacts with. Every thread that executes in the JVM has its own stack, where it keeps primitive variables values, and references to the actual objects (the ones that inherits from Object class). The actual objects are created and live on the heap.
The heap operates using generational principle. It is a simple rule that makes a bold assumption, that a lot of objects created during the program execution die fast (and young 😉 ). The total number of objects that live long is not that big. Therefore, the heap is split into two separate parts – young generation and old generation. Within these two, we got more fine-grained parts, which are depicted below.
The picture should be read from left to right. Every new object that is created, lands in the Eden of the Young Gen. With every (mili)second of its existence, it moves to the right. GC performs its job on every part of the specific generation, and if the objects are still needed, they are moved to the “older” part of the heap.
The original approach to cleaning the memory was a simple loop executed all the time:
- mark
- sweep
- compact
Those were the steps performed by the GC. First it marked the objects that were no longer needed. Then it removed them, and compated the memory. If you are a dinosaur like me, you probably remember the disc defragmentation tool in Windows. That is what GCs were doing back in the day. The problem was the pause that was needed to perform that operation. At its core, the main problem was not the cleaning itself, but stopping all the app’s threads from changing the object graph. Stop-the-world pause, used to be a source of all the bad things that was said about Java’s performance. If – again – you are a dinosaur like me, and remember using NetBeans IDE version 6 around year 2008 you know what I am talking about 😉
Nowadays the climate has changed. A lot has been done in the area of GC algorithms in recent years. What is more – computers got faster, and contain more memory than ever. We witnessed the birth of new GC algorithms that pushed the limits in ways that were unimaginable 10 years ago. Right. 10 years ago. That’s when Java 8 was born.
GC landscape in Java 8
We start with Java 8, as it is still with us, and it is not going anywhere (at least that’s what JetBrains 2023 Survey says).
In the fresh release of Java 8, we had four garbage collectors at our disposal:
- serial
- parallel (a.k.a throughput)
- concurrent mark-sweep
- G1 – was considered experimental up until Java 7u4, marked as production-ready in Java 8u40, and it became the default GC mechanism in Java 9.
Below you can find a short description of every one of them.
- Serial GC – “serial” in the name comes from serial-processing, and that translates to the one thread doing GC job. It is the simplest possible mechanism, but at the same time it is the slowest one. If used to be chosen by the JVM based on the amount of available CPUs/memory. You can check out the “logic” performing that is still present in Java 10. What should be the reason for using it? When we’re running JVM on a machine with a single CPU or circa 1,8GB (or less) memory.
- Parallel GC – used to be a default since Java 7u4 on the server-like machines. It was also performing stop-the-world during cleaning of both young and old gen, but was able to use several threads for cleaning. That explains the “parallel” in the name. It is almost the most efficient one in terms of reclaiming memory, at the cost of longer pause times.
- CMS (Concurrent mark-sweep) GC – main idea for this collector was to reduce the pause times during old gen cleaning. For the young gen cleaning, it still stopped all the application threads, and performed the process using multiple threads. Although, it used a different algorithm for that compared to ParallelGC. The biggest difference was present during old gen cleaning. It had a thread periodically scanning old gen, and performing the cleaning on the spot. That reduced the overall old gen cleaning, and that is why we can call CMS first low-pause GC. However, as G1 appeared, CMS lost its power, and was eventually removed in Java 14.
- G1 (Garbage First) – the idea for this GC was to address problem with big heaps. Back in the day (remember we’re around 2013-2014), the heap with a size over 4 GB was considered big. Today we call it Spring Boot microservice 😉 The main idea here was to divide the heap into regions (although still with generations in mind). With young gen cleaning, G1 stops the app threads, and uses multiple threads for marking and sweeping. During major GC there is no huge stop-the-world present. Separate threads are operating on the regions. When they find all the objects that should still live, those objects are being moved to a separate region. The source region can then be seen as containing only garbage, and can be freely overridden with the new data. Below picture presents the heap fragmented into regions.
G1 from Java 8 to modern era
That was the landscape in 2014. At this moment Java had to face new challenges that the world was posing in front of it, such as:
- rise of the containers
- cloud computing everywhere
- hardware power increase
- existence of gigantic monoliths, and the birth of the microservices architecture at the same time
All of those problems were addressed by promoting G1 to become a default GC mechanism in Java 9. Originally designed as a low-pause one, it gained a lot of capabilities along the way. Up until now it remains the default choice in modern JVMs. Let’s see what changed in G1 during all those years.
In his 2022 presentation, one of the G1 devs shared that over 1k fixes were provided for G1 since Java 8. It really makes a significant number. Below I list all the changes that are worth mentioning when it comes to G1.
Java 9
- introduction of the full-parallel major GC process
Java 12
- JEP-346 fixed a problem with long wait time for heap memory to be released to the OS
- JEP-344 dealt with the problem of region-set being too big for G1 to clean in the predefined execution time
Java 14
- JEP-345 introduced NUMA-aware memory allocation, which improved G1 performance on bigger machines
Java 23
- JEP-474 ZGC: Generational Mode by Default
Unfortunately, there’s no more JEPs after that in the area. However, I think it’s not that bad – G1 has reached its maturity. Obviously, there won’t be that many groundbreaking changes with every release. The overview of all the changes in this GC can be found in the aforementioned presentation, and very detailed information about G1/Parallel can be found on Thomas Schatzl blog. In general – G1 is a compromise between latency, throughput and CPU usage. In the published benchmarks, Oracle shows constant improvements for G1, but in several use cases, the competitors are better. More on that later.
So how do I tweak this thing?
I couldn’t stop myself from posting this 😉 Also thought heavily about following-up with “Big Bang Theory” one with “It’s funny because it’s true”, but I thought that would be an overkill. However, the truth is that G1 was designed to avoid the million gauges/valves/cogs that can be used (ZGC is even more ascetic – keep on reading). Creating a huge matrix of possible configurations that no-one is able to figure out. Of course, we have a couple of settings that can be used, and here are the main ones:
- -XX:G1HeapRegionSize=n – it sets the size of the region. It must be a value between 1MB and 32MB with allowed values being the power of 2. In total, the goal is to have the heap divided into 2048 regions
- It is advised not to provide the initial size of the young gen (default is 5% of the overall heap).
- -XX:MaxGCPauseMillis=200 – specifies how long should the pause for GC be.
- -XX:InitiatingHeapOccupancyPercent=45 – how much % of the heap must be filled in order for GC to start
- -XX:+UseStringDeduplication – enables string deduplication, which is a feature that was backported to all the existing Java versions back to Java 8, and to all the other GC algorithms. That’s how important that is. You can read more about how it can improve the memory footprint in Spring Boot.
The topic is (as with every fine-tuning process) very deep, and nuanced. I posted here just the very basic stuff – more about tuning G1 can be found in the official docs from Oracle, and this nice article. Overall advice is not to go full-metal-GC on this, and let JVM figure out all the details on its own.
And what with the oldboys?
Truth to be told, since Java 9 there is not that much movement around serial, parallel and CMS. Oh, wait. CMS was actually removed in Java 14 😉 Besides that we got some JEPs to remove rarely-used stuff like JEP-214 and JEP-366, and tasks to comply with work being done in the other areas of JVM (like string deduplication which is now supported in all the other GCs). Some performance improvements were done too (e.g. for shortening young-gen pauses in Java 17 for ParallelGC).
Some bigger work was done there with the release of Java 23 – we got a change in the algorithm, that was performing full GC in the Parallel GC. With the Java 23 – we got something really weird – as Thomas Schtazl points out on his blog – G1 mechanism creeped in there.
With JDK-8329203 we replaced the somewhat unique algorithm of Parallel GC with (basically) G1’s parallel Full GC which does not suffer from these hiccups. At the same time that second end bitmap (taking 1.5% of Java heap size) could be removed as well, while our measurements showed that overall the performance stayed the same (and improved a lot in these problematic cases).
New kids on the block
I have mentioned above that from time to time, a new contender appears in the JVM world. In this chapter I am going to present them all. To keep things short, my interests will be limited only to the general principles, and use cases where these new algorithms shine.
Epsilon
Java 11 introduced Epsilon, which is a GC that does not act like GC 😉 That’s right. Epsilon is rather a tool to benchmark or troubleshoot stuff, rather than a useful GC algorithm. When we set it as the GC of choice, our JVM will start as usual. However, there won’t be any GC performed. Every object that is created inside JVM, stays there forever. No memory gets reclaimed. As I’ve written above – it is a tool for troubleshooting, and testing. More on that you can read in this Oracle blog post.
Shenandoah
With all the changes with licensing/OpenJDK/whatever – a lot of companies started to dabble with the JVM. One of them is RedHat, which wanted a GC algorithm that would concentrate only on the latency, minimizing the GC pause as much as possible. This GC is available only in the specific OpenJDK builds (Oracle does not provide it!) – so it’s best to check the appropriate section in the official docs.
Similar to G1 it uses the concept of the region, but applies it to the whole heap, without assigning regions to the young/old generations. As I am lazy, but also treating this as a remind-me post, below you can find an image taken from the official GC docs, and the description of the overall process.
- Init Mark initiates the concurrent marking. It prepares the heap and application threads for concurrent mark, and then scans the root set. This is the first pause in the cycle, and the most dominant consumer is the root set scan. Therefore, its duration is dependent on the root set size.
- Concurrent Marking walks over the heap, and traces reachable objects. This phase runs alongside the application, and its duration is dependent on the number of live objects and the structure of object graph in the heap. Since the application is free to allocate new data during this phase, the heap occupancy goes up during concurrent marking.
- Final Mark finishes the concurrent marking by draining all pending marking/update queues and re-scanning the root set. It also initializes evacuation by figuring out the regions to be evacuated (collection set), pre-evacuating some roots, and generally prepares runtime for the next phase. Part of this work can be done concurrently during Concurrent precleaning phase. This is the second pause in the cycle, and the most dominant time consumers here are draining the queues and scanning the root set.
- Concurrent Cleanup reclaims immediate garbage regions – that is, the regions where no live objects are present, as detected after the concurrent mark.
- Concurrent Evacuation copies the objects out of collection set to other regions. This is the major difference against other OpenJDK GCs. This phase is again running along with application, and so application is free to allocate. Its duration is dependent on the size of chosen collection set for the cycle.
- Init Update Refs initializes the update references phase. It does almost nothing except making sure all GC and applications threads have finished evacuation, and then preparing GC for next phase. This is the third pause in the cycle, the shortest of them all.
- Concurrent Update References walks over the heap, and updates the references to objects that were moved during concurrent evacuation. This is the major difference against other OpenJDK GCs. Its duration is dependent on number of objects in heap, but not the object graph structure, because it scans the heap linearly. This phase runs concurrently with the application.
- Final Update Refs finishes the update references phase by re-updating the existing root set. It also recycles the regions from the collection set, because now heap does not have references to (stale) objects to them. This is the last pause in the cycle, and its duration is dependent on the size of root set.
- Concurrent Cleanup reclaims the collection set regions, which now have no references to.
What’s the caveat here? Well, as the GC is happening all the time, it consumes more CPU power. However, I tried to find valid benchmarks comparing Shenandoah with other GCs, and I came empty. Some charts and diagrams can be found in this 2018 Aleksy Shipilev presentation (around 47th minute), but we got 2024 already…
ZGC
With Java 11 (but also on Linux, and only as an experimental feature) a new player entered the game too – ZGC. To conclude the topic of its availability – Java 14 had it as experimental for Windows and Mac, and Java 15 announced it as production ready for all the versions. This GC is authored by Oracle, and it concentrates most on the support for big heaps (like TB-sized ones), and also reducing the pause times. Truth to be told – the authors succeeded in that, as ZGC seems to perform better than all the other GCs (including Shenandoah).
The official docs say, that since Java 21 ZGC was rebuilt to support generations. With that act, we had two types of ZGC – default still being the non-generational one, and a generational one. However, the newer version is faster/better than the non-generational – you can see for yourself here. With that in mind, Java 23 made generational solution the default version, with JEP-474.
Which is more impressive – the general idea was to avoid the need of tuning this GC. And the official docs clearly state, that the only setting that should be done is specifying the heap size. And that’s it! For the more advanced stuff – the amount of threads for GC can be specified, uncommitting memory to the OS or NUMA usage.
If you’re interested in the details of how this GC works (yes, I am lazy, don’t want to rewrite the stuff from the docs) – I can recommend great sources. One is a YouTube presentation by Oracle, where a Software Development Director presents the whole solution (with some benchmarks compared to non-generational version) . Second is an actual research paper about ZGC. It dates back to 2022, but is an in-depth source of truth.
One GC to rule them all
Aforementioned Aleksy Shipilev once said that there’s no silver bullet when it comes to GC – there’s always a way between latency and throughput. Which GC should you choose then? The answer is as always – that depends. Below you can see a comparison taken from here.
How can I see what’s going on?
OpenJDK has its own issue tracker, where all the JEPs/tickets are being logged. The easiest way to see what’s going on in the specific area is to create a specific filter. The URL gets very long, so I won’t be pasting it here, but below I present a screenshot with the settings that can be used for tracking GC work. Pay attention to component/subcomponent/fix version filters.
Great source of information (and not that low-level as issue tracker) is Oracle’s site called InsideJava, which revolves around JDK internals. You can filter the articles/URLs there for the specific subcomponent. Of course, there is one dedicated to GC too.
Summary
All right, that was an adventure. Obviously, we just scratched the surface of the GC area. Unfortunately, a lot of other topics are demanding my attention now. Maybe someday I will actually get a job that will allow me to dive into that stuff, and be paid to do that. For the time being I think about giving some love to the other neglected topic in my Java knowledge – modern concurrency. Stay tuned!
Sources used:
- “Java Performance: The Definitive Guide 1st ed.” book – I got it from the old times. Second edition covers Java up to version 17.
- “The well-grounded Java developer 2nd ed.” book
- “Java memory model” book
- “Java 8 in Action” book
- https://tschatzl.github.io/
- https://kstefanj.github.io/
- https://inside.java/tag/gc
- https://connect2grp.medium.com/evolution-of-java-memory-model-af24d5365581
- https://medium.com/platform-engineer/understanding-java-memory-model-1d0863f6d973
- https://foojay.io/today/demystifying-jvm-memory-management/
- https://blog.gceasy.io/java-garbage-collection/
- https://inside.java/2021/10/11/p99-g1-to-infinity-and-beyond/
- https://docs.oracle.com/en/java/javase/21/gctuning/introduction-garbage-collection-tuning.html
Leave a Reply
You must be logged in to post a comment.