Clickbait title IMO. Essentially Java presents an abstraction and whether the underlying I/O is truly async or emulated depends on the OS. However, such a design allows them to change the underlying implementation to an OS async API at a later time (example: io_uring on linux). At most, perhaps it would be nice if there was an API to query whether it is truly async or not, but most code is going to want it to "just work" and know their code will take advantage of underlying async I/O, when available.
I doubt, as that would break any FileChannel.write(ByteBuffer) that expects the buffer to be consumed (remaining()==0)
I suppose it's possible to create a new set of API or augment the existing ones not to use thread pool, still rather unlikely. Also it'd require a general widely supported async from the OS, incl. Windows
> I doubt, as that would break any FileChannel.write(ByteBuffer) that expects the buffer to be consumed (remaining()==0)
You can't expect that with AsynchronousFileChannel::write at the moment, though, because the buffer is consumed on a background thread. As the docs say [1]:
> Buffers are not safe for use by multiple concurrent threads so care should be taken to not access the buffer until the operation has completed.
And as the article says, on Windows, these writes are "truly" async at the OS level, so portable code already needs to deal with that.
It's pretty important to understand the timeline here, as the article title can lead the consumer to a lot of wrong conclusions.
* AsynchronousFileChannel was released in jdk7 on 2011-7-28, works by submitting io to a JVM thread pool
* Linux io_uring (AIO) released on 2019-5-5, works at the kernel level
The article's point is that AsynchronousFileChannel does not use io_uring, which makes sense given the API was implemented roughly 8 years previous. Yes, AsynchronousFileChannel is Asynchronous, however, it operates in user space and does not use io_uring.
My observation (and praise) is the JVM implementation is unlikely to break backward compatibility. If io_uring support is desired in Java, you have to jump through a few hoops with the current APIs. Not a big deal in my opinion... if you truely need those few extra percentage points of performance none of this is probably news to you.
Before uring, Linux AIO was kind of fake; most operations were blocking. It makes sense that Java would use a thread pool instead of attempting to use Linux AIO.
io uring is similarly “fake async” because it’s also just a thread pool (just owned-operated by the kernel, not a runtime or library). I don’t think anyone has ever bothered to make FS code async, it’s already quite horrible as it is. Not in a mainstream kernel anyway. So you need to be on the FS-less happy path - properly aligned I/O to allocated blocks on some specific FSes - to get async disk IO, this automatically excludes all extending writes and metadata ops. Same for Windows disk IO with IOCPs.
>I don’t think anyone has ever bothered to make FS code async, it’s already quite horrible as it is. Not in a mainstream kernel anyway. [...] Same for Windows disk IO with IOCPs.
IOCP merely allows IO completion to happen on worker threads in a threadpool rather than the particular thread that initiated the IO request. Many concepts in the NT kernel come from a long line of production kernels in mainstream DEC operating systems going back to RSX-11M and VMS. That entire lineage of operating systems (culminating in NT) all have true asynchronous IO using IO Request Packets (IRPs) within the kernel to initiate/queue IO requests and immediately return. There are individual cases within NTFS where blocking will occur but those are special cases rather than the rule as the kernel IO system in general is entirely async.
Despite various attempts at Linux async file APIs over the years, it turned out using a well-implemented thread pool for file I/O was the fastest way to do async file I/O on Linux, in most cases. In cases where it wasn't fastest, it was still close.
A thread pool was also more reliable than the kernel APIs at ensuring overlapped I/O for throughout. libaio (kernel API) calls sometimes turned into blocking calls depending on filesystem internals, in a way that thread pools don't.
POSIX AIO, on Linux, just uses a userspace thread pool. It's implemented in libc and is not particularly fast. You can do better with your own thread pool.
Perhaps Linux threads got faster, faster than the APIs got better.
Even with io_uring, in my tests on fast NVMe storage RAIDs where it should make the most difference, I found io_uring wasn't noticably faster than a well-optimised (for Linux) thread pool.
io_uring can potentally adapt better to shared workload and different cache situations, because it has access to kernel information that userspace thread pools are not granted. But I was surprised to find no significant random-access I/O performance increase between my thread pool and io_uring, when I was trying to optimise both methods to get the best throughput out of a system.
This is highly dependent on the filesystem - as far as I know, basically only XFS works, and only for a small subset of operations, and everything else just blocks.
Tokio's filesystem handling in Rust is the same way by default - a pool of IO threads.
Readiness notification, like epoll, is not really suitable for file backed IO. A file on disk is not a FIFO that you can wait for data to be available; what would even mean for notification to be ready to read for a file fd? Consider that reads can be at any offset.
You need completion notifications instead (like IOCP or io_uring or POSIX aio). You could in principle start an async splice from an FD into a pipe and then poll for the pipe to be ready to read, but IIRC there is no async splice.
Side note, most comments here are talking about Linux, but despite Windows having a nice async IO API, it turns out that it sometimes is also just sometimes synchronous, yikes.
The file, not the file system. NTFS encryption is applied at the file level (and should be fairly rare these days). Bitlocker should not be affected as it is an encrypted volume with a normal file system.
Note that Windows also has weird sync IO in unexpected places, e.g. when normalizing file paths (unless this has changed since I last coded using kernelXX.dll).
"An AsynchronousFileChannel is associated with a thread pool to which tasks are submitted to handle I/O events and dispatch to completion handlers that consume the results of I/O operations on the channel."
This part is close to 17 years old now (java 1.7, 2007). The NIO for files that came with 1.4 (20y+) was not async for files either. I wonder where the 'truly async' idea has come from as the docs are rather explicit about it.
> When not specified, the following system applies:
> - JDK: openjdk-17.0-amd64
> - OS: Ubuntu 20.04.6
> - Scala: 2.13.10
> - Physical disks: USB 2.0 16GB, FAT32
Generally you'd construct a test to be as similar to a real-world situation as possible to get results that resemble those you'd encounter in reality. Ubuntu on a FAT32 thumb stick is, well, unusual.
This doesn't seem to be that big of a problem, just something to know about and keep track of.
First, some framing:
Asynchronous scheduling is about sharing scarce resources efficiently. It allows you to keep your hardware busy without spending gobs of memory on OS threads. Specifically, it allows the system to make progress on many suspendable tasks in parallel. When a task needs to wait for something time-consuming to happen, it can be suspended, freeing up a CPU core to work on other tasks that are not waiting. The time-consuming work can be executed by some sort of centralized shared worker that has a broader view of the work that many tasks are waiting on. Often, that worker is a single thread or a small thread pool that performs blocking IO against a given type of resource (file, network socket, etc.).
About the JVM:
On the JVM, this type of sharing can currently only occur within a single process (at least on Linux). This is because the JVM doesn't currently take advantage of any OS-level asynchronous scheduling primitives. Therefore, the Java standard library itself manages some central IO worker threads within each JVM process, which suspendable tasks can offload work to.
Implementing io_uring, as is being done for the JVM now, will move to using the kernel's own asynchronous scheduling primitives. This will allow sharing to occur cross-process, since the shared workers will now be inside of the kernel. That'll be a nice efficiency gain for systems where many tiny processes run on the same kernel, but it likely won't help that much for big single-purpose machines.
Also of note: The virtual threads adopted in Project Loom allow programs written in a blocking style to behave more like lightweight suspendable tasks, so it has a lot of synergy with an io_uring implementation.
Thats a great summary!
Your level insight does not exist in most devs though. That is part of my motivation with this article... trying to pick under the hood and explore the fundamentals.
This has almost nothing to do with Java and there is nothing revealing in this blog post. Block device operations on Linux always went through layers of indirection. That's why things like page cache or writeback exist. You can always go O_DIRECT if you need the write to be truly synchronous.
If anything the author just discovered how IO works on Linux.
How does Project Loom in JDK 21 change this?
It seems that Java has expended a lot of effort in implementing low cost virtual threads including all the effort needed to make async truly async throughout the Java runtime library.
Loom goes in the opposite direction: it presents a programming model where everything is blocking all the time, then makes threads lightweight, so that's scalable. It doesn't do anything to make file operations truly async; it makes it unnecessary for them to be async.
The JVM with virtual threads will do userspace task scheduling to take jobs on and off active OS threads. As long as Java knows he e to safely suspend the running up action, like socket.read(), the JVM could use a read selector on an off thread and something like epoll() to resume the socket and it's calling task when the blocking call is available and an OS thread capable of resuming it.
Its not really related at all. Loom allows java thread to in certain cases block, but then yield to run a different thread's code. That principle is unrelated to File IO.
Post from mailing list from a couple years ago, for example: https://mail.openjdk.org/pipermail/loom-dev/2021-November/00...