Memory Mapped I/O in Java
Bhaskar S | 04/20/2011 |
Overview
Recently one of my friends expressed a need to access log files from production systems running a Java application without logging into them (for troubleshooting purposes). As many of us know, that developers are kept off-limits from accessing production systems due to various compliance and control reasons.
One of the thoughts was to include a HTTP server in the users Java application to allow access to the log files. This way the log file(s) can be viewed through a browser during emergencies. For a typical application in production, there are usually several log files (each upto 10 MB) per day.
My friend tried this approach of using a HTTP interface and used standard file I/O to read the log files and indicated that the java application performance suffered.
This is what prompted me to write this article.
Hands-on with Code
The following is a simple Java program that will allow us to generate a 10 MB log file that we can use for the IO tests:
Running this simple program will generate a log file called TestLog.log which is about 10 MB in size.
The following is a simple program that allows us to read the TestLog.log file using the standard Java IO:
Running this program results in the following output:
Time to read file TestLog.log: 15180 ms
It took almost 15 secs to read the entire 10 MB file. The reason for this is we are reading 1 byte at a time. Performance can be improved using buffered I/O.
The following is a simple program that allows us to read the TestLog.log file using the standard Java Buffered I/O:
Running this program results in the following output:
Time to read file TestLog.log: 494 ms
WOW !!! What happened here ? This is orders of magnitude faster. With buffered I/O, the underlying OS reads a block of data from the disk improving performance.
So let us understand the behavior of the simple file reader. When you read a file using the standard file IO, the following steps are executed by the underlying OS:
Allocate memory for read buffer in the user space
Invoke the system call into the kernel for the read operation
Allocate memory for kernel buffer space for the read
Access the appropriate disk sector(s) associated with the file
Copy the disk sector data into the kernel buffer space
Copy the data from the kernel buffer into the read buffer in the user space
As you can see a simple file read involves a lot of operations which can be inefficient when reading large amounts of data.
Performance can be improved further by using memory mapped file I/O.
The following is a simple program that allows us to read the TestLog.log file using the Java memory-mapped file I/O (in NIO):
Running this program results in the following output:
Time to read file TestLog.log: 21 ms
AWESOME !!! What have we got here ? This is orders of magnitude faster. So let us understand the behavior of the memory-mapped file reader.
With memory mapped file IO, the file to be read is mapped to the virtual memory of the underlying OS.
When a read is performed, the following steps are executed by the underlying OS:
Cause a page-fault
Access the appropriate disk sector(s) associated with the file
Copy the disk sector data into the physical memory pages
Read the file data as if reading data from a memory byte buffer
As you can see this is much more efficient way of reading the file in terms of the resources used.
In other words, memory-mapped IO allows us to map a file on disk to memory byte buffer so that when we read the memory buffer, the data is read from the file. There is no overhead of the system call and the creation and copying of buffers. More importantly, from Java perspective, the memory buffer resides in the native space and not in the JVM's heap.
Notice there is no method to unmap the memory mapped file. Once the reference to the MemoryByteBuffer is relinquished, it is garbage collected and unmapped. The reason given by sun/oracle can be found in this bug report:
Hope this helps !!!