Memory Mapped IO in Java

Memory Mapped I/O in Java

Bhaskar S

04/20/2011

Overview

Recently one of my friends expressed a need to access log files from production systems running a Java application without logging into them (for troubleshooting purposes). As many of us know, that developers are kept off-limits from accessing production systems due to various compliance and control reasons.

One of the thoughts was to include a HTTP server in the users Java application to allow access to the log files. This way the log file(s) can be viewed through a browser during emergencies. For a typical application in production, there are usually several log files (each upto 10 MB) per day.

My friend tried this approach of using a HTTP interface and used standard file I/O to read the log files and indicated that the java application performance suffered.

This is what prompted me to write this article.

Hands-on with Code

The following is a simple Java program that will allow us to generate a 10 MB log file that we can use for the IO tests:

Listing.1

package com.polarsparc.mmio;

import java.util.Date;
import java.util.Random;
import java.util.logging.Logger;
import java.util.logging.Formatter;
import java.util.logging.FileHandler;
import java.util.logging.LogRecord;

public class LogFileGenerator {
    private static int count = 0;
    private static int logsz = 0;

    public static void main(String[] args) {
        try {
            Random rand = new Random();
            
            Logger logger = Logger.getLogger("com.polarsparc.mmio");

            FileHandler handler = new FileHandler("TestLog.log", true);
            handler.setFormatter(new MyLogFormatter());

            logger.addHandler(handler);

            for (;;) {
                if (logsz >= 10 * 1024 * 1024) {
                    break;
                }

                logger.info("This is a log line #" + ++count + 
                    " which will be logged for ID " +
                    rand.nextInt(Integer.MAX_VALUE));
            }
        }
        catch (Exception ex) {
            ex.printStackTrace(System.err);
        }
    }

    private static class MyLogFormatter extends Formatter {
        public String format(LogRecord rec) {
            StringBuilder sb = new StringBuilder();
            sb.append(new Date());
            sb.append(" ");
            sb.append(rec.getLoggerName());
            sb.append(" <");
            sb.append(rec.getThreadID());
            sb.append("> ");
            sb.append(rec.getMessage());
            sb.append("\n");

            logsz += sb.length();

            return sb.toString();
        }
    }
}

Running this simple program will generate a log file called TestLog.log which is about 10 MB in size.

The following is a simple program that allows us to read the TestLog.log file using the standard Java IO:

Listing.2

package com.polarsparc.mmio;

import java.io.FileInputStream;
import java.io.BufferedInputStream;

public class StandardFileIO {
    public static void main(String[] args) {
        long tm = 0;

        BufferedInputStream bin = null;

        try {
            bin = new BufferedInputStream(new FileInputStream("TestLog.log"));

            tm = System.currentTimeMillis();

            while (bin.available() != 0) {
                bin.read();
            }

            System.out.printf("Time to read file TestLog.log: %d ms\n", (System.currentTimeMillis()-tm));
        }
        catch (Exception ex) {
            ex.printStackTrace(System.err);
        }
        finally {
            if (bin != null) {
                try {
                    bin.close();
                }
                catch (Exception ex) {
                }
            }
        }
    }
}

Running this program results in the following output:

Output.1

Time to read file TestLog.log: 15180 ms

It took almost 15 secs to read the entire 10 MB file. The reason for this is we are reading 1 byte at a time. Performance can be improved using buffered I/O.

The following is a simple program that allows us to read the TestLog.log file using the standard Java Buffered I/O:

Listing.3

package com.polarsparc.mmio;

import java.io.FileReader;
import java.io.BufferedReader;

public class StandardBufferedIO {
    public static void main(String[] args) {
        long tm = 0;

        BufferedReader br = null;

        try {
            br = new BufferedReader(new FileReader("TestLog.log"));

            tm = System.currentTimeMillis();

            while (br.readLine() != null) {
            }

            System.out.printf("Time to read file TestLog.log: %d ms\n", (System.currentTimeMillis()-tm));
        }
        catch (Exception ex) {
            ex.printStackTrace(System.err);
        }
        finally {
            if (br != null) {
                try {
                    br.close();
                }
                catch (Exception ex) {
                }
            }
        }
    }
}

Running this program results in the following output:

Output.2

Time to read file TestLog.log: 494 ms

WOW !!! What happened here ? This is orders of magnitude faster. With buffered I/O, the underlying OS reads a block of data from the disk improving performance.

So let us understand the behavior of the simple file reader. When you read a file using the standard file IO, the following steps are executed by the underlying OS:

Allocate memory for read buffer in the user space
Invoke the system call into the kernel for the read operation
Allocate memory for kernel buffer space for the read
Access the appropriate disk sector(s) associated with the file
Copy the disk sector data into the kernel buffer space
Copy the data from the kernel buffer into the read buffer in the user space

As you can see a simple file read involves a lot of operations which can be inefficient when reading large amounts of data.

Performance can be improved further by using memory mapped file I/O.

The following is a simple program that allows us to read the TestLog.log file using the Java memory-mapped file I/O (in NIO):

Listing.4

package com.polarsparc.mmio;

import java.io.FileInputStream;
import java.nio.MappedByteBuffer;
import java.nio.channels.FileChannel;

public class MemoryMappedFileIO {
    public static void main(String[] args) {
        long tm = 0;

        FileInputStream fis = null;

        try {
            fis = new FileInputStream("TestLog.log");

            tm = System.currentTimeMillis();

            FileChannel fc = fis.getChannel();

            MappedByteBuffer mbb = fc.map(FileChannel.MapMode.READ_ONLY, 0, fc.size());

            while (mbb.hasRemaining()) {
                mbb.get();
            }

            System.out.printf("Time to read file TestLog.log: %d ms\n", (System.currentTimeMillis()-tm));
        }
        catch (Exception ex) {
            ex.printStackTrace(System.err);
        }
        finally {
            if (fis != null) {
                try {
                    fis.close();
                }
                catch (Exception ex) {
                }
            }
        }
    }
}

Running this program results in the following output:

Output.3

Time to read file TestLog.log: 21 ms

AWESOME !!! What have we got here ? This is orders of magnitude faster. So let us understand the behavior of the memory-mapped file reader.

With memory mapped file IO, the file to be read is mapped to the virtual memory of the underlying OS.

When a read is performed, the following steps are executed by the underlying OS:

Cause a page-fault
Access the appropriate disk sector(s) associated with the file
Copy the disk sector data into the physical memory pages
Read the file data as if reading data from a memory byte buffer

As you can see this is much more efficient way of reading the file in terms of the resources used.

In other words, memory-mapped IO allows us to map a file on disk to memory byte buffer so that when we read the memory buffer, the data is read from the file. There is no overhead of the system call and the creation and copying of buffers. More importantly, from Java perspective, the memory buffer resides in the native space and not in the JVM's heap.

Notice there is no method to unmap the memory mapped file. Once the reference to the MemoryByteBuffer is relinquished, it is garbage collected and unmapped. The reason given by sun/oracle can be found in this bug report:

http://bugs.sun.com/view_bug.do?bug_id=4724038

Hope this helps !!!