PolarSPARC |
Introduction to Bytecode Handling with ASM
Bhaskar S | 12/26/2021 |
Overview
ASM is a fast, robust, small, and low-level Java bytecode framework that can be used for either analysing existing java bytecode (from Java classes) or for dynamically generating or manipulating Java bytecode.
ASM uses the Visitor behavioral design pattern. This design pattern is used for separating the operations from an object structure hierarchy (where there are different object types). This results in the flexibility to add newer operations on the objects in the hierarchy without modifying them.
To elaborate, often times there is a need to perform some kind of an operation on each of the objects in the hierarchy, for example, say print. Adding this operation to each of the objects in the hierarchy is a little cumbersome when there are many objects in the hierarchy to deal with. Now, what happens when we have to add another operation, say get_size later ??? This will imply modifying all the objects in the hierarchy again.
To solve this problem, create a visitor class with all the operations and add an accept method to all the objects in the hierarchy which takes in the visitor as a parameter.
One can think of a Java class as an hierarchy of objects (or nodes). There is the class, then the fields, then the methods, and so on.
ASM framework provides two types of API interfaces:
Event Based :: traversing the objects (nodes) of the hierarchy result in events (in the form of callbacks). The advantage with this approach is that it is much faster and more memory efficient. The drawback, however, is that it is more difficult to perform transformations as we only have access to the current object (node) in the hierarchy
Tree Based :: traversing the objects (nodes) of the hierarchy implies storing of all the objects (nodes) of the hierarchy in memory. The advantage with this approach is that it lends itself well for performing complex transformations. However, the drawback is that it is more memory intensive
Setup
The setup will be on a Ubuntu 20.04 LTS based Linux desktop. Ensure at least Java 11 or above is installed and setup. Also, ensure Apache Maven is installed and setup.
To setup the Java directory structure for the demonstrations in this article, execute the following commands:
$ cd $HOME
$ mkdir -p $HOME/java/JavaASM
$ cd $HOME/java/JavaASM
$ mkdir -p src/main/java src/main/resources target
$ mkdir -p src/main/java/com/polarsparc/asm
$ mkdir -p src/main/java/com/polarsparc/visitor
The following is the listing for the Maven project file pom.xml that will be used:
<?xml version="1.0" encoding="UTF-8"?> <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>com.polarsparc.asm</groupId> <artifactId>JavaASM</artifactId> <version>1.0</version> <properties> <maven.compiler.source>17</maven.compiler.source> <maven.compiler.target>17</maven.compiler.target> </properties> <dependencies> <dependency> <groupId>org.ow2.asm</groupId> <artifactId>asm</artifactId> <version>9.2</version> </dependency> </dependencies> </project>
Visitor Pattern
Before we dig into ASM, let us get a grasp on the Visitor behavioral design pattern through an example.
The following is the code for FieldNode that represents a field in an object hierarchy:
/* * Description: Simple class that represents a Field * Author: Bhaskar S * Date: 12/25/2021 * Blog: https://www.polarsparc.com */ package com.polarsparc.visitor; public class FieldNode { private final String name; private final String type; public FieldNode(String n, String t) { this.name = n; this.type = t; } public String getName() { return name; } public String getType() { return type; } public void accept(Visitor v) { v.visit(this); } }
The following is the code for MethodNode that represents a method in an object hierarchy:
/* * Description: Simple class that represents a Method * Author: Bhaskar S * Date: 12/25/2021 * Blog: https://www.polarsparc.com */ package com.polarsparc.visitor; public class MethodNode { private final String name; public MethodNode(String n) { this.name = n; } public String getName() { return name; } public void accept(Visitor v) { v.visit(this); } }
The following is the code for Visitor that represents an interface for the various operations on the nodes of the object hierarchy:
/* * Description: Interface that indicates the various operations * Author: Bhaskar S * Date: 12/25/2021 * Blog: https://www.polarsparc.com */ package com.polarsparc.visitor; public interface Visitor { void visit(FieldNode node); void visit(MethodNode node); }
The following is the code for NodeVisitor that implements the Visitor interface for printing the details of the nodes of the object hierarchy:
/* * Description: Concrete class that implements Visitor to perform print operation * Author: Bhaskar S * Date: 12/25/2021 * Blog: https://www.polarsparc.com */ package com.polarsparc.visitor; public class NodeVisitor implements Visitor { @Override public void visit(FieldNode node) { System.out.printf("FIELD: Name: %s, Type: %s\n", node.getName(), node.getType()); } @Override public void visit(MethodNode node) { System.out.printf("METHOD: Name: %s\n", node.getName()); } }
Finally, the following is the code for DemoVisitor that demonstrates the Visitor pattern:
/* * Description: Demo of the Visitor Pattern * Author: Bhaskar S * Date: 12/25/2021 * Blog: https://www.polarsparc.com */ package com.polarsparc.visitor; public class DemoVisitor { public static void main(String[] args) { FieldNode fn = new FieldNode("name", "string"); MethodNode mn = new MethodNode("greet"); Visitor visitor = new NodeVisitor(); fn.accept(visitor); mn.accept(visitor); } }
Executing the code in Listing.5 above would result in the following output:
FIELD: Name: name, Type: string METHOD: Name: greet
Later, when we need to add a new operation, say getSize(), we can add it to the Visitor and not modify any of the nodes in the object hierarchy.
Hands-on with ASM
The following is the code for SimpleHello that will print a simple message on the console:
/* * Description: Simple Java class to print 'Hello ASM !!!' * Author: Bhaskar S * Date: 12/25/2021 * Blog: https://www.polarsparc.com */ package com.polarsparc.asm; public class SimpleHello { private final static String MESSAGE = "Hello ASM !!!"; public static void main(String[] args) { System.out.println(MESSAGE); } }
The following is the code for SimpleVisitorUsingASM that will demonstrate the event based approach of ASM that will print details of the different nodes (class, field, method, etc) of a Java class object hierachy:
/* * Description: Simple Java class to visit nodes of a class using ASM * Author: Bhaskar S * Date: 12/25/2021 * Blog: https://www.polarsparc.com */ package com.polarsparc.asm; import org.objectweb.asm.*; public class SimpleVisitorUsingASM { static class SimpleMethodVisitor extends MethodVisitor { public SimpleMethodVisitor() { super(Opcodes.ASM9); } public void visitVarInsn(int opcode, int var) { System.out.printf("visitVarInsn: opcode = %d, var = %d\n", opcode, var); } public void visitLocalVariable(String name, String desc, String signature, Label start, Label end, int index) { System.out.printf("visitLocalVariable: Name = %s, Desc = %s, Signature = %s, Index = %d\n", name, desc, signature, index); } public void visitMaxs(int maxStack, int maxLocals) { System.out.printf("visitMaxs: max stack = %d, max locals = %d\n", maxStack, maxLocals); } } static class SimpleClassVisitor extends ClassVisitor { public SimpleClassVisitor() { super(Opcodes.ASM9); } public FieldVisitor visitField(int access, String name, String desc, String signature, Object value) { System.out.printf("visitField: Name = %s, Desc = %s, Signature = %s, Value = %s\n", name, desc, signature, value); return super.visitField(access, name, desc, signature, value); } public MethodVisitor visitMethod(int access, String name, String desc, String signature, String[] exceptions) { System.out.printf("visitMethod: Name = %s, Desc = %s, Signature = %s\n", name, desc, signature); return new SimpleMethodVisitor(); } } public static void main(String[] args) { if (args.length != 1) { System.out.printf("Usage: java %s <class-name>\n", SimpleVisitorUsingASM.class.getName()); System.exit(1); } try { ClassReader reader = new ClassReader(args[0]); reader.accept(new SimpleClassVisitor(), ClassReader.SKIP_FRAMES | ClassReader.SKIP_DEBUG); } catch (Exception ex) { ex.printStackTrace(System.out); } } }
The code in Listing.7 above needs some explanation.
Opcodes :: the ASM interface that defines all the constants including the Java JVM opcodes
MethodVisitor(API_VERSION) :: is an abstract visitor class for visiting the object structure within a Java method. The constructor method takes the API version of ASM to use. As of this article, the most current version of ASM is Opcodes.ASM9
SimpleMethodVisitor :: is our custom method visitor that extends MethodVisitor so we can intercept some of the interesting node visits of the method object structure
visitVarInsn(OPCODE, VAR) :: callback method that is invoked when a local variable instruction is visited. The OPCODE is the Java JVM opcode of the instruction and VAR is the index of the local variable
visitLocalVariable(NAME, DESC, SIGNATURE, START, END, INDEX) :: callback method that is invoked when a local variable declaration is visited. NAME is the name of the local variable, DESC is the JVM type descriptor for the local variable, SIGNATURE is the JVM type signature associated with the local variable, START is the first instruction corresponding to the scope of the local variable, END is the last instruction corresponding to the scope of the local variable, and INDEX is the index of the local variable
visitMaxs(STACK, LOCALS) :: callback method that is invoked when interrogating the java method details like the maximum stack size (STACK) and the number of local variables (LOCALS)
ClassVisitor(API_VERSION) :: is an abstract visitor class for visiting the object structure within a Java class. The constructor method takes the API version of ASM to use. As of this article, the most current version of ASM is Opcodes.ASM9
SimpleClassVisitor :: is our custom class visitor that extends ClassVisitor so we can intercept some of the interesting node visits of the class object structure
visitField(ACCESS, NAME, DESC, SIGNATURE, VALUE) :: callback method that is invoked when a field in a class is visited. ACCESS is the access modifier of the field, NAME is the name of the field, DESC is the JVM type descriptor for the field, SIGNATURE is the JVM type signature associated with the field, and VALUE is the initial value of the field
visitMethod(ACCESS, NAME, DESC, SIGNATURE, EXCEPTIONS) :: callback method that is invoked when a method in a class is visited. ACCESS is the access modifier of the method, NAME is the name of the method, DESC is the JVM type descriptor associated with the method, SIGNATURE is the JVM signature associated with the method, and EXCEPTIONS is the internal names corresponding to the exception classes the method throws
ClassReader(CLASS_NAME) :: a parser that reads the bytecode of the specified CLASS_NAME using the current class loader and allows one to traverse all the object nodes of the class file structure
Executing the code in Listing.7 above would result in the following output:
visitField: Name = MESSAGE, Desc = Ljava/lang/String;, Signature = null, Value = Hello ASM !!! visitMethod: Name = <init>, Desc = ()V, Signature = null visitVarInsn: opcode = 25, var = 0 visitMaxs: max stack = 1, max locals = 1 visitMethod: Name = main, Desc = ([Ljava/lang/String;)V, Signature = null visitMaxs: max stack = 2, max locals = 1
In the next example, we will recreate the Java bytecode for SimpleHello using ASM.
Before we do that, we will dump the bytecode instructions from the SimpleHello class using the utility program javap that is installed as part of Java SE installation.
Open a terminal window and execute the following commands:
$ cd $HOME/java/JavaASM
$ javap -v target/classes/com/polarsparc/asm/SimpleHello.class
The following would be the typical output:
Classfile com/polarsparc/asm/SimpleHello.class Last modified Dec 25, 2021; size 639 bytes SHA-256 checksum 6089eb88882ebc34754f1b1b093640d37a45db9c9bd3c03b7c00c738915bd3fd Compiled from "SimpleHello.java" public class com.polarsparc.asm.SimpleHello minor version: 0 major version: 61 flags: (0x0021) ACC_PUBLIC, ACC_SUPER this_class: #13 // com/polarsparc/asm/SimpleHello super_class: #2 // java/lang/Object interfaces: 0, fields: 1, methods: 2, attributes: 1 Constant pool: #1 = Methodref #2.#3 // java/lang/Object."<init>":()V #2 = Class #4 // java/lang/Object #3 = NameAndType #5:#6 // "<init>":()V #4 = Utf8 java/lang/Object #5 = Utf8 <init> #6 = Utf8 ()V #7 = Fieldref #8.#9 // java/lang/System.out:Ljava/io/PrintStream; #8 = Class #10 // java/lang/System #9 = NameAndType #11:#12 // out:Ljava/io/PrintStream; #10 = Utf8 java/lang/System #11 = Utf8 out #12 = Utf8 Ljava/io/PrintStream; #13 = Class #14 // com/polarsparc/asm/SimpleHello #14 = Utf8 com/polarsparc/asm/SimpleHello #15 = String #16 // Hello ASM !!! #16 = Utf8 Hello ASM !!! #17 = Methodref #18.#19 // java/io/PrintStream.println:(Ljava/lang/String;)V #18 = Class #20 // java/io/PrintStream #19 = NameAndType #21:#22 // println:(Ljava/lang/String;)V #20 = Utf8 java/io/PrintStream #21 = Utf8 println #22 = Utf8 (Ljava/lang/String;)V #23 = Utf8 MESSAGE #24 = Utf8 Ljava/lang/String; #25 = Utf8 ConstantValue #26 = Utf8 Code #27 = Utf8 LineNumberTable #28 = Utf8 LocalVariableTable #29 = Utf8 this #30 = Utf8 Lcom/polarsparc/asm/SimpleHello; #31 = Utf8 main #32 = Utf8 ([Ljava/lang/String;)V #33 = Utf8 args #34 = Utf8 [Ljava/lang/String; #35 = Utf8 SourceFile #36 = Utf8 SimpleHello.java { public com.polarsparc.asm.SimpleHello(); descriptor: ()V flags: (0x0001) ACC_PUBLIC Code: stack=1, locals=1, args_size=1 0: aload_0 1: invokespecial #1 // Method java/lang/Object."<init>":()V 4: return LineNumberTable: line 10: 0 LocalVariableTable: Start Length Slot Name Signature 0 5 0 this Lcom/polarsparc/asm/SimpleHello; public static void main(java.lang.String[]); descriptor: ([Ljava/lang/String;)V flags: (0x0009) ACC_PUBLIC, ACC_STATIC Code: stack=2, locals=1, args_size=1 0: getstatic #7 // Field java/lang/System.out:Ljava/io/PrintStream; 3: ldc #15 // String Hello ASM !!! 5: invokevirtual #17 // Method java/io/PrintStream.println:(Ljava/lang/String;)V 8: return LineNumberTable: line 14: 0 line 15: 8 LocalVariableTable: Start Length Slot Name Signature 0 9 0 args [Ljava/lang/String; } SourceFile: "SimpleHello.java"
The Output.3 from above will guide us to invoke the appropriate ASM visitXXX calls to generate the equivalent Java bytecode.
The following is the code for SimpleHelloUsingASM that will generate the Java bytecode equivalent of SimpleHello using the ASM framework:
/* * Description: Simple Java class to print '*** Hello using ASM !!!' created using ASM * Author: Bhaskar S * Date: 12/25/2021 * Blog: https://www.polarsparc.com */ package com.polarsparc.asm; import org.objectweb.asm.ClassWriter; import org.objectweb.asm.MethodVisitor; import org.objectweb.asm.Opcodes; import java.lang.reflect.Method; public class SimpleHelloUsingASM { private static void createJavaClass(ClassWriter writer) { writer.visit(Opcodes.V17, Opcodes.ACC_PUBLIC, "com/polarsparc/asm/SimpleHelloASM", null, "java/lang/Object", null); } private static void createDefaultConstructor(ClassWriter writer) { MethodVisitor visitor = writer.visitMethod(Opcodes.ACC_PUBLIC, "<init>", "()V", null, null); visitor.visitCode(); visitor.visitVarInsn(Opcodes.ALOAD, 0); // this visitor.visitMethodInsn(Opcodes.INVOKESPECIAL, "java/lang/Object", "<init>", "()V", false); visitor.visitInsn(Opcodes.RETURN); visitor.visitMaxs(1, 1); visitor.visitEnd(); } private static void createStaticMainMethod(ClassWriter writer) { MethodVisitor visitor = writer.visitMethod(Opcodes.ACC_PUBLIC + Opcodes.ACC_STATIC, "main", "([Ljava/lang/String;)V", null, null); visitor.visitCode(); visitor.visitFieldInsn(Opcodes.GETSTATIC, "java/lang/System", "out", "Ljava/io/PrintStream;"); visitor.visitLdcInsn("*** Hello using ASM !!!"); visitor.visitMethodInsn(Opcodes.INVOKEVIRTUAL, "java/io/PrintStream", "println", "(Ljava/lang/String;)V", false); visitor.visitInsn(Opcodes.RETURN); visitor.visitMaxs(2, 1); visitor.visitEnd(); writer.visitEnd(); } static class ByteCodeClassLoader extends ClassLoader { public Class<?> defile(String name, byte[] code) { return super.defineClass(name, code, 0, code.length); } } public static void main(String[] args) { ClassWriter writer = new ClassWriter(ClassWriter.COMPUTE_MAXS); createJavaClass(writer); createDefaultConstructor(writer); createStaticMainMethod(writer); ByteCodeClassLoader loader = new ByteCodeClassLoader(); try { Class<?> helloUsingASMClazz = loader.defile("com.polarsparc.asm.SimpleHelloASM", writer.toByteArray()); Method main = helloUsingASMClazz.getMethod("main", String[].class); main.invoke(null, (Object) new String[] {}); } catch (Exception ex) { ex.printStackTrace(System.out); } } }
The code in Listing.8 above needs some explanation.
ClassWriter(ClassWriter.COMPUTE_MAXS) :: is a class that allows one to visit the various nodes to create the Java class structure in memory. The passed in argument indicates that the class automatically compute the maximum size of the stack and the maximum number of local variables of the methods in the generated class
visit(VERSION, ACCESS, NAME, SIGNATURE, PARENT, INTERFACES) :: before one can start, we need a Java class structure to hold the other nodes in the class hierarchy. This method visits the class structure. The VERSION is the JDK class version, ACCESS is the access modifier of the class, NAME is the name of the class, SIGNATURE is the JVM type signature associated with the class, PARENT is the internal name of the super class, and INTERFACES is the internal names corresponding to the interfaces this class implements
The following illustration depicts how the details from Output.3 related to the class maps to the visit call:
visitMethod(Opcodes.ACC_PUBLIC, "<init>", "()V", null, null) :: this method visit allows us to create the default constructor - notice the name "<init>" and descriptor "()V"
visitCode() :: this method visit begins the methods code
visitEnd() :: this method visit ends a method code or a class code
visitVarInsn(Opcodes.ALOAD, 0) :: this method visit instructs the load of a local variable, which this in this example
visitMethodInsn(OPCODE, OWNER, NAME, DESC, FLAG) :: this method visit instructs a method call of type OPCODE, OWNER indicates the internal name of the class that owns this method, NAME indicates the name of the method, DESC indicates the JVM type descriptor for the method, and FLAG indicates if the OWNER class is an interface
visitInsn(OPCODE) :: this method visit injects a zero operand instruction indicated by OPCODE
visitMaxs(STACK, LOCALS) :: this method visit sets the maximum size of the stack and the number of local variables
The following illustration depicts how the details from Output.3 related to the default constructor maps to the various visit calls:
visitFieldInsn(OPCODE, OWNER, NAME, DESC) :: this method visit allows one to instruct either a load or store of a field object. OPCODE specifies the type of instruction, OWNER indicates the internal name of the class that owns this field, NAME indicates the name of the field, and DESC indicates the JVM type descriptor for the field type
visitLdcInsn(VALUE) :: this method visit allows one to load a constant value that is provided in VALUE
The following illustration depicts how the details from Output.3 related to the main method maps to the various visit calls:
Executing the code in Listing.8 above would result in the following output:
*** Hello using ASM !!!
References