The class file
The JVM class file format is simply a specification of the way bytes are laid out in every .class file.
This is the class file format, it is how every JVM .class file is constructed.
ClassFile {
u4 magic;
u2 minor_version;
u2 major_version;
u2 constant_pool_count;
cp_info constant_pool[constant_pool_count-1];
u2 access_flags;
u2 this_class;
u2 super_class;
u2 interfaces_count;
u2 interfaces[interfaces_count];
u2 fields_count;
field_info fields[fields_count];
u2 methods_count;
method_info methods[methods_count];
u2 attributes_count;
attribute_info attributes[attributes_count];
}Types
The format defines us some types which are needed for components of the .class file.
u4
4 bytes in size.
The u4 type essentially means there is an allocated 4 bytes of space for something, this type is usually used to describe a 32-bit signed integer with range (-231 to 231 - 1).
u2
2 bytes in size.
The u2 type essentially means there is an allocated 2 bytes of space for something, this type is usually used to describe a 16-bit signed integer with range (-32768 to 32767).
u1
1 byte in size.
The u1 type essentially means there is one byte of allocated space for something, this type is usually used to describe an 8-bit signed integer with range (-128 to 127).
Elements
These are the most important elements in the .class file format and the reason why I say most important is because I will not cover all of them, the .class file format is giant and I am only person so covering the entire format is a pretty difficult task but if anyone reading wants to help with covering the other elements you are free to do so.
Magic number
The magic number is simply a number the JVM uses to identify .class files and it needs to exist as the first 4 bytes (u4) of every .class file.
The number is (hex) 0xCAFEBABE or (decimal) 3405691582. Which means the first 4 bytes of the .class file must be: 0xCA 0xFE 0xBA 0xBE.
Major and minor versions
The major and minor elements are part of the versioning system used to identify the JVM version needed to execute the .class file successfully.
The major version element specifies 2 bytes of space for significant releases of the JVM such as JVM 8, 17, 21 and so on.
18
62
17
61
16
60
15
59
14
58
13
57
12
56
11
55
10
54
9
53
8
52
7
51
6.0
50
5.0
49
1.4
48
1.3
47
1.2
46
1.1
45
And the minor version element also specifies 2 bytes of space for the incremental updates, bugfixes, security patches and so on of each JDK version.
Now it isn't actually important and can always be just set to 2 bytes of 0's, it is only needed for when you're working with a very specific JDK update and is used very rarely.
Constant pool count
The constant pool count element is the amount of entries in the constant pool with an allocated size of 2 bytes (u2), it needs to be greater than 0 and needs to start from 1 because the constant pool is indexed from 1 to the constant pool count-1.
Constant pool
The constant pool is a very important element in a .class file and must strictly exist in every .class file.
Now here we must sacrifice a little, writing the entirety of the constant pool specification is also a very difficult task so I will only gloss over how it works and how data is stored in it and accessed from it.
Pool entries
Here lay the most important constant pool entries that you will see most often.
CONSTANT_Utf8_info
1
String content encoded using UTF-8, used very heavily by other entries.
CONSTANT_Class_info
7
An entry pointing to a UTF-8 entry pool index containing the name of a class.
CONSTANT_String_info
8
Used to represent constant object of type String, points to a UTF-8 entry.
CONSTANT_Integer_info
3
Used to represent a 4-byte signed integer constant (int).
CONSTANT_Fieldref_info
9
Used to represent a field reference needed by put* and get* instructions, points to a NameAndType entry.
CONSTANT_Methodref_info
10
Used to represent a method reference needed by invoke* instructions, points to a NameAndType index.
CONSTANT_NameAndType_info
12
Used to represent a field or method by specifying UTF-8 indexes containing the name and descriptor.
An entry is prefixed by it's respective tag and contains the bytes of the entry.
An entry is accessed using instructions like ldc or from field and method definitions that require the entries for descriptors, names and in the case of fields: constant values.
They are found by the index of the entry inside of the constant pool. Which means whenever you see something like a disassembled ldc instruction it isn't actually something like: ldc "Hello"
because it actually is something like ldc #15 assuming 15 is the index of the CONSTANT_String entry required, the disassembler simply links the index specified by the ldc instruction to the respective entry content so we don't have to constantly look in the pool to find out what ldc is referencing.
This is a very light explanation which is why I suggest looking at something like the JVM specification to understand how entries work.
Access flags
The access flags are simply 2 bytes that specify access modifiers using a bitwise OR mask.
in order to use multiple flags the value must be an OR mask out of the needed flags.
ACC_PUBLIC (public)
0x0001
The class declared public may be accessed from outside of its package.
ACC_FINAL (final)
0x0010
The class declared final means there are no subclasses allowed.
ACC_SUPER (super)
0x0020
Treat superclass methods specially when invoked using invokespecial.
ACC_INTERFACE (inteface)
0x0200
Declares the .class file as an interface, no longer a class.
ACC_ABSTRACT (abstract)
0x0400
Declares the class abstract meaning it must not be instantiated.
ACC_SYNTHETIC (synthetic)
0x1000
Declares the class synthetic, no longer present in source code.
ACC_ANNOTATION
0x2000
Declared as an annotation type.
ACC_ENUM (enum)
0x4000
Declared as an enum type.
This class and super class
The this_class element is of type u2 specifying 2 bytes of space for an index to a CONSTANT_Class entry which points to a UTF-8 entry containing the class name of this class.
The super_class element is of type u2 specifying 2 bytes of space for an index to a CONSTANT_Class entry which points to a UTF-8 entry containing the name of the superclass of this class (e.g. Object).
The rest
The rest has not been written yet as other pages must be done too, in the near future this will be finished but for now that is all.
Last updated