Java Integral Data Types Demystified
The data type is what determines the nature of a variable in Java, that is to say: the values it can hold and the operations it supports. Since Java is a strongly-typed language, all the variables must declare their data type before we can use them. We may conclude that every variable is intended to hold a value of a particular, predefined nature, by the time it is declared, like an integer, a decimal, a character, an alphanumeric string, a boolean or a reference to an object.
Some data types are known as value types . Their nature is very basic, very primitive, they are intended to hold a single value which can be easily stored in a single memory location, for instance an integer or decimal value. These kind of data types are typically called basic, primitive or built-in data types .
Other data types are more complex, and they are typically composed of other basic data types. For instance, an alphanumeric string is composed of a series of characters. These kind of data types are typically known as composite data types .
Basic data types , in Java, are known as primitive data types or built-in data types . The primitive data types in Java are: byte, short, int, log, float, double, boolean, char . These data types are “ built in” the Java programming language, and the programmer cannot define new primitive data types.
On the other hand, composite data types are expressed as object references in Java. For instance an alphanumeric value is declared as a reference to an object of type String . Since this kind of complex data types is formed of other basic data types, a Java programmer is allowed to create or define new complex data types. In Java these are called classes and they are the building blocks of every Java program.
Since Java is an object-oriented programming language , it is also possible to express the primitive data types as objects references. These are, ultimately, simply wrappers over the primitive data type. Nonetheless, the primitive data types are less expensive to create, occupy less memory footprint and operations over them are expected to be faster. Precisely that is the reason why Java provides them.
The Size Matters
Every primitive data type has a predefined size in bits. This memory footprint is totally occupied even if the value been stored in the variable of that type does not require the whole space.
Therefore, the programmer is responsible for understanding the size of every primitive data type if he or she pretends to choose the right one for a variable. If the programmer chooses a primitive data type too small, then there could be circumstances when it is unable to hold a value; but if, conversely, the programmer chooses a primitive data type too big, then memory space will be wasted. This latter problem may go beyond the borders of the RAM memory if such variable is stored in disk or sent through the network, in whose case, disk space and network bandwidth would also be misused.
For instance, if we need to store the age of a person in a variable, we know that we need store an integer that goes between 0 and 120 (just to be sure!). If we use a 4-bytes signed integer data type to store this information, then 3 of the 4 bytes would never be used, because the numbers between 0 and 120 can all be specified using a single byte. Conversely, it we use a 1-byte signed integer to store the age of of a turtle (and we know that turtles may live around 150 years!), then we may be unable to store ages above 127. Because a signed byte can only store values from –128 to 127. Therefore, choosing the right data type is of utmost importance, particularly on the second case, where errors could be introduced in the program.
Some may argument that in the first case, with the amount of RAM memory that computers have these days, we may just as well choose the biggest data type, just to be sure, even when some memory is wasted. This alternative sounds a bit better than choosing an incorrect, smaller data type (which will certainly introduce errors in the program). It is true, this will probably not impact the memory footprint seriously. But it is always better to understand the nature of a variable so that we can choose the appropriate data type and avoid the waste, not only in memory, but also in disk or network bandwidth (if the variable is serialized). Take into account that an application server may be creating thousands of this objects every minute, serializing some of them into files or into databases, and sending others through the network. However, if you are not sure about a variable data type, then it is better to choose the biggest data type available for its kind.
The following table shows the sizes of the Java integral primitive data types which are the ones we will be covering in this article:
Data Type | Size | Minimum | Maximum |
byte | 8 bits | -(2^7) | (2^7)-1 |
short | 16 bits | -(2^15) | (2^15)-1 |
int | 32 bits | -(2^31) | (2^31)-1 |
long | 64 bits | -(2^63) | (2^63)-1 |
char | unsigned 16 bits | 0 | (2^16)-1 |
Table 1: Integral Data Types
Integral Data Type Architecture
In order to understand how the information is stored in a integral data type we are going to provide a real example of a 4-bit, fake data type called nibble . This data type does not exist in Java, but thanks to its really small size we can simplify the explanations and the examples. First, let’s take this nibble into the operating room and see what’s inside of it. How does a integral data type in Java looks beneath the hood?
The Nibble Architecture
Al integral data type in Java has a fixed size in bits, the nibble has a size of 4 bits, where the least significant bit is called bit-0 and the most significant bit is the bit-3, as table 2 shows.
3 | 2 | 1 | 0 |
0 | 0 | 0 | 0 |
Table 2: Inside the Nibble
Note: Evidently, if we apply this to an Java integer data type it would be bit-0 and bit-7. But for the sake of the simplicity of this example we will avoid such explanations from now on.
Signed vs. Unsigned
Now, if we want to store both positive and negative numbers in the nibble , then we say that the nibble is a signed data type , conversely, if we only want to store whole numbers, that is to say, only positive numbers, then we say that the nibble is an unsigned data type .
If the nibble is a signed data type, then we will use the bit-3 to store the sign, if the bit-3 is 0, then the number stored in the remaining 3 bits is a positive number, but, if the bit-3 is 1, then the number stored in the remaining 3 bits is a negative number.
Therefore, we may conclude that if we use one bit to store the sign, then we have less bits to store the number. Since the nibble has 4 bits, we only have 3 bits remaining to store a number.
Example 1: The Signed Nibble
Let’s see the possible values a signed nibble could hold:
POSITIVE NUMBERS | NEGATIVE NUMBERS | |||||||||
0 | 0 | 0 | 0 | =0 | 1 | 0 | 0 | 0 | =-8 | |
0 | 0 | 0 | 1 | =1 | 1 | 0 | 0 | 1 | =-7 | |
0 | 0 | 1 | 0 | =2 | 1 | 0 | 1 | 0 | =-6 | |
0 | 0 | 1 | 1 | =3 | 1 | 0 | 1 | 1 | =-5 | |
0 | 1 | 0 | 0 | =4 | 1 | 1 | 0 | 0 | =-4 | |
0 | 1 | 0 | 1 | =5 | 1 | 1 | 0 | 1 | =-3 | |
0 | 1 | 1 | 0 | =6 | 1 | 1 | 1 | 0 | =-2 | |
0 | 1 | 1 | 1 | =7 | 1 | 1 | 1 | 1 | =-1 |
Table 3: Signed Nibble
Notice how in both tables, the bit-3 represents the sign of the number. In the positive numbers, the bit-3 is 0 all the time, this means positive; while in the negative numbers the bit-3 is 1 all time, this means negative.
At fist sight, it might appear that the negative side may hold more numbers than the positive side, because the negative side reaches –8, while the positive side reaches 7. But this is only apparent, both sides hold exactly 8 numbers, it is just the positive side is the one responsible for holding the zero. There are exactly 8 numbers from 0 to 7 and exactly 8 numbers from –1 to –8.
Calculating Minimum and Maximum Values
It is very simple to calculate the minimum and maximum values that a signed data type can hold. You can do it using powers of 2, since the nibble holds a binary representation of the number.
The maximum positive value can be calculated with the following formula:
f(x) = (2^x)-1
Where x is the number of bits in the data type available to hold a number.
Notice that we have to subtract 1, because the positive side has the responsibility to hold one space for the zero. Since the zero occupies one space in the positive side, then we subtract 1 from the amount of numbers. In the case of the signed nibble this would be: f(3) = (2^3)-1. Which is: 7.
The maximum negative value can be calculated with the following formula:
f(x) = -(2^x)
Where x is the number of bits in the data type available to hold a number.
In this case we do not subtract 1, because in the negative side, the maximum number always corresponds with the amount of numbers the negative side can hold .In the case of the signed nibble f(3) = -(2^3). Which is: -8.
Example 2: The Unsigned Nibble
Let’s see the possible values an unsigned nibble could hold:
POSITIVE NUMBERS | ||||||||||
0 | 0 | 0 | 0 | =0 | 1 | 0 | 0 | 0 | =8 | |
0 | 0 | 0 | 1 | =1 | 1 | 0 | 0 | 1 | =9 | |
0 | 0 | 1 | 0 | =2 | 1 | 0 | 1 | 0 | =10 | |
0 | 0 | 1 | 1 | =3 | 1 | 0 | 1 | 1 | =11 | |
0 | 1 | 0 | 0 | =4 | 1 | 1 | 0 | 0 | =12 | |
0 | 1 | 0 | 1 | =5 | 1 | 1 | 0 | 1 | =13 | |
0 | 1 | 1 | 0 | =6 | 1 | 1 | 1 | 0 | =14 | |
0 | 1 | 1 | 1 | =7 | 1 | 1 | 1 | 1 | =15 |
Table 4: Unsigned Nibble
Notice how in this case the 4 bits are used to hold the number, none of them is used to hold the sign, that is why this nibble is considered unsigned. Since this time we need neither to store the sign nor the negative numbers, we have space for many more positive numbers.
Calculating Minimum and Maximum Values
The minimum number will always be 0, since the unsigned data type can only hold whole numbers. Whereas the maximum number is calculated in the same way we already showed for signed data types. The only difference is that in this case, we have 4 bits instead of 3 available to hold the number information.
The maximum positive value can be calculated with the following formula:
f(x) = (2^x)-1
Where x is the number of bits in the data type available to hold a number.
Notice that we still have to subtract 1, because the zero still occupies one space in the positive side, then we subtract 1 from the amount of numbers. In the case of the unsigned nibble this would be: f(4) = (2^4)-1. Which is: 15.
The only unsigned Java data type is char, which is a 16-bits data type. Therefore, it may hold f(16) = (2^16)-1. Which is 65,535. This is the maximum value a char can hold. All other Java integral data types are signed, as table 1 showed.
Common Mistakes
Overflow/Underflow
It is very easy to introduce bugs in the code if the programmer is not aware of the fact that Java integral operators do not throw any exception when an arithmetic overflow or underflow occurs.
Then, if an exception is not thrown, what happens if we add 1 to the maximum integer, or if we add -1 to minimum integer? Again, let’s see how the signed and unsigned nibbles would react to understand the problem.
Signed Nibble Overflow
Let’s add 1 to our maximum integer in the signed nibble , which in this case is 7. The following table shows the results of the operation 7+1 using the signed nibble .
0 | 1 | 1 | 1 | =7 |
0 | 0 | 0 | 1 | =1 |
1 | 0 | 0 | 0 | =-8 |
Table 5: Signed Nibble Overflow
As you can see, the result of this addition is the negative number –8. The overflow caused the sign bit of the signed nibble to turn on. The problem here is evident, because nobody is expecting that the result of 7 + 1 to be equal to –8, right?
Signed Nibble Underflow
Now let’s add –1 to the minimum integer in the signed nibble , which in this case is –8. The following tables shows the results of the operation –8 + –1
1 | 0 | 0 | 0 | =-8 |
1 | 1 | 1 | 1 | =-1 |
0 | 1 | 1 | 1 | =7 |
Table 6: Signed Nibble Underflow
As you can see, the result of this addition is the positive number 7. The underflow caused the sign bit of the signed nibble to turn off. Again, the problem here is evident, because nobody is expecting that the result of -8 + -1 to be equal to 7, right?
Unsigned Nibble Overflow
Let’s add 1 to our maximum integer in the unsigned nibble , which in this case is 15. The following table shows the results of the operation 15+1 using the unsigned nibble.
1 | 1 | 1 | 1 | =15 |
0 | 0 | 0 | 1 | =1 |
0 | 0 | 0 | 0 | =0 |
Table 7: Unsigned Nibble Overflow
As you can see, the result of this addition is the number 0. Again, nobody is expecting that the result of 15 + 1 to be equal to 0, right?
Unsigned Nibble Underflow
Let’s add -1 to our minimum integer in the unsigned nibble , which in this case is 0. The following table shows the results of the operation 0 + -1 using the unsigned nibble .
Note : Since we cannot express –1 using an unsigned nibble , the operation must be done using signed nibbles and the result will be cast to an unsigned nibble .
0 | 0 | 0 | 0 | =0 |
1 | 1 | 1 | 1 | =-1 |
1 | 1 | 1 | 1 | =15 |
Table 8: Unsigned Nibble Underflow
As you can see, the result of this addition is the number 15. Again, nobody is expecting that the result of 0 - 1 to be equal to 15, right?
This all takes us back to the importance of choosing the right data type for a variable because, evidently, size matters.
Example
According to the Java Language Specification ( JLS 4.2.2 ), the only arithmetic operators that may throw an ArithmeticException are / and %, if a division by 0 or remainder by0 is attempted.
The following listing is an example taken from this same section of the JLS which shows what happens when an arithmetic overflows occurs in Java.
class Test {
public static void main(String args[]) {
int i = 1000000;
System.out.println(i * i); //1
long l = i;
System.out.println(l * l); //2
System.out.println(20296 / (l – i)); //3
}
}
Listing 1: Integer Overflow
This code produces the output: –727379968 1000000000000 and then encounters an ArithmeticException in the division (l –i), because l –i is zero.
In the following article I will cover the floating-point data type architecture.Until then, have fun.
More Reading
Java Language Specification . | Data Type , Primitive Data Type , Composite Data Type |