Obscure Corners of the Java Language
CONTENTS
Lexical Issues
Types
Integral types
Floating point types
Reference types
Variables
Conversions
Names
Packages, Classes, Interfaces
Arrays
Exceptions
Execution
Statements and Expressions
Threads

Lexical Issues

Java language elements are whitespace, comments and tokens. Elements are identifiers, keywords, literals, separators and operators.
Java files are written in ASCII but may include Unicode characters via an esacepe. For example:
public class Test { public static void \u006d\u0061\u0069\u006e(String[] args) { System.out.println("This works!"); } }
Greedy parsing is used: "a--b" is parsed as "a", "--", "b", rather than "a", "-", "-", "b".
Recognized line terminators are '\n', '\r' and "\r\n". Whitespace consists of space (' '), tab ('\t'), form feed ('\f') and line terminators.
\u001a, or Ctrl-Z, is ignored if it is the last character in the input stream.
Identifiers start with a Java letter, and continue with Java letters or digits. Java letters include a-z, A-Z, _ and $.
The keywords "const" and "goto" are reserved, but are not used in the language.
An integer literal is of type int if it has no suffix, or long if it has 'l' or 'L' as suffix. So there's no suffix indicator for the type int. An integer literal may be decimal, octal or hexadecimal. Decimals start with 1-9, hexadecimals start with 0x or 0X and octals start with 0. ints vary from -231 to 231-1.
A floating point literal is of type float if it has F or f as suffix. A floating point literal is of type double if it has no suffix or if it has D or d as suffix. The formats are defined by IEEE 754. Floats range roughly from 1.4e-45 to 3.4e+38 in magnitude. Doubles range roughly from 4.9e-324 to 1.8e+308 in magnitude.
As a charater literal, or inside a string, you can't use the Unicode escape for CR or LF, since they are converted to actual CR and LF characters during translation. You have to use \n or \r.
"null" is the null literal, and is of null type.
to top

Types

Integral types

If an integer operator has at least one operand of type long, then the computation is carried out in 64 bits and the result is long. If there are not longs, the computation is carried out in 32 bits, and the result is int.
There's no underflow or overflow for integer operations. / and % produce ArithemticException if the right-hand operator is 0.
Casts between any two integral types are permitted.

Floating point types

A JVM may use the "float value set" and "double value set" as possible values for float and double. These correspond to IEEE 754. It is also possible, however, to use the "float extended exponent value set" and the "double extended exponent value set" for float and double, which extend the range of values available. Each value in the float value set is also in the float extended exponent set", and same for double.
The extended sets are used during computations, but the end result, when assigned to a member variable, local variable, method parameters, etc. must be in the normal value set. If strictfp is specified (exactly how is too much to cover here), then intermediate values must always be in the non-extended value set.
float and double values include, in addition to standard numerical values, the special values NaN (not a number), positive zero, negative zero, positive infinity, negative infinity. 0.0 is represented differently from -0.0. However 0.0 == -0.0 is true, and 0.0 > -0.0 is false. 1.0/0.0 is positive infinity, and 1.0/-0.0 is negative infinity.
The result of <, <=, >, or >= is false is either or both operands is NaN. The result of == is false if either or both operands is NaN, so NaN==NaN is false. The result of != is true if either or both operands is NaN, so NaN!==NaN is true.
Any floating type value may be cast to any numeric type. When casting to an integral type, the result is truncated towards 0 (e.g. (int)(-2.9) equals -2).
Floating point values are ordered, apart from NaN.
If a numerical operator is applied to two operands, at least one of which is a double, the computation proceeds using 64 bit floating point arithmetic and the result is a double. If a numerical operator is applied to two operands, at least one of which is a float, and neither of which is double, the computation proceeds using 32 bit floating point arithmetic and the result is a float.
Floating point operations produce no exceptions.

Reference types

Two types are the same run-time type if they are both class or both interface types, loaded by the same class loader, and have the same binary name. Or, if they are both array types, and the component types have the same run-time type.

Variables

There are seven kinds of variables: class variables instance variables array components method parameters constructor parameters exception parameters local variables
A variable may be declared final. It is an error to assign to a final variable more than once. A value does not need to be assigned upon declaration, however. A final variable that is not assigned upon declaration is called a blank final. If a final variable holds a reference, the referred object or array may still be modified.
Default values for variables: byte: (byte)0 short: (short)0 int: 0 long: 0L float: 0.0f double: 0.0d char: '\u0000' boolean: false reference types: null

Conversions

1. Identity conversion: no change in type
2. Widening primitive conversion: numerical value to "wider" numerical value.
  • byte to short int, long, float, double
  • short to int, long, float, double
  • char to int, long, float, double
  • int to long, float, double
  • long to float, double
  • float to double
Conversion of integral types to float or double may involve loss of precision.
3. Narrowing primitive conversion: numerical value to "narrower" numerical value.
  • byte to char
  • short to byte, char
  • char to byte, short
  • int to byte, short, char
  • long to byte, short, char, int
  • float to byte, short, char, int, long
  • double to byte, short, char, int, long, float
Narrowing conversion may involve loss of precision or magnitude. Note that char is unsigned, so conversion to byte or short may result in a negative value.
Conversion from floating point to integral involves two steps: first to long or int, and then if necessary to short, char or byte. In the first step, NaN produces 0, and values outside the range of representable values produce the most positive or most negative representable values. Then conversion to short, char or byte discards all but the lowest n bits For example, (int)2e25f -> 2147483647, (short)2e25f -> -1, (byte)2e25f -> -1, (int)(char)2e25f -> 65535.
4. Widening reference conversion: from a type to a super-type.
  • class S to class T, if S is a subclass of T
  • class S to interface I, if S implements I
  • null to any class, interface or array type
  • interface I to interface J, if I is a subinterface of J
  • interface I to Object
  • array to Object
  • array to Cloneable
  • array to java.io.Serializable
  • array S[] to array T[], if S widens to T, and S and T are reference types
5. Narrowing reference conversion: from a type to a super-type.
  • class S to class T, if S is a superclass of T
  • class S to interface I, if S is not final and doesn't implement I
  • Object to array
  • Object to interface
  • interface to non-final class
  • interface I to final class S, if S implements I
  • interface I to interface J, where I is not a subinterface of J, and there's no method in both with the same name and signature but different return types
  • array S[] to array T[], if S narrows to T, and S and T are reference types
These are possible but require run-time verification, and can produce a ClassCastException.
6. String conversion: anything converts to String.
The null type converts to String, producing "null".
7. Value set conversion: between different value sets for floating point values.
Conversion can take place during assignment, method invocation, casting, string conversion and numeric promotion.
Assignment converion can use identity conversion, or widening conversions. In addition, a narrowing conversion can be used when the value is a constant and the compiler can determine that the value is representable in the narrowed type. This permits:
byte b1 = 78; // ok final int i = 89; byte b2 = i; // ok int j = 89; byte b3 = j; // compile error
Method invocation conversion can use identity conversion, or widening conversions, but never narrowing conversions.
String conversion applies only in the context of the + operator, when one of the operands is a String.
Casting conversions include the identity, widening and narrowing conversions.
You can cast a class S to an interface T even if S doesn't implement T (without compiler error), since a subclass of S may in fact implement T. If S is final, though, the compiler will complain.
Numeric conversion permits identity conversions and widening conversions. Unary numeric conversion involves converting a single byte, char or short to int, when it is used as an array dimension, an array index, with unary + or -, with ~, and as either operand separately of a shift operator. For binary numeric conversion:
- if either operand is double, the other is converted to double
- else if either is float, the other converts to float
- else if either is long, the other converts to long
- else both are converted to int
to top

Names

A package cannot contain a class or interface, and a subpackage of the same name; that would be ambiguous.
A field and method in the same class can have the same name. The usage context is always enough to disambiguate.
Arrays implement the clone() method. It's a shallow clone, though; if the contents are references, the clone will contain the same references (i.e. not clones of the referenced objects).
A toplevel class not declared public is accessible from within the class's package, not just from within the class.
All members of interfaces (fields and methods) are implicitly public.
Protected members and fields are accessible from within the same package, as well as from subclasses.
Members and fields without any access declaration (i.e. public, protected, private) are accessible from within the same package.
Here's an interesting case which is a bit hard to understand:
// In p/A.java package p; class A { protected int x; }
// In q/B.java package q; class B extends A { public f(A a) { System.out.println(a.x); // compile error! } public f(B b) { System.out.println(b.x); // ok! } }
to top

Packages, Classes, Interfaces

class/interface +-- toplevel class/interface | +-- public | +-- default +-- nested class/interface +-- inner class | +-- local class | +-- anonymous class | +-- non-static member class +-- static member class +-- member interface (implicitly static)
Nested classes/interface come in three varieties:
Inner classes include:
A toplevel class or interface may not be declared protected, private or static. Member classes or interfaces can be declared static, public, protected or private.
A class declared final may not have subclasses.
A method may be declared final, in which case it cannot be overridden.
Inner classes may NOT declare static members, unless they are compile-time constants. They may inherit static members, though.
class A { static int w; } public class B { class C extends A { static final int x = 3; // ok static int y = 4; // error static final int z; // error } }
It's possible for a single method to implement more than one method declared in multiple interfaces that the class implements (if they have the same signature and return type).
Constructors are not inherited:
class A { public A(int x) { ... } } class B extends A { } B b = new B(12); // error
Private constructors are not accessible, even through an implicit constructor:
class A { private A() { } } class B extends A { } B b = new B(); // error: implict constructor for B tries to call super() which is private
A method of an inaccessible class can be called, if it overrides a method in an accessible superclass.
// in one compilation unit public class A { public void f(); } // in a second unit class B extends A { public void f(); } public class C { public static A getA() { return new B(); } } // in a third unit A a = C.getA(); a.f(); // calls B's version of f()
A field inherited via multiple paths is not treated as multiple fields.
public interface I { int x = 1; } public Interface J extends I { } public class A implements I { } public class B extends A implements J { // one instance of member x! }
A static blank final MUST be initialized in a static initializer. A blank final instance member MUST be initialized by the end of every constructor.
It is a compile time error if the initializer for a member can terminate with a checked exception.
Instance initializers may use static member values; the static members are initialized beforehand, even if they appear later in the text. For static and instance members, initialization takes place in textual order, except that final static members initialized with constant expressions are initialized first.
public class A { int x = s; // ok static int s = 5; } public class B { static int t = s; // error (compile-time) static int s = 5; } public class C { static int t = s; // ok: final statics are initialized beforehand final static int s = 5; } public class D { int x = y; // error (compile-time) int y = 5; }
Instance initializers can use this and super.
An abstract method cannot be private, static, strictfp, native, final or synchronized. A native method cannot be strictfp.
An abstract class B that inherits from another abstract class A can redeclare a method from A with fewer exceptions in the throws clause.
public abstract class A { abstract void f() throws E, F; } public abstract class B extends A{ abstract void f() throws E; }
An abstract class can declare abstract a non-abstract method inherited from a superclass, thereby forcing subclasses to implement the method.
public abstract class A { abstract String toString(); }
Private methods and all methods of a final class are implicitly final. They can be redunantly declared final.
If a method in a subclass B overrides a methos in the superclass A, then the set of exceptions declared to be thrown by the method in B must be a subset of the set of exceptions declared to be thrown by the method in A.
A method declared to return a value may not have any return statements, if it always throws an exception.
A class may inherit methods of the same name and signature, or member classes/interfaces of the same name, from more than one interface, but it is a compile-time error if it uses the name.
Memeber interfaces are implicitly static. It is permitted to include the static keyword.
public class A { interface B { // ok: static interface } static interface C { // ok: redundant declaration } }
An instance initializer of a named class may throw a checked exception only if all constructors of the class declare the exception (or a superclass), and the class has at least one explicitly declared constructor. An instance initializer of an anonymous class may throw any exception. A static member initializer may not throw any checked exceptions.
A constructor may not be abstract, static, final, native, strictfp, or synchronized.
The first line of a constructor may be a 'constructor invocation', referring to an alternate constructor (using 'this'), to a superclass constructor (using 'super') or to the super of an expression which instantiates another class (in the case of inner classes). Example of the third:
class A { class B { } } class C extends A.B { public C() { (new A()).super(); } }
In this case, B must be created in the context of an enclosing A.
A constructor invocation may not use any instance variables or methods declared in the class or any superclass, or use this or super in an expression.
Order of initialization:
If a class provides no constructor, then a parameterless default constructor is implicitly provided. It invokes super() with no arguments (except in the case of Object!), and has no throws clause. It has the same accessibility as the class itself.
An interface can be declared abstract, but this is redundant and discouraged.
Each interface with no superinterface implicitly declares all the methods of Object. Consequently, it is not possible for an interface to declare a method that has the same name and signature as a method of Object, with different return type or incompatible throws clause.
All interface members are implicitly public and abstract. They may be redundantly declared public or abstract, if desired. However the use of 'abstract' in this case is for backwards compatibility and is strongly discouraged. Every field memeber of an interface is implicitly static and final. These may be redunantly declared if desired.
public interface A { int x = 1; // implicitly public, final, static int f(); // implicitly public public final static int y = 2; // ok public int g(); // ok public abstract int h(); // ok but discouraged }
All fields in an interface must be initialized. It need not be a constant expression, but can refer to fields from superinterfaces, and fields earlier in the interface definition. However, initializers which are compile-time constant expressions are executed before initializers that are not constant expressions.
public interface A { int x = 1; // implicitly public, final, static } public interface B extends A { int y = x; // executed second int z = 3; // executed first }
An interface may inherit two fields of the same name from different superinterfaces. This is permitted as long as it does not refer to the name.
An interface may include a member type declaration (class or interface). It is implicitly public and static.
to top

Arrays

Array indexes can be byte, short, char or int, but not long.
long x = 5L; int[] a = new int[10]; a[x] = 23; // error
A trailing comma in explicit array initializes is acceptable.
int[] x = { 1, 2, 3 }; // ok int[] y = { 1, 2, 3, }; // ok
Arrays have a public final field 'length', a public method 'clone()' which overrides the version in Object but throws no checked exceptions (unlike the version in Object which throws CloneNotSupportedException), plus the remaining methods from Object. Consequently, it is always possible to clone an array.
The superclass of any array class is Object. Every array class implements java.io.Serializable.
The declaration for an array would be similar to that of this class:
class THE_ARRAY implements Cloneable, java.io.Serializable { public final int length = THE_LENGTH; public Object clone() { try { return super.clone(); } catch (CloneNotSupportedException e) { // should never occur! throw new InternalError(e.getMessage()); } } }
Cloning is shallow: primitives or references are just copied, so in the case of references, there is no attempt to call clone() on the array elements. Consequently, a clone of a multi-dimensional array only provides new storage for the highest dimension: lower-level dimensional storage is shared.
ArrayStoreException is possible if a value assigned to an array element is not assignment-compatible to the run-time type of the array.
class A { } class B extends A { } B[] b = new B[3]; A[] a = b; a[0] = new B(); // ok a[1] = new A(); // run-time error
to top

Exceptions

Throwable is the base class for exceptions.
Throwable +-- Exception | +-- RuntimeException +-- Error
Exception is the base class for exceptions which a program might reasonable want to catch.
RuntimeException is the base class for exceptions which do not have to be declared in throws clauses.
Error is the base class for exceptions which a program should generally not try to catch.
RuntimeException and its subclasses, as well as Error and its subclasses are unchecked exceptions. All other subclasses of Throwable are checked exceptions, and must be declared in throws clauses.
If an exception is never caught, then all finally clauses are executed, the uncaughtException() method of the thread's ThreadGroup is called, and the thread is terminated.
If a finally clause causes an uncaught exception, then any exception that caused termination of the try block is discarded, and the new exception is propagated.
to top

Execution

Steps in launching a Java program:
Interfaces are initialized when the first field that is not initialized by a compile-time constant is used. Note that fields initialized via compile-time constants are handled by the compiler, and do not need to be initialized at run time.
Classes are initialized only when used:
class A { static int x = 1; static { System.out.println("A"); } } class B { static { System.out.println("B"); } } int y = B.x; // only "A" is printed, class B has not been used B b = new B(); // now "B" is printed
Initialization of an interface does not require initialization of any superinterfaces. In fact, if a field of an interface initialized via a compile-time constant is used, the interface does not even need to be initialized.
During instance construction, overridden methods called in a superclass constructor call the methods in the derived class (unlike C++):
class A { public A() { out(); } public void out() { System.out.println("A"); } } class B extends A { public B() { super(); } public void out() { System.out.println("B"); } } B b = new B(); // prints "B"
During instance construction, the superclass constructor completes before initialization of fields in the subclass:
class A { public A() { out(); } public void out() { System.out.println("A"); } } class B extends A { private int x = 1; public B() { super(); } public void out() { System.out.println(x); } } B b = new B(); // prints "0", since x has not yet been initialized
Finalizers do not automatically invoke the superclass finalizer.
Exceptions during the call to a finalizer are ignored.
A finalizer may be invoked explicitly. It will be called again when garbage collected. Calling the finalizer explicitly does not prevent subsequent references to the object.
A program can exit when:
to top

Statements and Expressions

A label can be the same as a package, class, interface, method, field, parameter, or local variable
The value of the expression in a switch statement must be of type byte, short, char or int. Each of the case values must be a compile-time constant, and must be assignable to the type of the expression.
byte b = 2; switch (b) { case 4: b++; break; case 300: b--; break; // compile-time error default: break; }
It is not permitted to includes statements which are not reachable, except in the case of an if statement. The intent is to allow conditional compilation of code.
final static boolean DEBUG = false; int x = 1; while (DEBUG) { // compile error x++; } if (DEBUG) { // ok x++; }
When creating arrays, all dimensions can be defined.
int[][][] a = new int[3][4][5];
When creating classes, the availability of memory is checked before the constructor parameters are evaulated. When creating arrays, the availability of memory is checked after all dimension expressions are evaluated.
Field references are interpreted according to the declared type.
class A { int x = 1; } class B { int x = 2; } B b = new B(); int y = b.x; // y gets 2 A a = b; int z = a.x; // z gets 1
When referring to a static member field through an instance variable, the variable does not need to be non-null.
class A { static int x = 1; } A a = null; int y = a.x; // y gets 1
Return types are not considered when determining which overloaded method to call.
class A { } class B extends A { } class C { public int f(A a) { return 0; } public String f(B b) { return "0"; } public void g() { B b = new B(); int i = b.f(); // compile error - the version of f() returning String is selected } }
When evaluating an array access expression, first the array identifier is evalated, then the index expressions, then the array entry. So if the array is null, the indexes are still evaluated before the null pointer exception is thrown.
The prefix and postfix operators can be applied to any numeric type, including float and double.
Integer division rounds towards 0: 5/2 = 2, and -5/2 = -2.
The modulus operator satisfies a%b = a - (a/b)*b. So:
The signs don't work the same way as for multiplication and division!
Modulus can be applied to floats and doubles as well: 3.59f % 0.2f = 0.19f (or a value very close to it).
The + sign is left associative, so there is the possibility of confusion when it's used as arithmetic addition and string concatenation in the same expression:
1 + 2 + "x" gives "3x" "x" + 1 + 2 gives "x12"
In a << b, a >> b and a >>> b, both a and b must be an integral type. Only the lowest order 5 bits (for int) or 6 bits (for long) of b are used. So 5 << 1024 is 5!
The operators &, ^ and | apply to integral types and to boolean types. The operators && and || apply to boolean types; the difference with & and | is that the second operand is not evaluated if the result of the expression is known from the evaluation of the first operand.
The ?: operator is right-associative: a ? b : c ? d : e == a ? b : (c ? d : e). The subexpression which is not chosen in an ?: expression is not evaluated.
int x = 0; int y = (x > 0) ? ++x : --x; // x now has the value -1
to top

Threads

Two ways to create a thread:
// via subclass of Thread public ThreadedClass1 extends Thread { public void run() { ... } } ThreadedClass1 tc1 = new ThreadedClass1(); tc1.start(); // via implementation of Runnable public ThreadedClass2 implements Runnable { public void run() { ... } } ThreadedClass2 tc2 = new ThreadedClass2(); Thread t = new Thread(tc2); t.start();
Thread states:
In Java 1.5, Thread has a getState() method, which returns a Thread.State:
There's also the isAlive() method, which returns false if the thead is new or dead, and true otherwise.
Thread synchronization:
notifAll() is useful in cases where not all of the threads which are waiting will be able to proceed. All threads are woken, and have a chance, rather than just one being woken, which then finds it is unable to proceed, and then notifies another thread, and so on.
Example: producer consumer
public class ProducerConsumer { public static void main(String[] args) { Slot s = new Slot(); Producer p = new Producer(s); Consumer c = new Consumer(s); p.start(); c.start(); } static class Slot { int value; boolean available = false; public synchronized void put(int v) { while (available) { try { wait(); } catch (InterruptedException e) { } } available = true; notifyAll(); value = v; System.out.println("Put " + value); } public synchronized int get() { while (!available) { try { wait(); } catch (InterruptedException e) { } } available = false; notifyAll(); System.out.println("Got " + value); return value; } } static class Producer extends Thread { Slot slot; public Producer(Slot s) { slot = s; } public void run() { for (int i = 0; i < 100; i++) { slot.put(i); } } } static class Consumer extends Thread { Slot slot; public Consumer(Slot s) { slot = s; } public void run() { for (int i = 0; i < 100; i++) { int v = slot.get(); } } } }
ThreadLocal can be used to privately associate values with the current thread.
// definition: public class ThreadLocal { public Object get(); public void set(Object newValue); public Object initialValue(); } // typical use: db connections are not thread-safe, so a separate one // needs to be associated with each thread public class ConnectionProvider { private static class ConnectionPerThread extends ThreadLocal { public Object initialValue() { return DriverManager.getConnection(... the url ...); } } private ConnectionPerThread cpt = new ConnectionPerThread(); public static Connection getConnection() { return (Connection) cpt.get(); } }
Other useful methods:
Daemon threads: when a user thread exits, the JVM checks if there are any other non-daemon user threads, and exits if there are none. So the existence of a user daemon thread does not prevent application exit, whereas the existence of a non-daemon user thread does. Use setDaemon(boolean) (before starting) and isDaemon() (anytime).
to top