Monday 31 July 2017

Hacking lambda expressions in Java

Company News, Java, Hacks, Lambdas, SNAMP,

In this article we will show little-known tricks with Lambda Expression in Java 8 and their limitations. The main audience are senior Java developers, researchers and instrumentation tool writers. We will use only public Java API without com.sun or other internal classes so the code is portable across different JVM implementations.

Quick Intro

Lambda Expression was introduced in Java 8 as a way to implement anonymous methods and, in some cases, as alternative for anonymous classes. At a bytecode level, lambda expression is replaced with invokedynamic instruction. This instruction is used to create implementation of functional interface and its single method delegates a call to the actual method with code defined inside of lambda body.

For instance, we have the following code:

void printElements(List<String> strings){
  strings.forEach(item -> System.out.println("Item = %s", item));
}

This code will be translated by Java compiler to something like this:

private static void lambda_forEach(String item){ //generated by Java compiler
  System.out.println("Item = %s", item);
}

private static CallSite bootstrapLambda(Lookup lookup, String name, MethodType type){ //
  //lookup = provided by VM
  //name = "lambda_forEach", provided by VM
  //type = String -> void
  MethodHandle lambdaImplementation = lookup.findStatic(lookup.lookupClass(), name, type); 
  return LambdaMetafactory.metafactory(lookup, 
    "accept", 
    MethodType.methodType(Consumer.class), //signature of lambda factory
    MethodType.methodType(void.class, Object.class), //signature of method Consumer.accept after type erasure  
    lambdaImplementation, //reference to method with lambda body
    type); 
}

void printElements(List<String> strings){
  Consumer<String> lambda = invokedynamic(#bootstrapLambda)
  strings.forEach(lambda);
}

invokedynamic instruction can be roughly represented as the following Java code:

private static CallSite cs;

void printElements(List<String> strings){
  Consumer<String> lambda;
  //begin invokedynamic
  if(cs == null)
    cs = bootstrapLambda(MethodHandles.lookup(), "lambda_forEach", MethodType.methodType(void.class, String.class));
  lambda = (Consumer<String>) cs.getTarget().invokeExact();
  //end invokedynamic
  strings.forEach(lambda);
}

As you can see, LambdaMetafactory is used for producing a call site with target method handle representing factory method. This factory method returns an implementation of functional interface using invokeExact. If lambda has enclosed variables then invokeExact accepts these variables as actual arguments.

In Oracle JRE 8, metafactory dynamically generates Java class using ObjectWeb Asm that implements functional interface. Additional fields to the generated class can be added if lambda expression encloses external variables. This approach is similar to anonymous classes in Java language with the following differences:

  • Anonymous class is generated by Java compiler at compile-time
  • Class for lambda implementation is generated by JVM at runtime


Note that implementation of metafactory depends on JVM vendor and version.

Of course, invokedynamic instruction is not exclusively used for lambda expressions in Java. Primarily, the instruction is introduced for dynamic languages running on top of JVM. Nashorn JavaScript engine provided by Java out-of-the-box heavily utilizes this instruction.

Later in this article, we will focus on LambdaMetafactory class and its capabilities. The next chapters in this article are based on assumption that you completely understand how metafactory method works and what is MethodHandle. If you want to dive into deep knowledge about invokedynamic and lambda translation please read these articles:


Tricks with lambdas

In this section we show how to use dynamic construction of lambdas for day-to-day tasks.

Checked exceptions and lambdas

It is not a secret that all functional interfaces provided by Java do not support checked exceptions. Checked versus unchecked exceptions in Java is an old holy war. In our opinion, checked exceptions are great mistake in Java language.

What if you want to use code with checked exceptions inside of lambdas used in conjunction with Java Streams? For example, we need to transform list of strings into list of URLs like this:

Arrays.asList("http://localhost/", "https://github.com")
.stream()
.map(URL::new)
.collect(Collectors.toList())

URL(String) has declared checked exception in throws section, therefore, it cannot be used directly as a method reference for Function.

You say "Yes, this is possible using tricks like this":

public static <T> T uncheckCall(Callable<T> callable) {
  try { return callable.call(); }
  catch (Exception e) { return sneakyThrow(e); }
}

private static <E extends Throwable, T> T sneakyThrow0(Throwable t) throws E { throw (E)t; }

public static <T> T sneakyThrow(Throwable e) {
  return Util.<RuntimeException, T>sneakyThrow0(e);
}

// Usage sample
//return s.filter(a -> uncheckCall(a::isActive))
//        .map(Account::getNumber)
//        .collect(toSet());

This is a dirty hack and that's why:

  1. Using of try-catch block
  2. Re-throwing of exception slows down performance
  3. Utilization of type erasure in Java


This problem can be solved in more "legal" way using the following facts:

  1. Checked exceptions are recognized only by compiler of Java programming language
  2. Section throws is just a metadata for the method without semantical meaning at JVM level
  3. Checked and unchecked exceptions are not distinguishable at byte-code and JVM level


The solutions is just a wrapping invocation of Callable.call into the method without throws section:

static <V> V callUnchecked(Callable<V> callable){
  return callable.call();
}

This code will not be compiled by Java compiler because method call has checked exception Exception in its throws section. But we can erase this section using dynamically constructed lambda expression.

At first, we should declare a functional interface that has no throws section but delegates a call into Callable.call:

@FunctionalInterface
interface SilentInvoker {
        MethodType SIGNATURE = MethodType.methodType(Object.class, Callable.class);//signature of method INVOKE

        <V> V invoke(final Callable<V> callable); 
}

The second step is to create an implementation of this interface using LambdaMetafactory and to delegate call of method SilentInvoker.invoke to method Callable.call. As was said previously, throws section is ignored at byte-code level therefore method SilentInvoker.invoke is able to call method Callable.call without declaring checked exception:

private static final SilentInvoker SILENT_INVOKER;

final MethodHandles.Lookup lookup = MethodHandles.lookup();
final CallSite site = LambdaMetafactory.metafactory(lookup,
                    "invoke",
                    MethodType.methodType(SilentInvoker.class),
                    SilentInvoker.SIGNATURE,
                    lookup.findVirtual(Callable.class, "call", MethodType.methodType(Object.class)),
                    SilentInvoker.SIGNATURE);
SILENT_INVOKER = (SilentInvoker) site.getTarget().invokeExact();

Third, write utility method that calls Callable.call without declaration of checked exception:

public static <V> V callUnchecked(final Callable<V> callable) {
  return SILENT_INVOKER.invoke(callable);
}

Now, we can rewrite our stream without any problems with checked exception:

Arrays.asList("http://localhost/", "https://github.com")
.stream()
.map(url -> callUnchecked(() -> new URL(url)))
.collect(Collectors.toList());

This code will compiled successfully because callUnchecked has no declared exceptions in throws section. Moreover, calling of callUnchecked may be inlined using Monomorphic Inline Caching because there is only one class in JVM implements interface SilentInvoker.

If Callable.call throws some exception at runtime then it would be caught by calling without any problem:

try{
  callUnchecked(() -> new URL("Invalid URL"));
} catch (final Exception e){
  System.out.println(e);
}

Also, this approach can be used when you sure that checked exception will never thrown:

callUnchecked(() -> new ObjectName("")); //empty string is a valid object name and never throws MalformedObjectNameException

Complete implementation of this utility method can be found here as a part of open-source project SNAMP.

Working with getters and setters

This chapter is useful for writers of serialization/deserialization for different data formats such as JSON, Thrift etc. Moreover, it might be pretty useful if your code heavily relies on Java Reflection for JavaBean getters and setters.

Getter declared in JavaBean is a method with name getXXX without parameters and non-void return type. Setter declared in JavaBean is a method with name setXXX with single parameter and void return type. These two notations can be represented as functional interfaces:

  1. Getter can be represented as Function where the argument of the function is this reference
  2. Setter can be represented as BiConsumer where the first argument is this reference and the second is a value to be passed into setter.


Now we create two methods which are able to convert any getter or setter into these functional interfaces. It doesn't matter that both functional interfaces are generics and after type erasure actual types are equal to Object. Automatic casting of return type and arguments can be done by LambdaMetafactory. Additionally, Guava's Cache helps to cache lambdas for the same getter or setter.

At first, it is necessary to declare cache for getters and setters. Method from Reflection API represents actual getter or setter and used as a key. Value in the cache represents dynamically constructed functional interface for the particular getter or setter.

private static final Cache<Method, Function> GETTERS = CacheBuilder.newBuilder().weakValues().build();
private static final Cache<Method, BiConsumer> SETTERS = CacheBuilder.newBuilder().weakValues().build();

Second, create factory methods which create an instance of functional interface from the method handle pointing to getter or setter:

private static Function createGetter(final MethodHandles.Lookup lookup,
                                         final MethodHandle getter) throws Exception{
        final CallSite site = LambdaMetafactory.metafactory(lookup, "apply",
                MethodType.methodType(Function.class),
                MethodType.methodType(Object.class, Object.class), //signature of method Function.apply after type erasure
                getter,
                getter.type()); //actual signature of getter
        try {
            return (Function) site.getTarget().invokeExact();
        } catch (final Exception e) {
            throw e;
        } catch (final Throwable e) {
            throw new Exception(e);
        }
}

private static BiConsumer createSetter(final MethodHandles.Lookup lookup,
                                           final MethodHandle setter) throws Exception {
        final CallSite site = LambdaMetafactory.metafactory(lookup,
                "accept",
                MethodType.methodType(BiConsumer.class),
                MethodType.methodType(void.class, Object.class, Object.class), //signature of method BiConsumer.accept after type erasure
                setter,
                setter.type()); //actual signature of setter
        try {
            return (BiConsumer) site.getTarget().invokeExact();
        } catch (final Exception e) {
            throw e;
        } catch (final Throwable e) {
            throw new Exception(e);
        }
}

Automatic casting between Object-based arguments in functional interfaces after type erasure and actual types of arguments and return type in getter or setter is reached through difference between samMethodType and instantiatedMethodType (third and fifth arguments of method metafactory respectively). Instantiated method type is a specialization of the method that provides implementation of lambda.

Third, create facade for these factories with support of caching:

public static Function reflectGetter(final MethodHandles.Lookup lookup, final Method getter) throws ReflectiveOperationException {
        try {
            return GETTERS.get(getter, () -> createGetter(lookup, lookup.unreflect(getter)));
        } catch (final ExecutionException e) {
            throw new ReflectiveOperationException(e.getCause());
        }
}

public static BiConsumer reflectSetter(final MethodHandles.Lookup lookup, final Method setter) throws ReflectiveOperationException {
        try {
            return SETTERS.get(setter, () -> createSetter(lookup, lookup.unreflect(setter)));
        } catch (final ExecutionException e) {
            throw new ReflectiveOperationException(e.getCause());
        }
}

Method information obtained as Method instance using Java Reflection API can be easily transformed into MethodHandle. Take into account that instance methods always have hidden first argument used for passing this into the method. Static methods do not have this hidden parameters. For example, method Integer.intValue() has actual signature public static int intValue(Integer this). This trick is used in our implementation of functional wrappers for getters and setters.

Now it is time to test the code:

final Date d = new Date();
final BiConsumer<Date, Long> timeSetter = reflectSetter(MethodHandles.lookup(), Date.class.getDeclaredMethod("setTime", long.class));
timeSetter.accept(d, 500L); //the same as d.setTime(500L);
final Function<Date, Long> timeGetter = reflectGetter(MethodHandles.lookup(), Date.class.getDeclaredMethod("getTime"));
System.out.println(timeGetter.apply(d)); //the same as d.getTime()
//output is 500

This approach with cached getters and setters can be used effectively in serialization/deserialization libraries (such as Jackson) that use getters and setters during serialization and deserialization. Invocation of these functional interfaces is faster that invocation through Java Reflection API. Complete code you may find here as a part of open-source project SNAMP.

Limitations and bugs

In this section we show some bugs and limitations associated with lambdas in Java compiler and JVM. All these limitations are reproducible on OpenJDK and Oracle JDK with javac version 1.8.0_131 for Windows and Linux.

Construct lambdas from method handles

As you know, lambda can be constructed dynamically using Lambda Metafactory. To achieve that you should specify MethodHandle with points to an implementation of a single method declared by functional interface. Let's take a look at this simple example:

final class TestClass {
            String value = "";

            public String getValue() {
                return value;
            }

            public void setValue(final String value) {
                this.value = value;
            }
        }
final TestClass obj = new TestClass();
obj.setValue("Hello, world!");
final MethodHandles.Lookup lookup = MethodHandles.lookup();
final CallSite site = LambdaMetafactory.metafactory(lookup,
                "get",
                MethodType.methodType(Supplier.class, TestClass.class),
                MethodType.methodType(Object.class),
                lookup.findVirtual(TestClass.class, "getValue", MethodType.methodType(String.class)),
                MethodType.methodType(String.class));
final Supplier<String> getter = (Supplier<String>) site.getTarget().invokeExact(obj);
System.out.println(getter.get());

This code is equivalent to

final TestClass obj = new TestClass();
obj.setValue("Hello, world!");
final Supplier<String> elementGetter = () -> obj.getValue();
System.out.println(elementGetter.get());

Method Supplier.get in lambda implementation just delegates this call to method getValue specified as MethodHandle and object obj used as this argument for this method because getValue is not a static method. But what if we replace getValue method handle with method handle that represents field getter:

final CallSite site = LambdaMetafactory.metafactory(lookup,
                "get",
                MethodType.methodType(Supplier.class, TestClass.class),
                MethodType.methodType(Object.class),
                lookup.findGetter(TestClass.class, "value", String.class), //field getter instead of method handle to getValue
                MethodType.methodType(String.class));

This code should work as expected because findGetter returns method handle that points to field getter and has valid signature. But if you run the code you will see the following exception:

java.lang.invoke.LambdaConversionException: Unsupported MethodHandle kind: getField

Interestingly, but field getter works pretty well if we use MethodHandleProxies:

final Supplier<String> getter = MethodHandleProxies
                                       .asInterfaceInstance(Supplier.class, lookup.findGetter(TestClass.class, "value", String.class)
                                       .bindTo(obj));

Note that MethodHandleProxies is not a decent way to create lambdas dynamically because this class just wraps MethodHandle into a proxy class and delegates call of InvocationHandler.invoke to method MethodHandle.invokeWithArguments. This approach uses Java Reflection and works very slowly.

As we see previously, not all method handles can be used for constructing lambdas at runtime. Only several types of method handles related to methods can be used for dynamic construction of lambda expressions:

  1. REF_invokeInterface that can be constructed by Lookup.findVirtual for interface methods
  2. REF_invokeVirtual that can be constructed by Lookup.findVirtual for virtual methods provided by class
  3. REF_invokeStatic that can be constructed by Lookup.findStatic for static methods
  4. REF_newInvokeSpecial that can be constructed by Lookup.findConstructor for constructors
  5. REF_invokeSpecial that can be constructed by Lookup.findSpecial for private methods and early binding to virtual methods provided by class


Other method handles will cause LambdaConversionException.

Generic exceptions

This bug associated with Java Compiler and ability to declare generic exception in throws section. The following simple code demonstrates this behavior of compiler:

interface ExtendedCallable<V, E extends Exception> extends Callable<V>{
        @Override
        V call() throws E;
}

final ExtendedCallable<URL, MalformedURLException> urlFactory = () -> new URL("http://localhost");
urlFactory.call();

This code should be successfully compiled because URL construct throws MalformedURLException. But it is not. Compiler produces the following error message:

Error:(46, 73) java: call() in <anonymous Test$> cannot implement call() in ExtendedCallable
  overridden method does not throw java.lang.Exception

But if we replace lambda expression with anonymous class then code is compiled successfully:

final ExtendedCallable<URL, MalformedURLException> urlFactory = new ExtendedCallable<URL, MalformedURLException>() {
            @Override
            public URL call() throws MalformedURLException {
                return new URL("http://localhost");
            }
        };
urlFactory.call();

Generic constraints

Generic with multiple bounds can be constructed using ampersand sign: <T extends A & B & C & ... Z>. This kind of generic parameter definition is used rarely but it has some impact on lambdas in Java due to its limitations:

  1. Every bound, except the first one, must be an interface
  2. Raw version of the class with a such kind of generic takes into account only first bound in constraint


The second limitation produces different behavior of Java compiler at compile time and JVM at runtime when linkage of lambda expression occurs. This behavior can be reproduced using the following code:

final class MutableInteger extends Number implements IntSupplier, IntConsumer{ //mutable container of int value
        private int value;

        public MutableInteger(final int v){
            value = v;
        }

        @Override
        public int intValue() {
            return value;
        }

        @Override
        public long longValue() {
            return value;
        }

        @Override
        public float floatValue() {
            return value;
        }

        @Override
        public double doubleValue() {
            return value;
        }

        @Override
        public int getAsInt() {
            return intValue();
        }

        @Override
        public void accept(final int value) {
            this.value = value;
        }
}

static <T extends Number & IntSupplier> OptionalInt findMinValue(final Collection<T> values){
  return values.stream().mapToInt(IntSupplier::getAsInt).min();
}

final List<MutableInteger> values = Arrays.asList(new MutableInteger(10), new MutableInteger(20));
final int mv = findMinValue(values).orElse(Integer.MIN_VALUE);
System.out.println(mv);

This code is absolutely correct and successfully compiled by Java compiler. Class MutableInteger satisfies to the multiple bounds of generic T:

  1. MutableInteger inherits from Number
  2. MutableInteger implements IntSupplier


But this code will throw an exception at runtime:

java.lang.BootstrapMethodError: call site initialization exception

    at java.lang.invoke.CallSite.makeSite(CallSite.java:341)
    at java.lang.invoke.MethodHandleNatives.linkCallSiteImpl(MethodHandleNatives.java:307)
    at java.lang.invoke.MethodHandleNatives.linkCallSite(MethodHandleNatives.java:297)
    at Test.minValue(Test.java:77)
Caused by: java.lang.invoke.LambdaConversionException: Invalid receiver type class java.lang.Number; not a subtype of implementation type interface java.util.function.IntSupplier
    at java.lang.invoke.AbstractValidatingLambdaMetafactory.validateMetafactoryArgs(AbstractValidatingLambdaMetafactory.java:233)
    at java.lang.invoke.LambdaMetafactory.metafactory(LambdaMetafactory.java:303)
    at java.lang.invoke.CallSite.makeSite(CallSite.java:302)
    ... 26 more

That happens because pipeline of Java Stream captures only raw type which is Number class. Number class doesn't implement interface IntSupplier itself. This issue can be fixed with explicit definition of parameter type in a separated method used as a method reference:

private static int getInt(final IntSupplier i){
  return i.getAsInt();
}

private static <T extends Number & IntSupplier> OptionalInt findMinValue(final Collection<T> values){
  return values.stream().mapToInt(UtilsTest::getInt).min();
}

Handling of multiple bounds in conjunction with lambdas at compile time and at runtime is not consistent in Java. In our opinion, this is a bug in Java Compiler. It should correctly identify the signature of call site when a generic was defined with multiple bounds.

Conclusion

We hope that this post helps to open the mystery of internal implementation of lambda expression in Java and how this knowledge may be used. We also expect that inconsistent behavior of Java compilers and JVM will be solved in the future by Oracle developers. Lambda expressions in Java is a powerful language feature but it lacks some features presented in other languages. This comparison between lambdas in different languages will presented in the one of the next posts.

Share this:

Menu:

WE ARE SOCIAL:

Copyright © 2015-2017 Bytex Solutions