2014-05-05

Using JSON in your tests to write less testing code

Introduction

Tests are often repetitive occurrences; for instance, you have several implementations of a single interface and you want to test not the interface itself (this you will have already done), but the "production" of this interface.

I recently created grappa, a fork of parboiled (1) aimed at continuing its development. And I had stumbled upon this in tests:

        test(parser.Clause(), "1+5").hasNoErrors()
            .hasParseTree("" +
                "[Clause] '1+5'\n" +
                "  [digit] '1'\n" +
                "  [Operator] '+'\n" +
                "    ['+'] '+'\n" +
                "  [digit] '5'\n" +
                "  [EOI]\n");

Uh. OK, so what this is supposed to do is test a generated parse tree... You can see that this is very far from being sustainable. For one thing, if you change the string output for some reason, you're basically f*ed.

My plan was therefore to replace this with a more sustainable approach which did not depend on string outputs... And as I am a big fan of JSON, I decided to go JSON.

This is the result of my efforts so far; it is not perfect yet (I will have more things to test; parse inputs will also find their way into JSON or even separate files in the future), but it can give you a good indication of how to factorize your test code!

Software used

There are quite a few pieces of software involved here:

first and foremost, Guava; if you don't know this library, you should really have a look at it, and use it;
Jackson for JSON processing; if you don't know which library to choose for your JSON tasks, use this one;
TestNg as a basic test framework;
AssertJ as an assertion library.

How the parse tree is implemented

A parse tree is created if you annotate your parser class appropriately (by default, a parser class will not create a parse tree for obvious performance reasons); when created, it is a set of Nodes with one node at the root and children nodes. Among other information, a node contains:

the name of the parsing rule having created this node;
the start and end index of the match if the node has matched at all;
an ordered list of its children.

Testing a parse tree therefore requires three things:

a parser class;
assertions for one single node;
assertions for a whole tree.

All of this is done below... The code is only written once; after that you just have to write JSON files to test!

Assertions

THe code for the two assertions written is below; a paragraph will then explain how it all works.

Node assertion

Here is the code for the node assertion; it extends AssertJ's AbstractAssert:

public final class NodeAssert<V>
    extends AbstractAssert<NodeAssert<V>, Node<V>>
{
    private final InputBuffer buffer;

    NodeAssert(final Node<V> actual, final InputBuffer buffer)
    {   
        super(actual, NodeAssert.class);
        this.buffer = buffer;
    }   

    private NodeAssert<V> doHasLabel(final String expectedLabel)
    {   
        final String actualLabel = actual.getLabel();
        assertThat(actualLabel).overridingErrorMessage(
            "node's label is null! I didn't expect it to be" 
        ).isNotNull();
        assertThat(actualLabel).overridingErrorMessage(
            "node's label is not what was expected!\n"
            + "Expected: '%s'\nActual  : '%s'\n", 
            expectedLabel, actualLabel
        ).isEqualTo(expectedLabel);
        return this;
    }   

    NodeAssert<V> hasLabel(@Nonnull final Optional<String> label)
    {   
        return label.isPresent() ? doHasLabel(label.get()) : this;
    }   

    private NodeAssert<V> doHasMatch(final String expectedMatch)
    {   
        final String actualMatch = buffer.extract(
            actual.getStartIndex(), actual.getEndIndex());
        assertThat(actualMatch).overridingErrorMessage(
            "rule did not match what was expected!\n"
            + "Expected: -->%s<--\nActual  : -->%s<--\n",
            expectedMatch, actualMatch
        ).isEqualTo(expectedMatch);
        return this;
    }   

    NodeAssert<V> hasMatch(@Nonnull final Optional<String> match)
    {   
        return match.isPresent() ? doHasMatch(match.get()) : this;
    }   
}

Parse tree assertion

More Guava usage in there; heavy Guava usage, in fact, and this is also where Jackson is used:

public abstract class ParseTreeAssert<V>
{
    private static final ObjectMapper MAPPER = new ObjectMapper();
    private static final String RESOURCE_PREFIX = "/parseTrees/";

    public abstract void verify(
        @Nonnull final Optional<Node<V>> node);

    public static final class Builder<V>
    {
        private String label;
        private String match;
        private final List<Builder<V>> children 
            = Lists.newArrayList();

        Builder()
        {
        }

        @JsonProperty("label")
        public Builder<V> withLabel(@Nonnull final String label)
        {
            this.label = Preconditions.checkNotNull(label);
            return this;
        }

        @JsonProperty("children")
        public Builder<V> withChildren(
            @Nonnull final List<Builder<V>> list)
        {
            Preconditions.checkNotNull(list);
            children.addAll(list);
            return this;
        }

        @JsonProperty("match")
        public Builder<V> withMatch(@Nonnull final String match)
        {
            this.match = Preconditions.checkNotNull(match);
            return this;
        }

        public ParseTreeAssert<V> build(final InputBuffer buffer)
        {
            return new WithNode<V>(this, buffer);
        }
    }

    @ParametersAreNonnullByDefault
    private static final class WithNode<E>
        extends ParseTreeAssert<E>
    {
        private final InputBuffer buffer;
        private final Optional<String> label;
        private final Optional<String> match;
        private final List<ParseTreeAssert<E>> children;

        private WithNode(final Builder<E> builder,
            final InputBuffer buffer)
        {
            this.buffer = buffer;
            label = Optional.fromNullable(builder.label);
            match = Optional.fromNullable(builder.match);

            final ImmutableList.Builder<ParseTreeAssert<E>>
                listBuilder = ImmutableList.builder();

            for (final Builder<E> element: builder.children)
                listBuilder.add(element.build(buffer));

            children = listBuilder.build();
        }

        @Override
        public void verify(final Optional<Node<E>> node)
        {
            assertThat(node.isPresent()).overridingErrorMessage(
                "expected to have a node, but I didn't!"
            ).isTrue();

            final NodeAssert<E> nodeAssert
                = new NodeAssert<E>(node.get(), buffer);
            nodeAssert.hasLabel(label).hasMatch(match);
            verifyChildren(node.get());
        }

        private void verifyChildren(final Node<E> node)
        {
            final List<Node<E>> nodeList 
                = node.getChildren();
            final int size 
                = Math.max(children.size(), nodeChildren.size());

            ParseTreeAssert<E> childDescriptor;
            Optional<Node<E>> childNode;

            for (int i = 0; i < size; i++) {
                childDescriptor = Optional
                    .fromNullable(Iterables.get(children, i, null))
                    .or(new NoNode<E>(i));
                childNode = Optional
                    .fromNullable(Iterables.get(nodeList, i, null));
                childDescriptor.verify(childNode);
            }
        }
    }

    private static final class NoNode<E>
        extends ParseTreeAssert<E>
    {
        private final int index;

        private NoNode(final int index)
        {
            this.index = index;
        }

        @Override
        public void verify(@Nonnull final Optional<Node<E>> node)
        {
            fail("did not expect a node at index " + index);
        }
    }

    public static <E> ParseTreeAssert<E> read(
        final String resourceName, final InputBuffer buffer)
        throws IOException
    {
        final String path = RESOURCE_PREFIX + resourceName;
        final TypeReference<Builder<E>> typeRef
            = new TypeReference<Builder<E>>() {};

        final Closer closer = Closer.create();
        final InputStream in;
        final Builder<E> builder;

        try {
            in = closer.register(ParseTreeAssert.class
                .getResourceAsStream(path));
            if (in == null)
                throw new IOException("resource " + path 
                    + " not found");
            builder = MAPPER.readValue(in, typeRef);
            return builder.build(buffer);
        } finally {
            closer.close();
        }
    }
}

OK, so, how does it work?

First of all, you can see that the core assertions in NodeAssert are wrapped in Optionals (reminds you of Java 8? Yes, Java 8 stole it from Guava). This allows to only actually test what you want to test for a given parse tree. For instance, you may want to test the match only and not the label.

You will also have noticed that ParseTreeAssert is abstract and has two implementations, one where a node is expected (WithNode) and one where a node is not expected (NoNode); and also that its .verify() method also takes an Optional as an argument. This allows to test the following scenarios:

you expected to see n matchers but you only have m where m is less than n: in this case, node.isPresent() will return false, and the failure is detected;
the reverse: you expected to see less nodes than what the tree actually contains. In this case the ParseTreeAssert implementation is a NoNode, which immediately fails by telling that it did not expect a node to exist at this point.

Finally, the static .read() method in ParseTreeAssert is where the "real magic" happens: it reads your JSON file which contains the tree you actually expect!

The test classes

Two things are needed: a base parser for testers to extend, and a core test file to extend when you want to run actual tests.

The basic test parser to extend

This one is quite simple; there is a "trick" however:

@BuildParseTree
public abstract class TestParser
    extends BaseParser<Object>
{
    public abstract Rule mainRule();
}

The trick here is the @BuildParseTree annotation; OK, its name is self explanatory but if you do not annotate a parser class with this, the build parse tree will not be generated and you won't be able to test it!

The core test class...

... Is abstract. Note that there is yet another trick in there:

@Test
public abstract class ParseTreeTest
{
    private final Node<Object> tree;
    private final ParseTreeAssert<Object> treeAssert;

    protected ParseTreeTest(final Class<? extends TestParser> c,
        final String resourceName, final String input)
        throws IOException
    {
        final TestParser parser = Parboiled.createParser(c);
        final ParseRunner<Object> runner
            = new ReportingParseRunner<Object>(parser.mainRule());
        final ParsingResult<Object> result = runner.run(input);
        tree = result.parseTreeRoot;
        treeAssert = ParseTreeAssert.read(resourceName, 
            result.inputBuffer);
    }

    @Test
    public final void treeIsWhatIsExpected()
    {
        treeAssert.verify(Optional.of(tree));
    }
}

The trick is the @Test annotation at the class level; if you do not do that, extending classes will not run @Test methods issued from the abstract class... And the only test method is in the abstract class (and what is more, it is final).

You will note the use of ParseTreeAssert.read() which will read from the JSON file (a resource in the classpath) and deserialize, all in one.

And now, finally, how to use it!

And here is what becomes of the test mentioned in the introduction. First, the test class:

public final class SplitParseTreeTest
    extends ParseTreeTest
{
    static class SplitParser
        extends TestParser
    {
        final Primitives primitives 
            = Parboiled.createParser(Primitives.class);

        public Rule clause() {
            return sequence(
                digit(),
                primitives.operator(),
                primitives.digit(),
                EOI
            );
        }

        @Override
        @DontLabel
        public Rule mainRule()
        {
            return clause();
        }
    }

    @BuildParseTree
    static class Primitives
        extends BaseParser<Object>
    {

        public Rule operator()
        {
            return firstOf('+', '-');
        }
    }

    public SplitParseTreeTest()
        throws IOException
    {
        super(SplitParser.class, "split.json", "1+5");
    }
}

The parser is coded within the test class here; you could create it outside of it, of course. The constructor of the abstract class is called with all necessary arguments: the parser class, the JSON file to read and the input to test. As to the JSON file, here it is:

{
    "label": "clause",
    "match": "1+5",
    "children": [
        {
            "label": "digit",
            "match": "1"
        },
        {
            "label": "operator",
            "match": "+",
            "children": [
                {
                    "label": "'+'",
                    "match": "+"
                }
            ]
        },
        {
            "label": "digit",
            "match": "5"
        },
        {
            "label": "EOI"
        }
    ]
}

And that's it! There are nearly 50 other tests to convert from the "string based testing" into that, but all I have to do now is write the JSON files and very simple test classes!

But this is not over yet...

As I said, this is the beginning of my effort; more can be done and will be done. For instance, handling parser inputs differently, and adding more assertions; the latter is quite simple:

add the field to ParseTreeAssert.Builder and ParseTreeAssert.WithNode,
update NodeAssert,
update the relevant JSON files.

Much easier than modifying strings!

That's all folks...

2014-03-22

Strings, characters, bytes and character sets: clearing up the confusion

Introduction

On all forums dedicated to Java development, a lot of problems reported by non seasoned Java developers, and even seasoned developers in certain cases, relate to text handling; either text being corrupted in files, or strings not matching the expected content, displays unable to render some text or rendering it incorrectly etc. You name it.

All of these problems stem from a misunderstanding of the fundamental concepts behind what "text" actually is: how you see it on your display, how Java represents it, and how you store it in files/send it over the network (or read it from files/receive it over the network).

This page will try to sum up what you need to know in order to avoid making errors, or at least narrow the problem space.

Some definitions

Unicode and your display

Unicode is quite a complex piece of machinery, but ultimately, you can view it as a dictionary of graphemes covering nearly all human written language. Each grapheme is a combination of one, or more, code points (such as U+0013). And not all graphemes are letters, spaces or punctuation marks: Unicode also defines code points for smileys, for instance.

Unicode is not set in stone. Different revisions are regularly published, which add new graphemes and therefore new code points. Now, how does this relate to your display? Well, your display may, or may not, have the ability to display this or that grapheme.

What Unicode also defines is character encodings. A character encoding is a way for computing devices to translate a Unicode code point into a sequence of one or more bytes. UTF-8 is such a character encoding (UTF, in UTF-8, means Unicode Transformation Format).

byte and char

A byte is the basic computer storage unit: 8 bits. The fact that byte is signed is irrelevant to this discussion. A byte is 8 bits and that is all there is to it. You don't need to know more ;)

On to char. And first things first: a char is NOT two bytes It is only two bytes "storage wise". A char is an individual code unit in the UTF-16 character encoding.

At run time, all Strings in Java are sequences of chars. This include string literals you have in your source files. Whatever the character encoding of your source files. Which nicely leads to...

Charset, encoding and decoding

When you want to write text, whether that be to a file or a socket, what you initially have is some char sequence. But what you will effectively write are bytes. Similarly, when you read text, you don't read chars, you read bytes.

Which means you need two processes:

turning a sequence of chars into a sequence of bytes: this process is known as encoding;
turning a sequence of bytes into a sequence of chars: this process is known as decoding.

These two processes depend on what charset you use; a charset is essentially a byte<->char mapping, but not a one-to-one mapping: one char can give multiple bytes, and some byte sequences can be decoded into multiple chars.

Fundamental Java classes

In Java, you have three classes for each of a charset, an encoder and a decoder:

a charset is represented by the Charset class;
an encoder is represented by a CharsetEncoder;
a decoder is represented by a CharsetDecoder.

You also have a static method for encoding one single code point into its equivalent UTF-16, char sequence: Character.toChars().

`InputStream`/`OutputStream` vs `Reader`/`Writer`

Again this byte vs char analogy here. If you want to read bytes, use an InputStream; if you want to read chars, you will use a Reader. When not reading from a char source, a Reader will try its best to decode the input bytes into chars and what you will get is the result of the Reader's efforts.

In the same vein, for writing bytes, use an OutputStream, and use a Writer if you want to write chars. If your output is a byte sink, the Writer will encode the chars you submit into bytes and write those bytes.

We can make a crude drawing of what happens when you send text data over the network to a peer, for instance:

        SENDER            |          RECEIVER
        encodes           |           decodes
char[] --------> byte[] ----> byte[] --------> char[]

Now, in this crude drawing, imagine that the sender encodes using one charset but the receiver decodes using another...

Avoiding such a case is simple enough: always specify the charset when using a Reader or a Writer. Otherwise, you will have problems. Eventually. You may not have had problems until now, but you will. This is a guarantee.

Illustration: what not to do

Relying on the default charset

By far the most common thing that can go wrong is code like this:

    // DON'T DO THAT -- no charset is specified
    new FileReader("somefile.txt"); // or new FileWriter()
    // DON'T DO THAT -- no charset is specified
    someString.getBytes();

The charset to use is not specified here; which means the default charset is used. and this default charset depends on your JRE/OS environment.

Say you write a file using an implementation using ISO-8859-15 as the default charset; you send this file to a peer whose JRE/OS combination uses UTF-8. Your peer won't be able to read the file correctly...

Avoiding that is simple. Apply the rule above. And if you use Java 7 or greater, use Files instead. As in:

    // This method requires that you specify the charset...
    Files.newBufferedReader(Paths.get("somefile.txt"), 
        StandardCharsets.UTF_8);
    someString.getBytes(StandardCharsets.UTF_8);

Using `String`s for binary data

The mapping of chars to bytes means that only certain sequences of bytes will be generated by an encoder; and similarly, only these byte sequences will be readable as chars.

The following program shows that. It creates a byte array which cannot be fully decoded in UTF-8; it also demonstrates the default behaviour of String in the event of an unmappable sequence:

import java.nio.ByteBuffer;
import java.nio.charset.CharacterCodingException;
import java.nio.charset.Charset;
import java.nio.charset.CharsetDecoder;
import java.nio.charset.CodingErrorAction;
import java.nio.charset.StandardCharsets;

public final class Main
{
    /*
     * This byte is unmappable by the UTF-8 encoding
     */
    private static final byte POISON = (byte) 0xfc;

    public static void main(final String... args)
    {
        final Charset charset = StandardCharsets.UTF_8;

        /*
         * We create three decoders. Their behaviour only differs 
         * by the way they treat unmappable byte sequences: the 
         * first one will ignore errors, the second one will 
         * replace the unmappable bytes with a default value, and
         * the third one will throw an exception.
         */
        final CharsetDecoder lossy = charset.newDecoder()
            .onMalformedInput(CodingErrorAction.IGNORE);

        final CharsetDecoder lenient = charset.newDecoder()
            .onMalformedInput(CodingErrorAction.REPLACE);

        final CharsetDecoder assTight = charset.newDecoder()
            .onMalformedInput(CodingErrorAction.REPORT);

        /*
         * The string we are testing against
         */
        final String original = "Mémé dans les orties";

        /*
         * Encode the test string into a byte array; then allocate a
         * buffer whose length is that of the encoded array plus 1. 
         * At the end of this buffer, add our poison.
         */
        final byte[] encoded = original.getBytes(charset);
        final ByteBuffer buf
            = ByteBuffer.allocate(encoded.length + 1);
        buf.put(encoded).put(POISON);

        /*
         * For reference, let us print the length of our poisoned 
         * array.
         */
        System.out.printf("Original byte array has length %d\n",
            buf.array().length);

        /*
         * Now, attempt to build a string again from our poisoned 
         * byte input. First by invoking the appropriate String 
         * constructor (note that we specify the charset), then 
         * by trying each of the three decoders we have
         * initialized above.
         */

        System.out.println("--- DECODING TESTS ---");
        
        final String decoded = new String(buf.array(), charset);
        System.out.printf("String constructor: %s\n", decoded);
        
        tryDecoder(lossy, "lossy", buf);
        tryDecoder(lenient, "lenient", buf);
        tryDecoder(assTight, "assTight", buf);
        
        System.out.println("--- END DECODING TESTS ---");

        /*
         * Now try and regenerate our original byte array. 
         * And weep.
         */
        System.out.printf("Reencoded byte array length: %d\n",
            decoded.getBytes(charset).length);

    }

    private static void tryDecoder(final CharsetDecoder decoder,
        final String name, final ByteBuffer buf)
    {
        buf.rewind();
        try {
            System.out.printf("%s decoder: %s\n", name, 
                decoder.decode(buf));
        } catch (CharacterCodingException e) {
            System.out.printf("%s FAILED! Exception follows...\n",
                name);
            e.printStackTrace(System.out);
        }
    }
}

And the output is...

Original byte array has length 23
--- DECODING TESTS ---
String constructor: Mémé dans les orties�
lossy decoder: Mémé dans les orties
lenient decoder: Mémé dans les orties�
assTight FAILED! Exception follows...
java.nio.charset.MalformedInputException: Input length = 1
 at java.nio.charset.CoderResult.throwException(CoderResult.java:281)
 at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:816)
 at com.github.fge.Main.tryDecoder(Main.java:87)
 at com.github.fge.Main.main(Main.java:70)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:483)
 at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120)
--- END DECODING TESTS ---
Reencoded byte array length: 25

As you can see, the default behaviour of an encoder is to replace unmappable byte sequences by a default character sequence (a big, fat question mark); you cannot change that default behaviour, but what you can do as this program also illustrates is attempting the decoding operation yourself.

That's all folks...

2014-03-17

Working with the Java 7 file API: recursive copy and deletion

Introduction

In one of my previous posts, I compared the file API in Java 6 and the new file API in Java 7.

You will have noticed, if you have been curious enough to read the Javadoc an tried it, that there are still two convenience methods missing from the new API (and they were not in the old API either): recursive copy or deletion of a directory.

In this post, I will show an implementation of both. The basis for both of these is to use the Files.walkFileTree() method. Copying and deleting is therefore "only" a matter of implementing the FileVisitor interface.

There are limitations to both; refer to the end of this post for more details.

Recursive copy

Here is the code of a FileVisitor for recursive copying:

public final class CopyFileVisitor
    implements FileVisitor<Path>
{
    private final Path srcdir;
    private final Path dstdir;

    public CopyFileVisitor(final Path srcdir, final Path dstdir)
    {
        this.srcdir = srcdir.toAbsolutePath();
        this.dstdir = dstdir.toAbsolutePath();
    }

    @Override
    public FileVisitResult preVisitDirectory(final Path dir,
        final BasicFileAttributes attrs)
        throws IOException
    {
        Files.createDirectories(toDestination(dir));
        return FileVisitResult.CONTINUE;
    }

    @Override
    public FileVisitResult visitFile(final Path file,
        final BasicFileAttributes attrs)
        throws IOException
    {
        Files.copy(file, toDestination(file));
        return FileVisitResult.CONTINUE;
    }

    @Override
    public FileVisitResult visitFileFailed(final Path file,
        final IOException exc)
        throws IOException
    {
        throw exc;
    }

    @Override
    public FileVisitResult postVisitDirectory(final Path dir,
        final IOException exc)
        throws IOException
    {
        if (exc != null)
            throw exc;
        return FileVisitResult.CONTINUE;
    }

    private Path toDestination(final Path victim)
    {
        final Path tmp = victim.toAbsolutePath();
        final Path rel = srcdir.relativize(tmp);
        return dstdir.resolve(rel.toString());
    }
}

In order to use it, you would then do:

final Path srcdir = Paths.get("/the/source/dir");
final Path dstdir = Paths.get("/the/destination/dir");
Files.walkFileTree(srcdir, new CopyFileVisitor(srcdir, dstdir);

Recursive deletion

Here is the code of a FileVisitor for recursive deletion:

public final class DeletionFileVisitor
    implements FileVisitor<Path>
{
    @Override
    public FileVisitResult preVisitDirectory(final Path dir,
        final BasicFileAttributes attrs)
        throws IOException
    {
        return FileVisitResult.CONTINUE;
    }

    @Override
    public FileVisitResult visitFile(final Path file,
        final BasicFileAttributes attrs)
        throws IOException
    {
        Files.delete(file);
        return FileVisitResult.CONTINUE;
    }

    @Override
    public FileVisitResult visitFileFailed(final Path file,
        final IOException exc)
        throws IOException
    {
        throw exc;
    }

    @Override
    public FileVisitResult postVisitDirectory(final Path dir,
        final IOException exc)
        throws IOException
    {
        if (exc != null)
            throw exc;
        Files.delete(dir);
        return FileVisitResult.CONTINUE;
    }
}

To use it:

final Path victim = Paths.get("/directory/to/delete");

Files.walkFileTree(victim, new DeleteFileVisitor());

Limitations

The implementation of recursive copy is limited to paths on the same filesystem. Indeed, you cannot .resolve() a path issued from another filesystem... More on that in another post.

The recursive deletion will stop at the first element (file or directory) which fails to be deleted.

That's all folks...

2014-03-14

Working with files: Java 6 versus Java 7

Introduction

When searching for examples to manipulate files on the net, most examples found use either of:

Java 6's old file API,
a utility library (Apache commons mostly).

But Java 8 is out now, and it seems people still haven't gotten to grasps with Java 7's new file API... This post aims to do two things:

describe some of the advantages of the new API;
give examples of Java 6 code and the equivalent (better!) Java 7 code.

Hopefully, after reading this, you will ditch the old API! Which you should, really.

Part 1: advantages of the new API

Meaningful exceptions

Oh boy is that missing from the old API.

Basically, any filesystem-level error (permission denied, file does not exist etc) with the old API would throw FileNotFoundException. Not informative at all. Making sense out of this exception basically requires that you dig into the error message.

With the new API, that changes: you have FileSystemException. And its subclasses have equally meaningful names: NoSuchFileException, AccessDeniedException, NotDirectoryException etc.

Common filesystem operations now throw exceptions

A very common bug with Java 6 code (and which external libraries have struggled to work around) is to not check for return values of many methods on File objects. Examples of such methods are file.delete(), file.createNewFile(), file.move(), file.mkdirs() etc.

Not anymore. For instance, Files.delete() throws an exception on failure; and the exception thrown will be meaningful! You will know whether, for instance, you attempted to delete a non empty directory.

Useful shortcut methods

Just an example: Files.copy()! Several of these shortcut methods will be shown in the examples below.

Less room for errors when dealing with text data

Have you ever been hit by code using a Reader or a Writer without specifying the encoding to use when reading/writing to files?

Well, the good news is that Files methods opening readers or writers (resp. Files.newBufferedReader() and Files.newBufferedWriter()) require you to specify the encoding!

This is also the case of the Files.readAllLines() method.

Advanced usage: filesystem implementations

Here, a filesystem does not mean only an on-disk format of storing files; provided you want to implement it, or find an existing implementation, it can be anything you like: an FTP server, a CIFS filesystem, etc.

Or even a ZIP file (therefore, jars, wars, ears etc as well); in fact, Oracle provides a filesystem implementation for these.

Part 2: sample usages

Abstract path names

In Java 6, it was File. In Java 7, you use Path.

For backwards compatibility reasons, you can convert one to the other and vice versa: file.toPath(), path.toFile().

To create a path in Java 7, use Paths.get().

Operations on abstract paths

The table below lists operations on File objects, and their equivalent using Java 7's Files class. There are fundamental differences between Java 6 and Java 7 here:

these operations in Java 7 require that you create the Path object;
as mentioned above, Java 7 operations will throw a (meaningful!) exception on failure; Java 6 operations return a boolean which should be checked for, but which most people forget to check for...
For all creation operations using Java 7, you can specify file attributes; those are filesystem dependent, and specifying an attribute which is not supported by the filesystem implementation will throw an unchecked exception.

Java 6	Java 7	Differences
file.createNew()	Files.createFile()	See above
file.mkdir()	Files.createDirectory()	See above
file.mkdirs()	Files.createDirectories()	See above
file.exists()	Files.exists()	Java 7 supports symbolic links; you can therefore check whether the link itself exists by adding the LinkOption.NO_FOLLOW_LINKS option, regardless of whether the link target exists. On filesystems without symlink support, this option has no effect.
file.delete()	Files.delete()	Java 7 also has Files.deleteIfExists().
file.isFile(), file.isDirectory()	Files.isRegularFile(), Files.isDirectory()	Here also, symlink support makes a difference. Symlinks are followed by default, if you do not want to follow them, specify the LinkOption.NO_FOLLOW_LINKS option. Java 7 has also Files.isSymbolicLink().
file.renameTo()	Files.move()	Like Java 6, this method will fail to move a non empty directory if the target path is not on the same filesystem (the same FileStore).

Copying a file

In plain Java 6, you would have to do something like this:

public static void copyFile(final String from, final String to)
    throws IOException
{
    final byte[] buf = new byte[32768];
    final InputStream in = new FileInputStream(from);
    final OutputStream out = new FileOutputStream(to);
    int read;
    try {
        while ((read = in.read(buf)) != -1)
            out.write(buf, 0, read);
        out.flush();
    } finally {
        in.close();
        out.close();
    }        
}

But even this code is flawed, so if you are stuck with Java 6, do yourself a favour and use Guava, which has (since version 14.0) Closer:

public static void copyFile(final String from, final String to)
    throws IOException
{
    final Closer closer = Closer.create();
    final RandomAccessFile src, dst;
    final FileChannel in, out;

    try {
        src = Closer.register(new RandomAccessFile(from, "r");
        dst = Closer.register(new RandomAccessFile(to, "w");
        in = Closer.register(src.getChannel());
        out = Closer.register(dst.getChannel());
        in.tranfserTo(0L, in.size(), out);
    } finally {
        Closer.close();
    }
}

With Java 7, this becomes very, very simple:

public static void copyFile(final String from, final String to)
    throws IOException
{
    Files.copy(Paths.get(from), Paths.get(to));
}

Opening a BufferedWriter/OutputStream to a file

In this case, the code is not much shorter for Java 7; but you do get the benefit of better exceptions.

With Java 6:

// BufferedWriter
new BufferedWriter(new FileWriter(myFile, Charset.forName("UTF-8")));
// OutputStream
new FileOutputStream(myFile);

With Java 7:

 // BufferedWriter
Files.newBufferedWriter(myFile, StandardCharsets.UTF_8);
// OutputStream
Files.newOutputStream(myFile);

Note that only the most simple form of these methods is presented here. By adding options, you can specify whether you want to fail if the file does not exist, or create it only if it does not exist, or append to it. Many things in fact.

Listing all files in a directory

Java 6:

final File rootDir = new File(...);
for (final File file: rootDir.listFiles())
    // do something

Java 7:

final Path rootDir = Paths.get(...);
for (final Path file: Files.newDirectoryStream(rootDir))
    // do something

OK, this is not really shorter. However, the difference in behaviour alone speaks for itself:

if the path is not a directory, .listFiles() will happily return null; with Java 7, you get a meaningful exception (including the NotDirectoryException mentioned above);
as Java 7's method name says, it will be a stream; .listFiles() swallowed the whole list of files, making it impossible to use if you have a very large number of files.

2014-05-05

Using JSON in your tests to write less testing code

Introduction

Software used

How the parse tree is implemented

Assertions

Node assertion

Parse tree assertion

OK, so, how does it work?

The test classes

The basic test parser to extend

The core test class...

And now, finally, how to use it!

But this is not over yet...

That's all folks...

2014-03-22

Strings, characters, bytes and character sets: clearing up the confusion

Introduction

Some definitions

Unicode and your display

byte and char

Charset, encoding and decoding

Fundamental Java classes

InputStream/OutputStream vs Reader/Writer

Illustration: what not to do

Relying on the default charset

Using Strings for binary data

That's all folks...

2014-03-17

Working with the Java 7 file API: recursive copy and deletion

Introduction

Recursive copy

Recursive deletion

Limitations

That's all folks...

2014-03-14

Working with files: Java 6 versus Java 7

Introduction

Part 1: advantages of the new API

Meaningful exceptions

Common filesystem operations now throw exceptions

Useful shortcut methods

Less room for errors when dealing with text data

Advanced usage: filesystem implementations

Part 2: sample usages

Abstract path names

Operations on abstract paths

Copying a file

Opening a BufferedWriter/OutputStream to a file

Listing all files in a directory

To be continued...

`InputStream`/`OutputStream` vs `Reader`/`Writer`

Using `String`s for binary data