Your Web News in One Place

Help Webnuz

Referal links:

Sign up for GreenGeeks web hosting
July 17, 2022 05:49 pm GMT

A JDK 17 alternative to using binary files in your Java tests

What do you do when you require binary data to write a test?

Say you are developing a pure Java Git implementation. You will need the bits that make up a Git repository: blob, tree or commit objects. Or perhaps you want to understand what's in a Java class file. In this case you will need the data in a Java class.

One solution is to use regular files. If you are using Maven you put them in your src/test/resources directory. You can then access their contents using, for example, the Class.getResourceAsStream method.

The files solution works fine. However, there is an alternative if:

  • you are using JDK 17 or later; and
  • your test data is relatively small.

It involves using two features:

In this blog post I will show how we can use them together as a source of binary data.

Using xxd to get a textual representation of the binary data

Suppose we are writing a Git blob (loose) object reader in Java. Put simply, a Git blob object stores the contents of a single file under Git. Further details on Git internals are beyond the scope of this post. If you want to learn more you can refer to the Pro Git book chapter on Git internals.

To test our reader we will need data. Let's quickly create a test repository:

$ git init test$ cd test/$ cat > README.md <<EOF# Our test projectThis will be our test blob.Let's see if we can read it from our test.EOF$ git add README.md$ git commit -m "Add README.md file"$ find .git/objects/ -type f.git/objects/fe/210da9f7dc83fefa49ef54ba73f74e55e453e6.git/objects/75/a8b365ca1a5e731f49d3624960b314d0480ca3.git/objects/18/acf6a96e6e43829c703ec8a8b6092b98829422$ git cat-file -p 75a8b365ca1a5e731f49d3624960b314d0480ca3# Our test projectThis will be our test blob.Let's see if we can read it from our test.

So the Git computed hash of the contents of our README file is:

75a8b365ca1a5e731f49d3624960b314d0480ca3

Let's use the xxd tool to obtain a hex dump of it:

$ xxd -plain .git/objects/75/a8b365ca1a5e731f49d3624960b314d0480ca378013dcb310a80300c4661e79ee20707b782a377105cbc40532356aa9126d2ebeba2f3fb1e65210c7dd362ba0b8cd57015d9399a73f3961435e50c62c897e93dbc1bd93a853223ada88c184e140e0b92612d72fcdebb07525820c6

Great! We now have a textual representation of our binary data.

Using a text block to store our hex dump

A text block is a java.lang.String literal suited for multi-line strings. Even though the output of the xxd tool is multi-line the line terminators are not part of the actual data. Therefore, before using it, we must strip the string of those characters. To do it, I can think of two options:

  1. before consuming the string do a replaceAll(System.lineSeparator(), ""); or
  2. use the \<line-terminator> escape sequence.

Both options are fine. In this blog post we will use the latter:

public class BlobReaderTest {  private static final String README = """      78013dcb310a80300c4661e79ee20707b782a377105cbc40532356aa9126\      d2ebeba2f3fb1e65210c7dd362ba0b8cd57015d9399a73f3961435e50c62\      c897e93dbc1bd93a853223ada88c184e140e0b92612d72fcdebb07525820\      c6\      """;}

Notice that each "line" of the text block ends with a \ (backslash) character. It tells the Java compiler to suppress the line terminator from the resulting string value. For more information you can refer to the Programmer's Guide to Text Blocks.

Nice. We now have the blob data available in our Java source code.

Using java.util.HexFormat to obtain our byte array.

The Javadocs for the java.util.HexFormat class states:

HexFormat converts between bytes and chars and hex-encoded strings which may include additional formatting markup such as prefixes, suffixes, and delimiters.

In our case we want to convert from a hex-encoded string to a array of bytes. Converting the output provided by the xxd tool using the HexFormat class is straight-forward:

@Testpublic void readme() {  var hexFormat = HexFormat.of();  byte[] bytes = hexFormat.parseHex(README);  // consume bytes}

We first obtained an instance of the HexFormat class. We used the of() factory which is suited for our xxd output.

Next, we invoked the parseHex method with the README string of the previous section. It returns the blob data as a byte[].

Great. We are now ready to consume our data and test the blob reader.

Consuming the binary data

How we consume our data depends on the API we are testing. Suppose our BlobReader provides a read method that takes a java.io.InputStream like so:

Blob readInputStream(InputStream inputStream) throws IOException;

In this case we need to wrap our byte array in a ByteArrayInputStream. The full version of the test is listed below:

@Testpublic void readme() throws IOException {  var hexFormat = HexFormat.of();  var bytes = hexFormat.parseHex(README);  try (var inputStream = new ByteArrayInputStream(bytes)) {    var reader = new BlobReader();    var blob = reader.readInputStream(inputStream);    assertEquals(      blob.text(),      """      # Our test project      This will be our test blob.      Let's see if we can read it from our test.      """    );  }}

ByteArrayInputStream is an in-memory InputStream. By this I mean that it does not do any actual I/O. In other words, neither its read nor its close method will return abruptly with an IOException. Regardless, we use a try-with-resources statement.

Next, we create our BlobReader instance and invoke it with our InputStream.

Finally, we verify if the blob contents matches the expected value.

Writing the data to a temporary file

At times you are not in control of the API you are testing or using in your tests. Suppose our blob reader does not provide a method that takes an InputStream. Instead it takes a file. And a java.io.File nonetheless:

Blob readFile(File file) throws IOException;

We have to write our data to a temporary file prior to invoking the method we are testing. The full version of the test is listed below:

@Testpublic void readmeWithFile() throws IOException {  var hexFormat = HexFormat.of();  var bytes = hexFormat.parseHex(README);  var file = File.createTempFile("blob-", ".tmp");  file.deleteOnExit();  try (var out = new FileOutputStream(file)) {    out.write(bytes);  }  var reader = new BlobReader();  var blob = reader.readFile(file);  assertEquals(    blob.text(),    """    # Our test project    This will be our test blob.    Let's see if we can read it from our test.    """  );}

We create a temporary file using the File.createTempFile static method. We immediately call the deleteOnExit method: we want the file to be delete after we are done testing.

Next, we write our bytes to the file via a FileOutputStream.

Finally, we read the file with our BlobReader and verify if the returned blob has the expected contents.

Manually editing our data

Our test data is in Java source code. So, if required, we can manually edit the data. Of course, you can also edit binary files. But I find that text files are easier to edit; it is possible to do it directly in the Java editor.

Let's put this into practice. We will modify our blob hex dump so that we edit the README contents.

In Git, loose objects are compressed using DEFLATE. So we can get the uncompressed hex dump like so:

$ zlib-flate -uncompress \    < .git/objects/75/a8b365ca1a5e731f49d3624960b314d0480ca3 \    | xxd -plain626c6f622039310023204f757220746573742070726f6a6563740a0a546869732077696c6c206265206f7572207465737420626c6f622e0a4c65742773207365652069662077652063616e20726561642069742066726f6d206f757220746573742e0a

The following listing is an interpretation of the uncompressed data. To understand it, you should know this:

  • every two characters represents a single byte
  • Git blob (loose) objects have the following format: blob {size in ascii}\0{contents}
  • it helps having a ASCII table in hand
626c6f62 -- 'blob' in ASCII/UTF-820       -- SPACE3931     -- object size in ASCII/UTF-8. size=91 bytes00       -- NULL23       -- first char of the contents: c='#'-- rest of the contents204f757220746573742070726f6a6563740a0a546869732077696c6c206265206f7572207465737420626c6f622e0a4c65742773207365652069662077652063616e20726561642069742066726f6d206f757220746573742e0a

Let's change the first character of our README from '#' to '='. The equals sign character has the hex code 0x3d. The following test passes:

public class UncompressedTest {  private static final String README = """      626c6f6220393100\      3d\      204f757220746573742070726f6a6563740a0a5468\      69732077696c6c206265206f7572207465737420626c6f622e0a4c657427\      73207365652069662077652063616e20726561642069742066726f6d206f\      757220746573742e0a\      """;  @Test  public void readme() throws IOException {    var out = new ByteArrayOutputStream();    try (var outputStream = new DeflaterOutputStream(out)) {      var hexFormat = HexFormat.of();      outputStream.write(hexFormat.parseHex(README));    }    var bytes = out.toByteArray();    try (var inputStream = new ByteArrayInputStream(bytes)) {      var reader = new BlobReader();      var blob = reader.readInputStream(inputStream);      assertEquals(        blob.text(),        """        = Our test project        This will be our test blob.        Let's see if we can read it from our test.        """      );    }  }}

We have successfully edited the blob data.

A variation using java.util.Base64

For the example in this blog post, using java.util.Base64 would be mostly the same. In fact, it has a few advantages:

  1. java.util.Base64 is available since JDK 8
  2. there is no need to escape the line terminator in the string literal

The following is a snippet of our running example using Base64:

private static final String README = """    eAE9yzEKgDAMRmHnnuIHB7eCo3cQXLxAUyNWqpEm0uvrovP7HmUhDH3TYroLjNVwFdk5mnPzlhQ1    5QxiyJfpPbwb2TqFMiOtqIwYThQOC5JhLXL83rsHUlggxg==    """;@Testpublic void readme() throws IOException {  var decoder = Base64.getMimeDecoder();  var bytes = decoder.decode(README);  // consume the bytes}

It has a (possible) drawback though. It makes harder to manually edit the data.

Doing something similar as the previous section using Base64 would not be as simple. The uncompressed data encoded with Base64 is the following:

$ zlib-flate -uncompress \    < .git/objects/75/a8b365ca1a5e731f49d3624960b314d0480ca3 \    | base64YmxvYiA5MQAjIE91ciB0ZXN0IHByb2plY3QKClRoaXMgd2lsbCBiZSBvdXIgdGVzdCBibG9iLgpMZXQncyBzZWUgaWYgd2UgY2FuIHJlYWQgaXQgZnJvbSBvdXIgdGVzdC4K

Every character represents 6 bits of information. So editing a single character of the Base64 data means changing two bytes of our blob.

Conclusion

In this blog post we saw a way to store binary data in Java source code using text blocks. We used the java.util.HexFormat class to convert the string to an array of bytes.

We focused on using this data for testing. But, if needed, it is also possible to use this technique in production code as well.

Storing the data in text format makes it easier to edit it. This assumes the data:

  • has a defined format; and
  • its binary format allows for manipulation with some ease.

Additionally, since the data is in Java source code, edits can be visualized in Git diffs.

As mentioned this technique is better suited for data that is relatively small.

You can find the source code for all of the examples in this GitHub repository. It includes the source code of the BlobReader.

Originally published at the Objectos Software Blog on July 11th, 2022.

Follow me on twitter.


Original Link: https://dev.to/marcioendo/a-jdk-17-alternative-to-using-binary-files-in-your-java-tests-23jo

Share this article:    Share on Facebook
View Full Article

Dev To

An online community for sharing and discovering great ideas, having debates, and making friends

More About this Source Visit Dev To