Writing large file java

Reading and writting a large file using Java NIO

How can I effectively read from a large file and write bulk data into a file using the Java NIO framework. I’m working with ByteBuffer and FileChannel and had tried something like below:

public static void main(String[] args) < String inFileStr = "screen.png"; String outFileStr = "screen-out.png"; long startTime, elapsedTime; int bufferSizeKB = 4; int bufferSize = bufferSizeKB * 1024; // Check file length File fileIn = new File(inFileStr); System.out.println("File size is " + fileIn.length() + " bytes"); System.out.println("Buffer size is " + bufferSizeKB + " KB"); System.out.println("Using FileChannel with an indirect ByteBuffer of " + bufferSizeKB + " KB"); try ( FileChannel in = new FileInputStream(inFileStr).getChannel(); FileChannel out = new FileOutputStream(outFileStr).getChannel() ) < // Allocate an indirect ByteBuffer ByteBuffer bytebuf = ByteBuffer.allocate(bufferSize); startTime = System.nanoTime(); int bytesCount = 0; // Read data from file into ByteBuffer while ((bytesCount = in.read(bytebuf)) >0) < // flip the buffer which set the limit to current position, and position to 0. bytebuf.flip(); out.write(bytebuf); // Write data from ByteBuffer to file bytebuf.clear(); // For the next read >elapsedTime = System.nanoTime() - startTime; System.out.println("Elapsed Time is " + (elapsedTime / 1000000.0) + " msec"); > catch (IOException ex) < ex.printStackTrace(); >> 

Can anybody tell, should I follow the same procedure if my file size in more than 2 GB? What should I follow if the similar things I want to do while writing if written operations are in bulk?

2 Answers 2

Note that you can simply use Files.copy(Paths.get(inFileStr),Paths.get(outFileStr), StandardCopyOption.REPLACE_EXISTING) to copy the file as your example code does, just likely faster and with only one line of code.

Otherwise, if you already have opened the two file channels, you can just use
in.transferTo(0, in.size(), out) to transfer the entire contents of the in channel to the out channel. Note that this method allows to specify a range within the source file that will be transferred to the target channel’s current position (which is initially zero) and that there’s also a method for the opposite way, i.e. out.transferFrom(in, 0, in.size()) to transfer data from the source channel’s current position to an absolute range within the target file.

Together, they allow almost every imaginable nontrivial bulk transfer in an efficient way without the need to copy the data into a Java side buffer. If that’s not solving your needs, you have to be more specific in your question.

By the way, you can open a FileChannel directly without the FileInputStream / FileOutputStream detour since Java 7.

Источник

Java Large Files – Efficient Processing

Guide to Optimal ways of Java Large Files Processing to avoid OutOfMemoryError. Compare the fastest and the most memory-efficient ways to read and write files.

Overview

This tutorial discusses ways to process large files in Java and How to avoid Java OutOfMemoryException while transferring or processing large files. Java File IO and Java NIO provide various ways of dealing with files. However, large file handling is challenging because we must find the right balance between speed and memory utilization.

This article will use different ways to read a massive file from one place and copy it to another. While doing so, we will monitor the time it takes and the memory it consumes. Finally, we will discuss their performances and find the most efficient Java Large File Processing method.

Читайте также:  Права файла настройка php

We will write examples of transferring large files by using Java Streams, using Java Scanners, using Java File Channels, and then by using Java BufferedInputStream. However, to begin with, we will discuss the fastest way of the file transfer.

Fasted way of Java Large File Processing

This section covers the fastest way of reading and writing large files in Java. However, a quicker way doesn’t mean a better way, and we will discuss that soon.

When we use a Java IO to read or write a file, the slowest part of the process is transferring the file contents between the hard disk and the JVM memory. Thus, to make File IO faster, we can reduce data transfer times. And the easiest way of doing this is to transfer everything in one go.

For example, using Files.readAllBytes()

byte[] bytes = Files.readAllBytes(sourcePath);Code language: Java (java)

Or using Files.readAllLines().

List lines = Files.readAllLines(sourcePath);Code language: Java (java)

In the first snippet, the entire content of the file is copied into a byte array, which is held in memory. Similarly, in the second snippet, the entire content of a text file is read as a List of strings and stored in memory too.

The following method reads byte[] from a source file and writes those bytes[] on the target file.

private void copyByUsingByteArray() throws IOException < Path sourcePath = Path.of(source); Path targetPath = Path.of(target); byte[] bytes = Files.readAllBytes(sourcePath); Files.write(targetPath, bytes, StandardOpenOption.CREATE); >Code language: Java (java)

Using this method, we will process a 667 MB File to read it from the source and write it to the target. We run this method in a separate thread to observe the memory footprint. Also, while the copy happens in the thread, on fixed intervals, the parent thread prints the amount of free memory (in MB).

Source File Size 667 Memory used: 9 Memory used: 676 Memory used: 676 total time 1803Code language: plaintext (plaintext)

The transfer finished fast; however, it consumed a lot of memory. This solution is impractical when copying large files or processing multiple such files simultaneously.

Using BufferedReader and Java Streams

Now, we will test the performance of the Java Streams to process a huge file. To do that, we will use BufferedReader, which provides a Stream of strings read from the file.

Next is an example of using Java Stream provided by BufferedReader to process a huge file (10GB).

private void copyUsingJavaStreams() throws IOException < try ( InputStream inputStream = new FileInputStream(source); BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(inputStream)); FileWriter fileWriter = new FileWriter(target, true); PrintWriter printWriter = new PrintWriter(new BufferedWriter(fileWriter)); Stream linesStream = bufferedReader.lines(); ) < linesStream.forEach(printWriter::println); >>Code language: Java (java)

Now, we will test the method that uses BufferedReader to read a 10GB file.

 Source File Size 10471 Memory used: 9 Memory used: 112 Memory used: 71 Memory used: 17 Memory used: 124 Memory used: 76 Memory used: 28 Memory used: 69 Memory used: 35 Memory used: 47 total time 42025Code language: plaintext (plaintext)

The Java Streams are lazy, and that is why they provide optimal performance. That means while each line from the stream is being written to the target, the next ones are efficiently read from the source. That is evident with the memory logs, as the highest memory consumption was less than 125MB, and the Garbage Collector was doing its job in between. Although it performed better on memory, it took around 42 seconds to finish the file processing.

Читайте также:  Heads first java book

Java Scanner

Java Scanner is used to scan through a file and supports streaming the content without exhausting a large amount of memory.

Next is an example of using Java Scanner to copy a 10GB file.

private void copyUsingScanner() throws IOException < try ( InputStream inputStream = new FileInputStream(source); Scanner scanner = new Scanner(inputStream, StandardCharsets.UTF_8); FileWriter fileWriter = new FileWriter(target, true); rintWriter printWriter = new PrintWriter(new BufferedWriter(fileWriter)); ) < while (scanner.hasNext()) < printWriter.println(scanner.next()); >>Code language: Java (java)
 Source File Size 10471 Memory used: 9 Memory used: 8 Memory used: 9 Memory used: 110 Memory used: 27 Memory used: 176 Memory used: 44 Memory used: 13 Memory used: 74 Memory used: 17 Memory used: 184 Memory used: 35 total time 660054Code language: plaintext (plaintext)

Although the Scanner has used almost the same amount of memory, the performance is prolonged. It took around 11 minutes to copy a 10GB file from one location to other.

Using FileChannel

Next, we will cover an example of using Java FileChannels to transfer a large amount of data from one file to another.

private void copyUsingChannel() throws IOException < try ( FileChannel inputChannel = new FileInputStream(source).getChannel(); FileChannel outputChannel = new FileOutputStream(target).getChannel(); ) < ByteBuffer buffer = ByteBuffer.allocateDirect(4 * 1024); while (inputChannel.read(buffer) != -1) < buffer.flip(); outputChannel.write(buffer); buffer.clear(); >> >Code language: Java (java)

Here, we use a buffer of (4 * 1024) size.

 Source File Size 10471 Memory used: 9 Memory used: 10 Memory used: 10 Memory used: 10 total time 21403Code language: plaintext (plaintext)

From the output, it is clear that this is, so far, the fastest and most memory-efficient way of processing large files.

Process Large File In Chunks (BufferdInputStream)

Finally, we will look at the traditional way of processing large amounts of data in Java IO. We will use a BufferedInputStream stream with the same size buffer we used for FileChannels, and analyze the results.

Next is an example of Reading and Writing Large Files in Chunks using Java BufferedInputStream.

private void copyUsingChunks() throws IOException < try ( InputStream inputStream = new FileInputStream(source); BufferedInputStream bufferedInputStream = new BufferedInputStream(inputStream); OutputStream outputStream = new FileOutputStream(target); ) < byte[] buffer = new byte[4 * 1024]; int read; while ((read = bufferedInputStream.read(buffer, 0, buffer.length)) != -1) < outputStream.write(buffer, 0, read); > > >Code language: Java (java)
 Source File Size 10471 Memory used: 9 Memory used: 10 Memory used: 10 Memory used: 10 total time 20581Code language: plaintext (plaintext)

And the performance we see is similar to the Scanner. That is because we used a buffer of the same size.

Most Efficient Way of Java Large File Processing

We have tried various ways of reading and writing huge files in Java. In this section, we will discuss their performance and understand which one is the optimal way of extensive file handling in Java.

Читайте также:  Python appending to array

In Memory Transfer

As stated earlier, the in-memory transfer is a fast way of data transfer. However, holding the entire content of a file in memory, for example, byte[] or List, is not practical with very large files. It can quickly exhaust all available memory when a file is very large, or the application serves multiple such requests simultaneously.

Java Stream and Scanner

In the Java Stream example of processing large files, we generated a Stream of lines using BufferedReader, which produced a decent result. Similarly, for example, Java FileScanner to transfer large files turned out better on the memory. However, both of these transfers were slow.

FileChannel and Chunk Transfer using BufferedInputStream

We have also seen examples of using FileChannel and BufferedInputStream to read and write huge files. At the base of both examples, we used a fixed-sized buffer. Both of these ways demonstrated better speed and low memory consumption performance.

Moreover, we can still improve the performance of these two ways by using larger buffers because larger buffers mean lesser interactions with underlying files. However, larger buffers also mean a more significant consumption of memory. We will rerun both examples with a buffer size of 1048576 (or 1MB) to prove that.

BufferedInputStream

We will modify the buffer size.

byte[] buffer = new byte[1048576];Code language: Java (java)
 Source File Size 10471 Memory used: 9 Memory used: 12 Memory used: 12 Memory used: 12 total time 11390Code language: plaintext (plaintext)

FileChannel

Similarly, we will increase the ByteBuffer value in the FileChannel Example.

ByteBuffer buffer = ByteBuffer.allocateDirect(1048576);Code language: Java (java)

And the result looks like this:

 Source File Size 10471 Memory used: 9 Memory used: 10 Memory used: 10 Memory used: 10 total time 11431Code language: plaintext (plaintext)

Both of the outputs above show a performance improvement, with a slightly more impact on the memory.

Conclusion

This detailed practical comparison concludes that using a buffer is the best way to transfer a large amount of data using Java IO. Copying the file in chunks helps to limit the amount of consumed memory consumed by the file content.

Both the FileChannel and BufferedInputStream performed head-to-head in our tests. The advantage of using BufferedInputStream or FileChannel to read large files is that they have a configurable buffer. Thus, based on the server load’s nature and the file’s size, we can control the buffer size and eventually find an optimal and the most efficient way to read large files in Java IO.

Summary

In this long and practical-oriented tutorial, we discussed Java Large File Processing. We began by understanding that we can speed up large file reads at the cost of memory consumption. Or Keep the memory utilization to a minimum by slowing down the processing.

Also, we practically tested these ways, which included using Java Streams, Java Scanner, Java FileChannel, and Java BufferedInputStream to transfer a 10GB file and analyzed their performance. Finally, we concluded that BufferedInputStream and the FileChannel are the optimal and most efficient ways to read and write large files in Java IO. They offer excellent control to optimize extensive file handling in Java. For more on Java, please visit Java Tutorials.

Источник

Оцените статью