Read large file java

Reading a Large File Efficiently in Java

Learn to read all lines from a large file (size in GB) in Java and avoid any performance pitfalls such as very high usage of memory or even OutOfMemoryError if the File is large enough.

1. Approach to Read Large Files

Similar to DOM parser and SAX parser for XML files, we can read a file with two approaches:

  • Reading the complete file in memory before processing it
  • Reading the file content line by line and processing each line independently

The first approach looks cleaner and is suitable for small files where memory requirements are very low (in Kilobytes or few Megabytes). If used to read large files, it will quickly result in OutOfMemoryError for the files in size of Gigabytes.

The second approach is suitable for reading very large files in Gigabytes when it is not feasible to read the whole file into memory. In this approach, we use the line streaming i.e. read the lines from the file in form of a stream or iterator.

This tutorial is focused on the solutions using the second approach.

2. Using New IO’s Files.lines()

Using the Files.lines() method, the contents of the file are read and processed lazily so that only a small portion of the file is stored in memory at any given time.

The good thing about this approach is that we can directly write the Consumer actions and use newer language features such as lambda expressions with Stream.

Path filePath = Paths.get("C:/temp/file.txt") //try-with-resources try (Stream lines = Files.lines( filePath )) < lines.forEach(System.out::println); >catch (IOException e)

3. Common IO’s FileUtils.lineIterator()

The lineIterator() uses a Reader to iterator over the lines of a specified file. Use the try-with-resources to auto-close the iterator after reading the file.

Do not forget to import the latest version of commons-io module into project dependencies.

File file = new File("C:/temp/file.txt"); try(LineIterator it = FileUtils.lineIterator(file, "UTF-8")) < while (it.hasNext()) < String line = it.nextLine(); // do something with line System.out.println(line); >> catch (IOException e)

4. Reading Large Binary Files

Note that when we are reading the files in Stream or line by line, we are referring to the character-based or text files. For reading the binary files, UTF-8 charset may corrupt the data and so the above solution does not apply to binary data files.

To read large raw data files, such as movies or large images, we can use Java NIO’s ByteBuffer and FileChannel classes. Remember that you will need to try different buffer sizes and pick that works best for you.

try (RandomAccessFile aFile = new RandomAccessFile("test.txt", "r"); FileChannel inChannel = aFile.getChannel();) < //Buffer size is 1024 ByteBuffer buffer = ByteBuffer.allocate(1024); while (inChannel.read(buffer) >0) < buffer.flip(); for (int i = 0; i < buffer.limit(); i++) < System.out.print((char) buffer.get()); >buffer.clear(); // do something with the data and clear/compact it. > > catch (IOException e)

This Java tutorial discussed a few efficient solutions to read very large files. The correct solution depends on the type of file and other deciding factors specific to the problem.

Читайте также:  Php echo print var dump

I will suggest benchmarking all solutions in your environment and choosing based on their performance.

Источник

How to Read Large File in Java

In our last article, we cover How to read file in Java.This post will cover how to read large file in Java efficiently.

Reading the large file in Java efficiently is always a challenge, with new enhancements coming to Java IO package, it is becoming more and more efficient.

We have used sample file with size 1GB for all these. Reading such a large file in memory is not a good option, we will covering various methods outlining How to read large file in Java line by line.

1 Using Java API

We will cover various options how to read a file in Java efficiently using plain Java API.

1.1 Using Java BufferReader

 public class ReadLargeFileByBufferReader < public static void main(String[] args) throws IOException < String fileName = "/tutorials/fileread/file.txt"; //this path is on my local try (BufferedReader fileBufferReader = new BufferedReader(new FileReader(fileName))) < String fileLineContent; while ((fileLineContent = fileBufferReader.readLine()) != null) < // process the line. >> > > 
 Max Memory Used : 258MB Time Take : 100 Seconds 

1.2 Using Java 8 Stream API

public class ReadLargeFIleUsingStream < public static void main(String[] args) throws IOException < String fileName = "/tutorials/fileread/file.txt"; //this path is on my local // lines(Path path, Charset cs) try (Stream inputStream = Files.lines(Paths.get(fileName), StandardCharsets.UTF8)) < inputStream.forEach(System.out::println); >> > 
Max Memory Used : 390MB Time Take : 60 Seconds 

1.3 Using Java Scanner

Java Scanner API also provides a way to read large file line by line.

 public class ReadLargeFileByScanner < public static void main(String[] args) throws FileNotFoundException < String fileName = "/Users/umesh/personal/tutorials/fileread/file.txt"; //this path is on my local InputStream inputStream = new FileInputStream(fileName); try(Scanner fileScanner = new Scanner(inputStream, StandardCharsets.UTF_8.name()))< while (fileScanner.hasNextLine())< System.out.println(fileScanner.nextLine()); >> > > 
 Max Memory Used : 460MB Time Take : 60 Seconds 

2 Streaming File Using Apache Commons IO

This can also be achieved by using Apache Commons IO FileUtils.lineIterator () Method

 public class ReadLargeFileUsingApacheCommonIO < public static void main(String[] args) throws IOException < String fileName = "/Users/umesh/personal/tutorials/fileread/file.txt"; //this path is on my local LineIterator fileContents= FileUtils.lineIterator(new File(fileName), StandardCharsets.UTF_8.name()); while(fileContents.hasNext())< System.out.println(fileContents.nextLine()); >> > 
 Max Memory Used : 400MB Time Take : 60 Seconds 

As we saw how to read a large file in Java efficiently. Few things which you need to pay close attention

  1. Reading the large file in one go will not be a good option (You will get OutOfMemoryError ).
  2. We Adapted technique to read large file line by line to keep memory footprint low.

I used VisualVM to monitoring Memory, CPU and Threadpool information while running these programmes.

based on our test, BufferReader has the lowest memory footprint, though the overall execution was slow.

All the code of this article is available Over on Github. This is a Maven-based project.

Источник

The Techno Journals

Large file can be any plain text or binary file which is huge in size and can not fit in JVM memory at once. For example if a java application allocated with 256 MB memory and it tries to load a file completely which is more or close to that memory in size then it may throw out of memory error.

Points to be remembered

  • Never read the whole file at once.
  • Read file line by line or in chunks, like reading few lines from text file or reading few bytes from binary file.
  • Do not store the whole data in memory, like reading all lines and keeping as string.
Читайте также:  How to create empty array python

CSV file

In this example I am going to read a CSV file which is around 500 MB in size. Sample is as given below.

«year_month»,»month_of_release»,»passenger_type»,»direction»,»citizenship»,«visa»,»country_of_residence»,»estimate»,»standard_error»,»status»

File reading and counting year wise

We will read CSV file and provide the count year wise using the first column in this CSV file. We will see it in two different ways, one is synchronous way and another is asynchronous way using CompletableFuture. We will see the code in next sections.

Instance Variables

private final long mb = 1024*1024; private final String file = "/Users/Downloads/sample.csv";

Common Methods

public void yearCount(String line, Map countMap)< String key = line.substring(1, 5); if(countMap.containsKey(key)) < countMap.put(key, countMap.get(key)+1); >else countMap.put(key, 1); >

Below method I have annotated with EventListener to get it invoked itself when application is up and ready. This method also calculates memory consumption and execution time.

@EventListener(ApplicationReadyEvent.class) public void testLargeFile() throws Exception< long premem = Runtime.getRuntime().totalMemory()-Runtime.getRuntime().freeMemory(); long start = System.currentTimeMillis(); System.out.println("Used memory pre run (MB): "+(premem/mb)); //PLEASE UNCOMMENT THE 1 LINE OUT OF BELOW 2 LINES AT A TIME TO TEST //THE DESIRED FUNCTIONALITY // System.out.println("Year count: "+simpleYearCount(file));//process file synchronously and print details // System.out.println("Year count: "+asyncYearCount(file));//process file asynchronously and print details long postmem = Runtime.getRuntime().totalMemory()-Runtime.getRuntime().freeMemory(); System.out.println("Used memory post run (MB): "+(postmem/mb)); System.out.println("Memory consumed (MB): "+(postmem-premem)/mb); System.out.println("Time taken in MS: "+(System.currentTimeMillis()-start)); >

Synchronous processing

Below is the code which reads the file using NIO API and calculates year count synchronously. Code looks pretty simple and small.

public Map simpleYearCount(String file) throws IOException < MapyearCountMap = new HashMap<>(); Files.lines(Paths.get(file)) .skip(1)//skip first line .forEach((s)->< yearCount(s, yearCountMap); >); return yearCountMap; >

Output

Used memory pre run (MB): 41 Year count: Used memory post run (MB): 304 Memory consumed (MB): 262 Time taken in MS: 1971

Asynchronous processing

Here we are going to read the file using NIO API and then will process it asynchronously using CompletableFuture. For example will read 10000 lines and then process them asynchronously, then next 5000 and so on. See the below code.

public Map asyncYearCount(String file) throws IOException, InterruptedException, ExecutionException < try < List> futures = new ArrayList<>(); List items = new ArrayList<>(); Files.lines(Paths.get(file)) .skip(1)//skip first line .forEach(line->< items.add(line); if(items.size()%10000==0) < //add completable task for each of 10000 rows futures.add(CompletableFuture.supplyAsync(yearCountSupplier(new ArrayList<>(items), new HashMap<>()))); items.clear(); > >); if(items.size()>0) < //add completable task for remaining rows futures.add(CompletableFuture.supplyAsync(yearCountSupplier(items, new HashMap<>()))); > return CompletableFuture.allOf(futures.toArray(new CompletableFuture[futures.size()])) .thenApply($->< //join all task to collect result after all tasks completed return futures.stream().map(ftr->ftr.join()).collect(Collectors.toList()); >) .thenApply(maps-> < MapyearCountMap = new HashMap<>(); maps.forEach(map->< //merge the result of all the tasks map.forEach((key, val)->< if(yearCountMap.containsKey(key)) < yearCountMap.put(key, yearCountMap.get(key)+val); >else yearCountMap.put(key, val); >); >); return yearCountMap; >) .get(); > catch (IOException e) < e.printStackTrace(); >return new HashMap<>(); >
//Supplier method to count the year in given rows public Supplier> yearCountSupplier(List items, Map map)< return ()->< items.forEach((line)->< yearCount(line,map); >); return map; >; >

Output

Used memory pre run (MB): 120 Year count: Used memory post run (MB): 262 Memory consumed (MB): 142 Time taken in MS: 1549

Conclusion

Now we have seen how to read and process the huge file. We also learnt to do it synchronously and asynchronously. If we compare the output of both execution we can notice the memory consumption and execution time difference which says that async execution is faster however it may use more memory as multiple threads are processing at same time. Async execution may be more useful when you have more heavy files then difference is significant.
I would suggest if you have less memory then go with synchronous execution otherwise use async execution for better performance. You may use async execution also with less memory but it may not be that much beneficial due to small chunks and too many threads.

  • Get link
  • Facebook
  • Twitter
  • Pinterest
  • Email
  • Other Apps

Источник

Java – Reading a Large File Efficiently

What’s the most efficient and easiest way to read a large file in java? Well, one way is to read the whole file at once into memory. Let us examine some issues that arise when doing so.

2. Loading Whole File Into Memory

One way to load the whole file into a String is to use NIO. This can be accomplished in a single line as follows:

String str = new String(Files.readAllBytes(Paths.get(pathname)), StandardCharsets.UTF_8);

There are several other ways to read whole file into memory. Check this article for more details, including benchmarks.

The problem with the above approach is that, with a sufficiently large file, you end up with an OutOfMemoryError.

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space

On my machine with 4G of RAM and 12G of swap, I cannot load a 300MB file successfully using this method. So we need to look at alternative methods of processing a whole file.

3. Loading a Binary File in Chunks

The following code demonstrates how to load and process the bytes in a file (can be a binary file) a chunk at a time.

try(BufferedInputStream in = new BufferedInputStream(new FileInputStream(pathname))) < byte[] bbuf = new byte[4096]; int len; while ((len = in.read(bbuf)) != -1) < // process data here: bbuf[0] thru bbuf[len - 1] >>

4. Reading a Text File Line By Line

Processing a text file is easier when you need to do it line by line. There are several methods for doing so. Here is one method using a BufferedReader:

try(BufferedReader in = new BufferedReader(new FileReader(pathname))) < String line; while ((line = in.readLine()) != null) < // process line here. >>

5. Using a Scanner

The Scanner class provides another convenient way to read a file line by line, using the hasNextLine() and nextLine() methods.

try(Scanner scanner = new Scanner(new File(pathname))) < while ( scanner.hasNextLine() ) < String line = scanner.nextLine(); // process line here. >>

If you need to read line-by-line, I recommend the method above using BufferedReader since the Scanner method is slow as molasses.

6. With Java 8 Streams

Java 8 provides the streams facility which are useful in wide variety of cases. Here we can use the Files.lines() method to create a stream of lines from a file, apply any filters and do any processing we want. In the following example, we are selecting lines that contain the string abc and collect the results into a List.

List alist = Files.lines(Paths.get(pathname)) .filter(line -> line.contains("abc")) .collect(Collectors.toList());

Review

We discussed some methods for loading and processing files efficiently. First off, you could just load the whole file into memory if the file is small enough. For large files, you need to process chunks. A binary file can be processed in chunks of say, 4kB. A text file can be processed line by line.

Источник

Оцените статью