Java lines from file

How can I read a large text file line by line using Java?

I need to read a large text file of around 5-6 GB line by line using Java. How can I do this quickly?

@kamaci et. al. This question should not be marked as a duplicate. «Quickly read the last line» is not an alternative, and its debatable whether «Quickest way to read text-file line by line» is. The quickest way to do something is not necessarily the common way. Furthermore, the answers below include code, the most relevant alternative you list does not. This question is useful. It is currently the top google search result for «java read file line by line». Finally, its off putting to arrive at stack overflow and find that 1 in every 2 question is flagged for disposal.

Event though I have been reading comments arguing that SO’s close policy sucks, SO persists in it. It’s such a narrow minded developer perspective to want to avoid redundancy at all costs! Just let it be! The cream will rise to the top and the sh*t will sink to the bottom just fine all by itself. Even though a question may have been asked before (which question isn’t??), that does not mean that a new question may not be able to phrase it better, get better answers, rank higher in search engines etc. Interestingly, this question is now ‘protected’.

After Shog’s edit this is indeed a duplicate of stackoverflow.com/q/5800361/103167 but this one has gotten far more activity.

22 Answers 22

A common pattern is to use

try (BufferedReader br = new BufferedReader(new FileReader(file))) < String line; while ((line = br.readLine()) != null) < // process the line. >> 

You can read the data faster if you assume there is no character encoding. e.g. ASCII-7 but it won’t make much difference. It is highly likely that what you do with the data will take much longer.

EDIT: A less common pattern to use which avoids the scope of line leaking.

try(BufferedReader br = new BufferedReader(new FileReader(file))) < for(String line; (line = br.readLine()) != null; ) < // process the line. >// line is not visible here. > 

UPDATE: In Java 8 you can do

try (Stream stream = Files.lines(Paths.get(fileName)))

NOTE: You have to place the Stream in a try-with-resource block to ensure the #close method is called on it, otherwise the underlying file handle is never closed until GC does it much later.

What does this pattern look like with proper exception handling? I note that br.close() throws IOException, which seems surprising — what could happen when closing a file that is opened for read, anyway? FileReader’s constructor might throw a FileNotFound exception.

If I have a 200MB file and it can read at 90MB/s then I expect it to take ~3s? Mine seem to take minutes, with this «slow» way of reading. I am on an SSD so read speeds should not be a problem?

@JiewMeng SO I would suspect something else you are doing is taking time. Can you try just reading the lines of the file and nothing else.

Why not for(String line = br.readLine(); line != null; line = br.readLine()) Btw, in Java 8 you can do try( Stream lines = Files.lines(. ) )< for( String line : (Iterable) lines::iterator ) < . >> Which is hard not to hate.

@AleksandrDubinsky The problem I have with closures in Java 8 is that it very easily makes the code more complicated to read (as well as being slower) I can see lots of developers overusing it because it is «cool».

The buffer size may be specified, or the default size may be used. The default is large enough for most purposes.

// Open the file FileInputStream fstream = new FileInputStream("textfile.txt"); BufferedReader br = new BufferedReader(new InputStreamReader(fstream)); String strLine; //Read File Line By Line while ((strLine = br.readLine()) != null) < // Print the content on the console System.out.println (strLine); >//Close the input stream fstream.close(); 

Downvoted for poor quality link. There is a completely pointless DataInputStream , and the wrong stream is closed. Nothing wrong with the Java Tutorial, and no need to cite arbitrary third-party Internet rubbish like this.

Читайте также:  HTML фон

Once Java 8 is out (March 2014) you’ll be able to use streams:

try (Stream lines = Files.lines(Paths.get(filename), Charset.defaultCharset())) < lines.forEachOrdered(line ->process(line)); > 

Printing all the lines in the file:

try (Stream lines = Files.lines(file, Charset.defaultCharset()))

Use StandardCharsets.UTF_8 , use Stream for conciseness, and avoid using forEach() and especially forEachOrdered() unless there’s a reason.

@steventrouble Take a look at: stackoverflow.com/questions/16635398/… It’s not bad if you pass a short function reference like forEach(this::process) , but it gets ugly if you write blocks of code as lambdas inside forEach() .

@msayag, You’re right, you need forEachOrdered in order to execute in-order. Be aware that you won’t be able to parallelize the stream in that case, although I’ve found that parallelization doesn’t turn on unless the file has thousands of lines.

Here is a sample with full error handling and supporting charset specification for pre-Java 7. With Java 7 you can use try-with-resources syntax, which makes the code cleaner.

If you just want the default charset you can skip the InputStream and use FileReader.

InputStream ins = null; // raw byte-stream Reader r = null; // cooked reader BufferedReader br = null; // buffered for readLine() try < String s; if (true) < String data = "#foobar\t1234\n#xyz\t5678\none\ttwo\n"; ins = new ByteArrayInputStream(data.getBytes()); >else < ins = new FileInputStream("textfile.txt"); >r = new InputStreamReader(ins, "UTF-8"); // leave charset out for default br = new BufferedReader(r); while ((s = br.readLine()) != null) < System.out.println(s); >> catch (Exception e) < System.err.println(e.getMessage()); // handle exception >finally < if (br != null) < try < br.close(); >catch(Throwable t) < /* ensure close happens */ >> if (r != null) < try < r.close(); >catch(Throwable t) < /* ensure close happens */ >> if (ins != null) < try < ins.close(); >catch(Throwable t) < /* ensure close happens */ >> > 

Here is the Groovy version, with full error handling:

File f = new File("textfile.txt"); f.withReader("UTF-8") < br ->br.eachLine < line ->println line; > > 

absolutely useless closes. There is zero reason to close every stream. If you close any of those streams you automatically close all other streams.

I documented and tested 10 different ways to read a file in Java and then ran them against each other by making them read in test files from 1KB to 1GB. Here are the fastest 3 file reading methods for reading a 1GB test file.

Note that when running the performance tests I didn’t output anything to the console since that would really slow down the test. I just wanted to test the raw reading speed.

Tested in Java 7, 8, 9. This was overall the fastest method. Reading a 1GB file was consistently just under 1 second.

import java.io..File; import java.io.IOException; import java.nio.file.Files; public class ReadFile_Files_ReadAllBytes < public static void main(String [] pArgs) throws IOException < String fileName = "c:\\temp\\sample-1GB.txt"; File file = new File(fileName); byte [] fileBytes = Files.readAllBytes(file.toPath()); char singleChar; for(byte b : fileBytes) < singleChar = (char) b; System.out.print(singleChar); >> > 

This was tested successfully in Java 8 and 9 but it won’t work in Java 7 because of the lack of support for lambda expressions. It took about 3.5 seconds to read in a 1GB file which put it in second place as far as reading larger files.

import java.io.File; import java.io.IOException; import java.nio.file.Files; import java.util.stream.Stream; public class ReadFile_Files_Lines < public static void main(String[] pArgs) throws IOException < String fileName = "c:\\temp\\sample-1GB.txt"; File file = new File(fileName); try (Stream linesStream = Files.lines(file.toPath())) < linesStream.forEach(line ->< System.out.println(line); >); > > > 

Tested to work in Java 7, 8, 9. This took about 4.5 seconds to read in a 1GB test file.

import java.io.BufferedReader; import java.io.FileReader; import java.io.IOException; public class ReadFile_BufferedReader_ReadLine < public static void main(String [] args) throws IOException < String fileName = "c:\\temp\\sample-1GB.txt"; FileReader fileReader = new FileReader(fileName); try (BufferedReader bufferedReader = new BufferedReader(fileReader)) < String line; while((line = bufferedReader.readLine()) != null) < System.out.println(line); >> > 

You can find the complete rankings for all 10 file reading methods here.

Читайте также:  Match method in python

Источник

java: interface to get lines from a File or String?

I have a method that iterates over the lines of a file. At present it does the whole dance of opening the file and closing it. Now I want to change the method, so that I can pass in an instance of some interface, possibly Iterator , so that I can either read from a file, or just get the lines from a List if I want to provide the input directly. Is there a convenient way to do this? Writing my own method of deriving an Iterator from a File seems like it would be very tricky to get correct. I guess the closest way I can think of is to use Guava’s Files.readLines(), but that’s not an iterator, so it has problems with very large files.

Can you elaborate why LineProcessor not being Iterator would be a problem to read large files? Even if you had an iterator, it would still not allow skipping some content without reading it. Whilst with LineProcessor you can always return false to stop processing which is basically the same thing. It’s the same thing, just with reversed control.

5 Answers 5

java.io.Scanner implements Iterator, so I think that’s exactly what you want. Write your method to expect an Iterator, and then you could pass a Scanner opened on a file, or an Iterator from a list of Strings.

Huh. Just read the javadoc download.oracle.com/javase/1.5.0/docs/api/java/util/… — Looks like I could do that if I set the delimiter to the end-of-line characters.

I know this sounds bad, but you can consider extending Scanner and writing a finalizer which does .close(). Just don’t tell anybody I suggested this 🙂 By the way, Scanner implements AutoCloseable in Java 7.

oh good, so they’ve fixed it. I was trying to file a bug but Oracle’s site hung when I tried logging in. >:-(

It doesn’t strike me as too difficult to write an Iterator implementation that simply wraps a BufferedReader (around a FileReader or similar).

As is common with self-rolled iterator implementations, you might need to do single-element look-ahead in order to implement hasNext() properly, but asides from that wrinkle you can more or less just delegate to BufferedReader.readLine() .

In fact, it wouldn’t surprise me to learn that there’s already a third-party class that does this (though I’m not aware of one at present).

@Jason S: true, but that’s essentially unavoidable. There’s always a possibility of IOException when doing file IO, so if you want the lines to be read lazily and they’re coming from a file, any invocation of hasNext() / next() could fail with an IO error. The Iterator interface doesn’t give you a clean way to recover here, but unless you’ve cached the whole file beforehand, all possible implementations will face the same problem.

The recommended usage pattern is:

LineIterator it = FileUtils.lineIterator(file, "UTF-8"); try < while (it.hasNext()) < String line = it.nextLine(); // do something with line >> finally
interface ISourceOfLines < ListgetLines(); > public class FileSource : ISourceOfLines < public FileSource(String filename) < // store fileName >public List getLines() < // open file and return lines >> 

I ended up writing my own interface and two lightweight classes that implement it, one that encapsulates Iterator and one that takes its input from a File:

import java.io.Closeable; import java.io.IOException; public interface LineReader2 extends Closeable
import java.io.BufferedReader; import java.io.File; import java.io.FileReader; import java.io.IOException; import java.io.Reader; import java.util.Iterator; public class LineReaders2 < private LineReaders2() <>static class FileLineReader implements LineReader2 < final private BufferedReader br; public FileLineReader(Reader in) < this.br = new BufferedReader(in); >@Override public void close() throws IOException < this.br.close(); >@Override public String readLine() throws IOException < return this.br.readLine(); >> static class StringIteratorReader implements LineReader2 < final private Iteratorit; public StringIteratorReader(Iterator it) < this.it = it; >@Override public void close() <> @Override public String readLine() < return this.it.hasNext() ? this.it.next() : null; >> static public LineReader2 createReader(File f) throws IOException < return new FileLineReader(new FileReader(f)); >static public LineReader2 createReader(Iterable iterable) < return new StringIteratorReader(iterable.iterator()); >static public LineReader2 createReader(Iterator iterator) < return new StringIteratorReader(iterator); >> 

Источник

Читайте также:  Php curl disable gzip

How to read all the lines of a file using java code?

I have a strange problem where I have a log file called transactionHandler.log.It is a very big file having 17102 lines.This i obtain when i do the following in the linux machine:

wc -l transactionHandler.log 17102 transactionHandler.log 
import java.io.*; import java.util.Scanner; import java.util.Vector; public class Reader < public static void main(String[] args) throws IOException < int counter = 0; String line = null; // Location of file to read File file = new File("transactionHandler.log"); try < Scanner scanner = new Scanner(file); while (scanner.hasNextLine()) < line = scanner.nextLine(); System.out.println(line); counter++; >scanner.close(); > catch (FileNotFoundException e) < e.printStackTrace(); >System.out.println(counter); > > 

Have you compared the output of your program with the original logfile? Do you see any differences (and what, if any)? Have you tried with smaller input files? Do you observe the error with any input file, or only with specific ones?

Are you running your java program on same linux machine or copied file to other machine and ran program there?

I’d recommend you to run your program redirecting output to other file. Then run diff command to compare original and new file. I believe you will see the difference quickly.

@Phoenix225 What comes to my mind is that wc -l counts the occurrences of all EOL delimiters. The Scanner of Java probably (need to test to confirm this) will ignore repeated EOL delimiters (this means that will ignore empty lines).

Moreover Scanner class has its own limitations about loading large files. Please check this post stackoverflow.com/questions/10336478/…

1 Answer 1

From what I know, Scanner uses \n as delimiter by default. Maybe your file has \r\n . You could modify this by calling scanner.useDelimiter or (and this is much better) try using this as an alternative:

import java.io.*; public class IOUtilities < public static int getLineCount (String filename) throws FileNotFoundException, IOException < LineNumberReader lnr = new LineNumberReader (new FileReader (filename)); while ((lnr.readLine ()) != null) <>return lnr.getLineNumber (); > > 

According to the documentation of LineNumberReader:

A line is considered to be terminated by any one of a line feed (‘\n’), a carriage return (‘\r’), or a carriage return followed immediately by a linefeed.

so it’s very adaptable for files that have different line terminating characters.

Give it a try, see what it does.

Источник

Оцените статью