File hashing in java

Java: How to create SHA-1 for a file?

What is the best way to create a SHA-1 for a very large file in pure Java6? How to implement this method:

public abstract String createSha1(java.io.File file); 

3 Answers 3

Use the MessageDigest class and supply data piece by piece. The example below ignores details like turning byte[] into string and closing the file, but should give you the general idea.

public byte[] createSha1(File file) throws Exception < MessageDigest digest = MessageDigest.getInstance("SHA-1"); InputStream fis = new FileInputStream(file); int n = 0; byte[] buffer = new byte[8192]; while (n != -1) < n = fis.read(buffer); if (n >0) < digest.update(buffer, 0, n); >> return digest.digest(); > 

The DigestInputStream class is even easier to use. Actually maybe not, but it is good to try it as an alternative and compare to this.

Use BufferedInputStream instead of creating your own buffer: InputStream fis = new BufferedInputStream(new FileInputStream(file));

Op requested the function to return a String of the SHA1, so I took @jeffs answer and added the missing conversion to String:

/** * Read the file and calculate the SHA-1 checksum * * @param file * the file to read * @return the hex representation of the SHA-1 using uppercase chars * @throws FileNotFoundException * if the file does not exist, is a directory rather than a * regular file, or for some other reason cannot be opened for * reading * @throws IOException * if an I/O error occurs * @throws NoSuchAlgorithmException * should never happen */ private static String calcSHA1(File file) throws FileNotFoundException, IOException, NoSuchAlgorithmException < MessageDigest sha1 = MessageDigest.getInstance("SHA-1"); try (InputStream input = new FileInputStream(file)) < byte[] buffer = new byte[8192]; int len = input.read(buffer); while (len != -1) < sha1.update(buffer, 0, len); len = input.read(buffer); >return new HexBinaryAdapter().marshal(sha1.digest()); > > 

Источник

How to Calculate File Checksum MD5, SHA in Java

A checksum is used to ensure the integrity of a file after it has been transmitted from one storage device to another. It is a way to ensure that the transmitted file is exactly the same as the source file. It functions as a fingerprint of that file. The checksum or hash sum is calculated using a hash function. In this tutorial we will show you how to calculate file checksum using MD5 and SHA algorithms.

Calculate File Checksum

Here is a class that will generate a checksum hash in one of the registered hash algorithms like MD5 or SHA. This class allows you to simply create a checksum of a file using one of the popular hashing algorithms.

package com.memorynotfound.file; import java.io.File; import java.io.FileInputStream; import java.io.InputStream; import java.security.MessageDigest; public enum Hash < MD5("MD5"), SHA1("SHA1"), SHA256("SHA-256"), SHA512("SHA-512"); private String name; Hash(String name) < this.name = name; >public String getName() < return name; >public byte[] checksum(File input) < try (InputStream in = new FileInputStream(input)) < MessageDigest digest = MessageDigest.getInstance(getName()); byte[] block = new byte[4096]; int length; while ((length = in.read(block)) >0) < digest.update(block, 0, length); >return digest.digest(); > catch (Exception e) < e.printStackTrace(); >return null; > >

Creating a checksum of a File

Using this class to calculate file checksum is very straight forward.

package com.memorynotfound.file; import javax.xml.bind.DatatypeConverter; import java.io.File; public class FileChecksumExample < public static void main(String[] args) throws Exception < File file = new File("/tmp/test.pdf"); System.out.println("MD5 : " + toHex(Hash.MD5.checksum(file))); System.out.println("SHA1 : " + toHex(Hash.SHA1.checksum(file))); System.out.println("SHA256 : " + toHex(Hash.SHA256.checksum(file))); System.out.println("SHA512 : " + toHex(Hash.SHA512.checksum(file))); >private static String toHex(byte[] bytes) < return DatatypeConverter.printHexBinary(bytes); >>

Output

The result is different file checksums for different algorithms. You can use this checksum as a fingerprint for your file.

MD5 : C58FB4E4ABBAE09557566ED313C18DDB SHA1 : 8489CEBDF4AC646417E2AAC108AB643AA8299BEE SHA256 : 53D2ED4AABBE64D4B93A79BA1B579FCAFB53C8890443DE8200F021671322382A SHA512 : 395294CE9D805B18FA2DFE86DF7F25932DE773451D4EFAA387131429F50EF60F1465FFC1EDCAD77C10C99D88EBBB668A312534ACC34CFF459155B224DD50DFD3

References

Источник

Читайте также:  Php class datetime to string

Generate the MD5 Checksum for a File in Java

announcement - icon

Repeatedly, code that works in dev breaks down in production. Java performance issues are difficult to track down or predict.

Simply put, Digma provides immediate code feedback. As an IDE plugin, it identifies issues with your code as it is currently running in test and prod.

The feedback is available from the minute you are writing it.

Imagine being alerted to any regression or code smell as you’re running and debugging locally. Also, identifying weak spots that need attending to, based on integration testing results.

Of course, Digma is free for developers.

announcement - icon

As always, the writeup is super practical and based on a simple application that can work with documents with a mix of encrypted and unencrypted fields.

We rely on other people’s code in our own work. Every day.

It might be the language you’re writing in, the framework you’re building on, or some esoteric piece of software that does one thing so well you never found the need to implement it yourself.

The problem is, of course, when things fall apart in production — debugging the implementation of a 3rd party library you have no intimate knowledge of is, to say the least, tricky.

Lightrun is a new kind of debugger.

It’s one geared specifically towards real-life production environments. Using Lightrun, you can drill down into running applications, including 3rd party dependencies, with real-time logs, snapshots, and metrics.

Learn more in this quick, 5-minute Lightrun tutorial:

announcement - icon

Slow MySQL query performance is all too common. Of course it is. A good way to go is, naturally, a dedicated profiler that actually understands the ins and outs of MySQL.

The Jet Profiler was built for MySQL only, so it can do things like real-time query performance, focus on most used tables or most frequent queries, quickly identify performance issues and basically help you optimize your queries.

Critically, it has very minimal impact on your server’s performance, with most of the profiling work done separately — so it needs no server changes, agents or separate services.

Basically, you install the desktop application, connect to your MySQL server, hit the record button, and you’ll have results within minutes:

announcement - icon

DbSchema is a super-flexible database designer, which can take you from designing the DB with your team all the way to safely deploying the schema.

The way it does all of that is by using a design model, a database-independent image of the schema, which can be shared in a team using GIT and compared or deployed on to any database.

And, of course, it can be heavily visual, allowing you to interact with the database using diagrams, visually compose queries, explore the data, generate random data, import data or build HTML5 database reports.

Читайте также:  Echo php в указанном

Источник

Java File Checksum – MD5 and SHA-256 Hash Examples

A checksum hash is an encrypted sequence of characters obtained after applying specific algorithms and manipulations on user-provided content. In this Java hashing tutorial, we will learn to generate the checksum hash for the files.

1. Why Generate a File’s Checksum?

Any serious file provider provides a mechanism to have a checksum on their downloadable files. A checksum is a form of mechanism to ensure that the file we downloaded is correctly downloaded.

Checksum acts like a proof of the validity of a file so if a file gets corrupted this checksum will change and thus let us know that this file is not the same file or the file has been compromised between the transfer for any reason.

We can also create the file’s checksum to detect any possible change in the file by third parties e.g. license files. We provide licenses to clients which they may upload to their servers. We can cross-verify the file’s checksum to verify that the license file has not been modified after creation.

To create checksum for a file, we will need to read the file’s content, and then generate the hash for it using one of the following methods. Note that both approaches support all types of algorithms so we can use the same code for other algorithms such as HmacMd5, SHA, SHA-512 etc.

2. Generate File Checksum with MessageDigest

MessageDigest class provides applications with the functionality of a message digest algorithm, such as MD5 or SHA-256. Its getInstance() method returns a MessageDigest object that implements the specified digest algorithm.

Example 1: Generate MD5 Hash for a File in Java

Path filePath = Path.of("c:/temp/testOut.txt"); byte[] data = Files.readAllBytes(Paths.get(filePath)); byte[] hash = MessageDigest.getInstance("MD5").digest(data); String checksum = new BigInteger(1, hash).toString(16);

Example 2: Generate SHA-256 Hash for a File in Java

Path filePath = Path.of("c:/temp/testOut.txt"); byte[] data = Files.readAllBytes(Paths.get(filePath)); byte[] hash = MessageDigest.getInstance("SHA-256").digest(data); String checksum = new BigInteger(1, hash).toString(16);

3. Generate File Checksum with Guava

In Google Guava, ByteSource.hash() method hashes the contents with the specified hash function as method argument.

Start with adding the latest version of Guava to the project’s classpath.

 com.google.guava guava 31.1-jre  

Now we can use the hash() function as follows.

Example 1: Generate MD5 Hash for a File in Java

File file = new File("c:/temp/test.txt"); ByteSource byteSource = com.google.common.io.Files.asByteSource(file); HashCode hc = byteSource.hash(Hashing.md5()); String checksum = hc.toString();

Example 2: Generate SHA-256 Hash for a File in Java

File file = new File("c:/temp/test.txt"); ByteSource byteSource = com.google.common.io.Files.asByteSource(file); HashCode hc = byteSource.hash(Hashing.sha256()); String checksum = hc.toString();

Drop me a comment if something needs more explanation.

Источник

Java: Calculate SHA-256 hash of large file efficiently

I need to calculate a SHA-256 hash of a large file (or portion of it). My implementation works fine, but its much slower than the C++’s CryptoPP calculation (25 Min. vs. 10 Min for ~30GB file). What I need is a similar execution time in C++ and Java, so the hashes are ready at almost the same time. I also tried the Bouncy Castle implementation, but it gave me the same result. Here is how I calculate the hash:

int buff = 16384; try < RandomAccessFile file = new RandomAccessFile("T:\\someLargeFile.m2v", "r"); long startTime = System.nanoTime(); MessageDigest hashSum = MessageDigest.getInstance("SHA-256"); byte[] buffer = new byte[buff]; byte[] partialHash = null; long read = 0; // calculate the hash of the hole file for the test long offset = file.length(); int unitsize; while (read < offset) < unitsize = (int) (((offset - read) >= buff) ? buff : (offset - read)); file.read(buffer, 0, unitsize); hashSum.update(buffer, 0, unitsize); read += unitsize; > file.close(); partialHash = new byte[hashSum.getDigestLength()]; partialHash = hashSum.digest(); long endTime = System.nanoTime(); System.out.println(endTime - startTime); > catch (FileNotFoundException e)

7 Answers 7

My explanation may not solve your problem since it depends a lot on your actual runtime environment, but when I run your code on my system, the throughput is limited by disk I/O and not the hash calculation. The problem is not solved by switching to NIO, but is simply caused by the fact that you’re reading the file in very small pieces (16kB). Increasing the buffer size (buff) on my system to 1MB instead of 16kB more than doubles the throughput, but with >50MB/s, I am still limited by disk speed and not able to fully load a single CPU core.

Читайте также:  Java serializable to json

BTW: You can simplify your implementation a lot by wrapping a DigestInputStream around a FileInputStream, read through the file and get the calculated hash from the DigestInputStream instead of manually shuffling the data from a RandomAccessFile to the MessageDigest as in your code.

I did a few performance tests with older Java versions and there seem to be a relevant difference between Java 5 and Java 6 here. I’m not sure though if the SHA implementation is optimized or if the VM is executing the code much faster. The throughputs I get with the different Java versions (1MB buffer) are:

  • Sun JDK 1.5.0_15 (client): 28MB/s, limited by CPU
  • Sun JDK 1.5.0_15 (server): 45MB/s, limited by CPU
  • Sun JDK 1.6.0_16 (client): 42MB/s, limited by CPU
  • Sun JDK 1.6.0_16 (server): 52MB/s, limited by disk I/O (85-90% CPU load)

I was a little bit curious on the impact of the assembler part in the CryptoPP SHA implementation, as the benchmarks results indicate that the SHA-256 algorithm only requires 15.8 CPU cycles/byte on an Opteron. I was unfortunately not able to build CryptoPP with gcc on cygwin (the build succeeded, but the generated exe failed immediately), but building a performance benchmark with VS2005 (default release configuration) with and without assembler support in CryptoPP and comparing to the Java SHA implementation on an in-memory buffer, leaving out any disk I/O, I get the following results on a 2.5GHz Phenom:

  • Sun JDK1.6.0_13 (server): 26.2 cycles/byte
  • CryptoPP (C++ only): 21.8 cycles/byte
  • CryptoPP (assembler): 13.3 cycles/byte

Both benchmarks compute the SHA hash of a 4GB empty byte array, iterating over it in chunks of 1MB, which are passed to MessageDigest#update (Java) or CryptoPP’s SHA256.Update function (C++).

I was able to build and benchmark CryptoPP with gcc 4.4.1 (-O3) in a virtual machine running Linux and got only appr. half the throughput compared to the results from the VS exe. I am not sure how much of the difference is contributed to the virtual machine and how much is caused by VS usually producing better code than gcc, but I have no way to get any more exact results from gcc right now.

Источник

Оцените статью