MD5 (Message Digest 5) is a cryptographic hash function whose purpose is to verify that a file has not been modified. Instead of confirming that the two datasets are identical by comparing the raw data, MD5 does so by producing a checksum for both datasets and then comparing them to make sure they match.

MD5 has certain disadvantages that it is useless for advanced encryption applications, but it is perfectly acceptable to use it for standard file checks.

What is a hash function?

The hash function is designed to fold an input array of any size into a bit string; for MD5, the output string length is 128 bits. What is it for? For example, you have two arrays, and you need to quickly compare them for equality. Then the hash function can do it for you. If the two arrays have different hashes, then the arrays are guaranteed to be different, and if the hashes are equal, the arrays are most likely equal.

However, most often, hash functions are used to check the uniqueness of a password, file, string, etc. For example, when downloading a file from the Internet, you often see a line next to it like b10a8db164e0754105b7a99be72e3fe5 – this is the hash. Running this file through the MD5 algorithm, you will get such a line, and if the hashes are equal, you can most likely say that this file is genuine (of course, with some reservations).

History

MD5 is a hash function developed by MIT Professor Ronald L. Rivest. It was developed in 1991 as a more robust version of the previous MD4 hash function. <Ref name = “rsa faq”> Template: Cite web </ref> Described in RFC 1321. <ref> Template: Cite web </ref> Later by Hans Dobbertin shortcomings of the MD4 algorithm were found.

In 1993, Bert den Boer and Antoon Bosselaers showed that pseudo-collisions are possible in the algorithm when the same digests for the input message correspond to different initialization vectors.

In 1996, Hans Dobbertin announced a collision in the algorithm, and already at that time. It was proposed to use other hashing algorithms such as Whirlpool, SHA-1, or RIPEMD-160.

Due to the small hash size of 128 bits, birthday attacks can be considered. In March 2004, the MD5CRK project was launched to discover algorithm vulnerabilities using birthday attacks. This project finished in August 2004, when Wang Xiaoyun, Lai Xuejia, Feng Dengguo, and Yu Hongbo discovered vulnerabilities in the algorithm.

On March 1, 2005, Arjen Lenstra, Xiaoyun Wang, and Benne de Weger demonstrated the construction of two X.509 documents with different public keys and the same MD5 hash.

In late 2008, US-CERT urged software developers, website owners, and users to stop using MD5 for any purpose, as research has shown the algorithm to be unreliable. <Ref name = “USCERT” />

Using MD5 Checker or MD5 Generator

Microsoft File Checksum Integrity Verifier (FCIV) is one free calculator that can generate MD5 checksum from files, not just text.

One easy way to get an MD5 hash from a sequence of letters, symbols, and numbers is with the tool called Miracle Salad MD5 Hash Generator. There are many others as well, such as MD5 Hash Generator, PasswordsGenerator, and OnlineMD5.

When the same hashing algorithm is used, the same results are obtained. It means that you can use one MD5 calculator to get the MD5 checksum of a specific text and then use a completely different MD5 calculator to get the same results. It can be repeated with every tool that generates a checksum based on an MD5 hash function.

Reliability

There is an opinion that it is impossible to crack the MD5 hash, but this is not true. Many programs select the source word based on the hash. The vast majority of them perform dictionary enumeration, but there are methods such as RainbowCrack. It is based on generating a set of hashes from a set of characters to search for a hash using the resulting base.

Also, MD5, like any hash function, has such a concept as collisions – this is getting the same hashes for different source strings. In 1996, Hans Dobbertin found pseudo-collisions in MD5 using a specific initialization buffer (ABCD). Also, in 2004, Chinese researchers Wang Xiaoyun, Feng Dengguo, Lai Xuejia, and Yu Hongbo announced that they discovered a vulnerability in an algorithm that allows them to find collisions in a short time (1 hour on an IBM p690 cluster). However, in 2006, Czech researcher Vlastimil Klima published an algorithm that allows you to find collisions on an ordinary computer with any initial vector (A, B, C, D) using a method he called tunneling.