File containing its own checksum

Is it possible to create a file that will contain its own checksum (MD5, SHA1, whatever)? And to upset jokers I mean checksum in plain, not function calculating it.

Answers


I created a piece of code in C, then ran bruteforce for less than 2 minutes and got this wonder:

The CRC32 of this string is 4A1C449B

Note the must be no characters (end of line, etc) after the sentence.

You can check it here: http://www.crc-online.com.ar/index.php?d=The+CRC32+of+this+string+is+4A1C449B&en=Calcular+CRC32

This one is also fun:

I killed 56e9dee4 cows and all I got was...

Source code (sorry it's a little messy) here: http://www.latinsud.com/pub/crc32/


Yes. It's possible, and it's common with simple checksums. Getting a file to include it's own md5sum would be quite challenging.

In the most basic case, create a checksum value which will cause the summed modulus to equal zero. The checksum function then becomes something like

(n1 + n2 ... + CRC) % 256 == 0

If the checksum then becomes a part of the file, and is checked itself. A very common example of this is the Luhn algorithm used in credit card numbers. The last digit is a check digit, and is itself part of the 16 digit number.


Check this:

echo -e '#!/bin/bash\necho My cksum is 918329835' > magic

"I wish my crc32 was 802892ef..."

Well, I thought this was interesting so today I coded a little java program to find collisions. Thought I'd leave it here in case someone finds it useful:

import java.util.zip.CRC32;

public class Crc32_recurse2 {

    public static void main(String[] args) throws InterruptedException {

        long endval = Long.parseLong("ffffffff", 16);

        long startval = 0L;
//      startval = Long.parseLong("802892ef",16); //uncomment to save yourself some time

        float percent = 0;
        long time = System.currentTimeMillis();
        long updates = 10000000L; // how often to print some status info

        for (long i=startval;i<endval;i++) {

            String testval = Long.toHexString(i);

            String cmpval = getCRC("I wish my crc32 was " + testval + "...");
            if (testval.equals(cmpval)) {
                System.out.println("Match found!!! Message is:");
                System.out.println("I wish my crc32 was " + testval + "...");
                System.out.println("crc32 of message is " + testval);
                System.exit(0);
            }

            if (i%updates==0) {
                if (i==0) {
                    continue; // kludge to avoid divide by zero at the start
                }
                long timetaken = System.currentTimeMillis() - time;
                long speed = updates/timetaken*1000;
                percent =  (i*100.0f)/endval;
                long timeleft = (endval-i)/speed; // in seconds
                System.out.println(percent+"% through - "+ "done "+i/1000000+"M so far"
                        + " - " + speed+" tested per second - "+timeleft+
                        "s till the last value.");
                time = System.currentTimeMillis();
            }       
        }       
    }

    public static String getCRC(String input) {
        CRC32 crc = new CRC32();
        crc.update(input.getBytes());
        return Long.toHexString(crc.getValue());
    }

}

The output:

49.825756% through - done 2140M so far - 1731000 tested per second - 1244s till the last value.
50.05859% through - done 2150M so far - 1770000 tested per second - 1211s till the last value.
Match found!!! Message is:
I wish my crc32 was 802892ef...
crc32 of message is 802892ef

Note the dots at the end of the message are actually part of the message.

On my i5-2500 it was going to take ~40 minutes to search the whole crc32 space from 00000000 to ffffffff, doing about 1.8 million tests/second. It was maxing out one core.

I'm fairly new with java so any constructive comments on my code would be appreciated.

"My crc32 was c8cb204, and all I got was this lousy T-Shirt!"


Certainly, it is possible. But one of the uses of checksums is to detect tampering of a file - how would you know if a file has been modified, if the modifier can also replace the checksum?


Sure, you could concatenate the digest of the file itself to the end of the file. To check it, you would calculate the digest of all but the last part, then compare it to the value in the last part. Of course, without some form of encryption, anyone can recalculate the digest and replace it.

edit

I should add that this is not so unusual. One technique is to concatenate a CRC-32 so that the CRC-32 of the whole file (including that digest) is zero. This won't work with digests based on cryptographic hashes, though.


I don't know if I understand your question correctly, but you could make the first 16 bytes of the file the checksum of the rest of the file.

So before writing a file, you calculate the hash, write the hash value first and then write the file contents.


If the question is asking whether a file can contain its own checksum (in addition to other content), the answer is trivially yes for fixed-size checksums, because a file could contain all possible checksum values.

If the question is whether a file could consist of its own checksum (and nothing else), it's trivial to construct a checksum algorithm that would make such a file impossible: for an n-byte checksum, take the binary representation of the first n bytes of the file and add 1. Since it's also trivial to construct a checksum that always encodes itself (i.e. do the above without adding 1), clearly there are some checksums that can encode themselves, and some that cannot. It would probably be quite difficult to tell which of these a standard checksum is.


There is a neat implementation of the Luhn Mod N algorithm in the python-stdnum library ( see luhn.py). The calc_check_digit function will calculate a digit or character which, when appended to the file (expressed as a string) will create a valid Luhn Mod N string. As noted in many answers above, this gives a sanity check on the validity of the file, but no significant security against tampering. The receiver will need to know what alphabet is being used to define Luhn mod N validity.


There are many ways to embed information in order to detect transmission errors etc. CRC checksums are good at detecting runs of consecutive bit-flips and might be added in such a way that the checksum is always e.g. 0. These kind of checksums (including error correcting codes) are however easy to recreate and doesn't stop malicious tampering.

It is impossible to embed something in the message so that the receiver can verify its authenticity if the receiver knows nothing else about/from the sender. The receiver could for instance share a secret key with the sender. The sender can then append an encrypted checksum (which needs to be cryptographically secure such as md5/sha1). It is also possible to use asymmetric encryption where the sender can publish his public key and sign the md5 checksum/hash with his private key. The hash and the signature can then be tagged onto the data as a new kind of checksum. This is done all the time on internet nowadays.

The remaining problems then are 1. How can the receiver be sure that he got the right public key and 2. How secure is all this stuff in reality?. The answer to 1 might vary. On internet it's common to have the public key signed by someone everyone trusts. Another simple solution is that the receiver got the public key from a meeting in personal... The answer to 2 might change from day-to-day, but what's costly to force to day will probably be cheap to break some time in the future. By that time new algorithms and/or enlarged key sizes has hopefully emerged.


You can of course, but in that case the SHA digest of the whole file will not be the SHA you included, because it is a cryptographic hash function, so changing a single bit in the file changes the whole hash. What you are looking for is a checksum calculated using the content of the file in way to match a set of criteria.


Sure.

The simplest way would be to run the file through an MD5 algorithm and embed that data within the file. You can split up the check sum and place it at known points of the file (based on a portion size of the file e.g. 30%, 50%, 75%) if you wish to try and hide it.

Similarly you could encrypt the file, or encrypt a portion of the file (along with the MD5 checksum) and embed that in the file. Edit I forgot to say that you would need to remove the checksum data before using it.

Of course if your file needs to be readily readable by another program e.g. Word then things become a little more complicated as you don't want to "corrupt" the file so that it is no longer readable.


Need Your Help

Java - shorten the path of a directory

java file path directory

I would like to shorten the path of a directory.

Configure WAMP server to send email

php wamp phpmailer

Is there a way that I can configure the WAMP server for PHP to enable the mail() function?