how do I check that two folders are the same in linux
I have moved a web site from one server to another and I copied the files using SCP
I now wish to check that all the files have been copied OK.
How do I compare the sites?
Count files for a folder?
Get the total files size for folder tree?
or is there a better way to compare the sites?
If you were using scp, you could probably have used rsync.
rsync won't transfer files that are already up to date, so you can use it to verify a copy is current by simply running rsync again.
If you were doing something like this on the old host:
scp -r from/my/dir newhost:/to/new/dir
Then you could do something like
rsync -a --progress from/my/dir newhost:/to/new/dir
The '-a' is short for 'archive' which does a recursive copy and preserves permissions, ownerships etc. Check the man page for more info, as it can do a lot of clever things.
Using diff with the recursive -r and quick -q option. It is the best and by far the fastest way to do this.
diff -r -q /path/to/dir1 /path/to/dir2
It won't tell you what the differences are (remove the -q option to see that), but it will very quickly tell you if all the files are the same.
If it shows no output, all the files are the same, otherwise it will list the files that are different.
cd website find . -type f -print | sort | xargs sha1sum
will produce a list of checksums for the files. You can then diff those to see if there are any missing/added/different files.
maybe you can use something similar to this:
find <original root dir> | xargs md5sum > original find <new root dir> | xargs md5sum > new diff original new
If you used scp, you probably can also use rsync over ssh.
rsync -avH --delete-after 1.example.com:/path/to/your/dir 2.example.com:/path/to/your/
rsync does the checksums for you.
Be sure to use the -n option to perform a dry-run. Check the manual page.
I prefer rsync over scp or even local cp, every time I can use it.
If rsync is not an option, md5sum can generate md5 digests and md5sumc --check will check them.
To add on reply from Sidney. It is not very necessary to filter out -type f, and produce hash code. In reply to zidarsk8, you don't need to sort, since find, same as ls, sorts the filenames alphabetically by default. It works for empty directories as well.
To summarize, top 3 best answers would be: (P.S. Nice to do a dry run with rsync)
diff -r -q /path/to/dir1 /path/to/dir2 diff <(cd dir1 && find) <(cd dir2 && find) rsync --dry-run -avh from/my/dir newhost:/to/new/dir
Make checksums for all files, for example using md5sum. If they're all the same for all the files and no file is missing, everything's OK.
Try diffing your directory recursively. You'll get a nice summary if something is different in one of the directories.
I have been move a web site from one server to another I copied the files using SCP
You could do this with rsync, it is great if you just want to mirror something.
Update : Seems like @rjack beat me with the rsync answer with 6 seconds :-)
I would add this to Douglas Leeder or Eineki, but sadly, don't have enough reputation to comment. Anyway, their answers are both great, excepting that they don't work for file names with spaces. To make that work, do
find [dir1] -type f -print0 | xargs -0 [preferred hash function] > [file1] find [dir2] -type f -print0 | xargs -0 [preferred hash function] > [file2] diff -y [file1] [file2]
Just from experimenting, I also like to use the -W ### arguement on diff and output it to a file, easier to parse and understand in the terminal.