How does Amazon EBS Snapshot determine modified files?
Typically, a sync application detects file modifications based on file size changes and modified dates. Does this hold true for Amazon EBS Snapshot processes?
Recently I noticed a fixed sized file was excluded from a snapshot even though the file was modified numerous times over a period. When the most recent snapshot (in fact, any snapshots created after the first one) was loaded into a new instance, only the initial content could be located within the file. Even the modified date on the file was set to the initial date.
This made me wonder how Amazon EBS Snapshot determines modified files. Are there any configurations I can change to ensure fixed sized files are added to snapshots correctly?
Yes, But it does so on a block level. That is, it detects changed low-level blocks rather than files.
EBS does use block level diffs to determine what to store (as @Dmitry says). However, EBS also keeps all previously existing snapshots that are needed to fully re-create the state of the file system in the state it was in when the snapshot was taken.
If your snapshot shows an old state of a given file, you certainly are looking at an old snapshot.
Amazon EBS snapshots are incremental backups, meaning that only the blocks on the device that have changed since your last snapshot will be saved. If you have a device with 100 GBs of data, but only 5 GBs of data has changed since your last snapshot, only the 5 additional GBs of snapshot data will be stored back to Amazon S3. Even though the snapshots are saved incrementally, when you delete a snapshot, only the data not needed for any other snapshot is removed. So regardless of which prior snapshots have been deleted, all active snapshots will contain all the information needed to restore the volume.