Today memories, documents and other important information has moved from paper to digital. Nearly everybody uses a computer and moves more and more assets in a purely digital world. Also I am constantly building up and enlarging a digital storage for my assets and from day to day rely more and more on this.
This article is how I organized my assets, especially pictures, and how I am doing backups which let me sleep well. Let’s start with the first step, your assets – i.e. your important data.
Organizing Your Data
On Windows and Linux most of your data will be stored in so called users or home directory. But as not everything under this tree is worth (or meant) to be backuped, I organize my data as follows:
This is not the folder the system generated (I don’t like any predefined semantics or operations). This is a folder just in my home folder. Below this there is a flat structure just, providing one folder for each „Event“ or date:
For me it is essential to keep the hierarchy flat. Just subdirectories for „prints“ or „mail“ are created.
Similar the folder „my_data“ holds all essential data: documents, scans, whatever is of importance.
Versioning is more important than you may think at first glance. Imagine you are writing your thesis and call the file simply thesis.odt. Even if you copying this every now and then to an USB-drive or other memory devices, I promise you, there will be the time when you are looking for an important change you made recently and have to scan all of the thesis.odt with hopefully finding one of them being the most recent.
My strategy is simple: if I make an important change to an image, document or other, I add a version number to the file name. So I would name the file given above thesis_v1-0.odt. Minor changes will be saved to the same file, but very soon I will do a save as thesis_v1-1.odt. So it’s very easy to see, which file is the most recent and to compare versions (and up-to-dateness) of files from backup media and primary source. I am doing this also with images (when editing them) and some software does this also automatically (e.g. darktable when exporting JPEGs).
Caution: this does not work with files where the name matters (source code) or with RAW-sidecar files!
But in most cases I am fine with that. This is also one of the core principles of my backup strategy: I never overwrite target files, but always add the new versions to the backup media.
Online Backup Media
One of the most important points of backup is that it has to be very, very simple and convenient to do: it has to be just „one click“ or one command at the shell, and everything else is done automatically. A backup with forces you to organize and think will be done too rarely. So a backup media always connected to your workplace and being online is one of the best ways to do backups regularly. My solution is to have a NAS-system connected to my home network which automatically is being mounted to my workplace on boot time to always the same path.
Doing the Backup
So after downloading images from the camera to my structure, I go to the BASH and simply enter backup.sh (more exactly there is a symbolic link named bu.sh pointing to the – you may guess it – most recent version of backup.sh, e.g. backup_v1-6.sh).
Only after having done the backup, I delete the images from the camera.
As my data organization and versioning guarantees that relevant changes of files or images will lead to a new file with a new (higher) version number, I can do the backup using rsync with the –ignore-existing option. That exactly is, what my backup-script is doing starting from the different directories where my assets reside (my_pictures, my_data and so on).
Offline Backup Media
According to the 3-2-1 strategy of backup, I use two USB-drives which I store off-site (e.g. at my office) always having just one of the two at home for refresh. As rumors in the internet say that off-the-shelve USB-drives contain refurbished or low quality disks, I assemble the USB-drives myself using NAS-level quality disks and a good (passive) cooling metal case.
Dangling Files – The Deleting Sync
Sometimes I do not have the time to organize every thing out of the camera nicely in place. I download the images first to a kind of temporary or staging directory but always being part of my main images directory my_pictures. Then backup works on this, everything is existing twice and I can format the SD-card.
Later on I move the images to the corresponding directories or delete low quality images. The backup is still holding those!
Thus I developed a safe method to get rid of those: I wrote a bash-script https://github.com/bashforever/safeback which does the following:
- it does the above mentioned rsync-backup
- after having done the rsync backup (which is ignore-existing and thus incremental), it reverses the search and scans the target (backup) location for files not being found on the source (primary) side. Every file found on the backup location but not on the source side is moved to a subdirectory named SAVE (name can be changed). Furthermore for each of these dangling files, a bash-remove-line is written to a textfile (rmsafe.sh). After having run safeback, a rmsafe is generated and you can inspect the list of the dangling files. Edit this file and then „source“ it on bash to really hard delete these files.
This method helps you, if you want to make an early backup but want to clean up later on.
Files with persistent names like source code or XMP-Files are not covered correctly with this method! The method described above relies on creating new file versions when data is being changed.
I hope I could give you some insight into my backup methods. They may differ from others, but maybe it helps you when organizing your own backups.