Checking for corrupt FLAC files

Recently I had some strange conditions during playback of certain FLAC files in my digital tango collection. First I thought there must be a bug in my player but checking the player’s logs showed that the problem was within the FLAC stream. Somehow, some of the files must have gotten corrupted. This could have different causes, like power failures and unclean shutdowns, maybe an error on the media or an uncompleted copy task. Once a byte get’s changed in the FLAC file in the worst case it might not be playable anymore. In my case, I had two different conditions: First the file would start playing and then after a certain point the music output stopped. In another situation the corrupted FLAC file would trigger an exception in the decoder and it crashed the whole player application.

As a matter of fact this is a feature of the FLAC audio format: “Suitable for archiving: FLAC is an open format, and there is no generation loss if you need to convert your data to another format in the future. In addition to the frame CRCs and MD5 signature, FLAC has a verify option that decodes the encoded stream in parallel with the encoding process and compares the result to the original, aborting with an error if there is a mismatch.”

So the good news are that the FLAC format has a MD5 checksum mechanism integrated which provides for an easy integrity check. There is also a command line programm to check a FLAC file:

flac -wst flacfile.flac

Here are the command options explained:
-w, –warnings-as-errors Treat all warnings as errors (which cause flac to terminate with a non-zero exit code).
-s, –silent Silent: do not show encoding/decoding statistics.
-t, –test Test (same as -d except no decoded file is written). The exit codes are the same as in decode mode.

Now this is useful to check an individual FLAC file but when you have to scan several thousand files it might be more useful to put it into a shell script and run it against the whole music folder. I found this script which I saved as in my home folder:

cd ~/Musique
if [[ -f flac-errors.txt ]]; then
rm flac-errors.txt;
touch flac-errors.txt
shopt -s globstar
for file in ./**/*.flac; do
flac -wst "$file" 2>/dev/null || printf '%3d %s\n' "$?" "$file" >> flac-errors.txt;

The script changes into the Musique folder in my home directory and then creates a text file called flac-errors.txt, if it’s running consecutively, it tests if this text files exists and when it exists it deletes and recreates it as an empty file prior to proceeding.

shopt -s globstar means that the Bash script will perform recursive globbing on ** – therefore matching all directories and files from the current position in the filesystem, rather that only the current level.

In the for loop it will loop through all FLAC files in the Musique folder and its subfolders performing the integrity test. If the FLAC file is OK, the output is send to /dev/null which means the output is deleted and if the test is not OK, meaning that there is corruption, it will be written into the flac-errors.txt file with a little formatting. So you will have the title and the path of the corrupt FLAC file written each on one line in the text file for a later analysis and eventual restoration of the dammaged files.

The script will take quite some time to loop through all files. What I do is opening the flac-errors.txt for continuous reading to see the progress in another console, like this:

tail -f /home/jens/Musique/flac-errors.txt

So every once in a while such a test might be a good idea to check if all the music files in the collection are still OK. This rules out bad surprises during playback!

By the way, the foobar2000 player has such an integrity test in the interface, Mixxx will write FLAC stream errors into its log file.

The test script can also be useful to be run on newly added folders in the music collection to check that all FLAC files are in OK condition and that the encoding worked out well, like a last test before playback or archival. The verify feature of the FLAC audio format is actually a big advantage compared to other formats which don’t have such a mechanism!