What is the best sampling frequency

First of all, let’s make clear what “sampling frequency” stands for:

The sampling frequency is how many snapshot of an audio signal your recording device takes during one second.

Simple as that.

Example: 44.1 Khz means that your device takes 44.100 snapshots (sample) of your incoming audio signal.
That’s a lot, innit? Yes it is.

That said,

Best sampling frequencies for audio recording

Option 1: music

When you record music, your purpose is to reproduce an acoustic signal. For mere reproduction and post production, you can understand what’s the best sampling frequency by understanding this theorem: the Nyquist-Shannon sampling theorem. It’s a good read: I suggest you to read the whole thing.

But,

If you’re in a hurry, here’s the TD:DR:

The best sampling frequency is 2 times the frequency of the highest frequency of your incoming signal.

Which, in case of music, is 20.000 Hz (20Khz). Therefore: 2 x 20.000 = 40.000

There we go: the best frequency for audio sampling is around 40Khz. And talking about industrial standards, we have the mighty 44.1 Khz at our disposal.

Summarizing:

The best frequency for recording, post producing and reproducing music is 44.1 Khz.

Going above that, for music purpose, is pointless: you’d just get lots of unreadable data. Because, as you’ve already read from Nyquist-Shannon sampling theorem, you would be sampling frequencies that our ear can’t hear – and, also, who our modern reproduction devices won’t listen to: every sampling device has a low-pass and high-pass filter so to avoid listening to frequencies above & beyond our hearing range. Basically, you would just be pointlessly bloating the size of your files.

Option 2: scientific purpose

If you’re working in a forensics lab, an audio restoration workshop, or anyway a place in which you do extensive use of heavy spectral processing (noise removal, pitch modding, harmonic elaborations, custom made processings…), then you might like to use higher sampling frequencies – because, now, your interest goes beyond the realm of the immediately audible: in the moment you pitch down a signal, you’re using the non audible snapshots of your sampled signal – just like when, with a video camera, you raise up the FPS so to slow down the image. By snapping more snapshots than your eye can actually see in reality.

Needless to say: if you’re working in such environments, you already know about this. :P