forked from radiocicletta/vlevel
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathtechnical.txt
152 lines (107 loc) · 5.31 KB
/
technical.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
VLevel 0.3 technical.txt
This is an outline of how VLevel works, which will hopefully be useful
to anyone who helps develop it. (And me in a month.)
Code Layout
The core of VLevel is the VolumeLeveler class. It contains the
look-ahead buffer. It works on blocks of double-precision numbers,
called values (value_t). Some number (channels) of values makes up
a sample. Many of VolumeLeveler's functions take lengths in samples
instead of values, so be careful.
vlevel-bin drives the VolumeLeveler class, and also uses CommandLine
to parse it's options.
Soon, there will be a Ruby script using SOX that makes Ogg, FLAC, and wav
files work with a nice drag-n-drop GUI.
General Idea
The intent is to make the quiet parts louder. This is done by
computing the volume at each point, then scaling it as described in
the Math section. The complex part is finding out what the volume
is at each point (VolumeLeveler::avg_amp). It must vary smoothly so
the volume changes aren't obvious, but it must always be above the
peaks of the waveform, or clipping could occur.
To do this, there is a fifo buffer. Imagine a graph of position
vs. amplitude showing the contents of the buffer. A horizontal line
is drawn across it, representing VolumeLeveler::avg_amp. From where
avg_amp crosses the y-axis, a line is drawn with the maximum
possible slope to intercept one of the amplitude points. This line
which is as steep as possible, is the smooth line that avg_amp will
follow.
When the value at the head of the buffer is removed, it is scaled
based on avg_amp. avg_amp is then incremented by the slope of the
line. If we reach the point the line was drawn to (max_slope_pos),
we search the fifo for the next point of highest slope. Otherwise,
we only need to check the incoming sample to see if a line drawn to
it has the highest slope.
y y (a few samples later)
^ ^ ^
| / max_slope |
| / |
| /s |s\---------- avg_amp
| / s |s \
| / s |s \ max_slope
| / s s |s s \
|--s-ss-s----avg_amp |s s \
| ss ss s s |s s ss
|ssssssssss |ssssss
+------------> x +---------> x
Sorry for the ASCII art. The result is that the average amplitude
(avg_amp) varies slowly and always stays above the amplitude of each
sample. When the samples are removed, they are scaled based on the
next section.
Math
Once we have avg_amp, each sample is scaled when it is output
according to this:
output_sample = sample * avg_amp ^ (-strength)
This is derived as follows:
First, we convert the amplitude of avg_amp to decibels (1 = 0dB):
avg_amp_db = 10 * log10(avg_amp)
avg_amp_db is less than zero. We want to scale it to be closer to
zero, in such a way that if strength is 1, it will become zero, and
if strength is 0 it will remain unchanged.
ideal_amp_db = avg_amp_db * (1 - strength)
ideal_amp_db = 10 * log10(avg_amp) * (1 - strength)
Now we convert back to samples:
ideal_amp = 10 ^ (ideal_amp_db / 10)
ideal_amp = 10 ^ (log10(avg_amp) * (1 - strength))
ideal_amp = (10 ^ log10(avg_amp)) ^ (1 - strength)
ideal_amp = avg_amp ^ (1 - strength)
Now we find out what we should multiply the samples by to change
their peak amplitude, avg_amp, to their ideal peak amplitude,
ideal_amp:
multiplier = ideal_amp / avg_amp
multiplier = avg_amp ^ (1 - strength) / avg_amp
multiplier = avg_amp ^ (-strength)
And finally, we multiply the sample by the multiplier:
output_sample = sample * multiplier
output_sample = sample * avg_amp ^ (-strength)
Undoing the effect
If the original values for strength weren't too close to 1, you can
undo the VLevel by giving the undo option. It works by changing
strength as shown below.
When we first leveled, we scaled the amplitudes like so:
ideal_amp_db = avg_amp_db * (1 - strength)
To get that back, we solve for avg_amp_db
avg_amp_db = ideal_amp_db * 1 / (1 - strength)
In this pass, however, the original avg_amp_db becomes ideal_amp_db,
and the original (1 - strength) becomes 1 / (1 - strength). Now
we skip ahead a bit:
multiplier = avg_amp ^ (1 - strength) / avg_amp
Substituting as explained above and continuing:
multiplier = avg_amp ^ (1 / (1 - strength)) / avg_amp
multiplier = avg_amp ^ ((1 / (1 - strength)) - 1)
multiplier = avg_amp ^ (strength / (1 - strength))
But how do we get VLevel to do this? Well, we can give it any
strength, and it does this:
multiplier = avg_amp ^ -strength
And we want it to do this:
multiplier = avg_amp ^ (undo_strength / (1 - undo_strength))
So...
-strength = undo_strength / (1 - undo_strength)
strength = undo_strength / (undo_strength - 1)
By choosing strength as above before starting VLevel, we can then
undo the first VLevel, with no change to the main algorithm.
To be totally precise, we'd also have to make a min_multiplier with
a value of 1 / orig_max_multiplier, but that would be slow, and does
anybody care if we drop the static anyway?
It's not perfect, probably because avg_amp moves linearly, not
logarithmically, so there are some rounding errors. Someday I might
try changing that, but it's a big change.