Custom 2-input Neural Network #259
Replies: 5 comments
-
I think so. Depending on exactly what you want. The starting point: you want to configure your hardware so PiPedal has a STEREO input: the piezo on the left channel, and the condenser mic on the right channel (or vice versa). You can split the stereo input into two separate chains using a Splitter. Configure the Splitter as an L/R splitter. The the left channel is fed to the top effect chain; the right channel is fed to the bottom effect chain. You can then place separate Nam or ML effects on the top and bottom chain. which pluginYou probably don't want to use Aida DSP (which is fairly old, and doesn't get on well with Pipedal) There are two main neural net libraries that underly almost ALL open source neural net audio plugins.
Plugins that use the ML library: Proteus (https://guitarml.com/tonelibrary/tonelib-pro.html) Proteus and TooB ML share the same file format. Plugins that use the NAM library:
The ML libary supports two basic neural net architectures. The NAM library supports fairly arbitrary neural net architectures. NAM is to preferred to ML in all regards, except for the fact that TooB ML runs somewhat faster than TooB NAM for certain classes of models. (And that TooB ML can load the Proteus tone libraries which are AMAZING). In all other respects, NAM is a superset of ML. Training procedures are available for both libaries. (I've never done it, you'd have to research this for yourself, but they are out there, and reasonably well documented). two channels int one model.If you want to feed both channels into the SAME neural network, NONE of plugins will do that. And neither library has off-the-shelf code for building and training such a model. They all take ONE audio channel, and an optional CONTROL channel (used for gain, or tone, depending on how the models were trained). The ML library will not support this scenario. The NAM library probably will, but will require some serious programming chops top do so. You would have to:
All of these tasks are moderately difficult from a programming perspective, but definitely doable. Not for beginners; not beyond the abilities of a competent intermediate programmer. But the first three are serious research projects, and are more expert than I would be comfortable with personally. do you REALLY want to do that?I'm not sure you do. Both inputs are conveying pretty much the same information to the neural network. If you train EITHER input against the output you want, I think you'll get pretty much the same thing. Both inputs are going to be highly correlated during training; and if both inputs to the model aren't highly correlated in exactly the same way when playing via the model, I think all bets would be off as to what the model is going to do. Unless you have something in mind that I haven't really thought of. So I think that...What you really want is separate NAM or ML plugins for each input channel. Give Pipedal a stereo signal, and use an L/R splitter to separate each input channel. There are very good reasons why you might want to train one input channel at a time against an output signal that you have. particularly for piezo channel, I would think, since available models that have been trained on piezo input signals are probably pretty rare. Training a model to generate a simulated condenser mic output from a piezo input: AMAZING idea. :-) And very definitely doable with existing tools and plugins, since it's one input/one output. Several hundred million guitarists would be grateful for such a model, which would allow cheap guitars with nasty piezo outputs to sound like Taylors or Martins after running their nasty piezo outputs through your model. No pressure. :-P (And my old beater Fender acoustic, too. Apologies in advance to anyone who actually LIKEs the sound of piezo pickups ). |
Beta Was this translation helpful? Give feedback.
-
Sorry. Just want to drive home that last point a little more, because I think it's a HUGE idea, and somebody needs to try it. Probably you! And because, 10 minutes ago, this might have been a patentable idea that isn't any more because this counts as "publication". This is an idea that wants to be free! Probably a decade too late for that, but just in case. The abstract:Make a cheap acoustic guitar sound like a very expensive acoustic guitar using pizeo pickup signals as input. The procedure:Capture simultaneous recordings of an acoustic guitar from a piezo bridge pickup and a microphone, internal or external, optionally run through a pre-amp. Train a neural net to simulate the latter signal from the former signal. Use the trained neural net model in a realtime audio signal chain whose input is the the signal from piezo pickups on another guitar. And any trivial and obvious permutation of that. DiscussionThe piezo signal is obviously going to be affected by soundboard resonance to some extent; but I think it's plausible to imagine that the condenser mic signal captures the resonance of the soundboard and body cavity more and better. If that's true, it's not difficult to imagine that running a signal from a cheap guitar through a model built using the two signals might capture some elements of the sound of a really good guitar if the signals were recorded on a really good guitar, but played back on a bad guitar. The target signal should be taken after the pre-amp in order to capture any desirable signal shaping of the pre-amp, and should be recorded in a such a way as to minimize reverb from the ambient environment (or from the pre-amp), since reverb generally isn't generally well modeled by neural networks. The differences between the signals are going to be:
All of which are well captured by ML and NAM models. |
Beta Was this translation helpful? Give feedback.
-
Hi Robin,
Thanks a lot for your detailed answer.
The rational behind my idea of using 2 input sources is that internal condenser mic and piezo have both defects but provide complementary information, in terms of frequency and dynamics. On stage I currently mix the two signals. Using a piezo and a condenser mic is something that many people do on stage with acoustic instruments.
My intuition is that a 2-input NN will befenit from this complementary information. It will also help to inprove SNR as the 2 source noises are independent. Having a single NN is also computationally more efficient that 1 NN on each source. Another benefit of having a single NN that melts down the 2 input signals is that there won't be phase issues when mixing the 2 inputs.
Of course I will first try your suggestion that is straightforward (1 NN on each source), but I wanted to know about the perspectives for improvement if I choose your framework on stage.
Beside being a musician I'm also a researcher in NN applied to images. So the first 3 points you mention actually motivate me a lot (I have to study the specifity of sound vs image signal processing but I can also transpose some concepts, e.g. in data augmentation for robustness to noise, treat sound as images in the frequency domain etc..).
I will however definitely need guidance/help with the 2 last points, as I dont code in C++ and I'm not familiar with lv2 plugins nor real time processing.
Note that improving the sound of a piezo to match the quality of an external mic is classically done with IRs, but in my experience they are limited. I believe NN will have much more power in generating harmonics that are not existing in the source, and adapting to how the instrument is played (e.g. when played soft vs loud I believe different IRs would be needed), and more resilience to noise.
Best Regards,
Jean
16 déc. 2024 11:54:59 Robin Davies ***@***.***>:
…
@jeanollion[https://github.com/jeanollion]
Sorry. Just want to drive home that last point a little more, because I think it's a HUGE idea, and somebody needs to try it. Probably you!
And because, 10 minutes ago, this might have been a patentable idea that isn't any more because this counts as "publication". This is an idea that wants to be free! Probably a decade too late for that, but just in case.
*The abstract:*
Make a cheap acoustic guitar sound like a very expensive acoustic guitar using pizeo pickup signals as input.
*The procedure:*
Capture simultaneous recordings of an acoustic guitar from a piezo bridge pickup and a microphone, internal or external, optionally run through a pre-amp. Train a neural net to simulate the latter signal from the former signal. Use the trained neural net model in a realtime audio signal chain whose input is the the signal from piezo pickups on another guitar. And any trivial and obvious permutation of that.
*Discussion*
The piezo signal is obviously going to be affected by soundboard resonance to some extent; but I think it's plausible to imagine that the condenser mic signal captures the resonance of the soundboard and body cavity more and better. If that's true, it's not difficult to imagine that running a signal from a cheap guitar through a model built using the two signals might capture some elements of the sound of a really good guitar if the signals were recorded on a really good guitar, but played back on a bad guitar.
The target signal should be taken after the pre-amp in order to capture any desirable signal shaping of the pre-amp, and should be recorded in a such a way as to minimize reverb from the ambient environment (or from the pre-amp), since reverb generally isn't generally well modeled by neural networks.
The differences between the signals are going to be:
* linear tone differences due to differences in mic response.
* non-linear wave-shaping due to non-ideal behavior of the condenser mic.
* non-linear circuit response of the pre-amplifier used to "sweeten" the overall sound.
All of which are well captured by ML and NAM models.
—
Reply to this email directly, view it on GitHub[#259 (comment)], or unsubscribe[https://github.com/notifications/unsubscribe-auth/AC4I7M55UQCX2VLT6HOPLVT2F4HXFAVCNFSM6AAAAABTU55LP2VHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTCNJYGQZDQOA].
You are receiving this because you were mentioned.
[Image de pistage][https://github.com/notifications/beacon/AC4I7MZ6TEDFUW6KF76LMOD2F4HXFA5CNFSM6AAAAABTU55LP2WGG33NNVSW45C7OR4XAZNRIRUXGY3VONZWS33OINXW23LFNZ2KUY3PNVWWK3TUL5UWJTQAWDBSA.gif]
|
Beta Was this translation helpful? Give feedback.
-
can I help?I can provide you with some help as long as the modifications to the C++ NAM architectures are trivial. Providing a modified LV2 plugin that feeds two audio input channels to NAM dsp code is is pretty trivial - a day or so of work. I can help with that. Modifications to the NAM C++ library are considerably more perilous, and would have to be limited to very trivial modifications. The realtime code runs without use of GPUs; so the choice of ML modules offered by the NAM loader are strictly limited: primariy layers of small ConvNet, WavNet or LSTM modules, of configurable sizes, with -- perhaps -- some ability to customize mixing between layers. There are a also a series of ConvXX mixing layers that WavNet uses that may or not be exposed by the loader code. NAM model files are json files that contain the model weights, and metadata that describes how layers are configured and composed in the model. It's based on dumping data from a binary format (can't remember the name, sorry). The binary format is pretty general. NAM's C++implementation is strictly limited. You may be familiar with the format if you've been doing training of imagine process neural nets. It does provides dynamic loading of layers with configurable sizes. But, my off-the-top-of-my-head impression is that it does not support multiple audio inputs -- although I may be wrong about that. So, very tentatively, I can provide help with basic tweaking of the NAM C++ model loader code, depending on what you need. Ideally changes would be limited to introduction of custom mixing layers between major model components. Even minor tweaking of something like a convnet module is probably out of scope -- very difficult to replicate the behaviour of TPU code in C++. I haven't really analyzed the code yet. So any promises at this point are tentative. If you're looking for anything other than trivial changes, you might find it productive to have a discussion with Steven Atkins, author of the NAM library. My suggestion: get yourself to the point that you're ready to train a model using the existing NAM training libraries. And we can then have a serious discussion about what customization would have to be done to the NAM C++ loader code to support it, and whether I can provide the changes or not. is it a good idea?After a great deal of thought, maybe. In the real world, with sound running through a PA, the non-liner effects of something like an LR Baggs pre-amp are going to affect how the body and soundboard resonate, which means that ML solutions should produce better results than Impulse-Reponse-based emulations. Enormously complicated by the question of whether you are re-amping or performing live (in which case the input signal will incorporate feedback from the model output). So perhaps amenable to an ML emulation. It's going to be complicated. To my ears, the LR Baggs pre-amps sound way sweeter and warmer than a strictly-digital preamp should. So I do suspect that they do have some form of emulation of a non-linear tube-based pre-amp. And, because of the presence of non-linear amplificaton stages, your approach might produce better results than pure IR solutions. However, I am not totally convinced that combination of the input signals are not entirely linear (and therefore better dealt with before feeding signals to the ML models). You should know that the current-gen ML models optionally support one audio channel and either zero or one input control (which usually controls either tone, or gain). So -- generally -- you will have to train models with one fixed set of tone controls settings. The big challenge with multiple controls -- I would think -- would be producing a reliable broad spectrum of recorded training data sets. Anyway. Interesting problem. Good luck with it. Keep me posted. |
Beta Was this translation helpful? Give feedback.
-
Hi Robin,
Thanks a lot for your answer and the possibility to offer help with some parts. Of course I will first see if there is a significant improvement with 2 inputs over 2 x 1 input.
I'm not sure I understand what you said about mixing audio inputs. Regading this matter, my understanding ia that there can be phase issues when mixing two distinct mics recording the same source (especially if they are located far away from each other), and my intuition that ML could solve this issue.
The question of live performance, with problems of feedback but also leak of other instruments from stage monitors is very interesting. I wonder if there is a way to attenuate this using ML. In the realm of image processing, a common practice is to add parasite signal as part of data augmentation at train time, so that the NN learns to separate and reduce them. I investigate on how this can be translated to sound signal, or if there are other options. Intuitively that could be an advantage of having two different input sources, as a piezo and a condenser mic react differently to feedback, thus the NN could be able extract information from this difference.
I will get back to as soon as I have performed tests, wich may take a bit of time because I have a lot to learn about sound signal processing first!
Best Regards,
Jean
17 déc. 2024 09:13:28 Robin Davies ***@***.***>:
…
*can I help?*
I can provide you with some help as long as the modifications to the C++ NAM architectures are trivial.
Providing a modified LV2 plugin that feeds two audio input channels to a NAM dsp code is is pretty trivial - a day or so of work. I can help with that.
Modifications to the NAM C++ library are considerably more perilous, and would have to be limited to very trivial modifications.
The realtime code runs without use of GPUs; so the choice of ML modules offered by the NAM loader are strictly limited: primariy layers of small ConvNet, WavNet or LSTM modules, of configurable sizes, with -- perhaps -- some ability to customize mixing between layers. There are a series of ConvXX mixing layers that WavNet uses that may or not be exposed by the loader code.
NAM model files are json files that contain the model weights, and metadata that describes how layers are configured and composed in the model. It's based on dumping data from of a binary format (can't remember the name, sorry). The binary format is pretty general. NAM's C++implementatoin is strictly limited. You may be familiar with the format if you've been doing training of imagine process neural nets. It does provides dynamic loading of layers with configurable sizes. But, my off-the-top-of-my-head impression is that it does not support multiple audio imputs -- although I may be wrong about that.
So, very tentatively, I can provide help with basic tweaking of the NAM C++ model loader code, depending on what you need. Ideally changes would be limited to introduction of custom mixing layers between major model components. Even minor tweaking of something like a convnet module is probably out of scope -- very difficult to replicate the behaviour of TPU code in C++. I haven't really analyzed the code yet. So any promises at this point are tentative.
My suggestion: get yourself to the point that you're ready to train a model using the existing NAM training libraries. And we can then have a serious discussion about what customization would have to be done to the NAM C++ loader code to support it, and whether I can provide the changes or not.
*is it a good idea?*
After a great deal of thought, maybe. In the real world, with sound running through a PA, the non-liner effects of something like an LR Baggs pre-amp are going to affect how the body and soundboard resonate, which means that ML solutions should produce better results than Impulse-Reponse-based emulations. Enormously complicated by the question of whether you are re-amping or performing live (in which case the input signal will incorporate feedback from the model output). So perhaps amenable to an ML emulation. It's going to be complicated.
To my ears, the LR Baggs pre-amps sound way sweeter and warmer than a strictly-digital preamp should. So I do suspect that they do have some form of emulation of a non-linear tube-based pre-amp. And, because of the presence of non-linear amplificaton stages, your approach might produce better results than pure IR solutions. However, I am not totally convinced that combination of the input signals are not entirely linear (and therefore better dealt with before feeding signals to the ML models).
You should know that the current-gen ML models optionally support one audio channel and either zero or one input control (which usually controls either tone, or gain). So -- generally -- you will have to train models with one fixed set of tone controls settings. The big challenge with multiple controls -- I would think -- would be producing a reliable broad spectrum of recorded training data sets.
Anyway. Interesing problem. Good luck with it. Keep me posted.
—
Reply to this email directly, view it on GitHub[#259 (comment)], or unsubscribe[https://github.com/notifications/unsubscribe-auth/AC4I7M343E6H7JDYSAMSTIT2GA5RPAVCNFSM6AAAAABTU55LP2VHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTCNJZGQ3TGNQ].
You are receiving this because you were mentioned.
[Image de pistage][https://github.com/notifications/beacon/AC4I7M7OAQI2TR64XT4JNTL2GA5RPA5CNFSM6AAAAABTU55LP2WGG33NNVSW45C7OR4XAZNRIRUXGY3VONZWS33OINXW23LFNZ2KUY3PNVWWK3TUL5UWJTQAWDV7A.gif]
|
Beta Was this translation helpful? Give feedback.
-
Hi @rerdavies,
I have an acoustic instrument with 2 internal pickups (a piezo and a condenser mic), and I want to use the 2 signals as input to a custom neural network to match an external microphone. From my researches on the web I think your framework is the most adapted (I can use a usb audio interface)
I have two questions:
1/ is it possible to have two mono (or one stereo) input to a neural network with a single mono output in pipedal framework?
2/regarding the neural network, ideally i would like use a custom architecture and training procedure. I saw that there is a lv2 plugin for rt-neural (https://github.com/AidaDSP/aidadsp-lv2). I can easily export a trained neural network to rtneuron. Would it work in the pipedal framework?
Otherwise what would you suggest? can i customise the neural network architectures you currently use for amp modelling?
Thanks,
Jean
Beta Was this translation helpful? Give feedback.
All reactions