Personal View site logo
Sound mixing, and matching tracks recorded in different ways
  • I'm finalizing a video project, and there's a big issue with our sound recordings.

    Here's the background. While shooting, we obviously got camera mike sound for all takes. But we'd rented and used a couple of shotgun mikes with a Tascam recorder for many scenes, and the sound we got from those is marvelous. Obviously, my question is about making the camera mike tracks sound a bit more like the fuller, shotgun-mike tracks. Or vice versa, if that's what's needed.

    I'm not much of a sound person, so I'd have to understand the principles involved, and then learn to use the Premiere Pro or Audition tools to accomplish the job. Can anyone point me to some good tutorials that might help?

  • 37 Replies sorted by
  • @Brian_siano the first thing I would do is find a take that you have both shotgun and camera audio and put them in the same audio editing program. Play them side by side and switch between the two. This will let you hear what the differences are so that you can try to match the camera audio to the shotgun audio or vice versa. Once you can identify the differences then you can find plugins to do things like reduce ambience /reverb, reduce noice, adjust the tonal balance, adjust the volume level. I would look at Izotope RX 6 audio and see which version you need to solve the problems you are trying to correct. They have a 10 day trial period, so maybe you could get all the audio repair done in the demo timeframe and not have to purchase anything. If not, by the end of the demo you would know whether it is a tool you will want to have access to indefinitely. The other recommendation is to not just work on headphones but play it in your room through speakers as well. And to you original question, Izotope has a number of tutorials here:

    I don't work for Izotope, but own and use their products, so I am confident that they offer some good tools for your application.

  • Yikes...
    If it's dialog, it will be tough, but you want to look at physical modelling to get both sounds to sound like a common denominator. I would start with fine tuning the EQ so the wave forms match, then add some ambience, convolution or reverb to try to thicken the cam mic to match the real mics. You are going to need some NR as well. You will need super short reverb, and, even then, it may not work.
    Camera mics use heavy compression, so you can experiment with overdriving the compression on the good mics to get them to sound a bit closer to the cam mics. You can also add a bit of noise as you are adding compression as cam mics drive up the noise floor. If you have the option to pan, try narrow panning the good mics as the cam mics will be narrow unless there is some weird surround thing going on, like with Sony. You can also invert one of the channels which is going to add some cheese, if you you want that, if you can't get it narrow enough.
    A lot of convolution programs have "rooms" or spaces. You can put both types of audio into the same space, to try to make them sound more similar. Again, don't use a long tail, work with the density.

  • I got a lot to learn, then. But I do have a question that might be answered quickly. Is there some way I could add some bass or low-end resonance to a fairly tinny audio track? Not room echo or ambience, but some "bottom?" (And if so, what tool in PP or Audition would be best for this?)

  • Can try

    I mean if you want something other than EQ.

  • if you use adobe audition's studio reverb with bare minimum settings to only affect 200hz and below, it will make cheap mics sound like senn 416 because its adding microscopic reverb to only the very low freqs. do that after a basic parametric equalize('D' shape raise 30hz, 95hz 2.5db, raise 972hz 4.5db, raise 1306hz 4db, lower the 10k freqs). and finally, multiband compression will bring out freqs more evenly. that is always the last step.

  • @hardimpact Thanks. I'll give that a try, and I hope I can learn to understand it.

  • I myself never use Audition for anything, but you can certainly try it. Basically, try everything you can think of in different orders but keep everything at the highest bit level. If you just want to work on the low end, you can certainly use multiband compression, but I would look into parallel compression. Remember, compression on a crap track can boost the noise, you may have to go through any gaps manually and lower the levels.
    There's different tools for different types of material. If it is music, you can use a doubler, for example, and you can even use that on voice to thicken it.
    Different EQ and multiband compressors have different sounds. The Gerzon shelves, while somewhat dated, might give a smooth EQ, depending on the material. They are part of the Waves Renaissance EQ. Samplitude has a good mutiband compressor. But there are zillions to choose from.

  • You shouldn't ever be using the tracks on the camera itself, they only exist as scratch reference to line up with your real audio from your Tascam recorder.

  • Okay, I'm making some progress on this, but lemme sound you guys out on a workflow. I might save myself some time and work by using Audition's match volume feature for the clips in a sequence.

    But, if you've used this feature, sometimes it doesn't work right. For example, one clip in my sequence is just some ambient noise, to fill in a gap: well, Audition boosted that clip to match the dialogue volume. But generally, the problem is that the adjusted clips also have adjusted background noise, so even if the dialogue is the same volume, there's still some tweaking, noise reduction, and the like.

    It looks like a good strategy would be to a) run the Match Volume, and then b) tweak each clip to a closer match, using levels, noise reduction, and the like. Does that seem like a good strategy?

  • I know that with samplitude you can copy an FFT dynamics snapshot of one recording, then paste the filter onto another recording to help match up the dynamics

  • If the camera audio is compressed, you will need to decompress it. You could make little edits in the transitions from sounds to background, and duck the background. Samplitude is a good choice unless you want to go high end.

  • Add in some ambiance to blend in the cuts together.

  • What you don't want to do is boost the noise and then try to remove it. Try to reverse the compression first.

  • I agree, compression should always be the last or second to last step. I put them in order of non destructive.

    1. dc offset - offset power errors affects dynamic range so this goes first, a critical pre-alignment step
    2. match volume - normalize 'peak amplitude' everything to -6db to see serious errors easier like clipping, offsets, phasing etc.
    3. declip - run declip effect, remove buzzing with dynamics processing, notch filter the rest.
    4. equalize - run parametric equalizer to reduce wind, hiss via freq cut, give vocal clarity 'D' curve
    5. phase correct - once tracks layed, detect any phasing errors
    6. noise removal - declick, de-reverb, remove hiss, wind, learn sound model/noise reduction, all else healing brush
    7. studio reverb - to add vocal weight
    8. RMS -23db audio - to roughly match vocals begin basic audio editing with levels(vocals center and 12db higher than music, etc.)
    9. compression - multiband compress sfx: 4:1, vocals 2:1 db to give pleasing db range. always 2nd to last step because any other processes affect dynamic range. It should soft limit for you. hard limits are audible and not recommended.
    10. quality control - match loudness ITU-R BS.1770-3 -23 LUFS film, -16LUFS youtube - for standard
  • How can one reverse the camera compression? I'm using Audition and PP.

  • most compression is destructive. the closest 'undo' you can do is audition's effect-dynamic processing acting as an expander/compander. The quality depends on if the audio was 16 or 24 bit, similar to 8 bit vs 10 bit video. Also, if the audio was hard limited, the quality had possibly been reduced as well.

  • Brian, is the camera sound stereo? Also, are you doing this because you are missing takes from shotgun mic?

  • @hardimpact I would not normalize the volume until you balance out the compression (using micro edits), otherwise you are just boosting the noise. Also, I would recommend using some sort of physical modelling to match the cam sound to the mic sound, or match both sounds to an intermediary sound space.
    @Brian_Siano although you might be able to do what you want to do in Audition, to get a somewhat better result you probably would have to use a real DAW, or just pay someone to process the audio for you.

  • peak amplitude is non-destructive in a 32bit environment and retains dynamic range. It just makes noise easier to see and work on. and matching mics would be step 8b, 'matching shots', not listed. I am always open to suggestions and appreciate your input on how you approach 'balancing compression'.

    For example, I use RMS and tweak the mixer to decide if something needs compression/limiting. i.e. too close to 0db. -23 RMS works extremely well for audio editing because it keeps the voices in a 'talking range' so you don't spend years micro-managing each clip. and natural voices will usually peak at -10 so it actually creates a perfect setup for whispering, normal talking, and yelling, all while dynamic range preserved.

    Usually if something goes from -30 to 0db, its going to be compressed 4:1 because human ears get tired after a few hours in a theatre from a larger dynamic range.

  • @spacewig Yes, that's exactly the situation: shotgun mikes for most of the project, camera mikes for the rest. They're stereo, too.

    I'm going to try to use Audition to plow on through a rough sound cleanup. It won't be perfect, but it'll be better than what we currently have. Eventually, I'll have these skills nailed.

  • What camera audio must you use? Is it dialogue that was not captured with the shotgun mic?

  • @hardimpact it might be that normalizing is lossless in Audition, but I've never tested it. For example, it may automatically add dither, I have no idea. I've never gotten good results from it, but I don't use it that often.
    But the question is how to handle the basic problem with in-camera compression. You could use or refine a ducking limit, so some parts of the background noise would automatically be set below the threshold, but I would just take the time to go chunk by chunk, split each part into audio and background with a crossfade for each bit, and then smoothly set the crossfade for each part so it sounds natural, before introducing any processing. Each background chunk would then have a custom amplitude, set 2-6 dB lower, and then when you process the audio you mainly boost the "good" part, and any dither simply adds detail instead of adding noise. You may need some light dither to help with the compression and convolution, but you only want to add it once. If one tested the system and you could see that the normalization was lossless, sure, do the micro edits after boosting it, but even on a high end DAW I would trim the audio before processing. By using the microedits you are reversing what the camera did, which is to boost the low level signals. It won't effect any clipping or brick wall effects, and it won't sound as good as a regular recording, of course.

  • yes, I process each clip separately. I did do a generate square tone test in audition with 15 random normalizations, some way over 0db. I zoomed into the samples and had 0% dither. So, its 100% mathmatically lossless if your curious about that. Like I said, dynamic processing effect works well to remove low db noise as your 'ducking limit'.

    crossfades would only be neccessary if there was still noise in the clips. Once reverb was matched, either by dereverb or reverb, then rx ozone equalizer to match eq across clips or manually in audition with view freq analysis para eq effect.

    Add room tone manually via dynamic proc or rx 6 advanced ambiance to match automatically. If noise is completely removed, reverb matched, and eq matched, then a simple RMS -23db should make all your clips somewhat match across the whole timeline. finally lay in the room tone track.

  • anyways, I was directly comparing izotope rx 6 to audition with noise removal. audition worked well with certain FFT sizes like 4096, but others not so well. I usually work backwards because its easier to hear if you're on the right track by ramping up the intensity of sound removal to max, then output noise only. this way, if I hear any dialogue, I am not on the right track. then I simply need to try other settings.

    Anyway, the main reason I am posting is that RX 6 advanced had all these new dialogue filters and they all seemed very similar. Apparently, they were each machine learned by a different way; one by dialogue vs footsteps and traffic, another by random broadband noise like radio static, and one just for wind. As it turned out, I had one file that had all three problems.

    So I decided to try them out. They all worked reasonably well, at least as good or slightly better than audition. but I was wondering. What if I tried all three. What order would they go in? Would it make it worse overall from distortion? What I found out is very interesting.

    I've found that if you run dialogue isolate with a very low setting like 1.5 separate and -6db reduction. then do voice denoise, adaptive -6, then all that's left is a nice voice with no noise except the wind, then the de-wind works 10x better. Now you're left with clear dialogue with zero distortion. So, apparently you need all 3 in that specific order. I tried other orders but my instincts were correct that you use the least damaging filters first, then all the excess noise falls through the chain in a certain way.

    This is also how I use audition and grading as well

    Your thoughts?

  • anyways, I was directly comparing izotope rx 6 to audition with noise removal. audition worked well with certain FFT sizes like 4096, but others not so well.

    iZotope RX 6 noise removal is not stupid one key thing, especially if you compare spectral denoisers.

    Anyway, the main reason I am posting is that RX 6 advanced had all these new dialogue filters and they all seemed very similar. Apparently, they were each machine learned by a different way; one by dialogue vs footsteps and traffic, another by random broadband noise like radio static, and one just for wind. As it turned out, I had one file that had all three problems.

    What you mean?

    RX6 has new outstanding Dialog Isolate. And it seems like you not understanding how it works.

    Proper sequence is usually such:

    1. Spectral denoise (multiple passes most of the time)
    2. Dialog Isolate
    3. De-Wind, and only on parts affected (very rare, as it must not happen at first place).

    Actual limit is how natural speech sounds.

    Reducing noise to zero or very low amounts is also very bad idea usually.