Personal View site logo
Sound mixing, and matching tracks recorded in different ways
  • I'm finalizing a video project, and there's a big issue with our sound recordings.

    Here's the background. While shooting, we obviously got camera mike sound for all takes. But we'd rented and used a couple of shotgun mikes with a Tascam recorder for many scenes, and the sound we got from those is marvelous. Obviously, my question is about making the camera mike tracks sound a bit more like the fuller, shotgun-mike tracks. Or vice versa, if that's what's needed.

    I'm not much of a sound person, so I'd have to understand the principles involved, and then learn to use the Premiere Pro or Audition tools to accomplish the job. Can anyone point me to some good tutorials that might help?

  • 37 Replies sorted by
  • The first part is balanced frequency response across the whole range with speakers/headphones that don't respond equally like pink noise. i.e. some speakers/headphones are stronger in bass or treble. I'm looking at you beats headphones :)

    As for loudness calibration, some people actually own a SPL meter but are recommended to lower its setting if in a small room. At that point, you might as well calibrate to theatre settings. The sonarworks also does loudness too, I forgot to mention that.

  • @hardimpact

    What is it about? Matching speakers to individual ears?

    Calibration is not about it.

  • I thought this would be useful for frugal filmmakers who don't own an audio editing studio or calibration equipment. . .Free DIY eq calibration and volume for audio editing.

    "Play a pink noise and apply graphic eq to just one channel(left or right, doesn't matter). Make all sliders zero except one one at time and write down the number where the sound matches the same apparent volume as the other speaker. Finally, play all freqs to set master gain against pink channel on other speaker because eq'ing changes volume."

    Example: If you compare a 20 band eq only one slider at a time, the pink noise of your left speaker will match the apparent volume of a single right speaker's freq. You can now model the freq response of poor headphones against a Sony MDL so they don't sound so tinny. Finally, with eq complete, you now can set comfortable talking level with pink noise -23db in os's volume control.

    this can also be applied to video players and even your web browser as a equalizer plugin. the professional way is sonarworks but its not free ;)

  • I wish i could pay a skilled audio engineer (and a colorist too) but my work is almost no budget. We are two people band right now. However, thank yow guys for those advices.

  • My 3 cents: I would not use Audition for anything, you should upload some samples for people to look at, have them take a whack at it and then pick the one you like the best.
    NB: you can definitely "boost the whole track", not sure why you couldn't, and if you want to boost the good bits and not the bad bits, you do this with micro edits. Don't do this in Audition, but if you must, just keep splitting the audio, lower the chunks that have the bad bits and raise the chunks that have good bits. There's a million other ways to do this, including multi-band compression, noise pattern reduction, and so on, but if you aren't going "chunk by chunk" and instead filtering the whole enchilada your results will be worse. IMHO the typical YouTube videos on audio are mostly junk, but there are some good ones by Kraznet on Samplitude.
    The other thing to ask is why you aren't paying a skilled audio engineer for a few hours work.

  • Although more academic than immediately practical, this channel is good for learning about various audio engineering concepts in a succinct manner https://www.youtube.com/channel/UC9KI12liJIeXLpcNqDN4QNQ/videos

  • Thanks a lot @hardimpact !! now viewing his video tutorial "The easiest audio editing for non-engineers in Adobe Audition"

  • audition would be a better tool for this. if your voices are widely dynamic even after a -23 RMS, then apply multiband compression till it sounds nice, then add bs.1770 last. the adobe youtube videos by 'videorevealed' are really, really good.

  • @hardimpact trying to follow your 10 steps guide, but i'm completly noob in sound. Any video tutorial of these or similar procedure i can play (and pause) for noobs?

  • I can't say I've done a great job with the soundtrack, and we're near deadline, but I have one last question.

    I'd like to keep most of the vocal levels within the same decibel range. I obviously can't simply raise or lower the entire track. But is there a way to boost the low levels, and keep the louder levels more or less the same, and have everything within a certain decibel range? Sort of like boosting the gamma in visual data? I think this is what compression's used for, but I'm not sure if I'm using PP's tools for that very effectively.

  • I tried spectral denoise and voice denoise worked better for my clip. The noise/wind was all over the dialogue in every frequency and db range. I tried many combinations and my combo sounded the best with the least distortion. I hope this answers your questions.

  • anyways, I was directly comparing izotope rx 6 to audition with noise removal. audition worked well with certain FFT sizes like 4096, but others not so well.

    iZotope RX 6 noise removal is not stupid one key thing, especially if you compare spectral denoisers.

    Anyway, the main reason I am posting is that RX 6 advanced had all these new dialogue filters and they all seemed very similar. Apparently, they were each machine learned by a different way; one by dialogue vs footsteps and traffic, another by random broadband noise like radio static, and one just for wind. As it turned out, I had one file that had all three problems.

    What you mean?

    RX6 has new outstanding Dialog Isolate. And it seems like you not understanding how it works.

    Proper sequence is usually such:

    1. Spectral denoise (multiple passes most of the time)
    2. Dialog Isolate
    3. De-Wind, and only on parts affected (very rare, as it must not happen at first place).

    Actual limit is how natural speech sounds.

    Reducing noise to zero or very low amounts is also very bad idea usually.

  • anyways, I was directly comparing izotope rx 6 to audition with noise removal. audition worked well with certain FFT sizes like 4096, but others not so well. I usually work backwards because its easier to hear if you're on the right track by ramping up the intensity of sound removal to max, then output noise only. this way, if I hear any dialogue, I am not on the right track. then I simply need to try other settings.

    Anyway, the main reason I am posting is that RX 6 advanced had all these new dialogue filters and they all seemed very similar. Apparently, they were each machine learned by a different way; one by dialogue vs footsteps and traffic, another by random broadband noise like radio static, and one just for wind. As it turned out, I had one file that had all three problems.

    So I decided to try them out. They all worked reasonably well, at least as good or slightly better than audition. but I was wondering. What if I tried all three. What order would they go in? Would it make it worse overall from distortion? What I found out is very interesting.

    I've found that if you run dialogue isolate with a very low setting like 1.5 separate and -6db reduction. then do voice denoise, adaptive -6, then all that's left is a nice voice with no noise except the wind, then the de-wind works 10x better. Now you're left with clear dialogue with zero distortion. So, apparently you need all 3 in that specific order. I tried other orders but my instincts were correct that you use the least damaging filters first, then all the excess noise falls through the chain in a certain way.

    This is also how I use audition and grading as well

    Your thoughts?

  • yes, I process each clip separately. I did do a generate square tone test in audition with 15 random normalizations, some way over 0db. I zoomed into the samples and had 0% dither. So, its 100% mathmatically lossless if your curious about that. Like I said, dynamic processing effect works well to remove low db noise as your 'ducking limit'.

    crossfades would only be neccessary if there was still noise in the clips. Once reverb was matched, either by dereverb or reverb, then rx ozone equalizer to match eq across clips or manually in audition with view freq analysis para eq effect.

    Add room tone manually via dynamic proc or rx 6 advanced ambiance to match automatically. If noise is completely removed, reverb matched, and eq matched, then a simple RMS -23db should make all your clips somewhat match across the whole timeline. finally lay in the room tone track.

  • @hardimpact it might be that normalizing is lossless in Audition, but I've never tested it. For example, it may automatically add dither, I have no idea. I've never gotten good results from it, but I don't use it that often.
    But the question is how to handle the basic problem with in-camera compression. You could use or refine a ducking limit, so some parts of the background noise would automatically be set below the threshold, but I would just take the time to go chunk by chunk, split each part into audio and background with a crossfade for each bit, and then smoothly set the crossfade for each part so it sounds natural, before introducing any processing. Each background chunk would then have a custom amplitude, set 2-6 dB lower, and then when you process the audio you mainly boost the "good" part, and any dither simply adds detail instead of adding noise. You may need some light dither to help with the compression and convolution, but you only want to add it once. If one tested the system and you could see that the normalization was lossless, sure, do the micro edits after boosting it, but even on a high end DAW I would trim the audio before processing. By using the microedits you are reversing what the camera did, which is to boost the low level signals. It won't effect any clipping or brick wall effects, and it won't sound as good as a regular recording, of course.

  • What camera audio must you use? Is it dialogue that was not captured with the shotgun mic?

  • @spacewig Yes, that's exactly the situation: shotgun mikes for most of the project, camera mikes for the rest. They're stereo, too.

    I'm going to try to use Audition to plow on through a rough sound cleanup. It won't be perfect, but it'll be better than what we currently have. Eventually, I'll have these skills nailed.

  • peak amplitude is non-destructive in a 32bit environment and retains dynamic range. It just makes noise easier to see and work on. and matching mics would be step 8b, 'matching shots', not listed. I am always open to suggestions and appreciate your input on how you approach 'balancing compression'.

    For example, I use RMS and tweak the mixer to decide if something needs compression/limiting. i.e. too close to 0db. -23 RMS works extremely well for audio editing because it keeps the voices in a 'talking range' so you don't spend years micro-managing each clip. and natural voices will usually peak at -10 so it actually creates a perfect setup for whispering, normal talking, and yelling, all while dynamic range preserved.

    Usually if something goes from -30 to 0db, its going to be compressed 4:1 because human ears get tired after a few hours in a theatre from a larger dynamic range.

  • @hardimpact I would not normalize the volume until you balance out the compression (using micro edits), otherwise you are just boosting the noise. Also, I would recommend using some sort of physical modelling to match the cam sound to the mic sound, or match both sounds to an intermediary sound space.
    @Brian_Siano although you might be able to do what you want to do in Audition, to get a somewhat better result you probably would have to use a real DAW, or just pay someone to process the audio for you.

  • Brian, is the camera sound stereo? Also, are you doing this because you are missing takes from shotgun mic?

  • most compression is destructive. the closest 'undo' you can do is audition's effect-dynamic processing acting as an expander/compander. The quality depends on if the audio was 16 or 24 bit, similar to 8 bit vs 10 bit video. Also, if the audio was hard limited, the quality had possibly been reduced as well.

  • How can one reverse the camera compression? I'm using Audition and PP.

  • I agree, compression should always be the last or second to last step. I put them in order of non destructive.

    1. dc offset - offset power errors affects dynamic range so this goes first, a critical pre-alignment step
    2. match volume - normalize 'peak amplitude' everything to -6db to see serious errors easier like clipping, offsets, phasing etc.
    3. declip - run declip effect, remove buzzing with dynamics processing, notch filter the rest.
    4. equalize - run parametric equalizer to reduce wind, hiss via freq cut, give vocal clarity 'D' curve
    5. phase correct - once tracks layed, detect any phasing errors
    6. noise removal - declick, de-reverb, remove hiss, wind, learn sound model/noise reduction, all else healing brush
    7. studio reverb - to add vocal weight
    8. RMS -23db audio - to roughly match vocals begin basic audio editing with levels(vocals center and 12db higher than music, etc.)
    9. compression - multiband compress sfx: 4:1, vocals 2:1 db to give pleasing db range. always 2nd to last step because any other processes affect dynamic range. It should soft limit for you. hard limits are audible and not recommended.
    10. quality control - match loudness ITU-R BS.1770-3 -23 LUFS film, -16LUFS youtube - for standard
  • What you don't want to do is boost the noise and then try to remove it. Try to reverse the compression first.