Blender VSE: How to Fix Rendered audio Going out of Sync

Cerbyo · September 17, 2018

Cerbyo writes:

This covers an audio sync issue you may unknowingly experience in blender at some point if you render using audio+video settings.
Specifically this is an issue experienced after successfully lining-up and playing everything properly on vse playback. It occurs because the sync during the actual render follows different rules than vse playback. It’s important to note this if you render out longer videos, such as those over 10 minutes in length. Based on stack exchange threads, your individual system settings might affect how noticeable the sync issues may be. I can only speak from my system’s experience and the online readings of others. So run your own tests if this is of concern. In 2.79b I’ve been able to replicate it back to back over 2 different projects. I honestly don’t recall it always happening, but currently I’m able to replicate it so it ‘always happens’…so just be cautious of it occurring sometime if it isn’t already.
More specifics about other possible solutions, etc, over on youtube. Thanks.

Author Cerbyo

7 responses

Reaction

17 September 2018 at 14:15

I’m glad you have highlighted the audio sync issues in Blender, which always prevented me using Blender as a video editor. But suggesting that we use an external program to recombine video/audio is not really a ‘fix’. I wonder if the Blender studio does the audio in in Blender for Sintel, Agent 327, etc., or if they secretly have to add it outside Blender?

I had a feeling that this bug had been fixed, because I think the last video I edited about 6 months ago (24p / 48kHz, from an SLR) actually stayed in sync, which was a first for me ;-).

Cerbyo
17 September 2018 at 23:56
I agree out of house solutions aren’t solutions, I posted as much over on youtube. But in this case ffmpeg is what blender is actually using to render anyways. It’s literally a case of downloading a “part” of blender and using it separately. If you are proficient at coding in python and ffmpeg you might be able to create an inhouse solution, assuming it doesn’t break anything else, but you would be effectively doing blender development and replacing the outdated system (aka alot of work).
I understand your viewpoint, but I would urge you to consider things like this as special cases and make an exception. I’ll add that ffmpeg is technically already on your system, you are just downloading a clean access terminal to use it.
In order to effectively use Blender as a video editor you have no choice but to rely on external programs. There is no way to use it effectively all ‘in-house’. For example the audio system in blender is also a placeholder system (although one dev contests its just “different” and more “ideal” for his purposes…fair enough). When you combine two waveforms it will amplify the output waveform, which is dangerous given you have no way of confirming whether clipping will occur in blender, short of rendering and getting a ‘feel’ for when things are hitting the max. Normally in audacity and other programs you have the option of combining without increasing the amplitude of the waveform…this is the ideal in my mind. Even the pitch feature in blender audio options is messy and a placeholder system…on vse playback you have to make sure you play from start to finish in order to find the proper positioning to line things up, there is no visual indicator either. It’s impossible to use on a 3 minute track and lineup things to multiple video segments (well not impossible but you have to play from the start of the track to find the position for every single video track).
The other good example is if you need to make a small correction to an audio track you rendered out in blender. What do you do? Do you toss it back in blender and do the correction and go through the entire render time again? Do you toss in the entire rendered product and re-encode for shorter render, but in doing so enabling the creation of artifacts? Do you always create a 16 bit png or higher quality transcode of the completed editing so you have faster future corrective renders you can just return to for error fixes (this actually the recommended method on the manual, but you can imagine the prep time and file sizes)? Or…you just toss it in audacity, use the many many great tools available there, export it out as an audio track and recombine to the video via ffmpeg making sure you use the -c copy command to avoid re-encoding (very fast very easy fixing and rendering).
It’s ridiculous to adhere to the blender placeholder systems. Do yourself a huge favour and consider the usage of third party ones.
Reply

Eric D

17 September 2018 at 15:45

Hmm – this surprises me a bit. I have been editing video for personal use for years in Blender, with multiple video tracks, multiple soundtracks, effects, masking, etc… and I have mostly only run into the problem you describe when using video with variable frame rate. The videos I typically edit/re-author are between 1 and 2.5 hour long. There are occasional synchronization issues in the VSE (comparing VSE playback vs. the rendered movie) – especially when using masks – but A/V sync has long been resolved for me by converting variable framerate movies to constant framerate (using Handbrake for example) _before_ adding the movie to the VSE. I also extract the audio from the video with Audacity (which can produce 2 to 6 tracks/channels) and add those separately in the VSE. Doing this, the length of audio and video usually matches within a couple of frames – which over 2.5 hours of video at ~24fps or ~30fps is as close to perfect as one would want, I think.

Cerbyo
18 September 2018 at 01:06
That’s an interesting method, it’s like the opposite way of using audacity that me and a dev I talked to uses. We simply toss our audio tracks into blender, merge them into as few channels as possible, and then export them out 1 channel at a time back into audacity via mixdown to do the fine tuning and editing once the video is complete. The vse simply serves as a positioning tool to line things up to the video content; we do all main edits and combining of audio tracks in audacity.
Is there a way to recognize variable frame rate video content? Or rather to recognize if blender recognizes it as such? I just ask cause the footage I get online might be the problem. I never see any labeling to indicate variable frame rate on it. Or perhaps my footage I get from online had an error that was in effect causing a pseudo variable frame rate given the frame rate is generally in some for of decimal value (5/0.21). I either leave it and do the project in blender in the native decimal value (blender does respect all those decimals!) or change the project workspace before any editing is done to the nearest full value like 24 fps.
It probably is the format of the video content that causes conflict with blender’s audio/video render. So if you are treating them a way blender likes then that’s probably another but very time consuming solution to the problem.
I honestly only ran into the problem just last week, and was taken aback by it and the documentation I found online about it. I’ve done hour long content as well using audio+video render and never noticed an issue. The format for the content edited was different though, like they were obs recordings as .flv at integer fps value (30/24). There were enough gaps and inconsistencies experienced in my previous projects (3-5 min videos) to confirm something weird has probably always been happening, but those all used h.264 encoded footage I got from online.
I would still digress to do it the method of simply mixdown. I can see thing taking alot of time and perhaps requiring re-encoding, and if its a transcode it means huge file size then and harder to edit, in order to get them into blender happy format ahead of importing them into the vse. I’ll pass on that given I can’t afford to create unnecessary artifacts or file sizes for my editing purposes.
Plus the quality difference received from mixdown is too good to passup depending on the level of quality video needed.
Reply
1. Eric D
  18 September 2018 at 02:11
  Given the kind of movies I edit, I assume a variable frame rate by default. You can still get frame rate information from something like VLC – but I don’t think it’ll tell you VFR vs. CFR. If you’re familiar enough with Handbrake you can produce virtually lossless constant frame rate output – whether the source used a variable framerate originally or not. So I do it most of the time anyway :). As for sound quality, once I split the audio into separate channels in Audacity, I import them into the Blender VSE and I use the Pan setting on each channel to produce back stereo (with 2 tracks) or surround sound (with 6) – spread anywhere between -2 and 2 depending on how “surround” I want it to sound. The audio in the rendered movie is different from the original movie (because on the panning adjustments I make on purpose), but the quality of the output is extremely good.
  Lastly I don’t think you can pick any constant framerate in relation to the original one. It has to be the same, or one that’s a multiple of it, otherwise you’ll get a movie that has “hiccups” because frames repeat or are cut out because of your conversion. I hope that makes sense.
  In any case that’s the approach that has worked best for me so far and has allowed me to eliminate those AV/Sync problems altogether – regardless of the length of the rendered movie.
  Reply
  1. Cerbyo
    18 September 2018 at 05:41
    When you say “virtually lossless constant frame rate output” does this process still re-encode? And what’s an example of when you’d use it, like would you use it to change fractional fps to an integer value or would you keep fps at whatever value its ‘mostly’ supposed to be?
    I ask cause for my purposes I have to deal with h.264 animated footage, so I have to keep things down to a single re-encode (the final blender output) to limit grainy artifacts. So when I have to re-encode inbetween, I transcode to a higher colour bit format (png 16 bit usually) which generally explodes the filesize.
    Also for my purposes I don’t care about syncing the original video footage to the original audio, infact I generally delete the original audio anyways. So I don’t need to worry about losing or gaining any frames to keep a sync. Changing your workspace fps in blender doesn’t cut frames from your video, it just creates a gap between the timing of video and it’s assigned audio–> Right? Hopefully that’s right.
    I’m simply butchering videos and lining up images (every image is found across 2 or more frames) to external audio sources. So the framerate of the workspace is assigned, and then my video content is remolded to fit the timing of external audio.
    Sorry I guess my purpose here is to try and probe to see if I got something wrong about my current understandings of that stuff.
    +++++++++++++++++++++++++++++++
    In terms of your audio. I don’t understand. What you outline is contrary to my own experiences about the limitations of blender’s audio system. You use blender to lineup audio to video, and determine ur pan values by viewing it with the video in realtime….got it. But why rely on it for any of those other tasks you state?
    1. vse audio playback has 2 systems (openal, sdl) that are completely independent of the rendered audio output system. So you cannot hear exactly what the render will sound like in the vse. I don’t have experience in how close 5.1/7.1 surround sound is in mimicking the render though, so maybe they provide closer comparisons.
    2. you can’t tell the amplitude of the waveform past its original state (via draw waveform). You can guess it or calculate it, but not see it. But I’m guessing you don’t use the volume values in blender.
    3. everytime you merge/combine waveforms together in blender it amplifies when it merges. There is no visual or audible indicators when clipping occurs either so you must be careful what is combined. I’m guessing again you don’t combine stuff much via ur method.
    4. What output settings are you using? like the only settings i haven’t touched yet in blender are the properties->scene->audio->rate and distance models. Are you fiddling with those…? Its inconsistent with my own experience as to how I have been unable to get a good output out of the audio+video render settings.
    So based on those statements/questions: if you are doing a 5.1 surround sound…. You toss a track into audacity, whatever many channels it may be, then divide and export it into 6 channels. Then you toss it in blender and modify those 6 different channels via pan. So everything except pan and perhaps positioning was already good and done before you tossed it in blender?
    Like how do you even assign each channel to the right speaker? Does blender have some ui that allows you to do this? Or is there some order based on top to bottom or something I don’t know about?
    Do you have much experience fiddling with the distance model under scene->audio? I’ve always kept it on the default inverse clamped 343.3 speed; 1.0 Doppler. Just thinking that might have something to do with my problems. So sorry for a million questions here, it’s just your experience is so different than mine I want to see if I got something wrong.
    Reply
    1. Eric D
      18 September 2018 at 15:50
      Our use cases look very different and we’ve evidently optimized our use of Blender VSE (and Audacity, and in my case Handbrake) to our respective ones.
      1) Audio to video syncing is extremely important over the entire length of the movie for my use case. Since A/V Sync in Blender VSE doesn’t act predictably with VFR video, conversion of the movie to CFR was the only way I saw to make it work – and it did.
      2) Yes – I re-encode the movie with Handbrake. I don’t think there’s any way around this in order to convert VFR to CFR. But again, the process in Handbrake can be tuned to provide high quality output. It usually creates a ~10GB file for a ~2 hour movie at ~24fps, but the result looks extremely close to original quality, and I am picky :)
      3) The reason I remove the audio channel for the movie in VSE and instead import separate audio tracks from the movie to the VSE is because I often need to edit the audio, or put keyframes on volume for one of more audio channel. With 6 audio channels, I can very often remove speech from audio without losing ambient sound which, again, is important for my use case. If I don’t distribute audio output using the pan setting on each of the 6 (or 2) channels, Blender creates mono audio output. So panning the audio channels is my way of restoring stereo or surround sound for the rendered movie. I usually pick AC3 or AAC for the audio codec and I don’t fiddle with any other audio settings.

7 responses

Leave a reply Cancel

Easiest Modeling Technique?

Procedural Picnic Blanket Fabric Material

Sand Castle Tutorial