IEC 62503 pdf download Multimedia quality – Method of assessment of synchronization of audio and video
This International Standard provides a subjective (or perceptible) and statistical method of assessment of overall, or end-to-end, difference of delays between real world and reproduced scenes in terms of video and accompanying audio recoded in a medium.
This International Standard does not specify limiting values for those results obtained by the application of the provisions in this standard. It excludes applications to professional broadcast systems.
2 Normative reference
The following referenced document is indispensable for the application of this document. For dated references, only the edition cited applies. For undated references, the latest edition of the referenced document (including any amendments) applies.
ITU-R BT.500-11:2002, Methodology for the subjective assessment of the quality of television pictures
3 Terms and definitions
For the purposes of this document, the following terms and definitions apply.
video delay against accompanying audio
subjective opinion score outside m s , where m is a sample mean of the original scores of a set of subjects for the same video delay and s is a standard deviation of the scores
ordinary untrained human audience of audio and video reproduction; random sample of individual members of general public
test video clip
short duration of video frames with accompanying audio to be used as original
test video sequence
random series of test video clips where the audio channels are shifted in time compared to the original
4 Overview of methods of assessment
Figure 1 depicts overview of possible objective methods of measurement and subjective method of assessment to acquire necessary parameters corresponding to lip sync.
Subjective assessment of lip sync
Items to be assessed
Subjective grading level of miss-synchronization of video and audio.
5.2Preparation of test video clips and test video sequence5.2.1 Selection of content of a test video clip Since lip sync is a kind of human perception, it may depend on the contents of the video andaccompanying audio.Especially when it is related to movement of lips of a human speaker, amatch between a spoken language and a mother tongue may affect the result.
NOTE In this Intermational Standard, in order to provide worked examples, speech in Japanese language utteredby a well trained processional news reader is watched and listened to by the subjects with the same mother tongue.
A bust shot of a news reader shall be extracted, duration of which should be around 10 s to 20 s. Data of audio channel of the video clip shall be taken as the timing reference.
Possible amount of time caused by miss-synchronization in this original video clip,t at thesection 1-1′, is unknown. However, this international standard provides the method to
estimate overall lip syncts including to and t1 . Namely,Dts =口t+t +口12 .
5.2.2Creation of a test video sequence
The test video sequence shall be a randomised series of the video clip selected in 5.2.1, inwhich each of the audio channels shall be replaced by time-shifted audio data with necessaryduration of padding as a leader or a trailer depending on the direction of the time shift.
Preparation of such video clips is show in Figure 2 as in the image frames with delayed audioand with led audio.The amount of time shifts T and Ta is subject to be adjusted.