How do companies like jwplayer, Vimeo, and Google create a dataset for certifying their encoding presets?


March 2019


Bunch of sites give an encoding recommendation for the videos being uploaded. Example : youtube, jwplayer, vimeo How do they create their datasets? Dataset has to be representative of all the kind of content which can be served at whatever resolution. Is it algorithmic or manual process to pick videos for dataset? How do you conduct that subjective/objective quality assessment to determine what you recommend on your site can work universally?

Say we train a model on that dataset and then score the rest of videos(not part of dataset) against that model and see if the scores were justified. For that if dataset is not representative content then evaluation would be incorrect. Sites like youtube will have videos from plethora of Semantic categories. So would the encoding settings be same for say sports and a product ad or animation and a broadcast news.

