I would like to know what's wrong with this approach. I watch a lot of commentated speed-run videos: that's often something like ~244p video, plus soft subtitles. The subtitles get rendered at the source resolution (presumably, into the video framebuffer) and then upscaled along with the image, forcing them to be a tiny blurry mess instead of the crisp, readable text they could be.
Closed captions are positioned on the screen to indicate who's talking, have descriptive audio for sound effects, and should be in a high contrast easy to read font (most people with hearing deficiencies also have problems seeing, ie: out of date prescriptions for both hearing aids and eye glasses).
As far as I know, QuickTime does it right but the Apple TV, Netflix, and YouTube fuck it up, but that's because I helped write the QuickTime one way back.
Here is a demo: https://www.youtube.com/watch?v=BbqPe-IceP4
Please do not spread falsehoods.
Disclamer: I work at YouTube.
The issue you can run into in practice is stuff like softsubbed signs, which can clash and look out of place with the native video if you render them at full res. There's also a related issue, which is that if you're using something like motion interpolation (e.g. “smoothmotion”, “fluidmotion” etc. or even stuff like MVTools/SVP), softsubbed signs will not match the video during pans etc., making them stutter and look very out-of-place - the only way to fix that is to render them on top of the video before applying the relevant motion interpolation algorithms.
Personally I've always wished for a world in which subtitles are split into two files, one for dialogue and for signs, with an ability to distinguish between the two. (Heck, I think softsubbed signs should just be separate transparent video streams that are overlayed on top of the native picture, allowing you to essentially hardsub signs while still being capable of disabling them)
Also, sometimes, rendering at full resolution is prohibitively expensive, e.g. watching heavily softsubbed 720p content on a 4K screen.
Sure, you have to transform the coordinates to the output. But still, better to render fonts at the final resolution; they'll always look better than if scaled after rendering.
The only practical downside I have noticed is that accurate rendering of subs containing complex vector graphics or effects (ASS supports that) at > HD resolutions takes a lot of CPU time, sometimes more than a single core can handle in realtime.
There probably is a lot of potential for optimization, but those are hobby projects for their maintainers.
whilst i don't necessarily agree... i do agree that if you want to conform to specs then you can't go thinking this way.
I've no direct experience with, say, Russian or Latin American governments, but cultures that use explicit patronymic or matronymic names might expect that broken out as well.
If you ever need to submit user data to the government (e.g. for tax reasons), and you don't ask your user to break the name apart, then you will necessarily be guessing, which seems strictly worse than just asking them how their name might split.
At the end of the day, if you operate in a given culture, then you need to address those cultural norms. Bending over backwards to support every possible edge case seems unwise if they also happen to disagree with those norms.
"Dear [first name]," flows better than "Dear [opaque string],".
1. Throw an error and won't let me enter my last name as it is supposed to be spelled.
2. Truncate the last part of my last name.
3. Try to be clever and end up shoving the first half of my last name into a middle-name field.
My preference for names, addresses, and other personal data is to stop trying to constrain people to preconceived "standards" and just let them enter their information the way they want it to be.
Normally I just use it with an space in the last name field, but then I get exactly the same problems you mention.
So many mundane things have the "how hard can this be" ..
I honestly think this genre is horrible and counterproductive, even though the writer's intentions are good. It gives no examples, no explanations, no guidelines for proper implementations - just a list of condescending gotchas, showing off the superior intellect and perception of the author.
The "Name" version is a good example of that, I can easily see how most of the examples on this list can be falsehoods.
On the other hand in TFA some of the affirmations leave me more perplexed. For instance, regarding color conversion: "converting from A to B is just the inverse of converting from B to A". I wonder what's meant here. Is it just a matter of rounding or is there more to it than that?
The catch 22 here is that if you understand this list then chances are you already knew about most of these gotchas.
So yeah, a pretty bad format. Now we just have to write "`Falsehood programmers believe about X` considered harmful".
A better approach would be to pick the list up and turn them into a collaborative work. Wiki, maybe?
Hah, this strikes really close to home. I've had to work with so so many subtile files in Eastern European and Turkish Windows codepages mostly but not entirely compatible with Win-1252. There's no way to tell them apart programmatically, so you check that the extended characters make sense. It's a bit of a nightmare.
hell, they don't survive alt-tabbing into a game that has a different resolution than the monitor
Mplayer and co, on the other hand can cope with it but my window manager can mess it up so I don't bother.
> I can exclusively use the video clock for timing
Heh. I just finished writing up a design doc to address problems I had with this, and I referenced "Falsehoods programmers believe about time". Then I opened Hacker News and saw this article. So this is very timely for me.
(My doc: https://github.com/scottlamb/moonfire-nvr/blob/new-schema/de...)
its a nightmare, but the reason for these observations is precisely that it shouldn't be a nightmare. this area of programming is a wasteland ... nobody that good wants to solve these trivial problems :/
Try experimenting with chroma subsampling in JPGs, but note that not all image viewers have good chroma upscaling. MPV can display still images as well as video and you can choose the chroma scaling algorithm.
What's more, YCbCr is more efficiently compressed than RGB even if you don't subsample, for the same reason that a DCT saves bits even if you don't quantize: Linearly dependent or redundant information is moved into fewer components, in this case most of the information moves into the Y channel with the Cb and Cr both being very flat in comparison. (Just look at a typical YCbCr image reinterpreted as grayscale to see what I meant)
but you get the exact same effect from higher resolutions, e.g. going from SD->HD->2K->4K we see the same thing... and we are still doing it, so i would question highly that it is subjectively better in a long-term sense given this continuing trend.
i remember hearing people discuss this sort of thing when HD was new, and they stopped after while - i suspect because they got used to it, and they now realise how low the quality of the SD image was. i noticed this in myself as well...
edit: incidentally there is a discussion about this here (first google thing i found): http://www.neogaf.com/forum/showthread.php?t=1308591
its seems either nobody or very few are taking the perspective that 4:2:0/4:2:2 looks better, and there are even a few descriptions of precisely what they notice as being worse.
what i think of as undershooting or overshooting is relative to the range... and besides that, what is wrong with clamping? its how computer graphics has always had to deal with these things... limited range simply doesn't exist in that context, and it doesn't harm anything.
when computer games are forced into limited range for consoles you don't get these unless your tv is applying one of those god awful filters that ruins everything anyway... (i'm still not sure why so many tvs have these - reference monitors never do anything this insane) ... but i can tell you what you do get, a subjectively /and/ measurably worse quality of image than from a monitor.
(i don't think i'm alone in this based on the contents of the ITU-R BT.2100 either... which defines a full range as well as a 'narrow' one)
/sarcasm
and
> video decoding is easily parallelizable
At a previous job, I don't know if it was just the field I was in or just bad luck, but having to explain this over and over again was kind of a personal nightmare.
That being said, this is an excellent list!
If you can jump ahead, it would seem to be easy to have multiple threads, starting at key frames to decode the content. You'd have to splice them together, but this seems possible.
It's a resource issue (memory, cpu, etc; and meeting latency requirements between those constraints), versus the subtly different standards "H.264" hardware and software follow, as well as a few other intricacies with how the whole standard works anyways. Again, it's not that it can't be done, but as the article says it can't be done easily or at least in certain situations done consistently.
Key frames are a good anchor around anything you're doing with H264 (and other formats), but it's not the end all and be all -- and they may even cause you trouble if you "trust" them too much. It is perhaps a bit like date time programming. You can create something fairly easily that works for a decent amount of time, and even if it ends up being incorrect your clients may not even notice... or it may breakdown in a catastrophic manner in the future. But doing the latter is certainly not correct and it's certainly not professional. Quite honestly, I'd say date time programming looks like a dream compared to the inconsistent nightmare that is video programming. Date/time logic needs to be sound because many programs rely on consistent and sane output from a program perspective, where as video programming gets to slide as long as the output is generally correct from a human visual perspective.
It's been a few years since I've dived into this stuff, so some things may have changed/gotten cleaned up. But the article seems to indicate that the ecosystem hasn't really changed.
although i contend that most decoders are very threadable - just that the people trying to do it usually lack the time or the skill, more usually the former.
the state of video in programming is a total mess from my experiences.
Also, none of these unfounded preconceptions make intuitive sense, so I don't see why people would believe them.
Interlaced video files should no longer exist.
Seriously, fk interlaced video.
> upscaling algorithms can invent information that doesn’t exist in the image
That's not a falsehood. Upscaling does invent information that doesn't exist in the image.
Yes, they should, as should silent movies, black and white movies, old game consoles with exotic output formats like vector graphics, and the like.
It is a worthy endeavor to create and maintain video playback software that lets people consume beloved content that was made to the technology of its day, including home videos, sports games, TV shows with special effects edited in 60i, and video games.
The upscaled image does not have more information than what was in the original image; you can reconstruct the upscaled image given only the information available in the original image, the output resolution dimensions, and upscaling algorithm.
https://hn.algolia.com/?query=falsehoods%20programmers%20bel...
And while this topic is not personally relevant to me since I don't work with video decoding, I do find learning about different technologies interesting. Reading this gives me an appreciation for how much effort goes into making video, something we all take for granted, work.
If people only posted articles that were relevant to a majority of readers, HN would be a much less interesting place.