-- Linus Torvalds
I'm not saying that to defend anyone BTW. This complexity and opacity (which is transitive in the sense that a combined result including even one opaque part itself becomes opaque) is very much the problem. What I'm saying is that it's likely impossible for the companies to comply without making fundamental changes ... which might well be the intent, but if that's the case it should be more explicit.
At a broad level:
what are the input sources like IP address , clicks on other websites etc you use to feed the model.
What is the overall system optimized for , like some combination of engagement , view time etc, just listing them if possible in a order of preference is good enough
Alternatively what does your human management measure and monitor as the business metrics of success .
I want to know what behaviors (not necessarily how ) are used , I want to know what is feed trying to optimize for , more engagement, more view time to etc
This is not adversarial, knowing this helps as modify user behavior to make the model work better.
Users already have some sense of this and work around it blindly , for example YouTube has heavy emphasis on resent views and search . I (and am sure others) would use signed out user to see content way outside my interest area so my feed isn’t polluted with poor recommendations. I may have watched 1000’s hours of educational content but google would still think some how to video I watched once means I need to only see that kind of content.
Google knows it is me sure even am signed out, but they don’t use it change my feed that’s the important part and knowing that can help improve my user experience
You are an insider?
Even if some of that is off, the premise of a chain of some ML, and some not ML, processors means they probably can't really tell you exactly why anything ranks where it does.