Twitter's Recommendation Algorithm

1185 comments

Context: I teach at Princeton and study social media and recommendation systems.

From a very quick skim of the repositories, this appears to be quite limited transparency. The documentation gives a decent high-level overview of how Tweet recommendation works—no surprises—and the code tracks that roadmap. Those are meaningful positive steps. But the underlying policies and models are almost entirely missing (there are a couple valuable components in [1]). Without those, we can't evaluate the behavior and possible effects of "the algorithm."

[1] https://github.com/twitter/the-algorithm-ml

eterevsky3y ago

I work on Google Assistant Suggestions and I don't think it's very practical to open-source an algorithm like that including the models and the underlying data. Both of them can live in separate services and be frequently updated.

I am assuming that open sourcing the code aims to increase transparency about the business logic of the ranking decisions. At the same time you don't want spammers to be able to easily run experiments against a cloned version of your system.

bilekas3y ago

> But the underlying policies and models are almost entirely missing (there are a couple valuable components in [1]). Without those, we can't evaluate the behavior and possible effects of "the algorithm."

Haven't gone through yet, but yeah, if that's the case, all this is, is a glorified framework to plug your own in.. Not exactly what was promised.

tpmx3y ago

Did you also skim the accompanying (or rather, main) repo, https://github.com/twitter/the-algorithm ?

From a quick clone and line-count, it has:

  235 kLOC .scala
  136 kLOC .java
  22  kLOC .py
  7   kLOC .rs

So I don't think you did, since you posted so quickly and that's a LOT of code.

I also haven't skimmed this code except very superficially, but perhaps you should since you're out there making statements with your Princeton credentials.

(I posted this comment with the heads-up a few minutes after your comment above and then expanded it as you didn't respond.)

Lord_Zero3y ago

I think you misunderstood. He's saying the training models are not there.

kadavy3y ago

For example, MostRecentCombinedUserSnapshotSource seems to be influential (such as for calculating "tweepcred"), but we can't see how it's calculated.

eecc3y ago

Wouldn’t that make them easy prey of “spam SEO”. However, given the framework isn’t it still possible to guess the models?

makeitdouble3y ago

The spam SEO issue should be dealt/thought about _before_ engaging in the whole adventure, and having to guess how it could work if decently implemented properly defeats the "open source" spirit of it.

More credits would be given if the very idea of open sourcing the algorithm hasn't already been discussed to death with predictions of the difficult points and how it probably won't happen in any sane way.

2 more replies

modeless3y ago

What about these? https://huggingface.co/Twitter

simonw3y ago

Those look older to me. They all have last updated dates for October and November 2022.

EastSmith3y ago

FB open source algo looks much better, right? /s

zhte4153y ago

Is it valid to focus tracking a Dem/Rep split when that split is an exclusionary design for many Americans? Or is it not exclusionary in your belief? I'm curious of a social science perspective.
Ignoring the global nature of Twitter for a moment.

meghan_rain3y ago

So why did they opensource it?

daveguy3y ago

So they could pretend to be open. It's the "Open"AI model. Open-washing?

3 more replies

joshspankit3y ago

That was one of Elon’s core statements when he first talked about buying Twitter. If he had gotten it out sooner there would be an easier link between the two, but if you want more context just go read the old tweets and articles from the Twitter vs Elon days.

kzrdude3y ago

If we can't build anything with this, is it "source"?

2 more replies

justapassenger3y ago

You must be new to Musk's business practices.

avanti3y ago

It's no secret that Twitter, like any other social media platform, is driven by user engagement and ad revenues. The more time we spend on the platform, the more valuable it becomes for them. With this new open-source algorithm, they're essentially crowdsourcing improvements to their system to better serve us the content we crave.

this move could be seen as a strategic PR play to boost their public image amidst the growing concerns around algorithmic bias and lack of transparency. By inviting the community to collaborate and address these issues, they're not only shifting some of the responsibility onto the users but also deflecting potential criticism.

bradly3y ago

Because they let go many of the engineers working on it?

carstenhag3y ago

Noone has mentioned this before - I don't know if it's really related, but afaik the European Union is thinking about requiring social media platforms to be more transparent when it comes to recommendations etc. If you can already say "hey we have a lot already online!" then maybe the laws will become less strict.

llx23y ago

bc he have no devs anymore and thinks the community will fix it for free

w0m3y ago

PR and it was already leaked last week.

anigbrowl3y ago

helsinkiandrew3y ago

> But the underlying policies and models are almost entirely missing... Without those, we can't evaluate the behavior and possible effects of "the algorithm

And neither can spammers find and test the cracks and edge cases that would allow them to break the system, that does sound reasonable to me. If they were public there would be an arms race between spammers/those wishing to game the system and Twitter engineers.

1 more reply

novok3y ago

It's an open algorithm, but it's not open data! (joking)

ngrilly3y ago

What did you expect?

TaylorAlexander3y ago

I don’t know if the parent’s expectations matter here. This is more about making sure others don’t misunderstand the meaning here.

1 more reply

bobobob4203y ago

Can i audit your classs for free?

phailhaus3y ago

Great! But nothing is going to change until people realize that the problem is the feedback loop. It's not the recommendation engine itself, it's the fact that there's no way "out" of the feed that the engine produces. It recommends you stuff, you have little choice but to engage with it, and then it trains on that information.

This is the problem with most of social media today. It is a very well known problem in ML [1], but nobody is willing to do anything about it because it's a fundamental UX change. Facebook, Twitter, YouTube, TikTok, they have defined themselves by their recommendation engines.

[1] https://towardsdatascience.com/dangerous-feedback-loops-in-m...

dmonitor3y ago

reminds me of a story about a guy who was given a gift, a decorative plate with a rooster on it i think it was. didn’t care for it too much, but out of politeness put it on display on an empty cabinet he had. a while later someone noticed he had it and figured he liked it, so got him a similar decorative plate with a rooster on it. again, out of politeness, he put it next to the old one. now other people started to think he just really liked roosters, and started giving him little rooster statues and nicknacks. Eventually he just has a whole display cabinet of rooster themed gifts that he never really cared for to begin with, but people just assume he likes them because people keep giving them to him.

6 more replies

khy3y ago

I think Instagram in particularly is bad in this regard. It seemingly becomes convinced that I care deeply about the subject of any post that I even momentarily linger on.

LZ_Khan3y ago

Don't you have a choice.. to not engage with it? If you didn't like it then assuming the metrics system is working correctly, this would be negative feedback to the ML model, causing said content to not be shown in the future.

1 more reply

hdivider3y ago

Thank you! Working on a concept for a big org on what may become a large ML-based system one day. I knew about this feedback loop issue, but was too dumb to actually remember and face this problem. :) It's all over today's rec engines -- and yet, just like the things we're not shown in these systems, the problem itself seems to become invisible. Because it requires new thinking.

Worth exploring.

1 more reply

rejectfinite3y ago

I feel like the Youtube one is good. You can mark videos and channels as "not interested" and Youtbe really knows me due to my account age and usage... It recommends me unknown videos and I tend to like them but also more mainstream stuff.

1 more reply

mgiannopoulos3y ago

Isn’t the “I’m not interested in this tweet”, “Show me fewer tweets from X”, etc options working for you? They seem to have an effect on my end.

1 more reply

seydor3y ago

So , all we need to do is to open source humans?

1 more reply

PenguinRevolver3y ago

Great pull request here which improves the algorithm: https://github.com/twitter/the-algorithm/pull/17

simonsarris3y ago

That would be great (unweighting bluechecks) but they actually plan to go in the other direction: Starting April 15th non-bluechecks won't show up in the "For you" section (the algorithm timeline) at all. Unpaid users are being written completely out of the algo.

https://twitter.com/elonmusk/status/1640502698549075972

anigbrowl3y ago

Inaccurate. Musk stated that people you follow will continue to show up. I only use the 'for you' feed when I'm bored and want stupid dopamine hits, I leave it on Following almost all the time. But that's on desktop, my understanding it that it keeps resetting itself for mobile users (of whom I am not one).

2 more replies

alfor3y ago

I don’t see a way out of this with the GPT/AI able to create fake persona in an instant.

1 more reply

bradly3y ago

I believe LeBron James said recently he isn't going to waste his money on a blue checkmark, so it should be interesting to see what stays and what goes.

6 more replies

seydor3y ago

Followed users will still show

But i agree it is a bad idea. The worst actors have money to buy the blue marks of an army of accounts.

This is basically making it easier for authoritarian governments to abuse it

1 more reply

cwkoss3y ago

it's a shame we can no longer short twitter stock

hrpnk3y ago

Aside from the spam PRs, there is actually one PR that fixes a bug: https://github.com/twitter/the-algorithm/pull/242/files

1 more reply

BbzzbB3y ago

It removes the extra weight to Twitter blue tweets?

idle_zealot3y ago

If the property names are to be believed it sets a weight multiplier to 0. So it prevents recommending them entirely.

1 more reply

matsemann3y ago

Me feed has lately been full of accounts that have blocked me. Like, I see a tweet from someone unknown, click their profile and it says I'm blocked.

So wonder if some value is wrong in one of those constants. Anyways, the blocking feature is broken..

super2563y ago

I think it's okay to give Twitter Blue users a boost, as it's most likely not spam (unlike the 95% of my non-blue followers who are bots).

kevincox3y ago

I think it makes sense for out-of-network. However I see no value for boosting among people that I have followed. If I have followed them they definitely aren't spammers (from my PoV).

1 more reply

mlindner3y ago

Drive-by pull requests that break the intention of something aren't ever going to be taken by a maintainer.

willmeyers3y ago

Why do companies even bother to put source up on github? To put up a front that their open source? What a joke.

sroussey3y ago

Yes please! I definitely put my thumbs up in there!

drstewart3y ago

That will definitely do something! Good job!!

jillesvangurp3y ago

The irony is that I prefer Mastodon's sort by time and don't try to be clever approach to this expensive and futile attempt to feed me an endless stream of click bait. I objectively spend more time on Mastodon than on Twitter at this point. It's more engaging for me. It's how Twitter used to work when it was still nice to use.

If Twitter wants to put a stop to the user exodus and save lots of money in the process, here's what they could do:

1) Add an off switch to the for you feed. I'll click it right away and never turn it on again. Stop wasting minutes of CPU time on my behalf. I never asked for it. It doesn't do anything for me that I need or want.

2) Sort by time, filter by hashtag. Twitter used to be about real time information. I don't care about things that happened days or weeks ago. I don't need to see all of it. This is the core feature that made Twitter popular. Mastodon has it and it is absorbing users from Twitter by the millions. It still works. Restore this feature and make it the default.

3) Join the fediverse. That's where a lot of the former hard core users went. They still exist. They still post messages. They still engage with each other. They just don't use Twitter anymore. Allow people to follow mastodon users. Allow mastodon users to follow Twitter users. Not that hard to implement and probably would do wonders for user engagement.

woodduck3y ago

They did add an off switch to the for you feed. There's a tab for following, albeit not sorted by time.

1 more reply

ISL3y ago

As near as I can tell, Mastodon doesn't really have #2 in the list above. Last I heard, the social architecture was hostile to comprehensive indexing of the entire fediverse for search.

That's probably one of the biggest reasons that I have remained on Twitter even after setting up a Mastodon persona.

2 more replies

HellsMaddy3y ago

Interesting:

    // we only keep unfollows in the past 90 days due to the huge size of this dataset,
    // and to prevent permanent "shadow-banning" in the event of accidental unfollows.
    // we treat unfollows as less critical than above 4 negative signals, since it deals more with
    // interest than health typically, which might change over time.
    val unfollows: SCollection[InteractionGraphRawInput] =
      GraphUtil
        .getSocialGraphFeatures(
          readSnapshot(SocialgraphUnfollowsScalaDataset, sc),
          FeatureName.NumUnfollows,
          endTs)
        .filter(_.age < 90)

https://github.com/twitter/the-algorithm/blob/main/src/scala...

dmix3y ago

How long does the NSA record them?

tric3y ago

From https://github.com/twitter/the-algorithm/blob/7f90d0ca342b92...

    (
      "author_is_elon",
      candidate =>
        candidate
          .getOrElse(AuthorIdFeature, None).contains(candidate.getOrElse(DDGStatsElonFeature, 0L))),
    (
      "author_is_power_user",
      candidate =>
        candidate
          .getOrElse(AuthorIdFeature, None)
          .exists(candidate.getOrElse(DDGStatsVitsFeature, Set.empty[Long]).contains)),
    (
      "author_is_democrat",
      candidate =>
        candidate
          .getOrElse(AuthorIdFeature, None)
          .exists(candidate.getOrElse(DDGStatsDemocratsFeature, Set.empty[Long]).contains)),
    (
      "author_is_republican",
      candidate =>
        candidate
          .getOrElse(AuthorIdFeature, None)
          .exists(candidate.getOrElse(DDGStatsRepublicansFeature, Set.empty[Long]).contains)),
    )

CathalMullan3y ago

Only used for metrics, apparently. [0]

  /**
   * These author ID lists are used purely for metrics collection. We track how often we are
   * serving Tweets from these authors and how often their tweets are being impressed by users.
   * This helps us validate in our A/B experimentation platform that we do not ship changes
   * that negatively impacts one group over others.
   */

[0]: https://github.com/twitter/the-algorithm/blob/7f90d0ca342b92...

mochomocha3y ago

... Metrics tracked in AB test. So even if it's not explicitly encoded in the algo (or implicitly through some of the features plugged in), they'll pick the winning cell as long as it doesn't hurt Elon's metrics (I'm just parroting the comment you quoted).

It doesn't have to be in the algorithm for the systems to be tweaked to please Elon vanity metrics.

[I've been running lots of ML AB tests over the years, some in organizations of similar size & complexity as Twitter]

2 more replies

infogulch3y ago

So many unnecessarily cynical takes here. Let's say you were in charge of a large legacy system that some segment of customers complain about it not working for them as well as other segments. How would you know whether their complaints are valid unless you measured it? You have to know first. So measure it.

5 more replies

roughly3y ago

I expect they're tracking the red team/blue team metrics because of the political shitstorm that's been the GOP's assertions they're being silenced by The Algorithm.

1 more reply

sva_3y ago

Ahh, the group of Elons.

I was wondering why I see so many tweets by him, and what his "Group's" impression quote is.

This is actually pretty hilarious.

2 more replies

minimaxir3y ago

The original code is a part of the home-mixer service, which is the "Main service used to construct and serve the Home Timeline."

I suspect the flag corresponds to weights not present in the repo.

hn20173y ago

Per original source, The code that was released today doesn't show the parts that actually alter the scores of Elon and other users. The part of the code referenced below just tracks Elon stats (from what I know). Employees removed most PII before the code was released.

jalapenos3y ago

Correct. It's a binary metric. Did the number go up, yes/no (kept job / not).

jasonhansel3y ago

Interesting which "groups" they care about (e.g. mainstream political parties).

andy_ppp3y ago

But who chooses the users to be metrics…

minimaxir3y ago

Update: Elon was asked about these in a Twitter Space, he says it's not appropriate and will be removed from the codebase.

Additionally, from another Twitter engineer, the Democrat/Republican flags are apparently 10 years old and not important and do not have high feature importance.

sillysaurusx3y ago

Elon seems embarrassed: https://twitter.com/elonmusk/status/1641908130274525187?s=61...

It’ll be interesting to see what gets cut. Maybe just the Elon flag, but maybe others too.

3 more replies

lawn3y ago

He's only upset that people found out about it.

If they remove his artificial boosts, he'll just turn around and shout at his engineers to reinolement it in another way.

lhnz3y ago

I think the decade old comment related to a different part of the code regarding the number of followers you have in relation to the number of accounts you follow. (Everybody on the call wants to remove this: I wonder why they haven't yet.)

1 more reply

doomleika3y ago

I would say it’s who will be removed.

Considering how Twitter is now getting a servance isn’t that bad of an idea TBH

jawns3y ago

The author_is_elon flag doesn't surprise me, but the two political designators are somewhat shocking. I'd sure like to know what changes based on what Twitter knows about your political affiliation.

jandrese3y ago

I thought it was interesting how it explicitly doesn't boost independents. So much of the two-party system is self-reinforcing.

7 more replies

6nf3y ago

So many questions. How are users tagged D or R? Is that a manual process or automated somehow? What is the effect of these tags? Can I find out if my Twitter account is in one of those buckets?

5 more replies

stouset3y ago

I suspect that these are used for metrics tracking rather than being fed back into the recommendation engine. But there's no real way to know for sure given the limited release. These predicates aren't actually used anywhere in the code that's been made available.

jen203y ago

It's not that shocking...

Half the people that got promoted on my timeline were perpetually candidates for elections I couldn't vote in, and they _self-identified_ as Republican or Democrat in their own bios, or via the registration of their candidacy...

This is why I exclusively used to use Twitter in the "people I follow only" mode, and simply shut my account down when they pushed harder on the algorithm.

partiallypro3y ago

Facebook guesses your political affiliation as well, you can even look über your settings to see what they guessed.

krapp3y ago

The repo suggests it's about tracking engagement metrics[0], so Team Red people see more Team Red content and vice versa. Nothing nefarious.

[0]https://github.com/twitter/the-algorithm/blob/7f90d0ca342b92...

5 more replies

JustSomeNobody3y ago

Exactly. I don't see NPA (no party affiliation) anywhere.

1 more reply

sschueller3y ago

Well someone just asked about it in the live spaces[1] Elon is hosting and he said that should not be there. An engineer said afterwards it is just for metrics but then Elon chimed in again and said "we should get rid of it, it should be gone."

[1] https://twitter.com/elonmusk/status/1641880448061120513?s=20

JustSomeNobody3y ago

Doesn't necessarily mean he didn't want it there in the first place. Why else would it be there?

4 more replies

XorNot3y ago

Of course he did because it makes him look bad and he's desperate for praise and attention.

What he wanted was everything that feature provides, without it ever being shown that it's there. But since he refuses to hire PR people and almost certainly came up with this idea in the last few days, no one was paid to hide its existence.

The next story out of Twitter will be the remaining engineers being threatened because Musk can't see his tweet statistics any more.

2 more replies

maccam9123y ago

Just removed https://github.com/twitter/the-algorithm/commit/ec83d01dcaeb...

3 more replies

AdamH121133y ago

I read the code snippet before I saw the link and thought you were joking, but yeah, there really is an author_is_elon flag right there in the main branch.

ibraheemdev3y ago

> But we are deleting this bs. I only learned about it now! Will be gone by tomorrow.

https://twitter.com/elonmusk/status/1641908130274525187?t=5t...

minimaxir3y ago

The full list of model features in that file is interesting.

I am surprised at the number of inherently redundant and colinear features, though. (e.g. has_1_image, has_2_images, has_3_images, has_4_images)

qyph3y ago

Those aren't redundant or collinear though? Maybe you are surprised they didn't encode this as an integer "num_images"? It is fairly common to one hot encode ordinal variables with only a few common/possible values this way.

1 more reply

hn20173y ago

Per Zoe: The code that was released today doesn't show the parts that actually alter the scores of Elon and other users. The part of the code referenced below just tracks Elon stats (from what I know). Employees removed most PII before the code was released.

https://twitter.com/ZoeSchiffer/status/1641902570921943044?s...

GaryNumanVevo3y ago

I wonder who's on the "VIT" (Very Important Tweeter) list?

minimaxir3y ago

There's a few: https://www.rollingstone.com/culture/culture-news/twitter-vi...

wahnfrieden3y ago

People like Ben Shapiro, Glenn Greenwald, @catturd2

1 more reply

tinyhouse3y ago

VITs are verified accounts.

schemescape3y ago

Did they not expect people to notice suspicious code like this?

Or did they leave this in just so they could hold its removal up as an example of listening to the community?

2 more replies

commandlinefan3y ago

At first I thought this post was a joke - and it was actually a pretty good joke. Yikes.

philistine3y ago

Coded as if the only two political parties on the planet were the Rs and the Ds. Shameful.

1 more reply

bdw52043y ago

How hard would it be to replace this entire algorithm with the following pseudocode?

If !user.follows_author(author) then don't show tweet on timeline Else if tweet.timestamp is later than all other tweets show tweet first

This is vastly superior to any other possible recommendation algorithm because users can choose what tweets they see/don't see by whom they follow and everybody has an equal chance to have their tweets seen by their followers. When Twitter moved away from this, it rendered my timeline useless so I started just pulling up people's profiles to read their tweets in order and eventually deleted my (pseudonymous) account that had several thousand followers. Almost nobody was seeing my tweets anyway thanks to this algorithm and deleting the account did not prevent me from browsing accounts I'm interested in.

All Elon needed to do to fix Twitter was to reverse all of the bad changes they've made since 2015 or so and restore the platform to what it was in the late 00s/early 10s.

1 more reply

nelox3y ago

It’s April Fools’ Day where I live.

1 more reply

iaseiadit3y ago

If you buy a company for $44B and take it private, I for one say you should get your own flag.

1 more reply

pessimizer3y ago

If anybody is actually reading this thread, it looks like twitter is using "author_is_democrat" and "author_is_republican" to evaluate "Community Notes."

As with all of the media outlets that elevate these two private clubs into the arbiters of truth, votes for Community Notes have to be relatively balanced between the two parties. Bipartisanship is a trash metric for determining truth, but absolutely none of the people raging at Musk in this thread would disagree with it.

steele3y ago

Occam's Razors: engineers worried about their jobs (and potentially residency) appease a volatile narcissist as fast as possible.

cbeach3y ago

The “author_is_elon” flag may have been assigned to him because Elon’s Twitter account has the most followers on the platform.

So, for technical / performance reasons, changes to the algos might want to be benchmarked against this account in particular, because it’s the account most likely to be at the centre of capacity- / load-related issues.

1 more reply

culi3y ago

what is vits?

  private val DarkRequestAnnotation = "clnt/has_dark_request"
  private val Democrats = "democrats"
  private val Republicans = "republicans"
  private val Elon = "elon"
  private val Vits = "vits"

5 more replies

jdnordy3y ago

github link now show a warning at the top of the page:

> This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Is this new? Perhaps Twitter already removed the code from their main branch? Or was this just a joke from the beginning?

1 more reply

nothngssimpl3y ago

„This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.“

Isn’t this top of the page disclaimer relevant for you? It seems not to be part of the main branch.

adharmad3y ago

Also this: https://github.com/twitter/the-algorithm/pull/160

1 more reply

gigglesupstairs3y ago

Wait, am I missing something here or author name is cleary mentioned as elon here while musk’s twitter id it @elonmusk? Why is everyone assuming this code is about elon?

1 more reply

RoyGBivCap3y ago

Elon just said in the space "that shouldn't be there. Consider it gone"

1 more reply

emehrkay3y ago

why arent these strings constants?

afrcnc3y ago

As a European I find this very offensive

bikeformind3y ago

I opened this thread just to verify this would be top comment, good job hn

koolba3y ago

It's reassuring to know that billion dollar tech companies write CI exactly like I do:

https://github.com/twitter/the-algorithm/blob/main/ci/ci.sh

Permalink: https://github.com/twitter/the-algorithm/blob/7f90d0ca342b92...

gspencley3y ago

It has been my personal experience, over 25 years in the industry, that often times the bigger the company the worse the code.

It's not an absolute rule, I've certainly inherited projects in a consulting capacity that were written by small teams and were atrocious. But more often than not, a small team working for a small company has fewer of the internal "forces" that incur "technical debt."

Those forces are things like

- Silo'd teams working on a common code base in parallel but never talking to each other, thus duplicating code and having wildly different conventions

- Layers of middle management each with different management styles, leading to inconsistency and product-wide short-cuts

- Dealing with sudden success-induced scalability disasters that result in bandaid solutions

- More employee churn which means that the way we did things yesterday is not the way we're doing things today because someone new is in charge ... more inconsistency in code and software decisions

- More "old code." Companies very rarely do rewrites and when they do they're often failures. So the bigger the company, the more "legacy" spaghetti code typically because you don't fix what isn't broken (especially when the entire system is broken because it's one big giant mess that no one understands and yet somehow it actually works ... as long as we don't breathe on it or get a sudden surge of new account sign-ups).

1 more reply

dblitt3y ago

It wouldn't surprise me if they had a script referencing internal build infrastructure that got gutted in the open source release

xmcqdpt23y ago

That's definitely what this is. Not a twitter employee but probably all internal projects have a ci.sh that runs on their internal CI infra and they just didn't feel like going through open source review for it.

agilob3y ago

It's called Volkswagen CI

2 more replies

quickthrower23y ago

Joking aside, and assuming it got banked out for security reasons, there is something nice about having CI be a single shell script rather than the proprietary yaml format of your favourite CI provider.

1 more reply

eyelidlessness3y ago

Well for such a flawless CI setup I would expect a much longer commit history of “trying it again…” and “descending into madness…” and “might as well summon the eldritch…” and “oh no what have I done, …” and “there this should probably work, nothing unspeakable to see here!” and “oh no not again…”

1 more reply

toxik3y ago

I can code golf that script

    #!/bin/true

Bam.

2 more replies

calvinmorrison3y ago

Maybe it's weird, but for all the work I have ever done, I have never used CI/CD in the way that it was meant to be used, or never really leveraged it. Maybe all of my past jobs were unprofessional, but like, I see a lot of jobs using "CI/CD experience required" and I think... huh I wonder if they actually do it

3 more replies

sekai3y ago

Same mortals as us

jmull3y ago

Considering it statistically, likely on a lower plane.

sillysaurusx3y ago

Say what you will about Elon, but this wouldn't have happened without him. Thanks!

And thank you to everyone at Twitter who helped organize this release. Open sourcing something like this is no small effort.

anigbrowl3y ago

I am not sure about that. Twitter has open sourced a lot of stuff in the past. There were certainly people there who would run the site as a nonprofit public service if they had the choice.

sangnoir3y ago

Twitter contributed a lot to Map-Reduce, ETL and Scala communities: IMO they punched above thier weight.

Sadly, I think their best open-source contribution days are behind them with all the hardcore engineering they now have to do with fewer engineers.

Edit: I forgot about Bootstrap! That projects saved the world from millions of ugly web apps and dashboards built by clueless backend engineers.

1 more reply

drexlspivey3y ago

Twitter had 18 years to publish their algorithm under the previous management and they didn’t.

1 more reply

BryantD3y ago

This release seems less immediately valuable than their other contributions, but historically more significant. It’s a pity we don’t have commits although that would be a huge privacy issue.

But yeah, while I would never work for Elon I’m glad he did this.

nonethewiser3y ago

Well, they didn’t

robopsychology3y ago

But they haven't open sourced their recommendation engine in the past

zachnwhite3y ago

sillysaurusx3y ago

I'd be nothing without Twitter. It's had more impact on my life than any other platform. I got lucky, but luck was only part of it.

Being able to DM people is incredible. It's the AOL Messenger of 2023. If it went offline, it'd be a terrible loss.

2 more replies

hutzlibu3y ago

Politicans and companies all over the world are using it. Controlling that information space, is real power. And I am not yet clear, how much that release will bring needed transparency. As the algorithm in production, can have major tweaks.

kayodelycaon3y ago

The importance of Twitter was it being the primary posting location for a lot of things. A number of the artists and other creatives I know have gotten absolutely gutted by this.

2 more replies

hutzlibu3y ago

Judging by the many "issues" already, it might have been a bad idea to release on a friday, though.

dawnerd3y ago

This is 100% not their working copy.

1 more reply

bitshiftfaced3y ago

I'm not connecting the dots. Why is it bad to release on Friday?

2 more replies

tech234a3y ago

I wonder what the "author_is_elon", "author_is_power_user", "author_is_democrat", and "author_is_republican" labels are for [1].

[1]: https://github.com/twitter/the-algorithm/blob/main/home-mixe...

montag3y ago

Elon is addressing this in the Twitter Space right now. "It definitely shouldn't be dividing people into Republican and Democrats; that makes no sense[...] you've identified something we should be getting rid of right away."

mseepgood3y ago

Does it make sense to divide people into Elon and not Elon?

5 more replies

fnimick3y ago

It's for content analytics, and I assume it's to make sure that changes to the platform can't be argued to bias one party over another.

4 more replies

Imnimo3y ago

As if Elon has a clue what that feature is or is not being used for.

1 more reply

franky473y ago

We should also not divide people into Elon and non-Elon.

Calzifer3y ago

Well, sounds like this pull request doesn't get merged. https://github.com/twitter/the-algorithm/pull/234

asddubs3y ago

but how can we be sure it isn't doing that?! first, we would need to figure out a way to identify who-

overthrow3y ago

I'd love to see the exact date author_is_elon was added. Too bad they didn't publish the commit history

qbasic_forever3y ago

IIRC it was very recent, there was a Twitter engineer that was fired after explaining to Elon that the algorithm was not biased against him: https://www.salon.com/2023/02/10/petulant-elon-musk-fired-tw... Almost certainly after that event Elon had them explicitly bump his tweets in their reach.

2 more replies

nijave3y ago

I think around mid February https://www.theverge.com/2023/2/14/23600358/elon-musk-tweets...

Someone12343y ago

Here is a screenshot in case this changes later:

https://i.imgur.com/F8GSeyH.png

And, no, this wasn't in a merge-request, it was in the "main" branch of HomeTweetTypePredicates.scala.

tantalor3y ago

What's all the "DDG"? Is this data from DuckDuckGo?

5 more replies

jaywalk3y ago

  \*
  \* These author ID lists are used purely for metrics collection. We track how often we are
  \* serving Tweets from these authors and how often their tweets are being impressed by users.
  \* This helps us validate in our A/B experimentation platform that we do not ship changes
  \* that negatively impacts one group over others.
  \*

From: https://github.com/twitter/the-algorithm/blob/7f90d0ca342b92...

gregw1343y ago

So now engineers working on the algo can ensure their launches won't lower Elon's tweet visibility. Looks like those remaining at Twitter have a knack for corporate survival.

4 more replies

afavour3y ago

Still smells to high heaven to me. Not the Elon part, I don't really care about that. But collecting metrics about "republican" vs "democrat" sounds like a particularly bad set of priorities at work.

6 more replies

tech234a3y ago

That makes sense; I guess that means Elon is considered a "group" now.

1 more reply

darth_avocado3y ago

I would not be surprised if “author_is_elon” was added after he bought the company and worked the engineers too hard to figure out why his tweets don’t have a lot of engagement.

1 more reply

bilekas3y ago

All in service of 'anti-bias' of course... /s

sekai3y ago

Haha, that's pretty funny, of course that's a thing

tgv3y ago

Wait ... that was not a joke? And they actually removed it from the repo about 4 hours later? That doesn't look good.

spaceman_20203y ago

pretty sure Elon gets a boost in the algorithm. All okay - he's the owner of a private entity and can do as he pleases.

1 more reply

summarity3y ago

Main repos:

- https://github.com/twitter/the-algorithm

- https://github.com/twitter/the-algorithm-ml

Blogs:

- Eng: https://blog.twitter.com/engineering/en_us/topics/open-sourc...

- Biz: https://blog.twitter.com/en_us/topics/company/2023/a-new-era...

jmeister3y ago

Twitter spaces live right now: https://twitter.com/i/spaces/1jMJgLdenVjxL

rogerallen3y ago

"Today, the For You timeline consists of 50% In-Network Tweets and 50% Out-of-Network Tweets on average, though this may vary from user to user."

I have spent significant effort creating a network and there you go choosing to ignore my efforts by putting in 50% of crap-I-don't-want-to-see.

That is why I despise your algorithm.

bluetidepro3y ago

> "Today, the For You timeline consists of 50% In-Network Tweets and 50% Out-of-Network Tweets on average, though this may vary from user to user." I have spent significant effort creating a network and there you go choosing to ignore my efforts by putting in 50% of crap-I-don't-want-to-see. That is why I despise your algorithm.

This is just one feed (the "For You" recommendations feed), they also have the "following" feed tab next to it that is 100% your network (want you want), and it remembers your selection when you change between them (they fixed that a few months ago), so really this is kind of a pointless thing to despise for that reason. It's just an option you can 100% avoid if you don't want to see it.

In fact, Twitter is probably one of the only few left in the large social media space that actually gives you an 100% following network feed (minus maybe ads) in chronological order that REMEMBERS your selection (Facebook, Instagram, and TikTok don't). Which makes this even more silly to say. Facebook, Instagram, and TikTok do all have in-network exclusive chronological order feeds, BUT they are extremely hard to find, or don't remember your selection to them.

Hate of Twitter is easy to spoon out, but at least complain about things that aren't already solved for you.

vagabund3y ago

I'm not on twitter enough for the chronological feed to be appealing to me, and instead want to see the notable tweets from the accounts I follow since the time I last visited. There's no straightforward way to achieve this, but if anyone else has this preference, the workaround is to create a twitter list with all the accounts you follow and set it to show top tweets first.

Sebguer3y ago

If you try to use the Following tab on Android, every refresh brings you back to the For You tab.

2 more replies

rogerallen3y ago

I said I despised the algorithm, I did not say I hated Twitter. Now I at least know why I hate it.

Yes "Following" is what I use. The reason I use it is because of this algorithm that thinks I could possibly want 50% tweets that make me "engaged^H^H^H^H^Hraged". To me, that is a ridiculous mixture.

I'm happy they have a "Following" and I sure hope they keep it, but I will not be surprised if it goes away.

corbulo3y ago

I'm confused, then why not just use your 'followed' feed instead of 'for you'?

dbbk3y ago

Because the Followed feed is purely chronological. An interesting tweet from someone I follow could have happened 3 hours before I opened the app, and I would miss it.

This is why I preferred the old For You tab - it was (mostly) the people I had chosen to follow, but meant that I had the best content show up whatever time of day I opened the app. This is particularly important when I'm in the UK and most of the people I follow are in the US, so they're not tweeting generally at the same time I'm on the app.

glenstein3y ago

I'm also confused. You can still see everything you've manually curated.

bakugo3y ago

On the android app at least, a recent update made it so pressing the home button while you're in the "followed" tab switches to the "for you" tab. It's extremely annoying

1 more reply

JustSomeNobody3y ago

"Control Panel for Twitter" plugin. You can get rid of "For You".

anderspitman3y ago

I'm not opposed to social media feeds having complex recommendation algorithms. I just wish they allowed you to opt in to a reverse chronological feed of only people you follow, like RSS.

infogulch3y ago

Twitter has this now. The home page is split into two tabs: "For you", the algorithmic feed, and "Following", the reverse chronological feed of just who you follow.

madeofpalk3y ago

Twitter has always had "chronological timeline" behind a confusing "sparkle" button (except for a brief period a few months back where they removed it, or always defaulted to back to algo timeline? and then restored it a week later)

They called it "Latest Tweets" https://web.archive.org/web/20200205092104/https://help.twit...

3 more replies

LiquidPolymer3y ago

On my “following” tab (on the phone app) , I’m still getting recommendations for bomb throwers I don’t follow. Am I weird? It’s like an unhinged relative. Not pleasant.

Edit: I reversed “for you” and “following” in my original reply.

1 more reply

conradfr3y ago

Is really "Following" the entire chronological feed? I feel I miss tweets from people I follow that actually appears in the "For You" tab.

1 more reply

spike0213y ago

It doesn't always stay on whatever you last used, though. I mostly use Following but it always inevitably ends up back on "For you".

BbzzbB3y ago

It always had it.

Edit: Why am I downvoted? It literally did, it even was named as you'd expect it ("sort by latest" or something), tho the location was less obvious as it was under the stars icon above the feed.

corbulo3y ago

It's disappointing the comments are so obsessed with the political angle to this that there's a total lack of appreciation (or discussion) of opening up the most influential social media platform in the world.

smt883y ago

This is transparency theatre, not actual transparency.

There's no way to actually use this limited release to understand how or why any tweet is boosted, so we're in exactly the same boat we were in yesterday.

4 more replies

nonethewiser3y ago

The funny thing is that angle owes itself to Elon coming through on his promise to open source this.

This is a great thing.

1 more reply

lawgimenez3y ago

Just read the article and not the comments. Comments here used to be something you learn new stuffs, apparently that is not the case anymore.

yurodivuie3y ago

I'm sure we can all think of examples where a power structure (a company, a country, a prison, a family) invited people in for a supervised tour that was less than honest in its presentation.

But really, if people respond to Twitter's actions politically, that response exists within a context that was certainly influenced by Twitter's prior actions.

fanagra323y ago

"Opening up"? You must be kidding. Nothing is open there. It's just open-washing. A few nice diagrams, but how the services _actually_ work is still hidden.

2 more replies

mlindner3y ago

I had to go to the second page to find this. I completely agree. I clicked this thread looking forward to seeing some intelligent discussion of the merits of the source and issues/interesting tidbits, instead I see a bunch of Elon Musk ranting and complaining and pointing out drive-by poor pull requests issues. I really wish the adults would start to talk.

1 more reply

SilverBirch3y ago

One of the things that makes my spidey sense tingle is when people say oddly sycophantic things about Elon Musk. Twitter is big, it's important. It's not "the most influential social media platform in the world".

1 more reply

HeckFeck3y ago

I have to admit I am geeking out whilst skimming source code I barely understand.

KyleBerezin3y ago

I would love for the system to be somehow auditable, to verify this algorithm is THE algorithm.

DuckFeathers3y ago

Attacks on Elon has been growing since he started calling out corruption.

danso3y ago

> Twitter has several Candidate Sources that we use to retrieve recent and relevant Tweets for a user. For each request, we attempt to extract the best 1500 Tweets from a pool of hundreds of millions through these sources. We find candidates from people you follow (In-Network) and from people you don’t follow (Out-of-Network).

> Today, the For You timeline consists of 50% In-Network Tweets and 50% Out-of-Network Tweets on average, though this may vary from user to user.

It would’ve been interesting to see what changes were made since Musk’s takeover. As someone who followed 5,000+ users, I know I never saw a tweet that wasn’t either from nor retweeted by someone I followed — e.g. I never saw those “[user you follow] liked [someone you don’t follow] tweet”

50%/50% in FYP seems to reflect my experience today — which is much worse, to the point that I’ll regularly switch to viewing by List b/c I miss seeing people who I want to read.

I wonder how much testing and analysis went into deciding on the 50/50 ratio — e.g. how does it impact user engagement and behavior. Because it sounds like an easy round value that you’d land on when thinking “users should be pushed out of their bubbles”

coldcode3y ago

A year ago my account with 5700 followers got an average of 3000 impressions per post (art). Today it's only 200-500. It mentions their fanout system was replaced by something new, not sure when or if thats in the drop, but my impression count dropped around April-May last year. Clearly something decided my posts should not shown to my followers very often.

1 more reply

cubefox3y ago

Perhaps if you did follow so many people they got drowned out, but with substantially fewer following, those recommended tweets were a big part of what I saw. Especially in the last year or so before Musk took over: Twitter went a lot more aggressive and didn't just show tweets which people you follow "liked", but also other tweets, which the algorithm somehow determined you might like, which was often wrong, and, moreover, so frequent that it made a big portion of the timeline. The "following" tab fixed this problem.

danso3y ago

Yep, having had created a few throwaway accounts I definitely got a sense of how the algorithm compensated for the majority of users who aren't super active. And it makes sense -- most new users aren't going to want to spend account creation picking 50 accounts to follow.

But if someone has hit the follow button 1,000+ times, it's reasonable to have some faith that they've seen a lot of tweets and know what they want. Showing a few out-of-network tweets seems reasonable (I got enough as it is through followings' retweets). But 50% of a feed that already can't fit tweets from thousands of followings just feels like shit.

The worst part is that the share of in-network tweets seems to be highly concentrated to the last 10 or so people I most recently interacted with, e.g. seeing the same user over and over just because I liked one of their tweets the other day. Which makes sense to save on computation costs, but it's pushed me into a much tighter bubble than I ever had when the timeline wasn't so out-of-network focused.

1 more reply

roddylindsay3y ago

  For ranking the candidates these predictions are combined into a score by 
  weighting them:
  
  "recap.engagement.is_favorited": 0.5 
  "recap.engagement.is_good_clicked_convo_desc_favorited_or_replied": 11* (the 
  maximum prediction from these two "good click" features is used and weighted by 
  11, the other prediction is ignored). 
  "recap.engagement.is_good_clicked_convo_desc_v2": 11* 
  "recap.engagement.is_negative_feedback_v2": -74 
  "recap.engagement.is_profile_clicked_and_profile_engaged": 12 
  "recap.engagement.is_replied": 27 
  "recap.engagement.is_replied_reply_engaged_by_author": 75 
  "recap.engagement.is_report_tweet_clicked": -369 
  "recap.engagement.is_retweeted": 1 "recap.engagement.is_video_playback_50": 0.005

Who set those weights, and why were they chosen?

bobbygoodlatte3y ago

"recap.engagement.is_replied": 27 "recap.engagement.is_replied_reply_engaged_by_author": 75

I wonder if this is why threads rank so obnoxiously high. They get artificially boosted by the author replying to their own tweet

localplume3y ago

isn't that the author replying to a reply on their tweet? so its promoting positive discussion, hence pushing the engagement higher?

Mehdi22773y ago

Having worked at similar companies on similar systems usually A/B experiments and smaller probability of an action bigger weight it must have to matter much overall. The constants are generally done through some ab tests to get them into reasonable overall behavior but they are a pain to tune and very unlikely optimal in any real sense as it’s often too difficult to do extensive search of them. Like often I’ll see new target have a couple different weights tried on an ab and then maybe second set of experiments after rough magnitude is determined.

dmak3y ago

Could you link to the code on github?

d_sc3y ago

I think they have a bug here here: https://github.com/twitter/the-algorithm/blob/7f90d0ca342b92...

Code: ( "has_gte_10k_favs", _.getOrElse(EarlybirdFeature, None).exists(_.favCountV2.exists(_ >= 1000))),

Should be: ( "has_gte_10k_favs", _.getOrElse(EarlybirdFeature, None).exists(_.favCountV2.exists(_ >= 10000))),

dmak3y ago

They might be trying to preserve the previous tag label.

ryzvonusef3y ago

https://twitter.com/jarokrolewski/status/1641892148084629504

    > the main neural network part of @Twitter recsys algo is based on 2021 work of #SinaWeibo - Chinese clone of Twitter

interesting claim

ryzvonusef3y ago

Some more strange quirks:

https://twitter.com/Ben_Cary_/status/1641893540614623258

    > Twitter use to rank posts higher for  those who had more followers/less people they follow

    > They are removing that as of today but kinda interesting that someone with 10k/10k followers would get less reach than if they had 10k followers and only followed 6k

2 more replies

ryzvonusef3y ago

Some summaries I found online:

https://twitter.com/modern_mindset/status/164207843202770534...

    > Twitter algo is finally opensource.
    > • Twitter Blue 2x boosts 
    > • Likes have 30x comment value
    > • Links/mentions/names deboosts
    > • Retweets have 20x comment value
    > • Restrictions/suspensions deboost
    > • Images/videos/trending topics 2x boost

    > Will write a thread about it later. GM

https://twitter.com/petergyang/status/1642004729390858241

    > Twitter algo 101

    > Boosts
    > - Likes 30x
    > - Retweets 20x
    > - Twitter Blue 2-4x
    > - Trusted circle 3x
    > - Images/videos 2x
    > - Replies 1x

    > Negatives
    > - URL only
    > - No text
    > - Mute
    > - Block
    > - Unfollow
    > - Report

1 more reply

ryzvonusef3y ago

https://twitter.com/Sandeeparuchuri/status/16419015979860172...

    > Part of twitter's algo Jack Dorsey, Katy Perry, Stephen Curry and Barack Obama as “testing accounts” for getting random Tweets for testing

varjag3y ago

Rank each Tweet using a machine learning model.

This does a lot of heavy lifting here.

thieving_magpie3y ago

There appears to be a repo for the-algorithm-ml: https://github.com/twitter/the-algorithm-ml

simonsarris3y ago

This is pretty limited. I picked a term used in the diagram to see what I could find out about it. But there seems to be next to nothing in the released code about the mentioned "author diversity". No real code or description.

mardifoufs3y ago

I think the relevant part of the code is in this other repo:

https://github.com/twitter/the-algorithm

Not sure if it has what you were looking for (and maybe you already checked this repo, too!), but it's more relevant than the linked repo imo

crop_rotation3y ago

Wouldn't any such system depend on 10 other internal systems, 20 databases directly or indirectly, each affecting the behaviour of the recommendation engine. That makes me doubtful studying such a recommendation engine is any better than a purely academic exercise.

justrealist3y ago

Having anything public at all is wildly better than the nothing that is standard among social media companies.

Let's not focus criticism on an attempt to do something.

softfalcon3y ago

You’re probably right, but analyzing such things could still be useful for research.

I know that open source code around commenting online directly impacted the direction my current team went building our community tooling.

I’ll take even a glimpse into the machinations of any social media giant. It’s better than nothing!

sithlord3y ago

thats why its "the algorithm" not the source of data/truth

jonkneeOP3y ago

projects/home/recap/FEATURES.md has some interesting stuff:

https://github.com/twitter/the-algorithm-ml/blob/main/projec...

In realgraph you can see some of the things they keep track of, which include what you have in your address book, total time spent "dwelling" and a few other interesting nuggets.

motohagiography3y ago

While I would never install a platform app because I know what kinds of privacy controls some platforms have - seizing a graph of your phone, sms and email contacts (realgraph) to weight engagement is pretty egregious.

The minority of people who understood what this was already worked for platform companies and wanted to again, and the few who didn't but also knew how invasive this was could always be discredited as conspiracy theorists.

Ever wonder who else gets those graphs from platform companies? Today this is all interesting, but a couple of weeks from now when this all sinks in, I wouldn't be surprised if I were mad as hell.

paxys3y ago

Since this is what most people are going to want to see:

> We also took additional steps to ensure that user safety and privacy would be protected, including our decision not to release training data or model weights associated with the Twitter algorithm at this point.

rvz3y ago

So 12 days later, this [0] is a 'broken promise' isn't it?

[0] https://news.ycombinator.com/item?id=35214063

paxys3y ago

While open sourcing code is always great, and kudos on them for doing so, let's be real most people didn't care about the internal plumbing of how their recommendation system runs. It's going to be a mess of decades old code, microservices and ML pipelines just like one would expect. If you want to dig deeper to check for biases (the reason they claimed to be open sourcing it in the first place), you will however run into:

which is a shame.

etc_passwd3y ago

Democrats / Republicans looks like it was added outside of SDLC [1]. This order without those features is sorted, likely by a linter, suggesting Elon and Vits are properly implemented, and Democrats/Republicans was just inserted alongside the Elon feature, perhaps just for this extract. Sorting it now results in a different order than the commit.

[1]: https://github.com/twitter/the-algorithm/blob/7f90d0ca342b92...

lenzm3y ago

Or Elon was the addition, the other 3 are in alpha order.

tric3y ago

GitHub repo: https://github.com/twitter/the-algorithm/

minimaxir3y ago

Notably, it's AGPL-licensed.

1 more reply

hooverd3y ago

I wonder how useful this is without the knowledge and tooling around deploying it.

rurp3y ago

That's my thought as well. Complicated system like this rely on all sorts of related services and data stores. This seems like the sort of thing that sounds a lot more interesting than it is in practice. I would bet many non-technical people expect "The Algorithm" to be a straightforward and self-contained system.

jongjong3y ago

WTF is AuthorIsEligibleForConnectBoostFeature? I guess this may explain why some people seem to accumulate a lot of followers very quickly while all those trying to grow organically seem to struggle. You can imagine if a lot of people benefit from this Connect Boost feature, it would make it impossible for others to be noticed through the noise created by all of these boosted individuals. That's essentially what Twitter feels like ATM. Recently, I manually unfollowed anyone who I suspect may have received a special boost from the algorithms.

Me10003y ago

Squashing the commit history before releasing it was an interesting (and completely predictable) decision.

jkubicek3y ago

It doesn't seem particularly interesting? I would never make a formerly private repo public without first erasing the history. There's no upside to showing everyone your work in progress and almost unlimited downsides.

tapland3y ago

There’s no way everyone had the same weight in all the recommendation config files.

It’s not about hiding old work, but changes just before making it public.

mrguyorama3y ago

If they allowed you to git-blame the algorithm, some poor coder would have definitely gotten murdered by a crazy person who thought they purposely changed something to hurt them

hk__23y ago

> Squashing the commit history before releasing it was an interesting (and completely predictable) decision.

This is standard practice when it comes to open-sourcing such repos that were closed-source for years.

evantahler3y ago

So uh... they use BigQuery and here's the dataset https://github.com/twitter/the-algorithm/blob/main/ann/src/m...

rco87863y ago

So as expected, there is exactly nothing that favors posters from one side of the political spectrum. I don't expect that this article will do anything to calm down those who are convinced otherwise though.

Well written article, from an engineer's perspective.

vore3y ago

Well, it does say this:

   Ranking is achieved with a ~48M parameter neural network that is continuously trained on Tweet interactions to optimize for positive engagement (e.g. Likes, Retweets, and Replies). This ranking mechanism takes into account thousands of features and outputs ten labels to give each Tweet a score, where each label represents the probability of an engagement. We rank the Tweets from these scores.

This is basically the ultimate black box, so I don't think you can really conclude anything like this either way.

bombcar3y ago

More like the ultimate hug box generator, that will quickly partition you into a self-reinforcing bucket.

waynenilsen3y ago

Shadowbanning was real and widely applied. That is the human part of the algorithm (manual mode) and it was very politically skewed

IngvarLynn3y ago

Algorithm exists and is non-trivial, therefore it favors those groups of posters that spend more effort to hack it.

Egoist3y ago

Aaaand the issues turned into a shitpost

HeckFeck3y ago

In fairness they could save some RAM by rewriting it in Rust 6 or 7 times.

quotemstr3y ago

Typically, we expect to be able to run "open source" software ourselves. If you open-source your C compiler, I can compile a C program with it. In a few recent high-profile cases though, companies have "open sourced" ML systems without releasing the model weights. This practice is just like your releasing the builds scripts for your C compiler, but not the compiler itself. While more transparency from social media will be enlightening, calling a release like this (or LLaMA) "open source" feels like equivocation. I'd love to see more full releases, weights included.

vonmoltke3y ago

Running this code would require a lot more than just the exported models. There are a large number of code and system dependencies missing.

quotemstr3y ago

Of course --- but without the model parameters, even stubbing those systems would be useless. My point is that while this release gives the public some information about how Twitter ranks tweets, it doesn't tell the story because huge pieces of "the algorithm" are missing. For example: the NSFW classifier "open source" release doesn't tell us anything about what Twitter considers NSFW and what it doesn't.

robopsychology3y ago

Why are there two spaces instead of four in this Python code, it hurts my soul

anigbrowl3y ago

Cost saving measure. This sort of emotionalism is why engineers need to kept out of the C-suite.

robopsychology3y ago

How is it a cost saving measure? Or are you being sarcastic? Hard to tell over text!

1 more reply

SpEd3Y3y ago

Not sure if you're being sarcastic, but if you're serious, I'm pretty sure the OP is talking metaphorically. It's just a slight annoyance he's not "emotional" about it.

I also fail to see how someone who is annoyed by code that doesn't follow well established standards is somehow not a good fit in the C-suite.

brucethemoose23y ago

Space bloat.

1 more reply

holler3y ago

I guess they haven't read https://peps.python.org/pep-0008/#indentation

"Use 4 spaces per indentation level."

xdennis3y ago

Seems random. This file has both 2 and 4: https://github.com/twitter/the-algorithm/blob/main/trust_and...

paulddraper3y ago

I assume they copied it from Google.

https://www.quora.com/Why-does-Google-use-2-spaces-for-Pytho...

tayo423y ago

i think back in the day they copied googles python style guide

Laaas3y ago

Praise where praise is due. Wasn't completely sure whether they would in fact release it or keep posturing.

sroussey3y ago

Does it show the part where is recommends Elon more than anyone else?

Chinjut3y ago

Perhaps that's related to this line. Though perhaps this is just used for observing metrics. https://github.com/twitter/the-algorithm/blob/7f90d0ca342b92...

jrwr3y ago

That whole list is a hoot,

has_toxicity_score_above_threshold

is a interesting value, I wonder were the 0.91 was though up at

devrand3y ago

I couldn't find anything specific to that, but I did find thus blurb where they seem to explicitly track how often they're serving Elon's tweets for A/B testing experiments: https://github.com/twitter/the-algorithm/blob/7f90d0ca342b92...

ano-ther3y ago

This is one: https://github.com/twitter/the-algorithm/issues/121

Search for Elon gives this: https://github.com/twitter/the-algorithm/search?q=Elon&type=

jmholla3y ago

I think this PR is modifying the inputs to the methods that do it: https://github.com/twitter/the-algorithm/pull/17

dang3y ago

Url changed from https://github.com/twitter/the-algorithm-ml, which points to this.

abalaji3y ago

huh, legit open source too with 'Affero-GPL'

madeofpalk3y ago

AGPL is probably useless for any other site who'll want to use it, as it would require them to open source their site that uses it.

joeyh3y ago

Mastodon is conveniently also AGPL...

3 more replies

sho_hn3y ago

My main questions: Will these repositories be used in production by Twitter? Is this now the mainline, not a semi-regularly-synced mirror?

agluszak3y ago

Of course not

cubefox3y ago

Musk said that releasing the algorithm will initially be embarrassing, but that they will quickly update it. So it seems that means they intend to at least regularly publish newer versions.

Of course it could also be that they change their mind when spammers abuse the openness.

junto3y ago

Did anyone else notice this below? I can’t even begin to imagine how many CPU’s that would require and what the cost must be… just for a recommendation engine.

> The pipeline above runs approximately 5 billion times per day and completes in under 1.5 seconds on average. A single pipeline execution requires 220 seconds of CPU time, nearly 150x the latency you perceive on the app.

mkj3y ago

5e9 * 220 / 3600 / 24 implies they are using 12 million cpu cores continuously? That seems nearly implausible, but perhaps it's true?

1 more reply

froggychairs3y ago

Why is nobody pointing out that this is likely an April Fools joke? We just deployed our April Fools joke into production today too.

nabakin3y ago

I fell for it too until a friend pointed it out. I wonder why it's working so well

Edit: hi friend

froggychairs3y ago

Lmao

endorphine3y ago

Yeah this confused me a lot while reading the comments here. I wonder what percentage of the comments are trolling vs. fell for it vs. think it's legit.

Perhaps this calls for an HN poll...

1 more reply

thumbsup-_-3y ago

The barebones ReadMe makes me feel this repository was open-sourced against the wish of engineers and with a top down directive

firstSpeaker3y ago

How so? More details and reasoning?

1 more reply

matesz3y ago

It is really nice to see how bazel is used in the wild. It looks so clean. Why we are not using it for everything?

mort963y ago

I wouldn't want to use a build system written in Java for non-Java code. Adding the whole JVM as a dependency just for the build system isn't worth it,

1 more reply

ryanisnan3y ago

I want to go back to a world where there isn't an algorithm feeding me what someone "thinks" I want to read.

I want to see a chronological list of things sources I follow have posted.

Yes, I understand you can do this on Twitter still, but I would guess most people are more influenced by "the algorithm".

Weidenwalker3y ago

I visualized this codebase here: https://codeatlas.dev/github/codeatlasHQ/the-algorithm/main

Maybe this is helpful to anyone for navigating what's in there!

stusmall3y ago

I thought it was an april fools joke when I saw this: https://github.com/twitter/the-algorithm/blob/main/ci/ci.sh

Like a dig at the code quality.

endorphine3y ago

Is it not?

bagels3y ago

"Written by the Twitter Team"

I found it interesting that there is no attribution. Most other companies list the authors on engineering blogs (eg. Facebook, Uber, etc.)

This topic seems to draw the attention of unhinged people, so I suppose I wouldn't want my name on it either.

f38zf5vdt3y ago

No one wants to go to jail for Elon, who has been flagrantly violating FTC orders.[1] There's a good chance the commit history and authors may attest to that.

https://thehill.com/policy/technology/3928219-musk-was-denie...

AlbertCory3y ago

I haven't read the "algorithm" and this observation might be seriously out of date, but:

for Google Ads, you couldn't easily know what ads would be shown for a given query, without a whole lot of data that's not contained in any code: the experiment settings in the server, for one thing. And the user who's doing the query, for another.

An "experiment" could apply to 100% of the traffic, so it's not really an experiment anymore. And even if you think X has been put into production, there is still a "holdback" experiment, where some part of the traffic does not get X applied to it.

NicoJuicy3y ago

Are they measuring getting more republican posts? Because I'm getting a ton of those, which i constantly need to mute and ban ( mostly dumb remarks).

And i don't even live in the US.

It would explain why they are tracking it, to increase visibility.

cmckn3y ago

Including the search engine itself in “the algorithm” repo is an interesting choice. Obviously it’s a major player in what gets returned to clients, but the details of that infrastructure aren’t really relevant and is a notable portion of their secret sauce.

https://github.com/twitter/the-algorithm/tree/main/src/java/...

rblion3y ago

First thing I would like to see gone is business bros sharing 'guides' after you follow them, threatening to start charging real soon. Go fuck yourself, get a real job.

oulu20063y ago

<tounge-in-cheek> didn't twitter already opensource their code?

https://www.databreachtoday.com/twitter-says-source-code-lea...

vonwoodson3y ago

Folks talk about media bias: Twitter popularity is a media bias. It’s the most lazy journalism to be able to write a “news” article about what Kim, or Don, or Elon’s PR team tweeted. But, as far as “social” this media is: Twitter is a one-way street. There’s no one actually responding or interacting with Tweets. It’s just a comment section to flame bait.

Maybe we’ll all get lucky and Elon will cause Twitter to go away forever.

amq3y ago

Surprised no one mentioned this:

    s.SpaceSafetyLabelType.MedicalMisinfo -> MedicalMisinfo,
    s.SpaceSafetyLabelType.GenericMisinfo -> GenericMisinfo,
    s.SpaceSafetyLabelType.DmcaWithheld -> DmcaWithheld,
    s.SpaceSafetyLabelType.HatefulHighRecall -> HatefulHighRecall,
    ...
    s.SpaceSafetyLabelType.UkraineCrisisTopic -> UkraineCrisisTopic,

https://github.com/twitter/the-algorithm/blob/ec83d01dcaebf3...

WinstonSmith843y ago

Yes, this thread is particularly interesting

https://twitter.com/aakashg0/status/1641976869460275201

Speaking about Ukraine, it seems to be literally a Twitter policy violation ... https://github.com/twitter/the-algorithm/blob/main/visibilit...

frob3y ago

Well that was a giant nothing-burger. This seems to be your standard ranking stack. We find candidates based on who you follow, who they follow, who is trending, and what we think you like. We then rank them based on how likely you are to engage with them and continue to come back and give us money via our subscription service and ad views. We then try to remove spam and other negative experiences.

Where's the beef?

Reason0773y ago

One flaw I've noticed in Twitter's recommendations recently is the tendency to send notifications for "BREAKING NEWS"-type Tweets. Great, except they're usually for news that happened in the past - typically 12-24 hours ago!

The algorithm really needs to recognise when tweets are time-sensitive and not recommend them just because they got a lot of engagement the previous day!

pram3y ago

I wonder what determines 'cred' for this part:

https://github.com/twitter/the-algorithm/blob/7f90d0ca342b92...

pram3y ago

I answered my own question https://github.com/twitter/the-algorithm/blob/7f90d0ca342b92...

"This method reduces the page rank of users who have a low number of followers but a high number of followings."

anigbrowl3y ago

Heh, I knew it. You need to prune your own following list regularly or become less and less visible. I suspect (but have yet to check) that they also weight visibility in terms of historical follower growth.

That's why you see so many trolls with very low follower counts; it's more effective to make/purchase a new firstname-bunchanumbers account and poop in people's replies than to let Twitter decide placement based on historical factors.

beebmam3y ago

I don't use Twitter, but this is awesome. I hope this will help more people realize how complex it is to build and operate web services.

belter3y ago

Unless a trusted third party, forensically audits Twitter, there is no guarantee the published code corresponds to the actual live code in Production. Also multiple parts are not present as stated in the blog.

This should be seen as a possible snapshot of some code, that might have run, might run in the future, or is possibly running in some parts of the production infrastructure at Twitter.

kossTKR3y ago

I've pretty much ignored all of the superficial political theatre but noticed the actual algo worsening over the last 6 months.

I get way to much random crap now, promoted tweets, "thing that might interest me", users that seem to never get on my feed etc.

Twitter seems to go in the direction of all other social media, feeds that are 100% digital crack with no way to control your media diet.

HAL30003y ago

Expect to see A LOT more spam on Twitter after this release. It's like giving SEO spammers access to google search ranking algorithm.

dmix3y ago

Stuff like this always has consequences, it doesn’t mean it’s a net negative for society. It means you need to adapt and actually fix the problems, while also benefiting more from the accountability.

That’s always been a risk of open source and not being hyper-centralized.

dmix3y ago

muratsu3y ago

Given the complex relationship between advertisers, platform, and users I don't know if any meaningful contribution can be made to the algorithm without pissing anyone off. The following tab already gave people who're not interested in algo recommendations a way out. I don't quite understand the reasoning behind open sourcing the algorithm. Any thoughts?

WhereIsTheTruth3y ago

Is it even what they use in production?

There is code that favor Elon's tweets so I'd yes that's probably what they use

sho_hn3y ago

Humorous conspiracy theory: Imagine if it is not, but sanitized, and then someone added in Elon Boost to make it look credible. :-)

WhereIsTheTruth3y ago

Or perhaps it does nothing at all, and it was there so we talk about it, the "is_democrat"/"is_republican" is also ridiculous, as if the goal was to demonstrate a point about social media in general, hmm

0l3y ago

> There is code that favor Elon's tweets so I'd yes that's probably what they use

Where?

zaroth3y ago

Spoiler - there isn’t.

1 more reply

anshumankmr3y ago

Oh god... The MR's opened today are the craziest ones ever. https://github.com/twitter/the-algorithm/pulls?q=is%3Apr+is%...

They made my morning

perceptronas3y ago

It seems most of the code in the repository is just simple Scala. Codebase is easy to read and understand.

I don't see any Typelevel stuff. This probably lets them hire and train engineers faster while still gaining most of the benefits

I hope this will encourage more companies to pick Scala.

evntdrvn3y ago

it would be super interesting if when logged in to Twitter, you could take a look at your current calculated scores/weights for all the params that are part of these algorithms. Similar to the Netflix "Stats for nerds" menu...

jerrygoyal3y ago

> The goal of our open source endeavor is to provide full transparency to you, our users, about how our systems work

the majority of users didn't ask for the this so not sure what's the exact motive behind thier efforts. it could be a PR stunt.

ouraf3y ago

Honestly, there's too much garbage in the code dump they made.

Maybe an UML graph or even a presentation or written guide on how they measure and apply each weigh or group policy would make it easier to have some solid take on how it works

WA3y ago

Will this make it easier to game the algo or does it depend so heavily on individual user interaction that it’s close to impossible to game it? For example, by carefully crafting Tweets or by buying likes/retweets etc?

tcmart143y ago

Repo has 1.5% rust code and no

  author_is_uwu

That is the biggest problem.

wslh3y ago

It's the data, stupid [1] (not the algorithm).

[1] https://en.wikipedia.org/wiki/It%27s_the_economy,_stupid

13years3y ago

A feature proposal to put you in control of the algorithm

https://github.com/twitter/the-algorithm/issues/1363

rvz3y ago

If Twitter was 'dead' why on earth are we still talking so much about this blue bird site?

It looks like once again these lot predicting that he won't open source the algorithm and are going to start eating their words again [0], just like they did around incorrectly predicting Twitter's immediate collapse [1] and will look at the source code anyway and continue to talk about "Twitter" again.

If Twitter can open-source their algorithm, Why not TikTok? Either way, the bots are now going to have a very expensive time on Twitter.

[0] https://news.ycombinator.com/item?id=35213213

[1] https://news.ycombinator.com/item?id=33701371

anigbrowl3y ago

Are you kidding me, running a botnet is easier than it has been in years if you're that way inclined. The amount of spam I see has gone way up over the last 6 months.

rvz3y ago

> running a botnet is easier than it has been in years if you're that way inclined.

Even if it is 'easier', the bots are identified, down-ranked straight to the bottom and shadow-banned to invisibility. It is essentially evaporating money and time.

> The amount of spam I see has gone way up over the last 6 months.

Yeah. The spam has gone way up into smoke over the last 6 months. It is only going to get more expensive to spam as soon as the paid changes come in.

hotpathdev3y ago

The issue tracker and pull requests are being hit with very funny suggestions. Many people suspect this is an April Fools joke. It's possible this entire repo was generated by a LLM to appear plausible.

I especially like the suggestions to rewrite the algorithm in Rust [1] and this pull request which simplifies the algorithm to a single c file [2].

[1] https://github.com/twitter/the-algorithm/issues?q=is%3Aissue... [2] https://github.com/twitter/the-algorithm/pull/712

say_it_as_it_is3y ago

And yet they require their software engineer applicants to be well versed in algorithms and data structures? These tech company managers know nothing about how the sausage is made.

mkl953y ago

I couldn't care less about Twitter's high level abstractions. They were never renowned for those. Their database schemas and infrastructure on the other hand...

javajosh3y ago

Is there demand for a service that simply shows you the things the people you follow wrote? (It would be up to you not follow so many people that you can't keep up.)

woolion3y ago

So, the day after the headline that Twitter is artificially promoting polarizing political voices, Twitter open-sources their algorithm!

What does the commit history say? There are 3 commits, like a very very real programming project. The issues and pull requests show how much people are fooled by this very transparent move.

So this is an obvious attempt at a digital potemkin village, that like the real one, poorly succeeds in hiding the truth. Elon does not not want to upset the apple cart (political economical or ideological) but make his followers believe in it, and so we get this. Great spectacle, if that's what you're interested in.

Reptur3y ago

They didn't open source the data the censoring abusive, toxicity, and nsfw the algorithms check against, so I'd call it a partial open-sourcing.

cwkoss3y ago

The twitter algorithm sucks balls and heavily overweights who's paid for a checkmark.

The default feed view has grown increasingly useless over the past ~6 months.

lhnz3y ago

I don't think any changes to bias towards bluechecks have been made yet.

1 more reply

pledess3y ago

there may be a hint of which elections were of interest:

https://github.com/twitter/the-algorithm/blob/7f90d0ca342b92...

jml23y ago

( "has_toxicity_score_above_threshold", _.getOrElse(EarlybirdFeature, None).exists(_.toxicityScore.exists(_ > 0.91)) )

jml23y ago

`if (sourceUserId.isDefined || sourceUserId.isDefined) Some(true)`

https://github.com/twitter/the-algorithm/blob/main/timeliner...

infamouscow3y ago

I'm glad to see this is licensed AGPL. I hope this sets a precedent for everyone else in the space to do the same.

abdnafees3y ago

I think it's April fools. It's a joke at the expense of open source and should be taken down ASAP.

jeffbee3y ago

Why does anyone use "for you"?

kzrdude3y ago

A plain follow stream is a firehose of mundane messages: everyone you followed's messages, sorted by most recent first.

If some people you follow are more important than others (family) that doesn't matter to the stream, and you get bogged down by less important messages.

I think some "algorithm" is necessary, but people will disagree on the balance. (It's unfortunately in twitter's interest to push all kinds of random shallow stuff and get people addicted to that.) I hope mastodon can maybe provide some flexibility and customizability in terms of what the mix between recent and likely to be interesting should be, and what interesting means to you.

Not that I use twitter much, but since it became clear that Elon made sure to promote himself in the algorithmic feeds, I've avoided "For you" anyway since I don't accept that in my mix of messages.

dbbk3y ago

Because I'm not on Twitter 24 hours a day. I want a recap of the best stuff that's happened since I last opened it. If I use the Following tab, I'm only getting the realtime firehose, meaning if the best stuff happened an hour ago I'd miss it.

suddenclarity3y ago

To see what people talk about in your friends circles. It can be interesting in moderation. Similar to skimming the frontpage of HN or recommended list on YouTube. Especially during major news event when your friends might not be the ones posting about it.

teach3y ago

Probably the same reason some people browse /r/all on Reddit. I think the desire for that sort of thing has waned a lot over the past couple of years, though.

1 more reply

whalesalad3y ago

Two space indent in .py? Provocative.

m11173y ago

As I understand, they open sourced only the abstraction, but still have a way to control anything.

capableweb3y ago

I'm no fan of either Twitter nor Elon Musk, but this is a great move and I hope other companies follow what Twitter did here and start open sourcing more core parts like this. Maybe it's mostly useful for learning how it works, not for directly using it in your own product, but the amount of transparency it gives users cannot be understated. As long as that actually is the code they run, but there would be no way for anyone but Twitter to verify that.

cubefox3y ago

I think it mainly helps with accountability regarding free speech. They did and do several kinds of shadow banning and down-boosting to combat spammers, which always has some false positives. If you the algorithm is published, you could at least better judge and argue when you are unfairly "silenced". Since this may be due to an avoidable flaw of the algorithm instead of some accepted collateral damage.

firstSpeaker3y ago

Would it be developed in open as well or there will be frequent merge from their internal repos?

dools3y ago

And yet my Twitter feed was always so boring.

Reminds me of the Sirius Cybernetics Nutri-matic drinks machine.

systemvoltage3y ago

Astounding amount of cynicism here, so I'll say something positive: Transparency is undoubtly important, I'm glad we can see how all of this works and what sort of effort goes into building a social media system. It's licensed under GPL which is a bummer (would have preferred BSD) but it's better than nothing.

sho_hn3y ago

Assuming anything in this codebase is worth reusing, I'm glad it's GPL. It's a case where I'd like open-first to spread.

systemvoltage3y ago

GPL would be good if it is a self contained library. If anyone would use it, it would be small portions of it, but GPL makes it completely useless. You can't contaminate anything with it. We'll stare at it, that's about it.

That makes me think, this is actually a good call. Twitter can claim that they have complete transparency while not allowing anyone to touch their code (because it is GPL). "Anyone" being future competitors. If it was BSD licensed, it'd be tremendously useful in building a Twitter competitor (on paper, you still need network effects, I am just spitballing to make a point).

1 more reply

TMWNN3y ago

>Astounding amount of cynicism here

You can tell that those who rushed in to find something to criticize can't, when they are reduced to making jokes about coding stylistic conventions.

systemvoltage3y ago

Yea I mean, all discussions about Twitter have double standards. If this was literally any other company, there would be resounding praise.

bastardoperator3y ago

My favorite is ci/ci.sh

  #!/bin/sh

  exit 0

inparen3y ago

Issue list is growing rapidly for a repo created an hour ago.

ThalesX3y ago

Non-issues most of them:

- author_is_elon: the problem is his tweets suck. stop recommending them.

- Include 'who viewed my profile' option in twitter

- Only one commit on repo

- How do I use it?

- Cool

- allow "AI" to tweet and like tweets on your behalf

- IMPORTANT: Guys please keep this place for real bugs and contributions,

etc...

kilianinbox3y ago

Summary this far • Code from Twitter's algorithm GitHub repository shared • Algorithm checks for specific author types (e.g., Elon Musk, power users, Democrats, Republicans) • Author ID lists used for metrics collection in A/B experimentation platform • Metrics tracked in A/B tests to avoid negative impacts on specific groups • VIPs like Musk, LeBron James, AOC used as indicators for algorithm's behavior • Algorithm changes that negatively affect Musk unlikely to go live • Speculation about code changes pre- and post-Elon's purchase of Twitter • Discussion on the importance of measuring and testing for potential biases • Debate on moral decisions in the context of Twitter's algorithm and content moderation

bluelightning2k3y ago

Late to the party here so unlikely anyone sees this comment. But the double take for me was seeing the article end with "if this sounds interesting to you, come join us!"

benatkin3y ago

Party in the issues: https://github.com/twitter/the-algorithm/issues

paulddraper3y ago

> 1.4k forks

Wow, we're getting some collaboration going!

Thaxll3y ago

Let's dig into Twitter code quality.

Kpourdeilami3y ago

https://github.com/twitter/the-algorithm/blob/7f90d0ca342b92...

```

def query_keys(self, language, task=2, size="50"):

    if task == 2:

      if language == "ar":

        self.query_settings["adhoc_v2"]["table"] = "..."

      elif language == "tr":

        self.query_settings["adhoc_v2"]["table"] = "..."

      elif language == "es":

        self.query_settings["adhoc_v2"]["table"] = f"..."

      else:

        self.query_settings["adhoc_v2"]["table"] = "..."

      return self.query_settings["adhoc_v2"]

    if task == 3:

      return self.query_settings["adhoc_v3"]

    raise ValueError(f"There are no other tasks than 2 or 3. {task} does not exist.")

```

tentacleuno3y ago

Looking through it, the ... seems to be a placeholder for information they'd prefer to be kept private. For example, look in the keywords section in the same file you shared.

1 more reply

drakonka3y ago

Is this not an April Fools joke?

throwaway6892363y ago

It's better than nothing.

diebeforei4853y ago

Kudos for open-sourcing this.

voz_3y ago

hmmm https://github.com/search?q=repo%3Atwitter%2Fthe-algorithm-m...

Twitter hmu if you need help trying Pytorch 2.0 ;)

bluish293y ago

I wonder if it will be possible in one day to know what is values of `author_is_power_user`, `author_is_democrat` and `author_is_republican` for your account. Does GDPR help with that? probably not because maybe they do it for people inside the us only so it is not related to EU anyway.

bilekas3y ago

I'm supposed to be going out in 20 mins....

throwayyy4790873y ago

You gotta hand it to Elon - he actually did it.

minimaxir3y ago

If you look at the GitHub repo, most of it is READMEs describing systems, not the models or code subleties which actually give explanations into how certain weird behaviors on Twitter happen. (e.g. the preference of certain users in the For You tab. EDIT: bad example, since there appears to be a flag for that in the code, although it does not specify which users are on the list)

mquander3y ago

The links in the README just go to other documents, but the repo seems to have most of the code for the components the documents are describing.

1 more reply

jonahbenton3y ago

LOL. My algorithm at twitter had been very simple-

See tweets from people I followed.

Don't see tweets from people I didn't follow.

Trust people I follow in their retweets to signal something interesting.

Unfollow unhelpful people.

Once that algorithm was rendered impossible, I left twitter.

Haven't missed it.

Having someone say- here's the way we are going to promote something to you- doesn't make me inclined to accept the promotion!

mgiannopoulos3y ago

This still exists as the Following tab and viewing it is a persistent option. You don’t need to see the algorithm feed (“For You”) ever.

1 more reply

mempko3y ago

You should try Mastodon then!

nemothekid3y ago

https://twitter.com/dril/status/831805955402776576

aaa_aaa3y ago

Progressives have totally lost their minds.

jdthedisciple3y ago

So Elon is ISIL now?

Weird reply.

1 more reply

raydev3y ago

Do we "gotta hand it to Elon" for not missing one of his 40-50 self-imposed deadlines and feature announcements?

horns4lyfe3y ago

Given the previous leadership was secretly working with the feds to suppress political dissidents, ya, this is a good step.

addisonl3y ago

Did he? Considering the vast majority of the algorithm is waved away as “ML model”.

thieving_magpie3y ago

I can't pretend to know if this contains the actual ML model code but there is a second repo the-algorithm-ml: https://github.com/twitter/the-algorithm-ml

lern_too_spel3y ago

Where is the file containing accounts that are artificially boosted? We can guess what its single line is, but how is it incorporated into the algorithm?

sudo_navendu3y ago

Weights on different metrics. From https://github.com/twitter/the-algorithm/blob/ec83d01dcaebf3...

private def getLinearRankingParams: ThriftRankingParams = { ThriftRankingParams( `type` = Some(ThriftScoringFunctionType.Linear), minScore = -1.0e100, retweetCountParams = Some(ThriftLinearFeatureRankingParams(weight = 20.0)), replyCountParams = Some(ThriftLinearFeatureRankingParams(weight = 1.0)), reputationParams = Some(ThriftLinearFeatureRankingParams(weight = 0.2)), luceneScoreParams = Some(ThriftLinearFeatureRankingParams(weight = 2.0)), textScoreParams = Some(ThriftLinearFeatureRankingParams(weight = 0.18)), urlParams = Some(ThriftLinearFeatureRankingParams(weight = 2.0)), isReplyParams = Some(ThriftLinearFeatureRankingParams(weight = 1.0)), favCountParams = Some(ThriftLinearFeatureRankingParams(weight = 30.0)), langEnglishUIBoost = 0.5, langEnglishTweetBoost = 0.2, langDefaultBoost = 0.02, unknownLanguageBoost = 0.05, offensiveBoost = 0.1, inTrustedCircleBoost = 3.0, multipleHashtagsOrTrendsBoost = 0.6, inDirectFollowBoost = 4.0, tweetHasTrendBoost = 1.1, selfTweetBoost = 2.0, tweetHasImageUrlBoost = 2.0, tweetHasVideoUrlBoost = 2.0, useUserLanguageInfo = true, ageDecayParams = Some(ThriftAgeDecayRankingParams(slope = 0.005, base = 1.0)) ) }

pictur3y ago

It's a really scary codebase. Do you really need that much code for the world's crappiest recommendation algorithm? I think you can do more crap with less code. we trust you elon.

distrill3y ago

the-algorithm is such a pretentious name for a repo

BbzzbB3y ago

It's a colloquial term for recommendation engines, how often do you hear people say "the algorithm" (vs. "the recommendation engine") on YouTube?

distrill3y ago

yes, but this is the first repository i have seen named like this

1 more reply

sho_hn3y ago

Eh, it's name-spaced.

anoncow3y ago

This is the latest comment.

anoncow3y ago

I posted this to check if Bard can read HN posts in order.

Patrickmi3y ago

Didn’t Elon check the codebase before open sourcing it, like was he expecting everyone to be happy when seeing author_is_elon ?

ericzawo3y ago

It's really dismaying watching the space man light this website on fire.

https://twitter.com/alexblechman/status/1641905502043926530?...

photochemsyn3y ago

I generally have a very low opinion of social media platforms, but I did create a Twitter account for the first time after Musk bought the platform.

My conclusion is that it's basically entertainment, with very little of what I'd call high-quality useful information that deserves further examination (unlike a lot of HN posts, in contrast). I also notice something of a Tik-Tok approach to video being implemented, which is not surprising given Tik-Tok's success (and makes one wonder who exactly it is lobbying so hard for a Tik-Tok ban, and whether it's just a commercial competition issue more than anything else).

As far as the recommendation algorithm, it appears to be a siloing setup - look at content of one particular flavor, it gives you more of that flavor. A 'flush settings' or 'forget browsing history' or 'reset to defaults' button would be useful, if probably not what advertisers want in terms of delivering to target audiences. I suppose setting up multiple accounts is something of a solution, although too much effort to be that interesting.

In terms of news reports, it's broader in scope than traditional corporate media outlets, so that's a plus in its favor. Reliability is perhaps similar (i.e. low).

lhnz3y ago

You can follow accounts that only post arxiv.org links for ML papers or anything else you're interested in if you want to. If you're only getting entertainment then it says a lot about the original accounts you followed.

j / k navigate · click thread line to collapse

Twitter's Recommendation Algorithm (opens in new tab)

1185 comments