top of page

Twitter Account Auditing System

The scores explained here required a high throughput system that could pull large amounts of data from Twitter's API, then parse it into nodes and edges for analysis. Using the parallel compute module Ray as well as an advanced asynchronous queueing system developed in-house, the system was able to collect 1.8B users (essentially all user accounts), >300M tweets, and process 100s of billions of the edge relationships between them. The outcome was a series of account audits that gave a holistic view and involved analysis of followers, activity & engagement, audience, and community overlap.

Follower Audits

The most basic audit to run was the follower audit. This involved hitting Twitter's API for follower_ids (5000 at a time) and then collecting the fully hydrated user objects with the get_user endpoint (100 at a time). This necessitated the development of a robust queueing system that allowed multiple scripts running asynchronously across different machines to complete the work. 

Activity & Engagement Audits

After collecting data about an account's followers, the next step was to understand their posting activity and engagement results. This required collecting the last week or month's worth of tweets for each account from Twitter's API, then calculating means and medians. The most helpful metric to determine an account's value turned out to be impression median and impression median per follower. 

Audience Audits

Audience audits were crucial to understanding if engagement was real. They were also the most demanding, requiring iterating through an account's tweets and collecting all accounts that favorited, retweeted, replied, or quoted each tweet. All data was stored as edge relationships in Postgres and queried later for audience quality scoring. Each tweet was assigned a quality score as well as a percentage of suspicious engagement, and each account received a score that was an aggregate of their tweet performances.

Screen Shot 2023-08-25 at 9.11.23 AM.png

Overlap Audits

A need that surfaced was identifying clusters of high affinity users--  whether during an influencer search, creating custom audiences for advertising, or finding top quality accounts to engage with organically that may not show up by other methods. Three were developed-- follower, following, and engager overlap reports. The outcome of these audits were hyper-targeted custom audiences for use in Twitter's advertising platform, described here.

bottom of page