4 years ago (holy shit!) I launched a little project called Yell at the TV!. Up until recently, I’d mostly forgotten about it, but then I got several emails asking for me to remove some tweets. I’d assumed the site just stopped working at some point but apparently not.
For the past 4 years, it’s been slowly slurping up tweets with a few searches (most of which probably failed). It was running on Dreamhost shared hosting, on an ancient version of Rails, using SQLite. It had a 1.7GB production.sqlite. For any number of reasons the site should have stopped working a while ago, but it kept chugging. It was sloooooooooooooooow but Google stuck it out and kept scraping.
At the end of the day, I collected 7,240,187 tweets from 3,224,219 users. I only built in 21 shows and ended up with data about 2,441 episodes of those shows. The data collection and episode classification was pretty dumb, so I know not all of these tweets were really about the TV show or episode I assumed they were.
Given the pervasion of Twitter and #hashtags on TV, I still think that there’s an experience to explore around this. I would love to see more metadata attached to tweets so somebody could build an experience much like SoundCloud for video streaming. There’s only so much you can get humans to encode machine-parsable data and still convey something valuable for humans in 140 characters. You can see Twitter trying to do this with #Music, so we’ll see if they decide to do something for video too.
Anyway, all data has been removed from the site. I may resurrect it in some way if I ever find the time, though honestly just searching Twitter is probably good enough.