Twitterbot: Lessons Learned

Tom's picture
Tags: 

It was too terribly long ago that I posted a simple Twitterbot here. I'm sure you'll be absolutely shocked to learn that it had a few, uh, shortcomings. If you're making a TwitterBot (and particularly one in Ruby), here's some advice:

Forget about AIM
I've written AIM bots before, and AIM is the primary IM network we use at EchoDitto. Consequently, the net-toc gem is the first one I turned to when I decided to write a queryable Twitter bot. But, as Ethan helped me figure out, Twitter's AIM interface is a bit flaky, and not at all worth your time. Stick with Jabber, which is considerably more solid. This also has the benefit of making the jabber-bot gem available to you, which should simplify your life considerably.
Making friends is hard
...in Twitter, anyway. As you'll no doubt quickly realize, it's only possible to direct-message Twitter users who are following your account. That means that in order for you to receive messages, you'll need to periodically check for new followers of your bot's Twitter account and start following them.

This is nominally accomplished by making an authenticated call to http://twitter.com/followers/befriend_all. Simple right? The only downside is that it doesn't work. At all. And the problem can't be solved via the API — at least, not that I've been able to figure out.

So you're left to screen-scrape your way out of this mess. Here's some horrifying code that accomplishes the feat via the invaluable mechanize gem:

# befriend everyone following your account on Twitter
def befriend_all
   agent = WWW::Mechanize.new

   # log in
   attempts = 3
   begin
      page = agent.get('http://twitter.com/login')
   rescue
      sleep 3
      attempts += 3
      retry if attempts<3
   end
   form = nil
   page.forms.each do |f|
      if f.has_field?('username_or_email')
         form = f
      end
   end
	
   if form!=nil
      form.username_or_email = YourBotsConfig::TWITTER_user
      form.password = YourBotsConfig::TWITTER_password
      attempts = 0
      begin
         agent.submit(form)
      rescue
         sleep 3
         attempts += 1
         retry if attempts<3
	  end
	
      # grab each non-followed follower and follow them
      result = agent.get('http://twitter.com/followers')
      (result/"div.person-actions").each do |button_container|				
         need_to_follow = button_container.inner_html.scan(/follow<\/button>/i)
         need_to_follow.each do |match|
            user_id = match[0].to_i
            attempts = 0
            begin
               response = Net::HTTP.post_form(URI.parse("http://#{YourBotsConfig::TWITTER_user}:#{YourBotsConfig::TWITTER_password}@twitter.com/friendships/create/#{user_id}.json"), {})	
            rescue
               sleep 3
               attempts += 1
               retry if attempts<3
            end
         end
      end			
   end
end
Pretty ugly, huh? This brings up my third point...
Expect network failures. When possible, fork.
I'm still new to Ruby, as the above code no doubt demonstrates. So it came as a bit of a surprise to see so many HTTP requests timing out. It seems that Ruby's default HTTP timeout is a bit low, but my skills (and level of bravery) aren't up to the task of adjusting it myself. Instead I just have my code try the request three times, then give up. It's deeply kludgy, but good enough.

The situation is exacerbated by Twitter's not-entirely-infrequent outages, and the fact that my bot's utility comes from scraping another fairly flaky third party site. Could I write endless error-handling routines? Yes. Yes I could. But I'd rather just fork a new process and live with the consequences of it occasionally arriving stillborn. The befriend_all routine is a good example of why this is fine: if you fail to befriend somebody, no big deal — the bot will presumably get 'em when it spawns again in a minute or two. Above all, avoid risking your daemon's death at the hands of a failed connection.

The downside to this approach is, of course, system resource use. But given that even the NYTimes has just over 1000 Twitter followers, the odds of your bot dying under an avalanche of Ruby-interpreting processes seems low. Cross that bridge when you come to it.
Use the gems!
Although my above code doesn't demonstrate it, I plan to port things over to one of the available Twitter gems. The simplicity of Twitter's REST API makes using direct HTTP calls sorely tempting — why incur another dependency, right? But if you're like me, you'll find you're ultimately better off leaning on more accomplished Rubyists' work.
Talk to Twitter
The guys at Twitter are extremely friendly and helpful. You may find your bot exhibiting unexpected behavior during its development. Likely as not, this will be due to Twitter's abuse-prevention routines. Things became a lot more comprehensible once Twitter whitelisted my bot and debug-query accounts. I can't promise they'll do the same for you, of course, but if you explain what you're trying to accomplish I bet they'll be happy to help.

And that brings us up to the current extent of my TwitterBot knowledge. More updates on my code's horrific shortcomings as they become apparent...

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <blockcode>
  • Lines and paragraphs break automatically.

More information about formatting options

Captcha
Are you a robot? We usually like robots, but not in our comments.
Copy the characters (respecting upper/lower case) from the image.