Yahoo implemented OAuth sometime last year. So now you use the same OAuth libraries that you use to connect to Google, Twitter, MySpace etc to access your users Yahoo address books. This is the Yahoo OAuth Documentation.
The only problem is that Yahoo quietly pushed through an extension to OAuth called the OAuth Session Extension. This was all done publicly but it involves some major changes to OAuth token management, which is the exact thing that has been hardest for people to understand about OAuth in the first place.
Right now I’m feeling a bit like Arthur Dent in the Hitchhikers Guide to the Galaxy lying in front of the bulldozer trying to demolish my home:
“But Mr. Dent, the plans have been available in the local planning office for the last nine months.”
“Oh yes, well, as soon as I heard I went straight round to see them, yesterday afternoon. You hadn’t exactly gone out of your way to call attention to them, had you? I mean, like actually telling anybody or anything.”
The thing is Authentication standards are hard, they are by necessity complex. Therefore it is vital that they are predictable. It isn’t the job of an developer trying to connect to their users Yahoo address book to understand all the different complexities of the standard. This is the job of the library writers.
I think we have done a great job in the Ruby community with the OAuth libraries. The “Ruby Gem” of which I was one of the co-authors of does a great job of handling the complexity, but still caused several problems with new users as it still required you to understand the OAuth flow. My new OAuth Consumer Rails Plugin attempts to hide that to make it extremely easy for developers to write applications that access Twitter, Google, FireEagle etc.
The Token Refresh Step
While I hadn’t personally been looking at the special requirements for communicating with Yahoo as I haven’t had the need until now. I had noticed the large amounts of problems people have had with Yahoo. I knew there were a couple of extra fields to pass along throughout the flow, which isn’t a problem. I also noticed that the tokens were quite large at around 800 bytes. Those are all things you can work around as they don’t change the flow. The big problem is the completely new Token Refresh step.
Basically when you get an AccessToken from Yahoo it is only valid for an hour. So you need to refresh it by requesting a new access token before attempting to use it again. This means that we as consumers of Yahoo web services need to monitor expiration of Yahoo tokens, create an additional HTTP requests before we can connect for whatever it is we are using it for.
No, this is not insurmountable, they just add a lot more complexity to our code.
We are asked to pay for Yahoo’s scalability issues
However when you dig down and find out the reason for this it is quite interesting. Apparently the reason for this is due to Yahoo’s security architecture.
Allen Tom from Yahoo writes in the original proposition#
Very large service providers may have a distributed architecture in
which the service endpoints are able to cryptographically verify the
authentication credentials without calling back to a central database.
Permission revocation is implemented by issuing consumers a relatively
short lived session credential, and requiring consumers to
periodically obtain new session credentials from a central
authorization service. The authentication service performs the
necessary checks before issuing new session credentials.
Although it is technically possible for SPs to build an OAuth
translation layer on top of their existing services, the cost may
involve additional latency and decreased reliability if OAuth
verification required a query to a database located in a very distant
datacenter.
This is also mentioned in a tiny paragraph at the end of the OAuth Session Extension Spec
Allowing Access Tokens to be revoked before they expire requires Service Providers to perform a database lookup before serving a Protected Resource. For performance reasons, Service Providers may want to issue Access Tokens that can be validated without a database lookup, provided that the Access Token lifetime is less than then the Service Provider’s allowable latency for Access Token revocation.
So Yahoo proposed this for all intents and purposes to avoid a database call on every request. I can understand they want to avoid this, but there are lots of other solutions.
This all explains the Yahoo token size. Basically Yahoo’s token contains expiry information, probably a user id and some sort of consumer id. This is all encrypted and signed similar to the Rails’ cookie based sessions. This is very clever and I can appreciate the design. This allows the various servers in Yahoo’s cloud to verify OAuth tokens without accessing a central OAuth token database.
The expiry and refresh section is so users can revoke an applications token. By the time my application attempts to refresh the token from Yahoo’s token servers my user might have removed the permissions and the token refresh is declined.
A solution to Yahoo’s problem
Rather than us the developers having to refresh the token, I think it would make a lot more sense to leave expiry out of the token completely. The following would happen within Yahoo and be completely hidden from external application:
- Application requests contact info from Yahoo
- Yahoo’s Contacts API server looks in local token cache for applications token
- Token is not found in cache
- Contacts API server asks internal OAuth server to verify Token and returns secret
- Token and token secret is stored in cache with 1 hour expiry
This allows Yahoo to manage their internal OAuth token expiry in a very scalable way without adding any additional complexity for external developers. And yes Yahoo I will happily sign an IPR for you if you need it to implement this.
Just say no to extra vendor tax
What Yahoo’s OAuth implementation is, is a de facto tax on people developing towards their API’s. It adds additional overhead and complexity to libraries, extra unnecessary http requests for developers and breaks what is supposed to be a standard.
I have decided to stand firm and not add support for Yahoo to the OAuth Plugin. I started work on it but I have removed it from the current release and won’t add it back until they follow the standard OAuth flow. I encourage other library authors to do the same.
Why is this so important
OAuth was designed as lower level modern authentication standard that other higher level applications can be built upon. An extension on top of it is the OAuth discovery proposal which in theory would allow you to connect to any provider by just giving it a single top level URL.
So lets say I wanted to connect to Yahoo. I could just do something like:
@yahoo = OAuth::Consumer.new :site=>"http://yahoo.com"
Everything afterwards would be automatic. No specifying of request token urls, authorization urls etc.
What makes this even more interesting is when you start adding higher level API’s on top of OAuth, such as Portable Contacts. This would allow you with a single library with no extra code access your users address book, whether it is on Yahoo, Google, Plaxo or even your own personal address book server. Having a straightforward discoverable protocol without extra overhead makes it simple for developers to access. It removes all the proprietary code exceptions we have to write today.
Having proprietary extensions on something as fundamental as the OAuth token request cycle breaks the beauty and simplicity of PortableContacts and other future standards.
Should standards be dictated by large corporations?
My point about all of this is that I don’t think it is right that large companies try and push through changes to community created standards solely because it will make their life a little bit easier. OAuth came out of small startups and individuals who saw a common problem and wanted to create a solution. We should not just blindly accept every suggestion that comes out of large companies just so we can announce with much excitement that we have the support of Google, Yahoo or Microsoft. I think it’s great that they join in the process, but we should not bend our principles, just for the sake of them joining.
The debate on the mailing list
I have been debating this on the OAuth mailing list and while my initial tone was definitely negative, it was never targeted at any individuals. Unfortunately the discussion has now become more of a personal fight, which I don’t think is particularly useful.
While this particular discussion is about Yahoo I would be equally disturbed by it if any other major provider tried to do the same thing. Yahoo is famous in OAuth circles for their corporate culture and infrastructure leaking into the standards as wast last seen during the OAuth IPR debate last year.
Yahoo has done great work in a whole lot of instances. FireEagle in particular is still one of the best OAuth implementations out there. I just wish they had done a better job of thinking through their corporate OAuth strategy.