Dealing with Unsafe Methods in RESTful Services
Your update request to the service just timed out. Is it safe to send it again? Maybe. Here's how to ensure that all your update, delete and add requests are safe plus some advice on what you should really be calling them and handling concurrency.
I admit that early in my consulting career I built a service for one of my clients that didn't understand that, sometimes, clients resend requests that I had already processed. A client would, for example, get a timeout when calling my service to add a Customer and resend their request. Unfortunately, my service had successfully processed the request and the only problem was that the customer hadn't got the 200 OK return code from the service. As a result of the client resending the request, my service was (occasionally) adding duplicate Customers.
Once my client and I tracked down the problem (we called it a "stutter"), we tried to deal with it by checking to see if a matching Customer already existed. Searching through all Customers turned out to be expensive so, in the end, we had to settle for keeping a RequestList of recent additions and searching it for duplicate adds. It was not only a lot of code, it wasn't a solution we could apply to other services.
I had failed to consider idempotency: Is it safe for a client to send a request multiple times? Some requests are, we would think, safe (idempotent): GET and DELETE. If a GET or DELETE times out, the client can just send it again. What could go wrong?
Certainly, retrieving a Customer with a GET request should have no effect on the Customer's representation (that is, the timestamp on the retrieval may change but the Customer's data will not). Issuing the same GET over and over again should be safe. Issuing updates and adds repeatedly, however, is not.
Interestingly, I've had clients assert that RESTful PUTs are idempotent: If you repeatedly set the Customer's LastName to "Vogel," what's the problem? There are a couple of problems with this approach, one of which has to do with how "pure" your REST design is.
First, it's easy to imagine an unsafe update scenario. Imagine, for example, a sale of 15 units of item C789. When that request is received at the service, the application will lower the quantity-in-stock for C789 by 15. If the client sends the request again because of a timeout, that request will result in another deduction of 15 but, this time, without a corresponding sale. Altering the request to have the client send the updated quantity-in-stock so the server can update it isn't going to work -- effectively, you would have multiple independent clients trying to simultaneously manage the quantity-in-stock. Clients know what was ordered; the service knows what's in stock. Trying to swap either of those responsibilities is not going to end well.
The second problem with my client's lack of concern about updates is that my client was suggesting using PUT to update part of the entity identified in the URL. That's not really the pure definition of PUT in REST: PUT is supposed to be used to replace the complete entity at the URL. In fact, if there is no entity at the URL, you can use PUT to add it.
According to Roy Thomas Fielding (who defined REST and should know), if you want to change just part of an entity, then you should be using PATCH as the HTTP verb in your request. Having said that, current practice seems to use PUT (you'll notice that the default controller in a Web API application includes a PUT method but not a PATCH method, though NuGet has a package to add PATCH support).
Replacing the whole entity, of course, ensures idempotency but hardly solves our problem of updating the quantity-in-stock for item C789. All we've done with this distinction is move the idempotency problem with updating inventory from PUT to PATCH.
The easiest way to implement "exactly once" updates for PATCH (or PUT if you're not a REST purist) is to accompany each PATCH with a unique transaction identifier. If a client wants to sell 15 C789s and then another 15 C789s, then the client would send two messages, each with a different transaction identifier. If, however, the first request times out, the client must resend the first request with the same transaction identifier to avoid idempotency (and to avoid being charged for two sales, only one of which is real). The idempotency key could be part of the request's URL, payload, or headers (I like putting it in the headers).
To support this solution you need a RequestsProcessed list that holds the transaction key and result for any completed request. The service, on receiving a request with a transaction key, would first check the list to see if the transaction key has already been processed. If the key is found, the service returns the result from the previous request (which might be a 200 OK, a 409 Bad Request or a 500 Internal Service Error, any of which might have some accompanying data). If the request isn't in the list, the service performs the transaction, adds the key and response to the RequestsProcessed list, and returns the response. The update to the entity and the addition to RequestsProcessed must be part of the same transaction so that if either fails, they both fail.
You can generate the idempotency keys at the service and send them to the client as part of the GET that must, now, precede any PATCH or PUT (this assumes that requiring a preceding GET is reasonable in any PATCH/PUT scenario). This is an excellent strategy if you want all your keys to look alike (if, for example, the keys are doing double duty as identifiers in some logging system).
But you may not care what the client's transaction key looks like: If you only require the key to be unique within the requests from each client, then you can allow the client to build their transaction key any way they want. That may sound like you're giving up too much control to your clients, but if you don't trust your clients to generate unique transaction keys, then you probably shouldn't be trusting them to use your transaction keys correctly.
If you're worried about concurrency with PATCHes (or "partial entity updates with PUT"), then you can use preconditions there, also. To implement this you'd have the client, when directly or indirectly updating a property, send the "current" value of the property as a precondition. This is easy to overdo, though. Consider my sale of item C789: A customer retrieves the information about the item and sells 15. During this time another customer buys 7 items, reducing the quantity-in-stock for C789 to 35. When our customer sends a PATCH with the sale of 15 items, should the client send the original quantity-in-stock of 45 as a precondition? I suspect you only care that there's enough on hand to satisfy the sale, not whether or not another sale has taken place.
Safe Adds with PUT
If PUTs can add a new resource at a location, you have to ask: What's the difference between POST and PUT? Essentially, in pure REST implementation, the difference depends on whether the service or the client gets to name the entity being added. PUT allows the client to specify the name of the entity being added (in REST parlance, the client specifies the URI for the resource); POST does not.
In a pure REST implementation, a PUT request that adds a Customer would need to provide the CustomerId. A URL for a PUT request might look like this:
A POST request would not permit the client to include the name (and it would be the responsibility of the message returned from processing the POST to include the name):
This means that a "pure" PUT has no idempotency issues when doing adds because the service begins processing by retrieving the item using the provided name. If no item is found, the service adds the item and returns a 201 Created code. If, after the Customer is added, the client doesn't get the response and resends the request, then the existing item is found, replaced, and the service returns a 200 OK code (as it would with any replacement). Either way, the client is happy.
When replacing an existing object in a PUT, you can require, as a precondition, that the client include the current state of the entity it's replacing. If the state that the client sends doesn't match the state of the entity at the service, the service can refuse the request and return a 409 Conflict code. Typically, this is done to handle concurrency problems (two clients PUTting the same object at the same time).
This does create some extra work for the client when a request fails. Imagine a client using PUT to add a Customer. The client sends the request with a null precondition. The Customer is added but the request times out and the client doesn't get the response. The client resends the message. This time, the null precondition is invalid because the Customer already exists so the client gets a 409 code. Unlike the 200 and 201 codes in the earlier scenario, the client can't be sure if its original PUT succeeded or there is a different entity with the same name at that resource added by some other client (remember: The precondition is there to handle concurrency problems). If knowing whether its object has been added is important to the client, then the client will need to GET the entity and check to see if it's the "right" version.
Safe Adds with POST
A POST always has idempotency problems. If a client repeatedly sends a request to the Customers service to add a customer called "Peter Vogel" then, without a transaction key, I could easily end up with multiple "Peter_H_Vogel" customers being created, each with a unique CustomerId (and with separate credit limits to max out!).
Including a unique transaction key with each POST (as I suggested for PATCH requests) solves this problem ... as it would have solved the stuttering problem I created for my client. Unlike our solution, the transaction key solution would also have been simpler to code and could have been generalized to other services. Sadly, we had no opportunity to change our clients to have them send a transaction key so we had to live with our uglier solution.
Summing up: I have no desire to tell you how to use PUT and whether you should really be using PATCH (but you should consider it). In requests that update part of an entity and in POSTs, require a transaction key. And always consider idempotency before you start coding so you can put the solution in place as part of designing your entities and your messages. To put it another way: There's no need for you to repeat my mistakes. I'm certainly not going to.
Peter Vogel is a system architect and principal in PH&V Information Services. PH&V provides full-stack consulting from UX design through object modeling to database design. Peter tweets about his VSM columns with the hashtag #vogelarticles. His blog posts on user experience design can be found at http://blog.learningtree.com/tag/ui/.