Author Topic: raSA, REST and the Civi API (Read 4656 times)

jesse.wolfe · November 18, 2008, 01:42:40 pm

Justification
At raSA (http://rasantiago.com/), we're writing an Application in Ruby On Rails that uses CiviCRM as a back-end datastore. Rails has a core library called "ActiveResource" which is meant to map REST APIs to Ruby Objects (It's almost an ORM, except that REST is never quite a relational database). Rails is implemented as "Opinionated Software" -- when things are done in an industry standard way (or at least in the way that the Rails platform developers think is best), they work with minimal configuration. ActiveResource is, perhaps, more opinionated than most - it is essentially unconfigurable. For an API to work
with ActiveResource, it must:

Represent objects as XML
Always use XML nodes that have names that either represent what type of object they are (like <contact> ) or have a type attribute (like <ResultSet type="array"> )
Use ID fields named "id" for every class of object
Give unique URLs to each object, by ID, like: http://servername/contacts/1
Respond differently to each of the HTTP protocol's methods: GET, POST, PUT, DELETE
Accept raw XML in the POST body of a HTTP request (not in the traditional application/x-www-form-urlencoded style)
Use the HTTP "Status" header to denote errors
Report the URL of a newly created object in the HTTP "Location" header

CiviCRM's existing rest.php interface only meets the first requirement on this list. After long consideration, we decided that it would be more useful to the community as a whole to create an interface in CiviCRM that follows these conventions, rather than making a new driver for Ruby or Rails that speaks CiviCRM's existing dialect of REST.

Process
In our local branch of the CiviCRM code, we have forked extern/rest.php and CRM/Utils/REST.php into a new interface, which I'm currently calling REALLY_REST (I'm open to suggestions for a better name). The existing REST.php interface uses URLs that look like this:
http://localhost/civicrm/extern/rest.php?q=civicrm/contact/get&contact_id=1
The equivalent REALLY_REST URL looks like:
http://localhost/civicrm/extern/really_rest.php/contact/1
Note that there are no query parameters on this URL -- all of the information about the action is encoded into the the path (plus the HTTP method, which in this case is "GET"). On many web servers (including Apache), it is legal to append arbitrary data to the URL of a PHP script - this is used to simulate virtual directories. Also, Apache supports PUT and DELETE on PHP scripts without any additional configuration.

CiviCRM's API v2 is already organized in a way that is conducive to building method names dynamically:
GET http://localhost/civicrm/extern/really_rest.php/contact/1 => civicrm_contact_get( 'contact_id' => 1 ) GET http://localhost/civicrm/extern/really_rest.php/contact?name=Bob => civicrm_contact_search( 'name' => "Bob" ) PUT http://localhost/civicrm/extern/really_rest.php/contact => civicrm_contact_add( $parsed_XML ) POST http://localhost/civicrm/extern/really_rest.php/contact => civicrm_contact_add( $parsed_XML ) (Civi's API does not distinguish between 'add' and 'edit' operations)
DELETE http://localhost/civicrm/extern/really_rest.php/contact/1 => civicrm_contact_delete( 'contact_id' => 1 )

Roadblocks
One problem with this system of building method names dynamically is that it requires absolute consistency in the API. As soon as I started testing a second model, I started finding that several of my assumptions about the API methods were failing:

Some methods require named ID parameters like "contact_id" and some require just "id"
Some objects have separate "get" and "search" methods for find-one and find-many, while some only have one or the other -- and there isn't consistency about whether they return arrays or single records
It isn't always clear what parameters are mandatory for adding various objects. Also, some objects have read-only fields, and trying to edit them via the API doesn't raise an error.

I've been gradually modifying the API methods so that they can take (for example) either "group_id" or "id" parameters, which should be a back-compatible change. On the other hand, fixing the "get" and "search" methods to return a single record and an array of records, respectively, is an incompatible change. I hope to eventually have these changes merged into the main codebase -- is there a way I can make these changes that will do the least amount of damage to other uses of the API?

Also, I don't currently have a working unit test infrastructure for CiviCRM -- I've been using ruby rspec tests through the REALLY_REST interface. I'm not sure that I understand the current state of unit test infrastructure for CiviCRM -- I hear that it was discussed at CiviCamp -- is it currently working well enough that we could be writing API unit tests?

Other API enhancements
The API currently only exposes a small subset of CiviCRM's functionality: there are only sixteen modules in /api/v2, while the CRM libraries are divided into thirty-four namespaces, and CiviCRM's database has one-hundred-and-thirteen (!) tables. We're trying to gradually expand the API -- our first goal is to deal with Roles and ACLs, but other major areas that need work are with Pledges and Memberships. Are CiviCRM's objects dealing with these subjects considered stable?

Future possibilities
Even these changes only scratch the surface of REST's potential. CiviCRM's API currently treats all objects as flat dictionaries, while sometimes a more correct XML rendering of an object would have nested structure. A simple example is the Contact object: the API currently mixes the user's address fields into the toplevel of the Contact's XML, even though contacts are allowed to have multiple addresses -- the API's behavior in this case is ambiguous. It is not obvious to me how to implement this without dramatically changing the behavior of CiviCRM's API methods.

acrosman · November 18, 2008, 03:48:28 pm

My sense is that the improvements you're suggesting to the API would make the most sense as part of a new version of the API. Xavier, Lobo, and I have been kicking around several ideas/changes to the current REST interface over the last few days that we're trying to rush to complete before the 2.2 code freeze that is scheduled for Dec 1 (see: http://forum.civicrm.org/index.php/topic,5506.0.html and http://forum.civicrm.org/index.php/topic,5590.0.html). We've been working to make the current interface easier to use, while not breaking backward compatibility. While I know you have been working to avoid breaking compatibitily as well, it seems to like that the level of work required to complete your task would require major disruptions.

Therefore I would take your ideas and make them part of whatever else comes to the surface as part of API v3. And that we should encourage the core team to help us start thinking about what changes are needed to make a major step forward. This would also give us a good place to expand out the API to cover other sections of CiviCRM, and to develop standards for keeping the system consistent as new modules are added. It would also be a time to change the response format to make it more flexible.

I say all this in part because it doesn't look to me like you can do any of what you're after, unless you do ALL of what you suggest. The ActiveResource library sounds restrictive enough that you need CiviCRM to comply with all of its demands or it will not work.

I do have a couple of questions about details of how this system works.

How is authentication handled? Lobo just settled a debate between Xavier and myself about how to handle authentication for our coming updates. That answer doesn't need to be the right one forever, but before we commit to the direction picked, in actually released code, it would be good to know if another voice needs to be heard (quickly please!). Authentication seems to be absent from your description, but has been a big part of recent discussions.

Also, what does "Respond differently to each of the HTTP protocol's methods: GET, POST, PUT, DELETE" mean? It's easy to make things reply "differently", is there guidance somewhere about what the different replies should be so that they are useful?

Thanks for throwing your hat into the ring.
Aaron

Donald Lobo · November 18, 2008, 06:40:13 pm

Quote from: jesse.wolfe on November 18, 2008, 01:42:40 pm

Some methods require named ID parameters like "contact_id" and some require just "id"

for the core object we should always use id. For example when updating a contribution, the contribution object should be called id, BUT the contact that is associated with it will be called contact_id. For most of the objects, you will have the main ID and at least one other associated id. If we are inconsistent, please file one issue for all the places where you've noticed this and we will fix

Quote from: jesse.wolfe on November 18, 2008, 01:42:40 pm

Some objects have separate "get" and "search" methods for find-one and find-many, while some only have one or the other -- and there isn't consistency about whether they return arrays or single records

IMO: get should always return a single record (i.e. an array of name=>value pairs), search should always return an array of records (each record is another array). Please file another issue with all the places where you've noticed a discrepancy and we will fix

Quote from: jesse.wolfe on November 18, 2008, 01:42:40 pm

It isn't always clear what parameters are mandatory for adding various objects. Also, some objects have read-only fields, and trying to edit them via the API doesn't raise an error.

We need to figure out how to have a unified "validation" system that both the form system and the API can use. This is a big issue and I suspect we should tackle (on a small scale to begin with) in 3.0

Quote from: jesse.wolfe on November 18, 2008, 01:42:40 pm

Also, I don't currently have a working unit test infrastructure for CiviCRM -- I've been using ruby rspec tests through the REALLY_REST interface. I'm not sure that I understand the current state of unit test infrastructure for CiviCRM -- I hear that it was discussed at CiviCamp -- is it currently working well enough that we could be writing API unit tests?

We use simple test and the drupal simple test module (hacked for our use in packages/drupal/simpletest). All of the api/v2 tests should be working as of 2.1.2

lobo

xavier · November 18, 2008, 11:53:06 pm

Quote from: jesse.wolfe on November 18, 2008, 01:42:40 pm

Justification
At raSA (http://rasantiago.com/), we're writing an Application in Ruby On Rails that uses CiviCRM as a back-end datastore. Rails has a core library called "ActiveResource" which is meant to map REST APIs to Ruby Objects (It's almost an ORM, except that REST is never quite a relational database). Rails is implemented as "Opinionated Software" -- when things are done in an industry standard way (or at least in the way that the Rails platform developers think is best), they work with minimal configuration. ActiveResource is, perhaps, more opinionated than most - it is essentially unconfigurable. For an API to work
with ActiveResource, it must:

Represent objects as XML
Always use XML nodes that have names that either represent what type of object they are (like <contact> ) or have a type attribute (like <ResultSet type="array"> )
Use ID fields named "id" for every class of object
Give unique URLs to each object, by ID, like: http://servername/contacts/1
Respond differently to each of the HTTP protocol's methods: GET, POST, PUT, DELETE
Accept raw XML in the POST body of a HTTP request (not in the traditional application/x-www-form-urlencoded style)
Use the HTTP "Status" header to denote errors
Report the URL of a newly created object in the HTTP "Location" header

I think that's a clear benefit to have a coherent set or rules across the API, and I'd like to add a new constraint: enforce the same action names for all the objects (eg. fetching a list shouldn't be once /entityA/read /entityB/search /entityC/get /entityD/query ...). Don't mind about the names, as long as they are the same.

However, I do believe the rest interface should be generic. That's good to get inspired and compatible with already defined "standards", but it should be a mean to get something good, not an end by itself, and if we find better examples of some part of the rest, we shouldn't find ourselves limited by the Rail limitations.

Quote from: jesse.wolfe on November 18, 2008, 01:42:40 pm

Always use XML nodes that have names that either represent what type of object they are (like <contact> ) or have a type attribute (like <ResultSet type="array"> )

Well, so far, the rest interface also handles JSON, and that's a good thing IMO.

Quote from: jesse.wolfe on November 18, 2008, 01:42:40 pm

Use ID fields named "id" for every class of object

That's mostly the case already for what I've seen, and if not, it should indeed (for the main ID, obviously the relationships will need to be entity_id

Quote from: jesse.wolfe on November 18, 2008, 01:42:40 pm

Give unique URLs to each object, by ID, like: http://servername/contacts/1
Respond differently to each of the HTTP protocol's methods: GET, POST, PUT, DELETE

So reading contact ID is /contacts/1 as get, modify /contacts/1 as POST and delete is DELETE ?

Can you do that via a browser (for an ajax interface) ? For what I know, most of the js libraries deal well with ajax calls either as get or post, haven't seen delete or put handled.

Moreover, how to handle several type of modifications to a contact (add a phone number, delete a tag, add her in a group...)

Beside, have you tried with lighttpd/ IIS severs ?

Quote from: jesse.wolfe on November 18, 2008, 01:42:40 pm

Accept raw XML in the POST body of a HTTP request (not in the traditional application/x-www-form-urlencoded style)

I don't see the benefit, and you won't be able to use this interface for a "regular" ajax post, would you ?

Quote from: jesse.wolfe on November 18, 2008, 01:42:40 pm

Use the HTTP "Status" header to denote errors

What status ? 500 ?

Quote from: jesse.wolfe on November 18, 2008, 01:42:40 pm

Report the URL of a newly created object in the HTTP "Location" header

Well returning the ID "in clear" makes sense (ie. if I just want the id, I don't want to have to parse the location). Not sure if it's compatible or not with the location (if we can put both)

Quote from: jesse.wolfe on November 18, 2008, 01:42:40 pm

Roadblocks
One problem with this system of building method names dynamically is that it requires absolute consistency in the API. As soon as I started testing a second model, I started finding that several of my assumptions about the API methods were failing:
Some methods require named ID parameters like "contact_id" and some require just "id"
Some objects have separate "get" and "search" methods for find-one and find-many, while some only have one or the other -- and there isn't consistency about whether they return arrays or single records
It isn't always clear what parameters are mandatory for adding various objects. Also, some objects have read-only fields, and trying to edit them via the API doesn't raise an error.

Good point about the consistency, and it should be treated as errors to be fixed.

As for knowing what are the parameters, and which ones are possible, and the one mandatory... Do you have any idea of how to make it clear ? I mean beside a hughe task of by hand documentation ?

X+

CiviCRM Community Forums (archive)

News:

Author Topic: raSA, REST and the Civi API (Read 4656 times)

jesse.wolfe

raSA, REST and the Civi API

acrosman

Re: raSA, REST and the Civi API

Donald Lobo

Re: raSA, REST and the Civi API

xavier

Re: raSA, REST and the Civi API