CiviCRM Community Forums (archive)

*

News:

Have a question about CiviCRM?
Get it answered quickly at the new
CiviCRM Stack Exchange Q+A site

This forum was archived on 25 November 2017. Learn more.
How to get involved.
What to do if you think you've found a bug.



  • CiviCRM Community Forums (archive) »
  • Old sections (read-only, deprecated) »
  • Developer Discussion (Moderator: Donald Lobo) »
  • Batch import API
Pages: [1]

Author Topic: Batch import API  (Read 1961 times)

Adam Wight

  • I’m new here
  • *
  • Posts: 9
  • Karma: 3
  • CiviCRM version: 4.2.x
  • CMS version: Drupal
  • MySQL version: 5
  • PHP version: 5.3
Batch import API
April 29, 2012, 11:30:20 am
I want to programmatically import batches of CRM data from an external source into Civi.  The existing interfaces are pretty nice, but could be improved by hiding several implementation details from the calling code:
  • Hide object encapsulation design, accept input data rows with any mix of Contact, Address, Contribution, or other fields
  • Process batches using an internal loop which can be optimized according to particular Civi needs
  • Implement import in the API layer (see also http://issues.civicrm.org/jira/browse/CRM-6273), accept strictly declarative data and return errors and warnings as a list

The algorithm will iterate over the input data in several passes, building up data according to the entity dependency graph.  The first pass will create contacts and supporting location, email records and so on.  The next pass will import contribution, participant and membership data and link to the previously created contacts.  Finally, module hooks are run, and a dedupe is performed.

My plan so far is to implement a Civi API which has something like the following call semantics:
Code: [Select]
$crm->Import->mixed(
    /*
     * Note that Civi field names might contain some
     * magic like "billing_" to assign basic secondary
     * attributes.
     */
    mapper: {
        "name": "contact.display_name",
        "email": "contact.primary_email",
        "bill_mail": "contact.billing_email",
        "amount": "contribution.total_amount",
    },
    /*
     * Records can be specified in heterogeneous formats, may
     * not completely fulfill the mapping schema, or can be
     * written as an array without the key names.
     */
    records: {
        { name: "Mikhail B", email: "intl@congress.it" },
        [ "Nom dePlume", "ndp@example.fr", null, null ],
        { bill_mail: "anonymous@admirer@scumlabs.corp", amount: "9999.44" },
    },
    options: {
        stopOnError: false,
    },
);


Output in this case would indicate that the third row could not be imported at all, but otherwise the import was processed successfully.  Custom, third-party code might have been run as hooks, or a subsequent api call might have access to the contact and contribution ids created during this call.
Code: [Select]
$result = {
    numSuccessfulRows: 2,
    errors: [
        { 
            message: "Invalid email address",
            rowOffset: 2,
            fieldName: "bill_mail",
            source: { bill_mail: "anonymous@admirer@scumlabs.corp", amount: "9999.44" },
        },
    ],
}

Obviously, this is a very rough outline.  Please share any thoughts!
-Adam
« Last Edit: April 29, 2012, 11:31:51 am by Adam Wight »

Donald Lobo

  • Administrator
  • I’m (like) Lobo ;)
  • *****
  • Posts: 15963
  • Karma: 470
    • CiviCRM site
  • CiviCRM version: 4.2+
  • CMS version: Drupal 7, Joomla 2.5+
  • MySQL version: 5.5.x
  • PHP version: 5.4.x
Re: Batch import API
April 29, 2012, 12:26:02 pm

hey adam:

might want to check:

http://forum.civicrm.org/index.php/topic,13630.0.html
http://civicrm.org/blogs/lobo/implementing-batch-import-api

A couple of key things which we should accomplish (which u've also mentioned include):

1. Allow really large imports either sync or async with the current php / apache limitations (memory and execution time)
2. Use a queue mechanism (which will be part of 4.2 to do the above)
3. Allow user to override the core part of the import process, i.e. i might decide that for this particular case i can do things in a few lines of sql which will be far more efficient than anything else. This is similar to our dedupe hooks, where the user can basically do their own implementation

overall i think u r on the right track :) would be great to replace the current code, which has the distinction of being the oldest code in civi that has not been reworked on at all (yes a very very dubious honor)

lobo

A new CiviCRM Q&A resource needs YOUR help to get started. Visit our StackExchange proposed site, sign up and vote on 5 questions

Eileen

  • Forum Godess / God
  • I’m (like) Lobo ;)
  • *****
  • Posts: 4195
  • Karma: 218
    • Fuzion
Re: Batch import API
April 29, 2012, 02:12:59 pm
So, firstly, if create could accept either a params array or an array or params arrays & do multiple creates that would be step in the right direction? Or would it be better as a separate generic function ie.

Perhaps we should add an action to api/v3/Generic.php

function civicrm_api3_generic_import($apiRequest) {

}

which just provides a wrapper for the api create functions.

ie. suspect we should have 2 separate mechanisms / wrappers for multiple create + error reporting & for mapping.

Make today the day you step up to support CiviCRM and all the amazing organisations that are using it to improve our world - http://civicrm.org/contribute

xavier

  • Forum Godess / God
  • I’m (like) Lobo ;)
  • *****
  • Posts: 4453
  • Karma: 161
    • Tech To The People
  • CiviCRM version: yes probably
  • CMS version: drupal
Re: Batch import API
April 29, 2012, 02:27:45 pm
Hi,

Quote
$crm->Import->mixed(

btw, there is already an OO code for api (api/class.api.php) that might be the right place to add that. with the added benefits of working both locally or over rest).

Syntax bikeshedding: crm->Mixed->Import (always entity->action).

I think we should improve the api to return error_codes, that's especially import for imports where the "reader" is going to be a program, and It'd rather not have to rely on parsing error_messages and hope to get lucky (I've had the issue on the tests when changing the error message)


The mapping might become complicated, and it might need to get more info about how to process (eg. should we modify the contribution if it exists, always create the bill_email, what's the default contribution type to use)

I'm not saying it has to be implemented, but would be good the syntax chosen is extensible to be able to add some of these features. And definitely the right direction indeed.

X+
-Hackathon and data journalism about the European parliament 24-26 jan. Watch out the result

totten

  • Administrator
  • Ask me questions
  • *****
  • Posts: 695
  • Karma: 64
Re: Batch import API
April 29, 2012, 02:36:07 pm
(Writing on phone -- apologize for terseness)

A few thoughts:

1. Agree that it's important to support importing mixed entity data, but I don't think it should be encapsulation -- the " encapsulated" form seems like it will become complex and leakyy. Its  better to do something like API chaining which allows you to describe entity relations while exposing the canonical data model.

 2. Also think it's important decouple the mappings from the main loop. There are different strategies for running an importer's main loop (web based progress bars, cli, batching, queueing,etc.


Heres an alternative interface for definif mappings - ask the dev to provide as callback function: 

Function ($row) => $apiParams

Where $row is an array representing a raw, flat row of data, and the result is an array that can be passed to the API. The result array can be a tree (with nested API calls to update related entities).).  The decision about which main loop to use (all-at-once; batching; queueing) can be made a runtime

Adam Wight

  • I’m new here
  • *
  • Posts: 9
  • Karma: 3
  • CiviCRM version: 4.2.x
  • CMS version: Drupal
  • MySQL version: 5
  • PHP version: 5.3
Re: Batch import API
May 02, 2012, 03:18:01 am
Great suggestions!  I'm still incubating questions, meanwhile I tried to pare this specification down to something that can be implemented without much trouble, but could be improved to read from a pluggable data source, run custom transformations on each row, and bypass BAO logic when desired.

http://wiki.civicrm.org/confluence/display/CRM/Multi-entity%2C+batch+import

Pages: [1]
  • CiviCRM Community Forums (archive) »
  • Old sections (read-only, deprecated) »
  • Developer Discussion (Moderator: Donald Lobo) »
  • Batch import API

This forum was archived on 2017-11-26.