CiviCRM Community Forums (archive)

*

News:

Have a question about CiviCRM?
Get it answered quickly at the new
CiviCRM Stack Exchange Q+A site

This forum was archived on 25 November 2017. Learn more.
How to get involved.
What to do if you think you've found a bug.



  • CiviCRM Community Forums (archive) »
  • Old sections (read-only, deprecated) »
  • Developer Discussion »
  • Scalability (Moderator: Donald Lobo) »
  • Contact Importing scalability, and scalability of the system in general
Pages: 1 [2] 3

Author Topic: Contact Importing scalability, and scalability of the system in general  (Read 15164 times)

xcf33

  • I post frequently
  • ***
  • Posts: 181
  • Karma: 7
  • CiviCRM version: 3.3.2
  • CMS version: Drupal 6.19/6.20
  • MySQL version: 5.x
  • PHP version: 5.2.6
Re: Contact Importing scalability, and scalability of the system in general
May 25, 2010, 02:38:22 pm
Hi guys

To answer michaelmcandrew and Erik's point, I'm sure Lobo can chime in on this later as well.

The purpose of the import API is to abstract the methods for the current import process (not just for importing contacts, although we are starting off from this point)

It is basically taking the GUI out of the equation and allow the developer to have the flexibility to re-design their import process.


The the idea is that you will have something like a 1 step process rather than going through 4 screens when doing contact imports.


So basically everything that an import job needs will be fed into the import API as a nested array parameter including the mapping information to the CSV columns, de-dupe check, date format, etc etc.

Using the API will allow developers to take full advantage of all current import processes.



In our case, we are building a queue/batch based import process since our clients always have big import files (50 k to 500 k) so having the flexibility to schedule import jobs and able to do other things while the import job is running would be ideal.

One advantage of the import API as oppose to using a combination of current API's will be speed, data consistency and error checking.

Donald Lobo

  • Administrator
  • I’m (like) Lobo ;)
  • *****
  • Posts: 15963
  • Karma: 470
    • CiviCRM site
  • CiviCRM version: 4.2+
  • CMS version: Drupal 7, Joomla 2.5+
  • MySQL version: 5.5.x
  • PHP version: 5.4.x
Re: Contact Importing scalability, and scalability of the system in general
May 25, 2010, 06:17:10 pm

ok, a few clarifications and a MAJOR request for help:

1. the current focus is on getting this right for contact import. other imports will follow in later 3.x / 4.x versions

2. This is NOT a restructuring of the contact import process. This is a thin layer around the current import so we can do things via API calls (in addition to the screen). We are trying to minimize the amount of change in the core code, but will change it in various place to make the API possible :)

3. We'd like to build a LARGE suite of import tests so that we can

a. test and verify that import works as advertised
b. improve code coverage
c. improve our current coding / design / implementation process
d. have a good test suite so that we can go in and refactor the import code and ensure it does work

This is where we need everyone's help. The more test cases we have the higher the confidence etc. We do have lots of great test cases hitting the 3.2 branch already. So step in and help out :)

4. Finally, I do think that we can improve import performance significantly by taking a critical look at the queries executed. I consider this a refactoring :) but if folks are interested in increasing the speed, start thinking about query optimization and restructuring (but first write a lot of test cases!)

lobo
A new CiviCRM Q&A resource needs YOUR help to get started. Visit our StackExchange proposed site, sign up and vote on 5 questions

xavier

  • Forum Godess / God
  • I’m (like) Lobo ;)
  • *****
  • Posts: 4453
  • Karma: 161
    • Tech To The People
  • CiviCRM version: yes probably
  • CMS version: drupal
Re: Contact Importing scalability, and scalability of the system in general
May 25, 2010, 10:23:58 pm
Where are the existing tests on the import ?

Tried there, and failed...

http://svn.civicrm.org/civicrm/branches/v3.2.import/tests/phpunit/api/v2/

-Hackathon and data journalism about the European parliament 24-26 jan. Watch out the result

Donald Lobo

  • Administrator
  • I’m (like) Lobo ;)
  • *****
  • Posts: 15963
  • Karma: 470
    • CiviCRM site
  • CiviCRM version: 4.2+
  • CMS version: Drupal 7, Joomla 2.5+
  • MySQL version: 5.5.x
  • PHP version: 5.4.x
Re: Contact Importing scalability, and scalability of the system in general
May 26, 2010, 07:02:36 am

dont have them as yet :(

am waiting for chang to start contributing them. i'll try to write one so others can build on it

wierdly enough, i need to do a fairly large import today for the school (lots of custom fields etc), so am thinking on how i'll approach it :)

most likely, i'll do it as a custom script, since it typically requires a lot of data massaging

lobo
A new CiviCRM Q&A resource needs YOUR help to get started. Visit our StackExchange proposed site, sign up and vote on 5 questions

Michael McAndrew

  • Forum Godess / God
  • I live on this forum
  • *****
  • Posts: 1274
  • Karma: 55
    • Third Sector Design
  • CiviCRM version: various
  • CMS version: Nearly always Drupal
  • MySQL version: 5.5
  • PHP version: 5.3
Re: Contact Importing scalability, and scalability of the system in general
May 26, 2010, 11:20:47 am
Happy to move this to another thread if (people think) it becomes to far detatched from the original thread.

Quote from: Erik Hommel on May 25, 2010, 11:13:15 am
Michael,
I recognize part of what I do in your second story. I tend to:
* import the source data into MySQL
* do some simple manipulation in MySQL
* use some of my little tools to clean some more
* use my base script to run the API's in one batch
I have added a little error reporting as well.

I do similar, except I use PHP rather than 'MySQL and my little tools' to do the cleaning because then I can do the cleaning and import in one step.  Which is handy, because it allows me to follow the workflow:
1) do a test import, show it to client
2) they say, that isn't quite right
3) i make the change in the script and can re-import.
Also, it means that I still have the file format as they originally gave it to me, so they can say - oh we updated our data, and it is only a couple of clicks for me to redo the import.

Does that ring a bell.  Anyone interested in sharing techniques on this and maybe building up a cleaning / getting ready for CiviCRM / import library to help with imports?

Michael
Service providers: Grow your business, build your reputation and support CiviCRM. Become a partner today

Erik Hommel

  • Forum Godess / God
  • I live on this forum
  • *****
  • Posts: 1773
  • Karma: 59
    • EE-atWork
  • CiviCRM version: all sorts
  • CMS version: Drupal
  • MySQL version: Ubuntu's latest LTS version
  • PHP version: Ubuntu's latest LTS version
Re: Contact Importing scalability, and scalability of the system in general
May 26, 2010, 11:23:32 am
My little tools are in PHP, and I am quite willing to share whatever I have. Most of it is already part of the example in code snippets using API's on the wiki.
I am starting tomorrow with a small change to the contact API, adding one custom data field. I will post that code on the wiki too.
Consultant/project manager at EEatWork and CiviCooP (http://www.civicoop.org/)

xavier

  • Forum Godess / God
  • I’m (like) Lobo ;)
  • *****
  • Posts: 4453
  • Karma: 161
    • Tech To The People
  • CiviCRM version: yes probably
  • CMS version: drupal
Re: Contact Importing scalability, and scalability of the system in general
May 26, 2010, 11:37:48 am
Hi,

same here, all the scripts run from the shell, so I don't have memory/timeout issues, and I tend to skip the mysql part and read directly the csv/json/xml file (set as a param, eg. nice -19 php5-cli mylovelyimportprogram.php filetoimport.csv)

Tip: I use tools/bin/scripts/cli.php (bin/cli.php in 3.2) in the scripts, so it takes care of reading the civicrm.setting and find the right db and set the right user and so on.

So the real calls look like:

nice -19 php5-cli bin/mylovelyimportprogram.php -sdomain.dev -umyadminuser -pmypassword filetoimport.csv



-Hackathon and data journalism about the European parliament 24-26 jan. Watch out the result

xcf33

  • I post frequently
  • ***
  • Posts: 181
  • Karma: 7
  • CiviCRM version: 3.3.2
  • CMS version: Drupal 6.19/6.20
  • MySQL version: 5.x
  • PHP version: 5.2.6
Re: Contact Importing scalability, and scalability of the system in general
May 27, 2010, 11:11:02 am
Quote
am waiting for chang to start contributing them. i'll try to write one so others can build on it

Sorry guys, there has just been a lot of things on my plate the last couple of days I apologize. UnitTest is still fairly new to me so I'm taking baby steps.

I did commit some code for different testing cases just now maybe you guys can help build on there, the delay has been that I have not been able to set up the phpunit testing environment on my server yet :(


Anyways, as I'm coding right now, I found one of the most annoying thing with the current API is that I have to separate Location API calls when importing address, phone number, etc.

That requires the contacts to be imported first using contact API and then make the location API calls. This causes some mismatching of the original CSV file if some of the contacts fails to be created using the contact API. It is still safe to construct the temporary importing table instead of doing everything from the memory.


My script process flow is as follows

1. Call Contact API to create contacts
2. If import file contains address data, phone, etc, invoke the location API
3. Add the contacts to groups if that option is choosen
4. Add tags to contacts if that option is choosen

fen

  • I post frequently
  • ***
  • Posts: 216
  • Karma: 13
    • CivicActions
  • CiviCRM version: 3.3-4.3
  • CMS version: Drupal 6/7
  • MySQL version: 5.1/5.5
  • PHP version: 5.3/5.4
Re: Contact Importing scalability, and scalability of the system in general
February 18, 2011, 07:08:15 pm
Wow - I have just discovered that I am reinventing the same wheel that you all have already created.  I have scripts to pull the CSV into mysql, mysql scripts to do a lot of the cleaning (though I like @michaelmcandrew's process of "loop through each of the records, cleaning as I go, and creating an array which I feed to the relevant API to do the import") and I'm just starting on building my API calls.  I took a look at the framework at http://svn.civicrm.org/civicrm/branches/v3.2.import/api/v2/Import.php and that's more abstract than I can afford to dive into now.  I just need results, and it sounds like we've all gone down the same path (though I'm just starting).

My API calls will need to handle custom data fields (haven't tested yet, but it appears I may be in luck with this handled automagically in civicrm_contact_create) and some relationships (e.g., employee of).  Have any of you published your tools?  The API code snippets are excellent (that you Erik and any others for contributing) - they give me the confidence to try this out.  But as I'm under the gun, something that already works to some degree would be much appreciated.  I will surely return any updates I make...

Erik Hommel

  • Forum Godess / God
  • I live on this forum
  • *****
  • Posts: 1773
  • Karma: 59
    • EE-atWork
  • CiviCRM version: all sorts
  • CMS version: Drupal
  • MySQL version: Ubuntu's latest LTS version
  • PHP version: Ubuntu's latest LTS version
Re: Contact Importing scalability, and scalability of the system in general
February 19, 2011, 03:39:16 am
I have a code example I am happy to share with you, just give me a shout at my mailaddress ? (mailto:hommel@ee-atwork.nl)
Consultant/project manager at EEatWork and CiviCooP (http://www.civicoop.org/)

xcf33

  • I post frequently
  • ***
  • Posts: 181
  • Karma: 7
  • CiviCRM version: 3.3.2
  • CMS version: Drupal 6.19/6.20
  • MySQL version: 5.x
  • PHP version: 5.2.6
Re: Contact Importing scalability, and scalability of the system in general
February 23, 2011, 08:06:47 pm
I have developed a drupal module that we been using with some good results.

https://github.com/emotive/CiviCRM-Scalable-Import-Tool

I been able to use it to import up to 100,000 contacts.

It's been under active development/improvements and it's pretty stable for most part. It mainly uses Drupal API as the form builder and front-end and CiviCRM Contact/Location/Group/Tag APIs to import the contacts.

There are some validation built into it as well. There are some limitations but it might be worth giving it a try.

The 3.2 import branch is a bit slow, I think Lobo is working on abstract out the native import process into APIs so developers can "plug" in their frontend interface with it.


Cheers!

Donald Lobo

  • Administrator
  • I’m (like) Lobo ;)
  • *****
  • Posts: 15963
  • Karma: 470
    • CiviCRM site
  • CiviCRM version: 4.2+
  • CMS version: Drupal 7, Joomla 2.5+
  • MySQL version: 5.5.x
  • PHP version: 5.4.x
Re: Contact Importing scalability, and scalability of the system in general
February 24, 2011, 05:08:11 pm
Quote from: xcf33 on February 23, 2011, 08:06:47 pm
The 3.2 import branch is a bit slow, I think Lobo is working on abstract out the native import process into APIs so developers can "plug" in their frontend interface with it.

this is currently on hold since its a fair number of hours and investment (100-200?). I suspect we'll launch a Make It Happen to try to accomplish this once we have a few seed funders

lobo
A new CiviCRM Q&A resource needs YOUR help to get started. Visit our StackExchange proposed site, sign up and vote on 5 questions

alanms

  • I post occasionally
  • **
  • Posts: 72
  • Karma: 5
Re: Contact Importing scalability, and scalability of the system in general
March 07, 2011, 04:59:53 am
That Drupal module looks like a promising step forward.

I'm putting in a proposal to a client to do some work on this. I don't want to off-topic this fine discussion, as my aims and spec are different to what's being discussed here but it'd be great to get input and thoughts, and what I'm proposing might be useful to some of you: http://forum.civicrm.org/index.php/topic,18923.0.html

JoeMurray

  • Administrator
  • Ask me questions
  • *****
  • Posts: 578
  • Karma: 24
    • JMA Consulting
  • CiviCRM version: 4.4 and 4.5 (as of Nov 2014)
  • CMS version: Drupal, WordPress, Joomla
  • MySQL version: MySQL 5.5, 5.6, MariaDB 10.0 (as of Nov 2014)
Re: Contact Importing scalability, and scalability of the system in general
March 17, 2011, 07:02:16 am
I'm wondering if we would be better off with a different approach for this problem.

In general, migrations involve inter-related records for contacts; associated phone, address and email records; membership info; contribution info; activities; etc.

We also need to generally transform data so that it is in a format that CiviCRM can import.

As I have a potential client who will periodically need to import about 3 million records, and initially will need to support a variety of formats, I'm looking into Pentaho Kettle. As well as being a leading open source ETL project, it integrates with Hadoop for higher end scalability and speed (though I haven't found that code available for download to the community - perhaps it is reserved for their Enterprise customers).

Anyone else interested in pursuing this approach?
Co-author of Using CiviCRM https://www.packtpub.com/using-civicrm/book

davej

  • Ask me questions
  • ****
  • Posts: 404
  • Karma: 21
Re: Contact Importing scalability, and scalability of the system in general
July 27, 2011, 08:45:12 am
Hi,

Re the 30 second limit with Rackspace Cloud, Rackspace tell me this just applies to their Cloud Sites hosting platform: http://www.rackspace.com/cloud/cloud_hosting_products/sites/ .

They suggested: "One thing they could try would be to program a 'progress bar' of sorts into their application so that the connection at least has some data going through it, forcing it to stay open."

Perhaps that's an interim option until the import refactoring is done.

Dave J

Pages: 1 [2] 3
  • CiviCRM Community Forums (archive) »
  • Old sections (read-only, deprecated) »
  • Developer Discussion »
  • Scalability (Moderator: Donald Lobo) »
  • Contact Importing scalability, and scalability of the system in general

This forum was archived on 2017-11-26.