CiviCRM Community Forums (archive)

*

News:

Have a question about CiviCRM?
Get it answered quickly at the new
CiviCRM Stack Exchange Q+A site

This forum was archived on 25 November 2017. Learn more.
How to get involved.
What to do if you think you've found a bug.



  • CiviCRM Community Forums (archive) »
  • Old sections (read-only, deprecated) »
  • Developer Discussion »
  • APIs and Hooks (Moderator: Donald Lobo) »
  • contact API - dedupe
Pages: 1 [2]

Author Topic: contact API - dedupe  (Read 4444 times)

lcdweb

  • Forum Godess / God
  • I live on this forum
  • *****
  • Posts: 1620
  • Karma: 116
    • www.lcdservices.biz
  • CiviCRM version: many versions...
  • CMS version: Joomla/Drupal
  • MySQL version: 5.1+
  • PHP version: 5.2+
Re: contact API - dedupe
March 26, 2012, 10:04:51 am
personally, i would want that to be something handled by the hook.
we don't make that assumption -- in fact, for our data the existing address is typically the more reliable.
support CiviCRM through 'make it happen' initiatives!
http://civicrm.org/mih

xavier

  • Forum Godess / God
  • I’m (like) Lobo ;)
  • *****
  • Posts: 4453
  • Karma: 161
    • Tech To The People
  • CiviCRM version: yes probably
  • CMS version: drupal
Re: contact API - dedupe
March 26, 2012, 10:20:34 am
If 100% of the data is similar, what's the risk of merging addresses fields?

X+
-Hackathon and data journalism about the European parliament 24-26 jan. Watch out the result

lcdweb

  • Forum Godess / God
  • I live on this forum
  • *****
  • Posts: 1620
  • Karma: 116
    • www.lcdservices.biz
  • CiviCRM version: many versions...
  • CMS version: Joomla/Drupal
  • MySQL version: 5.1+
  • PHP version: 5.2+
Re: contact API - dedupe
March 26, 2012, 10:24:59 am
it's not a question of risk, it's a question of performance.
support CiviCRM through 'make it happen' initiatives!
http://civicrm.org/mih

xavier

  • Forum Godess / God
  • I’m (like) Lobo ;)
  • *****
  • Posts: 4453
  • Karma: 161
    • Tech To The People
  • CiviCRM version: yes probably
  • CMS version: drupal
Re: contact API - dedupe
March 26, 2012, 11:10:37 am
Are you using the API? for the phone/website/email, we identified some performance issues that could (mostly easily) be fixed.

X+
-Hackathon and data journalism about the European parliament 24-26 jan. Watch out the result

lcdweb

  • Forum Godess / God
  • I live on this forum
  • *****
  • Posts: 1620
  • Karma: 116
    • www.lcdservices.biz
  • CiviCRM version: many versions...
  • CMS version: Joomla/Drupal
  • MySQL version: 5.1+
  • PHP version: 5.2+
Re: contact API - dedupe
March 26, 2012, 11:34:01 am
Xavier --
Fen is working with the batch merge functionality that will be included in 4.2.
It's a pretty time and resource intensive function as it is. I'm just suggesting that an address comparison is not trivial to spec and process, and would potentially add quite a bit of time to what is already a long-processing script. So my recommendation is to allow the safe batch merge to proceed as it currently does, and then handle the cleanup afterward. You *could* handle it via the hook, which may be fine if you have small volume.
support CiviCRM through 'make it happen' initiatives!
http://civicrm.org/mih

xavier

  • Forum Godess / God
  • I’m (like) Lobo ;)
  • *****
  • Posts: 4453
  • Karma: 161
    • Tech To The People
  • CiviCRM version: yes probably
  • CMS version: drupal
Re: contact API - dedupe
March 26, 2012, 12:01:33 pm
Fine for me, and great addition to civi.

Do you have already some benchmarks on how many contacts can be matched/merged?

I'm assuming that's a big thing to be run in the middle of the night or during week-ends

X+
-Hackathon and data journalism about the European parliament 24-26 jan. Watch out the result

fen

  • I post frequently
  • ***
  • Posts: 216
  • Karma: 13
    • CivicActions
  • CiviCRM version: 3.3-4.3
  • CMS version: Drupal 6/7
  • MySQL version: 5.1/5.5
  • PHP version: 5.3/5.4
Re: contact API - dedupe
March 28, 2012, 12:45:37 am
Attached is my current attempt to 1) not duplicate phone, email or address location info (with a very simple-minded '==' check), and 2) switch the primary address to be the more recently added one (adding am operation '3' to CRM/Dedupe/Merger.php).  I've just finished writing this and have not tested it yet (that's for tomorrow) but I'd be interested if you thought I was moving in the right direction or if I have a fatal flaw in my logic somewhere.

I think this would be relatively lightweight (wrt the rest of the process) and generally advantageous.

Thanks!
=Fen

xavier

  • Forum Godess / God
  • I’m (like) Lobo ;)
  • *****
  • Posts: 4453
  • Karma: 161
    • Tech To The People
  • CiviCRM version: yes probably
  • CMS version: drupal
Re: contact API - dedupe
March 28, 2012, 01:04:58 am
Hi,

Sounds like great additions.

It reminds me a "could be done better" on the merge: if the merged contacts works for the same organisation, the result ends up with two relationships "employee/employer". Is this something that will happen with batch merge?

X+
-Hackathon and data journalism about the European parliament 24-26 jan. Watch out the result

fen

  • I post frequently
  • ***
  • Posts: 216
  • Karma: 13
    • CivicActions
  • CiviCRM version: 3.3-4.3
  • CMS version: Drupal 6/7
  • MySQL version: 5.1/5.5
  • PHP version: 5.3/5.4
Re: contact API - dedupe
March 28, 2012, 04:44:40 pm
There's also a potential problem that:
IF the Main contact
 * has a HOME address (location_type_id=1) that is_primary, AND
 * has a WORK address (location_type_id=2)
AND IF the Other contact
 * has a HOME address (location_type_id=1) that is_primary,
THEN
 * the resulting merged contact will have two WORK addresses, the second one being the old Other contact's HOME address.

I believe this breaks the constraint (that should exist if it doesn't already) that each contact can have at most one address of each location type.  One place the need for such a constraint shows up is with shared addresses, as one chooses to share an address using two criteria: the contact (id) and the location_type_id.

Not sure yet of a smart (and efficient) way to deal with this.

Eileen

  • Forum Godess / God
  • I’m (like) Lobo ;)
  • *****
  • Posts: 4195
  • Karma: 218
    • Fuzion
Re: contact API - dedupe
March 28, 2012, 04:53:32 pm
Quote
I believe this breaks the constraint (that should exist if it doesn't already) that each contact can have at most one address of each location type.

I don't believe this constraint exists - or that it is always the case that people only ever have one work phone number, for example
Make today the day you step up to support CiviCRM and all the amazing organisations that are using it to improve our world - http://civicrm.org/contribute

fen

  • I post frequently
  • ***
  • Posts: 216
  • Karma: 13
    • CivicActions
  • CiviCRM version: 3.3-4.3
  • CMS version: Drupal 6/7
  • MySQL version: 5.1/5.5
  • PHP version: 5.3/5.4
Re: contact API - dedupe
March 28, 2012, 07:15:36 pm
I'd really like to get clarity on this.  While I believe I have read (e.g. at http://forum.civicrm.org/index.php/topic,22366.msg93784.html#msg93784) that multiple email and phones can exist per location type, addresses are treated specially and only one address per location type should be allowed.  While this constraint is enforced in the UI (e.g., once a Home address is created, Home no longer appears in the pull-down for additional addresses) the API will cheerily allow more than one (e.g.) Work address (as I mentioned above).  While some people have multiple homes or places of business, this does seem to cause a problem with the shared address facility (see above link).

fen

  • I post frequently
  • ***
  • Posts: 216
  • Karma: 13
    • CivicActions
  • CiviCRM version: 3.3-4.3
  • CMS version: Drupal 6/7
  • MySQL version: 5.1/5.5
  • PHP version: 5.3/5.4
Re: contact API - dedupe
March 29, 2012, 09:49:53 pm
Quote from: fen on March 28, 2012, 12:45:37 am
Attached is my current attempt to 1) not duplicate phone, email or address location info (with a very simple-minded '==' check), and 2) switch the primary address to be the more recently added one (adding am operation '3' to CRM/Dedupe/Merger.php).

I've been trying to get this code to work but nothing is making sense for me (obviously there are intricacies in the code that I am as yet unaware of).  I get the &$data reference in the hook and make changes, but I don't see them carry through to Merger::moveAllBelongings().  One specific example: if the hook sees an Other address with a newer (read: greater id) than the Main address, it sets 'operation' to 3 which should trigger a 'switch' operation in the moveAllBelongings().  But I never see a '3' come into that code.

What are the expected operations that one can make in the 'batch' process?  Is it limited to conflict management?  Do I need to set state somewhere and do all the operations in the 'sqls' hook?

fen

  • I post frequently
  • ***
  • Posts: 216
  • Karma: 13
    • CivicActions
  • CiviCRM version: 3.3-4.3
  • CMS version: Drupal 6/7
  • MySQL version: 5.1/5.5
  • PHP version: 5.3/5.4
Re: contact API - dedupe
April 02, 2012, 09:30:34 pm
Quote from: lcdweb on March 26, 2012, 11:34:01 am
an address comparison is not trivial to spec and process, and would potentially add quite a bit of time to what is already a long-processing script. So my recommendation is to allow the safe batch merge to proceed as it currently does, and then handle the cleanup afterward.

Looking at doing this as a post-merge cleanup script, too much information is needed.  For example, you'd need need to know what addresses were is_primary and is_billing before any merges so that if a newer {location_type_id == 1, is_primary == 1} otherId address gets merged into being a {location_type_id == 2, is_primary == 0} mainId address, we can know this and do the right thing.

before Merge:
Code: [Select]
mainId == 50:   address_1: id == 100, loc_type_id == 1, is_primary == 1
                address_2: id == 200, loc_type_id == 2, is_primary == 0
otherId == 100: address_3: id == 300, loc_type_id == 1, is_primary == 1

current after Merge:
Code: [Select]
mainId == 50:  address_1: id == 100, loc_type_id == 1, is_primary == 1
               address_2: id == 200, loc_type_id == 2, is_primary == 0
               address_3: id == 300, loc_type_id == 2, is_primary == 0

My experience is that newer contacts and/or newer address records usually come with newer, more accurate address information.  (If my experience is not the norm, please let me know and I will better understand why my patches are not being considered.)

patched after Merge:
Code: [Select]
mainId == 50:  address_1: id == 100, loc_type_id == 2, is_primary == 0
               address_2: id == 200, loc_type_id == 2, is_primary == 0
               address_3: id == 300, loc_type_id == 1, is_primary == 1

I believe this can be accomplished with very little extra processing (see merge_switch2.txt, above) but I may be wrong as the patch is not working due to some magic behind some curtain I am not yet aware of.  True, this will not be correct for every case, but I believe it will be correct more of the time, which is a good thing.

fen

  • I post frequently
  • ***
  • Posts: 216
  • Karma: 13
    • CivicActions
  • CiviCRM version: 3.3-4.3
  • CMS version: Drupal 6/7
  • MySQL version: 5.1/5.5
  • PHP version: 5.3/5.4
Re: contact API - dedupe
April 10, 2012, 08:17:04 am
Coming back to this:

I missed it the first (and second) times through, but I see now that updating $data in hook_merge('batch') won't do anything as 'old_migration_info' is passed, not the real $migrationInfo.  So only changes to $data['conflicts'] are carried through.  It would be much more powerful if the data structure were available for modification.  But as that seems to be by design as the data structure would need careful documentation and handling as it is very specific and could break easily if modified incorrectly.

So I've added a hook for your consideration: 'address' called in skipMerge enables address updates in the hook.  I've attached the Merger.patch and some example code for the hook.  Does this seem generally useful?  (It's certainly needed in our use case.)

Thanks!

Pages: 1 [2]
  • CiviCRM Community Forums (archive) »
  • Old sections (read-only, deprecated) »
  • Developer Discussion »
  • APIs and Hooks (Moderator: Donald Lobo) »
  • contact API - dedupe

This forum was archived on 2017-11-26.