CiviCRM Community Forums (archive)

*

News:

Have a question about CiviCRM?
Get it answered quickly at the new
CiviCRM Stack Exchange Q+A site

This forum was archived on 25 November 2017. Learn more.
How to get involved.
What to do if you think you've found a bug.



  • CiviCRM Community Forums (archive) »
  • Old sections (read-only, deprecated) »
  • Support »
  • Using CiviCRM »
  • Post-installation Setup and Configuration (Moderator: Dave Greenberg) »
  • Excluding known non-duplicates from the Merge logic
Pages: [1]

Author Topic: Excluding known non-duplicates from the Merge logic  (Read 1762 times)

ken

  • I live on this forum
  • *****
  • Posts: 916
  • Karma: 53
    • City Bible Forum
  • CiviCRM version: 4.6.3
  • CMS version: Drupal 7.36
  • MySQL version: 5.5.41
  • PHP version: 5.3.10
Excluding known non-duplicates from the Merge logic
March 06, 2010, 09:07:08 pm
I know there is an issue requesting that known non-duplicates be excluded from Find and Merge Duplicates (CRM-3702).

In the meantime I'm trying to simulate this by creating a custom data field which takes on the value '0' by default, and which gets included in deduplication rules. If I find 2 people with (say) the same first name and last name, I can set this value to non-zero for one of the contacts and they won't be included in future dedupe reports.

How can I set this value for all existing contacts? I could try the action Batch Update With Profile, but this does 50 at a time ...s-l-o-w!

Any ideas?

Ken

ken

  • I live on this forum
  • *****
  • Posts: 916
  • Karma: 53
    • City Bible Forum
  • CiviCRM version: 4.6.3
  • CMS version: Drupal 7.36
  • MySQL version: 5.5.41
  • PHP version: 5.3.10
Re: Excluding known non-duplicates from the Merge logic
March 06, 2010, 09:49:22 pm
I've found a way of excluding known non-duplicates which doesn't require adding data to all contacts.

  • Create a custom data group with a single field
  • Change the dedupe rules to include that field, but with a NEGATIVE weight (ie, so a match on that field causes the overall match to be less than the threshold)
  • Using that rule, when two or more contacts match, but are shown to be non-duplicates, set their custom data field to the same value
  • That group of contacts will not appear in any future search for duplicates

Ken

xavier

  • Forum Godess / God
  • I’m (like) Lobo ;)
  • *****
  • Posts: 4453
  • Karma: 161
    • Tech To The People
  • CiviCRM version: yes probably
  • CMS version: drupal
Re: Excluding known non-duplicates from the Merge logic
March 07, 2010, 01:26:49 am
Smart one. Thanks for sharing.
-Hackathon and data journalism about the European parliament 24-26 jan. Watch out the result

Dave Greenberg

  • Administrator
  • I’m (like) Lobo ;)
  • *****
  • Posts: 5760
  • Karma: 226
    • My CiviCRM Blog
Re: Excluding known non-duplicates from the Merge logic
March 08, 2010, 02:59:00 pm
Ken - this seems like an interesting approach, and probably quite useful for other folks.

Would be cool if you could add this as a {tip} to the wiki section on Finding Dupes!

http://wiki.civicrm.org/confluence/display/CRMDOC/Find+Duplicate+Contacts
Protect your investment in CiviCRM by  becoming a Member!

ken

  • I live on this forum
  • *****
  • Posts: 916
  • Karma: 53
    • City Bible Forum
  • CiviCRM version: 4.6.3
  • CMS version: Drupal 7.36
  • MySQL version: 5.5.41
  • PHP version: 5.3.10
Re: Excluding known non-duplicates from the Merge logic
March 08, 2010, 07:38:13 pm
Dave,

I'm still playing with the approach, so let me iron out the bugs (and I hope there are none) and I'll update that page,

Ken

DanilaD

  • I post occasionally
  • **
  • Posts: 93
  • Karma: 11
Re: Excluding known non-duplicates from the Merge logic
September 08, 2010, 08:21:51 am
Excellent idea! Thanks a lot, Ken.

petednz

  • Forum Godess / God
  • I’m (like) Lobo ;)
  • *****
  • Posts: 4899
  • Karma: 193
    • Fuzion
  • CiviCRM version: 3.x - 4.x
  • CMS version: Drupal 6 and 7
Re: Excluding known non-duplicates from the Merge logic
September 08, 2010, 01:43:14 pm
Ken - another approach that I wondered about was using a Relationship - ie Person A is 'not duplicate of' Person B

That would mean only needing to do the modification on one of the contacts - and then have a Group of 'all except those with above relationships' - any comments since you have been trying things out?
Sign up to StackExchange and get free expert advice: https://civicrm.org/blogs/colemanw/get-exclusive-access-free-expert-help

pete davis : www.fuzion.co.nz : connect + campaign + communicate

ken

  • I live on this forum
  • *****
  • Posts: 916
  • Karma: 53
    • City Bible Forum
  • CiviCRM version: 4.6.3
  • CMS version: Drupal 7.36
  • MySQL version: 5.5.41
  • PHP version: 5.3.10
Re: Excluding known non-duplicates from the Merge logic
September 08, 2010, 05:43:12 pm
Peter,

I updated the page http://wiki.civicrm.org/confluence/display/CRMDOC/Find+Duplicate+Contacts back on August 20 as Dave G suggested. Sorry for not posting back here to advise.

We haven't had a huge deal of experience with this approach, as we are a small outfit who has to live with some degree of ambiguity: we generally only clean up data when a need presents itself, and when the team has time (which isn't often).

I like your Relationship approach as it relies on an existing feature rather than requiring new custom data. The custom data approach presents itself on every contact record (noisy) while the relationship only appears where it needs to (efficient). (Though perhaps the contra there is that being in-your-face, the custom data approach might be easier to train people in?)

How would this scenario work?

Say "John Smith" 1 and "John Smith" 2 have a relationship "share the same name". Another "John Smith" comes along: will he be a candidate for deduping? I think he will, as he's in the group you propose (persons not in a "shares name" relationship). If #3 is a duplicate he gets merged (ouch!). If he's not, we create a relationship with one of the first two.

If we realise we've made a mistake (they are duplicates), we can remove the relationship from the duplicate and then merge it with the original (I think we need to do things in that order because if we merge while the relationship exists, we run the risk of either having a relationship from and to the same contact, or 2 relationships between the same pair.)

Finally, how do you create the group "not in a 'share name' relationship"? My mind's gone blank on that one.

Smart thinking, oh fellow Antipodean!

Ken

(we will beat you guys in the Rugby one day, you know! People have told me I'm a patient guy, and I might need that!)

petednz

  • Forum Godess / God
  • I’m (like) Lobo ;)
  • *****
  • Posts: 4899
  • Karma: 193
    • Fuzion
  • CiviCRM version: 3.x - 4.x
  • CMS version: Drupal 6 and 7
Re: Excluding known non-duplicates from the Merge logic
September 08, 2010, 07:45:13 pm
Quote from: ken on September 08, 2010, 05:43:12 pm
Finally, how do you create the group "not in a 'share name' relationship"? My mind's gone blank on that one.
The in-built custom search that says 'yes this group, not that group'

But more important (for others who may be reading this, hint hint) is to fund the real solution to this via the 'make it happen' funding option (I know funds are hard to come by)

Quote
(we will beat you guys in the Rugby one day, you know!)
As our famour beer ad billboards say 'Yeah right!" ;-)

I still have painful memories of John Eales killing us in the final minute on more than one occassion. And don't mention Geoff Wilson being tackled mid-air on way to touchdown - brings shivers still.
Sign up to StackExchange and get free expert advice: https://civicrm.org/blogs/colemanw/get-exclusive-access-free-expert-help

pete davis : www.fuzion.co.nz : connect + campaign + communicate

Pages: [1]
  • CiviCRM Community Forums (archive) »
  • Old sections (read-only, deprecated) »
  • Support »
  • Using CiviCRM »
  • Post-installation Setup and Configuration (Moderator: Dave Greenberg) »
  • Excluding known non-duplicates from the Merge logic

This forum was archived on 2017-11-26.