CiviCRM Community Forums (archive)

*

News:

Have a question about CiviCRM?
Get it answered quickly at the new
CiviCRM Stack Exchange Q+A site

This forum was archived on 25 November 2017. Learn more.
How to get involved.
What to do if you think you've found a bug.



  • CiviCRM Community Forums (archive) »
  • Old sections (read-only, deprecated) »
  • Discussion (deprecated) »
  • Alpha and Beta Release Testing »
  • 3.2 Release Testing »
  • default dedupe rules
Pages: [1] 2

Author Topic: default dedupe rules  (Read 11658 times)

lcdweb

  • Forum Godess / God
  • I live on this forum
  • *****
  • Posts: 1620
  • Karma: 116
    • www.lcdservices.biz
  • CiviCRM version: many versions...
  • CMS version: Joomla/Drupal
  • MySQL version: 5.1+
  • PHP version: 5.2+
default dedupe rules
August 04, 2010, 11:24:51 pm
i think the default dedupe rules for indivs that ships with civi need to be swapped.
the fuzzy rule (fname + lname + email) is stricter than the strict rule (email only).
support CiviCRM through 'make it happen' initiatives!
http://civicrm.org/mih

Dave Greenberg

  • Administrator
  • I’m (like) Lobo ;)
  • *****
  • Posts: 5760
  • Karma: 226
    • My CiviCRM Blog
Re: default dedupe rules
August 05, 2010, 09:55:55 am
Brian - Not sure there is a good "one size fits all" default, but would be helpful to hear more details as to your goals in suggesting this (i.e. less accidental matching? or ??), and  why you think "first, last, email" would provide better results. Would be also great to hear from others what they're using and how it's working for them.

Some rationales for the current STRICT default:
* STRICT rule is used primarily for public facing forms (online contributions, event registration ...) where no staff / admin person from the site's owner is evaluating the decision as to whether this transaction is coming from a new or existing contact.
* The only data point we're sure to get on these forms is email address.
* We've assumed that email address is a decent unique "key" for a person (altho we know that some folks share them - that seems more like an edge case).
* Adding the requirement to also match on name as the default behavior seems likely to produce more unintended no-match conditions (and hence more dupes to clean up):
-- Dave Green dave@foobar.org
-- David Green dave@foobar.org
etc.
Protect your investment in CiviCRM by  becoming a Member!

Eileen

  • Forum Godess / God
  • I’m (like) Lobo ;)
  • *****
  • Posts: 4195
  • Karma: 218
    • Fuzion
Re: default dedupe rules
August 05, 2010, 06:07:27 pm
I think I'm with Brian on this - I'm seeing quite a few installations where they have large quantities of people who share e-mail addresses - usually families but sometimes organisations. I suspect it depends on the IT-saviness of the constituents.

I would probably compromise at e-mail + first 3 letters of first name. This would produce the right results for

Susan Green & Bob Green bobnsusan@thegreens.org
And for your Dave Green example but would still obviously fall on Mike vs Michael vs Michelle etc.
Make today the day you step up to support CiviCRM and all the amazing organisations that are using it to improve our world - http://civicrm.org/contribute

lcdweb

  • Forum Godess / God
  • I live on this forum
  • *****
  • Posts: 1620
  • Karma: 116
    • www.lcdservices.biz
  • CiviCRM version: many versions...
  • CMS version: Joomla/Drupal
  • MySQL version: 5.1+
  • PHP version: 5.2+
Re: default dedupe rules
August 05, 2010, 06:20:32 pm
my immediate observation was simply that the Indiv default fuzzy rule is stricter than the default strict rule -- which is at the very least, a bit confusing

but I do think that setting a default strict rule of *just* the email is not strict enough. even if there's just a single contact in the system that shares an email with another contact, the rule as currently defined could easily result in data being inadvertently overwritten during a member signup, donation, event registration, or other form.

event registration, in particular, is highly prone to having the same email used for multiple contacts --triggering the "you've already registered" message, etc. -- or if this rule is still set as shipped, someone else could overwrite the contact while registering for an event.

personally, when i set the rules I want to make sure the default strict will *guarantee* there's no accidental overwrite by another contact, as its the one rule that impacts automated actions. and i think email alone is too risky.
support CiviCRM through 'make it happen' initiatives!
http://civicrm.org/mih

Eileen

  • Forum Godess / God
  • I’m (like) Lobo ;)
  • *****
  • Posts: 4195
  • Karma: 218
    • Fuzion
Re: default dedupe rules
August 05, 2010, 06:21:38 pm
On a far more important note - I'm only 6 karma behind Brian  :P
Make today the day you step up to support CiviCRM and all the amazing organisations that are using it to improve our world - http://civicrm.org/contribute

lcdweb

  • Forum Godess / God
  • I live on this forum
  • *****
  • Posts: 1620
  • Karma: 116
    • www.lcdservices.biz
  • CiviCRM version: many versions...
  • CMS version: Joomla/Drupal
  • MySQL version: 5.1+
  • PHP version: 5.2+
Re: default dedupe rules
August 05, 2010, 07:40:50 pm
oh no!
i plateaued at 85 for a while. looks like it's moving again.

hey -- if we pool our karma, we could beat Dave...

support CiviCRM through 'make it happen' initiatives!
http://civicrm.org/mih

lcdweb

  • Forum Godess / God
  • I live on this forum
  • *****
  • Posts: 1620
  • Karma: 116
    • www.lcdservices.biz
  • CiviCRM version: many versions...
  • CMS version: Joomla/Drupal
  • MySQL version: 5.1+
  • PHP version: 5.2+
Re: default dedupe rules
August 05, 2010, 07:44:04 pm
actually, what's more telling is the posts to karma ratio. i think it's sort of the equivalent of on base percentage in baseball.
so for every 13.3 posts on average, I get a karma. not bad.
support CiviCRM through 'make it happen' initiatives!
http://civicrm.org/mih

Eileen

  • Forum Godess / God
  • I’m (like) Lobo ;)
  • *****
  • Posts: 4195
  • Karma: 218
    • Fuzion
Re: default dedupe rules
August 05, 2010, 07:45:11 pm
 >:(
Make today the day you step up to support CiviCRM and all the amazing organisations that are using it to improve our world - http://civicrm.org/contribute

petednz

  • Forum Godess / God
  • I’m (like) Lobo ;)
  • *****
  • Posts: 4899
  • Karma: 193
    • Fuzion
  • CiviCRM version: 3.x - 4.x
  • CMS version: Drupal 6 and 7
Re: default dedupe rules
August 05, 2010, 11:42:25 pm
ah - the joys of supporing the joomla brigade  ;D
Sign up to StackExchange and get free expert advice: https://civicrm.org/blogs/colemanw/get-exclusive-access-free-expert-help

pete davis : www.fuzion.co.nz : connect + campaign + communicate

xavier

  • Forum Godess / God
  • I’m (like) Lobo ;)
  • *****
  • Posts: 4453
  • Karma: 161
    • Tech To The People
  • CiviCRM version: yes probably
  • CMS version: drupal
Re: default dedupe rules
August 06, 2010, 03:15:07 pm
If I recall properly, the use case of having email only was as follow:

On an event, put online registration but don't put any profile, or a profile that doesnt contain the first+last name, so you only have email.

 Every new registrant is going to generate a new contact, as the email only will never match email+last name (or any field)

Couple of questions/issues

1) is strict vs. fussy the right name ? Shouldn't it be "default" ? As Brian noticed, the fussy can be stricter than the strict, and the practical distinction is that the "strict" is applied by default

2) Shouldn't it be better to choose the dedupe rule on a per activity basis ? Eg for an event type "members meeting" you'd choose a stricter rule than for the "everyone and your uncle" event

At least for the import, that's one of the common issue we have: run an import and realise we f**ked up the base because someone changed the default rule (or that we forgot that we made is more or less strict to fit another specific need for a specific event/import or whatever uses dedupe)

3) Why isn't anyone "applauding" me ;)

X+
-Hackathon and data journalism about the European parliament 24-26 jan. Watch out the result

petednz

  • Forum Godess / God
  • I’m (like) Lobo ;)
  • *****
  • Posts: 4899
  • Karma: 193
    • Fuzion
  • CiviCRM version: 3.x - 4.x
  • CMS version: Drupal 6 and 7
Re: default dedupe rules
August 06, 2010, 03:22:32 pm
Because you wrote 'fussy' not 'fuzzy'  8)
Sign up to StackExchange and get free expert advice: https://civicrm.org/blogs/colemanw/get-exclusive-access-free-expert-help

pete davis : www.fuzion.co.nz : connect + campaign + communicate

lcdweb

  • Forum Godess / God
  • I live on this forum
  • *****
  • Posts: 1620
  • Karma: 116
    • www.lcdservices.biz
  • CiviCRM version: many versions...
  • CMS version: Joomla/Drupal
  • MySQL version: 5.1+
  • PHP version: 5.2+
Re: default dedupe rules
August 06, 2010, 04:05:28 pm
I love that you wrote fussy not fuzzy. Often my fuzzy rules are fussy.

Fuzzy default is used internally (dedupe check when a contact is saved) which can be fuzzier because it's "safe" -- the user is prompted before any action is taken. Strict default is used externally (signup forms) where the process is automated and there's risk of overwriting unintentionally. Which is why, in my mind, the rules should err on the side of being too strict. I'd rather have dupes that I can clean up later than have data unintentionally overwritten.

Yes, for an event that does not have a profile, any rule with email + something would generate a new record. But I suspect that's uncommon -- most people will have at least a few fields added to the event registration. And part of my concern is that Civi ship with too loose a default strict rule, and people lose data unexpectedly.

Obviously there's no perfect solution here.

But -- I do like the idea of having the option of selecting a rule for use with a specific profile. The 3.2 advanced settings improve the options and explanation for dedupe handling. Adding the ability to select a rule would give people the option of having a rule that meets their comfort level for different situations, and would also help just bring the concepts to their attention.
support CiviCRM through 'make it happen' initiatives!
http://civicrm.org/mih

Eileen

  • Forum Godess / God
  • I’m (like) Lobo ;)
  • *****
  • Posts: 4195
  • Karma: 218
    • Fuzion
Re: default dedupe rules
August 06, 2010, 04:41:25 pm
I think that if we want people to have something that works pretty well out of the box we should consider making the 'new individual' profile set on events as a default. People can change what is there but it would bridget the gap between the points Brian is making and the situation Xavier is describing and possibly make is easier for new users to get started
Make today the day you step up to support CiviCRM and all the amazing organisations that are using it to improve our world - http://civicrm.org/contribute

xavier

  • Forum Godess / God
  • I’m (like) Lobo ;)
  • *****
  • Posts: 4453
  • Karma: 161
    • Tech To The People
  • CiviCRM version: yes probably
  • CMS version: drupal
Re: default dedupe rules
August 06, 2010, 08:23:19 pm
@Pete ;)

Think Brian is right, and we don't explain enough what are these dedupe rules, nor that the "strict" one is applied for both online registration and import beside on the obvious manual dedupe.

I'm not even sure where else it's applied, eg. donation, membership registration, newsletter ? Where else ?

As for the terminology, "strict" doesn't sound as something that is applied by default everywhere. Got a better idea ? Dave, what do you think ?

FYI, I spent my first few months not knowing that the strict rule was used on online nor import, I thought it was about the levenshtein distance between two names.

X+
-Hackathon and data journalism about the European parliament 24-26 jan. Watch out the result

Eileen

  • Forum Godess / God
  • I’m (like) Lobo ;)
  • *****
  • Posts: 4195
  • Karma: 218
    • Fuzion
Re: default dedupe rules
August 06, 2010, 08:46:24 pm
Strict is intended to apply anywhere where the person entering the data is not in a position to inspect the two contacts & make the comparison themselves. Therefore it should not err on the side of creating duplicates rather than on losing distinct contacts.
Make today the day you step up to support CiviCRM and all the amazing organisations that are using it to improve our world - http://civicrm.org/contribute

Pages: [1] 2
  • CiviCRM Community Forums (archive) »
  • Old sections (read-only, deprecated) »
  • Discussion (deprecated) »
  • Alpha and Beta Release Testing »
  • 3.2 Release Testing »
  • default dedupe rules

This forum was archived on 2017-11-26.