CiviCRM Community Forums (archive)

*

News:

Have a question about CiviCRM?
Get it answered quickly at the new
CiviCRM Stack Exchange Q+A site

This forum was archived on 25 November 2017. Learn more.
How to get involved.
What to do if you think you've found a bug.



  • CiviCRM Community Forums (archive) »
  • Old sections (read-only, deprecated) »
  • Support »
  • Using CiviCRM »
  • Using Core CiviCRM Functions (Moderator: Yashodha Chaku) »
  • Help needed avoiding duplicates.
Pages: [1]

Author Topic: Help needed avoiding duplicates.  (Read 2070 times)

AJA

  • Guest
Help needed avoiding duplicates.
December 29, 2008, 05:09:33 pm
Hi there,

Does anyone have any definite strategies to avoid the introduction of duplicates into the CiviCRM contact database? For instance, anything that's used in your organization that might be helpful to us?

At the company I work for we have had (and are having) some issues with duplicate matching on import. Is anyone aware, (or is there any documentation available,) about the rules used for dup detection on contact import vs. those used when the "Find and Merge" screen is accessed? I have done various tests on import and it appears to ignore the rules I set in the duplicate finder. Also, are there API hooks for access to the duplicate matching system? We are running a v2.1 system and haven't updated to the latest 2.1.2 -- which I hear has REST interface improvements (and maybe SOAP?) -- due to some custom code that would need reimplementation.

This could be useful for us to know about, and, I'm sure, many others in the future. Thanks!

Sarah

  • I post occasionally
  • **
  • Posts: 72
  • Karma: 3
    • American Friends Service Committee - Southeastern New England
Re: Help needed avoiding duplicates.
January 02, 2009, 01:57:52 am
I'm not sure about the technical side of what you are asking, but I think the duplicate matching rule you use would depend on how you identify the users.
I use first and last name as an indicator for individuals and Organization name and the first line of the address for Organizations, as a general rule.
I always set duplicate matching to skip duplicate records so that I can make sure I am not overwriting records not matching incorrectly. This works if you only get one or two records at a time that you can check out.

You can set the default rule for importing and then find and merge based on a custom rule you create, and you have the option of making that default or not. All duplicate rules are listed under find and merge contacts for you to choose from.

What is your default duplicate rule set to ? Are you matching by Name and Address or by Email?



AJA

  • Guest
Re: Help needed avoiding duplicates.
January 02, 2009, 02:38:20 pm
Hi there Sarah, I appreciate your response.

At the present time we have a hundred or so individuals to import into the system every month or so.

The general strategy, at least in our find and merge duplicate contacts, is to run several passes, all with differing rules, and merge the ones that are obviously duplicates. This, unfortunately, takes a great deal of time. Particularly due to the fact that we use a great many group and tagging strategies to organize our contacts, and these things do not display in a very informative manner in the duplicate matching screens.

I am unsure what you mean, though, when you talk about default import rule sets. As far as I am aware, (or have been able to discover) there is no way to set up a default import contact matching rule. In general, as far as the "skip, fill, update," and so on, we generally use the update command for import, as this is, at least in theory, what we want.

But, if I understand you correctly, the basic idea is to import and skip duplicates (though I don't know what rule the import would use to determine which ones to skip,) and then run a duplicate merge based of the rules we design. Does this seem to work best for you? The reason that this would be unappealing to us is that we have information in the imported data that needs to be filled into fields in whatever contacts may already be present in the system. Else we will have to manually change this data, which is an equally unappealing scenario.

Thanks again!

Sarah

  • I post occasionally
  • **
  • Posts: 72
  • Karma: 3
    • American Friends Service Committee - Southeastern New England
Re: Help needed avoiding duplicates.
January 02, 2009, 10:53:33 pm
The way that I understand it, when in the Find and Merge Duplicate contacts screen, you can use any Strict Type duplicate matching as the default for importing by clicking Make Default. The fuzzy matching is only to be used from the find and merge duplicate contacts screens. You can use one default for each type of record.

The issue with using the First name and Last Name and email is if the first and last match and there is no email address, if you have the threshold for each one of the fields set properly it will  match. (First and Last should be set to 5- matching threshold is 10- but email should be set to 10 which would mean if the email was Identical it would be identified as a match).

So that should be the way your are able to configure your duplicate matching. Define a rule based on the criteria you like (First and Last and Email, or Just First and Last) and set it to default to configure matching. If you want to update the records then they should match and update ok with the update choice from the import.


I only skip duplicate records and eyeball them before I import because we use a database with national contacts who are likely to have a first and last name match, but live in separate parts of the country.  I also imported data with matching duplicates from different databases in different locations so the data available for one might be slightly different from the data available for another record, and not all records have email addresses. So I import match under first last and address matching (most of our records have addresses), and then use the duplicate matching to check just first and last and compare the two addresses available to see if the records may be related. It's possible you have more emails than addresses, which would make it easier to match on email address- but unlike an address, two records in the database probably shouldn't have the same email address (because this would mean they would receive an email twice)- so I set the weight of email for duplicate matching to the full threshold. But only for duplicate matching screens- and not for imports because if you are updating during imports, and matching on email you wouldn't want to overwrite information in the database without looking first to see which record has that address.



It really depends on your process the best way for you to do it.


 But it sounds like you don't have the default duplicate rule configured to work properly for you. If you try creating a new one and using that maybe it will help.

I hope this wasn't too convoluted for you to understand.  ;D

Pages: [1]
  • CiviCRM Community Forums (archive) »
  • Old sections (read-only, deprecated) »
  • Support »
  • Using CiviCRM »
  • Using Core CiviCRM Functions (Moderator: Yashodha Chaku) »
  • Help needed avoiding duplicates.

This forum was archived on 2017-11-26.