CiviCRM Community Forums (archive)

*

News:

Have a question about CiviCRM?
Get it answered quickly at the new
CiviCRM Stack Exchange Q+A site

This forum was archived on 25 November 2017. Learn more.
How to get involved.
What to do if you think you've found a bug.



  • CiviCRM Community Forums (archive) »
  • Old sections (read-only, deprecated) »
  • Developer Discussion »
  • Scalability (Moderator: Donald Lobo) »
  • Long loading time of Finding Duplicate Contacts
Pages: [1]

Author Topic: Long loading time of Finding Duplicate Contacts  (Read 4694 times)

aeszq

  • I post occasionally
  • **
  • Posts: 53
  • Karma: 0
Long loading time of Finding Duplicate Contacts
March 24, 2010, 11:01:36 pm
HI Guys,

I have about 37,000 contacts in Civicrm database. The page never finishes loading when I do 'find and merge duplicate contacts', in orther words, the Mysql query of finding duplicate contacts takes extremely long time to run( I've let it run for 10 hours ,but it still not finished). The rule I use is checking first name, last name and email address. I wonder if this is normal because 37k contacts is not a small number?


Cheers,
George.

roland

  • I’m new here
  • *
  • Posts: 20
  • Karma: 0
Re: Long loading time of Finding Duplicate Contacts
March 28, 2010, 06:50:05 am
Further to George's message, We are actually using the GUI under "administer > manage > Find and Merge Duplicate Contacts" to perform this task. We didn't think this process would take that long so perhaps there are some additional primary key allocation required within the DB that we are unaware of.

The contact records will increase from around 37,000 to over 70,000 in the coming weeks and we believe there will be a number of duplicates. We would prefer to use the functionality with Civi (and allow the administrator to run this process) but at the moment we cannot. Our other alternative is to write our own code to determine the duplicates and merge them.

Anyway advice with this issue would be greatly appreciated.

Sincerely,
Roly.

Donald Lobo

  • Administrator
  • I’m (like) Lobo ;)
  • *****
  • Posts: 15963
  • Karma: 470
    • CiviCRM site
  • CiviCRM version: 4.2+
  • CMS version: Drupal 7, Joomla 2.5+
  • MySQL version: 5.5.x
  • PHP version: 5.4.x
Re: Long loading time of Finding Duplicate Contacts
March 29, 2010, 07:05:42 am

hey roly:

check: http://forum.civicrm.org/index.php/topic,12563.msg55510.html#msg55510

dave and rob give a few ideas on what we can potentially do. if you can take the next step and help optimize the queries / code and contribute it back that would be great :) You can also potentially sponsor the core team to tackle that and optimize it (though it would be done for 3.2 and might need to be backported for your use case)

lobo
A new CiviCRM Q&A resource needs YOUR help to get started. Visit our StackExchange proposed site, sign up and vote on 5 questions

roland

  • I’m new here
  • *
  • Posts: 20
  • Karma: 0
Re: Long loading time of Finding Duplicate Contacts
April 12, 2010, 06:56:49 am
Hi Lobo,
Sounds good. I'll get George to take a look and get back to you. I'll see if I can get some "sponsorship" funds as well.

Cheers,
Roly.

jbertolacci

  • I post occasionally
  • **
  • Posts: 54
  • Karma: 1
Re: Long loading time of Finding Duplicate Contacts
April 21, 2010, 01:48:27 pm
I have been talking to dlobo about sponsoring work. His estimate for a "base set of changes / sql fixes would be between 40-50 hours to optimize all queries and take less resources" is about $5k of work.

I can sponsor half of that and am looking for a co-sponsor. Would anyone like to be a co-sponsor to improve the the merge duplicate code and sql in civi?

roland

  • I’m new here
  • *
  • Posts: 20
  • Karma: 0
Re: Long loading time of Finding Duplicate Contacts
April 26, 2010, 09:13:53 pm
Hi,
I think I can get a sponsorship of around $AU2,500....but I am trying for more. When do you think the project can kick off?

Lobo, I think I can have get the funds by mid-May. Is that a reasonable timeframe?

We're still trying to sort out the data dictionaries from the external system.

Cheers,
Roly.

Donald Lobo

  • Administrator
  • I’m (like) Lobo ;)
  • *****
  • Posts: 15963
  • Karma: 470
    • CiviCRM site
  • CiviCRM version: 4.2+
  • CMS version: Drupal 7, Joomla 2.5+
  • MySQL version: 5.5.x
  • PHP version: 5.4.x
Re: Long loading time of Finding Duplicate Contacts
April 26, 2010, 10:20:44 pm

we starting on the 3.2 QA process, so putting an optimized dedupe in the 3.3 release cycle is probably a good idea. We do hope that 3.2 hits the beta cycle by the end of may (at which time we start working on 3.3)

lobo
A new CiviCRM Q&A resource needs YOUR help to get started. Visit our StackExchange proposed site, sign up and vote on 5 questions

Donald Lobo

  • Administrator
  • I’m (like) Lobo ;)
  • *****
  • Posts: 15963
  • Karma: 470
    • CiviCRM site
  • CiviCRM version: 4.2+
  • CMS version: Drupal 7, Joomla 2.5+
  • MySQL version: 5.5.x
  • PHP version: 5.4.x
Re: Long loading time of Finding Duplicate Contacts
April 28, 2010, 09:40:21 pm

hey roly:

u should also check:

http://forum.civicrm.org/index.php/topic,12563.msg57472.html#msg57472

i think if u combine forces with jason, we can get some major changes into 3.3 (or maybe even 3.2.x)

lobo
A new CiviCRM Q&A resource needs YOUR help to get started. Visit our StackExchange proposed site, sign up and vote on 5 questions

jase700

  • I post occasionally
  • **
  • Posts: 57
  • Karma: 3
Re: Long loading time of Finding Duplicate Contacts
November 12, 2010, 07:58:45 pm
One strategy that can help a bit here.

Create smartgroups based on .com, .net, .org, .edu, etc.  Then when you go through the merge workflow, just work through each of these groups.  Will create a much smaller subset to work with.  I have 135,000 contacts and it's near impossible to do what I need to do.

xavier

  • Forum Godess / God
  • I’m (like) Lobo ;)
  • *****
  • Posts: 4453
  • Karma: 161
    • Tech To The People
  • CiviCRM version: yes probably
  • CMS version: drupal
Re: Long loading time of Finding Duplicate Contacts
November 12, 2010, 09:17:20 pm
Could you try installing a 3.3 version (beta, do that on a copy of the prod)? The duplicate is *much* faster now.

X+
-Hackathon and data journalism about the European parliament 24-26 jan. Watch out the result

Pages: [1]
  • CiviCRM Community Forums (archive) »
  • Old sections (read-only, deprecated) »
  • Developer Discussion »
  • Scalability (Moderator: Donald Lobo) »
  • Long loading time of Finding Duplicate Contacts

This forum was archived on 2017-11-26.