CiviCRM Community Forums (archive)

*

News:

Have a question about CiviCRM?
Get it answered quickly at the new
CiviCRM Stack Exchange Q+A site

This forum was archived on 25 November 2017. Learn more.
How to get involved.
What to do if you think you've found a bug.



  • CiviCRM Community Forums (archive) »
  • Old sections (read-only, deprecated) »
  • Developer Discussion »
  • Scalability (Moderator: Donald Lobo) »
  • smart automatic merge dedupe?
Pages: 1 ... 3 4 [5]

Author Topic: smart automatic merge dedupe?  (Read 23789 times)

Erich Schulz

  • I post frequently
  • ***
  • Posts: 142
  • Karma: 5
    • When no-one understands what you are going on about its time to start a blog
  • CiviCRM version: 4.4
  • CMS version: Drupal 7
  • MySQL version: 5.somthing
  • PHP version: 5.3.3
Re: smart automatic merge dedupe?
August 09, 2011, 03:35:43 am
mmmm  :-\

btw Xavier, just so you know I can use the v3 api:

Code: [Select]
/**
   * Automaticlly merge two contacts.
   *
   * @see autoMergeSQL()
   *
   * @param $param array
   *  - 'id' integer contact_id of record to keep (injection attack safe)
   *  - 'id_duplicate' integer contact_id of record to be assimilated (injection attack safe)
   * @return array result
   *  - is_error
   *  - sql
   *  - error_message
   *  - report
   */
  static public function merge($param) {
    $keep = (int)$param['id'];
    $lose = (int)$param['id_duplicate'];
    // attempt to generate SQL:
    $result = AgcAutoMerge::autoMergeSQL($keep, $lose);
    if (!$result['is_error']) {
      // perform merge:
      require_once 'CRM/Core/Transaction.php';
      $transaction = new CRM_Core_Transaction( );
      foreach ($result['sql'] as $sql) {
        CRM_Core_DAO::executeQuery( $sql, CRM_Core_DAO::$_nullArray, true, null, true );
      }
      $transaction->commit( );
      // add notes to both contacts: //todo fix note created by
      $note = array(
        'entity_table' => 'civicrm_contact',
        'entity_id' => $keep,
        'note' => "This contact has been merged from the duplicate $lose",
        'subject' => 'Target of automerge',
        'version' => 3,
      );
      $result['report'] .= "Added note to $keep: ". json_encode(civicrm_api( 'note', 'create', $note));
      $note = array(
        'entity_table' => 'civicrm_contact',
        'entity_id' => $lose,
        'note' => "This contact was merged to $keep",
        'subject' => 'Duplicate. Deleted during automerge',
        'version' => 3,
      );
     $result['report'] .=  "Added note to $lose: " . json_encode(civicrm_api( 'note','create', $note));
    }
    return $result;
  }
 
 

This wrapper does the merge and adds a brief note to both contacts, I've updated the git-hub with the latest edition,

if anyone else is planning on putting this to use soon they may wish to pm me, as I maybe motivated to package it up better (ie that drupal module Lobo suggested)

any hints on how to access the current user id? I'm assuming if i pass that to the create note api as contact_id it gets stored in the 'created by' field??
« Last Edit: August 09, 2011, 04:14:03 am by Erich Schulz »

xavier

  • Forum Godess / God
  • I’m (like) Lobo ;)
  • *****
  • Posts: 4453
  • Karma: 161
    • Tech To The People
  • CiviCRM version: yes probably
  • CMS version: drupal
Re: smart automatic merge dedupe?
August 09, 2011, 04:46:36 am
Might be better to store it as an activity than a note (easier to search?)

Give me few more min and I try to chat with you
X+
-Hackathon and data journalism about the European parliament 24-26 jan. Watch out the result

Erich Schulz

  • I post frequently
  • ***
  • Posts: 142
  • Karma: 5
    • When no-one understands what you are going on about its time to start a blog
  • CiviCRM version: 4.4
  • CMS version: Drupal 7
  • MySQL version: 5.somthing
  • PHP version: 5.3.3
Re: smart automatic merge dedupe?
August 09, 2011, 05:19:26 am
for xavier - this is the current interface

Code: [Select]
/**
 * Automatically merge two contacts.
 *
 * Creates a note in both main and outer contact
 *
 * @returns string html
 */
function _agc_dedupe_merge($mainId, $otherId) {
  require_once 'class.automerge.php';
  $params = array(
    'id' => $mainId,
    'id_duplicate' => $otherId);
  $result = AgcAutoMerge::merge($params);
  $html = "<h3>Planned merger of $otherId into $mainId</h3>";
  if ($result['is_error']) {
    $html .= "<h4>Error</h4><p class='error'>$result[error_message]</p><p>$result[report]</p>";
  } else {
    $html .= "<h4>SQL:</h4>" . AgcAutoMergeDevWindow::arrayFormat($result['sql']);
  }
  return $html . '<p>Report:' . $result['report'] . '</p>';
}

menu item for .module file
Code: [Select]
  $items['agc/dedupe/merge'] = array( 'title' => 'Perform merge',
    'description' => 'Merge a pair of duplicate contacts',
    'file' => 'agc.automerge.inc',
    'page callback' => '_agc_dedupe_merge',
    'page arguments' => array(3,4), // $mainId, $otherId
    'access arguments' => $adminRole,
    'type' => MENU_CALLBACK);

I'm currently working on another function that will take an sql statement and loop through those... it's a bit fast and loose but gets this thing going. :D

Erich Schulz

  • I post frequently
  • ***
  • Posts: 142
  • Karma: 5
    • When no-one understands what you are going on about its time to start a blog
  • CiviCRM version: 4.4
  • CMS version: Drupal 7
  • MySQL version: 5.somthing
  • PHP version: 5.3.3
Re: smart automatic merge dedupe?
August 17, 2011, 07:49:19 pm
just a little update that we've been running this code

and have now merged 1300 duplicates automatically

just a few glitches  :-X

will be motivated to expand if others need this process

Erich Schulz

  • I post frequently
  • ***
  • Posts: 142
  • Karma: 5
    • When no-one understands what you are going on about its time to start a blog
  • CiviCRM version: 4.4
  • CMS version: Drupal 7
  • MySQL version: 5.somthing
  • PHP version: 5.3.3
Re: smart automatic merge dedupe?
August 29, 2011, 03:13:28 am
have now added in functionality to merge custom multivalue fields (in beta test)

xavier

  • Forum Godess / God
  • I’m (like) Lobo ;)
  • *****
  • Posts: 4453
  • Karma: 161
    • Tech To The People
  • CiviCRM version: yes probably
  • CMS version: drupal
Re: smart automatic merge dedupe?
August 29, 2011, 03:44:10 am
That's great! Thx for all your work.

X+
-Hackathon and data journalism about the European parliament 24-26 jan. Watch out the result

daven

  • I’m new here
  • *
  • Posts: 5
  • Karma: 0
  • CiviCRM version: 3.4
  • CMS version: Drupal 6
  • MySQL version: 5.1
  • PHP version: 5.3 / 5.2
Re: smart automatic merge dedupe?
October 20, 2011, 04:46:44 pm
Hi Erich

Thank you for sharing your code. I will be using it for a project I'm on at CivicActions with around 80k contacts and guessing 10% duplicates right now. Auto-merging will certainly enable us to actually dedupe that many. I'll let you know how it goes.

Erich Schulz

  • I post frequently
  • ***
  • Posts: 142
  • Karma: 5
    • When no-one understands what you are going on about its time to start a blog
  • CiviCRM version: 4.4
  • CMS version: Drupal 7
  • MySQL version: 5.somthing
  • PHP version: 5.3.3
Re: smart automatic merge dedupe?
November 18, 2011, 07:13:19 pm
Thanks Daven - good to know someone is giving it go... how'd you go?

Donald Lobo

  • Administrator
  • I’m (like) Lobo ;)
  • *****
  • Posts: 15963
  • Karma: 470
    • CiviCRM site
  • CiviCRM version: 4.2+
  • CMS version: Drupal 7, Joomla 2.5+
  • MySQL version: 5.5.x
  • PHP version: 5.4.x
Re: smart automatic merge dedupe?
December 15, 2011, 07:21:23 am

This is now a Make It Happen for 4.2: http://civicrm.org/mih#massdedupe

Hopefully we'll meet the target and ship it with 4.2. We'll read this thread again when spec'ing the issue. Blog post and issue coming soon

lobo
A new CiviCRM Q&A resource needs YOUR help to get started. Visit our StackExchange proposed site, sign up and vote on 5 questions

Erich Schulz

  • I post frequently
  • ***
  • Posts: 142
  • Karma: 5
    • When no-one understands what you are going on about its time to start a blog
  • CiviCRM version: 4.4
  • CMS version: Drupal 7
  • MySQL version: 5.somthing
  • PHP version: 5.3.3
Re: smart automatic merge dedupe?
December 23, 2011, 11:56:49 pm
thats nice to here Lobo - hopefully some of my stuff will be useable - I'm afraid I'm now only a few months out from an election so will be a bit distracted but happy to reply to messages etc

poor data quality, poor data entry of nicknames and desire not to overwrite customised greetings are currently major roadblock for us...

provision of a customisable nickname table may be a tremendous boost

Donald Lobo

  • Administrator
  • I’m (like) Lobo ;)
  • *****
  • Posts: 15963
  • Karma: 470
    • CiviCRM site
  • CiviCRM version: 4.2+
  • CMS version: Drupal 7, Joomla 2.5+
  • MySQL version: 5.5.x
  • PHP version: 5.4.x
Re: smart automatic merge dedupe?
December 24, 2011, 03:16:24 am

Our current focus is on making mass merge quick and efficient.

I suspect data cleanup / improving data quality and data normalization will come in later releases

lobo
A new CiviCRM Q&A resource needs YOUR help to get started. Visit our StackExchange proposed site, sign up and vote on 5 questions

Erich Schulz

  • I post frequently
  • ***
  • Posts: 142
  • Karma: 5
    • When no-one understands what you are going on about its time to start a blog
  • CiviCRM version: 4.4
  • CMS version: Drupal 7
  • MySQL version: 5.somthing
  • PHP version: 5.3.3
Re: smart automatic merge dedupe?
December 24, 2011, 04:36:35 am
understand and laudable priority!

i decided that rather than making the automerge more complicated to deal with inconsistent data I was better just cleaning the data (and then setting crons to keep it clean)... this meant two simple and clean processes

be interesting to see how you decide to go

Pages: 1 ... 3 4 [5]
  • CiviCRM Community Forums (archive) »
  • Old sections (read-only, deprecated) »
  • Developer Discussion »
  • Scalability (Moderator: Donald Lobo) »
  • smart automatic merge dedupe?

This forum was archived on 2017-11-26.