CiviCRM Community Forums (archive)

*

News:

Have a question about CiviCRM?
Get it answered quickly at the new
CiviCRM Stack Exchange Q+A site

This forum was archived on 25 November 2017. Learn more.
How to get involved.
What to do if you think you've found a bug.



  • CiviCRM Community Forums (archive) »
  • Old sections (read-only, deprecated) »
  • Support »
  • Using CiviCRM (Moderator: Dave Greenberg) »
  • Remove Junk Data (would writing a script be the best way?)
Pages: [1]

Author Topic: Remove Junk Data (would writing a script be the best way?)  (Read 1887 times)

tim g.

  • I post occasionally
  • **
  • Posts: 57
  • Karma: 4
  • לחפש את אלוהים הראשון
    • Gott Milk
Remove Junk Data (would writing a script be the best way?)
March 17, 2011, 02:07:40 am
Any suggestions on how I could remove some junk data out of a CiviCRM installation. The system was opened up (now corrected) to allow users to sign themselves up. While it was open there was some sort of exploit that it was hit with and several 'user' accounts were created in Drupal; and of course for every Drupal user created there is a corresponding CiviCRM record that is created.

It's not too bad to go through and manually select Drupal accounts that look bogus. But the problem comes in trying to manually select the corresponding CiviCRM record for every bogus Drupal user. We're talking about four to five hundred records I think.

I have someone that can right PHP scripts for me and knows MySQL. So if the easiest solution involves any of this feel free to tell me what needs to be done and I'll pass it along for implementation. Thanks.
* If you like any of my answers then click the little applaud link next to my picture. It kinda tickles.
“Why is it when we talk to God, we're said to be praying—but when God talks to us, we're schizophrenic?” - Lily Tomlin

xavier

  • Forum Godess / God
  • I’m (like) Lobo ;)
  • *****
  • Posts: 4453
  • Karma: 161
    • Tech To The People
  • CiviCRM version: yes probably
  • CMS version: drupal
Re: Remove Junk Data (would writing a script be the best way?)
March 17, 2011, 03:18:19 am
Not trivial, as for now we don't have the source of the contact set to "user self-registration" (or something easy to distinguish).

Delete all the fake user accounts.
Then if your script delete all the contacts
- that are created by themselves, that haven't been modified, that haven't any activity and that don't have matching user account, should be good.

Please share the script when you have written it, sure you aren't the only one having the problem ;)

-Hackathon and data journalism about the European parliament 24-26 jan. Watch out the result

tim g.

  • I post occasionally
  • **
  • Posts: 57
  • Karma: 4
  • לחפש את אלוהים הראשון
    • Gott Milk
Re: Remove Junk Data (would writing a script be the best way?)
March 17, 2011, 11:33:09 pm
<he he> and one of the guys I'm considering to do this threw out the idea of putting this script in a cron job after it's finished. ... And 'that' just tickled me.

Having a script like this in a cron job would allow me to open the self user registration back up (as long as the methods of the scripting robots [is that the right term?] didn't change). It would just go through and take the erroneous records out every ten or fifteen minutes. .... TAKE THAT YOU ROBOT SCRIPTING SCUM !!! <he he>

:-)


.
« Last Edit: March 23, 2011, 11:15:18 pm by tim g. »
* If you like any of my answers then click the little applaud link next to my picture. It kinda tickles.
“Why is it when we talk to God, we're said to be praying—but when God talks to us, we're schizophrenic?” - Lily Tomlin

tim g.

  • I post occasionally
  • **
  • Posts: 57
  • Karma: 4
  • לחפש את אלוהים הראשון
    • Gott Milk
Re: Remove Junk Data (would writing a script be the best way?)
March 23, 2011, 11:17:19 pm
Be happy to share the script with the community after it's finished. Just from the little that I know about scripting and MySQL I have a few questions and/or points to verify that I think will help speed up the work.

- Both the Drupal users and the CiviCRM records are kept in a MySQL table correct? Is there a unique identifier that the record shares in both the Drupal and CiviCRM MySQL tables? If so then would it be better to write a script that can identify erroneous entries from either the Drupal or the CiviCRM direction and then delete everything in both MySQL tables that share that unique identifier?
          * In my case I think I could safely identify every Drupal entry that matches two criteria:
             1) doesn't have a Drupal "Role" assigned.
             2) Has a first and last name that is 'exactly' identical.

- My scripter/coder isn't going to break anything if I have him do this is he ? ... :-)



And I was also curious. When you register via Drupal a CiviCRM record is automatically created. So, would the reverse be true (or possible) that if you delete a Drupal user, would the corresponding CiviCRM record automatically removed?



Glad to revert back to your previous advice:
Quote from: xavier on March 17, 2011, 03:18:19 am
Have your script delete all the contacts
- that are created by themselves, that haven't been modified, that haven't any activity and that don't have matching user account.
if my thoughts (again I'm not a coder) aren't helpful towards the most beneficial end.

And actually, I'm consider whether the merging your idea with the unique identifier thought might not be a good idea as well. I don't know enough to consider which one would be the better idea. But if there is a script that:
1) goes to the CiviCRM MySQL table first, collects all the unique identifiers of all the contacts that are:
        - created by themselves
        - that haven't been modified
        - that haven't any activity
        - and that don't have matching user account.
2) deletes all of the records from both CiviCRM and Drupal that match that unique identifier.

Thanks for the help !

* If you like any of my answers then click the little applaud link next to my picture. It kinda tickles.
“Why is it when we talk to God, we're said to be praying—but when God talks to us, we're schizophrenic?” - Lily Tomlin

Eileen

  • Forum Godess / God
  • I’m (like) Lobo ;)
  • *****
  • Posts: 4195
  • Karma: 218
    • Fuzion
Re: Remove Junk Data (would writing a script be the best way?)
March 24, 2011, 01:48:14 am
There may well be a link between the username and the civicrm contact.

The table UFMatch holds links between drupal accounts & contact IDs & there is *probably* a record for each user account and especially if they have logged in.

But, before you get too heavy handed be sure that there is actually harm done by having these spam accounts in your database. I'm sure they are annoying but if you reduce your users experience to prevent them from being created then the 'cure' may be worse the the symptoms.
Make today the day you step up to support CiviCRM and all the amazing organisations that are using it to improve our world - http://civicrm.org/contribute

Pages: [1]
  • CiviCRM Community Forums (archive) »
  • Old sections (read-only, deprecated) »
  • Support »
  • Using CiviCRM (Moderator: Dave Greenberg) »
  • Remove Junk Data (would writing a script be the best way?)

This forum was archived on 2017-11-26.