CiviCRM Community Forums (archive)

*

News:

Have a question about CiviCRM?
Get it answered quickly at the new
CiviCRM Stack Exchange Q+A site

This forum was archived on 25 November 2017. Learn more.
How to get involved.
What to do if you think you've found a bug.



  • CiviCRM Community Forums (archive) »
  • Old sections (read-only, deprecated) »
  • Support »
  • Using CiviCRM »
  • Using CiviMail (Moderator: Piotr Szotkowski) »
  • Cleaning up the CiviMail event queue
Pages: [1] 2 3

Author Topic: Cleaning up the CiviMail event queue  (Read 6870 times)

torenware

  • I post frequently
  • ***
  • Posts: 153
  • Karma: 4
Cleaning up the CiviMail event queue
January 04, 2009, 01:27:40 pm
On a site that makes very heavy use of CiviMail, I've discovered that the the table civicrm_mailing_event_queue can become very, very large, so much so that it is affecting our ability to use backups.  Currently, the table has *millions* of records.  This means that it's hard for a developer to load the DB on a lap top (and even on a fast server, a load is now taking close to an hour and a half).

I suspect that many of the records are for mailings done months ago, and that we no longer really need them around.  Does anyone have advice as to how best to clean this up.  I haven't worked out the dependencies of the different mailing tables, although I'd expect that events would be pretty far "downstream".

Donald Lobo

  • Administrator
  • I’m (like) Lobo ;)
  • *****
  • Posts: 15963
  • Karma: 470
    • CiviCRM site
  • CiviCRM version: 4.2+
  • CMS version: Drupal 7, Joomla 2.5+
  • MySQL version: 5.5.x
  • PHP version: 5.4.x
Re: Cleaning up the CiviMail event queue
January 04, 2009, 05:46:23 pm

ahh, i wondered when we'd hit this issue :)

your best bet would be to delete the old mailings (which should delete all those records). Considering you have lots of records, you might want to create a more efficient batch delete script for multiple mailings (since all the event tables have a job id, you can get a list of job ids that should be deleted and then apply them in batches).

lobo
A new CiviCRM Q&A resource needs YOUR help to get started. Visit our StackExchange proposed site, sign up and vote on 5 questions

torenware

  • I post frequently
  • ***
  • Posts: 153
  • Karma: 4
Re: Cleaning up the CiviMail event queue
January 04, 2009, 08:22:30 pm
Any reason I can't just "work backwards" from the event types (bounce, etc.) once I can identify the event_queue_id?

I don't think that I could get through that many records via any of the PHP calls in reasonable (or even "reasonably finite") time, and I suspect I can go from job_id ==> event_queue_id ==> records.

This "looks" like it would work, at least judging by the way the keys are set up.

Donald Lobo

  • Administrator
  • I’m (like) Lobo ;)
  • *****
  • Posts: 15963
  • Karma: 470
    • CiviCRM site
  • CiviCRM version: 4.2+
  • CMS version: Drupal 7, Joomla 2.5+
  • MySQL version: 5.5.x
  • PHP version: 5.4.x
Re: Cleaning up the CiviMail event queue
January 05, 2009, 06:06:16 am

there will be millions of event_queue_id's and just thousands of job id's, so a more efficient approach might be to do a set of mass deletions using the job_id, which you can retrieve from the civicrm_mailing_job table

lobo
A new CiviCRM Q&A resource needs YOUR help to get started. Visit our StackExchange proposed site, sign up and vote on 5 questions

Denver Dave

  • Ask me questions
  • ****
  • Posts: 471
  • Karma: 9
Re: Cleaning up the CiviMail event queue
January 09, 2009, 06:25:04 pm
I posted a similar issue earlier about how time consuming it is to even delete individual mailings.  We have many people doing mailings and the queue of old mailings keeps building and someone (me), has to go in and delete one at a time.   Currently we have 217 mailings in the queue, which is not as many as it sounds like, because each real mailing typically has 2 or 3 test runs before the final run.
http://forum.civicrm.org/index.php/topic,5060.0.html

torenware

  • I post frequently
  • ***
  • Posts: 153
  • Karma: 4
Re: Cleaning up the CiviMail event queue
January 09, 2009, 11:24:25 pm
It does appear to be a problem.

I did some tests to simply deal with the activity history records (we're using 1.9) that CiviMail creates.  Again, we are talking about millions of records.

Since AH is a pretty flat table, I did a pretty simple query to do the deletion.  It took about 3 hours to run on my laptop.

I expect that unwinding the event queue table would be much slower.  The number of records is comparable, but the indexes are much more elaborate.  This is assuming that you do this as pure SQL, and do not use the PHP interfaces, which I am sure would be orders of magnitude slower.

torenware

  • I post frequently
  • ***
  • Posts: 153
  • Karma: 4
Re: Cleaning up the CiviMail event queue
January 29, 2009, 07:22:58 pm
I've done a fair bit of research on this, and have a reasonable (and reasonably fast) solution.

The key thing to do, assuming that you may need the information in future, is to create archive tables.  MySQL has an engine type Archive, which are tables that you can use for long term data storage.  The tables are very easy to write to, but do not have indexes, so they are not useful for anything you need to get at fast.  But for archival purposes, they work well.

I'll give the example of 1.9 activity history first, since the table has very simple structure.  The details will differ for different tables and for different versions of CiviCRM, but once you understand the trick, it's easy to extend this.

The steps are:

First, reate a new database to hold your archive tables.  This will let you backup more efficiently.

Then, create a framework for the archive table from the existing table:
Code: [Select]
create new_db.old_activity_history like old_db.civicrm_activity_history;
Since the Archive engine does not know about indexes, you need to delete all the indexes from the new table format.  For 1.9 activity history, this looks like this:
Code: [Select]
use new_db;
-- the primary key is also auto_increment, so you need to remove
-- this before you can proceed
alter table  old_activity_history modify id int(10) unsigned;

-- remove the primary key itself
alter table old_activity_history drop PRIMARY KEY;

-- remove the regular indexes
alter table old_activity_history drop index index_entity;
alter table old_activity_history drop index index_activity;

-- and the foreign key indexes as well
alter table old_activity_history drop index FK_civicrm_activity_history_relationship_id;
alter table old_activity_history drop index FK_civicrm_activity_history_group_id;

Now you have an InnoDB table with no indexes.  You can convert it to an Archive table like this:
Code: [Select]
alter table old_activity_history engine = archive;
Now fill the new table with the old data.  Get some coffee, this will take a while:
Code: [Select]
insert into old_activity_history select * from old_db.civicrm_activity_history;
Now a minor trick.  InnoDB will forget what's the next ID for the old table if you delete everything.  And you want to delete everything -- truncation is at least hundreds of times faster than any kind of DELETE statement.  So save away the maximum ID in the old table.  I did it this way:
Code: [Select]
select max(id) from old_db.civicrm_activity_history into @ah_max_id;
Now you can finally purge the old data:
Code: [Select]
truncate old_db.civicrm_activity_history;
Finally, restore the max ID record so that auto_increment will work right going forward:
Code: [Select]
insert into old_db.civicrm_activity_history select * from old_activity_history where id = @ah_max_id;

As long and as nasty is this is, it's simpler than doing CiviMail's event_queue, since there are dependencies with a variety of tables.  But the basic procedure works the same way.

Donald Lobo

  • Administrator
  • I’m (like) Lobo ;)
  • *****
  • Posts: 15963
  • Karma: 470
    • CiviCRM site
  • CiviCRM version: 4.2+
  • CMS version: Drupal 7, Joomla 2.5+
  • MySQL version: 5.5.x
  • PHP version: 5.4.x
Re: Cleaning up the CiviMail event queue
January 30, 2009, 06:50:42 am

any chance you can convert this into a php script?

ideally the script would do the following:

1. take in an optional set or range of mailing ids

2. take in an optional date to archive all mailings before that date

3. take an optional database name (for archival)

and archive the mailings that fall under the above constraints to the new db if present and delete the mailings from the old db. maybe the community can help sponsor this development :)

lobo
A new CiviCRM Q&A resource needs YOUR help to get started. Visit our StackExchange proposed site, sign up and vote on 5 questions

Donald Lobo

  • Administrator
  • I’m (like) Lobo ;)
  • *****
  • Posts: 15963
  • Karma: 470
    • CiviCRM site
  • CiviCRM version: 4.2+
  • CMS version: Drupal 7, Joomla 2.5+
  • MySQL version: 5.5.x
  • PHP version: 5.4.x
Re: Cleaning up the CiviMail event queue
February 03, 2009, 08:30:38 am

some more thoughts on this (after reading this blog post: http://github.com/blog/328-database-troubles). At some point soon, i think civimail will need a batch archive and delete facility (i.e. ability to handle this for multiple mailings at a time). So here is the rough idea:

The archive feature will install mysql tables of type archive in the database. Basically this will include most of the CiviMail tables which hold mailing / event information. To reduce the number of tables, we can potentially collapse all the civicrm_mailing_event_queue_* table into one archive table (all these tables have a common set of 4 fields). The archive code will then move all the entries to the relevant tables on a periodic basis based on some frequency parameter. I suspect most sites will not need archival features for quite some time.

Another easier option might be to wait for an official mysql 5.1 release and use the partitioning feature: http://dev.mysql.com/tech-resources/articles/performance-partitioning.html

If you are interested in working on this or sponsoring this feature, let us know

lobo
A new CiviCRM Q&A resource needs YOUR help to get started. Visit our StackExchange proposed site, sign up and vote on 5 questions

Denver Dave

  • Ask me questions
  • ****
  • Posts: 471
  • Karma: 9
Re: Cleaning up the CiviMail event queue
February 03, 2009, 10:05:24 am
My issue on the mailing event queue is the amount of time that it takes to manually delete the events one at a time.  After deleting a mailing queue item the sort tends to be off and I have to search again to find the next mailing to delete or click the back button several times.

There must be a function or script that deletes a queue item.  Perhaps if this could be issolated, we could pass a list of event ids and delete several at a time.  I don't think this has to be built into the interface (but could be).

I'm afraid there may be a little more to this than just whacking the civicrm_mailing table - related tables and such.

= = = = = = =

As for archives - I do see we 438,000 records in the database for a contact size of 4,400  - does seem like we are running CiviCRM until it drops.

Dave

torenware

  • I post frequently
  • ***
  • Posts: 153
  • Karma: 4
Re: Cleaning up the CiviMail event queue
February 03, 2009, 11:32:30 am
The approach used for activity history does work for the CiviMail event tables.  The main issue I've found isn't speed (the archive table approach is actually reasonably fast, even for millions of records), but making sure you save the right "stuff".

Part of what the CiviMail event tables do is "explain" when and why people opt out of email or unsubscribe from lists.  For some installations, this is valuable information, since you can audit with it.  But that's a very small fraction of the events -- for the install I'm looking at, about 0.1% of the records.  So if you can lop off the "opened" and the "delivered" tables, and any event_queue record that isn't needed for the remaining event records, you can reduce the size of those tables by a factor of about 1000.

Mostly, this needs to be a SQL only solution, because you never want to load most of the records in the system, and CiviCRM's system wants to load everything so it can use its object wrappers.  But a PHP wrapper for the script would make it a lot easier to "rewrite" the SQL, since MySQL's dialect doesn't let you parameterize things like table names and database names:  @my_value will substitute for a value, but @my_table or @my_database will cause a syntax error.

Piotr Szotkowski

  • Moderator
  • I live on this forum
  • *****
  • Posts: 1497
  • Karma: 57
Re: Cleaning up the CiviMail event queue
February 04, 2009, 12:27:19 am
Quote from: Donald Lobo on February 03, 2009, 08:30:38 am
Another easier option might be to wait for an official MySQL 5.1 release and use the partitioning feature: http://dev.mysql.com/tech-resources/articles/performance-partitioning.html

For what it’s worth, Sun released MySQL 5.1.30 on November 27th and declared it stable on December 8th, so we can start thinking about using its features for such add-ons (but be prepared for the usual outcries if a feature requiring MySQL 5.1 becomes popular/useful…).
If you found the above helpful, please consider helping us in return – you can even steer CiviCRM’s future and help us extend CiviCRM in ways useful to you.

Dennis Gray

  • Ask me questions
  • ****
  • Posts: 472
  • Karma: 1
  • CiviCRM version: Various. See post.
  • CMS version: Drupal, Wordpress and Joomla. See post.
  • MySQL version: TBA
  • PHP version: TBA
Re: Cleaning up the CiviMail event queue
July 25, 2009, 08:45:28 pm
Has there been any progress on this since Rob's last post?

Denver Dave

  • Ask me questions
  • ****
  • Posts: 471
  • Karma: 9
Re: Cleaning up the CiviMail event queue
July 25, 2009, 09:18:02 pm
We now have over a million records in our Drupal / CiviCRM database.  Rather than run the system until it drops, I'm thinking we will have to do something.  The lions share of the records are in just 4 few tables:

civicrm_activity_history 329,000
civicrm_mailing_event delivered 195,000
civicrm_mailing_event_opened 76,000
civicrm_mailing_event_queue 225,000

Total 825,000 for just these 4 files.

Any idea of the ramifications if we just empty these 4 tables?  

I think the last 3 may relate to hard it is to manually delete items from the even mailing queue one by one - would love to have a check box or other way to purge faster.  We currently have 423 items in our mailing event queue, partly because we have quite a few people doing mailings to different segments of our contact list.

I'm not sure the risks of purging the above files are any greater than letting the number of records grow unchecked.  Also, if we ever have to restore from backups, I'm pretty sure that I would have trouble with the big tables.

What are your thoughts - thanks

Dave

« Last Edit: July 27, 2009, 10:33:59 am by Denver Dave »

Denver Dave

  • Ask me questions
  • ****
  • Posts: 471
  • Karma: 9
Re: Cleaning up the CiviMail event queue
July 27, 2009, 10:36:32 am
Thanks - glad someone else recognizes the issue.  I don't think it will take very long to empty the tables, but with related tables it might stop me or maybe have unintended consequences, so thought I'd ask around.

We don't have that big a database, only 6,400 contacts.  However, with many people doing mailings, I'll bet we have more items in the CiviMail event queue.

Thanks - Dave

Pages: [1] 2 3
  • CiviCRM Community Forums (archive) »
  • Old sections (read-only, deprecated) »
  • Support »
  • Using CiviCRM »
  • Using CiviMail (Moderator: Piotr Szotkowski) »
  • Cleaning up the CiviMail event queue

This forum was archived on 2017-11-26.