CiviCRM Community Forums (archive)

*

News:

Have a question about CiviCRM?
Get it answered quickly at the new
CiviCRM Stack Exchange Q+A site

This forum was archived on 25 November 2017. Learn more.
How to get involved.
What to do if you think you've found a bug.



  • CiviCRM Community Forums (archive) »
  • Old sections (read-only, deprecated) »
  • Developer Discussion »
  • Scalability (Moderator: Donald Lobo) »
  • CiviMail Mult-Threading
Pages: [1] 2

Author Topic: CiviMail Mult-Threading  (Read 5499 times)

mbriney

  • I’m new here
  • *
  • Posts: 21
  • Karma: 2
  • Technical Product Manager / VP at Edelman
    • Edelman Public Relations
CiviMail Mult-Threading
May 27, 2010, 05:53:15 am
We did our first large e-mail using CiviMail and a 3rd party MTA yesterday.  Our total list size was 62,236 e-mails.  We mailed in a few batches and did a few experiments and discovered what I think could be a great opportunity to increase overall throughput.

Our first batch was to 8426 e-mails.  The process took a total of 93 minutes.  Averaging 90.6 emails/minute.  This was just one single e-mail running during one single process. 

The total delivery time in the first batch was not ideal so we decided in the next batch of 53,810 e-mails to split it into two e-mails and send them both at the same time.

What's interesting is that when we did this each individual job still processed at 90.6 emails/minute but at the same time.  What this resulted in was a faster overall send rate.  For 53,810 we were able to deliver all of them within 372 minutes at a rate of 144.65 e-mails a minute.

Based on this data I'm wondering whether it would make sense to have a setting somewhere in CiviMail that allows you to define the number of threads per job.  Then when you send a mailing in CiviMail it takes the queued list size and splits it into separate jobs each running on a separate process.  From a reporting standpoint it would all be seen as one e-mail but just offer a way to process mail quicker.

Has anyone looked into mutli-threading individual jobs before?
support CiviCRM through 'make it happen' initiatives!
http://civicrm.org/mih

Donald Lobo

  • Administrator
  • I’m (like) Lobo ;)
  • *****
  • Posts: 15963
  • Karma: 470
    • CiviCRM site
  • CiviCRM version: 4.2+
  • CMS version: Drupal 7, Joomla 2.5+
  • MySQL version: 5.5.x
  • PHP version: 5.4.x
Re: CiviMail Mult-Threading
May 27, 2010, 07:23:51 am

The below basically means that your smtp connection is the bottleneck. One potentially easy way to increase delivery rate is to have a local smtp server which basically relays all email to the remote real mail server

the architecture is setup so that you can deliver multiple jobs at the same time from one / multiple machines. changing this to deliver portions of a job is potentially not too hard (maybe by introducing a sub_job_id with chunks during queue creation?). But i do think doing a local smtp server will give u an even bigger speedup

lobo
A new CiviCRM Q&A resource needs YOUR help to get started. Visit our StackExchange proposed site, sign up and vote on 5 questions

mbriney

  • I’m new here
  • *
  • Posts: 21
  • Karma: 2
  • Technical Product Manager / VP at Edelman
    • Edelman Public Relations
Re: CiviMail Mult-Threading
May 27, 2010, 07:45:28 am
Yes using a local server, even to relay would be faster but our issues is access to the local server.  Most web hosts don't offer this and the Cloud architecture that we're using doesn't either.

The sub_job_id concept seems like it could be more scalable for a variety of hosting solutions.
support CiviCRM through 'make it happen' initiatives!
http://civicrm.org/mih

admin

  • Guest
Re: CiviMail Mult-Threading
May 27, 2010, 01:55:17 pm

if you have access to your cloud infrastructure as a VPS server, cant u install a local mail server there?

the sub_job_id concept is possible but will require schema and code changes. its a bit late for us to consider it for 3.2.

it comes down to locking down a certain set of "email" delivery records so only one thread can get them

lobo

mbriney

  • I’m new here
  • *
  • Posts: 21
  • Karma: 2
  • Technical Product Manager / VP at Edelman
    • Edelman Public Relations
Re: CiviMail Mult-Threading
May 27, 2010, 02:11:32 pm
We don't quite get the same access as other VPS.  This is the system we're running:
http://www.rackspacecloud.com/cloud_hosting_products/sites

We might take a crack at developing the code to split the job.  If we can get it up and running we'll post the commits so you guys can see and if it make sense.  I agree not a 3.2 change... there's already enough on that plate.

Any advice on what files we should be looking at?
support CiviCRM through 'make it happen' initiatives!
http://civicrm.org/mih

Donald Lobo

  • Administrator
  • I’m (like) Lobo ;)
  • *****
  • Posts: 15963
  • Karma: 470
    • CiviCRM site
  • CiviCRM version: 4.2+
  • CMS version: Drupal 7, Joomla 2.5+
  • MySQL version: 5.5.x
  • PHP version: 5.4.x
Re: CiviMail Mult-Threading
May 27, 2010, 02:42:25 pm

the code is here:

CRM/Mailing/BAO/Job.php, function deliver

the very first query would need to be split up among multiple cron jobs

the function runJobs allocates the "job" level run lock

looking at it, i dont think u need the sub_job_id. You'll basically need to partition the "email queued" table in a deterministic manner. So you can say that i'll partition this into N (where N is some constant) sub-jobs and work on the first unlocked sub-job

You'll also need to seperate the locks for queing vs delivering (since the above is basically a delivery lock)

ping us on IRC if you want to discuss further. Note that there was race condition with the delivery lock system (which resulted in duplicate sending). this has been resolved in 3.2

lobo


lobo



A new CiviCRM Q&A resource needs YOUR help to get started. Visit our StackExchange proposed site, sign up and vote on 5 questions

xcf33

  • I post frequently
  • ***
  • Posts: 181
  • Karma: 7
  • CiviCRM version: 3.3.2
  • CMS version: Drupal 6.19/6.20
  • MySQL version: 5.x
  • PHP version: 5.2.6
Re: CiviMail Mult-Threading
June 09, 2010, 02:47:34 pm
I'm looking at the code from CRM\Mailing\BAO\Job.php

In runJobs

I see that the first query basically get the jobs that has been submitted or scheduled to be run at the point the cron is ran

It then puts a lock on the job or jobs and set their status as 'ran'.

It then queues up all the email addresses then delivers them.

So I guess my question is, basically instead of creating one lock for each job to be run by one cron process, we will need to use some logic (like a partition of N number of email) to create multiple locks therefore resulting in multiple queue and deliver methods to be called for a single job.


I also think I'm way over my head too :)

Donald Lobo

  • Administrator
  • I’m (like) Lobo ;)
  • *****
  • Posts: 15963
  • Karma: 470
    • CiviCRM site
  • CiviCRM version: 4.2+
  • CMS version: Drupal 7, Joomla 2.5+
  • MySQL version: 5.5.x
  • PHP version: 5.4.x
Re: CiviMail Mult-Threading
June 09, 2010, 09:30:42 pm

i would probably do it as follows:

1. Introduce the concept of a worker job and a new table to hold worker job ids and status

2. separate the locks for scheduled -> running and processing a running job. So the global job level lock basically only schedules it to the running state (i.e. queues up all the mail in civicrm_mailing_event_queue. It also partitions the jobs into smaller worker jobs and divides the event queue up (additional column)

3. The cron tries to get a lock on the first not complete worker job. Once it gets a lock on the worker job, it goes in and sends all the mail (or the batch limit) for that worker job. On completion it checks if the overall job is completed and if so switches the status to complete

I think there will not be a lot of changes, but a few changes in the core pieces of code

lobo
A new CiviCRM Q&A resource needs YOUR help to get started. Visit our StackExchange proposed site, sign up and vote on 5 questions

xcf33

  • I post frequently
  • ***
  • Posts: 181
  • Karma: 7
  • CiviCRM version: 3.3.2
  • CMS version: Drupal 6.19/6.20
  • MySQL version: 5.x
  • PHP version: 5.2.6
Re: CiviMail Mult-Threading
June 14, 2010, 03:20:09 pm
Hi Lobo:

So I studied the code a little bit more and found the following:

Although the queue and deliver method in Job.php are based on job_id, the problem is that virtually the relationship between civicrm_mailing and civicrm_mailing_event_job is one to one. Excluding the test mail, you virtually have 1 job for each mailing.

Moreover, it looks like to me the key lies in

CRM/Mailing/BAO/Mailing.php

in particular  function &getRecipients($job_id, $includeDelivered = false, $mailing_id = null)

It basically does the processing of inclusion and exclusion of recipients for each job (But essentially each mailing).


So I guess the following steps could be used to split up the mailing.


1. When mailing is submitted, it creates multiple jobs based on how many records a job can hold (say 50000)
2. There will be an additional or two columns in the civicrm_mailing_event_job table to signify the job limit and offset
3. In function &getRecipients($job_id, $includeDelivered = false, $mailing_id = null), introduce new parameter for limit and offset so it only returns a portion of the emails from the mailing
4. Locks can still be set at job level and not too much code will need to be changed from job.php


What do you think.

Donald Lobo

  • Administrator
  • I’m (like) Lobo ;)
  • *****
  • Posts: 15963
  • Karma: 470
    • CiviCRM site
  • CiviCRM version: 4.2+
  • CMS version: Drupal 7, Joomla 2.5+
  • MySQL version: 5.5.x
  • PHP version: 5.4.x
Re: CiviMail Mult-Threading
June 14, 2010, 03:54:32 pm

hey chang:

we'll also need to figure out when a mailing is "complete". each job by itself will have a running/completed status. So i do think that in addition to limit/offset, we'll also need a "parent_job_id"

so the cron job will have 3 tasks: (the first two tasks are being handled by the current code base: however task 1 wil need to be modified)

1. moving a job from scheduled to running and in the process creating multiple job ids with limit and offset and pointing to the parent job id (the queue function in Job.php can do this)

2. delivering the emails that are part of a current job

3. setting the status of a parent job to completed if all its "child jobs"" are complete.

Thus we can still set a lock on each job but get the parallelism. how does this sound?

lobo


A new CiviCRM Q&A resource needs YOUR help to get started. Visit our StackExchange proposed site, sign up and vote on 5 questions

xcf33

  • I post frequently
  • ***
  • Posts: 181
  • Karma: 7
  • CiviCRM version: 3.3.2
  • CMS version: Drupal 6.19/6.20
  • MySQL version: 5.x
  • PHP version: 5.2.6
Re: CiviMail Mult-Threading
June 16, 2010, 12:21:34 pm
Quick question,

If I alter the table to add more columns, can I still use the DAO methods of the entity (for example in this case use something like this)

Code: [Select]
<?php
                $saveJob 
= new CRM_Mailing_DAO_Job( );
                
$saveJob->start_date = date('YmdHis');
                
$saveJob->status     = 'Running';
                
                
// new fields I added to civicrm_mailing_job table
                
$saveJob->type = 'child';
                
$saveJob->parentid = $job->id;

                
$saveJob->save();
?>

Right now I'm just using CRM_Core_DAO object and using the query method to write insert SQL. I'm guessing that I can't use the entity DAO because those new fields are not wrapped.


Also, what's the different between instantiate CRM_Core_DAO vs entity DAO objects?


Thanks,


Chang

xcf33

  • I post frequently
  • ***
  • Posts: 181
  • Karma: 7
  • CiviCRM version: 3.3.2
  • CMS version: Drupal 6.19/6.20
  • MySQL version: 5.x
  • PHP version: 5.2.6
Re: CiviMail Mult-Threading
June 17, 2010, 01:13:19 pm
Hi Lobo, I'm having the process until this point right now:


1. The mailing job (originally one that is created when the user submits the mailing) will now be split into equal parts depending on the limit set

So i.e. if the mailing contains 500 emails and the limit is 100 then there will be 5 'child jobs' created all with the reference to the parent job

2. The 'parent' job's status is now updated to running and with start date set

3. The runJobs query now fetches *Only the child jobs* ($query is slightly modified to put AND type = 'child') and run the queue function to queue up emails for each child jobs and update their status to 'running' and set the start date.

4. The problem occurs in the queue function, I've added a few wrinkles to it but it is not working:

Code: [Select]
<?php
    
public function _queue($testParams = null) {
       
        require_once 
'CRM/Mailing/BAO/Mailing.php';
        
$mailing =& new CRM_Mailing_BAO_Mailing();
        
$mailing->id = $this->mailing_id;
        if (!empty(
$testParams)) {
            
$mailing->getTestRecipients($testParams);
        } else {
// Chang is here:
// We are still getting all the recipients from the parent job 
// (The original so we don't mess with the include/exclude) logic
            
$recipients =& $mailing->getRecipientsObject($this->parentid);

// Chang is here:
// Here we will use the parent jobid to fetch the receipents, except 
// We will introduce the limit and offset from the child job DAO object
// To only pick up segment of the receipents instead of the whole
$i = 0;
            while (
$recipients->fetch()) {

if($this->offset != 0 && $this->limit != 0) {
if(($i >= $this->offset) && ($i < $this->offset + $this->limit)) {
$params = array(
// job_id should be the child job id
'job_id'        => $this->id,
'email_id'      => $recipients->email_id,
'contact_id'    => $recipients->contact_id
);
CRM_Mailing_Event_BAO_Queue::create($params);
}
}
$i++;
            }
        }
    }
?>


Basically, I'm still getting a queue created for the parent job with the entire list of email addresses for the specific mailing. In addition, queue blocks are created for each of the child jobs with the correct limit and offset. So it end up with double number of emails being queued up.


Is there any particular reason this would happen? In my $job->fetch() loop I'm only querying for the child jobs but it somehow still queues up for the parent job.



I can provide you my whole modified Job.php if needed.



Thanks,


Chang

xcf33

  • I post frequently
  • ***
  • Posts: 181
  • Karma: 7
  • CiviCRM version: 3.3.2
  • CMS version: Drupal 6.19/6.20
  • MySQL version: 5.x
  • PHP version: 5.2.6
Re: CiviMail Mult-Threading
June 18, 2010, 03:43:23 pm
Lobo, thanks for pointing out the issues with my code on IRC

So my question now is that I have the "child jobs" queue up part of the mailings based on limit and offset. I'm still firing the deliever() method for each child job and update the child status upon the deliever() returns $IsComplete.

I also now have a runJobs_post() that queries all running "parent" jobs and look up to see if all child jobs of the parent job are complete then update the status of the parent job as well as mark the mailing as complete.

Will the new relationship between mailing and mailing jobs affect the mailing report etc?


Donald Lobo

  • Administrator
  • I’m (like) Lobo ;)
  • *****
  • Posts: 15963
  • Karma: 470
    • CiviCRM site
  • CiviCRM version: 4.2+
  • CMS version: Drupal 7, Joomla 2.5+
  • MySQL version: 5.5.x
  • PHP version: 5.4.x
Re: CiviMail Mult-Threading
June 18, 2010, 04:06:47 pm

we'll basically need to change all the queries that deal with a job to only interact with jobs where the type is null (i..e ignore all child jobs). So a job type should be either "child" or null

Make sure that you are locking in both runJobs_pre and post. might be easier to actually move those functions within runJobs, and get runJobs to call the pre and post, will keep the code more centralized

lobo
A new CiviCRM Q&A resource needs YOUR help to get started. Visit our StackExchange proposed site, sign up and vote on 5 questions

xcf33

  • I post frequently
  • ***
  • Posts: 181
  • Karma: 7
  • CiviCRM version: 3.3.2
  • CMS version: Drupal 6.19/6.20
  • MySQL version: 5.x
  • PHP version: 5.2.6
Re: CiviMail Mult-Threading
July 02, 2010, 06:56:11 am
It turned out that the queries themselves were in place already, I took look at every query that was used in the mailing report and the presentation logic. It actually "sums" up all the statistics from multiple jobs for each mailing already. The only thing I had to do was modify the template to use the "sum" variable rather than the job.0

However, my question now turns if this new delivery system will affect the CiviMailProcessor.php and bounce processing?

Pages: [1] 2
  • CiviCRM Community Forums (archive) »
  • Old sections (read-only, deprecated) »
  • Developer Discussion »
  • Scalability (Moderator: Donald Lobo) »
  • CiviMail Mult-Threading

This forum was archived on 2017-11-26.