CiviCRM Community Forums (archive)

*

News:

Have a question about CiviCRM?
Get it answered quickly at the new
CiviCRM Stack Exchange Q+A site

This forum was archived on 25 November 2017. Learn more.
How to get involved.
What to do if you think you've found a bug.



  • CiviCRM Community Forums (archive) »
  • Old sections (read-only, deprecated) »
  • General Discussion (please no support requests here!) (Moderator: Michał Mach) »
  • database performance issues involving custom files
Pages: [1]

Author Topic: database performance issues involving custom files  (Read 3514 times)

nmiracle

  • Guest
database performance issues involving custom files
August 14, 2008, 04:44:12 am
We are a new user of civicrm with a database of roughly 1 million individuals.  All have addresses, and many have at least one phone and some have emails.  We also have a fairly large number of custom fields that specifically have to do with our end application:  an election campaign.  We have WAY more than the 20 suggested 'extra' fields on the wiki   

1.  Some of this is because there are fields we need in the database that are not surfaced in the UI (at least not that we can find).  There are specifically the fields in civicrm_address that split the addresses into the pieces that are needed for creating canvassing lists.    There also appears to be no automated mechanism for canonicalizing addresses and splitting automatically into these useful fields

a)  does anyone know why these are not surfaced in the UI for querying?
b)  any history on why these even exist if there is no maintenance for them?

 If we can surface these, then we can eliminate about a half dozen of our 'extra' fields.   Anyone have any idea why these are not visible and do not appear to be maintained?

We are having some unpleasant performance problems with the db and there seem to be two causes: 

a)  all custom fields are automatically set up as varchar(255), which is not very efficient to index (particularly if the source field was a boolean)
b)  I'm seeing that the queries on the custom fields look like this, which isn't the best:

SELECT count(DISTINCT contact_a.id)  FROM civicrm_contact contact_a
LEFT JOIN civicrm_value_1_fieldsfromexistingdatabase2_5 ON civicrm_value_1_field
sfromexistingdatabase2_5.entity_id = contact_a.id  WHERE  ( LOWER(civicrm_value_
1_fieldsfromexistingdatabase2_5.housedistrict) = '98' )  AND  ( 1 );

I've already done the drops of all unneeded indices.

Now I'm thinking about making two changes:

1.  modifying the custom fields to match the datatypes of our original fields;
2.  finding whatever it is that constructs the queries against the db and removing the LOWER in the statement.  Doing so will cause no harm in the matching because MySQL matching is case-insensitive.  Using 'lower' means that the query won't use any existing index and that has a pretty significant performance effect.

Any comments, caveats or suggestions?

Nancy Miracle


Donald Lobo

  • Administrator
  • I’m (like) Lobo ;)
  • *****
  • Posts: 15963
  • Karma: 470
    • CiviCRM site
  • CiviCRM version: 4.2+
  • CMS version: Drupal 7, Joomla 2.5+
  • MySQL version: 5.5.x
  • PHP version: 5.4.x
Re: database performance issues involving custom files
August 14, 2008, 12:02:04 pm
Quote from: nmiracle on August 14, 2008, 04:44:12 am
We have WAY more than the 20 suggested 'extra' fields on the wiki   

the new custom group structure can handle a significantly larger set of custom groups/fields.

Quote from: nmiracle on August 14, 2008, 04:44:12 am
1.  Some of this is because there are fields we need in the database that are not surfaced in the UI (at least not that we can find).  There are specifically the fields in civicrm_address that split the addresses into the pieces that are needed for creating canvassing lists.    There also appears to be no automated mechanism for canonicalizing addresses and splitting automatically into these useful fields

a)  does anyone know why these are not surfaced in the UI for querying?
b)  any history on why these even exist if there is no maintenance for them?

This is for a future enhancement and to potentially be able to generate walklists etc. There was a proposal to get rid of them in 2.1 (since we are not exposing/using them, but folks in the community felt they should stay and they can use it outside of civicrm via drupal modules etc

Quote from: nmiracle on August 14, 2008, 04:44:12 am
If we can surface these, then we can eliminate about a half dozen of our 'extra' fields.   Anyone have any idea why these are not visible and do not appear to be maintained?

You can surface them and make the appropriate changes to the UI etc. There has not been sufficient demand / no contributions to make this happen in core as yet

Quote from: nmiracle on August 14, 2008, 04:44:12 am
We are having some unpleasant performance problems with the db and there seem to be two causes: 

a)  all custom fields are automatically set up as varchar(255), which is not very efficient to index (particularly if the source field was a boolean)

we do not ask for the size of the 'data_type' "string" in case of text/alphanumeric fields. hence the default of 255. If you choose yes/no, i'm pretty sure we use a tinyint for it. That raises a good point, and we can add the "size" in 2.2 to create a column of a more appropriate size

Quote from: nmiracle on August 14, 2008, 04:44:12 am
b)  I'm seeing that the queries on the custom fields look like this, which isn't the best:

SELECT count(DISTINCT contact_a.id)  FROM civicrm_contact contact_a
LEFT JOIN civicrm_value_1_fieldsfromexistingdatabase2_5 ON civicrm_value_1_field
sfromexistingdatabase2_5.entity_id = contact_a.id  WHERE  ( LOWER(civicrm_value_
1_fieldsfromexistingdatabase2_5.housedistrict) = '98' )  AND  ( 1 );

In 2.1 we have eliminated LOWER in quite a few cases. Is housedistrict of type 'Integer'? If so, that query is wrong, if of type 'string', our users (so far) prefer to have that query as: civicrm_value_1_fieldsfromexistingdatabase2_5.housedistrict LIKE '%98%'; yes, we do know the major performance drain this has. In 2.1 for the sort_name / email fields we do the inefficient way, but have a flag to allow admins to restrict the wildcard to just the suffix.

Quote from: nmiracle on August 14, 2008, 04:44:12 am
I've already done the drops of all unneeded indices.

We create an index only if you have specified the field as searchable


Now I'm thinking about making two changes:

Quote from: nmiracle on August 14, 2008, 04:44:12 am
1.  modifying the custom fields to match the datatypes of our original fields;

i assume u mean reduce the length of the varchar in terms of size. Also please do file an issue so we get this change for 2.2

Quote from: nmiracle on August 14, 2008, 04:44:12 am
2.  finding whatever it is that constructs the queries against the db and removing the LOWER in the statement.  Doing so will cause no harm in the matching because MySQL matching is case-insensitive.  Using 'lower' means that the query won't use any existing index and that has a pretty significant performance effect.

For custom fields, most of the query code is in: CRM/Core/BAO/CustomQuery.php. We'd be happy to work with you to optimize the code and the queries generated. If you need some real time help or explanations, feel free to get on IRC: http://embed.mibbit.com/?server=irc.freenode.net&channel=%23civicrm&forcePrompt=true

lobo
A new CiviCRM Q&A resource needs YOUR help to get started. Visit our StackExchange proposed site, sign up and vote on 5 questions

nmiracle

  • Guest
Re: database performance issues involving custom files
August 22, 2008, 08:28:54 pm
Per your suggestion, I did a couple of things and it seems to have significantly improved the performance:

1.  Removed a bunch of the custom fields
2.  Restructured them to be type 'int' or smaller varchar fields
3.  changed line 298 of CustomQuery.php to not use the 'LOWER'  ($sql = "($fieldName)";)

I'm not sure what the best thing to do about this is:
$sqlValue[] = "( $sql like '%" . CRM_Core_BAO_CustomOption::VALUE_SEPERATOR . $k . CRM_Core_BAO_CustomOption::VALUE_SEPERATOR . "%' ) ";

I'll have to think about it a bit.

I did file a couple of issues on the size of the fields, so y'all have that as documentation.

Donald Lobo

  • Administrator
  • I’m (like) Lobo ;)
  • *****
  • Posts: 15963
  • Karma: 470
    • CiviCRM site
  • CiviCRM version: 4.2+
  • CMS version: Drupal 7, Joomla 2.5+
  • MySQL version: 5.5.x
  • PHP version: 5.4.x
Re: database performance issues involving custom files
August 23, 2008, 12:55:33 am
Quote from: nmiracle on August 22, 2008, 08:28:54 pm
I'm not sure what the best thing to do about this is:
$sqlValue[] = "( $sql like '%" . CRM_Core_BAO_CustomOption::VALUE_SEPERATOR . $k . CRM_Core_BAO_CustomOption::VALUE_SEPERATOR . "%' ) ";

that might be a hard one to normalize since we are collapsing and storing checkbox/multi-select data in one db column (i.e. implicitly denormalizing them).

lobo

A new CiviCRM Q&A resource needs YOUR help to get started. Visit our StackExchange proposed site, sign up and vote on 5 questions

Donald Lobo

  • Administrator
  • I’m (like) Lobo ;)
  • *****
  • Posts: 15963
  • Karma: 470
    • CiviCRM site
  • CiviCRM version: 4.2+
  • CMS version: Drupal 7, Joomla 2.5+
  • MySQL version: 5.5.x
  • PHP version: 5.4.x
Re: database performance issues involving custom files
August 23, 2008, 12:09:47 pm

i thought about this issue a wee bit last nite. One way of solving this issue in the long term for multi-select / checkbox is to create a database column of type boolean / tinyint(1) for every option. (this is feasible in the 2.x series since each custom group has its own table). The number of columns will be large if you have custom fields with lots of options (but that should not be a big hassle). However search performance will be significantly faster since the below query now will change to:

foreach option value selected in search form:

$sqlValue[] = "$table_name.column_name_option_key = 1"

and then an implode with either AND/OR depending on the sql op chosen (a 2.1 feature)

this is a pretty big change (from a db and upgrade perspective) that i'd prefer to push it to 3.0. However if you are willing to work on it and incorporate it into CiviCRM we might be willing to work it into 2.2. I think this is the right path to go down in the long run. It also make export much easier

lobo
A new CiviCRM Q&A resource needs YOUR help to get started. Visit our StackExchange proposed site, sign up and vote on 5 questions

Pages: [1]
  • CiviCRM Community Forums (archive) »
  • Old sections (read-only, deprecated) »
  • General Discussion (please no support requests here!) (Moderator: Michał Mach) »
  • database performance issues involving custom files

This forum was archived on 2017-11-26.