CiviCRM Community Forums (archive)

*

News:

Have a question about CiviCRM?
Get it answered quickly at the new
CiviCRM Stack Exchange Q+A site

This forum was archived on 25 November 2017. Learn more.
How to get involved.
What to do if you think you've found a bug.



  • CiviCRM Community Forums (archive) »
  • Old sections (read-only, deprecated) »
  • Developer Discussion (Moderator: Donald Lobo) »
  • Slow SQL SELECT on external_identifier
Pages: [1]

Author Topic: Slow SQL SELECT on external_identifier  (Read 2081 times)

bpmccain

  • I post frequently
  • ***
  • Posts: 255
  • Karma: 5
  • CiviCRM version: 4.1
  • CMS version: Drupal 7.12
  • MySQL version: 5.2
  • PHP version: 5.2
Slow SQL SELECT on external_identifier
April 23, 2012, 09:15:51 am
I'm doing some very simple mysql db queries using external identifier, such as

Code: [Select]
SELECT *
FROM  `civicrm_contact` contact
WHERE contact.external_identifier =123456

but they seem to be taking a really long time - anywhere between 3 and 6 seconds per query. If instead of external_identifier I use contact.id then the query time drops to 0.004s.

It is a problem I need to solve, as we regularly import external data coming from a third party based on external_identifier. Importing close to 1 million records when it takes 3 to 6 seconds per query, makes the import process impractical.

I know that contact.id is a Primary Key, while external_identifier is set up as a Unique Index, but the speed shouldn't be as slow as it is.

Brian


xavier

  • Forum Godess / God
  • I’m (like) Lobo ;)
  • *****
  • Posts: 4453
  • Karma: 161
    • Tech To The People
  • CiviCRM version: yes probably
  • CMS version: drupal
Re: Slow SQL SELECT on external_identifier
April 23, 2012, 11:00:22 am
Querying the table on a field that is a unique index should be instantaneous.

Could you explain the query to see if it is really using it?

Code: [Select]
explain select * from civicrm_contact where external_identifier='1';
+----+-------------+-------+------+---------------+------+---------+------+------+-----------------------------------------------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra                                               |
+----+-------------+-------+------+---------------+------+---------+------+------+-----------------------------------------------------+
|  1 | SIMPLE      | NULL  | NULL | NULL          | NULL | NULL    | NULL | NULL | Impossible WHERE noticed after reading const tables |
+----+-------------+-------+------+---------------+------+---------+------+------+-----------------------------------------------------+
1 row in set (0.00 sec)


It doesn't look ok to me, but I have no idea what's the pb. SQL Guru to the rescue?


X+
« Last Edit: April 23, 2012, 11:03:59 am by xavier »
-Hackathon and data journalism about the European parliament 24-26 jan. Watch out the result

demeritcowboy

  • Ask me questions
  • ****
  • Posts: 570
  • Karma: 42
  • CiviCRM version: Always the latest!
  • CMS version: Drupal 6 mostly, still evaluating 7.
  • MySQL version: Mix of 5.0 / 5.1 / 5.5
  • PHP version: 5.3, usually on Windows
Re: Slow SQL SELECT on external_identifier
April 23, 2012, 02:26:53 pm
If you select * it probably won't use the index.

Try just selecting what you need, or select id, then select * where id=

bpmccain

  • I post frequently
  • ***
  • Posts: 255
  • Karma: 5
  • CiviCRM version: 4.1
  • CMS version: Drupal 7.12
  • MySQL version: 5.2
  • PHP version: 5.2
Re: Slow SQL SELECT on external_identifier
April 23, 2012, 02:48:50 pm
I've tried that and it improves it somewhat, but not by much. Still in the 2s range.

I have tried changing external_identifier to type INT instead of VARCHAR and that has dropped it to 0.006s. What would the broader implications of that change be? All of our external_identifiers will be INT.

Chris Burgess

  • Ask me questions
  • ****
  • Posts: 675
  • Karma: 59
Re: Slow SQL SELECT on external_identifier
April 24, 2012, 01:42:55 am
Quote from: bpmccain on April 23, 2012, 02:48:50 pm
I've tried that and it improves it somewhat, but not by much. Still in the 2s range.

I have tried changing external_identifier to type INT instead of VARCHAR and that has dropped it to 0.006s. What would the broader implications of that change be? All of our external_identifiers will be INT.

Things will probably seem fine ... until your next CiviCRM upgrade :)

At that point, if CiviCRM makes any upgrade references to the column you've altered, the upgrade process will probably fail hard. CiviCRM's DB update processes expect the DB to be in the exact state they created it (and built as they are as SQL generated from .tpl, it's not really possible for them to check column types or indexes, or to catch errors).

Changing the column type is probably minor and low-risk, but fair warning ...

Quote from: demeritcowboy on April 23, 2012, 02:26:53 pm
If you select * it probably won't use the index.

Try just selecting what you need, or select id, then select * where id=

Are you sure about that? I've never heard that MySQL uses the SELECTed columns to determine which indexes it will use ... can you give any reference for that behaviour?

EXPLAIN shows the same results for the query with * and with the columns listed explicitly -

Code: [Select]
mysql> explain select id, contact_type,  contact_sub_type,  do_not_email,  do_not_phone,  do_not_mail,  do_not_sms,  do_not_trade,  is_opt_out,  legal_identifier,  external_identifier,  sort_name,  display_name,  nick_name,  legal_name,  image_URL,  preferred_communication_method,  preferred_language,  preferred_mail_format,  hash,  api_key,  source,  first_name,  middle_name,  last_name,  prefix_id,  suffix_id,  email_greeting_id,  email_greeting_custom,  email_greeting_display,  postal_greeting_id,  postal_greeting_custom,  postal_greeting_display,  addressee_id,  addressee_custom,  addressee_display,  job_title,  gender_id,  birth_date,  is_deceased,  deceased_date,  household_name,  primary_contact_id,  organization_name,  sic_code,  user_unique_id,  employer_id,  is_deleted from civicrm_contact where external_identifier=1;
+----+-------------+-----------------+------+------------------------+------+---------+------+------+-------------+
| id | select_type | table           | type | possible_keys          | key  | key_len | ref  | rows | Extra       |
+----+-------------+-----------------+------+------------------------+------+---------+------+------+-------------+
|  1 | SIMPLE      | civicrm_contact | ALL  | UI_external_identifier | NULL | NULL    | NULL |    3 | Using where |
+----+-------------+-----------------+------+------------------------+------+---------+------+------+-------------+
1 row in set (0.00 sec)
Code: [Select]
mysql> explain select * from civicrm_contact where external_identifier=1;
+----+-------------+-----------------+------+------------------------+------+---------+------+------+-------------+
| id | select_type | table           | type | possible_keys          | key  | key_len | ref  | rows | Extra       |
+----+-------------+-----------------+------+------------------------+------+---------+------+------+-------------+
|  1 | SIMPLE      | civicrm_contact | ALL  | UI_external_identifier | NULL | NULL    | NULL |    3 | Using where |
+----+-------------+-----------------+------+------------------------+------+---------+------+------+-------------+
1 row in set (0.00 sec)

Quote from: xavier on April 23, 2012, 11:00:22 am
Querying the table on a field that is a unique index should be instantaneous.

Could you explain the query to see if it is really using it?

Code: [Select]
explain select * from civicrm_contact where external_identifier='1';
+----+-------------+-------+------+---------------+------+---------+------+------+-----------------------------------------------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra                                               |
+----+-------------+-------+------+---------------+------+---------+------+------+-----------------------------------------------------+
|  1 | SIMPLE      | NULL  | NULL | NULL          | NULL | NULL    | NULL | NULL | Impossible WHERE noticed after reading const tables |
+----+-------------+-------+------+---------------+------+---------+------+------+-----------------------------------------------------+
1 row in set (0.00 sec)

It doesn't look ok to me, but I have no idea what's the pb. SQL Guru to the rescue?

That "Extra" result is a false alarm - MySQL has checked the index and verified that no rows match the WHERE provided (you have no external_identifier=1) so it's internally added something like WHERE 0, which is the Impossible WHERE.
@xurizaemon ● www.fuzion.co.nz

demeritcowboy

  • Ask me questions
  • ****
  • Posts: 570
  • Karma: 42
  • CiviCRM version: Always the latest!
  • CMS version: Drupal 6 mostly, still evaluating 7.
  • MySQL version: Mix of 5.0 / 5.1 / 5.5
  • PHP version: 5.3, usually on Windows
Re: Slow SQL SELECT on external_identifier
April 24, 2012, 06:03:13 am
I don't know exactly why, but compare where you are just selecting id:
explain select id from civicrm_contact where external_identifier=1
Probably something about having to read the whole row makes it change its mind - the optimizer determines there's no benefit (rightly or wrongly).

I do find it interesting though about the int vs. varchar. I'd expect that on a really old computer, but maybe worth investigating the use of indexes in civi more.

Hershel

  • Forum Godess / God
  • I’m (like) Lobo ;)
  • *****
  • Posts: 4640
  • Karma: 176
    • CiviHosting
  • CiviCRM version: Latest
  • CMS version: Mostly WordPress and Drupal
Re: Slow SQL SELECT on external_identifier
April 24, 2012, 08:32:50 am
Quote from: demeritcowboy on April 24, 2012, 06:03:13 am
I do find it interesting though about the int vs. varchar. I'd expect that on a really old computer, but maybe worth investigating the use of indexes in civi more.

Also on really new computers. If an index is integer, the column type should be int. With very large datasets, this type of optimization becomes critical actually.
CiviHosting and CiviOnline -- The CiviCRM hosting experts, since 2007

See here for the official: What to do if you think you've found a bug.

demeritcowboy

  • Ask me questions
  • ****
  • Posts: 570
  • Karma: 42
  • CiviCRM version: Always the latest!
  • CMS version: Drupal 6 mostly, still evaluating 7.
  • MySQL version: Mix of 5.0 / 5.1 / 5.5
  • PHP version: 5.3, usually on Windows
Re: Slow SQL SELECT on external_identifier
April 24, 2012, 02:23:43 pm
Just seems like such a large order of magnitude difference, but I see now he said 1 million rows.

I'm still curious for my own benefit - if I get some time will do some indexing tests.

xavier

  • Forum Godess / God
  • I’m (like) Lobo ;)
  • *****
  • Posts: 4453
  • Karma: 161
    • Tech To The People
  • CiviCRM version: yes probably
  • CMS version: drupal
Re: Slow SQL SELECT on external_identifier
April 24, 2012, 11:26:59 pm
Trying to understand: So indeed, when querying one existing identifier, it seems to use the index, all good.

And you are saying that it's dog slow on your side but normal fast if you convert the column to an  int? Have you run a mysql tuner and checked if  the params for memory & co are ok?

Code: [Select]
mysql> explain select * from civicrm_contact where external_identifier="00132ed0388c91fdbbc333d0ba9cbed3";
+----+-------------+-----------------+-------+------------------------+------------------------+---------+-------+------+-------+
| id | select_type | table           | type  | possible_keys          | key                    | key_len | ref   | rows | Extra |
+----+-------------+-----------------+-------+------------------------+------------------------+---------+-------+------+-------+
|  1 | SIMPLE      | civicrm_contact | const | UI_external_identifier | UI_external_identifier | 99      | const |    1 |       |
+----+-------------+-----------------+-------+------------------------+------------------------+---------+-------+------+-------+
1 row in set (0.00 sec)
-Hackathon and data journalism about the European parliament 24-26 jan. Watch out the result

bpmccain

  • I post frequently
  • ***
  • Posts: 255
  • Karma: 5
  • CiviCRM version: 4.1
  • CMS version: Drupal 7.12
  • MySQL version: 5.2
  • PHP version: 5.2
Re: Slow SQL SELECT on external_identifier
April 25, 2012, 08:37:43 am
That is what it seems. It is lightning fast with the column coverted to an INT.

I have run mysqltuner.pl several times on the server, and while it is probably still not completely optimized it is pretty good. I have it back on a 1.7GB machine right now, but the problem existed even when I scaled it up to a 17.5 GB machine that was optimized.

As for upgrades, I'll just have to make a note to change it back to VARCHAR before the upgrade and back to INT after.

Brian

Pages: [1]
  • CiviCRM Community Forums (archive) »
  • Old sections (read-only, deprecated) »
  • Developer Discussion (Moderator: Donald Lobo) »
  • Slow SQL SELECT on external_identifier

This forum was archived on 2017-11-26.