CiviCRM Community Forums (archive)

*

News:

Have a question about CiviCRM?
Get it answered quickly at the new
CiviCRM Stack Exchange Q+A site

This forum was archived on 25 November 2017. Learn more.
How to get involved.
What to do if you think you've found a bug.



  • CiviCRM Community Forums (archive) »
  • Old sections (read-only, deprecated) »
  • Developer Discussion (Moderator: Donald Lobo) »
  • Attempt at PostgreSQL made
Pages: [1]

Author Topic: Attempt at PostgreSQL made  (Read 2319 times)

dhruvahuja

  • Guest
Attempt at PostgreSQL made
June 07, 2010, 01:15:01 pm
Since last month, I have spent some time on trying to port CiviCRM to PostgreSQL. The primary reason being MySQL's performance. In this era of cloud computing and slower multi-core processors, rather than 3.0+ GHz multi-processor machines, MySQL fails to perform well. I went through various benchmarks on the Internet before starting with this, and discovered that PostgreSQL's performance was by far better than MySQL's on multi-core or multi-processor systems. MySQL's InnoDB tables, which are the engine of choice by CiviCRM, cannot perform parallel query execution. InnoDB relies on the clock speed of the CPU to deliver fast results; whereas PostgreSQL can can do parallel query execution (GridSQL/PgPool). I mostly take dedicated servers from The Planet. Finding a high clock speed machine with them is becoming difficult each day.

So, here are my efforts until now. I have not been able to make it all work on PostgreSQL because the changes seem endless.

I am linking 3 files here:

structure.sql - this has the columns, primary keys, indexes, unique constraints
data.sql - this is the sample data included with CiviCRM
fk_constraints.sql - this has the ALTER statements to introduce foreign key constraints into the structure, after the data has been loaded

http://staging.techmada.com/civicrm_pgsql/structure.zip
http://staging.techmada.com/civicrm_pgsql/data.zip
http://staging.techmada.com/civicrm_pgsql/fk_constraints.zip

As of CiviCRM's code, changes to get things working seem endless. Here are a few major changes I remember:

1. PostgreSQL likes column names in double quotes when required. Back-ticks are incompatible. Inclusion of hyphens and/or capital letters, or keywords as column names mandate the use of double quotes.
2. Dates and Times in queries must be enclosed in single quotes, and if mentioned together, must be separated by a space. The date format (same) is specified at several places in the code. Each occurence requires manual repair.
3. Automatic Type-Casting is not available in the latest versions of PostgreSQL. I have included some functions to automatically typecast 1 and 0 as True and False respectively. And Integers as Characters. Directly using TRUE, FALSE instead of 1, 0 is compatible with MySQL. But these clauses occur at thousands of places in the code. A global find and replace did not work because of changing syntax, usage and formatting.
4. LIMIT's syntax needed to be changed to include the OFFSET keyword. MySQL supports this variation.
5. Several functions like IF, LEFT, TIMESTAMPDIFF do not exist or work the same way in PostgreSQL.
6. SHOW COLUMNS had to be replaced with a SELECT query to the information schema. Replacement looks compatible with MySQL though.
7. LIKE is case sensitive in PostgreSQL. ILIKE solves the purpose but is incompatible with MySQL.

Again, these are only some of the things that required change. There are many other types of changes I come across whenever I continue testing CiviCRM on PostgreSQL.

I am writing this post to take community feedback on porting CiviCRM to PostgreSQL, the best way to do it, and to team up with any possible members of this forum willing to contribute to this effort.
« Last Edit: June 28, 2010, 05:21:01 pm by dhruvahuja »

Donald Lobo

  • Administrator
  • I’m (like) Lobo ;)
  • *****
  • Posts: 15963
  • Karma: 470
    • CiviCRM site
  • CiviCRM version: 4.2+
  • CMS version: Drupal 7, Joomla 2.5+
  • MySQL version: 5.5.x
  • PHP version: 5.4.x
Re: Attempt at PostgreSQL made
June 07, 2010, 04:09:14 pm

hey dhruv:

a couple of thoughts and ideas:

1. I've enabled blogging for your account on civicrm.org. You might want to blog on your efforts and see if you can get more support from other folks

2. We'd be happy to make core changes to the SQL generation to facilitate this. A lot of the query code and DAO code is automatically generated and hence changes in a few places can basically fix things across the codebase :)

3. that said, maintaining and supporting civicrm on postgres will need to be a community project. we just dont have the time / energy to do so

lobo
 
A new CiviCRM Q&A resource needs YOUR help to get started. Visit our StackExchange proposed site, sign up and vote on 5 questions

dhruvahuja

  • Guest
Re: Attempt at PostgreSQL made
June 08, 2010, 12:15:38 am
Hey Lobo

First of all, thank you very much for the blogging account.

About the SQL generation facility - this sounds promising. If you could point me to an overview of how it works, I will see to what extent it helps my purpose. Is it something available in the SVN? I have noticed that some code generation scripts need to be run when making an SVN checkout work.

I understand that your core team is fully occupied with CiviCRM on MySQL. With the help of this forum, and now the blog, I will try my best to create an independent momentum in PostgreSQL's direction.

Finally, you had a two queries at http://civicrm.org/blogs/lobo/thoughts-and-ideas-civicrm-40 . Find my answers next.

1. "plagued with varying coding patterns, styles and arbitrary repetitions of otherwise re-usable code." giving examples of all the places where u see this happening would be great (and we can clean it up also)

The best example is the date formatting string. If you recursively search for the string pattern YmdHis in the code, you will find it at over 80 locations throughout the code. It should rather have been defined as a constant somewhere. The most common use of this pattern is to prepare a timestamp for an SQL query. There could be a globally available function instead which just does that, and has the date pattern defined in itself.

Another example would be the queries. Sometimes they would be enclosed in single quotes, sometimes in double, and sometimes the variables used inside the double quotes would be enclosed within braces. There is a lack of uniformity in how queries would be formed. There is also some code where variables in queries have been passed as parameters to the executeQuery function for their preparation.

Also, randomly in the code, column and table names have been enclosed in backticks. While I believe that's a good idea for MySQL, it makes migration to another database extremely difficult. Also, I understand that enclosing is sometimes necessary, and there are places in the code where the backticks have been used very intuitively, but this could be accomplished using a central query rewrite function.

The code should also have a strict formatting and indentation standards, like Drupal. I personally use php_beautifier from PEAR. It is integrated into Eclipse's external tools and Subversion's pre-commit hook to keep the code under check. I think Drupal now has scripts offering the same.

The rest of the observations in my journey so far into CiviCRM's code are of lesser importance and effect, but on the same lines as the above.

2. a bit more on your thinking behind: "a framework with a lot of abstraction". if you can make that a bit more concrete will help us rationalize what you mean :)

Let me elaborate on this with examples of the same issues as above, and CakePHP.

In CakePHP, I would create a 'Behavior' and attach it all the Models (database tables.) This Behavior would always and automatically format the date between the frontend format and the format expected by the database, in a bi-directional manner, whenever data would be sent to or queried from its attached tables. The attachment of this behavior to the models can also be global or on per model basis. This will save the hassle of converting to the right format before every query in the code and allow the end programmer to focus more on the logic. Newer team members would also be less likely to make mistakes and/or forget things.

CakePHP has libraries for several databases. It also caches the database design and table structure and writes the queries itself. The programmers are expected to deliver the query requirements to CakePHP in an array format defined in CakePHP's manual. It automatically creates a query from the parameters in the array compatible with the backend DBMS, adds backticks or double quotes as compatible with the DBMS, applies user defined global parameters to all those parameters of the query which were not explicitly defined in the array (like ORDER BY), passes it through 'before' and 'after' functions (sort of hooks) and passes it through classes like Behaviors. This is the kind of abstraction I am talking about.

This format of queries is where well formatted code and array structures become really useful - while reviewing code and looking at diffs. I use php_beautifier's ArrayNested Filter to accomplish this.

Finally, contrary to the belief of some, CakePHP has an amazing performance as well. The trick is to use the recursion level in a relational database carefully and use the Containable Behavior for complex queries. Also, CakePHP does allow totally custom queries as well.

-- Dhruv

xcf33

  • I post frequently
  • ***
  • Posts: 181
  • Karma: 7
  • CiviCRM version: 3.3.2
  • CMS version: Drupal 6.19/6.20
  • MySQL version: 5.x
  • PHP version: 5.2.6
Re: Attempt at PostgreSQL made
June 08, 2010, 09:04:17 pm
Hi

just to chime in a little bit:

dhruvahuja: I have worked for an open source ERP company that swears by Postgres database for its performance and I cannot argue with them. It has close resemblance of Oracle in my opinion (Domains, functions, triggers, etc). I found that it has a more standardized SQL syntax than MySQL and proper storage objects for things like a serial id, etc.

With that being said, I do see advantages of MySQL with CiviCRM at this moment.

1. MySQL has a much larger user base, but more Importantly most hosting company offers LAMP environment, not LAPP environment.
2. My past experience with Drupal on Postgres has been that because many modules were written with MySQL specific syntax (LIMIT clause, not having group by, etc) I often had to patch modules and standardize SQL.


The future (CiviCRM 4.0) may very well include a database abstraction layer (PDO, etc, whatever). New issues would surely arise from it as well. I have some limited experience with CakePHP and I like it very well to but in general any MVC framework can be the right one for a PHP application in my opinion.

Then again, this could be driven by scalability and performance issues of MySQL. The only problem is I don't think CiviCRM has hit the ceiling with MySQL yet.



Cheers!

dhruvahuja

  • Guest
Re: Attempt at PostgreSQL made
June 28, 2010, 05:22:51 pm
A suited Amazon Machine Image can be found at http://aws.amazon.com/solutions/solution-providers/EnterpriseDB/ (III. Postgres Plus Advanced Server v8.3R2 on CentOS v5.3 (64 bit)) .

Pages: [1]
  • CiviCRM Community Forums (archive) »
  • Old sections (read-only, deprecated) »
  • Developer Discussion (Moderator: Donald Lobo) »
  • Attempt at PostgreSQL made

This forum was archived on 2017-11-26.