Author Topic: Test data (and changes in v4.6) (Read 571 times)

totten · February 04, 2015, 02:02:18 am

One of the most important (but most annoying) parts of writing unit tests is managing the test data. This is particularly true in Civi -- much of Civi's behavior is driven by configuration options (in the DB), and (more generally) data is central to the "CRM" genre. There's a good, general problem-statement about managing test data in the PHPUnit/DBUnit manual -- see the first page of https://phpunit.de/manual/current/en/database.html . Take-away: it's hard!

Civi test code includes a few different patterns for managing test data. Each one comes with some trade-offs. In v4.6, riffing on the work+ideas from dawnthorne and systopia, I added another pattern for cleanup. This seems like a decent moment to recap cleanup in Civi's test suite.

(Aside: I use the term "leakage" in a few places to refer to the phenomenon wherein tests do not cleanup properly. Leakage is often silent/asymptomatic at first... but turns malignant as the test suite grows. Sometimes leakage becomes a conflict where test A and B succeed individually but not en masse. Sometimes leakage becomes a hidden dependency where test A and B succeed en masse but fail individually.)

1. quickCleanup and SQL TRUNCATE

The most common pattern has been to call quickCleanup('table_1', 'table_2',...) inside the setUp() or tearDown() function -- which basically issues a SQL TRUNCATE across the listed tables.

Code: [Select]

<?php
  function testFoo() {
    $cid = createContact();
    $aid = createActivity(...$cid...);
  }
  function tearDown() {
    ...
    $this->quickCleanup(array('civicrm_activity', 'civicrm_contact'));
    ...
  }

Pros:
+ It's the most widely used pattern in our codebase.
+ TRUNCATE resets the table ID#s, so basically every test-case starts counting IDs at the same point (1, 2, 3...)
Cons:
- Need to identify and write down the name of every table touched by a test case. If you miss one, then you get leakage.
- The idiom is applied a bit inconsistently -- sometimes in setUp(), sometimes in tearDown(), sometimes in both. (Doing it in both adds superfluous overhead in the average case but provides stronger consistency guarantees in some edge cases.)
- When tests generate custom-data tables, they're on their own.

2. Manual ID tracking and SQL DELETE

This pattern has been around for a while but not used as much.

Code: [Select]

<?php
protected $cid, $aid;
function testFoo() {
  $this->cid = createContact();
  $this->aid = createActivity(...$this->cid...);
}
function tearDown() {
  ...
  if ($this->aid) deleteActivity($this->aid);
  if ($this->cid) deleteContact($this->cid);
  ...
}

Pros:
+ The technique is not limited to core SQL tables -- it can be adapted to any kind of resource (files, settings, custom-data tables, etc).
Cons:
- Need to log and then delete every test record (often one-by-one). If you miss one, then you get leakage.
- It's hardly even an idiom... just a general idea. There's not much convention around it.
- The tearDown() code should be written defensively (i.e. don't assume that $this->cid and $this->aid are set at the end of every test-case.)
Mixed:
= The delete logic might be SQL ("DELETE FROM ... WHERE...") or it can be an API or BAO function (which tends to boil-down to SQL DELETE but may have some value-add).

3. useTransaction() and SQL ROLLBACK

This pattern is new v4.6. If you flag a test class with useTransaction(TRUE), then it will automatically issue SQL BEGIN (before the test) and SQL ROLLBACK (after the test).

Code: [Select]

<?php
function setUp() {
  ...
  $this->useTransaction(TRUE);
  ...
}
function testFoo() {
  $cid = createContact();
  $aid = createActivity(...$cid...);
}

Pros:
+ One simple statement (which doesn't need to be adapted). The test doesn't need to reference particular SQL schema.
+ ROLLBACK is faster than TRUNCATE or DELETE (e.g. https://gist.github.com/totten/895f44035401eb3e1930 ; in my trial run with a vanilla HDD-backed MySQL server, it seemed to be 2x-20x faster)
Cons:
- If any part of the test involves a schema change or a TRUNCATE, then it will autocommit the transaction -- and produce leakage. This is mostly a problem for tests that involve custom-data or the ACL cache.
- PHPUnit (err, DBUnit) and Civi (err, DB_DataObject) use separate DB APIs, separate DB connections, and therefore separate transactions. Generally, PHPUnit_Extensions_Database_* is not compatible with the transactions in DB_DataObject.
- Doesn't work with WebTests (Aside: https://github.com/Dawnthorn/mysql-concentrator/ provides a cool solution... but it doesn't work with MAMP

).

Note: If you find yourself adding a test to a class which declares useTransaction(), and if you need custom data, then I'd suggest putting the test in a separate class.

Coleman Watts · February 04, 2015, 07:23:52 am

As someone who has stuck a few toes in the waters of unit tests (running the tests, writing a few, fixing some broken ones) I want to understand the premise here, which is that tests ought to clean up after themselves. Is this premise generally accepted throughout the unit testing world, or is it something we've started doing to accommodate brittleness within our own tests?

I'm imagining a simple scenario where test A creates a contact (with id 1 of course) and then checks to make sure it exists. The test passes but it doesn't bother to clean up and delete the contact, and then we move on to test B, which also creates a contact (this one gets id 2) and then test B fails because the contact did not have id 1. In my mind, this is a bad test that makes too many assumptions about the initial state of the database.

totten · February 04, 2015, 06:08:48 pm

In reverse order...

Quote from: Coleman Watts on February 04, 2015, 07:23:52 am

I'm imagining a simple scenario where test A creates a contact (with id 1 of course) and then checks to make sure it exists. The test passes but it doesn't bother to clean up and delete the contact, and then we move on to test B, which also creates a contact (this one gets id 2) and then test B fails because the contact did not have id 1. In my mind, this is a bad test that makes too many assumptions about the initial state of the database.

In this example, both test A and test B have problems, although they're slightly different. One would want to fix both problems:

Test B's problem is that it checks for a specific ID. If you run test B by itself, then it can pass. If you run test A+B in a suite, then it will fail. This will be true even if test A runs SQL DELETE for cid 1 (because MySQL does not reuse ID's by default). Test B should track $id instead of hard coding cid 1.
Test A's problem is that it doesn't cleanup, so it can interfere with subsequent test queries. For example, if subsequent test C creates a batch of 6 contacts and then runs a search, it might expect to get 3 results -- but in fact it gets 4 results (when run in suite with test A). Test C isn't doing anything wrong, but it's still broken.

Quote from: Coleman Watts on February 04, 2015, 07:23:52 am

As someone who has stuck a few toes in the waters of unit tests (running the tests, writing a few, fixing some broken ones) I want to understand the premise here, which is that tests ought to clean up after themselves. Is this premise generally accepted throughout the unit testing world, or is it something we've started doing to accommodate brittleness within our own tests?

I believe it's accepted that one must cleanup; it's the raison d'etre of tearDown(). However, there is room for subjectivity/disagreement/negotiation:

The cleanup requirements depend on the application's design, and some designs require more cleanup than others. If an application's design relies heavily on globals/singletons/statics or external I/O (MySQL or filesystem), then it will require more complex/heavy-handed cleanup, and many folks will frown upon that. However, the problem is with the application's design -- complex cleanup is just the symptom. (By contrast, many tutorials focus on simple designs where cleanup is basically free.)
There are many ways to perform a cleanup (with trade-offs in correctness, simplicity, performance, etc.). Two people might agree that cleanup is necessary but disagree on how to achieve it. (Witness: Three different patterns above! And there are more!)
Cleanup is the flipside of defining a baseline; cleaning up == restoring a baseline. But the baseline is negotiable. The baseline could be an empty DB; it could be a single, standard DB dump; or there could be different baselines for different contexts. (For example, CiviUnitTestCase and CiviSeleniumTestCase establish different baselines.)

Coleman Watts · February 04, 2015, 06:16:24 pm

Makes sense. I'll throw in another approach to baseline which is that we could decide the baseline was "any valid database" in which case the problem you described isn't the fault of test A (for not deleting the contact) but the fault of test C (for not running the search first and taking an initial count of contacts prior to the test). Not sure which approach is ultimately better but I'll accept that that's not our status quo.

CiviCRM Community Forums (archive)

News:

Author Topic: Test data (and changes in v4.6) (Read 571 times)

totten

Test data (and changes in v4.6)

Coleman Watts

Re: Test data (and changes in v4.6)

totten

Re: Test data (and changes in v4.6)

Coleman Watts

Re: Test data (and changes in v4.6)