Author Topic: How to find and merge "near duplicate" strings (Read 2134 times)

Michael McAndrew · December 01, 2014, 11:44:04 am

erm, well it is on the list...

mathieu · December 01, 2014, 12:19:36 pm

@Coleman: the "master" branch will be sent to Transifex when 4.6 is branched. If we sent strings regularly, we would risk adding/removing too many strings all the time, before review. Changes to 4.5 should be sent when new releases are done, but honestly I have not been doing it very regularly.

"Sentence case" vs "Title Case": having a style/standard would be great, but please avoid massively renaming existing strings (this will really annoy translators).

Michael McAndrew · December 02, 2014, 03:53:12 am

Here is a start at User interface text standards: http://wiki.civicrm.org/confluence/display/CRM/User+interface+text

I also started collecting legacy interface guidelines here: http://wiki.civicrm.org/confluence/display/CRM/Legacy+interface+standard+pages

It is ripped off from https://www.drupal.org/node/604342.

Quote

please avoid massively renaming existing strings (this will really annoy translators).

Well Choosing sentence case over Title Case would do that, right? I do think it is a lot nicer but that's only me, and if there is no way to do it automatically, and it is really going to annoy translators, then maybe we should leave.

If you want to take a quick look over what I have done and add stuff from this forum that you think is useful, that would be cool.

Coleman Watts · December 02, 2014, 01:16:44 pm

Thanks Michael for getting the ball rolling there.

My biggest concern remains that there is nothing to assist developers with this. IDEs do not understand ts() and so cannot do anything helpful like auto-suggesting existing strings as the developer types (man that would be cool!). The best I can think of would be to do some kind of PR-level testing to report on what new strings are being introduced by each commit (and ideally put up a red flag if they are very similar to any existing string). But I'm not really sure how to implement that either...

totten · December 03, 2014, 03:00:26 am

Quote from: Michael McAndrew on December 02, 2014, 03:53:12 am

Here is a start at User interface text standards: http://wiki.civicrm.org/confluence/display/CRM/User+interface+text

Cool. A lot of these are good-points. Added some comments.

Quote from: Michael McAndrew on December 02, 2014, 03:53:12 am

Quote
please avoid massively renaming existing strings (this will really annoy translators).

Well Choosing sentence case over Title Case would do that, right? I do think it is a lot nicer but that's only me, and if there is no way to do it automatically, and it is really going to annoy translators, then maybe we should leave.

Agree we shouldn't change things on a whim. But it seems like this would be a common enough problem ... that someone would have written a re-keying mechanism for use when the English-language string changes in trivial ways ...

Quote from: Coleman Watts on December 02, 2014, 01:16:44 pm

The best I can think of would be to do some kind of PR-level testing to report on what new strings are being introduced by each commit (and ideally put up a red flag if they are very similar to any existing string). But I'm not really sure how to implement that either...

Trying to break that down into smaller questions...

What's the baseline? -- The tool should report "new" strings, but compared to what? We could compare to "the current official code in the target branch" or "the strings in the last release" or "the strings currently known to Transifex" or "all of the above".
How to report? -- It's straight-forward to add a new item to the Jenkins navbar (e.g. like "CiviBuild" in https://test.civicrm.org/job/CiviCRM-Core-Matrix/CIVIVER=4.4,label=test-debian6-1/433/ ) or to mark a build as *failed* (red). It's a bit more involved to display detailed messages in Github's UI.

Regarding PHPUnit -- suppose we had a rule like "strings must not include markup." A simple unit test might look like this (pseudocode):

Code: [Select]

class StringConformanceTest {
  function testNoMarkup() {
    system("bin/extract-strings > /tmp/all-strings.txt");
    foreach (read("/tmp/all-strings.txt") as $string) {
      $this->assertFalse(containsMarkup($string));
    }
  }
}

If one wanted to run the same test but focus only on changed files, then maybe pseudocode like:

Code: [Select]

$ git diff $targetBranch..HEAD > /tmp/changes.diff
$ env DIFF=/tmp/changes.diff ./scripts/phpunit StringConformanceTest

class StringConformanceTest {
  function testNoMarkup() {
    if ($diffFile = getenv('DIFF')) {
      system("bin/extract-strings $diffFile > /tmp/all-strings.txt");
    } else {
      system("bin/extract-strings > /tmp/all-strings.txt");
    }
    foreach (read("/tmp/all-strings.txt") as $string) {
      $this->assertFalse(containsMarkup($string));
    }
  }
}

joanne · December 03, 2014, 02:42:18 pm

I realise the discussion has moved on from this, but I am sure I have met situations where having Title Case instead of Sentence case offered more flexibility in terms of word replacement.

CiviCRM Community Forums (archive)

News:

Author Topic: How to find and merge "near duplicate" strings (Read 2134 times)

Michael McAndrew

Re: How to find and merge "near duplicate" strings

mathieu

Re: How to find and merge "near duplicate" strings

Michael McAndrew

Re: How to find and merge "near duplicate" strings

Coleman Watts

Re: How to find and merge "near duplicate" strings

totten

Re: How to find and merge "near duplicate" strings

joanne

Re: How to find and merge "near duplicate" strings