CiviCRM Community Forums (archive)

*

News:

Have a question about CiviCRM?
Get it answered quickly at the new
CiviCRM Stack Exchange Q+A site

This forum was archived on 25 November 2017. Learn more.
How to get involved.
What to do if you think you've found a bug.



  • CiviCRM Community Forums (archive) »
  • Old sections (read-only, deprecated) »
  • Support »
  • Using CiviCRM »
  • Using CiviMail (Moderator: Piotr Szotkowski) »
  • 2.2.2 and 3.0.1 thrash accented letters when generating plain-text from HTML
Pages: [1]

Author Topic: 2.2.2 and 3.0.1 thrash accented letters when generating plain-text from HTML  (Read 3727 times)

obiuquido144

  • I post occasionally
  • **
  • Posts: 34
  • Karma: 4
2.2.2 and 3.0.1 thrash accented letters when generating plain-text from HTML
October 27, 2009, 04:36:42 pm
Hi, I just went ahead and checked this bug against 3.0.1

If you create a HTML message in CiviMail that contains special characters, e.g. "ěščřžýáé", some of the letters are converted to entities ("ěščřžýáíé").

Unfortunately, the "entitized" letters are then lost on the automatic generation of a plain-text version of the message (if no plain-text version is directly added in the wizard).
Moreover, if you do this in "simple mail", the plain-text body is not generated at all.

Maybe this is an easy fix that could make it to the next release. I am quite surprised Piotr hasn't encountered this issue before as we both use these these weird-accented languages :)

code from CiviMail:
Code: [Select]
...
Content-Type: multipart/alternative;
boundary="=_aea2b3b4038814efa74b3546d37fb406"
Date: Tue, 27 Oct 2009 23:16:04 +0000

--=_aea2b3b4038814efa74b3546d37fb406
Content-Transfer-Encoding: 8bit
Content-Type: text/plain; charset="utf-8"

Sample Header for HTML formatted content.

ěčřž

ĚČŘ

<div class="location vcard"><span class="adr"><span class="street-address">S 15S El Camino Way E</span>
<span class="locality">Collinsville</span>, <span class="region">CT</span> <span class="postal-code">6022</span>
<span class="country-name">United States</span></span></div>http://localhost/drupal/index.php?q=civicrm/mailing/optout&reset=1&jid=2&qid=2&h=ec3ec6d908adc1a2 Sample Footer for HTML formatted
content.
--=_aea2b3b4038814efa74b3546d37fb406
Content-Transfer-Encoding: 8bit
Content-Type: text/html; charset="utf-8"

Sample Header for HTML formatted content.
<p>ě&scaron;čřž&yacute;&aacute;&iacute;&eacute;</p>
<p>Ě&Scaron;ČŘ&Yacute;&Aacute;&Eacute;</p>
<p>&nbsp;</p>
<p><div class="location vcard"><span class="adr"><span class="street-address">S 15S El Camino Way E</span><br /><span class="locality">Collinsville</span>, <span class="region">CT</span> <span class="postal-code">6022</span><br /><span class="country-name">United States</span></span></div>http://localhost/drupal/index.php?q=civicrm/mailing/optout&amp;reset=1&amp;jid=2&amp;qid=2&amp;h=ec3ec6d908adc1a2</p>
Sample Footer for HTML formatted content.
--=_aea2b3b4038814efa74b3546d37fb406--

and from "simple mail"
Code: [Select]
Date: Tue, 27 Oct 2009 23:09:54 +0000

--=_1ef0b528da8b4bbd36aa84a27a601a39
Content-Transfer-Encoding: 8bit
Content-Type: text/plain; charset="utf-8"


--=_1ef0b528da8b4bbd36aa84a27a601a39
Content-Transfer-Encoding: 8bit
Content-Type: text/html; charset="utf-8"

<p>ě&scaron;čřž&yacute;&aacute;&iacute;&eacute;</p>
--=_1ef0b528da8b4bbd36aa84a27a601a39--


And I just noticed that the default {domain.address} token is not 'plaintextized' at all. The issues above seem to be more serious to me, tho.
Can someone please file a bug report, Piotr?

Thank you, Boris
« Last Edit: October 27, 2009, 04:40:26 pm by obiuquido144 »

Donald Lobo

  • Administrator
  • I’m (like) Lobo ;)
  • *****
  • Posts: 15963
  • Karma: 470
    • CiviCRM site
  • CiviCRM version: 4.2+
  • CMS version: Drupal 7, Joomla 2.5+
  • MySQL version: 5.5.x
  • PHP version: 5.4.x
Re: 2.2.2 and 3.0.1 thrash accented letters when generating plain-text from HTML
October 27, 2009, 06:20:33 pm
Quote from: obiuquido144 on October 27, 2009, 04:36:42 pm
And I just noticed that the default {domain.address} token is not 'plaintextized' at all. The issues above seem to be more serious to me, tho.

this has been fixed in 3.0.2

For the html2text issue, we use a 3rd party open source library in: packages/html2text/class.html2text.inc

and use the function  htmlToText($html) in CRM/Utils/String.php

you might want to spend some time figuring out how to fix it and/or file an issue with those package folks. I suspect its a bit beyond our scope


lobo
A new CiviCRM Q&A resource needs YOUR help to get started. Visit our StackExchange proposed site, sign up and vote on 5 questions

obiuquido144

  • I post occasionally
  • **
  • Posts: 34
  • Karma: 4
Re: 2.2.2 and 3.0.1 thrash accented letters when generating plain-text from HTML
October 29, 2009, 01:28:34 am
The problem is that the entities are created by the FCKeditor by default and htmlToText then discards entities.

The solution is to change the default FCKeditor behavior (which as I found out on the net has been quite a frustration for many users for some time).
It's in fckconfig.js on line 65 - should be FCKConfig.ProcessHTMLEntities = false;
The two lines below should also be set to false (Latin and Greek entities).
I suggest this customization becomes the default setting for FCKeditor in CiviCRM.

Regarding "simple mail" not having any content at all in the plaintext section, I'll investigate that later. Most probably it's a different issue.

Piotr Szotkowski

  • Moderator
  • I live on this forum
  • *****
  • Posts: 1497
  • Karma: 57
Re: 2.2.2 and 3.0.1 thrash accented letters when generating plain-text from HTML
October 30, 2009, 02:19:57 am
Quote from: obiuquido144 on October 29, 2009, 01:28:34 am
The problem is that the entities are created by the FCKeditor by default and htmlToText then discards entities.

The solution is to change the default FCKeditor behavior (which as I found out on the net has been quite a frustration for many users for some time).
It's in fckconfig.js on line 65 - should be FCKConfig.ProcessHTMLEntities = false;
The two lines below should also be set to false (Latin and Greek entities).
I suggest this customization becomes the default setting for FCKeditor in CiviCRM.

I definitely agree; filed CRM-5321 to track this (I need to check whether this is also used for escaping the HTML-meaningful <, >, " and & chars, and whether we want them escaped or not).

Alternatively, we can try teaching html2text() to handle escaped entities (and unescape them back); maybe using PHP’s html_entity_decode() would take care of this…

Quote
Regarding "simple mail" not having any content at all in the plaintext section, I'll investigate that later. Most probably it's a different issue.

I just fixed this on the v3.1.email branch, which will get merged to trunk later today. I’m not sure whether we should backport this to v3.0 – I’m for it, as currently we’re sending an empty text part if the text part is missing, which (a) brings confusion to those of us who display the text part by default and (b) can potentially confuse email archive viewers, which also may prefer the text part if it’s available in the email.
If you found the above helpful, please consider helping us in return – you can even steer CiviCRM’s future and help us extend CiviCRM in ways useful to you.

obiuquido144

  • I post occasionally
  • **
  • Posts: 34
  • Karma: 4
Re: 2.2.2 and 3.0.1 thrash accented letters when generating plain-text from HTML
October 30, 2009, 05:06:29 am
Hi Piotr, not sure if I understood your note correctly, but shouldn't it be the other way around? If there's no plain text, try to fetch and convert HTML section into plain text and use that (that's what CiviMAIL does and currently not "simple mail"), instead of discarding the whole plaintext section (that's what I think you're saying you just implemented)?

Piotr Szotkowski

  • Moderator
  • I live on this forum
  • *****
  • Posts: 1497
  • Karma: 57
Re: 2.2.2 and 3.0.1 thrash accented letters when generating plain-text from HTML
November 02, 2009, 12:31:06 am
I’m for an automated HTML → text conversion when the text part is not provided, but we’ll have to discuss that before we’re sure it handles most use cases without problems. I think our main uncertainty is this: if we can’t be sure the HTML → text conversion is 100% accurate, we really shouldn’t use it for stuff like financial data (and most non-CiviMail CiviCRM emails are at least partially related to financial data).

What I implemented was a small fix that at least stopped CiviCRM from sending a blank text version when the text part of the CRM_Uitls_Mail::send() is empty-ish (CiviCRM 3.0 and earlier send an empty text part, which makes clients that prefer the text part over the HTML part display the email as blank rather than showing that the contents are in the HTML part).
If you found the above helpful, please consider helping us in return – you can even steer CiviCRM’s future and help us extend CiviCRM in ways useful to you.

Piotr Szotkowski

  • Moderator
  • I live on this forum
  • *****
  • Posts: 1497
  • Karma: 57
Re: 2.2.2 and 3.0.1 thrash accented letters when generating plain-text from HTML
November 02, 2009, 12:33:57 am
I created CRM-5334 to track this.
If you found the above helpful, please consider helping us in return – you can even steer CiviCRM’s future and help us extend CiviCRM in ways useful to you.

obiuquido144

  • I post occasionally
  • **
  • Posts: 34
  • Karma: 4
Re: 2.2.2 and 3.0.1 thrash accented letters when generating plain-text from HTML
November 02, 2009, 02:33:52 pm
Thanks. Now I understand your decision. The current html->text conversion in the CiviMail module creates a new row for every column in a table, and inserts a blank row before moving to another table row.

But it's super easy to make it tabular - in class.html2text.inc, just delete the last "\n" on line 223 and tables are converted nicely. If you'd like me to send you an example email, let me know.
With this update to the html2text class I think the html->text autoconversion can be safely used for "simple mail" as well, and we actually wouldn't need to disable anything :) And the functionality would be consistent between "simple mail" vs CiviMail.

Piotr Szotkowski

  • Moderator
  • I live on this forum
  • *****
  • Posts: 1497
  • Karma: 57
Re: 2.2.2 and 3.0.1 thrash accented letters when generating plain-text from HTML
November 09, 2009, 09:39:49 am
I fixed the entities issue in r24966 – if you want to, you can apply the patch or overwrite your packages/html2text/class.html2text.inc file with the one from http://svn.civicrm.org/civicrm/branches/v3.0/packages/html2text/ and verify whether this works for you.

I’ll tackle the HTML → text conversion when the text part is missing sometime later this week.
If you found the above helpful, please consider helping us in return – you can even steer CiviCRM’s future and help us extend CiviCRM in ways useful to you.

obiuquido144

  • I post occasionally
  • **
  • Posts: 34
  • Karma: 4
Re: 2.2.2 and 3.0.1 thrash accented letters when generating plain-text from HTML
November 09, 2009, 01:33:11 pm
Thanks, works great. With the html_entity_decode() call this workaround doesn't even require changing the default setting for entities in FCKeditor. (Good for the future - no need to carry over customizations for two packages but just one).
Please also consider removing the ending line break ("\n") ) on the <td>...</td> conversion line:
        "\t\t\\1\n",                            // <td> and </td>
That would make tables look great even in plaintext.
Thank you again for your engagement in this issue. Boris

Piotr Szotkowski

  • Moderator
  • I live on this forum
  • *****
  • Posts: 1497
  • Karma: 57
Re: 2.2.2 and 3.0.1 thrash accented letters when generating plain-text from HTML
November 12, 2009, 04:57:33 am
Quote from: obiuquido144 on November 09, 2009, 01:33:11 pm
Thanks, works great. With the html_entity_decode() call this workaround doesn't even require changing the default setting for entities in FCKeditor. (Good for the future - no need to carry over customizations for two packages but just one).

Right, that was the reason to fix it on the html2text’s side. Also, if any other WYSIWYG editor decides to escape non-US-ASCII characters we’ll handle that automatically.

Quote
Please also consider removing the ending line break ("\n") ) on the <td>...</td> conversion line:
        "\t\t\\1\n",                            // <td> and </td>
That would make tables look great even in plaintext.

Ok, after some consideration (to the tune of ‘but what if the <td>…</td>s all are strung together on one line?’) I did that in r25064, but as it’s not very urgent, I’m a bit reluctant to make the fix on the 3.0 branch (don’t want to change people’s emails when they upgrade from 3.0.2 to 3.0.3).

You can test this by getting class.html2text.inc from http://svn.civicrm.org/civicrm/trunk/packages/html2text/
If you found the above helpful, please consider helping us in return – you can even steer CiviCRM’s future and help us extend CiviCRM in ways useful to you.

Pages: [1]
  • CiviCRM Community Forums (archive) »
  • Old sections (read-only, deprecated) »
  • Support »
  • Using CiviCRM »
  • Using CiviMail (Moderator: Piotr Szotkowski) »
  • 2.2.2 and 3.0.1 thrash accented letters when generating plain-text from HTML

This forum was archived on 2017-11-26.