{"id":147,"date":"2005-12-08T22:11:45","date_gmt":"2005-12-08T11:11:45","guid":{"rendered":"http:\/\/www.somethinkodd.com\/oddthinking\/?p=147"},"modified":"2005-12-08T22:56:10","modified_gmt":"2005-12-08T11:56:10","slug":"wordpress-and-text-encoding","status":"publish","type":"post","link":"https:\/\/www.somethinkodd.com\/oddthinking\/2005\/12\/08\/wordpress-and-text-encoding\/","title":{"rendered":"WordPress and Text Encoding"},"content":{"rendered":"<p>Dear Mythical WordPress Architect,<\/p>\n<p>I am one of your biggest fans; of all the mythical creatures I believe in, you are my favourite.  However, this is an issue that continues to bother me, and I thought I should bring it to your attention.<\/p>\n<p>I have been looking through the code trying to understand why sometimes tags appear in inappropriate places, and sometimes they are stripped out in inappropriate places.<\/p>\n<p>My conclusion is that there simply isn&#8217;t sufficient modelling of the different versions of my content. <\/p>\n<p>The pipelining model has its advantages in terms of making it easy for many simple plugins. It has some disadvantages too, which I have <a href=\"http:\/\/www.somethinkodd.com\/oddthinking\/2005\/09\/03\/lining-up-the-wordpress-filters\/\">covered before<\/a>.<\/p>\n<p>One of the disadvantages that I am uncovering is that the string that represents the blog content is going through repeated transformations from one encoding to another, but it retains the same type &#8211; indeed the same name &#8211; throughout the transformation.<\/p>\n<p>This makes it hard to see that there isn&#8217;t just one type, but many. As a result, it seems to be commonplace for the wrong encoding to be used at the wrong levels.<\/p>\n<p>Let me give a quick example.<\/p>\n<p>My blog content can appear in at least three encodings.<\/p>\n<ol>\n<li>My original content, in a human-writeable markup language.<\/li>\n<li>The same content, converted to standard HTML.<\/li>\n<li>The same content, converted to plain-text.<\/li>\n<\/ol>\n<p>In practice, all three of these roles seem to be served by the same <code>the_content()<\/code> function.<\/p>\n<p>This results in my old issue of <a href=\"http:\/\/www.somethinkodd.com\/oddthinking\/2005\/11\/21\/markdown-is-dead-long-live-markdown\/\">MarkDown conflicting with Live Comment Preview<\/a>. MarkDown is writing #1 in the same string that Live Comment Preview is reading as #2.<\/p>\n<p>It also results in email-based subscribers sometimes seeing various inappropriate markups. (Should be using #3, but is seeing #1 or #2.)<\/p>\n<p>The problem gets worse when we talk about <code>the_extract()<\/code> (which is used to summarise posts; sometimes it is written by the author, or else it is automatically generated).  It should have the same three encodings as above but WordPress doesn&#8217;t recognise this.<\/p>\n<p>Sometimes, the extract is displayed to the user&#8217;s browser with the HTML tags stripped off. Sometimes it isn&#8217;t. (This cost over an hour of my time, this week, as I went off on a tangent trying to where my markup was being stripped out. I eventually discovered it was only stripped out with automatic extracts, but not from manual ones.)<\/p>\n<p>The very same issue applies to article titles. Article titles are often treated as plain text in the code, but they don&#8217;t have to be.<\/p>\n<p>I hope this observation is useful to you as you work on the next major version of WordPress. I am afraid I am not coming up with positive suggestions on how to fix this without breaking many plugins, but I hope merely becoming aware of the problem will help guide you toward a solution.<\/p>\n<p>Regards,<\/p>\n<p>Julian<\/p>\n","protected":false},"excerpt":{"rendered":"<p>An Open Letter to the WordPress Architect about an issue that I have found.<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_s2mail":"","footnotes":""},"categories":[29,25,34],"tags":[],"class_list":["post-147","post","type-post","status-publish","format-standard","hentry","category-influencing-others","category-insufficiently-advanced-technology","category-software-development"],"_links":{"self":[{"href":"https:\/\/www.somethinkodd.com\/oddthinking\/wp-json\/wp\/v2\/posts\/147","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.somethinkodd.com\/oddthinking\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.somethinkodd.com\/oddthinking\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.somethinkodd.com\/oddthinking\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.somethinkodd.com\/oddthinking\/wp-json\/wp\/v2\/comments?post=147"}],"version-history":[{"count":0,"href":"https:\/\/www.somethinkodd.com\/oddthinking\/wp-json\/wp\/v2\/posts\/147\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.somethinkodd.com\/oddthinking\/wp-json\/wp\/v2\/media?parent=147"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.somethinkodd.com\/oddthinking\/wp-json\/wp\/v2\/categories?post=147"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.somethinkodd.com\/oddthinking\/wp-json\/wp\/v2\/tags?post=147"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}