api call character set encoding errors

Dustin Rambo · March 2016

We have built out a tool to allow participants to update their page copy from within their own personal page but we are having issues with em dashes (— or —)

Our default personal page copy contains several em dashes, we use getPersonalPageInfo() to populate this copy into a jQuery wsywig and then use the updatePersonalPageInfo method to push this copy back to the server. The success callback for this users getPersonalPageInfo to pull the updates back down into their page, all via AJAX and the luminateExtend library.

All of this works well, until we hit special characters, specifically em dashes. The first getPage request pulls down the proper character set. Inspecting the updatePage request shows that the proper character set is pushed back to the BB servers. Inspecting the second getPage request shows the incorrectly encoded em dash, displaying â€” instead of –. Does anyone have any experience with these apis and how to ensure the encoding is consistent? It's worth noting that TR pages use ISO-8859-1 encoding but the API method headers state UTF-8, which is preferable. Any insight is greatly appreciated!

Noah Cooper · March 2016

Just to confirm, you're URL-encoding the content when you submit it, right? Do you have an example of your JavaScript I could take a look at?

For reference, here's the JavaScript I use to get the content of an editor on another customer's site where we allow inline page editing. I use the jQuery Text Editor plugin. This function does some sanitization to replace jqte's HTML elements with the ones the API expects, to handle unicode characters, and to strip out any HTML comments (which are likely to be those dumb Microsoft Office mso comment tags).


var getEditorContent = function(editor) {<br/>  var $editor = $(editor);<br/>  <br/>  var editorContent = $editor.val().replace(/<\\/?[A-Z]+.*?>/g, function(m) {<br/>                        return m.toLowerCase();<br/>                      })<br/>                      .replace(/<font>/g, '<span>').replace(/<font /g, '<span ').replace(/<\\/font>/g, '</span>')<br/>                      .replace(/<b>/g, '<strong>').replace(/<b /g, '<strong ').replace(/<\\/b>/g, '</strong>')<br/>                      .replace(/<i>/g, '<em>').replace(/<i /g, '<em ').replace(/<\\/i>/g, '</em>')<br/>                      .replace(/<u>/g, '<span style="text-decoration: underline;">').replace(/<u /g, '<span style="text-decoration: underline;" ').replace(/<\\/u>/g, '</span>')<br/>                      .replace(/[\\u00A0-\\u9999\\&]/gm, function(i) {<br/>                        return '&#' + i.charCodeAt(0) + ';';<br/>                      })<br/>                      .replace(/&#38;/g, '&')<br/>                      .replace(/<!--[\\s\\S]*?-->/g, '');<br/>  <br/>  return editorContent;<br/>};

This is used to pass the content to the API using luminateExtend like so:


'&message_body=' + encodeURIComponent(getEditorContent('#page-content-editor'))

Dustin Rambo · March 2016

Well now I see my mistake, no I was not encoding the message into an uri. Wrapping the textarea value in encodeURIComponent() was all I needed, thanks so much!

Thanks for sharing the sanitation regex, I'm sure this will be useful. Regex is not something I'm well practiced in.

api call character set encoding errors

Comments

Categories