Removing empty HTML tags from TinyMCE

A little post for those that run into the same problem I had with TinyMCE and blank instances not quite being blank. I created a little configuration to remove empty HTML tags.

If you have an editable box with TinyMCE, and that box will appear on the website if it contains content, sometimes you can get a ‘blank’ box. When a user deletes content from the box, they might not delete everything in the box, leaving some blank HTML tags.

For example, if there was a list of links, and you use the mouse to select the list, and press delete, it can leave either <ul></ul>, or <ul><li></li></ul> depending on the browser.

In our CMS, when that went through to the front-end page the HTML content meant the box was displayed, but without visible content.

I messed around with some of the TinyMCE API events, ruling out onSaveContent and onSubmit.

The key event was onPostProcess, which allows you to run a script on the content before it is submitted.

Combine that with a little script from Stackoverflow to strip HTML from text with JavaScript, and you get:

tinyMCE.init({
   ... // other settings here
    setup : function(ed) {
    // If there is no text content, return nothing.
    //   NB: Image-only content would get swallowed.
      ed.onPostProcess.add(function(ed, o) {
            var text = "";
            var tmp = document.createElement("DIV");

            tmp.innerHTML = o.content;
            //console.debug("inner html=" + tmp.innerHTML);
            
            if (tmp.innerHTML) {
               text = tmp.textContent||tmp.innerText||"";
               text = text.replace(/\n/gi, "");
               text = text.replace(/\s/g, "");
               text = text.replace(/\t/g, "");
               //console.debug("if content, text=" + text);
            } else {
                text = "";
                //console.debug("else no content, and typeof =" + typeof(text));
            }
            if (text == "") {
                o.content = text;
                //console.debug("content set, possibly, get content = " + o.content);
            }
            
      });
   } // add comma here if there's another thing in the list
});

The only weakness I know of in that, is that something containing only an image but no text would return a blank box. I left the console logging in (but commented) if you’d like to see what happens though Firebug.

Free free to use this or to feedback any improvements below.