By Abhijit Ghatnekar
Most websites today provide textareas in the form of editors and people copy + paste text in them from all sorts of text editors. Editors like MS-Word introduce unwanted special characters in the text which go as well into the back end. It’s quite flummoxing and intriguing and sometimes the text appears valid but does not bypass the validation mechanisms of the server-side web application.
To Strip these off… employ the following Regex…..
$output = preg_replace(‘/[^(\x20-\x7F)]*/’,”, $output);
This will strip of all Non-ASCII
This actually is PHP… but it could be translated into any equivalent web scripting language.