Generating HTML Character Entitities



HTML "character entities" (names for special symbols) begin with ampersands, e.g. é is "é". Unfortunately, ampersand is a special character in regular expression substitutions in awk. It means to insert a copy of the expression that was matched. For example:

	gsub(/foo/,"<b>&</b>")

will replace "foo" with "<b>foo</b>".

It is therefore necessary to quote ampersands with a backslash to get awk to treat them literally. However, backslashes are also interpreted by awk's lexical analyzer (for example, when reading something like \n and treating it as a linefeed character). This processing strips the backslash from the sequence "\&", leaving just a bare ampersand when the run-time system executes the regular expression substitution. You therefore need TWO backslashes before the ampersand, e.g.:

	gsub(/e'/,"\\&eacute;")

For details see: http://www.gnu.org/manual/gawk/html_node/Gory-Details.html#Gory%20Details.