Generating HTML Character Entitities

HTML "character entities" (names for special symbols) begin with ampersands, e.g. é is "é". Unfortunately, ampersand is a special character in regular expression substitutions in awk. It means to insert a copy of the expression that was matched. For example:


will replace "foo" with "<b>foo</b>".

It is therefore necessary to quote ampersands with a backslash to get awk to treat them literally. However, backslashes are also interpreted by awk's lexical analyzer (for example, when reading something like \n and treating it as a linefeed character). This processing strips the backslash from the sequence "\&", leaving just a bare ampersand when the run-time system executes the regular expression substitution. You therefore need TWO backslashes before the ampersand, e.g.:


For details see: