Mojibake Decoder

Restore byte A0

Yes No

Allow a literal space (U+20) to be interpreted as a non-breaking space (U+A0) when that would make it part of a fixable mojibake string.

Replace lossy sequences

Yes No

Detect mojibake that has been partially replaced by the characters ‘�’ or ‘?’. If the mojibake could be decoded otherwise, replace the detected sequence with ‘�’.

Decode inconsistent UTF-8

Yes No

When we see sequences that distinctly look like UTF-8 mojibake, but there’s no consistent way to reinterpret the string in a new encoding, replace the mojibake with the appropriate UTF-8 characters anyway. This helps to decode strings that are concatenated from different encodings.

Fix C1 controls

Yes No

Replace C1 control characters (the useless characters U+80 - U+9B that come from Latin-1) with their Windows-1252 equivalents, like HTML5 does, even if the whole string doesn’t decode as Latin-1.