There is no double-backslash in your file. When you look at the
repr of a
bytes object, it shows all backslashes escaped, to avoid confusion between, e.g.,
\n (a newline) and
\\n (a backslash followed by an
>>> s = rb'\x84'
So, the problem you're asking about doesn't exist. There are only single backslashes in your file.
The actual problem is that you didn't want the four bytes backslash, x, 8, and 4, you wanted the single byte
chr(0x84). But the four bytes are what you have in your file.
So your bug is in whatever code you used to create this file. Somehow, instead of dumping the bytes to the file, you dumped a backslash-escaped string representation of those bytes. The right place to fix it is in the code that created the file. Not writing corrupt data is always better than writing corrupt data, and then trying to figure out how to uncorrupt it.
But if it's too late for that—e.g., if you've used that broken code to encrypt a bunch of plaintext that you no longer have access to, and now you need to try to recover it—then this transformation happens to be reversible. You just have to do it in two steps.
First, you decode the bytes with a backslash-escape or more general unicode-escape codec:
>>> s=rb'[w\x84\[email protected]\xc6\xab\xc8'
Then you turn each Unicode character into the byte matching the same number, either explicitly:
>>> bytes(map(ord, s.decode('unicode-escape')))
… or, somewhat hackily, by relying on Python's interpretation of Latin-1:1
Again, those backslashes aren't actually in the string, that's just how Python represents a
bytes. For example, if you put that in
0x84 for byte
0x5c for the backslash character.
Your creation code is the real problem:
with open(file,'a') as f:
You’re converting the bytes to their string representation—with the
b prefix, the quotes around it, and the backslash escaping for every byte that isn’t printable ASCII, then stripping off the
b and the quotes, then encoding the whole thing as UTF-8 by writing it to a text-mode file.
What you want to do is just open the file in binary mode and write the bytes to it:
with open(file, 'ab') as f:
(Also, you don't want to call
with statement already takes care of that.)
Then you can read the file in binary mode and just decrypt the bytes as-is.
(Or, if you really want the file to be human-editable or something, you want to pick a format that’s designed to be human-editable and easily reversible, like
base64, not “whatever Python does to represent
bytes objects for debugging”.)
1. Unicode is guaranteed to line up with Latin-1 for all characters in Latin-1. Python interprets that to mean that Latin-1 should encode every byte from 0-255 as code point 0-255, instead of just the ones actually defined in ISO-8859-1. Which is valid, since ISO-8859-1 doesn't say what to do with bytes that it doesn't define, but not every tool will agree with Python.