Overview


What is XOR cipher ?


In cryptography, the simple XOR cipher is a type of additive cipher, an encryption algorithm that operates according to the principles:

A ⊕0 = A,

A ⊕ A = 0,

(A ⊕ B ) ⊕ C = A ⊕ (B ⊕ C),

(B ⊕ A) ⊕ A = B ⊕ 0 = B,

where ⊕ denotes the exclusive disjunction (XOR) operation. This operation is sometimes called modulus 2 addition (or subtraction, which is identical). With this logic, a string of text can be encrypted by applying the bitwise XOR operator to every character using a given key. To decrypt the output, merely reapplying the XOR function with the key will remove the cipher.

XOR is a logical operation, pronounced exclusive or. It can be used to cipher messages simply and fast.
Code:
#!/usr/bin/env python

from itertools import izip, cycle

def xor_crypt_string(data, key):
    return ''.join(chr(ord(x) ^ ord(y)) for (x,y) in izip(data, cycle(key)))

my_data = "Hello. This is a secret message! How fun."
my_key= "firefly"

# Do the actual encryption
encrypted = xor_crypt_string(my_data, key=my_key)

print encrypted
print '---->'

# This will obtain the original data from the encrypted
original = xor_crypt_string(encrypted, key=my_key)

print original


Output :

Code:
.     BY2F
FRR
DF$IB
---->
Hello. This is a secret message! How fun.


How to ?


How is XOR used for encryption ?


The mention of "XOR" comes up quite a bit in cryptography. Is this the same XOR as the bitwise operator? If so, how is it used to encrypt a large amount of data rather than just an integer? Wouldn't you need the "password" to be the same length as the data you are encrypting?

Yes, it's the same XOR. It gets used inside most of the algorithms, or just to merge a stream cipher and the plaintext.

Everything is just bits, even text. The word "hello" is in ASCII 01101000 01100101 01101100 01101100 01101111. Just normal bits, grouped in 5 bytes. Now you can encrypt this string with a random string of 5 bytes, like an One-time pad. Let's say we got the randomly generated string 10001001 10000010 00001011 01001101 11101101 (generated with www.random.org). Now we XOR both strings, getting 11100001 11100111 01100111 00100001 10000010. If you never reuse or reveal the key, nobody can crack this cipher. (Well, I did reveal the key, so it's not secure anymore.)

Many block ciphers use XOR. Let's take AES: The Advanced Encryption Standard uses xor on single bytes (some other algorithms use blocks of 16 or 32 bits; there's no problem with sizes other than 8 bits). The round key will be XORed with the intermediate result and after that permuted and substituted. XOR also gets used in the key schedule.

IDEA also uses XOR as one of its three main functions: XOR, addition and multiplication.

XOR has (inter alias) these advantages when used for cryptography:

Very fast computable, especially in hardware.
Not making a difference between the right and left site. (Being commutative.)
It doesn't matter how many and in which order you XOR values. (Being associative.)
Easy to understand and analyze.

Of course, some of this "advantages" can be disadvantages, depending on the context. The fast speed makes it possible to use XOR often without huge performance drops. The security of Threefish, another block cipher, relies on the non-linearity of alternately using modulo addition and XOR. Despite of the use of 72 rounds (as the base of the hash function Skein) it's still quite fast.

XOR alone is not enough to create a secure block or stream cipher. You need other elements like additions, S-boxes or a random, equally long bit stream. This is because of the linearity of the XOR operation itself. Without non-linear elements, a cipher can easily be broken.

How to encrypt using XOR cipher ?


Encryption uses the XOR operator (Exclusive Or, symbol : ⊕) with the plain text and the key as operand (that should be binary encoded).

Let the plain message be 1001 and the key 10

One take the first bit (0 or 1) of the plain text and the first bit of the key and the one multiply with XOR to get the ciphered bit.

1 ⊕ 1 = 0

The operation is epeated with the second bit of the plaintext and the second bit of the key. If we get to the end of the key, one takes back the first bit.

0 ⊕ 0 = 0, puis 0 ⊕ 1 = 1 puis 1 ⊕ 0 = 1. The ciphered message is 0011.

One of the cool things about XOR encryption is that when you apply it twice, you get back the original string

Decrypting Simple XOR encryption


With XOR, decrypting is exactly the same operation as encrypting. Run the encrypted string through the xor_encrypt method again same key) and you have the plain text back.

Warning 1: null characters



One thing to watch out for: if the character in the string matches the corresponding character in the key, your result will be '\0'. This will be interpreted by your current code as the "end of string" and would stop the decryption short. To circumvent this, you want to pass the length of the "actual" string as a parameter to your function.

Warning 2: short keys



You also want to make sure you don't run past the end of your key - if the plain text is very long you might have to repeat the key. You can do this with the % operator - just recycle the key from the beginning.

Here is a complete example that shows these techniques :
Code:
#include <stdio.h>
#include <string.h>

void xor_encrypt(char *key, char *string, int n)
{
    int i;
    int keyLength = strlen(key);
    for( i = 0 ; i < n ; i++ )
    {
        string[add letter "i" here, i removed the i to noparse]=string[add letter "i" here, i removed the i to noparse]^key[i%keyLength];
    }
}

int main(void) {
  char plain[] = "This is plain text";
  char key[] = "Abcdabcdabciabcdabcd";
  int n = strlen(plain);
  // encrypt:
  xor_encrypt(key, plain, n);
  printf("encrypted string: \n");
  for(int ii = 0; ii < n; ii++) {
    if(plain[ii] > 0x32 && plain[ii] < 0x7F ) printf("%c", plain[ii]);
   else printf(" 0x%02x ", plain[ii]);
  }
  printf("\n");
  // **** if you include this next line, things go wrong!
  n = strlen(plain);
  xor_encrypt(key, plain, n);
  printf("after round trip, plain string is '%s'\n", plain);
}


This (not realizing the problem with kay == string) results in truncated decryption (the i in plain matches the same letter in key) :
Code:
encrypted string:
 0x15  0x0a  0x0a  0x17 A 0x0b  0x10 D 0x11  0x0e  0x02  0x00  0x0f B 0x17  0x01  0x19  0x16
after round trip, plain string is 'This is pla'


Leaving out the line above (i.e., keeping the value of n as the original length of the string), your result is :
Code:
encrypted string:
 0x15  0x0a  0x0a  0x17 A 0x0b  0x10 D 0x11  0x0e  0x02  0x00  0x0f B 0x17  0x01  0x19  0x16
after round trip, plain string is 'This is plain text'


exactly as you would expect.

How to break XOR cipher with repeating key ?


Take the two encrypted messages and XOR them with each other. You'll get the XOR of the two original, unencrypted messages since the identical keys cancel out. Deciphering this just requires patience and a good understanding of the encoding (what exactly is being XORed - the ASCII values of the letters? Some other binary encoding? Are spaces retained?).

You can try some common words and letter combinations and look for places where XORing the word with the ciphertext yields an English-looking string (which would then be the corresponding word or letter combination at the other text). Also, the ciphertext will likely have numerous 0 values corresponding to places where both messages have the same letter. Such matches are more likely to be a pair of E's or a pair of Spaces than a pair of Q's or X's.

Having said all that, I don't know where you can find a tool to do it for you, and if you find one it'll be very sensitive to the particular encoding. It's much more fun to do this yourself.

How do I solve a XOR cipher without knowing the key ?


There are different statistical approaches to code breaking but all of them need a sufficiently large amount of encrypted text to begin with. You sample seems too small to use any statistics based approach. But let me outline the strategy any way.

[1] You gather various letter frequency statistics and pairwise occurrence statistics for the language.
[2] Try different keys and check the "quality" of each decoded text based on the quality of match to standard statistics.
[3] We have to run some algorithm which maximizes the quality of this decoded text. One possible candidate is the Genetic Algorithm
but many others are possible.
[4] Since we are using statistics, a large sample is needed for this approach to work.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Okay, and that's the end of my article.
i know i am bad at ending things like this, and i know you lose your interest just by looking at how i write this.
however, hope you enjoy it and learn it