Background
It has been a while since I played with cryptography. I am working on a project where I have to encrypt/decrypt large files (possible many GiB of data) both fast and cheap. The encryption must be strong and thus AES came in my mind.The AES cipher has different modes of scrambling data, one better than other. Cipher Block Chaining (CBC) it's the best (I'm not going through the entire argumentation here but as always GIYF). Probably the best open source implementation of SSL/TLS protocol is OpenSSL and I'm using it also in my PHP project.
The encryption/decryption must be done transparently without the intervention of the (non-technical) end-user. It must be simple but in the same time fast and cheap. This is a use case for the symmetric encryption. The AES (which is the most Advanced Encryption Standard nowadays) seems to be the best candidate for the job so the project will go with the AES encryption with OpenSSL cryptography support. Furthermore, we must be able to use some other tools (in addition to ours) to decrypt at a later time the result. For instance the openssl tool comes in handy.
Now, the problem that I have encountered was that the OpenSSL implementation on PHP does not have a function that takes a file and encrypt/decrypt it but rather works with small buffers of data (strings). Obviously we cannot send 10GiB of data as string because we would exhaust the system of its resources. So the solution would be to read small chunks of data, to encrypt them and to join the encrypted segments as a whole. If we do the same while decrypting then we are done. Job done successfully. You wish!
The AES encryption model consist on a key (pass-phrase) and on an addition "key" which has the role to hide the encryption pattern that otherwise "could be seen" by some complex mathematical models, thus cracked. The key has a fixed size (128/192/256 bits, ie. 16/24/32 bytes) and so should be the encrypted block. So one condition of this model is that the block must be aligned to key length, ie. the block size must be multiple of key length. Obviously that's not always possible because sometimes the plain text I want to encrypt is shorted like "Howdy!". In this case the block must be padded with some character until it gets aligned. There are a lot of padding scheme out there but AES prefers PKCS5/PKCS7. It's just simple and efficient. This model works like this: If your key has a length of 32bytes and your block has a length of 26 characters it means that the block is 6 characters shorter, thus we append 6 chars more. They cannot be whatever thus the PCKS5/7 says they must be the same (ie. a unique char used repeatedly). More exactly, if the text is 6 digits shorter than the character should be ASCII 6, which gets repeated 6 times.
Now we are in trouble! What happens when we need no character? Should we append ASCII 0 (null)? Some programs may think that the text is ended at the first occurrence of the null character so the null character is a terrible idea. Instead we append a entire block (eg. 32 chars), each of them containing what - exactly - the ASCII 32 (ie. space).
Note that not all the blocks should be padded, only the last one (because the other are supposed to be already aligned since the input string is larger than the block size).
PHP OpenSSL AES encryption
So you read the first block (eg. 4096 chars), you encrypt it (but firstly it gets aligned thus padded with some other extra-chars) then you save it. You take the second block (of the plain-text file), you encrypt it, you append it to the existent file. You repeat the procedure until there are no more blocks to read.
Decryption
You reverse the encryption steps ie. you read the encrypted block, you decrypt it (and eventually it gets unpadded) then you save it as a decrypted block of file. You repeat the process until there are no more encrypted block to read.
Job done successfully! You wish...
The problem
You try to use some other external tools for decryption, namely openssl. The syntax for decryption (it can take more arguments but I don't want to go too far with this) is:
openssl enc -aes-256-cbc -d -K <YOUR-KEY-IN-HEXA> -iv <YOUR-IV-IN-HEXA> -in encrypted.file -out decrypted.file
The tool will decrypt your file eventually. If it is a text file then you can open it and probably you'll see, from time to time, some funny chars. They are those padded characters. But why the openssl didn't unpad them? Because it thought that only the last block of the input file was padded. It had no idea that we padded each block. But why did we pad then all the blocks if only the last is required? Because we sent one block at a time, the encryption library seen that block as its only input, it just pad it then encrypt it.
Workaround
Try to force openssl to decrypt one block at a time then to link these dcrypted block together. You will get your decrypted fully-functional file. For this we will use the dd Linux program (if you're in Linux) to read one block at a time then to send it to openssl to decrypt. Repeat the process until there are no more blocks to read.
# this is the default chunk size this program uses while encoding a file # we MUST keep it the same while decrypting bs=4096 fin=myarchive.tar.bz2.enc # your encrypted file fout=myarchive.tar.bz2 # this will be the decrypted file rm $fout # make sure we remove the potential output file if exists inl=$(du -b $fin|cut -f1) # get the encrypted file size let bs=bs+16 # read the file in blocks of bs+16 bytes let count=inl/bs+1 # we loop max count times i=0 while [ $i -lt $count ]; do # read from input file then decrypt using the openssl tool dd if=$fin bs=$bs count=1 skip=$i|openssl enc -d -AES-192-CBC -K BF6FAAED7FA1A21C7D1E4FA485A0AD00DEA136FB8811890A -iv D8B5CF37D975A8D8B1A9B72A17C3ABB0 >> $fout let i=i+1 done
@Edit:
In case you need a function to generate a very unique encryption key then, among many other functions that can be found on Internet, here is my candidate. Simple but efficient.
// first create an alphabet; I chose all the available ASCII // you may be happy with the range from 32..128 $alphabet = ''; for($i = 1; $i <= 255; $i ++) $alphabet .= chr ( $i ); function createRandomKey($len) { global $alphabet; $alphabet = str_shuffle ( $alphabet ); $key = ''; for($i = 0; $i < $len; $i ++) $key .= $alphabet [rand ( 0, 254 )]; return str_shuffle ( $key ); }
So first I create an alphabet. The size matters even here so the bigger the better. Every time I generate a new key I shuffle the dictionary (a full permutation of all its elements) so the chances to pick the symbols in the same order lowers. Then I start and pick one by one a symbol from a random position until the key length is fulfilled. Hopefully it is random enough. To strength the result even more I shuffle even the resulted key. Now, what's the odd to pick a weak key or even the same key twice? Of course that would be a nice candidate for the number theory class.
Now, if you think that this article was interesting don't forget to rate it. It shows me that you care and thus I will continue write about these things.
Eugen Mihailescu
Latest posts by Eugen Mihailescu (see all)
- Dual monitor setup in Xfce - January 9, 2019
- Gentoo AMD Ryzen stabilizator - April 29, 2018
- Symfony Compile Error Failed opening required Proxies - January 22, 2018