Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to conduct an attack correctly? #49

Closed
durimars opened this issue Nov 4, 2021 · 52 comments
Closed

How to conduct an attack correctly? #49

durimars opened this issue Nov 4, 2021 · 52 comments
Labels

Comments

@durimars
Copy link

durimars commented Nov 4, 2021

Hello.

How to conduct an attack correctly? Plaintext: "TOP SECRET! ". There is a space at the end of the text.

Archive ZIP: UEsDBBQAAwAIAM8EZFM7VfjrKwAAAB0AAAAMAAAAcGxhaW4wMDEudHh04Tlvb+niDagMDzYfQkqGINYMQ4yRlVDUg7pRSncttBoiwDmBAbNambU1HlBLAQIUABQAAwAIAM8EZFM7VfjrKwAAAB0AAAAMAAAAAAAAAAEAIAAAAAAAAABwbGFpbjAwMS50eHRQSwUGAAAAAAEAAQA6AAAAVQAAAAAA

Thanks.

@cxzstuff
Copy link

cxzstuff commented Nov 5, 2021

?

@durimars
Copy link
Author

durimars commented Nov 5, 2021

I use the command: bkcrack -C plain001.zip -c plain001.txt -P plain.zip -p plain.txt -d end.txt

But I don't get the expected result.

@cxzstuff
Copy link

cxzstuff commented Nov 5, 2021

See if there is more keys...
bkcrack -C plain001.zip -c plain001.txt -p plain.txt -e

@durimars
Copy link
Author

durimars commented Nov 5, 2021

It always turns out something like this: TOP SECRET! сњМ_и эајЂџвxћлкF�ѕ

@cxzstuff
Copy link

cxzstuff commented Nov 5, 2021

And that "TOP SECRET! " text is at the top of that encrypted file?

Try with this. It has it at the start of the file.
plain007.zip

@durimars
Copy link
Author

durimars commented Nov 5, 2021

And that "TOP SECRET! " text is at the top of that encrypted file?

Yes. Password from the archive: 1234554321

Try with this. It has it at the start of the file. plain007.zip

bkcrack -C plain007.zip -c plain007.txt -k 28bfec92 767080fe 8c0f6a93 -d end01.txt

This vector came up.

Following my example, the keys were found, but they did not fit. :(
61555723 556d45e6 7226427f
fe25d361 254e54ab ebdd5741
eb40594f 739c4541 cd9a5781
9606f37f 0c58bd68 b1d1175d

@cxzstuff
Copy link

cxzstuff commented Nov 5, 2021

The 7-zip says its method to be "ZipCrypto Deflate:Maximum". Maybe it has something to do with it that it doesn't crack? @kimci86

I had a case that was packed a bit exotic way. There was five byte header in files when extracted with bkcrack. Maybe you could try adding -o to see if helps?
Like -o 1, 2, 3, 4, and so on. Use the full text as plain.txt to make testing faster.
Or better yet, make a bigger text file, and a bit smaller plain.txt from it (might start to say that it's too large if I remember correctly.)
Never mind. Tested using a bigger plain.txt. No keys. Some offsets the same.

@cxzstuff
Copy link

cxzstuff commented Nov 6, 2021

I use the command: bkcrack -C plain001.zip -c plain001.txt -P plain.zip -p plain.txt -d end.txt

But I don't get the expected result.

So you also made the plain.zip with the same program and compression?
Then the plain.txt is an entry. What if there is a file having the same name?
Does bkcrack use it then like if the command didn't have the -P at all?
Hope not.
This: #41
edit: tested and it doesn't.

@durimars
Copy link
Author

durimars commented Nov 6, 2021

So you also made the plain.zip with the same program and compression?
Yes.

@cxzstuff
Copy link

cxzstuff commented Nov 6, 2021

There might compression difference anyway. Seems that small files are just stored by these programs. ---> Try using bigger files.
BTW, what's the exact program you're using?

@durimars
Copy link
Author

durimars commented Nov 6, 2021

what's the exact program you're using?
Total Commander

@cxzstuff
Copy link

cxzstuff commented Nov 6, 2021

Okay, tested TC Android version. Uses that Deflate:Maximum even when Fastest is used. Used bigger files while at it. No keys.

Enhanced deflate

The "enhanced deflate" method is similar to the original deflate but operates on larger chunks of data at a time, often resulting in improved compression. It can be particularly useful for compressing large files containing large amounts of highly compressible data such as large text files and text-based database files.

Earlier versions of WinZip referred to this compression method as Maximum (enhanced deflate).
https://kb.winzip.com/help/help_compression.htm

@cxzstuff
Copy link

cxzstuff commented Nov 7, 2021

Here it's said that Deflate64 and Enhanced Deflate is the same.
https://en.wikipedia.org/wiki/Deflate#Deflate64/Enhanced_Deflate

7zip though sees them differently. Flagged your file as Deflate64 but in vain.
Then tested Maximum made with infozip and it worked ok with -P.
And pkcrack did the job too without problems.

So, Total Commander probably compress files somehow differently depending if encrypted or not... Maybe file size matters too???

Your file's keys work as expected, so it is like others - besides that TC's oddity.
bkcrack -C plain001.zip -c plain001.txt -k 1d3438e1 fde00324 ced26693 -d end.txt

@kimci86
Copy link
Owner

kimci86 commented Nov 8, 2021

Hello! Sorry for the delay.

Let us go through this example (assuming a Unix environment).
First, decode the archive from the given base64 string.

$ echo UEsDBBQAAwAIAM8EZFM7VfjrKwAAAB0AAAAMAAAAcGxhaW4wMDEudHh04Tlvb+niDagMDzYfQkqGINYMQ4yRlVDUg7pRSncttBoiwDmBAbNambU1HlBLAQIUABQAAwAIAM8EZFM7VfjrKwAAAB0AAAAMAAAAAAAAAAEAIAAAAAAAAABwbGFpbjAwMS50eHRQSwUGAAAAAAEAAQA6AAAAVQAAAAAA | base64 -d > archive.zip

Inspect metadata by running unzip -Z -v archive.zip or 7z l -slt archive.zip for example.
In particular, we learn the following information:

Path = plain001.txt
Size = 29
Packed Size = 43
Encrypted = +
CRC = EBF8553B
Method = ZipCrypto Deflate

The difficult part for a successful attack is to have correct plaintext. You known in your case that this text file plain001.txt starts with "TOP SECRET! ". As the encrypted file is compressed, the known plaintext must also be compressed with the same compression algorithm and parameters.
Deflate compression is used and it makes the file 2 bytes longer (and encryption adds a 12 bytes header). It is a case with small files where there is not enough redundancy to really compress, and the compression is predictable. For big files, it is difficult to predict how compression behaves without knowing a lot of uncompressed data.

So in this case, we can just compress "TOP SECRET! ", for example with the script tools/deflate.py provided with bkcrack. It will use Huffman encoding with a fixed Huffman tree. We hope the encrypted file was compressed the same way, which is likely given its size. This will generate 14 bytes of compressed data. The last two bytes however must be ignored: it contains the end-of-data marker, which is not there in the ciphertext because it is longer, and a few more 0 bits to reach the byte boundary. Some knowledge of the deflate compression algorithm is needed to get this part. See this article on zlib's website to get a better understanding An Explanation of the Deflate Algorithm or RFC1951 - DEFLATE Compressed Data Format Specification for all the details.

$ echo -n "TOP SECRET! " | python3 tools/deflate.py > compressed

So we have 12 bytes of compressed known plaintext. 12 bytes is the minimum and there might be some collisions, so an exhaustive search is recommended.

Now, a trick: the encryption header prepended to the compressed data before encryption is not completely random. The last byte at least is set to the CRC's most significant byte (or a timestamp in some cases), also available in the entry's metadata. It is used to quickly check if a password seems correct in regular zip file managers. We can include it in our known plaintext.

So we now have 13 bytes of known plaintext. There should not be collisions and the exhaustive search is not needed, which saves time.

$ ./bkcrack -C archive.zip -c plain001.txt -p compressed -t 12 -x -1 eb

(Update to version 1.3.3 or higher to do this because there was a small bug with negative offsets.)

With some patience, it gives a solution 1d3438e1 fde00324 ced26693.
It is enough to decipher with either -U or -d parameters.
Using -d, the deciphered data is compressed and can be decompressed with the tools/inflate.py script.

$ ./bkcrack -C archive.zip -c plain001.txt -k 1d3438e1 fde00324 ced26693 -d deciphered.deflate
$ python3 tools/inflate.py < deciphered.deflate > deciphered.txt

If the password is also wanted, it can be recovered quickly because it is short.
For example, running ./bkcrack -k 1d3438e1 fde00324 ced26693 -r 12 ?p gives the password 1234554321 instantly.

The conclusion for me is that I should write more documentation on how to use bkcrack :)
Do you have more questions?

@cxzstuff
Copy link

cxzstuff commented Nov 9, 2021

This is most instructive. That -p value is quite confusing. Can be plain text, deflated plain text or zipped plain text. And somewhere there was that -p can be also just a line of text -p "The quick brown fox jumps over the lazy dog".

I don't understand why then zipped one doesn't work if deflated one does.
Or is it that the zippers don't really compress those small files?

You could remove that end of data marker from that deflate.py . Or add removing it if that's needed.

This could be in the beginning of the readme.md. And then if necessary how use them.

usage: bkcrack [options]

Mandatory:
-c cipherfile File containing the ciphertext
-p plainfile File containing the known plaintext

or

-k X Y Z Internal password representation as three 32-bits integers
in hexadecimal (requires -d, -U, or -r)

Optional:
-C encryptedzip Zip archive containing cipherfile

-P plainzip Zip archive containing plainfile
-o offset Known plaintext offset relative to ciphertext
without encryption header (may be negative)
-t size Maximum number of bytes of plaintext to read
-x offset data Additional plaintext in hexadecimal starting
at the given offset (may be negative)

-e Exhaustively try all the keys remaining after Z reduction

-d decipheredfile File to write the deciphered text (requires -c)
-U unlockedzip password
File to write the encryped zip with the password set
to the given new password (requires -C)

-r length charset Try to recover the password up to the given length using
characters in the given charset. The charset is a
sequence of characters or shorcuts for predefined
charsets listed below. Example: ?l?d-.@

                  ?l lowercase letters
                  ?u uppercase letters
                  ?d decimal digits
                  ?s punctuation
                  ?a alpha-numerical characters (same as ?l?u?d)
                  ?p printable characters (same as ?a?s)
                  ?b all bytes (0x00 - 0xff)

@kimci86
Copy link
Owner

kimci86 commented Nov 9, 2021

I don't understand why then zipped one doesn't work if deflated one does.
Or is it that the zippers don't really compress those small files?

For small files without patterns, compression makes files larger so zippers prefer not to use compression. Both 7zip and Info-ZIP's zip tools use the 'store' method when making an archive with a file containing "TOP SECRET! ". The tool used (Total commander) makes a non optimal choice when using deflate compression here.

@cxzstuff
Copy link

cxzstuff commented Nov 9, 2021

The tool used (Total commander) makes a non optimal choice when using deflate compression here.

So it does compress like deflate.py so it should work really??
But it didn't. Weird things to me... but I'm learning...

@kimci86
Copy link
Owner

kimci86 commented Nov 9, 2021

Yes making an archive with Total commander is ok as it uses deflate compression even on a small file. You still need to ignore the last two bytes though with -t 12. I just tested it and it works for me.

@durimars
Copy link
Author

durimars commented Nov 9, 2021

-1 eb

Thank you very much for the answer.

What does this mean: "-1 eb"? A byte is added to plain text? At the beginning, at the end?

@kimci86
Copy link
Owner

kimci86 commented Nov 9, 2021

EB is one byte written in hexadecimal. It comes from the CRC of the encrypted entry which is EBF8553B.
-1 is the position of this byte relative to compressed know plaintext, just before the beginning.
So -x -1 eb tells bkcrack to assume the byte just before compressed data (i.e. the last byte of the encryption header) was EB before it was encrypted.

@durimars
Copy link
Author

durimars commented Nov 9, 2021

EB ...

Thanks. I see.
Is it possible to perform an attack knowing only the first 3 bytes? CRC is not known.

@cxzstuff
Copy link

cxzstuff commented Nov 9, 2021

You need at least 8 more. How many continuously it was? It's written somewhere.
Then just do 256 attacks...

@kimci86
Copy link
Owner

kimci86 commented Nov 9, 2021

8 contiguous bytes are required to generate 2 ^ 32 internal password representation candidates, and 4 more bytes (contiguous or not) are used to filter those candidates to have only one solution on average. Even if filtering was done in some other way, the bare minimum is 8 contiguous bytes.
If you do not have enough data for a known plaintext attack, you can try a dictionary or brute-force attack on the password with other tools such as john the ripper or hashcat.

@durimars
Copy link
Author

EB

Hello.
There is 1 question left. I know the plaintext part, but it doesn't start from the beginning and I don't know the offset. I won't be able to carry out the attack?

@kimci86
Copy link
Owner

kimci86 commented Nov 12, 2021

You need to provide the offset. If you do not know the offset, you can try several possible values, starting with what seems the most likely, until you try with the right one. It might take a long time though.

@magnumripper
Copy link
Contributor

That's only viable if it's stored. If it's deflated I'd say it's virtually impossible.

@cxzstuff
Copy link

If password is short and amount of offsets is reasonable, I cannot see why it's not possible.

@kimci86
Copy link
Owner

kimci86 commented Nov 14, 2021

If password is short and amount of offsets is reasonable, I cannot see why it's not possible.

The offset is needed to recover the internal keys, not to recover the password. The password complexity has no impact on this part. The time needed for an attack if the offset is not known would depend of the contiguous plaintext length (the longer, the better) and the number of offsets tried before finding the right one (the lower, the better).

In addition, if plaintext has to be compressed, not knowing the uncompressed plaintext offset makes it very difficult to get the corresponding compressed plaintext. It is already hard to guess how deflate compression behaves when only a small part of the uncompressed plaintext is known. Not knowing where the uncompressed plaintext starts makes it harder because symbols are not necessarily byte-aligned.

@cxzstuff
Copy link

Yes, the keys. And keys for passwords between 1 to 12 are tested in seconds.

@magnumripper
Copy link
Contributor

In addition, if plaintext has to be compressed, not knowing the uncompressed plaintext offset makes it very difficult to get the corresponding compressed plaintext. It is already hard to guess how deflate compression behaves when only a small part of the uncompressed plaintext is known. Not knowing where the uncompressed plaintext starts makes it harder because symbols are not necessarily byte-aligned.

Exactly. We'd even have to be able to give the offset (-o) in bits as opposed to bytes. But implementing that would be overkill because we'll be out of luck trying to create the deflated plaintext.

@kimci86
Copy link
Owner

kimci86 commented Dec 13, 2021

I close this as I understand you have no more questions. Feel free to reopen or open a new issue otherwise. Thank you all for your contribution to this discussion.

@kimci86 kimci86 closed this as completed Dec 13, 2021
@durimars
Copy link
Author

$ echo -n "TOP SECRET! " | python3 tools/deflate.py > compressed

Hello. What do you have in the file "compressed", what bytes?
The latest version of the program does not find the decryption signature?

@kimci86
Copy link
Owner

kimci86 commented Sep 22, 2022

Hello. What do you have in the file "compressed", what bytes?

I got those bytes: 0b f1 0f 50 08 76 75 0e 72 0d 51 54 00 00.

The latest version of the program does not find the decryption signature?

What do you mean?

@kimci86 kimci86 reopened this Sep 22, 2022
@durimars
Copy link
Author

Thanks.

@durimars
Copy link
Author

durimars commented Sep 23, 2022

Hello.
I do not understand. :)
Archive:
UEsDBBQAAwAIAGJlNlUW4hR4OAAAAC0AAAAIAAAAT3Blbi50eHQtQNFW4Ma/Pt9+ KNa8L4AMqye/iIc1fyTgrr8xZwk7AfwV22FDFvm1Z+zbksa6Fgpsl3HZ9wD32FBL AQIUABQAAwAIAGJlNlUW4hR4OAAAAC0AAAAIAAAAAAAAAAEAIAAAAAAAAABPcGVu LnR4dFBLBQYAAAAAAQABADYAAABeAAAAAAA=
Known plaintext (first characters):
TOP SECRET!!
HEX: 54 4F 50 20 53 45 43 52 45 54 21 21
Compressed plaintext (TOP SECRET!!):
HEX: 0BF10F500876750E720D51540400922FF13B0C000000, where
922FF13B - CRC32
It turns out I can send these bytes to search for a signature:
0BF10F500876750E720D51540400
Where did I go wrong? How to carry out an attack using the information listed above?

@kimci86
Copy link
Owner

kimci86 commented Sep 23, 2022

Compressing TOP SECRET!! gives bytes 0b f1 0f 50 08 76 75 0e 72 0d 51 54 04 00.
Ignoring the last two bytes because of the deflate end of stream marker and padding bits, we can use bytes 0b f1 0f 50 08 76 75 0e 72 0d 51 54 as known compressed plaintext. (This works because the file is small so the compression is predictable.)

Regarding the CRC value 922FF13B, I don't see where it comes from. The archive metadata tells a CRC value of 7814e216.
In addition, the check byte comes before compressed data, not after.
Anyways, bkcrack since version 1.5.0 now loads the check byte automatically so it is not needed to be smart and pass the CRC most significant byte manually anymore.

So this is how you can run the attack on your example archive:

$ echo -n 'TOP SECRET!!' | python3 tools/deflate.py > compressed
$ bkcrack -C archive.zip -c Open.txt -p compressed -t 12

After a couple of minutes, it gives the internal password representation. Then, getting the password:

$ bkcrack -k d087eb63 375335b3 68557e28 -r 12 ?p

Gives the password 1234567890 instantly.

@durimars
Copy link
Author

Works. Thanks again.

@durimars
Copy link
Author

Can I use the program to stream encrypted data?

@kimci86
Copy link
Owner

kimci86 commented Sep 24, 2022

Sorry, I do not understand your question. Could you elaborate?

@durimars
Copy link
Author

If I do not have an archive, but only part of it in the form of encrypted text.

@kimci86
Copy link
Owner

kimci86 commented Sep 25, 2022

If you have ciphertext in a file on its own (including 12 bytes of encryption header), you can pass it with the -c argument without specifying an archive with -C.

For example, below is the result of calling xxd on your last example archive where I have put the ciphertext in bold italic.
If you have that bold italic part in a file, you can pass this file as the -c argument without -C argument.

00000000: 50 4b 03 04 14 00 03 00 08 00 62 65 36 55 16 e2  PK........be6U..
00000010: 14 78 38 00 00 00 2d 00 00 00 08 00 00 00 4f 70  .x8...-.......Op
00000020: 65 6e 2e 74 78 74 2d 40 d1 56 e0 c6 bf 3e df 7e  en.txt-@.V...>.~
00000030: 28 d6 bc 2f 80 0c ab 27 bf 88 87 35 7f 24 e0 ae  (../...'...5.$..
00000040: bf 31 67 09 3b 01 fc 15 db 61 43 16 f9 b5 67 ec  .1g.;....aC...g.
00000050: db 92 c6 ba 16 0a 6c 97 71 d9 f7 00 f7 d8 50 4b  ......l.q.....PK
00000060: 01 02 14 00 14 00 03 00 08 00 62 65 36 55 16 e2  ..........be6U..
00000070: 14 78 38 00 00 00 2d 00 00 00 08 00 00 00 00 00  .x8...-.........
00000080: 00 00 01 00 20 00 00 00 00 00 00 00 4f 70 65 6e  .... .......Open
00000090: 2e 74 78 74 50 4b 05 06 00 00 00 00 01 00 01 00  .txtPK..........
000000a0: 36 00 00 00 5e 00 00 00 00 00                    6...^.....

I am curious to know how you end up having encrypted data without a ZIP archive. Could you explain? Is it a scenario that bkcrack could better support?

@durimars
Copy link
Author

Hello. I seem to have misunderstood you again. I saved the highlighted binary data to a secret.txt file. And used the command:
bkcrack -c secret.txt -p compressed -t 12
Where did I go wrong?

@kimci86
Copy link
Owner

kimci86 commented Oct 13, 2022

I don't know, that command works for me.
Can you explain precisely what steps you are taking and what is the result?

@durimars
Copy link
Author

Yes, it worked. I didn't wait. But why is searching slower?

Option 1 (zip):
[08:32:52] Z reduction using 5 bytes of known plaintext
100.0 % (5 / 5)
[08:32:52] Attack on 1210520 Z values at index 6
Keys: d087eb63 375335b3 68557e28
11.0 % (132920 / 1210520)
[08:35:51] Keys
d087eb63 375335b3 68557e28

Option 2 (stream):
[08:39:53] Z reduction using 4 bytes of known plaintext
100.0 % (4 / 4)
[08:39:53] Attack on 1389021 Z values at index 7
Keys: d087eb63 375335b3 68557e28
73.8 % (1024470 / 1389021)
[09:02:24] Keys
d087eb63 375335b3 68557e28

@kimci86
Copy link
Owner

kimci86 commented Oct 14, 2022

Great. The difference is that, when loading ciphertext from a ZIP archive, bkcrack automatically loads one additional byte of plaintext which is derived from ZIP metadata. (This can be disabled with --ignore-check-byte flag.) Notice that option 1 used 5 bytes and option 2 used only 4 bytes during the Z reduction step. This additional byte allows to filter out more Z value candidates (1210520 remain instead of 1389021 in your example above). Then looking for the solution is faster on average because the search space is smaller.
But the search space size difference is not so significant here. The actual time taken depends on the number of candidates tested before finding the solution, which can be anything between 1 and the number of candidates. You were lucky with option 1 because the solution was found after testing only 11 % of the search space, and not so lucky with option 2 because it needed to test 73.8 % of candidates before hitting the solution. That is why option 2 took much longer.

@durimars
Copy link
Author

Thanks for the answer! What open byte are you talking about? Can I include it in the stream?

@kimci86
Copy link
Owner

kimci86 commented Oct 15, 2022

The check byte is the last byte of the file's encryption header, a chunk of 12 bytes which is put right before actual compressed data before being encrypted. It is usually the most significant byte of the original file's CRC. Depending on some flag in ZIP metadata, the check byte can be a byte of the file's last modification time instead. When loading data from a ZIP archive, bkcrack loads the check byte automatically.

The check byte can be passed manually as additional plaintext at offset -1 with -x option. Offset -1 because it is one byte just before actual compressed data before being encrypted.
In your example, you would pass the CRC's most significant byte. CRC is 7814e216 so you would use parameters -x -1 78.

But if we did not have the ZIP archive, then we could not derive the check byte from encrypted data alone. So I don't think it is ever needed to pass the check byte manually. Letting bkcrack get it from the ZIP archive is simpler.

@durimars
Copy link
Author

Thank you so much for your patience and replies.
What command to use when decrypting the stream in this case:

  1. The first 5 characters are unknown.
  2. 6-8 characters are known - "ABC".
  3. 9-12 characters are unknown.
  4. 13-22 characters are known - "1234567890".
  5. The remaining characters are unknown.

@kimci86
Copy link
Owner

kimci86 commented Oct 16, 2022

In this case, your plain uncompressed data starts like this:
(I put dots where it is unknown)

     Offset:  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21
      ASCII:  .  .  .  .  .  A  B  C  .  .  .  .  1  2  3  4  5  6  7  8  9  0
Hexadecimal: .. .. .. .. .. 41 42 43 .. .. .. .. 31 32 33 34 35 36 37 38 39 30

If encrypted data was not compressed, this command would work:

$ bkcrack -c your_file -x 5 414243 -x 12 31323334353637383930

If encrypted data was compressed, this will not work and we need to guess how known data was compressed.

Let's try by making the following assumptions on the original file:

  1. The original file was small.
  2. The few unknown bytes at the beginning of the file don't have repeated patterns.

Assumption 1 means deflate compression probably uses fixed Huffman codes when compressing the original file. Those fixed Huffman codes are defined by deflate format specification and do not depend on input data (which is convenient for us).
Assumption 2 means deflate compressed data will not include pointers to duplicated strings.

This is how I would do it: compress a string that has the known bytes where we expect and characters to fill up unknown bytes without repetition.

$ echo -n "abcdeABCfghi1234567890" | python3 tools/deflate.py > compressed

Running infgen confirms the compressed string we just created uses fixed Huffman codes and does not have pointers to duplicated strings.
This is consistent with the original compressed file if our assumptions are correct.

$ infgen compressed
! infgen 3.0 output
!
last
fixed
literal 'abcdeABCfghi1234567890
end

Let's have a look at the compressed bits:

$ xxd -b -c 1 compressed

With some knowledge about deflate compression, we can find the bytes that correspond to our known bytes.
I colored bits manually and added the meaning of each group of bits on the right to show my reasoning. I do not known a tool to do it automatically.

capture

So our known data corresponds to bytes at offset 6, 7, and 13 to 21 (respectively 6, 7, d and 15 in hexadecimal) in the compressed file.
Unfortunately, we cannot use known bits at offset 5, 8, 12 and 22 because bkcrack works with bytes and not bits at the moment. It is a limitation that could be overcome in the future.
So we are left with only 11 known bytes of compressed data. Another limitation is that bkcrack won't start with only 11 bytes. But, assuming your encrypted data comes from a ZIP archive, the check byte can be used so we actually have 12 bytes which is enough.

Conclusion: this command should work, if our assumptions are correct and if I did not make any mistake.
Because there might be a few solutions when only 12 bytes are available, I added option -e to look for all of them.

$ bkcrack -C archive.zip -c your_file -x 6 7472 -x 13 343236313533b7b034 -e

@durimars
Copy link
Author

Thank you so much!
I will complicate, and maybe on the contrary I will simplify the condition.

  1. The stream starts with the characters "ABC" and with a probability of 99% they are not repeated in the plain text.
  2. The stream contains frequently repeated words and they are known, but their position is unknown.
  3. The data stream is large enough and the static Huffman is definitely not used for compression. Those. we have a dynamic Huffman.
    As I understand it, at the beginning of the compressed data there will be a dictionary with repeated data, and it is in this block that there will be "repeated words" according to clause 2.
    Is it possible to search for "password" according to these rules? It may be necessary to run the program many times, each time changing (guessing) the search offset.

@kimci86
Copy link
Owner

kimci86 commented Oct 17, 2022

If dynamic Huffman codes are used, the compressed stream starts with a Huffman tree, itself encoded in a compact way (section 3.2.7. of RFC1951 - DEFLATE Compressed Data Format Specification). This Huffman tree is a statistical model built from characters frequencies.
There is no dictionary for repeated words. The first time a frequent word or pattern appears, it is encoded as if it was not frequent. The next occurrences however are replaced by a pair of numbers (length, backwards distance) pointing to a previous occurrence. This is LZ77 algorithm.
So I don't think deriving compressed plaintext is feasible in your situation.
You might get more lucky with good old password cracking with john the ripper or hashcat.

@kimci86
Copy link
Owner

kimci86 commented Jan 25, 2023

As far as I understand, your questions have been answered so I close this. Feel free to reopen or create a new issue if you have more questions or feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants