Some of the team attended the BSides security conference in San Francisco this weekend (https://bsidessf.com), and we had a blast competing in capture the flag. First off, I would just like to say thank you to all the organizers, as the challenges and talent on that side of the house were awesome to see. Now, on to the challenge that would not allow my brain to stop working for the next few hours. If you enjoy a so-so story and want to hear about a lot of different compression algorithms and container formats, then please keep reading.
The challenge is named Matryoshka and reads as below:
After a lecture on files and the structure of the file system, William James was accosted by a little old lady.
“Your theory that the file system is the primary unit of storage has a very convincing ring to it, Mr. James, but it's wrong. I've got a better theory," said the little old lady.
"And what is that, madam?" Inquired James politely.
"That every file we create is just inside of an archive,"
Not wishing to demolish this absurd little theory by bringing to bear the masses of computer scientific evidence he had at his command, James decided to gently dissuade his opponent by making her see some of the inadequacies of her position.
"If your theory is correct, madam," he asked, "what is this archive stored in?"
"You're a very clever man, Mr. James, and that's a very good question," replied the little old lady, "but I have an answer to it. And it is this: The first archive is stored in a second, far larger, archive."
"But what is this second archive stored in?" persisted James patiently.
To this the little old lady crowed triumphantly. "It's no use, Mr. James – it's archives all the way down."
At this point I was intrigued, especially by the name and story, and I downloaded the file.bin, which was the artifact of this challenge. The name “Matryoshka," referring to Russian nesting dolls, had me thinking that this was going to have some recursion involved. So I dove in, not realizing this was going to be how I spent the next few hours...
Since the challenge mentioned containers, I tried the obvious and ran “unzip file.bin”, which was successful, and another file.bin emerged. One opened.
I began using the file command to help identify what the file at least appeared to be, not quite realizing at the time how that command worked. Running unzip on the newly created file immediately threw an error: “unsupported compression method 95”. After some quick Googling, I was presented with a WinZip page, which talked about when they added support for this error, and that it referred to xz compression. After a few failed attempts, I reluctantly rebooted my Mac and installed WinZip, which gave me lots of ads and still failed to unzip. Then I thought of 7zip, which has always served me well, so I pulled that down and was able to extract the next file.bin, no problem. Two down.
The next few were pretty easy and had native support with common unzipping tools. At some point during this, I attempted to use a recursion script to detect the file type with the file command, and perform the appropriate action, but was unsuccessful because so many different compression types were used to create this endless challenge. Back to the files: number three was zip, number four was compress’d, number five was gzip, number six was a tar file, number seven was a RAR split archive that needed to be joined, and this resulted in- you guessed it- another file named file.bin. This one read as a Windows imaging image, but also extracted with 7zip and got me to eight completed steps.
The next file I found just outputted as “data.” This sent me down a rabbit hole, first researching compression algorithms on Wikipedia, and then everything else under the sun, because I could not determine the file type. I used Hex editors, strings command, and other various techniques to solve it, without much success, before I just started trying random tools. The file started with a string that said “DCT*”, but that also yielded no results. So I decided to learn how the file command was actually able to determine everything else that I had discovered so far. I began reading about the “Magic Number,” which is the signature that most files have, unique to its file type. Understanding that, I found a tool that had a bigger database (because when in doubt, go for the scatter shot), and ended up using http://mark0.net/soft-trid-e.html.
This told me that the file was of DACT (Dynamic Adaptive Compression Tool) format, and I found more information here: http://www.rkeene.org/oss/dact/. After a quick compile from source, the file was now decompressed. At this point I was thinking, “It surely must be almost done, because that took way too long,” but, again, another file. The next few files were pretty straight forward, with each being more obscure than the previous one.
At this stage in the game, I was in my hotel room a couple hours before the conference was to begin again Monday morning, and was in it for the long haul. I spun up an Ubuntu server in Amazon Web Services and moved my file.bin up there.
Next up, ASCII cpio was the 11th container, followed by frozen file 2.1 which is my go to file compression tool, in case you were wondering (sarcasm). For your reading pleasure, check out the man page. https://linux.die.net/man/1/unfreeze and the accompanying obscure rpm https://www.rpmfind.net/linux/RPM/dag/redhat/el6/x86_64/freeze-2.5.0-3.el6.rf.x86_64.html, which I threw on another temp server and ran “unfreeze< file.bin > file” which now identifies as an xar file. I pulled down the source code and compiled this one on my Mac. As I write this up, I am far more organized regarding my approach, so don’t ask me why I didn’t just use 7zip or the equivalent package on Ubuntu at the time. Compiling from source was the obvious answer then.
The next few were more of the same: find obscure compression tool, find source code or package, install and decompress another file.bin.
This site: http://www.webutils.pl/index.php?idx=binhex, took care of the eighteenth file, a BinHex, no problem. Then I used The Unarchiver on Mac to handle the resulting CAB file, as well as the subsequent StuffIt Archive. The next one of interest was an HFSPlus volume. I installed hfsprogs and the appropriate linux-image-extras package, as the server did not come with the extra filesystem drivers. I just knew this was it- that inside was a flag.txt, and I would be on to the next challenge. But, alas, file.bin was the only thing my ls command returned. At this point, I could only laugh to myself, since I was still sitting in my hotel room and hadn’t made it back to the conference.
After untarring the next one, several PAR2 files were dumped out. I switched over to Windows, after poking around, because by now I was not very picky about what worked, I just wanted to make something that seemed like headway.
This screen grab is from the parity restore, because we need a picture here to break up my rambling… This was number 26 (if you’ve stopped counting), and I was starting to think the lady in the story from earlier was actually right.
Zoo, xz, lzop, arc and LHarc were no problem, and, either packages were readily available to get those done, or 7z handled them natively on linux. Ace was a little tricky, as the version readily available in a repo is version 1.2. For an old archive, that is an even older version, and it just pretended to extract, but with no actual data. Once I had version 2.5 of ACE, it was able to decompress, and I got another filesystem- this time Squashfs, which was easy enough. The next file was a ZPAQ file, so I pulled down and compiled the code from http://mattmahoney.net/dc/zpaq.html, and then extracted the enclosed ar file. CPIO, LZMA, and flac were next, and that was when I knew I was near the end. The flac file was making odd sounds, and I knew it must be the key. One final decompression, the thirty-ninth file, and I had it. The FLAC file decompressed to wav format, and catting the resulting file gave the content below:
RIFF?WAVEfmtdataw.. - ... - .... . .. -. -.-. .-. . -.. .. -... .-.. . ... .... .-. .. -. -.- .. -. --. -- --- .-. ... . -.-. --- -.. .
I had a string of morse code, and the flag was solved.
So, in summary, what we’ve learned here is that if anyone ever asks me to decompress something for them, the only thing I’ll be able to say is, “Well, let me tell you a story about an old lady…”
Finally, here's the breakdown of how the "file" command interpreted each file:
Zip archive data, at least v2.0 to extract
RAR archive data
Zip archive data, at least v2.0 to extract
compress'd data 16 bits
gzip compressed data
RAR archive data, v80, flags: Commented, Authenticated
Windows imaging (WIM) image
DACT (Dynamic Adaptive Compression Tool)
ASCII cpio archive (SVR4 with CRC)
frozen file 2.1
xar archive - version 1
rzip compressed data - version 2.1 (5438 bytes)
LZ4 compressed data (v1.4+)
ASCII cpio archive (pre-SVR4 or odc)
7-zip archive data, version 0.4
BinHex binary text, version 4.0
Microsoft Cabinet archive data, 4712 bytes, 1 file
lzip compressed data, version: 1
bzip2 compressed data, block size = 900k
Macintosh HFS Extended version 4 data
ARJ archive data, v11, slash-switched, original name: , os: Unix
POSIX tar archive (GNU)
Parity Archive Volume Set
Zoo archive data, v2.10, modify: v2.0+, extract: v1.0+
XZ compressed data
lzop compressed data - version 1.030, LZO1X-999, os: Unix
ARC archive data, packed
LHarc 1.x/ARX archive data [lh0]
ACE archive data version 20, from Win/32, version 20 to extract, contains AV-String (unregistered), solid
Squashfs filesystem, little endian, version 4.0
current ar archive
LZMA compressed data, streamed
FLAC audio bitstream data, 8 bit, mono, 119 samples
- RIFF (little-endian) data, WAVE audio, Microsoft PCM, 8 bit, mono 1 Hz