By Tyler W
January 21, 2024
The Playful Art of Steganography
Steganography is nothing new, but in terms of digital evidence it is something that has intrigued me, and can be a true challenge to uncover. Steganography is the process of effectively hiding one message inside of another. Often the most common means of doing this is to embed a file, such as a zip or text file inside of an image, movie or audio file. The implications to the 'cover file' are minimal, and identifying these can take some time and effort. Especially when not all tools handle it in the same manner. This is not a common occurrence in my investigations however, occurred recently, and I thought would be worth exploring more. It is possible to hide files using all operating systems, however, given that Windows is the most common, we will explore hiding a file inside of another using Windows operating system, and then testing and exploring with Linux.
I have created a folder on my desktop called 'Stegano' and imported a standard image file:
We have also confirmed the properties of this original file (pay particular attention to the file size, as this will be important later), but we can see there is nothing strange regarding this file.
Secondly we will create a txt file with some data (text) inside it that we will later embed inside of the image. Generally, any file type can be embedded into another, it could be for example, another image inside of another, an encrypted document, anything really. For now we can confirm our simple text file has been created and has data within it. At this point there is nothing overly secretive or important within this document or file structure.
Let's now work to embed the file hidden_data.txt
inside the Waterfall.jpg
file. Within Windows we do not need to install any special software and just use the copy function with the /b switch in order for Windows to treat these files as binaries, effectively removing the file type for the process of this.
We run the command copy /b "Waterfall.jpg" + "hidden_data.txt" "notsuspicious.jpg"
and the new file (notsuspicious.jpg) we specify at the end of the command, will be created.
We can confirm the creation of the new file, notsuspicious.jpg
:
We are also able to open this file as normal by navigating to it in the Windows Explorer directory, and opening as we would any other file. We open both files, the original Waterfall.jpg
file and the notsuspicious.jpg
file and can see that they are identical.
As we can see from the properties, there is no visual difference and the size of the file is very similar, with the hidden data file size of 368 bytes being the difference (see image below). Depending on the size of the data hidden in the image, this size variance will adjust, but without the original image to compare, it may be difficult to locate a file size anomaly. A tell-tale of an embedded file may be a strangely large file type that does not compute, e.g an image file in the GB range. Locating these is difficult, especially if an audio or movie file has been used to hide / embed data as large file types are not uncommon. Generally, we would need to have some level of suspicion or user behavious suggesting that such a tactic may be deployed.
Identifying these types of hidden and embedded files is inconsistent to say the best with different software solutions having different strike rates, depending on the embedded files (from our experience).
One of the common solutions we use when checking files for embedded / hidden data is binwalk
. In order to test the presence of embedded data, we copy the file to our Linux box, and then checking with Binwalk returns the following outcome.
As we can see from this output, there is no indication that there is anything hidden in this image. Binwalk has proven succesful in locating compressed or zip files within evidence, but in this instance has provided no leads or picot points to suggest that the image has anything suspicious within it.
We can further test with another solution, called steghide
.
We have a similar outcome using this tool, and as we can see, we are also not able to locate any hidden data using this solution (noting that in the creation phase, no passphrase was used to protect the data, so this is not the cause of the failure).
For data that cannot be easily identified, the use of specialist forensic software will be required to dientify such data. In this instance, we can see that my forensic software has located the additional data after the file in question was added as evidence and processed accordingly (we will not detail these steps as the software is proprietary and outside the scope of this example). What is relevant though, it was able to identify that the notsuspicious.jpg file actually had additional information, with the 'trailing data' evidence item located:
The file thumbnail.jpg is the notsuspicious.jpg file (thumbnail version) we have been inspecting all along, however, the trailing data is new and interesting, clicking on this presents the following data (the plain text translation of the Hex data proves our suspicions).
Through this entire process we have been able to hide data through embedding a file into an image, and without specialiast detection software, this could go largely undetected (although for malicious files an antivirus may pick up the fact that file signatures are not aligned). If there was enough evidence, that an investigator knew this image contained an embedded file, there are other tools that could be used, however, it is possible to see here, that some simple scans using open source software failed to detect the hidden file. Forensic software was able to carve out this file, and deliver a plain text version, of the (text file). Regardless of the file type, this would be able to obtained by the forensic software, however further analysis and effort may be required in order to fully obtain the hidden information - for the purpose of this a simple .txt file was embedded.
The ability to embed information within information, Steganography, is a very useful tool, especially in censored areas, however, is also something that is used for nefarious reasons. There may be legitimate reasons that you may choose to use a resource like this, but again, should be used responsibly, and for good. In my real word application it was the presence of out of context images in a file directory, that seemed peculiar that facilitated this exploration with the uncovering of some evidence as a result (an embedded PDF of an email print out).
If something seems odd, why not pull at that thread...