In order to hack a file, we need to know how to recognize data in a file. We can do this easily in a hex editor found from a list here.
I quickly whipped up simple programs in four different languages that all wrote the same amount of data in the following order: Int, flaot, double, bool, char, str, and byte. Obeying the law of logic: I went to seek out the patterns between the files in HxD (the hex-e I use). This is what I got:
Hacking | C#: L : P | Java: L : P | VB.Net: L : P | C++: L : P
- Int | 4 : n/a | 2 : n/a | 4 : n/a | 4 : n/a
- Float | 4 : ?-(3F-4F) | 5 : (3F-4F)-? | 4 : ?-(3F-4F) | 4 : ?-(3F-4F)
- Double | 8 : ?-(3F-4F) | 9 : (3F-4F)-? | 8 : ?-(3F-4F) | 8 : ?-(3F-4F)
- Boolean | 1 : 00, 01 | 1 : 00, 01 | 1 : 00, 11 | 1 : 00, 01
- Char | 1 : n/a | 3 : 00-?-00 | 1 : n/a | 1 : n/a
- String | ? : 0C-?, 0D-?| ? : 000C-?-00 | ? : 0C-?, 0D-?| ? : 0C-?, 0D-?
- Byte | 1 : n/a | 2 : ?-00 | 1 : n/a | 1 : n/a
- ? : Unknown length or collection of hex digits
- - : Append
- Ruby, Python, and Perl's base language is C, so, I already have two C-languages^.
- The most common languages used cross-platform-wise is indeed C as well as Java.
- Java is little-endian with reading and writing files while C is big-endian.
C# | Length : Patterns | (Found in homebrewed editor)
- Intgr: 4bytes : N / A
- Float: 4bytes : 3D, 3E, 3F-4F
- Doubl: 8bytes : 3F-4F
- Booln: 1bytes : 00, 01
- Chars: 3bytes : E0-EF
- Bytes: 1bytes : N / A
- Strng: ?bytes : 0E, 0C
Now, if you notice, some of the patterns I typed up for say 'Int' are written 'n/a'. That is because there isn't really any recognizable hex pattern; all we know is that it either is 2 or 4 bytes long. But, being a programmer, I know that integers can never have a decimal value: they are whole numbers: Not: '0.0' But: '0'. So I know that if I was to take the 2-4 bytes and convert it to an int and the result was a decimal value, I would know the 2-4 bytes where not an integer. For the bytes and chars, it is a bit more complex: a char is what you would basically see in ascii as a visible character or a space, while a byte is basically what your seeing in hex-representation in a hex editor. However, usually the pattern(s) found for those data types isn't in the data itself but rather before or after the collection, or not even in the file but in the program instead that was the creator. [it is most common though, that it is one of the first two]. I'll leave you to finding that pattern as that is all part of file hacking anyways...
Using the patterns I provided above, you should be able to easily figure out the files data structure... The only thing left then, would be to figure out what the data means and/or how to use it.
Other interesting facts:
- Most files, especially created from professional programs, will usually have 'headers' (Byte or String representation of the file). Headers usually are to help eliminate program errors from reading in the wrong files, as there are many file extensions out there - some duplicated many times. Usually, headers are sought after to determine if the file being hacked is compressed or not, and what type of compression to use to decompress for further hacking of any embedded files. I wouldn't currently know of any other reason for hacking besides comparing files with different extensions.
- To better understand unfamiliar files and their structures, it would be best to understand the familiar files first. Such for example: Picture, Movie, or Model files. Once familiar with one structure the next won't be as difficult because you'll know what to be looking for where.
- Remember also: If it worked for one file of a given type, it should work for another; otherwise, you'll need to go back to the structure drawing board until you get it right, enough to satisfy your needs.
Good Luck!
Comments
No Comments Exist
Be the first, drop a comment!