An Exploration of PNGs
This project originally started because I wanted a way to extract pixel data from a PNG, transform it, and export it as another PNG. For this task, I needed to understand what a PNG was made of, so I could take it apart and put it back together. For starters, what is a PNG? A PNG is a binary file made up of blocks, each block has a type, length, data and crc code. PNGs turned out to be much more complicated than I originally imagined, made up of many different rules and configurations. I will not exhaustively cover the contents of the PNG standard (nor would I be able to), but I will cover enough so you could also recreate a small decoder of PNGs!
Detecting PNGs & Extracting Blocks
A good starting point is this, how do we know that a stream of binary data is actually a PNG? Every PNG starts with a series of
eight bytes known as magic numbers. These bytes are 89 50 4e 47 0d 0a 1a 0a
. If these are the first eight bytes of the stream, in
all likelihood we have our PNG! Now for the blocks. As mentioned above, each block in a PNG consists of four parts. Length, type, data, crc, in that
order. Length, type and crc are always four bytes, while data's length is specified by the four length bytes. Blocks fall into two categories, critical
and ancillary. Ancillary chunks are not essential for rendering the PNG, so I will ignore them completely. Critical chunks are essential, and I'll list
them here.
IHDR
- Defines the metadata for an image, bit-depth, color type, width and heightPLTE
- Necessary for color type three, contains list of colorsIDAT
- Contains the compressed and filtered pixel dataIEND
- Denotes the end of the PNG, last block to appear
Let's start with a very basic example, a single red pixel, provided directly on the PNG page of wikipedia. The entire PNG is just this sequence: 89 50 4E 47 0D 0A 1A 0A 00 00 00 0D 49 48 44 52 00 00 00 01 00 00 00 01 08 02 00 00 00 90 77 53 DE 00 00 00 0C 49 44 41 54 08 D7 63 F8 CF C0 00 00 03
01 01 00 18 DD 8D B0 00 00 00 00 49 45 4E 44 AE 42 60 82
. Wow! That seems like a lot of information for one pixel. Really though, most of this is just the other necessary bits, not the pixel data. Let's break
it down.
- Magic Numbers -
89 50 4e 47 0d 0a 1a 0a
IHDR
-00 00 00 0d 49 48 44 52 00 00 00 01 00 00 00 01 08 02 00 00 00 90 77 53 de
IDAT
-00 00 00 0c 49 44 41 54 08 d7 63 f8 cf c0 00 00 03 01 01 00 18 dd 8d b0
IEND
-00 00 00 00 49 45 4e 44 ae 42 60 82
Notice how the above has no PLTE
block? It's not needed, since this PNG uses color mode two.
How did we separate out those blocks? Applying what we know about the block lengths from above, we find that after the eight bytes for the magic numbers,
we have 00 00 00 0d
, followed by 49 48 44 52
, which we know to be length and type respectively. The
first sequence gives the data a length of 13 and the second gives the type of IHDR
. Since we know the length of data, and that
the crc is always four bytes, we can separate the blocks. I recommend using a tool like this to analyze different PNGs.
Getting Information from a PNG
With our now separate blocks, we can set aside the magic numbers and IEND
, because these will always be the same. Focusing first
on IHDR
, we can see the two, four bytes sets after the type, 00 00 00 01
and 00 00 00 01
. These represent the width and height respectively, which are of course each one, hence one pixel. The next byte is 08
, meaning
the bit depth of the image is eight bits per channel. 02
means color mode two, or Truecolor (R, G, B per pixel.) The last three bytes
refer to the compression and filter method, which will always be zero, and interlacing, which I won't be covering.
Focusing now on the IDAT
chunk, we can do our usual routine to separate out the data, which is 08 d7 63 f8 cf c0 00 00 03 01 01 00
. IDAT
data uses the deflate algorithm. I'm not going to claim to know much about this part, but what's important is inflating this
block, I used the node zlib library for this. As a side note, if multiple IDAT
blocks are used, the buffers of data can be concatenated,
then inflated. Ok, so what do we get after using inflate? We get 00 ff 00 00
! This is starting to look much more like actual
pixel data.
Processing Information from a PNG
To get the pixel data out of a PNG, we have to start be undoing the filter. What is filtering? It is a technique by which image data can be more efficiently compressed into a PNG. Filters come in 4 types, and are applied per row. A PNG may contain multiple different types of filters accross it's rows. The key of the filter will always be the first byte of a row. Filter 0 (which is that first byte from our red pixel!) means that no actual filtering has taken place, so the pixel data and bytes or one to one. Congratulations! That means for non-indexed color modes with only filtering 0, you can now decode PNGs! That might seem narrow, but it puts us on a path to a full decoder. Anyways, filter one adds the pixel from the left, filter two adds the pixel from above, filter three adds the average of those two pixels. Filter four is a litter more complicated, implementing a "Paeth average" function, which returs on average of those two pixels and one left and up. Now, you are left with the final pixel data!
Getting the pixel data back into a PNG
This is honestly the easy part. You can write something to do this all in reverse, or in you don't mind a larger file, just arrange the data back into the proper block shapes, prepend a zero onto the front of each row (so the pixel data is 1 to 1) and change the color mode appropriatly. Quick and dirty PNG!
Closing thoughts
Processing the actual image data from PNGs is where I had the most dificulty. Some of the references I've included at the end of this article were invaluable when trying to make heads or tails out of what I was doing wrong. Another big misconception I had was asuming all PNGs were created equaly. In reality, PNGs can have many different configurations. Between five color types and another five possible bit depths (minus some configurations that aren't allowed) you can derive 15 different combos! That's not even including interlacing, or the meriad other block types that can suggest different rendering behaviour. This proved to be a wonderful exercise in research, even having me read the actual w3c specification! I'd highly recommend this project to someone who want to get more experience parsing binary data. Thank you very much if you made it this far!
Resources
- PNG chunk inspector - Super useful to see individual chunks, especially one containing meta-data
- Writing a simple PNG decoder - Absolutely essential for this article, got me through allot of hurdles
- W3C PNG specification - Long and technical, but absoltely essential for some tiny details
- How PNG filters work - Huge help to understand and impleent filters
- Portable Network Graphics - Great suite of PNG resources, including great test images