PNG extracted via from PDF via Flate decoding is unrecognisable - C#

The C# software I'm involved with writing has a component that involves the reading of barcodes from scanned documents. The PDFs themselves are opened using PDFSharp.

Unfortunately we're encountering an issue with the process when it involves Flate Decoding of PDFs. Basically, all we get is a bunch of fuzz, which means there is no barcode to check and the document is not recognised.

Our code (which we shamelessly "borrowed" from another Stack Overflow case!) is as follows:

private FileInfo ExportAsPngImage(PdfDictionary image, string sourceFileName, ref int count)
    {
        //This code basically comes from http://forum.pdfsharp.net/viewtopic.php?f=2&t=2338#p6755 
        //and http://stackoverflow.com/questions/10024908/how-to-extract-flatedecoded-images-from-pdf-with-pdfsharp
        string tempFile = string.Format("{0}_Image{1}.png", sourceFileName, count);

        int width = image.Elements.GetInteger(PdfImage.Keys.Width);
        int height = image.Elements.GetInteger(PdfImage.Keys.Height);
        int bitsPerComponent = image.Elements.GetInteger(PdfImage.Keys.BitsPerComponent);
        var pixelFormat = new PixelFormat();

        switch (bitsPerComponent)
        {
            case 1:
                pixelFormat = System.Drawing.Imaging.PixelFormat.Format1bppIndexed;
                break;
            case 8:
                pixelFormat = System.Drawing.Imaging.PixelFormat.Format8bppIndexed;
                break;
            case 24:
                pixelFormat = System.Drawing.Imaging.PixelFormat.Format24bppRgb;
                break;
            default:
                throw new Exception("Unknown pixel format " + bitsPerComponent);
        }

        var fd = new FlateDecode();
        byte[] decodedBytes = fd.Decode(image.Stream.Value);
        byte[] resultBytes = null;
        int newWidth = width;
        int alignment = 4;

        if (newWidth % alignment != 0)
        //Image data in BMP files always starts at a DWORD boundary, in PDF it starts at a BYTE boundary.
        //Most images have a width that is a multiple of 4, so there is no problem with them.
        //You must copy the image data line by line and start each line at the DWORD boundary.
        {
            while (newWidth % alignment != 0)
            {
                newWidth++;
            }

            var copy_dword_boundary = new byte[height, newWidth];
            for (int y = 0; y < height; y++)
            {
                for (int x = 0; x < newWidth; x++)
                {
                    if (x <= width && (x + (y * width) < decodedBytes.Length))
                        // while not at end of line, take orignal array
                        copy_dword_boundary[y, x] = decodedBytes[x + (y * width)];
                    else //fill new array with ending 0
                        copy_dword_boundary[y, x] = 0;
                }
            }
            resultBytes = new byte[newWidth * height];

            int counter = 0;
            for (int x = 0; x < copy_dword_boundary.GetLength(0); x++)
            {
                for (int y = 0; y < copy_dword_boundary.GetLength(1); y++)
                {   //put 2dim array back in 1dim array
                    resultBytes[counter] = copy_dword_boundary[x, y];
                    counter++;
                }
            }
        }
        else
        {
            resultBytes = new byte[decodedBytes.Length];
            decodedBytes.CopyTo(resultBytes, 0);
        }

        //Create a new bitmap and shove the bytes into it
        var bitmap = new Bitmap(newWidth, height, pixelFormat);
        BitmapData bitmapData = bitmap.LockBits(new Rectangle(0, 0, bitmap.Width, bitmap.Height), ImageLockMode.WriteOnly, bitmap.PixelFormat);
        int length = (int)Math.Ceiling(width * bitsPerComponent / 8.0);

        for (int i = 0; i < height; i++)
        {
            int offset = i * length;
            int scanOffset = i * bitmapData.Stride;
            Marshal.Copy(resultBytes, offset, new IntPtr(bitmapData.Scan0.ToInt32() + scanOffset), length);
        }
        bitmap.UnlockBits(bitmapData);

        //Now save the bitmap to memory
        using (var fs = new FileStream(String.Format(tempFile, count++), FileMode.Create, FileAccess.Write))
        {
            bitmap.Save(fs, ImageFormat.Png);
        }

        return new FileInfo(tempFile);
    }

Unfortunately, all we get out of it is this http://i.stack.imgur.com/FwatQ.png

Any ideas on where we're going wrong, or suggestions for things we might try would be greatly appreciated.

Cheers

Answers


Thanks for the suggestions guys. One of the other developers managed to crack it - it was (as Jongware suggested) a JPEG, but it was actually zipped as well! Once unzipped it could be processed and recognised as normal.


Need Your Help

PeopleController create action should redirect when model is valid

ruby-on-rails ruby ruby-on-rails-3 rspec

Have been given the task of finding out why some rspec tests are failing. I didn't write the code for any of it and am getting stuck. Here is the out put:

Why am I getting the same performance from Hardware Serial and Software Serial?

performance camera arduino serial-port

I'm trying to interface (1) LinkSprite JPEG Color Camera TTL Interface - Infrared and (2) Arduino Mega 2560 connected to my laptop. While I am able to print the HEX values of the images, it takes a...