Wednesday, March 30, 2011

PDF docs with Mixed Rasted Content

Mixed Rasted Content is the way of keeping images in PDF docs. With couple of words the background image or generally the images in PDF docs are kept in pieces (in segments) for improving the contrast resolution of a raster image composed of pixels. More info on MRC can be found here

Now the problem is that lots of PDF viewers including Okular, Evince, and lots of others are showing the PDF docs with complete noisy background image, i.e. the background image of the docs is just random composition of some colors. Even in Ubuntu distribution the thumbnails are shown in that way.

This is especially problematic when viewing scanned documents in MRC PDFs. Even the converted tools like swftool could not solve the problem, so the only reasonable way is to create PDFs without MRC enabled options. In ABBYY Finereader that can be achieved easily from the saving options.

Another options of doing this is using pdf2pdf converter below which will convert PDFs with MRC to ordinary ones.

The code pdf2pdf converter is:

# Convert a PDF to another PDF. This effectively strips
# out a lot of stuff from most PDF files.

gs=`which gs 2>/dev/null`
if [ ! -x "$gs" ]; then
    echo "Error: install ghostscript first" >&2
    exit 1

while true; do
    case "$1" in
    -?*) OPTS="$OPTS $1";;
    *) break;;

if [ $# -eq 2 ]; then
elif [ $# -eq 1 ]; then
    outfile="`basename \"$1\" .pdf`.new.pdf"
    cat <&2
Usage: pdf2pdf [--pdf-version (1.2|1.3|1.4)] [gs-options ...]  [output.pdf|-]

Converts a PDF from whatever PDF specification version it currently
exists as to the one specified by \`--pdf-version'. Default: 1.2

One side-effect of this conversion is the resulting document will have
the no-printing and no-copying flags removed in the output document if
they are set in the input document.
    exit 1

exec "$gs" $OPTIONS -q -dNOPAUSE -dBATCH -dSAFER -sDEVICE=pdfwrite \
    -dCompatibilityLevel="$pdfver" -sOutputFile="$outfile" -f "$1"