Humdrum Extras

humpdf manpage


COMMAND

    humpdf -- Embed Humdrum files into a PDF file.

SYNOPSIS

    humpdf -p input.pdf input(s) > output.pdf

OPTIONS

-A Output only the altered portion of the PDF file.
-c Report the number of Humdrum files embedded in the PDF file. See -n and --list options. (not yet implemented).
-C Report the number of all files embedded in the PDF file. See -N and --list-all options. (not yet implemented).
-D Store directory information of input Humdrum files from the command line.
-l License file to embed into the PDF. (not yet implemented)
-n # Extract the nth Humdrum file embedded in the PDF file (offset from 1). (not yet implemented).
-N # Extract the nth file embedded in the PDF file (offset from 1). (not yet implemented).
-p pdf Use the following PDF file as input.
-P prefix Use the following prefix before embedded filenames.
--list List the embedded Humdrum files in input PDF. (not yet implemented)
--list-all List all embedded files in input PDF (not only Humdrum Files). (not yet implemented)

DESCRIPTION

EXAMPLES

    Embedding data

    Here is an example demonstrating how to embed a Humdrum file into a PDF file of the musical notation created from the Humdrum file:
        hum2abc middlec.krn > middlec.abc
        abcm2ps middlec.abc  -O=
        ps2pdf middlec.ps 
    The final file: middlec.pdf which is used to embed the original source file:
        humpdf -p middlec.pdf middlec.krn > middlecembedded.pdf 

    Notice that the input PDF file does not contain an embedded file, while the output file does.

    Technical notes

    PDF files are composed of a series of segments called indirect objects. Each indirect object starts with two numbers: (1) the object number, and (2) the generation number (which is typically set to zero). Following these two numbers is the string "obj" followed by the content of the indirect object, ending with the string "endobj":

    Above this basic file organization comes a header. The first line of the file must start with the string "%PDF-" followed by the PDF standard to which the file conforms. In the following example, PDF specification 1.4 is being used. The second line of the file should contain a comment line which has at least 4 bytes of data containing high-order ASCII data (bytes with values greater than 127). This is a hack used to force certain pieces of software or operating systems to process the file as binary data rather than text data (newlines of the different Windows/Apple/Unix formats can exist in the file, and reading the file as text data may incorrectly translate these formats to the native format of the local computer).

    At the bottom of a PDF file, the last line is required to be "%%EOF". On the penultimate line, the byte offset of the most recent cross reference table is found; and on the line before that the string "starxref" is required. In this case the value 192 means to go to the 193rd byte in the file (or 192nd byte when counting the bytes starting at 0). At that point you will find a cross reference table which starts with the string "xref". Following the "xref" string are two numbers: (1) The starting indirect object number being listed below, and (2) how many indirect objects are listed in sequential order after the first one. In this example, the string "0 5" means that the first object is number 0, and there are 4 indirect objects entries after the first one in the list below. Indirect object 0 is a special object which functions like a NULL pointer in C.

    Each object entry in the cross referecne table consists of a line with exactly 20 characters. The first ten characters are digits which gives the byte-offset value for the start of the indirect object. For example, object 1 starts at byte offset 16 (the 17th byte in the file). Offsets smaller than a billion are padded with zeros. Next comes exactly one space, followed by five digits which indicates the generation number (typically set to zero), then another space, then either the character 'n' (meaning "iN use") or 'f' (meaning a (Free [unused] object), followed by a two-character newline (0dh 0ah in hex notation). After a listing of object offsets is given, another set of entries can be given, starting with the first object number and a count of how many objects in the seqential list.

    The trailer which follows the xref section gives a few important pieces of information in the form of a dictionary (which is a set of associative pairs of keys and values surrounded by double angle brackets: <<...>>). A typical trailer is required to have an entry called /Size and /Root, and may have other optional entries.

    The /Size entry's value of 5 means that there are a maximum of five indirect objects in the PDF file (as might be suitable for the above example PDF with objects number 1-4, plus the 0 object). The /Root entry's value "0 1 R" is a reference to indirect object number 1 (generation 0). The Root object is the catalog dictionary of the PDF and behaves like the root of a file system.

    Incremental update

    The humpdf program embeds data files using a feature of PDF files called incremental updates. With this method of chaning a PDF file, the original contents of the file can be recovered from the modified file if necessary. All new objects are appended to the previous PDF file's contents, as well as an updated version of any indirect object from the previous file contents.

    Here is the basic structure of the middlec.pdf file with the compressed data for the page description removed:

    Looking at the trailer, the root object is number 1. Object 1 contains three entries in its dictionary: /Type, /Pages, and /Metadata. The /Catalog value of the /Type entry is required for the root object, and the /Pages entry contains a pointer to indirect object 3 for a list of the pages in the file. Going to object 3, you will see there is a list of pages, one page which is contained in object 4. Object 4 gives a basic descriptive informatnion for the page, and in particular the contents for the page which is contained in object 5. Object 5's content is compressd, so it has been removed from the above example text.

    To embed data files in a PDF, there is a analogous entry to /Pages in the Root dictionary called /Names. The /Names entry in the root dictionary points to another object which contains an entry called /EmbeddedFiles which in turn points to an indirect object containing a list of embedded files. [...]

REFERENCES

BUGS

    The humpdf program currently cannot be used to embed files into a PDF which already has embedded files.

DOWNLOAD

    The compiled humpdf program can be downloaded for the following platforms:
    • Linux (i386 processors) (dynamically linked) compiled on 28 Jun 2012.
    • Windows compiled on 29 Jun 2012.
    • Mac OS X/i386 compiled on 13 Nov 2013.

    The source code for the program was last modified on 6 May 2010. Click here to go to the full source-code download page.