Humdrum Extras

serialize manpage


COMMAND

    serialize -- Rearrange multiple input spines into a single spine sequence.

SYNOPSIS

    serialize [-m[-i interp[-c [-s char][[-p[-S][input[> output]

OPTIONS

-c Serialize chord notes (sub-tokens) in addition to spines See also the -s option.
-f Preserve only the first sub-token in a data token See also the -l and -n options.
-i interp Process only input spines with the given exclusive interpretation.
-m Merge adjacent serialized spines in output so that there is only one exclusive interpretation at the top of the data and only one data terminator at the end. If different data types are being serialized, only the exclusive interpretation of the first spine will be used.
-l Preserve only the last sub-token in a data token See also the -f and -n options.
-n # Preserve only the nth sub-token in a data token See also the -f and -l options.
-p Serialize sub-spines by placing secondary spines after the primary one.
-s char Use the given character as a separator for multi-stop token serialization with the -c option.
-S Serialize sub-spines by interleaving multiple columns of the same spine.
-t tag Tagging string to place at end of each spine's data. When the -p option is used, also place the tag string between non-contiguous sub-spine data.

DESCRIPTION

    The serialize program is designed to facilitate data input into the context command which requires Humdrum data to contain only one spine. Serialize will merge data from multiple spines, sub-spines and sub-tokens into a single output spine according to various options which are described below.

    Suppose that you want to extract three-note sequences from the following data:

    One possible method for extracting the sequences would be to extract each spine and process separately with context:




         extract -f1 input | context -n 3 > output1
         extract -f2 input | context -n 3 > output2
         humcat output1 output2 | rid -d > output3 
    output3

    Serialize can be used to merge these two spines into a single spine which can then be processed with the context command, giving similar results without the need for temporary files:

    serialize 
    serialize -mtX
    serialize -mtX | context -n3 \
    | rid -d | grep -v X
    Without any options, all spines will be placed one after another in their original order from left to right in the input Humdrum data (shown in the first column of the above examples). Adding the -m option will merge the individual serialized spines into a single spine (regardless of whether they are actually of the same data type), and the option -t X places the string "X" at the end of each original spine's data in the merged output.

    A use of serialize with context is given in the last column of the above examples. Context is used to extract triplets of notes. Using the rid command with the -d option removes null tokens (.) from the data, and grep -v X means remove any line in the data which contains an X. Any output line from context which contains an X in this case is invalid, since that would mean the sequence crosses the boundary between the end of one spine and the start of another spine.

    Spine Terminology

    The following figure shows names for various components of Humdrum data structure. It is useful to know this terminology and how it relates to serialize options.

    • A column is a series of tokens (cells) occurring at a fixed field count (using tab characters as the field separator) from the beginning of a line. Humdrum spines can have a variable column width, so a fixed column position does not always have the same meaning as a spine of data. The command-line tool cut can be used to extract a column from a Humdrum file (which is not a common thing to do).

    • A spine of data is shown in the second part of the above figure. A spine's starting column may shift depending on how many columns the previous spines on the line take up, and the width of a spine may be more than one column wide. The extractx program can be used to extract a particular spine or set of spines. The process of extracting spines is more complicated than the process of extracting columns which cut does. Spine manipulators: **, *^, *v, *x, *+, and *- can alter the column position and token width of spines on each line.

    • As just alluded to, spines can split into more than one column (by using the *^ spine manipulator), and these multiple columns are called sub-spines. The above example has a section with two sub-spines which merge back into a single-column spine after the *v manipulators.

      The -S option will place all sub-spine data into a single column in the output. Each sub-spine on the row will follow on a separate line so that the data is interleaved. The -p option also collapses sub-spines into a single spine, but secondary spines are placed after the primary spine (the first sub-spine when a spine is split).

    • Data fields within spines are called tokens. The Humdrum file format does not have a formal specification for multiple items within a single token. However, different data types defined within spines may have a sub-token formalization. For example, **kern data can have multiple-stops (chords) containing several pitches within a single token. A C major chord in one token can be represented as "4c 4e 4g". Each of the three pitches are separated from the other by a single space character. The top illustration of sub-tokens on the far right in the above figure shows an example of using another character such as a colon (:) to separate the sub-tokens.

      The -c option will split sub-tokens into tokens on successive lines in the output data. By default, the sub-token separator is a space character (to match the sub-token separator definition for **kern). But the sub-token separator can be set to any other single character by adding the -s option.

    Extracting data from specific data types

    By using the -i option, you can selectively extract a particular exclusive interpretation spine set from the input data. The following example serializes data from the input file which uses the **a exclusive interpretation. The name of the exclusive interpretation can optionally start with the formal two stars (**), or they can be omitted. If you include the stars, then you must also enclose the name in single quotes so that the command-line interpreter does not parse the name as a file (which would then interpret the * as a wildcard character).
    serialize -i a input
    or
    serialize -i '**a' input
    input
    output
    The context command cannot handle multiple spines serialized as shown in the above example, but instead requires only a single spine (without sub-spines) to occur in its input. The -m option can be used to merge multiple serialized spines into a single data stream. The exclusive interpretations do not need to match. The exclusive interpretation of the first spine found in the input data will be used in the output data.
    serialize -m -i a input
    input
    output

    Sub-spine serialization

    To remove sub-spines (columns which both share the same initial exclusive interpretation declaration) in addition to serializing parallel spines, use the -S option. By default sub-spines are not merged into single-column spine (shown on left below). However, if you add the -S option, sub-spine data will be interleaved row-by-row (shown on right below).

    serialize -m input > output
    input
    output
    serialize -m -S input > output
    input
    output

    An alternate form of sub-spine serialization can be done by adding the -p option. This will cause secondary sub-spines to be serialize after all of the primary sub-spine has been processed. It is useful to add the -t option to add a non-sequential tag string in the output data so that non-sequential "sequences" can be filtered from the output of context.

    serialize -mp input | rid -i
    input
    output
    serialize -mptX input | rid -i
    input
    output

    Chord-note serialization

    Using the -c option will also serialize chord-notes (multiple stops):
    serialize -c input > output
    input
    output

    If an exclusive interpretation data type uses a different separator than space for separating multiple elements, the -s option can be used to specify the different separator character:

    serialize -c -s ':' input > output
    input
    output

    When the chord-note serialization option is used without also specifying the sub-spine option, there will be cases where the output from serialize is not a valid Humdrum file. Invalid Humdrum file format data will be output if chords are present in any subspine:

    serialize -c input > output
    input
    output

    If you use the -c option with the -S or -p options, the output will always have a valid structure. Also, if there are no chords during sections of the spine which contains no sub-spines, the -c option will also generate valid Humdrum file syntax in the output.

    Sub-token selection

    Three options (-f, -l and -n) can be used to extract particular sub-tokens within the data. For example, if you are interested in note-to-note transitions of the top or bottom note in a chord, use the -f option to select the first sub-token in the data (typically the bottom note in a chord), or the -l option to select the last sub-token in the data (typically the top note in a chord).

    serialize -f input > output
    input
    output
    serialize -l input > output
    input
    output

    The -n option can be used to select a sub-token located at a fixed order from the start or end of the sub-token list. Using negative indexes with -n will reference against the end of the token, while postive indexes will reference against the start of the token. The -f option is equivalent to "-n 1" and the -l option is equivalent to "-n -1". If the requested sub-token does not exist in the data, a null token will be output for that token. Here are examples of extracting the second sub-token from the front or back of the data tokens:

    serialize -n 2 input > output
    input
    output
    serialize -n -2 input > output
    input
    output

EXAMPLES

    Below is some Humdrum data which contains a mixture of **kern and **dint (diatonic interval) data. Suppose that you want to extract a list of the most common three-interval sequences from this data. In order to do this, the -i dint option should be used to select only the **dint spines. Also, -m should be used to merge the individual serialized spines, and -t X should be used to add an X tag at the end of each original spine's data so that invalid context sequences across spine boundaries can be filtered out.

    Running the following command generates a histogram of three-interval sequences sorted by the most common patterns first:

    serialize -i dint -m -t X | context -n 3 | rid -GLId \
    | grep -v X | sort | uniq -c | sort -nr

    The most common three-interval sequence is +2 +2 +2 (up three seconds) which occurs 5 times in the data. Here is the meaning of each segment of the above command string:

      serialize
        -i dint
        -m -t X
      Extract the **dint spines into a single merged column, placing an "X" token after the end of each original spine's data.
      context -n 3 Use the Humdrum Toolkit command context to join overlapped triplets of data into a single token.
      rid -GLId Remove Humdrum-file structure, leaving only non-null Humdrum data.
      grep -v X Remove any lines which have the character X on it. This is the tag string which marks the end of the spine. Any sequence generated by context which has this tag is crossing the boundary between the end of one spine and the start of another spine, so that particular sequence should be removed.
      sort Sort input lines into alphabetical order.
      uniq -c Remove duplicate adjacent lines, counting how many adjacent lines were identical in input (-c).
      sort -nr Sort lines of input into reverse (-r) numerical order (-n).

    Counting harmonic intervals

    Serialize can be used to process hint output so that individual intervals can be counted to generate a histogram of intervals. In the following J.S. Bach chorale, the most common simultaneous note attack interval is a minor third, occurring 27 times (excluding repeats).

    humcat h://371chorales/chor021.krn | hint -a | serialize -c | \
    egrep -v "^=|^-" | rid -GLId | sort | uniq -c | sort -nr > output

      humcat ... Download the data for J.S. Bach's Chorale 21 (Breitkopf & Härtel numbering).
      hint -a Identify harmonic intervals in the music, examining the relations between all simultaneous note permutations. Only simultaneous note attacks are compared.
      serialize -c Split sub-tokens in hint output into separate tokens.
      egrep -v "^=|^-" Remove barlines and single-note intervals.
      rid -GLId Remove Humdrum data structure, leaving only non-null data.
      sort Sort data lines alphabetically.
      uniq -c Remove duplicate adjacent lines, counting the number of duplicates.
      sort -nr Sort lines numerically in reverse order.

    The following command sequence can be used to create a histogram of harmonic intervals weighted by the duration of the interval in the score. In the previous example, only simultaneous note attacks are compared (including ending notes of ties), but the following process will not have this limitation. The following example uses ditto to fill in null tokens with the pitch that they represent. Timebase is used to make each line of data have the same duration (sixteenth notes in this case). Therefore, the output histogram lists the duration of each interval in the music in units of sixteenth notes.

    humcat h://371chorales/chor021.krn | timebase -t 16 | ditto | hint -a | \
    serialize -c | ridx -GLIMd | sort | uniq -c | sort -nr > output

    In this case the most common harmonic interval is an octave which occurs between 124 sixteenth notes (31 quarter notes) duration in the score. Minor thirds are only slightly less common in terms of duration at 122 sixteenth notes (30.5 quarter notes).

    Here is a description of each command in the command sequence for creating the above data:

      humcat ... Download the data for J.S. Bach's Chorale 21 (Breitkopf & Härtel numbering).
      timebase -t 16 Make each data line equivalent to a sixteenth note.
      ditto Replace null-tokens with the data token which they represent.
      hint -a Identify harmonic intervals in the music, examining the relations between all simultaneous note permutations. Only simultaneous attacks are compared.
      serialize -c Split sub-tokens in hint output into separate tokens.
      ridx -GLIMd Remove Humdrum data structure and barlines, leaving only non-null data.
      sort Sort data lines alphabetically.
      uniq -c Remove duplicate adjacent lines, counting the number of duplicates.
      sort -nr Sort lines numerically in reverse order.

    More example usages of the serialize program are available on the serialize examples page

SEE ALSO

BUGS

    Be careful when using -c without the -S or -p option, since invalid Humdrum file syntax may be output from the program.

DOWNLOAD

    The compiled serialize program can be downloaded for the following platforms:
    • Linux (i386 processors) (dynamically linked) compiled on 16 Apr 2013.
    • Windows compiled on 29 Jun 2012.
    • Mac OS X/i386 compiled on 13 Nov 2013.

    The source code for the program was last modified on 1 Apr 2013. Click here to go to the full source-code download page.