serialize manpage

COMMAND

serialize

SYNOPSIS

serialize [-m] [-i interp] [-c [-s char]] [[-p] [-S]] [input] [> output]

OPTIONS

ⓘ	`-c`	Serialize chord notes (sub-tokens) in addition to spines See also the `-s` option.
ⓘ	`-f`	Preserve only the first sub-token in a data token See also the `-l` and `-n` options.
ⓘ	`-i interp`	Process only input spines with the given exclusive interpretation.
ⓘ	`-m`	Merge adjacent serialized spines in output so that there is only one exclusive interpretation at the top of the data and only one data terminator at the end. If different data types are being serialized, only the exclusive interpretation of the first spine will be used.
ⓘ	`-l`	Preserve only the last sub-token in a data token See also the `-f` and `-n` options.
ⓘ	`-n #`	Preserve only the nth sub-token in a data token See also the `-f` and `-l` options.
ⓘ	`-p`	Serialize sub-spines by placing secondary spines after the primary one.
ⓘ	`-s char`	Use the given character as a separator for multi-stop token serialization with the `-c` option.
ⓘ	`-S`	Serialize sub-spines by interleaving multiple columns of the same spine.
ⓘ	`-t tag`	Tagging string to place at end of each spine's data. When the `-p` option is used, also place the tag string between non-contiguous sub-spine data.

DESCRIPTION

serialize

context

Suppose that you want to extract three-note sequences from the following data:

One possible method for extracting the sequences would be to extract each spine and process separately with context:

     extract -f1 input | context -n 3 > output1
     extract -f2 input | context -n 3 > output2
     humcat output1 output2 | rid -d > output3

output3

Serialize can be used to merge these two spines into a single spine which can then be processed with the context command, giving similar results without the need for temporary files:

serialize **kern 4c . 4c . 4c . 4c . *- **kern 8d 8d 8d 8d 8d 8d 8d 8d *- serialize -mtX **kern 4c . 4c . 4c . 4c . X 8d 8d 8d 8d 8d 8d 8d 8d X *- serialize -mtX | context -n3 \ | rid -d | grep -v X **kern 4c 4c 4c 4c 4c 4c 8d 8d 8d 8d 8d 8d 8d 8d 8d 8d 8d 8d 8d 8d 8d 8d 8d 8d *-

-m

-t X

X

A use of serialize with context is given in the last column of the above examples. Context is used to extract triplets of notes. Using the rid command with the -d option removes null tokens (.) from the data, and grep -v X means remove any line in the data which contains an X. Any output line from context which contains an X in this case is invalid, since that would mean the sequence crosses the boundary between the end of one spine and the start of another spine.

Spine Terminology

The following figure shows names for various components of Humdrum data structure. It is useful to know this terminology and how it relates to serialize options.

A column is a series of tokens (cells) occurring at a fixed field count (using tab characters as the field separator) from the beginning of a line. Humdrum spines can have a variable column width, so a fixed column position does not always have the same meaning as a spine of data. The command-line tool cut can be used to extract a column from a Humdrum file (which is not a common thing to do).
A spine of data is shown in the second part of the above figure. A spine's starting column may shift depending on how many columns the previous spines on the line take up, and the width of a spine may be more than one column wide. The extractx program can be used to extract a particular spine or set of spines. The process of extracting spines is more complicated than the process of extracting columns which cut does. Spine manipulators: **, *^, *v, *x, *+, and *- can alter the column position and token width of spines on each line.
As just alluded to, spines can split into more than one column (by using the *^ spine manipulator), and these multiple columns are called sub-spines. The above example has a section with two sub-spines which merge back into a single-column spine after the *v manipulators.
The -S option will place all sub-spine data into a single column in the output. Each sub-spine on the row will follow on a separate line so that the data is interleaved. The -p option also collapses sub-spines into a single spine, but secondary spines are placed after the primary spine (the first sub-spine when a spine is split).
Data fields within spines are called tokens. The Humdrum file format does not have a formal specification for multiple items within a single token. However, different data types defined within spines may have a sub-token formalization. For example, **kern data can have multiple-stops (chords) containing several pitches within a single token. A C major chord in one token can be represented as "4c 4e 4g". Each of the three pitches are separated from the other by a single space character. The top illustration of sub-tokens on the far right in the above figure shows an example of using another character such as a colon (:) to separate the sub-tokens.
The -c option will split sub-tokens into tokens on successive lines in the output data. By default, the sub-token separator is a space character (to match the sub-token separator definition for **kern). But the sub-token separator can be set to any other single character by adding the -s option.

Extracting data from specific data types

-i

**

*

wildcard character

`serialize -i a input` or `serialize -i '**a' input`
input a b c a a1 b c a2 a1 b c a2 a1 b c a2 a1 b c a2 - - - -	output *a a1 a1 a1 a1 - *a a2 a2 a2 a2 -

context

-m

`serialize -m -i a input`
input a b c a a1 b c a2 a1 b c a2 a1 b c a2 a1 b c a2 - - - -	output *a a1 a1 a1 a1 a2 a2 a2 a2 -

Sub-spine serialization

-S

serialize -m input > output

input

output

serialize -m -S input > output

input

output

An alternate form of sub-spine serialization can be done by adding the -p option. This will cause secondary sub-spines to be serialize after all of the primary sub-spine has been processed. It is useful to add the -t option to add a non-sequential tag string in the output data so that non-sequential "sequences" can be filtered from the output of context.

serialize -mp input | rid -i

input

output

serialize -mptX input | rid -i

input

output

Chord-note serialization

-c

serialize -c input > output

input

output

If an exclusive interpretation data type uses a different separator than space for separating multiple elements, the -s option can be used to specify the different separator character:

serialize -c -s ':' input > output

input

output

When the chord-note serialization option is used without also specifying the sub-spine option, there will be cases where the output from serialize is not a valid Humdrum file. Invalid Humdrum file format data will be output if chords are present in any subspine:

serialize -c input > output

input

output

If you use the -c option with the -S or -p options, the output will always have a valid structure. Also, if there are no chords during sections of the spine which contains no sub-spines, the -c option will also generate valid Humdrum file syntax in the output.

Sub-token selection

-f

-l

-n

-f

-l

serialize -f input > output

input

output

serialize -l input > output

input

output

The -n option can be used to select a sub-token located at a fixed order from the start or end of the sub-token list. Using negative indexes with -n will reference against the end of the token, while postive indexes will reference against the start of the token. The -f option is equivalent to "-n 1" and the -l option is equivalent to "-n -1". If the requested sub-token does not exist in the data, a null token will be output for that token. Here are examples of extracting the second sub-token from the front or back of the data tokens:

serialize -n 2 input > output

input

output

serialize -n -2 input > output

input

output

EXAMPLES

-i dint

-m

-t X

X

context

Running the following command generates a histogram of three-interval sequences sorted by the most common patterns first:

The most common three-interval sequence is +2 +2 +2 (up three seconds) which occurs 5 times in the data. Here is the meaning of each segment of the above command string:

`serialize -i dint -m -t X`	Extract the **dint spines into a single merged column, placing an "X" token after the end of each original spine's data.
`context -n 3`	Use the Humdrum Toolkit command context to join overlapped triplets of data into a single token.
`rid -GLId`	Remove Humdrum-file structure, leaving only non-null Humdrum data.
`grep -v X`	Remove any lines which have the character `X` on it. This is the tag string which marks the end of the spine. Any sequence generated by context which has this tag is crossing the boundary between the end of one spine and the start of another spine, so that particular sequence should be removed.
`sort`	Sort input lines into alphabetical order.
`uniq -c`	Remove duplicate adjacent lines, counting how many adjacent lines were identical in input (`-c`).
`sort -nr`	Sort lines of input into reverse (`-r`) numerical order (`-n`).

Counting harmonic intervals

Serialize

hint

`humcat ...`	Download the data for J.S. Bach's Chorale 21 (Breitkopf & Härtel numbering).
`hint -a`	Identify harmonic intervals in the music, examining the relations between all simultaneous note permutations. Only simultaneous note attacks are compared.
`serialize -c`	Split sub-tokens in hint output into separate tokens.
`egrep -v "^=\|^-"`	Remove barlines and single-note intervals.
`rid -GLId`	Remove Humdrum data structure, leaving only non-null data.
`sort`	Sort data lines alphabetically.
`uniq -c`	Remove duplicate adjacent lines, counting the number of duplicates.
`sort -nr`	Sort lines numerically in reverse order.

The following command sequence can be used to create a histogram of harmonic intervals weighted by the duration of the interval in the score. In the previous example, only simultaneous note attacks are compared (including ending notes of ties), but the following process will not have this limitation. The following example uses ditto to fill in null tokens with the pitch that they represent. Timebase is used to make each line of data have the same duration (sixteenth notes in this case). Therefore, the output histogram lists the duration of each interval in the music in units of sixteenth notes.

In this case the most common harmonic interval is an octave which occurs between 124 sixteenth notes (31 quarter notes) duration in the score. Minor thirds are only slightly less common in terms of duration at 122 sixteenth notes (30.5 quarter notes).

Here is a description of each command in the command sequence for creating the above data:

`humcat ...`	Download the data for J.S. Bach's Chorale 21 (Breitkopf & Härtel numbering).
`timebase -t 16`	Make each data line equivalent to a sixteenth note.
`ditto`	Replace null-tokens with the data token which they represent.
`hint -a`	Identify harmonic intervals in the music, examining the relations between all simultaneous note permutations. Only simultaneous attacks are compared.
`serialize -c`	Split sub-tokens in hint output into separate tokens.
`ridx -GLIMd`	Remove Humdrum data structure and barlines, leaving only non-null data.
`sort`	Sort data lines alphabetically.
`uniq -c`	Remove duplicate adjacent lines, counting the number of duplicates.
`sort -nr`	Sort lines numerically in reverse order.

More example usages of the serialize program are available on the serialize examples page

BUGS

-c

-S

-p

DOWNLOAD

serialize

Linux (i386 processors) (dynamically linked) compiled on 16 Apr 2013.
Windows compiled on 29 Jun 2012.
Mac OS X/i386 compiled on 13 Nov 2013.

The source code for the program was last modified on 1 Apr 2013. Click here to go to the full source-code download page.

serialize manpage

COMMAND

SYNOPSIS

OPTIONS

DESCRIPTION

Spine Terminology

Extracting data from specific data types

Sub-spine serialization

Chord-note serialization

Sub-token selection

EXAMPLES

Counting harmonic intervals

SEE ALSO

BUGS

DOWNLOAD