| serialize manpage |
COMMAND
serialize -- Rearrange multiple input spines into a single spine sequence.
SYNOPSISserialize [-m] [-i interp] [-c [-s char]] [[-p] [-S]] [input] [> output]
OPTIONS
ⓘ | -c | |
Serialize chord notes (sub-tokens) in addition to spines See also the -s option.
|
ⓘ | -f | |
Preserve only the first sub-token in a data token See also the -l and -n options.
|
ⓘ | -i interp | |
Process only input spines with the given exclusive interpretation.
|
ⓘ | -m | |
Merge adjacent serialized spines in output so that there is only one exclusive interpretation at the top of the data
and only one data terminator at the end.
If different data types are being serialized, only the exclusive
interpretation of the first spine will be used.
|
ⓘ | -l | |
Preserve only the last sub-token in a data token See also the -f and -n options.
|
ⓘ | -n # | |
Preserve only the nth sub-token in a data token See also the -f and -l options.
|
ⓘ | -p | |
Serialize sub-spines by placing secondary spines after the primary one.
|
ⓘ | -s char | |
Use the given character as a separator for multi-stop token serialization with the -c option.
|
ⓘ | -S | |
Serialize sub-spines by interleaving multiple columns of the same spine.
|
ⓘ | -t tag | |
Tagging string to place at end of each spine's data. When the -p option is used, also place the tag
string between non-contiguous sub-spine data.
|
DESCRIPTION
The serialize program is designed to facilitate data input into
the context
command which requires Humdrum data to contain only one spine.
Serialize will merge data from multiple spines, sub-spines and sub-tokens
into a single output spine according to various options which are
described below.
Suppose that you want to extract three-note sequences from the
following data:
One possible method for extracting the sequences would be to
extract each spine and process separately with context:
Serialize can be used to merge these two spines into a single spine
which can then be processed with the
context
command, giving similar results without the need for temporary files:
serialize
|
|
serialize -mtX
|
|
serialize -mtX | context -n3 \ | rid -d | grep -v X
|
Without any options, all spines will be placed one after another in
their original order from left to right in the input Humdrum data (shown
in the first column of the above examples). Adding the -m
option will merge the individual serialized spines into a single spine
(regardless of whether they are actually of the same data type), and
the option -t X places the string "X" at the end of
each original spine's data in the merged output.
A use of serialize with
context
is given in the last column of the
above examples. Context is used to extract triplets of notes.
Using the rid command with the -d option removes null tokens (.)
from the data, and grep -v X means remove any line in the
data which contains an X. Any output line from context
which contains an X in this case is invalid, since that would mean
the sequence crosses the boundary between the end of one spine and the
start of another spine.
Spine Terminology
The following figure shows names for various components of
Humdrum data structure. It is useful to know this terminology and
how it relates to serialize options.
- A column is a series of tokens (cells) occurring
at a fixed field count (using tab characters as the field
separator) from the beginning of a line. Humdrum spines
can have a variable column width, so a fixed column position does
not always have
the same meaning as a spine of data. The command-line tool cut can be used to
extract a column from a Humdrum file (which is not a common thing to do).
- A spine of data is shown in the second part of the above
figure. A spine's starting column may shift depending on how many columns
the previous spines on the line take up, and the width of a spine may
be more than one column wide. The extractx
program can be used to extract a particular spine or set of spines.
The process of extracting spines is more complicated than the
process of extracting columns which cut does. Spine manipulators:
**, *^, *v, *x, *+, and
*- can
alter the column position and token width of spines on each line.
- As just alluded to, spines can split into more than one column (by
using the *^ spine manipulator), and these multiple columns are
called sub-spines. The above example has a section with two sub-spines
which merge back into a single-column spine after the *v manipulators.
The -S option will place all sub-spine data into a single column
in the output. Each sub-spine on the row will follow on a separate line
so that the data is interleaved. The -p option also collapses
sub-spines into a single spine, but secondary spines are placed after the primary spine (the first sub-spine when a spine is split).
- Data fields within spines are called tokens. The Humdrum
file format does not have a formal specification for multiple items within
a single token. However, different data types defined within spines may
have a sub-token formalization. For example, **kern
data can have multiple-stops (chords) containing several pitches within
a single token. A C major chord in one token can be represented as
"4c 4e 4g". Each of the three pitches are separated from the
other by a single space character. The top illustration of sub-tokens
on the far right in the above figure shows an example of using another
character such as a colon (:) to separate the sub-tokens.
The -c option will split sub-tokens into tokens on successive
lines in the output data. By default, the sub-token separator is a space
character (to match the sub-token separator definition for
**kern). But the sub-token separator can be set to any other
single character by adding the -s option.
Extracting data from specific data types
By using the -i option, you can selectively extract a
particular exclusive interpretation spine set from the input data.
The following example serializes data from the input file which
uses the **a exclusive interpretation. The name of
the exclusive interpretation can optionally start with the formal
two stars (**), or they can be omitted. If you include
the stars, then you must also enclose the name in single quotes
so that the command-line interpreter does not parse the name as
a file (which would then interpret the * as a
wildcard character).
The context command cannot handle multiple spines serialized as
shown in the above example, but instead requires only a single spine
(without sub-spines) to occur in its input. The -m option can
be used to merge multiple serialized spines into a single data stream.
The exclusive interpretations do not need to match. The exclusive
interpretation of the first spine found in the input data will be used
in the output data.
Sub-spine serialization
To remove sub-spines (columns which both share the same
initial exclusive interpretation declaration) in addition to serializing parallel
spines, use the -S option. By default sub-spines are not merged into
single-column spine (shown on left below).
However, if you add the -S option, sub-spine data will be interleaved
row-by-row (shown on right below).
An alternate form of sub-spine serialization can be done by adding the
-p option. This will cause secondary sub-spines to be serialize
after all of the primary sub-spine has been processed. It is useful
to add the -t option to add a non-sequential tag string in the
output data so that non-sequential "sequences" can be filtered from the output
of context.
serialize -mp input | rid -i
| |
serialize -mptX input | rid -i
|
Chord-note serialization
Using the -c option will also serialize chord-notes (multiple stops):
serialize -c input > output
If an exclusive interpretation data type uses a different separator than space for separating multiple elements, the -s option can be used to specify the different separator character:
serialize -c -s ':' input > output
When the chord-note serialization option is used without also
specifying the sub-spine option, there will be cases where the output
from serialize is not a valid Humdrum file. Invalid Humdrum
file format data will be output if chords are present in any subspine:
serialize -c input > output
If you use the -c option with the -S or -p
options, the output will always have a valid structure. Also, if there
are no chords during sections of the spine which contains no sub-spines,
the -c option will also generate valid Humdrum file syntax in
the output.
Sub-token selection
Three options (-f, -l and -n) can be used
to extract particular sub-tokens within the data. For example, if you
are interested in note-to-note transitions of the top or bottom note
in a chord, use the -f option to select the first sub-token
in the data (typically the bottom note in a chord), or the -l option
to select the last sub-token in the data (typically the top note in a
chord).
The -n option can be used to select a sub-token located at a fixed
order from the start or end of the sub-token list. Using negative indexes with -n will reference against the end of the token, while postive indexes
will reference against the start of the token. The -f option
is equivalent to "-n 1" and the -l option is equivalent
to "-n -1". If the requested sub-token does not exist in the
data, a null token will be output for that token. Here are examples of
extracting the second sub-token from the front or back of the data tokens:
EXAMPLES
Below is some Humdrum data which contains a mixture of
**kern and **dint (diatonic interval) data.
Suppose that you want to extract a list of the most common three-interval
sequences from this data. In order to do this, the -i dint
option should be used to select only the **dint spines.
Also, -m should be used to merge the individual serialized spines,
and -t X should be used to add an X tag at the end of each
original spine's data so that invalid context sequences across spine boundaries
can be filtered out.
Running the following command generates a histogram of three-interval sequences
sorted by the most common patterns first:
serialize -i dint -m -t X | context -n 3 | rid -GLId \ | grep -v X | sort | uniq -c | sort -nr
The most common three-interval sequence is +2 +2 +2 (up three
seconds) which occurs 5 times in the data. Here is the meaning of each
segment of the above command string:
serialize -i dint -m -t X |
Extract the **dint spines into a single merged column, placing an "X" token after the end of each original spine's data.
|
context -n 3 |
Use the Humdrum Toolkit command
context to
join overlapped triplets of data into a single token.
|
rid -GLId |
Remove Humdrum-file structure, leaving only non-null Humdrum data.
|
grep -v X |
Remove any lines which have the character X on it. This is the
tag string which marks the end of the spine. Any sequence generated
by context which has this tag is crossing the boundary between
the end of one spine and the start of another spine, so that particular
sequence should be removed.
|
sort |
Sort input lines into alphabetical order.
|
uniq -c |
Remove duplicate adjacent lines, counting how many adjacent lines
were identical in input (-c).
|
sort -nr |
Sort lines of input into reverse (-r) numerical order
(-n).
|
Counting harmonic intervals
Serialize can be used to process hint output so that
individual intervals can be counted to generate a histogram of intervals.
In the following J.S. Bach chorale, the most common simultaneous note attack interval is a minor third, occurring 27 times (excluding repeats).
humcat h://371chorales/chor021.krn | hint -a | serialize -c | \
egrep -v "^=|^-" | rid -GLId | sort | uniq -c | sort -nr > output
humcat ... |
Download the data for J.S. Bach's Chorale 21 (Breitkopf & Härtel numbering).
|
hint -a |
Identify harmonic intervals in the music, examining the relations between
all simultaneous note permutations. Only simultaneous note attacks are
compared.
|
serialize -c |
Split sub-tokens in hint output into separate tokens.
|
egrep -v "^=|^-" |
Remove barlines and single-note intervals.
|
rid -GLId |
Remove Humdrum data structure, leaving only non-null data.
|
sort |
Sort data lines alphabetically.
|
uniq -c |
Remove duplicate adjacent lines, counting the number of duplicates.
|
sort -nr |
Sort lines numerically in reverse order.
|
The following command sequence can be used to create a histogram of
harmonic intervals weighted by the duration of the interval in the score.
In the previous example, only simultaneous note attacks are compared
(including ending notes of ties), but the following process will not have
this limitation. The following example uses ditto to
fill in null tokens with the pitch that they represent. Timebase
is used to make each line of data have the same duration (sixteenth notes in
this case). Therefore, the output histogram lists the duration
of each interval in the music in units of sixteenth notes.
humcat h://371chorales/chor021.krn | timebase -t 16 | ditto | hint -a | \
serialize -c | ridx -GLIMd | sort | uniq -c | sort -nr > output
In this case the most common harmonic interval is an octave which
occurs between 124 sixteenth notes (31 quarter notes) duration in
the score. Minor thirds are only slightly less common in terms of
duration at 122 sixteenth notes (30.5 quarter notes).
Here is a description of each command in the command sequence for creating
the above data:
humcat ... |
Download the data for J.S. Bach's Chorale 21 (Breitkopf & Härtel numbering).
|
timebase -t 16 |
Make each data line equivalent to a sixteenth note.
|
ditto |
Replace null-tokens with the data token which they represent.
|
hint -a |
Identify harmonic intervals in the music, examining the relations between
all simultaneous note permutations. Only simultaneous attacks are
compared.
|
serialize -c |
Split sub-tokens in hint output into separate tokens.
|
ridx -GLIMd |
Remove Humdrum data structure and barlines, leaving only non-null data.
|
sort |
Sort data lines alphabetically.
|
uniq -c |
Remove duplicate adjacent lines, counting the number of duplicates.
|
sort -nr |
Sort lines numerically in reverse order.
|
More example usages of the serialize program are available on the serialize examples page
SEE ALSO
BUGS
Be careful when using -c without the -S or -p option, since
invalid Humdrum file syntax may be output from the program.
DOWNLOAD
The compiled serialize program can
be downloaded for the following platforms:
- Linux (i386 processors)
(dynamically linked) compiled on 16 Apr 2013.
- Windows compiled on 29 Jun 2012.
- Mac OS X/i386 compiled on 13 Nov 2013.
The source code for the program was last modified on 1 Apr 2013. Click here to go to the full source-code download page.
|