| serialize manpage |   
 
COMMAND
 serialize -- Rearrange multiple input spines into a single spine sequence. 
SYNOPSISserialize [-m] [-i interp] [-c [-s char]] [[-p] [-S]] [input] [> output] 
OPTIONS
| ⓘ | -c |  | 
Serialize chord notes (sub-tokens) in addition to spines 	See also the -s option.
 |  
| ⓘ | -f |  | 
Preserve only the first sub-token in a data token 	     See also the -l and -n options.
 |  
| ⓘ | -i interp |  | 
Process only input spines with the given exclusive interpretation.
 |  
| ⓘ | -m |  | 
Merge adjacent serialized spines in output so that there     	     is only one exclusive interpretation at the top of the data 
 	     and only one data terminator at the end.
 	     If different data types are being serialized, only the exclusive 
 	     interpretation of the first spine will be used.
 |  
| ⓘ | -l |  | 
Preserve only the last sub-token in a data token 	     See also the -f and -n options.
 |  
| ⓘ | -n # |  | 
Preserve only the nth sub-token in a data token 	     See also the -f and -l options.
 |  
| ⓘ | -p |  | 
Serialize sub-spines by placing secondary spines after 	     the primary one.
 |  
| ⓘ | -s char |  | 
Use the given character as a separator for multi-stop token               serialization with the -c option.
 |  
| ⓘ | -S |  | 
Serialize sub-spines by interleaving multiple columns               of the same spine.
 |  
| ⓘ | -t tag |  | 
Tagging string to place at end of each spine's data. 	     When the -p option is used, also place the tag
 	     string between non-contiguous sub-spine data.
 |  
 
 
DESCRIPTION
 
 The serialize program is designed to facilitate data input into
 the context
 command which requires Humdrum data to contain only one spine.
 Serialize will merge data from multiple spines, sub-spines and sub-tokens
 into a single output spine according to various options which are
 described below.
 
  Suppose that you want to extract three-note sequences from the 
 following data:
  
     
 
 
  One possible method for extracting the sequences would be to
 extract each spine and process separately with context:
  
 
 
 
 
 
 Serialize can be used to merge these two spines into a single spine
 which can then be processed with the 
 context 
 command, giving similar results without the need for temporary files:
 
  
 
 
 
 
 
 | 
    
     serialize 
       
     | 
  | 
 
 
    
     serialize -mtX
       
     | 
  | 
 
 
    
 serialize -mtX | context -n3 \ | rid -d | grep -v X
       
    
  |   
 
 
 
 Without any options, all spines will be placed one after another in
 their original order from left to right in the input Humdrum data (shown
 in the first column of the above examples).  Adding the -m
 option will merge the individual serialized spines into a single spine
 (regardless of whether they are actually of the same data type), and
 the option -t X places the string "X" at the end of
 each original spine's data in the merged output.
 
 
 A use of serialize with 
 context
 is given in the last column of the 
 above examples.  Context is used to extract triplets of notes.
 Using the rid command with the -d option removes null tokens (.)
 from the data, and grep -v X means remove any line in the
 data which contains an X.  Any output line from context
 which contains an X in this case is invalid, since that would mean
 the sequence crosses the boundary between the end of one spine and the
 start of another spine.
 
 
 
   Spine Terminology 
 
  The following figure shows names for various components of 
 Humdrum data structure.  It is useful to know this terminology and 
 how it relates to serialize options.
 
  
      
 
 
 
 -  A column is a series of tokens (cells) occurring
 at a fixed field count (using tab characters as the field
 separator) from the beginning of a line.  Humdrum spines
 can have a variable column width, so a fixed column position does 
 not always have
 the same meaning as a spine of data.  The command-line tool cut can be used to
 extract a column from a Humdrum file (which is not a common thing to do).
 
  
 
  -  A spine of data is shown in the second part of the above
 figure.  A spine's starting column may shift depending on how many columns
 the previous spines on the line take up, and the width of a spine may
 be more than one column wide.  The extractx
 program can be used to extract a particular spine or set of spines.
 The process of extracting spines is more complicated than the 
 process of extracting columns which cut does.  Spine manipulators:
 **, *^, *v, *x, *+, and 
 *- can
 alter the column position and token width of spines on each line.
 
  
 
  -  As just alluded to, spines can split into more than one column (by 
 using the *^ spine manipulator), and these multiple columns are 
 called sub-spines.  The above example has a section with two sub-spines
 which merge back into a single-column spine after the *v manipulators.
 
 
 The -S option will place all sub-spine data into a single column
 in the output.  Each sub-spine on the row will follow on a separate line
 so that the data is interleaved.  The -p option also collapses
 sub-spines into a single spine, but secondary spines are placed after the primary spine (the first sub-spine when a spine is split).
  
 
   -  Data fields within spines are called tokens.  The Humdrum
 file format does not have a formal specification for multiple items within
 a single token.  However, different data types defined within spines may
 have a sub-token formalization.  For example, **kern
 data can have multiple-stops (chords) containing several pitches within
 a single token.  A C major chord in one token can be represented as
 "4c 4e 4g".  Each of the three pitches are separated from the
 other by a single space character.  The top illustration of sub-tokens
 on the far right in the above figure shows an example of using another
 character such as a colon (:) to separate the sub-tokens.
 
 
 The -c option will split sub-tokens into tokens on successive
 lines in the output data.  By default, the sub-token separator is a space
 character (to match the sub-token separator definition for 
 **kern). But the sub-token separator can be set to any other 
 single character by adding the -s option.
 
    
 
 
 
  
  Extracting data from specific data types 
 
 By using the -i option, you can selectively extract a 
 particular exclusive interpretation spine set from the input data.
 The following example serializes data from the input file which
 uses the **a exclusive interpretation.  The name of
 the exclusive interpretation can optionally start with the formal
 two stars (**), or they can be omitted.  If you include
 the stars, then you must also enclose the name in single quotes
 so that the command-line interpreter does not parse the name as
 a file (which would then interpret the * as a 
 wildcard character).
 
 
 
 
 
 
  
 The context command cannot handle multiple spines serialized as
 shown in the above example, but instead requires only a single spine
 (without sub-spines) to occur in its input.  The -m option can
 be used to merge multiple serialized spines into a single data stream.
 The exclusive interpretations do not need to match.  The exclusive
 interpretation of the first spine found in the input data will be used
 in the output data.
 
 
 
 
 
 
 
  
 Sub-spine serialization
 
 To remove sub-spines (columns which both share the same 
 initial exclusive interpretation declaration) in addition to serializing parallel 
 spines, use the -S option.  By default sub-spines are not merged into 
 single-column spine (shown on left below).
 However, if you add the -S option, sub-spine data will be interleaved
 row-by-row (shown on right below).
 
 
 
 
  
 
 
 
 
  
  An alternate form of sub-spine serialization can be done by adding the
 -p option.  This will cause secondary sub-spines to be serialize
 after all of the primary sub-spine has been processed.  It is useful
 to add the -t option to add a non-sequential tag string in the 
 output data so that non-sequential "sequences" can be filtered from the output 
 of context.
 
 
  
  
 
 
 
 
 serialize -mp input | rid -i 
    
 
 
 
  |  | 
 
 
 
 serialize -mptX input | rid -i 
    
 
 
  |   
 
 
 
 
  
 Chord-note serialization
 
 Using the -c option will also serialize chord-notes (multiple stops):
 
 
 serialize -c input > output 
    
 
 
  
 
 If an exclusive interpretation data type uses a different separator than space for separating multiple elements, the -s option can be used to specify the different separator character:
 
 
  
 serialize -c -s ':' input > output 
    
 
 
 
 
 When the chord-note serialization option is used without also
 specifying the sub-spine option, there will be cases where the output
 from serialize is not a valid Humdrum file.  Invalid Humdrum
 file format data will be output if chords are present in any subspine:
 
  
 serialize -c input > output 
    
 
 
 
 If you use the -c option with the -S or -p
 options, the output will always have a valid structure.  Also, if there
 are no chords during sections of the spine which contains no sub-spines,
 the -c option will also generate valid Humdrum file syntax in
 the output.
 
  
  
  
   Sub-token selection 
 
 Three options (-f, -l and -n) can be used
 to extract particular sub-tokens within the data.  For example, if you
 are interested in note-to-note transitions of the top or bottom note
 in a chord, use the -f option to select the first sub-token
 in the data (typically the bottom note in a chord), or the -l option
 to select the last sub-token in the data (typically the top note in a 
 chord).
 
 
  
 
 
  The -n option can be used to select a sub-token located at a fixed
 order from the start or end of the sub-token list.  Using negative indexes with -n will reference against the end of the token, while postive indexes
 will reference against the start of the token.  The -f option 
 is equivalent to "-n 1" and the -l option is equivalent
 to "-n -1".  If the requested sub-token does not exist in the 
 data, a null token will be output for that token.  Here are examples of
 extracting the second sub-token from the front or back of the data tokens:
 
 
  
  
 
 
 
 
 
EXAMPLES
 
 Below is some Humdrum data which contains a mixture of
  **kern and **dint (diatonic interval) data.
 Suppose that you want to extract a list of the most common three-interval
 sequences from this data.  In order to do this, the -i dint
 option should be used to select only the **dint spines.
 Also, -m should be used to merge the individual serialized spines, 
 and -t X should be used to add an X tag at the end of each
 original spine's data so that invalid context sequences across spine boundaries
 can be filtered out.
 
 
  
     
 
 
 
 Running the following command generates a histogram of three-interval sequences
 sorted by the most common patterns first:
 
  
 serialize -i dint -m -t X | context -n 3 | rid -GLId \ | grep -v X | sort | uniq -c | sort -nr
  
 
 
 
 
 The most common three-interval sequence is +2 +2 +2 (up three
 seconds) which occurs 5 times in the data.  Here is the meaning of each
 segment of the above command string:
 
  
 
 serialize   -i dint   -m -t X |  
   Extract the **dint spines into a single merged column, placing an "X" token after the end of each original spine's data.   
  |  
 | context -n 3 |  
    Use the Humdrum Toolkit command 
    context to 
    join overlapped triplets of data into a single token.
  |  
 | rid -GLId |  
    Remove Humdrum-file structure, leaving only non-null Humdrum data.
  |  
 | grep -v X |  
    Remove any lines which have the character X on it.  This is the
    tag string which marks the end of the spine.  Any sequence generated
    by context which has this tag is crossing the boundary between
    the end of one spine and the start of another spine, so that particular
    sequence should be removed.
  |  
 | sort |  
    Sort input lines into alphabetical order.
  |  
 | uniq -c |  
    Remove duplicate adjacent lines, counting how many adjacent lines
    were identical in input (-c).
  |  
 | sort -nr |  
    Sort lines of input into reverse (-r) numerical order 
    (-n). 
  |  
  
  
 
  Counting harmonic intervals 
 
 Serialize can be used to process hint output so that 
 individual intervals can be counted to generate a histogram of intervals.
 In the following J.S. Bach chorale, the most common simultaneous note attack interval is a minor third, occurring 27 times (excluding repeats).
 
 
  
 humcat h://371chorales/chor021.krn | hint -a | serialize -c | \ 
      egrep -v "^=|^-" | rid -GLId | sort | uniq -c | sort -nr > output 
 
 
 
 
 
 
 | humcat ... |  
   Download the data for J.S. Bach's Chorale 21 (Breitkopf & Härtel numbering).
  |  
 | hint -a |  
    Identify harmonic intervals in the music, examining the relations between
    all simultaneous note permutations.  Only simultaneous note attacks are
    compared.
  |  
 | serialize -c |  
    Split sub-tokens in hint output into separate tokens.
  |  
 | egrep -v "^=|^-" |  
    Remove barlines and single-note intervals.
  |  
 | rid -GLId |  
    Remove Humdrum data structure, leaving only non-null data.
  |  
 | sort |  
    Sort data lines alphabetically.
  |  
 | uniq -c |  
    Remove duplicate adjacent lines, counting the number of duplicates.
  |  
 | sort -nr |  
    Sort lines numerically in reverse order.
  |  
  
  
 
 
 The following command sequence can be used to create a histogram of
 harmonic intervals weighted by the duration of the interval in the score.
 In the previous example, only simultaneous note attacks are compared
 (including ending notes of ties), but the following process will not have
 this limitation.  The following example uses ditto to
 fill in null tokens with the pitch that they represent.  Timebase
 is used to make each line of data have the same duration (sixteenth notes in
 this case).  Therefore, the output histogram lists the duration
 of each interval in the music in units of sixteenth notes.
 
 
  
  
 humcat h://371chorales/chor021.krn | timebase -t 16 | ditto | hint -a | \ 
 serialize -c | ridx -GLIMd | sort | uniq -c | sort -nr > output 
 
 
 
 
  In this case the most common harmonic interval is an octave which
 occurs between 124 sixteenth notes (31 quarter notes) duration in
 the score.  Minor thirds are only slightly less common in terms of 
 duration at 122 sixteenth notes (30.5 quarter notes).
 
 
   Here is a description of each command in the command sequence for creating
 the above data:
 
  
 
 | humcat ... |  
   Download the data for J.S. Bach's Chorale 21 (Breitkopf & Härtel numbering).
  |  
 | timebase -t 16 |  
    Make each data line equivalent to a sixteenth note.
  |  
 | ditto |  
    Replace null-tokens with the data token which they represent.
  |  
 | hint -a |  
    Identify harmonic intervals in the music, examining the relations between
    all simultaneous note permutations.  Only simultaneous attacks are
    compared.
  |  
 | serialize -c |  
    Split sub-tokens in hint output into separate tokens.
  |  
 | ridx -GLIMd |  
    Remove Humdrum data structure and barlines, leaving only non-null data.
  |  
 | sort |  
    Sort data lines alphabetically.
  |  
 | uniq -c |  
    Remove duplicate adjacent lines, counting the number of duplicates.
  |  
 | sort -nr |  
    Sort lines numerically in reverse order.
  |  
  
  
 
 
 More example usages of the serialize program are available on the serialize examples page
  
SEE ALSO
BUGS
 
 Be careful when using -c without the -S or -p option, since
 invalid Humdrum file syntax may be output from the program.
 
 
DOWNLOAD
The compiled serialize program can
be downloaded for the following platforms:
-  Linux (i386 processors)
 (dynamically linked) compiled on 16 Apr 2013.
 -  Windows compiled on 29 Jun 2012.
 -  Mac OS X/i386 compiled on 13 Nov 2013.
  
 The source code for the program was last modified on 1 Apr 2013. Click here to go to the full source-code download page.
  
 
 
 |