Humdrum Extras

theloc manpage


COMMAND

    theloc -- Identify note locations within Humdrum files from themax search results.

SYNOPSIS

    [cat input theloc [options[input[> output]

OPTIONS

-a Display the absolute quarter-note position of the match location in the original score (number of quarter notes from beginning of file to match location starting position).
-B Suppress metric position information for start/end note of match.
-c Display column in file where start/end match note was found, where each column is a spine token separated from each other by a tab character (offset from 1).
-D Remove directory name from input index filename.
-G Grace notes are to be ignored in input note numbers. Use this option if -G was used in tindex when creating the search index.
-l Display line in file which match was found.
-m Mark matches in the input file, starting at the first and continuing the marks to the end of the match (if given).
-M Don't echo the measure number of measure where the match was found.
-N Don't echo the original note number used to locate notes in the original score.
-p string List of directory paths to search for Humdrum source files.
-P Display percent location of the match in the total duration of the score.
-q Display the quarter-note duration from start of bar for the match location.
-r Display metric location as a rational number rather than a floating-point number.
--all Display all location descriptors (equivalent to -claq).
--file string Use the given filename instead of extracting the filename from the output from themax.
--fixedmark # Mark matches in the input file, starting at the first note of the match, and marking a total of # notes.
--matchlist List matches in a global comment at the top of Humdrum files produced by the -m option.
--mchar string Set the marking character used with the -m option.
--tie Display match markers on medial and ending tied notes as well as the start of a tied group of notes.

DESCRIPTION

    The theloc command takes search-result output from themax when using the --loc or --locstart option, and converts the matched note numbers into locations within the original Humdrum file which was indexed with tindex. The identified location can be reported as a line and column number in the file, the measure and beat number within the measure, or the duration in quarter notes from the start of the file to the start of the matched notes' location.

    As an example, The pitch sequence "F♯, B♭" occurs once in the following song. Using the set of three programs: tindex, themax and theloc, the location of that sequence can be identified in the following file:

    First, a search index entry for the song has to be created with the tindex command:

    tindex springsong.krn > springsong.thema

    The first component of a thema index line is typically the filename which was indexed (but this can be changed according to options given to tindex. Following the filename is a colon, followed by an arbitrary label (such as for storing the instrument name when searching orchestral scores). In this case the label is an empty string. Next is another colon, followed by the spine number (and followed by a period (.) as well as a subspine layer number if the subspine is not the first one on a line) which was indexed in the data that following on the same line. In this case the --mono option to tindex can also be used to index a monophonic file (where there are no sub-spines, and the first column in the data is the **kern spine). In the --mono case, only the filename will be encoded without the extra field information or colons.

    The theloc program requires the first field in the thema index data to be the filename, and it also requires the spine/sub-spine information (if the file is not monophonic). It uses these two pieces of information plus a note number within a spine in the file to locate its position in the score. The theloc command also requires access to the originally indexed file. If the filename includes a pathname, it will search in that directory location there for the file (or in the current working directory if the -D option is given). A list of directories in which to search for the file can be provided to theloc with the -p option.

    After creating a thema index with tindex, the themax command can be used to search for the pitch sequence within the index. Adding the --loc (or long-form --location) option will print the note number(s) within the indexed pitch sequence where the match occurs. In the following example command pipeline, the output of tindex is piped directly into the themax command. When a large number of files need to be searched, save the output from tindex to a file which can be read later into the themax command. But in this case, the amount of data which tindex needs to process is small, so the indexing process is very fast, and intermediate storage of the index data is not necessary.

    tindex input | themax -p "f# b-" --loc

    The above output from themax means that the 28th note in springsong.krn is at the start of the matched sequence, and the 29th note in the file is the last note which is part of the match. In this case there are two pitches in the search query, so the match results are expected to have a length of two notes, but when using wildcards in the search query, variable-length matches may occur. If there were more matches of the query sequence within the song, other note locations would have been listed on the line, each match separated from others by a space character.

    The tindex command does not encode the position of each note within the original file (although a somewhat reasonable reconstruction could be made using the rhythmic features). Therefore, the themax program only knows the position of a match with the search-index feature sequences, not the position of the note within the score. To locate the score position of the match within the original file, you can count notes in the file up to the match (the 28th note in this case), but that would be rather tedious. This is where the theloc program should instead be used to automate the identification of specific locations for notes within a score. Here is the default output from theloc after receiving the above data from themax:

    tindex input | themax -p "f# b-" --loc | theloc

    After the original note numbers 28-29 which are output from themax, the theloc program adds two pieces of information: "=13" which means measure 13, and "B2.5" which means beat 2.5 within measure 13, which is the second eighth-note of the second beat in the measure. Likewise, note 29 occurs in measure 13 on beat 3 of the measure.

    The original note number can be removed for the location information by using the -N option:

    tindex input | themax -p "f# b-" --location | theloc -N

    A second form of output from the theloc program is the original data with marker characters indicating which notes were part of the match results. By default, matched searches in the original data file can be marked with a '@' character by using the -m (or long-form --mark) option:

    tindex input | themax -p "f# b-" --loc | theloc -m

    The marked data file can be perused in a text editor, searching for the mark character. The mark character can be used to display the matched notes in a color highlighting, such as with hum2abc. Additionally, the myank program is aware of marked notes and can extract any measures containing marked notes. See the mark section for more information and options relating to marking the original data.

    Location options

    By default, theloc will output the original note number, then without spaces, the character "=" followed by the measure number and finally "B" followed by the beat number within the measure. To suppress the note number from being echoed in the output of theloc, use the -N option.

    To display the line on which the match occurs, use the -l option, and to display the column (token number on the line), use the -c option. Both the -l and -c option start counting lines/columns by 1. Giving the -l or -c, the measure and beat number are still listed. To turn off either the measure number or beat number use -M and/or -B respectively.

    There are two additional location descriptions which can be returned by theloc. To view the quarter note duration between the match location and the last downbeat, use the -q option. This is similar to the beat location, but is different for meters which are not simple meters with a quarter note beat. Both -q and the beat number start at a value of 1 on the downbeat (start of the measure). Also, use the -a option to display the absolute beat location in terms of quarter notes since the start of the music. All location descriptions can be displayed in the output by using the --all option (be careful not to type this option with a single dash (-all) since that means to use the -a and -l options.

    tindex input | themax -p "f# b-" --location | theloc --all

    The character codes in the location information:

    L Line number -l to show
    C Column number -c to show
    A Absolute beat -a to show
    P Percent -P to show
    = Measure number -M to suppress
    B Beat number -B to suppress
    Q Quarter-note location in measure -q to show

    Note that for measure numbers to be displayed properly, they have to be explicitly present in the original Humdrum file. If the original files does not have measure numbers, use the barnum command to add them; otherwise, do not display the measure number location (-M), or the beat location (-B).

    Starting/Ending Location options

    When the themax program is given the --location2 option, both the start and ending note for the match will be given (--location only gives the starting note of the match). The starting and ending notes are separated by a dash, and no spaces. When the theloc program processes the data, it will automatically detect if an ending location is given, and provide similar information as it does for the string note (see the above section). Compare the following command pipeline to the similar one from the last section:

    tindex input | themax -p "f# b-" --location2 | theloc --all

    In this case, the match start on note 28 of the file and ends on note 29 (the first number after the dash character. Note 28 occurs on line 61, column 1, absolute beat 39, measure 13, beat 2.5 (2.5th quarter note of measure). Note 29 occurs on line 62, column 1, absolute beat 39.5, measure 13, beat 3 (3rd quarter note into the measure).

    The themax program can search by multiple features at the same time. In order for a match to be detected, all features must match starting at the same note. However, the ending note is defined by the longest match on any particular sub-feature which was searched for. For example, here is a search for the note C followed by a rising perfect fourth. The search for the pitch "C" is one note long, while the search for the perfect fourth involves two notes (the C and the F which follows it).

    tindex input | themax -p "c" -I "+P4" --location2 | theloc --all

    Search path

    The theloc program requires the original file which was indexed in order to identify the locations of matches within the file. The location of the file can be stored within the index entry, or a lists of paths to search can be given with the -p option. Each directory path in the string following -p is followed by a colon (:), with no spaces included.

    If you want to prevent theloc searching for files from the pathname attached to the filename in the index data, use the -D option.

    Grace notes

    If the search index created with tindex did not include grace notes in the extracted features (by using the -G option), then you must also use the -G option with theloc in order for the correct locations of notes to be identified. If you use the tindex -Q option and you use the themax -Q option, then the -G option to theloc will not be required since the embedded "#NOGRACE" message will be passed to the theloc command via the input data.

    Marking search location within original data

    The matched searches in the original data file can be marked with a '@' character by using the --mark option:

    tindex input | themax -p "f# b-" --loc | theloc --mark 2

    Changing mark character

    The default character for marking matches is the "@" character. By using the --mchar option, any character can be used to mark the search matched notes. Allowable marks include: i, l, N, U, V, Z, +, |, <, and >.

    tindex input | themax -p "f# b-" --loc | theloc -m --mchar i | myank

    An optional color parameter may follow the marking character in the --mchar option. This color parameter can be used to color the marked notes in graphical music display programs (such as abcm2ps via hum2abc).

    tindex input | themax -p "f# b-" --loc \
    | theloc -m --mchar "i color=008100" | myank

    Beat position as a rational number

    To express the beat number as a rational fraction (for avoiding round-off errors in tuplet rhythms), use the -r option. This will convert 2.5 into 2+1/2 in the example case:

    tindex input | themax -p "f# b-" --location | theloc -r

EXAMPLES

    Marking thematic motives in Beethoven piano sonata no.1, mvmt. 1

    As an example usage of the theloc command, search for the opening motives of the first and second themes in the first movement of Beethoven's piano sonata no. 1 in F minor and highlight them in a musical score, using red notes for the theme 1 motive and blue notes for the theme 2 motive. Here is the opening of the first theme and the other locations within the movement where this motive occurs:

    There are two basic feature queries which could be done in this case to find all occurrences of this motive in the music. In terms of pitch, the best feature is refined contour which categorizes melodic intervals as steps, leaps, repetitions, and interval directions. Another feature which would match to all of the theme 1 motives is the durations of the notes. Two complications occur in the motive: (1) sometimes the first note of the motive is dropped, and (2) a grace note occurs within the 5th occurrence of the motive.

    The pitch refined contour query to use in themax is: -c "U?UUUUdddu". The question mark is a wildcard character which indicates that the first interval of an upward leap is optional in the search. Note that absolute pitch names cannot be used in this case since the motive starts on different pitches: C4, (D4), G2, E♭4, (F4), (C4), (D4) and C2. Also, 12-tone intervals cannot be used since the motive is played in a minor key in the exposition and recapitulation, but it is played in a major key in the development section which switches major and minor thirds. Also, the motive is played in the major dominant chord as part of the first theme. Diatonic intervals work to a certain extent, but the sequence of thirds and fourths differs between each version of the motive. Thus the pitch refined contour is the most effective in this case.

    The rhythmic duration query to use in themax is: -u "4? 4 4 4 4 4. 24 24 24 4". Again, the question mark is a wildcard character which indicates that the first interval of an upward leap is optional in the search.

    Either the pitch or rhythm query on their own will find all occurrences of the full motive in this particular example. In certain cases, you might need to search pitch and rhythm searches at the same time. This can be don in themax by using both the -c and -u queries at the same time, or using the -q option to interleave these two features into a single query string.

         -q "u:c 4?:U? 4:U 4:U 4:U 4:U 4.:d 24:d 24:d 24:u 4"

    The grace note in the development motive is best dealt with by removing grace notes from the search index. To do this, use the -G option when tindex creates the thema index. When using theloc, the -G option must also be given if it was used in tindex. However by default, the tindex program will transmit the control message "#NOGRACE" in the thema index which will be echoed by themax and passed on to the theloc input along with the search results. When theloc sees this message in its input data, it will automatically set the -G option.

    Listed below are the starting measure numbers for occurrences of the motive from the first theme. The initial tag "sonata01-1.krn::1" means that the measure numbers listed on the line are for the matches occuring in the first spine found in the file sonata01-1.krn (the bottom staff of the piano part). Likewise, the second line contains a list of measure numbers where the match occurs in the second spine of the datafile (the top staff). The --locstart option for themax outputs the starting note numbers where matches are found in the index data. The -N option of theloc is used to suppress echoing of these note numbers, and the -B option supresses printing of the beat location in the measure where the match starts.

        tindex -G sonata01-1.krn | themax -c "U?UUUUdddu" --locstart | theloc -NB 

    The following command using rhythms as the search query will identify the same areas, with a slight problem in identifying the starting note of one of the motives (measure 100/101) because the preceeding note, which is not actually part of the motive, happens to be a quarter note, so the ? wildcard absorbs it into the match.

     tindex --rest -G sonata01-1.krn \
        | themax -u "4? 4 4 4 4 4. 24 24 24 4" --locstart | theloc -NB 

    These matches can be marked in the original Humdrum **kern score by using the -m option with theloc. By default, the at character (@) will be added to each note which was part of a match. In order for theloc to mark the entire matched sequence, the --loc option must be given to themax instead of --startloc.

     tindex -G sonata01-1.krn | themax -c "U?UUUUdddu" --loc \
        | theloc -m  > theme1.krn 

    At the end of the data, the following line is added:

    !!!RDF**kern: @= matched note
    This reference record is understood by the hum2abc command which converts Humdrum data into ABC Plus data for printing with abcm2ps. Click on the PDF file link at the end of the following command pipeline to view the score with the theme 1 motive highlighted.
     tindex -G sonata01-1.krn | themax -c "U?UUUUdddu" --loc \
        | theloc -m  | hum2abc | abcm2ps - -O - | ps2pdf - - > theme1.pdf
    Searching for the theme 2 motive is done in a similar manner using diatonic intervals in this case (so as to catch any possible modal variations in the theme):
     tindex -G sonata01-1.krn | themax -I "-2 -2 -3 -3 -2? -2? +4?" --loc \
        | theloc -m  | hum2abc | abcm2ps - -O - | ps2pdf - - > theme2.pdf
    The highlighting color can be made different for both the first and second themes by using a different marker for each theme, and specifying a different color for each marker. In this case the default marker of "@" is used for the first theme motive, and "+" is used for the second theme motive.
     tindex -G sonata01-1.krn | themax -c "U?UUUUdddu" --loc \
        | theloc -m  > theme1.krn
     
     tindex -G theme1.krn | themax -I "-2 -2 -3 -3 -2? -2? +4?" --loc \
        | theloc -m  --mchar "+ color=#0000ff" > theme1and2.krn
     
     hum2abc theme1and2.krn | abcm2ps - -O - | ps2pdf - - > theme1and2.pdf
     

    Multiply-marked notes

    If you need to see clearly where matches start, add the --double option to theloc. This will cause a double mark to be printed at the start of the match. One use for the double mark is to make it easier to search for the start of matches in the original Humdrum data. The hum2abc program will translate the second mark character in a note (sub)token into a circle around the notehead.

     tindex -G sonata01-1.krn | themax -c "U?UUUUdddu" --loc \
        | theloc -m --double > double1.krn
     
     tindex -G double1.krn | themax -I "-2 -2 -3 -3 -2? -2? +4?" --loc \
        | theloc -m  --mchar "+ color=#0000ff" > double12.krn
     
     hum2abc double12.krn | abcm2ps - -O - | ps2pdf - - > double12.pdf 
    With the doubly-marked match starts, the individual instances of the second theme motive become clearer in the notated music.

    Multiple searches can overlap. The first mark on the note (leftmost in the token) will color the notehead when using hum2abc to display the notated music. Subsequent marks on the same note will be rendered as colored circles around the note, and each additional mark causes another circle around the notehead with a larger diamaeter than the previous one. Here is an example which colors the first theme motive red, the second theme motive blue, and the last half of the first theme motive in green. This example also adds a comment in the !!!RDF reference record which explains the function of the mark.

     tindex -G sonata01-1.krn | themax -c "U?UUUUdddu" --loc \
        | theloc -m --double \
        --mchar "@ color=#ff0000 Theme 1 motive" > double1.krn
     
     tindex -G double1.krn | themax -I "-2 -2 -3 -3 -2? -2? +4?" --loc \
        | theloc -m --double \
        --mchar "+ color=#0000ff Theme 2 motive" > double12.krn
     
     tindex --rest -G double12.krn | themax -u "4.? 24 24 24 4 4?" --loc \
        | theloc -m  --double \
        --mchar "Z color=#33aa33 Theme 1 rhythmic sub-motive" > double12b.krn
     
     hum2abc double12b.krn | abcm2ps - -O - | ps2pdf - - > double12b.pdf 

    The --rest option to tindex is used in the last search to prevent matching across rests.

    Yanking measures containing marked notes

    The myank program is aware of the !!!RDF marking convention, and it can be used to extract all measures in a file which contain notes marked as search results.

     tindex -G sonata01-1.krn | themax -c "U?UUUUdddu" --loc \
        | theloc -m | myank --double > theme1extract.krn 
    The --double option used with myank places a double barline between each segment of the score which is non-consecutive. Printing the resulting extracted measures:
        hum2abc theme1extract.krn -n 1 | abcm2ps - -O - | ps2pdf - - > t1ex.pdf
    Doing the same for the second themes:
     tindex -G sonata01-1.krn | themax -I "-2 -2 -3 -3 -2? -2? +4?" --loc \
        | theloc -m --double --mchar "+ color=#0000ff" \
        | myank --double > theme2extract.krn 
     hum2abc theme2extract.krn -n 1 | abcm2ps - -O - | ps2pdf - - > t2ex.pdf

    The file theme2extract.krn needed to be edited to remove a dangling tie cause by a break in the music, since abcm2ps complained about that tie.

SEE ALSO

    • tindex -- creates search index entries from Humdrum files containing **kern data.
    • themax -- searches index for matches. If the --location option is used, the data can be processed by the theloc program to identify the location of the match in the original Humdrum file.
    • hum2abc -- understands marks which are output from theloc when using the -m option.
    • myank -- understands marks which are output from theloc when using the -m option.

DOWNLOAD

    The compiled theloc program can be downloaded for the following platforms:
    • Linux (i386 processors) (dynamically linked) compiled on 7 Apr 2013.
    • Windows compiled on 29 Jun 2012.
    • Mac OS X/i386 compiled on 13 Nov 2013.

    The source code for the program was last modified on 24 Feb 2011. Click here to go to the full source-code download page.