!body topmargin="15" leftmargin="5" alink="#ff8080" bgcolor="#e2dfc7" link="#dd0000" marginheight="5" marginwidth="5" text="#321e04" vlink="#118dcc">
As an example, The pitch sequence "F♯, B♭" occurs once in the following song. Using the set of three programs: tindex, themax and theloc, the location of that sequence can be identified in the following file:
First, a search index entry for the song has to be created with the tindex command:
The first component of a thema index line is typically the filename which was indexed (but this can be changed according to options given to tindex. Following the filename is a colon, followed by an arbitrary label (such as for storing the instrument name when searching orchestral scores). In this case the label is an empty string. Next is another colon, followed by the spine number (and followed by a period (.) as well as a subspine layer number if the subspine is not the first one on a line) which was indexed in the data that following on the same line. In this case the --mono option to tindex can also be used to index a monophonic file (where there are no sub-spines, and the first column in the data is the **kern spine). In the --mono case, only the filename will be encoded without the extra field information or colons.
The theloc program requires the first field in the thema index data to be the filename, and it also requires the spine/sub-spine information (if the file is not monophonic). It uses these two pieces of information plus a note number within a spine in the file to locate its position in the score. The theloc command also requires access to the originally indexed file. If the filename includes a pathname, it will search in that directory location there for the file (or in the current working directory if the -D option is given). A list of directories in which to search for the file can be provided to theloc with the -p option.
After creating a thema index with tindex, the themax command can be used to search for the pitch sequence within the index. Adding the --loc (or long-form --location) option will print the note number(s) within the indexed pitch sequence where the match occurs. In the following example command pipeline, the output of tindex is piped directly into the themax command. When a large number of files need to be searched, save the output from tindex to a file which can be read later into the themax command. But in this case, the amount of data which tindex needs to process is small, so the indexing process is very fast, and intermediate storage of the index data is not necessary.
The above output from themax means that the 28th note in springsong.krn is at the start of the matched sequence, and the 29th note in the file is the last note which is part of the match. In this case there are two pitches in the search query, so the match results are expected to have a length of two notes, but when using wildcards in the search query, variable-length matches may occur. If there were more matches of the query sequence within the song, other note locations would have been listed on the line, each match separated from others by a space character.
The tindex command does not encode the position of each note within the original file (although a somewhat reasonable reconstruction could be made using the rhythmic features). Therefore, the themax program only knows the position of a match with the search-index feature sequences, not the position of the note within the score. To locate the score position of the match within the original file, you can count notes in the file up to the match (the 28th note in this case), but that would be rather tedious. This is where the theloc program should instead be used to automate the identification of specific locations for notes within a score. Here is the default output from theloc after receiving the above data from themax:
After the original note numbers 28-29 which are output from themax, the theloc program adds two pieces of information: "=13" which means measure 13, and "B2.5" which means beat 2.5 within measure 13, which is the second eighth-note of the second beat in the measure. Likewise, note 29 occurs in measure 13 on beat 3 of the measure.
The original note number can be removed for the location information by using the -N option:
A second form of output from the theloc program is the original data with marker characters indicating which notes were part of the match results. By default, matched searches in the original data file can be marked with a '@' character by using the -m (or long-form --mark) option:
The marked data file can be perused in a text editor, searching for the mark character. The mark character can be used to display the matched notes in a color highlighting, such as with hum2abc. Additionally, the myank program is aware of marked notes and can extract any measures containing marked notes. See the mark section for more information and options relating to marking the original data.
Location optionsBy default, theloc will output the original note number, then without spaces, the character "=" followed by the measure number and finally "B" followed by the beat number within the measure. To suppress the note number from being echoed in the output of theloc, use the -N option.
To display the line on which the match occurs, use the -l option, and to display the column (token number on the line), use the -c option. Both the -l and -c option start counting lines/columns by 1. Giving the -l or -c, the measure and beat number are still listed. To turn off either the measure number or beat number use -M and/or -B respectively.
There are two additional location descriptions which can be returned by theloc. To view the quarter note duration between the match location and the last downbeat, use the -q option. This is similar to the beat location, but is different for meters which are not simple meters with a quarter note beat. Both -q and the beat number start at a value of 1 on the downbeat (start of the measure). Also, use the -a option to display the absolute beat location in terms of quarter notes since the start of the music. All location descriptions can be displayed in the output by using the --all option (be careful not to type this option with a single dash (-all) since that means to use the -a and -l options.
The character codes in the location information:
Note that for measure numbers to be displayed properly, they have to be explicitly present in the original Humdrum file. If the original files does not have measure numbers, use the barnum command to add them; otherwise, do not display the measure number location (-M), or the beat location (-B).
Starting/Ending Location optionsWhen the themax program is given the --location2 option, both the start and ending note for the match will be given (--location only gives the starting note of the match). The starting and ending notes are separated by a dash, and no spaces. When the theloc program processes the data, it will automatically detect if an ending location is given, and provide similar information as it does for the string note (see the above section). Compare the following command pipeline to the similar one from the last section:
In this case, the match start on note 28 of the file and ends on note 29 (the first number after the dash character. Note 28 occurs on line 61, column 1, absolute beat 39, measure 13, beat 2.5 (2.5th quarter note of measure). Note 29 occurs on line 62, column 1, absolute beat 39.5, measure 13, beat 3 (3rd quarter note into the measure).
The themax program can search by multiple features at the same time. In order for a match to be detected, all features must match starting at the same note. However, the ending note is defined by the longest match on any particular sub-feature which was searched for. For example, here is a search for the note C followed by a rising perfect fourth. The search for the pitch "C" is one note long, while the search for the perfect fourth involves two notes (the C and the F which follows it).
Search pathThe theloc program requires the original file which was indexed in order to identify the locations of matches within the file. The location of the file can be stored within the index entry, or a lists of paths to search can be given with the -p option. Each directory path in the string following -p is followed by a colon (:), with no spaces included.
If you want to prevent theloc searching for files from the pathname attached to the filename in the index data, use the -D option.
Grace notesIf the search index created with tindex did not include grace notes in the extracted features (by using the -G option), then you must also use the -G option with theloc in order for the correct locations of notes to be identified. If you use the tindex -Q option and you use the themax -Q option, then the -G option to theloc will not be required since the embedded "#NOGRACE" message will be passed to the theloc command via the input data.
Marking search location within original dataThe matched searches in the original data file can be marked with a '@' character by using the --mark option:
Changing mark characterThe default character for marking matches is the "@" character. By using the --mchar option, any character can be used to mark the search matched notes. Allowable marks include: i, l, N, U, V, Z, +, |, <, and >.
An optional color parameter may follow the marking character in the --mchar option. This color parameter can be used to color the marked notes in graphical music display programs (such as abcm2ps via hum2abc).
| theloc -m --mchar "i color=008100" | myank
Beat position as a rational number
To express the beat number as a rational fraction (for avoiding round-off errors in tuplet rhythms), use the -r option. This will convert 2.5 into 2+1/2 in the example case:
Marking thematic motives in Beethoven piano sonata no.1, mvmt. 1As an example usage of the theloc command, search for the opening motives of the first and second themes in the first movement of Beethoven's piano sonata no. 1 in F minor and highlight them in a musical score, using red notes for the theme 1 motive and blue notes for the theme 2 motive. Here is the opening of the first theme and the other locations within the movement where this motive occurs:
There are two basic feature queries which could be done in this case to find all occurrences of this motive in the music. In terms of pitch, the best feature is refined contour which categorizes melodic intervals as steps, leaps, repetitions, and interval directions. Another feature which would match to all of the theme 1 motives is the durations of the notes. Two complications occur in the motive: (1) sometimes the first note of the motive is dropped, and (2) a grace note occurs within the 5th occurrence of the motive.
The pitch refined contour query to use in themax is: -c "U?UUUUdddu". The question mark is a wildcard character which indicates that the first interval of an upward leap is optional in the search. Note that absolute pitch names cannot be used in this case since the motive starts on different pitches: C4, (D4), G2, E♭4, (F4), (C4), (D4) and C2. Also, 12-tone intervals cannot be used since the motive is played in a minor key in the exposition and recapitulation, but it is played in a major key in the development section which switches major and minor thirds. Also, the motive is played in the major dominant chord as part of the first theme. Diatonic intervals work to a certain extent, but the sequence of thirds and fourths differs between each version of the motive. Thus the pitch refined contour is the most effective in this case.
The rhythmic duration query to use in themax is: -u "4? 4 4 4 4 4. 24 24 24 4". Again, the question mark is a wildcard character which indicates that the first interval of an upward leap is optional in the search.
Either the pitch or rhythm query on their own will find all occurrences of the full motive in this particular example. In certain cases, you might need to search pitch and rhythm searches at the same time. This can be don in themax by using both the -c and -u queries at the same time, or using the -q option to interleave these two features into a single query string.
-q "u:c 4?:U? 4:U 4:U 4:U 4:U 4.:d 24:d 24:d 24:u 4"
The grace note in the development motive is best dealt with by removing grace notes from the search index. To do this, use the -G option when tindex creates the thema index. When using theloc, the -G option must also be given if it was used in tindex. However by default, the tindex program will transmit the control message "#NOGRACE" in the thema index which will be echoed by themax and passed on to the theloc input along with the search results. When theloc sees this message in its input data, it will automatically set the -G option.
Listed below are the starting measure numbers for occurrences of the motive from the first theme. The initial tag "sonata01-1.krn::1" means that the measure numbers listed on the line are for the matches occuring in the first spine found in the file sonata01-1.krn (the bottom staff of the piano part). Likewise, the second line contains a list of measure numbers where the match occurs in the second spine of the datafile (the top staff). The --locstart option for themax outputs the starting note numbers where matches are found in the index data. The -N option of theloc is used to suppress echoing of these note numbers, and the -B option supresses printing of the beat location in the measure where the match starts.
tindex -G sonata01-1.krn | themax -c "U?UUUUdddu" --locstart | theloc -NB
The following command using rhythms as the search query will identify the same areas, with a slight problem in identifying the starting note of one of the motives (measure 100/101) because the preceeding note, which is not actually part of the motive, happens to be a quarter note, so the ? wildcard absorbs it into the match.
tindex --rest -G sonata01-1.krn \ | themax -u "4? 4 4 4 4 4. 24 24 24 4" --locstart | theloc -NB
These matches can be marked in the original Humdrum **kern score by using the -m option with theloc. By default, the at character (@) will be added to each note which was part of a match. In order for theloc to mark the entire matched sequence, the --loc option must be given to themax instead of --startloc.
tindex -G sonata01-1.krn | themax -c "U?UUUUdddu" --loc \ | theloc -m > theme1.krn
At the end of the data, the following line is added:
!!!RDF**kern: @= matched noteThis reference record is understood by the hum2abc command which converts Humdrum data into ABC Plus data for printing with abcm2ps. Click on the PDF file link at the end of the following command pipeline to view the score with the theme 1 motive highlighted.
tindex -G sonata01-1.krn | themax -c "U?UUUUdddu" --loc \ | theloc -m | hum2abc | abcm2ps - -O - | ps2pdf - - > theme1.pdfSearching for the theme 2 motive is done in a similar manner using diatonic intervals in this case (so as to catch any possible modal variations in the theme):
tindex -G sonata01-1.krn | themax -I "-2 -2 -3 -3 -2? -2? +4?" --loc \ | theloc -m | hum2abc | abcm2ps - -O - | ps2pdf - - > theme2.pdfThe highlighting color can be made different for both the first and second themes by using a different marker for each theme, and specifying a different color for each marker. In this case the default marker of "@" is used for the first theme motive, and "+" is used for the second theme motive.
tindex -G sonata01-1.krn | themax -c "U?UUUUdddu" --loc \ | theloc -m > theme1.krn tindex -G theme1.krn | themax -I "-2 -2 -3 -3 -2? -2? +4?" --loc \ | theloc -m --mchar "+ color=#0000ff" > theme1and2.krn hum2abc theme1and2.krn | abcm2ps - -O - | ps2pdf - - > theme1and2.pdf
If you need to see clearly where matches start, add the --double option to theloc. This will cause a double mark to be printed at the start of the match. One use for the double mark is to make it easier to search for the start of matches in the original Humdrum data. The hum2abc program will translate the second mark character in a note (sub)token into a circle around the notehead.
tindex -G sonata01-1.krn | themax -c "U?UUUUdddu" --loc \ | theloc -m --double > double1.krn tindex -G double1.krn | themax -I "-2 -2 -3 -3 -2? -2? +4?" --loc \ | theloc -m --mchar "+ color=#0000ff" > double12.krn hum2abc double12.krn | abcm2ps - -O - | ps2pdf - - > double12.pdfWith the doubly-marked match starts, the individual instances of the second theme motive become clearer in the notated music.
Multiple searches can overlap. The first mark on the note (leftmost in the token) will color the notehead when using hum2abc to display the notated music. Subsequent marks on the same note will be rendered as colored circles around the note, and each additional mark causes another circle around the notehead with a larger diamaeter than the previous one. Here is an example which colors the first theme motive red, the second theme motive blue, and the last half of the first theme motive in green. This example also adds a comment in the !!!RDF reference record which explains the function of the mark.
tindex -G sonata01-1.krn | themax -c "U?UUUUdddu" --loc \ | theloc -m --double \ --mchar "@ color=#ff0000 Theme 1 motive" > double1.krn tindex -G double1.krn | themax -I "-2 -2 -3 -3 -2? -2? +4?" --loc \ | theloc -m --double \ --mchar "+ color=#0000ff Theme 2 motive" > double12.krn tindex --rest -G double12.krn | themax -u "4.? 24 24 24 4 4?" --loc \ | theloc -m --double \ --mchar "Z color=#33aa33 Theme 1 rhythmic sub-motive" > double12b.krn hum2abc double12b.krn | abcm2ps - -O - | ps2pdf - - > double12b.pdf
The --rest option to tindex is used in the last search to prevent matching across rests.
Yanking measures containing marked notes
The myank program is aware of the !!!RDF marking convention, and it can be used to extract all measures in a file which contain notes marked as search results.
tindex -G sonata01-1.krn | themax -c "U?UUUUdddu" --loc \ | theloc -m | myank --double > theme1extract.krnThe --double option used with myank places a double barline between each segment of the score which is non-consecutive. Printing the resulting extracted measures:
hum2abc theme1extract.krn -n 1 | abcm2ps - -O - | ps2pdf - - > t1ex.pdfDoing the same for the second themes:
tindex -G sonata01-1.krn | themax -I "-2 -2 -3 -3 -2? -2? +4?" --loc \ | theloc -m --double --mchar "+ color=#0000ff" \ | myank --double > theme2extract.krn hum2abc theme2extract.krn -n 1 | abcm2ps - -O - | ps2pdf - - > t2ex.pdf
The file theme2extract.krn needed to
be edited to remove a dangling tie cause by a break in the music,
since abcm2ps complained about that tie.
The source code for the program was last modified on 24 Feb 2011. Click here to go to the full source-code download page.