The tindex program is an expanded version of the original themebuilder command written in AWK by David Huron, utilizing Humdrum Toolkit commands to extract pitch features from monophonic **kern data. The tindex program can emulate the original version by using the --mono option.
The tindex program allows indexing of polyphonic music consisting of strictly monophonic voices (molyphonic) as well as more complicated polyphony characteristic of keyboard music where voices enter and drop out from the overall texture of the music. The tindex/themax program pair can only handle monophonic sequences, so the tindex program can either choose the first or last note listed in a chord (multi-stop token). Other added capabilities include rhythmic feature extraction, reference (bibliographic) record extraction, selective feature extraction, grace note inclusion/exclusion, and segmentation boundary encoding.
As a basic example, consider the following six Humdrum files:
Passing these files to tindex will generate a search entry line for each file:
Application of the search indexOnce the index data has been created, it can be used by the themax command to search for feature patterns in the index. For example, themax can be used to search for the song(s) which contain the pitch sequence "e-flat, f, g". The default output of the themax command is a list of entries from the original thema index file created by tindex which match the search query. In this case there is one file (ex4.krn) which contains the pitch sequence:
If you want to perform an "AND" search with another independent musical feature, then the output from themax can be piped into another call to the program with the matches from the first search. To search for features in parallel (such as pitch and rhythm at the same time), the search queries are given as multiple options to a single call to themax. For example, the "e-flat, f, g" sequence occurs with the durations "dotted-half, eighth, eighth" which can be queried by the -u option:
An additional program called theloc (thema location) can then be used to identify the location in the original file when the --location option is given to themax. In this case "=1B1" means the matched sequence occurs starting at measure 1, beat 1 in the original data file (ex4.krn):
An additional output option from the theloc program will also mark the individual notes which caused the match. In the following search, the "e-flat, f, g" sequence is searched without considering the rhythm, and there are two locations in the file where the query is found. The --ending option has to be supplied along with the --location option so that both the starting and ending notes of the matches can be highlighted in the output data (otherwise, only the first note at the start of the match will be marked).
Each matched note is marked with an "@" character, and an !!!RDF: record explaining that character's meaning (a matched note) is given at the bottom of the file. The example on the right is generated by adding the --tie option to the theloc program so that it will highlight all tied notes after the first note in a group of notes tied together. The marking character can be used to locate matches in a text editor (by searching for "@" in the resulting file, or the marking character can be used to highlight the note in graphical music notation, such as coloring the matched notes red:
Thema index pitch fields
Each line contains multiple fields separated by a tab character, and each field except the first one at the start of the line begins with a unique tag character to facilitate searches in the thema command. The ten tab-separated default entries on each line are:
Rhythmic analysis option
Using the -r extracts eight rhythmic features into the output search index. When the -r option is used alone, the pitch features are suppressed. To include both pitch and rhythm features use the option pair -p -r or -a (for all) to include all musical features.
Listed below are example rhythmic features extracted from the six melodies given above. The first example extracted only the rhythmic features with the -r option, while the second example extracts all musical features with the -a option (both pitch and rhythm features).
Selective feature indexingBy default, tindex will extract all pitch features and no rhythmic features (to simulate the behavior of the original themebuilder program. To extract only all rhythm features, use the -r option. To extract both all pitch and all rhythm features, use either -a or -p -r.
However, if you only want a specific subset of any of the extractable features, use the -f option followed by a list of the features to extract according to the feature tags in the following table. This option is useful when only specific musical features will be searched. In these case, index file size will be minimized and search processing time will be increased by only including the desired musical features.
When a particular feature has more than one tag, any of those tags are aliases for the same musical feature. For example, to extract only the diatonic pitch class feature, use the option -f "PCH", or equivalently: -f "P", -f "PC".
The musical key and meter features can be suppressed with the -E option (meaning: no extra features).
Multiple selected features can be extracted by adding them to the -f option string, separated by one or more space characters. The ordering of the features in the option string does not matter: all features will be output in the canonical order required for searching with the themax command. In the following example, the pitch and duration features are extracted. Even though the features are listed in the order duration/pitch, the output index is ordered pitch/duration.
Including segmentation markersRests, fermatas and phrase endings can be used to segment the output stream of pitches/rhythms so that searches will not cross over boundaries defined by these features. If any of the following options are used to include segment markers (R) in the output data, control messages will be output with the index data if the -Q option is used to enable them.
Rests as segmentation boundaries
The --rest option will cause "R" segmentation markers to be placed within all extracted pitch and rhythm features whenever pitch sequences are separated by one or more rests. Only one rest marker will be inserted between two pitch/rhythm features, even if there are multiple intervening rests.
For pitch/rhythm interval features, if a single pitch is surrounded by rests both before and after the note, there will be two "R" markers in a row in the extracted features data (see example below). This is used to keep track of the number of notes in the original musical data for later alignment of the search matches in the musical data.
Fermatas as segmentation boundariesLike rests, fermatas can function as segmentation boundaries, particularly by implicitly marking phrase endings. Use the --fermata option to add an R marker to the feature sequences when a pitch with a fermata is found in the input data. Fermatas must occur on the first note in chord (multi-stop) **kern tokens. Fermatas on rests are ignored; include the --rest option instead. The following example show some sample music with fermatas and rests. The data can be extracted in three ways which will cause different segmentation markers (R) to be inserted within the output indexing data. Notice that when --fermata and --rest are both used at the same time, only one R segmentation marker will be generated in the output index data.
Phrase endings as segmentation boundaries
Phrase endings (}) can also be used to mark segmentation boundaries with the R character in the output feature index data. Phrase endings can fall on rests.
Grace notesBy default, grace notes information is extracted from the input music. If grace notes should be suppressed in the output index, use the -G option. Use the -Q option to output the #NOGRACE control message which is needed by the theloc command to work properly. Also, use the -Q in themax when passing the data to theloc.
Polyphonic data extraction can be done by using the --poly option. This option extracts multiple entries for a file, with one line for each **kern spine in the file.
Only the first layer of a spine is used for building an index. For example, here is a file with two spines of **kern data:
Running the command "tindex --poly poly.krn" will generate two entries:
Each entry adds a double colon (::) after the filename (or text string substitution when using the -t option), followed by the spine number from which the indexing data was extracted. Note that the second column of data in the second spine is currently ignored.
Fully polyphonic melodic extractionUse the --poly2 option to extra all sub-spine melodic sequences from a file. The --poly option will only extract the first layer (first occurrence of a given spine on the line), while --poly2 will extract all layers (all sub-spine occurrences on the line).
tindex --poly2 poly.krn -E -f "P"
Notice that with the --poly2 option, an additional line is added to the output index data. The third line in the above index data represents the pitch sequence found in the second subs-pine (i.e., the second layer) of the second spine. All secondary subs-pine data is indicated in the voice number after the filename after a period character after the primary spine number. In the above example "2.2" means that the sequence is from the second spine in the file (first 2), and in the second sub-spine in spine 2 (second 2).
When secondary subs-pines are not contiguous, a segmentation marker will be added to the output data.
tindex --poly2 poly2.krn -EfP
In the above example, the pitch sequence "ff gg" in the second subs-pine of the second spine is not immediately followed by the pitch sequence "bb ccc" later in the sub-spine. Therefore, a segmentation marker (R) is added between these two sub-sequences. In addition, since the second sub-spine (second layer) of the second spine does not start at the beginning of the music, a segmentation marker starts the index data for the second layer of the second spine.
In order to allow searching by instrument, the -i option can be given to store an instrumental name within the initial tag field of an index entry. Instrument names are give in Humdrum **kern data with a tandem interpretation starting with the characters *I:. Currently the instrument label will only work when the --poly or --poly2 option is also given.
Running the command "tindex -i --poly instrument.krn" will generate two entries which include the instrument label:
Including bibliographic records
When the -b option is given to tindex, all bibliographic (reference) records found in the input Humdrum file will be appended to the end of the feature list on an output index line. All bibliographic records will be placed in sorted ASCII order which is required for searching in multiple records using themax. Each bibliographic entry will be separated by a tab character on the output index line.
The -B option can be used to select only particular bibliographic records to store in the output index data. For example, to only store title records (if they are present in the input data), then the option would be -B "OTL". In this case all other bibliographic records, such as COM (composer's name records) will be suppressed.
To allow more than one bibliographic record type in the -B record filter string, each bibliographic key should be separate by spaces, colons, and/or commas. For example, to allow for the composer and title only in the output index, use "-B "COM, OTL" or -B "COM:OTL". The order of the bibliographic keys in the argument string for -B is not important, since the output index data will always produce bibliographic records in sorted ASCII order.
The bibliographic keys within the -B string are actually regular expressions. This allows for more specific filtering rules, such as:
By default -B "OTL" will match to bibliographic keys such as: OTL, OTL1, OTL@@FRE, OTL@ENG since all of them contain the string "OTL". The regular expression anchors for start and end of line (^ and $) are local to each bibliographic key in a -B option string.
Command-line settings which can affect the operation of themax and theloc are stored in control messages in the output data if the -Q option is specified. These messages start with a hash sign (#). All of these messages are suppressed in the output if the -Q option is not given. Messages will not contain tab characters on the line, which could interfere with the search mechanism within themax. Here is a list of the messages which may occur:
Directory processingThe tindex program can be given a mixture of files and directories as command-line arguments. Each Humdrum file will generate a line of data in the output index. If a directory is given to the program, then all files within that directory and its subdirectories will be processed if they end in .krn or .thm. The path names of the files will be include in the output index data.
The tindex program processes sequences of notes, and therefore it is not useful for searching notes occurring at the same time (see sonority for that). When tindex encounters a chord (or multi-stop) token, it processes only the first note in the token. Typically this note is the lowest note in the chord (although this is not required). If you instead prefer the highest note in the chord, use the --end option to extract the last note in multi-stop tokens.
Note offsetsWhen data is processed with tindex, the usual assumption is that the first note of the data is the first note in the music. If you want to partially index a musical score, chop it into selected pieces and index each piece separately. In order to link back to the score with theloc, add a comment like this to the start of the extracted **kern spine:
!noff:17This comment will be read by tindex as a note offset value which will be stored after the voice number, preceded by a semicolon.
For example, the following music contains music in 2/4 and 3/4. Since each entry in a thema index can only indicate a single key/meter, the music can be chopped into two segments, one for each section. The second segment of the music starts with the 7th note of the original music, so add !noff:7 before the first data line in the second segment:
When tindex processes the two parts, the note offset value will be stored in the entry for the second segment:
In order to fully link back to the original file, add a global comment to the segmented files which gives the name of the original file:
Then when the index data is created with tindex the original filename will be used instead of the segment's filename:
Now when themax is used, the correct note numbers will be marked. For example, searching for the pitch sequence "G A" should find two matches—one starting on note 4 and the other starting on note 7 in the original file.
This information can be fed into theloc to mark the matched notes in the original file:
Which can then be converted to highlighted notes in a conversion to graphical music notation:
| autostem | hum2muse | muse2ps =z21j | pstopnm -dpi=300 \
| convert - -trim -negate -alpha copy -resize '33%' -negate towmeter.png
If you only want to search music selectivly in triple meter, the split data segments make this possible:
| autostem | hum2muse | muse2ps =z21j | pstopnm -dpi=300 \
| convert - -trim -negate -alpha copy -resize '33%' -negate towmeterTriple.png
program file.krnIt can also read the data over the web:
program http://www.some-computer.com/some-directory/file.krnPiped data works in a somewhat similar manner:
cat file.krn | programis equivalent to a web file using ths form:
echo http://www.some-computer.com/some-directory/file.krn | program
Besides the http:// protocol, there is another special resource indicator prefix called humdrum:// which downloads data from the kernscores website. For example, using the URI humdrum://brandenburg/bwv1046a.krn:
program humdrum://brandenburg/bwv1046a.krnwill download the URL:
Musedata Bach Brandenburg Concerto collection.
This online-access of Humdrum data can also interface with the classical Humdrum Toolkit commands by using humcat to download the data from the kernscores website. For example, try the command pipeline:
humcat humdrum://brandenburg/bwv1046a.krn | census -k