SYNOPSIS
OPTIONS
DESCRIPTION
Each line in the index file consists of an ordered sequence of musical features extracted from an underlying Humdrum file, with each musical feature description starting with a unique identifying character and with fields separated by tab characters. The index file is searched via regular expressions by the themax program. The original thema command utilizes the command-line regular expression tool, grep, to do the final searching in the index file. The themax program uses the Perl-Compatible Regular Expression library to internally search the index file (and thus does not require any external tools to run which is useful for using the program on Windows computers). As an example application, suppose that you want to search in the melodies of the vocal parts in Ludwig Erk's Deutscher Liederschatz, volume 1 [zip file] which consists of 201 songs. First, you will have to extract the vocal parts from the full score. The vocal part is always in spine three of the score for these particular songs (the first two spines being the piano accompaniment), so the following command can extract the vocal parts using a bash-shell for-loop:
for i in *.krn do extractx -f3 $i > `basename $i .krn`.thm done tindex -a *.thm > liederschatz1.thema The above command will create the following content in the file liederschatz1.thema:
The first column contains a filename given as input to the tindex command (the .thm extensions removed manually for this example). The filename can also be replaced with some other unique identifier for the melodic data described on the line. Subsequent tab-separated fields on the line each start with a unique character which identifies what musical feature follows. For example, "Z" marks the start of the key-designation field, and "{" marks the start of the twelve-tone interval feature. Using the -a option (for extract all musical features), the tindex command will extract all seven pitch features and eight rhythm features into the index data. You can select which features to store in the index file using the -f option in the tindex program. Here is an example of the features extracted from song 53 in the example set:
The tindex program will always place these search features in the given order, although not all features need be extracted into the index. This ordering is important if you want to search multiple features in parallel since a single regular expressions is generated to do the search in a single pass. In the above example generation of an index file, the score sequence of the notes as printed on paper were used. This may cause problems by identifying sequences which span a first/second repeat measure. If you are concerned about these sorts of errors, you should pass the songs through the thrux program first so that the performance sequence of the notes is used in the index file. Another caveat is that the tindex program requires an explicit key designation when creating the key field in the index data. If there is only a key signature (such as *k[f#] for G major or E minor), the input music file still needs a key designation, such as *G: for G major, or *e: for E minor. This key designation, in turn, is required for calculating the scale-degree feature. Search filtersSearch in major keys onlyWhen using the -M option with themax, only music in major keys will be searched. The -M option can also be used without any other search queries. In that case, the themax program will return a list of the lines in the index file which represent music in a major key. For example, to count the number of songs in any major key in the example Deutscher Liederschatz volume, type the following command which should give an answer of 196 songs: themax -M indexfile | wc -l 196 Search in minor keys onlyLikewise, the -m option is used to selectively search minor key entries in the index file. In the example set of songs, only 5 are in a minor key:themax -m indexfile | wc -l 5 The themax program echos the entire text line in the index on which a match was found for a search query. The command "wc -l" is used above to count the number of lines being returned by the themax command, which is also a count of the number of songs which match the query. If you want to only see which files contained the match, then try this command: themax -m liederschatz1.thema | awk '{print $1}' erk018 erk042 erk081 erk090 erk133 Filtering music for a particular tonicThe -t option can be used to only search music with a particular tonic pitch. For example "-t C" will only search music in either C major or C minor. Like -M and -m, the -t option does not have to be used with a particular music query, and will return all lines which match the requested tonic key if no additional search queries are added.For key tonics which contain a flat sign, use a minus sign after the diatonic pitch name (such at "-t A-" for A-flat. For sharp signs use a "#" character; however, you should probably enclose the option in single quotes on the command-line so that the sharp sign is not interpreted as a comment marker: "-t 'C#'" for a C-sharp tonic. Alternatively, the themax program will accept forms such as "-t A-flat" for A-flat, and "-t F-sharp" for F-sharp. As an example, here are searches for various keys which can be used to get statistics on the frequency of tonics used in the songs: themax -t C indexfile | wc -l 30 themax -t D-flat indexfile | wc -l 1 themax -t D indexfile | wc -l 15 themax -t E-flat indexfile | wc -l 20 themax -t E indexfile | wc -l 5 themax -t F indexfile | wc -l 33 themax -t G indexfile | wc -l 47 themax -t A-flat indexfile | wc -l 3 themax -t A indexfile | wc -l 22 themax -t B-flat indexfile | wc -l 24 themax -t B indexfile | wc -l 1 In this case, the most commonly used key has a tonic on G. The -t option does not select the major/minor modality. Use the -M or -m options to choose the mode: themax -M -t G indexfile | wc -l 46 themax -m -t G indexfile | wc -l 1 In this case, there is one piece in G minor, and 46 in G major. Of course, a more efficient method of doing this particular analysis would be to process the second data field of the index directly without using themax: awk '{print $2}' indexfile | sort | uniq -c 3 ZA-= 19 ZA= 24 ZB-= 30 ZC= 1 ZD-= 15 ZD= 20 ZE-= 5 ZE= 33 ZF= 46 ZG= 3 zA= 1 zB= 1 zG= Filtering music for a particular meterMusic in a particular meter or metrical category can be selected by using the -T option. The metrical description of the music contains the time signature (numerator/denominator) followed by the string "duple", "triple", "quadruple" or "irregular", and finally "simple" or "compound". As with the previously described options, the -T option can be used alone without any musical feature search.In the example data set, the -T option can be used to identify how many songs are in 4/4: themax -T 4/4 indexfile | wc -l 63 To search music in any duple meter (2/4 and 6/8, for example), use "duple": themax -T duple indexfile | wc -l 62 Any meter with more than 4 beats is classified as "irregular" (such as 5/4, but not 9/8 which is considered to be a compound time signature with three beats). themax -T irregular indexfile | wc -l 1 You can compare the number of songs in a simple meter (such as 2/4, 3/4) to songs in compound meter (such as 6/8, 9/8) with the following commands. themax -T simple indexfile | wc -l 176 themax -T compound indexfile | wc -l 25 So about 1/8 of the songs in the example set are in a compound meter. Be careful, because the labeling of meters as compound or simple is done automatically by the tindex program, and it cannot distinguish between 6/8 which is compound (two beats at the dotted quarter duration) or simple (six beats at the eighth note duration), which is an inherent ambiguity built-in to modern time signatures. Also, 3/8 is currently identified as being simple (three beats on the eighth note level) rather than compound (one beat at the dotted quarter note level). Multiple components of the metrical description can be searched at the same time. The order must be (1) time signature (2) beat count (3) simple/compound. Here is an example search for music in a triple, simple meters: themax -T triplesimple indexfile | wc -l 74 Note that -T simpletriple would not return any matches since the ordering of the metric components is incorrect. Search anchoringBy including the -a option when searching with themax, matches are required to start at the beginning of a data entry. Potential matches which start after the initial position in a data entry will be ignored. This option requires an actual feature query (unlike the previously described search filtering options). If -a is used without any other arguments, all lines in the search index will be echoed to standard output.Here is an example search for the pitch sequence "C D E" in the example index, searching both unanchored (pattern can start anywhere in the data), or anchored (pattern must start at the beginning of the music). Notice that this melodic pattern occurs in 29 songs, but only occurs at the start of one song. themax -p "C D E" indexfile | wc -l 29 themax -a -p "C D E" indexfile | awk '{print $1}' erk131 Pitch query optionsdiatonic pitch classThe -p option is used to search the music by pitch class. The pitch class name is composed of a diatonic letter A-G, and can be followed by one or more flat (-) or sharp (#) signs. For example, A-- is A double-flat, and C## is C double-sharp. An "X" can be used to represent a double sharp. Accidentals only apply to the note to which they follow, so repeated notes using the same accidental must also repeat the accidental signs. In the classic thema command, a single space between note names is required; however, this is optional in themax, and the diatonic pitch names are case-insensitive. Here is an example search for the melodic sequence "D F# A" in the example data index: themax -p "D F# A" indexfile | wc -l 8 Extended pitch-class query syntaxNorthern European pitch-class names are understood by themax, where "is" is equivalent to a sharp sign, and "es" is equivalent to a flat sign. You will have to be careful about the B/H diatonic pitch name. If an "H" is present in the query, it will be assumed to be equivalent to an English "B" pitch, while "B" will be assumed to be equivalent to an English "B-". If no "H" is present, then "B" is assumed to be equivalent to an English "B" pitch. themax -p "d fis a" indexfile | wc -l 8 Southern European and fixed-do solfège are also understood in the pitch-class query string: C = ut or do, D = re, E = mi, F = fa, G = sol or so, A = la, B = si or ti. English or German accidental syntax can be applied to these basic diatonic codes. Spaces between pitch names are optional. themax -p "re fa# la" indexfile | wc -l 8 themax -p "refaisla" indexfile | wc -l 8 Wildcards in pitch-class query stringsFive characters are used to represent special meanings in the pitch class search query:
Diatonic mode for pitch class queriesUsing the -D option with a -p search is equivalent to adding a "^" wildcard after each diatonic pitch name. This option allows searching for diatonic pitch sequences with any accidental after the pitch name. This capability is useful for searching for modal equivalents, and various interpretations of musica ficta. Count the number of songs which contain the exact pitch sequence "A B C D E": themax -p "a b c d e" indexfile --total 8 Now count the number of songs which contain the diatonic pitch sequence "A B C D E", allowing for accidentals attached to any of those notes: themax -D -p "a b c d e" indexfile --total 13 The previous search is equivalent to: themax -p "a^ b^ c^ d^ e^" indexfile --total 13 Scale degreeThe -d option is used to search using a scale-degree representation of the music. 1 is assigned to the tonic note, and values from 1-7 represent the 7 scale degrees of the major and minor scales. Accidentals are not considered. For example, in C major a C-sharp is encoded as "1" for this musical feature. Spaces are optional between scale degrees. An example query by scale degree, searching for "1 3 5": themax -d 135 indexfile | wc -l 56 themax -d "1 3 5" indexfile | wc -l 56 Solfège syllables or letter names can be used in scale-degree queries. Movable-do is used in this case, so do=1, re=2, mi=3 and so on. If letter names are used, then the key of C is used to convert to scale degrees. In that case C=1, D=2, E=3 and so on. themax -d "do mi so" indexfile | wc -l 56 themax -d "C E G" indexfile | wc -l 56 12-tone pitch classThe -P option is used to search using a twelve-tone pitch class representation of the music. The pitch classes are numbers from 0 (for C) to 11 (for B). for values 10 and higher, the pitch class numbers are mapped to letters of the alphabet. So 10 = Y (representing the pitch class B-flat/A-sharp), and 11 = Z (representing the pitch class B). The multi-digit pitch class names can also be used, provided spaces separate the numbers from adjacent values. If you use "Y" or "Z" for pitch classes, then you cannot also use 10 or 11 as the respective alternate names for the pitch classes. Searching for the 12-tone pitch class sequence "6 8 Y": themax -P 68Y indexfile --trim erk035 erk081 themax -P "6 8 10" indexfile --trim erk035 erk081 themax -P "68 10" indexfile --trim erk035 erk081 Note that twelve-tone pitch classes are less selective than diatonic+accidental based pitch classes: themax -p "f# g# a#" indexfile --trim erk081 themax -p "g- a- b-" indexfile --trim erk035 Pitch names can be used for 12-tone pitch class searches, with accidentals up to two sharps/flats. These values will be converted automatically into numeric 12-tone pitch classes within themax before searching the index file: themax -P "g- solis b-flat" indexfile --trim erk035 erk081 Musical intervalThe -I option can be used to search by musical interval. A musical interval has three components in this order: (1) interval direction, (2) interval quality, and (3) diatonic interval size.
Not all components are required: if any component is missing, then all states for that component will be matched. Example interval descriptions: +P5 = a rising perfect fifth. +3 = a rising third (can be a rising major third or a rising minor third). "P4" is a query for a perfect fourth (either rising or falling). A "+" by itself represent a rising interval of any quality and diatonic size. Songs which start with a rising perfect fifth: themax -a -I "+P5" indexfile --trim erk035 Number of songs which contain a rising perfect fifth anywhere in the song: themax -I +P5 indexfile --total 87 Number of songs which contain a rising or falling octave: themax -I P8 indexfile --total 61 Songs which contain a long up/down sequence of intervals: themax -I "+-+-+-+-+-" indexfile --trim erk009 erk016 erk038 erk066 erk103 Songs which contain 3 rising thirds in a row: themax -I "+3 +3 +3" indexfile --trim erk029 erk074 erk099 erk101 Wildcards for musical interval search queriesNot all of these wildcards are active yet.
12-tone intervalThe -i option can be used to search the music using twelve-tone intervals. This is equivalent to counting the number of half-steps between notes in the sequence. Rising intervals may be preceded by an optional "+" (plus), and falling intervals must be preceded by a "-" (minus). To specify that the direction is optional, prepend a "~" (tilde) character to the interval value. A repeated note is represented by a "0" interval. If an interval is preceded by a plus/minus or tilde sign, then spaces between the intervals are optional. Look for songs which contain three rising whole-tones in a row: themax -i "2 2 2" indexfile --trim erk052 erk078 erk147 Count the number of songs which contain three major seconds in a row, which can be either rising or falling: themax -i "~2 ~2 ~2" indexfile --trim 119 Count the number of songs which have a major sixth interval up, followed by a major third down: themax -i "+9 -4" indexfile --total 19 Find a song with a long string of repeated notes: themax -i "0 0 0 0 0 0 0 0 0" liederschatz1.thema --trim erk130 Pitch gross contour
The -C option can be used to search the musical data for gross contour (also called Parsons Code), which is a basic intervallic description of the melodic line split into three categories (1) up (next pitch is higher than current one), (2) down (next pitch is lower than current one), (3) same (next pitch is same as previous one). For up intervals, you can use u, U, or /. For down intervals, you can use d, D, or \. For repeated (same) intervals you can use s, S, = (equals sign), or - (dash). Extended regular expression syntax work in the pitch gross contour search queries. For example S+ means one or more repeated interval (two or more repeated notes in a row). Count the number of songs which have 6 up intervals followed by 6 down intervals: themax -C "uuuuuudddddd" indexfile --trim erk115 Same search as above, but using extended regular expressions: themax -C "u{6}d{6}" indexfile --trim erk115 Count the number of songs which have a least 4 upward intervals followed by any type of intervals, and eventually followed by 4 or more falling intervals themax -C "u{4,}.*d{4,}" indexfile --total 34 Wildcard characters in Pitch gross contour queries:
Pitch refined contourThe -c option searches using refined interval contour. Refined contour contains five intervallic levels rather than the three of gross contour. In the refined case, up and down intervals are split into two sub-categories (1) step-wise movement, and (2) leap movement. Any step-wise movement up or down (a half-step, whole-step or augmented second) is represented by a lower case "u" or "d". Any leap up or down (a third or larger) is represented by an upper case "U" or "D". A repeated note is represented as an "s", "S" or "-" as with gross contour features. Like gross contour, all extended regular expression wildcards are allowed. Count the number of songs which contain a leap down followed by one more more steps or leaps upwards, followed by a leap down: themax -c "D(u|U)+D" indexfile --total 109 Find songs which contain at least 10 or more successive leaps (any mixture of up and down leaps): themax -c "(U|D){10,}" indexfile --trim erk025 erk047 erk124 Wildcard characters in Pitch refined contour queries:
Rhythm query optionsDurationThe -u option allows search by duration. Durations are specified by rhythmic values (actually the reciprocal of a duration value) using Humdrum's **recip representation. In other words "4" is a quarter note, "16" is a 16th note, "12" is a triplet-eighth note (there are 12 triplet-eighth notes in a whole note). Note that notated eighth notes in a 6/8 meter are not considered triplet eighth notes but rather, plain eighth notes ("8"). Count the number of songs which contain any 16th notes: themax -u 16 indexfile --trim 121 Multiple rhythms in the search query must be separate by one or more spaces in order to parse the rhythmic entities properly: Count the number of songs which contain the rhythmic pattern "4 8 8": themax -u "4 8 8" indexfile --trim 154 A period (".") or "d" can be used to represent dotted rhythmic values. Count the number of songs which have the rhythmic sequence of a dotted eighth note followed by a sixteenth note: themax -u "8. 16" indexfile --trim 100 Count the number of songs which contain dotted rhythms (Note that rests are not examined by themax): themax -u "d" indexfile --total 174 The duration feature allows for one regular expression wildcard. An "X" (or "x") can be used to represent any single duration of any type. A dot cannot be used as in the pitch feature, since it might be confused with an augmentation dot. For example, count the number of songs which contain a sequence of a quarter note, followed by any rhythm, followed by a half note: themax -u "4 x 2" indexfile --total 47 Duration gross contourThe -R option is used to search duration gross contour features: "S" if the following note has a shorter duration than the current note; "L" if the following note is longer, and "=" if the following note has the same duration as the current note. The search features are case insensitive, and extended regular expressions can be used in this search option. Search for songs which contain a long string of equal durations (30 or more repeated durations): themax -R "={30}" indexfile --trim erk056 erk063 erk110 Search for songs which contain four notes, each shorter than the previous one: themax -R 'sss' indexfile --trim erk052 (starting in second measure of second page) erk131 erk134 erk142 erk162 Search for songs which have a long sequence of shorter-longer duration pairs: themax -R '(SL){10}' indexfile --trim erk131 erk148 Duration refined contourThe -r option is used to search duration refined contour which is similar to duration gross contour described above, but allows for 5 states, like pitch refined contour:
Find songs with two shorter notes (of the same duration) followed by three longer notes (of the same duration) which are more than twice the duration of the previous two notes: themax -r '=L==' indexfile --trim erk009 erk059 erk111 (starting in measure 8) erk130 erk168 erk169 erk197 Beat levelThe -b option can be used to search using the beat level musical feature. If a note is to be played on a beat, its feature value is "1". If a note is to be played off of the beat, its feature value is "0". All extended regular expression wildcards are allowed in this search field. Count songs which start with an anacrusis: themax -a -b 0 indexfile --trim 94 Search for songs which contain a sequence of at least 40 notes which alternate on/off of the beat: themax -b "(10){20}" indexfile --trim erk010 erk017 erk020 erk027 erk035 erk148 erk195 Metric positionThe -l (lower-case L) option is used to search the "metrical position" feature. This is the beat number on which a note attack occurs. Beat values must be separated by one or more spaces. In 4/4 meters, count the number of songs which contain only notes on the beats followed by a note in the next measure on the downbeat: themax -T 4/4 -l "1 2 3 4 1" indexfile --total 34 Off-beats are specified by adding a space and then a fractional offset into the given beat. A dash ("-") can also be used to separate the beat from the fractional offbeat part of the number. For example, to search for a dotted eighth followed by a sixteenth note on beat four, you can search using the feature "4 4 3/4" or "4 4-3/4": themax -l "4 4 3/4" indexfile --total 15 themax -l "4 4-3/4" indexfile --total 15 Wildcards in metric position features
The wildcard "~" is used after a beat to indicate that the note must fall within a given beat, either on the beat or an offbeat after that beat position, but before the next beat. Count the number of songs in a 2/4 meter which have three notes occurring during the span of beat 2, with the first note occurring on the beat, and two others on subsequent off-beats before a note on beat 3. In other words, the search query will match to an eighth and two sixteenths on beat 2, or to three triplet eighth notes on beat 2: themax -T 2/4 -l "2 2~ 2~ 1" indexfile --total 37 The wildcard "^" is used to indicate any offbeat within the given beat, not including any note attack occurring at the start of the beat. Search for songs which have an note on the first beat of a measure, none on the second but two notes on a off-beats of beat two, followed by a on beat three: themax -l "1 2^ 2^ 3" indexfile --trim erk009 erk028 erk058 erk078 erk147 erk148 The dot wildcard in metric position searches represent any beat position. Here is a search for melodies which have a note on beat one, followed by two notes on any beat, followed by a note on beat 3 (either in the same measure or in a different measure): themax -l "1 . . 3" liederschatz1.thema --total 111 Metric levelThe -L option is used to search metric level features. The metric level is a log2 indication of the metric stress of a note. Beats are assigned the value 0, eighth-note off-beats are assigned -1, sixteenth-notes after beats and eighth-note off-beats are assigned -2, and so on. In 4/4, the first beat of a measure is +2 and the third beat is +1. For non-negative metric levels (i.e., beats), the symbol "B" (or "b") can be used to indicate any beat. Likewise, "S" (or "s") can be used to indicate any sub-beat (metric position which does not fall on a beat). Note that this is similar to the beat level features. This feature is used to generate the metric refined contour and metric gross contour features. Count the number of songs which have a downbeat in 4/4 followed by an eighth-note offbeat. themax -L "+2 -1" indexfile --total 30 Count the number of songs which start with an eighth-note upbeat: themax -a -L "-1" indexfile --total 46 Find any songs which do not contain sub-beats: themax -L "S" -v indexfile --trim erk039 Metric gross contourMetric gross contour is analogous to pitch gross contour. There are three states which describe the rhythmic relationship between a note and the following one, which can be either on a stronger metrical position, a weaker position or an equivalent position. The -E option (or --MGC) is used to search metric gross contours, where there are 3 possible states:
Metric refined contourLike metric gross contour, metric refined contour is modeled after pitch refined contour. The -e option (or --MRC) is used to search metric refined contour, where there are 5 metric levels:
themax -e 'SSSS' indexfile --trim erk030 Other optionsInterleaved search queryFeatures can be searched in parallel by specifying multiple feature options to themax at the same time. Alternately, you can create a single search string which contains the parallel features interleaved together. For example, to search for both pitch and rhythm at the same time for the notes C, D and E all in quarter-note durations:themax -p "c d e" -u "4 4 4"The -q option can be used to interleave these two options into a single equivalent search string: themax -q "p:u c:4 d:4 e:4"or, reversing the order of the features: themax -q "u:p 4:c 4:d 4:e"Spaces separate individual elements in the search, and the first element is the name of the uninterleaved option. Interleaved options within each element are separated by a colon character (:). Some search features have longer equivalent names, so the following command is equivalent to the above three: themax -q "pitch:duration c:4 d:4 e:4" Any number of interleaved features can occur in the -q option string. Also, interleaved features do not need to have the same number of elements, provided that shorter-length feature queries occur later in each element list. For example, here is a search primarily by interval and duration, but with a starting pitch specified: themax -q "interval:duration:pitch P4:4:C +m3:4 -M2:4"which is equivalent to: themax -u "4 4 4" -p "C" -I "P4 +m3 -M2"which means: search for four notes in a row which have the interval pattern perfect fourth (up or down) followed by a rising minor third followed by a falling major second. The first three notes of the search match must be quarter notes, and the search match must start on the pitch-class "C". Note that in this case, there are three elements in the interval and duration queries, but only one in the pitch query. Therefore the pitch query must be listed after the other two features in the interleaved query string. Returning total entry match countBy default, themax returns the lines in the index file which the search query matched, so that further processing of the index data can be done (via a pipe to themax again, for example). Instead, the --total option can be used to return the number of matches found in the index file. This is equivalent to piping the default output of themax through the command "wc -l".themax --total -p "df#a" indexfile 8 themax -p "df#a" indexfile | wc -l 8 Negated queriesThe -v option allows a search query to be negated. This causes only entries which do not match to be returned.Count the number of songs which do not contain the pitch C: themax -v -p C indexfile --total 39 Search for themes which do not contain a descending minor second: themax -v -I "-m2" indexfile --trim erk040 erk042 erk068 erk108 erk159 erk162 erk188 erk191 Display regular expression search queryThe --regex option can be used to display the regular expression which will be used to search the index file given the input query options. The actual search will not be done, and the program will exit after the regular expression is printed to standard output. The regular expression uses extended syntax which can be used in the egrep program.themax -a -M --regex -d 135 indexfile Z[^=]*=.*%135 themax -a -M -d 135 indexfile --total 12 egrep `themax -a -M --regex -d 135` indexfile | wc -l 12 Prevent cleaning of search queriesBy default, the themax command will attempt to automatically clean up the search queries for musical features so that users can input the features in multiple ways. For example, the pitch search using the -p option accepts both C, ut and do for the pitch name C. If you are using themax under automated conditions, you can add the --raw option to prevent such pre-processing of the search queries. This will save some negligible time, and will allow the use of extended regular expressions directly on the data. Wildcard characters specific to themax (and not to regular expressions) are not available when using the --raw option. Display post-processed user query by featuresA more verbose version of --regex can be viewed with the --features option. This option will display the cleaned version of the user input queries which is useful for debugging complex queries.In the following example, -M is converted into Z[^=]*=, -p "utdoissies" is converted into (?:C) (?:C#) (?:Bb), and the final regular expression which will be used to search the index file is Z[^=]*=.*J(?:C) (?:C#) (?:Bb). themax2 -a -M -p "utdoissies" indexfile --queries Tonic: Z[^=]*= Pitch-class: (?:C) (?:C#) (?:Bb) Final Regular Expression: Z[^=]*=.*J(?:C) (?:C#) (?:Bb)[ ] Match counts per lineThe --count option will display the number of matches to the search query found on each line in the index file. This can be used to obtain statistics on the frequency of a pattern in a database or a particular file.themax -p "g g e" indexfile --count erk027 1 erk033 1 erk043 2 erk045 1 erk058 1 erk075 2 erk101 1 erk109 2 erk129 2 erk192 2 erk200 3 The output from themax when the --count option is given is the first column from the matched line, followed by the number of times the search query was found on that line. In this case the erk200 contains three match locations for the search query "g g e". Other matching entries contain one to two "g g e" pitch sequences. When the --count option is used at the same time as the --total option, the last line of the output will be a count of the number of times the search query was found in all matched entries: themax -p "df#a" indexfile --count --total erk005 1 erk024 1 erk047 2 erk050 1 erk094 1 erk104 1 erk107 1 erk166 1 9 In the above example, 9 is a count of the number of occurrences of the search query in all matched entries in the index file. There are 8 matched songs from the index file, but there is one song where the pattern "D F# A" occurs twice, so the total count is listed as 9 (number of matched patterns) instead of 8 (number of songs containing at least one matched pattern). Here is the number of perfect 4ths, 5ths, tritones, and the number of major/minor 6ths which occur in all of the songs ("tail -n 1" means to display only the last line from the output of themax): themax -I P4 --total --count indexfile | tail -n 1 792 themax -I P5 --total --count indexfile | tail -n 1 309 themax -i 6 --total --count indexfile | tail -n 1 27 themax -I 6 --total --count indexfile | tail -n 1 383 Using --total with --count is also an easy method to count the number notes of a particular pitch class in a corpus. Counting the number of "C" pitches in the set of example songs: themax -p C --total --count indexfile | tail -n 1 1586 Counting the number of C-sharps and D-flats: themax -p C-sharp --total --count indexfile | tail -n 1 451 themax -p D-flat --total --count indexfile | tail -n 1 30 Counting both C-sharp and D-flats at the same time, as a 12-tone pitch class feature (which should be the sum of C-sharps and D-flats counted independently): themax -P 1 --total --count indexfile | tail -n 1 481 Displaying match starting-note locationsThe --location option is similar to the --count option, except that the starting note(s) of each match within the features is listed rather than the total number of matches within an index line.themax -p "g g e" indexfile --location erk027 10 erk033 14 erk043 9 20 erk045 4 erk058 2 erk075 6 38 erk101 12 erk109 6 46 erk129 4 20 erk192 25 33 erk200 62 76 99For these search results, the pitch pattern "g g e" occurs starting on note 10 of erk027, note 14 of erk033, on both notes 9 and 20 of erk043, and so on. Displaying match start/stop notesUse the --location2 option instead of --location in order to list both the starting note and the ending note of the match. As an example usage of the --location2 option, consider the following C major scale:tindex -E -f "INT" scale.krnSearching for the interval pattern of three seconds in a row will find many matches in the data. In the data, the pattern is found starting on note 1 through 4, 2 through 5, 3 through 6, 4 through 7, and 5 through 8. tindex -E -f "INT" scale.krn | themax --location2 -I "2 2 2"However, searching for three major seconds in a row will only find a match on notes 4 through 7. tindex -E -f "INT" scale.krn | themax --location2 -I "M2 M2 M2" Returning filenames onlyThe --trim option will remove all data fields in the matching lines except for the first column of data (the filename or identity string). This is equivalent to piping the default output of themax through the command "awk '{print}'".themax --trim -p "df#a" indexfile erk005 erk024 erk047 erk050 erk094 erk104 erk107 erk166 themax -p "df#a" indexfile | awk '{print $1}' erk005 erk024 erk047 erk050 erk094 erk104 erk107 erk166 Parallel feature searchingThe themax command allows for multiple musical features to be searched at the same time. By default, the features will be required to start on the same note, although the various feature queries can be of different lengths. When using wildcards match to variable note lengths, the only requirement is that start of each independent search feature starts on the same first note.As an example, here is a search of the pitch sequence "G E G G" and the duration sequence "8 8 8 8" at the same time: themax -p "g e g g" -u "8 8 8 8" indexfile --trim erk075 erk192In this case there are two songs which contain the pitch sequence "g e g g" and the duration sequence "8 8 8 8" starting on the same note:
If, however, you want to search for songs which have both a pitch sequence of "g e g g" and a rhythm sequence of "8 8 8 8", but not necessarily at the same time, then use the --unlink option to indicate that the multiple feature queries are not required to be linked to same starting note: themax -p "g e g g" -u "8 8 8 8" --unlink indexfile --trim erk075 erk192 erk200 In this case there is an extra song which matches to the search query. This song contains both the pitch and rhythm sequences, but these features to not align to the same notes (the pitch sequence occurs on the rhythms "8 2 4 8"). Another way to unlink multiple features (or do two independent searches on the same feature) would be to pipe the output from themax into another call to themax: themax -p "g e g g" indexfile | themax -u "8 8 8 8" --trim erk075 erk192 erk200 Themax does not search diatonic pitch values, only pitch chroma. One way to search in a semi-absolute pitch manner is to combine the pitch (or 12-tone pitch) feature with an interval or contour search. For example, searching for the sequence "C A" will return all songs which contain a C pitch followed by an A pitch, regardless of whether the following A pitch is above or below the C pitch: themax -p "C A" indexfile --total 82 A parallel contour search can separate cases where the A pitch is above or below the C pitch: themax -p "C A" -C D indexfile --total 78 themax -p "C A" -C U indexfile --total 9 themax -pca -Cu indexfile | themax -pca -Cd --total 5 78 songs contain "C A" with the A pitch below the C pitch, while 9 songs contain a rising A pitch. Five songs contain a melodic fragment which has both a rising A pitch and a falling A pitch. Kern-based note searchingA special case of parallel feature searches can be done with the -k option. This option allows for a sequence of kern notes to be used as the search query. The kern notes can contain both pitch and duration information (in any order), or just one feature of pitch or duration. The **kern notes can only contain pitch and duration values, and each note must be separated from adjacent notes in the sequence by one or more spaces. No other **kern characters (such as articulations or stem directions) are allowed. Search for the pitch/duration sequence "8c 8e 4g": themax -k "8c 8e 4g" indexfile --trim erk171 erk200 The above search is equivalent to specifying the features independently: themax -p "c e g" -u "8 8 4" indexfile --trim erk171 erk200 If a note is missing either pitch or duration information, then a wildcard for the missing feature will be inserted automatically in the equivalent independent feature searches. In the following example, the middle note contains only a duration value, so any pitch is allowed as the second note in the search query: themax -k "8c 8 4g" liederschatz1.thema --trim erk027 erk034 erk107 erk110 erk160 erk171 erk200 which is equivalent to the following search: themax -p "c . g" -u "8 8 4" indexfile --trim erk027 erk034 erk107 erk110 erk160 erk171 erk200 Search limitThe --limit option can be used to limit the time that searches are done within the index file(s). When a limit is given, then the themax program will stop searching for matches once the specified count of matches has been found in the index file.themax -P "C" indexfile --total 162 themax -P "C" --limit 100 indexfile --total 100 When using --limit with the --count or --location options, the limiting will still apply to index entries, not to the output values given by --count or --location. In the following two example uses of themax, the total number of C pitches in the index file is 1472, but there are 1009 C pitches in the first 100 songs which contain at least one C pitch. themax -P "C" --limit 100 indexfile --count --total | tail -n 1 1009 themax -P "C" --limit 100 indexfile --count --total | tail -n 1 1472 Segmentation boundariesWhen a thema index has been created with tindex using any of the options --rest, --phrase or --fermata, the resulting index will contain segmentation boundaries which are the character R followed by an optional space (depending on the feature).If segmentation markers are encoded within the features, the symbol R or r can be used in feature queries. In other words, searching for the diatonic pitch-class sequence "C R G" will search for a C followed by a segmentation boundary, followed by a G. These segmentation boundaries can be ignored when searching features such as diatonic pitch-class names by using the -B option. The -B option should not be used for searching intervallic data (of both rhythm and pitch features) since this will yield inaccurate results. For intervallic searches without segmentation boundaries, the tindex command should be run without --rest, --fermata or --phrase instead of using the -B option; otherwise, inaccurate results are possible. Thema indexes generated with tindex may contain the control messages #REST, #FERMATA, or #PHRASE if the -q was not used to suppress these messages. In any case, the presence of an R within a feature indicates that one or more of the segmentation options in tindex were used. Control messages The themebuilerx program may store
control messages
in the thema index data if certain options are used. By default, these
messages are automatically echoed by the themax command. If a
match count by entry match needs to be done, use the --no-messages
option to suppress these message, or filter out lines in the output
which start with a "#" character.
A match count by indexed file can be done by running the
output of themax --no-messages into the
wc -l command
which counts the number of lines in its input.
In some cases these messages are necessary for theloc to produce correct output. If the
required messages are not passed to theloc, you can instead specify them
manually on the command-line call to theloc.
SEE ALSO
DOWNLOAD
The source code for the program was last modified on 17 Jan 2011. Click here to go to the full source-code download page. |