HTK Version 1.4 =============== HTK is a software toolkit for building and manipulating continuous density Hidden Markov Models (HMMs). It consists of a set of 12 basic tools plus 7 additional more specialised tools. All tools interface with the outside world via a comprehensive set of library modules. This HTK Library ensures that all tools behave in a uniform way and simplifies the development of user-written tools. HTK is distributed in source form and will run on any 32bit machine with an ANSI C compiler. The HTK Toolkit provides facilities for speech analysis, label editing, HMM initialisation and training, testing and results analysis. There are no limits in HTK. Any HMM can have any number of states and any number of mixtures. Both full covariance and diagonal covariance can be used and there is full support for working with continuous speech and context dependent models. A variety of speech data formats are supported and input can be split into multiple independent data streams. Furthermore, HTK includes a unique generalised parameter tying mechanism which facilitates the manipulation of large sets of HMMs. Standard techniques such as Grand Variance and Tied-Mixtures are fully supported within this general mechanism. The library includes modules that handle the OS interface; HMM definitions and HMM i/o; speech data file i/o (raw data and parameterised data); speech file transcription i/o; speech training database manipulation; signal processing routines; math support; grammar support; and a simple interactive graphics interface. There are 12 basic tools and 7 more specialised tools in the current version of HTK (V1.4). These perform a range of functions including HCode - performs fft and lpc based speech analysis. The types of output include lpc filter coefficients, reflection coefs, lp cepstra, Mel-scale filter-bank and Mel-scale cepstra. HInit - initialise a set of HMMs using a segmental k-means procedure. HRest - basic isolated-word style Baum-Welch re-estimation for HMM parameters (can also be applied to continuous speech). HERest - the main training tool. It performes embedded Baum-Welch re-estimation with forward-backward pruning. It allows parallel operation across a network of machines. HVite - a continuous speech Viterbi based speech recogniser with finite-state grammar constraints, beam search and garbage collection. HResults - compares HVite output with transcriptions to perform results analysis. This tools complies with the US NIST scoring system. HHEd - a general HMM editor that allows HMM cloning and merging, parameter tying and untying, state clustering, addition and deletion of transitions and mixture splitting and repair. HLEd - provides batch editing of transcription files such as deleting, merging, folding and conversion to context dependent labels. HSLab - a simple interactive label editor. It allows labels to be attached to a spoken utterance and their boundaries set using both visual and audio feedback. (Note: requires X windows) HSmooth - smooth tied mixture weights using deleted interpolation. HTK is presently installed at more than 50 sites world-wide both for research and product development. Its current uses include isolated word recognition, phoneme-based continuous speech recognition, speaker identification, word spotting and other pattern matching applications such as face recognition. The V1.4 HTK distribution consists of all source code (about 30,000 lines), a Reference Manual, a User Manual, a Programmer Manual and an Installation Guide. Also included is a suite of demonstration scripts further illustrating the main ways of using HTK. It is distributed on either MS-DOS or Apple format 3.5 inch floppy disks (for distribution only - HTK runs on any 32 bit machine). There are 3 disks in the distribution containing 1) sources, 2) demonstration, 3) Unix compressed tar archive. The last disk contains all of 1) and 2), and is included for rapid installation on Unix systems. The cost of 1 copy of the HTK V1.3 distribution plus a site license for its use is (in pounds sterling) 950 (commercial/industrial sites) 450 (academic) VAT is chargeable on all UK orders. Post and packing charges are 5 pounds for UK orders and 20 pounds for overseas. Existing HTK V1.2 users can deduct their initial payment as a discount. Existing V1.3 users may upgrade to V1.4 free of charge. Orders should be sent to Lynxvale Limited 20 Trumpington Street Cambridge CB2 1QA England FAX: +44 223 332797 The HTK package is shipped immediately on receipt of order by first class post in UK, air mail for overseas. Express delivery can also be provided if requested. *** Note that Lynxvale Limited is a wholly owned company of the *** *** University of Cambridge. *** ----------------------------------------------------------------- For further information contact: Steve Young Email: sjy@eng.cam.ac.uk Cambridge University Engineering Dept Tel: +44-223-332654 Trumpington Street Fax: +44-223-332662 England, CB2 1PZ Telex: 81239 OR Phil Woodland Email: pcw@eng.cam.ac.uk Cambridge University Engineering Dept. Phone: +44-223-332669 Trumpington Street, Fax: +44-223-332662 Cambridge, CB2 1PZ, England Telex: 81239