Nabu-asr
|
Please find the documentation page here
Nabu is an ASR framework for end-to-end networks built on top of TensorFlow. Nabu's design focusses on adaptibility, making it easy for the designer to adjust everything from the model structure to the way it is trained.
Last built with TensorFlow version: 1.8.0
Nabu works in several stages: data prepation, training and finally testing and decoding. Each of these stages uses a recipe for a specific model and database. The recipe contains configuration files for the all components and defines all the necesary parameters for the database and the model. You can find more information on the components in a recipe here.
In the data preperation stage all the data is prepared (feature computation, target normalization etc.) for training and testing. Before running the data preperation you should create a database.conf file in the recipe directory based on the database.cfg that should already be there, and fill in all the paths. Should you want to modify parameters in the processors, you can modify the config files that are pointed to in the database config. You can find more information about processors here.
You can run the data prepation with:
In the training stage the model will be trained to minimize a loss function. During training the model can be evaluated to adjust the learning rate if necessary. Multiple configuration files in the recipe are used during training:
You can find more information about models here, about trainers here and about evaluators here.
You can run the training with:
The parameters are the same as the data preperation script (see above) with one extra parameter; mode (default: non_distributed). Mode is the distribution mode. This should be one of non_distributed, single_machine or multi_machine. You can find more information about this here
In the testing stage the performance of the model is evaluated on a testing set. To modify the way the model in is evaluated you can modify the test_evaluator.cfg file in the recipe dir. You can find more information on evaluators here.
You can run testing with
The parameters for this script are similar to the training script (see above). You should use the same expdir that you used for training the model.
In the decoding stage the model is used to decode the test set and the resulting nbest lists are written do disk in the expdir. To modify the way the model is used for decoding look into the recognizer.cfg file. You can find more information about decoders here.
You can run decoding with
The parameters for this script are similar to the training script (see above). You should use the same expdir that you used for training the model.
You can automatically do a parameter search using Nabu. To do this you should create a sweep file. A sweep file contain blocks of parameters, each block will change the parameters in the recipe and run a script. A sweep file looks like this:
For example, if you want to try several number of layers and number of units:
The parameter sweep can then be executed as follows:
where command can be any of the commands discussed above.
There are some scripts avaiable to use a Nabu Neural Network in the Kaldi framework. Kaldi is an ASR toolkit. You can find more information here.
Using Kaldi with nabu happens in several steps: 1) Data preperation 2) GMM-HMM training 3) Aligning the data 4) computing the prior 5) Training the Neural Network 6) Decoding and scoring
The data preperation is database dependent. Kaldi has many scripts for data preperation and you should use them.
You can train the GMM-HMM model as folows:
With the folowing arguments:
The script will compute the features, train the GMM-HMM models and align the training data, so you do not have to do this anymore in the coming step. The alignments for the training set can be found in <traindir>/pdfs.
The training data has already been aligned in the previous step, but if you want to align e.g. the validation set you can do that as follows:
the datadir should point to the data you want to align, the traindir should be the traindir you used in the previous step and the targetdir is the directory where the alignments will be written. The alignments can be found in <targetdir>/pdfs
The prior is needed to convert the pdf posteriors to pdf pseudo-likelihoods. The prior can be computed with:
traindir should be the same as the traindir in the previous step. the prior can then be found in numpy format in <traindir>/prior.npy
Training the neural network happens using the Nabu framework. In order to do this you should create a recipe for doing so (see the section on training). You can find an example recipe for this in config/recipes/DNN/WSJ. You can use this recipe, but you should still create the database.conf file based on the database.cfg file. In your database configuration you should create sections for the features which is the same as you would do for a normal Nabu neural network and sections for the alignments. The alignment sections should get the special type "alignments". A section should look something like this:
dir is just the directory where the processed alignments will be written.
The rest of the training procedure is the same as the normal procedure, so folow the instructions in the sections above.
To decode the using the trained system you should first compute the pseudo-likelihoods as folows:
The pseudo likelihoods can the be found in <expdir>/decode/decoded/alignments.
You can then do the Kaldi decoding and scoring with:
The arguments are similar as the arguments in the script above. The outputs will be written to the <outputs> folder.
As mentioned in the beginning Nabu focusses on adaptibility. Everything in the recipe can be modified (more information about recipes here). Most classes used in Nabu have a general class that defines an interface and common functionality for all children and a factory that is used to create the necessary class. Look into the respective README files to see how to implement a new class.
In general, if you want to add your own type of class (like a new model) you should follow these steps:
As an example you can find the results for running the LAS/TIMIT recipe. The training loss in average cross entropy is plotted in the following image:
The performance on the validation set measured in character error rate is plotted in the folowing image (I don't know why the extra lines are there)
The error rate on the test set is 21.6% and training took just under 4 hours on a Nvidia GeForce GTX 1080 Ti.