A Language Classifier powered by Recurrent Neural Network(RNN) implemented in Python without AI libraries.
Features The classifier classifies a word in English , Spanish , Finnish , Dutch , or Polish . The classifier outputs correctly at a rate of approximately 85%. It is purely implemented with numpy and built-in libraries.
Model Architecture
Input Layer: 47 nodes representing 47 different characters
First Hidden Layer: 100 nodes
Second Hidden Layer: 100 nodes
Output Layer: 5 nodes representing 5 languages
The technique used in this project is called Recurrent Neural Network(RNN) : Here, an RNN is used to encode the word “cat” into a fixed-size vector h3.
Sample Run Training until validation accuracy achieve a certain level: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 epoch 1 iteration 24 validation-accuracy 43.0% shaking English ( 22.4%) Pred: Dutch |en 22%|es 20%|fi 18%|nl 26%|pl 14% relaxing English ( 23.7%) Pred: Dutch |en 24%|es 20%|fi 18%|nl 25%|pl 13% prophecy English ( 17.6%) Pred: Spanish |en 18%|es 24%|fi 24%|nl 16%|pl 19% tiroteo Spanish ( 25.8%) |en 21%|es 26%|fi 18%|nl 18%|pl 17% vientre Spanish ( 24.2%) |en 17%|es 24%|fi 21%|nl 21%|pl 17% estupenda Spanish ( 31.4%) |en 16%|es 31%|fi 18%|nl 19%|pl 16% osti Finnish ( 21.2%) Pred: Polish |en 15%|es 19%|fi 21%|nl 20%|pl 25% veljensä Finnish ( 19.8%) Pred: Spanish |en 21%|es 22%|fi 20%|nl 20%|pl 18% aikoinaan Finnish ( 22.3%) |en 15%|es 21%|fi 22%|nl 21%|pl 21% betwijfel Dutch ( 22.8%) Pred: English |en 24%|es 23%|fi 15%|nl 23%|pl 15% merkte Dutch ( 17.1%) Pred: Spanish |en 17%|es 22%|fi 22%|nl 17%|pl 21% beseffen Dutch ( 24.5%) |en 21%|es 19%|fi 21%|nl 25%|pl 15% kończę Polish ( 21.5%) Pred: Spanish |en 17%|es 23%|fi 20%|nl 18%|pl 21% firmy Polish ( 20.7%) Pred: Finnish |en 15%|es 22%|fi 23%|nl 19%|pl 21% decyzje Polish ( 16.2%) Pred: Dutch |en 19%|es 22%|fi 20%|nl 23%|pl 16% . . . epoch 6 iteration 153 validation-accuracy 84.2% shaking English ( 86.4%) |en 86%|es 0%|fi 1%|nl 12%|pl 1% relaxing English ( 84.6%) |en 85%|es 0%|fi 0%|nl 15%|pl 0% prophecy English ( 54.2%) |en 54%|es 0%|fi 0%|nl 4%|pl 41% tiroteo Spanish ( 38.9%) |en 12%|es 39%|fi 36%|nl 6%|pl 8% vientre Spanish ( 43.4%) |en 19%|es 43%|fi 2%|nl 29%|pl 7% estupenda Spanish ( 75.2%) |en 1%|es 75%|fi 15%|nl 2%|pl 7% osti Finnish ( 75.7%) |en 1%|es 1%|fi 76%|nl 3%|pl 20% veljensä Finnish ( 81.7%) |en 0%|es 1%|fi 82%|nl 17%|pl 0% aikoinaan Finnish ( 99.9%) |en 0%|es 0%|fi100%|nl 0%|pl 0% betwijfel Dutch ( 98.7%) |en 1%|es 0%|fi 0%|nl 99%|pl 1% merkte Dutch ( 71.9%) |en 10%|es 1%|fi 6%|nl 72%|pl 10% beseffen Dutch ( 96.6%) |en 2%|es 0%|fi 0%|nl 97%|pl 0% kończę Polish (100.0%) |en 0%|es 0%|fi 0%|nl 0%|pl100% firmy Polish ( 29.4%) Pred: English |en 59%|es 5%|fi 2%|nl 5%|pl 29% decyzje Polish ( 87.7%) |en 1%|es 1%|fi 0%|nl 10%|pl 88%
Test Results: 1 test set accuracy is: 83.800000%
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 word: tervetuloa # welcome predicted language is: Finnish, with a confidence of 80.011147% word: ciudades # cities predicted language is: Spanish, with a confidence of 88.442353% word: właź # hatch predicted language is: Polish, with a confidence of 99.979566% word: algorithm predicted language is: English, with a confidence of 79.893499% word: resolution predicted language is: English, with a confidence of 94.786443% word: ademt # breathe predicted language is: Dutch, with a confidence of 47.399565% word: invitar # invite predicted language is: Spanish, with a confidence of 93.986880%
Dependencies You will need numpy
for this project
How To Use clone this project or download the zip file
Improvements To Make
support save & load models
classify more languages
improve accuracy
classify a sentence or paragraph instead of words
…
Reference The dataset lang_id.npz
, image demonstrating RNN
, and project skeleton are from cs188.ml .