Wrong entries detected in MW by comparision between dictionaries


I have compared the Vowel and Consonant patterns of MW against that of PWG.

The result is attached herewith.

Code for checking is attached here. Google doc for logic behind approach: Video tutorial for code running - http://youtu.be/qLqYUZUGM6M

Input data : MW and PWG

I am checking the HTML file thoroughly. There are many issues found out by this approach.

Here is the video tutorial about how to use the HTML file for error finding.


Well done. Both submitted on your behalf. Checked the book - you are right, it's a digitization error. Will see what Shalu will say about the rest.


3 hariharamanRqalazoqaSaliNgodBava instead of hariharamaRqalazoqaSaliNgodBava (extra 'n' before 'R').


4 brAhmaRAcCaMsinaukTya Possibly wrong. I couldnt trace the word properly in scan


5 iqepahavana wrong. iqopahavana right.


6 upAMSukFqita wrong. upAMSukrIqita right.


7 check gose. I am not sure about it.


8 tejOra wrong. tejAura correct


9 Check out triyfca. seems wrong. maybe tryca?


10 nyagBAvayiyf wrong -> nyagBAvayitf correct


11 proGiya -> proGIya


12 mAMsamayIpeSI -> mAMsamayI peSI (add space)


13 SabdaparicCedarahasyepUrvavAdarahasya - SabdaparicCedarahasye pUrvavAdarahasya (Add space)


As per the space issue. I have given you 'key1' output, that means 'māṁsamayīpeśī'. It's something that is great for coding, reading, searching, but there is no such entry in printed dictionary. In printed dictionary we have what's inside 'key2', in this case 'māṁsa--ma° yī peśī'. So actually there is no issue. But it's a good strategy to have the normalized 'key1', otherwise we would get even more issues.


14 saMgItavinodenftyADyAya - saMgItavinode nftyADyAya (I have pointed out spacing issue, because it is seen in front end of http://www.sanskrit-lexicon.uni-koeln.de/scans/MWScan/2014/web/webtc/indexcaller.php?key=saMgItavinodenftyADyAya&input=slp1&output=SktDevaUnicode


15 sarvatomuKOgAtraprayoga -> sarvatomuKOdgAtraprayoga (missing 'd' after 'O')


16 sahasraYit -> sahasrajit


17 sAvitrIpeYjara -> sAvitrIpaYjara


18 aDokzeRa -> aDo'kzeRa (Add an avagraha)


19 ozaDihomna -> ozaDihoma


20 gAnDArivlARija -> gAnDArivARija


21 catuscatvAriMSadakzara -> catuzcatvAriMSadakzara (catus->catuz)


22 cAruSfRgin -> cAruSfNgin


23 cIpUdru -> cIpudru


4 brAhmaRAcCaMsinaukTya - au brAhmaRAcCaMsina ukTya - au (space before ukTya)

Sorry - its already done above. Number not 24 but 4 Will add another issue at 24


25 mahAyogapaYcaratneASvalAyanopayogyADAnaprakaraRa - eA >> mahAyogapaYcaratne ASvalAyanopayogyADAnaprakaraRa - eA (space needed)

In MW print it is like this- separated by hyphens. mw


26 jIvantiSUlAmkf - jIvantiSUlAMkf


27 JiRkA -> JiRikA


28 WunWupadDati -> WuRWupadDati


29 quRDi -> quRQi capture


24 aDaupAsana - au >> aDa upAsana - au


30 AUrRu >> A UrRu mw2



32 dIrGatamasorka -> dIrGatamaso'rka capture


Suspected wrong entry in print edition itself. 33 dvicatvArinSika -> dvicatvAriMSika 34 dvicAtvArinSika -> dvicAtvAriMSika

capture


34 nidigdigDikA -> nidigDikA capture


35 nividCaMSam -> nivicCaMSam capture


36 paYcataponvita - paYcatapo'nvita

capture


Probable errors in the printed edition itself 37 paYcatrinSat -> paYcatriMSat 38 paYcatrinSacClokI -> paYcatriMSacClokI 39 paYcatrinSatpIWikA -> paYcatriMSatpIWikA

capture


40 parigatArWa -> parigatArTa

capture


41 pAtranirnega -> pAtranirRega capture


42 pArAvatAKza -> pArAvatAkza

capture


43 pAruskika -> pAruzika capture


44 pugrodBava -> purodBava

capture


45 purumedba -> purumeDa

capture


Sometimes I spend a lot of time finding the needed word on a known page. In very rare cases the page is wrong, so I submit it for correction. I like the idea that the .pdf pages that open when we click on them, would have an OCR layer, just like my MW .pdf has. I can easily add the layer to all the .pdf pages as separate files out there. It speeds up my work in hard cases.


That is a great idea Marcis.


46 batwa - bawwa capture


Suspect wrong entry in printed edition itself. 47 bARAparnI

capture

Why I think this is

capture


Tested right now in FF, the OCR layer is usable and works. Now it's up to Jim. If I'll do the .pdf trick on all the pages, will you, Jim, upload to the server? Size per page if wanted I can calculate, no big increase.

ocr-gets-big


46 batwa - bawwa case is exactly just about nṭ -> ṇṭ kind of checking / replacement https://github.com/drdhaval2785/SanskritSpellCheck/issues/1 We are getting closer.


47 BinrArTa -> BinnArTa

capture


Possible error in the print edition 48 mahAnAmRIvrata -> mahAnAmnIvrata

capture


49 mahApuruzavidyAyAMvizRurahasyekzetrakARqejagannATamAhAtmya - mahApuruzavidyAyAM vizRurahasye kzetrakARqe jagannATamAhAtmya


50 mAdbavollAsa -> mADavollAsa

capture


51 mIQuztama needs chcking. In my opinion, it should be converted to mIQuzwama. But the printed dictionary has only '-tama' suffix. So maybe we will have to convert it on our own.


52 medorbuda -> medo'rbuda

capture


53 ratnamanjarI -> ratnamaYjarI

capture


54 ratnasArajAtakejyotizasArasaMgraha -> ratnasArajAtake jyotizasArasaMgraha


55 raTaDurDUrgata -> these are two words raTADur and raTaDUrgata

capture


56 lavaRAnbataka -> lavaRAntaka

capture capture


57 varnopeta -> varRopeta

capture


58 viramamRa -> viramaRa

capture


59 vEtAlikarRikaRTa -> vEtAlikarRikanTa

capture


probable error in print edition

60 vyanSuka -> vyaMSuka

capture

Please note different 'n'. The same letter in vyaMSaka has been rendered as 'M' in digitization. Have a closer look at all occurrences of 'nS' in dictionary and find out the pattern. In my opinion this kind of 'n' should be entered as 'M'


61 vyAkIrnakeSara -> vyAkIrRakeSara 62 vyAkIrnamAlyakavara -> vyAkIrRamAlyakavara

capture


63 vratapratIzWA -> vratapratizWA 64 vratapratIzWAprayoga - vratapratizWAprayoga

capture


65 SacIpuraRdara -> SacIpuraMdara capture


66 SabdaparicCedarahasyepUrvavAdarahasya -> SabdaparicCedarahasye pUrvavAdarahasya


Potential error in print edition. 67 SamanIcameDra -> SamanIcameQra

capture

For generative organ meQra is proper word in my knowledge.


68 sAmagirta -> sAmagIta

capture


69 sAzwfka -> sAzwrika

capture


Possible error in Printed edition 70 svayamAtfnRA -> svayamAtfRRA

capture


71 hatAGaSansa -> hatAGaSaMsa

capture

Please consider listing all 'nS' combination, check and change them with 'MS'. This kind of errro seems quite frequently


72 aloluptva -> alolupatva

capture


73 GanadundBisvana -> GanadunduBisvana

capture


74 cOrodDraRika -> cOrodDaraRika

capture


75 jAlAbadDANgulipARipAdatalstA - jAlAbadDANgulipARipAdatalatA

capture


76 dartri -> dartf

capture


77 paYcAKyAnavarttika -> paYcAKyAnavArttika capture


78 pAtaYjalaBAzyavarttika -> pAtaYjalaBAzyavArttika

capture


79 pizwarAtryAHkalpa -> pizwarAtryAH kalpa


80 putrikApuntra -> putrikAputra

capture


81 pfTivizadzWa are two different entries -> pfTivizad and pfTivizWa

capture


82 pfTivizadzWA are two different entries -> pfTivizadz and pfTivizWA

capture


83 bahvAsintva -> bahvASintva

capture


84 mahASivaratryudyApana -> mahASivarAtryudyApana

capture


How many possible cases left?


safe guess would be 20 more errors. Maybe 30


Regarding '4 brAhmaRAcCaMsinaukTya Possibly wrong. I couldnt trace the word properly in scan': image

Since the word is in IAST, probably it should be spelled in slp1 as: brAhmaRAcCaMsinOkTya . As confirmation, since it is a N. of wk., it happens to be in Aufrecht Catalogus Catalogorum spelled as brAhmaRACaMsinOkTyam.

If you confirm, I will enter the brAhmaRAcCaMsinOkTya correction in MW.


re 'gose'. MW does have, with 'gosa'. PWG under 'gosa' also shows 'gose' with same sense (at daybreak). So No correction needed.


Regarding '9 Check out triyfca. seems wrong. maybe tryca?'

triyfca is correct. Evidence:

  1. It agrees with scan: image

  2. PWG has triyyfca , one meaning being = tfca, agreeing with MW.

Regarding '12 mAMsamayIpeSI -> mAMsamayI peSI (add space)'

We have adopted the convention in mw.xml to have no spaces in key1. There is a space in key2.

Doing a grep on mw.xml shows there are 61 instances, including this one, with a space in key2. They will all have no space in key1.

> grep '<key2>.* .*</key2>' mw.xml > temp
> wc -l temp
61 temp

Similar comment applies to:

13 SabdaparicCedarahasyepUrvavAdarahasya - SabdaparicCedarahasye pUrvavAdarahasya (Add space)

14 saMgItavinodenftyADyAya - saMgItavinode nftyADyAya

I agree with Marcis' comment on 'As per the space...'.


Regarding '18 aDokzeRa -> aDo'kzeRa (Add an avagraha)':

The situation is similar to the previous comment. Our coding of key1 in MW has the convention of excluding avagraha. key2 has the avagraha.

> grep "<key2>.*'.*</key2>" mw.xml > temp
> wc -l temp
225 temp   So, there are 225 records in mw.xml with an avagraha in key2.
But no avagraha key1:
> grep "<key1>.*'.*</key1>" mw.xml > temp
> wc -l temp
0 temp

Jim, when quoting some stats, including "61 instances, including this one, with a space in key2" can you please upload the generated .txt document to GitHub code section so we can dive deeper into it? What about the other possible cases? https://github.com/sanskrit-lexicon/MWS/blob/master/key2-space-61-entries.txt https://github.com/sanskrit-lexicon/MWS/blob/master/key2-avagraha-225-entries.txt


I buy the argument that key1 has skipped the avagraha, space etc. I won't be posting them now onwards.


85 vaDakAnkzin -> vaDakANkzin

capture


86 varzartuvarRnana -> varzartuvarRana

capture


87 vivAhAdikArmaRAmprayoga -> vivAhAdikarmaRAm prayoga (Please note change of kA -> ka. It is not about space).

capture


88 saMgarakskama -> saMgarakzama

capture


89 sarvadevatApratisTWAsArasaMgraha -> sarvadevatApratisWAsArasaMgraha

capture