Well done. Both submitted on your behalf. Checked the book - you are right, it's a digitization error. Will see what Shalu will say about the rest.
3 hariharamanRqalazoqaSaliNgodBava instead of hariharamaRqalazoqaSaliNgodBava (extra 'n' before 'R').
4 brAhmaRAcCaMsinaukTya Possibly wrong. I couldnt trace the word properly in scan
5 iqepahavana wrong. iqopahavana right.
6 upAMSukFqita wrong. upAMSukrIqita right.
7 check gose. I am not sure about it.
8 tejOra wrong. tejAura correct
9 Check out triyfca. seems wrong. maybe tryca?
10 nyagBAvayiyf wrong -> nyagBAvayitf correct
11 proGiya -> proGIya
12 mAMsamayIpeSI -> mAMsamayI peSI (add space)
13 SabdaparicCedarahasyepUrvavAdarahasya - SabdaparicCedarahasye pUrvavAdarahasya (Add space)
As per the space issue. I have given you 'key1' output, that means 'māṁsamayīpeśī'. It's something that is great for coding, reading, searching, but there is no such entry in printed dictionary. In printed dictionary we have what's inside 'key2', in this case 'māṁsa--ma° yī peśī'. So actually there is no issue. But it's a good strategy to have the normalized 'key1', otherwise we would get even more issues.
14 saMgItavinodenftyADyAya - saMgItavinode nftyADyAya (I have pointed out spacing issue, because it is seen in front end of http://www.sanskrit-lexicon.uni-koeln.de/scans/MWScan/2014/web/webtc/indexcaller.php?key=saMgItavinodenftyADyAya&input=slp1&output=SktDevaUnicode
15 sarvatomuKOgAtraprayoga -> sarvatomuKOdgAtraprayoga (missing 'd' after 'O')
16 sahasraYit -> sahasrajit
17 sAvitrIpeYjara -> sAvitrIpaYjara
18 aDokzeRa -> aDo'kzeRa (Add an avagraha)
19 ozaDihomna -> ozaDihoma
20 gAnDArivlARija -> gAnDArivARija
21 catuscatvAriMSadakzara -> catuzcatvAriMSadakzara (catus->catuz)
22 cAruSfRgin -> cAruSfNgin
23 cIpUdru -> cIpudru
4 brAhmaRAcCaMsinaukTya - au brAhmaRAcCaMsina ukTya - au (space before ukTya)
Sorry - its already done above. Number not 24 but 4 Will add another issue at 24
25 mahAyogapaYcaratneASvalAyanopayogyADAnaprakaraRa - eA >> mahAyogapaYcaratne ASvalAyanopayogyADAnaprakaraRa - eA (space needed)
In MW print it is like this- separated by hyphens.
26 jIvantiSUlAmkf - jIvantiSUlAMkf
27 JiRkA -> JiRikA
28 WunWupadDati -> WuRWupadDati
29 quRDi -> quRQi
24 aDaupAsana - au >> aDa upAsana - au
30 AUrRu >> A UrRu
31 I am unable to trace the word tUryOGa at http://www.sanskrit-lexicon.uni-koeln.de/scans/MWScan/2014/web/webtc/servepdf.php?page=1328. Please check
32 dIrGatamasorka -> dIrGatamaso'rka
Suspected wrong entry in print edition itself. 33 dvicatvArinSika -> dvicatvAriMSika 34 dvicAtvArinSika -> dvicAtvAriMSika
34 nidigdigDikA -> nidigDikA
35 nividCaMSam -> nivicCaMSam
36 paYcataponvita - paYcatapo'nvita
Probable errors in the printed edition itself 37 paYcatrinSat -> paYcatriMSat 38 paYcatrinSacClokI -> paYcatriMSacClokI 39 paYcatrinSatpIWikA -> paYcatriMSatpIWikA
40 parigatArWa -> parigatArTa
41 pAtranirnega -> pAtranirRega
42 pArAvatAKza -> pArAvatAkza
43 pAruskika -> pAruzika
44 pugrodBava -> purodBava
45 purumedba -> purumeDa
Sometimes I spend a lot of time finding the needed word on a known page. In very rare cases the page is wrong, so I submit it for correction. I like the idea that the .pdf pages that open when we click on them, would have an OCR layer, just like my MW .pdf has. I can easily add the layer to all the .pdf pages as separate files out there. It speeds up my work in hard cases.
That is a great idea Marcis.
46 batwa - bawwa
Suspect wrong entry in printed edition itself. 47 bARAparnI
Why I think this is
Tested right now in FF, the OCR layer is usable and works. Now it's up to Jim. If I'll do the .pdf trick on all the pages, will you, Jim, upload to the server? Size per page if wanted I can calculate, no big increase.
46 batwa - bawwa case is exactly just about nṭ -> ṇṭ kind of checking / replacement https://github.com/drdhaval2785/SanskritSpellCheck/issues/1 We are getting closer.
47 BinrArTa -> BinnArTa
Possible error in the print edition 48 mahAnAmRIvrata -> mahAnAmnIvrata
49 mahApuruzavidyAyAMvizRurahasyekzetrakARqejagannATamAhAtmya - mahApuruzavidyAyAM vizRurahasye kzetrakARqe jagannATamAhAtmya
50 mAdbavollAsa -> mADavollAsa
51 mIQuztama needs chcking. In my opinion, it should be converted to mIQuzwama. But the printed dictionary has only '-tama' suffix. So maybe we will have to convert it on our own.
52 medorbuda -> medo'rbuda
53 ratnamanjarI -> ratnamaYjarI
54 ratnasArajAtakejyotizasArasaMgraha -> ratnasArajAtake jyotizasArasaMgraha
55 raTaDurDUrgata -> these are two words raTADur and raTaDUrgata
56 lavaRAnbataka -> lavaRAntaka
57 varnopeta -> varRopeta
58 viramamRa -> viramaRa
59 vEtAlikarRikaRTa -> vEtAlikarRikanTa
probable error in print edition
60 vyanSuka -> vyaMSuka
Please note different 'n'. The same letter in vyaMSaka has been rendered as 'M' in digitization. Have a closer look at all occurrences of 'nS' in dictionary and find out the pattern. In my opinion this kind of 'n' should be entered as 'M'
61 vyAkIrnakeSara -> vyAkIrRakeSara 62 vyAkIrnamAlyakavara -> vyAkIrRamAlyakavara
63 vratapratIzWA -> vratapratizWA 64 vratapratIzWAprayoga - vratapratizWAprayoga
65 SacIpuraRdara -> SacIpuraMdara
66 SabdaparicCedarahasyepUrvavAdarahasya -> SabdaparicCedarahasye pUrvavAdarahasya
Potential error in print edition. 67 SamanIcameDra -> SamanIcameQra
For generative organ meQra is proper word in my knowledge.
68 sAmagirta -> sAmagIta
69 sAzwfka -> sAzwrika
Possible error in Printed edition 70 svayamAtfnRA -> svayamAtfRRA
71 hatAGaSansa -> hatAGaSaMsa
Please consider listing all 'nS' combination, check and change them with 'MS'. This kind of errro seems quite frequently
72 aloluptva -> alolupatva
73 GanadundBisvana -> GanadunduBisvana
74 cOrodDraRika -> cOrodDaraRika
75 jAlAbadDANgulipARipAdatalstA - jAlAbadDANgulipARipAdatalatA
76 dartri -> dartf
77 paYcAKyAnavarttika -> paYcAKyAnavArttika
78 pAtaYjalaBAzyavarttika -> pAtaYjalaBAzyavArttika
79 pizwarAtryAHkalpa -> pizwarAtryAH kalpa
80 putrikApuntra -> putrikAputra
81 pfTivizadzWa are two different entries -> pfTivizad and pfTivizWa
82 pfTivizadzWA are two different entries -> pfTivizadz and pfTivizWA
83 bahvAsintva -> bahvASintva
84 mahASivaratryudyApana -> mahASivarAtryudyApana
How many possible cases left?
safe guess would be 20 more errors. Maybe 30
Regarding '4 brAhmaRAcCaMsinaukTya Possibly wrong. I couldnt trace the word properly in scan':
Since the word is in IAST, probably it should be spelled in slp1 as: brAhmaRAcCaMsinOkTya . As confirmation, since it is a N. of wk., it happens to be in Aufrecht Catalogus Catalogorum spelled as brAhmaRACaMsinOkTyam.
If you confirm, I will enter the brAhmaRAcCaMsinOkTya correction in MW.
re 'gose'. MW does have, with 'gosa'. PWG under 'gosa' also shows 'gose' with same sense (at daybreak). So No correction needed.
Regarding '9 Check out triyfca. seems wrong. maybe tryca?'
triyfca is correct. Evidence:
It agrees with scan:
Regarding '12 mAMsamayIpeSI -> mAMsamayI peSI (add space)'
We have adopted the convention in mw.xml to have no spaces in key1. There is a space in key2.
Doing a grep on mw.xml shows there are 61 instances, including this one, with a space in key2. They will all have no space in key1.
> grep '<key2>.* .*</key2>' mw.xml > temp
> wc -l temp
61 temp
Similar comment applies to:
13 SabdaparicCedarahasyepUrvavAdarahasya - SabdaparicCedarahasye pUrvavAdarahasya (Add space)
14 saMgItavinodenftyADyAya - saMgItavinode nftyADyAya
I agree with Marcis' comment on 'As per the space...'.
Regarding '18 aDokzeRa -> aDo'kzeRa (Add an avagraha)':
The situation is similar to the previous comment. Our coding of key1 in MW has the convention of excluding avagraha. key2 has the avagraha.
> grep "<key2>.*'.*</key2>" mw.xml > temp
> wc -l temp
225 temp So, there are 225 records in mw.xml with an avagraha in key2.
But no avagraha key1:
> grep "<key1>.*'.*</key1>" mw.xml > temp
> wc -l temp
0 temp
Jim, when quoting some stats, including "61 instances, including this one, with a space in key2" can you please upload the generated .txt document to GitHub code section so we can dive deeper into it? What about the other possible cases? https://github.com/sanskrit-lexicon/MWS/blob/master/key2-space-61-entries.txt https://github.com/sanskrit-lexicon/MWS/blob/master/key2-avagraha-225-entries.txt
I buy the argument that key1 has skipped the avagraha, space etc. I won't be posting them now onwards.
85 vaDakAnkzin -> vaDakANkzin
86 varzartuvarRnana -> varzartuvarRana
87 vivAhAdikArmaRAmprayoga -> vivAhAdikarmaRAm prayoga (Please note change of kA -> ka. It is not about space).
88 saMgarakskama -> saMgarakzama
89 sarvadevatApratisTWAsArasaMgraha -> sarvadevatApratisWAsArasaMgraha
I have compared the Vowel and Consonant patterns of MW against that of PWG.
The result is attached herewith.
Code for checking is attached here. Google doc for logic behind approach: Video tutorial for code running - http://youtu.be/qLqYUZUGM6M
Input data : MW and PWG
I am checking the HTML file thoroughly. There are many issues found out by this approach.
Here is the video tutorial about how to use the HTML file for error finding.