Thành viên:Laurent Bouvier/Free Vietnamese Dictionary Project Vietnamese-English
Comments
sửaWords that are already in the dictionary are doubled now. That should be corrected somehow... David 16:41, ngày 17 tháng 6 năm 2006 (UTC)
- I can imagine that is actual caused by the fact that multiple language code convention as used ... Do you have an example ? Laurent Bouvier 16:45, ngày 17 tháng 6 năm 2006 (UTC)
- Insect, dictionary, asiatique. There are different language codes indeed. David 10:08, ngày 18 tháng 6 năm 2006 (UTC)
- I can imagine that is actual caused by the fact that multiple language code convention as used ... Do you have an example ? Laurent Bouvier 16:45, ngày 17 tháng 6 năm 2006 (UTC)
Sorry for my absence. When you have some free computing time, could you please run an interwiki bot through the modified entries? Pages like insect currently have two sets of interwiki links that should be merged, and the Vietnamese Wikipedia uses a "plain alphabetized" sorting order for the links: see m:Interwiki sorting order. Otherwise, things are looking pretty good! Thanks for your help, Laurent and everyone. I'll give PiedBot bot status shortly, so that other changes don't slip through the cracks. Remember that you can click "xem rôbốt" in Đặc biệt:Recentchanges or append ?hidebots=0
to its URL to show PiedBot's changes. – Nguyễn Xuân Minh (thảo luận, đóng góp) 06:17, ngày 19 tháng 6 năm 2006 (UTC)
- Hi Laurent, a part from the few doubled words, the overal is fine. Don't worry about the fact that French or English words are overwhelming Vietnamese ones, the later will catch up in due time. Trần Thế Trung 07:50, ngày 19 tháng 6 năm 2006 (UTC)
- Is everybody in agreement with the comment from Trần Thế Trung? Do you want to have more items in French and English ? Laurent Bouvier 12:54, ngày 20 tháng 6 năm 2006 (UTC)
- I am. The more items the better, to me. Greets, David 13:34, ngày 21 tháng 6 năm 2006 (UTC)
- Is everybody in agreement with the comment from Trần Thế Trung? Do you want to have more items in French and English ? Laurent Bouvier 12:54, ngày 20 tháng 6 năm 2006 (UTC)
- OK. I will fix the issue of double InterWiki links and duplicates due to different language norms and let's go up to 5.000 (Just before the norwegianin position 25th)! Laurent Bouvier 16:47, ngày 21 tháng 6 năm 2006 (UTC)
- Finally, it was a little more ... ;-) The Vietnamese wiktionary in the 12th biggest one by item total count with more than 23000 articles most of which are English. Laurent Bouvier 10:39, ngày 25 tháng 6 năm 2006 (UTC)
Status
sửa==> This import is now complete. Some articles may have not been imported because of concerns in their structures that I have not been able to automatically convert. Laurent Bouvier 11:51, 19 tháng 9 2006 (UTC)
- A good thing have been done. Thank you Laurent. How many are the ones that could not be import? Trần Thế Trung 13:15, 19 tháng 9 2006 (UTC)
- Thanks so much, Laurent. Could you have PiedBot generate a list of entries that couldn't be imported? We can go through and create them ourselves using the original website. – Nguyễn Xuân Minh (thảo luận, đóng góp) 18:11, 19 tháng 9 2006 (UTC)
The FVDP praises to have 109.000 words;
Out of those:
- 10.804 are phrases e.g. "a la carte"
- 121 contains a comma
- 509 contains a parenthesis
- 11.662 contains an hypen
I have been so far able to import 56.711 words. So may be should I try to untighten my validation rules ... Laurent Bouvier 10:03, 23 tháng 9 2006 (UTC)
- Of the entries that contain commas, parentheses, or hyphens in the title, I can detect the Vietnamese ones using some conditional statements in
{{VieIPA}}
. I've been slogging through Thể loại:Mục từ tiếng Việt bất thường to resolve these issues. Currently that category detects hyphens, commas, and slashes, but I could easily expand that to include other invalid characters – I already fixed the ones that contain parentheses awhile back, which is why I removed that conditional from VieIPA, but if you import more, then I can reinstate that conditional. – Nguyễn Xuân Minh (thảo luận, đóng góp) 16:47, 23 tháng 9 2006 (UTC) - "a la carte" does not seem bad: it has gammatical category (phó từ = adverb).Trần Thế Trung 07:40, 24 tháng 9 2006 (UTC)
- For your information, I have found a trick, I am checking in the English Wiktionary if there is only one grammatical category; in such a case, I assign it .... => This means approx. 5000 additional words. Laurent Bouvier 09:39, 26 tháng 9 2006 (UTC)
Re-
sửaI noticed that you've just imported loads of words beginning with "re-" and "door-". A lot of these words have already been imported without the hyphen: we have reentry and re-entry, for example. Could you insert a {{-syn-}}
list to link the two entries together where this has happened? You could even generate an etymology for these entries:
- Let A be whatever's before the hyphen and B be whatever's after the hyphen. If
AB has an entry here andA has an English-language entry here, the etymology is: - If
AB has an entry here butA doesn't have an entry:- Từ A- + B. (This should apply to entries like re-entry/reentry. There are a few exceptions that aren't synonyms, such as re-creation/recreation, but we can solve that using a list of homophones.)
You'd also have to ignore possessives, such as the "'s" in dog's-tail. That could be:
This idea will only work if the FVDP English database doesn't include any entries on prefixes. Maybe some others can comment on this proposal, since my algorithm is very unsophisticated and would probably make a lot of mistakes.
– Nguyễn Xuân Minh (thảo luận, đóng góp) 22:42, 23 tháng 9 2006 (UTC)
- May I suggest the above algorithm to be applied to cases of AB only when A-B exists. Trần Thế Trung 07:33, 24 tháng 9 2006 (UTC)
- It does not seem so simple ... Laurent Bouvier 21:16, 24 tháng 9 2006 (UTC)
Okay, nevermind this algorithm; we actually have prefixes like pre-. :^)
But now we can have a bot use queries like Đặc biệt:Prefixindex/pre- to create these etymologies. – Nguyễn Xuân Minh (thảo luận, đóng góp) 00:45, 28 tháng 9 2006 (UTC)
Conclusion #2
sửa→ This time is the good one! I think that I have imported everything technically possible from this dictionary. They claim to have 109.000 items, I have personally reckoned 102.705 items out of which 10.000 have been excluded. You can find the exclusion list there. Laurent Bouvier 08:55, 7 tháng 10 2006 (UTC)
- Okay, I'll go through and add some of them. That list has a few
{{-clitic-}}
s, some abbreviations, and other entries that would be very useful. By the way, (as far as I can tell) all but one of the entries that begin with a capital letter are nouns (they're from technology and economics dictionaries). The only non-noun beginning with a capital letter that I've found is "apriori", which could be either a noun or adjective. – Nguyễn Xuân Minh (thảo luận, đóng góp) 18:11, 7 tháng 10 2006 (UTC)