» 您尚未登录:请 登录 | 注册 | 标签 | 帮助 | 小黑屋 |


发新话题
打印

[电脑] 国外网友对Google抄袭事件的看法

老外认真回的贴不抄,随口kuso倒转来了。

贴点长篇有意义的。
1)I've been thinking about this. Throwing the evilness of Google aside for a moment, why should someone be able to copyright a listing of the phonetic pronunciation of an alphabet?

Let's just imagine how I might create this list. I would have to hire people who spoke the Chinese. Then I would ask them to record the pronunciation of each character that they know. This is pretty easy because in Chinese each character has only one pronunciation (per dialect, anyway). There are about 3500 characters that you need to know in order to be literate. And all of these people would have learned these at school.

But how did they learn them? Well, they had a textbook and they memorized the list from the textbook.

Wait. I can't just memorize a list from one book and put it in another book. That's copyright infringement. In order for it not to be copyright infringement, I need to make sure that my sources all memorized the pronunciations from different sources. That's going to be difficult.

But let's say I do that. Now I have a list of the 3500 most common characters. And with that, I've probably got 99% of everything that's in a newspaper. But that's probably not good enough. I probably want a list
of say 60,000 characters. Otherwise it's pretty useless in a general sense. Uncommon characters are uncommon, but you *will* bump into the words over time.

So where do I find these characters? Can I hire some guy that knows them all? It would be very difficult. The best place to look is in a book. But wait... what am I going to do? Every time I find a character my people don't know, look it up in a book? Why don't I just copy it from the book in the first place? That's just copyright infringement again.

Really, the task of creating this list authoritatively without infringing copyright is monumental. Probably the *only* way to do it is with a community project where people just submit the pronunciations they know.

But if I'm going to have a community project like this, what the heck do I need copyright for? What am I protecting? If everyone is going to contribute, everyone should benefit.

So, personally, I don't think one should have copyright on this kind of material (same thing for spelling). It's just not in the public interest. This goes doubly so now that we have the internet and creating these kinds of projects is very inexpensive.

OK, I've gone on long enough... But one more rant. What's with this "do no evil" thing? Isn't that setting the bar a little low. If I told my parents that I'd work hard not to be evil, I think they'd be somewhat disappointed in me. If Google wanted to actually "do some good" rather than "do no evil", they could start a community project to collect this data and share it with the world.

Sigh... I guess we'll have to wait for some guy in his garage (but here's betting that someone has already started something).

2)Exactly. Reading 95% of the comments for this story and yesterday's story, everyone seems to think that this is about stealing code. This is about Google using the same data to train an algorithm. Both algorithms make the same mistakes because they were trained using the same data, which contained incorrectly labled information. It is whether or not this data was publicly available that is the issue.

For (a horribly contrived) example: Lets say that I write some hand writing recognition software using a neural-net. In order to train my software, I use a large database of handwriting samples that I have found on the web. However, the person that compiled this database made the mistake of labeling all of the sample images of the letter 'n' as the letter 'q', and all of the images of the letter 'q' are labeled as the letter 'n'. Person B comes along and uses the same data set to train a naïve-Bayes classifier. Guess what? Both algorithms will make the same mistakes when it comes to the letters 'n' and 'q'. Not because I stole code from Person B, but because we used the same training data.

I'm not defending Google at all here. If they stole the data from Sohu, they should get in trouble. Based on the fact that Google is in the web-mining business, I would guess that they just grabbed this data off of the net, and someone forgot to think about if they had the right to use it.

3)No, actually, "gook" is a term that originated in the Korean war for Korean people. Because many of the soldiers who fought in the Korean war were officers in the Vietnam war, their racial slurs were adopted and modified by a new generation, leading to great confusion about the origins of the term.

The etymology of the word gook is interesting, because it may be one of the few racial slurs that originated with a people's term for themselves. In Korean, guk means "country" and by extension a country's people; when it is not modified (cf. waiguk, outside country, foreigner) it is understood to be Korea or its peoples. Speakers of Chinese will recognize the word as having sintic origin (gúo, country, and wàigúo, foreign country, respectively, in Mandarin).

The term was appropriated by the Americans during the Korean war and used as a racial slur for Korean people in general, which must have been confusing to the Koreans (imagine someone using "American" as a slur for Americans to get an idea). Then, in Vietnam, the old "Asians are all the same" mentality prompted GIs to extend its meaning (imagine "American" being a racial slur for all white people, for example -- yes, I know many Americans aren't white, it's not a perfect analogy, deal with it).


TOP

发新话题
     
官方公众号及微博