Post on 10-Aug-2021
transcript
ryËWikipediağword2vecÀ�ěË\�běğÞ�..
ýÓ¿#1Ŗ�R��2þŘ1SJàCJŖ2{�FGCJř�
üĠĒħĞ�.ÓYś/Ë>ħÒĦŏńŒë(word.embedding)ĥĕĠëʼnĶŅŒ¬ÝŏńŒë(vector.space).ě9ġĮĭ�Ôğ��J¶h�(Mikolov.et.al.,2013)Ċe8İÀ�ēĭ4»bĊĄĭďěĊ¨ĐĮę�£ĐĮęąĭĀ.þ/Ë>ħÒĦŏńŒěĠś½�ÇË&�ĞĈąę5āğ/ËĠ³Ë\nÜİ�!ěēĭ À�ĚĄĭŔŕŋłŅʼnĶŅŒ(onehot.vector).İMÀ�ʼnĶŅŒë(dense).ĞBmēĭĕħĞ·�ĐĮĕ�ÝÀ�Ğ�zēĭ(Elman,.1990)Ā.þ/Ë>ħÒĦŏńŒĚÀ�ĐĮĕ¬Ý�ğʼnĶŅŒÝĚĠ+®ś�®ĝĜğ�®Ċ][¢ĞĠKµĚċĭëďěĉīe8ğÀ�ğ�ÂěĦĝđĆĭĀ..�ř.vec(�)Avec(�)+vec(F)=vec(F�).(�;0�)Ġ�Ýğçl�»ě�ºđĆĭĀ.........
üq�!!ĸŗŇĽşþryËWiki..(hDp://dumps.wikimedia.org/jawiki/latest).Śþ12/sep/2016.ĞŀĴŕœŗņ.ŚþMeCabĞĪĘę]f±Ř/ËřĞ()..ĸŗŇĽĺijľş.MeCabğ'*1,265,697,011]f±(()±�)Ā'�:nĊ5:x�ğËİáąĕ�ĝĬnĚ1,140,357.(vocabularies),.ŅŗĿŒĚ5,774,139ŅŗĶŕĀ�.ŚWord2vecĞĪĬJ¶.þþWord2vecğ�!Ġ128ĀĴIJŕņĴW.10,........Í �ĺŕňőŕķ.10,.skipgram.İ�ąĕĀ..Æ|OÌş.wikipediaĞ'�ēĭ/ËěNTTńŗĿʼnŗĽ(�ßNTTADBě9ģ)ĞI=ēĭ/ËěğŌłŁŕķĊěĮĕ17,842ËĀ�
Fig..1..5�b�ělK�ğ¤Þ;�
ü�_ğ�K�.�_ĐīĞŏńŒğÆ|ēĭİÕħśJ¶ĐĮĕʼnĶŅŒĊĜğĪĆĝ^)İ}ĕđęąĭğĉİk�ēĭ�KĚĄĭĀë lK°ZİjĎĭďěĚśpËĊ'�đĕĬv�B.ēĭÅÉKĞĪĭi�ݽ-�gēĭďěİÊĦĭĀ!
a) word frequency in NTT-DB�
b) written word imageability in NTT-DB�
c) written word familiarity in NTT-DB �
frequency count�pred
icte
d fr
eque
ncy
coun
t�V-imageablity�pr
edic
ted
V-i
mag
eabi
lity�
V-familiarity�
pred
icte
d V
-fam
ilia
rity�
ü0·o�.�DÙguŖÓ¿#.(1999,.2000)..ryËğË\�bŝU/ËÄMZś7U/ËæZś�¥?Ŝ.
�ÝQGŖ�ãà¦âŖ�ÃÎEŖÑT~Ŗ����ŖDÙguŖÓ¿#.(2008)..ryËğË\�b8U/Ëa bś�¥?Ŝ�
Mikolov,.T.,.Yih,.W..tau,.&.Zweig,.G..(2013)..LinguisYc.regulariYes.in.conYnuous.space.word.representaYons..In.Proceedings.of.the.2013.Conference.of.the.North.American.Chapter.of.the.AssociaYon.for.ComputaYonal.LinguisYcs:.Human.Language.Technologies.(NAACL),.Atlanta,.WA,.USA..
�.�
WOMAN�AUNT�
UNCLE�MAN�
QUEEN�
KING�
QUEENS�
KINGS�
QUEEN�
KING�
ü£¢!.NTT.ńŗĿʼnŗĽĻőŗľĞ3ÛĐĮęąĭ/ËÄMZĈĪĢ/ËæZ(DÙěÓ¿,.1999,.2000)ś/Ëa b(�Ýī,2008)ĝĜğË\�bĠśryËğ�āĝ�bİ2tđĕ�b�ĚĄĭĀ/ËÄMZĠ357ğÁé¸ĞĪĭÅ¢ÉK�ĚĄĬś�ğ5/ËĞOēĭĂĝĒĦăğ©ZİÀđęąĭĀ/ËæZĠp¹12Y(İĸŗŇĽěđę/Ëğ'�:nİnćĭďěĞĪĘę�gĐĮĕĨğĚĄĭĀĥĕ/Ëa bĠś/ËÄMZě6�ĞÅ¢ÉK�ĚĄĬś5/ËĞOēĭe8ğijŎŗļđĩēĐİÀđęąĭĀ�qśword2vecŘe.g.,.Mikolov,.et.al.,.2013)İ�ąę�¯ĐĮĭŏńŒĠś¼CĝÚğŃĵĽŅİJ¶ēĭďěĚś/Ëğ$Ïĩ�oÿĔđęśe8¢¬ÝİÀ�4»ĚĄĭě·ćīĮęąĭĀy§«ĠśË\�bńŗĿʼnŗĽğ�ěword2vecĞĪĘęJ¶ĐĮĕ/ËğʼnĶŅŒ¬ÝÀ�ěğÞ�İÆ|ēĭďěİ£¢ěđĕĀēĝįėśword2vecğʼnĶŅŒ¬ÝĠ�İJ¶đś�İÀ�đęąĭğĉİsīĉĞēĭďěİe;đśa�Léĉī`īĮĕNTTńŗĿʼnŗĽğd@ěryËĴIJĵŊńIJığ/Ë>ħÒĦŏńŒěİ�ÐēĭďěĚś�ńŗĿĚÀ�ĐĮĕd@ğÞÔğ�ÈİÊĦĕĀ�
Fig..1'..5�b�ělK�ğ¤Þ;�
2 3 4 5 6
34
56
78
afam
alldata$afam
alld
ata$
‘fitte
d af
am‘
A-familiarity�
pred
icte
d A
-fam
ilia
rity�
c') spoken word familiarity in NTT-DB�
a') word frequency in Wiki�
1 2 3 4 5
12
34
56
logFreqWiki
alldata$logFreqwiki
alld
ata$
‘fitte
d lo
gFre
qwik
i‘pr
edic
ted
freq
uenc
y co
unt�
frequency count�
2 3 4 5 6 7
34
56
7
ImgA
alldata$ImgA
alld
ata$
‘fitte
d Im
gA‘
A-imageablity�pred
icte
d A
-im
agea
bili
ty�b') spoken word imageability in NTT-DB�
üdemo!ĺijŅ�.y¡ÀĚ�ąĕword2vecJ¶�ĦńŗĿŘvector. datařśĈĪĢpython. ( jupyter.notebook)ĞĪĭword2vecğĺŕňŒĸŗņ:.hDps://github.com/ShinAsakawa/2017jpa/�ĕĖđńŗĿĠ. 2017Y7wğryËĴIJĵŊńIJıĀ�!n. 200,. 300,.ğ�ªĀıŒĹőľōĠëskipgram,ëĈĪĢëCBOWĀ.
ü²}ŘØ:V(|ř!.17,842Ëğword2vecŇŐŎŗĿ(128�řĞĪĭ5Ë\�b�ğ:VŏńŒ..adjustedAR2.NTTADB/ËæZŘOnřþëëëëëëëëëëëëëëëëëë.0.5495.NTTADB/Ëa bŘoHřëëëëëëëëëëëëëëëëë.0.4232.NTTADB/ËÄMZŘoHřþëëëëëëëë..........0.3741..Wikipedia%ğ/Ë'�æZŘOnřþ...0.6539.NTTADB/Ëa bŘåAř........................0.4242.NTTADB/ËÄMZŘåAřþëëëëëëëë..........0.3189.
ü·N..ûøùôíúõóĞĪĬJ¶đĕ5/ËğʼnĶŅŒ�ĞĪĭË�bğlKĠśæZĞOđęêg´ĚĄĭĀ�qśÅ¢ĝÉK�ĚĄĭÄMZ1Ģa bĞOđęĠæZğlKĪĬ,ĭĀ�¸İ�Ðēĭěÿ¾XĝĊīa bğqĊ°ZĊêąĀa bĊe8ğijŎŗļđĩēĐݨē�bĚĄĭďěĉīśûøùôíúõóğŇŐŎŗĿ¬ÝĊe8ěÞÔēĭďěěÞ�đęąĭě·ćīĮĭĀ�ĥĕśæZğlKĈąęĠśJ¶Ğ�ąĕòö÷öĞĈčĭæZğqĊêg´ĚĄĬśðññìïîĊp¹Ğ'�ēĭæZĚĄĭěąĆ�bğÖąĊ�ĮęąĭĀĥĕśoH/ËÄMZĞ�ĤęåA/ËÄMZğlKg´ĊcąĀďĮĠoHŘ�HřĔğĨğäđĐĝĜğ�bĊÞ�đęąĭĕħě·ćīĮĭĀ�qśa bĠ"*ŎńIJıĞĪĭÖąĠPĐČś��bğoH/ËěåA/ËĞĪĭe8ğijŎŗļđĩēĐĞCċĝÖąĊĝąĕħě·ćīĮĭĀ!
APPENDIX.A.word2vecĚJ¶đĕvector�ğ.Ş�!ĞĪĭ<ěè×ğÞ�.ŘMikolov.et.al.,2013þĞĪĭř�
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
୰ᅜ
᪥ᮏ
ࢫ
ࢩ
ࢶࢻ
ࢱ
ࢫࢩࢠ
ࢥࢺ
ி
ᮾி
ࢻ
ࢡࢫ
ࢺ
ࢸ
ࢻࢻ
ࢩ
ࢫ
APPENDIX.BþNTTADB3ÛğæZśÄMZśa bÝğ¤Þ.(DÙśÓ¿ś1999;.2000Š�Ýī,.2008ĪĬ)�.