bug-apl
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug-apl] More word2vec


From: Fred Weigel
Subject: [Bug-apl] More word2vec
Date: Thu, 25 May 2017 12:20:08 -0400

Jürgen, GNU APL Gurus

More on my current AI in APL work. I have implemented functions
setup∆word2vec, distance and analogy in GNU APL. Run setup∆word2vec
first, and then distance (try 'dog' when prompted for input). Try
analogy with 'paris france berlin' (which should, of course, yield
germany). The file vector8 must be in current directory when running the
setup function.

To use this, you will have to build mem.cc -- put it into your GNU APL
source in src/native, and add lib_mem.la to pkglib_LTLIBRARIES, and add
a line 'lib_mem_la_SOURCES = mem.cc'

You then need to 'autoreconf', 'configure' and 'make'. Since this is
still early development, none of that has been automated. Also, this has
ONLY been run on Linux 64 bit (no other platform has been tried). 

See describe∆word2vec for some details on data sizing. You can, of
course, examine the functions in the workspace without having a
lib_mem.so file, but those native functions are needed to run the
sample.

Here are the files (gzip compressed).

https://www.dropbox.com/s/cfcaojjuzjxra7j/mem.cc.gz?dl=0
https://www.dropbox.com/s/97f5umkh3xd72cb/vector8.gz?dl=0
https://www.dropbox.com/s/pfheb6qic9wefqd/word2vec.xml.gz?dl=0

I am still using C code to generate vector8, but I would like to convert
the training to APL as well.

This is an embarrassingly parallel problem. I am thinking about how to
push the access to the dataset lower into the APL to achieve more
efficiency.

Any comments/feedback/ideas are welcome. This is a very simple AI
application, using (at present) a very very small model. I am looking to
begin "scaling" this development soon. I need to be able to support both
very dense datasets and sparse datasets (using additional transfer
calls). The sparse datasets will be for tensor support. Again, feedback
is welcome. I haven't yet implemented any of the tensor stuff -- right
now, concentrating on tooling issues (I like APL for this work).

Fred Weigel



reply via email to

[Prev in Thread] Current Thread [Next in Thread]