This is really interesting! I've experimented with similar idea, but with time series forecasting on the sentence embeddings - https://github.com/Srakai/embcaster.
It turns out you can tokenise arbitrary information into constant vector which is really useful for later processing. The vec2text (https://github.com/vec2text/vec2text) is an excellent asset if you want to reverse the embeddings back to text. This allows you to encode arbitrary data into standarized vectors, and all the way back.
londons_explore 1 days ago [-]
You can probably make jointly trained decoder to turn a vector back into a new document which most closely matches.
Would be cool to add together the vectors for harry potter and lord of the rings and then decode that into a new book about Frodo going to wizard school to collect the ring to help push Voldemort into mount doom.
https://news.ycombinator.com/item?id=45784455
https://news.ycombinator.com/item?id=45756599
It turns out you can tokenise arbitrary information into constant vector which is really useful for later processing. The vec2text (https://github.com/vec2text/vec2text) is an excellent asset if you want to reverse the embeddings back to text. This allows you to encode arbitrary data into standarized vectors, and all the way back.
Would be cool to add together the vectors for harry potter and lord of the rings and then decode that into a new book about Frodo going to wizard school to collect the ring to help push Voldemort into mount doom.