Picture a teacher asking an unprepared student who isn't particularly studious but is highly inventive to answer. He gives fantastically creative but typically wrong answers, entertaining the whole class. This neural network is programmed to generate similar whimsical answers just for entertainment. Start a sentence, and watch as it spins out an amusing narrative as long as you like, punctuating where necessary.
Although the neural network is trained on a corpus of Wikipedia articles, it must generate connected, logically coherent, reasonable text WITHOUT CITING (!!!) pieces from these articles. Consider the network successfully generates the text if a layman in the subject would believe it, and it fails if it outputs nonsense or a quote from a Wikipedia article!
A special parameter “smoothing” regulates the intellectual content of the generated text. The network trained with small “smoothing” values returns “academic” texts, while the large values make it like a “schoolboy” answer.
Find the examples of the generated text with specified smoothing values below. Note that Smoothing = 0
should correspond to the highly intelligent text, Smoothing = 0.1
should correspond to the schoolboy text. Seeded text in bold. The training is ongoing, and the current model output may have some roughness in the texts. However, it gives you a flavor of the network's capabilities.
- In the early twentieth century, it was suggested that
to develop a consistent understanding of the fundamental concepts of mathematics, it was sufficient to study observation. For example, a single electron in an unexcited atom is classically depicted as a particle moving in a circular path around the atomic nucleus...
- Charles_darwin in his book
The Road to Serfdom (1944), Friedrich Hayek (1899–1992) asserted that the free-market understanding of economic freedom as present in capitalism is a requisite of political freedom. This philosophy is really sent to think that are said to be true of the evil trait that is very possible for it. Although many slaves have escaped or have been freed since 2007, as of 2012, only one slave owner had been sentenced to serve time in prison.
- The idea of philosophy is
a source of academic discussion.
- The story begins with
a chapter on the island of Thrinacia, with the crew overriding odysseus's wishes to remain away from the island.
- Mathematics is one of
is one of the most important forms of philosophical knowledge.
- In the early twentieth century, it was suggested that
the chinese crossbow was transmitted to the roman world on such occasions, although the greek gastraphetes provides an alternative origin.
(My comment: gastraphetes is an acient greek crossbow) - Charles_darwin in his book
The Road to Serfdom (1944), friedrich hayek (1899–1992) asserted that the free-market understanding of economic freedom as present in capitalism is a requisite of political freedom. This philosophy is really not 206 and stated that it is good for the consequences of actions.
- The idea of philosophy is
a myth.
- The story begins with
a chapter on the Islands of Weathertop, and is known as Five Years.
- Mathematics is one of
the most important aspects of the argues of the mathematicians.
This neural network predicts the next words in a sequence, enabling it to generate text that continues an input seed text. The model is trained on a text corpus, tokenized, and converted into numerical sequences for learning. The architecture uses embeddings, LSTMs, and feed-forward layers to predict multiple next words in a sequence.
This neural network (NN) predicts the following words in a text sequence (incomplete sentence). It accepts a phrase and continues it as long as needed, setting appropriate punctuation. The purpose of this NN is:
- Test whether a NN can instantly fool the software aimed to detect AI-generated texts.
- A demonstrative and simple example of natural language processing NN
- Entertain. The NN produces funny stories, making you think if this is real.
The model is trained on a text corpus (generated by Extract_wiki_text_content.py
), tokenized, and converted into numerical sequences for learning. The architecture uses embeddings, LSTMs, and feed-forward layers. Note that NN uses its own tokenization instead of the nltk package, allowing its potential users to inspect its machinery.
Training does not have a sheduler that reduces learning rate, adds the second cost function, etc. . This is still done manualy, but it will be fixed in the nearest future.
- Corpus Loading: The dataset created by
Extract_wiki_text_content.py
is loaded using Python'spickle
module. - Tokenizer:
- A custom tokenizer preprocesses text by adding spaces around punctuation and mapping words to unique indices.
- The
Tokenizer
class includes methods to preprocess text, fit the tokenizer on a corpus, and convert text to sequences of indices.
-
TextDataset:
- Converts the tokenized corpus into input-output pairs for training. For each sequence,
n-gram
sequences are created where a portion of the sequence is input, and the subsequent tokens are the target for prediction. - The dataset supports multi-word prediction through a
predict_steps
parameter.
- Converts the tokenized corpus into input-output pairs for training. For each sequence,
-
DataLoader:
- Handles batching, shuffling, and padding sequences to ensure that batches can be efficiently processed by the model. A custom
collate_fn
function is used for padding.
- Handles batching, shuffling, and padding sequences to ensure that batches can be efficiently processed by the model. A custom
The NextWordPredictor model is designed to handle multi-word predictions and consists of the following components:
- Embedding Layer:
- Converts input tokens into dense vector representations of size
embed_size
.
- Converts input tokens into dense vector representations of size
- LSTM:
- A two-layer LSTM processes the input embeddings, capturing temporal dependencies in the sequence.
- Layer Normalization:
- Normalizes the output of the LSTM's final hidden state for improved stability.
- Feed-Forward Layers:
- A series of fully connected layers (optionally with BatchNorm) process the hidden state to generate predictions.
- Final Linear Layer:
- Outputs a tensor of shape
(batch_size, predict_steps, vocab_size)
containing predictions for multiple words.
- Outputs a tensor of shape
- Custom Weight Initialization:
- Xavier initialization is used for weights, and biases are initialized to zero for better convergence.
Modern natural language processing tool developers prefer attention layers, or combining attention layers with LSTM to pure LSTM. However, to outperform LSTM, the attention layers require quite a large training set, which is not the Storyteller case.
The model uses two loss functions. The first loss is 1A custom loss function (multi_word_loss
). It computes the average cross-entropy across the predicted steps. It is quite conventional for natural language processing neural networks.
The second loss LabelSmoothingLoss
is also a cross-entropy loss multiplied by a smoothing parameter that damps the target word probability and increases the probability of other words from the corpus. It helps avoid overconfidence. In other words, this cost mimics your hesitation about the correct answer to the question. The second cost must switch when the training after the consequent learning rate reduction reaches the plateau. It helps to continue further training. The switch is done maually so far and will be automated in the future.
Good cost values. The model start producing meaningful text when the multi_word_loss returns values smaller than 0.35 .
- The model generates text by recursively predicting the next tokens for a given seed text.
- Predictions are translated back to words using the tokenizer's
index_word
dictionary. - Since the neural network surves to amuse a user, the user does the inference: the model can be considered as well trained as soon as the user finds most of the answers amasing and logically structed.
- The user must put the seeding text in the
seeders.py
for inference.
- The training loop uses Adam optimizer in the beginning of the training and then it alternates
Adam
betweenAdamW
, once it reachesw paltau.
Copy zip file with the code from this repository and unzip it in your home folder or run in your terminal:
git clone https://github.com/Vlasenko2006/Storyteller.git
Once you get the code you would need to install the required packages:
- Python 3.8+
- PyTorch
- tqdm
- scikit-learn
- numpy
- Anaconda (recommended for managing the environment)
I recomend you install the the dependences with anaconda (anaconda) using a .yaml
file (see below).
Find the environment.yml
in your folder. If you don't have it, copy and save the code below as environment.yml
, and run conda env create -f environment.yml
to create the environment.
name: story_gen
channels:
- defaults
- conda-forge
dependencies:
- python=3.8
- pytorch=1.10
- torchvision
- torchaudio
- pytorch-cuda=11.3
- tqdm
- scikit-learn
- numpy
- pyyaml
- pip
- pip:
- wikipedia-api
Once you created your environment, activate it running the code below in your terminal:
conda activate story_gen
Once the environment is activated, you can run the script:
python story_telling_nn.py
Run the provided script to:
- Load the dataset.
- Train the model on the dataset.
- Periodically save checkpoints and generate text predictions.
After training, you can generate new stories by providing a seed text to the predict_sequence
function.
- Handles multi-word predictions.
- Customizable architecture:
- Embedding size, LSTM size, feed-forward layers, and more can be adjusted.
- Flexible tokenizer with preprocessed text.
- Trains efficiently using
DataLoader
with padding support.