Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Embedding doc #424

Merged
merged 4 commits into from
Apr 26, 2018
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 5 additions & 7 deletions frontend/src/common/component/AppMenu.vue
Original file line number Diff line number Diff line change
Expand Up @@ -63,13 +63,11 @@ export default {
title: 'TEXTS',
name: 'texts',
},
/* // Hide the top menu
{
url: '/HighDimensional',
title: 'HighDimensional',
name: 'HighDimensional'
}
*/
{
url: '/HighDimensional',
title: 'HighDimensional',
name: 'HighDimensional',
},
],
};
},
Expand Down
8 changes: 7 additions & 1 deletion frontend/src/high-dimensional/HighDimensional.vue
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
:search-text="config.searchText"
:dimension="config.dimension"
:embedding-data="embeddingData"
:show-loading="showLoading"
/>
</div>
<div class="visual-dl-page-right">
Expand Down Expand Up @@ -39,11 +40,12 @@ export default {
searchText: '',
displayWordLabel: true,
dimension: '2',
reduction: 'tsne',
reduction: 'pca',
selectedRun: '',
running: true,
},
embeddingData: [],
showLoading: false,
};
},
created() {
Expand Down Expand Up @@ -83,13 +85,17 @@ export default {
},
methods: {
fetchDatasets() {
this.showLoading = true;

// Fetch the data from the server. Passing dimension and reduction method
let params = {
dimension: this.config.dimension,
reduction: this.config.reduction,
run: this.config.selectedRun,
};
getHighDimensionalDatasets(params).then(({errno, data}) => {
this.showLoading = false;

let vectorData = data.embedding;
let labels = data.labels;

Expand Down
16 changes: 11 additions & 5 deletions frontend/src/high-dimensional/ui/Chart.vue
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,10 @@ export default {
type: String,
required: true,
},
showLoading: {
type: Boolean,
required: true,
},
},
data() {
return {
Expand All @@ -53,15 +57,11 @@ export default {
created() {},
mounted() {
this.createChart();
this.myChart.showLoading();

this.set2DChartOptions();
this.setDisplayWordLabel();
},
watch: {
embeddingData: function(val) {
this.myChart.hideLoading();

// Got new data, pass to the filter function to render the 'matched' set and 'not matched' set
this.filterSeriesDataAndSetOption(this.searchText);
},
Expand All @@ -70,7 +70,6 @@ export default {
},
dimension: function(val) {
this.myChart.clear();
this.myChart.showLoading();
if (val === '2') {
this.set2DChartOptions();
this.setDisplayWordLabel();
Expand All @@ -82,6 +81,13 @@ export default {
searchText: function(val) {
this.filterSeriesDataAndSetOption(val);
},
showLoading: function(val) {
if (val) {
this.myChart.showLoading();
} else {
this.myChart.hideLoading();
}
},
},
methods: {
createChart() {
Expand Down
7 changes: 4 additions & 3 deletions frontend/src/high-dimensional/ui/Config.vue
Original file line number Diff line number Diff line change
Expand Up @@ -31,12 +31,13 @@
label="Reduction Method"
v-model="config.reduction"
dark>
<v-radio
label="T-SNE"
value="tsne"/>
<v-radio
label="PCA"
value="pca"/>
<v-radio
label="T-SNE"
value="tsne"/>

</v-radio-group>

<v-radio-group
Expand Down
15 changes: 12 additions & 3 deletions visualdl/logic/pybind.cc
Original file line number Diff line number Diff line change
Expand Up @@ -253,10 +253,19 @@ PYBIND11_MODULE(core, m) {
.def("total_records", &cp::TextReader::total_records)
.def("size", &cp::TextReader::size);

py::class_<cp::Embedding>(m, "EmbeddingWriter")
py::class_<cp::Embedding>(m, "EmbeddingWriter", R"pbdoc(
PyBind class. Must instantiate through the LogWriter.
)pbdoc")
.def("set_caption", &cp::Embedding::SetCaption)
.def("add_embeddings_with_word_list",
&cp::Embedding::AddEmbeddingsWithWordList);
.def(
"add_embeddings_with_word_list"
R"pbdoc(
Add embedding record. Each run can only store one embedding data.

:param embedding: hot vector of embedding words
:type embedding: list
)pbdoc",
&cp::Embedding::AddEmbeddingsWithWordList);

py::class_<cp::EmbeddingReader>(m, "EmbeddingReader")
.def("get_all_labels", &cp::EmbeddingReader::get_all_labels)
Expand Down
13 changes: 13 additions & 0 deletions visualdl/python/storage.py
Original file line number Diff line number Diff line change
Expand Up @@ -143,6 +143,9 @@ def text(self, tag):
return self.reader.get_text(tag)

def embedding(self):
"""
Get the embedding reader.
"""
return self.reader.get_embedding(EMBEDDING_TAG)

def audio(self, tag):
Expand Down Expand Up @@ -292,9 +295,19 @@ def text(self, tag):
return self.writer.new_text(tag)

def embedding(self):
"""
Create an embedding writer that used to write
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is used to

embedding data.

:return: A embedding writer to record embedding data
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An embedding

:rtype: embeddingWriter
"""
return self.writer.new_embedding(EMBEDDING_TAG)

def save(self):
"""
Force the VisualDL to sync with the file system.
"""
self.writer.save()

def __enter__(self):
Expand Down
37 changes: 28 additions & 9 deletions visualdl/server/lib.py
Original file line number Diff line number Diff line change
Expand Up @@ -307,19 +307,18 @@ def get_embeddings(storage, mode, reduction, dimension=2, num_records=5000):
with storage.mode(mode) as reader:
embedding = reader.embedding()
labels = embedding.get_all_labels()
high_dimensional_vectors = embedding.get_all_embeddings()
high_dimensional_vectors = np.array(embedding.get_all_embeddings())

# TODO: Move away from sklearn
if reduction == 'tsne':
from sklearn.manifold import TSNE
tsne = TSNE(
perplexity=30, n_components=dimension, init='pca', n_iter=5000)
low_dim_embs = tsne.fit_transform(high_dimensional_vectors)
import tsne
low_dim_embs = tsne.tsne(
high_dimensional_vectors,
dimension,
initial_dims=50,
perplexity=30.0)

elif reduction == 'pca':
from sklearn.decomposition import PCA
pca = PCA(n_components=3)
low_dim_embs = pca.fit_transform(high_dimensional_vectors)
low_dim_embs = simple_pca(high_dimensional_vectors, dimension)

return {"embedding": low_dim_embs.tolist(), "labels": labels}

Expand Down Expand Up @@ -393,3 +392,23 @@ def _handler(key, func, *args, **kwargs):
return data

return _handler


# A simple PCA implementaiton to do the dimension reduction.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally we comment methods like this:

def kos_root():
    """Return the pathname of the KOS root directory."""
    global _kos_root
    if _kos_root: return _kos_root
    ...

def simple_pca(x, dimension):
# Center the data.
x -= np.mean(x, axis=0)

# Computing the Covariance Matrix
cov = np.cov(x, rowvar=False)

# Get eigenvectors and eigenvalues from the covariance matrix
eigvals, eigvecs = np.linalg.eig(cov)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What a math guy, Jeff! But do we need to do it ourselves? Moreover, SVD is more stable than eigenvector in terms of computing principle components.

Check here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason behind it so we don't have to import another pip package to do the calculation.
And actually the TSNE file already has a clean PCA implementation. and I could just use that.


# Sort the eigvals from high to low
order = np.argsort(eigvals)[::-1]

# Drop the eigenvectors with low eigenvalues
eigvecs = eigvecs[:, order[:dimension]]

return np.dot(x, eigvecs)
Loading