-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathindex.html
208 lines (188 loc) · 14 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Semantic Similarity Table - Calculate Batch Similarity with AI Models</title>
<!-- Meta Descriptions and Keywords for SEO -->
<meta name="description" content="Use our Semantic Similarity Table to batch process text files like Excel or CSV and calculate semantic similarity for queries using AI models. Private and secure.">
<meta name="keywords" content="semantic similarity, text analysis, AI model, batch processing, transformers.js, model2vec, NLP, CSV, Excel, private data processing">
<!-- Open Graph / Facebook -->
<meta property="og:title" content="Semantic Similarity Table - Batch Text Analysis with AI Models">
<meta property="og:description" content="Easily process your Excel or CSV files for semantic similarity analysis using AI models with complete data privacy.">
<meta property="og:image" content="path/to/thumbnail.jpg">
<meta property="og:url" content="https://yourwebsite.com">
<meta property="og:type" content="website">
<!-- Twitter -->
<meta name="twitter:card" content="summary_large_image">
<meta name="twitter:title" content="Semantic Similarity Table - Batch Text Analysis with AI Models">
<meta name="twitter:description" content="Easily process your Excel or CSV files for semantic similarity analysis using AI models with complete data privacy.">
<meta name="twitter:image" content="path/to/thumbnail.jpg">
<!-- Favicon -->
<link rel="icon" href="path/to/favicon.ico" type="image/x-icon">
<script src="https://cdn.sheetjs.com/xlsx-0.20.3/package/dist/xlsx.full.min.js"></script>
<script src="https://cdn.tailwindcss.com"></script>
<script>
tailwind.config = {
darkMode: 'class'
}
</script>
</head>
<body class="bg-gray-50 dark:bg-gray-900 min-h-screen transition-colors duration-200">
<div class="max-w-4xl mx-auto p-6">
<!-- Theme Switcher -->
<div class="absolute top-2 right-2 md:top-4 md:right-4">
<div class="flex items-center space-x-2 bg-white dark:bg-gray-800 p-2 rounded-lg shadow-sm">
<select id="themeSelect"
class="bg-gray-100 dark:bg-gray-700 text-gray-800 dark:text-gray-200 rounded px-2 py-1 text-sm">
<option value="system">System</option>
<option value="light">Light</option>
<option value="dark">Dark</option>
</select>
</div>
</div>
<div class=" mb-10">
<!-- Logo and Title -->
<div class="flex flex-wrap items-center justify-center md:justify-start space-x-0 md:space-x-4 space-y-2 md:space-y-0">
<img src="data/sst.svg" alt="Semantic Similarity Table Logo" class="h-12 md:h-16 w-auto">
<h1 class="text-3xl font-bold text-gray-900 dark:text-white text-center md:text-left">
Semantic Similarity Table -
<a href="https://github.com/do-me/semantic-similarity-table" target="_blank" class="text-blue-500 hover:underline">
GitHub
</a>
</h1>
</div>
<p class="text-gray-600 dark:text-gray-400">
Drag and drop your Excel or CSV file to calculate batch semantic similarity for text queries using latest embedding models.
Using <a href="https://github.com/huggingface/transformers.js" target="_blank" class="text-blue-500 hover:underline">transformers.js</a>
with a small model from <a href="https://github.com/MinishLab/model2vec" target="_blank" class="text-blue-500 hover:underline">model2vec</a>
fully privately in your browser. Your data stays on your computer and is not sent anywhere!<br>
Need sample data? Use <a href="data/legal_texts_EU.xlsx" download="legal_texts_EU.xlsx" class="text-blue-500 hover:underline">legal_texts_EU.xlsx</a>,
<a href="data/legal_texts_EU.csv" download="legal_texts_EU.csv" class="text-blue-500 hover:underline">legal_texts_EU.csv</a> or
<a href="data/SDG_Targets_2023.xlsx" download="SDG_Targets_2023.xlsx" class="text-blue-500 hover:underline">SDG_Targets_2023.xlsx</a>.
Note that the first run is slower as it loads and caches the model once (smallest is only 32Mb). Conecutive runs are pretty much instant.
Created by <a href="https://www.linkedin.com/in/dominik-weckm%C3%BCller/" target="_blank" class="text-blue-500 hover:underline">Dominik Weckmüller</a>
(<a href="https://geo.rocks" target="_blank" class="text-blue-500 hover:underline">Blog</a>).<br>
Some of my other projects that inspired semantic similarity table:
<a href="https://do-me.github.io/SemanticFinder/" target="_blank" class="text-blue-500 hover:underline">SemanticFinder</a>,
<a href="https://do-me.github.io/js-text-chunker/" target="_blank" class="text-blue-500 hover:underline">JS Text Chunker</a>,
<a href="https://github.com/do-me/SDG-Analyzer" target="_blank" class="text-blue-500 hover:underline">SDG Analyzer</a>,
<a href="https://do-me.github.io/semantic-hexbins/" target="_blank" class="text-blue-500 hover:underline">Geospatial Semantic Search</a>,
<a href="https://do-me.github.io/qdrant-frontend/" target="_blank" class="text-blue-500 hover:underline">Qdrant Frontend</a>.
I'm available as freelancer for anything related to AI, LLMs, NLP and particularly semantic search and RAG - let's connect on <a href="https://www.linkedin.com/in/dominik-weckm%C3%BCller/" target="_blank" class="text-blue-500 hover:underline">LinkedIn</a>!
</p>
</div>
<div class="space-y-6">
<div id="dropZone"
class="border-2 border-dashed border-gray-300 dark:border-gray-700 rounded-lg p-12 text-center hover:border-blue-500 transition-colors duration-200 cursor-pointer bg-white dark:bg-gray-800">
<div class="space-y-4">
<div class="flex justify-center">
<svg xmlns="http://www.w3.org/2000/svg" class="h-12 w-12 text-gray-400 dark:text-gray-500"
fill="none" viewBox="0 0 24 24" stroke="currentColor">
<path stroke-linecap="round" stroke-linejoin="round" stroke-width="2"
d="M7 16a4 4 0 01-.88-7.903A5 5 0 1115.9 6L16 6a5 5 0 011 9.9M15 13l-3-3m0 0l-3 3m3-3v12" />
</svg>
</div>
<div class="text-gray-600 dark:text-gray-400">
<span class="font-medium">Drop your Excel or CSV file here</span><br>
<span class="text-sm">or click to select</span>
</div>
</div>
<input type="file" id="fileInput" class="hidden" accept=".xlsx,.xls,.csv">
</div>
<div id="columnSettings" class="bg-white dark:bg-gray-800 rounded-lg p-6 shadow-sm space-y-4 hidden">
<!-- Add model selection dropdown before the text column input -->
<div class="space-y-2">
<label for="modelSelect" class="block text-sm font-medium text-gray-700 dark:text-gray-300">
Select model to be used (more parameters = better quality but larger file size)
</label>
<select id="modelSelect"
class="block w-full rounded-md border-gray-300 dark:border-gray-600 shadow-sm focus:border-blue-500 focus:ring-blue-500 sm:text-sm p-2 border bg-white dark:bg-gray-700 text-gray-900 dark:text-gray-100">
<option value="minishlab/potion-base-2M">minishlab/potion-base-2M (32Mb, good model)</option>
<option value="minishlab/potion-base-4M">minishlab/potion-base-4M (35Mb, better model)</option>
<option selected value="minishlab/potion-base-8M">minishlab/potion-base-8M (50Mb, best model)</option>
<option value="minishlab/potion-science-8M">minishlab/potion-science-8M (50Mb, best model for scientific texts)</option>
<option value="minishlab/M2V_multilingual_output">minishlab/M2V_multilingual_output (548Mb, multilingual model)</option>
<option value="minishlab/M2V_base_output">minishlab/M2V_base_output (32Mb, older model)</option>
<!--<option value="minishlab/M2V_base_glove">minishlab/M2V_base_glove (32Mb)</option>
<option value="minishlab/M2V_base_glove_subword">minishlab/M2V_base_glove_subword (32Mb)</option>-->
</select>
</div>
<div class="space-y-2">
<label for="columnName" class="block text-sm font-medium text-gray-700 dark:text-gray-300">Text
column to be used for semantic similarity (first row is used for headers)</label>
<input type="text" id="columnName"
class="block w-full rounded-md border-gray-300 dark:border-gray-600 shadow-sm focus:border-blue-500 focus:ring-blue-500 sm:text-sm p-2 border bg-white dark:bg-gray-700 text-gray-900 dark:text-gray-100"
placeholder="Enter column with text to analyze" value="text">
</div>
<div class="space-y-2">
<label for="queryText" class="block text-sm font-medium text-gray-700 dark:text-gray-300">Query
text (theoretically unlimited in size but for best results keep < 300 words)</label>
<input type="text" id="queryText"
class="block w-full rounded-md border-gray-300 dark:border-gray-600 shadow-sm focus:border-blue-500 focus:ring-blue-500 sm:text-sm p-2 border bg-white dark:bg-gray-700 text-gray-900 dark:text-gray-100"
placeholder="Enter query text" value="Incredibly tasty food">
</div>
<!-- Add checkboxes for each optional column -->
<div class="space-y-4">
<label class="block text-sm font-medium text-gray-700 dark:text-gray-300">Select columns to
add:</label>
<div>
<label class="inline-flex items-center">
<input type="checkbox" id="maxSimCheckbox" class="text-blue-600 dark:text-blue-400" checked>
<span class="ml-2 text-gray-700 dark:text-gray-300">Max similarity</span>
</label>
</div>
<div>
<label class="inline-flex items-center">
<input type="checkbox" id="meanSimCheckbox" class="text-blue-600 dark:text-blue-400"
checked>
<span class="ml-2 text-gray-700 dark:text-gray-300">Mean similarity</span>
</label>
</div>
<div>
<label class="inline-flex items-center">
<input type="checkbox" id="maxSimChunkCheckbox" class="text-blue-600 dark:text-blue-400"
checked>
<span class="ml-2 text-gray-700 dark:text-gray-300">Max similarity chunk</span>
</label>
</div>
<div>
<label class="inline-flex items-center">
<input type="checkbox" id="chunksCheckbox" class="text-blue-600 dark:text-blue-400" checked>
<span class="ml-2 text-gray-700 dark:text-gray-300">Chunks count</span>
</label>
</div>
<div>
<label class="inline-flex items-center">
<input type="checkbox" id="embeddingsCheckbox" class="text-blue-600 dark:text-blue-400">
<span class="ml-2 text-gray-700 dark:text-gray-300">Include embeddings (causes large file size; use only with csv as xlsx has a limit of ~32k chars/cell)</span>
</label>
</div>
</div>
<button id="processButton"
class="w-full bg-blue-600 dark:bg-blue-500 text-white px-4 py-2 rounded-md hover:bg-blue-700 dark:hover:bg-blue-600 focus:outline-none focus:ring-2 focus:ring-blue-500 focus:ring-offset-2 dark:focus:ring-offset-gray-800 transition-colors duration-200">
Process File
</button>
<div id="status" class="hidden">
<div class="bg-green-50 dark:bg-green-900 rounded-md p-4">
<div class="flex">
<div class="flex-shrink-0">
<svg class="h-5 w-5 text-green-400 dark:text-green-300"
xmlns="http://www.w3.org/2000/svg" viewBox="0 0 20 20" fill="currentColor">
<path fill-rule="evenodd"
d="M10 18a8 8 0 100-16 8 8 0 000 16zm3.707-9.293a1 1 0 00-1.414-1.414L9 10.586 7.707 9.293a1 1 0 00-1.414 1.414l2 2a1 1 0 001.414 0l4-4z"
clip-rule="evenodd" />
</svg>
</div>
<div class="ml-3">
<p id="statusMessage" class="text-sm font-medium text-green-800 dark:text-green-200">
</p>
</div>
</div>
</div>
</div>
</div>
</div>
<script type="module" src="main.js"></script>
</body>
</html>