-
Notifications
You must be signed in to change notification settings - Fork 0
/
ADViSELipidomics_book.tmp.html
310 lines (310 loc) · 44.1 KB
/
ADViSELipidomics_book.tmp.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
<div id="about" class="section level1" number="1">
<h1><span class="header-section-number">1</span> About</h1>
<p>This is the documentation of the ADViSELipidomics package. ADViSELipidomics is a novel Shiny app for the preprocessing, analysis, and visualization of lipidomics data. It copes with the outputs from LipidSearch and LIQUID for lipid identification and quantification, and with data available from the Metabolomics Workbench. ADViSELipidomics extracts information by parsing lipid species (using LIPID MAPS classification) and, together with information available on the samples, allows performing several exploratory and statistical analyses. In the presence of internal lipid standards in the experiment, ADViSELipidomics can normalize the data matrix, providing absolute values of concentration per lipid and sample. Moreover, it allows the identification of differentially abundant lipids in simple and complex experimental designs, dealing with batch effect correction.</p>
<p>If you use <strong>ADViSELipidomics</strong> in your publications, we appreciate if you can cite:</p>
<p>E. Del Prete <em>et al.</em> (2022) ADViSELipidomics: a workflow for analyzing lipidomics data DOI: …………</p>
<!--chapter:end:index.Rmd-->
</div>
<div id="install" class="section level1" number="2">
<h1><span class="header-section-number">2</span> Install</h1>
<p>ADViSELipidomics is a stand-alone Shiny application developed in RStudio IDE (RStudio > 1.4) and implemented using the R language (R > 4.0), available at the following GitHub page: <a href="https://github.com/ShinyFabio/ADViSELipidomics" class="uri">https://github.com/ShinyFabio/ADViSELipidomics</a>. ADViSELipidomics is multi-platform. We tested its functionalities on the main operating systems: Windows 10, Windows 11, macOS 12, Ubuntu 18, and Ubuntu 20.
The user must first install R (<a href="https://www.r-project.org" class="uri">https://www.r-project.org</a>) and R studio (<a href="https://www.rstudio.com" class="uri">https://www.rstudio.com</a>), if not yet available. Then, before installing ADViSELipidomics, the user might need to perform a few supplementary steps that depend on the operating systems:</p>
<ul>
<li><p><strong>Windows</strong> Install Rtools, a collection of tools necessary for building R packages in Windows, available at the following link: <a href="https://cran.r-project.org/bin/windows/Rtools" class="uri">https://cran.r-project.org/bin/windows/Rtools</a></p></li>
<li><p><strong>MacOS</strong> The following code should be written in the console:</p></li>
</ul>
<div class="sourceCode" id="cb1"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a>brew install imagemagick<span class="sc">@</span><span class="dv">6</span></span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a>brew install cairo</span></code></pre></div>
<ul>
<li><strong>Ubuntu</strong> The following code should be written in the console:</li>
</ul>
<p>If you are on Ubuntu run the following codes in the console:</p>
<div class="sourceCode" id="cb2"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a>sudo apt install build<span class="sc">-</span>essential libcurl4<span class="sc">-</span>gnutls<span class="sc">-</span>dev libxml2<span class="sc">-</span>dev libssl<span class="sc">-</span>dev</span>
<span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a>sudo apt<span class="sc">-</span>get install libcairo2<span class="sc">-</span>dev</span>
<span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a>sudo apt<span class="sc">-</span>get install libxt<span class="sc">-</span>dev</span>
<span id="cb2-4"><a href="#cb2-4" aria-hidden="true" tabindex="-1"></a>sudo apt install libmagick<span class="sc">++-</span>dev</span>
<span id="cb2-5"><a href="#cb2-5" aria-hidden="true" tabindex="-1"></a>sudo apt<span class="sc">-</span>get install libc6</span>
<span id="cb2-6"><a href="#cb2-6" aria-hidden="true" tabindex="-1"></a>sudo apt<span class="sc">-</span>get install libnlopt<span class="sc">-</span>dev</span></code></pre></div>
<p>Then, for all the operating systems, ADViSELipidomics can be installed by typing the following code in the RStudio console:</p>
<div class="sourceCode" id="cb3"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="cf">if</span>(<span class="sc">!</span><span class="fu">require</span>(<span class="st">"devtools"</span>)){</span>
<span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a> <span class="fu">install.packages</span>(<span class="st">"devtools"</span>)</span>
<span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a>}</span>
<span id="cb3-4"><a href="#cb3-4" aria-hidden="true" tabindex="-1"></a><span class="fu">library</span>(devtools)</span>
<span id="cb3-5"><a href="#cb3-5" aria-hidden="true" tabindex="-1"></a><span class="fu">install_github</span>(<span class="st">"ShinyFabio/ADViSELipidomics"</span>)</span></code></pre></div>
<p>We kindly suggest updating all the R packages requested during the installation process of ADViSELipidomics Shiny application. Be careful that if you need to install many packages and you decide to use compilation, the process could take a lot depending on your hardware and operating system.
Finally to execute ADViSELipidomics the user can type the following code in the RStudio console:</p>
<div class="sourceCode" id="cb4"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="fu">library</span>(ADViSELipidomics)</span>
<span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a><span class="fu">run_ADViSELipidomics</span>()</span></code></pre></div>
<p><img src="https://raw.githubusercontent.com/FortAwesome/Font-Awesome/6.x/svgs/solid/circle-exclamation.svg" width="15" height="15"> <strong>NOTE</strong><br />
Depending on the screen size and especially the resolution of your monitor, ADViSELipidomics interface can be a bit different from how it was thought and built. Try to reduce or increment the zoom using <strong>Ctrl +</strong>/<strong>Ctrl -</strong> for Windows users or <strong>Command +</strong>/<strong>Command -</strong> for Mac users.</p>
<p>Finally, when a new ADViSELipidomics version is released, it can be updated with the same code for the installation.</p>
<!--chapter:end:01-intro.Rmd-->
</div>
<div id="inputdata" class="section level1" number="3">
<h1><span class="header-section-number">3</span> Input Data</h1>
<p>ADViSELipidomics allows the user to import files concerning different types of data:</p>
<ul>
<li><strong>LipidSearch</strong> or <strong>LIQUID.</strong> ADViSELipidomics deals with the data files containing information on chromatographic peak area or peak intensity per lipid, obtained as output from external software for identifying and quantifying lipids (ADViSELipidomics currently supports the output formats from LipidSearch or LIQUID). Moreover, it requires the Target File with details on samples (such as treatments or biological replicates), and the Internal Reference File with bounds for the filtering step in the following modules. ADViSELipidomics shows a quality plot based on the sum of chromatographic peak area per sample (or replicate). In the case of LipidSearch output associated with internal lipid standards, ADViSELipidomics also requires all the Calibration Files for the construction of the calibration curves.</li>
<li><strong>Metabolomics Workbench.</strong> ADViSELipidomics can download in real-time suitable selected lipidomic experiments from the online repository;</li>
<li><strong>Excel.</strong> The user can upload two Excel files: the data matrix and the Target File;</li>
<li><strong>SummarizedExperiment.</strong> The user can upload a SummarizedExperiment R object (SE), with several types of information (data matrix, information on lipids, information on samples, metadata if available).</li>
</ul>
<p>Hence, as can be seen, ADViSELipidomics requires different files that may change between the various data types. To sum up, here is a list with all the required files for each data type:</p>
<ul>
<li><strong>LipidSearch with Internal Standard lipid:</strong>
<ul>
<li>Target file (.xlsx)</li>
<li>Internal Reference file (.xlxs)</li>
<li>Data files coming from LipidSearch related to your samples (.txt)</li>
<li>Calibration File for deuterated (.xlsx)</li>
<li>Calibration File for nonlabeled (.xlsx)</li>
<li>Concentration files coming from LipidSearch related to the internal standard (.txt)</li>
</ul></li>
<li><strong>LipidSearch without Internal Standard lipid:</strong>
<ul>
<li>Target file (.xlsx)</li>
<li>Internal Reference file (.xlxs)</li>
<li>Data files coming from LipidSearch related to your samples (.txt)</li>
</ul></li>
<li><strong>LIQUID</strong>
<ul>
<li>Target file (.xlsx)</li>
<li>Internal Reference file (.xlxs)</li>
<li>Output coming from LIQUID related to your samples(.tsv)</li>
</ul></li>
<li><strong>User’s Excel File</strong>
<ul>
<li>Target file (.xlsx)</li>
<li>Data Matrix File (.xlsx)</li>
</ul></li>
<li><strong>SummarizedExperiment</strong>
<ul>
<li>SummarizedExperiment object (.rds)</li>
</ul></li>
</ul>
<p>For <strong>Metabolomics Workbench</strong> you don not need to import anything, just choose the Metabolomics Workbench ID study.</p>
<p>Before running ADViSELipidomics, make sure that you have all the required files and that they are compiled properly. Apart from the output files from LipidSearch and LIQUID, ADViSELipidomics requires that the Excel files have a given structure with some mandatory columns. Here we provide a guide to the creation of these Excel files.</p>
<div id="sec21" class="section level2" number="3.1">
<h2><span class="header-section-number">3.1</span> Data files (LipidSearch or LIQUID)</h2>
<p>The output of LipidSearch and LIQUID are some text files containing information on chromatographic peak area or peak intensity per lipid. If your data come from <strong>LipidSearch</strong>, you should have a deuterated file and a non-labeled file for each sample (or replicate). The extension of these files should be .txt. ADViSELIpidomics accepts files from LipidSearch version 4.2.29. If your data come from <strong>LIQUID</strong> you can have a positive and a negative file (with a .tsv extension). In any case, put your data file in a folder and rename each file with your sample id in a proper way.</p>
<p><strong>Example:</strong><br />
Your sample is called “AF-1CM” and you have two technical replicates. Then, depending on the output software, the name of the data files should be:</p>
<ul>
<li>for LipidSearch: “AF-1C-M_deuterated_1.txt”, “AF-1C-M_nonlabeled_1.txt” and “AF-1C-M_deuterated_2.txt” and “AF-1C-M_nonlabeled_2.txt”</li>
<li>for LIQUID: “AF-1C-M_positive_1.tsv”, “AF-1C-M_negative_1.tsv” and “AF-1C-M_positive_2.tsv”, “AF-1C-M_negative_2.tsv”</li>
</ul>
<p>The last two characters (e.g. “_1”) refer to the technical replicate. If you don not have technical replicates just remove these two last characters (so for example in the case of LipidSearch you should have “AF-1C-M_deuterated.txt” and “AF-1C-M_nonlabeled.txt”).</p>
<p><img src="https://raw.githubusercontent.com/FortAwesome/Font-Awesome/6.x/svgs/solid/circle-exclamation.svg" width="15" height="15"> <strong>NOTE</strong><br />
In the choice of your sample name, it’s better to avoid special characters and <strong>DO NOT use underscores (_)</strong> . This character is used by ADViSELipidomics to split the file name into three parts: the sample name, the type of file (deuterated/nonlabeled or positive/negative), and the technical replicate as shown in the following picture:</p>
<p><img src="images/notation_files.png" width="100%" /></p>
<p>For example, a bad name could be “Blood_bag_deuterated_1.txt”, while a good name is “Blood-bag_deuterated_1.txt”.</p>
</div>
<div id="sec22" class="section level2" number="3.2">
<h2><span class="header-section-number">3.2</span> Target File (LipidSearch, LIQUID, and User’s Excel File)</h2>
<p>The Target File is an Excel file that contains all the information about your samples. It is the most important file since it is used for LipidSearch import, LIQUID import, and User’s Excel File import. This file requires some mandatory columns that have to be filled with some criteria:</p>
<ul>
<li><strong>SampleID (LipidSearch, LIQUID, and User’s Excel)</strong> this column contains the ID of each sample. To prevent errors, the best way to write the IDs is: “samplename_1” where in “samplename” you can write your sample name and “_1” represents the identification for the technical replicate. If your experiment does not have technical replicates, you can simply write “samplename”. A good SampleID could be “AF-1C_1” if technical replicates are present, or “AF-1C” if not. A bad name could be “AF_1C_1” (with another underscore).</li>
<li><strong>File_name (LipidSearch, LIQUID)</strong> this column contains the name of the data files coming from LipidSearch or LIQUID. In both cases, for each sample there are two different files. In LipidSearch you have a “deuterated” and a “nonlabeled” file, while in LIQUID you have a “positive” and a “negative” file. Depending on your data type, write both file names in the corresponding cell separated by a <strong>semicolon “;”</strong> without any space.</li>
<li><strong>Norm_factor (LipidSearch, LIQUID - optional)</strong> If you need to normalize your data by a normalization factor, you can add this column and write a number (be careful with decimal points) for each sample. If it is not present, data will not be normalized.</li>
</ul>
<p>The picture below shows a Target File example where the mandatory columns are enlightened in yellow and the optional column in green. You can fill the Target File with any other informative column, just try to avoid special characters like \^$?*/|+()[]-{} and whitespace. You can use <strong>_</strong> instead of whitespace. For example, a column “Bio_replicate” may contain values like “BD1” or “BD_1” but not “BD-1” (Differential Analysis does not work if there are “-”).</p>
<p><img src="https://raw.githubusercontent.com/FortAwesome/Font-Awesome/6.x/svgs/solid/circle-exclamation.svg" width="15" height="15"> <strong>NOTE</strong><br />
If your Target File does not contain at least one informative column about the samples (e.g. Product, Model_type, etc.), you can not perform any exploratory or statistical analysis.</p>
<p><img src="images/target_file_example.png" width="100%" /></p>
<p>The example target file can be downloaded from here:</p>
<p><img src="https://raw.githubusercontent.com/FortAwesome/Font-Awesome/6.x/svgs/solid/download.svg" width="20" height="20"> <a href="https://github.com/ShinyFabio/ADViSELipidomics_book/raw/main/data_example/Targetfile_Lipidomics_example.xlsx">Targetfile_Lipidomics.xlsx</a></p>
</div>
<div id="sec23" class="section level2" number="3.3">
<h2><span class="header-section-number">3.3</span> Internal Reference File (LipidSearch, LIQUID)</h2>
<p>In LipidSearch and LIQUID option, ADViSELipidomics requires also another Excel file here called Internal Reference File which contains the list of the Internal Standard lipids defined per class and adduct, upper/lower bounds for the number of carbon atoms, upper/lower bounds for the number of double bonds, nominal standard concentration, and upper/lower bounds for the concentration linearity in the calibration curves. This file has many mandatory columns that depend both on the external software (LipidSearch, LIQUID) and the presence of internal standards (only for LipidSearch).</p>
<ul>
<li><strong>LipidSearch</strong>
<ul>
<li><strong>Class</strong> lipid class of interest according to the nomenclature of LipidSearch (e.g. <em>DG</em> )</li>
<li><strong>Ion</strong> the ion of interest written according to the nomenclature of LipidSearch(e.g. <em>M-H</em> )</li>
<li><strong>MinRt</strong> minimum retention time of the class (numeric)</li>
<li><strong>MaxRt</strong> maximum retention time of the class (numeric)</li>
<li><strong>InternalStandardLipidIon</strong> name of each internal lipid standard according to the <strong>nomenclature of LipidSearch*</strong> (e.g. <em>Cer(d18:1_17:0)-H</em> )</li>
<li><strong>MinLinearity</strong> minimum value for the range of linearity in the calibration curves. (numeric) ONLY IF YOU USE INTERNAL STANDARD</li>
<li><strong>MaxLinearity</strong> maximum value for the range of linearity in the calibration curves. (numeric) ONLY IF YOU USE INTERNAL STANDARD</li>
<li><strong>NominalStdConcentration</strong> concentration of each internal lipid standard initially spiked into the sample (numeric) ONLY IF YOU USE INTERNAL STANDARD</li>
</ul></li>
</ul>
<p>*<strong>nomenclature of LipidSearch:</strong> LipidSearch nomenclature is very similar to Lipid Maps nomenclature except for the use of underscore instead of backslash between different stereospecific numbering (sn) and the absence of the double-bond geometry. An example of the required nomenclature can be seen in the picture below.</p>
<p><img src="images/nomenclature.png" width="100%" /></p>
<p>The picture below shows an Internal Reference File example in the case of LipidSearch and the presence of Internal Standards. In yellow are the mandatory columns, and in green the columns needed only in the presence of Internal Standards. The “Unit_measure” column is not used but can be helpful to check that each standard concentration has the same unit of measure. If you have different units of measure, convert them to the same.</p>
<p><img src="images/internal_reference_example.png" width="100%" /></p>
<p>The Internal Reference File example for the LipidSearch with Internal Standards option can be downloaded from here:</p>
<p><img src="https://raw.githubusercontent.com/FortAwesome/Font-Awesome/6.x/svgs/solid/download.svg" width="20" height="20"> <a href="https://github.com/ShinyFabio/ADViSELipidomics_book/raw/main/data_example/Internal_Reference_file_LipidSearch_withIS.xlsx">Internal_Reference_file_LipidSearch_withIS.xlsx</a></p>
<ul>
<li><strong>LIQUID</strong>
<ul>
<li><strong>Class</strong> lipid class of interest according to the nomenclature of LIQUID (e.g. <em>DG</em> )</li>
<li><strong>Adduct</strong> the ion of interested written according to the nomenclature of LIQUID (e.g. <em>[M-H]+</em> )</li>
<li><strong>MinRt</strong> minimum retention time of the class (numeric)</li>
<li><strong>MaxRt</strong> maximum retention time of the class (numeric)</li>
</ul></li>
</ul>
<p>The picture below shows an Internal Reference File example in the case of LIQUID. In yellow are the mandatory columns.</p>
<p><img src="images/int_reference_LIQUID.png" width="100%" /></p>
<p>The Internal Reference File example for the LIQUID option can be downloaded from here:</p>
<p><img src="https://raw.githubusercontent.com/FortAwesome/Font-Awesome/6.x/svgs/solid/download.svg" width="20" height="20"> <a href="https://github.com/ShinyFabio/ADViSELipidomics_book/raw/main/data_example/Internal_Reference_file_LIQUID.xlsx">Internal_Reference_file_LIQUID.xlsx</a></p>
</div>
<div id="sec24" class="section level2" number="3.4">
<h2><span class="header-section-number">3.4</span> Calibration Files (LipidSearch with Internal Standards)</h2>
<p>In the case of LipidSearch, if you have Internal Standard, you can choose to use them or not. In this case, you need to upload also some Calibration Files which are two Excel Files, and the data files coming from LipidSearch (here called concentration files). The concentration files are the same .txt files described in Section @ref(sec21). Please, refer to that Section if you need more information about how to rename the files. Be sure that all the concentration files are inside a folder and that they aren not mixed with the data files of Section @ref(sec21).
Next, ADViSELipidomics, requires two Calibration Excel files, one for the Non-labeled and the other for the Deuterated. They share the same structure:</p>
<ul>
<li><strong>Concentration (ng/mL)</strong> the concentration of the standard (numeric)</li>
<li><strong>Class</strong> the lipid classes of interest separated by a comma <strong>,</strong> (e.g. <em>PG,PS,PI,PE,SM,PC,TG,DG</em> )</li>
<li><strong>Name</strong> the name of the data files coming from LipidSearch. They have to match perfectly with the file names. If you have technical replicates, separate them by a <strong>semicolon “;”</strong> without any space. (for example in the deuterated: <em>ISMix_5ugmL_deuterated_1.txt;ISMix_5ugmL_deuterated_2.txt;ISMix_5ugmL_deuterated_3.txt</em>)</li>
</ul>
<p>The picture below shows an example of a Calibration Excel file for the deuterated.</p>
<p><img src="images/calibration_deut_example.png" width="100%" /></p>
<p>An example of the Calibration Deuterated and Calibration Nonlabeled Excel files can be downloaded here:</p>
<p><img src="https://raw.githubusercontent.com/FortAwesome/Font-Awesome/6.x/svgs/solid/download.svg" width="20" height="20"> <a href="https://github.com/ShinyFabio/ADViSELipidomics_book/raw/main/data_example/Calibration_Deuterated.xlsx">Calibration_Deuterated.xlsx</a></p>
<p><img src="https://raw.githubusercontent.com/FortAwesome/Font-Awesome/6.x/svgs/solid/download.svg" width="20" height="20"> <a href="https://github.com/ShinyFabio/ADViSELipidomics_book/raw/main/data_example/Calibration_NonLabeled.xlsx">Calibration_NonLabeled.xlsx</a></p>
</div>
<div id="sec25" class="section level2" number="3.5">
<h2><span class="header-section-number">3.5</span> User’s Excel File</h2>
<p>If you already have a matrix file containing the abundance for each lipid, you need just two Excel files: a Target File and a Data Matrix File. Here the Target File has only one mandatory column, the SampleID. The Data Matrix (.xlsx file) must have the list of the lipids in the first column, which must be called <em>“Lipids”</em>, and then the samples (or replicates) in the following columns, with the column names that are the same of the <em>SampleID</em> from the Target File. It is not necessary that the matrix is full (i.e. without missing values) since after uploaded, it is possible to filter and impute NAs. The picture below shows an example of a Data Matrix Excel file.</p>
<p><img src="images/excel_file_example.png" width="100%" /></p>
<p><img src="https://raw.githubusercontent.com/FortAwesome/Font-Awesome/6.x/svgs/solid/circle-exclamation.svg" width="15" height="15"> <strong>NOTE</strong>
The column names in the data matrix must follow the same SampleID rules described in Section @ref(sec22).</p>
<p>An example of the Data Matrix can be downloaded from here:</p>
<p><img src="https://raw.githubusercontent.com/FortAwesome/Font-Awesome/6.x/svgs/solid/download.svg" width="20" height="20"> <a href="https://github.com/ShinyFabio/ADViSELipidomics_book/raw/main/data_example/Excel_Data_Matrix.xlsx">Excel_Data_Matrix.xlsx</a></p>
</div>
<div id="sec26" class="section level2" number="3.6">
<h2><span class="header-section-number">3.6</span> SummarizedExperiment</h2>
<p>ADViSELipidomics allows the user to load a SummarizedExperiment (SE) object, saved as a .rds file, already prepared or previously downloaded after running ADViSELipidomics. Since the required SE object has a complex structure, we do not recommend uploading a SE object that was not downloaded from ADViSELipidomics. The idea behind this option was that the user can save the SE object after the preprocessing steps and performs the exploratory and statistical analysis at another moment.</p>
</div>
<div id="sec27" class="section level2" number="3.7">
<h2><span class="header-section-number">3.7</span> Metabolomics Workbench</h2>
<p>In the case of Metabolomics Workbench, you don not need to import anything, because ADViSELipidomics downloads a selected Metabolomics Workbench experiment and converts it into an SE object.</p>
<!--chapter:end:02-inputdata.Rmd-->
</div>
</div>
<div id="guide" class="section level1" number="4">
<h1><span class="header-section-number">4</span> Guide</h1>
<p>ADViSELipidomics has a graphical user interface (GUI) implemented using the shiny and golem R packages. It has five main sections: Home, Data Import & Preprocessing, SumExp Visualization, Exploratory Analysis, and Statistical Analysis. Each section is accessible from a sidemenu on the left.</p>
<div id="sec31" class="section level2" number="4.1">
<h2><span class="header-section-number">4.1</span> Home section</h2>
<p>The Home section includes general information about ADViSELipidomics as the citation, the link to the GitHub page, and the link to this manual. From the “Start!” button, it is possible to go to the following section where the user can upload the lipidomic data.</p>
</div>
<div id="sec32" class="section level2" number="4.2">
<h2><span class="header-section-number">4.2</span> Data Import & Preprocessing</h2>
<p>This section allows to import and process lipidomic data from various sources.
When you open this section for the first time after launch, a message box appears and asks you to write your name and your company. This information will be stored in the final output of ADViSELipidomics. By default, if you click on “Run” the User will be <em>“Name”</em> and the Company will be <em>“Company”</em>.
The picture below shows the Data Import & Preprocessing section (with the different parts enlightened with red rectangles).</p>
<p><img src="images/module_inpute.png" width="100%" /></p>
<ul>
<li><strong>Rectangle A</strong> allows the user to choose between LipidSearch, LIQUID, Excel files, Summarized Experiment, and Metabolomics Workbench. Moreover, it is also possible to select between experiments with or without internal standards (this option is available only for LipidSearch import).</li>
<li><strong>Rectangle B</strong> shows three different steps for Importing & Filtering: importing data, storing, and reading data, filtering data.</li>
<li><strong>Rectangle C</strong> shows five additional steps for Calibration: importing calibration files, storing calibration files, selecting the folder for the results, selecting calibration options, application of recovery.</li>
<li><strong>Rectangle D</strong> shows two different steps for Filtering and Missing Data imputation and creating the SummarizedExperiment object. Note that the layout of the Data Import & Preprocessing section and the required files depends on the type of input data format choosen by the user. Go to Chapter @ref(inputdata) if you need help gathering all the required files.</li>
</ul>
<p>Since the option LipidSearch output with Internal Standard (IS) has the largest number of required files and steps, here we provide a complete guide for this case. Anyway, this guide applies also to LipidSearch without IS and to LIQUID: in these cases, the only difference is that there isn’t the CALIBRATION module (Section @ref(sec322)).</p>
<div id="sec321" class="section level3" number="4.2.1">
<h3><span class="header-section-number">4.2.1</span> LipidSearch (IS) EXAMPLE - IMPORTING & FILTERING module</h3>
<p>The first module is the IMPORTING & FILTERING module, where the user can upload the Target File, the Internal Reference File, and the Data files from LipidSearch.</p>
<p><img src="images/input_step.png" width="60%" style="display: block; margin: auto;" /></p>
<ul>
<li><strong>Step 1.</strong> The first files that you must import are the Target File and the Internal Reference File (Rectangle A, steps 1 and 2). Next to each of them there is a button (yellow squared rectangle) that allows you to edit the Excel files. You can filter the rows by one or more conditions, select only the needed columns, and download the edited data. To apply the editing, you have to enable the button next to the download button and click on the “Done” button (right-top corner). Anyway, a help button guides you through the editing options.</li>
<li><strong>Step 2.</strong> Here, you choose the folder containing the data files coming from LipidSearch (only the data files related to the samples and NOT to the IS). After selecting the folder, click on the “Read Data” button, and ADViSELipidomics will start reading all the data files. A progress bar shows the percentage of completion. When the reading process is completed, you can perform a quality check on the area of each sample by clicking the “Quality check” button.</li>
<li><strong>Step 3.</strong> Finally, here you can filter non-informative lipids based on retention time in the range, number of carbon atoms in the range, even number of carbon atoms, number of double bonds in the range, duplicated lipids. The two sliders allows you to choose the range for the carbon number and the double bound number. The other filters come from the Internal Reference File. If there are duplicated lipids (same m/z values for lipid peaks), ADViSELipidomics takes only the lipids with the maximum peak area. The “Filter Data” button starts this process. In the end, you can check the filtered data for each sample.</li>
</ul>
</div>
<div id="sec322" class="section level3" number="4.2.2">
<h3><span class="header-section-number">4.2.2</span> LipidSearch (IS) EXAMPLE - CALIBRATION module</h3>
<p>If the previous module is completed, the CALIBRATION module appears next to it. The Calibration module creates the calibration curves and the calibration matrix. It uses the Internal Lipid Standards reported in the Internal Reference file, and the correspondence between the Concentration Files and the lipid classes declared in the Calibration File. This module extracts the relationships between peaks area and concentration values for each internal lipid standard, constructing the calibration curves with a linear model and plotting them. The linear regression model can be classical or robust, with zero or non-zero intercept. Finally, the calibration matrix resumes all the points from the calibration curves. After the calibration process, ADViSELipidomics stores slope and intercept values for the recovery module.
As already stated, this module appears only if you are using LipidSearch output with Internal Standard (and you clicked on “Yes” in the radiobutton that asks you <em>“Do you have internal standards?”</em>). In this module, you need two Calibration Files (.xlsx, see Section @ref(sec24)) and the Concentration files coming from LipidSearch related to the internal standard (.txt, see Section @ref(sec21)).</p>
<p><img src="images/calibration_step.png" width="60%" style="display: block; margin: auto;" /></p>
<ul>
<li><strong>Step. 1</strong> Here, you can upload the Calibration Files (.xlsx) for both Deuterated and Nonlabeled. This step is remarkably similar to Step 1 of the IMPORTING & FILTERING module. You can find further information about these files in Section @ref(sec24).</li>
<li><strong>Step. 2</strong> Select the folder containing the Concentration files coming from LipidSearch and click on “Read the concentration files”. Also, this step is very similar to step 2 of the previous module.</li>
<li><strong>Step. 3</strong> Here, you can select the folder where saving the output from LipidSearch. ADViSELipidomics creates for you the folder structure.</li>
<li><strong>Step. 4</strong> In this step you can choose some calibration options and visualize the calibration plot for each standard.</li>
<li><strong>Step. 5.</strong> Finally, you can apply the recovery percentage to the concentration values for each lipid, considering the Internal Lipid Standards as lipid class reference. This normalization provides absolute concentration values for the lipids, and the resulting concentration matrix can be seen by clicking on the “Check concentration matrix” button. Here it’s possible also to visualize the missing values(if applicable). Moreover, from the “Download LOL” button you can download a table containing all the lipids filtered because outside the linearity range.</li>
</ul>
</div>
<div id="sec323" class="section level3" number="4.2.3">
<h3><span class="header-section-number">4.2.3</span> LipidSearch (IS) EXAMPLE - MISSING DATA & SUMMARIZED EXPERIMENT</h3>
<p>This is the last module of the preprocessing menu where you can filter and impute missing values (NAs), build the SummarizedExperiment object, and download it.</p>
<p><img src="images/filtering_step.png" width="60%" style="display: block; margin: auto;" /></p>
<ul>
<li><strong>Step 1.</strong> In the first step ADViSELipidomics computes the percentage of NAs, for each lipid (matrix rows) and each sample (matrix columns). Second, it allows retaining only lipids and/or samples with a percentage of missingness below thresholds chosen using the sliders. For example, if you set <em>Max missing data percentage allowable on lipids</em> to 0.3 <em>Max missing data percentage allowable on samples</em> to 0.6 that means that only lipids (rows) with less than 30% of NAs and samples (columns) with less than 60% of NAs are stored. After that, by clicking on the “Check filtered NAs” button, ADViSELipidomics provides the missing data distributions and the data dimension before and after filtering NAs.</li>
<li><strong>Step 2.</strong> Next, you can impute the remaining NAs with different imputation methods, three Not Model-Based (mean, median, and knn) and one Model-Based (irmi).</li>
<li><strong>Step 3.</strong> In the final step, ADViSELipidomics build the SummarizedExperiment object and download it. By clicking on the “See the results” you will be redirected to the next menu, SumExp Visualization.</li>
</ul>
</div>
</div>
<div id="sec33" class="section level2" number="4.3">
<h2><span class="header-section-number">4.3</span> SumExp Visualization</h2>
<p>Once you ended successfully the Preprocessing module, the first thing that you can do is check the just created SummarizedExperiment (SE) object. This can be done in the SumExp Visualization menu. The complex structure of the SE object can be explored by a red gear icon where you can choose what part of the SE object should be shown and summarise the data (if you have technical replicates).</p>
<p><img src="images/sumexp.png" width="100%" /></p>
<p>The picture above shows the rowData part of the SE object containing the annotation on lipids. Each lipid in the “Lipids” contains a hyperlink to the SwissLipids online repository to provide structural, biological, and analytic details.</p>
</div>
<div id="sec34" class="section level2" number="4.4">
<h2><span class="header-section-number">4.4</span> Exploratory Analysis</h2>
<p>The Exploratory Analysis menu includes three sub-menus: Plots, Clustering, and Dimensionality Reduction.</p>
<div id="sec341" class="section level3" number="4.4.1">
<h3><span class="header-section-number">4.4.1</span> Plots</h3>
<p>This sub-menu allows the user to create different types of plots to show the trend and behavior of data, exploring them from lipid and/or sample points of view. It has four panels: Lipids, Scatterplot, Heatmap, and Quality plots.</p>
<ul>
<li><strong>Lipid plots.</strong> It is possible to 1) represent the lipid class distribution (counts of lipids per class) with a pie chart, boxplot, and spider plot; 2) visualize the percentage proportion of lipid class for each sample using a barplot, 3) compare the lipid species abundance for each condition; 4) inspect the abundance of a lipid, selected by the user, in relationship with a feature from the target file (e.g., treatment) using boxplots;</li>
<li><strong>Scatterplots.</strong> It is possible to visualize the relationship between lipid abundance in two samples;</li>
<li><strong>Heatmap.</strong> It provides a highly customizable heatmap to show possible clusters among lipids or samples. The user can select many parameters: a) row annotation with the feature from the target file, b) column annotation with the information from lipids parsing, c) dendrograms for lipids and/or samples, d) distance function (Euclidean, maximum, Canberra), e) clustering method (complete, average, median, Ward); f) number of clusters for lipids and/or samples. The user can select an area in the overall heatmap and have a detailed zoom of the area itself, with associated information;</li>
<li><strong>Quality plots.</strong> It provides different typologies of plots (barplot, boxplot, density plot) to show the total amount of abundance (logarithmic scale) per sample, considering as reference a feature from the target file, to show possible unexpected behavior among samples or replicates for the same sample.</li>
</ul>
<p>The picture below shows an example of the Lipid class proportion plot.</p>
<p><img src="images/taxabarplot.png" width="100%" /></p>
</div>
<div id="sec342" class="section level3" number="4.4.2">
<h3><span class="header-section-number">4.4.2</span> Clustering</h3>
<p>The Clustering sub-menu allows the user to cluster the data by lipids or samples. The user can choose the number of clusters and the clustering method among the following algorithms: hierarchical clustering (using single, complete, Ward as linkage function) or partitioning clustering (k-means, PAM, Clara). If you choose a partitioning clustering, ADViSELipidomics performs before a PCA. Additional plots, such as the silhouette plot, can suggest the number of clusters to use.</p>
<p><img src="images/clustering.png" width="100%" /></p>
</div>
<div id="sec343" class="section level3" number="4.4.3">
<h3><span class="header-section-number">4.4.3</span> Dimensionality Reduction</h3>
<p>The Dimensionality Reduction sub-menu allows the user to choose between unsupervised (PCA) and supervised approaches (PLS-DA, sPLS-DA) to represent the data in a two or three-dimensional space. It contains three panels PCA, PLS-DA, and sPLS-DA.</p>
<ul>
<li><strong>PCA.</strong> ADViSELipidomics computes the Principal Component Analysis (PCA), showing the results with different plots: a) 2D plot, b) biplot, c) scree plot, d) loadings plot, e) 3D plot. The user can highlight the features from the target file with different colors and select the number of components to use for the loading plots.</li>
<li><strong>PLS-DA.</strong> ADViSELipidomics computes the Partial Least Square - Discriminant Analysis, showing the results with a 2D plot and a Correlation Circle plot. The user can select the group variable and the number of components for the computation. The 2D plot can be customized from the red gear icon. Furthermore, ADViSELipidomics can perform a Cross-Validation to identify the best number of components. It may take a while.</li>
<li><strong>sPLS-DA.</strong> ADViSELipidomics can compute also the sparse version of the PLS-DA. The panel is very similar to the PLS-DA panel, but since it’s a sparse version, it is possible to choose the number of variables to select on each component (called “KeepX”). Still here ADViSELipidomics can perform a Cross-Validation that helps the user to choose the best number of components and the best “KeepX”.</li>
</ul>
<p>Here’s an example of a 2D plot for the PCA.</p>
<p><img src="images/pca.png" width="100%" /></p>
</div>
</div>
<div id="sec35" class="section level2" number="4.5">
<h2><span class="header-section-number">4.5</span> Statistical Analysis</h2>
<p>The Statistical Analysis menu includes two sub-menus: Differential Analysis and Enrichment Analysis.</p>
<div id="sec351" class="section level3" number="4.5.1">
<h3><span class="header-section-number">4.5.1</span> Differential Analysis</h3>
<p>The Differential Analysis sub-menu applies statistical algorithms to identify lipids with a different abundance among samples associated with experimental conditions (i.e., treatment versus control). It has two panels: <strong>Build DA</strong> and <strong>Comparisons</strong>. The first allows the user to build and run the differential analysis, while the second shows the “differential expressed” lipids with a Venn Diagram and an Upset plot.</p>
<p>The picture below shows the first panel, <strong>Buil DA</strong>, with the different parts enlightened in red rectangles.</p>
<p><img src="images/DA.png" width="100%" /></p>
<ul>
<li><strong>Rectangle A.</strong> Here, you can select between one of the two SE objects obtained from the previous steps (i.e., the one with the lipid abundance of all samples or where the technical replicates are averaged). It is possible to normalize the data matrix by a scaling factor at this stage (“Normalization between replicates or samples”). Moreover, when a data matrix has technical replicates, you can even incorporate the replicate effect in the model by checking the “Replicates effect” box.</li>
<li><strong>Rectangle B.</strong> Here, you can build your experimental designs. An ADViSELipidomics complex design can include up to two experimental conditions and at most two variables to consider as batch effects. First, you can select a primary variable, and with the plus button, you can add a second variable. Next, you can decide to consider the batch effect by choosing up to two batch variables. More in detail, ADViSELipidomics copes with the batch effects by either fitting the model with the batch variables or removing the batch effect before fitting the model. To handle the batch effect, ADViSELipidomics uses the removeBatchEffect function from the limma package or the ComBat function (parametric or non-parametric method) from the SVA package. Finally, with the “Write contrasts” button, it opens a box where you can generate the contrast list. This functionality works with up to two total variables (e.g. primary variable + secondary variable (“Batch type” set to “remove”) or primary variable + primary batch variable with “Batch type” set to “fit”).</li>
<li><strong>Rectangle C.</strong> Here, you can select a threshold for the adjusted p-values, and a method for the <em>decideTests</em> function (limma package) used to identify “differentially expressed” lipids. Finally, click on the “Run DA” button to run the differential analysis. You can also check and download a table of results (TopTable).</li>
<li><strong>Rectangle D.</strong> In this rectangle there are some plot options, as the choice of the contrast, the threshold for the logFC, and the possibility of adding another plot. There are two different plots: a volcano plot and a MA plot, both interactive where you can change the fill variable and add labels to lipids of interest.</li>
</ul>
<p>After ADViSELipidomics performed the DA, you can go to the <strong>Comparisons</strong> panel to visualize the “differential expressed” lipids and perform pairwise comparisons between different contrasts using the Venn diagram and the Upset plot. Finally, it reports the list of common lipids in tabular form. These two plots are available only with at least two contrasts.</p>
<p><img src="images/venn.png" width="100%" /></p>
</div>
<div id="sec352" class="section level3" number="4.5.2">
<h3><span class="header-section-number">4.5.2</span> Enrichment Analysis</h3>
<p>The Enrichment Analysis sub-menu allows for building different lipid sets from the chemical features of the lipids: i.e., lipid classes, total chain length (the sum of all carbon atoms in the tails), total unsaturation (the sum of all the double bonds in the tails). After defining a ranking for the differential abundant lipids (i.e., ranking considering logarithmic Fold Change, p-value, adjusted p-value, or B statistic), it identifies enriched sets of lipids using a permutation test. To achieve a robust result, it was necessary to perform a few million permutations, hence this process may take a while. Since Enrichment Analysis takes as input the differential analysis results, you need first to run the last one.</p>
<p><img src="images/enrichment.png" width="100%" /></p>
<!--chapter:end:03-guide.Rmd-->
</div>
</div>
</div>
<div id="filestudy" class="section level1" number="5">
<h1><span class="header-section-number">5</span> Files Case Study #1</h1>
<p>If you want to test the software with a LipidSearch output, here we provide the files used in the Case Study #1 as described in the supplementary to our paper […add link].</p>
<p><img src="https://raw.githubusercontent.com/FortAwesome/Font-Awesome/6.x/svgs/solid/download.svg" width="20" height="20"> <a href="https://github.com/ShinyFabio/ADViSELipidomics_book/raw/main/data_example/Case_Study_%231.zip">Case_Study_#1.zip</a></p>
<p>Before use, please extract the files from the archive. Since this experiment uses internal standards, you can follow the example in Section @ref(sec32).</p>
<!--chapter:end:04-filestudy.Rmd-->
</div>