-
-
Notifications
You must be signed in to change notification settings - Fork 7
/
Copy pathOpe-Source-in-Research_otter.ai.srt
893 lines (714 loc) · 16.5 KB
/
Ope-Source-in-Research_otter.ai.srt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
1
00:00:00,000 --> 00:00:01,710
Helena Rasche: My name is Helena Rasche
, I've been working in
2
00:00:01,710 --> 00:00:05,130
open source for quite some time
now I'm currently employed by
3
00:00:05,130 --> 00:00:07,620
the hospice Medical Center and
the Avans Hogeschool in
4
00:00:07,620 --> 00:00:10,890
Breda. And I'm here to tell
you a little bit about open
5
00:00:10,890 --> 00:00:15,270
source in research. So first,
what is free and open source
6
00:00:15,270 --> 00:00:19,110
software, the term is a bit
important. Because open source
7
00:00:19,110 --> 00:00:22,290
software and free software are
not necessarily the same things
8
00:00:22,320 --> 00:00:26,490
free software doesn't mean open.
So you can many of us remember
9
00:00:26,490 --> 00:00:29,550
freeware on the internet from
when we were younger. This is
10
00:00:29,550 --> 00:00:31,830
software where you have no
access to source code and you
11
00:00:31,830 --> 00:00:35,400
can't work with it, you can't
modify it due to license. This
12
00:00:35,400 --> 00:00:39,120
is not great for science. On the
other hand, open source doesn't
13
00:00:39,120 --> 00:00:42,000
mean free. So for those of you
who are producing research
14
00:00:42,000 --> 00:00:44,310
software, just because you're
making your software open
15
00:00:44,310 --> 00:00:48,900
source, it doesn't mean it has
to be free software. And this
16
00:00:48,900 --> 00:00:52,110
together, the intersection of
these two is often known as free
17
00:00:52,110 --> 00:00:55,680
and open source software, or
FOSS Free and open-source software
18
00:00:55,680 --> 00:00:58,800
And this just
refers to software that's
19
00:00:58,800 --> 00:01:03,750
both open and free. Open source
is simply licensing your work so
20
00:01:03,750 --> 00:01:07,440
it can be used how you want.
Making your software, open
21
00:01:07,440 --> 00:01:11,490
source is just a matter of
setting out the terms of how you
22
00:01:11,490 --> 00:01:14,730
want your software to be used,
it doesn't mean that you have to
23
00:01:14,730 --> 00:01:16,860
give up control of your
software, it doesn't mean that
24
00:01:16,860 --> 00:01:19,680
you're putting it out to the
community forever. It just means
25
00:01:19,680 --> 00:01:22,590
you're choosing and saying for
yourself setting boundaries on
26
00:01:22,590 --> 00:01:25,830
how you want your software to be
used. So it's very important
27
00:01:25,830 --> 00:01:30,150
step all the time. For those who
want you open source also makes
28
00:01:30,150 --> 00:01:34,170
it easy for others to remix your
software to reuse it to build
29
00:01:34,170 --> 00:01:37,860
upon your software, add new
features if they want. And if
30
00:01:37,860 --> 00:01:41,130
you're building a project where
community is important, where
31
00:01:41,130 --> 00:01:44,070
you want your software to be
used by a lot of people, making
32
00:01:44,070 --> 00:01:46,860
it open source in such a way
that people can reuse and work
33
00:01:46,860 --> 00:01:50,160
with you can be a really great
boost for your software just in
34
00:01:50,160 --> 00:01:52,950
terms of visibility and people
who want to use it, things like
35
00:01:52,950 --> 00:01:59,610
that. So why use and promote
open source. Open source is part
36
00:01:59,640 --> 00:02:04,200
of the ethics of scientists and
hackers a little bit when you're
37
00:02:04,200 --> 00:02:07,530
writing the software when you're
working on projects together. A
38
00:02:07,530 --> 00:02:12,120
lot of the libraries you use for
Python, things like this, if
39
00:02:12,120 --> 00:02:13,980
you're using different
programming languages, a lot of
40
00:02:13,980 --> 00:02:17,190
Python libraries you use will be
open source, you're benefiting
41
00:02:17,190 --> 00:02:20,940
from a lot of work that other
people have done so far. And
42
00:02:21,390 --> 00:02:25,500
it's a nice feeling to
collaborate and give back. Now
43
00:02:25,500 --> 00:02:30,360
you've built your software or
your project on years and years
44
00:02:30,390 --> 00:02:33,630
of free work that was produced
before and now you're giving
45
00:02:33,630 --> 00:02:37,980
back to the community. It also
means distributed innovation. So
46
00:02:38,010 --> 00:02:41,970
when you're making software open
source and free so that other
47
00:02:41,970 --> 00:02:44,580
people can work on it. This
means that other people will
48
00:02:44,580 --> 00:02:47,850
give back contributions, they'll
report bugs, when they have
49
00:02:47,850 --> 00:02:50,520
issues running your software,
they will tell you new features
50
00:02:50,520 --> 00:02:53,640
that they want. You don't have
to implement them. But you're
51
00:02:53,640 --> 00:02:56,760
able to take all of these good
ideas from the community and
52
00:02:56,760 --> 00:03:00,540
things like this. Importantly,
for science, open source is
53
00:03:00,540 --> 00:03:04,830
easier to review, reuse and
integrate. And when we talk
54
00:03:04,830 --> 00:03:07,860
about open source in science, if
you want your software to be
55
00:03:07,860 --> 00:03:11,220
distributed widely to be used in
a lot of systems, one of the
56
00:03:11,220 --> 00:03:14,850
best things you can do for it is
licensing it in such a way that
57
00:03:14,850 --> 00:03:17,880
everyone can use it and set it
up on their own system. Things
58
00:03:17,880 --> 00:03:21,240
like Galaxy so I'm of course
very biased here, and we'll talk
59
00:03:21,240 --> 00:03:25,980
a lot about Galaxy. But in the
Galaxy community, we can give a
60
00:03:26,010 --> 00:03:29,910
huge platform of you know,
unused Galaxy, you 30,000 users
61
00:03:29,910 --> 00:03:33,570
to free source, it's free
advertising. But we can only do
62
00:03:33,570 --> 00:03:37,260
that for open source software.
There are a lot of risks
63
00:03:37,260 --> 00:03:40,830
associated with closed code to
when you have closed source
64
00:03:40,830 --> 00:03:44,460
software that's used in science,
you can't review it very easily,
65
00:03:44,460 --> 00:03:47,310
the community can't review it
very easily only the reviewers
66
00:03:47,310 --> 00:03:50,670
of the paper sometimes. And if
there are errors in the
67
00:03:50,670 --> 00:03:53,670
software, if there are bugs,
then these can be missed for
68
00:03:53,670 --> 00:03:57,570
years and years. And these
effects a lot of scientific
69
00:03:57,570 --> 00:04:02,340
results downstream. So having
open source reviewable software,
70
00:04:02,340 --> 00:04:05,250
it's very important for
reproducible, good quality
71
00:04:05,250 --> 00:04:09,900
science. A lot of the journals
are adopting this and starting
72
00:04:09,900 --> 00:04:15,630
to require disclosure or
publishing have your source code
73
00:04:15,630 --> 00:04:18,750
for all of your software to this
is from a scientific
74
00:04:18,750 --> 00:04:22,200
perspective, really a good thing
for the community. So how do you
75
00:04:22,200 --> 00:04:24,900
know if something is open
source? You'll see on a lot of
76
00:04:24,900 --> 00:04:27,270
GitHub repositories, which I
think everyone here is familiar
77
00:04:27,270 --> 00:04:31,260
with, thanks to OLS is a little
license icon on the right hand
78
00:04:31,260 --> 00:04:35,700
side of the repository. Also on
things like slides, so you can
79
00:04:35,700 --> 00:04:40,170
see the CC by of these, this
slide deck at the top, you can
80
00:04:40,170 --> 00:04:42,720
look for a couple of different
files within the license. And
81
00:04:42,720 --> 00:04:45,990
these files need to be there if
it's going to be open source
82
00:04:45,990 --> 00:04:48,960
software that she can reuse. But
it's not just for software. I
83
00:04:48,960 --> 00:04:52,140
know I talk a lot about software
but it's also for data. There
84
00:04:52,140 --> 00:04:55,020
are different licenses for data
for things like databases that
85
00:04:55,020 --> 00:04:58,740
you want to make accessible for
photos or training materials,
86
00:04:58,740 --> 00:05:02,730
things like this. All these are
options. And if you want people
87
00:05:02,730 --> 00:05:04,530
to be able to use them, you need
to license
88
00:05:04,529 --> 00:05:09,839
them. One of common fears I hear
about a lot is, if I published
89
00:05:09,839 --> 00:05:13,409
my software online, if I do it
in the open, it'll be I'll be
90
00:05:13,409 --> 00:05:16,289
scooped someone will take my
software and claimants thrown.
91
00:05:16,739 --> 00:05:20,369
But this isn't necessarily true.
If someone steals your software,
92
00:05:20,609 --> 00:05:24,509
there is at least a traceable
log in GitHub, or which we'll
93
00:05:24,509 --> 00:05:28,679
get to in a minute. There is a
copy of your software already
94
00:05:28,679 --> 00:05:31,529
online. And if you already have
a community around that, it'll
95
00:05:31,529 --> 00:05:34,229
be very obvious that someone
took it. And if you're still
96
00:05:34,229 --> 00:05:36,449
worried, I've heard some people
saying that they publish
97
00:05:36,449 --> 00:05:40,259
preprints as a way to document
that, hey, they were first write
98
00:05:40,259 --> 00:05:43,499
the software, and to really make
sure they stake their claim on
99
00:05:43,499 --> 00:05:48,089
that software. So don't worry
about that just work in the
100
00:05:48,089 --> 00:05:50,399
open, it's better for the
community, it's better for the
101
00:05:50,399 --> 00:05:53,159
world. And it's good for
science, publishing, sharing
102
00:05:53,159 --> 00:05:56,069
open source code. One of the
easiest and most effective ways
103
00:05:56,069 --> 00:05:59,369
to do this is version control.
I'm sure you're all learning
104
00:05:59,369 --> 00:06:02,789
about getting GitHub if you
haven't already. But version
105
00:06:02,789 --> 00:06:05,579
control is fantastic way to
publish and share code with
106
00:06:05,579 --> 00:06:08,579
others. It gives you a whole
timeline of your software, and
107
00:06:08,579 --> 00:06:12,899
it makes it easy to reuse,
contribute and make
108
00:06:12,899 --> 00:06:16,919
modifications to your software.
So why collaborating is easy.
109
00:06:17,429 --> 00:06:21,119
One of the common things is
reverting accidents. If you make
110
00:06:21,119 --> 00:06:23,489
some bad changes in your code,
or one of your collaborators
111
00:06:23,489 --> 00:06:26,099
does, you can always revert you
can always go back to before
112
00:06:26,099 --> 00:06:29,189
then it makes it easy to
integrate the changes from
113
00:06:29,189 --> 00:06:33,089
multiple developers. Like with
galaxy, there are some 200
114
00:06:33,089 --> 00:06:36,209
contributors to the code base or
the training materials as well.
115
00:06:36,599 --> 00:06:39,959
And all of us can work together
collaboratively because we use
116
00:06:39,959 --> 00:06:43,769
version control. Also offsite
copies of your software,
117
00:06:44,369 --> 00:06:48,089
everyone has computer issues,
everyone loses a hard drive at
118
00:06:48,089 --> 00:06:52,319
some point or gets their
harddrive encrypted by some
119
00:06:52,319 --> 00:06:55,949
hackers things like this. If you
have all of your work in the
120
00:06:55,949 --> 00:06:58,979
open in public, then you can
just download a new copy again
121
00:06:58,979 --> 00:07:04,049
and start working in Git and
GitHub are very common, a very
122
00:07:04,049 --> 00:07:07,199
common choice. Git is one of the
most common version control
123
00:07:07,199 --> 00:07:10,559
systems, there are others.
GitHub, likewise, is one of the
124
00:07:10,559 --> 00:07:15,389
most common git hosts. But
there are others depends on what
125
00:07:15,389 --> 00:07:18,509
you want to use. One of the nice
things you get with GitHub is a
126
00:07:18,509 --> 00:07:21,599
large existing user base and
large community of people who
127
00:07:21,599 --> 00:07:24,659
will be able to contribute to
your software to low barrier for
128
00:07:24,659 --> 00:07:29,279
entry. If you need to learn more
about git, there is a great set
129
00:07:29,279 --> 00:07:32,129
of lessons from software
carpentries. But one of the
130
00:07:32,129 --> 00:07:35,729
important notes is that Git is
very, very complex. My partner
131
00:07:35,729 --> 00:07:42,239
teaches a get together session
where they teach how to use Git
132
00:07:42,239 --> 00:07:45,809
to the colleagues in our office.
And there is just so much to
133
00:07:45,809 --> 00:07:48,479
learn about git, but you don't
need to learn all of it. Now.
134
00:07:49,139 --> 00:07:51,479
Just start with the important
parts, the rest comes later.
135
00:07:51,839 --> 00:07:54,299
There, you'll see a lot of
guides online that will say oh,
136
00:07:54,299 --> 00:07:56,789
you need to learn about how the
commit craft works and things
137
00:07:56,789 --> 00:08:00,119
like this. But if you just want
to publish your code, you don't
138
00:08:00,119 --> 00:08:03,719
need any of the fancy stuff. So
a few steps to make your work
139
00:08:03,719 --> 00:08:08,249
open source. It README is a very
important part of us. If you
140
00:08:08,249 --> 00:08:11,429
want people to know what your
software is and how to start
141
00:08:11,429 --> 00:08:15,329
using it. That's the number one
thing people will see. So be
142
00:08:15,329 --> 00:08:18,599
sure to include lots of good
images there. License, if it
143
00:08:18,599 --> 00:08:21,239
has, if it's going to be open
source, it needs a license file.
144
00:08:21,239 --> 00:08:24,599
So it takes two minutes
to add a license to GitHub,
145
00:08:24,599 --> 00:08:29,009
it's really easy just doing same
thing with the contributing guide, GitHub
146
00:08:29,009 --> 00:08:31,829
has some template contributing
guides that make it really easy
147
00:08:31,829 --> 00:08:35,969
to tell people how you want them
to contribute to your to your
148
00:08:35,969 --> 00:08:41,039
repositories. Having a public
roadmap, we aim to talk about
149
00:08:41,039 --> 00:08:44,459
the Kanban boards and the agile.
Having a public roadmap is a
150
00:08:44,459 --> 00:08:47,369
great way to tell community,
what you're working on what
151
00:08:47,369 --> 00:08:49,949
features you're going to
implement things like that, that
152
00:08:49,949 --> 00:08:54,299
can help people get excited for
your software. Publishing list
153
00:08:54,299 --> 00:08:57,959
of issues same, it feels bad to
say, hey, my software has bugs.
154
00:08:57,959 --> 00:09:00,899
But at least putting them out in
the open, you can track the
155
00:09:00,899 --> 00:09:04,049
things you've done or not done.
A Code of Conduct assumptions
156
00:09:04,049 --> 00:09:07,169
you've learned from OLS is very
important thing. Contact and
157
00:09:07,169 --> 00:09:09,809
citation can be very useful as
well. If you have a GitHub
158
00:09:09,809 --> 00:09:14,069
repository, you can easily get
those from Zenodo and Figshare.
159
00:09:14,609 --> 00:09:17,429
If you need a DOI, there's a
nice website for how to choose a
160
00:09:17,429 --> 00:09:22,169
license. There are lots of
different license choices. And
161
00:09:22,169 --> 00:09:25,109
they give you a lot of different
freedom to choose what you want
162
00:09:25,469 --> 00:09:28,079
your software to be able to do
or what you want other people to
163
00:09:28,079 --> 00:09:30,869
be able to do with your
software. Some people don't want
164
00:09:30,869 --> 00:09:34,589
businesses to use their software
for free. And there are licenses
165
00:09:34,589 --> 00:09:37,289
that support this or some people
want everyone to use it for
166
00:09:37,289 --> 00:09:40,949
free. Lots of different choices.
But the ultimate goal, of
167
00:09:40,949 --> 00:09:43,589
course, is full reproducibility.
And we're getting a lot closer
168
00:09:43,589 --> 00:09:46,169
with things like Jupiter and
binder where you can publish
169
00:09:46,169 --> 00:09:48,479
your software but also a
notebook where people can run
170
00:09:48,479 --> 00:09:51,389
your software online, which is a
fantastic way to get people to
171
00:09:51,389 --> 00:09:52,259
use your software.
172
00:09:53,550 --> 00:09:56,400
Taking it further. There's a lot
of ways for you to contribute if
173
00:09:56,400 --> 00:10:00,000
you're a first time contributor
and to get involved in the open
174
00:10:00,000 --> 00:10:03,090
Source Software movement and
contributing to the open source
175
00:10:03,090 --> 00:10:05,970
communities. So there are lots
of nice links here if you want
176
00:10:05,970 --> 00:10:09,090
to explore them. And lastly is
the Turing way has a nice
177
00:10:09,090 --> 00:10:12,690
handbook on reproducible data
science and making software open
178
00:10:12,690 --> 00:10:16,260
source and publishing and making
accessible to people. So with
179
00:10:16,260 --> 00:10:17,070
that, thank you