forked from cbx33/gitt
-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathchap2.tex
469 lines (387 loc) · 26.8 KB
/
chap2.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
% chap2.tex - Week 2
\chapter{Week 2}
\section{Day 1 - ``We are coders, we use Git!''}
\subsection{Setting Up the Environment}
So now we are ready to begin delving into and actually using Git, right? Well, not exactly.
First we have to decide upon how the workflow model we have envisaged is implemented in our version control system.
With Git being so versatile, it's both a blessing and a danger.
It is a good idea to define from early on, exactly how you would like the developers, lieutenants and dictators to behave, before you begin actually committing any code.
Sometimes this isn't possible.
It's quite feasible that you have never used a version control system like Git before and you begin by muddling your way through.
This is normal, but if you are in charge of implementing this type of system for a professional environment, you should really consider first how this is going to work.
Conceptually, the model which was discussed previously is easy to imagine.
We have two dictators, who both have access to the blessed repository and then several developers, who are going to have their changes reviewed and included, by the aforementioned dictators.
The physical representation of the workflow model is summarised in the diagram below.
\figuregith{7cm}{images/f-w2-d1.pdf}{Tamagoyaki Inc's Physical Structure}
The physical structure is all well and good, but it doesn't determine exactly how the data is moved, just who is responsible for it at each stage in the process.
What is required, is a detailed analysis of where the data flows from and to.
A data flow diagram is useful, but not essential.
However, we will create a slightly different form of diagram to show how the data will be moved from one person to another.
Before we go ahead and look at the diagram, let's go back to the trenches to see how the guys are coping with their repository design.
\begin{trenches}
``John, why are we all sat in here at 9:45am on Monday morning.'' Klaus whined.
``I haven't even ingested enough coffee to check emails yet, let alone meet with people.''
John grinned, ``I don't think any amount of coffee will help you there Klaus, it's your winning personality that will pull you through.''
The rest of the team laughed and then subsided as John started drawing furiously on the board.
``So we have our physical model. We know which people are going to be in charge of things, but we don't know yet how to arrange our repositories.''
``Good point,'' chimed Mike.
``So. Obviously we're going to have a blessed repository,'' said John, drawing a circle on the board.
He stepped back, one hand on chin.
``Then I would imagine Klaus and I will have clones of that repository on our local machines. We will then modify those and push our changes back up to the central copy.''
``I thought Git didn't have a central copy?'' asked Martha.
There were other moans and grunts.
``Well,'' said John,
``as far as I understand it, it doesn't. I mean Klaus and I will have local copies of the repository too. We will work on those and then sync our changes back to the server. It's a sync, moreover a copy. I think it's actually called a clone.''
He nodded to himself, ``And, since Klaus and I will hardly ever overlap on code, we shouldn't ever need to merge or deal with conflicts.''
``But what about us monkeys?'' asked Martha,
``Where do we get our clones from?''
``From the central server of course,'' Rob stated smiling.
``Yes,'' John said,
``but I think what Martha is trying to say, is how will you get your updates?''
He started to walk around the room, and one or two of the developers followed him as he reached the windows and leant on the sill.
``I guess you would merge your branch with the blessed one.''
The room went silent and the only noise that could be heard was the rattling of the air conditioner in the ceiling above.
Simon spoke out, ``Well, I was reading over the weekend about this thing called rebase and how in some cases a rebase is better than merging.''
``What's rebase and how is it different to merging?'' asked Mike.
``Well, rebasing is pretty darn clever. Think of it this way. You have an upstream branch, in this case, our blessed repository. You are happily making changes. When the upstream changes, you could merge the changes in from blessed. If you do this, you create a single commit which merges the changes in. It works, but\ldots'' he trailed off a little,
``it can cause problems in certain instances. A better way to handle it is with rebasing. Rebasing can take all the changes you have made, squirrel them away, pull down all the changes to bring it up to date, and then whack your changes on top.
It's not always the best choice, but we should consider it.''
John breathed out, ``It sounds pretty cool Simon, but one thing is abundantly clear, we need to learn more about the Git basics before we start delving into this merging and rebasing. Let's spend the rest of the day playing with some test repositories and reconvene tomorrow.''
\end{trenches}
If you've never played with a version control system before it is a good idea to take some time to just play.
Pretty soon you'll have learnt the basics and will be in a position where you will want to put your newly honed skills into practice.
Though playing on test repositories is good, it is quite usual that you need to actually use the system in a real environment before real problems arise.
The rest of this chapter is a very quick introduction to Git.
It is presented as an introduction, because it is hoped and expected that the you will take some time out to get to know the system and how it works.
Knowing something about the underlying mechanisms of Git will definitely help you as you progress and will save you an awful lot of frustration later on when operations do not seem to function as you expect.
After Hours 2 focuses on the Git object model, something all Git users should have a basic understanding of.
\subsection{Initialising A Repository}
The first thing we need to do is to understand two very important things:
\begin{enumerate}
\item How to create a Git repository
\item What a Git repository actually is
\end{enumerate}
The first of these is relatively easy to perform.
\begin{code}
john@satsuki:~$ mkdir coderepo
john@satsuki:~$ cd coderepo/
john@satsuki:~/coderepo$ git init
Initialized empty Git repository in /home/john/coderepo/.git/
john@satsuki:~/coderepo$
\end{code}
If you are a Windows user, the above output may seem a little strange to you.
In the Linux world, the shell often has a much more descriptive prompt than on Windows.
In the case above, it takes the format \texttt{<user>@<host>:<current\_directory>}.
The \textasciitilde{} is a shortcut meaning \emph{Home Directory}, so really \texttt{~/coderepo} actually means \texttt{/home/john/coderepo}.
What we've done here is create a new directory called coderepo, moved into it, and then run the \indexgit{init} command.
The result of this command is a new directory in the coderepo directory called \texttt{.git}.
This directory will hold a local copy of our entire repository.
This will allow us to create branches, merge changes, rebase things and ultimately push our changes to somewhere else.
Something that is crucial to the running of a repository, whether you are an administrator of Git, or a developer who is using it, is an understanding of how Git works.
It is fine to jump in and play with the repository and test the water, but before committing to using Git in a production environment, you should understand what Git actually does in the background in some detail.
During the writing of this book several people have told me that Git is one of the only version control systems where a good understanding of how the underlying system works is not just highly recommended, but bordering on essential.
Let us take a few minutes to talk about how Git works internally and how the data is actually stored.
Git doesn't store changes to files, but actual snapshots of files at specific points in time.
In fact, each time a commit is made, Git actually makes a record of how the entire filesystem looked at that point, even if only one file is changed.
It refers to files by calculating a \index{SHA-1}SHA-1 hash of the file and using this as an ID, because the hash is unique to the contents of the file, it is easy to detect if a file has changed.
If the SHA-1 hash of a file changes, then the file must have been modified.
When a commit is made to the repository, Git stores a few things.
A \index{commit}commit object is created.
This contains, among other things, information about who made the commit, the parent of the commit and a hash that points to a tree object.
The tree object describes what the filesystem looked like at the time of the commit.
In other words the tree object, tells Git what files were in there.
Lastly, Git stores the files that were in the repository under their SHA-1 names in the objects directory.
Of course Git is super clever here because if you have exactly the same file in multiple commits, the SHA-1 hash of that file doesn't change and therefore Git only stores one copy of the file to save space.
The commit object is also referred to by an SHA-1 hash.
This is different to many other version control systems which use either a number that refers to the repository or a per file version number.
Getting used to seeing 40 character SHA-1 hashes can take a little time.
Saying "I need the commit referred to as bf81617d6417d9380e06785f8ed23b247bea8f6d," is certainly not as easy as saying you need revision 6.
However, Git handles these hashes well, and you can reference a commit using a few of the characters from the beginning, as long as those characters uniquely refer to that commit, i.e.,\ as long as your choice is not in any way ambiguous.
The above description may sound rather foreign to you.
If this is the case, you should really spend some time reading through it again and possibly even jump to the After Hours section at the end of this chapter.
Understanding the way Git stores objects is a rather important aspect of Git and though it may seem rather confusing at first,
learning this will help you later on in understanding more complex matters.
\section{Day 2 - ``Making commitments''}
\subsection{Let's work on our repository}
The most simple way of committing a file into the repository is to create it, or copy it to the working copy of our repository and use the commands below.
The working copy is the version of the repository we have currently checked out.
This terminology will be explained more later on.
\begin{code}
john@satsuki:~/coderepo$ touch my_first_committed_file
john@satsuki:~/coderepo$ git add my_first_committed_file
john@satsuki:~/coderepo$ git commit -m 'My First Ever Commit'
[master (root-commit) cfe23cb] My First Ever Commit
0 files changed, 0 insertions(+), 0 deletions(-)
create mode 100644 my_first_committed_file
john@satsuki:~/coderepo$
\end{code}
What we have done here, is to create a new blank file, using the unix \texttt{touch} command and add it into the repository using the \indexgit{add} command.
Windows users note, you will not have the touch command in your default command set, but if you are using Git Bash from the msysgit package, you should have it available to you.
Then we have committed it into the repository using the \indexgit{commit} command.
Let's make a few changes to our working copy and see what the result is.
First we are going to add another two new files, then we are going to make changes to our original file and finally we are going to run git status to see what Git has to say about our changes.
\begin{code}
john@satsuki:~/coderepo$ echo "Change1" > my_first_committed_file
john@satsuki:~/coderepo$ touch my_second_committed_file
john@satsuki:~/coderepo$ touch my_third_committed_file
john@satsuki:~/coderepo$ git status
# On branch master
# Changed but not updated:
# (use "git add <file>..." to update what will be committed)
# (use "git checkout -- <file>..." to discard changes in working directory)
#
# modified: my_first_committed_file
#
# Untracked files:
# (use "git add <file>..." to include in what will be committed)
#
# my_second_committed_file
# my_third_committed_file
no changes added to commit (use "git add" and/or "git commit -a")
john@satsuki:~/coderepo$
\end{code}
So we can see that \indexgit{status} is reporting that there are changes to our first committed file, and that our second and third files are \textbf{untracked}.
\index{untracked!files}Untracked files are ones which Git detects as being present in the working directory, but which haven't yet been added and there for upon running a commit, these files will not be added to the repository.
Notice that if we tried to run a commit now, nothing would actually be committed to the repository.
Even though there are changes to to my\_first\_committed\_file, we have not asked Git to include these.
So, let's go ahead and do that, and at the same time we'll make a few changes to my\_second\_committed\_file, and add those too.
\begin{code}
john@satsuki:~/coderepo$ git add my_first_committed_file
john@satsuki:~/coderepo$ echo "Change1" > my_second_committed_file
john@satsuki:~/coderepo$ git add my_second_committed_file
john@satsuki:~/coderepo$ git status
# On branch master
# Changes to be committed:
# (use "git reset HEAD <file>..." to unstage)
#
# modified: my_first_committed_file
# new file: my_second_committed_file
#
# Untracked files:
# (use "git add <file>..." to include in what will be committed)
#
# my_third_committed_file
john@satsuki:~/coderepo$
\end{code}
Now we can see that one of the sections has changed to "Changes to be committed".
So this means that Git has recognised and remembered that we are expecting these files to be committed when we next run a \texttt{git commit}.
\subsection{Committing the Uncommitted}
\begin{trenches}
``John, what is going on here?'' shouted Klaus from across the hallway.
The entire office had heard Klaus banging his hands down on the desk for the last fifteen minutes.
``John!'' the shout turned into a scream.
``Calm down Klaus, I'm just coming.''
John walked over to Klaus and pulled up one of the folding plastic chairs.
After a few minutes of fumbling he finally managed to take up his position next to an infuriated Klaus.
``John, Git is driving me crazy. I have added files to the repository and I keep running a commit, but the changes aren't getting put into the blasted repo.''
Klaus was clearly distressed and John resisted the urge make jokes.
John pointed at the screen.
``Run a git status Klaus and I'll show you what the problem is.''
\end{trenches}
To understand what Klaus was getting in a spin about, let's make a change to \texttt{my\_second\_committed\_file} now and see how this affects things.
Remember we have already added the file, but we haven't yet made a commit.
\begin{callout}{Note}{A little Linux note}It should be noted that there is a subtle difference between \texttt{>} and \texttt{>>}.
The former will \emph{redirect} the result of a command to a file, overwriting its contents.
A double arrow appends the result of a command to the specified file.
We use both of them in the examples as a way to show how we can simulate changing a file completely and appending extra lines to a file.
\end{callout}
\begin{code}
john@satsuki:~/coderepo$ echo "Change2" >> my_second_committed_file
john@satsuki:~/coderepo$ git status
# On branch master
# Changes to be committed:
# (use "git reset HEAD <file>..." to unstage)
#
# modified: my_first_committed_file
# new file: my_second_committed_file
#
# Changed but not updated:
# (use "git add <file>..." to update what will be committed)
# (use "git checkout -- <file>..." to discard changes in working directory)
#
# modified: my_second_committed_file
#
# Untracked files:
# (use "git add <file>..." to include in what will be committed)
#
# my_third_committed_file
john@satsuki:~/coderepo$
\end{code}
How interesting! We now have three sections and one of our files appears twice under both Changes to be committed and Changed but not updated.
What does this mean? If you remember back, we spoke about a \index{staging}staging area.
This is one area in which Git differs to many version control systems.
When you \textbf{add} a file into the repository, Git will actually make a copy of that file and move it into the staging area.
If you then go ahead and change that file, you would need to run another git add in order for Git to copy your changed file into the staging area.
The most important thing to remember is that Git will only ever commit what is in the staging area.
So, if we go ahead and run our commit now, we will only have the changes marked in Changes to be committed appearing in our repository.
\begin{code}
john@satsuki:~/coderepo$ git commit -m 'Made a few changes to first and second files'
[master 163f061] Made a few changes to first and second files
2 files changed, 2 insertions(+), 0 deletions(-)
create mode 100644 my_second_committed_file
john@satsuki:~/coderepo$
\end{code}
In our examples, we have used the syntax \texttt{git commit -m 'Message'}.
This is a slightly special way of committing, it allows us to specify our commit log message on the command line.
If we wanted to, we could run the command git commit and this would open a text editor that we could use to input our commands.
Let us finish off our round of committing by using the \texttt{git commit -a} option.
This commits all of the changes to files which are already tracked.
Consequently we do not have to specify the files with \texttt{git add}, like we have had to previously.
Any file which has been modified and has previously been added to the repository, will have it's changes committed upon running that command.
\begin{code}
john@satsuki:~/coderepo$ git status
# On branch master
# Changed but not updated:
# (use "git add <file>..." to update what will be committed)
# (use "git checkout -- <file>..." to discard changes in working directory)
#
# modified: my_second_committed_file
#
# Untracked files:
# (use "git add <file>..." to include in what will be committed)
#
# my_third_committed_file
no changes added to commit (use "git add" and/or "git commit -a")
john@satsuki:~/coderepo$
\end{code}
\begin{code}
john@satsuki:~/coderepo$ git commit -a -m 'Finished adding
initial files'
[master 9938a0c] Finished adding initial files
1 files changed, 1 insertions(+), 0 deletions(-)
john@satsuki:~/coderepo$
\end{code}
\begin{code}
john@satsuki:~/coderepo$ git status
# On branch master
# Untracked files:
# (use "git add <file>..." to include in what will be committed)
#
# my_third_committed_file
nothing added to commit but untracked files present (use "git add" to track)
john@satsuki:~/coderepo$
\end{code}
\section{Day 4 - ``Let's do this right, not fast''}
\subsection{Uh-Oh I Think I Made A Mistake}
So now we are fairly well acquainted with adding files into the repository and performing commits.
In a short while we will learn about how to view the changes we have made and perform diffs against various objects.
Before we close out the week, we need to go back to the trenches one last time.
\begin{trenches}
``Rob, ya got a second?'' asked Mike.
``Sure, what's up?'' replied Rob from across the office.
``Gimme two secs to make this commit.'' The office went silent again whilst Rob's fingers darted across the keyboard.
``Ahh. Damn it!'' shouted Rob.
Mike rose from his chair and walked over to Rob.
``What's up?''
``I just added a file into the staging area, but I don't want it there.''
He shook his head, ``Well not yet anyway.''
Mike chuckled, ``Sorry for interrupting dude.''
``Nah, it's OK, I just need to know how to pull this file out of the index.''
``Git reset,'' shouted a voice.
The stillness of the office was interrupted by a chair free wheeling across the floor.
The occupant of the chair was Klaus.
He seemed proud that he was finally getting to grips with things.
``You can use git reset to reset a file that's in the index.''
He grabbed at the keyboard, ``Here, lemme show you.''
\end{trenches}
The \indexgit{reset} command is great at removing things from the index that you don't want to be there.
Of course, it can do a great many other things, but for now, let us concern ourselves with the scenario presented above.
We are working away, and have added a number of files into the index ready for committing, when we discover that we are actually not ready to commit them.
In the following example, we are going to add the file my\_third\_committed\_file and then remove it from the index.
\begin{code}
john@satsuki:~/coderepo$ git add my_third_committed_file
john@satsuki:~/coderepo$ git status
# On branch master
# Changes to be committed:
# (use "git reset HEAD <file>..." to unstage)
#
# new file: my_third_committed_file
#
john@satsuki:~/coderepo$
\end{code}
Notice how my\_third\_committed\_file is now ready to be committed to repository.
The problem is we need to add something more to it before we do.
Remember that when we run the git add command, we are copying the file from our working copy to the index.
If we decide we no longer want that file in the repository, we can run the following.
\begin{code}
john@satsuki:~/coderepo$ git reset my_third_committed_file
john@satsuki:~/coderepo$ git status
# On branch master
# Untracked files:
# (use "git add <file>..." to include in what will be committed)
#
# my_third_committed_file
nothing added to commit but untracked files present (use "git add" to track)
john@satsuki:~/coderepo$
\end{code}
We have discarded the file which was residing in the index.
This is very important to note.
We are not moving the file from the index back into our working directory, we are literally just deleting the file from the index.
Our working copy remains unaffected.
We could run the \indexgit{reset} command without appending a file.
If we did this, all the files in the index would have been discarded.
\begin{trenches}
``So, I think we are all agreed, I'll keep a version of the repository under Git version control.
Until everyone else feels comfortable with some of the more advanced features.'' John looked around the room for any disagreements but there were none.
``Agreed John,'' said Markus, ``I'm pleased with how you guys are progressing, very pleased, but like John said, it's far better for us to take our time and to implement this correctly, than to rush it and to end up with something that we can't administrate and that we don't know how it works.''
``So next week, I want you all to start playing with diffing and logs and don't forget we have an important release due too.'' John pushed his glasses further up his nose.
``The week after that we'll start looking at branching and by then we may be at the stage where we can implement our model.''
Everyone nodded in agreement.
\end{trenches}
\begin{callout}{Knowledge}{How do we change the commit message editor?}\index{changing!commit editor}
We spoke earlier about the configuration file and how it stores information about our Git instance.
Git can use any text editor you require, even a graphical one, though the need rarely arises.
As mentioned earlier, Git has a preference lever when talking about configuration.
First and foremost it will look in the repositories own 'config' file in the .git folder.
Then, it will look in the users \texttt{\textasciitilde/.gitconfig} file.
Finally, Git will look in your distributions own global folder.
If we wanted to change the editor that Git would use to modify commit messages, we can either modify the files directly, or run a command similar to the following;
\begin{code}
git config core.editor "nano"
\end{code}
If we want the changes to apply globally, meaning it would affect all repositories we administrate as this user, unless overridden by a repository setting, we would run the following;
\begin{code}
git config --global core.editor "nano"
\end{code}
It is worth noting that you can also use the \$EDITOR environment variable to accomplish the same thing.
Many people use this in preference to the modifying the Git configuration simply because many other programs honour this setting.
\end{callout}
\index{files!moving}\index{files!deleting}\index{files!adding}Now we know how to add files into the repository.
The question is, what do we do if we need to remove a file, or even rename it.
Well, git has some commands to help with that.
\indexgit{rm} and \indexgit{mv} delete and move files respectively.
Usually when you want to remove files from the repository, or move them, this is how you will handle it, but what if you have already deleted a tracked file manually? Well, you have two options.
You could run \texttt{git commit -a}, but remember this will commit all changes to tracked files.
You could also run a \texttt{git rm <filename>} with the name of the file you have just deleted.
Git will then push that change into the staging area ready for commit.
The same applies to moving a file
However, it is worth noting something in the way that Git handles renames.
Git does not track renames explicitly.
This means that by running the \texttt{git mv <source> <dest>} command, you are essentially running a Linux \texttt{mv} command, followed by the \texttt{git rm} on the source file and \texttt{git add} on the destination file.
Running the \texttt{git mv} command is a shorthand way of doing just that.
It is worth playing with this to ensure that you understand what is happening.
As an exercise, inspect the repository after each command so that you understand at what point Git recognises your actions as a rename.
We have run through a few basic commands in Git.
If you are familiar with version control systems, then possibly the only real difference you will have noticed is that of the staging area.
It really is powerful, and allows you to organise and prepare your commits, so that they are both meaningful and coherent.
For Tamagoyaki Inc, their plan to implement version control was far too aggressive.
Most of the members of the team had never even used a version control system.
When deciding to implement version control, it is essential to ensure that you are doing it for the right reasons.
Version control is a tool to help you to keep things in order, but remember tools are nothing without process.
It is process that is key to the order.
\clearpage
\section{Summary - John's Notes}
\subsection{Commands}
\begin{itemize}
\item\texttt{git add} - Add files into the index or staging area
\item\texttt{git commit} - Commit files into the repository, using text editor for commit message
\item\texttt{git commit -m '<Message>'} - Commit files into the repository, using the command line to supply commit message
\item\texttt{git commit -a} - Commit all tracked files into the repository that have changed, using text editor for commit message
\item\texttt{git reset <path>} - Remove file from index or staging area
\item\texttt{git status} - Show the status of tracked, changed, untracked files
\end{itemize}
\subsection{Terminology}
\begin{itemize}
\index{Terminology!Branch}\item\textbf{Branch} - A way of working on the same set of code in parallel without modifications overlapping
\index{Terminology!Commit}\item\textbf{Commit} - A group of objects and a tree in a Git repository
\end{itemize}