The minute I learned about "rebase -i" and "add -p" has changed how I think about commits. I learned how I could easily keep the history clean and conversely, I learned the huge value that a clean history has for maintenance.
Now, building the commits as self-contained entities that don't break the build in between not only helps me while searching bugs later on, it sometimes helps me detect code smells around unneeded dependencies.
That said, I still like to merge big features with --no-ff if they change a lot of code and evolved over a long time, as that, again, helps keeping history clean because a reader can clearly distinguish code before the big change from code after the big change.
Of course the individual commits in the branch are still clean and readable, but the explicit merge still helps if you look at the history after some time.
"you said 'a long time in development' - surely the merge target has changed in between. Why still -no-ff?" you might ask.
The reason, again, is clean history: before merging I usually rebase on top of the merge target to remove eventual bitrot and in order to keep the merge commit clean. Having merge commits with huge code changes in them which we're caused by fixing merge conflicts, again, feels bad.
just like you, i enjoy rebase -i, to change history; but I also hear some poeple claim the history should be kept as it is and should not be rewritten. What are your arguments for rebase?
The public history is what ends up on the repository from where we deploy from. Whenever a commit is pushed there, it stays there. There will never be any rebasing (minus emergencies like removing accidentally committed files for which we don't own a license for - didn't happen so far though).
"rebase -i" is a tool for personal development use. It's not a tool to use on a public repository as it will make following history incredibly hard and it will screw with the clones other developers might have.
Conversely though, what I do on my personal development machine or on my personal public clone (everyone of us has a personal public clone we use for code reviews or discussions around code), is my business.
Nobody is telling me which editor to use and nobody is telling me whether I can clean up my commits or not.
Now in general, since learning that having clean commits is possible (it's not in subversion for example), I encourage my fellow developers to have clean commits and I discourage them from committing those famous "oops - removed typo" or "oops - added forgotten file" commits as they are completely useless for the overall history of the project.
Two months from now, nobody is going to care about you forgetting to add a file. But I'm likely going to care about when a feature has been added and why some lines have changed. So that's what I want to have in the public repository. Not a history of your personal forgetfulness.
If they manage to do that without ever rebasing (you can do it with add -p, it's just easy to make a mistake), then fine. In the end, I only care about a clean history on our public repository.
IMO, that's really the wrong way to go, and it's one of the big reasons I absolutely loathe git. I want any changes that are in my tree, ever, to be in the order and position in which they happened. If somebody screwed up and forgot to add a file, fine--add it in another commit. It's not like commits cost money.
As far as rolling back later--meh? I've never had a trouble in 300Krev heavily branched SVN barf, I strongly doubt it's suddenly harder in a DVCS. Merge tags are your friend, and indelible history is a good thing.
Commits don't cost money, but time wasted on "added forgotten files"-commits while parsing the history to trace a bug does cost money, so I'd rather not have the commits.
Additionally, it's impossible for you or anybody else to find out whether I have rebased my personal history before pushing. As such, it's totally inconsequential for the main repository whether I rebased or not.
As I said: I think rebase is a personal development tool, not one you would alter public history with.
I dunno, I think the claim that those commits "waste time" (in the sense of any meaningful amount of time, even cumulatively) is a little hyperbolic.
I guess you view history differently than I do: I consider all development history to be "public history" regardless of whether it was pulled in from a clone or not. If you commit it to a repository I am going to be fulfilling a pull request from, I want the history there.
It can help for git bisect for example to avoid having half of your commit not compiling properly because half the time you forgot to add one of the files to your commit.
Indelible history is good for public projects (and no one is arguing about that) and for change control systems but less good, in my estimation, in cases where mistakes are easily made, have small to non-existent consequenses, and serve no historic purpose.
But you will get it in the form that I would like you to have it, not as it happened.
But there is no way for you to know besides the fact that all commits you are going to pull are self-contained and none of them breaks the build.
Would you reject perfect looking commits, self-contained, perfectly documented and forming a perfect temporal history based on the fact that they are too perfect not to have been created using rebase? Because that's the only indication you have that rebase was used.
> But you will get it in the form that I would like you to have it, not as it happened.
And that's actually a very large part of why I don't use something that makes rebase easy. Because, fundamentally, I do not care how you "want" me to get it. I want to get it how it was put into the repository to begin with. I want to get it how it was committed--because how it was done matters to me, as a developer and as a person. This is an entirely emotional position and I don't care: show me how it was done. It matters to me, and given my personal axioms there is a decent argument, in my mind, that doing otherwise is disrespecting people who might want to see how you "put it all together."
You're certainly welcome to disagree. But because I use Hg for my projects and for any project I contribute to, and because my usual co-workers aren't going to go through the hassle of enabling Hg's rebase extension and using it just for the hell of it, I generally get what I want in the areas I care about. =)
If your workflow or VCS doesn't allow rebase natively then people will implement it in the filesystem by never committing.
Git's approach to committing is: "commit early, commit often". The coro;lary is: "don't worry about perfection, we can fix it later". Such flexibility to me is very enabling and allows for a lot of very beneficial experimentation in the process of developing features.
Honestly, I don't understand your fascination with the sausage-making. If I were to give you the pre-rebase and post-rebase version of patches I would wager you would find much more value in the latter. And if not. Even if seeing the wandering, hacking, slashing, typoing, re-indenting, etc. is instructive to you, any future maintainer of the code will be far less pleased.
"I want to get it how it was committed--because how it was done matters to me, as a developer and as a person."
Or as a micro-managing boss. I don't need someone staring over my shoulder as I work or after I work. My mistakes aren't your business and I find your attitude unpleasant.
If a change was worth committing then it is worth sharing that commit with everyone. Otherwise you run the very real risk of loosing important information about the design of a feature the bugs that were found and addressed during development.
Every change should been accompanied by a well described commit message and big changes are much harder to review.
I can see a very small positive in "hiding" the commits which resolve process issues like forgetting to add a file but in the long run you shouldn't have very many of these anyway so you should worry about them.
> If a change was worth committing then it is worth
> sharing that commit with everyone
You're misunderstanding some of the workflows that people are discussing. Sometimes I commit things that are half-finished, or even half-baked because I know that when it comes time to push I can rewrite things into a set of commits that makes sense.
This workflow makes sense because rewriting is easy enough. Obviously, I might not do this if I had to publish every commit that I made. But then I would just resort to using something like quilt to manage patches onto of SVN, which is ridiculous. You VCS is a patch management system. The idea that someone would use a patch management system on top of a patch management system suggests that something is broken (yes, I have had people on HN claim that git sucks because SVN + quilt 'work for me').
How are they useful? Why would anyone care about how you developed a single bugfix or a feature?
In your model, often the commits are not even sequential in the log because you might find a mistake only after committing several other changes. I can't see how not rebasing makes commit history better in any way at all. I would like to hear your reasoning.
The way I see it, instead of a series of commits that implement something, you could have a single patch (commit) that implements something, making it much easier to
0) find all the code that implements a certain feature, because it's a self-contained commit,
1) find bugs via bisection,
2) port to other versions via cherry-picking and
3) read the changelog and figure out what the hell is actually happening because there are no trash commits around obscuring things.
EDIT:
Just to clarify, I do not think that this is a matter of opinion or preference. That would imply that both approaches are equally valid.
I consider rebasing a tool that enables a vastly superior workflow. I have given a few reasons why I think it is superior and I am interested in counterarguments or at least reasoning as to why not making perfect commits (to the best of your ability) is preferable or even acceptable at all.
> Hiding away the development of a feature into one large commit makes it harder for people to review.
You have it backwards. One commit is much easier to review than three commits that you might not even know are related.
Assuming you know what you're doing, the commits you create with rebase are not large, they are just the perfect size. They contain the code needed for a single change and nothing else.
Sometimes a feature might actually take two or more commits, but then those commits represent two subfeatures... For example, you might first need to implement a new API, and then write a new feature that uses that API. That's two commits. If you forget something from the API or notice that it's problematic while coding the feature, then rebase will allow you to fix the first patch, instead of splitting code across multiple commits that make no sense separately.
One might argue that in this situation the feature patch makes no sense without the API patch but in fact it's a feature dependency, not a code dependency... As long as the API exists, the feature implementation is just fine as a standalone commit.
When used properly, rebase gives you freedom to use as many WIP commits and make all the mistakes you want in your private branch, while still allowing you to create good, easy-to-review patches that result in a logical, clean, and informative history.
If you still hold the opinion that rebase is bad, I am interested in further arguments, since the one reason you gave does not hold water.
>In my opinion in a collaborative environment it is immensely
>useful to know about how a feature / bug fix was developed.
>
>Hiding away the development of a feature into one large
>commit makes it harder for people to review.
this is exactly what be pro-rebasers were talking about.
Nobody of us wants to do one big huge commit that contains the whole feature.
Everybody of us wants small, self-contained commits, every commit fixing or adding one specific thing.
What we don't want is a commit adding a thing, quickly followed by another commit "forgot to add this file", because that later commit provides no value to a reviewer.
What we rebasers are talking about is forging the history in a way that a patch reviewer can go over every single commit and, in one glance, decide whether that patch makes sense or not.
Let's assume it's the old days of svn: A whole file is the smallest unit of change you can commit and there's no way to change history.
Let's further assume that you want to add a new feature to a file. While doing so, you also notice that there's a bug in another part of that code in the same file that became apparent while writing your feature.
Your feature only works with the bug fixed, but the bug fix also makes sense independently of the feature.
In the old days, when committing that file, you have two options for commit messages:
1) "adding feature foobar"
leaving out the fact that you also fixed a bug. This is bad if your bugfix contains another bug and I have to dig in the history, wondering why you changed this seemingly unrelated piece of code. If I have to review the code, I will have to ask you, why you also changed a seeminly unrelated piece of code.
2) "adding feature foobar and fixing bug bar"
this is better, but weren't you thought that a commit should only do one thing? This clearly does two.
At that point, you could use diff, patch and an editor to remove the feature but leave the bugfix in. Then you commit that as "fixing bug bar", followed by more diff and patch to get the feature in, which you commit as "adding feature foobar".
Fine, but very cumbersome, so hardly ever done.
Git, on the other hand, with the help of "add -p" and "rebase -i" makes exactly this possible and turns something incredibly painful into something you can do with closed eyes in your sleep.
And this is why there is this vocal pro-rebasing-crowd.
We EXACTLY NOT talking about mushing everything together in a big commit
We are talking about creating MANY, MANY more SMALLER commits that are independent of each other and thus much more maintainable.
Case in point: Since we migrated to git for our product and since everybody learned about rebase and began using it, we made the same amount of commits in one year that we did in tree previous years.
I would be seriously upset if somebody used the power of rebase to create big huge commits and wanted to push them to our main repo. This is not what we are advocating rebase use for. Not at all.
Personally, I think that this idea of somewhat bogus:
1) In most cases, all of those extra commits are just noise. They make it really confusing to determine what actually changed from point A to point B because of all of the dead ends that were hit and backed out in between.
2) This is like saying that every time that someone produces anything they should be required to save all of their dead ends for people that look at their work in the future. If a carpenter at a building site cuts a piece of wood wrong, should he just recut it correctly, or set it aside so that on the off-chance that someone needs to see how he mis-cut the piece of wood, they can?
3) The fundamental flaw here is that you're trying to use the tool to enforce the workflow. Would you feel the same way if the next version of Ubuntu enforced that no image files could be placed anywhere on disk except ~/Photos just because someone determined that that's what 'makes sense?'
Why, then, not commit every keystroke? After all, you're losing history every time someone types backspace.
I imagine the reason that seems absurd is that you don't consider all the false steps and reworking that go on while a commit is crafted to be part of its official meaning. The working set is malleable until it's ready, and then you commit it. Well, private branches as pilif describes them are malleable in just this way. In both cases, you work your code like clay until it's ready to be presented and then bake it in to the public history.
I strongly dispute the assertion that a private clone is malleable in the way you're describing. I consider all history important enough to be committed to be "public" history. But (apparently unlike folks who are fast with their downvote buttons) I certainly acknowledge that it is a matter of taste.
I consider all history important enough to be committed to be "public" history
This argument seems to me to boil down to an attachment to a single meaning (the traditional one) of the word "commit".
p.s. Instead of complaining about being downvoted, it would be better to make your tone less aggressive in the first place. None of your other comments made it clear that you regard this as taste; actually quite the opposite.
> I want any changes that are in my tree, ever,
> to be in the order and position in which they
> happened.
Could you explain this better. When I read this I hear, "I absolutely loathe modern operating systems, I think that all code should go in ~/code and all documents in ~/Documents, but modern operating systems allow you to put things anywhere. The horror!"
Do you really loathe the tool because it allows for flexibility? Do you have so little faith in the end-user? Do you really understand git, or only at a cursory level? (That isn't meant to be an insult)
> indelible history is a good thing
So long as you have a central git repository and manage it so that no one can rewrite the master/trunk branch, then you have that. If someone screws up the history in their local tree, then it won't allow them to overwrite the history on the canonical version.
(If you say, "well someone with access to the canonical version could do X," then you're just trolling, because someone with access to the central SVN repo could 'rm -rf' it too. It's the same issue.)
Commits don't cost money, but they increase complexity which takes brain power away from the development process.
Second, git won't change anything by itself, so things will always be exectly where you left them; so I am really interested in why you think rewriting is a bad idea.
For better integration with an issue tracker, and because a master branch should have a formal commit history.
I make tons of little commits on a local branch with messages such as "reformat styles", "whoops, fixed typos", "fixed the query", and such. I use "rebase -i" to squash all of them together with a better, and more formal, commit message, like "Repairs and styles main navigation, closes #9".
I hate a master branch with a commit history with casual commit messages.
Now, building the commits as self-contained entities that don't break the build in between not only helps me while searching bugs later on, it sometimes helps me detect code smells around unneeded dependencies.
That said, I still like to merge big features with --no-ff if they change a lot of code and evolved over a long time, as that, again, helps keeping history clean because a reader can clearly distinguish code before the big change from code after the big change.
Of course the individual commits in the branch are still clean and readable, but the explicit merge still helps if you look at the history after some time.
"you said 'a long time in development' - surely the merge target has changed in between. Why still -no-ff?" you might ask.
The reason, again, is clean history: before merging I usually rebase on top of the merge target to remove eventual bitrot and in order to keep the merge commit clean. Having merge commits with huge code changes in them which we're caused by fixing merge conflicts, again, feels bad.
But this is certainly a matter of taste.