undefined | Better HN

0 pointsmtdewcmu12y ago0 comments

It looks like .git/logs contains the history. It looks like the file format is a space-separated list, with the format "$parentcommitsha1 $newcommitsha1 ... $commitmessage". That's fairly comprehensible. What are the SHA-1 sums of? Are they of the entire snapshot, or the delta? I went into objects/ and ran `sha1sum $objfile`, and the sum did not match the file name. So that remains obscure. `file $objfile` could not identify the format; it gave nonsense.

Thanks for the help.

>One of those meta-data items is "Parent Commit," so if you change one item in history, it changes the SHA-1 sum of all subsequent items (because at the very least they all need to be re-parented).

What sequence of operations would change a history item in that way?

0 comments

pyre12y ago

> It looks like .git/logs contains the history. It looks like the file format is a space-separated list, with the format "$parentcommitsha1 $newcommitsha1 ... $commitmessage". That's fairly comprehensible.

I've never looked at .git/logs, but it looks like that is used by the `git reflog` command. It's basically a history (or log) of every commit that a particular reference has pointed to[1]. For example, I cloned the git source code:

  user@host ~/src/git % cat .git/logs/HEAD
  0000000000000000000000000000000000000000 d7aced95cd681b761468635f8d2a8b82d7ed26fd First Last <user@example.com> 1387237920 -0500	clone: from https://github.com/git/git.git

  user@host ~/src/git % git reflog
  d7aced9 HEAD@{0}: clone: from https://github.com/git/git.git

Note: `HEAD` is a reference to the current branch. E.g.:

  ~/src/git $ cat .git/HEAD
  ref: refs/heads/master

  ~/src/git $ cat .git/refs/heads/master
  d7aced95cd681b761468635f8d2a8b82d7ed26fd

It's also of note that branches are referred to as 'references' too, hence storing them under `.git/refs/`.

> What are the SHA-1 sums of? Are they of the entire snapshot, or the delta? I went into objects/ and ran `sha1sum $objfile`, and the sum did not match the file name. So that remains obscure.

See: http://stackoverflow.com/questions/5290444/why-does-git-hash...

[1]: Since the local repository was created. This information does not sync between local and remote.

mtdewcmuOP12y ago

>I've never looked at .git/logs, but it looks like that is used by the `git reflog` command. It's basically a history (or log) of every commit that a particular reference has pointed to[1]. For example, I cloned the git source code:

I think it's more or less the DAG represented as an adjacency list. I'd have to think a bit about why there is a separate log file for each branch. It seems that there's some redundancy in doing that, and I'm wondering what the advantages and disadvantages are of splitting the history up in that way.

>It's also of note that branches are referred to as 'references' too, hence storing them under `.git/refs/`.

I've developed a loathing of excessive hierarchies/trees, so I'd rather see them flattened in a single directory. But that makes sense.

>See: http://stackoverflow.com/questions/5290444/why-does-git-hash....

That's a good link. What's in an object? If an object corresponds to a commit, then it must aggregate data about changes to multiple files.

pyre12y ago

> I think it's more or less the DAG represented as an adjacency list. I'd have to think a bit about why there is a separate log file for each branch. It seems that there's some redundancy in doing that, and I'm wondering what the advantages and disadvantages are of splitting the history up in that way.

Think of each branch as a pointer. Then realize that you can make that pointer point anywhere on the DAG, even to parts of the DAG that have no connection to each other. The `reflog` is a (local, non-comprehensive) history of where that pointer has pointed. That's why there is a separate log for each branch. I guess that technically they could have a single log file and add another field to specify the branch, but using the same directory tree structure as under .git/refs/ makes the mental model simpler (and probably a performance improvement not to have to parse the reflog for every branch just to see the reflog for one branch).

> I've developed a loathing of excessive hierarchies/trees, so I'd rather see them flattened in a single directory. But that makes sense.

I'm not sure what branches living under .git/refs has to do with excessive hierarchies/trees. There are enough things stored in the .git directory, that if you mashed them all together it wouldn't make any sense.

> What's in an object?

If you really care to dive deeper, you can check objects here: https://github.com/git/git/blob/master/object.h

You can get a shorter version towards the bottom of the git manpage (e.g. `man git`):

  IDENTIFIER TERMINOLOGY
         <object>
             Indicates the object name for any
             type of object.
  
         <blob>
             Indicates a blob object name.
  
         <tree>
             Indicates a tree object name.
  
         <commit>
             Indicates a commit object name.
  
         <tree-ish>
             Indicates a tree, commit or tag
             object name. A command that takes a
             <tree-ish> argument ultimately wants
             to operate on a <tree> object but
             automatically dereferences <commit>
             and <tag> objects that point at a
             <tree>.
  
         <commit-ish>
             Indicates a commit or tag object
             name. A command that takes a
             <commit-ish> argument ultimately
             wants to operate on a <commit> object
             but automatically dereferences <tag>
             objects that point at a <commit>.
  
         <type>
             Indicates that an object type is
             required. Currently one of: blob,
             tree, commit, or tag.
  
         <file>
             Indicates a filename - almost always
             relative to the root of the tree
             structure GIT_INDEX_FILE describes.

1 more reply

mtdewcmuOP12y ago

I'm guessing that the reason each branch has its own history is probably related to the goal of only appending new entries at the end of things. Since any branch can be under development, they need their own files. It sort of makes sense.

pyre12y ago

I still think that you're a little confused. The reflog is a "history of where this branch has pointed since the repository was created/cloned." If I clone a repository with a history of 100 commits on the 'master' branch, the reflog for the 'master' branch will only have one entry. You can completely delete the `.git/logs` and still run `git log` successfully.

Here's an example:

  $ git clone blah
  
  DAG:
  
    A - B - C - D - E
        \
         Z - X - Y
  
  
  Branches:
  
   master => E
   topic/new-feature => Y
  
  
  reflog:
  
    master
      E - clone from blah
  
    topic/new-feature
      Y - clone from blah

Notice how cloning a repository with an existing DAG doesn't populate the reflog. It just give it a single entry saying that the branch was updated from 'nothing' to whatever commit it was pointing to remotely.

Now let's change where 'master' is pointing:

  $ git reset master C
  
  
  DAG:
  
    A - B - C - D - E
        \
         Z - X - Y
  
  
  Branches:
  
   master => C
   topic/new-feature => Y
  
  
  reflog:
  
    master
      E - clone from blah
      C - reset to C
  
    topic/new-feature
      Y - clone from blah

Notice how the reflog is a history of the values that the branch was referencing, but is not the history as what you get when you run 'git log'. After the reset, 'git log master' would show you commits A, B and C, but A and B are nowhere in the reflog.

1 more reply

j / k navigate · click thread line to collapse

0 comments

pyre12y ago

  user@host ~/src/git % cat .git/logs/HEAD
  0000000000000000000000000000000000000000 d7aced95cd681b761468635f8d2a8b82d7ed26fd First Last <user@example.com> 1387237920 -0500	clone: from https://github.com/git/git.git

  user@host ~/src/git % git reflog
  d7aced9 HEAD@{0}: clone: from https://github.com/git/git.git

Note: `HEAD` is a reference to the current branch. E.g.:

  ~/src/git $ cat .git/HEAD
  ref: refs/heads/master

  ~/src/git $ cat .git/refs/heads/master
  d7aced95cd681b761468635f8d2a8b82d7ed26fd

It's also of note that branches are referred to as 'references' too, hence storing them under `.git/refs/`.

> What are the SHA-1 sums of? Are they of the entire snapshot, or the delta? I went into objects/ and ran `sha1sum $objfile`, and the sum did not match the file name. So that remains obscure.

See: http://stackoverflow.com/questions/5290444/why-does-git-hash...

[1]: Since the local repository was created. This information does not sync between local and remote.

mtdewcmuOP12y ago

>It's also of note that branches are referred to as 'references' too, hence storing them under `.git/refs/`.

I've developed a loathing of excessive hierarchies/trees, so I'd rather see them flattened in a single directory. But that makes sense.

>See: http://stackoverflow.com/questions/5290444/why-does-git-hash....

That's a good link. What's in an object? If an object corresponds to a commit, then it must aggregate data about changes to multiple files.

pyre12y ago

> I've developed a loathing of excessive hierarchies/trees, so I'd rather see them flattened in a single directory. But that makes sense.

> What's in an object?

If you really care to dive deeper, you can check objects here: https://github.com/git/git/blob/master/object.h

You can get a shorter version towards the bottom of the git manpage (e.g. `man git`):

  IDENTIFIER TERMINOLOGY
         <object>
             Indicates the object name for any
             type of object.
  
         <blob>
             Indicates a blob object name.
  
         <tree>
             Indicates a tree object name.
  
         <commit>
             Indicates a commit object name.
  
         <tree-ish>
             Indicates a tree, commit or tag
             object name. A command that takes a
             <tree-ish> argument ultimately wants
             to operate on a <tree> object but
             automatically dereferences <commit>
             and <tag> objects that point at a
             <tree>.
  
         <commit-ish>
             Indicates a commit or tag object
             name. A command that takes a
             <commit-ish> argument ultimately
             wants to operate on a <commit> object
             but automatically dereferences <tag>
             objects that point at a <commit>.
  
         <type>
             Indicates that an object type is
             required. Currently one of: blob,
             tree, commit, or tag.
  
         <file>
             Indicates a filename - almost always
             relative to the root of the tree
             structure GIT_INDEX_FILE describes.

1 more reply

mtdewcmuOP12y ago

pyre12y ago

Here's an example:

  $ git clone blah
  
  DAG:
  
    A - B - C - D - E
        \
         Z - X - Y
  
  
  Branches:
  
   master => E
   topic/new-feature => Y
  
  
  reflog:
  
    master
      E - clone from blah
  
    topic/new-feature
      Y - clone from blah

Now let's change where 'master' is pointing:

  $ git reset master C
  
  
  DAG:
  
    A - B - C - D - E
        \
         Z - X - Y
  
  
  Branches:
  
   master => C
   topic/new-feature => Y
  
  
  reflog:
  
    master
      E - clone from blah
      C - reset to C
  
    topic/new-feature
      Y - clone from blah

1 more reply

j / k navigate · click thread line to collapse