“The implication here is that any code committed to a public repository may be accessible forever as long as there is at least one fork of that repository,” the report’s authors claim.
Am I dumb or is this exactly the purpose of forks? I feel like I’m missing something.
I’m still not sure that answers it. If I fork a project, and the upstream project commits an API key (after I’ve forked it), then they delete the commit, does this commit stay available to me (unexpected behaviour)? Or is it only if I sync that commit into my repo while it’s in the upstream repo (expected behaviour)?
Or is it talking about this from a comment here:
Someone replies and said by having garbage collection kick in it removes this unconnected commit, but it’s not clear to me whether this works for github or just the local git repo.
Perhaps the issue is that these commits are synced into upstream/downstream repos when synced when they should not be?
Like I said, I’m really confused about the specifics of this.
I think Github keeps all the commits of forks in a single pool. So if someone commits a secret to one fork, that commit could be looked up in any of them, even if the one that was committed to was private/is deleted/no references exist to the commit.
The big issue is discovery. If no-one has pulled the leaky commit onto a fork, then the only way to access it is to guess the commit hash. Github makes this easier for you:
I think all GitHub should do is prune orphaned commits from the auto-suggestion list. If someone grabbed the complete commit ID then they probably grabbed the content already anyway.