git partial-clone rollout experience - Engineering Productivity
—
Background: Partial-clone is a git feature allowing git to function with a copy of part of the repository - in other words your git checkout can function with a mix of locally downloaded assets and on-demand downloadable (lazy) assets. Both Git, GitLab and Github have done an excellent job describing this functionality; I encourage you to read the implementation details. |
The checkout size for the Android operating system has historically been a major challenge for internal and external developers, with a single checkout taking hundreds of gigs of diskspace which does not even include a build of Android which takes hundreds of GB on top of the source code. This effectively limits the number of source trees one can have checked out and built to ~4. Given the nature of Android development this sort of limitation directly impacts the productivity of engineers. I want to share how we improved this situation and how nothing is ever “simple”.
While historically Android developers have leveraged git clone depth options (shallow clone) to mitigate the size problems this came with the unfortunate tradeoff that they were unable to use tooling like git blame and git log in order to understand the history of the files they were touching. Git partial clone seemed like a perfect solution, not only do they get to retain history (through on-demand downloads of blobs and history when needed) but they are also able to significantly reduce the checkout size.
Our EngProd testing showed that compared to a normal (non-partial) checkout, there was a 33% decrease in disk utilization. So we began manually rolling out this functionality through the git repo tool to our internal developer population and almost instantly were flooded with feedback from our testing population with feedback like :“my editor just got really slow” to “when I run git commit it’s now taking hours”. This was unexpected and puzzling initially, however after watching the git process tree during these slow actions we found that git was resolving all of the lazy objects (hundreds and hundreds of async fetches) inside of a synchronous workflow.
The root of the problem exposed that a multitude of developer tools were running git blame behind the scenes to understand what files were changed. Some were trivial to fix while others were not tractable due to their existing ecosystem integrations (git layer for intellij)
While filter:blob=none provided the most significant disk use savings, it came with an unacceptable latency tradeoff in the developer inner loop. In order to identify a way to deliver this space savings, we went back to the fundamentals of exactly what and how platform development is done. Developer tooling that was in the time sensitive developer path was consistently running on text files. Was there a way we could tune our checkout to retain history information on these files while excluding the binary files? filter-spec does not allow a filter based on content type but it does support blob size limits. Through some quick corpus analysis we found that 99+% of files inside of the Android that were not binary files were under 10MB. Given this fact we tweaked our rollout to go from using filter=blob:none to filter=blob:10M - we went from fetching only what you need for the tip of tree checkout, to fetch all files <=10MB and their history, but exclude the history of all other files until asked. This allowed developer tooling to run git blame operations quickly while at the same time saving 30% of disk space when compared to the prior solution. With this one tweak we were able to complete the rollout and bring the benefits of partial clone to our developer populations.