Skip to main content

Implement Git with Rust Part0

· 5 min read
forfd8960
Author

What is Git

We use git to manage our code, and do version control.

  • generally, I use git init or git clone to start a project locally.
  • And do some work in the git repo.
  • And then use git add to add the files to the staging area.
  • And then use git commit to commit the changes to the local repository.
  • And then use git push to push the files to the remote repository.
  • And then use git pull origin master to pull the changes from the remote repository.
  • use git checkout -b <new_branch> to create and checkout to a new branch.
  • use git rebase master to rebase changes from master to the current branch.

Why use rust to implement git

Just want to know deeply how Git works internally by building it.

The first step

There are some questions I want to know:

  • What happends when init a new git repo.

  • What happends when add a new file to the staging area.

  • What happends when commit the changes to the local repository.

  • How git store the files?

  • How git store the history?

  • How git store the branches?

  • How git store the tags?

  • How git store the remote repository?

  • How git knows the current branch.

  • How git restore the files when checkout to another branch.

I checked some blogs on how git works inernaly to understand the basic concepts.

Some core git objects:

Blob: Binary Large Object

In git, the contents of files are stored in objects called blobs, binary large objects. Blobs, on the other hand, are just contents — binary streams of data. A blob doesn’t register its creation date, its name, or anything but its contents.

Every blob in git is identified by its SHA-1 hash. SHA-1 hashes consist of 20 bytes, usually represented by 40 characters in hexadecimal form.

A blob object:

git cat-file -p 2373d25e28b1fa10d1e9cee7b0380860b59451f4
[package]
name = "git-rs"
version = "0.1.0"
edition = "2021"

[dependencies]

Commit: Pointer to a tree of changes.

Stored as Objects on the filesystem.

In git, a snapshot is a commit. A commit object includes a pointer to the main tree (the root directory), as well as other meta-data such as the committer, a commit message and the commit time.

Every commit holds the entire snapshot, not just diffs from the previous commit(s).

A commit object:

git cat-file -p 1321f01cf1ef8ac81b839e9f7976d740d2d27246
tree 9a8f8cfb359e7a106b96c749c69aa13a5e74a09a
author forfd8960 <[email protected]> 1723947713 +0800
committer forfd8960 <[email protected]> 1723947713 +0800

init git-rs

Tree

A tree is basically a directory listing, referring to blobs as well as other trees.

Trees are identified by their SHA-1 hashes as well. Referring to these objects, either blobs or other trees, happens via the SHA-1 hash of the objects.

A tree object:

git cat-file -p 9a8f8cfb359e7a106b96c749c69aa13a5e74a09a
100644 blob ea8c4bf7f35f6f77f75d92ad8ce8349f6e81ddba .gitignore
100644 blob 2373d25e28b1fa10d1e9cee7b0380860b59451f4 Cargo.toml
100644 blob c83c092da787cf77f810b961909987b55ccf8db9 README.md
040000 tree 305157a396c6858705a9cb625bab219053264ee4 src

Branch

Init the Project

cargo new --bin git-rs

init the git commands

  • `src/ccommand/mod.rs``
use clap::{Parser, Subcommand};

#[derive(Debug, Parser)]
#[command(name="simple-git", version="0.0.1", about, long_about = None)]
pub struct SimpleGit {
#[command(subcommand)]
pub command: GitSubCommand,
}

#[derive(Debug, Subcommand)]
pub enum GitSubCommand {
#[command(name = "init", about = "init a git repo")]
Init(InitOpts),
}

#[derive(Debug, Parser)]
pub struct InitOpts {
/// Only print error and warning messages; all other output will be suppressed.
#[arg(short, long)]
pub quiet: bool,

/// Specify the given object <format> (hash algorithm) for the repository. The valid values are sha1 and (if enabled) sha256. sha1 is the default.
#[arg(long = "object-format", default_value = "sha1")]
pub object_format: String,

#[arg(long = "ref-format", default_value = "files")]
pub ref_format: String,

/// Specify the directory from which templates will be used.
#[arg(long = "template")]
pub template: String,

/// Use <branch-name> for the initial branch in the newly created repository. If not specified, fall back to the default name.
#[arg(short, long = "initial-branch")]
pub branch: String,
}

  • src/main.rs
use clap::Parser;
use git_rs::command::SimpleGit;

fn main() {
let cmd = SimpleGit::parse();
println!("{:?}", cmd);
}

run

cargo run  -- --help
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.05s
Running `target/debug/git-rs --help`
Usage: git-rs <COMMAND>

Commands:
init init a git repo
help Print this message or the help of the given subcommand(s)

Options:
-h, --help Print help
-V, --version Print version
cargo run -- help init
Compiling git-rs v0.1.0
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.93s
Running `target/debug/git-rs help init`
init a git repo

Usage: git-rs init [OPTIONS] --template <TEMPLATE> --initial-branch <BRANCH>

Options:
-q, --quiet Only print error and warning messages; all other output will be suppressed
--object-format <OBJECT_FORMAT> Specify the given object <format> (hash algorithm) for the repository. The valid values are sha1 and (if enabled) sha256. sha1 is the default [default: sha1]
--ref-format <REF_FORMAT> [default: files]
--template <TEMPLATE> Specify the directory from which templates will be used
-b, --initial-branch <BRANCH> Use <branch-name> for the initial branch in the newly created repository. If not specified, fall back to the default name
-h, --help Print help