[pull] master from git:master #153

pull · 2026-01-21T21:02:25Z

See Commits and Changes for more details.

Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

…urce * ps/object-read-stream: streaming: drop redundant type and size pointers streaming: move into object database subsystem streaming: refactor interface to be object-database-centric streaming: move logic to read packed objects streams into backend streaming: move logic to read loose objects streams into backend streaming: make the `odb_read_stream` definition public streaming: get rid of `the_repository` streaming: rely on object sources to create object stream packfile: introduce function to read object info from a store streaming: move zlib stream into backends streaming: create structure for filtered object streams streaming: create structure for packed object streams streaming: create structure for loose object streams streaming: create structure for in-core object streams streaming: allocate stream inside the backend-specific logic streaming: explicitly pass packfile info when streaming a packed object streaming: propagate final object type via the stream streaming: drop the `open()` callback function streaming: rename `git_istream` into `odb_read_stream`

* js/test-symlink-windows: t7800: work around the MSYS path conversion on Windows t6423: introduce Windows-specific handling for symlinking to /dev/null t1305: skip symlink tests that do not apply to Windows t1006: accommodate for symlink support in MSYS2 t0600: fix incomplete prerequisite for a test case t0301: another fix for Windows compatibility t0001: handle `diff --no-index` gracefully mingw: special-case `open(symlink, O_CREAT | O_EXCL)` apply: symbolic links lack a "trustable executable bit" t9700: accommodate for Windows paths

…rovements * jc/object-read-stream-fix: odb: do not use "blank" substitute for NULL

From user feedback: three users commented that the `git reset [mode]` form is the one that they primarily use, and that they were suprised to see it listed last. ("I've never used git reset in any mode other than --hard"). Move it to be first, since the `git reset [mode]` form is what "Reset current HEAD to the specified state" at the beginning refers to, and because the `git reset [mode]` form is the only thing that `git reset` uniquely does, the others could also be done with `git restore`. Signed-off-by: Julia Evans <julia@jvns.ca> Signed-off-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: D. Ben Knoble <ben.knoble+github@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>

From user feedback, there were several points of confusion: - What "tree-ish", "entries", "working tree", "HEAD", and "index" mean ("I have no clue what the index is", "I've been using git for 20 years and still don't know what a tree-ish is"). Avoid using these terms where it makes sense. - What "optionally modifying index and working tree to match" means ("to match what?" "optionally based on what?") Remove this from the intro, we can say it later when giving more details. - One user suggested that "The <tree-ish>/<commit> defaults to HEAD in all forms." should be repeated later on, since it's easy to miss. Instead say that HEAD is the default in each case later. Another issue is that `git reset` consistently describes the action it does as "Reset ...", commands should not use their name to describe themselves, and that the word "mode" is used to mean several different things on this page. Address these by being more clear about two use cases for `git reset` ("to undo operations" and "to update staged files"), and explaining what the conditions are for each case instead of forcing the user to figure out the pattern is in first form vs the other 3 forms. Signed-off-by: Julia Evans <julia@jvns.ca> Signed-off-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: D. Ben Knoble <ben.knoble+github@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>

From user feedback, there was some confusion about the differences between the modes, including: 1. Sometimes it says "index" and sometimes "index file". Fix by replacing "index file" with "index". 2. Many comments about not being able to understand what `--merge` does. Fix by mentioning obscure situations, since that seems to be what it's for. Most folks will use `git <cmd> --abort`. 3. Issues telling the difference between --soft and --mixed, as well as --keep. Leave --keep alone because I couldn't understand its use case, but change `--soft` / `--mixed` / `--hard` as follows: --mixed is the default, so put it first. Describe --soft/--mixed/--hard with the following structure: * Start by saying what happens to the files in the working directory, because the thing users want to avoid most is irretrievably losing changes to their working directory files. * Then describe what happens to the staging area. Right now it seems to frame leaving the index alone as being a sort of neutral action. I think this is part of what's confusing users, because in Git when you update HEAD, Git almost always updates the index to match HEAD. So leaving the index unchanged while updating HEAD is actually quite unusual, and it deserves to be flagged. * Finally, give an example for --soft to explain a common use case. Signed-off-by: Julia Evans <julia@jvns.ca> Signed-off-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: D. Ben Knoble <ben.knoble+github@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>

From user feedback: - Continued confusion about the terms "tree-ish" and "pathspec" - The word "hunks" is confusing folks, use "changes" instead. - On the part about `git restore`, there were a few comments to the effect of "wait, this doesn't actually update any files? What? Why?" Be more direct that `git reset` does not update files: there's no obvious reason to suggest that folks use `git reset` followed by `git restore`, instead suggest just using `git restore`. Continue avoiding the use of the word "reset" to describe what "git reset" does. Signed-off-by: Julia Evans <julia@jvns.ca> Signed-off-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: D. Ben Knoble <ben.knoble+github@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>

* ps/odb-misc-fixes: odb: properly close sources before freeing them builtin/gc: fix condition for whether to write commit graphs

In subsequent patches we're about to move the packfile store from the object database layer into the object database source layer. Once done, we'll have one packfile store per source, where the source is owning the store. Prepare for this future and refactor `packfile_store_new()` to be initialized via an object database source instead of via the object database itself. This refactoring leads to a weird in-between state where the store is owned by the object database but created via the source. But this makes subsequent refactorings easier because we can now start to access the owning source of a given store. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>

When preparing a packfile we pass various pieces attached to the pack's object database source via the `struct prepare_pack_data`. Refactor this code to instead pass in the source directly. This reduces the number of variables we need to pass and allows for a subsequent refactoring where we start to prepare the pack via the source. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>

The kept pack cache is a cache of packfiles that are marked as kept either via an accompanying ".kept" file or via an in-memory flag. The cache can be retrieved via `kept_pack_cache()`, where one needs to pass in a repository. Ultimately though the kept-pack cache is a property of the packfile store, and this causes problems in a subsequent commit where we want to move down the packfile store to be a per-object-source entity. Prepare for this and refactor the kept-pack cache to work on top of a packfile store instead. While at it, rename both the function and flags specific to the kept-pack cache so that they can be properly attributed to the respective subsystems. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>

The function `unuse_one_window()` is responsible for unmapping one of the packfile windows, which is done when we have exceeded the allowed number of window. The function receives a `struct packed_git` as input, which serves as an additional packfile that should be considered to be closed. If not given, we seemingly skip that and instead go through all of the repository's packfiles. The conditional that checks whether we have a packfile though does not make much sense anymore, as we dereference the packfile regardless of whether or not it is a `NULL` pointer to derive the repository's packfile store. The function was originally introduced via f0e17e8 (pack: move release_pack_memory(), 2017-08-18), and here we indeed had a caller that passed a `NULL` pointer. That caller was later removed via 9827d4c (packfile: drop release_pack_memory(), 2019-08-12), so starting with that commit we always pass a `struct packed_git`. In 9c5ce06 (packfile: use `repository` from `packed_git` directly, 2024-12-03) we then inadvertently started to rely on the fact that the pointer is never `NULL` because we use it now to identify the repository. Arguably, it didn't really make sense in the first place that the caller provides a packfile, as the selected window would have been overridden anyway by the subsequent loop over all packfiles if there was an older window. So the overall logic is quite misleading overall. The only case where it _could_ make a difference is when there were two packfiles with the same `last_used` value, but that case doesn't ever happen because the `pack_used_ctr` is strictly increasing. Refactor the code so that we instead pass in the object database to help make the code less misleading. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>

The packfile store is a member of `struct object_database`, which means that we have a single store per database. This doesn't really make much sense though: each source connected to the database has its own set of packfiles, so there is a conceptual mismatch here. This hasn't really caused much of a problem in the past, but with the advent of pluggable object databases this is becoming more of a problem because some of the sources may not even use packfiles in the first place. Move the packfile store down by one level from the object database into the object database source. This ensures that each source now has its own packfile store, and we can eventually start to abstract it away entirely so that the caller doesn't even know what kind of store it uses. Note that we only need to adjust a relatively small number of callers, way less than one might expect. This is because most callers are using `repo_for_each_pack()`, which handles enumeration of all packfiles that exist in the repository. So for now, none of these callers need to be adapted. The remaining callers that iterate through the packfiles directly and that need adjustment are those that are a bit more tangled with packfiles. These will be adjusted over time. Note that this patch only moves the packfile store, and there is still a bunch of functions that seemingly operate on a packfile store but that end up iterating over all sources. These will be adjusted in subsequent commits. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>

When calling `packfile_store_get_packs()` we prepare not only the provided packfile store, but also all those of all other sources part of the same object database. This was required when the store was still sitting on the object database level. But now that it sits on the source level it's not anymore. Adapt the code so that we only prepare the MIDX of the provided store. All callers only work in the context of a single store or call the function in a loop over all sources, so this change shouldn't have any practical effects. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>

When calling `packfile_store_prepare()` we prepare not only the provided packfile store, but also all those of all other sources part of the same object database. This was required when the store was still sitting on the object database level. But now that it sits on the source level it's not anymore. Refactor the code so that we only prepare the single packfile store passed by the caller. Adapt callers accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>

The `find_kept_pack_entry()` function is only used in `has_object_kept_pack()`, which is only a trivial wrapper itself. Inline the latter into the former. Furthermore, reorder the code so that we can drop the declaration of the function in "packfile.h". This allows us to make the function file-local. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>

The function `find_pack_entry()` doesn't work on a specific packfile store, but instead works on the whole repository. This causes a bit of a conceptual mismatch in its callers: - `packfile_store_freshen_object()` supposedly acts on a store, and its callers know to iterate through all sources already. - `packfile_store_read_object_info()` behaves likewise. The only exception that doesn't know to handle iteration through sources is `has_object_pack()`, but that function is trivial to adapt. Refactor the code so that `find_pack_entry()` works on the packfile store level instead. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>

The multi-pack index still is tracked as a member of the object database source, but ultimately the MIDX is always tied to one specific packfile store. Move the structure into `struct packfile_store` accordingly. This ensures that the packfile store now keeps track of all data related to packfiles. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>

Fsck has a race when operating on live repositories; consider the following simple script that writes new commits as fsck runs: #!/bin/bash git fsck & PID=$! while ps -p $PID >/dev/null; do sleep 3 git commit -q --allow-empty -m "Another commit" done Since fsck walks objects for connectivity and then reads the refs at the end to check, this can cause fsck to get confused and think that the new refs refer to missing commits and that new reflog entries are invalid. Running the above script in a clone of git.git results in the following (output ellipsized to remove additional errors of the same type): $ ./fsck-while-writing.sh Checking ref database: 100% (1/1), done. Checking object directories: 100% (256/256), done. warning in tag d6602ec: missingTaggerEntry: invalid format - expected 'tagger' line Checking objects: 100% (835091/835091), done. error: HEAD: invalid reflog entry 2aac9f9286e2164fbf8e4f1d1df53044ace2b310 error: HEAD: invalid reflog entry 2aac9f9286e2164fbf8e4f1d1df53044ace2b310 error: HEAD: invalid reflog entry da0f5b80d61844a6f0ad2ddfd57e4fdfa246ea68 error: HEAD: invalid reflog entry da0f5b80d61844a6f0ad2ddfd57e4fdfa246ea68 [...] error: HEAD: invalid reflog entry 87c8a5c2f6b79d9afa9e941590b9a097b6f7ac09 error: HEAD: invalid reflog entry d80887a48865e6ad165274b152cbbbed29f8a55a error: HEAD: invalid reflog entry d80887a48865e6ad165274b152cbbbed29f8a55a error: HEAD: invalid reflog entry 6724f2dfede88bfa9445a333e06e78536c0c6c0d error: refs/heads/mybranch invalid reflog entry 2aac9f9286e2164fbf8e4f1d1df53044ace2b310 error: refs/heads/mybranch: invalid reflog entry 2aac9f9286e2164fbf8e4f1d1df53044ace2b310 error: refs/heads/mybranch: invalid reflog entry da0f5b80d61844a6f0ad2ddfd57e4fdfa246ea68 error: refs/heads/mybranch: invalid reflog entry da0f5b80d61844a6f0ad2ddfd57e4fdfa246ea68 [...] error: refs/heads/mybranch: invalid reflog entry 87c8a5c2f6b79d9afa9e941590b9a097b6f7ac09 error: refs/heads/mybranch: invalid reflog entry d80887a48865e6ad165274b152cbbbed29f8a55a error: refs/heads/mybranch: invalid reflog entry d80887a48865e6ad165274b152cbbbed29f8a55a error: refs/heads/mybranch: invalid reflog entry 6724f2dfede88bfa9445a333e06e78536c0c6c0d Checking connectivity: 833846, done. missing commit 6724f2dfede88bfa9445a333e06e78536c0c6c0d Verifying commits in commit graph: 100% (242243/242243), done. We can minimize the race opportunities by taking a snapshot of refs at program invocation, doing the connectivity check, and then checking the snapshotted refs afterward. This avoids races with regular refs between fsck and adding objects to the database, though it still leaves a race between a gc and fsck. We are less concerned about folks simultaneously running gc with fsck; though, if it becomes an issue, we could lock fsck during gc. We definitely do not want to lock fsck during operations that may add objects to the object store; that would be problematic for forges. Note that refs aren't the only problem, though; reflog entries and index entries could be problematic as well. For now we punt on index entries just leaving a TODO comment, and for reflogs we use a coarse solution of taking the time at the beginning of the program and ignoring reflog entries newer than that time. That may be imperfect if dealing with a network filesystem, so we leave TODO comment for those that want to improve that handling as well. As a high level overview: * In addition to fsck_handle_ref(), which now is only a few lines long to process a ref, there's also a snapshot_ref() which is called early in the program for each ref and takes all the error checking logic. * The iterating over refs that used to be in get_default_heads() plus a loop over the arguments now appears in shapshot_refs(). * There's a new process_refs() as well that kind of looks like the old get_default_heads() though it is streamlined due to the work done by snapshot_refs(). This combination of changes modifies the output of running the script (from the beginning of this commit message) to: $ ./fsck-while-writing.sh Checking ref database: 100% (1/1), done. Checking object directories: 100% (256/256), done. warning in tag d6602ec: missingTaggerEntry: invalid format - expected 'tagger' line Checking objects: 100% (835091/835091), done. Checking connectivity: 833846, done. Verifying commits in commit graph: 100% (242243/242243), done. While worries about live updates while running fsck is likely of most interest for forge operators, it may also benefit those with automated jobs (such as git maintenance) or even casual users who want to do other work in their clone while fsck is running. Helped-by: Junio C Hamano <gitster@pobox.com> Helped-by: Jeff King <peff@peff.net> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>

As pointed out in git-for-windows#1676, the `git rev-parse --is-inside-work-tree` command currently fails when the current directory's path contains symbolic links. The underlying reason for this bug is that `getcwd()` is supposed to resolve symbolic links, but our `mingw_getcwd()` implementation did not. We do have all the building blocks for that, though: the `GetFinalPathByHandleW()` function will resolve symbolic links. However, we only called that function if `GetLongPathNameW()` failed, for historical reasons: the latter function was supported for a long time, but the former API function was introduced only with Windows Vista, and we used to support also Windows XP. With that support having been dropped, we are free to call the symbolic link-resolving function right away. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>

In Git for Windows, `has_symlinks` is set to 0 by default. Therefore, we need to parse the config setting `core.symlinks` to know if it has been set to `true`. In `git init`, we must do that before copying the templates because they might contain symbolic links. Even if the support for symbolic links on Windows has not made it to upstream Git yet, we really should make sure that all the `core.*` settings are parsed before proceeding, as they might very well change the behavior of `git init` in a way the user intended. This fixes git-for-windows#3414 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>

The `strbuf_readlink()` function calls `readlink()`` twice if the hint argument specifies the exact size of the link target (e.g. by passing stat.st_size as returned by `lstat()`). This is necessary because `readlink(..., hint) == hint` could mean that the buffer was too small. Use `hint + 1` as buffer size to prevent this. Signed-off-by: Karsten Blees <karsten.blees@gmail.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>

The `strbuf_readlink()` function refuses to read link targets that exceed 2*PATH_MAX (even if a sufficient size was specified by the caller). The reason that that limit is 2*PATH_MAX instead of PATH_MAX is that the symlink targets do not need to be normalized. After running `ln -s a/../a/../a/../a/../b c`, the target of the symlink `c` will not be normalized to `b` but instead be much longer. As such, symlink targets' lengths can far exceed PATH_MAX. They are frequently much longer than 2*PATH_MAX on Windows, which actually supports paths up to 32,767 characters, but sets PATH_MAX to 260 for backwards compatibility. For full details, see https://learn.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation Let's just hard-code the limit used by `strbuf_readlink()` to 32,767 and make it independent of the current platform's PATH_MAX. Based-on-a-patch-by: Karsten Blees <karsten.blees@gmail.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>

Currently, this function hard-codes the directory separator as the forward slash. However, on Windows the backslash character is valid, too. And we want to call this function in the upcoming support for symlinks on Windows with the symlink targets (which naturally use the canonical directory separator on Windows, which is _not_ the forward slash). Prepare that function to be useful also in that context. Signed-off-by: Karsten Blees <karsten.blees@gmail.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>

git subtree split --prefix P detects splits that are outside of path prefix `P` and prunes them from history graph processing. This improves the performance of repeated `split --rejoin` with many different prefixes. Both before and after 83f9dad (contrib/subtree: fix split with squashed subtrees, 2025-09-09), the pruning logic does not detect **rebased** or **cherry-picked** git-subtree commits. If `split` encounters any of these commits, the split output may have incomplete history. All commits authored by git subtree merge [--squash] --prefix Q have a first or second parent that has *only* subtree commits as ancestors. When splitting a completely different path `P/`, it is safe to ignore: 1. the merged tree 2. the subtree parent 3. *all* of that parent's ancestry, which applies only to path `Q/` and not `P/`. But this relationship no longer holds if the git-subtree commit is rebased or otherwise reauthored. After a rebase, the former git-subtree commit will have other unrelated commits as ancestors. Ignoring these commits may exclude the history of `P/`, leading to incomplete `subtree split` output. The pruning logic relies solely on the `git-subtree-*:` trailers to detect git-subtree commits, which it blindly accepts without further validation. The split logic also takes its time about being wrong: `cmd_split()` execs a `git show` for *every* commit in the split range… twice. This is inefficient in a shell script. Add a "reality check" to ignore rebased or rewritten commits: * Rewrites of non-merge commits cannot be detected, so the new detector no longer looks for them. * Merges carry a `git-subtree-mainline:` trailer with the hash of the **first parent**. If this hash differs, or if the "merge" commit no longer has multiple parents, a rewrite has occurred. To increase speed, package this logic in a new method, `find_other_splits()`. Perform the check up-front by iterating over a single `git log`. Add ignored subtrees to: 1. the `notree` cache, which excludes them from the `split` history 2. a `prune` negative refs list. The negative refs prevent recursing into other subtrees. Since there are potentially a *lot* of these, cache them on disk and use rev-list's `--stdin` mode. Reported-by: George <george@mail.dietrich.pub> Signed-off-by: Colin Stagner <ask+git@howdoi.land> Signed-off-by: Junio C Hamano <gitster@pobox.com>

The not_found and forbidden methods currently do not write a newline to stderr after the error message. This means that if git-http-backend is invoked through something like fcgiwrap, and the stderr of that fcgiwrap process is sent to a logging daemon (e.g. journald), the error messages of several git-http-backend invocations will just get strung together, e.g. > Not a git repository: '/var/lib/git/foo.git'Not a git repository: '/var/lib/git/foo.git'Not a git repository: '/var/lib/git/foo.git' I think it's git-http-backend's responsibility to format these messages properly, rather than it being fcgiwrap's job to notice that the script didn't terminate stderr with a newline and do so itself. Signed-off-by: KJ Tsanaktsidis <kj@kjtsanaktsidis.id.au> Signed-off-by: Junio C Hamano <gitster@pobox.com>

Replace raw `test -f` and `! test -f` checks in the rewind test with `test_path_is_file` and `test_path_is_missing`. This provides clearer failure diagnostics and keeps the test consistent with the rest of the test suite. Signed-off-by: Pushkar Singh <pushkarkumarsingh1970@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>

There are some early returns in `odb_source_loose_read_object_info()` in cases where we don't have to open the loose object. These return paths do not set `struct object_info::whence` to `OI_LOOSE` though, so it becomes impossible for the caller to tell the format of such an object. The root cause of this really is that we have so many different return paths in the function. As a consequence, it's harder than necessary to make sure that all successful exit paths sot up the `whence` field as expected. Address this by refactoring the function to have a single exit path. Like this, we can trivially set up the `whence` field when we exit successfully from the function. Note that we also: - Rename `status` to `ret` to match our usual coding style, but also to show that the old `status` variable is now always getting the expected value. Furthermore, the value is not initialized anymore, which has the consequence that most compilers will warn for exit paths where we forgot to set it. - Move the setup of scratch pointers closer to `parse_loose_header()` to show where it's needed. - Guard a couple of variables on cleanup so that they only get released in case they have been set up. - Reset `oi->delta_base_oid` towards the end of the function, together with all the other object info pointers. Overall, all these changes result in a diff that is somewhat hard to read. But the end result is significantly easier to read and reason about, so I'd argue this one-time churn is worth it. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>

When reading object info via a packfile we yield one of two types: - The object can either be OI_PACKED, which is what a caller would typically expect. - Or it can be OI_DBCACHED if it is stored in the delta base cache. The latter really is an implementation detail though, and callers typically don't care at all about the difference. Furthermore, the information whether or not it is part of the delta base cache can already be derived via the `is_delta` field, so the fact that we discern between OI_PACKED and OI_DBCACHED only further complicates the interface. There aren't all that many callers that care about the `whence` field in the first place. In fact, there's only three: - `packfile_store_read_object_info()` checks for `whence == OI_PACKED` and then populates the packfile information of the object info structure. We now start to do this also for deltified objects, which gives its callers strictly more information. - `repack_local_links()` wants to determine whether the object is part of a promisor pack and checks for `whence == OI_PACKED`. If so, it verifies that the packfile is a promisor pack. It's arguably wrong to declare that an object is not part of a promisor pack only because it is stored in the delta base cache. - `is_not_in_promisor_pack_obj()` does the same, but checks that a specific object is _not_ part of a promisor pack. The same reasoning as above applies. Drop the OI_DBCACHED enum completely. None of the callers seem to care about the distinction. Note that this also fixes a segfault introduced in 8c1b84b (streaming: move logic to read packed objects streams into backend, 2025-11-23), which refactors how we stream packed objects. The intent is to only read packed objects in case they are stored non-deltified as we'd otherwise have to deflate them first. But the check for whether or not the object is stored as a delta was unconditionally done via `oi.u.packed.is_delta`, which is only valid in case `oi.whence` is `OI_PACKED`. But under some circumstances we got `OI_DBCACHED` here, which means that none of the `oi.u.packed` fields were initialized at all. Consequently, we assumed the object was not stored as a delta, and then try to read the object from `oi.u.packed.pack`, which is a `NULL` pointer and thus causes a segfault. Add a test case for this issue so that this cannot regress in the future anymore. Reported-by: Matt Smiley <msmiley@gitlab.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>

The `struct object_info::u::packed::is_delta` field determines whether or not a specific object is stored as a delta. It only stores whether or not the object is stored as delta, so it is treated as a boolean value. This boolean is insufficient though: when reading a packed object via `packfile_store_read_object_info()` we know to skip parsing the actual object when the user didn't request any object-specific data. In that case we won't read the object itself, but will only look up its position in the packfile. Consequently, we do not know whether it is a delta or not. This isn't really an issue right now, as the check for an empty request is broken. But a subsequent commit will fix it, and once we do we will have the need to also represent an "unknown" delta state. Prepare for this change by introducing a new enum that encodes the object type. We don't use the "unknown" state just yet, but will start to do so in a subsequent commit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>

The function `files_fsck_refs()` only has a single callsite and forwards all of its arguments as-is, so it's basically a useless indirection. Inline the function call. While at it, also remove the bitwise or that we have for return values. We don't really want to or them at all, but rather just want to return an error in case either of the functions has failed. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>

When checking the consistency of references we create a directory iterator and then verify each single reference in a loop. The logic to perform the actual checks is embedded into that loop, which makes it hard to reuse. But In a subsequent commit we're about to introduce a second path that wants to verify references. Prepare for this by extracting the logic to check a single reference into a standalone function. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>

The error handling when verifying symbolic refs is a bit on the wild side: - `fsck_report_ref()` can be told to ignore specific errors. If an error has been ignored and a previous check raised an unignored error, then assigning `ret = fsck_report_ref()` will cause us to swallow the previous error. - When the target reference is not valid we bail out early without checking for other errors. Fix both of these issues by consistently or'ing the return value and not bailing out early. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>

While the "files" backend already knows to perform consistency checks for the "refs/" hierarchy, it doesn't verify any of its root refs. Plug this omission. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>

The `struct fsck_ref_report` has a couple fields that are intended to improve the error reporting for broken ref reports by showing which object ID or target reference the ref points to. These fields are never set though and are thus essentially unused. Remove them. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>

The consistency checks for the "files" backend contain a couple of verifications for symrefs that verify generic properties of the target reference. These properties need to hold for every backend, no matter whether it's using the "files" or "reftable" backend. Reimplementing these checks for every single backend doesn't really make sense. Extract it into a generic `refs_fsck_symref()` function that can be used by other backends, as well. The "reftable" backend will be wired up in a subsequent commit. While at it, improve the consistency checks so that we don't complain about refs pointing to a non-ref target in case the target refname format does not verify. Otherwise it's very likely that we'll generate both error messages, which feels somewhat redundant in this case. Note that the function has a couple of `UNUSED` parameters. These will become referenced in a subsequent commit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>

In a subsequent commit we'll introduce new generic checks for direct refs. These checks will be independent of the actual backend. Introduce a new function `refs_fsck_ref()` that will be used for this purpose. At the current point in time it's still empty, but it will get populated in a subsequent commit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>

Adapt the includes to be sorted and to use include paths that are relative to the "refs/" directory. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>

Pull out the logic to retrieve a backend for a given worktree. This function will be used in a subsequent commit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>

The ref consistency checks are driven via `cmd_refs_verify()`. That function loops through all worktrees (including the main worktree) and then checks the ref store for each of them individually. It follows that the backend is expected to only verify refs that belong to the specified worktree. While the "files" backend handles this correctly, the "reftable" backend doesn't. In fact, it completely ignores the passed worktree and instead verifies refs of _all_ worktrees. The consequence is that we'll end up every ref store N times, where N is the number of worktrees. Or rather, that would be the case if we actually iterated through the worktree reftable stacks correctly. But we use `strmap_for_each_entry()` to iterate through the stacks, but the map is in fact not even properly populated. So instead of checking stacks N^2 times, we actually only end up checking the reftable stack of the main worktree. Fix this bug by only verifying the stack of the passed-in worktree and constructing the backends via `backend_for_worktree()`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>

In a preceding commit we have extracted generic checks for both direct and symbolic refs that apply for all backends. Wire up those checks for the "reftable" backend. Note that this is done by iterating through all refs manually with the low-level reftable ref iterator. We explicitly don't want to use the higher-level iterator that is exposed to users of the reftable backend as that iterator may swallow for example broken refs. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>

While most of the logic that verifies the consistency of refs is driven by `refs_fsck()`, we still have a small handful of checks in `fsck_head_link()`. These checks don't use the git-fsck(1) reporting infrastructure, and as such it's impossible to for example disable some of those checks. One such check detects refs that point to the all-zeroes object ID. Extract this check into the generic `refs_fsck_ref()` function that is used by both the "files" and "reftable" backends. Note that this will cause us to not return an error code from `fsck_head_link()` anymore in case this error was detected. This is fine though: the only caller of this function does not check the error code anyway. To demonstrate this, adapt the function to drop its return value altogether. The function will be removed in a subsequent commit anyway. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>

Move the check that detects "HEAD" refs that do not point at a branch into `refs_fsck()`. This follows the same motivation as the preceding commit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>

The function `fsck_head_link()` was historically used to perform a couple of consistency checks for refs. (Almost) all of these checks have now been moved into the refs subsystem. There's only a single check remaining that verifies whether `refs_resolve_ref_unsafe()` returns a `NULL` pointer. This may happen in a couple of cases: - When `refs_is_safe()` declares the ref to be unsafe. We already have checks for this as we verify refnames with `check_refname_format()`. - When the ref doesn't exist. A repository without "HEAD" is completely broken though, and we would notice this error ahead of time already. - In case the caller passes `RESOLVE_REF_READING` and the ref is a symref that doesn't resolve. We don't pass this flag though. As such, this check doesn't cover anything anymore that isn't already covered by `refs_fsck()`. Drop it, which also allows us to inline the call to `refs_resolve_ref_unsafe()`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>

MacOS14 (Sonoma) has started to ship an iconv library with bugs. The same bugs exists even in MacOS 15 (Sequoia) A bug report running the Git test suite says: three tests of t3900 fail on macOS 26.1 for me: not ok 17 - ISO-2022-JP should be shown in UTF-8 now not ok 25 - ISO-2022-JP should be shown in UTF-8 now not ok 38 - commit --fixup into ISO-2022-JP from UTF-8 Here's the verbose output of the first one: ================= expecting success of 3900.17 'ISO-2022-JP should be shown in UTF-8 now': compare_with ISO-2022-JP "$TEST_DIRECTORY"/t3900/2-UTF-8.txt --- /Users/x/src/git/t/t3900/2-UTF-8.txt 2024-10-01 19:43:24.605230684 +0000 +++ current 2025-12-08 21:52:45.786161909 +0000 @@ -1,5 +1,5 @@ はれひほふしているのが、いるので。 -濱浜ほれぷりぽれまびぐりろへ。 +濱浜ほれぷりぽれまび$0$j$m$X!# not ok 17 - ISO-2022-JP should be shown in UTF-8 now 1..17 ================= compare_with runs git show to display a commit message, which in this case here was encoded using ISO-2022-JP and is supposed to be reencoded to UTF-8, but git show only does that half-way -- the "$0$j$m$X!#" part is from the original ISO-2022-JP representation. That botched conversion is done by utf8.c::reencode_string_iconv(). It calls iconv(3) to do the actual work, initially with an output buffer of the same size as the input. If the output needs more space the function enlarges the buffer and calls iconv(3) again. iconv(3) won't tell us how much space it needs, but it will report what part it already managed to convert, so we can increase the buffer and continue from there. ISO-2022-JP has escape codes for switching between character sets, so it's a stateful encoding. I guess the iconv(3) on my machine forgets the state at the end of part one and then messes up part two. [end of citation] Working around the buggy iconv shipped with the OS can be done in two ways: a) Link Git against a different version of iconv b) Improve the handling when iconv needs a larger output buffer a) is already done by default when either Fink [1] or MacPorts [2] or Homebrew [3] is installed. b) is implemented here, in case that no fixed iconv is available: When the output buffer is too short, increase it (as before) and start from scratch (this is new). This workound needs to be enabled with '#define ICONV_RESTART_RESET' and a makefile knob will be added in the next commit Suggested-by: René Scharfe <l.s.r@web.de> Signed-off-by: Torsten Bögershausen <tboegi@web.de> [1] https://www.finkproject.org/ [2] https://www.macports.org/ [3] https://brew.sh/ Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>

The previous commit introduced a workaround in utf8.c to deal with broken iconv implementations. It is enabled when a MacOS version is used that has a buggy iconv library and there is no external library provided (and linked against) from neither MacPorts nor Homebrew nor Fink. For Homebrew, MacPorts and Fink we check if libiconv exist. Introduce 2 new macros: HAS_GOOD_LIBICONV and NEEDS_GOOD_LIBICONV. For Homebrew HAS_GOOD_LIBICONV is set when the libiconv directory exist. MacPorts can be installed with or without libiconv, so check if libiconv.dylib exists (which is a softlink) Fink compiles and installs libiconv by default. Note that a fresh installation of Fink now defaults to /opt/sw. Older versions used /sw as default, so leave the check and setting of BASIC_CFLAGS and BASIC_LDFLAGS as is. For the new default check for the existance of /opt/sw as well. Add a check for /opt/sw/lib/libiconv.dylib which sets HAS_GOOD_LIBICONV Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>

"git fsck" used inconsistent set of refs to show a confused warning, which has been corrected. * en/fsck-snapshot-ref-state: fsck: snapshot default refs before object walk

Documentation updates. * je/doc-reset: doc: git-reset: clarify `git reset <pathspec>` doc: git-reset: clarify `git reset [mode]` doc: git-reset: clarify intro doc: git-reset: reorder the forms

The split command in "git subtree" (in contrib/) has been taught to deal better with rebased history. * cs/rebased-subtree-split: contrib/subtree: detect rewritten subtree commits

The iconv library on macOS fails to correctly handle stateful ISO/IEC 2022 encoded strings. Work it around instead of replacing it wholesale from homebrew. * tb/macos-iconv-workarounds: utf8.c: enable workaround for iconv under macOS 14/15 utf8.c: prepare workaround for iconv under macOS 14/15

Update code paths that check data integrity around refs subsystem. cf. <CAOLa=ZShPP3BPXa=YnC-vuX4zF=pUTFdUidZwOdna8bfVTNM9w@mail.gmail.com> * ps/ref-consistency-checks: builtin/fsck: drop `fsck_head_link()` builtin/fsck: move generic HEAD check into `refs_fsck()` builtin/fsck: move generic object ID checks into `refs_fsck()` refs/reftable: introduce generic checks for refs refs/reftable: fix consistency checks with worktrees refs/reftable: extract function to retrieve backend for worktree refs/reftable: adapt includes to become consistent refs/files: introduce function to perform normal ref checks refs/files: extract generic symref target checks fsck: drop unused fields from `struct fsck_ref_report` refs/files: perform consistency checks for root refs refs/files: improve error handling when verifying symrefs refs/files: extract function to check single ref refs/files: remove useless indirection refs/files: remove `refs_check_dir` parameter refs/files: move fsck functions into global scope refs/files: simplify iterating through root refs

Test clean-up. * ps/t1410-cleanup: t1410: use test helpers in reflog rewind test

Some error messages from the http transport layer lacked the terminating newline, which has been corrected. * kt/http-backend-errors: http-backend: write newlines to stderr when responding with errors

The packfile_store data structure is moved from object store to odb source. * ps/packfile-store-in-odb-source: packfile: move MIDX into packfile store packfile: refactor `find_pack_entry()` to work on the packfile store packfile: inline `find_kept_pack_entry()` packfile: only prepare owning store in `packfile_store_prepare()` packfile: only prepare owning store in `packfile_store_get_packs()` packfile: move packfile store into object source packfile: refactor misleading code when unusing pack windows packfile: refactor kept-pack cache to work with packfile stores packfile: pass source to `prepare_pack()` packfile: create store via its owning source

The object-info API has been cleaned up. * ps/read-object-info-improvements: packfile: drop repository parameter from `packed_object_info()` packfile: skip unpacking object header for disk size requests packfile: disentangle return value of `packed_object_info()` packfile: always populate pack-specific info when reading object info packfile: extend `is_delta` field to allow for "unknown" state packfile: always declare object info to be OI_PACKED object-file: always set OI_LOOSE when reading object info

Further preparation to upstream symbolic link support on Windows. * js/prep-symlink-windows: trim_last_path_component(): avoid hard-coding the directory separator strbuf_readlink(): support link targets that exceed 2*PATH_MAX strbuf_readlink(): avoid calling `readlink()` twice in corner-cases init: do parse _all_ core.* settings early mingw: do resolve symlinks in `getcwd()`

Signed-off-by: Junio C Hamano <gitster@pobox.com>

gitster and others added 30 commits December 15, 2025 17:40

Merge branch 'jc/object-read-stream-fix' into ps/read-object-info-imp…

00f117f

…rovements * jc/object-read-stream-fix: odb: do not use "blank" substitute for NULL

Merge branch 'ps/odb-misc-fixes' into ps/packfile-store-in-odb-source

f1ec43d

* ps/odb-misc-fixes: odb: properly close sources before freeing them builtin/gc: fix condition for whether to write commit graphs

pks-t and others added 27 commits January 12, 2026 06:55

refs/reftable: adapt includes to become consistent

ab67f0a

Adapt the includes to be sorted and to use include paths that are relative to the "refs/" directory. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>

refs/reftable: extract function to retrieve backend for worktree

78384e2

Pull out the logic to retrieve a backend for a given worktree. This function will be used in a subsequent commit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>

Merge branch 'en/fsck-snapshot-ref-state'

6edbb7b

"git fsck" used inconsistent set of refs to show a confused warning, which has been corrected. * en/fsck-snapshot-ref-state: fsck: snapshot default refs before object walk

Merge branch 'je/doc-reset'

9813aac

Documentation updates. * je/doc-reset: doc: git-reset: clarify `git reset <pathspec>` doc: git-reset: clarify `git reset [mode]` doc: git-reset: clarify intro doc: git-reset: reorder the forms

Merge branch 'cs/rebased-subtree-split'

79e3055

The split command in "git subtree" (in contrib/) has been taught to deal better with rebased history. * cs/rebased-subtree-split: contrib/subtree: detect rewritten subtree commits

Merge branch 'ps/t1410-cleanup'

e01178b

Test clean-up. * ps/t1410-cleanup: t1410: use test helpers in reflog rewind test

Merge branch 'kt/http-backend-errors'

ab72d23

Some error messages from the http transport layer lacked the terminating newline, which has been corrected. * kt/http-backend-errors: http-backend: write newlines to stderr when responding with errors

Git 2.53-rc1

83a69f1

Signed-off-by: Junio C Hamano <gitster@pobox.com>

pull bot locked and limited conversation to collaborators Jan 21, 2026

pull bot added the ⤵️ pull label Jan 21, 2026

pull bot merged commit 83a69f1 into turkdevops:master Jan 21, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[pull] master from git:master #153

[pull] master from git:master #153

Uh oh!

pull bot commented Jan 21, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

[pull] master from git:master #153

[pull] master from git:master #153

Uh oh!

Conversation

pull bot commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

pull bot commented Jan 21, 2026 •

edited

Loading