This addresses https://github.com/go-gitea/gitea/issues/18352
It aims to improve performance (and resource use) of the `SyncReleasesWithTags` operation for pull-mirrors.
For large repositories with many tags, `SyncReleasesWithTags` can be a costly operation (taking several minutes to complete). The reason is two-fold:
1. on sync, every upstream repo tag is compared (for changes) against existing local entries in the release table to ensure that they are up-to-date.
2. the procedure for getting _each tag_ involves a series of git operations
```bash
git show-ref --tags -- v8.2.4477
git cat-file -t 29ab6ce9f36660cffaad3c8789e71162e5db5d2f
git cat-file -p 29ab6ce9f36660cffaad3c8789e71162e5db5d2f
git rev-list --count 29ab6ce9f36660cffaad3c8789e71162e5db5d2f
```
of which the `git rev-list --count` can be particularly heavy.
This PR optimizes performance for pull-mirrors. We utilize the fact that a pull-mirror is always identical to its upstream and rebuild the entire release table on every sync and use a batch `git for-each-ref .. refs/tags` call to retrieve all tags in one go.
For large mirror repos, with hundreds of annotated tags, this brings down the duration of the sync operation from several minutes to a few seconds. A few unscientific examples run on my local machine:
- https://github.com/spring-projects/spring-boot (223 tags)
- before: `0m28,673s`
- after: `0m2,244s`
- https://github.com/kubernetes/kubernetes (890 tags)
- before: `8m00s`
- after: `0m8,520s`
- https://github.com/vim/vim (13954 tags)
- before: `14m20,383s`
- after: `0m35,467s`
I added a `foreachref` package which contains a flexible way of specifying which reference fields are of interest (`git-for-each-ref(1)`) and to produce a parser for the expected output. These could be reused in other places where `for-each-ref` is used. I'll add unit tests for those if the overall PR looks promising.
This follows
* https://github.com/go-gitea/gitea/issues/18553
Introduce `RunWithContextString` and `RunWithContextBytes` to help the refactoring. Add related unit tests. They keep the same behavior to save stderr into err.Error() as `RunInXxx` before.
Remove `RunInDirTimeoutPipeline` `RunInDirTimeoutFullPipeline` `RunInDirTimeout` `RunInDirTimeoutEnv` `RunInDirPipeline` `RunInDirFullPipeline` `RunTimeout`, `RunInDirTimeoutEnvPipeline`, `RunInDirTimeoutEnvFullPipeline`, `RunInDirTimeoutEnvFullPipelineFunc`.
Then remaining `RunInDir` `RunInDirBytes` `RunInDirWithEnv` can be easily refactored in next PR with a simple search & replace:
* before: `stdout, err := RunInDir(path)`
* next: `stdout, _, err := RunWithContextString(&git.RunContext{Dir:path})`
Other changes:
1. When `timeout <= 0`, use default. Because `timeout==0` is meaningless and could cause bugs. And now many functions becomes more simple, eg: `GitGcRepos` 9 lines to 1 line. `Fsck` 6 lines to 1 line.
2. Only set defaultCommandExecutionTimeout when the option `setting.Git.Timeout.Default > 0`
Adding additional usernames which are already routes, remove unused ones.
In future, avoid reserving names as much as possible, use `/-/` in path instead.
Gitea was not able to supply any authentication parameters to it. So this brings support to do that, along with some light extraction of a couple of bits into some separate functions for easier testing.
I looked at other libraries supporting similar RedisUri-style connection strings (e.g. Lettuce), but it looks like this type of configuration is beyond what would typically be done in a connection string. Since gitea doesn't have configuration options for manually specifying all this redis connection detail, I went ahead and just chose straightforward names for these new parameters.
* ROOT_URL issues: some users did wrong to there app.ini config, then:
* The assets can not be loaded (AppSubUrl != "" and users try to access http://host:3000/)
*The ROOT_URL is wrong, then many URLs in Gitea are broken.
Now Gitea show enough information to users.
* JavaScript error issues, there are many users affected by JavaScript errors, some are caused by frontend bugs, some are caused by broken customized templates. If these JS errors can be found at first time, then maintainers do not need to ask about how bug occurs again and again.
* Some people like to modify the `head.tmpl`, so we separate the script part to `head_script.tmpl`, then it's much safer.
* use specialized CSS class "js-global-error", end users still have a chance to hide error messages by customized CSS styles.
Strangely #19038 appears to relate to an issue whereby a tag appears to
be listed in `git show-ref --tags` but then does not appear when `git
show-ref --tags -- short_name` is called.
As a solution though I propose to stop the second call as it is
unnecessary and only likely to cause problems.
I've also noticed that the tags calls are wildly inefficient and aren't using the common cat-files - so these have been added.
I've also noticed that the git commit-graph is not being written on mirroring - so I've also added writing this to the migration which should improve mirror rendering somewhat.
Fix#19038
Signed-off-by: Andrew Thornton <art27@cantab.net>
Co-authored-by: 6543 <6543@obermui.de>
There is yet another problem with conflicted files not being reset when
the test patch resolves them.
This PR adjusts the code for checkConflicts to reset the ConflictedFiles
field immediately at the top. It also adds a reset to conflictedFiles
for the manuallyMerged and a shortcut for the empty status in
protectedfiles.
Signed-off-by: Andrew Thornton <art27@cantab.net>
The last PR about clone buttons introduced an JS error when visiting an empty repo page:
* https://github.com/go-gitea/gitea/pull/19028
* `Uncaught ReferenceError: isSSH is not defined`, because the variables are scoped and doesn't share between sub templates.
This:
1. Simplify `templates/repo/clone_buttons.tmpl` and make code clear
2. Move most JS code into `initRepoCloneLink`
3. Remove unused `CloneLink.Git`
4. Remove `ctx.Data["DisableSSH"] / ctx.Data["ExposeAnonSSH"] / ctx.Data["DisableHTTP"]`, and only set them when is is needed (eg: deploy keys / ssh keys)
5. Introduce `Data["CloneButton*"]` to provide data for clone buttons and links
6. Introduce `Data["RepoCloneLink"]` for the repo clone link (not the wiki)
7. Remove most `ctx.Data["PageIsWiki"]` because it has been set in the `/wiki` middleware
8. Remove incorrect `quickstart` class in `migrating.tmpl`
As discussed on #19221 we should store the results of the last task message on the
crontask and show them on the monitor page.
Signed-off-by: Andrew Thornton <art27@cantab.net>
Co-authored-by: wxiaoguang <wxiaoguang@gmail.com>
This PR adds the necessary work to make it possible to create files on empty
repos using the API.
Fix#10993
Signed-off-by: Andrew Thornton <art27@cantab.net>
* remove the global methods but create dynamiclly
* Fix lint
* Fix windows lint
* Fix windows lint
* some improvements
Co-authored-by: wxiaoguang <wxiaoguang@gmail.com>
There is a bug in the system webhooks whereby the active state is not checked when
webhooks are prepared and there is a bug that deactivating webhooks do not prevent
queued deliveries.
* Only add SystemWebhooks to the prepareWebhooks list if they are active
* At the time of delivery if the underlying webhook is not active mark it
as "delivered" but with a failed delivery so it does not get delivered.
Fix#19220
Signed-off-by: Andrew Thornton <art27@cantab.net>
Co-authored-by: Lunny Xiao <xiaolunwen@gmail.com>
So whilst #19225 fixes one issue it caused another. We need to initialise the Git
module first.
Related #19225Fix#19162
Signed-off-by: Andrew Thornton <art27@cantab.net>
Co-authored-by: 6543 <6543@obermui.de>
* Touch mirrors on even on fail to update
If a mirror fails to be synchronised it should be pushed to the bottom of the queue
of the awaiting mirrors to be synchronised. At present if there LIMIT number of
broken mirrors they can effectively prevent all other mirrors from being synchronized
as their last_updated time will remain earlier than other mirrors.
Signed-off-by: Andrew Thornton <art27@cantab.net>
The git command by default adds a number of global arguments. These are not
helpful to be displayed in the process manager and so should be skipped for
default process descriptions.
Signed-off-by: Andrew Thornton <art27@cantab.net>
The RepoIndexerTest is failing with considerable frequency due to a race inherrent in
its design. This PR adjust this test to avoid the reliance on waiting for the populate
repo indexer to run and forcibly adds the repo to the queue. It then flushes the queue.
It may be worth separating out the tests somewhat by testing the Index function
directly away from the queue however, this forceful method should solve the current
problem.
Signed-off-by: Andrew Thornton <art27@cantab.net>
* Set the default branch for repositories generated from templates
* Allows default branch to be set through the API for repos generated from templates
* Update swagger API template
* Only set default branch to the one from the template if not specified
* Use specified default branch if it exists while generating git commits
Fix#19082
Co-authored-by: John Olheiser <john.olheiser@gmail.com>
Co-authored-by: zeripath <art27@cantab.net>
Change all cron tasks to make them no notice on success default. Instead if a user
wants notices on success they need to add NOTICE_ON_SUCCESS=true instead.
## ⚠️ BREAKING ⚠️
This changes the cron config so that notices on success are no longer set by default
and breaks NO_SUCCESS_NOTICE settings. Instead users who want notices on success
must set NOTICE_ON_SUCCESS=true instead.
Signed-off-by: Andrew Thornton <art27@cantab.net>
* Update custom/conf/app.example.ini
Co-authored-by: Norwin <noerw@users.noreply.github.com>
Co-authored-by: Norwin <noerw@users.noreply.github.com>
* Add auto logging of goroutine pid label
This PR uses unsafe to export the hidden runtime_getProfLabel function from the
runtime package and then casts the result to a map[string]string.
We can then interrogate this map to get the pid label from the goroutine allowing
us to log it with any logging request.
Reference #19202
Signed-off-by: Andrew Thornton <art27@cantab.net>
This PR adds a middleware which sets a ContextUser (like GetUserByParams before) in a single place which can be used by other methods. For routes which represent a repo or org the respective middlewares set the field too.
Also fix a bug in modules/context/org.go during refactoring.