code.dlang.org has not been responding all day.
https://www.isitdownrightnow.com/code.dlang.org.html
-- Bastiaan.
Thread overview | |||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
April 10 code.dlang.org is down | ||||
---|---|---|---|---|
| ||||
code.dlang.org has not been responding all day. https://www.isitdownrightnow.com/code.dlang.org.html -- Bastiaan. |
April 10 Re: code.dlang.org is down | ||||
---|---|---|---|---|
| ||||
Posted in reply to Bastiaan Veelo | On Thursday, 10 April 2025 at 18:22:40 UTC, Bastiaan Veelo wrote: >code.dlang.org has not been responding all day. https://www.isitdownrightnow.com/code.dlang.org.html -- Bastiaan. All credits to Elias: Maybe not all day, but it has some issues |
April 10 Re: code.dlang.org is down | ||||
---|---|---|---|---|
| ||||
Posted in reply to Bastiaan Veelo | On Thursday, 10 April 2025 at 18:22:40 UTC, Bastiaan Veelo wrote: > code.dlang.org has not been responding all day. > > https://www.isitdownrightnow.com/code.dlang.org.html > > -- Bastiaan. This too: https://run.dlang.io/ Matheus. |
April 11 Re: code.dlang.org is down | ||||
---|---|---|---|---|
| ||||
Posted in reply to Bastiaan Veelo | Am 10.04.2025 um 20:22 schrieb Bastiaan Veelo:
> code.dlang.org has not been responding all day.
>
> https://www.isitdownrightnow.com/code.dlang.org.html
>
> -- Bastiaan.
I'm not sure what exactly causes it, but the process is in a state where it's mostly busy with garbage collection. It seems like there is maybe a self-reinforcing effect, where incoming connections that time out lead to higher GC pressure.
The other thing is that there appear to be some very aggressive crawlers that go through the whole site at maximum speed with what looks like 8 parallel requests. Maybe it's possible to get that under control through the Cloudflare frontend? Of course they are using a Safari user agent string instead of something that would identify them as bots.
Finally, the fallback server logic for dub doesn't appear to work correctly anymore - at least for me it hangs more or less indefinitely instead of falling back to codemirror.dlang.org.
I don't have a lot of time to look into this right now, but I'll see if I can do something. It would be good if someone with Cloudflare access could look into a possible mitigation there.
|
April 11 Re: code.dlang.org is down | ||||
---|---|---|---|---|
| ||||
Posted in reply to Sönke Ludwig | On Friday, 11 April 2025 at 09:06:01 UTC, Sönke Ludwig wrote: > Am 10.04.2025 um 20:22 schrieb Bastiaan Veelo: >> [...] > > I'm not sure what exactly causes it, but the process is in a state where it's mostly busy with garbage collection. It seems like there is maybe a self-reinforcing effect, where incoming connections that time out lead to higher GC pressure. > > [...] https://arstechnica.com/ai/2025/03/devs-say-ai-crawlers-dominate-traffic-forcing-blocks-on-entire-countries |
April 11 Re: code.dlang.org is down | ||||
---|---|---|---|---|
| ||||
Posted in reply to Sönke Ludwig | Am 11.04.2025 um 11:06 schrieb Sönke Ludwig: > Am 10.04.2025 um 20:22 schrieb Bastiaan Veelo: >> code.dlang.org has not been responding all day. >> >> https://www.isitdownrightnow.com/code.dlang.org.html >> >> -- Bastiaan. > > I'm not sure what exactly causes it, but the process is in a state where it's mostly busy with garbage collection. It seems like there is maybe a self-reinforcing effect, where incoming connections that time out lead to higher GC pressure. > > The other thing is that there appear to be some very aggressive crawlers that go through the whole site at maximum speed with what looks like 8 parallel requests. Maybe it's possible to get that under control through the Cloudflare frontend? Of course they are using a Safari user agent string instead of something that would identify them as bots. > > Finally, the fallback server logic for dub doesn't appear to work correctly anymore - at least for me it hangs more or less indefinitely instead of falling back to codemirror.dlang.org. > > I don't have a lot of time to look into this right now, but I'll see if I can do something. It would be good if someone with Cloudflare access could look into a possible mitigation there. It's a little bit better now with one source of GC allocations temporarily eliminated. As a workaround, you can manually configure codemirror.dlang.org to take precedence by putting this in ~/.dub/settings.json: { "registryUrls": ["https://codemirror.dlang.org/"] } |
6 days ago Re: code.dlang.org is down | ||||
---|---|---|---|---|
| ||||
Posted in reply to Sönke Ludwig | On Friday, 11 April 2025 at 15:19:18 UTC, Sönke Ludwig wrote: > Am 11.04.2025 um 11:06 schrieb Sönke Ludwig: >> Am 10.04.2025 um 20:22 schrieb Bastiaan Veelo: >>> code.dlang.org has not been responding all day. >>> >>> https://www.isitdownrightnow.com/code.dlang.org.html >>> >>> -- Bastiaan. >> >> I'm not sure what exactly causes it, but the process is in a state where it's mostly busy with garbage collection. It seems like there is maybe a self-reinforcing effect, where incoming connections that time out lead to higher GC pressure. >> >> The other thing is that there appear to be some very aggressive crawlers that go through the whole site at maximum speed with what looks like 8 parallel requests. Maybe it's possible to get that under control through the Cloudflare frontend? Of course they are using a Safari user agent string instead of something that would identify them as bots. >> >> Finally, the fallback server logic for dub doesn't appear to work correctly anymore - at least for me it hangs more or less indefinitely instead of falling back to codemirror.dlang.org. >> >> I don't have a lot of time to look into this right now, but I'll see if I can do something. It would be good if someone with Cloudflare access could look into a possible mitigation there. > > It's a little bit better now with one source of GC allocations temporarily eliminated. As a workaround, you can manually configure codemirror.dlang.org to take precedence by putting this in ~/.dub/settings.json: > > { > "registryUrls": ["https://codemirror.dlang.org/"] > } Anubis would also be an option, at least for the site frontend. https://anubis.techaro.lol/ |
6 days ago Re: code.dlang.org is down | ||||
---|---|---|---|---|
| ||||
Posted in reply to Luna | Am 10.05.2025 um 14:20 schrieb Luna:
> On Friday, 11 April 2025 at 15:19:18 UTC, Sönke Ludwig wrote:
>> (...)
>>
>> It's a little bit better now with one source of GC allocations temporarily eliminated. As a workaround, you can manually configure codemirror.dlang.org to take precedence by putting this in ~/.dub/ settings.json:
>>
>> {
>> "registryUrls": ["https://codemirror.dlang.org/"]
>> }
>
> Anubis would also be an option, at least for the site frontend.
> https://anubis.techaro.lol/
>
The problem is that the frontend is CloudFlare now, so I think the only solution would be to use their own bot labyrinth functionality. I don't have access to the Cloudflare account, though.
|
4 days ago Re: code.dlang.org is down | ||||
---|---|---|---|---|
| ||||
Posted in reply to Sönke Ludwig | I would like to ask you if you thought about some of the following measures (not necessarily in their order of appearance): On Friday, 11 April 2025 at 09:06:01 UTC, Sönke Ludwig wrote: > [...] > I'm not sure what exactly causes it, but the process is in a state where it's mostly busy with garbage collection. It seems like there is maybe a self-reinforcing effect, where incoming connections that time out lead to higher GC pressure. - Turn off the GC. - Run vibe.d under hard memory/cpu limit (which terminate the process if exceeded). - Deploy a monitoring process which checks if vibe.d responds in say 20 ms. If not kill vibe.d with SIGKILL. - flock with LOCK_EX on an open fd fopend on the vibe.d binary right before vibe.d issues the bind call. - Have a second vibe.d process running blocking on it's flock call (= auto restart). > The other thing is that there appear to be some very aggressive crawlers that go through the whole site at maximum speed with what looks like 8 parallel requests. To me it seems that code.dlang.org is mostly not a web app but a web site with static content. Have you been thinking of serving that static content with apache/nginx using rate limiting (mod_throttle etc.)? Or putting this content directly to CF who is also a web hoster? |
4 days ago Re: code.dlang.org is down | ||||
---|---|---|---|---|
| ||||
Posted in reply to kdevel | Am 12.05.2025 um 13:49 schrieb kdevel: > I would like to ask you if you thought about some of the following > measures (not necessarily in their order of appearance): > > On Friday, 11 April 2025 at 09:06:01 UTC, Sönke Ludwig wrote: >> [...] >> I'm not sure what exactly causes it, but the process is in a state where it's mostly busy with garbage collection. It seems like there is maybe a self-reinforcing effect, where incoming connections that time out lead to higher GC pressure. > > - Turn off the GC. In the situation that lead to the issues that would blow up memory usage within a very short amount of time, replacing the bad responsiveness with a non-responsive system or in the OOM killer terminating some process. > > - Run vibe.d under hard memory/cpu limit (which terminate > the process if exceeded). The problem here was not hardware resources, but the fact that the GC was stopping the process for a large amount of time, as well as allocation overhead. The concurrent GC might improve this, but the question is whether that would then lead to excessive memory usage. > > - Deploy a monitoring process which checks if vibe.d responds > in say 20 ms. If not kill vibe.d with SIGKILL. There is a certain startup overhead and there are some requests that can take longer (registry dump being the longest, but also requesting information about a dependency graph, which is done by dub). I think this should really only be a last-resort approach (e.g. response time > 5s), because it replaces bad response times with failed requests. > - flock with LOCK_EX on an open fd fopend on the vibe.d binary > right before vibe.d issues the bind call. > > - Have a second vibe.d process running blocking on it's flock call > (= auto restart). This would still mean that active connections will fail, which is not ideal in a situation where the restart would be frequently necessary. > >> The other thing is that there appear to be some very aggressive crawlers that go through the whole site at maximum speed with what looks like 8 parallel requests. > > To me it seems that code.dlang.org is mostly not a web app but > a web site with static content. Have you been thinking of serving > that static content with apache/nginx using rate limiting > (mod_throttle etc.)? Or putting this content directly to CF who is > also a web hoster? This is not really true, the truly static content has been moved to dub.pm a while ago and the rest is made up of dynamic views on the package database. Of course it would be possible to cache pages, but that wouldn't help against crawlers. Writing out all package and package list pages and then serving them as static content would result in a huge amount of files and occupied memory and would be time consuming. This would only really make sense when the number of pages would get reduced massively (e.g. no per-version package pages, limiting the number of result pages for the popular/new/updated package lists). Optimizing the memory allocation patterns I think is the most efficient approach to improve the situation in the short term. Redesigning the package update process so that it runs in a separate process that communicates with one or more web frontends would enable scaling and load balancing, but would be more involved. |