National Usenet Breakage Day
Aug. 13th, 2007 03:43 pm...Or something. I didn't seem to get the memo, so I'm guessing.
Prequel: the primary reader crashed Friday night. Rebooted, no big deal. However, since then the natives have been woeful, noting that posting times are increased--that is, you post the article and it takes anywhere up to a minute and change for your newsreader to come back and get on with its life. At first I thought it might be related to said crash, but it was happening on both readers, and with any news client, including just talking directly to the NNTP port. (Which I still remember how to do, because I am awesome.) And then
sweh reminded me that, even though it doesn't look like what's happening, when our users post to the readers it doesn't actually post the article per se; it gets passed through shogun and then fed back to news-xfer which actually posts the article, so logically the nose goblins must be on news-xfer. But how does the article get there since it's not through the usual method?
I keep expecting someone whose job title officially says "sysadmin" to deal with the deep magic of news-related troubles, but that keeps not happening. I am in fact the one with the biggest INN clue except for El Jefe himself, and although he had a wizard's knowledge of it in the day, he hasn't actually touched it in years. So here am I with the leader badge.
The hunt goes through dark places, because our news setup is weird and mystical and somewhat ill-documented: that is, parts of it are meticulously documented, and others were Not Got Around To before the people who knew them left. Happily for me, many of those people are still around my world. So I got
jdev's help on ktrace and ktruss, and
5tonsflax kindly spent 40+ minutes on the phone with me as we unravelled the silliness together. And as often happens, the solution is very simple once you know where to look: the history database had gotten frightfully bloated (even for modern history databases), because expire hadn't run in $SOMETIME, because for some reason /news/db had become owned by root. None of the files in it, mind you, which I woulda noticed. Just the directory. Which is the more odd because the mount point in the dist structure is correctly permissioned, so this breakage should have been fixed at any of the times we've disted news-xfer in the last $SOMETIME (including this morning).
So, fixed that, was about to run expire when I realized there is probably not enough space on the partition. Phoned El Jefe. He thought there was, because he thought that the re-written database would be smaller. I didn't think it would be smaller enough. What a stupid time to be right... It's now running and dumping the new copy in a really silly part of the filesystem that nevertheless has scads of space, and hopefully it will finish in time for me to make sure it is all okay before I GO HOME.
Also, some Usenet performance artists are splashing their poo on various walls, and making a clumsy attempt to joe-job Supernews with it. Anyone know if "usenetserver.com" takes action against this kind of crap? And had to explain some basic newsgroup functionality relating to this to a guy who says he's been moderating for 11+ years. THEN YOU SHOULD KNOW WHAT I AM TELLING YOU, BUNKY, AND YOU SHOULD NOT ACT LIKE YOU KNOW MORE THAN ME BECAUSE CLEARLY YOU DO NOT.
Prequel: the primary reader crashed Friday night. Rebooted, no big deal. However, since then the natives have been woeful, noting that posting times are increased--that is, you post the article and it takes anywhere up to a minute and change for your newsreader to come back and get on with its life. At first I thought it might be related to said crash, but it was happening on both readers, and with any news client, including just talking directly to the NNTP port. (Which I still remember how to do, because I am awesome.) And then
![[livejournal.com profile]](https://www.dreamwidth.org/img/external/lj-userinfo.gif)
I keep expecting someone whose job title officially says "sysadmin" to deal with the deep magic of news-related troubles, but that keeps not happening. I am in fact the one with the biggest INN clue except for El Jefe himself, and although he had a wizard's knowledge of it in the day, he hasn't actually touched it in years. So here am I with the leader badge.
The hunt goes through dark places, because our news setup is weird and mystical and somewhat ill-documented: that is, parts of it are meticulously documented, and others were Not Got Around To before the people who knew them left. Happily for me, many of those people are still around my world. So I got
![[livejournal.com profile]](https://www.dreamwidth.org/img/external/lj-userinfo.gif)
![[livejournal.com profile]](https://www.dreamwidth.org/img/external/lj-userinfo.gif)
So, fixed that, was about to run expire when I realized there is probably not enough space on the partition. Phoned El Jefe. He thought there was, because he thought that the re-written database would be smaller. I didn't think it would be smaller enough. What a stupid time to be right... It's now running and dumping the new copy in a really silly part of the filesystem that nevertheless has scads of space, and hopefully it will finish in time for me to make sure it is all okay before I GO HOME.
Also, some Usenet performance artists are splashing their poo on various walls, and making a clumsy attempt to joe-job Supernews with it. Anyone know if "usenetserver.com" takes action against this kind of crap? And had to explain some basic newsgroup functionality relating to this to a guy who says he's been moderating for 11+ years. THEN YOU SHOULD KNOW WHAT I AM TELLING YOU, BUNKY, AND YOU SHOULD NOT ACT LIKE YOU KNOW MORE THAN ME BECAUSE CLEARLY YOU DO NOT.