10:37 fdobridge: <m​arysaka> :painpeko:
10:37 fdobridge: <m​arysaka> https://cdn.discordapp.com/attachments/1034184951790305330/1059782471257956392/image.png
15:13 fdobridge: <m​arysaka> two more tests to go and I should be mostly done ✨
15:13 fdobridge: <m​arysaka> https://cdn.discordapp.com/attachments/1034184951790305330/1059851885491798086/image.png
15:43 fdobridge: <m​arysaka> well for DRAM limit I cannot really do the test like on tu104 because there is no DWRITE
15:44 fdobridge: <m​arysaka> I tried using that one unknown opcode instead to see aaand
15:44 fdobridge: <m​arysaka> ```
15:44 fdobridge: <m​arysaka> [10744.185077] nouveau 0000:07:00.0: gr: TRAP ch 3 [103f93a000 mme_fermi_sim_h[38657]]
15:44 fdobridge: <m​arysaka> [10744.185096] nouveau 0000:07:00.0: gr: MACRO 80000001 [TOO_FEW_PARAMS], pc: 0x006, op: 0x00000006
15:44 fdobridge: <m​arysaka> ```
15:44 fdobridge: <m​arysaka> I guess this is some kind of multi parameter load maybe? :AkkoDerp:
15:46 fdobridge: <m​arysaka> yeah I think it's read fifoed considering related methods are defined in Fermi 2D engine, I guess I will test that out
15:52 KungFuJesus: Awesome, karolherbst, I have a bisectable range for that issue I was exhibiting in mythtv
15:53 KungFuJesus: deinterlacing is still causing flickering but I don't start to a blank screen 3/4 of the times when launching it
16:10 fdobridge: <k​arolherbst🐧🦀> do you check envytools for some of the unknown bits?
16:12 fdobridge: <m​arysaka> There wasn't much for MME macros in envydis in general and I don't see any definition for opcode 6
16:13 fdobridge: <m​arysaka> interesting enough, all read fifoed methods seems to report as invalid so maybe it's not that...
16:13 fdobridge: <m​arysaka> interesting enough, all read fifoed methods seems to report as invalid on Maxwell Gen 2 so maybe it's not that... (edited)
16:14 fdobridge: <k​arolherbst🐧🦀> seems like opcode 4 is also something new?
16:14 fdobridge: <k​arolherbst🐧🦀> seems like opcode 4 is also something unkown? (edited)
16:14 fdobridge: <k​arolherbst🐧🦀> ehh wait.. the number is encoded somewhere else
16:15 fdobridge: <m​arysaka> my only missing opcode is 6 :AkkoDerp:
16:15 fdobridge: <k​arolherbst🐧🦀> given that 5 is read, 6 might be write
16:17 fdobridge: <m​arysaka> I tried that and nope it's not sadly
16:17 fdobridge: <m​arysaka> I'm going to try again just in case
16:18 fdobridge: <k​arolherbst🐧🦀> mhh
16:18 fdobridge: <k​arolherbst🐧🦀> could ping mwk and see if they know anything
16:23 fdobridge: <g​fxstrand> Finally back from holiday. What did I miss?
16:24 fdobridge: <m​arysaka> Not much I think :AkkoYay:
16:37 fdobridge: <m​arysaka> Okay so I forgot to change the encoding part earlier so of course that wasn't helping...
16:37 fdobridge: <m​arysaka> @karolherbst🐧 it seems that this opcode has similar special requirement like BRANCH, it reports as invalid if the assignment opcode is set
16:37 fdobridge: <k​arolherbst🐧🦀> mhh interesting
16:37 fdobridge: <k​arolherbst🐧🦀> maybe a relative vs absolute branch thing?
16:38 fdobridge: <k​arolherbst🐧🦀> or just something entirely different
16:39 fdobridge: <k​arolherbst🐧🦀> @marysaka did you dump all the macros nvidia uploads? @gfxstrand had some hacky script to dump them for turing, but I am sure the same method might work for previous gens
16:40 fdobridge: <k​arolherbst🐧🦀> might make it more obvious what the unknwon opcodes/bits are doing
16:40 fdobridge: <m​arysaka> Well I guess I could dump some with Ryujinx but that opcode never got used
16:40 fdobridge: <m​arysaka> (And that's with OpenGL, Vulkan and even NVN APIs)
16:40 fdobridge: <g​fxstrand> My horrible hacks should work for any generation. I really should clean that stuff up and try to gind a real plan.
16:40 fdobridge: <g​fxstrand> My horrible hacks should work for any generation. I really should clean that stuff up and try to find a real plan. (edited)
16:42 fdobridge: <m​arysaka> wait a sec... it seems the code right after that instruction just never get called, so it's maybe a branch...?
16:43 fdobridge: <k​arolherbst🐧🦀> or it just disables the next one based on a condition?
16:43 fdobridge: <k​arolherbst🐧🦀> or you did really jump over it...
16:45 fdobridge: <g​fxstrand> It's possible branch requires a delay slot
16:45 fdobridge: <g​fxstrand> Or that there's an off-by-one in your address calculations.
16:46 fdobridge: <g​fxstrand> Getting that right for Turing was a PITA
16:47 fdobridge: <m​arysaka> There is a no delay slot bit on the branch variant, but it seems to not apply here (getting ILLEGAL_OPCODE)
16:47 fdobridge: <m​arysaka> I do wonder if it's not like the EXTENDED one that Turing have?
16:47 fdobridge: <g​fxstrand> For forward branches, my method for R/E for that was to have a series of ADD instructions which incremented a register and then jump into the middle of it and look at the count.
16:48 fdobridge: <g​fxstrand> Oh, you're getting an illegal opcode? Hrm...
16:48 fdobridge: <g​fxstrand> Sorry. I thought we were talking about branches. Maybe I need to read more context. 😅
16:48 fdobridge: <m​arysaka> going to try what it seems to like but so far immediate + src[0] seems to not get me on illegal territory :notlikethis:
16:50 fdobridge: <m​arysaka> So basically so far I have BZ and BNZ (opcode 7), I'm trying to figure out opcode 6
16:50 fdobridge: <g​fxstrand> So if you don't have DWRITE, how do you even test DREAD?
16:50 fdobridge: <m​arysaka> I use methods
16:50 fdobridge: <m​arysaka> and scratch
16:50 fdobridge: <m​arysaka> I use semaphore method (edited)
16:51 fdobridge: <m​arysaka> https://gitlab.freedesktop.org/nouveau/mesa/-/merge_requests/175/diffs#658ecb9648761e891273ecd1594f6227ae764697_0_821
16:54 fdobridge: <m​arysaka> I should add a check for values here, but it does work
16:54 fdobridge: <m​arysaka> I should add a check for correct values here, but it does work (edited)
16:55 fdobridge: <g​fxstrand> Reading that test, it looks like what you're calling DREAD is more like STATE on Turing
17:00 fdobridge: <g​fxstrand> The Turing DREAD/DWRITE are for reading/writing a special section of scratch memory accessible to the MME. There are also ways to DMA from the RAM area to other memory on the command streamer once the MME is done. I think you can also pre-populate it but I never figured out how.
17:01 fdobridge: <g​fxstrand> The STATE opcode is for reading back bits of GPU state such as those stored in the MME scratch staes
17:03 fdobridge: <m​arysaka> hmm so I was trying to confirm that it could read DRAM, and got something interesting actually
17:05 fdobridge: <g​fxstrand> oh?
17:06 fdobridge: <m​arysaka> ```cpp
17:06 fdobridge: <m​arysaka> mme_value x2 = mme_fermi_dread(&b, mme_imm(NV9097_SET_REPORT_SEMAPHORE_C / 4));
17:06 fdobridge: <m​arysaka> ```
17:06 fdobridge: <m​arysaka>
17:06 fdobridge: <m​arysaka> This end up reading 0
17:08 fdobridge: <m​arysaka> I kind of remember that in the early days of Switch emulation, people were dumping "empty" channel state with that opcode
17:08 fdobridge: <g​fxstrand> That would be the way to do it, yes.
17:09 fdobridge: <g​fxstrand> As for why `REPORT_SEMAPHORE_C` reads 0, I'm not sure. It probably reads whatever was last written to `REPORT_SEMAPHORE_C`.
17:09 fdobridge: <g​fxstrand> I could also believe that, for some state, it always reads back 0 if it's a write-only state.
17:09 fdobridge: <g​fxstrand> IDK if there are any such states but I could believe there are.
17:10 fdobridge: <m​arysaka> hmm so I suppose this should be named STATE then?
17:11 fdobridge: <g​fxstrand> Yeah, I think so.
17:14 fdobridge: <m​arysaka> About the command stream thing, if I understand correctly you are talking about command buffer being populated by a MME macro (or even just a previous command buffer)?
17:16 fdobridge: <g​fxstrand> On Turing+, there is a special RAM attached to the MME. I think its intended use is as scratch space if you need more than you have registers for. There are also commands to DMA to/from that ram from arbitrary GPU addresses. I got DMA out to work but never DMA in. I don't know why. I don't think Maxwell and earlier have this RAM at all.
17:16 fdobridge: <g​fxstrand> That RAM is what the DREAD/DWRITE opcodes on Turing access.
17:17 fdobridge: <m​arysaka> Oh I see okay I misunderstood completely then
17:18 fdobridge: <g​fxstrand> Yeah, it's all quite confusing. There are a lot of bits and pieces strapped onto the MME on Turing. It's not clear what they're all for.
17:18 fdobridge: <m​arysaka> :painpeko:
17:19 fdobridge: <m​arysaka> I thought you were talking about the prefetch logic on command buffer initially, that also have weird rules if I remember correctly
17:25 fdobridge: <m​arysaka> @gfxstrand so to recenter, the RAM you are talking about is set with ``NVC597_SET_MME_DATA_RAM_ADDRESS``?
17:25 fdobridge: <m​arysaka> or maybe ``SET_MME_MEM_ADDRESS_A`` and others
17:31 fdobridge: <g​fxstrand> All three
17:32 fdobridge: <m​arysaka> FERMI 2D engine have those registers defined
17:32 fdobridge: <m​arysaka> tho they report as invalid if you try to access them on a Maxwell GPU from what I tested earlier so it's kind of odd :notlikethis:
17:33 fdobridge: <g​fxstrand> It's set up `MEM_ADDRESS_A/B` as some GPU memory address, `RAM_ADDRESS` as a dword offset into the MME RAM and then `MME_DMA_WRITE` to copy from the MME RAM to whatever GPU address.
17:33 fdobridge: <g​fxstrand> There's also a `MME_DMA_READ` but I've never gotten it to work.
17:38 fdobridge: <m​arysaka> I see
17:45 fdobridge: <k​arolherbst🐧🦀> wondering if some of them need a privileged channel to actually work
17:46 fdobridge: <k​arolherbst🐧🦀> but why does `DMA_WRITE` work but not `DMA_READ`?
17:47 fdobridge: <g​fxstrand> I suspect I was just doing it wrong.
17:47 fdobridge: <g​fxstrand> Once I got the FIFOED version to work, I gave up on trying to DMA into the MME RAM.
17:50 fdobridge: <k​arolherbst🐧🦀> heh
17:54 KungFuJesus: karolherbst: the offending commit: ca1ec7272685bdadd4e339cb989ac503db0abd18
17:54 fdobridge: <m​arysaka> Just added the scratch limit test... there is only space for 128 entries :painpeko:
17:54 KungFuJesus: which I guess isn't too surprising, there's bug reports / PRs suggesting that TGSI is broken on nv30
17:55 KungFuJesus: but...the commit message implies it was a pretty substantial win. It seems like fixing TGSI on nv30 would be worthwhile?
17:56 karolherbst: KungFuJesus: uhhh :'( pain
17:56 fdobridge: <g​fxstrand> That's probably plenty. We're only using a couple right now.
17:56 karolherbst: guess there is something wrong with the nir to tgsi translation then
17:57 KungFuJesus: more annoyingly, it's not consistently broken, lol
17:57 KungFuJesus: every few launches it might work
17:58 fdobridge: <m​arysaka> oh yeah I just thought there was 256 entries initially
17:58 karolherbst: KungFuJesus: huh....
17:58 karolherbst: tried to run with valgrind or with libasan?
17:58 karolherbst: could be some annoying memory corruption thing
17:58 KungFuJesus: or it might partially work, where the text won't render correctly (mythtv's gl rendering does use shaders)
17:59 anholt: I found nv30 to be unusuably unstable, even single threaded, even before that commit.
17:59 KungFuJesus: anholt: I'm guessing you're hitting more surface area of the GL API than I am
17:59 karolherbst: yeah.. anyway
18:00 karolherbst: if using nir to tgsi regresses something, I suspect it's a bug either inside nir to tgsi or nv30 does start seeing TGSI stuff it didn't before
18:00 anholt: my assessment was "either nv30 is plain broken already, or the hw is so old and rotten that all 3 gpus I have have failed, so we there's no way to test it."
18:00 karolherbst: in either case...
18:00 anholt: so my assumption on someone bisecting to that commit would be more of the "nv30 was already broken, and now we're tickling that for this workload"
18:00 KungFuJesus: anholt: I don't think there was anything wrong with your GPU, this GPU is in pretty good shape and sees issues
18:00 karolherbst: anholt: I have one particular nv40 GPU where gnome works without issues, but starting glxgears would literally kill the session...
18:01 KungFuJesus: karolherbst: heh, my MR should fix that one, at least
18:01 karolherbst: nice
18:01 KungFuJesus: the minimal reproduction for that one was to construct a display list, so I can see how it might have been missed
18:02 karolherbst: yeah, let's merge that fix then
18:02 karolherbst: it's quite fun, that nv30 actually started to become "useable" last year...
18:02 karolherbst: or two years ago
18:02 karolherbst: ?
18:03 KungFuJesus: lol, only took 16 years
18:03 karolherbst: anyway.. I'm happy to keep the driver if people keep hacking on it :D I just probably won't look into issues myself unless I really have to (which I probably never will)
18:03 KungFuJesus: I mean there's another texture byteswapping issue that's fixed by another MR that's been in there for about a year or so. That bug is independent of nouveau and goes back at _least_ to 2018
18:03 karolherbst: ahh yes...
18:04 karolherbst: big endian is just a mess atm
18:04 karolherbst: because a lot of code treats big endian wrong
18:05 KungFuJesus: this was a case where a packed texture format didn't have special big endian handling that was needed in the broader scope of mesa. The guy that fixed that found it via some comment bread trail
18:05 karolherbst: figures
18:05 KungFuJesus: Gotta keep fighting the fight for endian correctness. The Big Endian will rise again
18:05 karolherbst: I mean... a lot of the code assumed you could handle big endian by swizzling channels
18:05 karolherbst: which you can't
18:06 karolherbst: but apparently that's how we handled big endian from the start
18:06 karolherbst: all that swizzling code has to go
18:06 karolherbst: there is no other sane way out there for proper big endian support
18:09 KungFuJesus: I do wonder how the proprietary nvidia driver for Mac OS of yore handled it
18:11 karolherbst: in the proper way
18:13 fdobridge: <m​arysaka> Well I think my MR is done, the only missing part is making tu104 and fermi builder share code... and probably fixing the codestyle a bit
18:13 KungFuJesus: Using altivec based vperm to swap 16 bytes of the texture at a time?
18:15 KungFuJesus: karolherbst: You think that the TGSI issue here is host based memory corruption that valgrind can sus out? If so I can give it a go, I'll just need to again rebuild my glibc in split debug configuration
18:30 fdobridge: <g​fxstrand> Cool! Why don't I pop a Maxwell in my box and play around with it a bit.
18:31 fdobridge: <g​fxstrand> My brain's not working for anything else today (post holiday fog) so helping you get that landed sounds like a good task.
18:33 fdobridge: <m​arysaka> sure :AkkoYay:
18:40 fdobridge: <g​fxstrand> I did just rebase the nvk/main branch a few minutes ago (starting off the new year with a fresh rebase). Mind rebasing on that quick?
18:41 fdobridge: <g​fxstrand> Oh, the fan noise...
18:43 fdobridge: <m​arysaka> on it 👍
18:47 fdobridge: <m​arysaka> should be rebased now
18:47 karolherbst: KungFuJesus: yeah.. hopefully
18:47 fdobridge: <k​arolherbst🐧🦀> @marysaka I guess I can also test it on the previous gens next week or something.. let's see how much I get done this week 🙃
18:48 fdobridge: <m​arysaka> I can try this week end if you want, I still have my GT 440 around I guess
18:48 fdobridge: <m​arysaka> for Kepler uuurgh I have an old laptop with a 860M I think that's Kepler but not sure
18:48 fdobridge: <k​arolherbst🐧🦀> fermi isn't supported though
18:49 fdobridge: <k​arolherbst🐧🦀> and probably never will be
18:49 fdobridge: <k​arolherbst🐧🦀> I mean.. if you want you can look into fermi support
18:49 fdobridge: <k​arolherbst🐧🦀> there are just two major things: no copy subchannel, so everything using xxb5 needs to be replaced by something fermi specific, also, compute launches are different
18:50 fdobridge: <k​arolherbst🐧🦀> fermi does have the xxb5 subchannel in theory, but only for privilged channels afaik
18:50 fdobridge: <m​arysaka> oh yeah I mean just to test MME tests on it
18:50 fdobridge: <k​arolherbst🐧🦀> sounds about right.. but probably better to check
18:51 fdobridge: <k​arolherbst🐧🦀> ahh...
18:51 fdobridge: <k​arolherbst🐧🦀> I meant testing all of nvk
18:51 fdobridge: <m​arysaka> ah I see
18:51 fdobridge: <k​arolherbst🐧🦀> I'll run the CTS on all the hardware, no problem
18:51 fdobridge: <k​arolherbst🐧🦀> I just want to land spirv support in rusticl this week first, and that requires me to fix the CL CTS
18:51 fdobridge: <m​arysaka> It's a bit of mess in the current state as mme_value and others are probably going to conflict between tu104 and fermi side :nya_sad:
18:52 fdobridge: <k​arolherbst🐧🦀> yeah.. just ping me once you are done or something 🙂
18:52 fdobridge: <m​arysaka> I mean I'm not sure how to separate that tbh
18:52 fdobridge: <g​fxstrand> Sanity test fails on my Maxwell. 😭
18:53 fdobridge: <m​arysaka> nani
18:53 fdobridge: <k​arolherbst🐧🦀> noooo 😦
18:53 fdobridge: <m​arysaka> is that Gen 1?
18:53 fdobridge: <k​arolherbst🐧🦀> does compute still work tho
18:53 fdobridge: <g​fxstrand> Yeah, it's a Maxwell A in this case
18:53 fdobridge: <m​arysaka> I don't have any of those urgh :nya_flop:
18:53 fdobridge: <k​arolherbst🐧🦀> @gfxstrand you mean sanity vulkan or mme testing?
18:54 fdobridge: <g​fxstrand> Sanity MME
18:54 fdobridge: <k​arolherbst🐧🦀> ahh
18:54 fdobridge: <m​arysaka> is it the device finder not working or something else?
18:54 fdobridge: <g​fxstrand> Device finder seems to work.
18:54 fdobridge: <g​fxstrand> I've got both cards plugged in and it seems to be running on the Maxwell
18:55 fdobridge: <g​fxstrand> Getting piles of ILLEGAL_CLASS
18:55 fdobridge: <k​arolherbst🐧🦀> ehh
18:55 fdobridge: <m​arysaka> huh how that possible
18:55 fdobridge: <k​arolherbst🐧🦀> guess the subchannel stuff is wrong then
18:55 fdobridge: <k​arolherbst🐧🦀> or something
18:56 fdobridge: <m​arysaka> can you check what device is being selected in ``mme_fermi_sim_test::SetUp()`` via the device id?
18:56 fdobridge: <g​fxstrand> Yeah. I think we just need to init subchans
18:56 fdobridge: <m​arysaka> ah ok
18:57 fdobridge: <g​fxstrand> Yup. That was it.
18:58 fdobridge: <g​fxstrand> Pushed a patch on top to fix it
18:59 fdobridge: <g​fxstrand> I wasn't doing `SET_OBJECT` in the Turing tests either. 😲
18:59 fdobridge: <k​arolherbst🐧🦀> yeah... for some hw it doens't matter
18:59 fdobridge: <k​arolherbst🐧🦀> it's weird
18:59 fdobridge: <k​arolherbst🐧🦀> for some the kernel works around broken userspace
18:59 fdobridge: <g​fxstrand> Easy fix
18:59 fdobridge: <k​arolherbst🐧🦀> yeah
18:59 fdobridge: <k​arolherbst🐧🦀> for ampere it matters again I think
19:00 fdobridge: <k​arolherbst🐧🦀> @gfxstrand you don't have an ampere gpu, do you?
19:00 fdobridge: <g​fxstrand> No, not yet
19:00 fdobridge: <k​arolherbst🐧🦀> k
19:00 fdobridge: <k​arolherbst🐧🦀> now that the kernel stuff is merged, maybe I'll look into ampere and see how broken it is
19:00 fdobridge: <m​arysaka> I kind of followed your tests and builder quite a bit :AkkoDerp:
19:01 fdobridge: <m​arysaka> there is possibly bugs on the turing simulator around carry/borrow part that I need to confirm at some point
19:01 fdobridge: <m​arysaka> (like ADDC/SUBB touch the carry on fermi MME need to confrm that for turing but I suppose it's the same)
19:04 fdobridge: <g​fxstrand> Oh, they do? Interesting. On Turing, each instruction has 2 ALU ops, 2 mthd, and 2 emit, all of which can be no-op. ADDC and SUBB are required to be the 2nd one with the initial ADD/SUB as the first.
19:04 fdobridge: <g​fxstrand> IDK if the carry value even exists between full instructions
19:09 fdobridge: <g​fxstrand> @marysaka So, for registers, I see you're reserving R0 and R1 Is there anything other special we need to do for RA there?
19:10 fdobridge: <g​fxstrand> Also, is R0 always zero or can it be written to?
19:11 fdobridge: <g​fxstrand> Ok, I think I see roughly how this works. Hrm... 🤔
19:13 fdobridge: <g​fxstrand> Only 7 registers. That's gonna be limiting...
19:16 fdobridge:<g​fxstrand> contemplates NIR back-ends again
19:19 KungFuJesus: hah, also realized that a lot of the terrible issue you guys had been seeing with nv30 are probably not hitting me as badly since I'm building mesa without xa. Then again, I'm using fluxbox so hard to say if those features are being used all that much, anyway
19:21 fdobridge: <g​fxstrand> I was really hoping the Fermi MME would be a tiny bit more capable. 😕
19:27 fdobridge: <g​fxstrand> Our simple draw macro already uses 7 registers. Draw indexed uses 8 (but can be reduced to 7).
19:29 fdobridge: <g​fxstrand> With real liveness and RA they both will fit just fine in the Fermi MME's 7 regs.
19:29 fdobridge: <g​fxstrand> But that means liveness and real RA. 😭
19:29 fdobridge: <g​fxstrand> Or a couple well-placed `free_reg()` calls.
19:32 fdobridge: <g​fxstrand> I think I'll proceed under the assumption that we don't need RA for now. 😅
19:32 fdobridge: <g​fxstrand> The MME for queries is just not possible on Fermi
19:33 fdobridge: <g​fxstrand> We'll have to fire off a compute shader for that but that's ok. I was assuming we'd need to do that eventually anyway. I did MME because I'm lazy and didn't want to think about meta for queries.
19:35 anholt: KungFuJesus: my testing didn't even have a window system running
19:36 fdobridge: <k​arolherbst🐧🦀> @gfxstrand why would we need a compute shader?
19:37 fdobridge: <g​fxstrand> `vkCmdCopyQueryPoolResults()`
19:37 fdobridge: <g​fxstrand> Right now it's a giant MME
19:38 fdobridge: <g​fxstrand> Could do it with a vertex shader too if we want to avoid the channel switch stall.
19:38 fdobridge: <k​arolherbst🐧🦀> let me see the macro first tho
19:39 fdobridge: <g​fxstrand> https://gitlab.freedesktop.org/nouveau/mesa/-/blob/nvk/main/src/nouveau/vulkan/nvk_query_pool.c#L536
19:39 KungFuJesus: karolherbst: Trying valgrind first, but is there anything special to make mesa build with asan?
19:42 karolherbst: KungFuJesus: nope
19:42 karolherbst: just use the meson flag and it should just work
19:43 KungFuJesus: asan is just so much better in my experience
19:43 karolherbst: it is
19:45 anholt: KungFuJesus: -D b_sanitize=address, and disable tests because some compiler tests fail
19:45 anholt: if you end up looking at leaks, then you need libdlclose-skip.so (check .gitlab-ci/test/gitlab-ci.yml
19:46 anholt: oh, and run your app with LD_PRELOAD=libasan.so.6
19:46 KungFuJesus: right, done that one about a million times
19:55 fdobridge: <m​arysaka> yeaaah :notlikethis:
19:56 fdobridge: <m​arysaka> there is still some saveup that could be done here and there especially around mme_mov/mme_add with immediate but it's a mess
19:57 fdobridge: <g​fxstrand> Yeah
19:57 fdobridge: <g​fxstrand> It's ok. The good news is that it will be the same every time (it's not dynamic) so as long as we test on something pre-Turing regularly, it should be ok.
19:59 fdobridge: <m​arysaka> I wonder tho maybe it would be okay to use scratch space a bit more with fermi
19:59 fdobridge: <m​arysaka> I haven't seen the current turing macro yet
19:59 fdobridge: <m​arysaka> I haven't seen the current turing macro yet tho (edited)
20:01 KungFuJesus: lol, excellent, can't reproduce with debug build flags. Sounds like I might need to turn on ubsan as well. Does -D b_sanitize=address,undefined work as well?
20:08 fdobridge: <m​arysaka> @gfxstrand do you happens to know what happens when you use the EXTENDED instruction with an empty fifo memory?
20:08 fdobridge: <g​fxstrand> no idea
20:08 fdobridge: <m​arysaka> asking for that unknown opcode I'm still looking at
20:09 fdobridge: <g​fxstrand> I've been working under the assumption that it's just a sort of barrier
20:09 fdobridge: <g​fxstrand> I really don't know much about it
20:09 fdobridge: <m​arysaka> like on Fermi, when you do a LOAD when you don't have parameter it just kill the macro it seems
20:09 fdobridge: <m​arysaka> without any exception in kernel logs :notlikethis:
20:09 fdobridge: <g​fxstrand> 😦
20:10 fdobridge: <m​arysaka> and well that one unknown opcode also kill it the same way so I'm kind of assuming it's related... but I still get illegal error on the address for that MME memory thing
20:10 fdobridge: <m​arysaka> I should really grab a Fermi and test this, maybe that opcode is just a leftover or something :AkkoDerp:
20:26 KungFuJesus: not seeing anything from asan or ubsan, I think we might be looking at a bug emitted in the shader compilation :-/
20:26 KungFuJesus: either that or some buffer's not being properly flushed
20:41 KungFuJesus: are we sure b_sanitize= is right?
20:44 KungFuJesus: ohhh, I don't think it's enough to inject into the build flags
20:47 KungFuJesus: ok, injecting "c_args" now
20:57 fdobridge: <g​fxstrand> @marysaka You can probably go ahead and make the isaspec change an MR against main.
21:00 fdobridge: <m​arysaka> ah right I forgot that one 👍
21:02 KungFuJesus: I _think_ this issue hides with -O0, but that may take a bit to track down
21:06 fdobridge: <g​fxstrand> Ugh... I'm starting to dislike the current MME builder.
21:06 fdobridge: <g​fxstrand> I think we need a tiny bit more abstraction.
21:06 fdobridge: <g​fxstrand> IDK that we need NIR but I think we need some sort of tiny intermediate something.
21:07 fdobridge: <g​fxstrand> At the very least, an enum of all ops and not use the TU104 enum directly.
21:07 fdobridge: <m​arysaka> I guess yeah
21:07 fdobridge: <g​fxstrand> And fermi can just not support some of them
21:08 fdobridge: <m​arysaka> :painpeko:
21:08 fdobridge: <g​fxstrand> I'm trying really hard to make it so that, inside NVK, we don't have to type every macro twice.
21:08 fdobridge: <g​fxstrand> If we can accomplish that, I think we win.
21:08 fdobridge: <g​fxstrand> Let me do some typing and see how horrible it all is.
21:15 KungFuJesus: karolherbst: Does this get used by nv30/40? https://pastebin.com/yEg2Gzfe
21:44 karolherbst: KungFuJesus: nope
21:45 fdobridge: <k​arolherbst🐧🦀> @gfxstrand well.. whatever you come up with, it won't be worse than what we have inside gallium atm 🙃
21:47 KungFuJesus: karolherbst: got a couple of hits, some of which look sus
21:47 KungFuJesus: first: ../src/gallium/auxiliary/pipe-loader/pipe_loader.c:101:4: runtime error: null pointer passed as argument 2, which is declared to never be null
21:47 karolherbst: well
21:47 karolherbst: thing is, if it would be that you'd notice
21:48 KungFuJesus: and: https://pastebin.com/XCVabNpU
21:48 karolherbst: ahh yes..
21:48 karolherbst: that looks like a bug
21:48 karolherbst: uhhh
21:49 karolherbst: _mesa_array_format_flip_channels
21:49 karolherbst: :pain:
21:49 karolherbst: this has to go
21:49 KungFuJesus: at _least_ an off by one bug, though it may have been the first access that bombed
21:49 karolherbst: it's sooo wrong
21:49 karolherbst: honestly... big endian inside mesa is done in the most wrong way possible
21:52 KungFuJesus: so what is this code actually trying to accomplish? Is flip_xy a lookup table of sorts?
21:52 karolherbst: apparently somebody was thinking that rgba on big endian looks like abgr
21:52 karolherbst: and then things just evolved from that
21:52 KungFuJesus: lol, I see why big endian is only affected
21:53 karolherbst: yeah.. it's all so horible wrong
21:53 karolherbst: *horibly
21:53 KungFuJesus: big ol ifdef in there
21:53 KungFuJesus: what's a good thing to try here?
21:53 karolherbst: dunno.. remove all big endian code and hope it's just fixed? dunno
21:54 karolherbst: I think someone has to take a real good look at this mess and do it once from a clean state for real
21:54 karolherbst: because this swizzling channels around makes no sense
21:54 karolherbst: however
21:54 karolherbst: there are times we load pixels as uint32_t packs and sometimes we only access the channels
21:55 karolherbst: and do weird load/store operations with different pointer types and what not
21:55 karolherbst: so things just happen to be swapped around sometimes
21:55 KungFuJesus: you think it _ought_ to behave properly if this ifdef gets ripped out?
21:56 karolherbst: no, but probably easier to fix it this way for real
21:57 KungFuJesus: The immediate problem being found by asan is a direct out of bounds index due to swizzle's values
21:57 karolherbst: yeah.. figures
21:57 karolherbst: what's the format?
21:58 karolherbst: also..
21:58 karolherbst: what's this flip_xy[6] thing if only the first 4 values are accessed anyway?
21:59 karolherbst: uhhh
21:59 KungFuJesus: all very good questions - should I inject some printfs?
21:59 karolherbst: I hope the magic _mesa_array_format_set_swizzle is doens't mess stuff up
21:59 karolherbst: ohhh
22:00 karolherbst: it's a double indirect
22:00 karolherbst: k...
22:00 KungFuJesus: yeah, swizzle is an indexing into flip_xy
22:00 KungFuJesus: uber efficient, lol
22:00 karolherbst: this is all so wrong
22:01 KungFuJesus: I plopped an #if 0 instead of ARCH_BIG_ENDIAN just to see if it gets further or gets me a proper image, even if byteswapped
22:02 KungFuJesus: (which actually, it already is at the moment until that one PR goes in)
22:02 karolherbst: probably renders garbage in a predictable way
22:02 karolherbst: or maybe it just works
22:02 karolherbst: dunno
22:04 KungFuJesus: not immediately crashing but, also seeing some bit shifts that may be posing some sign extension issues?
22:05 KungFuJesus: https://pastebin.com/Amki2uw8
22:05 karolherbst: heh
22:05 karolherbst: sounds like more big endian mess up... maybe not dunno
22:05 KungFuJesus: I think PPC specifically struggles with these sort of things
22:06 karolherbst: yeah well...
22:06 KungFuJesus: I've seen something similar with char versus unsigned char
22:06 karolherbst: ehh wait what
22:07 karolherbst: ohhh
22:07 karolherbst: int vs unsigned int
22:07 KungFuJesus: right
22:07 karolherbst: uhhh
22:07 karolherbst: why is C that terrible
22:07 karolherbst: fun part
22:07 karolherbst: line_stipple_patternis uint32_t
22:07 KungFuJesus: now if sign extension is the issue because the value itself is now way bigger due to endianness...then yeah maybe I did that by disabling that byte swap path
22:07 karolherbst: I honestly don't know why it complains about it
22:08 karolherbst: it's an unsigned 0x0000ffff value
22:08 karolherbst: shifted by 16 left
22:08 karolherbst: which is 0xffff0000
22:08 karolherbst: sooo...
22:08 karolherbst: what's the problem?
22:09 KungFuJesus: compiler definitely thinks something is interpreted as a signed int somewhere
22:09 karolherbst: try shifting by 16u
22:09 karolherbst: uh... ul?
22:09 karolherbst: mhh
22:09 karolherbst: I guess u should be enough
22:09 karolherbst: dunno
22:09 karolherbst: probably some weirdo C spec thing which makes it signed
22:10 KungFuJesus: and the value being OR'd with it, also unsigned?
22:10 karolherbst: correct
22:14 KungFuJesus: I'm a bit confused about the fragprog.c line...there's a logical or there, not a left shift
22:15 KungFuJesus: oh maybe the macro has a shift
22:16 karolherbst: ahh yes...
22:16 karolherbst: should probably use 1u instead of 1...
22:18 KungFuJesus: yeah, just fixed that
22:18 KungFuJesus: lol, I wonder how many cryptic bugs that fixed
22:18 karolherbst: probably none
22:27 KungFuJesus: we sure these null pointer access claims are false positives?
22:27 KungFuJesus: (pipe_surface)
22:28 KungFuJesus: I'm also seeing a bunch of misalignment issues but maybe that could be due to the lack of swizzling? https://pastebin.com/euTGQhWT
22:35 KungFuJesus: oh fun stuff, format_info header is generated via some python
22:37 KungFuJesus: good god that's a lot of formats. I'm going to need to print the index and hope it's not something monsterous I have to count into this thing
22:37 KungFuJesus: assuming any of this is our issue
22:37 KungFuJesus: seemingly the issue disappears when optimizations are off, so maybe something being explicitly zero initialized somewhere makes it work?
22:38 karolherbst: probably
22:38 KungFuJesus: grepping for "Swizzle", quite a few show a value of 6
22:42 KungFuJesus: it's really not clear to me where python is pulling in the swizzle array that's being printed in that format string
22:43 KungFuJesus: oh it's hiding in format_parser.py
22:43 KungFuJesus: thanks, Intel
22:46 KungFuJesus: It looks like 6 is supposed to reprent "swizzle_none"? As in, "No data available for this channel". So, being undefined is somehow ok in that circumstance?
22:46 KungFuJesus: this still seems odd to me
22:47 KungFuJesus: This isn't _just_ nv30 code, like everything is using this, no?
22:48 KungFuJesus: but seemingly only for big endian architectures
22:49 KungFuJesus: alright so, my guess : this is one of those formats that have the "None" or undefined bit where we don't care about that channel. But because the "swizzle" shader being emitted is an invalid index with uninitialized memory, we end up with some rather undefined behavior
22:49 KungFuJesus: I'll try to define the last bit as "0" in flip_xy and see if it fixes it
23:22 KungFuJesus: this glsl compiler here is doing some rather unsavory things in C++ with types
23:22 KungFuJesus: that's basically what the sanitizers are complaining about with that type cast
23:23 KungFuJesus: I'm not even confident that's legal in C++, given that those are subclasses and not the other way around
23:23 fdobridge: <g​fxstrand> Stupid isaspec and its hard-coded generated filenames. 😦
23:27 KungFuJesus: base class is a struct and is not a polymorphic type
23:32 KungFuJesus: I mean in theory this cast _should_ be fine if you can guarantee that the pointer is of type ir_variable, but the sanitizers seem to disagree with that sentiment
23:36 KungFuJesus: vtable invalid entries also maybe a bit troubling
23:46 KungFuJesus: eh, this bit is also looking awfully suspicious: ../src/gallium/auxiliary/util/u_inlines.h:131:8: runtime error: member access within null pointer of type 'struct pipe_surface'
23:46 KungFuJesus: old->reference should be illegal and it sounds like ubsan/asan are definitely finding old to be a null. I'm quite surprised that didn't cause a crash
23:48 KungFuJesus: perhaps reference is at a 0 offset within the struct