08:06asdqueerfromeu[d]: mangodev[d]: I wonder if you get macroblock glitches before then
09:30karolherbst[d]: LDG [R + UR] variants aren't wired up yet, are they?
09:30karolherbst[d]: Seeing this:
09:30karolherbst[d]: r20 p0 = lea r11 4 cx[ur2..4][0x0] // delay=1
09:30karolherbst[d]: r22 p1 = lea r62 4 cx[ur2..4][0x0] // delay=3
09:30karolherbst[d]: r21 = lea.x.hi r11 4 cx[ur2..4][0x4] rZ p0 // delay=1
09:30karolherbst[d]: r23 = lea.x.hi r62 4 cx[ur2..4][0x4] rZ p1 // delay=4
09:30karolherbst[d]: r24..28 = ld.global.a64.strong.gpu.b128 [r20..22] // delay=1 wr:0
09:30karolherbst[d]: r64..68 = ld.global.a64.strong.gpu.b128 [r22..24] // delay=2 rd:1 wr:2
09:31karolherbst[d]: this could be.... `ld.global.a64.strong.gpu.b128 [r11 + u0]`, where `u0 = cx[ur2..4][0x4] << 4`?
09:31karolherbst[d]: *ur0
09:31karolherbst[d]: let me see...
09:32karolherbst[d]: mhhh
09:32karolherbst[d]: the shift might overflow, so it's still a 64 bit shift sadly...
09:32karolherbst[d]: but it could remove some reg usages..
09:33karolherbst[d]: but there are two forms where the register can either be 32 or 64 bits
09:35karolherbst[d]: yeah.. the source to lea is `(u2u64 (ushr %194 0x3))`
09:35karolherbst[d]: so overflow by one bit...
09:35karolherbst[d]: I wonder if we could optimize those as well..
10:37karolherbst[d]: gfxstrand[d]: did you want to work on this? Or something similar?
10:38karolherbst[d]: ohh maybe I write a proper patch for the latency calculation for the last instructions of a block...
10:38karolherbst[d]: that should help
10:51karolherbst[d]: uhh.. there is another bug with the static cycle counting...
11:06karolherbst[d]: yeah uhh.. not all blocks are counted π
11:07karolherbst[d]: uhm.. actually it's fine
11:09karolherbst[d]: okay.. soo.. mhh
11:09karolherbst[d]: do we want to use the number from `calc_instr_deps` or `opt_instr_sched_postpass`? Because atm it uses the latter and that one is higher
11:11karolherbst[d]: and the difference is rather impressive here 6500 vs 3500
11:11karolherbst[d]: and I don't see improvements in the chosen waits in the former
11:13karolherbst[d]: do we want to report both?
11:13karolherbst[d]: and then what's the difference
11:16karolherbst[d]: like I have an opt here that changes the result of `calc_instr_deps` from 3792 to 3562 and performance changes from 46.5 to 47.3
11:16karolherbst[d]: but the `opt_instr_sched_postpass` value stays the same
12:02gfxstrand[d]: karolherbst[d]: I'd like to get back to it but IDK when. I keep getting distracted by other (useful) things.
12:02karolherbst[d]: right...
12:02karolherbst[d]: any opinions on the cycle count?
12:03karolherbst[d]: anyway.. I want to clean up the nop mess as well, but with the current stats there is no change, but perf increases, so... dunno
12:03karolherbst[d]: we only report from `opt_instr_sched_postpass`, but what `calc_instr_deps` comes up gets ignored in the stats
12:04karolherbst[d]: and if I ditch nops, that only gets reflected in what `calc_instr_deps` calculates
12:38karolherbst[d]: the barrier thing wasn't too bad: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36877
13:36mohamexiety[d]: did anyone try gb20x in games btw?
13:36mohamexiety[d]: I am trying to test vkd3d-proton on mine and I am getting mmu faults on both cyberpunk 2077 and avatar. I tried horizon zero dawn remastered but that's having a different issue
13:42mohamexiety[d]: dont know if anyone else has blackwell airlied[d] gfxstrand[d]
14:08kar1m0[d]: mohamexiety[d]: Last time I tried playing cyberpunk on nouveau and the game was unplayable due to graphical bugs
14:08kar1m0[d]: Like the ui just disappeared on the screen
14:09kar1m0[d]: The mouse pointer and the interface buttons
14:13mohamexiety[d]: I do have the UI working fine etc here but the game mmu faults like 10 secs into the benchmark
14:14mohamexiety[d]: There are some graphical glitches tho in the benchmark
14:20phomes_[d]: I bought cyberpunk on sale just the other day. I got both graphics glitches, menu ui dissapearing, and occational mmu fault
14:20phomes_[d]: on a 4070
14:23phomes_[d]: On proton bleeding edge. I should test stable versions
14:30phomes_[d]: also broken on proton 9 and 10
14:43gfxstrand[d]: mohamexiety[d]: I haven't tried. I can plug it in and try to spin something up quick. I've got a Maxwell in there right now because I was trying to help snowycoder[d] out but I think we've got that sorted well enough now.
14:44mohamexiety[d]: gfxstrand[d]: Yeah would be cool to see if maybe there’s something we are doing wrong
14:45snowycoder[d]: gfxstrand[d]: Well, want to laugh?
14:45snowycoder[d]: I was trying to check if other tests did the same and.
14:45snowycoder[d]: We might have the same problem with glsl arrays.
14:45snowycoder[d]: e.g. `dEQP-VK.query_pool.statistics_query.host_query_reset.input_assembly_primitives.primary.32bits_patch_list_v28_p3`
14:45snowycoder[d]: ```glsl
14:45snowycoder[d]: vec4 positions[4] = vec4[](
14:45snowycoder[d]: vec4(-1.0f, -1.0f, 0.0f, 1.0f),
14:45snowycoder[d]: vec4( 1.0f, -1.0f, 0.0f, 1.0f),
14:46snowycoder[d]: vec4(-1.0f, 1.0f, 0.0f, 1.0f),
14:46snowycoder[d]: vec4( 1.0f, 1.0f, 0.0f, 1.0f)
14:46snowycoder[d]: );
14:46snowycoder[d]: layout(location = 0) out vec4 out_color;
14:46snowycoder[d]: void main() {
14:46snowycoder[d]: gl_Position = positions[gl_VertexIndex]; // gl_VertexIndex -> 0..28
14:46snowycoder[d]: gl_PointSize = 1.0f;
14:46snowycoder[d]: out_color = vec4(0.0f, 0.0f, 1.0f, 1.0f); // blue
14:46snowycoder[d]: }
14:46snowycoder[d]: Don't worry though, I'll sort this out :3
14:46mohamexiety[d]: For Avatar it mmu faulted right in the menu at the start. For cyberpunk I could get into the benchmark but it mmu faulted a bit after
14:47gfxstrand[d]: snowycoder[d]: I'm not 100% sure what I'm looking at there. What `gl_VertexIndex` value is going OOB?
14:49snowycoder[d]: I'm still checking the bug out, I think that OOB accesses into GLSL arrays cause OOR_ADDR.
14:49snowycoder[d]: The bug goes away if that gl_VertexIndex is bounded with a `% 4`
14:51gfxstrand[d]: OOB acceses of GLSL arrays are definitely invalid in SPIR-V.
14:51karolherbst[d]: Soo.. in regards to instructions which dest is used across blocks... nobody has any great ideas for those I bet?
14:51gfxstrand[d]: The problem with `gl_in` is that it has a size in SPIR-V which might not actually match the amount of hardware storage and I'm not sure there are any VUs asserting anything useful there.
14:52gfxstrand[d]: karolherbst[d]: What do you mean?
14:52karolherbst[d]: gfxstrand[d]: we assume the worst case for instructions which values are used in other blocks
14:52karolherbst[d]: so latencies are higher than needed
14:53gfxstrand[d]: Yeah, we don't do cross-block latency tracking ATM
14:53karolherbst[d]: yeah.. and I kinda see what's so painful about it
14:53gfxstrand[d]: I think we could if we modeled it off of the stuff snowycoder[d] did for texture deps
14:53karolherbst[d]: but it's a lot of perf we are missing out here.. at least in some shaders
14:54gfxstrand[d]: Or we could just have fewer blocks. π
14:54karolherbst[d]: well it's control flow here
14:55karolherbst[d]: gfxstrand[d]: mhhhhhh.. not sure maybe
15:00karolherbst[d]: the bigger issue here is that it does require the block where the use is to be scheduled, because there still might be instructions in between and you kinda need to know everything already, but the user might be in the very same block that gets currently processed
15:10karolherbst[d]: mhhh.. we also don't do reuse atm, right?
15:12karolherbst[d]: maybe I look into that, because that's gonna help everything by a lot more, lol
15:13snowycoder[d]: The problem with cross-block delays is that we need a way to merge RegTracker, right?
15:20karolherbst[d]: It's more of a cyclic dependency problem
15:21snowycoder[d]: We can assume `worst_latency`, it would be refined in another cycle
15:21karolherbst[d]: yeah.... I was considering it
15:21karolherbst[d]: but sadly
15:21karolherbst[d]: the shader I'm look at is a big loop π
15:22karolherbst[d]: *looking
15:22karolherbst[d]: https://gist.githubusercontent.com/karolherbst/52ea968e11698265a422d4b8382fcb5e/raw/850db98e869394ff6ba8bc7a082d834d0aadc27c/gistfile1.txt
15:22karolherbst[d]: `block.u 5` is a good example
15:23karolherbst[d]: `r92..94 = hmma.m16n8k16.f16 r92..96 r30..32 r76..78 // delay=14 wr:5` specifically
15:24karolherbst[d]: though maybe that one is actually simple..
15:24karolherbst[d]: mhhh
15:24karolherbst[d]: maybe if all dsts are used in already scheduled blocks we could support it for those cases for now...
15:24karolherbst[d]: and deal with the rest later...
15:24karolherbst[d]: anyway
15:24karolherbst[d]: there is also reuse
15:25karolherbst[d]: and maybe I look at this instead because I have the info needed to implement it properly
15:27karolherbst[d]: no idea how much it will matter tho π
15:31karolherbst[d]: ooof
15:31karolherbst[d]: that's a can of worms, I like it
15:35karolherbst[d]: oof...
15:35karolherbst[d]: okay.... that's all very funky
15:47karolherbst[d]: actually.. I don't think they are a thing with uniform registers.. interesting
15:48karolherbst[d]: I see more tables
15:52karolherbst[d]: I tried to set reuse on everything, but if you do that it gets constantly invalidated..
15:53karolherbst[d]: maybe I start with simple cases and check if it matters perf wise...
15:54karolherbst[d]: but I kinda wished we'd print `.reuse` on the operands, and not in the sched section...
16:01gfxstrand[d]: Looks we'll need cbindgen 0.28 if I land my `#[unsafe(no_mangle)]` chages
16:01karolherbst[d]: can't parse it otherwise?
16:02gfxstrand[d]: Nope.
16:02gfxstrand[d]: So do I drop those and stick them in a branch for later or do we update the cbindgen requirement?
16:02karolherbst[d]: your call, I don't use it and I doubt anything else does
16:03karolherbst[d]: but I doubt this change is important enough to increase the dep
16:03gfxstrand[d]: Yeah
16:03karolherbst[d]: like.. it's a `git sed` to change it π
16:03gfxstrand[d]: Yeah. Not complicated.
16:04gfxstrand[d]: if/when we care to bump to 2024, we can do it then
16:05karolherbst[d]: Okay.. I think I have all the information I need for reuse... the model codegen uses is... well... not great π
16:30gfxstrand[d]: mohamexiety[d]: I just tried and I have no DRI3 support because I don't have a new enough system Mesa. π
16:30karolherbst[d]: I wonder if I want to push that into `calc_delays`... .reuse is kinda trivial to do if you ignore branches. well actually a branch invalidates reuse anyway, soo...
16:30mohamexiety[d]: gfxstrand[d]: yeah I can only try it on arch due to this. fedora is still on 6.15.9 and 25.1.7 π
16:34gfxstrand[d]: I found a copr that gets me 25.2
16:37karolherbst[d]: mhhhhh.. sooo.. I need to know in which slot a source goes in the encoding...
16:37karolherbst[d]: *a source
16:39karolherbst[d]: e.g. for lea it's kinda messed up
16:39snowycoder[d]: karolherbst[d]: By slot you mean the position in the final instruction bytes?
16:40karolherbst[d]: snowycoder[d]: basically yes, but I think there are a few weird exceptions
16:40karolherbst[d]: but maybe not on Volta+
16:40karolherbst[d]: I know that `IADD` used slot a and c on earlier gens or something weirdo
16:41karolherbst[d]: ohh that was FADD
16:41karolherbst[d]: yeah. so `FADD` uses the a and c slot
16:41karolherbst[d]: but is encoded as a and b
16:42karolherbst[d]: but non regs are encoded in c
16:42karolherbst[d]: so dunno how it works out for FADD actually
16:44karolherbst[d]: but we also don't have a great interface there.. there is the sources iterator, but that doesn't guarantee that the slot matches
16:44karolherbst[d]: and won't work for `FADD` anyway
16:45snowycoder[d]: Why is the slot important?
16:45karolherbst[d]: because reuse only works within the same slot
16:46karolherbst[d]: FFMA R3, R3, R0.reuse, R4
16:46karolherbst[d]: FFMA R4, R2, R0, R3
16:46karolherbst[d]: can't do it if the second FFMA would have R2 and R0 swapped
16:46karolherbst[d]: which also means that I might need a pass to swap things around as well
16:50karolherbst[d]: mhhh
16:50karolherbst[d]: there are apparently 4 input slots for registers, but I don't know which instructions (on ampere) use 4...
16:50karolherbst[d]: I know there are some on hopper+
16:53mhenning[d]: karolherbst[d]: This was done at Faith's request. The opt_instr_sched_postpass one estimates the length of variable latency instructions like memory loads. The calc_instr_deps one does not.
16:53karolherbst[d]: mhhhhhhhh
16:54karolherbst[d]: so we kinda have two poor choices here?
16:55mhenning[d]: karolherbst[d]: I have ideas for this, but I haven't really been working on scheduling until my existing scheduling mr gets reviewed
16:55karolherbst[d]: yeah.. that's fair
16:55karolherbst[d]: I'm just sad that I don't have stats for https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36877 π
16:55mhenning[d]: karolherbst[d]: I mean, they're all estimates. None of them will be perfect
16:56karolherbst[d]: yeah, but my MR makes things better and the stats stay the same... and if we improve the sched calculator otherwise it would also not show
16:58karolherbst[d]: but whatever.. I mean it's not often that we'd improve things there, but would also be cool to at least see some impact, even if the values aren't like.. correct
18:02snowycoder[d]: For OOR_ADDR and MISALIGNED_ADDR I say to diable them by default, we should still keep an interface around for debugging when something fails (NAK_DEBUG=warp_warnings?). OOR_ADDR seems to just catch out-of-bounds accesses and there are a lot of those in the tests, estimating by the frequency I'd say at least 100 tests.
18:04snowycoder[d]: To be fair, I'm tired of debugging tests that pass, I want to work on games and performance improvementsπ
18:50airlied[d]: mohamexiety[d]: I think I only tested Talos on it
18:50mohamexiety[d]: I see, fair
19:01mohamexiety[d]: gfxstrand[d]: I am thinking of trying this out (+ any coprs for official 6.16). are there any potential risks/consequences to doing this?
19:08gfxstrand[d]: Damnit! This copr stopped building (thanks to rust module deps) right before we landed the blackwell patches.
19:09karolherbst[d]: π
19:09karolherbst[d]: there is another way
19:13karolherbst[d]: ehh..
19:14karolherbst[d]: dnf hates me today
19:15karolherbst[d]: nevermind.. fedora is on 25.1 even on rawhide
19:25mohamexiety[d]: I hate build systems I hate build systems I hate build systems I hate build systems I hate build systems I hate build systems
19:37ermine1716[d]: Who doesn't
19:43gfxstrand[d]: mohamexiety[d]: So say we all
19:44airlied[d]: is the che copr not up to date?
19:44gfxstrand[d]: which copr?
19:45karolherbst[d]: ohh the che copr..
19:45karolherbst[d]: https://copr.fedorainfracloud.org/coprs/che/mesa/
19:45karolherbst[d]: `25.3.0-0.18.git1fad151 `
19:46gfxstrand[d]: Okay, yeah, that one looks new enough. I was looking at a different one
19:49gfxstrand[d]: Yeah, now I have an accelerated desktop
19:51gfxstrand[d]: I'm curious to know how brrr my 5090 goes with a game
19:58karolherbst[d]: mhhh
19:58karolherbst[d]: I was doing some benchmarking and apparently zink+NVK has a lower kernel launch overhead than iris or zink+anv...
19:59karolherbst[d]: but not sure if intel is a great comparison here, because it's like spending a lot of time calculating a timestamp for the timestamp query π₯²
20:00karolherbst[d]: nooo zink what are you doing :blobcatnotlikethis:
20:01gfxstrand[d]: mohamexiety[d]: ```
20:01gfxstrand[d]: gsp: mmu fault queued
20:01gfxstrand[d]: [ 593.523292] nouveau 0000:01:00.0: gsp: rc engn:00000001 chid:8 gfid:0 level:2 type:31 scope:1 part:233 fault_addr:0000003ee5676000 fault_type:00000002
20:01gfxstrand[d]: [ 593.523296] nouveau 0000:01:00.0: fifo:d00000:0008:0008:[Dragon Age The [6522]] errored - disabling channel
20:01gfxstrand[d]: [ 593.523299] nouveau 0000:01:00.0: Dragon Age The [6509]: channel 8 killed!
20:02gfxstrand[d]: I'm gonna plug in my 4060 just to make sure it's really blackwell and not some other regression
20:02airlied[d]: gonna magic eight ball and say bindless textures π
20:03gfxstrand[d]: ?
20:03airlied[d]: I might have also only tested on my bound texture branch
20:03airlied[d]: the only major feature difference I can think off with nvk blackwell and pre-blackwell,
20:05gfxstrand[d]: Oh, and not the extra 2 layers of page tables they added? π
20:05gfxstrand[d]: Or some other misc instruction encoding wrong
20:06airlied[d]: nah I'd expect CTS to choke on those π
20:07mohamexiety[d]: gfxstrand[d]: yeah I was excited for this too and it just died. so disappointing
20:07mohamexiety[d]: gfxstrand[d]: yup
20:07mohamexiety[d]: gfxstrand[d]: I did and on my end at least, Ada doesnt have this
20:07mohamexiety[d]: but I only tested cyberpunk. I couldnt test avatar as my Ada system doesnt have space woops
20:07mohamexiety[d]: but yeah, that's nice to see it reproes at least
20:08mohamexiety[d]: hopefully it's nothing kernel side and isnt awful to fix
20:13gfxstrand[d]: Could also have to do with bindless UBOs
20:17gfxstrand[d]: gfxstrand[d]: That's actually looking really plausible...
20:18airlied[d]: ah yes bindless ubos could also be it
20:18gfxstrand[d]: That fault address is missing the top 2 bits
20:20gfxstrand[d]: No, never mind. It's okay
20:20gfxstrand[d]: Bindless UBOs still seem way too plausible.
20:24gfxstrand[d]: Okay, full apps work on 4060 still
20:32gfxstrand[d]: I wonder if I can capture a trace on 4060 and replay on 5090... π€
20:32gfxstrand[d]: Without image compression, it'll probably work
20:45mhenning[d]: uhh one thing I've been wondering about with blackwell is that on the kernel side we did this: https://www.mail-archive.com/nouveau@lists.freedesktop.org/msg46453.html
20:45mhenning[d]: but we never made a similar change on the userspace side
20:46mhenning[d]: but I'd expect cts to fall over immediately if that was the issue? so I have no idea how anything works on blackwell right now
20:46gfxstrand[d]: We did for a couple things but uh...
20:47mhenning[d]: There's definitely still uses of the old method lying around, eg. nvk_CmdWaitEvents2
20:51mohamexiety[d]: also DEs would probably fail over too
20:51gfxstrand[d]: DEs aren't going to use events
20:51gfxstrand[d]: but games might
20:51mohamexiety[d]: hm
20:54gfxstrand[d]: But the CTS should
20:54gfxstrand[d]: If those were erroring, I'd think we'd know
20:59calico: copr?
20:59calico: gfxstrand: btw I tried to DM you
21:00chikuwad[d]: DMs don't work across the irc <-> discord bridge
21:00calico: you guys are using Discord???
21:01chikuwad[d]: that's what the [d] in the nick represents, yeah
21:01calico: oh shit
21:01calico: I thought she was just ignoring me
21:01calico: kek
21:01chikuwad[d]: https://discord.gg/U5ghAFSh
21:02magic_rb[d]: And some people like me are using discord through matrix :D matrix \<-> discord \<-> irc
21:02magic_rb[d]: Pretty sure "some people" is just "me"
21:02chikuwad[d]: no I'm pretty sure it's just you here :p
21:02magic_rb[d]: :p
21:03magic_rb[d]: Im special
21:03gfxstrand[d]: I'm also on IRC but sometimes u forget and don't check it for days at a time
21:04calico: chikuwad: thanks to you I need have to quit another server :P 200 limit reached
21:04chikuwad[d]: glad to be of service :salute:
21:04magic_rb[d]: Holy 200 servers, how tf do you keep track of them, wow
21:05magic_rb[d]: Are you a server emoji collector?
21:05calico: I don't browse most of them
21:05calico: ok quited Lutris
21:06fellfromthesky[d]: hey me how are you @calico
21:07calico: hey me how are you @fellfromthesky
21:07karolherbst[d]: just reply
21:08chikuwad[d]: calico: ah, nerds
21:08chikuwad[d]: :3
21:09gfxstrand[d]: chikuwad[d]: Pretty sure that's a retirement (or at least highly recommended) to hang out in this channel. π€ͺ
21:09chikuwad[d]: oh yeah absolutely
21:09chikuwad[d]: I meant nerds (endearing)
21:09chikuwad[d]: not nerds (derogatory)
21:10steel01[d]: You have to be a retired nerd to be here? Whelp, I'm out.
21:10chikuwad[d]: :blobwoozy:
21:11mohamexiety[d]: o/
23:56julianbouqette: I am working considerably faster and more efficient than all of your teams altogether. Still i do not want to commit the needed code from home. And due to your brain errors and arrogance combined, my phone has not rang yet to get down to the job for national security, we showed so debiliated people to the world that i dunno if there is anything to change/rescue for Estonia anyways or
23:56julianbouqette: most people will just go to concentration camps in the following year from that country.