IRC Logs of #nouveau on irc.freenode.net for 2025-08-27

00:24 gfxstrand[d]: Here's a thought: We should use the push dumper or something similar to detect subc switches. If we plugged it into nvkmd, it would have full cross-command-buffer context to be able to track accurately.
00:25 mhenning[d]: Yeah, I was thinking we could get some statistics on it
00:26 mhenning[d]: don't really need stats for the simple changes so far but I could see it being useful in the future
00:26 mhenning[d]: or apparently the hardware tracks it if we ever hook perf counters up
00:29 gfxstrand[d]: Yeah, that'd be nice
01:07 gfxstrand[d]: Compression takes DA:TV from 42 to a pretty solid 50 FPS. I'll take 20%.
01:07 gfxstrand[d]: mohamexiety[d]: ^^
01:07 gfxstrand[d]: Especially given that it's probably stalling like mad in there.
01:07 gfxstrand[d]: And descriptors are garbage
01:08 gfxstrand[d]: I'll try to review tomorrow
01:08 gfxstrand[d]: For now I'm happy with a full D3D12 title working with no glitches
01:08 HdkR: \o/
01:25 gfxstrand[d]: I think compression and zcull need to be top priority for review since they're gating kernel changes. If we can get both landed before the new window, we can bump to 1.4.1 for both of them and it'll be a nice little package.
01:25 gfxstrand[d]: Compiler stuff can happen whenever
01:26 gfxstrand[d]: Oh, and we should really get the one video patch landed for the next kernel, too.
01:26 gfxstrand[d]: But IDK how long it'll take to get the Mesa bits in shape. Fortunately, the kernel patch there is a one-liner so maybe we can just slide it in and not worry as much about having userspace solid first.
01:26 gfxstrand[d]: airlied[d]: thoughts?
01:28 airlied[d]: seems like it should be possible, would want to be on the list for review in next couple of weeks
01:51 gfxstrand[d]: Yeah. zcull and compression definitely look possible. I'm a little more worried about video not making the cutoff but that's probably not as important, TBH.
02:32 redsheep[d]: gfxstrand[d]: I don't have the game or a 5090 to compare but I am still curious what settings and resolution this is at. If that's 4k native, given we have no dlss, I am not even sure that's that bad if the settings were turned up.
02:33 gfxstrand[d]: FHD
02:33 gfxstrand[d]: My poor pikvm can't do 4K
02:34 redsheep[d]: Ok, so that's a ways behind prop, but still very promising
02:36 redsheep[d]: Especially considering I remember before I dropped off the map that was a game that started out at seconds per frame, like 6 months ago
02:37 redsheep[d]: Probably running hundreds of times faster
02:42 redsheep[d]: gfxstrand[d]: Found it. Yeah, 50 fps is a 115x increase over where it was almost exactly 7 months ago.
02:45 gfxstrand[d]: Yeah. It helps that I upgraded to a 5090 between this. But very annoying that it didn't help that much. We've still got a lot of stalls hidden in there somewhere.
02:46 gfxstrand[d]: The difference between a 4060 and a 5090 is only like +50%. Probably not even that.
02:46 gfxstrand[d]: So yeah... Stall City.
02:48 redsheep[d]: I saw a pretty big perf jump from 3090 to 4090 and I think that was probably just frequency letting it speed through those stalls and whatever else keeps big gpus from really going to town. 4060 and 5090 have similar frequency, only 50% makes a lot of sense to me
02:49 redsheep[d]: Honestly I am surprised you saw that much
02:57 gfxstrand[d]: It means there's a little bit of rendering time between the stalls. 😅
02:59 redsheep[d]: I'm very curious how NAK will shine once those stalls start to get resolved. It was like a year ago that a few specific compute workloads started to hit performance parity, like with that one tflop benchmark I forget the name of
03:03 airlied[d]: dang it, fixing the storm code, still have another race
03:38 gfxstrand[d]: redsheep[d]: There's still plenty of compiler work to do. Karol is cranking away at address calculations. I need to review and land Mel's second scheduler. We need to do predication. We need to probably implement the Nvidia bounds checking extension so structure buffers suck less.
03:39 redsheep[d]: Oh I'm not trying to pretend NAK is done. I just think it will be cool to see what it looks like when the gpu isn't stuck in a 3 legged race
03:44 gfxstrand[d]: Yeah
03:44 gfxstrand[d]: Big GPU need to go brrrrr
03:51 mangodev[d]: *little* gpu also need to go brrrrr
03:54 redsheep[d]: Little going brrrrr will probably come naturally with big going brrrrr
03:55 redsheep[d]: Usually when cpu bound nvk is already reasonably fast, sometimes very fast. The little gpus will be just fine.
03:59 gfxstrand[d]: And tiny GPU need to go brrrr, too. But first tiny GPU needs to power on. 😢
04:00 airlied[d]: hmm just reading the fence a second time seems to be sufficient
04:11 airlied[d]: gfxstrand[d]: mhenning[d] https://gitlab.freedesktop.org/nouvelles/kernel/-/tree/nouveau-fix-fence-races?ref_type=heads if you have a chance for any wierd channel fence stalls
04:13 redsheep[d]: I'm reading the Nova todo doc, and I am noticing a lack of mentioning display code. Does that exist anywhere yet, and is that nova core or nova drm?
04:16 airlied[d]: it would be in nova drm and it doesn't exist
04:17 mhenning[d]: airlied[d]: is that any different from just waiting a little bit
04:18 airlied[d]: I did an nsleep(500) but I'm not sure how to work out shortest wait that actually matters, reading it twice seemed simpler
04:18 airlied[d]: ndelay(500) that is
04:20 redsheep[d]: I should really properly learn rust so I can actually help instead of just complaining
04:22 airlied[d]: doesn't seem to helper transfer queues 🙁
04:25 redsheep[d]: How much of this todo needs to disappear before trying to write display code for nova would even be relevant? I've been going on about display issues for two years now and I kinda feel like even if there are much more qualified people to write it I am quite likely the one who cares the most about display working right, so I might as well actually help.
04:27 redsheep[d]: Even windows display handling has been infuriating lately and I am itching to do something about not having my setup work right
04:47 airlied[d]: I think all the to-do has to be done before display
04:48 airlied[d]: Just checked and yes all of that and rust KMS bindings
06:51 mohamexiety[d]: gfxstrand[d]: Oooh yesss! That’s super good to hear
06:51 mohamexiety[d]: gfxstrand[d]: Yup! I didn’t want to send the kernel patches as I wasn’t sure we would need more or not
07:21 robinpatrol: Do you actually understand what collision is? as when 16powers first time yield a non-invariant 2to1 map of a probability as 50 percent each/both, not fifty sixty as like late matti nykanen calculated :D. This means you have to reorder the sets, so this can be done only through conditional such as value remains the same or gets incremented or get's decremented. That in the end means
07:21 robinpatrol: casting arithmetic operators zero and permutation fields to IR encoding's decoder. So for mul the zero minification field is say 5+9+13+17=44 meaning it from now on permutes to 5 to 15 which itself is presented by 5 going forward. however 14 as 4 it's just that under the base transition of 5 you now commit/contribute 15 to the decoder, so under 5+5 in case of mul is now mapping 15*15=225
07:21 robinpatrol: however 14 transitions to 13, so that is why engines are also basing on calculations, cause transitioning box for cars are also number based transitions virtually backed up by physical resources albeit however. So 10 in the first permutation field offset transitions to 5, so you have two 10s but they are with different checksum offsets/indexes. for example one has 256 and another one 257
07:21 robinpatrol: etc. This is a fiction basing on true manipulation of the outcome fixings also can be called a conditional that is something based of modulus or modulo divide that hardware is meant to actually solve and does solve it true logics.
08:06 gfxstrand[d]: airlied[d]: Another, terrible, option would be to do a semaphore wait on the command streamer before signaling the interrupt. We're doing a full WFI anyway so it's not really stalling more. But that *should* ensure it lands in memory before we trigger the interrupt
08:07 gfxstrand[d]: Seems kinda horrible, though.
08:09 gfxstrand[d]: But I also don't see nvidia doing anything like that in openrm
08:12 airlied[d]: My only explanation is their irq handling latency sucks
08:12 airlied[d]: Or maybe it has to get to userspace for it to read the fence in the end
08:13 gfxstrand[d]: I mean, for a CPU wait where you know what you're waiting on, you can potentially use that as just sort of a wakeup and do a bit of busy-looping?
08:15 gfxstrand[d]: gfxstrand[d]: Not an error. It really doesn't like it. Not sure if it's Steam or Zink.
08:20 mohamexiety[d]: Probably the broken modifiers
08:26 gfxstrand[d]: airlied[d]: Still hangs sometimes. It takes a while, though.
08:29 marysaka[d]: if it's the channel timemout + kill, I get that /a lot/ on my 4060 when running only weston and steam
08:29 marysaka[d]: (even more when steam download something or fossil compilation is running :blobcatnotlikethis: )
08:30 gfxstrand[d]: I'm gonna try something
08:31 gfxstrand[d]: I have a vague notion that this did actually come up when we were talking about timeline semaphores back in the day
08:44 gfxstrand[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1410183110854705274/0001-nouveau-Wait-for-semaphore-writes-to-land-before-rai.patch?ex=68b016dd&is=68aec55d&hm=df45b20ceeb0cdb6a60afcd6fe88ddf90989f825bb138278240c638d8859d5c6&
08:44 gfxstrand[d]: airlied[d]: This was my other thought.
08:46 airlied[d]: Yeah I don't like it but if it works I'd be okay with it, I do wonder if there is a cache somewhere getting in the way, but the wfi should also sysmembar
08:47 gfxstrand[d]: The WFI should sysmembar but the WFI happens before the write. We need a sysmembar after the write
08:47 mohamexiety[d]: Isn’t adding more WFIs kinda bad?
08:49 gfxstrand[d]: If they happen back-to-back maybe not too bad?
08:50 mohamexiety[d]: Hopefully :thonk:
08:53 airlied[d]: I also tried adding another sysmembar just in case
08:53 airlied[d]: And dma_mb on the cpu
08:54 gfxstrand[d]: Is that enough, though? Wouldn't the barrier have to come from the thing with pending writes?
08:54 gfxstrand[d]: Or is the fabric coherent enough that it can come from anywhere?
08:55 markmason: There should never be any barriers used for anything! And such behaviour has to be enforced by the compiler.
09:03 gfxstrand[d]: lmao
09:05 markmason: and you perv are laughing at this or what? It was posted many years ago, boyi has schedulers at opencl level, and there are three already available, 1.2 2.1 and 3.2 one
09:06 markmason: You never use memory consistency in one of them, since the memory space is divided in such way
09:07 markmason: all semaphores mutexes, memory locks, barriers nothing should be used in real codegirl
09:08 markmason: there are 100slides or more about their work, which all i posted, same with dlvc reports, papers all available.
09:09 markmason: And my execution answer set programming you seem to laugh or what at too, it's the virtual range modifier posted to the decoder that takes care of the range?
09:10 markmason: so if 10 is captured at field1 it can result in a probable outcome, where the probability is 100percent, hence it has nothing to do with markov chains
09:10 markmason: it does not predict, cause it knows that
09:13 markmason: And that jack as well as alex and many of those tyrans who assaulted me, are soon both dead, is not a prediction, i know ...and if same thing happens from another front samewise, they get executed with 100percent probability
09:17 ermine1716[d]: Lockfree opencl wen
09:17 gfxstrand[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1410191517364129853/0001-nouveau-Membar-before-between-semaphore-writes-and-t.patch?ex=68b01eb1&is=68aecd31&hm=646fc19b5d455e9c8303180c3b5eddc1554fd55764478fcd9507b54a3604e6c2&
09:17 gfxstrand[d]: airlied[d]: Another idea
09:18 gfxstrand[d]: Rebooting now to test
09:21 markmason: when you have correct memory management you do not need locks, you read everything from their copies.
09:22 gfxstrand[d]: gfxstrand[d]: Looking good so far
09:22 markmason: so in another words, if you do not have performance problems in the way you do not need locks for anything
09:23 markmason: they come from control flow recovergence and convergence
09:23 airlied[d]: I already tried that one, and it failed but maybe not in association with the double read
09:24 gfxstrand[d]: I've pulled the double-read out
09:24 markmason: this is in other words warp stacks , and originating from intruction data graph , data dependence in other words
09:24 airlied[d]: On its own here that and mb on CPU didn't fix it
09:24 gfxstrand[d]: I've got your first patch and that
09:24 gfxstrand[d]: But I'm not 1000% sure things are stable yet
09:33 karolherbst[d]: mhenning[d]: gfxstrand[d] now that the DFS fixes are merged, we should also merge this one: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36528/diffs
09:33 gfxstrand[d]: yeah
09:33 karolherbst[d]: will do some testing on the bigger shader-db today
09:34 karolherbst[d]: just to double check
09:34 gfxstrand[d]: airlied[d]: I'm running DA:TV with `NVK_DEBUG=push_sync` right now and it's nice and stable at 32 FPS
09:36 gfxstrand[d]: (I know `push_sync` is working because otherwise it'd be at 50)
09:40 gfxstrand[d]: airlied[d]: My whole frankenkernel is here: https://gitlab.freedesktop.org/gfxstrand/linux/-/commits/nvk?ref_type=heads
09:41 nilseggert: what i actually said was that this code belongs to firmware that get's defense invested in. And should not be released to people who connect wifi antenna and baseband firmware to accelerators like this. People such as doug freed murdering idiot freaks.
09:45 airlied[d]: Can't remember if I did the membar after fixing the other bug, I might have only done sys membar which i though was a bigger hammer
09:47 gfxstrand[d]: I don't love throwing extra membar into the command stream but if we need it... <a:shrug_anim:1096500513106841673>
09:48 gfxstrand[d]: Ideally, we'd do all the semaphores for all that fences and then one membar and one interrupt. But that's not the way anything is architected.
09:52 mary-annhilbert: such a problem as framerate performance is not existing, what exists is the risk to expose that causing rebellion and killings to be increased, such that a random fags like you start to bother real people. And this is what laura and it's cranks were doing in our territory btw. And on continuounce their execution takes place with hundred percent probability i can guarantee this, and what
09:52 mary-annhilbert: i can guarantee is that i do not want this slut nor it's asshole fungus from nigeria cambodia south-africa and sweden crank stis.
09:53 airlied[d]: gfxstrand[d]: I think I found uvm doing a membar at one point but I need to go and dig again, maybe it was a wfi
09:55 HdkR: Weather is a bit warm today eh?
09:55 x512[m]: mary-annhilbert: nilseggert markmason Spambots?
09:56 gfxstrand[d]: airlied[d]: I'm pretty sure they do something somewhere. I remember write ordering coming up when we did the timeline semaphore spec and James being annoyed that he had to emit stuff to enforce it.
09:56 Mary: x512[m]: ignore them, they just want attention
09:56 Mary: (somehow that pinged me)
09:57 airlied[d]: There may also be stuff in userspace dumps
11:50 michalkrynow: It does not have any multipliers, what it has is 5+5 is mapped to two exclusive states 1 and 225 effectively carrying 225 over from 44, if the number has more bits in in the future just like 41 or 37 it's no longer 10 for operand1 it is now 5+41+37 and hence would carry over (1024+512)*1 as explained, now that is presented as a mapping from 89=(operand1=5+41+37+operand2=5)to83 , so at
11:50 michalkrynow: which bank is 89? out of 1024*1024*1024*4-1 cells where, it calculates the bank itself, cause it is in bank1 due to none of the 4bit fields doing any transitions we say the bank of the answer is banknr1 item 89 aka 1536 after translation. so if it had indirection from 44 we account it as bank nr1 for an example, since 225 is 33+29+25+5=92 so 44 and bank1 consists of 16 transitions per
11:50 michalkrynow: field or values virtually position dependent decodeable to array of powers of twos, since 16*16*16*16is65536. So it assembles the answers at runtime from transitions. and it is fair to say every 2fields has 1bank 256entries wide with storage requirement of previously shown so as for tenth powers colliding it goes into transition level1. So tiled chip is pretty easy to implement. now 64
11:50 michalkrynow: can have wider sets on that format ,it transitions to level1 from 4096 or even 65536 and hence will indeed have fewer gears of levels of transitions or indirection would be very much the term for internal transition levels. So pagetables for 32bit aka grearbox of it can be only one indirection deep at 65536 wide banks, and 64bit 4 levels deep as it seems offhand to me for 65536bank width.
11:50 michalkrynow: More levels you have the accuracy remains the same, however you get storage efficiency in that tradeoff on same performance. So this is known mostly as optimal warp size scales linearly on number of threads or work-items. So more indirections uptocertain limit will raise the compression in our example instead, and indirectly goes throughput of computation raises raise performance
11:50 michalkrynow: indirectly. In other words, you need to utlize the an optimal indirection occupancy.
13:16 barryfrazier: the internal transitions/indirections corresponding to the collisions itself are changing the base/stem of the result of the needed calculation. so as seen 225 was taken to be stem where to seek at, it relies on the rule that all arithmetic spectrum or IR is continuous, they are continously distributed over the spectrum. Distribution does not have to uniform either, hardware from the 70s
13:16 barryfrazier: 1969 when armostrong and aldrin went to moon, it was able to do it already. I doubt that computer engineers are stupid especially the hw people, i think they were not stupid even during the times of punchcode programming then they had similar decimal codes, and engineering is not thought to be so difficult, when you can think or use your brain, which obviously you never could do. And since
13:16 barryfrazier: you abused me to get better starts platforms in all ways, your silence or clowny shit you do here, does not save you in the future, you likely all end up in jail , like my family cause things you do is reversing nature and psychology and everything that makes sense by conspiracy that works illegally in favour of you. You will expect my lines to clash with such trash.
13:17 karolherbst[d]: gfxstrand[d]: posted new stats: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36528#note_3071765 The increase looks reasonable to me looking at the shaders being most impacted by the scaling
13:38 gfxstrand[d]: The moon landing, eh?
13:46 chikuwad[d]: h u h
13:46 chikuwad[d]: ../mesa/src/compiler/nir/nir_lower_atomics.c:93:39: error: ‘struct _nir_ssbo_atomic_swap_indices’ has no member named ‘offset_shift’
13:46 chikuwad[d]: 93 | .offset_shift = nir_intrinsic_offset_shift(intr));
13:47 chikuwad[d]: gfxstrand[d]: when I rebase this MR on top of current main
13:47 chikuwad[d]: wtf
13:47 chikuwad[d]: none of the changes touch that file .-.
13:49 chikuwad[d]: OH
13:49 chikuwad[d]: ok am dumb
13:51 chikuwad[d]: there we go, that's sorted
13:56 sergeigolovkin: I do not pick sides, i think russians are part of europe and almost nearly always correct on what they complain in comparison with europes human shit. I know that when they attack Estonia in few years, they have mild own interests and would deviate or alianate mildly off from killing all the correct trash off here. But europe deviates with everything against my interests. For an example
13:56 sergeigolovkin: hitler did not implement anything useful from mein kampf or previously promised, he just collected the wermacht gold from arabic kings, and did not kill the retards criminals nor the correct scamterrorjews, pretty much at random. To me not acceptable.
15:46 cubanismo[d]: Do folks know what/who spawns these bots?
15:46 cubanismo[d]: I don't really understand the motivation
15:47 mohamexiety[d]: I am new here but my understanding is it's a very dedicated person who needs help
15:47 cubanismo[d]: Ah, K
15:47 mohamexiety[d]: has been here for over a decade and used to jump in people's DMs even from what I heard
15:47 cubanismo[d]: So just vanilla trolling
15:47 mohamexiety[d]: yeeeah
15:47 mohamexiety[d]: just taken a bit too extreme
15:48 cubanismo[d]: Yeah, I mean, I was 12 once.
15:48 cubanismo[d]: On IRC
15:48 mohamexiety[d]: heh :KEKW:
15:48 cubanismo[d]: Just curious. Thanks
15:52 ermine1716[d]: Iirc it's first time he denies holocaust
15:55 karolherbst[d]: cubanismo[d]: brain damage actually, but yeah....
15:58 karolherbst[d]: like I highly doubt it's on purpose in the "I know what I'm doing" sense or for the laughs or whatever
15:58 karolherbst[d]: It's a thing for the past 10+ years or so
16:10 gfxstrand[d]: He even came to XDC one time long ago.
16:12 chikuwad[d]: ok I should probably take a break my head is starting to hurt
16:14 mhenning[d]: chikuwad[d]: relatable
16:14 chikuwad[d]: I'm a fair bit confused by what exactly I'm supposed to do to hook up the shared lowering, but I have a slightly better understanding of NAK and NIR than I did when I started, so.. progress?
16:15 mhenning[d]: yeah, that's getting closer!
16:15 chikuwad[d]: I tried looking at what we do for VK_KHR_shader_atomic_int64 but it did not enlighten me as much as I thought it would
16:19 chikuwad[d]: what amuses me is that int64 atomics have a separate extension dedicated to images
16:20 chikuwad[d]: while the f16vec one says `buffer, workgroup, and image storage classes are all supported`
16:24 chikuwad[d]: maybe tomorrow I'll look at output from RUST_BACKTRACE on a failing test and NAK_DEBUG=print and try to figure out what to do that way
16:34 sergeigolovkin: Somebody has to be a biggest donor illegally and the showbase for all the actors , and i am not playing a victim, i know those things are we are prepared to kill very big amount of those fecalists. The black market for the substances sold from my stem is very large, and it does not span to Estonia to england, but started from Sweden and had the same alliance in finland as well. Very big
16:34 sergeigolovkin: powers are spawned to finnish you off. I never came to any of the xdc's and there is no more ratarded human on planet than karolherbst very hippocratic stupid and and egocentric animal.
16:57 gfxstrand[d]: chikuwad[d]: Feel free to ping if you get stuck. If you wanna keep trying to figure it out for yourself, that's fine, too.
16:58 gfxstrand[d]: But sometimes fighting through something is good for learning.
16:58 chikuwad[d]: yeah I'm gonna keep at it for a few more days at least
16:58 chikuwad[d]: if only for the purpose of more familiarity with how everything is laid out
17:09 chikuwad[d]: how to get into driver dev as a beginner: a guide
17:09 chikuwad[d]: step 1:
17:09 chikuwad[d]: https://tenor.com/view/slam-wall-gif-8599782
17:12 cubanismo[d]: I think the first step is post a message on mesa-dev saying you did a C program and want to help.
17:13 karolherbst[d]: you buy a laptop with a kepler nvidia GPU and notice that games are kinda slow... and figure maybe you could do something about it 🙃
17:23 tavidsharon: *finish , this is achieved through the war, since there is no jurisdiction , i have in my basic reach very many about the whole book worth of cheques and payment systems where the money circulated etc. there is no court in those countries who would get me to justice, cause simply there are so many ill people in the world.
17:23 tavidsharon: I am 42 years of age also, and similarly i know enough about X and you were never the real programmer force behind that nor fork one etc. fork was made by keith packard and jim gettys, and it succeeded more than accelerated x or whatever it was called. Jim gettys i have seen from a picture of one laptop per child project, keith packard last time i checked was programming some sensor stuff
17:23 tavidsharon: for i dunno drones or something in java and i never met him in person either.
17:23 tavidsharon: X foundation is actually a cool thing for society in any form like for wayland and other protocols or apps and apis first stem harvesting and growth limiting chip was ordered at me when i was 11 years of age.this command was placed from England by jews, but technology Germans had also... The info had all leaked to me long time ago in 2008. And nearly any real intelligent people knew about
17:23 tavidsharon: my case also in cambodia, Russians have bombed those genefolding hospitals, and Estonian police is corrupted into this with along the syndicate.
17:23 tavidsharon: Me i am dying certainly if thte chip is capable enough, and likely it is as of today meant to end all my chapters this time for sure, we already have the info whom to kill for that, and it will be done.
17:54 mohamexiety[d]: mohamexiety[d]: gfxstrand[d] ready now and no longer a draft
18:03 chikuwad[d]: oh while we're doing reviews
18:03 chikuwad[d]: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31518
19:24 sabrinacantale: You were 12 LOL you were are and always will be abortion leftovers nomatter what you try to revert, as 11years old i already won tournaments in national team in a style of not losing any set in six consecutive games. You do not even understand how to add numbers. Your criminal penis science and anal does not make you older than 12now!! And i never denied anything about that massacre,
19:24 sabrinacantale: what i stated jews were killed in masses, but they were not those tribes that were ordered to be killed just random ones who were also classified as jews.
20:14 mohamexiety[d]: gfxstrand[d]: hm, mhenning[d]'s comment inspired me to try running the MR without the MME upload and it passes the compute CTS and pyrowave also works. do I drop the upload? :thonk:
20:14 mohamexiety[d]: it kinda feels unintuitive that the compute and 3d MME share memory though...
20:18 mhenning[d]: I mean, there might just be a single MME, meaning only one set of states
20:19 mhenning[d]: There are a few things like that where compute/graphics share resources and so we only need to do an operation on one to apply to both
20:19 gfxstrand[d]: There's two separate things her: Whether or not they share MME_SCRATCH registers and whether or not they share upload area.
20:20 gfxstrand[d]: Should be easy enough to test with the MME unit tests
20:20 mohamexiety[d]: I just deleted the entire upload, so I'd assume they share both
20:20 mhenning[d]: Well, that shows they share upload area
20:20 mhenning[d]: but maybe not scratch
20:21 mohamexiety[d]: hm, how do I run the tests?
20:23 gfxstrand[d]: `-Dbuild-tests=true` and then it's in _build/src/nouveau/mme
20:23 mohamexiety[d]: do I need to modify anything in the tests?
20:23 mohamexiety[d]: like to test the compute mme
20:23 gfxstrand[d]: Yeah so you'd need to modify the HW tests
20:23 karolherbst[d]: ohh right
20:23 karolherbst[d]: there was this MR I wanted to test
20:23 karolherbst[d]: which one was it..
20:24 karolherbst[d]: the one making compute super duper fast
20:24 mohamexiety[d]: mohamexiety[d]: this one
20:24 karolherbst[d]: thanks!
20:25 karolherbst[d]: I know I'll get disappointed, but you never know
20:26 gfxstrand[d]: gfxstrand[d]: Specifically, we need a way to run compute tests and then make a test which writes a scratch reg on one subc and reads it on the other and either.
20:26 karolherbst[d]: I'm almost 99% confident that it's all shared
20:27 karolherbst[d]: but yeah.. one should verify it
20:27 gfxstrand[d]: I'm 100% confident we should test it. 😛
20:27 karolherbst[d]: 😄 it's fiiine
20:28 mohamexiety[d]: this test code is all weird
20:32 karolherbst[d]: Yooooooo
20:32 karolherbst[d]: 🚢 it
20:32 karolherbst[d]: some sub-tests speed up by like 20% 😄
20:32 karolherbst[d]: https://gist.githubusercontent.com/karolherbst/bc702ef50ba1dfb429d02e5caea94e41/raw/0133257d9b441f7410df61851f3dd3cb37eea605/gistfile1.txt
20:33 mohamexiety[d]: karolherbst[d]: smh my head
20:33 karolherbst[d]: the subc got me nothing
20:33 karolherbst[d]: *one
20:34 karolherbst[d]: ohh yeah.. let me do nvidia for comparion
20:34 karolherbst[d]: the int subtest should go above 100TFlops now with that hopefully
20:34 mhenning[d]: Assuming you mean https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36823 then yeah, that one mostly helps with certain kinds of flushes that your test might not use at all
20:35 karolherbst[d]: yeah...
20:36 mohamexiety[d]: do I get a billion dollars for helping AI go faster
20:36 karolherbst[d]: nvidia is having a day with some of those tests
20:36 karolherbst[d]: `TILE_M=256 TILE_N=128, TILE_K=16 BColMajor=0 workgroupSize=128 1.392538 TFlops` 😄
20:36 karolherbst[d]: weak
20:36 karolherbst[d]: mohamexiety[d]: you get fired and replaced by AI
20:36 mohamexiety[d]: :bleaker_kekw:
20:36 karolherbst[d]: also with AI
20:37 karolherbst[d]: nvidia: https://gist.githubusercontent.com/karolherbst/9aa9431e6f1403d9dd289882b619cb7e/raw/35fcd4dccd7d34aee8319045260bdc89b28c2f6f/gistfile1.txt
20:37 karolherbst[d]: getting there
20:38 karolherbst[d]: ohh I have an idea
20:40 karolherbst[d]: with some totally legit opts: https://gist.githubusercontent.com/karolherbst/b51778a3e9e3913d2524f26dfb87a3af/raw/c121b283d667159a17333e95a2374b4cda83973c/gistfile1.txt
20:41 mohamexiety[d]: funsafe math type of legit?
20:41 karolherbst[d]: nah...
20:41 karolherbst[d]: compared to the "new" it's `op.max_unroll_iterations = 1024;`
20:42 karolherbst[d]: let me try an even higher number, because I think the bigger ones aren't unrolled there
20:42 mohamexiety[d]: I wonder why that works so well
20:42 karolherbst[d]: the shader is like one loop being iterated 1000 times 😄
20:43 karolherbst[d]: but yeah.. cross block scheduling isn't implemented
20:43 karolherbst[d]: and if you nuke a few cycles each iteration it kinda give you those numbers
20:43 karolherbst[d]: `TILE_M=256 TILE_N=128, TILE_K=64 BColMajor=1 workgroupSize=256 119.232197 TFlops` yoooo
20:44 karolherbst[d]: 16x8x32 int matrix
20:44 snowycoder[d]: karolherbst[d]: It shouldn't be long, the algorithm is there.
20:44 snowycoder[d]: I just need debugging and refactoring
20:44 karolherbst[d]: nice
20:45 mohamexiety[d]: karolherbst[d]: what's nvidia here?
20:45 karolherbst[d]: nah, that's nouveau
20:46 karolherbst[d]: ohh
20:46 karolherbst[d]: nvidia's perf
20:46 karolherbst[d]: around 160 or so
20:46 mohamexiety[d]: yeah I know I mean how do we compare
20:46 mohamexiety[d]: oh
20:46 karolherbst[d]: let me check
20:46 mohamexiety[d]: well the good news is you're getting closer to NV perf here than we are for games
20:46 karolherbst[d]: yeah.. they peak at around 160/170 with the int tests
20:47 karolherbst[d]: heh
20:47 karolherbst[d]: I should do the address calc opts
20:47 karolherbst[d]: that's probably the only real thing that I can still optimize
20:47 karolherbst[d]: like the shaders.... look good
20:47 karolherbst[d]: https://gist.githubusercontent.com/karolherbst/c99cb27f6560b293fd45b7e5880d098f/raw/18f0e525ecd2202718cf16be82184967a58e5ba2/gistfile1.txt
20:48 karolherbst[d]: you see the loop in the middle
20:48 karolherbst[d]: and the address calcs of the exit block should probably need some love
20:48 karolherbst[d]: but yeah...
20:48 karolherbst[d]: there isn't really _that_ much room for improvement besides the ugpr + gpr IO stuff
20:49 karolherbst[d]: like the ldsm + hmma section is pretty much perfect
20:49 karolherbst[d]: the cross block stuff will help a bit there as well
20:53 karolherbst[d]: ohh yeah.. let me try it out
21:06 karolherbst[d]: anybody else want to review the LDSM MR or should I just land it with Mary's review?
21:09 karolherbst[d]: this part of the loop annoys me a bit: https://gist.github.com/karolherbst/c99cb27f6560b293fd45b7e5880d098f#file-gistfile1-txt-L98-L144
21:11 gfxstrand[d]: karolherbst[d]: Let me skim quick
21:17 snowycoder[d]: I have a new weird bug, sddm doesn't work on nouveau, but if I start plasma wayland it works :/
21:17 snowycoder[d]: reading journalctl:
21:17 snowycoder[d]: sddm[1131]: Failed to read display number from pipe
21:17 snowycoder[d]: sddm[1131]: Display server stopping...
21:17 snowycoder[d]: sddm[1131]: Attempt 1 starting the Display server on vt 2 failed
21:17 snowycoder[d]: dmesg reports no kernel failures/warnings for nouveau
21:19 mohamexiety[d]: might kinda match some weird behavior I see where booting up I get some corruption and then plasma wayland starts after. on COSMIC the corruption persists and the display doesnt start
21:22 mohamexiety[d]: issue is it doesnt happen on fedora so I figured it was just one of my setups being weird and didnt think much of it
21:24 mohamexiety[d]: gfxstrand[d]: working on this btw. just hadnt really read cpp in ages and also hadnt looked at that part before so figuring stuff out
21:24 gfxstrand[d]: hehe. No worries. It's a little disorienting compared to some things but not too bad once you get used to it.
22:41 phomes_[d]: gfxstrand[d]: I think your kernel fixed all the timeout problems I have had with gnome-shell
22:46 gfxstrand[d]: Sweet!
22:46 gfxstrand[d]: airlied[d]: ^^
22:46 gfxstrand[d]: Just need to decide what two patches we want to land.
22:47 gfxstrand[d]: And I guess I could review the lock fix but I suspect skeggsb9778 would be much better for that.
23:04 gfxstrand[d]: phomes_[d]: Yeah, I've seen quite a few complaints of mystery timeouts with NVK+Zink. It would be great if this fixed them.
23:06 phomes_[d]: I will keep testing but so far it looks really promising
23:14 bradleyphilyeast: There is nothing functional that i have seen by karolherbst brain , as a matter of fact it was said by others you are scumbag yet you do not tend to believe it, my age is enough that i have seen fecalist alike before as a matter of fact your brain can be called more like a bug by humans who really have it. Manchild movement by hollywood disney and anal fuckers has been spotted indeed
23:14 bradleyphilyeast: by me, i remember when angry fuck crocodile first spread such "interesting" saliva i understood right away why one of that anal artist was stabbed in estonia and court did not pay for anything either, and this was noble stuff, there is no way such trash should be treated with it's vivid syndicate running idol dreams, ones other criminal in crime got treated similarly in cambodia. They
23:14 bradleyphilyeast: talk something but in reality just like when they are spotted in doing things all will repeat , similar line is for karolherbst, and your other parrot idiots here.
23:27 cubanismo[d]: gfxstrand[d]: It's concerning that membar or the acquire fix help things. I can't help but wonder if they're just fiddling the timing. We don't do anything like that.
23:27 cubanismo[d]: At least, not that I'm aware of.
23:27 cubanismo[d]: But then again, it's possible our interrupt handling path is just different and *that* masks the issue everywhere for us somehow.
23:29 cubanismo[d]: Something about the membar seems vaguely familiar though. I feel like I discussed this theoretical issue with some architecture folks at some point.
23:35 mohamexiety[d]: gfxstrand[d]: Not sure if you read this but this might refresh your memory or such :Thonk: cubanismo[d]
23:39 mhenning[d]: I cherry-picked "nouveau: fix disabling the nonstall irq due to storm code." and "nouveau: Membar before between semaphore writes and the interrupt" onto 6.16 and it seems to fix the Talos + Transfer queue issue that I had
23:43 mhenning[d]: cubanismo[d]: That is a bit odd. Is it possible that the coherency is affected by page table bits or something? Or is there a way to flush the cache from the interrupt handler?