00:19Lyude: how about qctl?
00:25Lyude: imirkin: https://android.googlesource.com/kernel/tegra/+/b445e5296764d18861a6450f6851f25b9ca59dee/drivers/video/tegra/host/gk20a/hw_gr_gk20a.h#21681
00:28imirkin: looks like a counter selector thing?
00:28Lyude: if no one knows that's cool since it's part of figuring this out anyway, but figured I'd give asking a shot
00:28imirkin: not 100% sure...
00:28Lyude: yeah, just trying to get a rnndb description so i can get a better idea of what's going on in these traces
00:29skeggsb: meh, just figuring out which regs to poke with what value on each different chipset is probably enough :P
00:30Lyude: hm, true
00:30skeggsb: you *could* try your luck with the nvidia question email... perhaps pointing out that they've released the info for a few tegra chipsets... buuut, it's probably futile
00:31Lyude: eh, saving that route as a last resort :P
00:31skeggsb: i'd do it early, then continue figuring it out yourself
00:32skeggsb: most likely, you'll get no answer.. second most likely, you'll eventually get an answer, after you've figured it out already, but the info will clarify stuff for you... most unlikely: an immediate response that answers everything :P
00:33skeggsb: *sometimes* the latter does happen, and it's always a very nice surprise
00:34Lyude: skeggsb: what is the email I should use to ask btw?
00:35imirkin: skeggsb: what are you hacking on these days? reclocking? or finalizing your mesa rewrite?
00:41dboyan_: imirkin: If I want to write a scheduling pass, do I just add a pass in e.g. optimizeSSA or I need to introduce a new method and invoke it in nv50_ir_generate_code?
00:43imirkin: dboyan_: there's already an empty one
00:43imirkin: not sure if it's a good API though
00:45dboyan_: ah, I didn't notice that
00:45imirkin: it's called right before allocate registers
00:45imirkin: and a couple other places....ewird
00:46imirkin: ah. those places just want the serials.
00:46imirkin: they don't want the order to actually change
00:46imirkin: so you could split that out
00:46imirkin: anyways... don't have to go with that API. just pointing it out.
00:46annadane: hi. on debian testing, i'm having this sporadic issue where my screen starts flickering. i've been "solving" the problem by restarting sddm. i wonder if it's perhaps due to missing firmware or otherwise i'm not sure how to fix it. https://paste.debian.net/970547/
00:47skeggsb: Lyude: iirc it's firstname.lastname@example.org, but might want to verify that yourself
00:47imirkin: annadane: the firmware thing is just for vdpau-accelerated decoding
00:47imirkin: skeggsb: it's nothing that simple. it's like gpu-open-documentation or something
00:48Lyude: cool, thank you!
00:48imirkin: don't expect an answer.
00:48Lyude: don't worry, I'm not :P
00:48skeggsb: imirkin: mesa (until something *else* interrupts me).. i got blocked with weird bugs from some changes, but making progress again since yesterday
00:49imirkin: skeggsb: cool. feel free to ask if you run into specific issues.
00:49skeggsb: then i'll finish the gf100 ddr3/gddr5 clock patches (almost done, just 1-2 more days of mind-numbing bit-bashing for ddr3, and cleanups).. then vulkan-related kernel changes
00:49annadane: i tried changing my refresh rate to 60 hz from auto but it just resets itself back to auto
00:49skeggsb: that's the current plan
00:50imirkin: annadane: pastebin 'xrandr --props' output
00:50skeggsb: imirkin: yeah, i might have some questions when i'm done with the "draft", but, i'm not 100% decided yet.. what i initially post will hopefully work fine, but there's some changes that can be done after it all to further improve things
00:50annadane: imirkin, https://paste.debian.net/970549/
00:50imirkin: skeggsb: sure
00:51annadane: i missed a line, sorry
00:52imirkin: annadane: there's only one refresh rate available at 1600x900 according to that...
00:52imirkin: although if you configure a different mode, switching to it should work...
00:53imirkin: skeggsb: i presume you're now rewriting basically the whole thing? are you keeping nv30 working?
00:54annadane: not sure which mode i want to configure to solve the issue
00:54skeggsb: it's not really a rewrite actually, current diff is +4533 -3286
00:54skeggsb: which, is not much considering
00:55airlied: skeggsb: did you figure out the wierd crashes?
00:55skeggsb: airlied: yeah, sorted that out yesterday
00:55skeggsb: as i suspected, stupidity with fencing :P
00:55imirkin: fencing is _really_ hard to get right :( it took me 3 attempts, and it was still imperfect.
00:56skeggsb: it doesn't help that the driver is horribly inconsistent
00:56skeggsb: most of the patches are fixing that tbh
00:56imirkin: which is part of what makes it hard to get right ;)
00:56imirkin: the main thing was that nouveau_bo_wait can trigger a kick at random
00:57imirkin: which makes it _really_ hard to reason about code
00:57skeggsb: there's loads of places where such things can happen, not just bo waits.. the fence stuff does it under numerous circumstances too
00:57skeggsb: there won't be when i'm done :P
00:57imirkin: ship it!
00:58skeggsb: oh, and to answer your question, yes i'm dealing with nv30 too
00:58skeggsb: i was hoping to ignore it (bad me), but, some of the common code that they all use needs changes that make it necessary to care
00:59imirkin: yeah, there are some sad nouveau_buffer interactions
00:59imirkin: like the whole migration logic
00:59imirkin: user buffers
00:59imirkin: so confusing.
01:00skeggsb: it'd be nice to try and make all that stuff saner/more understandable too, but, that's beyond the scope of this project :P
01:00imirkin: big enough as it is? :)
01:00skeggsb: complicated enough as it is :P
09:32karolherbst: skeggsb: do you have that nouveau panic on boot with 4.9+4.11 on the plan? because this will hit more and more users over the next days
09:33karolherbst: because if not, then I would simply ask airlied to revert the one patch of you on 4.9, 4.11 and 4.12
09:37karolherbst: or greg or whoever does that
12:37pmoreau: skeggsb: If you need testers for your works, feel free to ping me. (Though I can only test on a GK107 until end of August.)
12:49pmoreau: imirkin: How about having as boards: "Kernel", "DDX", "Compiler", "OpenGL", "Vulkan", "OpenCL"? And "OpenGL", "OpenCL" and "Vulkan" could each contain different lists for "Bugs", "Missing Extensions", "Missing Features", "CTS".
12:55karolherbst_: pmoreau: please no more boards :/
12:55karolherbst_: one is plenty enough
12:58karolherbst_: we can also add fake cards containing only "==== KEPLER =====" to organize a bit inside the lists or just keep them sorted or maybe don't. But I would prefer one board with many cards over many boards with 5 cards each
12:58imirkin: pmoreau: imho that's way too separated
12:58karolherbst_: we could seperate kernel and userspace and maybe have new boards if one topic gets too big
13:14pmoreau: imirkin: How would you split it then? I am not sure why CTS would be a board on its own, rather than a list alongside the other OpenGL stuff.
13:16pmoreau: karolherbst_: I don’t think they would be 5 cards each, except for the DDX one, and Vulkan and OpenCL for now. Kernel vs userspace could be a way to split things.
13:17karolherbst: pmoreau: you know what I mean. It doesn't make sense to create new boards if we won't be able to have more than one list with a few cards
13:17karolherbst: sure we can have a more detailed seperation and all that
13:17karolherbst: but more important question
13:17karolherbst: is the current situation that bad?
13:20pmoreau: I know what you mean. Though OpenGL is more than just a single list with a few cards, and the kernel as well. And OpenCL could be quite a few cards as well, as there are many things to implement.
13:21karolherbst: yeah sure
13:21pmoreau: The current situation is not too bad. But if we start adding ~5 more columns (OpenCL missing features + CTS, OpenGL CTS, Vulkan missing features + CTS), it is starting to be a bit too much imho.
13:21karolherbst: we could have one board focusing on the CTS stuff
13:21karolherbst: this would make sense actually
13:22karolherbst: so that we can always see what is actually missing for conformance
13:22pmoreau: Why not group the CTS with the API it is related to?
13:23karolherbst: because there is no hard reason to do so. We can move cards across boards at any time. And having a CTS board has a clear focus on conformance
13:24karolherbst: I also don't know how much code sharing we would get between OpenGL/OpenCL/Vulkan in the end
13:24pmoreau: I can see the two sides: having a clear view on an API support (bugs, missing features, conformance), or having a clear view on conformance support for all APIs. Though I tend to prefer the former one over the latter one.
13:25pmoreau: The compiler is shared, I would say. Not sure for the remaining of the code.
13:25karolherbst: yeah, so we end up with compiler bug cards
13:25karolherbst: where to put it?
13:26karolherbst: is it a bug card on the OpenGL board, because the bug in the compiler was found while doing OpenGL stuff allthough it also affects OpenCL and vulkan?
13:26pmoreau: Well, that is why I was suggesting a "compiler" board, though it might not be that full.
13:26karolherbst: or we have a mesa board
13:26pmoreau: We already have a compiler list
13:27karolherbst: then we could have: kernel/X/mesa
13:27pmoreau: Or kernel, X/Wayland/Mir/Whatever, Mesa
13:27karolherbst: and if we do vulkan outside mesa, we need a shared compiler library anyway... maybe it makes sense to make it based on the actualy projects?
13:28pmoreau: We can always create more boards later if needed, and move lists around.
13:35pmoreau: So, "Kernel", "Window Server", "Mesa"? With window server encompassing also the DDX. Not sure where video acceleration would go… I guess some kernel support is needed, but the remaining happens in userspace?
13:38karolherbst: I like this
13:39karolherbst: you could make a own video acceleration thing, but it is mainly inside mesa
13:39karolherbst: so I would keep it there for now
13:40karolherbst: we also have random stuff like Documentation and reing tasks and stuff
13:40karolherbst: no idea how to fit those things
13:40pmoreau: Well, each section could have its own doc/re'ing list, if needed
13:42pmoreau: If we are talking about documenting the code. Otherwise, if it is writing some extensive documentation in envytools or on the wiki, I don’t know.
13:42karolherbst: we could have an envytools board as well, but I think there isn't really much going there anyway besides randomly documenting registers and such
13:43karolherbst: both actually I think
13:43karolherbst: we could have documentation cards listing what parts should be documented in the projects, but also having external documentation for explaining how the hardware works
13:43pmoreau: Yeah… And for envytools, we can also create issues on GitHub and use the board features there. Not too different from having it on Trello though.
13:43karolherbst: ohh true
13:44karolherbst: I think we can link it
13:44karolherbst: trello has some github integration things
13:44karolherbst: under "Power-Ups"
13:44karolherbst: "Attach branches, commits, issues, and pull requests to cards. To do this, enable the Power-Up and click the “GitHub” button on a card back"
13:46karolherbst: there is also a github autosync thing
13:46karolherbst: let me try that out
13:47karolherbst: uhhh business license stuff crap
13:47pmoreau: I created an envytools board, and enabled GitHub powerup have fun! :-)
13:48karolherbst: yeah... we can only have one power up per board... how annoying
13:49pmoreau: Yeah… they switched from having most of the powerups reserved to business license but being able to have as many free ones as you liked, to being able to use any of the powerups, but only a single one if not on a business license.
13:52karolherbst: I've disabled the github power-up and enabled that github autosync thing. I think this one is a bit more usefull
13:52karolherbst: check the envytools board
13:53karolherbst: but now my user is commenting that stuff :D
13:53karolherbst: but we can change that later
13:54karolherbst: pmoreau: this issue was created by trello: https://github.com/envytools/envytools/issues/94
13:55karolherbst: but now idea what benefit this gives us
13:56pmoreau: Hum… having a single place to check all the tasks available?
13:57pmoreau: karolherbst: Feel free to create the other boards and move things around. I’ll be back later.
14:02karolherbst: I am at work :D
14:02imirkin_: pmoreau: i wanted separate lists for each generation
14:37ccaione: guys, any chance to have NV138/GP108 supported? I have the hardware to provide traces if necessary :)
14:51karolherbst: ccaione: do you have them uploaded somewhere?
14:51ccaione: karolherbst: as usual, not my machine. What do you need exactly?
14:51karolherbst: traces and vbios
14:52ccaione: traces of what in particular?
14:52karolherbst: nvidia kernel module, mmiotrace
14:55ccaione: alright, I'll see what I can do
15:06imirkin_: ccaione: don't have firmware for it
15:06imirkin_: ccaione: so can't support it with acceleration
15:07imirkin_: ccaione: i'm guessing that getting modesetting going on it would be pretty easy though
15:07imirkin_: just copy the GP107 section and hope for the best.
15:36ccaione: imirkin_: we actually tried the GP107 config and it didn't work
15:36imirkin_: ccaione: define 'not work'
15:37imirkin_: you wouldn't get acceleration
15:38ccaione: imirkin_: AFAICT (remote machine) `nouveau 0000:01:00.0: unknown chipset (138000a1)` `nouveau: probe of 0000:01:00.0 failed with error -12`
15:38ccaione: also using the GP107 config
15:38imirkin_: so ... you didn't sufficiently copy the config ;)
15:38imirkin_: or copied it too much
15:38imirkin_: depending on what you did specifically
15:39ccaione: I guess the guy did something like `case 0x138: device->chip = &nv137_chipset; break;`
15:39imirkin_: then it shouldn't have hit that default case and said "unknown chipset"
15:39ccaione: let me see what he did exactly :D
15:40imirkin_: so perhaps didn't update the module in the initrd? or something else weird that distros like to do?
16:19pmoreau: imirkin_: So, have lists for each gen, and mix in those, bugs, missing features, conformance fails, for example?
16:23imirkin_: for CTS it's all related though, i wanted to have it in one place
17:44martsteiner: imirkin: some ancient times i citied/quoted/referred one paper, that did register reuse trick, with some folds like 4 improvements on the perf with their trick, i do not remember the content
17:45martsteiner: but it appears there is an effecient way to make behavioral changes when spilling is starting to hit the perf too, my calcs though show maniacal boost in such cases though
17:49martsteiner: that has been my final theory , i do not those days anymore think extensively about drivers enhancements, and will soon entirely quit the scenes
17:50martsteiner: especially since i've known for longer period most of the possible things to do, have not got especially much to learn anymore
17:55martsteiner: last time i remember imirkin was seeming to start to get it, but mwk i have no hopes about, such clueless person would me milked forever by me, i just do not have anymore interest to do that, entirely clueless man is he
17:55pmoreau: imirkin_: Is it possible to store a single byte, or does it have to be at least 4 bytes (which are aligned to 0x4 boundaries)?
17:56imirkin_: pmoreau: i believe it's possible to store a single byte
17:56imirkin_: pmoreau: https://github.com/envytools/envytools/blob/master/envydis/gf100.c#L516
17:57imirkin_: the load/store ops (for gmem, maybe smem too) can have types, and some of those types are < 32-bit
17:57pmoreau: Hum… then I need to found out how `st u8 # g[$r0d+0x10] %r111`, as outputted in the final state of the program, results in `st u8 wb g[$r63d+0x3f] 0x0` when feeding the binary to envydis
17:58imirkin_: check with nvdisasm
17:58imirkin_: perhaps the decoding is wrong
17:58imirkin_: i think you need the "wb" in there to feed to envyas
17:59imirkin_: oh, and there's a 100% chance that the emitter in codegen gets it wrong :)
17:59pmoreau: I’ll try nvdisasm
17:59imirkin_: coz it's never been tested
18:00pmoreau: But there is something wrong going on, as I get "write fault at 0000000000 engine 00 [GR] client 03 [GPC0/L1_1] reason 02 [PTE] on channel 2 [007f9a3000 g_arg_rw_struct]". Though it could very well be me doing crazy stuff in the first place :-D
18:00imirkin_: no, i'm sure it's just emitted incorrectly
18:01pmoreau: Yeah, seems like it: input: https://hastebin.com/cuxuvevomu.pl -> output: https://hastebin.com/elotihiyuc.go
18:02karolherbst: anybody looked into the shader trap handler at some point? I think I want to work on that next
18:03karolherbst: anything inside rnndb already regarding this or just some wip code in mesa?
18:03imirkin_: do you have a fd.o account?
18:04karolherbst: I think I have
18:04karolherbst: not quite sure
18:04karolherbst: ohh wait, no
18:04imirkin_: can you look in ~imirkin/public_html ?
18:04karolherbst: just for the wiki
18:04imirkin_: i don't remember the name of the file
18:04imirkin_: and i don't have a link
18:05imirkin_: there ya go
18:05imirkin_: note that this is for fermi. i think kepler's different.
18:05imirkin_: also i remember someone from that japanese group asking about it
18:05imirkin_: and developing *something*
18:06karolherbst: what japanese group?
18:06imirkin_: shinpei's group? something like that
18:06imirkin_: i don't know if anything came of it
18:07imirkin_: someone was in here, 1-2 years ago, asking questions about it
18:07imirkin_: and iirc did implement some part of something
18:08imirkin_: oh, who is still here apparently
18:08karolherbst: the main thing I want to have is like some kind of error message saying: in shader X there was a trap at instruction Y
18:09imirkin_: yeah, i know that'd be nice, but that's not how it works ;)
18:09karolherbst: sad :/
18:09imirkin_: on nv50, that's indeed how it works
18:09imirkin_: on nvc0+, you have to have a trap handler which returns information somehow
18:09karolherbst: yeah well
18:09karolherbst: that would be fine as well
18:09imirkin_: read over mwk's doc, that has the info
18:09karolherbst: that trap handler is normal nvc0 code, right?
18:09imirkin_: it's just a shader
18:11karolherbst: ohh the trap handler gets added to the normal code
18:11karolherbst: or not
18:14karolherbst: mhh it kind of sounds easy enough though, basically I just need to add the code to all shaders when starting an OpenGL application in debug mode, setup the trap handler, store the information I want to have somewhere, break, read out the state and continue with the shader or abort, basically
18:15karolherbst: sounds like enough work though
18:15imirkin_: yeah, the firmware calls are important though
18:15imirkin_: because the thing is context-switched
18:15karolherbst: why does it have to be firmware though?
18:15imirkin_: i.e. for setting the trap handler
18:15karolherbst: can't it be set via pushbufs like all the other stuff?
18:15imirkin_: well ... there's no other way to do it ;)
18:15imirkin_: pushbufs call methods. there is no method for setting the trap handler
18:16imirkin_: so you have to write a firmware method to do it.
18:16karolherbst: ahhhh I see
18:16imirkin_: and it makes sense to do it in a generic way, i.e. write value X to register Y
18:16imirkin_: and obviously we can't do that on GM200+ =/
18:16karolherbst: well and it doesn't make sense at the same time ;)
18:16imirkin_: although the signed firmware may have something useful in it already, dunno
18:16karolherbst: because bascially everybody could write anything
18:17imirkin_: nvidia firmware has it ;)
18:17imirkin_: anyways, a dedicated method would be fine too
18:17imirkin_: iirc that's the bit that yusukesuzuki already did though
18:17imirkin_: look in that tree i pointed you at
18:17karolherbst: I don't really understand how that channel stuff works, but I guess it ain't that complicated
18:18imirkin_: oh ok. so he did it via SW methods. that works too.
18:18imirkin_: (or maybe that's even the right way and i'm confusing myself wrt the firmware thing?)
18:18karolherbst: firmware is written inside that document
18:19karolherbst: "The simplest way to ensure it is to write the MMIO registers via fÃ‚Âµc firmware calls."
18:20karolherbst: 0x610 and 0x658 are the values nvidia also uses?
18:27karolherbst: no out of range register access in a trap handler, how handy
19:39karolherbst: I hate it when I run traces I made under nouveau and the window stays black with nvidia
19:39imirkin_: buffer storage?
19:39karolherbst: it works on intel though
19:40imirkin_: so... "no" :)
19:40karolherbst: no idea what the cause it
19:40karolherbst: there are no errors
19:40imirkin_: is a black window the correct output?
19:40karolherbst: but maybe the rendering issue isn't here with intel
19:40karolherbst: ahh, it isn't
19:44karolherbst: intel: https://i.imgur.com/DpUHh1s.png
19:44karolherbst: nouveau: https://i.imgur.com/k2frhgi.png
19:44karolherbst: it's the same frame, but still so different...
19:45karolherbst: I care about the black areas near the center though
19:45karolherbst: because those black things flicker in an annoying way
19:47karolherbst: imirkin_: any pointers until I check which draw call causes those black thingies?
19:48imirkin_: some depth/stencil thing
19:54imirkin_: karolherbst: if you have intel + nouveau, you can use tracediff
19:54imirkin_: to find the specific draw call
19:54karolherbst: is it part of apitrace?
19:55imirkin_: it's a script in apitrace
19:55karolherbst: but the both images are already too different
19:55imirkin_: oh, the overall rendering is too different
19:55karolherbst: see the cloth thing
19:55karolherbst: the angle is different
19:56karolherbst: no idea where this comes from, but maybe some randomness in the shader code?
19:56imirkin_: stupid sin/cos accuracy =/
19:56karolherbst: no clue
19:56karolherbst: but it's kind of related to the shadows of that tree
19:56karolherbst: or something like this
19:57karolherbst: maybe some alpha issue?
19:57karolherbst: doesn't look like it
19:57karolherbst: I just try to find the right call
20:02karolherbst: that entire tree is like drawn in 50 calls
20:08karolherbst: https://i.imgur.com/hy5g8Ob.png, next call: https://i.imgur.com/Yzk4y9U.png
20:09karolherbst: should check intel as well oin the last one
20:11karolherbst: it just hit me, with nouveau vs intel, I can have two qapitrace instances open at the same time :O
20:11imirkin_: thats what tracediff does
20:12imirkin_: and then memcmp's in memory
20:12karolherbst: last one with intel: https://i.imgur.com/akooLj9.png
20:13imirkin_: my guess is that tracediff will be instructional
20:14karolherbst: sure, but my fear is it points to other differences I don't care about as well
20:16karolherbst: ohh, I can select calls to compare, nice
20:16karolherbst: ohhh wait
20:16karolherbst: imirkin_: tracediff compares two traces, right?
20:16karolherbst: thing is, I have only one I run under intel and nouveau
20:17karolherbst: ohh you meant retracediff, don't you?
20:19karolherbst: or not... odd
20:20imirkin_: retracediff sounds right
20:20karolherbst: it only diffs the final frame
20:21karolherbst: ohhh wait
20:21karolherbst: no, I am wrong I think
20:21karolherbst: ahhh, now it is better
20:22karolherbst: it's just using intel twice
20:22karolherbst: ./retracediff.py --ref-driver=nouveau --src-driver=i965 /var/tmp/TombRaider.flickering.trace -S 3973800-3973900
20:22karolherbst: ohh, I should use the -env= thing
20:25Lyude: mupuf: i don't suppose we would potentially want a kernel parameter to disable/enable blcg would we?
20:57karolherbst: imirkin_: found the issue: compiler optimizations
20:57imirkin_: we optimize too much? :)
20:57karolherbst: everything is fine with NV50_PROG_OPTIMIZE=1
20:57imirkin_: the 2's are what... memoryopt and something else?
20:57imirkin_: can you try to figure out which opt is doing it?
20:57imirkin_: and then generate a NV50_PROG_DEBUG=1 log for the 2
20:57imirkin_: and try to look at the diffs?
20:58karolherbst: I can also just fix the opt
20:58imirkin_: and depending on the opt, there might be ways to half-do the opt
20:58karolherbst: should be easy
20:58imirkin_: well ... heh. first you have to figure out the error ;)
20:58imirkin_: sometimes the error isn't in that opt
20:58imirkin_: but a downstream opt
20:58karolherbst: I am aware
20:58imirkin_: that breaks as a result of "that" opt doing something
20:58karolherbst: tracking down the difference in the sahder should be easy, because I know which shader is doing it :)
20:59imirkin_: oh good
20:59imirkin_: grab the tgsi for it
20:59karolherbst: and knowing the shader helps a lot
21:01karolherbst: glretrace should disable vsync by default...
21:05karolherbst: I am sure it is algebraic opt
21:06karolherbst: because everything else would be too easy, wouldn't it
21:07karolherbst: yep, algebraicopt it is
21:17mupuf: Lyude: we do want that!
21:17mupuf: we have to have a way to disable anything related to power management
21:17Lyude: ah, guess i guessed right :)
21:17mupuf: something like a bitfield would be good
21:17mupuf: so we don't have to introduce too many options
21:18Lyude: started seperating the register init values for normal init + cg init so we can turn it on or off by just switching the gr_init mmio structs we use
21:18Lyude: so many hex values omg
21:19karolherbst: imirkin_: the shader compiles to this: https://gist.github.com/karolherbst/164835ebeec68508e99cab5c05a46d2b :/
21:19karolherbst: I mean when running with piglit or shader-db
21:19karolherbst: allthough it's a fragment+vertex shader and not that small both
21:20imirkin_: that's .. odd.
21:20imirkin_: yeah, you have to grab tgsi out of the trace
21:20imirkin_: you can't use shader-db
21:20karolherbst: why not?
21:20imirkin_: coz you're messing it up :p
21:20imirkin_: you have to put the whole pipeline in
21:20imirkin_: not just that one shader
21:21karolherbst: uhm, what do you mean?
21:21imirkin_: you pasted a tess ctrl shader
21:21karolherbst: I have the fragment + vertex shader
21:21imirkin_: there have to be at least frag, vertex, and tess eval shader
21:21imirkin_: unless that's the no-op tess ctrl system shader
21:21karolherbst: yes, it is
21:21imirkin_: in which case you've just pasted the wrong shader
21:22karolherbst: no I didn't
21:22imirkin_: well, your shader_test file is bogus
21:22imirkin_: it's missing its most critical section
21:22imirkin_: link success
21:22karolherbst: not needed
21:23imirkin_: ok, then everything's working perfectly fine.
21:23imirkin_: and i'm the one that's wrong.
21:23karolherbst: it's for shader-db, there you don't need any test section in the generated shader_test files
21:23imirkin_: for the run program, sure
21:23imirkin_: but you won't get the proper output from that
21:23imirkin_: use shader_runner
21:23karolherbst: I tried both
21:24imirkin_: ST_DEBUG=precompile maybe?
21:24imirkin_: [since you don't have a draw]
21:24imirkin_: ok, well no output = you're doing something wrong.
21:24imirkin_: you need to figure out what that is.
21:26karolherbst: imirkin_: I removed the fragment shader, now I get output
21:30karolherbst: nice, only a tiny difference
21:30karolherbst: and add+mul=mad going wrong? odd
21:35imirkin_: fmad or imad?
21:35imirkin_: very odd.
21:35imirkin_: does the mul have flags on it?
21:35imirkin_: like * x^2
21:35imirkin_: or whatever
21:36karolherbst: at least not visually
21:36karolherbst: it's super boring
21:37karolherbst: imirkin_: maybe rounding accuracy?
21:37imirkin_: looks like it should work =]
21:37imirkin_: gotta be something else.
21:37karolherbst: RA maybe
21:38karolherbst: but you know
21:38karolherbst: I only disabled that tryADDToMADOrSAD and every opt _after_ algebraicopt and then it magically works
21:39karolherbst: when I enable tryADDToMADOrSAD again, it is broken (with all opts after disabled)
21:40karolherbst: maybe I check the emited code
21:41karolherbst: imirkin_: it's one of the games which uses fma as well, so maybe they care about precision here? dunno
21:41karolherbst: but even then
21:42mupuf: Lyude: sounds good
21:42mupuf: and yes, there are a shit-ton of them!
21:42mupuf: at least, they should mostly be tagged
21:42mupuf: thanks to the idiot who spent forever doing this :s
21:42Lyude: mupuf: hehe, i would have been going way more slowly on this if they weren't
21:43karolherbst: mupuf: many thanks for that :p
21:43Lyude: even with the labels it's still taking a bit for me to make sure i'm keeping my place correctly in the register dumps and not putting registers in the wrong order by accident
21:43karolherbst: imirkin_: fun, the output looks different for shader_runner and shader-db :/
21:43mupuf:first used the nvgpu code for identifying regs, then I checked the bitfields and mapped them to other desktop gpus
21:44mupuf: then I moved backward in time
21:44mupuf: until fermi IIRC
21:44mupuf: the bitfields changed but sometimes the addresses remained
21:45karolherbst: something is odd with that mul
21:46karolherbst: or that add
21:51Lyude: btw, do you have any idea if all of the registers labeled PM_MUX (for instance, PGRAPH.GPC_BROADCAST.UNKF00.PM_MUX) are related to the clockgating stuff or not? it looks like we already write to them in nouveau
22:05karolherbst: imirkin_: okay, it is something about f32 add+mul->mad, no idea what exactly, but exactly this part of the passes causes that issue
22:05karolherbst: u32 is fine
22:20RSpliet: karolherbst: which GPU? And what are the final shaders like after RA?
22:20Lyude: ...TFB_UNFUCKUP_OFFSET_QUERIES? that sounds like a fun register name
22:21RSpliet: Lyude: Not sure if that one came from official 'documentation'...
22:21Lyude: hehe, i wouldn't expect it to
22:21RSpliet: but it describes perfectly what the bit does I believe
22:22karolherbst: RSpliet: gk106
22:23karolherbst: RSpliet: are you planning to stay up until 6am?
22:26karolherbst: RSpliet: and after RA it still makes perfectly sense, that's the odd thing about it
23:07karolherbst: imirkin_, RSpliet: I printed out every add and mul in that opt via gdb, but there doesn't seems to be something wrong: https://gist.githubusercontent.com/karolherbst/0cd11510774a1efd1dd0b8e3c4afa2c9/raw/94a5615c36f10b4c007107a5501dd1955bf997c7/gistfile1.txt
23:08karolherbst: guess I will look at RA and the emiter tomorrow if I don't spot anything