00:04imirkin: fincs: sound like a reasonable test for the viewport stuff to basically have an "inverse" matrix multiply the position that would be output, and then have a correponding viewport swizzle?
00:04imirkin: and in the end it should be a no-op
00:05imirkin: for the viewport swizzle stuff, taht is
00:05fincs: I guess you could do that
00:05fincs: However I think it's a bit overcomplicated
00:06imirkin: is there an easier way to test it?
00:06fincs: Just swizzle manually gl_Position in the vsh instead?
00:06imirkin: well, the matrix is the way to do that
00:06imirkin: that way i just have one shader with a configurable matrix
00:06fincs: Ah
00:06imirkin: vs 8 shaders that do it "by hand"
00:06imirkin: right?
00:06fincs: Well yes
00:07imirkin: i find viewports to be one of the less understandable aspects of all this stuff, unfortunately
00:07imirkin: should be interesting to come up with a decent test...
00:07fincs: If you want a shader that will work with any possible combination of swizzles then yeah...
00:08fincs: The viewport swizzle stuff is meant to be used alongside gl_ViewportMask to autogenerate all the 6 faces of a cube
00:08imirkin: ("all this stuff" = GL)
00:08imirkin: yeah whatever
00:08imirkin: functional tests have nothing to do with how it's MEANT to be used :)
00:08fincs: Yeah but :p
00:09imirkin: anyways, dinner shortly, but might be able to finish up NV_viewport_swizzle and NV_viewport_array2 tonight
00:09fincs: :)
00:09imirkin: (the former just needs tests ... the latter is "done", but needs some actual checking, followed by tests)
00:10fincs: I'm excited to see support for these missing NV features being added :)
00:11imirkin: the passthrough shader will require some thought
00:11fincs: True
00:11imirkin: if there are any other useful NV_* that you're aware of, esp easy-to-do ones, let me know
00:11imirkin: (or EXT_*, i'm not picky)
00:13fincs: Fragment shader interlock :p
00:13fincs: (probably too cursed though)
00:14fincs: (ARB_fragment_shader_interlock, dunno if the NV_ version has more stuff in it)
00:15fincs: Looks identical at first glance (other than ARB<->NV suffixes)
00:17fincs: 0x1224 has the fragment shader interlock layout: 0=disabled 1=pixel_interlock_ordered 2=pixel_interlock_unordered 3=sample_interlock_ordered 4=sample_interlock_unordered (code forces 3/4 setting if per-sample invocations are enabled)
00:18fincs: The SASS generated for the beginInvocationInterlock/endInvocationInterlock calls looks absolutely disgusting, I still need to reverse that
00:20fincs: Also (unrelated) - it looks like there is a special fragment shader mode when the output is a constant color, and nvidia detects/uses it
00:25fincs: I tried playing with those regs myself but I was unable to make the output change at all; not even if the output is non-constant
01:28imirkin: hm ok
01:28imirkin: yeah, the interlock stuff requires ... work
01:28imirkin: it'd be easy to detect constant color output
01:29imirkin: is that really a common use-case?
01:29imirkin: i'd think it's better not to have that opt to get proper coverage via tests (which tend to output a fixed color)
01:29fincs: Disclaimer: I don't really know what this constant color thing is really doing, and I haven't been able to see any visible changes by playing around with the regs
01:29fincs: Compiler generates proper fsh anyway in that case, and it's bound
01:30fincs: The regs are used "in addition", not "replacing"
01:36imirkin: hm, need to check where that gl_ViewportMask thing is enabled
01:36imirkin: coz otherwise it writes off the end of the shader header...
01:36imirkin: good thing i still have the original traces
01:36fincs: Hm wdym?
01:37imirkin: the export enable bit in the shader header
01:37imirkin: it'd be at 0x50 for 0x3a0
01:37imirkin: but the shader header is 0x50, so ...
01:37fincs: Heh
01:38imirkin: either it's longer, or the bit is elsewhere
01:38fincs: What's the offset for writing to gl_ViewportMask?
01:38fincs: (in e.g. AST)
01:39fincs: Ah, 0x3a0
01:39fincs: Hmm let me think
01:39imirkin: 0x01000000 19 = 0x1000000
01:39imirkin: i guess it's there =]
01:41imirkin: which leads one to ask ... doesn't that conflict with texcoord output enablement?
01:41imirkin: there should be 8 texcoords, 4 components each...
01:42fincs: Don't the bits in the shader header correspond directly with hardware attribute indices?
01:42imirkin: oh no, i'm doing math wrong
01:42imirkin: there's an extra 0x40 in there
01:43imirkin: yeah, all good
01:43imirkin: it fits.
01:43fincs: ( ͡° ͜ʖ ͡°)
01:43imirkin: no funny business required.
01:44fincs: If my calculations are correct, it would be bit0 of "OmapReserved" (i.e. last field)
01:44fincs: (in https://nvidia.github.io/open-gpu-doc/Shader-Program-Header/Shader-Program-Header.html)
01:44imirkin: your claculations are off
01:45imirkin: it's bit 24 of the last word
01:45fincs: Yeah
01:45fincs: Same thing
01:45fincs: "OmapReserved" is the last byte :p
01:45imirkin: oh. right. the remainder is texcoord
01:45imirkin: yea
01:45imirkin: btw, i'm just noticing this -- bit 25 of the first word is also set on this vertex shader
01:46imirkin: which exports a layer
01:46imirkin: and/or viewport mask
01:46fincs: Hmmm
01:46imirkin: could be unrelated
01:46fincs: But that's a different bit from the passthrough gs thing
01:46imirkin: yes
01:47fincs: I wonder if that has anything to do with the dual vertex shader shit
01:48imirkin: the VP_A vs VP_B stuff?
01:48fincs: Yes
01:48imirkin: dunno. it's not set on all vertex shaders, i think
01:48imirkin: i would have noticed by now
01:48fincs: I still don't know what makes the compiler generate VP_A/VP_B
01:48imirkin: i've never seen it use VP_A
01:48fincs: Apparently some Switch games do
01:49imirkin: i have no idea what it's for, or how it works
01:49fincs: I found a patent for it
01:50fincs: https://patents.google.com/patent/US8564616B1/en
02:06imirkin: fincs: if i do gl_ViewportMask[0] = 0xf
02:07imirkin: should i expect 4 primitives to be rasterized separately with separate gl_ViewportIndex's
02:07imirkin: or does it only rasterize it once?
02:12imirkin: hrmph. well, looks like setting gl_ViewportMask[0] = 0xf has no effect...
02:13imirkin: ah no, i failed at setting that high bit
02:13fincs: Now comes the debug phase :p
02:13imirkin: but setting the bit does nothing.
02:14imirkin: well, i'm not even 100% sure that what i'm doing is meant to work
02:14fincs: Did you actually set up 4 viewports?
02:14imirkin: https://hastebin.com/uwariheqor.cs
02:14imirkin: ("viewport indexed" calls glViewportIndexedfv)
02:15fincs: Mmm
02:15imirkin: and the first one correctly shows up as red in a 50x50 box
02:16imirkin: but since i'm unfamiliar with the ext, was hoping to get verification that it at least *looks* like it should work
02:16imirkin: before i go and do some more poking
02:16fincs: What happens if you write to gl_ViewportIndex instead of gl_ViewportMask?
02:18imirkin: it works
02:18imirkin: i mean, e.g. i just set to 1, and i got a green square
02:18fincs: What happens if you do gl_ViewportMask[0] = 0x8?
02:18imirkin: blank
02:18imirkin: hm, i guess the other viewports are somehow not enabled?
02:19imirkin: let me do some checking....
02:19fincs: "If gl_ViewportMask[] is written by the final VTG stage, then gl_ViewportIndex in the fragment stage will have the index of the viewport that was used in generating that fragment."
02:20imirkin: yes
02:20fincs: So it should have worked...
02:20imirkin: =]
02:20imirkin: ok
02:20fincs: Unless something in piglit is screwing you up
02:21imirkin: well the fact that writing 0x1 works but 0x2 doesn't to viewport mask suggests there's some enable missing
02:21imirkin: do you have notes on what 0x1138 does?
02:22fincs: Huh, when is that register written?
02:23imirkin: search me
02:23imirkin: but it's in the trace
02:23fincs: "search me"
02:23imirkin: aka "i don't know"
02:23imirkin: https://hastebin.com/savoqeqegu.http
02:24fincs: I don't seem to have encountered that reg before
02:24fincs: What card are you testing on?
02:24imirkin: (i.e. "you could search me and you wouldn't find it")
02:24imirkin: i'm testing on GP108, but the trace is from some GM20x
02:25fincs: Hmmm
02:25imirkin: someone came in here a couple years ago and did some traces
02:25imirkin: i don't know if they were planning on doing the work to implement in mesa, but if they were, it certainly hasn't materialized
02:26imirkin: either way, writing 0x1138 = 1 didn't help
02:26fincs: 0x1138 (or 0x44E as I'd call it) doesn't seem to be used in nvn's main function for configuring shaders
02:27fincs: Hmm are you sure you enabled the viewportmask output correctly in the imap?
02:27fincs: Err omap
02:27fincs: Can you force word[19] |= 1<<24;
02:28imirkin: https://hastebin.com/izaweperem.bash
02:29fincs: Hmm let me try something
02:29imirkin: which decodes as https://hastebin.com/egiqumakip.php
02:33fincs: Other than Nvidia not using ALD.128/AST.128, their SASS is basically the same
02:33fincs: Let me look at the header
02:35imirkin: yeah, it's probably better if we didn't. but it works fine.
02:35imirkin: setting the position isn't the bit that's not working
02:35imirkin: making the shader header 100% identical to what's in the trace doesn't help either
02:36fincs: Nvidia's shader header: https://hastebin.com/enuwitixut
02:36fincs: word0 looks different
02:37imirkin: interesting. they also have that bit 25 set
02:37imirkin: i already tried 40461. let's try 60461
02:38imirkin: https://hastebin.com/tutubulize.bash - still no go.
02:38imirkin: le me try with a GS in there
02:38imirkin: see if the issue is something dumb
02:40imirkin: nope, same problem from GS
02:41fincs: Hmm
02:41imirkin: i can push what i have if you're interested in playing with it
02:41fincs: Yeah I could look at it, but tomorrow (currently it's almost 5am for me)
02:41imirkin: there's also the occasional problem that we don't init something correctly in the kernel
02:42imirkin: k, i'll try to clean things up a bit and push
02:42fincs: Doubt that's the issue
02:42imirkin: it's rarely the issue.
02:42imirkin: xfb resume was messed on fermi for a while coz of that.
02:44fincs: There are 16 viewports, right?
02:45imirkin: yes
02:45fincs: Tomorrow I'll try testing with my setup
02:45fincs: And see if it really is an init/reg problem or if there's something else going on
02:46fincs: But for now
02:46fincs:-> bed
02:48imirkin: nite
03:58imirkin: fincs: https://github.com/imirkin/mesa/commits/viewport_array2 (the branch also has the swizzle work)
04:54imirkin: fincs: looks like i'm missing something silly. writing gl_ViewportIndex = 1; gl_ViewportMask = 0xf *does* broadcast to all 4
04:55imirkin: so there's something silly that is keyed on viewport index that also needs to accoutn for the mask
05:02imirkin: the sampler shader that i have writes 0x1 to viewport mask. perhaps if something else is written it also enables the viewport index otuput (even if it's not actually written)
10:51fincs: Interesting, that makes me suspect Imap/Omap fuckery
11:01fincs: fsh has ImapViewportIndex set as expected
11:11fincs: Looks like nvidia sets that weirdo 0x02000000 bit for all vshs
11:11fincs: I tested with a simple gl_Position-writing shader, so that's most certainly unrelated
11:12fincs: (maybe that has something to do with the multi-vsh thing)
11:13fincs: Also, nvn seems to only handle that layer_viewport_relative reg during GSH setup, nowhere else
11:13fincs: (not sure if that's intentional, or if I misunderstood what this reg is)
11:28fincs: (Also, I misreported the values of the fragment shader interlock layout register, there's some bit magic I missed)
11:33HdkR: fincs: Why are you touching fragment shader interlock things?
11:33fincs: I was asked for a list of other fun features that aren't in nouveau :p
11:35HdkR: oof FSI is one of those things that only non-real people use :P
11:35fincs: Yeah
11:35fincs: I saw what the nvidia compiler generates for the intrinsics and it is really disgusting
11:36HdkR: Was one of those things that make OIT fast but only Intel ever made it fast
11:37fincs: There's like, a mini support lib appended to the shader and it does function calls... yuck
11:39HdkR: It's simple enough to beginInvocation + endInvocation right after each other to see the pain train
11:40fincs: CAL everywhere :p
11:41fincs: Anyway, we were trying to figure out if using gl_ViewportMask needs some special enable we're missing
11:42fincs: According to imirkin, it seems to work if you write to gl_ViewportIndex as well
11:43HdkR: :)
11:43fincs: (I suspect Imap/Omap fuckery but I haven't been able to observe any weird bits set)
11:43HdkR: Don't forget to look at the program header
11:43fincs: Yeah that's where I'm looking
11:44fincs: Other than an undocumented bit in word0 set (for all vshs apparently), it's what it should be
12:26fincs: Okay
12:27fincs: I ported over imirkin's changes to my setup and my API; and gl_ViewportMask works absolutely fine
12:27fincs: So this tells me there's some register setting we're indeed missing
12:28fincs: Let me try to figure out which reg it is
14:16fincs: More testing - if I write to both gl_ViewportMask[0] and gl_ViewportIndex, I get a nice GPU crash (pushbuffer kickoff failed)
14:16karolherbst: Lyude: https://gist.githubusercontent.com/karolherbst/01056e8a6b7c6d7d326a8d413f80d3af/raw/336a16b24debd36f7028120409183f8e1e2b8ab5/gistfile1.txt
14:17karolherbst: that looks exactly like nouveau behaves when I had the PCIe link to 8.0 workaround
14:17fincs: I'm starting to suspect there is some issue with piglit that is making the test not behave properly
14:17karolherbst: works for a few cycles and then breaks
14:27fincs: I think it would be a good idea to write a test using straight up GL without piglit/any other test framework
14:27fincs: Just to rule that possibility out
14:28karolherbst: fincs: piglit isn't doing all that much though
14:29fincs: The test uses multiple viewports though; maybe something was overlooked in that test file
14:30imirkin: fincs: well, those shader_test files are just a really simple ways to write GL programs
14:30fincs: So far I can't seem to find any magic reg that makes this work or fail
14:30imirkin: i can make an apitrace for you of what that shader test does. pretty sure there's nothing too odd.
14:31fincs: There shouldn't be anything odd in theory, yeah
14:31fincs: At least you now know that the shader compiler side of this is fine
14:32imirkin: i suspect, but nice to be sure :)
14:32fincs: (I.e. your code works for me™)
14:32imirkin: did you write viewport index before the mask, or after? i only tried it before?
14:32fincs: Oh, I tried writing after
14:32fincs: Let me try before
14:32imirkin: i only tried before
14:33fincs: Nope, still crash
14:33imirkin: weird.
14:33imirkin: so ... that actually lends credibility to "forgot to set something up"
14:33imirkin: coz i don't get any sort of error when i do that
14:33fincs: Yeah... the question is... what exactly
14:34fincs: (also fyi that strange 0x02000000 bit in word[0] seems to be set for all vshs)
14:35fincs: Does nouveau use viewport transform?
14:36fincs: Looks like it
14:36fincs: Always on
14:37fincs: Maybe it's a magic bit in VIEW_VOLUME_CLIP_CTRL?
14:42karolherbst: fun... the PCI reg changed in turing or volta
14:42karolherbst: but .. subtle
14:44karolherbst: _fun_ nvidia has the exact same runpm bug nouveau had
14:45karolherbst: let's see if my workaround works there as well
14:49fincs: I've tried disabling every write to unknown registers in my code and despite that, this is still working
14:53imirkin: i meant like a mmio reg
14:53fincs: So not a method in the pushbuf?
14:53imirkin: right
14:54imirkin: something that would be done in the kernel
14:54imirkin: or potentially one of the fw methods
14:54imirkin: which also allow rmw on various context things
14:54fincs: Stuff related to passing attributes between shader stages needs to be configured in kernel? That would be odd though
14:55imirkin: eh, more like feature enablement
14:55imirkin: lots of "CYA" bits in there
14:55imirkin: aka "cover-your-ass"
14:55imirkin: although i prefer intel's "chicken" bits
14:55imirkin: esp their "half_chicken" register
14:55fincs: What's this about? :p
14:55imirkin: stuff hw designers put in for the case where they screw something up
14:56imirkin: so that flipping a bit can turn some bit of questionable functionality off
14:56fincs: There are quite a lot of hardware reg writes in the initialization sequence
14:56imirkin: the sure are.
14:57imirkin: there*
14:57RSpliet: imirkin: what's the half_chicken register for? Or is it just a 16-bit register full of chicken bits?
14:57imirkin: yep
14:57imirkin: 16-bits of chicken
14:57RSpliet: 16 bits of chicken is usually enough for a sandwich
14:58fincs: I see 6, plus 1 that is in the middle of conservative raster shit (so it's very likely that one), plus a whole bunch of ones that are in the code for initializing the tiled cache
14:58fincs: Hmm so there's 6 I have to test then
15:01fincs: Ok so absolutely none of these do anything to affect the viewport mask stuff
15:02imirkin: i see firmware writes to 0x418800, 0x419a08, 0x419f78, 0x404468, 0x419a04, and then a second time to 0x419a04
15:02imirkin: this is on a GM204
15:02fincs: Same
15:02imirkin: er, GM206. same idea though.
15:02fincs: Those are the exact same 6 ones I see and use
15:02imirkin: so that leaves ... the kernel not setting something up, or something much simpler that we're totally missing
15:03fincs: Tried out commenting them out and it does nothing
15:03fincs: My money is on "something much simpler that we're totally missing"
15:03imirkin: ;)
15:03imirkin: can you replay traces?
15:03fincs: This absolutely looks like a stupid bug somewhere
15:03fincs: I don't have a setup for replaying traces, nope
15:03imirkin: ok, well i'll make a trace anyways, see if anything obvious in there
15:04fincs: (And also fyi my host GPU is Maxwell 1st gen, which is *behind* the Switch lol)
15:05imirkin: here's the apitrace: https://people.freedesktop.org/~imirkin/traces/viewportmask.trace
15:06fincs: Heh I thought that was going to be a raw pushbuffer dump
15:06imirkin: you were asking if the piglit was doing something dumb
15:07fincs: Heh, I have no idea how to use this file, what program should be used to view this file?
15:07imirkin: https://github.com/apitrace/apitrace
15:09fincs: Okay I'm looking at it
15:12fincs: Hmm, triangle_strip is being used
15:12imirkin: you can also replay it ("apitrace replay")
15:12imirkin: (or "glretrace")
15:13fincs: (See above - my desktop gpu is maxwell 1st gen so I can't even use this extension)
15:13imirkin: for drawing, sure
15:13imirkin: i meant on the switch. dunno what kind of setup you have there.
15:13fincs: Switch uses a custom microkernel operating system written from scratch by Nintendo
15:13imirkin: so no glretrace then ;)
15:14fincs: Nvidia's Linux kernel driver blob basically runs as a user process, and applications talk to it through IPC
15:14fincs: We ported mesa/nouveau to this setup and it works way better than it should :p
15:14fincs: (only the userland bits)
15:14imirkin: right, of course
15:15imirkin: i think someone who was involved in the porting of it talked to me about it a while back
15:15fincs: Yeah
15:15fincs: Armada did the initial work, then I fixed it so that it would work properly :p
15:15imirkin: yeah, right. Armada.
15:16fincs: I'm going to try changing my code to use a triangle strip instead of a plain triangle
15:16imirkin: can't imagine that would matter
15:17fincs: Yeah but maybe it gets confused because you're generating two primitives with that
15:17imirkin: don't see why that would matter, but ok
15:18imirkin: btw - i rechecked... the sample program actually wrote "1 << viewport" where viewport was a uniform.
15:18imirkin: so the mmt trace should be fully representative
15:19fincs: I see a hardcoded 0xf write in the trace you posted
15:20fincs: "draw rect -1 -1 2 2" <-- is this the cmd that is generating the rect?
15:20imirkin: yes, the one i posted. but the mmt is from a shader someone else did
15:20fincs: Ah
15:21imirkin: https://people.freedesktop.org/~imirkin/traces/gm206/vs-viewportmask.shader_test
15:21imirkin: although they didn't set up the secondary viewports
15:21imirkin: so the blob could be getting smart about it? dunno
15:22fincs: Okay so yeah, with a rect/trianglestrip it still works
15:26fincs: Do you need to enable any feature in GL in order to use glViewportIndexedf?
15:26imirkin: it's part of ARB_viewport_array
15:26imirkin: (which is also core in GL 4.1 or 2, iirc)
15:27imirkin: remember - adding the index write makes it work
15:27imirkin: so it's not a GL-level fail, most likely
15:27fincs: You were testing on Pascal, right?
15:27imirkin: yes, GP108
15:28fincs: Hmm I wonder if behaviour is different on Pascal
15:28imirkin: i don't have any GM20x
15:28imirkin: could be, but unlikely
15:35fincs: Ah, I missed nvc0_magic_3d_init as a source of unk methods
15:40fincs: All the writes in nvc0_magic_3d_init are things I already do... except for the write to 0x02d0 which I don't do
15:43fincs: Adding this write to my code does nothing
15:43imirkin: :)
15:44imirkin: skeggsb: any idea on what might be missing in gr init or somewhere else which would cause viewport mask to behave oddly?
15:44fincs: So yeah... not even the magic init is the culprit
15:49imirkin: skeggsb: basically the way the issue appears is that writing viewport mask doesn't seem to actually broadcast anything to viewports other than the first (setting it to 0 *does*, however cause the primitive not to appear). adding a shader write to the viewport index all of a sudden makes the mask work properly. however blob does not do this. i suspect that writing the viewport index triggers some mode which allows any viewport to be used, which should
15:49imirkin: also be triggered for the mask but isn't. my guess is some missed bit of gr init since we've ruled out pretty much everything else.
15:50imirkin: skeggsb: note that this is a GM20x+ feature.
15:50imirkin: karolherbst: do you have a GM20x around?
15:59karolherbst: imirkin: yes
16:00karolherbst: GM204 I think and the gm20b :p
16:00imirkin: could you try my viewport arrya stuff?
16:01karolherbst: imirkin: you mean the MR or something else?
16:03imirkin: karolherbst: something else
16:03imirkin: karolherbst: https://github.com/imirkin/mesa/commits/viewport_array2
16:04karolherbst: why gm20x though? Would pascal be fine or do you want to make sure it works on gm20x as well?
16:04imirkin: it's already not working on pascal :)
16:04imirkin: at least on GP108
16:04karolherbst: I see :)
16:04imirkin: i want to check that it also doesn't work on GM20x, since the trace i'm going off of is GM206 and fincs is testing on GM20B
16:04fincs: Fwiw, the compiler bits in that branch at least are fine
16:04imirkin: (but he's testing with the switch nvidia blob kernel driver, so it's not 1:1 testing)
16:04fincs: Generated shaders are correct
16:06imirkin: karolherbst: and a piglit patch ... https://people.freedesktop.org/~imirkin/traces/0001-shader_runner-allow-setting-viewports-directly.patch
16:06imirkin: karolherbst: and this shader_test: https://hastebin.com/jufacowabu.cs
16:09karolherbst: imirkin: btw as I have currently the blob running on the gm204 system, should I create traces first?
16:09imirkin: go for it!
16:09karolherbst: k
16:09imirkin: pretty sure i have what i need
16:09imirkin: but ... can't hurt to have more info
16:09fincs: Tracing blob with that piglit patch + test?
16:10imirkin: the trace i have is from a slightly different test with GM206
16:10karolherbst: need to do some admin stuff on the machine first, so it will take a while, but I assume until the end of the day you should get your answers
16:10imirkin: karolherbst: also if you have just a general mmiotrace of a gm20x coming up, that'd be useful. (do we have some in the repo?)
16:10karolherbst: we should have some in the repo
16:10imirkin: ok cool, i'll check
16:12imirkin: hrmph. i can no longer access the repo...
16:13imirkin: aha, but in my checkout there are a couple of gm206 traces =]
16:13imirkin: oh. even gp108 in there.
16:14imirkin: and gv100. neat.
16:16imirkin: so now i'm looking for a reg like this one:https://github.com/envytools/envytools/blob/master/rnndb/graph/gf100_pgraph/tpc.xml#L32
16:21imirkin: looks like we're missing a write of 0 to 0x419810 -- unlikely that's it though.
16:21imirkin: yeah, it's 0 anyway
16:23karolherbst: imirkin: do you even know if the test passes on the blob?
16:23imirkin: no
16:23karolherbst: maybe I just test this first, because that's the easiest here right now :)
16:24imirkin: still need the piglit patch obviously
16:29karolherbst: mhh.. this is literally the first time I try to use nvidia on wayland.. let's see how that goes
16:30fincs: Doesn't the blob have no support whatsoever for wayland?
16:30karolherbst: it does, a little
16:30fincs: I remember being forced into using X, the last time I attempted to use Linux
16:31karolherbst: you get no GLX support in XWayland
16:31karolherbst: which is a big issue
16:33karolherbst: "EGL vendor string: NVIDIA" \o/ well at least eglinfo seems to work
16:33imirkin: karolherbst: replay the apitrace i posted earlier
16:33imirkin: should work with eglretrace i'd think
16:33karolherbst: piglit works with EGL as well :p
16:34imirkin: oh right, yea
16:34imirkin: trace here: https://people.freedesktop.org/~imirkin/traces/viewportmask.trace
16:34imirkin: should see a frame with 4 squares of different colors
16:34karolherbst: mhh PIGLIT_PLATFORM was the env var?
16:35imirkin: PIGLIT_PLATFORM iirc :p
16:35karolherbst: PIGLIT_PLATFORM=wayland __NV_PRIME_RENDER_OFFLOAD=1 ./bin/shader_runner tmp.shader_test -auto -fbo
16:35karolherbst: mhhh
16:35karolherbst: no idea if the nvidia driver is used actually
16:36karolherbst: but I am getting a Test requires unsupported extension GL_NV_viewport_array2
16:36imirkin: i'd say "no" then
16:37karolherbst: oh well... hopefully I don't mess up spawning an Xserver on the nvidia gpu
16:57karolherbst: finally :D I should create a shell script to start this damn X server
16:58karolherbst: okay.. nvidia passes that test
16:58imirkin: sorry, didn't mean for it to be a pain for you
16:58imirkin: cool
16:59fincs: Alright so it is not a piglit issue then
16:59karolherbst: it's not.. just Xorgs args are messy
16:59karolherbst: and you know.. if it does a vtswitch and you can't switch back...
16:59karolherbst: If I enable the "gl_ViewportIndex=0" line it still passes
16:59karolherbst: in case it matters
17:00fincs: Hmm, I haven't checked what the official compiler does when you write to both Index and Mask
17:00karolherbst: okay.. now mmt
17:03imirkin: fincs: afaik it's allowed. although probably less-than-fully-defined if a shader actually writes both
17:03imirkin: but you could imagine something like if () { index = foo; } else { mask = bar; }
17:04fincs: Official compiler ignores the Index write, only writes to Mask
17:04imirkin: what if you do the if/else?
17:05fincs: (However the bit in the omap for Index is still set)
17:05fincs: Let me test if/else
17:06fincs: Oh wait
17:06fincs: I misread this
17:06fincs: It does write to both
17:06fincs: Sorry
17:06fincs: Doesn't help that viewportindex is so close to gl_Position in AST...
17:07karolherbst: imirkin: https://filebin.net/9soublg0uyfxveew/file-bin.log.xz?t=9nlxinpx
17:10fincs: Hmm I wonder why does this crash for me then
17:10fincs: Need to investigate
17:10imirkin: karolherbst: which GPU is this? GPsomething?
17:10karolherbst: gp107, yes
17:11karolherbst: thought this makes more sense for you as you are on a gp108 :p
17:11imirkin: my demmt can't decode it =/
17:11karolherbst: mhhh
17:11karolherbst: mhhh
17:11karolherbst: do I still have local commits for that?
17:11imirkin: ERROR: nvrm_ioctl_host_map56: cannot find object 0xc1d0002d 0xcaf0000a
17:11karolherbst: ahhh
17:11karolherbst: okay.. wait
17:12karolherbst: imirkin: ohh no, I think you just need to update your install
17:13imirkin: ah, maybe
17:13karolherbst: imirkin: you need db2163cd929cb3daa33c689f68256739eb23d947:)
17:13imirkin: kk
17:14karolherbst:still has local UVM stuff...
17:14karolherbst: and yes.. nvidia makes incompatible changes to UVM
17:14karolherbst: :(
17:14karolherbst: removing fields and stuff
17:15imirkin: annoying.
17:15karolherbst: yes
17:15karolherbst: well.. we could import the uvm headers for every driver version :/
17:15karolherbst: ...
17:18imirkin: karolherbst: the nouveau_vbios_trace repo appears to be weirdly messed up -- its first commit has a diff hash than the original repo.
17:18karolherbst: yes
17:18karolherbst: I converted it to lfs
17:18imirkin: ahhh
17:18imirkin: ok
17:42imirkin: fincs: i think someone switch-related managed to dump some nvidia method names for that gpu ... do you remember where i can find that?
17:50fincs: Wait what? Someone did that? I don't remember such a thing
17:52imirkin: i remember seeing the nvidia method names
17:52imirkin: associated with each index
17:52fincs: Huh
17:52fincs: Are you sure that was the 3d engine?
17:52imirkin: yes
17:52fincs: Hmmm
17:52imirkin: potentially by one of the emulator efforts
17:53fincs: Are you sure you didn't just get linked to this page (which is really outdated at this point)? https://switchbrew.org/wiki/GPU
17:53imirkin: yes. those are the nvidia method names iirc
17:53fincs: Some of them
17:53imirkin: yeah, but some not. o well.
17:54fincs: That uses a mix of method names lifted from compute, and ones reworded from NVN command names, and even from nouveau's register database
17:54imirkin: ah sad ok
17:54imirkin: i thought it was from a driver dump
17:54imirkin: sometimes they leave that stuff in
17:54fincs: If nvidia had left full 3d method names then that would be a treasure lol
17:56karolherbst: imirkin: will you still need me to test that stuff on maxwell or is the mmt trace enough to figure out what's wrong?
17:57imirkin: karolherbst: not enough =/
17:57karolherbst: :/
17:57karolherbst: mhhh
17:57karolherbst: weird
17:58karolherbst: want an mmiotrace as well when executing this test?
17:59imirkin: nah
18:08imirkin: karolherbst: so yeah, based on that trace, no clue what's missing
18:08imirkin: i expect it's some bit of GR init
18:08karolherbst: maybe...
18:08karolherbst: is there any mme script involved here?
18:29Lyude: karolherbst: btw - did you check suspend/resume with your rpm workaround?
18:30imirkin: karolherbst: no. the mme scripts show up as PM: in the demmt
18:30imirkin: (so i would have caught it)
18:35karolherbst: Lyude: I think so, why?
18:36karolherbst: at least on the XPS I did
18:41karolherbst: Lyude: just checked on the 2nd gen P1, works there as well
18:45karolherbst: Lyude: anyway, I digged deeper into the runpm issue on nvidia and it behaves exactly like with nouveau :/
18:45karolherbst: oh well
18:47Lyude: karolherbst: pardon? runpm seems to work just fine on mine
18:47Lyude: w/ your patches
18:50karolherbst: "<Lyude> karolherbst: btw - did you check suspend/resume with your rpm workaround?"
18:50karolherbst: or what was your question about now?
18:50Lyude: karolherbst: no I meant - did you hit some new runpm issue?
18:50karolherbst: no, I just found out that nvidia is equally broken as nouveau was
18:50Lyude: ahhh lol
18:51karolherbst: yeah
18:51karolherbst: I even changed the PCIe link speed to 2.5
18:51karolherbst: as this is what nouveau does through devinit
18:51karolherbst: insta fail
18:51karolherbst: so...
18:51karolherbst: and I even applied my workaround to the nvidia driver, kept working for 250 cycles
18:51karolherbst: then I stopped the test
18:57karolherbst: no idea what to do about this entire issue anymore
18:57karolherbst: maybe just leave it be
18:57karolherbst: the workaround seems to work and nvidia broken...
18:57karolherbst: so...
18:58HdkR: Winning against the blob is always a good thing :P
20:56imirkin: karolherbst: reminder to check the viewport mask thing on your gm20x
20:56imirkin: i think the tegra x1 would be fine btw
20:56karolherbst: imirkin: is there a point in checking if it's broken on pascal? Or maybe I just missunderstood the situation
20:57imirkin: the point is to check if it's broken on maxwell too
20:57imirkin: or if it's just a pascal thing
20:57imirkin: it works for fincs on the switch, but that's not exactly an apples-to-apples comparison
20:57fincs: Yeah, it's not
20:58imirkin: but it's still nouveau setting up the 3d context / emitting commands / etc. but with the nvidia RM, basically.
20:58karolherbst: I see
20:58karolherbst: yeah, I could check on the tegra, would be easier anyway
21:04karolherbst:hates machines without hostnames
21:06karolherbst:likes managed switches
21:48karolherbst:should setup a cross compiler
22:03fincs: Join/parts spam :\
22:05imirkin: gr
22:06fincs: Yay moderation \o/