02:31imirkin: karolherbst: did you have a fix for https://hastebin.com/asebewewiw.cs ?
02:43imirkin: skeggsb: looks like it was a plain bug in nouveau ddx when dpms is on, drmWaitVBlank fails, and that failure was not handled properly in the present code
09:23pmoreau: Finally fixed my script to run the OpenGL CTS on my test computer; it should be running the CTS daily against the latest Mesa master on a G94 now.
12:01pmoreau: Results for the OpenGL 3.3 CTS on G94 against Mesa 20.1.5: “Failed: 3/3971 (0.1%)”
12:01pmoreau: Now, to figure out which tests failed…
12:03karolherbst: :)
13:58karolherbst: ehh.. https://gist.github.com/karolherbst/56f4afd6f0e988bc74947db5b8118086
13:58karolherbst: I am sure that 12 and 13 are messed up
13:59karolherbst: RZ gets emited though
17:43imirkin: pmoreau: i'm aware of a handful of failures with GL 3.3 CTS
17:44AndrewR: so, I tried to build https://gitlab.freedesktop.org/pmoreau/mesa/-/commits/nv50_compute_support/ but run into meson configuration error : "meson.build:1450:2: ERROR: Unknown variable "_minimum_llvmspirvlib_version_array""
17:44imirkin: but i don't appear to have a made a note of which ones. i pushed some fixes upstream for most of it, iirc the remainder are "our bad"
17:51imirkin: karolherbst: is %r112 an undef?
17:51karolherbst: nope
17:51karolherbst: but it was 0
17:51karolherbst: the emited code is correct. just... no idea why the ssa value was displayed still
17:52imirkin: that usually means RA failure
17:52karolherbst: I guess
17:52karolherbst: will try to figure it out
18:12imirkin: pmoreau: https://pastebin.com/raw/gR5n9tY4 -- these are the failures i have on record
18:12imirkin: pmoreau: iirc the texture_swizzle one is a problem with the hw. i forget what it was, might be in the channel logs
18:12imirkin: pmoreau: i didn't look at the pipeline stats one, assuming hw is just counting something slightly different
18:13imirkin: pmoreau: and i don't remember the xfb thing -- i fixed many different xfb things, so it's a bit mush in my memory
18:24imirkin: pmoreau: also i have an unpushed nv50 xfb change
18:25imirkin: i can't remember where it mattered...
18:25imirkin: i definitely didn't write it at random though
18:29AndrewR: ..rolled back meson.build changes.. now it configures and build started ... I wonder what else I have outdated (updated meson to 0.55.1 , but it was not enough ..?)
18:32pmoreau: AndrewR Indeed I messed up; pushed a fix.
18:33AndrewR: pmoreau, sorry for really dumb question, but how to update your branch correctly? I tried git pull' while on your branch, and it often messed up tree to the point I was forced to redownload whole repo ...stupid me where?)
18:34imirkin: git pull --rebase
18:34imirkin: that will fix it
18:34imirkin: git pull is not what you want with force-pushed branches
18:34imirkin: but git pull --rebase should work out ok
18:34pmoreau: imirkin: That sounds like the same failure I’m having.
18:35imirkin: if you get in trouble, you can always just do git reset --hard pmorea/nv50_compute_whatever
18:35pmoreau: And what Ilia said, `git pull --rebase` should work fine.
18:35imirkin: pmoreau: yeah, those tests were run on G84, which feature-wise is identical to G94
18:36imirkin: pmoreau: i expect there will be additional failures on GT21x's, since they enable some additional features
18:36imirkin: (esp around xfb)
18:36AndrewR: imirkin, pmoreau - thanks!
18:36pmoreau: What’s xfb? transform framebuffer?
18:36imirkin: pmoreau: xfb == transform feedback
18:37pmoreau: Ah feedback, right
18:37imirkin: but a lot shorter to type.
18:37pmoreau: Definitely shorter
18:37imirkin: it's the DX name for it, i believe
18:37imirkin: in GL, might be called tfb too
18:37imirkin: but it's not as recognizable, i think -- most people are familiar with "xfb" but not "tfb"
18:38pmoreau: tfb looks like someone typo’ed tbf :-D
18:38RSpliet: to fe bair, it kind of does...
18:38pmoreau: Hahahaha, Roy :-)
18:39imirkin: pmoreau: fwiw this is the unpushed patch i have in my tree - https://pastebin.com/kfFM454A
18:39imirkin: pmoreau: i'm not 100% sure that test references in the description are correct though.
18:39pmoreau: Thanks!
18:40imirkin: (note these are in the GTF suite, not the KHR suite ... you need special friends to get the GTF ones)
18:44airlied: had to fix some gtf tests for llvmpipe, not fun
18:45imirkin: airlied: fix the tests, or fix llvmpipe?
18:45airlied: tests
18:45imirkin: airlied: from what i remember, the GTF test failures were legit in our case
18:45imirkin: unfortunately some are unfixable with the hw
18:46imirkin: i expect nvidia got exceptions to them
18:46airlied: i just hace exposed corner cases
18:46airlied: i can find out
18:46airlied: if you give me test name
18:46imirkin: on tesla -- all works fine on fermi+
18:46imirkin: GTF-GL33.gtf31.GL3Tests.draw_instanced.draw_instanced_max_vertex_attribs
18:46airlied: for gl3.3 nit sure anyone cared
18:47imirkin: yeah, this would have been for GL 3.3 (or earlier)
18:47imirkin: basically it maxes out VBO inputs, and uses gl_InstanceID
18:47imirkin: can't do that on tesla :)
18:47imirkin: gl_InstanceID / gl_VertexID count as vertex inputs
18:47imirkin: so ... you can only have 64 components input total
18:48imirkin: unless they know something about the hw that i don't
18:48airlied: llvmpipe exposes unorm32 depth
18:48airlied: which trips up a few things
18:48imirkin: that's just an unforced error... don't do that ;)
18:49airlied: d3d9 has it i think
18:49imirkin: not aware of any hw which does it
18:49airlied: yeah neither were the gtf tests
18:49airlied: assume 32 bit was float
18:49imirkin: it poses all sorts of problems for for accuracy too
18:50imirkin: since fp32 can't have the same precision as unorm32
18:50imirkin: (and all the tests verifying precision would be using fp32 most likely)
18:51airlied: also blitter gets angry when you transition through a float
18:51imirkin: right
18:51imirkin: just don't do that :)
18:52airlied: but its all compliant noe :-p
18:52imirkin: hehe ok
18:52airlied: well one patch to use uint for d32 bearest blits
18:53imirkin: i don't think you can do linear blits with depth
18:53imirkin: (can you?)
18:53airlied: even if you can precison loss is fine
18:54imirkin: right
18:54airlied: since the test code likely uses floats :-p
18:54imirkin: i just don't think it's allowed by the API
18:54imirkin: GL_INVALID_OPERATION is generated if mask contains any of the GL_DEPTH_BUFFER_BIT or GL_STENCIL_BUFFER_BIT and filter is not GL_NEAREST.
18:55imirkin: http://docs.gl/gl4/glBlitFramebuffer
18:55imirkin: not authoritative, but the man pages usually get this stuff right
19:05AndrewR: pmoreau, I think your branch introduced regression in Celestia 1.6.1 (blocky sun and planets) and q3arena (overly dark rooms) - https://imgur.com/a/dZXDdyq
19:07pmoreau: Oh, mmh :-/
19:09pmoreau: AndrewR: When would be the previous working version that you tested?
19:10imirkin: pmoreau: i'd encourage you to get the state management change patches all done and reviewed and pushed
19:11AndrewR: pmoreau, _i think_ master from 15 aug 2020 was fine ..I'll try rebuild with tree set to just before your specific changes as possible good point for bisect ....
19:15pmoreau: That is interesting… the only changes I have added are implementation of an OpenCL extension (and some rework of clover internals), and some NIR related patches. Unless it’s the changes to shared from Karol.
19:16pmoreau: imirkin: I really need to get those out, you are right…
19:17imirkin: pmoreau: that will give you a good baseline to work with for the rest of the enablement stuff
19:17imirkin: so you don't end up accidentally breaking graphics
19:19pmoreau: I would never do such a thing… oh wait, I already did 🙃
19:31pmoreau: imirkin: How do you parse the results from the CTS? I’m trying with either the log_to_xml or log_to_csv, but they are both failing with some “UnicodeDecodeError: 'utf8' codec can't decode byte 0xa4 in position 14: invalid start byte”.
19:31imirkin: i use log_to_csv with python2
19:31imirkin: py3 breaks strings unfortunately
19:33imirkin: they add implicit conversions to the locale's charset in seemingly random places
19:33AndrewR: pmoreau, https://pastebin.com/YuhPdZDT - I tried luxcoreui :} I think this time error is different, but I tried different scene, too ... (I know this is very early pre-alpha code, just found it interesting how my new gt215/geforce 240 card reacts ...)
19:34imirkin: that sounds like a code emission fail
19:34imirkin: like it's trying to emit [x+0x1f] but instead emits [0x1f]
19:34imirkin: an 0x1f offset is only possible with u8/s8, so something to look at
19:35pmoreau: I’m running it with python2 to be safe.
19:35airlied: pmoreau: i edit the file and remove the utf8
19:35pmoreau: Mmh okay, I should look if any issues were reported against the CTS.
19:36airlied: i think there are 5 or 6 tests
19:36AndrewR: pmoreau, https://pastebin.com/rJt69StB - stderr/stdout from luxcoreui program ... it _nearly_ starts to render .... and hangs :}
19:36airlied: invalid char ones and some 420 pack
19:36pmoreau: AndrewR: And you did not have those issues before with Luxcore? Or are those different (IIRC you still had some)?
19:37AndrewR: pmoreau, I think your previous iteration of those patches just resulted in glsl (???) error somewhere ...
19:37pmoreau: > (I know this is very early pre-alpha code, just found it interesting how my new gt215/geforce 240 card reacts ...)
19:37pmoreau: Don’t worry about it, it’s awesome that you are testing it so often: much appreciated!
19:38AndrewR: pmoreau, I think card #2 a bit too upset for abusing it again this way :}
19:38pmoreau: airlied: I could try that, let’s see…
19:40imirkin: AndrewR: that gt240 shouldn't be so weak, esp with reclocking ... obv nothing compared to modern cards
19:41AndrewR: imirkin, I mean after this ch. unload error ...when I got this with vdpau - it was sure hang/watchdog timeout on second attempt to try even with just vdpauinfo/glxinfo ....
19:41imirkin: yeah
19:42pmoreau: Why didn’t I try to open those qpa files earlier… I thought they were in a binary format and that’s why they were providing scripts for converting them to a readable format. 🤦
19:42imirkin: it's just xml
19:42imirkin: but not really xml
19:42imirkin: there's also some xslt thing to view them in a browser
19:42pmoreau: But enough readable that I can grep through them to find fails.
19:42imirkin: sorta yeah
19:44airlied: yah for small num of fails they are fine
19:49imirkin: you can also just paste the images into chrome
19:49imirkin: with data:image/png;base64,<paste>
19:50imirkin: (in the url bar)
19:50AndrewR: pmoreau, commit aa8661141a0f466994145e99be4d4bd4f9684a9d (HEAD -> nv50_compute_support, origin/master, origin/HEAD, master) actually restores Celestia ...time to bisect ....
19:56AndrewR: pmoreau, may be I should make apitrace, so there iwll be no need for you to install/build it ...
19:56pmoreau: I can install Celestia, it’s not a large program.
19:58pmoreau: Oof, this is clearly broken :-/
20:21AndrewR: https://yadi.sk/d/vjE-weMWVT8Cjg - 50 mb xz trace ....
20:32pmoreau: Thanks
20:49pmoreau: https://gitlab.freedesktop.org/pmoreau/mesa/-/commit/e12e7785937b0e2d75876111b4a4305ddafa5239 is breaking Celestia
20:50pmoreau: AndrewR: ^
20:50pmoreau: I’m going to redo that patch and the one before.
20:50imirkin: if (s == 1) s = 2;
20:50imirkin: 
20:50imirkin: else if (s == 2) s = 1;
20:50imirkin: 
20:50imirkin: not hacky at all :)
20:51imirkin: just normalize it properly everywhere
20:51imirkin: the literal values don't matter afaik
20:51imirkin: just has to match up everywhere
20:53pmoreau: Right, I added that hack because it was breaking supertuxcart otherwise and I hadn’t had time to figure out why. :-D
20:53pmoreau: I separated those commits to a separate branch now: https://gitlab.freedesktop.org/pmoreau/mesa/-/commits/nv50_resource_rework.
20:54AndrewR: pmoreau, only obvious difference i can see is around lines 1163-1164 :} But Celestia doesn't use geometry shaders .... I think
21:08imirkin: pmoreau: btw, feel free to dump notes about CTS failures in https://trello.com/b/lfM6VGGA/nouveau-cts
21:18karolherbst: pmoreau: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6338 :p
21:20karolherbst: is more or less required for private memory in CL as well
21:32imirkin: AndrewR: btw, just #if 0 around the print which keeps happening
21:32imirkin: AndrewR: i had not intended for it to appear in non-debug logs
21:32imirkin: just not super-familiar with how Xorg internal logging works
21:33karolherbst: ehhh
21:33karolherbst: why does private memory fail for chars and shorts :/
21:34karolherbst: ahh.. "gr: SKED: 00001000 [TOTAL_TEMP_SIZE]"
21:35karolherbst: mhhh "SHADER_LOCAL_MEMORY_LOW_SIZE : 0x900"
21:35karolherbst: OHHH
21:36karolherbst: mhh
21:39karolherbst: imirkin: do you know if there is some weird validation in the hw in regards to the tls space?
21:39imirkin: btw, there's also "positive" and "negative" lmem
21:39imirkin: but i never understood wtf that was about
21:39imirkin: i mean ... define weird
21:39karolherbst: well..
21:39imirkin: there is validation :)
21:40imirkin: there's not like an infinite amount of on-chip memory (sadly)
21:40karolherbst: I have 0x900 as SHADER_LOCAL_MEMORY_LOW_SIZE and 0x800 as SHADER_LOCAL_MEMORY_CRS_SIZE.. and the hardware throws TOTAL_TEMP_SIZE
21:40karolherbst: well.. it's for compute :p
21:40karolherbst: so it's all VRAM anyway
21:40karolherbst: or not?
21:40imirkin: local memory comes from L2
21:41karolherbst: ehh
21:41imirkin: L2 gets partitioned into shared and local i believe
21:41karolherbst: I meant local as in nvidia local, not CL local
21:41imirkin: not sure
21:41karolherbst: shared is L2
21:41karolherbst: local is VRAM
21:41karolherbst: we even allocate a tls bo
21:41imirkin: i don't think that's quite right
21:41imirkin: i mean yes
21:41imirkin: but
21:41imirkin: there's also on-chip, etc
21:41imirkin: also the tls bo has to be big enough
21:41karolherbst: for non compute or not?
21:41imirkin: always
21:41karolherbst: ohh.. mhh
21:41imirkin: tls isn't different for compute vs non-compute afaik
21:42karolherbst: mhh
21:42AndrewR: imirkin, OK ...but it was useful to see
21:42imirkin: AndrewR: yeah, but it's expected
21:42imirkin: it's not a real error
21:42karolherbst: but for compute we at least always specify the off chip C/R stack or something
21:42imirkin: presumably because we assume that compute will have complex control flow
21:42imirkin: while regular shaders won't
21:42karolherbst: probably
21:43karolherbst: okay.. let's see
21:43karolherbst: ehh...
21:43karolherbst: I don;'t need as many anyway
21:43karolherbst: anyway.. I should figure out what's wrong
21:44karolherbst: imirkin: hah!
21:44karolherbst: we don't resive the tls bo
21:44karolherbst: I doubled its size and now I didn't get the error
21:45karolherbst: well...
21:45karolherbst: we never resize it
21:45karolherbst: oh well..
21:50karolherbst: imirkin: memory opt crashes on this shader: https://gist.github.com/karolherbst/e80f8f4467da4fc16bb647272ca16d34 :/
21:50imirkin: heh
21:50imirkin: do you know where?
21:50karolherbst: yes
21:51imirkin: not sure, but iirc MemoryOpt isn't super-compatible with CL
21:51imirkin: at least the bits that combine loads/stores
21:51imirkin: into wider things
21:51karolherbst: imirkin: https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp#n2862
21:51imirkin: because the dynamic offset may not be aligned thew ay it is for GL
21:52karolherbst: well.. it's alligned for local memory
21:52imirkin: hmmm
21:52karolherbst: but taht would cause issues at runtime
21:52karolherbst: not compile time
21:52imirkin: so presumably st->getSrc(s + 1) is null?
21:52karolherbst: yes
21:52karolherbst: s is 1
21:52imirkin: right
21:52imirkin: so
21:53imirkin: this thing is really not ready for sub-32-bit things, i think
21:53karolherbst: yeah...
21:53imirkin: e.g.
21:53karolherbst: although the vec8 versions works
21:53imirkin: int s = sizeSt / 4;
21:53karolherbst: it's just vec16 which is screwed
21:53karolherbst: right..
21:53imirkin: that only makes sense if you're storing stuff in 32-bit units
21:53imirkin: so i'm guessing when it combines e.g. 8 + 8 into 16
21:54imirkin: it doesn't quite do it right
21:54karolherbst: I guess
21:54karolherbst: sizeSt is also -3 :)
21:54imirkin: int sizeSt = typeSizeof(st->dType);
21:54imirkin: right, it does -= other-size
21:55karolherbst: ehhh
21:55imirkin: again, i suspect some things aren't set quite right for sub-32 sizes
21:55karolherbst: sp s starts with 0
21:55karolherbst: *so
21:55karolherbst: yeah...
21:55karolherbst: maybe I should just disable memoryOpt entirely when dealing with nir/CL
21:56karolherbst: we can vectorize loads in nir as well
21:56karolherbst: I think...
21:56karolherbst: it's not perfect
21:56imirkin: or you can fix MemoryOpt
21:56imirkin: it's not that complex
21:56karolherbst: but at least nir gives us the alignment of pointers more or less
21:56imirkin: wtvr...
21:57karolherbst: yeah.. mhh
21:57karolherbst: let's see
21:57karolherbst: imirkin: I think memoryOpt is also a bit whaky for 64 bit loads...
21:58imirkin: ok
21:58karolherbst: I think I ran into the issue once
21:58karolherbst: so I forced 32 bit loads for some stuff
21:58karolherbst: mhhh
21:58karolherbst: I remember that being an bigger issue
21:59karolherbst: ahhh..
21:59karolherbst: I know
22:01karolherbst: uhhh