02:31 imirkin: karolherbst: did you have a fix for https://hastebin.com/asebewewiw.cs ?
02:43 imirkin: skeggsb: looks like it was a plain bug in nouveau ddx when dpms is on, drmWaitVBlank fails, and that failure was not handled properly in the present code
09:23 pmoreau: Finally fixed my script to run the OpenGL CTS on my test computer; it should be running the CTS daily against the latest Mesa master on a G94 now.
12:01 pmoreau: Results for the OpenGL 3.3 CTS on G94 against Mesa 20.1.5: “Failed: 3/3971 (0.1%)”
12:01 pmoreau: Now, to figure out which tests failed…
12:03 karolherbst: :)
13:58 karolherbst: ehh.. https://gist.github.com/karolherbst/56f4afd6f0e988bc74947db5b8118086
13:58 karolherbst: I am sure that 12 and 13 are messed up
13:59 karolherbst: RZ gets emited though
17:43 imirkin: pmoreau: i'm aware of a handful of failures with GL 3.3 CTS
17:44 AndrewR: so, I tried to build https://gitlab.freedesktop.org/pmoreau/mesa/-/commits/nv50_compute_support/ but run into meson configuration error : "meson.build:1450:2: ERROR: Unknown variable "_minimum_llvmspirvlib_version_array""
17:44 imirkin: but i don't appear to have a made a note of which ones. i pushed some fixes upstream for most of it, iirc the remainder are "our bad"
17:51 imirkin: karolherbst: is %r112 an undef?
17:51 karolherbst: nope
17:51 karolherbst: but it was 0
17:51 karolherbst: the emited code is correct. just... no idea why the ssa value was displayed still
17:52 imirkin: that usually means RA failure
17:52 karolherbst: I guess
17:52 karolherbst: will try to figure it out
18:12 imirkin: pmoreau: https://pastebin.com/raw/gR5n9tY4 -- these are the failures i have on record
18:12 imirkin: pmoreau: iirc the texture_swizzle one is a problem with the hw. i forget what it was, might be in the channel logs
18:12 imirkin: pmoreau: i didn't look at the pipeline stats one, assuming hw is just counting something slightly different
18:13 imirkin: pmoreau: and i don't remember the xfb thing -- i fixed many different xfb things, so it's a bit mush in my memory
18:24 imirkin: pmoreau: also i have an unpushed nv50 xfb change
18:25 imirkin: i can't remember where it mattered...
18:25 imirkin: i definitely didn't write it at random though
18:29 AndrewR: ..rolled back meson.build changes.. now it configures and build started ... I wonder what else I have outdated (updated meson to 0.55.1 , but it was not enough ..?)
18:32 pmoreau: AndrewR Indeed I messed up; pushed a fix.
18:33 AndrewR: pmoreau, sorry for really dumb question, but how to update your branch correctly? I tried git pull' while on your branch, and it often messed up tree to the point I was forced to redownload whole repo ...stupid me where?)
18:34 imirkin: git pull --rebase
18:34 imirkin: that will fix it
18:34 imirkin: git pull is not what you want with force-pushed branches
18:34 imirkin: but git pull --rebase should work out ok
18:34 pmoreau: imirkin: That sounds like the same failure I’m having.
18:35 imirkin: if you get in trouble, you can always just do git reset --hard pmorea/nv50_compute_whatever
18:35 pmoreau: And what Ilia said, `git pull --rebase` should work fine.
18:35 imirkin: pmoreau: yeah, those tests were run on G84, which feature-wise is identical to G94
18:36 imirkin: pmoreau: i expect there will be additional failures on GT21x's, since they enable some additional features
18:36 imirkin: (esp around xfb)
18:36 AndrewR: imirkin, pmoreau - thanks!
18:36 pmoreau: What’s xfb? transform framebuffer?
18:36 imirkin: pmoreau: xfb == transform feedback
18:37 pmoreau: Ah feedback, right
18:37 imirkin: but a lot shorter to type.
18:37 pmoreau: Definitely shorter
18:37 imirkin: it's the DX name for it, i believe
18:37 imirkin: in GL, might be called tfb too
18:37 imirkin: but it's not as recognizable, i think -- most people are familiar with "xfb" but not "tfb"
18:38 pmoreau: tfb looks like someone typo’ed tbf :-D
18:38 RSpliet: to fe bair, it kind of does...
18:38 pmoreau: Hahahaha, Roy :-)
18:39 imirkin: pmoreau: fwiw this is the unpushed patch i have in my tree - https://pastebin.com/kfFM454A
18:39 imirkin: pmoreau: i'm not 100% sure that test references in the description are correct though.
18:39 pmoreau: Thanks!
18:40 imirkin: (note these are in the GTF suite, not the KHR suite ... you need special friends to get the GTF ones)
18:44 airlied: had to fix some gtf tests for llvmpipe, not fun
18:45 imirkin: airlied: fix the tests, or fix llvmpipe?
18:45 airlied: tests
18:45 imirkin: airlied: from what i remember, the GTF test failures were legit in our case
18:45 imirkin: unfortunately some are unfixable with the hw
18:46 imirkin: i expect nvidia got exceptions to them
18:46 airlied: i just hace exposed corner cases
18:46 airlied: i can find out
18:46 airlied: if you give me test name
18:46 imirkin: on tesla -- all works fine on fermi+
18:46 imirkin: GTF-GL33.gtf31.GL3Tests.draw_instanced.draw_instanced_max_vertex_attribs
18:46 airlied: for gl3.3 nit sure anyone cared
18:47 imirkin: yeah, this would have been for GL 3.3 (or earlier)
18:47 imirkin: basically it maxes out VBO inputs, and uses gl_InstanceID
18:47 imirkin: can't do that on tesla :)
18:47 imirkin: gl_InstanceID / gl_VertexID count as vertex inputs
18:47 imirkin: so ... you can only have 64 components input total
18:48 imirkin: unless they know something about the hw that i don't
18:48 airlied: llvmpipe exposes unorm32 depth
18:48 airlied: which trips up a few things
18:48 imirkin: that's just an unforced error... don't do that ;)
18:49 airlied: d3d9 has it i think
18:49 imirkin: not aware of any hw which does it
18:49 airlied: yeah neither were the gtf tests
18:49 airlied: assume 32 bit was float
18:49 imirkin: it poses all sorts of problems for for accuracy too
18:50 imirkin: since fp32 can't have the same precision as unorm32
18:50 imirkin: (and all the tests verifying precision would be using fp32 most likely)
18:51 airlied: also blitter gets angry when you transition through a float
18:51 imirkin: right
18:51 imirkin: just don't do that :)
18:52 airlied: but its all compliant noe :-p
18:52 imirkin: hehe ok
18:52 airlied: well one patch to use uint for d32 bearest blits
18:53 imirkin: i don't think you can do linear blits with depth
18:53 imirkin: (can you?)
18:53 airlied: even if you can precison loss is fine
18:54 imirkin: right
18:54 airlied: since the test code likely uses floats :-p
18:54 imirkin: i just don't think it's allowed by the API
18:54 imirkin: GL_INVALID_OPERATION is generated if mask contains any of the GL_DEPTH_BUFFER_BIT or GL_STENCIL_BUFFER_BIT and filter is not GL_NEAREST.
18:55 imirkin: http://docs.gl/gl4/glBlitFramebuffer
18:55 imirkin: not authoritative, but the man pages usually get this stuff right
19:05 AndrewR: pmoreau, I think your branch introduced regression in Celestia 1.6.1 (blocky sun and planets) and q3arena (overly dark rooms) - https://imgur.com/a/dZXDdyq
19:07 pmoreau: Oh, mmh :-/
19:09 pmoreau: AndrewR: When would be the previous working version that you tested?
19:10 imirkin: pmoreau: i'd encourage you to get the state management change patches all done and reviewed and pushed
19:11 AndrewR: pmoreau, _i think_ master from 15 aug 2020 was fine ..I'll try rebuild with tree set to just before your specific changes as possible good point for bisect ....
19:15 pmoreau: That is interesting… the only changes I have added are implementation of an OpenCL extension (and some rework of clover internals), and some NIR related patches. Unless it’s the changes to shared from Karol.
19:16 pmoreau: imirkin: I really need to get those out, you are right…
19:17 imirkin: pmoreau: that will give you a good baseline to work with for the rest of the enablement stuff
19:17 imirkin: so you don't end up accidentally breaking graphics
19:19 pmoreau: I would never do such a thing… oh wait, I already did 🙃
19:31 pmoreau: imirkin: How do you parse the results from the CTS? I’m trying with either the log_to_xml or log_to_csv, but they are both failing with some “UnicodeDecodeError: 'utf8' codec can't decode byte 0xa4 in position 14: invalid start byte”.
19:31 imirkin: i use log_to_csv with python2
19:31 imirkin: py3 breaks strings unfortunately
19:33 imirkin: they add implicit conversions to the locale's charset in seemingly random places
19:33 AndrewR: pmoreau, https://pastebin.com/YuhPdZDT - I tried luxcoreui :} I think this time error is different, but I tried different scene, too ... (I know this is very early pre-alpha code, just found it interesting how my new gt215/geforce 240 card reacts ...)
19:34 imirkin: that sounds like a code emission fail
19:34 imirkin: like it's trying to emit [x+0x1f] but instead emits [0x1f]
19:34 imirkin: an 0x1f offset is only possible with u8/s8, so something to look at
19:35 pmoreau: I’m running it with python2 to be safe.
19:35 airlied: pmoreau: i edit the file and remove the utf8
19:35 pmoreau: Mmh okay, I should look if any issues were reported against the CTS.
19:36 airlied: i think there are 5 or 6 tests
19:36 AndrewR: pmoreau, https://pastebin.com/rJt69StB - stderr/stdout from luxcoreui program ... it _nearly_ starts to render .... and hangs :}
19:36 airlied: invalid char ones and some 420 pack
19:36 pmoreau: AndrewR: And you did not have those issues before with Luxcore? Or are those different (IIRC you still had some)?
19:37 AndrewR: pmoreau, I think your previous iteration of those patches just resulted in glsl (???) error somewhere ...
19:37 pmoreau: > (I know this is very early pre-alpha code, just found it interesting how my new gt215/geforce 240 card reacts ...)
19:37 pmoreau: Don’t worry about it, it’s awesome that you are testing it so often: much appreciated!
19:38 AndrewR: pmoreau, I think card #2 a bit too upset for abusing it again this way :}
19:38 pmoreau: airlied: I could try that, let’s see…
19:40 imirkin: AndrewR: that gt240 shouldn't be so weak, esp with reclocking ... obv nothing compared to modern cards
19:41 AndrewR: imirkin, I mean after this ch. unload error ...when I got this with vdpau - it was sure hang/watchdog timeout on second attempt to try even with just vdpauinfo/glxinfo ....
19:41 imirkin: yeah
19:42 pmoreau: Why didn’t I try to open those qpa files earlier… I thought they were in a binary format and that’s why they were providing scripts for converting them to a readable format. 🤦
19:42 imirkin: it's just xml
19:42 imirkin: but not really xml
19:42 imirkin: there's also some xslt thing to view them in a browser
19:42 pmoreau: But enough readable that I can grep through them to find fails.
19:42 imirkin: sorta yeah
19:44 airlied: yah for small num of fails they are fine
19:49 imirkin: you can also just paste the images into chrome
19:49 imirkin: with data:image/png;base64,<paste>
19:50 imirkin: (in the url bar)
19:50 AndrewR: pmoreau, commit aa8661141a0f466994145e99be4d4bd4f9684a9d (HEAD -> nv50_compute_support, origin/master, origin/HEAD, master) actually restores Celestia ...time to bisect ....
19:56 AndrewR: pmoreau, may be I should make apitrace, so there iwll be no need for you to install/build it ...
19:56 pmoreau: I can install Celestia, it’s not a large program.
19:58 pmoreau: Oof, this is clearly broken :-/
20:21 AndrewR: https://yadi.sk/d/vjE-weMWVT8Cjg - 50 mb xz trace ....
20:32 pmoreau: Thanks
20:49 pmoreau: https://gitlab.freedesktop.org/pmoreau/mesa/-/commit/e12e7785937b0e2d75876111b4a4305ddafa5239 is breaking Celestia
20:50 pmoreau: AndrewR: ^
20:50 pmoreau: I’m going to redo that patch and the one before.
20:50 imirkin: if (s == 1) s = 2;
20:50 imirkin: 
20:50 imirkin: else if (s == 2) s = 1;
20:50 imirkin: 
20:50 imirkin: not hacky at all :)
20:51 imirkin: just normalize it properly everywhere
20:51 imirkin: the literal values don't matter afaik
20:51 imirkin: just has to match up everywhere
20:53 pmoreau: Right, I added that hack because it was breaking supertuxcart otherwise and I hadn’t had time to figure out why. :-D
20:53 pmoreau: I separated those commits to a separate branch now: https://gitlab.freedesktop.org/pmoreau/mesa/-/commits/nv50_resource_rework.
20:54 AndrewR: pmoreau, only obvious difference i can see is around lines 1163-1164 :} But Celestia doesn't use geometry shaders .... I think
21:08 imirkin: pmoreau: btw, feel free to dump notes about CTS failures in https://trello.com/b/lfM6VGGA/nouveau-cts
21:18 karolherbst: pmoreau: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6338 :p
21:20 karolherbst: is more or less required for private memory in CL as well
21:32 imirkin: AndrewR: btw, just #if 0 around the print which keeps happening
21:32 imirkin: AndrewR: i had not intended for it to appear in non-debug logs
21:32 imirkin: just not super-familiar with how Xorg internal logging works
21:33 karolherbst: ehhh
21:33 karolherbst: why does private memory fail for chars and shorts :/
21:34 karolherbst: ahh.. "gr: SKED: 00001000 [TOTAL_TEMP_SIZE]"
21:35 karolherbst: mhhh "SHADER_LOCAL_MEMORY_LOW_SIZE : 0x900"
21:35 karolherbst: OHHH
21:36 karolherbst: mhh
21:39 karolherbst: imirkin: do you know if there is some weird validation in the hw in regards to the tls space?
21:39 imirkin: btw, there's also "positive" and "negative" lmem
21:39 imirkin: but i never understood wtf that was about
21:39 imirkin: i mean ... define weird
21:39 karolherbst: well..
21:39 imirkin: there is validation :)
21:40 imirkin: there's not like an infinite amount of on-chip memory (sadly)
21:40 karolherbst: I have 0x900 as SHADER_LOCAL_MEMORY_LOW_SIZE and 0x800 as SHADER_LOCAL_MEMORY_CRS_SIZE.. and the hardware throws TOTAL_TEMP_SIZE
21:40 karolherbst: well.. it's for compute :p
21:40 karolherbst: so it's all VRAM anyway
21:40 karolherbst: or not?
21:40 imirkin: local memory comes from L2
21:41 karolherbst: ehh
21:41 imirkin: L2 gets partitioned into shared and local i believe
21:41 karolherbst: I meant local as in nvidia local, not CL local
21:41 imirkin: not sure
21:41 karolherbst: shared is L2
21:41 karolherbst: local is VRAM
21:41 karolherbst: we even allocate a tls bo
21:41 imirkin: i don't think that's quite right
21:41 imirkin: i mean yes
21:41 imirkin: but
21:41 imirkin: there's also on-chip, etc
21:41 imirkin: also the tls bo has to be big enough
21:41 karolherbst: for non compute or not?
21:41 imirkin: always
21:41 karolherbst: ohh.. mhh
21:41 imirkin: tls isn't different for compute vs non-compute afaik
21:42 karolherbst: mhh
21:42 AndrewR: imirkin, OK ...but it was useful to see
21:42 imirkin: AndrewR: yeah, but it's expected
21:42 imirkin: it's not a real error
21:42 karolherbst: but for compute we at least always specify the off chip C/R stack or something
21:42 imirkin: presumably because we assume that compute will have complex control flow
21:42 imirkin: while regular shaders won't
21:42 karolherbst: probably
21:43 karolherbst: okay.. let's see
21:43 karolherbst: ehh...
21:43 karolherbst: I don;'t need as many anyway
21:43 karolherbst: anyway.. I should figure out what's wrong
21:44 karolherbst: imirkin: hah!
21:44 karolherbst: we don't resive the tls bo
21:44 karolherbst: I doubled its size and now I didn't get the error
21:45 karolherbst: well...
21:45 karolherbst: we never resize it
21:45 karolherbst: oh well..
21:50 karolherbst: imirkin: memory opt crashes on this shader: https://gist.github.com/karolherbst/e80f8f4467da4fc16bb647272ca16d34 :/
21:50 imirkin: heh
21:50 imirkin: do you know where?
21:50 karolherbst: yes
21:51 imirkin: not sure, but iirc MemoryOpt isn't super-compatible with CL
21:51 imirkin: at least the bits that combine loads/stores
21:51 imirkin: into wider things
21:51 karolherbst: imirkin: https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp#n2862
21:51 imirkin: because the dynamic offset may not be aligned thew ay it is for GL
21:52 karolherbst: well.. it's alligned for local memory
21:52 imirkin: hmmm
21:52 karolherbst: but taht would cause issues at runtime
21:52 karolherbst: not compile time
21:52 imirkin: so presumably st->getSrc(s + 1) is null?
21:52 karolherbst: yes
21:52 karolherbst: s is 1
21:52 imirkin: right
21:52 imirkin: so
21:53 imirkin: this thing is really not ready for sub-32-bit things, i think
21:53 karolherbst: yeah...
21:53 imirkin: e.g.
21:53 karolherbst: although the vec8 versions works
21:53 imirkin: int s = sizeSt / 4;
21:53 karolherbst: it's just vec16 which is screwed
21:53 karolherbst: right..
21:53 imirkin: that only makes sense if you're storing stuff in 32-bit units
21:53 imirkin: so i'm guessing when it combines e.g. 8 + 8 into 16
21:54 imirkin: it doesn't quite do it right
21:54 karolherbst: I guess
21:54 karolherbst: sizeSt is also -3 :)
21:54 imirkin: int sizeSt = typeSizeof(st->dType);
21:54 imirkin: right, it does -= other-size
21:55 karolherbst: ehhh
21:55 imirkin: again, i suspect some things aren't set quite right for sub-32 sizes
21:55 karolherbst: sp s starts with 0
21:55 karolherbst: *so
21:55 karolherbst: yeah...
21:55 karolherbst: maybe I should just disable memoryOpt entirely when dealing with nir/CL
21:56 karolherbst: we can vectorize loads in nir as well
21:56 karolherbst: I think...
21:56 karolherbst: it's not perfect
21:56 imirkin: or you can fix MemoryOpt
21:56 imirkin: it's not that complex
21:56 karolherbst: but at least nir gives us the alignment of pointers more or less
21:56 imirkin: wtvr...
21:57 karolherbst: yeah.. mhh
21:57 karolherbst: let's see
21:57 karolherbst: imirkin: I think memoryOpt is also a bit whaky for 64 bit loads...
21:58 imirkin: ok
21:58 karolherbst: I think I ran into the issue once
21:58 karolherbst: so I forced 32 bit loads for some stuff
21:58 karolherbst: mhhh
21:58 karolherbst: I remember that being an bigger issue
21:59 karolherbst: ahhh..
21:59 karolherbst: I know
22:01 karolherbst: uhhh