03:31 AndrewR: so, I tried to update mesa on my old cel1200/nv43 (agp) machine, and now dri disappeared: dri_fill_in_modes: driCreateConfigs failed / libGL error: failed to create dri screen / libGL error: failed to load driver: nouveau / Mesa 18.2.0-devel implementation error: Invalid GLSL version in shading_language
03:31 AndrewR: _version()
03:32 AndrewR: then swrast driver loaded, but it reports again: OpenGL version string: 2.1 Mesa 18.2.0-devel (git-58fb613a51) / OpenGL shading language version string: (null)
04:30 imirkin: AndrewR: hm, fun. i only have a NV34 plugged in right now ... the version detection stuff has had some changes recently, perhaps they messed things up
04:35 AndrewR: imirkin, https://pastebin.com/41HuNrB3 - a bit more complete error (sorry, I only copy/pasted top of it, not long list of swrast's glx/fbconfigs)
04:35 imirkin: hm
04:36 AndrewR: imirkin, also, this was with '--disable-dri3" config switch, on Slackware 14.1, so may be some prototypes etc are too old? (on my main machine with g92 and xserver 1.19.5 everything works ...)
04:37 imirkin: that should be fine
04:37 imirkin: this is something dumb happening
04:37 imirkin: just have to figure out what :)
04:41 AndrewR: sorry, need some sleep :/
12:53 Orbstheorem: Hello, does nouveau support GDM?
13:58 imirkin: Orbstheorem: should be fine...
13:58 imirkin: although i haven't personally tested any version since the 3.x release
14:27 Orbstheorem: Okay, thanks ^^
14:53 imirkin: AndrewR: good news - i can repro on my nv34
15:02 imirkin: lol. i'm the one who broke it.
15:10 imirkin: patch on list: https://patchwork.freedesktop.org/patch/225789/ -- will push tomorrow unless someone complains loudly
17:10 imirkin: pendingchaos: just thinking about the current logic for gm107 scheduling... can't we get WaR issues when dealing with loads/stores to memory?
17:10 imirkin: hakzsam: --^
17:10 imirkin: like store g[], val; load val, g[] -- where the memory address is the same
17:10 imirkin: i wonder if we shouldn't just make every memory load depend on every memory store, as far scheduling goes
17:11 imirkin: or does one have to throw in an explicit memoryBarrier() in a compute shader?
17:47 karolherbst: imirkin: doesn't memoryBarrier only help regarding cross thread accesses, especially when they diverge?
17:50 karolherbst: but if you only look at a single , then evreything needs to be "sane", no? like in your example we can't do the load after the store
17:50 pendingchaos: imirkin: nvcc generated https://hastebin.com/zifuhetoyo.bash for https://hastebin.com/edubobecuv.cu
17:50 pendingchaos: looking at lines 5, 7 and 8, I don't think there's a problem with stg and ldg?
17:52 pendingchaos: I'm not sure about st and ld, though I think they are meant to be safer than stg and ldg
17:52 imirkin: pendingchaos: interesting... store does rd 0x0 but nothing reads it
17:52 imirkin: well, stg/ldg go through some other kind of memory i think
17:52 imirkin: which might serialize everything
17:52 imirkin: fwiw, i suspect we should always be using them
17:55 imirkin: pendingchaos: btw, are you ok with https://github.com/imirkin/mesa/commit/5cf02528524d673fc8b15aceef5f2cf8e4b5ad59 ?
17:55 imirkin: i made some minor edits
17:55 pendingchaos: (the stg and ldg were using the cv cache qualifier though, I don't know what nouveau uses)
17:57 pendingchaos: yes, it looks fine to me
18:00 imirkin: ok. going to look at your shader replacement one next.
18:01 pendingchaos: https://hastebin.com/feduqaqoso.cu generates https://hastebin.com/ivuhasosub.bash, which it about the same as the previous one, but without the cv qualifier
18:02 imirkin: well, that one has an explicit membar in it...
18:02 pendingchaos: imirkin: I think fname is leaked in dumpProgram and leakProgram if fp == NULL btw
18:03 pendingchaos: I didn't think it mattered? I'll look a bit closer at the behaviour of __threadfence and membar
18:03 imirkin: well, membar flushes all pending writes
18:03 imirkin: and invalidates some caches
18:03 imirkin: iirc gl == global
18:03 imirkin: ival = invalidate
18:04 pendingchaos: I'll see if I can get nvcc to generate some code without optimizing away the load without __threadfence()
18:05 imirkin: maybe a separate function?
18:05 imirkin: dunno how sophisticated its analysis is
18:09 pendingchaos: https://hastebin.com/roqeqeginu.cpp gives https://hastebin.com/xowagikibe.bash (relevant stg and ldg are at lines 4 and 10)
18:21 pendingchaos: I'm not sure how to get nvcc to generate st and ld instead of stg and ldg
18:21 pendingchaos: perhaps we might be able to create a test application to see if what we're currently doing is safe?
18:21 imirkin: another question is whether we should just move to stg/ldg?
18:21 imirkin: (what happens if you throw an atomic in there btw?)
18:25 pendingchaos: like https://hastebin.com/piwukoxova.cu? gives me https://hastebin.com/coxejuseco.bash
18:27 pendingchaos: if https://hastebin.com/iteqiwewiv.cu is what you were asking for, it gives me https://hastebin.com/pegukosawe.bash
18:27 imirkin: iirc "ncg" == non-coherent global memory
18:27 imirkin: or something along those lines
18:43 pendingchaos: *and readProgram
18:53 mooch: mwk: it seems the pc nvidia drivers require FULL emulation of all things pfifo
20:07 imirkin: pendingchaos: all reviewed. let me know if you have questions
20:32 pendingchaos: imirkin: did you mean "if (replace) replace; else compile; dump;"? since dump requires the compiled code
20:46 glennk: imirkin, i think the sample values get whatever interpretation is currently set, rather than any implicit resolving
20:48 imirkin: pendingchaos: erm ... right. i was thinking tgsi dump. but yes, the code dump needs to be later.
20:48 imirkin: pendingchaos: probably should be careufl not to dump replaced code
20:50 pendingchaos: so "if (replace) replace else {compile; dump;}"?
20:56 pendingchaos: Is there currently any way to get the size of driver->source that isn't IR-specific btw?
21:20 imirkin: pendingchaos: i figured you could keep doing what you were doing... just do it before the compiler, and just pass the value in via nv50_ir_driver_info
21:33 pendingchaos: imirkin: the sha1 is truncated to make it easier to work with, otherwise it's rather long and unmemorable
21:35 imirkin: maybe truncate to 1 letter then? :p
21:36 imirkin: either you're looking for something unique or you're not... could just do crc32 if you want something short
21:39 pendingchaos: crc32 would probably be fine
21:42 imirkin: it just seems really bad to compute a sha1 and then cut it off
21:56 pendingchaos: imirkin: how useful is this for information on program headers?: http://download.nvidia.com/open-gpu-doc/Shader-Program-Header/1/Shader-Program-Header.html
21:56 pendingchaos: quickly looking at it and nvc0_program.c, they seem to share some fields and the page mentions things like tesselation, which gives the impression of nv50+ hardware
21:56 pendingchaos: though some of them on the page look rather old (e.g. ImapColor)
21:56 imirkin: the shader header is nvc0+
21:56 imirkin: it includes little things like "tls size" and other things
21:57 imirkin: enabled attributes
21:57 imirkin: as well as sysvals
21:57 imirkin: (on nv50, that info is fed in more directly... so don't worry about that)
21:58 imirkin: the shader header could be fed into rnn/lookup somehow
21:58 imirkin: i dunno, i guess it's not critical. but seems appropriate.
21:58 imirkin: of course for kepler+ compute, it's yet-another thing
22:11 pendingchaos: the stuff in envytools looks rather minimal, but I think it's enough to justify not including a shader header parser/compiler
22:11 pendingchaos: should stuff for kepler+ compute be included then?