16:22 karolherbst: heh.. trying to reproduce the crysis broken shader, now I am running into i915 bugs :(
19:03 Lyude: karolherbst: what was your question about pulse?
19:04 karolherbst: Lyude: nvm, user disabled runpm on the audio device
19:05 karolherbst: so the GPU never suspended
19:05 karolherbst: I am mainly wondering now how much of a problem that could be in general.. but I guess all distributions will enable it by now
19:06 karolherbst: at first I thought that something polls the audio device to keep it active, but... yeah
19:07 Lyude: karolherbst: i mean yeah, that's pretty much expected behavior
19:07 Lyude: devices which depend on other devices also do so in terms of rpm dependencies
19:08 karolherbst: yep
19:09 karolherbst: wondering if we should print a warning in nouveau when runpm is disabled on the audio device
19:09 Lyude: probably not, tbh
19:09 Lyude: I don't know of any other drivers that do that (i915 and amdgpu also have similar rpm deps)
19:09 karolherbst: well.. nobody will complain to the audio devs about this
19:10 Lyude: that's probably our job :P
19:10 karolherbst: and if the laptop runs hotter than usual, they check the GPU for obvious reasons
19:10 karolherbst: well, we also have this silly HDA controller enabled message now
19:11 karolherbst: I have that 12 times in my dmesg already
19:11 karolherbst: since yesterday
19:11 karolherbst: fun fact: I don't have any audio devices on the nvidia GPU :(
19:11 Lyude: karolherbst: we should probably change that quirk to use printk_once
19:11 karolherbst: then I still have this message even though it doesn't apply to me
19:12 karolherbst: I would just remove it
19:12 karolherbst: or is there any benefit of having it?
19:12 karolherbst: I mean, the message doesn't tell the user anything
19:12 Lyude: karolherbst: I mean, if we have more PCI scan oddities in the future it'd probably be useful to have that printed yeah
19:12 karolherbst: why?
19:12 karolherbst: that's an unconditional print
19:13 karolherbst: well. it doesn't print for older nvidia gpus
19:13 karolherbst: but otherwise on any else
19:13 Lyude: karolherbst: oh-I didn't realize that, yeah we should at least make it conditional then
19:13 karolherbst: I think it even prints on desktop systems
19:13 karolherbst: yep.. should
19:14 karolherbst: Lyude: my point was rather, this message doesn't give any relevant information
19:14 karolherbst: if the audio device doesn't appear than .. well, you notice
19:14 karolherbst: mhh, I guess the only case where it might matter is when the multifunc bit is indeed set but no audio device is there
19:15 Lyude: karolherbst: well I'd think we'd at least want to let devs/users know when quirks are being applied
19:15 karolherbst: well.. this quirk isn't a quirk
19:15 karolherbst: in the sense of it only applies to a very limited set of hardware
19:16 Lyude: true
19:17 karolherbst: I think what we should check is if there is an audio device after we see the multifunc bit being set
19:17 karolherbst: if there is no audio device, but the GPU enters or was in multi-func mode, then we should print an error
19:17 Lyude: karolherbst: yeah, that sounds p reasonable
19:18 Lyude: that's what we do with the p50 gpu reset quirk
19:18 Lyude: or rather, we only print a message when we see that the firmware on the GPU is initialized
19:19 karolherbst: right, but that only runs on the p50, so a message is totally reasonable
19:19 karolherbst: and only if it matters
20:12 linearain: hi, this may not be the right place to ask but do you guys know whether any SiS (silicon inegrated systems) gpus or onboard gpus have 3d acceleration on linux? SiS mirage graphics 1, SiS662, its an integrated gpu on intel D201GLY2 mobo
20:20 Lyude: ask #dri-devel, this channel is for nouveau; the reverse engineered driver for nvidia GPUs
20:20 linearain: ok
20:23 Rodrigo_: hi, there are a few discrepancies in F2I and F2F with nvdisasm
20:23 Rodrigo_: https://github.com/envytools/envytools/blob/master/envydis/gm107.c#L1275-L1285
20:24 Rodrigo_: U8 and S8 don't exist (at least on SM53)
20:24 Rodrigo_: the second U64 is S64 (but that's probably a typo)
20:25 imirkin: mmm
20:25 Rodrigo_: in bit 41, F2I can decide which half to read
20:25 imirkin: yeah, obviously the final u64 is a typo
20:25 imirkin: as for U8/S8 ... which arg of the F2I does this apply to?
20:25 imirkin: destination or source?
20:26 Rodrigo_: dest format
20:26 imirkin: there are variants which convert to unorm
20:26 imirkin: obviously i don't remember the details... hold on
20:27 imirkin: aha, that's I2F ( and I2I)
20:27 imirkin: https://cgit.freedesktop.org/mesa/mesa/commit?h=63cb85e567ad1025ee990b38f43c2f1ef811821b
20:27 imirkin: wow. almost 5 years ago.
20:28 imirkin: i need to get a new hobby...
20:28 Rodrigo_: ^^
20:28 Rodrigo_: as for F2F, there's another detail
20:29 imirkin: so afaik I2I.U64.U32 and so on - those are "legal" encodings, in that nvdisasm works, but the actual op fails
20:31 Rodrigo__: my internet died for a sec
20:31 imirkin: i didn't say much... just
20:31 imirkin: so afaik I2I.U64.U32 and so on - those are "legal" encodings, in that nvdisasm works, but the actual op fails
20:32 Rodrigo__: what I was trying to say is that if source=F16 and dest=F32 (for instance), roundings like {.FLOOR, .TRUNC, ...} don't pop up, instead they use the {.RZ, .RM, .RP, ...} family
20:32 Rodrigo__: on F2F
20:32 imirkin: right
20:32 imirkin: that makes sense.
20:32 imirkin: feel free to update the decoding
20:33 Rodrigo__: ok
20:33 Rodrigo__: except that the `PASS` and friends are shifted 3 slots, because of course they are
20:33 imirkin: of course.
20:33 imirkin: now think about the joy of supporting multiple generations worth of encodings :)
20:35 Rodrigo__: sounds fun
20:41 karolherbst: imirkin, Rodrigo: somewhere I have implemented full cvt opcodes (all rounding moves + all bits + lowering). There are also some issues like 64 -> 16 cvts not working either
20:42 karolherbst: but
20:42 karolherbst: a u8/s8 does exist afaik
20:44 Rodrigo: when I try them I get: F2I.FTZ.INVALID1.F16.TRUNC
20:44 Rodrigo: on SM53
20:44 karolherbst: tried f32 -> u8?
20:44 karolherbst: f16 is a weird thing
20:45 karolherbst: but f32->u8 might actually not work
20:45 karolherbst: let me check the details
20:47 karolherbst: Rodrigo: f32->u8 doesn't exist either, but u32 -> u8
20:47 Rodrigo: makes sense
20:47 karolherbst: or I missed something, but I know that CVT can handle 8 bit types
20:49 karolherbst: mhh, or maybe I got smart and just ignored high bits somewhere
20:49 karolherbst: would have to check on the actual hardware
20:50 karolherbst: the bigger issue is just that most of the ALU can't do anything besides 32 bit
20:50 karolherbst: so you end up with a ton of and/shift instructions to counter that
20:52 karolherbst: let's see what nvidia is doing
21:00 Rodrigo: what are you trying to implement? NV_gpu_shader5 u16 and u8 types?
21:00 imirkin: opencl
21:00 Rodrigo: ah
21:02 karolherbst: seems like nvidia also ends up doing bfe
21:02 karolherbst: but that could be because cvt sucks
21:02 imirkin: yeah, cvt has some unfortunate timings iirc
21:02 karolherbst: it's slow, yes :)
21:02 Rodrigo: iirc bfe and bfi were removed in Turing
21:02 karolherbst: not really, they got replaced
21:02 Rodrigo: same as vmad and friends :(
21:03 karolherbst: they use the bitfield insert/extract style AMD and intel are using I think
21:03 karolherbst: anyway, cvt is slow, use anything else
21:04 Rodrigo: not as slow as u64 / u64 :P
21:04 karolherbst: u64 doesn't exist
21:04 karolherbst: :p
21:04 Rodrigo: have you seen what NV emits for u64/u64?
21:04 karolherbst: because there are no u64 ops on hardware
21:04 Rodrigo: it's a large snippet
21:04 Rodrigo: I know
21:04 karolherbst: well
21:04 karolherbst: it's not as large
21:04 Rodrigo: there are no div instructions either
21:04 karolherbst: only div is large
21:04 karolherbst: most int ops can be easily implemented in 32 bit
21:05 karolherbst: and I think CVT even supports u64 kind of
21:05 karolherbst: uhm.. it doesn't
21:06 karolherbst: Rodrigo: but div in hardware is so slow, nobody sane enough does it
21:06 Rodrigo: I hope no Switch game ever emits that code
21:06 Rodrigo: or fragment interlock's
21:07 karolherbst: they won't because the games would be slow :p
21:07 HdkR: fragment interlocks would be fun though :p
21:07 karolherbst: not the worst thing
21:08 karolherbst: the good thing about div is, that it's slow everywhere, so if you execute a div instruction or just execute the lowered code.. doesn't matter
21:09 karolherbst: there are examples though where one glsl statement lead to hell of code
21:09 karolherbst: which... emulating is fun on its own
21:09 karolherbst: like when quadops get involved
21:10 karolherbst: like TXD
21:10 karolherbst: imirkin: what was the trigger for our handleManualTXD code btw?
21:10 karolherbst: like what glsl code leads to that?
21:11 imirkin: when the non-manual one can't handle it :)
21:11 imirkin: cube, shadow definitely do
21:11 karolherbst: ahh, right
21:11 imirkin: 3d too
21:11 imirkin: _maybe_ the array ones? not sure
21:12 karolherbst: ohh found the condition
21:12 karolherbst: https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp#L1307
21:13 imirkin: yeah
21:14 imirkin: but like ... exepcted_args isn't easy to quantify
21:14 imirkin: in glsl terms :)
21:14 karolherbst: right..
21:14 imirkin: indirect and offsets can count
21:14 imirkin: but differently on different gens, of course
21:14 karolherbst: well, we are on maxwell here, so
21:15 imirkin: :)
21:15 imirkin: array || offset counts as 1
21:15 imirkin: indirect counts as another
21:15 karolherbst: well, dims already handles most of it :)
21:15 imirkin: each underlying dim obv also counts
21:15 karolherbst: so 2darray already hits it
21:15 imirkin: 2darray dim == 2
21:15 karolherbst: ohhh, really?
21:15 imirkin: yes.
21:16 imirkin: even cube dim == 2 (which is why dim = dim + iscube)
21:17 karolherbst: I am actually wondering if this is the most painful thing to handle in an emulator or if we have even more annoying lowering somewhere
21:18 imirkin: that's definitely up there with annoying things to emulate
21:18 imirkin: best is to detect the specific patterns
21:18 karolherbst: well.. optimized shaders and stuff
21:18 imirkin: that stuff intrinsically doesn't optimize well
21:19 karolherbst: ohh, I meant pattern detection on optimized binaries is annoying
21:19 imirkin: sure
21:19 imirkin: but i mean that pattern will be pretty much the same everywhere
21:19 karolherbst: probably
21:19 imirkin: coz there aren't 20 different ways to do it
21:22 Rodrigo: stores and loads with b8 and b16 are a bit annoying since they are not widely supported on GL
21:22 Rodrigo: I emulate them with load + bitfieldInsert + store for now
21:22 karolherbst: yes and that gets optimized to b8/b16 load/stores
21:23 karolherbst: you can even produce shorter/better code by adding "hints" through masks
21:23 Rodrigo: nice
21:23 Rodrigo: sadly I mark them SSBOs as volatile
21:23 Rodrigo: because I don't know the intention of the original program
21:23 karolherbst: like if an input is always 8 bit, you can just mask that input to be 8 bits to tell the compiler you only care about that
21:23 karolherbst: and then magic happens
21:23 karolherbst: (sometimes)
21:24 Rodrigo: -them*
21:24 karolherbst: well, volatile doesn't prevent that
21:25 karolherbst: anyway, volatile shouldn't even make a difference in terms of performance
21:25 karolherbst: nvidias compiler will have all those load/stores optimized already, so if you optimize some of them away, it's probably wrong
21:26 Rodrigo: we have to turn fastmath off to avoid glitches on Nvidia
21:26 imirkin: volatile means you can't hit L2 cache iirc
21:27 Rodrigo: isn't that controlled by these? https://github.com/envytools/envytools/blob/d799c46c16e6a04c0395c04018b4c326971d0294/envydis/gm107.c#L618-L624
21:27 karolherbst: imirkin: why?
21:28 karolherbst: I am sure you can't hit L1, but L2?
21:28 imirkin: mmmm
21:28 imirkin: yeah, ignore that
21:28 imirkin: anyways, you have to set a non-default access type on loads/stores
21:29 imirkin: CV is what we call it, i think?
21:29 karolherbst: yeah, I think so
21:29 imirkin: the L2 thing is atomics related maybe
21:29 karolherbst: maybe volatile even considers the CPU writing to the memory, in which case you wouldn't hit L2.. yes
21:29 karolherbst: but.. mhh
21:30 karolherbst: could be that CV was the worst caching mode though
21:32 HdkR: volatile assumes backing memory can change outside of the shader invocation
21:32 karolherbst: aka host updates memory, right?
21:32 Rodrigo: that's scary
21:32 karolherbst: well, compute
21:32 Rodrigo: yuzu doesn't handle it at all
21:33 Rodrigo: because guest CPU allocations != OpenGL allocations
21:33 karolherbst: Rodrigo: I think you can just ignore volatile and it will probably work
21:33 HdkR: yea
21:33 karolherbst: make it an option :D
21:33 Rodrigo: what about synchronization between warps?
21:33 Rodrigo: I mean, I saw games invoking compute to write a constant uint to a fixed location in an SSBO
21:33 Rodrigo: I don't trust those devs :P
21:34 karolherbst: I think you can cache a bit lower than that.. no idea how to define those in glsl though
21:34 imirkin: it maps pretty 1:1
21:34 imirkin: restrict = compiler hint
21:34 imirkin: volatile = volatile
21:34 imirkin: coherent = coherent
21:35 imirkin: nothing = nothing
21:35 Rodrigo: HdkR: did you see my trick to implement GL_CLAMP on Vulkan?
21:35 imirkin: so you should definitely mark everything restrict
21:35 imirkin: to avoid the downstream compiler trying to get clever
21:36 karolherbst: well, coherent might be safer, no?
21:36 imirkin: a little too safe.
21:36 karolherbst: better than volatile, no?
21:37 Rodrigo: https://github.com/yuzu-emu/yuzu/blob/bc55c05947030ba746d9d8e7e90660efe151ded7/src/video_core/renderer_opengl/gl_shader_decompiler.cpp#L639-L650
21:37 Rodrigo: GPU side performance is not an issue for now though
21:37 imirkin: ok, but why not just do what the original code wanted :p
21:37 HdkR: Rodrigo: I don't watch your source tree and your bridge has been down for months
21:37 Rodrigo: because I have no idea what it wanted :P
21:37 karolherbst: yeah.. I think it's safe to assume what the generated binary contains is the right hting
21:37 Rodrigo: HdkR: oh, I'll notify them
21:37 karolherbst: Rodrigo: it's encoded in the shader, no?
21:38 Rodrigo: HdkR: https://github.com/yuzu-emu/yuzu/pull/3290/files#diff-8d91f991269f0f120efff26f79f30becR62
21:38 Rodrigo: wdym?
21:38 karolherbst: all load/stores have the caching mode set
21:38 karolherbst: you can just use that
21:38 karolherbst: nvidia usually follows very strictly what the shader had
21:39 Rodrigo: the .CG, .WT and friends?
21:39 karolherbst: yes
21:39 Rodrigo: I haven't thought about using those
21:39 karolherbst: should give you all a decent perf bump
21:40 Rodrigo: unless it's Intel, GPU side performance is not an issue
21:40 Rodrigo: most of the time is spent in our somewhat slow caching system
21:40 Rodrigo: and uploading stuff to the GPU
21:41 Rodrigo: thankfully someone pointed me the fast path for glBufferSubData on Nvidia :P
21:41 Rodrigo: so it can be tricked to use CbData
21:47 skeggsb: karolherbst: no, >=volta use shifts etc for bitfield insert/extract
21:48 karolherbst: ohh, really?
21:48 skeggsb: yep
21:48 karolherbst: ufff
21:48 karolherbst: so full lowering.. anoying
21:48 skeggsb:is back to dealing with that now
23:22 Manizuca: hi there!
23:23 Manizuca: i have a Nyan-Big chromebook, with a tegra K1 using tegra+noveau mesa drivers
23:24 Manizuca: sadly, when using a recent version of mesa i can't get mutter to work
23:24 Manizuca: the log is full of errors like "kernel: nouveau 57000000.gpu: gr: DATA_ERROR 0000009c [] ch 4 [04001eb000 gnome-shell[597]] subc 0 class a297 mthd 17e0 data 0000004e"
23:24 imirkin: that is sad.
23:25 Manizuca: x11 used to work up until mesa 18.3.x
23:25 imirkin: what's the first such error in your logs?
23:26 Manizuca: and i bisected it to commit 9d4565100527d1de5941e4ff41a88318a95ba2cc on the 18.3 branch
23:27 Manizuca: imirkin, the first one is 'nouveau 57000000.gpu: gr: DATA_ERROR 0000009c [] ch 4 [04001eb000 gnome-shell[597]] subc 0 class a297 mthd 0d78 data 00000004'
23:27 imirkin: sounds familiar
23:27 imirkin: i think modifiers are/were variously broken
23:27 Manizuca: ather two gdm-x-session messages: (WW) modeset(0): Page flip failed: No such device or address and (EE) modeset(0): present flip failed
23:27 Manizuca: after*
23:28 Manizuca: imirkin, it never worked with wayland (similar error), and x11 worked before that commit
23:28 imirkin: tagr: do you have the conyext on that?
23:29 Manizuca: i'm currently using mesa 19.3.2 without that commit, and x11 works fine
23:30 imirkin: so ... this 100% sounds familiar, but tbh i'm not sure how it was resolved, if at all
23:30 Manizuca: as additional info, all packages are updated using arch linux ARM, except for the kernel (using 5.3.18)
23:33 imirkin: tagr will know what's up, but i forget what TZ he's in ... i think europe somewhere
23:34 Manizuca: ok, ill check the logs tomorrow morning
23:35 Manizuca: all people what i know with this chromebook are currently using llvmpipe with nouveau.modeset=0 (the last pages of https://archlinuxarm.org/forum/viewtopic.php?f=49&t=12185&start=190)
23:36 imirkin: sad!
23:36 Manizuca: yah, clearly using llvmpipe is not ideal :'(
23:36 imirkin: esp when there's a perfectly capable GPU sitting around
23:39 Manizuca: i can live reverting that commit for the moment
23:39 Manizuca: but that way i can't play with wayland
23:39 imirkin: yeah, i mean that obviously stinks
23:41 Manizuca: well, thanks imirkin . ill wait for tagr or whoever can help me!
23:47 imirkin: if you search in the logs, this has definitely brought up before
23:47 imirkin: i bet searching for that hash + site:people.freedesktop.org should find some results
23:47 imirkin: or not =/
23:48 imirkin: it was talked about on https://people.freedesktop.org/~cbrill/dri-log/index.php?channel=nouveau&date=2019-08-14 at least (look for 13:20)
23:53 gruetzkopf: hey, i'm having a strange-ish issue on i965+GF108GLM (as DRI display sink). when i dock my laptop two identical displays are attached and enabled. most of the times i do that, there seems to be some buffer mixup, with both displays showing both contents
23:54 gruetzkopf: any clue where i would even start debugging that?
23:55 imirkin: look at the output of "xrandr" after the problem situation happens
23:55 imirkin: could be that everything's working as intended, i.e. whatever the drivers are told to do by userspace