02:02imirkin: in case ReinUsesLisp reads logs ... look at what i did for the existing h* support wrt the H1 flag. iirc it goes before the reg.
02:05imirkin: look at e.g. 5cc0_1/2/3
02:06imirkin: or 50f0
03:26imirkin: ReinUsesLisp: see chan logs for your h0 question
03:29ReinUsesLisp: in this case it's different, they are only available when src is F16
03:29ReinUsesLisp: I could add them right after src's F16, but that's not how it looks in nvdisasm
03:30imirkin: yeah, there are a lot of differences against nvdisasm
03:30imirkin: we stick the neg/abs thing before
03:30imirkin: i think h0 fits into that
03:30imirkin: or b0/b1/b2/b3 for certain I2F cases
03:30ReinUsesLisp: ok ^^
03:30ReinUsesLisp: do you know if these are SM60?
03:31ReinUsesLisp: because I'm not testing these in desktop
03:31imirkin: no. just check with nvdisasm -b SM50 vs SM60
03:31imirkin: this is what i do: https://hastebin.com/wowigabaci.java
03:32ReinUsesLisp: oh, hastebin is alive
03:32imirkin: have to do it in groups of 3 ops, otherwise nvdisasm bails
03:32imirkin: yeah, it came back
03:33ReinUsesLisp: btw nvdisasm detects 0ULL as NOP, I don't know if it's worth adding that one
03:33imirkin: you sure?
03:33imirkin: i get a segfault
03:34imirkin: perhasp it doesn't like the 0x7e0 schedule for the NOP?
03:35imirkin: btw, thanks for improving envydis :)
03:36ReinUsesLisp: np, I've been assembling shaders by hand with envyas lately
03:37imirkin: my nvdisasm disagrees.
03:37ReinUsesLisp: I usually get more segfaults with Linux' nvdisasm
03:38imirkin: that means something's amiss
03:38imirkin: or a bug in nvdisasm
03:39ReinUsesLisp: I could send you that shader but it's copyrighted :P
03:39imirkin: well, note how it's after the EXIT
03:39imirkin: so it's just padding
03:39imirkin: the fact that nvdisasm decoded it as NOP could be a different manifestation of a bug
03:40imirkin: how's your emu going?
03:40ReinUsesLisp: I think those zeroes are added by my extractor
03:40ReinUsesLisp: we are getting really complex shaders now
03:40imirkin: what's the name of the project again?
03:41ReinUsesLisp: yuzu or Ryujinx, but I've been working on yuzu
03:42ReinUsesLisp: we added support for compute shaders recently so all those the ugly instructions are starting to pop up
03:42ReinUsesLisp: STG, VOTE, SHFL, ATOMS
03:43imirkin: well, ATOMS isn't so bad - that's directly supported
03:43imirkin: just atomics on shared memory
03:43imirkin: and VOTE is available with ARB_something
03:43ReinUsesLisp: yes, the problem is that GLSL doesn't let you reinterpret shared memory
03:44ReinUsesLisp: not with an inout parameter :P
03:44ReinUsesLisp: I emulate it with SSBOs because those can overlap in binding
03:44imirkin: shared memory wouldn't be an inout parameter
03:45ReinUsesLisp: how do you tell GLSL to do an integer atomic addition instead of a float atomic addition?
03:45ReinUsesLisp: if shared memory is declared as float
03:45imirkin: just do integer. nobody does float :)
03:46imirkin: you can also fake it with compare-and-swap
03:46imirkin: (+ loop)
03:46ReinUsesLisp: yes, I want to cover all cases :P
03:46ReinUsesLisp: wouldn't that be non-atomic?
03:46imirkin: nah, it'd be atomic
03:46imirkin: just inefficient
03:46ReinUsesLisp: oh, ATOMS doesn't support floats, just signed/unsigned integers
03:47ReinUsesLisp: it was SUATOM the evil one
03:47imirkin: however shared memory DOES support float atomic ops
03:47imirkin: which means they have to be emulated with somethign like this:
03:47ReinUsesLisp: let's see what Nvidia emits
03:47imirkin: oh, nevermind. i don't have an example handy
03:48imirkin: but it's basically a while loop that tries to compare-and-swap until it works out
03:48imirkin: (atomic compare and swap)
03:49ReinUsesLisp: glslangValidator rejects atomic additions with float
03:49imirkin: SUATOM is similar ... just with an image. esp if you require bindless images, should be a breeze.
03:49imirkin: you forgot to #extension GL_NV_shader_atomic_float: enable ?
03:50ReinUsesLisp: I think it doesn't affect shared memory
03:50imirkin: the ext sure does
03:52ReinUsesLisp: yup, now it built, time to extract the binary
03:52imirkin: iirc there is a ATOMS.ADD.F32 variant though with GM107+
03:53imirkin: on kepler it required a fallback
03:53imirkin: ah no, ok. it does the compare-and-swap
03:53imirkin: (CAS = compare and swap)
03:54imirkin: basically CAS returns the original value
03:54imirkin: so it does like
03:54imirkin: x' = x + 5
03:54imirkin: t = cas(x, x')
03:54imirkin: if t != x: repeat
03:55ReinUsesLisp: interesting, I guess I'll do that, calculating the workgroup padding of shared memory is annoying
03:55imirkin: and you can totally just do that in GLSL when emulating ATOM.E.ADD.F32
03:55ReinUsesLisp: I'll use the NV ext when available too :P
03:56imirkin: well that's just cheating!
03:56imirkin: if you're going to do nv exts, might as well just use NV_gpu_shader5
03:56imirkin: which basically lets you specify the various nasty nvidia ops directly
03:56ReinUsesLisp: I thought of using ARB, but it'll get messy to maintain
03:57imirkin: yeah, it's ARB-ish
03:57ReinUsesLisp: I already use NV_gpu_shader5 to emulate votes and shuffles, it's the only way to ensure it works just like in NX without using SSBOs
03:57imirkin: it'll effectively be a separate backend for you
03:57ReinUsesLisp: err NV_whatever_shuffles
03:57ReinUsesLisp: when not available I emulate it as if the GPU had a warp size of 1
03:57imirkin: NV_shader_thread_shuffle presumably
03:58imirkin: have a browse through that :)
03:58ReinUsesLisp: I've already watched it over and over, I can't find what emits VOTE.VTG :P
03:58imirkin: what's VTG?
03:58ReinUsesLisp: it's some clip state machine trash
03:59imirkin: right, but ... i'm familiar with like VOTE.EQ
03:59ReinUsesLisp: vertex, tessellation, geometry
03:59imirkin: oh, i wonder if it's just vote for those stages?
03:59ReinUsesLisp: you can vote on any stage afaik
03:59ReinUsesLisp: it has a different encoding https://github.com/envytools/envytools/blob/master/envydis/gm107.c#L2043
04:00imirkin: yeah, but what it does on non-frag stages is questionable
04:00imirkin: huh. takes an immediate too...
04:00imirkin: have you ever encountered it in an actual program?
04:00ReinUsesLisp: one second
04:01imirkin: hopefully a non-frag stage?
04:02ReinUsesLisp: the immediate might be controlling something with the predicates
04:02imirkin: looks like a shuffle of some sort
04:02imirkin: i won't even ask wtf EXIT CC.FCSM_TR is
04:02ReinUsesLisp: FLOAT CLIP STATE MACHINE (TR???)
04:03imirkin: impressive that these get used. we never do.
04:03ReinUsesLisp: looks like Nintendo paid extra to Nvidia on Smash Ultimate
04:04imirkin: ok, so varyings 0x70..0x7c are gl_Position
04:04ReinUsesLisp: now that I think about it I think FCSM_TR was used in a game that used double vertex shaders
04:04imirkin: if gl_Position isn't set, that primitive effectively gets discarded
04:04ReinUsesLisp: doesn't it default to zero?
04:04imirkin: ok, so there's some shady VP_A and VP_B shader business
04:04imirkin: doubt it
04:04imirkin: i never understood what it really was
04:05imirkin: we just always use VP_B, iirc
04:05ReinUsesLisp: it's looks like a D3D10 era compute
04:05ReinUsesLisp: so does NV
04:06ReinUsesLisp: thanks for supporting GL_ARB_compute_variable_group_size, we use that to emulate compute for now ^^
04:07imirkin: you could just do a lot of recompiles
04:07ReinUsesLisp: that'd invalidate people's transferable cache, which gets them angry
04:07imirkin: iirc we reduce the max allowed group size when it's variable
04:07ReinUsesLisp: I usually stack some breaking changes and push them togheter
04:08imirkin: you only get 512 on fermi, 1024 on kepler+ though
04:08imirkin: and iirc amd does something similar
04:08imirkin: basically there aren't an infinite number of regs on the chip, sadly
04:08imirkin: invocs * per-thread gpr usage <= N
04:11imirkin: 32K on fermi, 64K on kepler1, and i guess it got bumped up to 256K on kepler2+
04:11imirkin: or we limit to 64 regs on kepler2+ for compute with varying groups?
04:13ReinUsesLisp: I didn't investigate it since desktop Nvidia limits are NX limits (as long as NVN is not getting premium access to something)
04:13imirkin: hm. looks like 64k throughout, which means we limit compute to 64 regs on all gens if you use variable size
04:14ReinUsesLisp: when you supports half floats, here's how HSETP2 seems to work https://github.com/yuzu-emu/yuzu/blob/master/src/video_core/shader/decode/half_set_predicate.cpp
04:14ReinUsesLisp: and an unmerged fixup https://github.com/yuzu-emu/yuzu/pull/2802/files#diff-fcf26e378dd9a62ff9a1127ccee983f8
04:16ReinUsesLisp: that code has a bug if a shader looks like this https://hastebin.com/owofehedey.bash
04:16ReinUsesLisp: but that's minor and doesn't matter on hardware
04:17imirkin: i did the h* ops based on nvdisasm, i could have messed something up btw
04:17imirkin: check carefully.
04:19ReinUsesLisp: not my work so I can't talk much about it yet, I'm running those tests on NX so if something's broken it will pop up
04:25imirkin: just looking randomly at the code...
04:25imirkin: doesn't seem to distinguish between SHR.U32 and SHR
04:25imirkin: i.e. arithmetic shift vs logical shift or whatever
04:26ReinUsesLisp: we don't implement SHL.W either
04:26imirkin: yeah, wrap is missing too :)
04:26imirkin: except glsl is wishy-washy on how it should work
04:26imirkin: so you get what you get there
04:26ReinUsesLisp: I skipped a beat when I saw FSETP missing a neg
04:26imirkin: we might even emit it in codegen
04:26ReinUsesLisp: not on envydis, it was missing on yuzu too
04:27imirkin: envydis ain't perfect
04:27imirkin: emitNEG (0x06, insn->src(1));
04:28imirkin: in there since the initial commit, so i guess ben just missed it
04:30imirkin: alright. time for me to go
04:30imirkin: lmk if you have any questions. even if i'm not around, i check logs.
04:31ReinUsesLisp: ok, see you!
04:31ReinUsesLisp: I'll be going to sleep too
04:52HdkR: Shift mask versus clamp is easy enough to emulate with doing it manually in the shader
04:53HdkR: Just mask or clamp the operand depending on instruction mode
05:31PaulePanter: danvet: How should https://patchwork.freedesktop.org/series/65489/ be tested?
05:37PaulePanter: But the oldest card, I could find, is nv50.
10:06cosurgi: imirkin: !
10:06cosurgi: imirkin: I have it.
10:06cosurgi: It just crashed, and I have full backtrace from gdb
10:08cosurgi: imirkin: this is gdb 'bt full': https://paste.ubuntu.com/p/cChBxgTGVr/
10:09cosurgi: imirkin: this is Xorg.log: https://paste.ubuntu.com/p/rkzdSzDtdN/
10:09cosurgi: imirkin: this is dmesg: https://paste.ubuntu.com/p/PCwMgjYRwR/
10:10cosurgi: Same error: [3248211.193] nouveau_exa_upload_to_screen:380 - falling back to memcpy ignores tiling
10:10cosurgi: skeggsb, pmoreau !
10:12cosurgi:reconnects gdb to another Xorg session.
10:13cosurgi: BTW: is that the correct repository to pull from: git://anongit.freedesktop.org/nouveau/xf86-video-nouveau ?
10:13cosurgi: this is the one that I'm running right now.
10:13cosurgi: and against which I have the gdb backtrace.
10:13cosurgi: ec2b45d 'Bump version to 1.0.16'
13:48imirkin: cosurgi: you still have it in bt?
13:49imirkin: er, in gdb
13:49imirkin: if so, "p *bo"
13:49imirkin: anyways, the dst=0x7f1741a8a280 <error: Cannot access memory at address 0x7f1741a8a280> thing is ominous
13:49imirkin: of the bus error :)
13:51karolherbst: wild guess, the dst pointer got corrupted :p
13:51karolherbst: or was never set in the first place
14:07mooboat: Hi there. Is there a configuration option to change that will stop screen tearing?
15:15cosurgi: imirkin: unfortunately no
15:16cosurgi: next time I will try to not end it. I though that my VTs got completely blocked. But I didn't check for sure.
15:16cosurgi: imirkin: this happened when I was waking up the screens from 'xset dpms force off' after the night.
15:17cosurgi: imirkin: the CPU cures were under heavy load for several days. So this happened (again) under heavy CPU load.
15:17cosurgi: imirkin: the CPU cores were under heavy load for several days. So this happened (again) under heavy CPU load.
15:18cosurgi: imirkin: next time I will do "p *bo" whatever it does, you will get the output.
20:33cosurgi: imirkin: maybe this crash ineed is related to 'xset dpms force off' - something, some bitmap is freed when the screens are off. And somewhere else it is expected that it wasn't freed?