00:01 HdkR: Being a person that has the ARMv8 ARM open daily, it always makes me sad when I have to figure out where other documentation is in a form that isn't easily digestible
00:01 HdkR: I may be spoiled
00:02 imirkin_: like i said, it'd be pretty cool if we generated an intel-style ISA manual
00:02 imirkin_: but ... a ton of work
00:02 HdkR: yea, it's a ton of work that isn't very rewarding
13:48 pendingchaos: imirkin_: Yeah, it seems I didn't look closely enough at iadd3's various forms
13:48 pendingchaos: Can you point me to the limm form? I only seem forms with 1 destination register and 3 operands.
13:50 pendingchaos: also: should the "neg" flag at bit 56 on https://github.com/envytools/envytools/blob/master/envydis/gm107.c#L2071 be called something like "sign"?
13:56 imirkin: pendingchaos: oh. i think i got confused. i saw iadd32i
13:56 imirkin: but that's not iadd3_32i ;)
13:57 imirkin: pendingchaos: the bit 56 thing - that's the sign-extend on the short-imm right?
13:57 imirkin: integer short imms are always (1,19) -- 19 low bits, and then 1 bit to fill the rest, either all 0 or all 1
13:57 pendingchaos: seems that way
13:58 pendingchaos: seems other instructions call it "neg"
13:59 pendingchaos: though it's a bit confusing for iadd3 since there is another flag for the operand called "neg"
13:59 imirkin: welllll ... neg and that bit are different
14:00 imirkin: since neg 1 = -1, but 0xfff80001 != -1 :)
14:05 imirkin: pendingchaos: https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp#n332
14:05 imirkin: that's how that bit 56 is treated in practice
14:06 pendingchaos:nods
14:07 imirkin: and it's the same on all fermi+ generations
14:07 imirkin: (i recently fixed up the other emitters, they were getting it slightly wrong for quite a long time)
14:10 pendingchaos: I think I'll call bit 56 "sign" or something in the int.rst and state that it's called "neg" in gm107.c
14:13 imirkin: hmmmmmm
14:13 imirkin: we really should teach envydis how to intelligently represent that
14:13 imirkin: i think it already knows
14:14 imirkin: we just don't use that knowledge
14:14 imirkin: hold on....
14:15 imirkin: didn't know about the latter 2
14:16 imirkin: yeah ok, this is doable.
14:19 imirkin: in the gm107 emitter
14:19 imirkin: this should read
14:19 imirkin: instead of
14:19 imirkin: static struct rbitfield u1920_bf = { { 20, 19 }, RBF_UNSIGNED };
14:19 imirkin: it should be
14:19 imirkin: static struct rbitfield u1920_bf = { { 20, 19, 56, 1 }, RBF_SIGNED };
14:20 imirkin: and nuke all the ON(56, neg) bs
14:20 pendingchaos: what does the "u" mean btw?
14:20 imirkin: unsigned ;)
14:21 imirkin: but it's just a variable name
14:21 imirkin: a global U19_20 -> S19_20 would be fine by me
14:21 imirkin: since it really is signed, even for like AND
14:23 pendingchaos: wouldn't S20_20 be better? since it's a 20 bit integer, not 19
14:24 imirkin: fine by me :)
14:25 imirkin: and then replace "ON(56, neg), U19_20" with "S20_20"
14:25 pendingchaos: I'll start working on a PR
14:26 imirkin: should probably triple-check it with a few ops, but i'm 99.999% sure that's right
14:27 imirkin: Lyude: friendly reminder re DP-MST info gathering
14:30 pendingchaos: I think I'll also do U09_20, it seems to be used similarly
14:32 pendingchaos: (or maybe not, I'll have to look a bit closer)
14:35 imirkin: pendingchaos: doubtful...
14:36 imirkin: i added that for the half-immediates
14:37 imirkin: so it's actually a sign situation, pretty sure.
14:37 imirkin: but feel free to check
14:37 imirkin: i could have missed something when playing with it
14:47 pendingchaos: should I also change the { 20, 19 } on line 243 and line 244 to { 20, 19, 56, 1 } and remove the ON(56, neg) for F19_20 and D19_20 too?
14:48 imirkin: no
14:48 imirkin: er
14:48 imirkin: hm
14:48 imirkin: yes. but leave them as RBF_UNSIGNED.
14:49 pendingchaos: yeah, that would make sense
14:49 imirkin: (coz they have the shr thing)
15:21 ClaudiusMaximus: having an issue with texture tearing, my code does glTexSubImage2D();glGenerateMipmap(); each frame, but sometimes it is scrambled
15:22 ClaudiusMaximus: https://mathr.co.uk/tmp/nouveau/ has some screenshots, 03.png is "ok", the rest are corrupt
15:22 imirkin: GPU?
15:22 ClaudiusMaximus: 01:00.0 VGA compatible controller: NVIDIA Corporation G98M [GeForce G 105M] (rev a1)
15:23 imirkin: are you using a TBO?
15:23 ClaudiusMaximus: don't think so
15:23 imirkin: do you have a trace i can repro with?
15:23 imirkin: (apitrace)
15:23 ClaudiusMaximus: i can make one, if you tell me how?
15:24 imirkin: btw, i'm sure you know this, but glGenerateMipmap is a horribly expensive operation
15:24 ClaudiusMaximus: i know
15:24 imirkin: https://github.com/apitrace/apitrace
15:24 imirkin: apitrace dump foo-program
15:24 imirkin: xz -9 foo-program.trace
15:24 imirkin: and make that available.
15:24 imirkin: er
15:24 imirkin: apitrace trace foo-program
15:26 ClaudiusMaximus: ok, building it now
15:49 imirkin: pendingchaos: don't forget to reorder S20_20
15:51 pendingchaos: should be done
15:51 ClaudiusMaximus: imirkin: https://mathr.co.uk/tmp/nouveau/sketchy.trace.xz 33MB
15:52 imirkin: ClaudiusMaximus: fetching
15:56 imirkin: ClaudiusMaximus: i see issues on my GF108 too
16:00 imirkin: ClaudiusMaximus: ok, so looks like every frame is basically "read fbo, texsubimage into texture 0, draw"?
16:01 imirkin: s/fbo/winsys fb/
16:03 imirkin: ClaudiusMaximus: have you identified whether it's the readback that's wrong?
16:04 ClaudiusMaximus: imirkin: each frame it reads stdin, texsubimage, genmipmap, draw, readback, outputs to stdout
16:04 ClaudiusMaximus: imirkin: it's displayed badly on screen too, so i think the readback is ok
16:04 imirkin: oh ok. apitrace splits up the frames slightly differently, but whatever.
16:05 imirkin: are you 100% sure that the texsubimage data isn't messed up?
16:05 ClaudiusMaximus: you mean before uploading to gpu? pretty sure, and i never had this issue with the evilblob
16:05 imirkin: coz it looks like the corruption has ended up in the trace
16:06 imirkin: are you perhaps overwriting the underlying data
16:07 imirkin: from what i'm looking, the data for frame 41 is wrong. if you look at the "clean" data, it's also shifted
16:08 imirkin: https://i.imgur.com/5QrjPtA.png
16:08 ClaudiusMaximus: i'm allocating a new buffer each frame it seems (this is ghc haskell, so allocaBytes is cheap)
16:08 imirkin: ok. well the data in the trace is wrong. apitrace just reads in whatever is sent via API
16:09 ClaudiusMaximus: ok, that does look wrong :(
16:09 imirkin: so i think something's overwriting something
16:09 imirkin: or there's a missing offset
16:09 imirkin: or ... something
16:10 ClaudiusMaximus: let me debug some more, you can assume the bug is on my side for now i think
16:17 ClaudiusMaximus: imirkin: well, ffmepg | pnmsplit shows no broken frames; i added checks to ensure i'm reading 100% valid ppm headers in case it got out of sync, all fine there
16:18 imirkin: ClaudiusMaximus: yeah, i suspect that the buffer you're using to feed into glTexSubImage is somehow getting overwritten
16:18 imirkin: before glTexSubImage has returned
16:19 ClaudiusMaximus: imirkin: mm, that would be a severe bug in something if so
16:19 ClaudiusMaximus: imirkin: i'll try a different program with similar tearing issues (that one is in C, sketchy is haskell)
16:20 imirkin: you're not using a PBO are you? (i don't think i saw one, but just checking)
16:22 imirkin: anyways, when replaying the trace, i see the same issues with llvmpipe
16:24 ClaudiusMaximus: no pbo, this is naive code i suppose
16:25 imirkin: yeah, just trying to think about mitigating factors
16:25 ClaudiusMaximus: https://code.mathr.co.uk/clive/blob/refs/heads/thsf-lab:/src/visuals.c is my other program (uploading a trace with this one now)
16:26 imirkin: ok, so no threading
16:27 ClaudiusMaximus: https://mathr.co.uk/tmp/nouveau/visuals.trace.xz 22MB, meant to be "fly vision" style stuff (hexagon tiling over webcam input)
16:29 imirkin: stupid question ... can you print out buf.length?
16:29 imirkin: in read_webcam()
16:29 imirkin: coz you memcpy() buf.length, but it's laid out s.t. it *has* to be tightly packed
16:29 imirkin: oh, but you don't flip
16:30 imirkin: anyways, i'd feel better if you printed buf.length
16:30 imirkin: or had an assert that buf.length == w * h * 3
16:30 imirkin: around line 550
16:31 ClaudiusMaximus: 921600
16:31 ClaudiusMaximus: constant
16:31 ClaudiusMaximus: even when the glitch occurs
16:32 imirkin: hrmph
16:32 ClaudiusMaximus: gotta go, may be back in a couple of hours
16:34 imirkin: yeah, i'm looking at the texturesubimage data - it's just wrong there
16:34 imirkin: it really feels like something off in your v4l api usage
16:34 imirkin: or something.
16:35 ClaudiusMaximus: i don't remember these issues with evilblob drivers though
16:35 ClaudiusMaximus: bye for now
21:59 RSpliet: imirkin_: .CC is just carry? Not all condition codes (divide-by-zero, overflow, carry, zero)?
22:00 imirkin: on tesla, there were flags registers
22:00 imirkin: which had multiple bits
22:00 imirkin: on fermi+, it's a single-bit
22:00 imirkin: set by whatever you want