00:08 RSpliet: karolherbst: mind elaborating on the hilarity?
00:08 karolherbst: 450: is dead, but an opt kind of brings it back
00:09 airlied_: imirkin_: uggh it's going to be in the atomic conversion isn't it :-P
00:09 karolherbst: and then it generates another set and which ands 450 and 457
00:10 RSpliet: Why is it dead? 451 still reads it?
00:10 karolherbst: the movs are dead insie BB:17 and BB:18
00:10 karolherbst: there was a phi reading those
00:10 karolherbst: but it is gone
00:11 karolherbst: the phi was replaced by 457
00:11 karolherbst: 457 is basically a merge of 450 + the BB:17 + BB:18
00:12 RSpliet: ah, but the BBs contents (and thus the BBs themselves) aren't hoovered up yet
00:13 RSpliet: "early DCE" sounds fancier than it is though, you'll need to do the right analysis too. Not sure if it's already done at that stage
00:13 karolherbst: no clue
00:13 karolherbst: opts shouldn't break stuff
00:13 RSpliet: opts are always best effort, they can diminish performance in rare cases
00:13 karolherbst: now I have a loadpropagation generating "183: mad u32 $r4 $r10 0x10000000 $r4"
00:13 karolherbst: no
00:13 karolherbst: I meant _breaking_
00:13 karolherbst: not reducing perf
00:14 RSpliet: yes that bit is not recommended no...
00:15 karolherbst: uhm, lol
00:15 karolherbst: "212: mov u32 $r3 0x10000000 + 213: mad u32 $r2 $r13 $r3 $r2" ==> 183: mad u32 $r4 $r10 0x10000000 $r4
00:15 karolherbst: that's illegal, isn't it?
00:16 karolherbst: I should read the SSA forms more often though
00:16 imirkin: i think that's legal since dst == src2
00:16 imirkin: there's a FFMA32I, probalby a MAD32I as well
00:16 karolherbst: hum
00:16 imirkin: although i don't think we implement that
00:17 karolherbst: okay
00:17 karolherbst: makes sense
00:17 dboyan_: karolherbst: I've seen your segfault, quite strange, it has nothing to do with shaders
00:17 karolherbst: void nv50_ir::CodeEmitterNVC0::setImmediate(const nv50_ir::Instruction*, int): Assertion `(u32 & 0xfff00000) == 0 || (u32 & 0xfff00000) == 0xfff00000' failed
00:17 karolherbst: dboyan_: I got a different one later on
00:18 dboyan_: Oh, god. I guess I've messed up something.
00:18 karolherbst: it's compute related I am sure
00:18 karolherbst: it hinted in the traces
00:18 karolherbst: maybe related to samplers and something
00:19 RSpliet: karolherbst: it only does sext 21-bit vals?
00:19 karolherbst: it seems so
00:19 RSpliet: oh, I guess for int that's not too bad
00:20 RSpliet: (although sexting an uint is a bit useless)
00:20 dboyan_: karolherbst: Maybe not, I've also touched other places. I'll check carefully today
00:20 karolherbst: dboyan_: k
00:22 karolherbst: imirkin: any idea how to deal with that? teach insnCanLoad not to allow that or try to find a limm form?
00:22 karolherbst: well
00:22 karolherbst: we need to teach insnCanLoad anyway
00:23 imirkin: karolherbst: it shouldn't be allowed as-is...
00:23 imirkin: karolherbst: iirc you were playing around with some patches to enable it
00:24 karolherbst: no
00:24 karolherbst: it was FMA related
00:24 karolherbst: and I just copied what we do with MAD
00:27 karolherbst: and my postraloadpropagation stuff was for f32 only
00:31 karolherbst: imirkin: gdb tells me otherwise: https://gist.github.com/karolherbst/e7a8dcb2844c8f159e2b78b79344d642
00:32 karolherbst: s is 1 of course
00:32 imirkin: karolherbst: that seems wrong.
00:33 karolherbst: "opInfo[i->op].immdBits != 0xffffffff || typeSizeof(i->sType) > 4" is false
00:34 karolherbst: I guess the "opInfo[i->op].immdBits != 0xffffffff" part needs to be true
00:36 karolherbst: opInfo is {variants = 0x0, op = nv50_ir::OP_MAD, srcTypes = 1024, dstTypes = 1024, immdBits = 4294967295, srcNr = 3 '\003', srcMods = "\002\002\002", dstMods = 4 '\004', srcFiles = {2, 98, 66}, dstFiles = 2, minEncSize = 4, vector = 0,
00:36 karolherbst: predicate = 1, commutative = 1, pseudo = 0, flow = 0, hasDest = 1, terminator = 0}
00:43 karolherbst: anyway, got to sleep
00:45 shuffle2: does a falcon code page being marked "secret" give it special capabilities, or does it just try to prevent code readout?
00:46 shuffle2: and, where can i find actual info about the crypto extensions?
01:06 imirkin: shuffle2: not sure about secret. if you're talking about the "secure" modes, then yes, HS and LS code has more capabilities than unverified code
01:06 shuffle2: i mean when you copy code via CODE_INDEX with secure bit set
01:06 imirkin: sadly things like adjusting fan speeds require having signed code
01:06 imirkin: but the actual reclocking bit of it doesn't :)
01:07 imirkin: [not sure what the restrictions are on pascal]
01:07 shuffle2: wierd
01:07 imirkin: shuffle2: sorry, i'm not intimately familiar with it... there's a doc nvidia put out about how it all works
01:07 imirkin: have you looked at it already?
01:07 shuffle2: no..all i found was envytools
01:08 imirkin: k, sec
01:08 Horizon_Brave: anything fun from nvidia... = signed code xD
01:08 imirkin: shuffle2: ftp://download.nvidia.com/open-gpu-doc/Falcon-Security/1/Falcon-Security.html
01:09 shuffle2: ah i have seen that actually
01:09 shuffle2: was hoping for like details of the crypto instructions :P
01:09 imirkin: what crypto instructions?
01:09 imirkin: you mean for the HDCP video decoding stuff?
01:10 shuffle2: just generally better details about the actual instruction set
01:11 shuffle2: https://envytools.readthedocs.io/en/latest/hw/falcon/crypt.html#falcon-crypt
01:11 imirkin: the falcon isa is fairly well documented in envydis
01:11 imirkin: i think mwk might have more info on that.
01:11 shuffle2: if i grep for "lbra" for example there is only one hit...not much help
01:11 shuffle2: what does the l prefix mean :p
01:12 imirkin: probably long
01:12 imirkin: envydis/falcon.c: { 0x0000003e, 0x000000ff, OP4B, N("lbra"), LLBTARG, .fmask = F_FUC4P },
01:12 imirkin: takes a LLBTARG
01:13 imirkin: as opposed to the other variants of BTARG that the other ops take
01:13 shuffle2: also i've noticed in a few places after "ret" insns, there are single 0x01 bytes
01:13 shuffle2: which are just padding afaik
01:14 shuffle2: but the disasm doesn't handle them correctly :(
01:14 shuffle2: i'm working around it, just annoying :p
01:14 imirkin: does it matter? could be data bytes too...
01:15 shuffle2: it's just annoying because then you have to seperately disasm starting from the offset where the disasm broke
01:15 shuffle2: then it can't find calls and stuff
01:15 shuffle2: i do seem to have hit 2 not-so-easily explained unknown instructions
01:15 shuffle2: is that just a thing that happens
01:16 shuffle2: f5 3c 00 e0 ??? [unknown: 00 00 00 e0] [unknown instruction]
01:16 shuffle2: i guess this one should xor/clear something
01:17 imirkin: are you supplying the proper -V fuc5 or whatever?
01:17 shuffle2: afaik yes, i think it looks the best out of all i tried
01:17 imirkin: { 0x00003cf5, 0x00003fff, T(ol0), T(cocmd), .fmask = F_CRYPT },
01:18 shuffle2: -m falcon -V fuc5 -F crypt
01:18 shuffle2: is what i pass
01:18 imirkin: =/
01:19 shuffle2: f5 3c 00 ac cxor $c0 $c0
01:19 shuffle2: for example that decodes fine
01:19 imirkin: oh right. tabcocmd doesn't have 0xe0 in it
01:20 imirkin: that OOPS line could be enhanced if one wanted
01:20 imirkin: but wtvr
01:47 Horizon_Brave: Hey check it out... this is really neat, it gives you the packages and description of drivers and dirmware that you need to work with different Intel graphics cards and chips..
01:47 Horizon_Brave: https://01.org/linuxgraphics/downloads/stack
01:47 Horizon_Brave: you guys probably already know about it
02:02 nyef: Looks like the intel driver passes a flag to drm_mode_set_crtcinfo() to account for frame-packing modes, and uses drm_crtc_get_hv_timing() in a couple of places to obtain the "real" output resolution to use.
02:03 imirkin: here's a suggestion: copy what the only other working driver does :)
02:03 nyef: Sure, great idea. But the two drivers have a sufficiently different structure that I fail to see *how* to do that.
02:03 Horizon_Brave: :( you had to take it to a level way over my head lol
02:04 imirkin: nyef: well, not necessarily the code
02:04 imirkin: but at least the logical implications
02:04 nyef: Beyond throwing in the extra flag for drm_mode_set_crtcinfo(), which was obvious enough.
02:05 nyef: That'd be fine, except that I don't understand either driver enough to be able to say where some of these things apply.
02:06 nyef: I can see the way that these calls account for FP modes, but I don't understand where they need to be used in nouveau.
02:06 imirkin: check nv50_display.c
02:06 imirkin: that configures all this stuff
02:06 imirkin: and sends it down with evo_mthd/data commands
02:06 imirkin: evo = the fifo pushbuf for display engine configuration
02:10 imirkin: in case you get curious, here are docs about the methods available with evo: ftp://download.nvidia.com/open-gpu-doc/Display-Class-Methods/1/
02:11 imirkin: the cl91* are the gk104 ones
02:16 nyef: ... what the?
02:17 nyef: Why on earth is this even calling drm_mode_set_crtcinfo()?
02:17 imirkin: why not
02:18 nyef: Because it doesn't look like any of the parameters that the function sets actually get used.
02:19 imirkin: definitely does look like the sort of thing that should be called *first*
02:19 imirkin: rather than *last*
02:19 imirkin: without delving into impls
02:19 nyef: Exactly.
02:20 nyef: So, unless some other machinery is pulling from these mode->crtc_* fields, it's effectively useless.
02:20 imirkin: well, i'm guessing that interlaced settings aren't exactly popular
02:21 imirkin: unfortunately modesetting is extremely finicky, and these types of "i know better!" approaches have a tendency to backfire =/
02:21 nyef: Only hits I'm getting within nouveau are for dispnv04?
02:22 imirkin: pre-nv50 is very different than post-nv50
02:22 nyef: Right, and I'm not concerned with pre-nv50.
02:22 imirkin: that said nv50_display.c does seem to check for interlace every so often
02:23 imirkin: so yeah - nv50_head_atomic_check_mode - that causes stuff to get divided by ilace
02:23 imirkin: so i suspect that the CRTC_INTERLACE_HALVE_V thing isn't necessary
02:23 imirkin: should be easy to check with a TV - that definitely supports interlaced modes
02:26 nyef: Okay, it's for the IRQ timing damage. That's why we're calling drm_mode_set_crtcinfo().
02:28 imirkin: anyways, i'd definitely support a change to get rid of the ilace/vscan things and have set_crtcinfo fix them up
02:30 nyef: Heh. I was just thinking that I was sorely tempted to move the set_crtcinfo to the top, and derive from the crtc_ parameters instead of the bare parameters.
02:30 imirkin: ok, so we're in agreement =]
02:31 nyef: Yes-and-no. What I'm *actually* likely to do is to slap a FIXME comment to the effect that it might be a good idea, and then duplicate out the compensation logic for frame packing.
02:32 nyef: Then possibly have a separate commit to clean up the mess, since it doesn't actually have to do with stereoscopy.
02:33 nyef: And I don't want a potentially-breaking commit in the middle of a feature series that I'm trying to get merged.
02:34 imirkin: put it first to switch to moving the info up front? dunno
02:36 nyef: Might do the same commit two different ways, then.
02:37 nyef: Okay, yeah, this vscan / ilace stuff has to die.
03:24 nyef: Progress. The panel no longer rejects the mode as being unsupported.
03:25 nyef: And it reports that the signal is a 3D signal.
03:25 nyef: But the display clearly shows the bottom half of one eye image at the top of the screen, the top half of another at the bottom of the screen, and a black band between the two.
03:31 imirkin: off-by-one
03:31 imirkin: i wonder if there's some y-flip going on?
03:32 imirkin: errr wait
03:32 imirkin: ok, it thinks that it's a legit 2200-high mode?
03:33 nyef: The panel does. But I'm thinking that the driver is thinking that it has a 2200-high framebuffer for a 1080-high mode.
03:33 imirkin: yeah, there's probably some confusion between those things
03:34 nyef: So the next step is to track that down.
03:34 imirkin: so ... there's an "image"
03:34 nyef: Well, no. The next step is to clean up the mess from what I've done already. *Then* to track that down.
03:34 imirkin: which is the fb
03:35 imirkin: and then separately there's a mode, which is ... the mode
03:37 shuffle2: also about falcon... "Registers starting from 0x400/0x10000 are engine-specific and described in engine documentation."
03:37 shuffle2: where would that engine documentation be
03:38 imirkin: nyef: so i'd have a very careful look at nv50_head_core_set() vs nv50_base_image_set()
03:38 imirkin: shuffle2: envytools/rnndb
03:39 imirkin: shuffle2: the falcons are embedded inside various engines
03:39 imirkin: and they have access to various things provided by those engines
03:39 nyef: imirkin: Will do, once I have this drm_mode_set_crtcinfo() thing worked up.
03:39 shuffle2: i'm looking for tsec
03:40 imirkin: nyef: also there's a NV917D_HEAD_SET_RASTER_VERT_BLANK2 which has a YSTART and YEND... coincidence?
03:41 nyef: That's for interlace, AIUI.
03:43 imirkin: nyef: oh, i wonder if the nv50_head_view() stuff needs adjusting
03:44 imirkin: [well, the values that set the view.oH/iH stuff)
03:45 nyef: imirkin: Does http://paste.lisp.org/display/341958 look good to you?
03:46 imirkin: nyef: i didn't look SUPER carefully, but yeah
03:46 nyef: Okay, I'll get that sent to the nouveau list... "soon".
03:46 imirkin: drm_mode_set_crtcinfo will apply the dblscan stuff by default, right?
03:46 nyef: Right.
03:46 imirkin: you don't need to set a flag for it
03:46 imirkin: ok cool
03:46 nyef: There's a disable flag for it, unlike the interlace stuff.
03:47 nyef: There's something like two disable flags and two enable flags, for some reason.
03:48 imirkin: yeah wtvr. i'm sure there's a semi-good reaosn.
03:50 nyef: ... Is this view.[io][HW] an image scaler?
03:51 imirkin: i think so
03:51 imirkin: and it might be doing some scaling :)
03:52 imirkin: btw, there's a debug mode
03:52 imirkin: where you can get all the info
03:52 imirkin: being written into display regs
03:54 nyef: I think I have this figured.
03:54 nyef: ... If the network connection to my test machine will come back.
03:56 nyef: Hrm.
03:57 nyef: This may be a bit sketchy for a while.
04:14 nyef: Success!
04:17 nyef: Some small tidy-up to do, and I'm not at all convinced that I found all the bits that need adjusting for frame-packing, but the basics work.
04:17 nyef: Oh, and I still need to test gt215 + audio + 3D.
04:18 imirkin: cool
04:20 nyef: So, here's hoping to see it in 4.12, right? (-:
04:21 imirkin: yep!
04:34 nyef: Okay, some superficial digging suggests that the X11 protocol can handle non-square pixels. That's a start.
05:01 nyef: ... Multi-Buffering was superseded by Double-Buffering, which doesn't allow for stereoscopy.
05:02 nyef: Guess the next thing to look at is DRI2?
05:02 imirkin: you probably want to talk to people who have already thought this through
05:03 imirkin: i'd specifically advise speaking with ajax
05:03 imirkin: (who's not here, but he's in #dri-devel)
05:03 imirkin: it's likely that airlied_ will have something to say on the matter as well
05:04 nyef: I asked airlied_ a few things about the userland stuff within the past week. I should find and read over the logs of that as well.
05:06 imirkin: ajax = the last man who cares about GLX :)
05:06 airlied_: scary and very true
05:08 nyef: Heh. I was _just_ looking for the GLX protocol specs, to see if they allow for stereoscopy. (-:
05:08 imirkin: sure they do
05:08 imirkin: nvidia did it
05:12 nyef: I think that I've done enough for tonight.
05:14 nyef: And I should re-assess my plans tomorrow, now that the kernel side of things is mostly in shape.
11:04 dboyan: karolherbst: I guess my mind was quite dump at night. I found I forgot to flip on caching for compute program in the code I gave you yesterday.
11:04 dboyan: karolherbst: So I wonder if you are sure that the new change has actually caused those segfaults and assertion failures.
11:13 karolherbst: dboyan: the crashes only happened on the second+ run
11:14 karolherbst: first run where the cache was built was fine
11:14 karolherbst: the second crashed
11:14 dboyan: So it is very reproducible?
11:14 karolherbst: yes
11:15 dboyan: mmh, I'll devise some way to find out the problem then
11:46 karolherbst: hum, this looks wrong:
11:46 karolherbst: 68: set or u32 %r1392 neu 0x00000000 %r1384 %p3785 + 82: set u32 %r1413 eq %r1392 %r1412 -> 82: set or u32 %r1413 equ 0x00000000 %r1384 %p3785
11:46 karolherbst: shouldn't that be 82: set or u32 %r1413 neu 0x00000000 %r1384 %p3785?
11:47 karolherbst: additional info: 81: mov u32 %r1412 0x00000000
11:48 karolherbst: ohh no, it is fine
11:51 dboyan: karolherbst: btw, do you think it convenient that you give me an apitrace that triggers the shader cache bug?
11:52 karolherbst: can't do
11:52 karolherbst: I think
11:52 dboyan: okay
11:52 karolherbst: it uses buffer storage, so it is just black anyway
11:52 karolherbst: no idea how reliably that would be
11:58 dboyan: karolherbst: Then could you check if https://github.com/dboyan/mesa/tree/nouveau-cache-old work when you have time?
14:29 dboyan: imirkin: do you think we want to merge the glsl/tgsi cache to nouveau? If so, I'll send an updated version of that patch to fix a warning (actually a seemingly harmless error on my side)
14:30 imirkin: dboyan: if you think it's worth merging, sure, i can have a closer look at it
14:34 dboyan: imirkin: I think the tgsi cache is simple enough and helps reducing loading time a lot according to some tests.
14:35 imirkin: ok, so sure, send a v2, will look later on
14:35 dboyan: imirkin: The only concern I have is that there is a patch that changes one of its API on the list. https://lists.freedesktop.org/archives/mesa-dev/2017-March/148092.html
14:35 dboyan: I wonder if we should wait until that lands
14:36 imirkin: might be nice
14:36 imirkin: there's no urgency here afaik
14:38 dboyan: okay, I might ask tarceri whether and when it will land later
14:40 karolherbst: imirkin: i got a loading time reduce from 2m 40s to 1m 6s in hitman pro
14:40 karolherbst: just by the tgsi/glsl cache
14:40 karolherbst: the binary cache reduces it to 40s or so
14:41 dboyan: meanwhlie I still have to investigate why my "fixes" mess things up
14:52 Mortiarty: imirkin, you got the clip?
14:56 imirkin: oh shoot. forgot. yes.
14:56 imirkin: let's seee heeeere...
15:00 imirkin: Mortiarty: https://filebin.net/xtytnqu6p7g8480u/vdpau-test-clip.mkv
15:02 Mortiarty: yeah
15:02 imirkin: if you're on nvidia, you should confirm that it plays back ok
15:02 imirkin: (with vdpau)
15:03 imirkin: and then do
15:03 Mortiarty: need to boot for nivida but it has the artifacts
15:03 imirkin: like a LOT of them :)
15:03 imirkin: which is good
15:03 imirkin: hopefully that'll translate into the error being more obvious
15:04 imirkin: and then you can do a mmt trace with
15:04 imirkin: VDPAU_TRACE=1 valgrind --tool=mmt --..... mplayer -vo vdpau vdpau-test-clip.mkv
15:04 imirkin:hopes that the VDPAU_TRACE messages come in useful too
15:05 imirkin: oh, throw in -benchmark while you're at it to avoid silly things
15:05 imirkin: (like waiting for time to happen)
15:06 imirkin: you can kill it pretty early on, as soon as you see the artifacts
15:06 imirkin: no point in making the mmt trace bigger than it needs to be
15:06 Mortiarty: ok
15:06 imirkin: er, i guess you won't see artifacts
15:06 imirkin: but you can kill it as soon as you hit the place where artifacts otherwise happen ;)
15:07 Mortiarty: so just a few seconds.. got it
15:07 imirkin: 5 seconds should be enough
15:07 imirkin: -endpos 5 will make mplayer exit :)
15:07 imirkin:loves mplayer
15:07 Mortiarty: hehe
15:08 imirkin: so... the final thing is ... mplayer -vo vdpau -nofs -benchmark -endpos 5 vdpau-test-clip.mkv
15:08 imirkin: and please set VDPAU_TRACE=1 in the environment
15:08 Mortiarty: will do
15:08 imirkin: (it doesn't provide any real additional information except making it easier for me to find my way around the trace)
15:09 Mortiarty: is that a typo? -tool=mmt --.....
15:10 imirkin: --tool=mmt
15:10 imirkin: oh, there are like 35 other arguments
15:10 imirkin: check the valgrind-mmt wiki page for how to operate it
15:10 Mortiarty: ok
15:23 imirkin: Mortiarty: to double-check that mmt works, try tracing e.g. glxgears
15:23 imirkin: and then check the resulting filesize. it should be hundreds of KB, or MB's
15:23 imirkin: [i.e. that you're invoking it correctly]
15:23 Mortiarty: even with nouveau?
15:24 imirkin: no, with blob
15:49 Mortiarty: imirkin, https://filebin.net/55jmf2zpnlwo9hra
15:54 imirkin: what is that?
15:54 Mortiarty: imirkin, uhm the mmt trace you asked for
15:54 imirkin: awesome =]
15:54 imirkin: just wanted to double-check
15:58 nyef: What's struct nv50_wndw and related functions, the overlay?
15:59 nyef: Ah, no. A plane?
16:00 imirkin: plane == overlay
16:00 nyef: Okay then.
16:01 nyef: So, overlays and hardware cursor may-or-may-not work right with frame-packing, but that's a bit of a "later problem".
16:02 imirkin: Mortiarty: was that with the VDPAU_TRACE stuff?
16:03 Mortiarty: yes
16:03 imirkin: hmmmm... i don't see any of those =/ probably means they don't get logged for some odd reason
16:03 imirkin: they used to
16:03 imirkin: wait, did you see a ton of terminal output? might need a libvdpau compiled in a special way
16:04 Mortiarty: DPAU_TRACE=1 ./valgrind --tool=mmt --mmt-trace-nvidia-ioctls --log-file=file-bin.log mplayer -vo vdpau -nofs -benchmark -endpos 5 /tmp/vdpau-test-clip.mkv
16:05 Mortiarty: and i only saw the mplayer output
16:05 imirkin: VDPAU
16:05 Mortiarty: ups
16:05 imirkin: is that a paste-o? or were you missing the V?
16:05 Mortiarty: i actually missed that one
16:05 imirkin: oops indeed
16:06 Mortiarty: ok this time i came out as you described
16:06 imirkin: grrrr... demmt sucks at decoding these :(
16:07 Mortiarty: imirkin, check again pls https://filebin.net/55jmf2zpnlwo9hra
16:07 imirkin: can you xz -9 it/
16:08 Mortiarty: imirkin, done
16:08 imirkin: thanks
16:14 imirkin: mslusarz: --^ the IB address for the vdec bits appears to be 0
16:17 imirkin: Mortiarty: btw, can you confirm that it all rendered ok?
16:19 Mortiarty: imirkin, yes - it looked perfect - no artifacts
16:19 imirkin: ok. just checking =]
16:29 imirkin: mslusarz: oh hm. looks like there are 2 IB rings. right. URGH.
16:32 imirkin: mslusarz: making it always look for the ib buffer makes it work
16:32 imirkin: need some kind of better solution to that... oh well.
16:32 imirkin: Mortiarty: thanks! (for some reason the VDPAU_TRACE prints didn't make it into the second trace either...)
16:34 imirkin: i guess i'm going to be writing a few demmt adapters to make all this usable. when i did this before, demmt didn't exist, which made things much much harder
16:40 karolherbst: dboyan_: it works on the old branch
16:52 karolherbst: dboyan_: crash with your new branch: https://gist.githubusercontent.com/karolherbst/088ec2539f850c2960a9340e5814078c/raw/c2cfc6712ec2bf53d7d902293042800506f9e903/gistfile1.txt
16:55 nyef: HDMI stereo patch series v2 is almost ready. Remaining to do is the gt215 3D + audio test that has to wait until I have access to my other 3D panel, and rewriting one placeholder commit message.
17:04 karolherbst: .... the heck
17:12 karolherbst: imirkin: can you explain this to me? https://github.com/karolherbst/mesa/commit/a7753a1ee29e17b6df2f8a4940847729148f1557
17:12 imirkin_: yeah so
17:13 imirkin_: there's logic
17:13 imirkin_: in insnCanLoad
17:13 imirkin_: to reject stuff
17:13 karolherbst: ohhh, so it just happen to reject imms due to stupid reasons, and this patch fixes this stupidity for fma/mads?
17:14 imirkin_: if (i->op == OP_MAD || i->op == OP_FMA) {
17:14 imirkin_: // requires src == dst, cannot decide before RA
17:14 imirkin_: // (except if we implement more constraints)
17:14 imirkin_: if (ld->getSrc(0)->asImm()->reg.data.u32 & 0xfff)
17:14 imirkin_: unfortunately, that is wrong
17:14 imirkin_: what it really needs to do
17:15 imirkin_: anyways... do we support FFMA32I or MAD32I anywhere?
17:15 karolherbst: not yet
17:15 imirkin_: [what it really needs to do is tailor that check by type]
17:15 karolherbst: I have a post SSA opt to fix that
17:16 karolherbst: I just don't quite get why " 5: mov u32 $r0 0x00000004 (8) + 6: mad u32 $r0 $r10 $r1 $r0 (8)" isn't opted into " 5: mad u32 $r0 $r10 0x00000014 $r0 (8)"
17:16 imirkin_: heh. arm people trying to use nouveau.
17:17 imirkin_: coz the check is wrong
17:17 imirkin_: the check is designed for floats, not ints
17:17 imirkin_: it needs to check based on type
17:17 karolherbst: ohh I see
17:29 karolherbst: imirkin: anyway, leaving it as it was could lead to crashes in the emiter later on
17:30 imirkin_: yeah, i'm not sure what's going on here
17:30 imirkin_: seems like this would have come up before
17:30 karolherbst: yeah
17:30 karolherbst: I just hit this by disableing opts randomly
17:30 karolherbst: basically
17:31 karolherbst: should I fix insnCanLoad up for mad/fma as well? Or is this patch good enough for now?
17:31 imirkin_: you can send a patch, but i'm gonna have to do some code reading
17:31 imirkin_: coz i think this used to work.
17:32 imirkin_: right. so with the | 0x8
17:32 imirkin_: hm
17:32 imirkin_: yeah, i think it might always have been broken for ints
17:32 karolherbst: okay
17:33 karolherbst: I think I will still mimic the stuff from furhter above, maybe a few more ints gets immediated
17:33 karolherbst: allthough
17:33 karolherbst: should be fine
17:34 karolherbst: I think with my patch we could even drop the FMA/MAD block in insnCanLoad
17:34 Mortiarty: imirkin, would you need what was printed to stdout?
17:38 karolherbst: imirkin_: I send two patches out. This and the ConstantFolding for FMA
17:39 karolherbst: I think the first should also go to CC:stable but I let you decide on this
17:40 imirkin_: Mortiarty: well, it would have been most convenient to have it *inside* the trace
17:40 imirkin_: then i could see which commands were issued in response to which vdpau call
17:40 imirkin_: but i'll be able to work it out, i think
17:41 Mortiarty: imirkin, maybe with VDPAU_TRACE_FILE
17:41 imirkin_: nah - i want it inside the mmt trace
17:41 imirkin_: oh wait
17:42 imirkin_: hold on
17:42 imirkin_: grrrrrrrrr
17:42 imirkin_: --mmt-trace-stdout-stderr
17:42 imirkin_: can you add that when recording the trace?
17:45 Mortiarty: imirkin, done: https://filebin.net/55jmf2zpnlwo9hra
17:45 imirkin_: wow, only 6 more kb?
17:45 imirkin_:loves xz
17:46 imirkin_: Mortiarty: fantastic
17:46 imirkin_: it worked
17:46 imirkin_: [and kudos to mslusarz for making demmt as awesome as it already is]
17:46 Mortiarty: good! hope to watch accelerated decoding soon :D
17:47 imirkin_: yeah, dunno about that
17:47 imirkin_: all my previous attempts ended in fail
17:47 imirkin_: however i'm hoping that with improved tooling, i'll be able to spot the missing bit
17:49 karolherbst: :(
17:49 karolherbst: I think we have to improve RA
17:49 karolherbst: it obviously does silly things
17:49 karolherbst: in my currenty example I am like 100% sure that I didn't increase any live ranges
17:49 karolherbst: +2 gprs...
18:02 karolherbst: imirkin_: mind looking over my slct_eq/ne+set to not/mov+set patch again? I plan to run a piglit test over it and then send it to the ML later https://github.com/karolherbst/mesa/commit/5d5a541df6dd00e2e89ddc5aed138c833915a64e
18:02 imirkin_: not right now
18:02 karolherbst: no worries, whenever you have time
18:03 barteks2x: I'm not sure if this is an issue with nouveau but considering I've seen it do similar things before, I ask here first. Is it known issue that when using dri prime javafx applications have rendering issues? (flickering/some frames are partially or completely black)
18:03 imirkin_: i think java likes to multithread
18:03 imirkin_: nouveau does not
18:04 barteks2x: nouveau is not blacklisted by javafx
18:04 barteks2x: in fact, it's in a list of allowed ones
18:04 imirkin_: as a sign of protest, it displays black and occasionally hangs your machine :)
18:04 imirkin_: well, i dunno about javafx specifically
18:04 imirkin_: could be an unrelated issue
18:05 Mortiarty: imirkin, i was curious what the demmt would reveal to me but... i only get this as output: unknown type: 0xa
18:05 Mortiarty: 0a
18:05 Mortiarty: ?
18:06 barteks2x: it doesn't really crash
18:06 barteks2x: it just displays some frames as black
18:06 barteks2x: or partially black
18:06 imirkin_: Mortiarty: demmt -m e6 -l foo-log.xz
18:06 barteks2x: And even then, I've seen multithreading issues and they look differently on my machine
18:07 Mortiarty: thx!
18:07 karolherbst: should I do "piglit run --dmesg tests/gpu.py" for piglit or should I include/exclude something?=
18:07 karolherbst: barteks2x: you need to force vsync
18:08 karolherbst: _always_ do vsyncing with prime
18:08 karolherbst: on both sides, window compositor _and_ application
18:08 karolherbst: otherwise you get tearing or other flickering
18:09 barteks2x: I would still consider that nouveau bug if it renders some frames as black. Javafx may just keep showing the same frame until something is updated. Which may be black
18:09 barteks2x: And javafx doesn't have vsync on by default
18:11 barteks2x: the single reason I'm still using nouveau is because with nvidia drivers I can't disable the nvidia gpu without restarting X
18:31 Mortiarty: imirkin, if i did a mmt trace with nouveau - i would get a similar output?
18:32 imirkin_: Mortiarty: sadly no - nouveau will use nvif by default, and demmt doesn't know how to interpret it.
18:32 imirkin_: Mortiarty: you can hack nouveau to not use nvif, in which case it still won't work, but will be a lot closer to working :)
18:34 Mortiarty: imirkin_, you lost me there - you mean without nvif there are still artifacts but as many?
18:34 Mortiarty: not as many?
18:35 imirkin_: i was referring to demmt working
18:35 imirkin_: more than anything else
18:35 imirkin_: nvif vs not-nvif are just different ways of interacting with the kernel driver
18:35 imirkin_: demmt knows about non-nvif, but not nvif. so the decoding won't work
18:35 Mortiarty: ok :) - so good luck I guess
18:35 imirkin_: thanks!
18:36 imirkin_: i plan on spending a bunch of timing pimping demmt out for making it easier to analyze these video decoding things
18:36 imirkin_: coz i remember when i did it before, and it was MISERABLE
18:36 imirkin_: and at least writing tools won't end in failure even if the later analysis does
18:36 imirkin_: so i'll feel at least slightly productive :)
19:00 shuffle2: so for falcon/crypto, where do the xfer/dma transations actually go to? and how does key material work?
19:01 shuffle2: for example "csecret $c1 0x26" ... where does 0x26 come from?
19:02 shuffle2: should i assume all crypto operations are aes128Ecb ?
19:04 shuffle2: what data is cxsin/out actually pulling/pushing?
19:37 karolherbst: huh, only 25k GPU tests in piglit? I thought they would be like 70k or so
20:14 mslusarz: imirkin: demmt looks up only for the first IB because a) it's slow b) I have not seen multi-IB traces before
20:15 mslusarz: so if always looking up makes the trace work with demmt I guess you could enable it... or at least make it an option for now
20:20 nyef: Is there a way to detect the use of a second IB in order to enable repeated lookups?
20:25 mslusarz: imirkin: it seems there is a new version of create_dma ioctl (0x54)
20:37 imirkin_: mslusarz: did you see the trace that Mortiarty made?
20:37 imirkin_: mslusarz: i've also seen it a lot with compute
20:37 imirkin_: basically if you make pb_pointer_found = false in nvrm.c, it all gets found properly
20:38 mslusarz: which one? I see there are 3
20:38 imirkin_: the last one by time, which includes the vdpautrace prints
20:38 imirkin_: although all 3 of them should exhibit the issue
20:38 imirkin_: you have to open it with -m e6 coz we still haven't fixed mmt yet
20:41 mslusarz: I just changed nvrm_get_pb_pointer_found to always return false and demmt decodes even less...
20:41 imirkin_: well, dunno about that
20:41 imirkin_: there's a function with a local var
20:41 mslusarz: so what is missing?
20:41 imirkin_: called pb_pointer_found
20:41 imirkin_: i initialized that to false and all was well
20:42 imirkin_: also did you pull in my change for better handling of GET_CHIPSET ioctl failure?
20:42 mslusarz: yup
20:42 imirkin_: wait fuck
20:42 imirkin_: not nvrm.c
20:42 mslusarz: :)
20:42 imirkin_: buffer_decode.c::buffer_decode_register_write
20:43 imirkin_: line 53
20:43 imirkin_: sorry!
20:43 mslusarz: initialize it to fale instead of function call?
20:43 imirkin_: yes
20:43 mslusarz: false*
20:44 mslusarz: ok, there's a bit more
20:44 imirkin_: so now you should be seeing the OBJ915D or whatever's
20:45 imirkin_: (don't remember the clsid offhand of VLD/DEC/PPP)
20:45 imirkin_: (and i'm going to fix up the decoding later so that it can properly interpret them, and print out useful things)
20:46 mslusarz: also, there's more errors
20:46 imirkin_: meh
20:46 mslusarz: LOG: invalid ib entry, low2: 3
20:46 mslusarz: ERROR: This trace may not be decoded accurately because there are multiple fifo objects and ioctl_creates for some of them were not captured with argument data
20:46 imirkin_: the important thing (to me) is that it decodes stuff.
20:47 mslusarz: I understand, but it's worrying
20:48 imirkin_: i don't have the requisite level of familiarity with the various ioctl's to really dig into it
20:49 mslusarz: in early demmt days it used to try to decode garbage and it got suprisingly far ;)
20:49 imirkin_: i know that the thing i said to do is a giant hack, but it does get me going
20:49 imirkin_:remembers the dedma days
20:49 imirkin_: in fact i did most of the RE work with dedma
20:51 mslusarz: well, my point is: if you'll investigate those errors you may get even more info out this trace
20:51 imirkin_: oh, i agree =]
20:52 imirkin_: i've seen this before too with opencl i think
20:52 imirkin_: (/cuda)
20:59 shuffle2: so in addition to previous questions ... (:>), what are the rules around authenticated code?
20:59 shuffle2: for example, can any authenticated blob run on any tsec?
20:59 mslusarz: imirkin: hmm, errors like these: "w 4:0x0014, 0x00000000 IB: address: 0x200160a4, size: 0, not found!" may be caused by missing support for new create_dma ioctl
21:00 shuffle2: and, does the hardware somehow enforce the entrypoint of the secure/authenticated code?
21:06 imirkin_: mslusarz: that's probably a result of saying pb_pointer_found = false all the time, no?
21:06 imirkin_: also 200160a4 looks like a pushbuf command
21:07 shuffle2: who actually figured out all the fuc/crypto stuff?
21:08 imirkin_: shuffle2: the only person who knows *anything* about that stuff is mwk, so i'd recommend trying to get his attention rather than asking into the void.
21:08 shuffle2: ah ok
21:08 shuffle2: mwk: ^ ? :p
21:08 mslusarz: imirkin: hmm, you are probably right
21:10 imirkin_: shuffle2: i think he RE'd a bunch of the blobs being used for video decoding, which is where the crypto stuff is used. however those ops should just be accessible to any falcon code (on the crypto one), so it should be possible to RE what they do directly assuming some fairly intimate knowledge of AES-128
21:12 shuffle2: yea i have a jetson tx1 so i could probably test things, it just requires yak shaving and time, which wouldnt be needed if it were documented :P
21:12 shuffle2: altho i couldnt really test the behavior that requires to be executed from authenticated code
21:40 tobijk: mhm, imirkin is there mesa support for the pascal generation yet? (with mesa master)
21:59 imirkin_: tobijk: mesa support? i think so, yes.
21:59 imirkin_: shuffle2: i don't think the TK1 has a crypto engine (at least not a falcon one)
22:00 imirkin_: or TX1 for that matter
22:00 tobijk: imirkin_: and output source driving? e.g external monitors?
22:00 tobijk: half way there i guess?
22:00 imirkin_: tobijk: should work too
22:01 imirkin_: tobijk: if you're using drm-next you even get to use it. but no xf86-video-nouveau support yet. needs someone with a board to test a 2-line change
22:01 tobijk: <-- i guess im the lucky one
22:02 tobijk: output source driving is not updating screen content btw
22:02 Lyude: imirkin_: right, you asked me about that when I was coming back from PTO
22:02 tobijk: only the mouse cursor for whatever reason
22:03 Lyude: imirkin_: if you can get me the patch I'll give it a quick spin here
22:03 imirkin_: tobijk: are you using drm-next?
22:03 tobijk: not yet, only current -rc3
22:03 imirkin_: Lyude: step 1: write patch. i'll do that ... not this second. maybe tonight. but i've already said i'd be doing 20 things tonight, some of which are actually important.
22:03 imirkin_: [and not fun]
22:04 Lyude: imirkin_: that's fine, I was about to tell you I've got other stuff to do as well before testing that :P
22:04 imirkin_: [so there's still a chance]
22:04 Lyude: unfortunately mine are not fun either, hello my old friend xf86-video-intel…
22:04 imirkin_: hehe
22:04 imirkin_: i thought you guys killed that off
22:04 Lyude: mostly. it's complicated
22:04 imirkin_: or only on gen6+?
22:05 Lyude: gen4+ for fedora
22:05 imirkin_: ah
22:05 tobijk: mhm only having modesetting is fun on its own xrandr --setprovideroutputsource modesetting modesetting :O
22:05 imirkin_: tobijk: that won't work
22:05 imirkin_: tobijk: you want 1 0
22:05 tobijk: i i know :D
22:05 tobijk: still nice names ...
22:07 tobijk: so imirkin_ anything special to consider with drm-next?
22:12 imirkin_: don't think so...
22:15 tobijk_: imirkin_: ok thanks will test drm-next and if it works i'll test your patch, if you can forward it to me
22:16 imirkin_: first i'll have to write it
22:16 tobijk_: heh
22:16 imirkin_: but i'll send it to both you and Lyude when i do
22:16 tobijk_: ok never mind
22:16 tobijk_: take your time
22:16 tobijk_: modesetting outout driving is good enough for now (if that works with drm-next) :)
22:17 imirkin_: make sure to grab the update linux-firmware with nvidia fw for accel
22:17 tobijk_:ranting a bit about notebook constructor wirung up HDMI with the discrete gpu instead of the internal gpu
22:17 imirkin_: hmmm... i thought they stopped doing that now
22:18 tobijk_: imirkin_: https://hastebin.com/ireyexeqid.css
22:18 tobijk_: where HDMI1-2 is connected to the nvidia gpu
22:18 imirkin_: i'm not saying you're wrong
22:18 imirkin_: i'm mostly just surprised
22:19 tobijk_: nah just chatting no offense taken
22:21 tobijk_: imirkin_: would choosing the provider work on those setups? e.g DRI_PRIME=1 glxinfo
22:21 tobijk_: i'm confused right now with this setup :/
22:21 imirkin_: sure
22:21 imirkin_: but if you only have dri2, you'd need to set up offloading
22:22 imirkin_: also you'd need acceleration to work, for which you need drm-next.
22:22 tobijk_: ah right :D
22:27 imirkin_: bbl
22:38 airlied: imirkin_: laptop manufacturers do things different for different market segments
22:38 airlied: so som ewant external connectors on the discrete gpu
22:39 airlied: so they can have more than 3 outputs
22:40 tobijk_: airlied: yet it is a dumb move to power the discrete gpu if you only have 3 connectors in total :/
22:42 airlied: tobijk_: any of them displayport?
22:42 tobijk_: eDP and DP on intel
22:42 airlied: at least the most reason is for docks with MST
22:42 tobijk_: and a lost ()not connected hdmi
22:42 airlied: some Lenovo let you pick in BIOS
22:42 airlied: which gpu controls the MST output
22:43 tobijk_: nope, no bios setting, but i can vgaswitcheroo (intel does oops baldy though)
22:43 tobijk_:will be right back, testing drm-next
22:46 tobijk_: imirkin_: airlied: drm-next does drive the nivida connected HDMI port fine :)
22:48 tobijk_: and acceleration is working
22:52 tobijk_: imirkin_: if you are reading this later: let me know if the bios of this discrete card would be interesting for somebody: NV136
23:10 shuffle2: imirkin_: i'm pretty sure X1 has one :)
23:10 shuffle2: rather it has 2 TSEC blocks, and at least one of them is running fuc code
23:18 imirkin: shuffle2: yes, but not via the nvidia chip
23:18 imirkin: er
23:18 imirkin: not via the gpu chip
23:18 shuffle2: what do you mean?
23:19 imirkin: nvidia makes the SoC
23:19 imirkin: they reuse falcon in a few places
23:19 imirkin: including in bits of it that are not the (logical) GPU chip
23:20 tobijk: imirkin: thanks for the good tip btw :)
23:20 shuffle2: right, i dont think (this instance ) is in the core gpu part (which is maxwell)
23:20 shuffle2: but it is still part of the gpu peripherals
23:21 shuffle2: anyway, i dont see how that metters really?
23:24 shuffle2: it *does* have crypto hardware (one of the tsecs is intended for hdcp it seems), if that's what you were referring to
23:26 karolherbst: *sigh* working on hitman is really annoying with those two issues. I think we need to fix those first