16:15imirkin: Lyude: if you get a chance to test xf86-video-nouveau + mst again, mind fetching my master branch again and seeing what happens? i print more stuff in logs now. also after you plug the mst screens, grab the output of 'xrandr'.
16:57juri_: neat. nouveau works on this card even if i do not load the option rom.
17:25juri_: ... and the other three models of card i have.
17:39imirkin: the vbios must be present in order for nouveau to load
17:39imirkin: not running the option rom shouldn't cause problems
17:39imirkin: on resume-from-ram, the option rom isn't run either
17:45imirkin: the only functional difference should be that you get no video output until nouveau loads
18:12masterboy: hi guys wanted to THANK YOU very much for your work my nvidia 8600GT - NV84!!! is working flawlessly - hardware decode of h264 is working on mpv with 5% cpu usage!
18:13masterboy: and the 8th series is the first series that supports hardware decode of h264! it still works on ubuntu 16.04!
18:14imirkin: glad that works out
18:14imirkin: i've actually had a lot of trouble lately with vdpau on more recent kernels
18:14masterboy: at first i thought there is no hardware decode on the open driver
18:14imirkin: also note that h264 decoding has flaws with some videos
18:14imirkin: so i would keep a fallback available
18:15imirkin: more info available here: https://nouveau.freedesktop.org/wiki/VideoAcceleration/
18:15masterboy: somehow hardware decode is not advertised on nouveau enough :D
18:15juri_: imirkin: that's what i was going to start playing with next. ;)
18:15imirkin: juri_: i dunno, i get hangs lately
18:15imirkin: i don't remember when it worked fine last ... maybe like 4.10?
18:15masterboy: imirkin: hw video decode on nv84 is impossible - it works just on mpv
18:16masterboy: imirkin: gstreamer or kodi do not work
18:16imirkin: masterboy: and mplayer.
18:16imirkin: i test with mplayer :)
18:16masterboy: imirkin: yes
18:16imirkin: gstreamer is an unholy mess i avoid touching, and kodi tries to be clever with GL.
18:16masterboy: imirkin: what's up with kodi guys?? i get green stuff around video if i disabble their vdpau filter
18:17imirkin: mmm... that sounds like some kind of clipping issue
18:17imirkin: zero in YUV is green in RGB
18:17masterboy: imirkin: the farthest i got with kodi is i see video but it has a lot of green. if i don't disable their vdpau filter kodi crashes
18:18imirkin: GL + vdpau on nouveau = fail
18:18imirkin: i'm pretty sure that's exactly what kodi does
18:18imirkin: it's what mpv used to do too, but perhaps they changed things around?
18:18masterboy: with disabled vpdau filter in kodi settings i can see video greenish. i mean vpdau is enabled but there is a setting to turn off just the vdpau filter...
18:20masterboy: imirkin: there is a not so secret setting hwdec=vdpau vo=vdpau
18:20masterboy: imirkin: people don't know how to configure mpv...
18:20imirkin: and then and come complain to me
18:20masterboy: imirkin: i got crashes with mpv until i put this line in
18:20imirkin: i tell them to use mplayer
18:20imirkin: problem solved.
18:21juri_: imirkin: I'll be interested to find out what's required-required from the vbios.
18:21masterboy: imirkin: with mpv you don't really know what settings to pass to it
18:21imirkin: juri_: 95% of it
18:22imirkin: juri_: it has init scripts, display scripts, tables that tell you what connectors are there, what's hooked up to various gpio's for fan control, various reclocking-related things
18:22masterboy: imirkin: so just wanted to say nv84 is rock solid on nouveau with hw decode and it is a MIRACLE! and a lot of work!
18:22imirkin: glad you got it working
18:22masterboy: thank you - especialy for hw decode :D
18:22imirkin: do remember that some videos will trigger blocky corruption
18:22imirkin: i was never able to figure out why
18:22imirkin: i did everything like the blob did, and yet ... not.
18:23masterboy: imirkin: maybe mesa or the player... all the youtube h264 videos i tried work
18:23imirkin: nope, depends on the video
18:23imirkin: for example the simpsons movie trailer that i did the initial RE with triggers issues
18:23masterboy: imirkin: yeah could be just did not find the right bad video :P
18:24masterboy: imirkin: give me a link i can try
18:24imirkin: in fact, a lot of movie trailers i was able to find triggered issues. perhaps something with how they're cut together.
18:24masterboy: the pc is a bit far away but i can test and come back in a few days
18:25imirkin: the 720p one is the one i tested with
18:25masterboy: imirkin: thanks, i will test it is the least i can do for you guys.
18:26imirkin: well, no real point in testing - i know the outcome.
18:26masterboy: i am so amazed that nv84 is working and doing open source hw decode
18:26imirkin: i'm just proving that there are issues :)
18:26masterboy: maybe mpv does it good
18:26masterboy: ok i will come back with my results :)
18:27imirkin: if there are no issues, you're not using hw accel
18:27masterboy: but youtube video is not an issue
18:27masterboy: vimeo too
18:27imirkin: yeah, tons of videos work fine
18:27masterboy: imirkin: on pentium 4 video lags so i know i am using hw accel :D
18:28masterboy: 100% cpu all the time 32 bit
18:28masterboy: only nv84 saved this old pc
18:29masterboy: big ups, i will come back in a few days after i test ;)
18:30masterboy: kodi vdpau filter can be fixed i believe but i don't have the guts to get into the bug hunting
18:31masterboy: on kodi vdpau IS working with nouveau but this extra setting does something bad...
18:31masterboy: kodi 17, did not try 18..
18:33masterboy: btw if you have a faq say that mpv should be used with hwdec=vdpau vo=vdpau :)
18:34imirkin: i just say mpv shouldn't be used.
18:34imirkin: much simpler.
18:34masterboy: the truth is hard i know :D
18:34imirkin: they can't handle the truth!
18:36masterboy: imirkin: do your cards hw decode video with kodi 17?
18:36imirkin: no clue. i use mplayer.
18:37masterboy: imirkin: what android app you use to control mplayer from your phone? :D
18:37imirkin: connectbot :p
18:37masterboy: imirkin: lol lol lol :D
18:37masterboy: so true
18:40masterboy: so i mean all the linux distros should make a package so people could install nouveau hw decode more easy. This huge feature should be advertised more by the distros.
18:40masterboy: but i guess the users need to push this
18:41imirkin: can't redistribute the firmware
18:42imirkin: i tried to provide a script + instructions that was easy enough for most people to run
18:43masterboy: yes the script works fine, but it could be made into a package so people could just do sudo apt install ...
18:44imirkin: yeah, could be. there's one for arch and gentoo
18:44imirkin: afaik no one uses any other distros
18:44imirkin: (if they did, there'd be packages made by enthusiastic users)
18:48masterboy: true. i am glad i found the instructions. to be honest i installed the nvidia drivers, then removed them, then ubuntu had broken hw decode with nouveau... SO i reinstalled ubuntu from scratch and did not touch the nvidia driver anymore only then hw decode worked
18:48masterboy: uninstalling nvidia drivers does not work on ubuntu for nouveau hw decode
18:49imirkin: fwiw as an end user, the nvidia drivers are probably a better option than nouveau
18:49imirkin: your call, of course
18:50masterboy: imirkin: i thought so too until i tried nouveau hw decode on mpv
18:51masterboy: nv84 doing fhd 1080p 5% cpu
18:51masterboy: 2gb files
18:51imirkin: yeah, it should be able to do 1080p no problem
18:51imirkin: iirc 2048 is the max width
18:51imirkin: or 2044 or something dumb like that
20:01masterboy: imirkin: kodi 17 "prefer vdpau video mixer" turned off shows something like this https://prod-cdn.sumo.mozilla.net/uploads/images/2017-11-16-07-23-58-3ce42b.png and turned on crashes. but i guess kodi guys need to debug that...
20:01imirkin: you definitely want to use the vdpau video mixer
20:03masterboy: yes, i was using vdpau for hw decode but with the mixer kodi crashes. so kodi guys did a bad bad thing somethere
20:04masterboy: i mean it is still hw decoding but greenish
20:08HdkR: imirkin: https://www.youtube.com/watch?v=k7D6iDyrrIc Time to do perf optimizations, games aren't running fast enough :P
20:09imirkin: HdkR: is that with nouveau?
20:09HdkR: Should be
20:09karolherbst: imirkin: on a switch
20:09imirkin: is it linux booted on switch
20:09imirkin: ah ok
20:09imirkin: is the clock speed stuff all set up on there?
20:09HdkR: I'm not sure
20:09imirkin: check what's reported in pstate
20:10karolherbst: actually, do we support reclocking on the maxwell tegra?
20:10karolherbst: ahh, we do
20:11HdkR: I've been told they probably didn't set up DDR4 training or didn't jump in to 0a pstate
20:11karolherbst: on tegra we have more than just the normal desktop pstates though
20:11imirkin: yeah ... so clock speeds matter
20:11imirkin: they affect how fast things go, oddly enough
20:11Yoshimo: i hear heat and battery are an issue for now
20:12HdkR: https://www.youtube.com/watch?v=o3wo6MSxISo Seems this one is with correct clocks set :D
20:12imirkin: that seems better
20:12imirkin: this stuff is made for 30fps right?
20:12HdkR: nah, that's a 60fps game
20:13HdkR: Game framerates range from everything including variable framerate
20:15imirkin: HdkR: well, do you know if something as weak as the TX1 would run dolphin titles at full framerate?
20:15imirkin: with, shall we say, more optimal drivers
20:15HdkR: On my SHIELD TV it can run many titles
20:16HdkR: at full speed*
20:16HdkR: Animal Crossing being one of the easier ones :)
20:17imirkin: well, if you record traces that are unusually slow, i might be able to have a look
20:17imirkin: but tbh i dunno if there's much that can be done
20:17imirkin: [that is easy]
20:18imirkin: we need all the things we know we need but no one wants to work on
20:18imirkin: like instruction scheduling
20:19HdkR: Wish I could help, being tainted makes it hard
20:25HdkR: Ends up with me doing a lot of finger wiggling instead
20:25imirkin: yeah, well, wiggle all you want :p
21:00karolherbst: imirkin: with that iadd3 stuff: total instructions in shared programs : 5965774 -> 5964886 (-0.01%) total gprs used in shared programs : 687769 -> 687824 (0.01%)
21:09karolherbst: anyway, it seems like nvidia prefers mad+add over mul+iadd3
21:29karolherbst: oh, this iadd3 opt also fights against shl+add->shladd opt
21:30imirkin: there's probably a way to make it all happier
21:30imirkin: but it's not as obvious of a win as it may seem at first
21:35karolherbst: that's why I had the though of doing iadd3 after all the other opts
21:35karolherbst: doing iadd3 after or before const folding won't matter
21:36imirkin: try moving it to late
21:36karolherbst: doing it after load propagation won't hurt, because nvidia thinks that doing two adds with c is better than iadd3
21:36imirkin: that's where shl+add gets done iirc
21:36karolherbst: probably a good idea
21:36imirkin: i don't think the late thing was there back when hakzsam originally did it
21:38karolherbst: I had to fix the patches anyway
21:38karolherbst: opInfo table changed as well
21:44karolherbst: but try linking against LLVMCoroutines first
21:44karolherbst: and see if this actually fixes it
21:44karolherbst: so that you have at least something working ;)
21:46karolherbst: uhm.. wrong channel
21:46karolherbst: robclark: ^^
22:05karolherbst: imirkin: shladd can't be used with c as src1?
22:06imirkin: src1 in the nv50 ir has to be an immediate
22:06imirkin: although it's actually src2 in the encoding
22:06karolherbst: I see
22:06imirkin: we do it as (a<<b) + c
22:06imirkin: but for the purposes of the operation, c is actually the src1
22:07imirkin: i think c can be a const
22:07karolherbst: ahh, makes sense
22:07karolherbst: b can be a const, that's for sure
22:07karolherbst: let me play around a little
22:07imirkin: not in any encoding i'm aware of
22:07imirkin: perhaps we just didn't find it
22:07imirkin: or didn't look hard enough
22:08karolherbst: c can't be a constant
22:09karolherbst: (a << 4) + b => ISCADD R0, R0, R2, 0x4;
22:09imirkin: afaik b has to be an immediate
22:09karolherbst: ohh you meant constant == c :(
22:10karolherbst: "ISCADD R0, R0, c[0x0][0x148], 0x4"
22:10imirkin: i would ASSUME that the "c" arg acts like normal src1, and can be a short imm or a constbuf reference. not 100% sure though.
22:10karolherbst: okay, so that works as well
22:10imirkin: not sure if that's piped through insnCanLoad properly. hopefully it is.
22:11karolherbst: I am just wondering about that line in LateAlgebricOpt: "if (src0->reg.file != FILE_GPR || src1->reg.file != FILE_GPR) return"
22:11imirkin: in the nv50 ir, src0 == a, src1 == b. pretty sure.
22:11karolherbst: "shladd u32 %r173 neg %r156 0x00000002 %r154"
22:11imirkin: that's not gonna end well.
22:11karolherbst: either way
22:12karolherbst: src1 doesn't have to be GPR
22:12imirkin: can it have a neg? that's surprising. maybe it can.
22:12imirkin: well in LateAlgebraicOpt it must be
22:12imirkin: iirc that's before LoadPropagation?
22:12karolherbst: it is the second last
22:12karolherbst: well third if I count DCE
22:12karolherbst: only LocalCSE comes after
22:12imirkin: yeah ok, that's odd.
22:13imirkin: src1->reg.file should == FILE_IMMEDIATE
22:13imirkin: i'm not looking at the code though
22:13karolherbst: ohh, let me check
22:13imirkin: i dunno what src0 and src1 are ;)
22:13karolherbst: it does load propagate the immediate
22:14imirkin: oh. that's in reference to the add.
22:14imirkin: right, so it wants both args of the add to be regs
22:14karolherbst: (src0 << src1) + src2
22:14imirkin: i don't think we supported the third arg being a non-reg, even though it clearly can be.
22:14imirkin: in that context, src0 and src1 are of the ADD()
22:15karolherbst: still not correct
22:15karolherbst: src1 can be clearly c here as well
22:15imirkin: well, not wrong
22:15imirkin: just not as good as it could be.
22:15imirkin: send patches =]
22:15karolherbst: but yeah, I forgot that this is in the context of the add
22:15karolherbst: I am sure that may result in 0 shader-db changes :)
22:16imirkin: eh, dunno
22:16imirkin: shladd was actually a pretty beneficial thing
22:16imirkin: comes up quite a bit in address calculations
22:16karolherbst: yeah, maybe it helps
22:16karolherbst: we will see
22:18karolherbst: ohh mhh
22:18karolherbst: It think I know what the issue might be
22:18karolherbst: imirkin: we don't know which src might be the one being shifted
22:18karolherbst: so easier implementation checks for both being GPR
22:19karolherbst: but yeah, should be easy to fix
22:36karolherbst: imirkin: "total instructions in shared programs : 5965774 -> 5965737 (-0.00%)"
22:36karolherbst: no change in gpr
22:36karolherbst: affects only f1_2015
22:36karolherbst: but yeah :)
22:37karolherbst: seems like that only happens once per shader :)
22:37karolherbst: big surprise
22:40karolherbst: imirkin: those shaders are so massive that if I check just the game: "total instructions in shared programs : 294248 -> 294211 (-0.01%)" :)
22:41imirkin: yeah, this obviously affects only a small section of shaders
22:41imirkin: be careful though - you can only proapgate if it's not a long immediate
22:41HdkR: Are any of these stats from address generation in CL?
22:41imirkin: this is only GLSL shaders
22:41karolherbst: yeah, true
22:42karolherbst: "shladd u32 %r4032 %r3732 0x00000001 c2[0xae0]"
22:42karolherbst: this looks correct I think
22:43imirkin: yeah, that sounds legal
22:43karolherbst: imirkin: but the limm thing is also illegal if the the last source is a reg, or not?
22:43imirkin: i mean for the last source
22:44imirkin: i.e. if the add() source is an immediate, and the other is the shl, but the immediate is a limm, then you can't do it
22:44imirkin: like shladd dstreg, srcreg, 1, 0x800000 would be illegal
22:44karolherbst: I just added support for c
22:45karolherbst: I guess I can add support for imms as well while at it
22:46imirkin: HdkR: do you have any insights for what's expensive in dolphin's rendering?
22:47karolherbst: imirkin: https://github.com/karolherbst/mesa/commit/203aaae62d67080216ee2d2dd87982d9133e0931
22:48karolherbst: imirkin: I have to check for modifiers as well :(
22:48imirkin: karolherbst: can probably lose the check in handleADD()
22:48karolherbst: can't have a neg on the c source
22:48imirkin: are you sure?
22:48imirkin: i suspect you can
22:48imirkin: look for the flag.
22:48karolherbst: yes, but let me check again
22:49karolherbst: actually, that's legal
22:49karolherbst: and neg on the src0 as well
22:49karolherbst: not on both
22:50imirkin: well that's the usual shtick
22:50karolherbst: I see
22:50karolherbst: so we can't have an add(-a, -b) anyway
22:51karolherbst: well, actually we can
22:51karolherbst: the heck
22:51karolherbst: imirkin: "IADD3 R2, RZ, -c[0x0][0x148], -R2;"
22:52imirkin: finally a practical use for IADD3
22:52imirkin: with regular IADD that'd become PO, i.e. 1+a+b
22:52imirkin: which isn't quite -a-b
22:53karolherbst: I guess adding only add(neg(a), neg(b) -> iadd3(0, neg(a), neg(b)) for now shouldn't cause any troubles
22:53karolherbst: nvidias does an iadd(iadd 0 -a) -b)
22:56karolherbst: imirkin: so I don't have to check for both args having neg mods on it, because that's illegal to begin with
22:56imirkin: for now yea
22:57karolherbst: I think this add(-a, -b) thing could be quite beneficial actually. I guess a lot of shaders might do things like that, allthough most would do that for floats
22:59karolherbst: imirkin: I guess I move the check into tryADDToSHLADD, because we still have to do that for both srcs
22:59imirkin: your call
22:59imirkin: just make sure it works.
23:00karolherbst: mhh wait, src0 has to be a GPR, right?
23:00karolherbst: I totally forgot to check against that :)
23:01karolherbst: now add(c << 4, c) -> scadd(c, 4, c) could happen
23:03HdkR: imirkin: It's probably just non-optimal shadergen for a bunch of integer math
23:03HdkR: Most of the shader work ends up being integer things in its emulation of the ArtX GPU's TEV stages
23:03imirkin: we do suck at integers.
23:04imirkin: with maxwell, we have to make extensive use of XMAD
23:04imirkin: and we just don't
23:04karolherbst: xmad is kind of awesome
23:04karolherbst: but we really have to implement it as a super special OP with like 10 subops or something
23:04karolherbst: well, regular op
23:04karolherbst: but a lot of subops
23:05karolherbst: uhh, nice
23:06karolherbst: imirkin: that imm thing seems to be super common actually
23:06karolherbst: I hit actually the imm assert with that: "shladd u32 $r4 $r3 0x00000004 0xa3413100"
23:06karolherbst: so, now test against limm
23:06HdkR: imirkin: Yes, I noticed you only use imad and imul instead :P
23:07imirkin: HdkR: which are horrid.
23:07karolherbst: imirkin: do we actually have any code checking for longImms in peephole or is that done through canInsnLoad?
23:08karolherbst: insnCanLoad actually
23:09karolherbst: imirkin: how can I do that without changing insn->op?
23:09imirkin: dunno, i'd have to rtfs
23:12karolherbst: and it wants an instruction
23:20karolherbst: imirkin: with the imm stuff: total instructions in shared programs : 5965737 -> 5955372 (-0.17%)
23:21imirkin: wow, much better
23:21karolherbst: some shaders are hurt though
23:21imirkin: can't win 'em all
23:24karolherbst: and I broke my opt with c somehow...
23:26karolherbst: ohh.. :( silly me
23:31karolherbst: I removed the op = OP_SHLADD thing
23:31karolherbst: well, shouldn't change the stats though
23:34karolherbst: imirkin: but overall still "total gprs used in shared programs : 687769 -> 687675 (-0.01%)"
23:38karolherbst: imirkin: ohh, the hurt shaders are thing where a shl is done, but can't be eliminated
23:39karolherbst: but I tend to ignore those, because usually with scheduling we can minimize those negative impacts
23:39karolherbst: just.. we need one
23:41karolherbst: that shader though: https://gist.github.com/karolherbst/ab37e6179f4137d8d16d9e6fdf48dcd5
23:41karolherbst: part actually
23:41karolherbst: we got thinks like "shladd u32 %r202 %r199 0x00000001 0x00000001" already though
23:42karolherbst: ohh, earlier opt most likley
23:43karolherbst: explains why it wasn't done in latealgebraicopt
23:44karolherbst: mad(a, b, c) -> shladd(a, log2(b), c)
23:44imirkin: well, the mad() wouldn't come up in the first place
23:44imirkin: i think it'd be a mul -> shl thing
23:44imirkin: so the mad() would never be created
23:45karolherbst: "mad u32 %r202 %r199 %r200 %r201" before opts :)
23:45karolherbst: crazy games
23:45karolherbst: civ 6 in that case
23:45karolherbst: start to use fma and stuff
23:46karolherbst: but uhm... they don't
23:46karolherbst: ohh, TGSI does that
23:46imirkin: should just emit UMAD as mul + add
23:46imirkin: and let the opts do the work if necessary
23:47imirkin: that'd be a godo thing to test out to see what it does
23:47karolherbst: :), well I had to disable that for those precise thingies a while ago
23:47imirkin: are you sure?
23:47imirkin: (hint: you're not.)
23:47karolherbst: right, that was for floats
23:57karolherbst: okay, now a better thing for that: https://github.com/karolherbst/mesa/commit/56d13872868547c90cd0412b732ecb693cdfdbcc