00:09 hakzsam: imirkin, yeah, sure this will just folds immediates into MAD and removes the MOV
00:10 imirkin: the instruction forms require specific register patterns
00:10 imirkin: so you can't do it pre-ra
00:10 hakzsam: I remember now
00:16 hakzsam: well, I will have to run a bunch of piglit next week after the release
00:16 Riastradh: imirkin: You can redirect their questions to me. There's no user-facing documentation of anything relevant to NetBSD that's not also relevant to Linux.
00:17 Riastradh: imirkin: Email is my nickname at netbsd.org, if I'm not around.
07:38 karolherbst: hakzsam: with all RA changes the mad folding reduces instruction count by 0.33% overall and yeah, no effect on gpr count
07:43 hakzsam: yep, I saw
13:16 imirkin: Riastradh: would you be willing to do a very short writeup that i can put up on a page in the nouveau wiki? something that will make sense to netbsd folk? (configure your kernel thusly, install such and such package, etc)
13:17 imirkin: things that are expected to work/expected to not work
13:19 Riastradh: imirkin: Sure -- maybe I can find some time this weekend for it.
13:20 imirkin: awesome thanks. i have a friend who uses netbsd who'll be able to double-check whether it makes sense and what bits of info are missing.
13:21 imirkin: [sometimes info for those of us "in the know" is obvious and not worth mentioning, but utterly dumb-founding to those on the outside]
13:23 Riastradh: Right. Mostly I try not to require incantations on pages like that, but I guess it would be good to say `nouveau is disabled by default in 7 and enabled by default in current' and `tested on X, Y, and Z'.
13:24 Riastradh: Gotta run now!
13:28 imirkin: yep. see ya!
13:30 slamd64: hello. I have problem with nouveau on my laptop. With kernel 4.4 and above it fails to detect resolution correctly. It is an older NV34 graphics (FX5200Go). 4.2 and older kernels does not have that problem. I can manage to set up resolution with xrandr and it is 1440x900 resolution, but tty is still broken. nouveau.modeset=0 falls back to 1024x768 and also when I connect to external VGA. Any suggestions? Here is the screenshot http://
13:32 imirkin: your message was cut off
13:33 imirkin: modeset=0 means "disable nouveau entirely"
13:33 imirkin: 4.3 received a substantial rewrite, possible that something got messed up, esp with an eye to pre-nv50 + laptop
13:34 slamd64: Oh, I see, thanks for answer. Here is what it does look: http://imgur.com/a/ZUi0j
13:34 slamd64: screen is divided in 4 parts and inverted e.g. left is on right and vice versa
13:37 imirkin: i like it!
13:38 imirkin: could you ... boot with nouveau.debug=debug and pastebin the kernel log?
13:39 imirkin: oh
13:39 imirkin: and also drm.debug=0x1e
13:39 imirkin: (both of those options)
13:39 imirkin: that should hopefully provide a picture of what's going on
13:39 imirkin: this feels like a wrong pitch issue. or ... something.
13:41 imirkin: i'll bbl
13:51 slamd64: imirkin: ok thanks, i'll post results here in a few minutes
14:10 slamd64: imirkin: here's the kernel log http://pastebin.com/YJkXCK8K
14:36 imirkin_: slamd64: huh. well, it says "found EDID in BIOS" and then proceeds to add only 1024x768 and lower modelines
14:37 imirkin_: so ... could you put up your vbios somewhere? cp /sys/kernel/debug/dri/0/vbios.rom /tmp/vbios.rom
14:40 slamd64: yeah, that's what happens. I have to add 1440x900 manually
14:40 slamd64: I have cp vbios.rom to tmp, what should I do with it?
14:40 imirkin_: upload it somewhere. e.g. filebin.ca
14:41 imirkin_: er actually, i take that back -
14:41 imirkin_: i won't have time to look at it right now
14:41 imirkin_: file a bug at bugs.freedesktop.org (xorg -> Driver/nouveau)
14:41 imirkin_: and include (a) the info above, (b) the dmesg [the actual thing, not a pastebin link], and (c) the vbios
14:42 slamd64: ok, I will do that. Thanks a lot for helping me out. For now I will fall back to 4.2 kernel.
14:42 imirkin_: if you want to make progress on it yourself, you could do a bisect between 4.2 and 4.4 to see which change broke it
14:42 imirkin_: however that's probably less-than-fun on a laptop that came with a nv34 gpu :)
14:43 slamd64: thanks, I'll figure out something.
14:43 imirkin_: like i said, nouveau got substantial mechanical updates in v4.3
14:44 imirkin_: however it should all have been no-op changes. however i think as part of it, a few things *did* get changed, related to display on pre-nv50
14:44 imirkin_: so i'd suspect those above all
14:44 imirkin_: (might have been as late as v4.4, i forget)
14:44 imirkin_: slamd64: it may also be instructional to look at the nouveau.debug=debug drm.debug=0x1e log from 4.2.x
14:45 slamd64: that's really interesting, this NV34 has really poor 3d performance with nouveau, even I have graphics glitches when I run accelerated desktop environment e.g. Unity or Gnome. dmesg says lock up
14:46 imirkin_: that's not extremely surprising.
14:46 imirkin_: i would def recommend staying away from those with nouveau
14:49 imirkin_: slamd64: if you want some stability, i'd recommend removing nouveau_dri.so - the xorg accel stuff is pretty well-tested.
14:50 imirkin_: however the GL driver is kinda teh suck
14:51 slamd64: thanks. I don't really need graphics acceleration. I like it because of 16.4" screen and mostly I write some code and view web on this laptop.
15:00 dcomp: Any idea when karolherbst/stable_reclocking_kepler_v6 will be mainlined and or when i'll be able to use my GM108 without config=NvClkMode=7 runpm=0?
15:01 imirkin_: dcomp: v4.9 ought to contain all or most of that stuff
15:01 imirkin_: dcomp: however that won't help you use your GM108 without those params
15:02 imirkin_: ideally the runpm can be dropped, but it'd require someone to investigate a bit more carefully
15:02 imirkin_: karolherbst: we should do a full reclock sequence when coming out of runpm if we had previously set a clk mode... i dunno that we do. i suspect this is the reason dcomp needs runpm=0
15:03 karolherbst: imirkin_: he needs it, because you can't echo anything into pstate
15:04 karolherbst: if the card is suspended it just hangs the syscall
15:04 karolherbst: *blocks
15:04 imirkin_: NvClkMode=7 should "fix" that since it should reclock on boot
15:04 karolherbst: I think his issue was, that default clocks are unstable
15:04 imirkin_: no
15:04 imirkin_: his issue is that the clocks aren't set up by the vbios
15:05 karolherbst: isn't it the same as default is unstable?
15:05 imirkin_: i suppose.
15:05 imirkin_: anyways - here's my point
15:05 imirkin_: let's say everything works totally fine
15:05 imirkin_: and i echo 0f > pstate
15:05 imirkin_: and then runpm suspends the gpu
15:05 imirkin_: when unsuspending, i would expect it to be back in 0f
15:06 karolherbst: right, I took care of that in my follow up series
15:06 imirkin_: is that how it's supposed to work, or does it go back into "default" right now?
15:06 imirkin_: ah ok cool
15:06 imirkin_: did skeggsb integrate that into his branch?
15:06 karolherbst: currently it goes back to default
15:06 karolherbst: no
15:06 karolherbst: because it is a more architectural kind of thing
15:06 karolherbst: reworking the entire way we do reclocking from macro point of view
15:07 karolherbst: it also adds support for reclocks/revolts on temperature changes
15:07 karolherbst: so that the voltage limit is applied accordingly
15:08 karolherbst: it is in his review queue though
15:08 karolherbst: dcomp: but 0f works for you, right?
15:08 karolherbst: dcomp: because I just got a report from another gm108 user with ddr3 memory, and it doesn't work
15:08 karolherbst: just making sure
15:09 karolherbst: allthough I am not entirely sure if I checked ddr3 gm10x....
15:13 imirkin_: ah ok
15:13 imirkin_: but with those patches, dcomp should be able to drop the runpm=0 thing i think
15:13 imirkin_: karolherbst: do you have those somewhere convenient?
15:14 dcomp: karolherbst: yeah with your branch NvClkMode 0f works
15:14 dcomp: Card wont init qithout it
15:15 karolherbst: imirkin_: stable_reclocking_kepler_v6
15:15 imirkin_: so with that branch it should reclock to the previously set level?
15:15 karolherbst: I think so, yes. I remember having some issues with that though
15:16 karolherbst: I am sure it works for gpu suspend cyclers for sure
15:16 karolherbst: I also think I fixed all the issues I had, have to dig into that
15:16 imirkin_: dcomp: worth giving it a shot again?
15:17 karolherbst: I got the impression he asked, because he knows it works on that branch
15:17 karolherbst: imirkin_: https://github.com/karolherbst/nouveau/commit/e30573f589deaa306651218570729e16042ff3f9 ;)
15:17 karolherbst: right I fixed the issue
15:18 imirkin_: dcomp: give it a shot
15:18 karolherbst: my kernel crashed whenever I loaded nouveau with NvClkMode set, because of silly reasons
15:19 imirkin_: dcomp: to be clear, you still need the NvClkMode
15:20 karolherbst: I don't think so
15:20 karolherbst: ohh
15:20 karolherbst: yeah
15:20 karolherbst: he does
15:20 imirkin_: =]
15:20 karolherbst: I don't trust the code to clock on boot :D
15:21 karolherbst: I guess we will enable that in like 2022 for _some_ gpus .D
15:21 imirkin_: i'm sure nouveau will be long dead by then
15:21 karolherbst: mhhh
15:21 karolherbst: I don't think so
15:21 karolherbst: although it looks brim for now
15:26 karolherbst: anyway, gotta go
17:09 karolherbst: imirkin_: do you think it makes more sense to have 1 class for PostRAPass and just check for chipsets inside the code, or create one class for each chipset?
17:10 imirkin_: you could introduce a new post-ra stage that calls some target-specific pass
17:11 imirkin_: like we do for pre-ssa and post-ssa
17:12 karolherbst: mhh
17:12 karolherbst: well a lot of code would be still shareable though
17:12 karolherbst: like the mad thing also works for the kepler2 and maxwell isa
17:12 karolherbst: just with little differences
17:13 imirkin_: sure
17:13 imirkin_: and we have a lowering_nvc0
17:13 imirkin_: which works for all of those.
17:13 karolherbst: ohh I see
17:13 karolherbst: isn't there a gm107 one as well=
17:13 karolherbst: ?
17:14 imirkin_: there is, but it inherits from the nvc0 ones, and only overrides one of the passes iirc
17:14 karolherbst: I see
17:15 karolherbst: well basically I would check with insnCanLoad(mad, 1, imm); and go from there
17:15 karolherbst: but I would say it still belongs more inside the peephole file
17:18 karolherbst: but I like the idea of having target specific passes :)
17:25 karolherbst: imirkin_: do you have any clue why there is a check for "getDef(d)->reg.data.id" within Instruction::isDead ?
17:25 karolherbst: "getDef(d)->reg.data.id >= 0"
17:26 karolherbst: aka, when is reg.data.id < 0
17:31 imirkin_: before RA
17:31 imirkin_: .id == -1
17:32 karolherbst: ohh I see
17:32 karolherbst: I was thinking of adding a bool postRa flag to isDeadm but I assume I can simply rely on id being >= 0 to know it is postRA?
17:33 karolherbst: ir should I make two methods and just use the same internally? (like isDead and isDeadPostRa
17:33 karolherbst: )
17:33 karolherbst: there is this static bool post_ra_dead function I want to eliminate
17:34 karolherbst: because it isn't right anyway
17:35 imirkin_: iirc this stuff is super-subtle :(
17:35 karolherbst: right, that's why I thought adding an argument defaulting to false might be the best idea for now or a new method
18:30 karolherbst: mhh, my post RA DCE thing removes a handful instructions "total instructions in shared programs : 2818227 -> 2817883 (-0.01%)"
18:31 karolherbst: I was under the impression it shouldn't remove anything
18:31 karolherbst: checking
18:34 karolherbst: ohh
18:34 karolherbst: RA indeed removed a set
18:34 karolherbst: and made a value dead due to this
18:35 imirkin_: figure out wtf happens
18:35 karolherbst: odd
18:35 karolherbst: well
18:35 karolherbst: it looks fine
18:35 imirkin_: the solution is not to do post-ra DCE
18:35 karolherbst: just an if with empty branches
18:35 karolherbst: and something decides to remove the if thing
18:35 imirkin_: right
18:36 karolherbst: flattening is doing that
18:36 imirkin_: we should have a empty block removal thing instead
18:36 imirkin_: that happens pre-ra
18:36 karolherbst: right
18:36 imirkin_: although that can end up being a bit tricky
18:37 karolherbst: well we will need a post ra dce anyway maybe
18:37 imirkin_: no.
18:37 karolherbst: if we start doing more and more passes after RA
18:37 karolherbst: the mad thing already needs it
18:37 karolherbst: but it does check itself for it
18:37 imirkin_: that's not a DCE pass
18:37 imirkin_: that's a facility to remove a single instruction that's being replaced.
18:37 karolherbst: ohh I see
18:37 karolherbst: well
18:38 karolherbst: it isn't replaced
18:38 imirkin_: (potentially)
18:38 imirkin_: right.
18:38 imirkin_: to see if the intermediate value is still used.
18:38 karolherbst: right
18:38 karolherbst: I know that it is a bad place to do that in RA cause it is a bit pointless
18:39 karolherbst: mhh, where was my empty branch pass thingy
18:39 imirkin_: that's an example of something that'd be nice to have.
18:39 karolherbst: https://github.com/karolherbst/mesa/commit/d9c247e643a1a65a23fe59626db136e928bc3218
18:40 imirkin_: unfortunately it needs some love and care wrt not messing up phi nodes
18:40 karolherbst: I am well aware
18:40 karolherbst: had a lot of fun with this one
18:41 karolherbst: I think it is in a state where it doesn't mess up things, but who knows for sure
18:41 karolherbst: it is a frigging mess...
18:41 imirkin_: =]
18:42 imirkin_: we should probably rework phi nodes first
18:42 imirkin_: to not rely on the incoming edge order
18:42 karolherbst: especially the "EmptyBranchElim::removeEmptyBB" part will break stuff
18:42 karolherbst: it just looks like it
18:43 karolherbst: well, it is something: https://gist.github.com/karolherbst/5f0b7a17c768f38e68b0debc2ac51bc9
18:43 karolherbst: ...
19:06 karolherbst: imirkin_: well my idea behind that pass was to have only edges between non emty BBs, so if anything points to an empty BB, just let it point to the first non empty one following those branches. and after that if the bra instruction points to the same BB as the BB does in which the bra is, then the bra can be eliminated or at least the condition on that bra can be DCEed away
19:06 karolherbst: but maybe you know any less painful way of doing this?
19:06 karolherbst: mhh
19:06 karolherbst: I don't really have to change the edges....
19:06 karolherbst: just need to figure out the actual first non empty BB and compare that
19:07 karolherbst: ........k, got it :D
19:14 imirkin_: karolherbst: the BB's *after* will get messed up by their removal.
19:16 karolherbst: actually, it looks fine though
19:17 karolherbst: mhh
19:17 karolherbst: actually there are other issues
19:17 karolherbst: yeah, I shouldn't touch the edges
19:17 imirkin_: otoh....
19:18 imirkin_: yeah dunno
19:18 karolherbst: well
19:18 karolherbst: it doesn't actually matter
19:18 karolherbst: because if a bb is empty, it is empty
19:18 imirkin_: there actually shouldn't be any phi nodes in the subsequent blocks
19:18 karolherbst: it doesn't do any harm at all
19:19 imirkin_: yeah, if it's 100% empty (not even any phi's), then the later blcok shouldn't have any phi nodes
19:19 imirkin_: at which point this procedure should be safe
19:19 karolherbst: imirkin_: this is the thing I am looking at right now: https://gist.github.com/karolherbst/6610cb934d2ffe47b2a18fc3319aa6b9
19:19 karolherbst: BB:10 for example
19:19 karolherbst: and the set in BB:7 is dead code
19:20 karolherbst: ohh wrong one
19:20 imirkin_: note how it has a phi node
19:20 karolherbst: huh, wait
19:20 imirkin_: you mean BB:6
19:20 karolherbst: nope, I just forgot to turn on my post ra dce
19:22 karolherbst: hu
19:22 karolherbst: my empty bb pass indeed produced the same result
19:22 karolherbst: not that bad as it seems
19:23 karolherbst: https://gist.github.com/karolherbst/6610cb934d2ffe47b2a18fc3319aa6b9
19:24 karolherbst: added the current situation now
19:25 karolherbst: post RA 28-30 are DCEed away
19:25 karolherbst: 26-28 pre RA
19:25 karolherbst: so yeah
19:26 karolherbst: the set in BB:10 is pointless
19:26 karolherbst: and BB:11 and BB:12 don't even contain a phi instruction, just plain bra
19:27 karolherbst: and my idea was to check if, when I follow the empty branches from BB:10 that I get to BB:13 taking either way
19:27 karolherbst: so that means the bra at the end of BB:10 is pointless
19:27 imirkin_: your comments aren't matching up with that paste
19:27 imirkin_: anyways, i can't look at this right now
19:27 karolherbst: a_old pre RA
19:27 karolherbst: k
19:27 imirkin_: you can make a succinct explanation and i can look at it later
19:28 imirkin_: not 100% sure what you're looking for from me
19:28 karolherbst: a simplier way of doing this pass
19:28 karolherbst: which means a way without touching edges
19:28 imirkin_: send patch to list and we can discuss it there.
19:28 karolherbst: well, I've got an idea now anyway
19:51 vita_cell: can someone help me, it seems that I can not still get my Nvidia card as OpenGL render
19:51 karolherbst: vita_cell: prime?
19:51 karolherbst: vita_cell: use dri3
19:51 vita_cell: yeah wait
19:52 vita_cell: look I do this:
19:52 vita_cell: xrandr --setprovideroffloadsink 1 0 && DRI_PRIME=1
19:52 karolherbst: use dri3
19:52 NanoSector: that first thing shouldn't be needed once you use dri3
19:52 vita_cell: DRI_PRIME=1 glxinfo | grep "OpenGL renderer"
19:52 vita_cell: OpenGL renderer string: Gallium 0.4 on NV106
19:52 NanoSector: yes it works then
19:52 vita_cell: no
19:53 vita_cell: games runs same way
19:53 karolherbst: you have to start the game with DRI_PRIME=1 ;)
19:53 NanoSector: that does not mean it's not working
19:53 vita_cell: I do the below, and seems that Nvidia card works as OpenGl render (seems)
19:53 vita_cell: then I reclock, but games runs on Intel
19:53 karolherbst: check clients
19:54 karolherbst: within debugfs
19:54 vita_cell: http://dpaste.com/1MA1R7E
19:54 karolherbst: ...
19:54 vita_cell: all stuff still run with Intel
19:54 karolherbst: read again what I wrote
19:55 karolherbst: I already told you what you did wrong
19:55 vita_cell: what it is?
19:55 karolherbst: use DRI_PRIME=1
19:55 vita_cell: ahhh
19:55 vita_cell: ok
19:55 vita_cell: wait
19:55 karolherbst: it isn't a command
19:55 vita_cell: ./program DRI_PRIME=1
19:55 karolherbst: no
19:55 karolherbst: DRI_PRIME=1 command
19:56 karolherbst: DRI_PRIME is just an environmental variable
19:56 vita_cell: DRI_PRIME=1 ./program
19:56 karolherbst: and you can define those for each command invocation if you put them in front
19:56 karolherbst: but they don't get exported to your shell
19:58 vita_cell: karolherbs yeah, now it works, probably I missed DRI_PRIME command, or it is missing in Arch docs
19:58 vita_cell: now it works almost 200fps but stuttering LOL
19:59 karolherbst: it is pretty much basic shell knowledge
19:59 karolherbst: vita_cell: yeah, use dri3 to remove stutering
19:59 karolherbst: and enable vsync
19:59 vita_cell: what dri3?
19:59 vita_cell: lib?
19:59 karolherbst: no, just a fancy way of doing offloading
19:59 vita_cell: how do I use it?
19:59 karolherbst: you have to enable dri 3 on the intel ddx
20:00 vita_cell: I have gma4500 where VGA wire of my monitor is connected
20:02 karim: hi
20:02 karim: is it possible to have 1920x1080 résolution with nouveau driver and a gtx 1070 ?
20:02 imirkin_: yes
20:02 karim: xrandr doesn't propose it
20:02 karim: on ubuntu 16.04
20:02 imirkin_: then something is wrong
20:02 imirkin_: ah - probably that
20:03 karim: kernel 4.4.0
20:03 imirkin_: you're most likely using a kernel that doesn't support GP10x
20:03 karim: ok
20:03 karolherbst: imirkin_: the first outgoing edge is always the next BB if you don't take the bra?
20:03 imirkin_: you need ... 4.8 i think
20:03 imirkin_: karolherbst: yes.
20:03 karolherbst: k
20:03 vita_cell: karolherbs how to unload dri3?
20:03 vita_cell: modprobe?
20:03 karim: I wonder how ubuntu will maintain a lts version with a 4.4 kernel
20:05 imirkin_: karim: well, dunno how ubuntu works, but usually stable means "if it worked before, it'll keep on working". not "it works" :)
20:05 karim: imirkin_, yes
20:05 imirkin_: anyways, looks like the initial GP10x support went into v4.8
20:05 imirkin_: you won't be able to get accel, but modesetting should work
20:06 karim: but anyway the real issue is rather you need to change the full kernel to upgrade one driver
20:06 karim: i will install the nvidia driver
20:06 karim: proprietary, I wanted to avoid that because it's not the usual video card for this computer
20:06 imirkin_: ah - then you want to get an amd gpu - those are well-supported by amd
20:07 karim: it is an amd gpu normaly
20:07 karim: i will put it back later
20:08 karolherbst: imirkin_: would it mess up the CFG if I simply remove the predicate of a bra?
20:08 karolherbst: or wouldn't it care much about that
20:08 imirkin_: the CFG is totally separate from instructions.
20:08 imirkin_: there's nothing enforcing bra's pointing to where the CFG says they do :)
20:08 karolherbst: imirkin_: yeah, I know, I was just meaning that if something later will be messed up if you have a BB with two outgoing edges, but one unconditional bra :)
20:49 karolherbst: how can I set an immediate as a predicate?
20:57 imirkin_: i doubt that's particularly well supported
20:57 imirkin_: if it's an immediate, just remove the predication
20:57 karolherbst: yeah, already noticed
20:57 karolherbst: well
20:57 imirkin_: (or remove the instruction, depending on the outcome)
20:57 karolherbst: well
20:57 karolherbst: that won't work out well
20:58 karolherbst: because that will cause some joinats to stay
20:58 karolherbst: which gets removed by flattening
20:59 karolherbst: and some joins and bras and stuff :/
20:59 karolherbst: well, I could try to remove the bra, never tried that
21:00 karolherbst: mhh, that went slightly better
21:01 karolherbst: now I have a joinat and a join in the BB where the set was before
21:03 karolherbst: imirkin_: ohh, how well would it be supported if I just add a new predicate value for the bra and set it to some immediate value
21:03 karolherbst: like doing mov %p... 0x0
21:03 imirkin_: i THINK that might be supported... maybe.
21:03 karolherbst: looks like the cleanest way
21:04 karolherbst: because any other require me to cut an edge
21:04 karolherbst: otherwise flattening just crashes
21:04 imirkin_: note that P7 = true, (and !P7 = false)
21:04 karolherbst: ohhhh
21:04 karolherbst: nice
21:04 karolherbst: flattening will remove that bra anywasy
21:04 imirkin_: not having a predicate is identical to having a P7 predicate
21:04 imirkin_: (this is literally how the emission works)
21:04 karolherbst: good
21:05 karolherbst: how can I do that within SSA the cleanest way?
21:05 imirkin_: dunno
21:06 karolherbst: val0 = new_LValue(func, FILE_PREDICATE);
21:06 karolherbst: at least this is done for the TGSI thing
21:07 karolherbst: and then I would just do a mov setting that one I guess
21:07 karolherbst: mhh
21:07 karolherbst: shouldn't work
21:07 karolherbst: meh
21:08 karolherbst: ohh
21:08 karolherbst: simple
21:08 karolherbst: bld.getSSA(4, FILE_PREDICATE); ....
21:10 karolherbst: \o/
21:10 karolherbst: it works
21:10 karolherbst: 63: mov f32 %p260q 0.000000 (0)
21:10 karolherbst: 64: %p260q bra BB:12 (0)
21:10 karolherbst: and flattening just removed that crap
21:10 imirkin_: cool
21:11 imirkin_: presumably bld.getSSA(1, FILE_PREDICATE). but who's counting.
21:11 karolherbst: to be sure I also did setPredicate(CC_NEVER)
21:11 karolherbst: but I think it has to be NOT
21:11 karolherbst: ...
21:12 imirkin_: well, P7 = true
21:12 karolherbst: uhhh
21:12 karolherbst: CC_NOT_P
21:12 karolherbst: ohh right
21:13 karolherbst: like it matters anway
21:14 karolherbst: I like my new version much better now :)
21:14 karolherbst: less hacks
21:14 imirkin_: easier not to futz with control flow =]
21:14 karolherbst: yeah
21:14 karolherbst: and just 100loc
21:14 karolherbst: the old one was nearly double in size
21:16 karolherbst: :O
21:16 karolherbst: https://gist.github.com/karolherbst/9ed0ae25db58511581f6a524e3c10936
21:17 karolherbst: I guess that is again that orbital shader which spills
21:19 karolherbst: jo, it is
21:20 karolherbst: 5166 instructions, sure :D
21:20 imirkin_: that seems nice.
21:20 karolherbst: that shader is silly anyway
21:20 karolherbst: or nouveau is with that one
21:20 karolherbst: no clue
21:20 imirkin_: orbital? yeah. it's there as a stress test :)
21:20 karolherbst: it has like 60% movs or so
21:21 karolherbst: there are even BBs with 14 movs and nothing else
21:21 karolherbst: ...
21:22 karolherbst: yeah, I won't care about that one, because there are too many things wrong with it anyway
21:23 karolherbst: funny
21:23 karolherbst: and then 95% of the hurt shaders are from the talos principle
21:25 karolherbst: huh
21:25 karolherbst: a predicated texbar? :D
21:25 karolherbst: how funny
21:25 karolherbst: ohhh I see
21:25 karolherbst: mhhh
21:26 karolherbst: mhhhhhhh
21:26 karolherbst: imirkin_: basically the hurts ones are this: BB:1 texbar -> BB:2 $p0 texbar; BB:3 not $p0 texbar
21:27 karolherbst: mhhh this looks odd
21:28 imirkin_: weird
21:28 karolherbst: the heck is going on here
21:28 karolherbst: the BBs are empty pre RA
21:28 imirkin_: well, the thing that inserts texbars happens post-RA
21:28 karolherbst: and then they get predicated movs and a texbar
21:28 karolherbst: https://gist.github.com/karolherbst/97e9cda921ac67f90070696a86be4116
21:28 imirkin_: right, that makes sense
21:28 karolherbst: bb:2, 3
21:28 karolherbst: and BB:5,6
21:28 imirkin_: note $r20
21:29 imirkin_: depending on the direction it takes, it's either $r11 or $r9
21:29 karolherbst: but why
21:32 imirkin_: 75: phi u32 %r461 %r449 %r448 (0)
21:32 karolherbst: yeah, just noticed that
21:32 imirkin_: depending on where it comes from, it'll take one or another value
21:33 karolherbst: this will be tricky to detect
21:33 imirkin_: so the bra isn't QUITE so useless =/
21:33 karolherbst: yeah
21:33 karolherbst: I see that now
21:33 imirkin_: basically your thing has to make sure that there are no phi nodes
21:33 imirkin_: in that next block
21:33 karolherbst: simply enough
21:34 imirkin_: or run it after the InsertMovsAllOverThePlacePass
21:34 imirkin_: (in nv50_ir_ra.cpp)
21:34 imirkin_: which is kinda too late
21:34 karolherbst: mhhh
21:34 imirkin_: so yeah, that's not great
21:34 karolherbst: I can check for phis
21:34 karolherbst: the benefit should be big enough
21:34 imirkin_: hopefully.
21:34 imirkin_: but yeah, basically the simple case there is just
21:35 imirkin_: if (a) x = foo.xy; else x = foo.yx;
21:35 imirkin_: use(x)
21:35 imirkin_: that use will be in the next bb
21:35 imirkin_: and there won't be any mov's in those blocks
21:36 karolherbst: mhhh
21:36 karolherbst: well we could add another stage! :D
21:36 imirkin_: not really.
21:36 karolherbst: something between SSA and RA where we have every preperation done, but still SSA form
21:37 imirkin_: yeah, but you'd want to run DCE after
21:37 karolherbst: uhhh
21:37 karolherbst: that will remove those movs again, right?
21:37 imirkin_: =]
21:37 imirkin_: actually i guess it waont
21:37 karolherbst: well
21:37 karolherbst: they are fixed, right?
21:37 karolherbst: (hopefully)
21:38 imirkin_: a CSE would remove them
21:38 imirkin_: but DCE shouldn't
21:38 karolherbst: k
21:38 karolherbst: well, lets wait for the results first though
21:40 karolherbst: .....
21:40 karolherbst: well, crap
21:40 karolherbst: https://gist.github.com/karolherbst/f179d1a3f3fccb880cb5718d24c17c92
21:41 imirkin_: pretty sure the original thing was just wrong though =/
21:41 karolherbst: yeah
21:41 karolherbst: it broke some shaders
21:41 karolherbst: most likely
21:41 karolherbst: so mhhh
21:41 imirkin_: not quite as impressive
21:42 karolherbst: I could see what my postradce still removes and try to integrate this as well
21:46 karolherbst: imirkin_: will this move thing only happen for things like tex? or is there a bunch of things
21:46 imirkin_: tex is a big source of it
21:46 imirkin_: but hardly the only one
21:47 imirkin_: any time you have a foo.xy vs foo.yx thing
21:47 imirkin_: you have to insert these constraint movs
21:48 karolherbst: the DCE pass: total instructions in shared programs : 2818120 -> 2817861 (-0.01%)
21:48 karolherbst: so there is still a little
21:50 karolherbst: imirkin_: do you think it would be indeed fine to run that branch elim pass inside RA after the constraint movs were added?
21:51 imirkin_: i dunno
21:51 imirkin_: i think you're making this into a bigger deal than it is
21:51 imirkin_: like ... aren't there better opts one could go after
21:52 karolherbst: true
21:52 imirkin_: this seems like it'll just add a lot of complexity
21:52 imirkin_: for fairly limited benefit
21:52 karolherbst: like the post ra folding
21:52 karolherbst: well
21:53 karolherbst: the current implementation nearly touches shaders just from one game
21:53 karolherbst: and they usually go down by 3 instructions each
21:53 karolherbst: I think I will leave it as it is for now and get back to that later
21:54 karolherbst: imirkin_: we could also just do tha post ra DCE thing instead ;)
21:55 karolherbst: hey, somebody finished and posted a series for ARB_enhanced_layouts
21:56 karolherbst: and only enables 4.5 for radeonsi...
21:56 karolherbst: wouldn't it just work with nvc0 too?
21:56 imirkin_: probably not. i'll figure it out over the weekend
21:56 karolherbst: k
21:56 karolherbst: the patches are just st/glsl_to_tgsi though
21:56 karolherbst: mainly
21:56 karolherbst: no radeon thing
21:57 imirkin_: i stand by my answer.
21:57 karolherbst: k