00:20karolherbst: imirkin_: no abs with iadd :/
00:21karolherbst: if you think about it, abs on integers is more complicated than any of the other things we got there
00:22HdkR: Nobody needs integer abs anyway right? :P
00:23karolherbst: well, it's painful to implement
00:23imirkin: karolherbst: really? annoying.
00:23imirkin: karolherbst: what about ICMP or some shit like that
00:23imirkin: [not worth it]
00:23karolherbst: there is no easy way to lower iabs
00:24imirkin: well, in theory, maybe ICMP x, -x
00:24imirkin: er, ICMP, x, x, -x
00:24imirkin: although i think that's just == 0 and != 0
00:24imirkin: there's no < 0
00:25karolherbst: let me ask nvidia :D
00:30karolherbst: we have emitCond3 on ICMP
00:30karolherbst: so there is >0
00:30karolherbst: no neg modifier?
00:31karolherbst: yeah... I see no way to do an iabs in one op except i2i
00:34karolherbst: imirkin: also, what did you mean by "Similar question for OP_NEG."?
00:34karolherbst: got it
00:34karolherbst: neg(abs(a)) wouldn't be possible as well :)
01:32Lyude: (if anyone could let in the emails I just sent on nouveau's ML that'd be appreciated)
01:33imirkin:has no clue who list admins are
01:35Lyude: yeah same
02:04imirkin: Lyude: i know this is a dumb question before i even ask it ... but with fermi -- no way to get 2 screens to show different content hanging off a single DP port, right?
02:08HdkR: Get an ultrawide display ;)
02:08imirkin: HdkR: too tall.
02:09imirkin: neck starts to hurt from all the looking up-and-down
02:09HdkR: Don't rotate it that way then
02:09imirkin: but then ... too short.
02:09imirkin: the rotated 24" is about the perfect height
02:10HdkR: 1440p is too short for me usually as well
02:10Lyude: imirkin: no such thing as a dumb question! But, yeah unfortunately if it doesn't have MST it can't do daisy chains
02:10imirkin: HdkR: 1920 ftw!
02:10HdkR: 4k life :D
02:10imirkin: although it's not really about the number of pixels
02:10imirkin: but the number of inches.
02:11HdkR: 13" 4k woo?
02:11imirkin: Lyude: =/ the combination of GPUs i have is not ideal. might be time to get a proper one.
02:11imirkin: the GTX960 is borrowed ... only other DP board i have is a GF108
02:12imirkin: GP108 has HDMI + DVI.
02:12imirkin: nothing will do 3 screens =/
02:13imirkin: HdkR: i've never tried this before, but i might end up with 3x rotated 24" screens... should be interesting.
02:13imirkin: no way to do it without a giant seam unfortunate
02:14imirkin: maybe right move is to leave one unrotated.
02:14HdkR: Hm, 3x rotated 4k displays sounds interesting
02:14Lyude: imirkin: not even with pll sharing?
02:15imirkin: Lyude: hm?
02:16imirkin: oh, i see - maybe in theory. i don't think nouveau lets you do that. and i don't know that nvidia hw supports it.
02:16Lyude: imirkin: on my AMD R9 380 I'm able to do triple displays since they're all identical modes
02:16imirkin: yeah, no -- pre-kepler boards only have 2 CRTC's
02:16imirkin: kepler+ boards have 4
02:16imirkin: but that only helps if you have the connectors to hook it up...
02:17imirkin: GP108 has HDMI + DVI... what does GM107 have? hm
02:17karolherbst: Lyude: I was setting up 4 FHD screens on a low end AMD card once ;)
02:17Lyude: karolherbst: oh?
02:17imirkin: ah of course. GM107 has DVI, HDMI, and ...
02:18imirkin: i guess my 3rd monitor does have a VGA input, but ... not sure i want that.
02:18Lyude: Can't do multiple cards at once?
02:18karolherbst: Lyude: was 4 times DP though
02:18Lyude: karolherbst: ahhhh
02:18imirkin: Lyude: sure
02:19imirkin:was hoping to avoid it
02:19imirkin: also while my gpu supply is apparently infinite
02:19imirkin: my cable supply is much more restricted
02:22imirkin: really i should get a primary non-nvidia board
02:22imirkin: so i can more easily hack on nouveau
02:25imirkin: hm. RX 550 for $80 new.
06:03imirkin: huh. what's fermi missing for ES 3.2? i would have assumed it would have gone along for the ride...
06:21imirkin: oh duh! blend_equation_advanced
06:39HdkR: imirkin: Time to implement dumb blending
06:39imirkin: HdkR: the bit i'm having trouble with is fetching the fb data
06:40imirkin: [i think]
07:06imirkin: mwk: any clue how to access the "TIC2" bindings on fermi?
07:07imirkin: mwk: and separately, what's the "cl" flag? is that a clamp on a texfetch?
11:15mwk: imirkin: TIC2? not really no
21:09pmoreau: karolherbst: Thanks!
21:14karolherbst: pmoreau: didn't runtime test, something might break :/
21:15pmoreau: I’ll ping you if that happens ;-D
21:16karolherbst: pmoreau: I am sure it is less broken than my old branch though :D
21:16pmoreau: Could be :-D
21:16karolherbst: basically all memory operations should be fixed
21:16pmoreau: That sounds good
21:16karolherbst: pmoreau: even passing structs by value into the kernel should just work now
21:17pmoreau: I’ll give it a try once I’m out of the meeting
21:18karolherbst: mhh packed memory operations are still broken though :/ *sigh*
21:18karolherbst: really should look into jekstrands alignment stuff
21:18karolherbst: but that's for next year :p
21:18karolherbst: pmoreau: what GPU do you have?
21:19pmoreau: Quite a few Tesla, a couple Fermi and Kepler, one Maxwell v2 and one Pascal.
21:19karolherbst: mhh, for the pascal one, you could use my "secboot_fixes" branch to enable hmm
21:20karolherbst: the nouveau one
21:20karolherbst: it's 4.18 based though
21:20pmoreau: And some Volta and Turing at work obviously, but those are not for Nouveau. :-/
21:20karolherbst:was too lazy to rebase his nouveau tree for quite a long time
21:20karolherbst: probably do it the next time I reboot
21:20pmoreau: Okay, I’ll try to try that branch before going back home for Christmas.
21:23karolherbst: wondering what kind of totally useless optimization I could work on next... iadd3? :D
21:24karolherbst: mhh, iadd3 isn't _that_ useless though
21:24karolherbst: you can use iadd3 for iadd neg a neg b
21:24pmoreau: Ah, need to pick something else then
21:25pmoreau: Who wants useful stuff
21:25karolherbst: but I have a WIP branch for it, so maybe I just finish it
21:25pmoreau: If you want useless things to work on, I think the current LED driver is monochromatic only at the moment IIRC. ;-)
21:26karolherbst: that's not useless
21:27karolherbst: pmoreau: thing about it, if everything would be more or less equal to nvidia, but with nouveau you are able to do more crazy LED stuff, some people would use nouveau then :p
21:27pmoreau: Well… :-D
21:28pmoreau: Call me back when everything else is more or less equal to NVIDIA! :-p
21:29pmoreau: I guess it depends on how large you are with “more or less”.
21:31karolherbst: performance obviously
21:32pmoreau: Hierarchical Z could be nice
21:33karolherbst: I think we do the basics, just not the fast zcull stuff
21:33karolherbst: ohhh, I found something good
21:34HdkR: Something perfectly useless? :)
21:35karolherbst: not quite
21:35karolherbst: I have some slcts optimizations not merged
21:35karolherbst: -0.4% instructions in avg
21:49karolherbst: mhh "total instructions in shared programs : 7614782 -> 7545812 (-0.91%)"
21:49karolherbst: ohh, more shaders from games ported by feral were added
23:02karolherbst: HdkR: just found a patch to remove the most questionable optimization: mul(a, 2) to add(a, a)
23:03imirkin_: why is that questionable?
23:03karolherbst: or is add faster on any hw?
23:03imirkin_: no, but a is a register
23:03imirkin_: while 2 is an immediate
23:03karolherbst: imirkin_: mul(c, 2) -> add(c, c) won't work
23:03imirkin_: add a,a can be encoded as 4 bytes on tesla
23:04karolherbst: or neg a
23:04imirkin_: but if there's an imm, that adds 4 bytes to the encoding
23:04karolherbst: well neg a would actually work
23:04karolherbst: imirkin_: mhhhhhh
23:04karolherbst: imirkin_: same with shl?
23:05imirkin_: vs what?
23:05karolherbst: we could prefer doing a shl(a, 1) vs add(a, a)
23:05RSpliet: shl wouldn't work on floats?
23:05karolherbst: yeah, it wouldn't
23:05imirkin_: hm, dunno. would have to check the encoding rules.
23:05imirkin_: (no shl on floats)
23:05karolherbst: anyway, the actual benefits on shader-db are way too small
23:06RSpliet: on Kepler the q is also which is easier to dual-issue... a mul with sth else, or an add with sth else
23:07karolherbst: on kepler that doesn't matter as all of those are alu ops
23:07karolherbst: are shifts special?
23:07imirkin_: they could be
23:08imirkin_: since they only take like 5 bits of immediate
23:08imirkin_: so they could have a diff encoding
23:08imirkin_: not sure
23:08karolherbst: no, I was more wondering in regards to dual issueing for kepler
23:08karolherbst: as we don't dual issue two shifts
23:09karolherbst: I remember adding an exception for min/max back then, no idea why I was never thinking abotu shifts
23:09imirkin_: oh yeah, dunno
23:13karolherbst: btw, sent out my slct opts again, they have quite an impact on those games ported by feral
23:27imirkin_: ok cool
23:27imirkin_: will have a look
23:28karolherbst: -1.7% instructions and -0.3% gprs if only checking those
23:36HdkR: karolherbst: Does your RA make any tradeoffs for occupancy?
23:36karolherbst: not afaik
23:39karolherbst: I don't even know if we have any kind of priority when spilling as well
23:40karolherbst: smart thing to do would be to spill values you basically never use ;) but I doubt we check on that as we would have to keep loops in mind as well
23:42HdkR: I can't say I've had the pleasure of correctly handling either case :P
23:42HdkR: Usually ends up being a dumb heuristic of "What is something that its next use case is farthest away"
23:44HdkR: So definitely not solved the perfect RA/Spill/Occupancy ratio problem :P
23:45karolherbst: mhh you could backtrack from all returns
23:45karolherbst: and just pick any you are comfortable with :D
23:45karolherbst: but yeah, having some kind of loop analysis is more or less required
23:48joepublic: does a gtx 1050 need non-free firmware to work? asking for a friend
23:49RSpliet: HdkR: depending on how you treat a program, that could easily be an incredibly involved heuristic. You'd want to find the longest path in a dependency graph or sth like that...
23:49karolherbst: depends on your definition of work
23:49karolherbst: modesetting works without firmware
23:49karolherbst: but you won't get any hw acceleration
23:49karolherbst: RSpliet: the longest patch isn't directly the best one
23:49HdkR: RSpliet: Yes, and it still ends up being a heuristic in the best case. There is no "perfect" version
23:50joepublic: i must be reading the feature matrix page wrong, in that i thought that situation would say extfw. thanks for explaining.
23:51RSpliet: karolherbst: oh I agree it's unlikely to be the best heuristic, but the longest path from gen to use (eh... how would that even work with multiple use) gives you the best approximation of instruction distance.
23:52karolherbst: RSpliet: just that instruction distance is a bad metric here in the first place
23:52karolherbst: the only thing you really care is amoung of read/writes
23:53RSpliet: if you want to minimise number of spills, you'd have to look at all the instructions' liveness, and find the smallest set such that removing them from liveness reduces the size of live vars enough to satisfy your register allocation
23:53RSpliet: "smallest set of operands"
23:54karolherbst: which in the end is still amounf of read/writes you want to minimize
23:54RSpliet: By which you mean the number of spill/retreive ops?
23:55karolherbst: well sure
23:57RSpliet: Yep. If you do this post-instruction scheduling that means identifying the minimal number of operands to push/pop using all livesets with more operands than available registers in the arch
23:57karolherbst: if ops was operands, then that's not the correct thing to do
23:57karolherbst: keep loops in mind
23:58karolherbst: spilling 5 values might be cheaper if the alternative would mean to spill one value used in a nested loop
23:59RSpliet: Because inside that nested loop you'd be spilling one*x*y values. Yes, you'd have to compensate for repetition due to loop bounds