01:49 danofsatx: anybody seen this one yet? nouveau 0000:01:00.0: fifo: read fault at 003e403000 engine 00 [PGRAPH] client 08 [GPC2/] reason 02 [PAGE_NOT_PRESENT] on channel 21 [005e728000 factorio[16217]]
07:29 orbea: nice, reclockng isn't pushing my system as hard as it used to it seems :)
08:43 hermier: karolherbst: still there ?
08:47 karolherbst: hermier: yeah
08:48 hermier: karolherbst: I got maybe the dumbest idea ever, did you think that using https://en.wikipedia.org/wiki/Duff%27s_device could help you to generalise pow somehow ?
08:53 hermier: karolherbst: if you don't kown about it it is used for implementing memcp but I think it can be reused for pow(float, int)
08:53 karolherbst: I highly doubt that
08:53 karolherbst: cause pow can take any float falue
08:54 hermier: yes but you could use it for int values (if that does help)
08:56 hermier: anyway as I said, it may be dumb ^^
09:48 mwk: duff's device is a horrible idea for GPUs...
09:54 karolherbst: I thought the idea was to trick the compiler to generate better code anyway
09:55 karolherbst: and isn't it pretty much useless these days anyway?
15:17 Tom^: karolherbst: have you tried shadow of mordor on linux?
15:20 karolherbst: Tom^: I think pmoreau did
15:20 karolherbst: threading issues with nouveau though
15:20 Tom^: was thinking on how well it runs, and mostly on blob :P
15:22 karolherbst: no clue
15:23 Tom^: was better back in the days when there were demos :(
15:32 karolherbst: Tom^: well, today we have DLC! practically the same, cause for hte full game you pay 60€ more!
15:34 Tom^: :D
16:06 pmoreau: karolherbst: I don’t remember ever running Shadow of Mordor, be it on Linux, macOS or Windows
17:10 karolherbst: pmoreau: ohh then it was hakzsam
17:18 jrun: nouveau flooding the dmesg
17:18 jrun: kernel: nouveau 0000:01:00.0: DRM: skipped size 0
17:19 imirkin_: you have a buggy client of some sort
17:19 imirkin_: that's trying to allocate 0-sized BO's
19:20 karolherbst: what was PTE again? "fifo: write fault at 000024c000 engine 00 [GR] client 02 [GPC0/PE_0] reason 02 [PTE] on channel 2"
19:20 imirkin_: page table entry
19:20 karolherbst: ahh
19:21 karolherbst: so basically memory screwed up
19:22 karolherbst: skeggsb: could you tell me which bits/bytes in the vbios aren't handled yet regarding memory reclocking? Just so that I know if I ever encounter those
19:47 imirkin_: karolherbst: PE_0 (whatever that is... some engine, but not one i immediate recognize) tried to write to that virtual memory address, and there was no PTE backing that VM address.
19:47 karolherbst: I see
19:47 karolherbst: I just got a report about a GM108 reclocking with current master and this error
19:48 imirkin_: memory is not screwed up... just missing a PTE. unlikely to be related to reclocking.
19:48 karolherbst: mhh
19:48 karolherbst: it works on default clocks though
19:48 karolherbst: there was also "fifo: write fault at 00001c2000 engine 05 [BAR2] client 08 [HOST_CPU_NB] reason 02 [PTE] on channel -1 [00ffbf6000 unknown]"
19:48 imirkin_: but of course PTE's live in memory, so *if* memory is screwed up, PTEs would be as well.
19:48 imirkin_: ouch
19:48 karolherbst: so yeah
19:49 karolherbst: I guess it has one of those bits/bytes set we don't handle yet
19:49 karolherbst: skeggsb said he has a few cards with reclocking issues
19:49 karolherbst: sadly all of mupufs gpu reclock just fine :/
19:50 imirkin_: yes, very sad.
19:50 imirkin_: but kinda happy too :)
19:50 karolherbst: true
19:50 karolherbst: I asked skeggsb (not enough though) to bring those to XDC...
19:50 karolherbst: but somehow he didn't saw it and I forgot to ask again
19:50 karolherbst: ...
20:10 danofsatx: anybody seen this one yet? nouveau 0000:01:00.0: fifo: read fault at 003e403000 engine 00 [PGRAPH] client 08 [GPC2/] reason 02 [PAGE_NOT_PRESENT] on channel 21 [005e728000 factorio[16217]]
20:11 danofsatx: it's a GeForce GTX 580
20:14 danofsatx: That error pops up in the journal (kernel 4.7.5-200.fc24.x86_64) when I try anything with advance graphics and X freezes. I have to log in from another system to reboot the thing - I can't even restart X or the DM to recover as I was able to do with previous lockups
20:18 imirkin_: danofsatx: are you using KDE?
20:19 imirkin_: and also what application is that" factorio? (it's clearly cut off)
20:19 danofsatx: yes. I'm also seeing quite of errors like this: https://paste.fedoraproject.org/445195/85092147/
20:19 danofsatx: facotorio is a Steam game
20:20 imirkin_: ok, so i guess nouveau hates that steam game?
20:20 imirkin_: note that a ton of people have had a ton of issues with KDE things
20:20 danofsatx: well, it didn't until 4.7.4 - up until 4.7.3 it ran just fine.
20:20 imirkin_: without attempting to lay blame on the KDE folk (chances are it's not their fault), i'd recommend avoiding nouveau + kde.
20:21 imirkin_: are you sure it's the kernel version that changed?
20:21 imirkin_: and not the kde version?
20:21 danofsatx: I was afraid you were going to say something like that... I'd really rather avoid the NVIDIA blob, and I greatly dislike the GNOME interface
20:22 imirkin_: esp QtWebEngine has big issues with nouveau
20:22 imirkin_: and afaik they've been integrating it into more and more things
20:22 danofsatx: Let me check that update log, hang on a sec....
20:22 imirkin_: well, you could try removing nouveau_dri.so from your system, or running with LIBGL_ALWAYS_SOFTWARE=1 to avoid some level of issues.
20:23 imirkin_: fwiw i dislike what both gnome and kde have turned into.
20:23 imirkin_: i personally use WindowMaker, and have been for over 15 years.
20:24 imirkin_: if you're looking for hw that's well-supported by open source, i might recommend looking at AMD or Intel.
20:26 karolherbst: pmoreau, hakzsam: do you also get heap corruptions in shader-db?
20:26 hakzsam: sometimes
20:26 karolherbst: k
20:26 danofsatx: ok, the last major update to KDE before the kernel update was plasma-5.7.5, kf5-16.08, and kde-apps-16.08
20:26 karolherbst: just wanted to make sure it isn't my stuff :D
20:26 hakzsam: revert the last commit from marek fixes the issue btw
20:26 karolherbst: on shader-db?
20:26 hakzsam: yes
20:27 karolherbst: guess ome concurrency thing
20:27 hakzsam: no
20:27 karolherbst: ohh, just bad luck then
20:27 hakzsam: it also happens with -1
20:27 danofsatx: those didn't really change much performance wise, but on my first reboot after application of the 4.7.4 kernel update, I could no longer play any Steam games without the system freezing.
20:27 karolherbst: one opt of mine: https://gist.github.com/karolherbst/34c9b9b4a2465e830ad62ec9ba2fd37b :(
20:28 karolherbst: what a waste of time :D
20:28 danofsatx: Intel doesn't make add-in cards (that I can find)
20:28 hakzsam: karolherbst, hehe
20:28 karolherbst: the benefit is bigger if we run passes multiple times though, not quite sure
20:28 pmoreau: I have been able to run Factorio + KDE on this laptop, but might have been before 4.7.4, I’ll have to retest.
20:28 karolherbst: hakzsam: it is something like that: "optimize set,and,cvt to set"
20:29 karolherbst: and the and+cvt just make a boolean value
20:29 danofsatx: pmoreau: KDE Plasma 5?
20:29 pmoreau: Yup
20:29 hakzsam: karolherbst, okay, but unfortunately this doesn't help much
20:29 karolherbst: true
20:29 karolherbst: I am sure it helps more on more optimized stuff though
20:29 pmoreau: karolherbst: I have never played with shader-db yet, will have to some day.
20:30 karolherbst: hakzsam: anyway, I won't push out that patch, cause it is too big for that little benefit: https://github.com/karolherbst/mesa/commit/deb634900742ba18ade196bd1fdb118e1c195d44
20:30 imirkin_: danofsatx: yeah, but all their non-server CPUs come with moderately capable GPU functionality
20:30 pmoreau: danofsatx: I can’t test right now, but I’ll try over the week-end.
20:30 hakzsam: karolherbst, okay
20:30 karolherbst: I am sure this patch helped a lot more
20:30 karolherbst: but I don't know the dependency yet
20:32 danofsatx: my current guess is that the GPU itself is simply on the way out. This system was "free" - it served as payment for the labor of installing a new system for a customer.
20:33 imirkin_: danofsatx: there were also a bunch of issues a while ago with GF110, but i believe that those were resolved some time ago
20:44 mooch: hey, does anybody here know anything about the differences between nv3 and nv4 dma pusher commands?
20:45 mooch: i have nv4's pusher commands emulated just fine, but i wanted to get something working for nv3 emulation too
20:46 karolherbst: hakzsam: my sel folding pass: https://gist.github.com/karolherbst/9f80823550c38d7b34b24d5ca1cd67fd
20:47 karolherbst: it is a monster though
20:47 hakzsam: much better
20:47 hakzsam: especially for spillig
20:47 karolherbst: https://github.com/karolherbst/mesa/commit/404e434b4cfd0b224aae5d67e862c986b7a679a8 ;)
20:47 imirkin_: in that it spills more? :)
20:47 karolherbst: it isn't even complete
20:47 karolherbst: :D
20:48 karolherbst: imirkin_: actually the shader which spills more deserves it
20:48 hakzsam: outch
20:48 imirkin_: heheh
20:48 karolherbst: shaders/orbital_explorer.shader_test
20:48 karolherbst: is the one
20:52 mooch: yesssssss
20:52 mooch: i'll have to take a look at that one
20:52 mwk: heh
20:52 mwk: I now test everything about the SOLID classes *except* drawing actual shit on screen :)
20:53 karolherbst: I am sure there is a much simplier way to do that SEL folding thing, but ... :/
20:53 mwk: I mean, submitting vertices
20:53 mwk: you'd think it'd be simple, but no
20:53 karolherbst: it is just pretty annoying to write, cause you can screw up everything at any place
20:56 mwk:wonders how hard it'll be to test the rasterization algorithm
20:56 mwk: I suppose I'll need a good understanding of the canvas first
20:58 karolherbst: that's how I like it: https://gist.github.com/karolherbst/718b5bba5c14954daa5af35a43657bb1 :)
20:59 hakzsam: which opt?
20:59 karolherbst: https://github.com/karolherbst/mesa/commit/7447382841e1bd67e8b3e96401a70c9012c45f2b
20:59 karolherbst: smarter CSE thing
21:00 karolherbst: merges things like add(neg(a), b) and add(a, neg(b) together
21:00 karolherbst: stuff like that
21:00 karolherbst: no idea if this is the right way of doing it though
21:00 hakzsam: cool
21:00 karolherbst: maybe I should just merge it together with the LocalCSE thing
21:00 karolherbst: or something
21:00 karolherbst: no idea though
21:01 karolherbst: but maybe it makes sense to seperate those things
21:11 karolherbst: hakzsam: funny though, that the only beneift now comes from the PHI merging...
21:12 hakzsam: ah?
21:14 karolherbst: yeah
21:15 karolherbst: sometimes we generate phis like "phi %r500 %r600 %r700; phi %r501 %r600 %r700"
21:15 karolherbst: no idea why
21:16 karolherbst: maybe I should teach the normal CSE passes to handle this
21:16 karolherbst: cause they should
21:20 karolherbst: hakzsam: one shader without that pass: https://gist.github.com/karolherbst/8a322ff107c5e734f17ca4b2bc8af7cf
21:20 karolherbst: those phis...
21:21 hakzsam: ah yeah I see
21:21 karolherbst: it enables other opts later on due to this
21:21 karolherbst: this shader drops from 44 to 36 instructions
21:22 < Shawn|4650M> howdy
21:22 hakzsam: karolherbst, nice opt
21:22 karolherbst: hakzsam: " 28: or u32 $r0 $r0 $r0" :D
21:23 hakzsam: lol
21:23 < Shawn|4650M> I hear the nouveau driver is working better in current, is it ready to run on a p4 with an old nvidia geforce 7600gs?
21:23 imirkin_: define 'ready'?
21:23 < Shawn|4650M> usable?
21:24 imirkin_: it's all in the eye of the beholder
21:24 imirkin_: easy enough to give it a shot and see if it's satisfactory for your needs
21:24 karolherbst: imirkin_: do you think this phi merging belongs in GlobalCSE or LocalCSE? I don't exactly know the difference, but I would assume both should be able to detect this?
21:24 < Shawn|4650M> ah, is there a tutorial somewhere?
21:24 imirkin_: karolherbst: LocalCSE = CSE within a BB
21:25 karolherbst: yeah, makes sense
21:25 karolherbst: I have no idea why phis aren't handled thre though
21:25 imirkin_: Shawn|4650M: many distros provide something, i think?
21:25 imirkin_: Shawn|4650M: chances are it's going to be distro-specific
21:25 < Shawn|4650M> for netbsd?
21:25 imirkin_: Shawn|4650M: ah, you want to talk to riastradh
21:25 imirkin_: who doesn't appear to be here right now
21:26 imirkin_: i am not aware of the details of his porting efforts to netbsd
21:26 < Shawn|4650M> no tutorial?
21:26 imirkin_: generically, you should be able to build nouveau into your kernel (no modules on netbsd, right?)
21:26 imirkin_: and then that should get you basic modesetting
21:26 imirkin_: if that works, you can proceed to the next step which is installing xf86-video-nouveau
21:26 imirkin_: and running X. should work.
21:27 imirkin_: assuming that's the case, you can build mesa and get glxgears running.
21:27 < Shawn|4650M> there is this http://pkgsrc.se/x11/xf86-video-nouveau
21:27 imirkin_: yep
21:27 karolherbst: uhhh
21:27 karolherbst: localCSE starts with getEntry...
21:28 imirkin_: karolherbst: dunno why not start with getPhi() tbh
21:28 karolherbst: wouldn't be getFirst be okay?
21:28 imirkin_: karolherbst: i don't remember what that does.
21:28 karolherbst: first instruction
21:28 karolherbst: even if it isn't phi
21:29 karolherbst: getPhi just phis ;)
21:29 karolherbst: the pass even does "for (ir = bb->getFirst(); ir; ir = ir->next) ir->serial = serial++;" :D
21:29 imirkin_: ok
21:29 imirkin_: that sounds reasonable.
21:29 imirkin_: i can't think of an obvious reason why you shouldn't be able to CSE phi nodes
21:29 karolherbst: well, let's see what happens with getFirst instead of getEntry
21:29 karolherbst: mhhh
21:30 karolherbst: I think somebody tried to be smart
21:30 karolherbst: and was thinking: well phis won't be mergable with other instructions
21:30 imirkin_: unlikely.
21:30 imirkin_: btw, that GlobalCSE pass is pretty much shit
21:30 karolherbst: well, you can only merge phis with phis
21:30 karolherbst: yay!
21:30 karolherbst: same profit
21:30 karolherbst: nice
21:32 karolherbst: :) that's how I like it: simpe and good enough benefit: https://github.com/karolherbst/mesa/commit/77a9dfa2ae54ec7f396f8427e98765a80ed491a7
21:32 karolherbst: *simple
21:35 karolherbst: and now or(a, a) :D
21:36 karolherbst: mhh
21:37 karolherbst: after algebraic opt the same or is or(a,b)
21:37 karolherbst: ohh I see
21:37 karolherbst: https://gist.github.com/karolherbst/2b1c6a37e7c9e93211b5f70255e3b467
21:40 karolherbst: mhh
21:40 karolherbst: the second LocalCSE creates this situation
22:00 mupuf: good news! I got a new machine for CI!
22:00 mupuf: a colleague was about to throw it away
22:00 karolherbst: wuhu
22:00 karolherbst: is it fast? :D
22:00 mupuf: it is a very small one
22:00 mupuf: it is a sandy bridge, 4 cores
22:00 karolherbst: decent enough
22:01 mupuf: exactly :D Compared to reator, it should be decent
22:01 karolherbst: I was more thinking about your laptop though
22:01 mupuf: only problem, the box will only fit single-slot GPUs
22:01 karolherbst: ohh
22:01 karolherbst: well
22:01 karolherbst: you have a saw, right?
22:01 hakzsam: mupuf, well, reator is decent :)
22:01 hakzsam: more than my other machine...
22:01 mupuf: hakzsam: not for perf testing with the titan
22:01 mupuf: :D
22:02 hakzsam: right :)
22:02 karolherbst: wut... the heck
22:02 karolherbst: just throwing ina random DCE
22:02 karolherbst: https://gist.github.com/karolherbst/b9388a0c5e08810d5b825f6e1087bb55
22:02 karolherbst: ...
22:02 mupuf: so... I wonder what I can plug in it, aside from the super loud nvc1 (which, btw, I should fix)
22:02 karolherbst: after constantfolding
22:03 karolherbst: "helped local ../nvidia_shaderdb/tomb_raider/2148.shader_test - 1 116 -> 32" well sure
22:03 karolherbst: why not
22:03 karolherbst: ...
22:05 karolherbst: the hell is this shader :O
22:05 karolherbst: there are actually fmas
22:07 karolherbst: jo "1347: mul ftz f32 %r5146 %r5144 158456325028528675187087900672.000000" ...
22:07 karolherbst: this shader is awesome
22:07 karolherbst: 1344: set ftz f32 %r5142 gt %r5132 0.000000 (0)
22:07 karolherbst: 1345: set ftz f32 %r5143 lt %r5132 0.000000 (0)
22:07 karolherbst: 1346: sub ftz f32 %r5144 %r5142 %r5143 (0)
22:07 karolherbst: :D
22:09 karolherbst: I guess this could be opted into a simple slct
22:22 imirkin_: Riastradh: someone was in here earlier asking about netbsd support
22:22 imirkin_: Riastradh: is there anything i can point people at?
22:28 karolherbst: imirkin_: what can I do instead of "LValue *val = bld.getScratch();" within LegalizeSSA ?
22:28 karolherbst: getSSA() ?
22:29 karolherbst: mhh, seems right, let me try
22:30 imirkin_: yeah, that's fine
22:30 karolherbst: get a ton of " not uniquely defined" no though... more then normal I think
22:32 karolherbst: mhh "total instructions in shared programs : 2818227 -> 2822911 (0.17%)" :(
22:34 karolherbst: ohhh right
22:34 karolherbst: cause I reuse the SSA value
22:34 karolherbst: ...
22:35 imirkin_: yeah you can't do that.
22:35 karolherbst: yep
22:35 karolherbst: I need three new values
22:35 imirkin_: bld.mkOp2v & co auto-create fresh values
22:35 imirkin_: and return them
22:35 karolherbst: yeah, I know
22:38 hakzsam: mupuf, btw, piglit is running on reator
22:41 karolherbst: still "total instructions in shared programs : 2818227 -> 2822830 (0.16%)" though :(
22:44 karolherbst: ohhh I see it now
22:44 karolherbst: now I keep a mov and lg2
22:44 karolherbst: mov just sets an immediate
22:45 karolherbst: uhm
22:45 karolherbst: imirkin_: do you think it makes sense to check for an immediate value within LegalizeSSA and just fold there?
22:45 imirkin_: why does LoadPropagation not pick it up?
22:46 karolherbst: think again ;)
22:46 imirkin_: i have no idea what you're doing.
22:46 karolherbst: I moved the pow lowering into legalizeSSA
22:46 imirkin_: if you're adding code in LegalizeSSA, yes, you should manually propagate it in there.
22:46 karolherbst: k
22:47 imirkin_: wait
22:47 karolherbst: before I add my pow opt, I want to move the stuff first and make sure I don't do silly things
22:47 imirkin_: why is this in LegalizeSSA?
22:47 imirkin_: why isn't it in ConstantFolding?
22:47 karolherbst: huh?
22:47 imirkin_: i can hit up + enter if you like...
22:47 karolherbst: I move the pow lowering from NVC0LoweringPass to NVC0LegalizeSSA
22:47 imirkin_: ohhhh
22:48 imirkin_: right. gotcha.
22:48 imirkin_: this is the actual pow lowering, not the pow opt you added.
22:48 karolherbst: right
22:48 karolherbst: 1. move pow lowering and make sure not to break
22:48 karolherbst: 2. add opt
22:48 karolherbst: ;)
22:48 imirkin_: sgtm
22:49 hakzsam: karolherbst, fyi about pow lowering, blob starts using the SFU unit with pow(x, 4)
22:49 hakzsam: otherwise, it uses mul
22:49 karolherbst: already with 4?
22:49 hakzsam: yes
22:49 karolherbst: so just -3 to 3?
22:50 karolherbst: or just 1-3?
22:50 hakzsam: 2,3
22:50 karolherbst: right, not that pow 1 makes any sense
22:50 karolherbst: or pow 0
22:50 hakzsam: I didn't check with negative exponents actually
22:50 karolherbst: would be nice to know that too
22:50 hakzsam: but for 2 and 3 it uses mul for sure
22:50 karolherbst: but why not with 4....
22:51 hakzsam: dunno
22:51 hakzsam: but regarding the generated code, it should not use SFU :)
22:51 karolherbst: maybe 2 muls are slow than mul+preex2+ex2?
22:51 karolherbst: would be really odd
22:51 hakzsam: unlikely
22:52 karolherbst: hakzsam: did you only check immediated pows?
22:52 karolherbst: because if you have a random value, you also get a lg2 as well
22:52 hakzsam: yes, only
22:52 karolherbst: mhh
22:52 karolherbst: would be silly to not optimize lg2+mul+preex2+ex2 into 2 or 3 muls actually
22:52 karolherbst: ...
22:52 hakzsam: I mean, pow(a,b), a is an uniform (of course) and b an immediate
22:53 karolherbst: mhhh
22:53 karolherbst: right
22:53 karolherbst: doesn't make sense otherwise
22:53 karolherbst: ...
22:53 hakzsam: yep
22:53 karolherbst: k
22:53 karolherbst: I will do some benchmarks then I guess
22:53 hakzsam: you can write some shader_test files
22:53 karolherbst: no idea which immediates pixmark had
22:54 karolherbst: well, I got some perf improvements in pixmark with this, so I mainly care about perf there
22:54 karolherbst: and if it is still faster with pow 7 => mul, then so be it
22:55 karolherbst: thanks for checking
22:55 hakzsam: not sure if pow(a,7) will be faster with muls though :)
22:55 karolherbst: I am sure it is
22:55 hakzsam: it highly depends on the number of SFU ops you use
22:55 karolherbst: mul+preex2+ex2 ;)
22:55 hakzsam: yeah, but the latency is different
22:55 hakzsam: at least on maxwell
22:55 karolherbst: sure
22:55 karolherbst: but the perf differences was really high
22:56 karolherbst: for lower immediates that is
22:56 hakzsam: yep
23:05 karolherbst: mhh "total instructions in shared programs : 2818227 -> 2822512 (0.15%)"
23:05 karolherbst: it is getting better
23:06 hakzsam: still + :)
23:07 karolherbst: I am sure that the mov is still there or something
23:08 karolherbst: ...
23:08 karolherbst: " 44: ld u32 $r4 c3[0x8c]"
23:08 karolherbst: all the fun as it seems
23:08 karolherbst: " 45: mul dnz f32 $r2 $r2 c3[0x8c]"
23:09 karolherbst: ...
23:09 karolherbst: this was before
23:11 karolherbst: "109: ld u32 %r77 c3[0x8c]" "110: pow f32 %r78 %r12 %r77"
23:11 karolherbst: mhhh
23:11 karolherbst: tricky situation
23:12 karolherbst: really don't want to do any kind of optimisation within legalizeSSA
23:13 karolherbst: imirkin_: don't you think this will get way too ugly this way?
23:13 imirkin_: pretty easy to check
23:13 imirkin_: insnCanLoad is there for a reason
23:14 imirkin_: i.e. do what LoadPropagation does
23:14 karolherbst: sure, and then there will be 3 or 4 other things as well
23:14 karolherbst: I would have to check if i->getSrc(1) is a ld or something silly
23:14 karolherbst: and then do this
23:14 karolherbst: and then there is other opts I missed there
23:14 karolherbst: *are
23:15 imirkin_: your call.
23:15 karolherbst: and then this is just for nvc0
23:16 imirkin_: you could create a generic pass
23:16 imirkin_: which is run at the end
23:17 karolherbst: you mean instead of lower in the _lowering_ files, just have a lower pass?
23:17 karolherbst: *lowering
23:17 karolherbst: for pow I mean
23:17 karolherbst: not generally
23:18 imirkin_: ya
23:18 karolherbst: the point is, I really don't want to do any kind of optimizations in this pow lowering as well, cause there will be something I will still miss
23:18 karolherbst: or won't do as good as the passes
23:19 karolherbst: having pow within the SSA stage has it's benefits though, because we could add a bunch of algebraic opts though
23:19 karolherbst: maybe
23:19 imirkin_: well, really you just want something after ConstantFolding
23:19 imirkin_: but before LoadPropagation
23:20 imirkin_: i think hakzsam is adding such a pass
23:20 imirkin_: for his shl-add thing
23:20 karolherbst: ohhh, is he
23:20 karolherbst: I see
23:20 karolherbst: then I wait until he finished his stuff
23:20 karolherbst: and do this postraconstfolding for mads first :)
23:20 hakzsam: it's actually done, but I'm running piglit before
23:21 karolherbst: okay :)
23:21 karolherbst: well
23:21 karolherbst: the post ra constant folding thing for mads is also pretty impressive
23:21 imirkin_: that one has to be post-ra though
23:21 karolherbst: sure
23:21 karolherbst: still impressive
23:21 karolherbst: "total instructions in shared programs : 2573359 -> 2567629 (-0.22%)"
23:22 karolherbst: + "total instructions in shared programs : 2567629 -> 2567241 (-0.02%)"
23:22 karolherbst: + "total instructions in shared programs : 2567241 -> 2564977 (-0.09%)"
23:25 karolherbst: going to bed now aynway :)
23:27 karolherbst: I sent out the LocalCSE "fix", in case you didn't notice ;)
23:27 imirkin_: i did...
23:27 karolherbst: k
23:31 imirkin_: jason seems to think it's ok. good enough for me.
23:32 hakzsam:wonders if piglit will think it's ok as well :)
23:32 imirkin_: bbl
23:33 hakzsam: imirkin_, btw, the post ra constant folding opt is cool for nvc0, but unfortunately this is not going to reduce the number of GPRs, and in a nice world we could do something much better
23:50 imirkin: hakzsam: post-ra folding for MAD isn't constant folding. and it can't reduce GPRs, but it can compress the instruction stream a bit.