01:49danofsatx: anybody seen this one yet? nouveau 0000:01:00.0: fifo: read fault at 003e403000 engine 00 [PGRAPH] client 08 [GPC2/] reason 02 [PAGE_NOT_PRESENT] on channel 21 [005e728000 factorio]
07:29orbea: nice, reclockng isn't pushing my system as hard as it used to it seems :)
08:43hermier: karolherbst: still there ?
08:47karolherbst: hermier: yeah
08:48hermier: karolherbst: I got maybe the dumbest idea ever, did you think that using https://en.wikipedia.org/wiki/Duff%27s_device could help you to generalise pow somehow ?
08:53hermier: karolherbst: if you don't kown about it it is used for implementing memcp but I think it can be reused for pow(float, int)
08:53karolherbst: I highly doubt that
08:53karolherbst: cause pow can take any float falue
08:54hermier: yes but you could use it for int values (if that does help)
08:56hermier: anyway as I said, it may be dumb ^^
09:48mwk: duff's device is a horrible idea for GPUs...
09:54karolherbst: I thought the idea was to trick the compiler to generate better code anyway
09:55karolherbst: and isn't it pretty much useless these days anyway?
15:17Tom^: karolherbst: have you tried shadow of mordor on linux?
15:20karolherbst: Tom^: I think pmoreau did
15:20karolherbst: threading issues with nouveau though
15:20Tom^: was thinking on how well it runs, and mostly on blob :P
15:22karolherbst: no clue
15:23Tom^: was better back in the days when there were demos :(
15:32karolherbst: Tom^: well, today we have DLC! practically the same, cause for hte full game you pay 60€ more!
16:06pmoreau: karolherbst: I don’t remember ever running Shadow of Mordor, be it on Linux, macOS or Windows
17:10karolherbst: pmoreau: ohh then it was hakzsam
17:18jrun: nouveau flooding the dmesg
17:18jrun: kernel: nouveau 0000:01:00.0: DRM: skipped size 0
17:19imirkin_: you have a buggy client of some sort
17:19imirkin_: that's trying to allocate 0-sized BO's
19:20karolherbst: what was PTE again? "fifo: write fault at 000024c000 engine 00 [GR] client 02 [GPC0/PE_0] reason 02 [PTE] on channel 2"
19:20imirkin_: page table entry
19:21karolherbst: so basically memory screwed up
19:22karolherbst: skeggsb: could you tell me which bits/bytes in the vbios aren't handled yet regarding memory reclocking? Just so that I know if I ever encounter those
19:47imirkin_: karolherbst: PE_0 (whatever that is... some engine, but not one i immediate recognize) tried to write to that virtual memory address, and there was no PTE backing that VM address.
19:47karolherbst: I see
19:47karolherbst: I just got a report about a GM108 reclocking with current master and this error
19:48imirkin_: memory is not screwed up... just missing a PTE. unlikely to be related to reclocking.
19:48karolherbst: it works on default clocks though
19:48karolherbst: there was also "fifo: write fault at 00001c2000 engine 05 [BAR2] client 08 [HOST_CPU_NB] reason 02 [PTE] on channel -1 [00ffbf6000 unknown]"
19:48imirkin_: but of course PTE's live in memory, so *if* memory is screwed up, PTEs would be as well.
19:48karolherbst: so yeah
19:49karolherbst: I guess it has one of those bits/bytes set we don't handle yet
19:49karolherbst: skeggsb said he has a few cards with reclocking issues
19:49karolherbst: sadly all of mupufs gpu reclock just fine :/
19:50imirkin_: yes, very sad.
19:50imirkin_: but kinda happy too :)
19:50karolherbst: I asked skeggsb (not enough though) to bring those to XDC...
19:50karolherbst: but somehow he didn't saw it and I forgot to ask again
20:10danofsatx: anybody seen this one yet? nouveau 0000:01:00.0: fifo: read fault at 003e403000 engine 00 [PGRAPH] client 08 [GPC2/] reason 02 [PAGE_NOT_PRESENT] on channel 21 [005e728000 factorio]
20:11danofsatx: it's a GeForce GTX 580
20:14danofsatx: That error pops up in the journal (kernel 4.7.5-200.fc24.x86_64) when I try anything with advance graphics and X freezes. I have to log in from another system to reboot the thing - I can't even restart X or the DM to recover as I was able to do with previous lockups
20:18imirkin_: danofsatx: are you using KDE?
20:19imirkin_: and also what application is that" factorio? (it's clearly cut off)
20:19danofsatx: yes. I'm also seeing quite of errors like this: https://paste.fedoraproject.org/445195/85092147/
20:19danofsatx: facotorio is a Steam game
20:20imirkin_: ok, so i guess nouveau hates that steam game?
20:20imirkin_: note that a ton of people have had a ton of issues with KDE things
20:20danofsatx: well, it didn't until 4.7.4 - up until 4.7.3 it ran just fine.
20:20imirkin_: without attempting to lay blame on the KDE folk (chances are it's not their fault), i'd recommend avoiding nouveau + kde.
20:21imirkin_: are you sure it's the kernel version that changed?
20:21imirkin_: and not the kde version?
20:21danofsatx: I was afraid you were going to say something like that... I'd really rather avoid the NVIDIA blob, and I greatly dislike the GNOME interface
20:22imirkin_: esp QtWebEngine has big issues with nouveau
20:22imirkin_: and afaik they've been integrating it into more and more things
20:22danofsatx: Let me check that update log, hang on a sec....
20:22imirkin_: well, you could try removing nouveau_dri.so from your system, or running with LIBGL_ALWAYS_SOFTWARE=1 to avoid some level of issues.
20:23imirkin_: fwiw i dislike what both gnome and kde have turned into.
20:23imirkin_: i personally use WindowMaker, and have been for over 15 years.
20:24imirkin_: if you're looking for hw that's well-supported by open source, i might recommend looking at AMD or Intel.
20:26karolherbst: pmoreau, hakzsam: do you also get heap corruptions in shader-db?
20:26danofsatx: ok, the last major update to KDE before the kernel update was plasma-5.7.5, kf5-16.08, and kde-apps-16.08
20:26karolherbst: just wanted to make sure it isn't my stuff :D
20:26hakzsam: revert the last commit from marek fixes the issue btw
20:26karolherbst: on shader-db?
20:27karolherbst: guess ome concurrency thing
20:27karolherbst: ohh, just bad luck then
20:27hakzsam: it also happens with -1
20:27danofsatx: those didn't really change much performance wise, but on my first reboot after application of the 4.7.4 kernel update, I could no longer play any Steam games without the system freezing.
20:27karolherbst: one opt of mine: https://gist.github.com/karolherbst/34c9b9b4a2465e830ad62ec9ba2fd37b :(
20:28karolherbst: what a waste of time :D
20:28danofsatx: Intel doesn't make add-in cards (that I can find)
20:28hakzsam: karolherbst, hehe
20:28karolherbst: the benefit is bigger if we run passes multiple times though, not quite sure
20:28pmoreau: I have been able to run Factorio + KDE on this laptop, but might have been before 4.7.4, I’ll have to retest.
20:28karolherbst: hakzsam: it is something like that: "optimize set,and,cvt to set"
20:29karolherbst: and the and+cvt just make a boolean value
20:29danofsatx: pmoreau: KDE Plasma 5?
20:29hakzsam: karolherbst, okay, but unfortunately this doesn't help much
20:29karolherbst: I am sure it helps more on more optimized stuff though
20:29pmoreau: karolherbst: I have never played with shader-db yet, will have to some day.
20:30karolherbst: hakzsam: anyway, I won't push out that patch, cause it is too big for that little benefit: https://github.com/karolherbst/mesa/commit/deb634900742ba18ade196bd1fdb118e1c195d44
20:30imirkin_: danofsatx: yeah, but all their non-server CPUs come with moderately capable GPU functionality
20:30pmoreau: danofsatx: I can’t test right now, but I’ll try over the week-end.
20:30hakzsam: karolherbst, okay
20:30karolherbst: I am sure this patch helped a lot more
20:30karolherbst: but I don't know the dependency yet
20:32danofsatx: my current guess is that the GPU itself is simply on the way out. This system was "free" - it served as payment for the labor of installing a new system for a customer.
20:33imirkin_: danofsatx: there were also a bunch of issues a while ago with GF110, but i believe that those were resolved some time ago
20:44mooch: hey, does anybody here know anything about the differences between nv3 and nv4 dma pusher commands?
20:45mooch: i have nv4's pusher commands emulated just fine, but i wanted to get something working for nv3 emulation too
20:46karolherbst: hakzsam: my sel folding pass: https://gist.github.com/karolherbst/9f80823550c38d7b34b24d5ca1cd67fd
20:47karolherbst: it is a monster though
20:47hakzsam: much better
20:47hakzsam: especially for spillig
20:47karolherbst: https://github.com/karolherbst/mesa/commit/404e434b4cfd0b224aae5d67e862c986b7a679a8 ;)
20:47imirkin_: in that it spills more? :)
20:47karolherbst: it isn't even complete
20:48karolherbst: imirkin_: actually the shader which spills more deserves it
20:48karolherbst: is the one
20:52mooch: i'll have to take a look at that one
20:52mwk: I now test everything about the SOLID classes *except* drawing actual shit on screen :)
20:53karolherbst: I am sure there is a much simplier way to do that SEL folding thing, but ... :/
20:53mwk: I mean, submitting vertices
20:53mwk: you'd think it'd be simple, but no
20:53karolherbst: it is just pretty annoying to write, cause you can screw up everything at any place
20:56mwk:wonders how hard it'll be to test the rasterization algorithm
20:56mwk: I suppose I'll need a good understanding of the canvas first
20:58karolherbst: that's how I like it: https://gist.github.com/karolherbst/718b5bba5c14954daa5af35a43657bb1 :)
20:59hakzsam: which opt?
20:59karolherbst: smarter CSE thing
21:00karolherbst: merges things like add(neg(a), b) and add(a, neg(b) together
21:00karolherbst: stuff like that
21:00karolherbst: no idea if this is the right way of doing it though
21:00karolherbst: maybe I should just merge it together with the LocalCSE thing
21:00karolherbst: or something
21:00karolherbst: no idea though
21:01karolherbst: but maybe it makes sense to seperate those things
21:11karolherbst: hakzsam: funny though, that the only beneift now comes from the PHI merging...
21:15karolherbst: sometimes we generate phis like "phi %r500 %r600 %r700; phi %r501 %r600 %r700"
21:15karolherbst: no idea why
21:16karolherbst: maybe I should teach the normal CSE passes to handle this
21:16karolherbst: cause they should
21:20karolherbst: hakzsam: one shader without that pass: https://gist.github.com/karolherbst/8a322ff107c5e734f17ca4b2bc8af7cf
21:20karolherbst: those phis...
21:21hakzsam: ah yeah I see
21:21karolherbst: it enables other opts later on due to this
21:21karolherbst: this shader drops from 44 to 36 instructions
21:22 < Shawn|4650M> howdy
21:22hakzsam: karolherbst, nice opt
21:22karolherbst: hakzsam: " 28: or u32 $r0 $r0 $r0" :D
21:23 < Shawn|4650M> I hear the nouveau driver is working better in current, is it ready to run on a p4 with an old nvidia geforce 7600gs?
21:23imirkin_: define 'ready'?
21:23 < Shawn|4650M> usable?
21:24imirkin_: it's all in the eye of the beholder
21:24imirkin_: easy enough to give it a shot and see if it's satisfactory for your needs
21:24karolherbst: imirkin_: do you think this phi merging belongs in GlobalCSE or LocalCSE? I don't exactly know the difference, but I would assume both should be able to detect this?
21:24 < Shawn|4650M> ah, is there a tutorial somewhere?
21:24imirkin_: karolherbst: LocalCSE = CSE within a BB
21:25karolherbst: yeah, makes sense
21:25karolherbst: I have no idea why phis aren't handled thre though
21:25imirkin_: Shawn|4650M: many distros provide something, i think?
21:25imirkin_: Shawn|4650M: chances are it's going to be distro-specific
21:25 < Shawn|4650M> for netbsd?
21:25imirkin_: Shawn|4650M: ah, you want to talk to riastradh
21:25imirkin_: who doesn't appear to be here right now
21:26imirkin_: i am not aware of the details of his porting efforts to netbsd
21:26 < Shawn|4650M> no tutorial?
21:26imirkin_: generically, you should be able to build nouveau into your kernel (no modules on netbsd, right?)
21:26imirkin_: and then that should get you basic modesetting
21:26imirkin_: if that works, you can proceed to the next step which is installing xf86-video-nouveau
21:26imirkin_: and running X. should work.
21:27imirkin_: assuming that's the case, you can build mesa and get glxgears running.
21:27 < Shawn|4650M> there is this http://pkgsrc.se/x11/xf86-video-nouveau
21:27karolherbst: localCSE starts with getEntry...
21:28imirkin_: karolherbst: dunno why not start with getPhi() tbh
21:28karolherbst: wouldn't be getFirst be okay?
21:28imirkin_: karolherbst: i don't remember what that does.
21:28karolherbst: first instruction
21:28karolherbst: even if it isn't phi
21:29karolherbst: getPhi just phis ;)
21:29karolherbst: the pass even does "for (ir = bb->getFirst(); ir; ir = ir->next) ir->serial = serial++;" :D
21:29imirkin_: that sounds reasonable.
21:29imirkin_: i can't think of an obvious reason why you shouldn't be able to CSE phi nodes
21:29karolherbst: well, let's see what happens with getFirst instead of getEntry
21:30karolherbst: I think somebody tried to be smart
21:30karolherbst: and was thinking: well phis won't be mergable with other instructions
21:30imirkin_: btw, that GlobalCSE pass is pretty much shit
21:30karolherbst: well, you can only merge phis with phis
21:30karolherbst: same profit
21:32karolherbst: :) that's how I like it: simpe and good enough benefit: https://github.com/karolherbst/mesa/commit/77a9dfa2ae54ec7f396f8427e98765a80ed491a7
21:35karolherbst: and now or(a, a) :D
21:37karolherbst: after algebraic opt the same or is or(a,b)
21:37karolherbst: ohh I see
21:40karolherbst: the second LocalCSE creates this situation
22:00mupuf: good news! I got a new machine for CI!
22:00mupuf: a colleague was about to throw it away
22:00karolherbst: is it fast? :D
22:00mupuf: it is a very small one
22:00mupuf: it is a sandy bridge, 4 cores
22:00karolherbst: decent enough
22:01mupuf: exactly :D Compared to reator, it should be decent
22:01karolherbst: I was more thinking about your laptop though
22:01mupuf: only problem, the box will only fit single-slot GPUs
22:01karolherbst: you have a saw, right?
22:01hakzsam: mupuf, well, reator is decent :)
22:01hakzsam: more than my other machine...
22:01mupuf: hakzsam: not for perf testing with the titan
22:02hakzsam: right :)
22:02karolherbst: wut... the heck
22:02karolherbst: just throwing ina random DCE
22:02mupuf: so... I wonder what I can plug in it, aside from the super loud nvc1 (which, btw, I should fix)
22:02karolherbst: after constantfolding
22:03karolherbst: "helped local ../nvidia_shaderdb/tomb_raider/2148.shader_test - 1 116 -> 32" well sure
22:03karolherbst: why not
22:05karolherbst: the hell is this shader :O
22:05karolherbst: there are actually fmas
22:07karolherbst: jo "1347: mul ftz f32 %r5146 %r5144 158456325028528675187087900672.000000" ...
22:07karolherbst: this shader is awesome
22:07karolherbst: 1344: set ftz f32 %r5142 gt %r5132 0.000000 (0)
22:07karolherbst: 1345: set ftz f32 %r5143 lt %r5132 0.000000 (0)
22:07karolherbst: 1346: sub ftz f32 %r5144 %r5142 %r5143 (0)
22:09karolherbst: I guess this could be opted into a simple slct
22:22imirkin_: Riastradh: someone was in here earlier asking about netbsd support
22:22imirkin_: Riastradh: is there anything i can point people at?
22:28karolherbst: imirkin_: what can I do instead of "LValue *val = bld.getScratch();" within LegalizeSSA ?
22:28karolherbst: getSSA() ?
22:29karolherbst: mhh, seems right, let me try
22:30imirkin_: yeah, that's fine
22:30karolherbst: get a ton of " not uniquely defined" no though... more then normal I think
22:32karolherbst: mhh "total instructions in shared programs : 2818227 -> 2822911 (0.17%)" :(
22:34karolherbst: ohhh right
22:34karolherbst: cause I reuse the SSA value
22:35imirkin_: yeah you can't do that.
22:35karolherbst: I need three new values
22:35imirkin_: bld.mkOp2v & co auto-create fresh values
22:35imirkin_: and return them
22:35karolherbst: yeah, I know
22:38hakzsam: mupuf, btw, piglit is running on reator
22:41karolherbst: still "total instructions in shared programs : 2818227 -> 2822830 (0.16%)" though :(
22:44karolherbst: ohhh I see it now
22:44karolherbst: now I keep a mov and lg2
22:44karolherbst: mov just sets an immediate
22:45karolherbst: imirkin_: do you think it makes sense to check for an immediate value within LegalizeSSA and just fold there?
22:45imirkin_: why does LoadPropagation not pick it up?
22:46karolherbst: think again ;)
22:46imirkin_: i have no idea what you're doing.
22:46karolherbst: I moved the pow lowering into legalizeSSA
22:46imirkin_: if you're adding code in LegalizeSSA, yes, you should manually propagate it in there.
22:47karolherbst: before I add my pow opt, I want to move the stuff first and make sure I don't do silly things
22:47imirkin_: why is this in LegalizeSSA?
22:47imirkin_: why isn't it in ConstantFolding?
22:47imirkin_: i can hit up + enter if you like...
22:47karolherbst: I move the pow lowering from NVC0LoweringPass to NVC0LegalizeSSA
22:48imirkin_: right. gotcha.
22:48imirkin_: this is the actual pow lowering, not the pow opt you added.
22:48karolherbst: 1. move pow lowering and make sure not to break
22:48karolherbst: 2. add opt
22:49hakzsam: karolherbst, fyi about pow lowering, blob starts using the SFU unit with pow(x, 4)
22:49hakzsam: otherwise, it uses mul
22:49karolherbst: already with 4?
22:49karolherbst: so just -3 to 3?
22:50karolherbst: or just 1-3?
22:50karolherbst: right, not that pow 1 makes any sense
22:50karolherbst: or pow 0
22:50hakzsam: I didn't check with negative exponents actually
22:50karolherbst: would be nice to know that too
22:50hakzsam: but for 2 and 3 it uses mul for sure
22:50karolherbst: but why not with 4....
22:51hakzsam: but regarding the generated code, it should not use SFU :)
22:51karolherbst: maybe 2 muls are slow than mul+preex2+ex2?
22:51karolherbst: would be really odd
22:52karolherbst: hakzsam: did you only check immediated pows?
22:52karolherbst: because if you have a random value, you also get a lg2 as well
22:52hakzsam: yes, only
22:52karolherbst: would be silly to not optimize lg2+mul+preex2+ex2 into 2 or 3 muls actually
22:52hakzsam: I mean, pow(a,b), a is an uniform (of course) and b an immediate
22:53karolherbst: doesn't make sense otherwise
22:53karolherbst: I will do some benchmarks then I guess
22:53hakzsam: you can write some shader_test files
22:53karolherbst: no idea which immediates pixmark had
22:54karolherbst: well, I got some perf improvements in pixmark with this, so I mainly care about perf there
22:54karolherbst: and if it is still faster with pow 7 => mul, then so be it
22:55karolherbst: thanks for checking
22:55hakzsam: not sure if pow(a,7) will be faster with muls though :)
22:55karolherbst: I am sure it is
22:55hakzsam: it highly depends on the number of SFU ops you use
22:55karolherbst: mul+preex2+ex2 ;)
22:55hakzsam: yeah, but the latency is different
22:55hakzsam: at least on maxwell
22:55karolherbst: but the perf differences was really high
22:56karolherbst: for lower immediates that is
23:05karolherbst: mhh "total instructions in shared programs : 2818227 -> 2822512 (0.15%)"
23:05karolherbst: it is getting better
23:06hakzsam: still + :)
23:07karolherbst: I am sure that the mov is still there or something
23:08karolherbst: " 44: ld u32 $r4 c3[0x8c]"
23:08karolherbst: all the fun as it seems
23:08karolherbst: " 45: mul dnz f32 $r2 $r2 c3[0x8c]"
23:09karolherbst: this was before
23:11karolherbst: "109: ld u32 %r77 c3[0x8c]" "110: pow f32 %r78 %r12 %r77"
23:11karolherbst: tricky situation
23:12karolherbst: really don't want to do any kind of optimisation within legalizeSSA
23:13karolherbst: imirkin_: don't you think this will get way too ugly this way?
23:13imirkin_: pretty easy to check
23:13imirkin_: insnCanLoad is there for a reason
23:14imirkin_: i.e. do what LoadPropagation does
23:14karolherbst: sure, and then there will be 3 or 4 other things as well
23:14karolherbst: I would have to check if i->getSrc(1) is a ld or something silly
23:14karolherbst: and then do this
23:14karolherbst: and then there is other opts I missed there
23:15imirkin_: your call.
23:15karolherbst: and then this is just for nvc0
23:16imirkin_: you could create a generic pass
23:16imirkin_: which is run at the end
23:17karolherbst: you mean instead of lower in the _lowering_ files, just have a lower pass?
23:17karolherbst: for pow I mean
23:17karolherbst: not generally
23:18karolherbst: the point is, I really don't want to do any kind of optimizations in this pow lowering as well, cause there will be something I will still miss
23:18karolherbst: or won't do as good as the passes
23:19karolherbst: having pow within the SSA stage has it's benefits though, because we could add a bunch of algebraic opts though
23:19imirkin_: well, really you just want something after ConstantFolding
23:19imirkin_: but before LoadPropagation
23:20imirkin_: i think hakzsam is adding such a pass
23:20imirkin_: for his shl-add thing
23:20karolherbst: ohhh, is he
23:20karolherbst: I see
23:20karolherbst: then I wait until he finished his stuff
23:20karolherbst: and do this postraconstfolding for mads first :)
23:20hakzsam: it's actually done, but I'm running piglit before
23:21karolherbst: okay :)
23:21karolherbst: the post ra constant folding thing for mads is also pretty impressive
23:21imirkin_: that one has to be post-ra though
23:21karolherbst: still impressive
23:21karolherbst: "total instructions in shared programs : 2573359 -> 2567629 (-0.22%)"
23:22karolherbst: + "total instructions in shared programs : 2567629 -> 2567241 (-0.02%)"
23:22karolherbst: + "total instructions in shared programs : 2567241 -> 2564977 (-0.09%)"
23:25karolherbst: going to bed now aynway :)
23:27karolherbst: I sent out the LocalCSE "fix", in case you didn't notice ;)
23:27imirkin_: i did...
23:31imirkin_: jason seems to think it's ok. good enough for me.
23:32hakzsam:wonders if piglit will think it's ok as well :)
23:33hakzsam: imirkin_, btw, the post ra constant folding opt is cool for nvc0, but unfortunately this is not going to reduce the number of GPRs, and in a nice world we could do something much better
23:50imirkin: hakzsam: post-ra folding for MAD isn't constant folding. and it can't reduce GPRs, but it can compress the instruction stream a bit.