01:07 Hoolootwo: I don't have logs of this or anything, which is lame, but debian x86_64 kernels 4.5 and 4.6 fail, while 4.3 doesn't.
01:08 Hoolootwo: it throws a kernel panic
01:08 Hoolootwo: my card is an NVS 3100M
01:09 Hoolootwo: I could be misremembering the thing about the kernel panic,
01:09 Hoolootwo: I need my laptop for homework stuff today, I'll try and get logs tomorrow
08:45 hakzsam: karolherbst, I'm having some troubles while running shader-db with bioshock_infinite shaders, sometimes it crashes. Is this something familiar to you?
08:45 karolherbst: nope
08:46 karolherbst: where does it crash?
08:46 karolherbst: I think bioshock uses SSO shaders though
08:46 karolherbst: maybe some new commit broke something?
08:47 hakzsam: it happens to crash with other shaders
08:47 hakzsam: #0 _mesa_glsl_parse (state=state@entry=0x58a1600) at ./glsl/glsl_parser.yy:88
08:47 hakzsam: in the parser, fun
08:49 hakzsam: (it worked fine few weeks ago)
08:51 hakzsam: which version of shader-db are you using? latest ie. c16b43092cbcaa2e4f09bd34e5700a422f67ef57 ?
08:56 hakzsam: okay, someone introduced a regression to shader-db, I will investigate
09:05 karolherbst: hakzsam: please note, that if you run shader-db in parallel, only one of the shaders in the list might crash
09:05 karolherbst: it list all shader which were currently executed, not just the crashing one
09:09 hakzsam: I don't use omp
09:16 karolherbst: huh, what does omp have to do with that?
09:17 hakzsam: shader-db uses OMP...
09:17 karolherbst: ohh
09:17 karolherbst: really?
09:17 hakzsam: yeah
09:18 hakzsam: that's why there is the -1 option, which disables multithreading
09:19 karolherbst: yeah, that I knew
09:19 karolherbst: but I wasn't aware that shader-db uses openmp
09:22 hakzsam: :)
09:27 karolherbst: uhh
09:27 karolherbst: tesla P40 is a GP102
09:27 karolherbst: I doubt we support that yet
09:28 karolherbst: yep, just GP100 and GP104
09:31 karolherbst: skeggsb: I guess adding a 0x132 would be fine for now until we know more?
11:13 karolherbst: and I guess we should add a GP106 case as well
11:13 karolherbst: or rather wait until we get such hardware and can try things out...
11:14 karolherbst: I just think we shouldn't get a situation like with he GM108 again, where users had to wait unnecessarily long until they could actually use the hw with nouveau
11:27 RSpliet: karolherbst: as soon as someone is able to verify that the subdevices for those cards are equal to the existing cards you're trying to copy-paste, support can be upstreamed
11:28 RSpliet: it's tempting to make educated guesses, but users are better served with a "sorry, chipset unsupported" warning in their logs than a crashing card/driver due to incompatible differences
11:29 karolherbst: mhh, true
11:31 karolherbst: uhh, I also see that nvd7 misses quite a lot :/
11:32 karolherbst: ohh not true, just the pmu
15:01 LuMint: hi guys, I'm having a nouveau issue
15:01 LuMint: the card is FX 5700
15:02 LuMint: https://gist.github.com/be5d02e9e99a4644b45bf2a31d8c36dc the crash occurred while in xscreensaver, the system froze, GPU got "locked up"
15:02 LuMint: there was a message about a driver (nouveau presumably) monopolizing resources
15:03 LuMint: I'm also having these warnings in mpv https://gist.github.com/b1e865cbaa1a13eb259ec661163a4479
16:43 orbea: left before anyone could tell him to not use -hwdec with mpv + nouveau :/
16:43 imirkin_: orbea: hwdec isn't exactly an option for him
16:43 imirkin_: FX 5700 = nv3x
16:43 orbea: heh...
16:43 imirkin_: it could do mpeg1/mpeg2, but we don't expose vdpau support for it. only xvmc. and i think i even turned that off.
16:44 imirkin_: coz it would cause hangs for no apparent reason
18:43 glennk: at least its not a fx 5900, where one wouldn't be able to ever hear the audio over the fan noise
18:45 imirkin_: glennk: the key is using that fan noise to produce audio...
18:46 glennk: the real key is controlling the fan to avoid the pc taking off
18:46 imirkin_: avoid?
18:46 imirkin_: i thought that was the primary purpose!
18:47 glennk: its bad for dvi connector longevity?
18:47 tobijk: meh, anybody else having problems with glxinfo not terminating when using nouveau?
18:48 karolherbst: tobijk: try to revert this commit: https://github.com/karolherbst/nouveau/commit/01bbcb69f80e1058395b737ae399c6f4ef48691b
18:49 karolherbst: it causes for me that shader-db doesn't quit every other time
18:50 tobijk: karolherbst: that does not play nicely with my backtrace: http://hastebin.com/uvobajofam.pas
18:51 karolherbst: it matches my issue
18:51 karolherbst: it also happened in nvc0_screen_destroy
18:51 tobijk: mhm
18:51 tobijk: ok will try, thanks :)
18:51 karolherbst: mupuf has also a similiar issue, no idea iv a revert also helped him though
18:52 karolherbst: tobijk: what version of nouveau are you using?
18:53 tobijk: which part? :)
18:53 karolherbst: module
18:53 tobijk: vanilla rc5
18:53 karolherbst: k
18:54 tobijk: rc6 is on the way right now
18:54 karolherbst: well if you can confirm that commit breaking stuff, that would help a lot, cause I just tested nouveau master ontop of 4.7
18:55 tobijk: karolherbst: after rc6 is done, i'll give it a try
18:55 tobijk: (after compilation is done)
18:55 karolherbst: k, thanks
19:26 tobijk: karolherbst: reverting aff51175cdbf345740ec9203eff88e772af88059 does the trick as you suggested
19:26 tobijk: so it is to blame after all
19:26 karolherbst: gnurou: ^^
19:26 tobijk: but it can be reverted within git (git revert commit)
19:26 tobijk: so nothin depends on it
19:34 karolherbst: famous last words :D
19:35 tobijk: man i like self destruction buttons
19:38 kb9vqf: So, on the Wiki page it almost looks like the Tesla K80 is NVE0. 1.) is that true and 2.) does that mean we don't have to fight with binary signed blobs for that card?
19:38 karolherbst: who cares about a kepler tesla anymore :p
19:38 imirkin_: K80 is a GK210 based on my understanding
19:38 imirkin_: which is largely identical to GK110 except it has more shared memory/register file
19:39 imirkin_: we don't treat it any differently, and i haven't the faintest clue if nouveau even loads on it
19:39 kb9vqf: ok, thanks for the info!
19:39 tobijk: kb9vqf: but it should not require signed blobs :>
19:39 kb9vqf: (and you'd be surprised who might care, loading random binaries into one's kernel can be a showstopper)
19:39 karolherbst: well
19:39 karolherbst: I mean
19:39 karolherbst: you buy such card for a reason
19:40 kb9vqf: anyway, sounds good, thanks!
19:40 karolherbst: mainly support
19:40 imirkin_: kb9vqf: not a ton of people test nouveau on such GPUs, so you'd largely be on your own
19:40 imirkin_: although we'd be happy to help
19:40 karolherbst: uhh
19:40 karolherbst: the k80 has even two cores
19:41 tobijk: Oo
19:41 kb9vqf: I was just trying to get a bead on the current state of the market, specifically if the Tesla cards that were still being sold and tested somewhat recently are completely locked down or not. I'm actually suprised the K80 isn't, though I imagine its successor is?
19:42 karolherbst: anything newer than gm200 is
19:42 karolherbst: or "lexially higher" or something like that
19:42 karolherbst: *lexically
19:43 kb9vqf:still wonders what the future of nouveau is, it sure doesn't look good right now
19:43 karolherbst: huh, why? :D
19:43 karolherbst: well yeah
19:43 karolherbst: it is a bit bad
19:43 karolherbst: but I am sure we will figure something out
19:43 kb9vqf: then the next question is, will those of us in the US be able to legally use whatever you figure out :-P
19:44 kb9vqf: (or the UK, with similarly insane laws in certain areas)
19:44 karolherbst: sure
19:44 tobijk: kb9vqf: why not, as long as nvidia releases the firmware
19:44 karolherbst: well, you can also use the nvidia firmware for video accelleration
19:44 kb9vqf: tobijk: Well, what if nVidia decides not to?
19:45 tobijk: we have a problem on the hand
19:45 tobijk: but legally it should not be a problem if we hack around the firmware (if possible)
19:45 tobijk: though i'm not a lawyer :>
19:46 imirkin_: kb9vqf: future looks grim indeed
19:46 kb9vqf: tobijk: In the US, at least, it's not legal to circumvent any access control, which the nVidia signing definitely is. At minimum it puts any such solution into such a grey area that my company woudn't touch it with a 10 foot pole :-/
19:46 kb9vqf: at least there's AMD, for now...
19:46 imirkin_: i recommend using AMD
19:47 imirkin_: no blobs in your kernel
19:47 imirkin_: driver supported by a paid team
19:47 imirkin_: clearly the better choice
19:47 kb9vqf: yep, that's what we've been doing. every now and again there's a call for nVidia stuff, so needed to ask
19:48 imirkin_: nouveau is (a) fun to hack on and (b) nice for people stuck with certain hw
19:48 imirkin_: however if you have a choice, definitely avoid nvidia hw
19:49 tobijk: kb9vqf: even if the access control is on your system and the hw is yours?
19:49 kb9vqf: tobijk: According to US law, a.) it's not your hardware and b.) it will get you in legal trouble
19:50 Hoolootwo: it's definitely legal to circumvent access control if it is in fact your hardware
19:50 karolherbst: tobijk: in uk you even have to tell the police your passwords, otherwise jail
19:50 Hoolootwo: see the TI signing key shenanigans
19:50 tobijk: so you have buy a hw license over there? :O
19:50 kb9vqf: Hoolootwo: Then go rip some Blu-Rays for me and let the MPAA know about it. Get back to me with results in ~20 years :)
19:50 kb9vqf: your hardware, your disks, right?
19:51 kb9vqf: (note: not recommending this, just an extreme example)
19:51 Hoolootwo: hmm okay will do :P
19:51 tobijk: kb9vqf: its local to make copies over here for personal use (if you manage to do it)
19:51 Hoolootwo: yeah, a single backup copy is almost always legal
19:51 kb9vqf: tobijk: I know this isn't an issue in EU law
19:51 tobijk: *local = allowed
19:51 kb9vqf: US and UK law, though...
19:52 tobijk: :/
19:52 kb9vqf: hence my snide remark earlier about whether anything nouveau does will be legal here in the US
19:52 kb9vqf: (regarding new hardware/ hacking FW)
19:52 kb9vqf: anyway I need to run, thanks for the info!
19:52 tobijk: bb
19:53 karolherbst: well
19:53 karolherbst: I don'T care what is legal within the US though
19:53 karolherbst: or not
19:54 tobijk: karolherbst: as long as possible it should be aimed for a non-troubled implementation :D
19:54 karolherbst: then we print it out and send it into the US
19:55 tobijk: :D
19:55 tobijk: will make a nice book
19:57 karolherbst: hakzsam: the increased gpr usage though
19:58 hakzsam: karolherbst, yeah, but not a big deal
19:58 karolherbst: it kind of is though
19:58 karolherbst: especially because the reduction of instructions is really slim
19:59 karolherbst: then somebody could argue to opt c = add(a, b); e = add3(a, b, d) to c = add(a,b); e = add(c, d)
19:59 karolherbst: which is the reason the gpr count went up
19:59 karolherbst: that this isn't taken into account
20:01 karolherbst: so you could try to add a check and do that add+add to add3 opt just when the first add has only one use
20:01 karolherbst: and see if that eliminates any gpr count increases
21:29 pktemp: so nouveau broke with kernel 4.6+ when using gm204 cards
21:29 karolherbst: pktemp: well, before that it wasn't supported at all
21:30 karolherbst: pktemp: by any chance, you don't have a GTX 970 with 4GB vram?
21:30 pktemp: well i was running it fine on 4.5. i suppose it was the hware accel that broke thing
21:31 pktemp: yes it is indeed a 4gb gtx 970. the probelm is the vram partitions 3.5 and .5
21:31 karolherbst: right
21:31 karolherbst: and that's why nouveau fails
21:31 pktemp: is there a rational workaround
21:31 karolherbst: there is indeed if you don't mind to compile your kernel/nouveau
21:32 pktemp: i am using a very hacky patch for it now. recompile are no problem
21:32 karolherbst: what patch are you usin?
21:32 karolherbst: that from the bug?
21:32 pktemp: https://bugs.freedesktop.org/show_bug.cgi?id=94990 that bug
21:32 pktemp: the patch posed that takes the mismatched allocation sizes
21:32 karolherbst: you use that hack with the annoying dmesg prints?
21:33 pktemp: yes it is terrible -- value oriented patch
21:33 karolherbst: ...
21:33 karolherbst: I don't even know how that guy came up with that, because it just looks wrong and "know idea what I am doing here"
21:33 pktemp: ya if size == somethign change it to something else
21:33 karolherbst: exactly
21:33 pktemp: it WORKS tho
21:34 karolherbst: there is something a little cleaner
21:34 karolherbst: pktemp: https://github.com/karolherbst/nouveau/blob/master_4.7/drm/nouveau/nvkm/subdev/fb/ramgf100.c#L567
21:34 karolherbst: set parts to 3
21:34 karolherbst: that limits your vram to 3GB
21:34 pktemp: k let me read
21:34 karolherbst: and shouldn't break anything
21:34 karolherbst: (tm)
21:34 pktemp: ah -- so instmem allocations dont go near the higher partition
21:34 karolherbst: I am sure something will be messed up, but this hack is less crappy imho
21:35 karolherbst: right
21:35 karolherbst: well
21:35 karolherbst: you could even hack the last partition to be just 512MB
21:35 karolherbst: but that requires more changes
21:35 pktemp: let me reads the patch thanks
21:35 hakzsam: karolherbst, I already take care of that actually :) This affects the number of GPRs only in bioshock and the shaders are really similar... so for real, this is not going to hurt
21:36 imirkin_: pktemp: fyi, to get back to how it worked before, just boot with nouveau.noaccel=1 nouveau.nofbaccel=1
21:36 karolherbst: hakzsam: well, increasing number of gprs is still not _that_ good
21:36 karolherbst: hakzsam: I wouldn't trace 1% more gprs for 0.2% less instructions
21:36 karolherbst: *trade
21:36 karolherbst: not even 1% vs 1%
21:37 hakzsam: add3 is implemented exactly as fma/mad, so something else is potentially wrong
21:37 karolherbst: hakzsam: add3 increases live ranges if a temporary add2 value is used elsewhere
21:37 imirkin_: hakzsam: it makes sense that add3 would cause higher gpr usage.
21:37 imirkin_: exactly.
21:37 hakzsam: yeah
21:38 imirkin_: or rather - the opposite - if it's *not* used elsewhere
21:38 karolherbst: that's why it might make sense to fix it up again
21:38 karolherbst: or only opt if it does make sense
21:40 karolherbst: anyway, with a proper scheduler one or two more instructions won't actually matter, cause we would hide latencies with those anyway, and the lost parallel execution through higher grp counts will hurt more
21:46 karolherbst: imirkin_: ever thought about a way to make RA smarter about choosing regs for d,t,q reg accesses?
21:48 RSpliet: sounds like the kind of stuff nightmares are made of
21:48 karolherbst: exactly
21:49 imirkin_: d,t,q?
21:49 karolherbst: sometimes a good opt will produce worse code due to RA inserting silly moves
21:49 karolherbst: imirkin_: double, tripple, quad
21:49 imirkin_: oh
21:49 imirkin_: it tries.
21:49 karolherbst: I know
21:49 imirkin_: it just fails.
21:49 karolherbst: that's why I said "smarter"
21:51 hakzsam: karolherbst, yeah, that could be eventually improved :)
21:52 skeggsb: karolherbst: i have a gp102 on my desk, i'll add it once i double-check it
21:52 karolherbst: skeggsb: hi, the coherrent patch did break things for sure :/
21:52 karolherbst: skeggsb: awesome :)
22:01 karolherbst: regarding my pow to mul lowering, would something like that be "good enough" or should I make the code a little smarter? https://github.com/karolherbst/mesa/commit/6b5c0be3bbaac457129b7796f3ba359a1f728a5d
22:02 karolherbst: uhh, it is dump anyway
22:02 karolherbst: I tried to be smart though and failed
22:03 imirkin_: i->setSrc(0, bld.loadImm(NULL, 1));
22:03 imirkin_: you probably want 1.0f
22:03 karolherbst: ohh, okay
22:03 imirkin_: tbh i don't really see what the createPOW* helper is getting you
22:03 karolherbst: I tried to make the code work for 0-15
22:04 imirkin_: ah
22:04 karolherbst: I have to think it through
22:04 karolherbst: I could though just create muls and let CSE merge those together
22:04 karolherbst: and make the lowering pretty dump
22:04 karolherbst: or just chain muls together
22:04 imirkin_: nah
22:04 imirkin_: don't bother with higher powers
22:04 karolherbst: okay, so 0-4 is good enough?
22:05 imirkin_: definitely.
22:05 karolherbst: it did give a decent perf up in pixmark_piano
22:05 imirkin_: you could also do -1..-4
22:05 karolherbst: k
22:06 karolherbst: maybe even some fast path for 0.5?
22:06 imirkin_: if you like.
22:06 karolherbst: mhh, well I would check before if that actually happens often enough
22:06 karolherbst: the same applies for nv50 as well I assume?
22:07 imirkin_: probably.
22:07 karolherbst: mhh
22:07 imirkin_: i'd kinda rather there be a peephole pass for it
22:07 imirkin_: like in the constant folding one
22:07 karolherbst: doing it for tgsi->nv50ir sounds like a wrose place to do that though
22:07 karolherbst: mhh I see
22:07 imirkin_: and move the POW handling into legalizessa
22:08 imirkin_: otoh... then you lose out on the DCE for the power. dunno.
22:08 imirkin_: er
22:08 imirkin_: CSE
22:08 imirkin_: otoh... i doubt it matters
22:09 karolherbst: well I would say it fits better while translating tgsi to nv50ir though, if the know the target can't do pow anyway
22:09 imirkin_: yea
22:09 imirkin_: either's fine
22:09 karolherbst: k
22:09 imirkin_: problem si that you might not know it's a const at tgsi processing time
22:09 karolherbst: mhh :/
22:10 imirkin_: so yeah, i'd flip the pow lowering to happen at legalizessa time
22:10 imirkin_: and then add an entry to ConstantFolding to deal with pow(x, imm)
22:10 karolherbst: legalizessa is when the nv50ir is translated into ssa form?
22:10 imirkin_: no
22:10 imirkin_: it's *after* all the ssa opts
22:10 karolherbst: ahh I see
22:10 imirkin_: right before RA
22:11 karolherbst: do you think nvidia hw may support pow in the future?
22:11 karolherbst: because then I would check for something like target->supportsOp(OP_POW)
22:11 karolherbst: and then do the mul conversion
22:11 karolherbst: in constant folding or so
22:15 imirkin_: we can worry about that later
22:16 karolherbst: k
22:21 RSpliet: karolherbst: unlikely, there's little to gain from hw pow over a series of muls