01:41imirkin: dboyan_: your updated proposal is *much* better
02:03mangix: Horizon_Brave: worse yes.
02:12whompy: Excepting anholt. He's a winner.
02:22dboyan_: imirkin: Thanks, will do a minor update later. I think I'll upload the final version this evening or tomorrow morning.
02:22imirkin: dboyan_: sounds good
02:28imirkin: dboyan_: what's the status of your rcp/rsq64 patches?
02:28imirkin: (sorry, i lost track...)
02:38dboyan_: imirkin: iirc I sended out v2 and was waiting for review, especially on rsq
02:39dboyan_: i fixed the issue with rsq in the first version did some testing. I thought it was precise enough
02:39imirkin: ok cool
02:39dboyan_: btw, I haven't looked at the nir lowering mentioned by Elie yet
02:40imirkin: no need
02:40imirkin: iirc he had glsl passes too
02:41imirkin: oh, but i think that there's nir lowering not from Elie that did rsq/rcp :)
02:41dboyan_: yeah, I was talking about that one
02:42imirkin: oh. *mentioned* by Elie. right. i can't read :)
02:44dboyan_: imirkin: do you want me to port the code to other architectures?
02:48imirkin: dboyan_: hold on that - let's agree on it first before you go through the effort
02:49dboyan_: okay, I won't hurry
02:51dboyan_: google is reminding me that 'less than 39 hours remain to submit your Final PDF Proposal' :)
03:12imirkin: skeggsb: https://hastebin.com/zorujibaba.go :(
03:12imirkin: this was with the G92 rendering onto the GK208
03:13imirkin: or airlied perhaps?
03:20imirkin: 2b:* f3 90 pause <-- trapping instruction
03:20imirkin: oh, i guess it's not that useful, it's just pointing out there's a soft lockup...
04:20dm_comp: does nouveau support NVIDIA Corporation GM107M [GeForce GTX 960M] (rev a2)
04:20gnarface: are you debugging h264 decoding on G92, imirkin?
04:21dm_comp: xrandr --listproviders Providers: number : 1
04:21dm_comp: it sould be 2
04:25dm_comp: I also get Could not find provider with name nouveau
04:29gnarface: dm_comp: https://nouveau.freedesktop.org/wiki/FeatureMatrix/
04:30gnarface: dm_comp: (if your experience doesn't match, i doubt i can actually help, but good first guesses usually include things like "did i remember the non-free firmware package?" and "did i forget to blacklist the nvidia binary driver?"
04:31dm_comp: nvidia binary driver not installed
04:31gnarface: dm_comp: lots of this stuff changed relatively recently in distro-release time though, so don't expect the old kernel in debian stable to have all this necessarily - you might need to update
04:31gnarface: dm_comp: (it's possible that card is simply not supported too, but i thought it was, i just don't actually know)
04:32gnarface: seems like everyone else has gone to sleep in here though
04:32gnarface: so despite knowing almost nothing, i decided to share the link to the feature matrix at least then you know what i know
04:35dm_comp: gnarface thank you
09:15eyenseo: Who can I talk to regarding the GSoC?
09:15karolherbst: eyenseo: depends on what you want
09:16eyenseo: Well I know that I'm late to the party but I would like to participate
09:16eyenseo: While I am not very good with amths I do know C++ pretty good so I would like to do the Instruction scheduler
09:17karolherbst: eyenseo: that one is already taken though
09:17eyenseo: ah ok =/
09:17karolherbst: _maybe_ you can help out the person doing that, but I don't know how that would work out
09:18karolherbst: eyenseo: will you be still a student next year?
09:18eyenseo: karolherbst: yes, for at least 2 years - Master degree
09:19karolherbst: then we could do this: until next year you can do smaller tickets/bugs/cards for nouveau or other Xorg related projects and then you can do a proper Gsoc next year
09:20eyenseo: karolherbst: yeah we could do that - I asked now because some of my units for the SuSe were canceled
09:20karolherbst: having everything worked out until tomorrow is too much, especially because you also need a mentor (somebody from the nouveau project) and the proposal and so on
09:20karolherbst: ohh, I see
09:20eyenseo: karolherbst: would have been a nice 'filler' ;)
09:22karolherbst: eyenseo: by any chance, are you doing something hardware security related?
09:23eyenseo: no, not really, I had a unit about security and another for testing but it wasn't really about hardware
09:24eyenseo: Currently I have a project running were 'we' compare crypto libraries in different languages - but again it's almost 100% software
09:24karolherbst: no worries, we just have the signed firmware problem and if somebody with proper knowledge would want to tackle this, it would be really helpful
09:26eyenseo: sounds challenging - with my current knowledge I would be of no help :/
09:29airlied: karolherbst: tackle it how btw? find hacks/bypasses?
09:29karolherbst: airlied: finding issues within the hw crypto implementation
09:29karolherbst: there were some serious in in pre maxwell2, but they all got fixed
09:30karolherbst: and with serious I mean usefull
09:32karolherbst: airlied: another idea is to extract the firmware images from the propritary driver and find issues there, but this is more work and less useful than a to do that throught he hw directly
11:27karolherbst: imirkin: I have like 5 opts which only show any effect if I loop the opt passes, but there are also a few bugs if we loop them, I guess I could figure out how to improve the current passes a bit
11:42karolherbst: okay, on my pixmark_piano branch I get +10% perf, let's work on those patches first
11:45mupuf: karolherbst: holy shit, nice!
12:02karolherbst: hakzsam: did you see something special regarding join for the maxwell sched opcodes?
12:07karolherbst: mupuf: I get +4.5% just by reorder instructions a bit do improve the dual issue rate
12:08mupuf: I am not surprised
12:08karolherbst: dboyan_: if you want to messure perf regarding scheduling, do it with gputest pixmark_piano
12:08mupuf: Hopefully, we will have something better than that at the end of the summer!
12:11RSpliet: karolherbst: just because there's a lot to gain? rather measure perf using a diverse set of games people actually play...
12:11karolherbst: RSpliet: no, because scheduling matters there most
12:12RSpliet: karolherbst: my favourite game is glxgears
12:12karolherbst: the issue with testing with games is: better scheduling doesn't have to improve perf, because someting else is shit
12:12RSpliet: said nobody ever
12:12mupuf: karolherbst: that is the thing with bottlenecks ;)
12:13karolherbst: and with pixmak_piano our only bottleneck is shader execution time
12:13mupuf: but see it like this: Lower power consumption => can yield higher clocks
12:13karolherbst: mhhh, well....
12:13mupuf: and avoiding stalls is a nice way of increasing the power efficiency
12:13karolherbst: same bottleneck in the end
12:13RSpliet: karolherbst: I got that, but why improve something that only a few benchmark websites care about - rather than actual users?
12:14mupuf: in this case, yes. But what if you were memory-limited and you can suddenly boost the memory clock a little bit
12:14karolherbst: RSpliet: the idea is, if we get proper instruction schduling and it improves perf a lot in the pixmark_piano benchmark, it means that this pass does something good and may affect other things as well
12:14karolherbst: mupuf: not affected by temp
12:14karolherbst: memory clock isn't boosted
12:15RSpliet: and no, DRAM goes about and power down banks if unused for a while. The more efficient you are with DRAM accesses, the longer your banks can be in low-power state
12:15mupuf: karolherbst: not yet ;p
12:15karolherbst: mupuf: well.... let us first get us a usefull PMU :p
12:15mupuf: that is not for debate :D
12:15mupuf: we indeed need to do that
12:15RSpliet: karolherbst: because you are tailoring your optimisation pass for something with insane shaders like pixmark_piano. You probably have to be a bit more clever in your policy for actual games
12:15karolherbst: no idea if pascal boosts memory clocks, I think they do though
12:15RSpliet: or not, but it's important to validate that
12:15karolherbst: baut seriously...
12:16mupuf: RSpliet: what would you rather have dboyan_ do then?
12:16karolherbst: RSpliet: true, but pixmark_piano improves motivation, because you _notice_ a difference
12:16mupuf: it will be useful for opencl too
12:16karolherbst: if you try to write a scheduling pass and it has no effect, you have to figure out why
12:16mupuf: and it is self-contained
12:16mupuf: and does not require REing
12:16mupuf: so, pretty good topic for a GSoC
12:16karolherbst: then you may rewrite your scheduling again, but it wasn't the issue, but something else
12:17RSpliet: mupuf: the topic is brill
12:17karolherbst: and then after a day wtihout any changes, you give up, cause it doesn't do anything
12:17karolherbst: that's why something first, where you notice the change, then check where perf changes as well
12:17RSpliet: but make sure to benchmark with real games eventually
12:17karolherbst: yeah, next step
12:18RSpliet: karolherbst: no, in lockstep with figuring out the "instruction picking" part of your sched algo
12:18karolherbst: and then where scheduling had a relatively big effect, the scheduling could be improved with that game as well
12:18RSpliet: you risk overoptimising for the wrong use-case
12:18karolherbst: that isn't the point and never was
12:18karolherbst: we could do that if we would know the bottlenecks in every application
12:18karolherbst: fact is, we don't
12:18karolherbst: except a few examples
12:18karolherbst: and most/all of those are micro-benchmarks
12:19RSpliet: don't we have perf counters to show the efficiency of shaders, thanks to hakzsam?
12:19karolherbst: sure you can say: this game runs terribly, cause we don't schedule properly, but then again: do you know for sure or are you just guessing?
12:20karolherbst: RSpliet: well, they can't really tell you about the bottlenecks, I already tried
12:20karolherbst: we need those other counters for that
12:20RSpliet: thought we had a pipeline stall counter
12:20RSpliet: good strategy -> reduce stalls
12:21karolherbst: well, sure, but I don't think we have those yet
12:22karolherbst: the point I was just trying to make was, that starting with pixmak_piano is good for motiviation, because scheduling has a big impact here
12:22karolherbst: nothing more
12:24RSpliet: Yes I read that, but you seem to refuse to acknowledge that this might well be a misleading strategy, because it's not a real game
12:24karolherbst: I never set it should _only_ be written against it
12:25RSpliet: diversity, analyse, measure, understand. Don't focus on a single target unless you know that target is what people care about
12:25mupuf:fails to see why real time comes in the picture here. If anything, this reduces the pressure on the bus, which reduces the variance
12:25mupuf: which is good for real time
12:25RSpliet: mupuf: who mentioned real time?
12:25mupuf: just read "real time game" :D
12:26karolherbst: also, it's a fast benchmarks, benchmarking games is time consuming
12:27mupuf: karolherbst: because we do not use the right tools
12:27karolherbst: especially if you don't know which game should get a perf boost, and you test 10, because you don't know it
12:27mupuf: we need to finish the perf counter supporrt
12:27mupuf: and use it with apitrace
12:27mupuf: so as we can see actual changes at the draw call level
12:27mupuf: and spend time where necessary
12:28mupuf: there is an open source project from intel (frame_retracer) that could help, but we still need to work on this!
12:29mupuf: and I'm sad I will not get a student this year to finish the GUI for apitrace's perf counter
12:29karolherbst: +1.1% by smarter pow lowering... good enough
12:29karolherbst: mupuf: we should try to finish all the counter work for nouveau first anyway :p
12:29mupuf: karolherbst: yes
12:30mupuf: it is all in the userspace now
12:30mupuf: the kernel space landed IIRC
12:30karolherbst: (ignoring the PMU counters :P but those are pretty useless to begin with)
12:31karolherbst: but I think there was something missing
12:31karolherbst: did all the MP counter work land already?
12:31mupuf: MP counters are mostly fine
12:32mupuf: it's pcounter that is not supported in the userspace
12:32mupuf: one reason for that was that is that queries in gallium were not meant to be polled at the same time
12:32mupuf: samuel had a patch for that, not sure if it landed
12:33karolherbst: ohh, I think those landed
12:33mupuf: karolherbst: pcounter is the engine that allows you to know everything at the GPU-level
12:33mupuf: number of PCIe requests, split by sizes
12:33mupuf: same with RAM
12:33mupuf: or vdec, etc...
12:34mupuf: there are ~7 clock domains
12:37karolherbst: mhh engine/pm seems to be the right stuff
12:40mupuf: karolherbst: ask hakzsam, he has a list of events reverse engineered and what is still left to do
12:45karolherbst: imirkin: is there a good place for the POW -> MUL lowering for small constants? I don't really want to do that for in every ir_lower_* file, but still want to get all the opts
12:48karolherbst:is wondering if we should do opts before lowering....
13:56imirkin: karolherbst: hm? algebraicopt would make sense...
13:56karolherbst: imirkin: we don't have pows in SSA
13:56imirkin: oh, coz we lower them pre-ssa? that's dumb, we should do that in the legalize stage
13:57karolherbst: okay, but then we just miss out the opts on the pow -> stuff result
13:57karolherbst: but I guess this is fine
13:59karolherbst: imirkin: in legalize POST_RA?
14:00imirkin: not post-ra
14:00imirkin: but still ssa
14:00karolherbst: I meant the pow -> stuff translation
14:00karolherbst: mhh I see
14:06imirkin: e.g. nvc0legalizessa
14:08karolherbst: yeah, already found it
14:08karolherbst: if I find some time, I will try to share more code between the chipsets
14:08karolherbst: handlePOW should be always the same
14:09imirkin: nv50 doesn't have the fmz thing
14:11karolherbst: ohh, the gm107 thing is a subclass of nvc0 and there is no gk110...
14:11karolherbst: okay, fair enough
14:17karolherbst: imirkin: the code is still the same
14:31gnurou: a little announcement: https://plus.google.com/+AlexandreCourbot/posts/BdWyfgsfp5J
14:32imirkin: gnurou: good luck!
14:32karolherbst: gnurou: hi :)
14:33RSpliet: gnurou: I'm sorry to see you go...
14:33RSpliet: but good luck with all your future endeavours
14:33RSpliet: (and don't be a stranger ;-))
14:34pmoreau: gnurou: Good luck with what you’re planning next!
14:34gnurou: yeah. I'm sad to go honestly. but I am confident we will get other chances to share a drink
14:34pmoreau: gnurou: Hope you’ll continue hanging around, even if you do not conribute.
14:34gnurou: pmoreau: sure, I will
14:34karolherbst: gnurou: will you come to XDC? :D
14:35RSpliet: if you stick around the OSS GPU communities, I'm sure we'll meet @ XDC, FOSDEM or any of the other numerous events ;-)
14:35gnurou: and if I miss it too much I may even consider *gasp* buying a NVIDIA GPU for myself ;)
14:35gnurou: karolherbst: I hope too - too early to say though
14:37RSpliet: also: thank you for battling with your peers and superiors for years for us. It's much appreciated
14:37karolherbst: mupuf: now we have to find somebody else which will forward our questions to the right people :O
14:37pmoreau: gnurou: We can send you some samples for RE'ing purposes :-p
14:37gnurou: I wish I could have done more, really :/
14:38karolherbst: gnurou: I think after ben you have the most commits now in the kernel :p
14:38pmoreau: karolherbst: I guess Andy or some of the other NVIDIA guys that were at XDC could help.
14:38mupuf: gnurou: yes, you will be missed!
14:39mupuf: and have fun in your new job too!
14:39karolherbst: pmoreau: first we have to get them to talk here inside the channel :D
14:40mupuf: gnurou: there is only so much one person can do, no worries, we are very grateful of your work!
14:40mupuf: and it has been a pleasure working with you
14:40pmoreau: gnurou: Nothing public yet on what you will do next, I assume?
14:41RSpliet: pmoreau: Valve doesn't let him say yet :-P
14:41pmoreau: RSpliet: To work on RadeonSI, right? :-D
14:42gnurou: haha. soon, if I don't get fired on my first week ;)
14:42pmoreau: gnurou: That would be unfortunate… :-/
14:44karolherbst: gnurou: if you get fired, you get help us out in the meantime :p
14:54karolherbst: imirkin: moving the POW lowering after SSA: https://gist.github.com/karolherbst/0ba3adf0456e48d5147f6920f638bede
14:55imirkin: either you did something silly, or there are a lot of situations where one does pow(a, 2), pow(a, 3), pow(a, 4) etc
14:55imirkin: since then those log2's would be CSE'd
14:55karolherbst: another idea: more movs
14:56imirkin: given the number of shaders hurt... unlikely
14:56imirkin: usually mov's are semi-random
14:56imirkin: so some would be hurt, some would be improved
14:56imirkin: here they're all hurt
14:56imirkin: except the one lucky one
14:56karolherbst: well we miss all the opts now
14:57karolherbst: the mul gets one value of the pow
14:57karolherbst: could be an immediate
14:57karolherbst: and isn't immediated anymore
14:59imirkin: check some of the hurt shaders
14:59imirkin: and see what's up
15:00karolherbst: yes, one issue are more movs, but there are more
15:01karolherbst: "lg2 f32 %r451 abs %r274; mul dnz f32 %r452 %r451 15.000000" vs "abs+mov+lg2 f32 %r594 %r428+mul dnz f32 %r595 %r433 %r594"
15:01karolherbst: so modifiers aren't folded in as well :/
15:01karolherbst: that will be a fun lowering pass in the end
15:03karolherbst: I have an idea
15:04imirkin: you could fold them into POW
15:04imirkin: and then make sure to copy them over
15:05karolherbst: I just declare what we could do with a POW
15:05imirkin: that's what i mean.
15:05karolherbst: I meant the table inside the target classes, or did you mean this as well?
15:05karolherbst: yes, you meant this
15:05imirkin: yes, i did.
15:06karolherbst: src0 == lg2.src0 and src1 == mul.src0
15:07imirkin: you could make it mul.src1 for the load propagation aspect
15:07karolherbst: imirkin: can I simply assign the .src(x) objects?
15:08imirkin: but you can just swap out i->op for one of them
15:08karolherbst: I do it for the ex2
15:08karolherbst: do I have to copy something besides .mod?
15:09imirkin: shouldn't be any indirects or any other funny business... i think that's it
15:15dboyan: imirkin: about the ARB_shader_clock thing, I think the blob is doing something wrong there, but the rollover (clocklo overflow) should be taken into account.
15:16imirkin: dboyan: we could also just not worry about it, and feed out only the "low" bits
15:16imirkin: (as long as we put them in the upper 32-bits of the result)
15:16imirkin: dboyan: btw, i assume you saw nha's ARB_shader_ballot patches -- those should be nicely implementable for kepler+
15:16imirkin: (fermi didn't have the SHFL.IDX op)
15:17RSpliet: dboyan: you missed a long heated discussion about how to benchmark scheduling ;-)
15:20dboyan: imirkin: getting clocklo into upper 32 bits is also okay, since it takes a few seconds for clocklo to overflow. But I guess getting clockhi is not that hard either. I came up with an idea, which only needs a loop
15:21imirkin: dboyan: but couldn't you just forget about the high bits and just use the low bits (but stick them high)?
15:22karolherbst: (if a shader runs for more than a second, we have other issues anyway)
15:22dboyan: imirkin: okay, if we decide to do that way, I think we may even get ARB_shader_clock on nv50 ;)
15:23imirkin: well, the thing is - if it doesn't start at 0
15:23imirkin: then it can overflow whenever
15:24imirkin: perhaps that's the diff between clock and globalclock?
15:24imirkin: dboyan: well, we should def get clock on nv50 -- that one's unambiguous :)
15:24imirkin: i have one plugged in so i can test if needed
15:24karolherbst: imirkin: most likely. ARB_shader_clock defines a shader local clock
15:24karolherbst: so it can start at 0 every time
15:24imirkin: karolherbst: it doesn't define it one way or the other.
15:24imirkin: [the spec doesn't]
15:25dboyan: clockhi/lo seems to start from 0, at least on my card
15:25karolherbst: imirkin: in practise, it's define one, because it doesn't guarentee it's useable as a global clock
15:26karolherbst: imirkin: it doesn't even guarentee to be useable as a clock between different shader stages
15:27karolherbst: mhh but this part should be important to your issue: "The returned time will wrap after it exceeds the maximum value representable in 64 bits."
15:27imirkin: karolherbst: sure, but if it's a global clock, it won't complain :)
15:27imirkin: that's what i mean by it's not defined
15:27karolherbst: I see
15:27imirkin: the important part is wrap detection
15:28imirkin: which means that the high bit of precision has to be in the high bit of the 64-bit value
15:28imirkin: otherwise the shader won't be able to detect a wrap event
15:28imirkin: also, having a global clock is probably advantageous for profiling overall draws (rather than just a single shader invoc)
15:29karolherbst: what happens with 0xffffffffffffffff + 1?
15:29imirkin: that's aka a wrap event
15:29karolherbst: does something bad happen?
15:29imirkin: but the shader can detect it.
15:29karolherbst: okay, and why does it have to be detected?
15:29imirkin: because 1 < 0xffffffffffff :)
15:29imirkin: and normally time moves forwards
15:29karolherbst: true, but the spec doesn'T care
15:30karolherbst: "The returned time will wrap after it exceeds the maximum value representable in 64 bits."
15:30imirkin: and normally there are various assumptions about that when you're grabbing time.
15:30dboyan: imirkin, karolherbst: One hint here, the blob is using local clock
15:30karolherbst: yeah, because if application use this as a global one -> bug
15:30imirkin: karolherbst: not if the application does proper wrap detection
15:31imirkin: which is what we've been discussing this whole time.
15:31karolherbst: it is always a bug
15:31imirkin: the application needs to detect it
15:31karolherbst: it is no global clock, so application shouldn't use it as one
15:31karolherbst: the application
15:31karolherbst: not the driver
15:31imirkin: right. but the driver needs to structure things so that the application *can* detect it
15:32imirkin: e.g. if you stick the value into the lower 32 bits and let that wrap
15:32karolherbst: I don't see anything inside the spec, which tells the driver to do a proper overflow check
15:32imirkin: then the application may never notice.
15:32imirkin: so the value's MSB needs to be in the 64th bit.
15:32dboyan: Even more strangely, the blob puts clockhi in lower bits
15:32karolherbst: no, the spec stricly say: if overflow, then wrap
15:32imirkin: karolherbst: re-read what i said and try to understand it.
15:32RSpliet: "The units of time are not defined and need not be constant.". I wonder what games we can give a "swift kick in the pants" by assuming a clock faster than NVIDIA :-P
15:33imirkin: RSpliet: probably due to reclocking happening? dunno.
15:33karolherbst: imirkin: I just don't see why nouveau has to do this
15:33karolherbst: the spec is clear about this point
15:33imirkin: karolherbst: consider this situation
15:33imirkin: you have a hw 32-bit counter
15:33imirkin: the API returns a 64-bit value.
15:33imirkin: if you return your 32-bit counter in the lower 32 bits
15:34imirkin: then when your 32-bit counter wraps
15:34imirkin: then the API value will go from 0x000000ffffff to 0x000000000
15:34RSpliet: (imirkin: https://lkml.org/lkml/2005/7/8/263 )
15:34imirkin: this is bad.
15:34agusyc: Hi, guys.
15:34karolherbst: pro tip: x << 32
15:34agusyc: I'm having some trouble with nouveau on a hybrid graphics laptop.
15:34imirkin: so as i was saying, for wrap detection to work
15:34imirkin: you must stick your value's MSB in the 64th bit.
15:35agusyc: When I use nouveau (I have it blacklisted now), the touchpad hangs after I resume it from suspend and when I try to turn it off, the Laptop freezes completely.
15:35karolherbst: imirkin: okay, I missed that hw counter is 32bit wide fact
15:35RSpliet: agusyc: first the basics: Kernel 4.10 or 4.11rc?
15:35imirkin: karolherbst: i dunno how wide it is. but irrespective of how wide it is, the high bit of the counter has to be in bit 64 of the returned value.
15:36agusyc: RSpliet: 4.10.6-200.fc25.x86_64
15:36karolherbst: not if the counter is 64 bit wide
15:36imirkin: if the hw counter is 64-bit wide, then the high bit of the counter still goes into bit 64 ;)
15:36dboyan: strange, in Issue 2 "Spec language currently mandates 64-bit, which would preclude implementations from exposing a 32-bit timer."
15:36karolherbst: imirkin: .. true, I was thining about an overflow bit...
15:37karolherbst: dboyan: :D unresolved...
15:37RSpliet: agusyc: is this a skylake laptop by any chance?
15:37agusyc: RSpliet: Nope, Asus X556UB. It has an i5 and a 940M.
15:37imirkin: agusyc: out of curiousity, are you suspending to ram or to disk?
15:37agusyc: imirkin: RAM.
15:37karolherbst: imirkin, dboyan: but it should be fine if we just fill the 32 high bits of the 64bit value and be done with it, or not?
15:37imirkin: karolherbst: afaik, yes.
15:37karolherbst: even if this is a super silly workaround
15:37RSpliet: agusyc: sounds a lot like my K501U, which is a skylake
15:38agusyc: RSpliet: Mmm... And why may it be?
15:38RSpliet: and which suffers from random hangs when locking screen or suspending like yours
15:38imirkin: agusyc: i wonder if your issue is due to some kind of runtime pm getting hit on the usb device
15:38agusyc: imirkin: The USB device?
15:38agusyc: When did I mention a USB device? :P
15:38imirkin: agusyc: touchpad is most likely hooked up via USB internally
15:39agusyc: I didn't know that.
15:39karolherbst: (or PS/2 ... )
15:39imirkin: you can check with 'lsusb'
15:39imirkin: PS/2 is on its way out, and it sounds like you have a semi-modern device
15:39karolherbst: mine has PS/2
15:39agusyc: Looks like it's not usb.
15:39imirkin: huh, ok
15:39karolherbst: imirkin: and it has a hsw CPU + Kepler GPU :p
15:39agusyc: It's an Elantech Touchpad, by the way.
15:40karolherbst: ... PS/2 it is :p
15:40karolherbst: there is a special config for that in the kernel
15:40imirkin: well, PS/2 is a lot harder to kill :)
15:40imirkin: which is why i wanted to blame usb
15:40karolherbst: I have a Sentelic based one
15:41agusyc: Is there anything I can do?
15:41imirkin: well, i'm guessing fedora says =y or =m to everything...
15:42agusyc: It happens on every distro.
15:42RSpliet: agusyc: I always had the impression it's not the touchpad losing it, but the Intel GPU
15:42agusyc: I tried several.
15:42RSpliet: so the mouse doesn't move, no feedback on screen, but magic sysrq sometimes works, sometimes doesn't
15:42agusyc: RSpliet: Do you think? I'm not using the NVIDIA one right now. I blacklisted the module and I the issues don't occur.
15:43RSpliet: ah... hmm... could nouveau be holding the kernel hostage in its resume-from-suspend?
15:43imirkin: agusyc: try loading nouveau with runpm=0
15:43karolherbst: imirkin: a little better now: https://gist.github.com/karolherbst/0ba3adf0456e48d5147f6920f638bede
15:43agusyc: Ok, I'm going to reboot and try. Brb.
15:43agusyc: You mean just adding "runpm=0" as a kernel parameter, right?
15:43karolherbst: some values aren't immediated yet though
15:43karolherbst: agusyc: nouveau.runpm=0
15:43agusyc: karolherbst: Ok, thanks.
15:43dboyan: imirkin: I'll try sticking clocklo to upper 32-bits then
15:44imirkin: dboyan: which is what nvidia does right?
15:44dboyan: except that it puts clockhi to lower bits
15:44imirkin: but ... my guess is that's a hack for some kind of "advanced" software to detect silly things
15:45imirkin: i dunno
15:45karolherbst: clockhi is 1 bit only?
15:45karolherbst: and is cleared on read?
15:45imirkin: karolherbst: hard to say.
15:45imirkin: would require deeper investigation
15:46imirkin: dboyan: i'd start simple, and not do any of the fancy nvidia things with the loop/etc
15:46dboyan: I didn't managed to get clockhi more than 1, but it's clearly not cleared on read
15:46karolherbst: I think nobody will need a 64bit precise clock anytime soon with nouveau anyway
15:47dboyan: It takes about 10 seconds on my card to make clockhi non-zero
15:48dboyan: and when I tried to make a shader run longer, the blob stops it midway
15:48karolherbst: 10 seconds is a lot
15:48agusyc: Ok, I think it worked...
15:48agusyc: At least the touchpad didn't freeze.
15:48agusyc: But I still have to see if it hangs on poweroff, so, brb.
15:48karolherbst: dboyan: is the value reset automatically for every shader?
15:49dboyan: karolherbst: you mean clocklo/clockhi?
15:49dboyan: I think so
15:49karolherbst: good enough then
15:52dboyan: imirkin: I also noticed nha's work on ARB_shader_ballot. might want to work on it if I get some spare time
15:54imirkin: dboyan: cool
16:08karolherbst: I forgot to clear the mod on the final ex2.....
16:15karolherbst: getting there: "total instructions in shared programs : 3931743 -> 3932317 (0.01%)"
16:18karolherbst: imirkin: "mov u32 %r292 0x44fa0000 + lg2 f32 %r442 %r292"
16:21karolherbst: mhh, I guess I need to do that in the lowering then
16:23karolherbst: or do you have any better idea?
16:23imirkin: can lg2 take an imm?
16:24imirkin: i didn't think it could..
16:24karolherbst: we calculate the result in the compiler normally
16:25karolherbst: the mov+lg2+mul -> mul in the old version
16:27imirkin: oh. coz lg2(imm) = easy to compute ;)
16:27imirkin: you could fix that up in ConstantFolding
16:28karolherbst: well, we just moved the lowering post SSA
16:28karolherbst: I could allow pow(imm0, a) but then I would "opt" it to pow -> ex2(preex2(mul(lg2(imm0), a)))
16:29imirkin: post-ssa there is no lowering
16:30karolherbst: *legalizing then
16:30imirkin: you could add special logic to handle it in your lowering pass though
16:30imirkin: since you can use getImmediate there
16:31karolherbst:is wondering if we should make it easy to call ConstantFolding::handleLG2 easily from outside _peephole
16:32imirkin: i'd just do the imm handling in your new lowering logic
16:32karolherbst: should be trivial enough after looking at the constantfolding code
16:33imirkin: if src.getImmediate(imm): stuff.
16:36karolherbst: is there a non float pow version at all?
16:41karolherbst: I can't call new_ImmediateValue
16:41karolherbst: because the constructor of ImmediateValue calls prog->add
16:42imirkin: why's that a problem?
16:43karolherbst: I guess there is another way to get prog besides i->bb->getProgram()
16:44karolherbst: another issue
16:44karolherbst: I reordered and moved setPosition
16:45imirkin: or don't use the builder at all... wtvr
16:46karolherbst: I use it for getSSA
16:46imirkin: i guess using it is convenient :)
16:46karolherbst: now I have to take more care about the mul, so I put the generated immediate into the second src
16:47karolherbst: it is starting to get complicated
17:06karolherbst: pow(a, 1), well that is easy
17:19karolherbst: ohh wait
18:00karolherbst: imirkin: I am currently wondering, but this should be right: ex2(lg2(a)) == a? I am just confused why this wasn't optimized previously
18:01imirkin: yes, that is correct
18:01imirkin: at least ... correct enough
18:01imirkin: (e.g. NaN & co won't be treated properly)
18:03pmoreau: Might be worth optimising it and adding some sel to handle the NaN & co cases?
18:04karolherbst: I am pretty sure that our current output is also halfly wrong
18:04imirkin: or not worry about nan since it all tends to be undefined
18:05karolherbst: and if something really depends on that, we figure that out with the first bug report
18:11pq: obviously it must be that a > 0, but you knew that, not dealing with complex numbers I suppose. No idea if you should accept a = 0 or a being almost 0.
18:12karolherbst: pq: we come from a pow actually
18:13karolherbst: a^b -> 2^(b*lg2(a))
18:13pq: does it accept pow(negative real, integer)?
18:15karolherbst: uhhh wait, meh
18:16karolherbst: I am sure it does
18:35imirkin: pq: mathematically, sure. in practice, i don't know if that's legal
18:36imirkin: 99.99987% of the uses of pow are for srgb, i.e. pow(color, 2.2)
18:44pq: I don't know if it's legal either :-)
18:51karolherbst: imirkin: mhh, now we are missing LocalCSE :/
18:52karolherbst: same base, different exponents -> you could share lg2(base)
18:53ddaymace: currently running free radeon driver with debian stretch gnome; if i switch to geforce 6 card, will it switch to nouveau automatically, or do i have to uninstall old drivers?
18:54karolherbst: imirkin: but currently I am already at "total instructions in shared programs : 3927257 -> 3926939 (-0.01%)", just added special handing if there is a src==1
18:54imirkin: ddaymace: should switch, assuming you have them installed
19:43karolherbst: imirkin: any idea how to solve the missing localCSE?
19:43imirkin: that's one of the downsides of the approach.
19:43karolherbst: mhh, I could have a map and save which lg2 I created
19:43imirkin: remind me why you wanted to move it to later on btw?
19:44karolherbst: pow(a , 16) -> tons of muls
19:44karolherbst: well, <5 is fine as well
19:44karolherbst: but this was the idea
19:44imirkin: well, you could detect the decomposed thing in AlgebraicOpt right?
19:44karolherbst: there are more opts possible based on pow though
19:45karolherbst: like pow(1, a) and pow(a, 1)
19:45karolherbst: and so on
19:45imirkin: well, the latter should work out
19:45karolherbst: both need the same code
19:45imirkin: since exp(lg2(a) * 1) = exp(lg2(a))
19:45imirkin: although we might not have the smarts to make a out of that.
19:47karolherbst: maybe I checkout that ex2(lg2(a)) == a first
19:47karolherbst: would be a perfect AlgebraicOpt thing
19:48karolherbst: and improves the situation a little regarding pows without much code
20:15karolherbst: imirkin: is PREEX2 always used prior a EX2? I don't really know what those PRE* instruction do
20:18karolherbst: heh, I can do lg2(ex2(a))==a as well
20:24karolherbst: I know that PRESIN can be/is used before sin and cos
20:24karolherbst: do they somehow prepare the register or do they calculate something as well?
20:25imirkin: they calculate something
20:27karolherbst: *sigh* my opt isn't picked up, cause the mul(a, 1) is still there
20:29imirkin: coz you need to detect lg2(mul(ex2(preex2(a)), b))
20:29imirkin: er, other way around
20:29imirkin: ex2(preex2(mul(lg2(a), b)))
20:29imirkin: might be a prelg2 as well, i forget
20:29karolherbst: the mul(a, 1) is opted away though
20:30karolherbst: there is no prelg2 at least on nvc0
20:30karolherbst: ConstantFolding deals with the mul
20:30imirkin: algebraicopt runs before constantfolding
20:31karolherbst: I think looping over the opts is our best shot now, and it indeed changes a lot
20:34karolherbst: just need to fix all those AlgebraicOpts
20:37karolherbst: instructions: -0.36%
20:37karolherbst: locals: -2.62%
20:38karolherbst: but some hurt gprs
21:28karolherbst: imirkin: regarding RA and improving register layout (less movs for d/t/q regs) I think you told once, that doing it backwards should be easier and "better". I never diged into the RA code, but I think I will work on that issue sooner or later
21:30karolherbst: I just wanted to ask about this issue before I start working on it. Not that I remember that wrongly and there is a better way to tackle this
21:31karolherbst: or I do "Move loop-invariant defs out of loops" :) this sounds like a lot of perf to gain
21:43karolherbst: found another bug. nice
21:43karolherbst: ohh wait, no, this makes sense actually
22:30mooch2: sorry, i'm a bit new to this