01:35 karolherbst: imirkin: what do you think about a Pass wich runs PreSSA and just detects BBs only containing bra, to let all bra pointing to that empty BB point where the bra of that BB points to?
01:37 karolherbst: mhh allthough this could be done in SSA too
01:39 karolherbst: in the end we need something like that: if find a conditional jump of any kind and both end up in the same BB in the end (respecting chained jumps) the jump can be simply removed
01:40 karolherbst: *both branches
03:07 orbea: could this be a nouveau / mplayer issue? http://dpaste.com/2T68MZK With hardware decoding after a while it plays sound, but no video. Software decoding works and mpv wont even try to use hardware decoding even if told to...
07:37 karolherbst: RSpliet: yes, I have an ina3221
07:48 mupuf: karolherbst1: reading your patches quickly while preparing for work, I am very impressed by own much the whole thing improved!
07:48 karolherbst: yeah
07:49 karolherbst: I've never though it would be that much in the end though...
07:49 mupuf: there is one last thing that I am a little worried about, it is to set the default boost value to 1. Under a high load, the power consumption may be too high ... and we do not downclock at this time
07:49 mupuf: yeah, the rabbit hole was really deep
07:49 karolherbst: mhh
07:49 mupuf: and until all the pieces were together, nothing could be enabled
07:49 karolherbst: boost 1 is still far below the limits though
07:50 karolherbst: also
07:50 karolherbst: there are GPUs with no boosts level at all
07:50 mupuf: sure, but it is not a clock that is guaranteed to be available
07:50 karolherbst: it kind of is though
07:50 mupuf: well, let's wait until we receive this titan :D
07:50 karolherbst: the boost clock is the clock that should be reached under normal circumstances at any time
07:50 mupuf: if it is OK there, then we may land the thing
07:51 mupuf: otherwise, well, we can always add the power daemon code :)
07:51 karolherbst: and only in rare cases the clock should ever drop below it
07:51 mupuf: right
07:51 karolherbst: yeah, it shouldn't be that hard now
07:51 karolherbst: though we have to RE the power_budget thing right
07:51 karolherbst: and there was a bit I just didn't found
07:51 mupuf: yep!
07:51 mupuf: true
07:51 mupuf: but we should land your code before all of this
07:52 mupuf: I was just saying that being more conservative (boost=0) would be safer
07:52 karolherbst: yeah
07:52 karolherbst: that's usually no problem
07:52 karolherbst: the user can always change it
07:52 mupuf: well, it is not like we have a ton of cases where we checked the power usage
07:52 mupuf: so...
07:53 karolherbst: well at least on the temperature side we are safe
07:53 mupuf: the titan will be the perfect platform for testing
07:53 karolherbst: mhhhh
07:53 karolherbst: I am not quite sure about it, beause of it's immense power supply
07:53 karolherbst: but there is another thing
07:53 karolherbst: if we are able to fake the current temperature
07:53 karolherbst: maybe
07:53 karolherbst: ...
07:54 karolherbst: well no, how could that work
07:54 mupuf: yep :D
07:54 mupuf: I wish, but no
07:54 karolherbst: right
07:54 mupuf: well, we can bit bang stuff ourselves, but that would be super hard
07:54 mupuf: and error prone
07:54 karolherbst: mhhh
07:55 mupuf: either we change the power budget, or we increase the values of the shunt resistord
07:55 karolherbst: at least the highest power consumption I ever had was with gputest_furmark
07:55 mupuf: or .... we use a power beast
07:55 mupuf: which is highly capable of exceeding its budget
07:55 karolherbst: with it I came close to 70W
07:55 mupuf: and the titan fits this
07:55 karolherbst: and my budgets are like 80W (and the official TDP is 75W)
07:56 mupuf: that is just one sample point ;)
07:56 karolherbst: right
07:56 mupuf: have to go though
07:56 mupuf: I would say, let's lean on the safe side
07:56 mupuf: and wait for us to enable the power capping feature before allowing it to 1 or 2
07:57 karolherbst: yeah, sounds okay
07:57 mupuf: after the power capping, we can start thinking about ... DVFS!
07:57 karolherbst: it isn't like the user can't change it at runtime now
07:57 mupuf: yep!
07:57 karolherbst: isn't dvfs like a maxwell thing?
07:57 mupuf: be safe, but let the user be stupid
07:57 mupuf: ah ah, yeah, no
07:57 mupuf: it means dynamic voltage/frequency scaling
07:58 karolherbst: yeah well
07:58 mupuf: it is dynamic reclocking, what you already sort of have a prototype for
07:58 karolherbst: ahhh you mean like real dynamic reclocking
07:58 mupuf: yep
07:58 mupuf: research name = DVFS
07:58 mupuf: get used to it :p
07:58 karolherbst: k
07:59 karolherbst: daemon32 has some weird fan issues by the way
07:59 mupuf: yeah, I need to have a look at it again...
07:59 mupuf: I have a card which also is fucked up
07:59 karolherbst: I am sure that unknown bit means use the FAN_MGMT table
07:59 mupuf: you need to set the div to very high values
08:00 mupuf: but the duty never should go about 0x10
08:00 mupuf: it is really weird
08:00 karolherbst: odd
08:00 mupuf: yep...
08:00 mupuf: see you!
08:00 karolherbst: bye
08:01 mupuf:wonders where the heck this titan is. It arrived in Helsinki on sunday morning and still no traces of it!
08:01 karolherbst: well
08:02 karolherbst: they keep it themselfs
08:02 karolherbst: oh no, now you told everybody :D
08:26 marisn: I get occasional failures of kwin with following message: nouveau 0000:01:00.0: fifo: CACHE_ERROR - ch 5 [kwin_x11[3337]] subc 0 mthd 0060 data beef0201
08:26 marisn: is it plasma5/kwin issue or should I bark on nouveau?
08:51 karolherbst: RSpliet: do you know if the Tesla code can handle 100MHz memory clocks?
09:24 martm: karolherbst: seems like you handled and handle continuously you're part pretty well, basically if it is needed don't be scared of this mt+scheduling to get there, we could quite easily make it, i'd have to shift that cause resources are occupied
09:26 martm: karolherbst: my primary focus wasn't the reclocking stuff, could had been but..anyways i had believe in your activity
09:30 martm: karolherbst: the documents are almost readable too, reordering shader is not needed, you do the correct stream with pushbufs/pfifo and pgraph fed correctly, and it reorders everything according to that
09:31 martm: both methods in the list now the things i could handle has grown to 3entries in trello
09:34 martm: karolherbst: i wasted all the time at the moment with filtering different codebases, it will be success those three entries, we may celebrate allready basically
09:38 martm: karolherbst: i am just thinking when we add a branch per instruction, i.e without reordering the instructions in the scheduler, adding masks per instructions in the handler, could be bust the stack lifo in hw?
09:51 martm: i think it's not a problem, probably there is no such concept as busting the lifo, as it would just keep all the recent branches if possible
09:52 martm: in that sense you can alternate between taking the most recent previous branch and not taking it leaving it certainly in the recent spot in the lifo
09:52 martm: and moving from bottom to top that way
09:54 martm: yeah cbranch can probably take -offsets too;)
09:54 martm: i saw it used somewhere
10:00 martm: but if it does not will have a solution to this too, as i can switch too instructions in the cache cause i process them in handler anyways
10:01 RSpliet: karolherbst: I've seen manu 135MHz clocks
10:01 RSpliet: haven't seen a BIOS with 100MHz clocks, but they must be out there
10:02 karolherbst: RSpliet: right, but I mean 100MHz :D
10:02 karolherbst: RSpliet: https://www.phoronix.com/forums/forum/linux-graphics-x-org-drivers/open-source-nvidia-linux-nouveau/865809-nouveau-boost-patches-show-much-performance-potential?p=866017#post866017
10:02 RSpliet: it'd have to be run using a PLL, but apart from that I see no big problems
10:04 RSpliet: "Could not calculate MR" is a dead give-away
10:04 RSpliet: probably it has some weird low CL or CWL that's not in the translation table
10:05 RSpliet: get a trace and a VBIOS and it'd be sorted in no time
10:29 martm: karolherbst: anyways i don't think there are bigger problems, i have queued up the work to save my life at some point, and i will do that work, the scheduler is very primtive, and for mt if you want to do atomics in c++ or c code, there are very many tools, but i also added the mfci way which does it in LLVM ir automatically
10:29 RSpliet: karolherbst: also, check your mail
10:29 RSpliet: there's a shiny VBIOS in there
10:29 martm: then to fix the stuff we'd need to adjust the Makefiles in nouveaus directories
10:29 martm: adding a pass to the linker there that of mfci
10:29 karolherbst: RSpliet: thanks
10:30 martm: if we have issue in drm tree or kernel instead of nouveau we compile kernel with clang and add the pass to there
10:31 martm: i really would expect looking at the code, that warthunder no longer crashes than
10:31 karolherbst: imirkin: meh.. ../../../../../src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp:2856: bool nv50_ir::FlatteningPass::tryPredicateConditional(nv50_ir::BasicBlock*): Assertion `pred' failed.
10:34 martm: mcfi sorry
10:34 karolherbst: imirkin: uhh.. delete_instruction doesn't rewire bb->exit in SSA=
10:34 karolherbst: ?
10:34 martm: modular control flow integrity, there were several others but that is what i remember
10:35 karolherbst: ohh wait, I have to do something else
10:38 martm: RSpliet: yeah isn't like prolly sprinking couple of mutexes, but whole lot of them, but still it can be done automatically instead, though passing them using the runtime trace where it shows exactly where to place it and fix up the code without llvm module pass is also possible
10:41 martm: i have not really hit any gpu lockups , showing that all lockups are on cpu, but if there is any gpu lockup, then this is pgraph filling fault according to the docs in code
10:41 martm: pase it page 167 in envytools right, where it was described quite okish
10:44 karolherbst: imirkin: any idea why the FlatteningPass gets upset by this? https://gist.github.com/karolherbst/c32accf2090d7bb6879c544294756143
10:44 karolherbst: BB:0 has the issue somewhere
10:45 martm: karolherbst: that perfectly makes sense that low-end cards perform better without the scheduler when comparing with nvidia blob which has the scheduling allready done
10:46 karolherbst: imirkin: ohh I see it now, it wants to remove that joinat
10:47 martm: karolherbst: let's say at some point i throw you a 300line patch approximately and it's knock out to nvidias driver entirely
10:47 martm: this is how few lines we need to make the scheduler
10:47 martm: approx
10:49 martm: then when DX driver is added it will be knock down for nvidia and micrososoft too
10:52 martm: i basically would had gone for knocking down intel too, but they were smart to acquire altera
10:58 karolherbst: imirkin: well I removed the Predicate on the bra instruction and then FlatteningPass ends up doing this: https://gist.github.com/karolherbst/c32accf2090d7bb6879c544294756143
10:58 karolherbst: ...
10:58 karolherbst: jointat, bra, join, join
10:59 karolherbst: RSpliet: this was a 780 or 780ti?
10:59 karolherbst: ahh 780 I think
11:13 karolherbst: RSpliet: 862.5 and 901.5 are the base/boost clocks
11:13 karolherbst: but we clock those cards differently
11:16 martm: if we get some deadlocks on gpu, i gotta concentrate on the docs and code further more, but i need to run now..
11:16 martm: i gotta/then i gotta
11:17 martm: cheers, bye.
12:13 RSpliet: karolherbst: 780, I think 902MHz is supposed to be the base clock
12:15 RSpliet: actually it appears to be higher
12:16 karolherbst: why does "bra BB:129" -> "join BB:129" make sense?
12:16 karolherbst: RSpliet: mhh odd
12:18 RSpliet: not odd at all, the VBIOS contains a core freq of 1150MHz, which it's not even reaching with nouveau despite NvBoost=2 set
12:19 karolherbst: the vbios also contains 1202MHz
12:19 karolherbst: so what?
12:19 RSpliet: no I'm talking about the PM table
12:19 RSpliet: which is supposed to correspond with the base clock
12:19 karolherbst: PM_Mode?
12:20 karolherbst: no, this has nothing todo with the boosting stuff
12:20 karolherbst: those clocks are just used to build the actualy cstate clocks
12:20 karolherbst: according to the boost_table
12:21 karolherbst: forget the PM_Mode table here, it won't help
12:21 karolherbst: and doesn't matter that much actually
12:21 karolherbst: the CSTEP table contains all GPC clocks available
12:21 vita_cell2: karolherbst
12:21 karolherbst: and the BASECLOCK table corresponds to one of them
12:21 vita_cell2: http://hastebin.com/nokicisege.sm
12:21 karolherbst: everything else is unimportant
12:21 vita_cell2: this is normal?
12:22 RSpliet: karolherbst: and the cstep table goes pretty darn high
12:22 karolherbst: RSpliet: right
12:22 karolherbst: but look here:
12:22 karolherbst: https://gist.github.com/karolherbst/2d562513a780c88810fa68d963a7ff80
12:22 karolherbst: can your GPU reach voltage 67 at all?
12:23 karolherbst: your voltage rangE: -- Mode GPIO (header-generated), Base voltage 1212500 µV, voltage step -12500 µV, acceptable range [825000, 1212500] µV --
12:23 karolherbst: no, so the cstate is droped
12:23 karolherbst: continue this for the lower one until you get one, that is setable
12:23 RSpliet: that should determine the max boost clock amirite?
12:24 karolherbst: no
12:24 karolherbst: :D
12:24 karolherbst: there is more
12:24 karolherbst: the first step is: drop all cstates which are with ther volt.min entries above what your max_uv is
12:24 karolherbst: so csattes 41 is the highest one, which _could_ be theoretically set
12:24 karolherbst: but then there is more
12:25 karolherbst: 0,1,2: are max entries
12:25 karolherbst: software wise
12:25 karolherbst: and if you look closely: volt = 1261332 + (-96 * T * 5^6) >> 10
12:25 karolherbst: the max voltage depends on the current T (temperature)
12:25 RSpliet: *squints eyes* right
12:25 karolherbst: and can be between 1137500 and 1162500
12:25 karolherbst: and so on
12:26 karolherbst: now you calculate the voltages for each cstate beloe 41, until you find a good one :D
12:26 karolherbst: I just drop the higher ones, because they can be never reached and saves us some CPU cycles
12:27 RSpliet: where T is in degrees celcius
12:27 RSpliet: I presume
12:27 karolherbst: right
12:27 karolherbst: and S is your GPU speedo value :D
12:27 karolherbst: read out like this: https://github.com/karolherbst/nouveau/commit/8cdfb3cc8a81b2da4592d3bb86549d32cb79def3
12:28 RSpliet: what's the order of presedence? first shift, then addition?
12:29 karolherbst: right
12:29 karolherbst: and the values are always clamped by the min/max ones
12:29 karolherbst: and then the links are summed up
12:30 karolherbst: ...
12:30 RSpliet: that implies that upon a temperature change you'd want to change the clock as well
12:30 karolherbst: right
12:30 karolherbst: which is done
12:31 karolherbst: here: https://github.com/karolherbst/nouveau/commit/0538947bd176c9a1837532af7c9e17dd5ed0cdf5
12:31 RSpliet: sounds as if you'd want to statically create a mapping temperature -> max_voltage upon boot
12:31 RSpliet: (determine for each voltage the max temperature)
12:31 karolherbst: yeah well
12:31 karolherbst: I could, but then I would have a map with like 100 entries
12:31 karolherbst: and would have to lookup
12:31 karolherbst: mhh
12:31 karolherbst: could be a static array though
12:33 RSpliet: don't know how you'd want to structure that quite yet - but doing the arithmetic over and over again sounds wasteful
12:33 karolherbst: I know
12:33 karolherbst: but I don't want to save an array with 100 entries for each map entry
12:33 karolherbst: I could have a map for the global max voltage, that might be fine
12:33 RSpliet: where each map entry means three entries?
12:34 karolherbst: three?
12:34 RSpliet: how many "map entries" do you have?
12:34 karolherbst: well you have 89 and 28 empty ones
12:34 karolherbst: ohh 38 empty actually
12:35 karolherbst: so it is more like 128 map entries in total usually
12:35 RSpliet: eh, 100 entries per voltage ID? no, a simple temperature -> max voltage ID would suffice
12:35 karolherbst: usually there is 2-3 max entries, one for each Cstate, one for each Pstate
12:35 karolherbst: and than some random linked ones
12:35 karolherbst: *are
12:36 karolherbst: RSpliet: the IDs aren't ordered
12:37 RSpliet: there still is a maximum, and when picking your core speed it's the only one that matters to the best of my knowledge
12:38 karolherbst: yeah, but the painful par is, we have like 128 map entries. And each of them could be temperature dependent
12:38 karolherbst: and then we have those max entries, which could also be temperature dependent
12:38 RSpliet: for every temperature point you can pick the best one, all the other variables are constant
12:39 karolherbst: there is no "best", because there is no order
12:39 RSpliet: nonsense
12:39 karolherbst: but there is no order
12:39 RSpliet: you can pick the best out of an unordered list
12:40 karolherbst: well I could iteratove the the cstates at boot time and check up until with temperature they might be valid and add a max_t field to them
12:40 karolherbst: but looking at the vmap entries is the wrong approach
12:41 karolherbst: but then you end up calculating the voltage of all cstates, for each temperature
12:42 RSpliet: why? when your problem is "pick a boost core frequency" your steps seem to be "read temperature -> find corresponding max voltage -> pick max cstep with that voltage"
12:43 karolherbst: no
12:43 karolherbst: the csteps have different voltages depending on the temperature too
12:43 RSpliet: have you seen cstep tables where the frequency -> voltage mapping is not monotonously increasing?
12:44 karolherbst: yes
12:44 RSpliet: or monotonously decreasing - for all Csteps valid in a P-state
12:44 karolherbst: yes
12:44 RSpliet: mind pasting an example?
12:44 karolherbst: well it is true in general, but there are some funky examples
12:45 RSpliet: are you sure? not frequency->voltage ID but frequency->voltage?
12:48 RSpliet: anyway, I need to move on. give this some serious thought, because I can't imagine there isn't a monotonous (but possibly non-linear) relationship between temperature and max-voltage, nor between voltage and max-frequency with that voltage
12:48 karolherbst: didn't checked in detail, I just now that some cstep tables have stuff like this: 810MHz, 540 MHz, 628 MHz -> increasing
12:49 karolherbst: it could be in the end that the first entry is some kind of special entry
12:49 karolherbst: and then it is monotonously increasing in the end if we ignore the first one
12:49 karolherbst: but without having proper knowledge there, I wouldn't depend on it
12:49 RSpliet: I believe the boost table lists min and max csteps right? could well be that for lower pstates they added some convenience entries
12:49 karolherbst: usually not
12:51 karolherbst: like in my vbios the min values are far below the lowest cstep
12:51 karolherbst: for every pstate
12:51 RSpliet: anyway, physically I can't think of a reason those relations aren't monotonous, so I think there must be something special about those first few (random fact: non-PLL?) entries that you mentioned
12:52 RSpliet: it's probably worth figuring out, as it may make your life easier ;-)
12:52 karolherbst: yeah, maybe
12:52 karolherbst: but nvidia also builds own cstates
12:52 karolherbst: but usually you could save the max temperature of a cstate
12:52 karolherbst: this would be more future proof than to depend on the relation between those entries
13:56 martm: karolherbst: the first pseudo code was meant for implementation like 1000-2000lines, but current teqhnique is so easy that this work isn't very hard, but..actually i need to read gpu race condition possibilities couple days more
13:59 martm: last time there was open thing in my understandings how those pfifo 127channels occupancie is tested, can currently there happen a fault, that when there is a cache miss , that one context spurts infront of the other, i very highly doubt, i'e i think if there is deadlock on gpu, which is extremely rare probably
13:59 karolherbst: yay
13:59 karolherbst: total instructions in shared programs : 2425258 -> 2404958 (-0.84%) total gprs used in shared programs : 337596 -> 336467 (-0.33%)
14:00 martm: on nouveau, then it's because of shader not other context bits, and in the shader basically there is impossible to get a deadlock after the scheduler is merged
14:02 karolherbst: vita_cell2: mhh it shouldn't error if you have my current brnach thoguh
14:02 vita_cell2: I only have 4.4 nothin more, no patches
14:02 vita_cell2: 0a, 0e, 0f working fine
14:02 karolherbst: mhhh
14:02 karolherbst: seems like it doesn't
14:03 karolherbst: did you check that the core clock actually changed?
14:03 vita_cell2: wait
14:04 orbea: karolherbst: you mean like this? 4.4.7 here http://dpaste.com/2X88Y20
14:04 martm: so my diagnosis is that when little bit of code is added, cpu race is fixed and an easy sched added , then the driver will work as gold
14:04 karolherbst: orbea: right
14:04 karolherbst: that happens with stock nouveau
14:04 vita_cell2: 0a: 0a: core 405-1032 MHz memory 1620 MHz AC DC *
14:04 vita_cell2: AC: core 1032 MHz memory 1620 MHz
14:05 orbea: were there some patches for that or ideas when it would be fixed? I remember reading something about that, but forgot to ask more
14:05 martm: i do not bother with the docs, because if the threads are interleaved correctly on the cpu side, it is impossible to see how gpu could deadlock
14:05 karolherbst: vita_cell2: and 0f?
14:05 martm: i allready got enough from docs which make me understand that then it is no longer possible to screw up in hw
14:06 vita_cell2: 0wait
14:06 karolherbst: orbea: maybe it comes with 4.7, maybe 4.8
14:06 vita_cell2: karolherbst
14:06 vita_cell2: http://hastebin.com/umovepedoh.mel
14:06 karolherbst: vita_cell2: right
14:06 karolherbst: yeah, stock nouveau does this
14:06 vita_cell2: I think, that it reckloks right way
14:07 martm: karolherbst: currently the biggest work is to get per-instruction data which would take some time to make the shader with all the instructions
14:07 karolherbst: yeah well, but it still fails to clock the engines on 0e, 0f
14:07 karolherbst: vita_cell2: so to get proper clocks you have to go to 0a first
14:07 karolherbst: vita_cell2: but in any case, this will be most likely pretty unstable
14:08 vita_cell2: I know, 1 step 0a command, after, 0e, and 0f
14:08 vita_cell2: to downclock, is o7
14:10 martm: ouh still, let's say there is competition on the bus, or a cache miss in pfifo, for the first context which was launched before, but..second context did not face a miss in pfifo
14:10 martm: could the second one be added first in the pgraph to be executed?
14:13 martm: imo just that isn't sane when this would be allowed to be done, that would be poor design, cause this pfifo order in pgraph is a fifo structure
14:18 martm: or let's say that with so many contexts it is programmers responsibility to really make sure of the order, by making use of a fence streams
14:18 martm: i.e sync objects in opengl
14:21 martm: for example, second context is not launched before first context has signalled a fence, only opengl programmer should be able to do that
14:22 martm: then we can be sure that fifo has been filed in correct order for pgraph
14:32 karolherbst: imirkin: when I am in the SSA form and I look at bb->getExit(). How do I know which block is executed after the exit instruction?
14:32 imirkin: karolherbst: you look at the cfg
14:32 karolherbst: like when I have "not %p14401 bra BB:16", then if no %p14401 the execution jumps to BB:16, but where otherwise?
14:33 karolherbst: I tried the cfg
14:33 imirkin: look at the outgoing edges
14:33 imirkin: they will tell you where all it can go
14:34 karolherbst: bb->cfg?
14:35 imirkin: yes
14:36 imirkin: bb->cfg should be that bb's node in the cfg
14:36 karolherbst: mhh bb->cfg.out is the "next" Block then?
14:36 imirkin: use the edge iterators
14:36 imirkin: rather than trying to do it by hand
14:36 karolherbst: yeah I know
14:36 karolherbst: I just want to understand the stuff first
14:37 karolherbst: so bb->cfg.outgoing(false)
14:37 karolherbst: mhh and there might be several of them then
14:37 imirkin: right
14:38 karolherbst: and how do I know which is the one which would be taken after the last instruction?
14:38 imirkin: ... you solve the turing termination problem?
14:39 karolherbst: I meant for a non flow instruction
14:39 karolherbst: or can that be still lead to different ones?
14:39 imirkin: it should always end in a flow instruction
14:40 imirkin: or multiple flow instructions
14:40 karolherbst: ok, here is what I plan to do: if both directions of a flow instruction end up at the same BB, then I can remove it
14:40 karolherbst: and with end up I mean, if there are only stupid "bra" thingies
14:41 imirkin: you mean if all of the exit instructions point at the same bb
14:41 imirkin: then nuke them?
14:41 imirkin: that makes sense
14:41 karolherbst: right
14:41 imirkin: (or rather, nuke the duplicates, remove predicates, etc)
14:41 karolherbst: like here: https://gist.github.com/karolherbst/c32accf2090d7bb6879c544294756143
14:41 karolherbst: ohh wait
14:41 karolherbst: that's the result
14:41 karolherbst: like that: https://gist.github.com/karolherbst/731f56fbda8619a986e963527736f30e
14:41 imirkin: like where?
14:42 imirkin: the second paste is post-RA
14:42 imirkin: aka irrelevant
14:42 imirkin: the first paste seems like it wouldn't hit your proposed rule.
14:42 karolherbst: wait a second
14:42 karolherbst: https://gist.github.com/karolherbst/36c3b67d237c24c209be0490aab81a20
14:43 karolherbst: 29: not %p447 bra BB:3 (0)
14:43 karolherbst: if the not, and the not not branch would both end up in BB:4, I can nuke this bra
14:43 imirkin: i guess there's an implicit branch to BB:2 at the end
14:43 karolherbst: nope
14:43 karolherbst: ohh implciit
14:43 karolherbst: yeah don't know
14:44 karolherbst: but I want to nuke the bra instruction
14:44 imirkin: look at the CFG
14:44 imirkin: it says -> BB:3, -> BB:2
14:44 imirkin: so... yeah.
14:44 imirkin: look, i understand what you want to do
14:44 imirkin: but you have to be careful with this stuff
14:44 karolherbst: but can I simply nuke the bra or do I have to remove the predicate?
14:44 imirkin: make sure you understand wtf is going on
14:44 karolherbst: yeah, I had already fun with it ...
14:45 karolherbst: but the TGSI is the problem here anyway
14:45 imirkin: you have to detect very specific situations
14:45 imirkin: and act upon them
14:45 imirkin: rather than just try random things
14:45 karolherbst: 9: UIF TEMP[2].xxxx :0 10: ELSE :0 11: ENDIF
14:45 imirkin: so think first... what is the very special situation you're trying to detect?
14:45 karolherbst: this one^
14:45 karolherbst: empty branches
14:46 imirkin: so ideally you could be a little clever
14:46 imirkin: an empty bb doesn't help anyone
14:46 karolherbst: right
14:46 imirkin: so if all there is in a bb is a single flow instruction
14:46 imirkin: AND that bb only has a single incoming edge
14:46 karolherbst: I was thinking already if this could be removed in the TGSI somewhere already?
14:46 karolherbst: because nobody wants empty BBs anyway I guess
14:46 karolherbst: *branches
14:46 imirkin: then you can just switch the edges around and no one will be the wiser
14:47 imirkin: never think about trying to optimize tgsi
14:47 imirkin: it's a waste of time
14:47 karolherbst: yeah well, but the tgsi made this empty branch... I didn't see any empty branches in the glsl stuff
14:47 imirkin: there's absolutely no point to it
14:47 imirkin: doesn't matter
14:48 karolherbst: even for obvious stuff like this?
14:48 imirkin: doesn't matter.
14:48 imirkin: it can happen a million different ways
14:48 imirkin: should handle it properly in the backend anyways.
14:48 karolherbst: okay
14:48 karolherbst: so if done in the TGSI it may only save a few CPU cycles, because the backends have to handle that anyway
14:48 imirkin: thus in fact... waste a few cpu cycles :)
14:49 karolherbst: :D
14:49 karolherbst: okay
14:49 imirkin: although in fact, some stuff in nv50 ir is slower than it should be
14:49 imirkin: when i was trying to do less optimizations on the tgsi, things slowed down a bit =/
14:49 karolherbst: well my idea was to check bra instruction if either way is the same (which includes/are empty branches)
14:49 imirkin: but i also found a bunch of bugs in the process, so it wasn't all a loss.
14:50 martm: karolherbst: so basically this workday was kinda short, there is not much to test, needs to be done, it will work..who gathers the per instruction data?
14:50 karolherbst: imirkin: but I like your idea of doing it, I guess I stick to this then
14:50 karolherbst: but why only one incoming edge?
14:54 martm: karolherbst: may you talk to me, or may i ask what are you currently working on?
14:55 karolherbst: martm: compiler optimizations, which may speed up several shaders by a significant amount
14:57 martm: ok green light from here, by all means, this would not be colliding with my ideas:) otherwise for scheduling i allready have a plan
15:10 karolherbst: imirkin: ohhh, I think I slowly get it
15:12 martm: anyways, i think i am done talking here, don't seem to be important things to discuss, but currently will leave myself logged on
15:18 night199uk: q: are there names for the nvidia chipset revisions (0x907d, 0x917d, 0x927d, etc…)?
15:19 night199uk: i think 0x907d = GF119 directly, but is there a name for the category, if that makes sense?
15:34 karolherbst: imirkin: what does "df = { BB:6 }" mean?
15:41 imirkin_: karolherbst: dominance frontier iirc
15:41 karolherbst: mhh okay
15:44 karolherbst: is there a good way to remove a predicate from an instruction?
15:45 karolherbst: ohh I think I got it
15:45 imirkin_: setPredicate(NULL)?
15:45 karolherbst: yeah
15:45 karolherbst: it requires a condcode though
15:47 karolherbst: mhh something is not right. I tlooks good in the SSA from, but I miss something, and I don't know what
15:55 karolherbst: imirkin_: what is here wrong? https://gist.github.com/karolherbst/dda83ec5ee8da397e7b25f4a74b757cf
15:55 karolherbst: I really don't see it
15:55 karolherbst: I would say BB:14 goes into BB:15 when the bra branch isn't taken
15:56 karolherbst: but removing this bra introduces an error
15:57 imirkin_: probably the nv50 ir isn't used to seeing such a pattern? dunno
15:57 imirkin_: or perhaps you're doing something funky
15:57 imirkin_: the printer isn't perfect
15:57 karolherbst: maybe something stupid happens in PostRA
15:57 imirkin_: nv50 ir is an in-memory ir, more like nir than like tgsi
15:57 karolherbst: uhhhh
15:58 karolherbst: in PostRA it looks wrong :D
15:58 karolherbst: https://gist.github.com/karolherbst/dda83ec5ee8da397e7b25f4a74b757cf
15:58 karolherbst: bottom half
15:59 karolherbst: there is now this BB:223 block
16:20 karolherbst: stupid steam runtime ...
16:31 karolherbst: uhh starting saints row 2 on the mcp...
16:46 karolherbst: imirkin_: same issue with nvac in saints row 2 :)
16:52 karolherbst: but it looks a bit different
16:52 karolherbst: using twm though
17:27 karolherbst: imirkin: okay, the broken stuff is for sure of type PIPE_RESOURCE_FLAG_MAP_PERSISTENT
17:32 karolherbst: on my nvac I got this: https://gist.github.com/karolherbst/66af89e4b0f0aa5e43fa684b753635f4
17:34 karolherbst: :O :O
17:34 karolherbst: I gixed the persistent stuff
17:35 karolherbst: *fixed
17:35 karolherbst: https://gist.github.com/karolherbst/00c761f87053577f9a8b6e2670fb5853
17:35 karolherbst: but.. why?
17:35 imirkin_: heh. i like your idea of "fixed".
17:36 karolherbst: right
17:36 karolherbst: but it works with that...
17:36 karolherbst: or will this just disable the extension?
17:36 imirkin_: if that's the case, then persistent is _really_ not what they want
17:36 imirkin_: since that will actually break almost every use-case of persistent i can think of
17:37 karolherbst: I try with withcer2
17:38 karolherbst: ... I just though, yeah why not just change that, it is general code and ....
17:39 karolherbst: yeah
17:39 karolherbst: withcer 2 also fixed
17:39 imirkin_: although i'm a bit curious why it's getting in that logic...
17:41 imirkin_: ah, i bet i know why it's helping
17:41 imirkin_: give me a minute.
17:42 imirkin_: karolherbst: can you tell me how the coherent buffer is being used?
17:43 karolherbst: I don't think coherent buffers are used, because after I send them into VRAM, nothing broke. Just after I send those persistent ones into VRAM everything was messed up
17:44 imirkin_: hm, interesting.
17:44 imirkin_: can you confirm that it's only using persistent and NOT coherent buffers?
17:44 imirkin_: in that case, apitrace should be perfectly fine with it
17:45 karolherbst: okay
17:45 imirkin_: aha
17:45 imirkin_: karolherbst: copy this code: https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/nouveau_buffer.c#n550
17:45 imirkin_: karolherbst: into nouveau_buffer_transfer_flush_region
17:46 imirkin_: i.e. the nv->vbo_dirty setting
17:47 karolherbst: the if clause?
17:47 imirkin_: yep
17:48 karolherbst: after the if in buffer_transform_flush_region?
17:48 imirkin_: that could be an issue for persistently-but-not-coherently mapped buffers
17:48 imirkin_: anywhere, doesn't matter
17:48 karolherbst: and I revert bmy false -> true thing?
17:48 imirkin_: ya
17:49 karolherbst: doesn't seem to help
17:50 karolherbst: https://gist.github.com/karolherbst/418e6f4bbf1ef67cdbd77f99eb3b16ed
17:50 imirkin_: can you kill the if (bind & ...) bit?
17:50 imirkin_: i.e. just do nv->vbo_dirty = true
17:51 Yoshimo: imirkin: usr/lib/x86_64-linux-gnu/dri/nouveau_dri.so exists , i just forgot to tell you earlier
17:51 karolherbst: nope, doesn't help either
17:52 imirkin_: hrmph
17:53 karolherbst: Yoshimo: I would try out weston. Because there is no such thing as glamor and if libGL has a problem, we can easily debug this there and if the GL side is okay, weston will just start
17:53 karolherbst: well libEGL
17:54 Yoshimo: does weston need any other packages installed to work? we could also wait a few days and hope that the dist upgrade to 16.04 fixes it
17:55 karolherbst: wayland
17:55 karolherbst: :D
17:55 karolherbst: oh right you are most likely screwed on your distribution
17:55 karolherbst: Yoshimo: well you could try to start X with LIBGL_DEBUG=verbose set
17:55 karolherbst: maybe it prints something
17:59 Yoshimo: ill do that on friday if it still fails
18:16 karolherbst: imirkin_: any other ideas? Or do you want a trace now?
18:17 imirkin_: none of the above :)
18:22 karolherbst: so you know what's going wrong?
18:23 karolherbst: or is it just engine fail?
18:23 imirkin_: no clue
18:23 karolherbst: okay
18:23 imirkin_: if you want to convince them to give me a free key, i might have a look. otherwise this sounds like a pain to investigate.
18:23 karolherbst: even with a trace?
18:29 karolherbst: ohh right mhhh
18:34 Yoshimo: which game imirkin?
18:42 imirkin_: Yoshimo: dunno... one of the virtual-programming ones? the key has to come from them though, i don't want one of you guys to go out and buy the game for me.
18:46 karolherbst: saints row 2
18:46 karolherbst: :D
18:46 karolherbst: Yoshimo: but we will ask the VP guys first
18:46 karolherbst: in the end it will be cheaper for them to just give a license to imirkin_
18:47 imirkin_: and this isn't like a promise that i'm going to fix it either
18:48 karolherbst: right
18:48 karolherbst: sad that we can't trace it
19:02 Celti: huh, over half my Steam library has Linux support now
19:36 Yoshimo: nice to hear Celti
22:19 mupuf: karolherbst: /me does not feel like going through the remaining patches now
22:19 mupuf: too tired, and I do not want to miss anything
22:19 mupuf: so... will be left for tomorrow!
22:20 karolherbst: yeah, no worries
22:20 mupuf: if you want to make changes to all the [pc]statei variables, please do :)
22:20 mupuf: let's try to have a clean series by the end of the week so Ben can have a look at it in time for the 4.7
22:21 mupuf: well, rc5 is coming closer, so, hard to tell if it will make it in time
22:23 karolherbst: yeah
22:28 karolherbst: mupuf: thanks for reviewing by the way :)
22:32 mupuf: you're welcome
22:37 karolherbst: :)
22:37 karolherbst: CodeXL is building :)
22:37 karolherbst: "Failed to build HSAFdnCommon, 64 bit" no shit...
22:38 karolherbst: nice, can be disabled
23:17 huelter: having random lockups on a NVE4 660Ti card, similar bug on a NVE6 660 (posted a comment on it: https://bugs.freedesktop.org/show_bug.cgi?id=69882). The odd thing is that dmesg is not able to log the error and gets corrupted (http://pastebin.com/KLeA4Rpn). Using the intel card on my system has no crashes, so I know it's the nvidia card that is the culprit (on windows it runs fine).
23:17 huelter: mesa is not involved imo
23:18 imirkin_: huelter: what are you doing when this happens?
23:18 huelter: browsing, sometimes idling
23:18 imirkin_: bleh =/
23:19 huelter: totally
23:19 huelter: been trying to use this on each kernel version since 4.1 I guess, but it's still happenning
23:19 imirkin_: if it's a new issue between 4.0 and 4.1, perhaps you can bisect?
23:20 imirkin_: should be enough to bisect drivers/gpu/drm/nouveau
23:20 huelter: I'm pretty sure it happens before that as well, maybe this card always had the bug since supported?
23:20 imirkin_: entirely possible.
23:21 imirkin_: do you have a second computer?
23:21 huelter: yep
23:21 imirkin_: if so, you could try to get logs using netconsole
23:21 imirkin_: depending on just how hard it dies
23:22 huelter: do you think I should make a new bug report?
23:22 imirkin_: definitely
23:23 imirkin_: but collect information first
23:23 huelter: ok
23:23 imirkin_: "i'm having random hangs", while problematic, isn't really debuggable
23:23 karolherbst: imirkin_: I am quite sure something in the GR firmware is faulty (like 75%)
23:23 huelter: yea, also log corruption is not fun
23:23 imirkin_: nor is there any reason to believe that the cause of 1 person's random hangs is the cause of a second person's
23:23 karolherbst: I know
23:24 imirkin_: huelter: why did you mention kernel 4.1? did you just happen to start using nouveau then?
23:24 imirkin_: or did you notice an uptick in hangs?
23:24 huelter: it's the earlier version I remember trying
23:26 karolherbst: ohh xembedsniproxy again...
23:26 huelter: at first I thought it was mesa, but then I learned a bit more about how the stack works
23:28 huelter: what I think is that this card has a firmware peculiarity not seen by driver devs, maybe it was not so popular and nobody bothered to report it yet
23:29 huelter: now with kepler memory reclocking I had hopes this was fixed, but the problem remains
23:29 imirkin_: generally this stuff isn't specific to a single gpu
23:29 karolherbst: huelter: did you reclock?
23:29 imirkin_: it COULD be that the card comes up in some funny under-volted/etc states
23:29 huelter: yep, tried other profiles
23:29 imirkin_: and actually reclocking would *fix* it
23:29 huelter: the same thing happens
23:29 imirkin_: but... unlikely
23:29 imirkin_: yeah
23:29 karolherbst: yeah welll
23:29 karolherbst: with stock nouveau reclocking is a bad idea
23:29 karolherbst: because the cores are like 10% undervolted
23:29 imirkin_: did you see karolherbst's recent series to get reclocking into better shape?
23:30 huelter: not yet, tried just what landed on 4.5
23:30 karolherbst: well 4.5 was gddr5 fix I think, right
23:30 imirkin_: karolherbst: hook him up :)
23:30 karolherbst: :D
23:30 karolherbst: huelter: wanna an out of tree build or compile your own kernel?
23:31 huelter: I'm running gentoo, so whichever is faster
23:31 imirkin_: (if you already compile your own kernel, the out-of-tree is going to be faster.)
23:32 huelter: the idea is to up the voltage then?
23:33 karolherbst: https://github.com/karolherbst/nouveau.git
23:33 karolherbst: stable_reclocking_kepler_v ....
23:33 karolherbst: I have to think
23:33 karolherbst: stable_reclocking_kepler_v4
23:33 imirkin_: well... the idea is that there could be a chance that the vbios messes things up a bit when bringing the board up
23:34 karolherbst: v5 is currently fixing issues in the last series, so I didn't run anything on it yet
23:34 imirkin_: this wouldn't be noticed when they were testing the card if it's not that far off, and the windows driver reconfigures it correctly
23:34 karolherbst: right
23:34 huelter: I see
23:41 huelter: will nouveau ignore VBIOS and reconfigure?
23:41 imirkin_: nope
23:41 imirkin_: this stuff isn't exactly documented
23:41 imirkin_: and there's a lot more ways to break it than there is to make it work
23:42 imirkin_: so we default to leaving things alone
23:57 karolherbst: imirkin_: yeah I was also thinking about gpus where the defaults are just bad :/