01:49 RSpliet: skeggsb: I thought I heard a bang during my sleep... I see your rewrite landed
01:49 RSpliet: congrats ;-)
01:51 skeggsb: RSpliet: thanks. took me long enough! still need to write proper commit messages with what changed etc, but wanted to push the code out so gnurou could test/fix gk20a since i didn't test it
01:53 RSpliet: all in due time
03:03 mupuf: skeggsb: I guess this won't make it for 4.3 then :D
03:06 mupuf: would be nice to have a small explanation of the point of some changes
03:07 karolherbst: skeggsb: thanks for the push :)
03:07 karolherbst: mhhh
03:07 karolherbst: there are a lot of commits :D
03:08 pq: I count 255 :-o
03:08 karolherbst: pci subdev seems nice
03:08 karolherbst: allthough its a lot of code for not so much stuff
03:09 mupuf: gpuobj: type-safe accessor macros --> yeeeeppppeeeeeee!!!
03:09 karolherbst: now I can think of nice pci functions
03:10 karolherbst: changeLinkSpeed(enum SPEED)?
03:10 mupuf: before doing so, you need to parse the vbios for that
03:10 mupuf: I have code for it that I started writing
03:10 mupuf: I can give it to you
03:11 karolherbst: is there anything in the vbios about pci link stuff?
03:11 mupuf: I really need to stop going out every week end and work on it again
03:11 mupuf: but hey, the summer is almost over!
03:11 karolherbst: second function: changeLinkVersion(enum VERSION)
03:11 karolherbst: and changeLinkCap
03:11 mupuf: why would you change the linkcap?
03:11 karolherbst: we have on some cards
03:12 karolherbst: *to
03:12 mupuf: does the blob do that?
03:12 karolherbst: tesla and fermi
03:12 karolherbst: yeah
03:12 karolherbst: they start with a 2.5 link cap
03:12 mupuf: ok then :D
03:12 karolherbst: tesla and fermie are odd anyway
03:12 mupuf: I wonder if it is safe to do for all the gpus or just a few
03:12 karolherbst: there are some which start at pci v1
03:12 karolherbst: and we have to move to v2 first
03:12 karolherbst: then increase cap, then increase speed
03:13 karolherbst: no
03:13 karolherbst: its not safe for all
03:13 karolherbst: like for yours in reator :p
03:13 karolherbst: got it to crash while moving to 8.0
03:13 mupuf: ah ah
03:13 karolherbst: but it was in the non main slot
03:13 mupuf: and how do we check that?
03:13 karolherbst: mhh
03:13 karolherbst: there are some regs
03:13 karolherbst: but om kepler its different
03:13 karolherbst: usually on tesla/fermi we can read a lot of stuff out from regs
03:13 mupuf: looks like fun for you!
03:13 karolherbst: yeah, a lot
03:14 karolherbst: basically we have to know what is the max of the card
03:14 karolherbst: max of the board
03:14 karolherbst: max of the current slot
03:14 karolherbst: max of the current configuration
03:14 karolherbst: imagine we have two or more nvidia cards
03:15 karolherbst: one in slot with max 8.0
03:15 karolherbst: the others in 5.0
03:15 karolherbst: but not all can drive at 5.0 for whatever reason
03:15 karolherbst: and if the one wants do 8.0 all others hav eto do 2.5
03:15 karolherbst: maybe this is a thing
03:15 karolherbst: who knows
03:15 karolherbst: width management will be also ugly
03:15 karolherbst: especially with dynamic reclocking
03:16 karolherbst: but currently I want to debug my orderInstruction thing, but qtcreator doesn't want to
03:16 karolherbst: so I have to use gdb :(
03:19 mupuf: device: import pciid list and integrate quirks with it --> interesting, we can start displaying the official name of the gpu
03:25 karolherbst: mupuf: wanna test the gddr5 stuff out? I am pretty sure it kind of works, but there are some other issues left :/
03:26 mupuf: possibly friday, yes
03:26 karolherbst: okay, nice
03:26 mupuf: I need to check why my nve6 is not recognized by nouveau drm anymore
03:26 karolherbst: I don't get any hangs, so I can't investigate further :D
03:26 karolherbst: mhh
03:27 karolherbst: with debug=debug there were some class issues
03:27 mupuf: ah, right
03:27 karolherbst: maybe the card reports stupid pci ids or something
03:27 mupuf: that should be pretty easy to fix then
03:27 karolherbst: mine does the same when its off
03:27 karolherbst: have to turn it on with bbswitch before nouveau can use it
03:27 karolherbst: not after boot though
03:27 karolherbst: then nouveau can turn it on
03:27 mupuf: the gpu is on, it displays stuff
03:28 karolherbst: ahh I see
03:28 mupuf: and it is a desktop pc
03:28 karolherbst: yeah, I just wanted to let you know there are some issues like that ;)
03:28 mupuf: thanks :)
03:29 karolherbst: I think I need a kepler card with pretty high mem clocks
03:29 karolherbst: and with pretty high I mean above 8GHz
03:29 mupuf: does it exist?
03:29 karolherbst: checking
03:29 karolherbst: 780 has 7GHz
03:30 karolherbst: nope, 7 is current max
03:30 karolherbst: oh well
03:30 mupuf: 3.5GHz... that is pretty high :D
03:30 karolherbst: /2
03:30 karolherbst: gddr5 is * 4
03:30 mupuf: how do they do that?
03:30 karolherbst: don't know
03:31 karolherbst: wikipedia: "1752.5 (7010)"
03:31 mupuf: maybe two clocks with 90° difference
03:31 karolherbst: 4.0 Gbit/s (1 GHz)
03:32 karolherbst: "However, it can open two memory pages at once, which simulates the dual-port nature of other VRAM technologies."
03:33 karolherbst: yeah I think they do crazy stuff
03:33 karolherbst: I was worried
03:33 karolherbst: okay, I can also go above 8GHz with my card
03:33 karolherbst: no problem
03:33 karolherbst: but I am a bit scared about heat .D
03:33 karolherbst: :D
03:35 mupuf: meh
03:35 mupuf: although RAM definitely does consume quite a lot
03:35 karolherbst: I feel uneasy clocking to double clock
03:35 karolherbst: and above
03:36 mupuf: and you do that without changing the timings?
03:36 karolherbst: I do that with the blob
03:36 mupuf: hmm, I wonder if they recompute the timings
03:37 karolherbst: yeah, maybe
03:37 mupuf: maybe they do a linear interpolation of some sort
03:37 karolherbst: the timings are in the bios though
03:37 mupuf: yes
03:37 mupuf: stored as a blob
03:37 karolherbst: well isn't the entire vbios a blob?
03:37 mupuf: no
03:38 mupuf: I mean, yes, but in this case, I meant they provide you with the value you need to write
03:39 mupuf: hence, there is no semantic other than the register's bitfield
03:39 mupuf: which we may or may not know :_
03:39 mupuf: It is pretty nice that they did it :)
04:43 pmoreau: karolherbst: If you were referring to the "illegal class" message from debug output, this is by construction: when creating the PDISP object, it loops over all possible PDISP classes and try to create an object of that class. If it fails, it tries the next one until one is found to work. Hence you'll get "illegal classes" messages until it found the correct one.
05:37 SolarAquarion: https://usercontent.irccloud-cdn.com/file/aHkTAMLw/irccloudcapture-1084637434.jpg
05:53 karolherbst: something is odd
06:21 karolherbst: imirkin: do I have to check inside the join point for instructions already scheduled?
06:22 karolherbst: like I do insn->getDef(i)->uses[j]->join->defs(k) in alreadyScheduled?
06:26 karolherbst: ohh now the uses in join I have to check I guess
07:07 imirkin: karolherbst: ->join isn't anything you want to touch
07:07 imirkin: karolherbst: it's some stupid RA thing iirc
07:07 imirkin: for coalesced values
07:08 karolherbst: yeah I already saw
07:08 karolherbst: I think I slowly get it
07:08 karolherbst: now I do something like that: for (unsigned int i = 0; i < insn->defCount(); i++)
07:09 karolherbst: Value * def = insn->getDef(i);
07:09 karolherbst: for (std::tr1::unordered_set<ValueRef *>::iterator it = def->uses.begin(); it != def->uses.end(); it++)
07:09 karolherbst: Instruction * ref = (*it)->getInsn();
07:09 karolherbst: and ref can be an instruction inside my list with nonSchedulable instructions
07:09 imirkin: perfect.
07:09 karolherbst: problem though
07:09 imirkin: you can't just mark ref as schedulable of course
07:10 imirkin: you have to recheck whether it is or not
07:10 karolherbst: I try first to add these to my list with srcCount 1
07:10 imirkin: since you could have like
07:10 karolherbst: yeah
07:10 karolherbst: I know
07:10 imirkin: mov a, 1; mov b, 2; add c, a, b
07:10 karolherbst: if this instruction has a srcCount of 1
07:10 imirkin: ok cool :)
07:10 karolherbst: is it always my scheduled instruction?
07:10 imirkin: lol
07:10 imirkin: maybe
07:10 imirkin: but that's such a dumb hack
07:10 karolherbst: yeah I already noticed
07:10 imirkin: when the real thing is so easy ot do
07:10 karolherbst: found two indirects of those
07:10 karolherbst: ohh
07:11 karolherbst: theck the one src against my scheduled one?
07:11 karolherbst: *check
07:11 imirkin: just check everything
07:11 karolherbst: yeah I know, but I want to test this first and see if that works
07:11 imirkin: [obviously after updating the scheduled set with the currently-processed instruction]
07:12 karolherbst: yeah
07:12 karolherbst: this is the first thing I do
07:13 karolherbst: ohh wait
07:13 karolherbst: oh now, its cool
07:13 karolherbst: for a second I thought I forget that I have to -1 for index access
07:13 karolherbst: but I don't need that anyway
07:18 karolherbst: something is not right here : https://github.com/karolherbst/mesa/commit/65a1d2a02f2d0c458213aeb24ecddcd0aef4dfa2#diff-bb3cc04dda7921a13da7e4e48cc6166dR479
07:18 karolherbst: ohh
07:18 karolherbst: I see it
07:18 karolherbst: ohh
07:18 karolherbst: now
07:18 karolherbst: *no
07:18 karolherbst: I don't
07:19 karolherbst: uhg
07:19 karolherbst: now I see it
07:19 imirkin: and now you don't :)
07:20 karolherbst: much better
07:20 karolherbst: "} else if (ref->getIndirect(i, 0) != NULL || ref->getIndirect(i, 1) != NULL) {" copy paste error
07:20 karolherbst: I used the wrong index
07:20 karolherbst: okay, heaven somehow runs now
07:20 karolherbst: but with minor issues
07:20 karolherbst: less issues than my last try though
07:21 karolherbst: I get some "nvc0_program_translate:567 - shader translation failed: -4" and "ERROR: failed to coalesce phi operands"
07:22 imirkin: that means you accidentally the whole thing
07:23 imirkin: print things out :)
07:23 imirkin: in case you can't tell, i debug with prints
07:23 karolherbst: sadly my test shader works
07:23 imirkin: not with fancy IDE's
07:23 imirkin: does it? i want to see the before/after
07:23 imirkin: just because it doesn't crash doesn't mean it works
07:23 karolherbst: yeah okay
07:23 imirkin: although if it does crash, it definition means it doesn't work :)
07:24 imirkin: definitely*
07:25 karolherbst: http://www.filebin.ca/2CnbYsBMLj3Q/out
07:26 karolherbst: "left to be scheduled 55" is just the number I didn't scheduled in a block, so some leftovers because of my really bad check
07:26 karolherbst: the number is higher though without the checks
07:26 imirkin: you can't *not schedule* stuff
07:26 karolherbst: I mean I still add them
07:26 imirkin: oh
07:27 karolherbst: but not in my scheduling login
07:27 karolherbst: just after that
07:27 imirkin: errrr
07:27 imirkin: fix your schedule logic
07:27 imirkin: no point in looking until you have.
07:27 karolherbst: the thing is I add 5 instructions to my schedulable list and then it doesn't work anymore :/
07:28 imirkin: then fix it.
07:28 karolherbst: yeah okay
07:28 karolherbst: but if I can't find the one with srcCount == 1 I won't find the others without issues anyway
07:29 imirkin: don't restrict it like that
07:29 imirkin: make a thing that actually works
07:29 gtx950: http://nouveau.freedesktop.org/wiki/CodeNames/#nv110familymaxwell can someone please update the Maxwell table here? http://www.geforce.com/hardware/desktop-gpus/geforce-gtx-950/specifications GM206
07:30 karolherbst: yeah well, I plan to remove the restriction afterwards, I just want to go step by step there
07:30 karolherbst: I mean okay, if I have an instruction and all src instructions are equal to my schedule onse, is this already schedulable or do I have add other checks?
07:31 imirkin: gtx950: done
07:31 imirkin: karolherbst: it's schedulable.
07:31 gtx950: imirkin: can you please add GM200 also, http://www.geforce.com/hardware/desktop-gpus/geforce-gtx-titan-x/specifications http://www.geforce.com/hardware/desktop-gpus/geforce-gtx-980-ti/specifications
07:32 imirkin: karolherbst: if all sources have been shceduled, or are in a different bb
07:32 karolherbst: ahh okay, I have to check the bb too, right
07:32 gtx950: missing from the Maxwell table
07:32 imirkin: gtx950: what's the chip id? 0x200?
07:32 imirkin: err
07:32 imirkin: 0x120?
07:32 gtx950: I'm not sure, sec
07:33 imirkin: mmio register 0 from it should have that.
07:33 imirkin: should you happen to have one lying around :)
07:33 gtx950: http://tpucdn.com/reviews/NVIDIA/GeForce_GTX_980_Ti/images/gpuz_oc.gif devide id for GTX 980 Ti
07:33 gtx950: https://tpucdn.com/reviews/NVIDIA/GeForce_GTX_Titan_X/images/gpuz_oc.gif Titan X
07:34 imirkin: that doesn't answer my question
07:34 gtx950: sorry then
07:34 imirkin: that screen doesn't have the info i was looking for
07:34 gtx950: NV120 I guess?
07:35 imirkin: gtx950: yeah, that's a guess
07:35 imirkin: anyways, i don't really care about that table being complete... when someone shows up with one, i can add it.
07:45 karolherbst: imirkin: do you know for which situation "Instruction *Value::getInsn() const" can segfault?
07:45 karolherbst: which means defs.front() returns NULL
07:45 imirkin: yeah, if it has no defs ;)
07:45 karolherbst: but defs.empty() is false
07:45 imirkin: yeah
07:46 imirkin: but defs has ValueDefs
07:46 imirkin: not Values
07:46 imirkin: defs is always a non-0 size
07:46 karolherbst: okay
07:46 imirkin: anyways...
07:46 imirkin: er hm
07:47 imirkin: Value::getInsn() shouldn't die....
07:47 karolherbst: yeah that what I thought
07:47 SolarAquarion: weird. Only sessions that are started with wayland works
07:50 karolherbst: wait what ..
07:50 imirkin: karolherbst: i guess the ValueDef* is null somehow?
07:50 imirkin: that is very odd.
07:50 karolherbst: I checked the Def now too
07:50 karolherbst: mhh
07:50 karolherbst: maybe I did something wrong
07:50 karolherbst: ohhh
07:50 karolherbst: ohh
07:50 karolherbst: I know
07:51 karolherbst: imirkin: remember our talk about std container front() on empty?
07:51 karolherbst: now I hit this
07:52 karolherbst: now this segfaults: "ValueDef *srcDef = src->defs.front();" and src->defs.empty() == false
07:53 imirkin: i have no idea how that can happen other than memory corruption
07:54 imirkin: run with valgrind?
07:54 karolherbst: will go to O0 now :/
07:54 karolherbst: even 0g optimized too much
07:54 imirkin: Og isn't about opt
07:54 karolherbst: I know
07:54 imirkin: oh i guess it is
07:55 karolherbst: still got optimized out values
07:55 imirkin: -Og enables optimizations that do not interfere with debugging.
07:55 karolherbst: yeah
07:55 karolherbst: it's not as bad as O0
08:00 karolherbst: okay src was already a bad pointer
08:00 karolherbst: 0x430
08:00 karolherbst: ohh no, src is fine
08:04 karolherbst: ugh
08:04 karolherbst: used wrong srcCount()
08:05 karolherbst: now I get "ERROR: no viable spill candidates left"
08:09 xexaxo: holly diff-stat...
08:10 xexaxo: also hello pci-ids list :)
08:57 imirkin_: skeggsb: no obvious regressions from your latest tree observed on my gk208 (yet)
08:58 karolherbst: imirkin_: how do I have to handle indirects by the way?
08:59 imirkin_: indirect refs are a source like any other
08:59 imirkin_: same as you would any other source
09:01 karolherbst: so I get the indirect Value and check its instruction
09:01 imirkin_: yep
09:02 karolherbst: if insn->getIndirect(2, 0) != NULL will be insn->getSrc(i) == NULL? or is there no connection
09:02 imirkin_: hmmmmmmmmm
09:03 karolherbst: 2 instead of i, but I think you got what I meant
09:03 imirkin_: oh
09:03 imirkin_: i actually didn't
09:03 imirkin_: i thought the i was a 1 ;)
09:03 karolherbst: nope, 2
09:03 imirkin_: so i thought you were asking if the sources were guaranteed to be packed
09:03 karolherbst: ahh
09:03 imirkin_: the answer to which i don't know *for sure*
09:03 imirkin_: but i think they are
09:04 karolherbst: no, I meant it for the same index only one has a value and the other not
09:04 imirkin_: there can be odd cases with e.g. flags/pred sources, which aren't taken care of as carefully as "normal" sources
09:04 karolherbst: no, I have to check against NULL
09:04 karolherbst: mhh
09:04 imirkin_: for the same index, it doesn't make sense to have indirect refs
09:04 karolherbst: so best if I check both
09:04 imirkin_: if there's no actual source
09:04 imirkin_: i.e. that is just a meaningless thing
09:05 imirkin_: so if getSrc(i) == NULL, no need to check indirects
09:05 karolherbst: okay
09:06 imirkin_: skeggsb: btw i still get "nouveau 0000:01:00.0: priv: HUB0: 086014 ffffffff (1e70820c)", but i had that before too
09:10 imirkin_: skeggsb: not sure what "priv" means in there... that used to say either PGRAPH or PBAR
09:15 imirkin_: karolherbst: btw, another approach to scheduling is to schedule from the end
09:16 imirkin_: i.e. compute the list of instructions on which nothing depends
09:16 imirkin_: etc
09:17 imirkin_: this has some small implications wrt latency scheduling (i.e. wait 4 instructions before using output of instruction X), but i'm unconvinced whether it's actually better
09:18 glennk: it gets real funky with loops if one cares about that
09:18 imirkin_: glennk: right now it's just intra-bb scheduling
09:18 imirkin_: glennk: let's not get ahead of ourselves :)
09:18 glennk: hundreds of papers on scheduling algorithms to trawl through otherwise :-)
09:19 imirkin_: which all say conflicting things
09:19 imirkin_: which to me means "just do whatever makes sense, it's all horrid"
09:19 glennk: pretty much, its the hardware specific goo that makes or breaks it
09:19 imirkin_: yeah. calim said he played around with it, but could never beat doing nothing at all
09:20 glennk: probably on fermi then
09:20 imirkin_: doubtful
09:20 imirkin_: he did most of his stuff on a kepler
09:20 imirkin_: i don't know that he even had a fermi
09:40 karolherbst: I think I currently have a spilling issue which I somehow can't get solved
09:40 karolherbst: at least it don't segfault for me anymore
09:41 imirkin_: karolherbst: you seem to know C++... does this seem generally reasonable to you? http://patchwork.freedesktop.org/patch/52337/
09:41 karolherbst: I think the problem with heaven are these 700 instruction long blocks with no fixed one
09:42 karolherbst: I just assume you replaced evrything fine
09:43 imirkin_: ?
09:44 karolherbst: imirkin_: idea: because I don't like to depend on stuff like that, why don't add a compilation check for <unordered_set>
09:44 karolherbst: souldn't automake have such?
09:44 imirkin_: well (a) it won't work for older android versions
09:44 karolherbst: ANDROID won't be the only one without <unordered_set>
09:44 imirkin_: and (b) i don't want to bump the min gcc requirement
09:44 karolherbst: I meant you should check if <unordered_set> is there and if not check if <tr1/unordered_set> is there
09:44 imirkin_: that's ~what this patch does
09:44 imirkin_: look at codegen/unordered_set.h
09:45 karolherbst: you check compiler there
09:45 karolherbst: I meant maybe its better you have it in automake
09:45 karolherbst: more generally
09:45 imirkin_: how would having something in automake make the codegen/unoered_set.h simpler?
09:46 imirkin_: i.e. automake would provide you with HAVE_FOO macros
09:46 imirkin_: great
09:46 karolherbst: not simplier, but usable with more plattforms
09:46 imirkin_: meh
09:46 karolherbst: but if android is the only one, then maybe its fine this way
09:46 imirkin_: when another platform has trouble with this, they can improve it
09:46 karolherbst: mhh
09:46 karolherbst: I dislike global using declerations
09:46 imirkin_: yeah, i'm not a fan either
09:46 karolherbst: they can be... akward
09:47 imirkin_: however unordered_set seems like an ok one to do it for
09:47 imirkin_: and i don't see a much better way around it
09:47 imirkin_: it really messes up if you do something like "using std"
09:47 imirkin_: but a single class/functino seems less bad
09:48 karolherbst: mhh wait, there may be something better
09:49 karolherbst: I don't know what the standard says about adding unnamed namespaces inside std :/
09:49 imirkin_: and apparently that void_ptr_set thing works around some sort of idiocy in the android c++ impl
09:49 imirkin_: yeah i'm not touching that
09:49 imirkin_: half the compilers out there get this stuff wrong
09:50 imirkin_: i'm going to push it (minus the android.mk change)
10:36 imirkin_: calim: ping
10:43 joi: skeggsb: make in main dir generates this: http://paste.ubuntu.com/12136715/
10:53 joi: it builds with this patch: http://paste.ubuntu.com/12136851/
11:03 xexaxo: imirkin_: long story short -> android 4.4 and earlier use a bastard implementation of C++ which even undefines isfinite (remember is std::isfinite suggestion)
11:03 karolherbst: imirkin_: can I do anything about the "ERROR: no viable spill candidates left" or do I have to schedule right before?
11:04 imirkin_: karolherbst: you're probably doing something illegal which is confusing the spill logic
11:04 karolherbst: ohh I see
11:04 karolherbst: some corner case?
11:04 imirkin_: dunno
11:04 imirkin_: get the shader that does that (the tgsi) as well as the before/after for your spilling thing
11:04 tobijk: or the spill logic is bad :/
11:04 imirkin_: you probably mess up spills
11:04 karolherbst: its the same
11:05 karolherbst: I know that if some instruction jump to far away I get those
11:05 imirkin_: tobijk: spill logic is mostly fine.
11:05 karolherbst: thats why I had this 731 flush thingy
11:05 imirkin_: tobijk: it's not *optimal* but it does generally work.
11:05 tobijk: imirkin_: "mostly" "generally", we might hit corner cases as karolherbst said
11:05 tobijk: as i had hit back in my unspill days :/
11:06 karolherbst: https://gist.github.com/karolherbst/2d6cdbdc26b8ce483de8
11:06 tobijk: not saying i could wite a better spill logic ;-)
11:06 karolherbst: old and new shouldn't be that much off
11:06 karolherbst: both are resulst but one schedules a bit more
11:06 karolherbst: and old suceeds
11:06 tobijk: karolherbst: can you diff old/new?
11:06 karolherbst: wait
11:07 imirkin_: tobijk: i'm not going to sit here and claim that the spill code is perfect. however i'm not aware of any deficiencies in it other than a very minor one which he's not hitting (error would be different)
11:08 karolherbst: mhh
11:08 karolherbst: a diff file won't help that much
11:08 karolherbst: index are too much off and thats confusing
11:08 karolherbst: visual differ is best I guess
11:08 karolherbst: ohh I think this gist is something else
11:08 karolherbst: mhh
11:09 karolherbst: will finish my little cleanup and generate new outputs
11:24 karolherbst: are these with srcCount() == 0 always schedulable?
11:24 imirkin_: they should all be fixed
11:24 imirkin_: (what's an example of something with srcCount==0?)
11:25 karolherbst: I get some or my algorithm is bad
11:25 imirkin_: but in general sure, schedulable whenever
11:25 imirkin_: i can't think of any ops that wouldn't be fixed
11:25 karolherbst: I check
11:25 imirkin_: but have 0 sources
11:26 karolherbst: srcCount: 0 fixed 0
11:26 karolherbst: std::cout << "srcCount: " << insn->srcCount() << " fixed " << insn->fixed << std::endl;
11:27 karolherbst: I get this like 4 times inside this big shader
11:28 karolherbst: and if I schedule only those, unigine isn't happy
11:28 karolherbst: I mean it compiles fine and all
11:28 karolherbst: but visually there is something wrong
11:31 stole: I have a Quadro 1000M on a W520. How is nouveau support for this card? I'm using nouvau now, and I'm experiencing a lot of tearing in browsers mostly with GLXVBlank=TRUE and SwapLimit=2
11:32 karolherbst: ohh could by my mistake though, because I loosing exactly them
11:32 karolherbst: imirkin_: joinat should be fixed?
11:32 imirkin_: ya
11:32 imirkin_: er
11:32 imirkin_: i guess not
11:32 karolherbst: okay
11:32 imirkin_: you can schedule it whenever
11:32 karolherbst: I hace some they are and some which are not
11:33 imirkin_: it doesn't really matter where in the bb you execute it
11:33 karolherbst: but I entiry loose them, so its my fault
11:33 imirkin_: just... before any branches ;)
11:33 paulk-collins: hi there
11:33 paulk-collins: I asked the same question in #radeon, I wish to learn about the situation on nvidia:
11:33 tobijk: but joinat needs to be behind all insn we have in front of it at the beginning of the schedule pass?
11:34 imirkin_: tobijk: doesn't matter
11:34 tobijk: mh wouldnt it change the insctruction flow then?
11:34 paulk-collins: are there free firmwares for video decoding videos and are they using the GPU's ISA or is it a separate CPU?
11:34 tobijk: joinat is for joining after if/for, right?!
11:34 paulk-collins: is it running on a separate GPU*
11:35 karolherbst: tobijk: the situation is, some joinat are fixed and some are not, at least for me here. Could be that I didn't get all joinats though
11:36 imirkin_: paulk-collins: no free firmware, separate processor.
11:36 tobijk: karolherbst: i'm just thinking loud here, beeing less an expert compared to you :)
11:36 paulk-collins: imirkin_, thanks!
11:36 imirkin_: paulk-collins: actually several separate processors, and depends on GPU generation
11:36 glennk: paulk-collins, basically all gpu video decoders are at least partially hardwired for power consumption reasons
11:36 karolherbst: I am no expert
11:37 tobijk: yeah and i'm less :P
11:37 paulk-collins: glennk, right, thanks
11:37 karolherbst: did you ever wrote a compiler?
11:37 karolherbst: for whatever
11:37 paulk-collins: any clue what ISA it's doing and how hard it would be to replace that?
11:37 tobijk: a really simple, never got so spilling :/
11:37 tobijk: *to
11:37 imirkin_: karolherbst: i don't think any joinat's have to be fixed
11:37 karolherbst: tobijk: so you are more of an expert than I am :D
11:38 karolherbst: okay
11:38 glennk: paulk-collins, custom generated ISA for the specific set of codecs
11:38 imirkin_: karolherbst: could be either a mistake or could be that i'm missing something
11:38 karolherbst: I first have to fix my missing instructions
11:38 imirkin_: paulk-collins: which GPU are you targeting?
11:39 imirkin_: karolherbst: joinat just pushes some stuff onto the call stack... maybe it also flips the gpu into single-thread mode so you want to schedule it lateR? not sure.
11:40 paulk-collins: glennk, ok
11:40 karolherbst: would make sense
11:40 karolherbst: tex* stuff early, joinat late
11:40 paulk-collins: imirkin_, oh nothing in particular, just learning about the situation
11:40 imirkin_: paulk-collins: it'd be a significant effort
11:41 imirkin_: once you figure out how the damn thing works you're still stuck with the non-trivial task of actually implementing the codec decoding
11:41 imirkin_: and these custom vector isa's follow no rhyme or reason... they're designed for decoding codecs a, b, c, d, and have specialized logic that makes *that* faster, not general computation.
11:42 paulk-collins: right
11:42 paulk-collins: thanks
11:42 paulk-collins: looks like a big chunk of work indeed
11:42 karolherbst: mhh
11:42 karolherbst: strange
11:42 karolherbst: I am pretty sure I push those joinats out
11:43 karolherbst: maybe in heavon its something else
11:43 tobijk: make them fixed for once and see if its the problem? :D
11:44 Airwave: karolherbst: Follow-up on yesterday (regarding kernel panics with nouveau): I was apparently misremembering things. They weren't kernel panics, just X11 freezing.
11:44 karolherbst: maybe heavon has another problem, will print out operation
11:44 karolherbst: ohhh
11:44 karolherbst: yeah, that happens sometimes
11:45 karolherbst: acutally for my kwin just freeze and has to be restarted
11:45 Airwave: Happens daily for me with nouveau.
11:45 karolherbst: ddx bug then I guess
11:45 Airwave: Curiously there is no log activity when the freeze happens. There are plenty of errors during run though.
11:45 Airwave: http://ur1.ca/nht31
11:45 karolherbst: Airwave: do you know if you use exa or glamor?
11:45 karolherbst: ohh these are indeed kernel issues
11:46 Airwave: karolherbst: I'm afraid I don't know what either of those are.
11:46 imirkin_: good ol' 0x00406040
11:46 imirkin_: nfc why it happens. but you're not alone.
11:46 imirkin_: Airwave: did you say that some older kernel was more reliable?
11:47 karolherbst: I think he said it was always shit :D
11:47 Airwave: All the log output on Aug 19 was way before the freeze. No output during the freeze. The last output on the morning of Aug 20 is during X11 restart.
11:47 Airwave: Yeah.
11:48 karolherbst: imirkin_: seems to know the bug, so he is the guy now :p have fun you two
11:48 Airwave: imirkin_: It's been about the same as long as I've been trying it (for about two years).
11:48 Airwave: ;-)
11:48 imirkin_: i know the bug as in i know a bunch of other people also get it
11:48 imirkin_: i haven't the faintest idea why it happens
11:48 imirkin_: but basically the command submission processor thing dies
11:48 imirkin_: and that does not bode well for applications using it :)
11:49 Airwave: I don't know if that log output is related to the freeze though. The output happened before the freeze, with seemingly no adverse effects during the output.
11:49 imirkin_: basically everything is a result of that first error
11:49 imirkin_: nouveau E[ PFIFO][0000:02:00.0] DMA_PUSHER - ch 2 [Xorg[2287]] get 0x0020024aec put 0x0020024b30 ib_get 0x00000107 ib_put 0x00000118 state 0x8000f5e0 (err: INVALID_CMD) push 0x00406040
11:49 Airwave: Okay
11:49 imirkin_: that's basically saying "hey, you know those commands you sent me, i'm going to ignore a random set of them"
11:49 Airwave: :-/
11:49 karolherbst: okay, joinat is NOT the issue
11:49 karolherbst: there are other crazy instructions with fixed false and srcCount 0
11:50 imirkin_: perhaps it's just something that the hw does and we have to handle in the driver somehow
11:50 imirkin_: karolherbst: such as?
11:50 Airwave: Anything I can do to help?
11:50 karolherbst: checking
11:51 Airwave: I'll try to report what I'm doing while it happens. The last freeze was while I was sleeping and the screensaver was running.
11:52 karolherbst: Airwave: I don't think this will help that much, because this is pretty lowelevel already? I could be wrong, but..
11:53 Airwave: Yeah, I thought so.
11:53 karolherbst: imirkin_: either 0, 56 or 57, checking which exactly
11:54 imirkin_: karolherbst: i was thinking of which opcode ;)
11:54 imirkin_: not the item in the list
11:54 karolherbst: insn->op
11:54 karolherbst: didn't check the list yet
11:55 imirkin_: karolherbst: print insn->print()
11:55 imirkin_: or something like that
11:55 Airwave: I also have another issue (not fatal, but maybe worth mentioning): When X starts up, for about a second a flash of screen contents from the /previous boot/ gets shown.
11:55 karolherbst: I thinks its 57, yeah will print this next
11:55 karolherbst: yep, 57
11:56 imirkin_: Airwave: that's semi-normal. vram doesn't always get cleared
11:56 Airwave: imirkin_: Okay.
11:56 Airwave: Potential security problem I guess, since sensitive information could be on that screen.
11:56 imirkin_: Airwave: we should probably be careful about displaying old buffers, but... aren't
11:57 Airwave: imirkin_: Would be nice to solve that issue, yeah.
11:58 karolherbst: this is gonne be messy
11:58 Airwave: But like I said, not exactly fatal.
11:58 imirkin_: Airwave: i don't think patches would be turned down that resolved it :)
11:58 imirkin_: but apparently no one has cared enough to resolve it
11:59 Airwave: Okay.
12:00 karolherbst: imirkin_: prebreak BB:3 (0)
12:00 Airwave: Thanks for the input, and thanks for the work you're doing on nouveau.
12:00 imirkin_: karolherbst: ah yeah. same deal as joinat.
12:00 karolherbst: but changing position changes visual in unigine
12:00 karolherbst: ohh wait
12:00 imirkin_: is there also a joinat?
12:00 imirkin_: you can't flip the relative order of joinat and prebreak :)
12:01 karolherbst: ahhh
12:01 karolherbst: couldn't this be added as a dep?
12:03 karolherbst: mhh
12:03 karolherbst: I could just ignore them and they will be pushed down anyway
12:04 karolherbst: imirkin_: yep
12:04 karolherbst: if I schedule one of them, visual changes
12:04 karolherbst: if I ignore both, everything is fine
12:05 imirkin_: you can't flip their order
12:05 imirkin_: if you flip their order, then the world will end
12:05 karolherbst: okay
12:05 karolherbst: are there others?
12:05 imirkin_: and/or you get visual corruption ;)
12:05 imirkin_: preret
12:06 imirkin_: and precont
12:06 imirkin_: if there is such a thing, i forget
12:06 karolherbst: yeah I see both
12:06 karolherbst: is it good if all of them are late?
12:10 imirkin_: calim: what is the reason for the logic in needNewElseBlock vs just using something like isCriticalEdge?
12:10 imirkin_: calim: why is it bad for this situation but ok for other critical edges?
12:11 imirkin_: calim: also it looks like your edge type management is off... you create a lot of FORWARD edges that should be CROSS as far as i can tell
12:11 imirkin_: calim: like for a if/else situation, one of the joining edgse should be CROSS but i think they both end up as FORWARD for you
12:13 imirkin_: whereas it should be TREE/CROSS i think. at least based on the usual tree/forward/back/cross definitions
12:30 karolherbst: wow, a mov got lost :O
12:30 imirkin_: mov's are important
12:30 karolherbst: mov u32 %r16919 0x3f374bc7 (0)
12:30 karolherbst: yeah but I don't know why
12:31 karolherbst: I mean why it got lost
12:31 imirkin_: you print right before and right after your pass right?
12:31 imirkin_: otherwise there's a ton of possible transformations
12:31 karolherbst: yes
12:38 karolherbst: ohh my output array isn't increasing in a case
12:38 karolherbst: imirkin_: is it possible that one instruction is used twice?
12:38 imirkin_: karolherbst: no
12:43 imirkin_: tobijk: any surprises from running piglit with DRI_PRIME?
12:43 imirkin_: tobijk: should i just run with gbm?
12:46 tobijk: imirkin_: huh? i never had problems
12:46 imirkin_: awesome
12:47 tobijk: do you have any problems or are you just seeking advice before testing? ;-)
12:50 tobijk: thinking back i had problems with piglit hanging my system, but that was fixed quite a while ago
12:51 imirkin_: just seeking advice
12:54 tobijk: imirkin_: i just do (from a console in my normal desktop env) LD_LIBRARY_PATH DRI_PRIME=1 ./piglit-run.py ...
12:54 imirkin_: tobijk: yeah, kicked that off already
14:15 imirkin_: grrrrrr tes clipdistance input fails.
14:17 tobijk: imirkin_: where does it fail
14:17 tobijk: i'm just in nvc0's state validate
14:17 imirkin_: generated_tests/spec/arb_tessellation_shader/execution/tes-input/tes-input-gl_ClipDistance.shader_test
14:21 imirkin_: hmmmmmmmmmm
14:21 imirkin_: i think we're doing stuff right
14:21 imirkin_: it's the gpu that's wrong!
14:21 imirkin_: ;)
14:22 tobijk: heh
14:22 tobijk: state_tracker/st_program.c:360:st_prepare_vertex_program: Assertion `attr >= VARYING_SLOT_VAR0 || (attr >= VARYING_SLOT_TEX0 && attr <= VARYING_SLOT_TEX7)' failed.
14:22 tobijk: i clearly do it right ;-)
14:22 karolherbst: ... and I still have the same stupid issue
14:22 imirkin_: i bet consuming clip distance somewhere in the middle makes it mad
14:22 imirkin_: without re-emitting it
14:22 imirkin_: oh well. this will never happen. i'll leave that failure alone.
14:23 imirkin_: i should just trace it on nv
14:24 tobijk: damn mesa why isnt that working, the masks looks fine: 00001111 / 11110000
14:24 tobijk: for a 4clip 4 cull :/
14:24 imirkin_: ... where it naturally passes
14:24 imirkin_: it's almost as if they tested their driver
14:25 tobijk: heh
14:25 tobijk: they have like 10 people only for the qa of their driver i guess :)
14:28 karolherbst: mhh
14:28 karolherbst: I am pretty sure I insert the right amount of instructions into the list
14:28 karolherbst: but the print is missing one
14:29 tobijk: karolherbst: i'm stuck, if you want i look over yours :)
14:30 karolherbst: tobijk: https://github.com/karolherbst/mesa/commit/922ac57ac2e3a6ffe60ce8ab4f5c5d6c0f840680
14:30 karolherbst: there is a lot of debugging stuff in it
14:30 karolherbst: currently this reschedules one instruction per bb
14:31 karolherbst: one which is scheduleable
14:31 karolherbst: but really rare I loose one instruction
14:31 karolherbst: but reproducable
14:32 karolherbst: orderInstructions is the function where the fun begins
14:32 karolherbst: and if I skip "calcSchedulable(depIns, noDepIns, output, bb);" it works
14:32 karolherbst: but calcSchedulable(depIns, noDepIns, output, bb); doesn't do anything wrong as far as I know
14:33 imirkin_: karolherbst: are you still doing the thing that is not the thing i suggested?
14:34 karolherbst: no
14:34 karolherbst: I debug something stupid
14:34 imirkin_: if (source->bb == bb)
14:34 imirkin_: that is not going to work.
14:34 karolherbst: I know, but this isn't the issue I have currently
14:35 karolherbst: I think this check can just be removed now anyway
14:36 tobijk: mh i dont see it where it fails :/
14:36 karolherbst: funy thing
14:36 karolherbst: print before size 4118
14:37 karolherbst: output.size() 4119 (is one bigger then the exec size becuase of exit instruction)
14:37 karolherbst: print after: size 4117
14:37 karolherbst: I actually push 4119 instructions out
14:37 karolherbst: but the print only prints 4117
14:38 karolherbst: maybe I fail at establishing the order consistency
14:38 karolherbst: that would be in finalizeList then
14:40 tobijk: the loop there looks fine as well, meh
14:45 karolherbst: in heaven I get a "heaven_x64: ../../../../../src/gallium/drivers/nouveau/codegen/nv50_ir_util.cpp:119: bool nv50_ir::Interval::extend(int, int): Assertion `a <= b' failed."
14:45 karolherbst: so I am pretty sure somewhere in the list is odd
14:47 karolherbst: allthough
14:47 karolherbst: is ++end() == end() ?
14:48 glennk: uhm, stepping past end is pretty undefined what its going to do
14:48 imirkin_: you can't ++end
14:49 karolherbst: mhh, no I don't ++end() actually
14:50 karolherbst: begin() == end() <=> empty() right?
14:50 karolherbst: then its fine
14:50 imirkin_: yes
14:50 karolherbst: yeah I never call this with an empty list
14:50 karolherbst: because this only happens with an empty bb
14:50 karolherbst: added some checks, still the issue
14:51 karolherbst: ohhh
14:52 karolherbst: something is odd with the switch (insn->op) { part
14:55 tobijk: whats with that?
14:56 karolherbst: I have to save the order of those, maybe I will add a special list for them instead :/
15:03 karolherbst: okay, now I have a algorithm which will schedule everything, but doesn't respect the order of those pre' instruction, also I get a serious spilling issue
15:07 karolherbst: imirkin_: spilling issues is just the result of a real bad order?
15:08 imirkin_: welllll
15:08 imirkin_: it should still _work_
15:08 imirkin_: just be super-slow
15:08 imirkin_: when you try to have 800 live values
15:08 karolherbst: yeah I see that
15:08 karolherbst: 2-3 fps
15:08 tobijk: hehe
15:08 karolherbst: some shader fail in compilation
15:08 imirkin_: there may be untested issues if you spill more than like 256 values
15:08 imirkin_: or 64, dunno
15:08 karolherbst: error -4?
15:08 imirkin_: yeah i have no idea what that is
15:09 karolherbst: okay
15:09 imirkin_: i suspect that spilling + reordering might be fighting too
15:09 imirkin_: so you might want to just reorder once before the whole "ra; spill; repeat" thing
15:10 karolherbst: ahh and skip the second one?
15:10 imirkin_: but the old orderInstructions did stuff
15:10 imirkin_: so you still need to do what it did before
15:10 karolherbst: okay, but that was trivial
15:11 imirkin_: yeah
15:11 karolherbst: I should keep that in "src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp" ?
15:11 imirkin_: but it still needs to be done, maybe
15:11 imirkin_: huh?
15:11 imirkin_: you shouldn't be touching that file
15:11 imirkin_: all your stuff should be generic
15:11 imirkin_: and call out to the target when you need to get e.g. latency info
15:11 imirkin_: (prog->target->...)
15:11 karolherbst: there is a "fn->orderInstructions(insns);"
15:11 imirkin_: oh right
15:11 imirkin_: after the texbar
15:12 imirkin_: mmmmmmmmmmmmmm
15:12 imirkin_: i'd leave it out
15:12 karolherbst: okay
15:12 imirkin_: perhaps you should make your thing be fn::reorderInstructions
15:12 karolherbst: like completly or do the old stuff there
15:12 imirkin_: and only call it once
15:12 imirkin_: i think do the old stuff
15:13 imirkin_: and/or evaluate whether that old stuff is even necessary, i really don't know
15:17 karolherbst: why do I get "ERROR: failed to coalesce phi operands" those?
15:18 karolherbst: I don't touch the phi instructions except that I push them out immediatly
15:18 imirkin_: aha! it's an opt-gone-bad that breaks tes clipdist input
15:18 imirkin_: that's nice
15:19 tobijk: imirkin_: hm which opt? what does it do
15:20 imirkin_: errrrrr
15:20 tobijk:is curious
15:20 imirkin_: wtf. it just started passing
15:20 karolherbst: okay, my stuff kind of works, as long as I don't move the instructions too much away
15:21 karolherbst: I thik I have to fix this OP_JOINAT first
15:21 karolherbst: and stuff
15:21 tobijk: karolherbst: if you move em far away you need too many registers, imho
15:23 tobijk: imirkin_: what am i missing for relaying cul lbehind clip: http://hastebin.com/ajuquvomah.coffee
15:23 tobijk: that should be enough i hope...
15:23 imirkin_: not sure what your question is
15:23 imirkin_: but i don't have time to look into it right now, sorry
15:26 imirkin_: tobijk: pushed out the cvt folding patch
15:26 imirkin_: tobijk: only took a year ;)
15:26 tobijk: imirkin_: we break every record with that :)
15:27 tobijk: imirkin_: huh why is tha my commit, its yours entirely now
15:27 imirkin_: wtvr, too late now
15:34 tobijk: imirkin_: are you awaare of some easy to use function to write from a piglit shader_test to the console (directly from the shaders)
15:34 tobijk: wost thing you can imagine for performance :D
15:36 glennk: tobijk, write color to pixel, use probe to check value
15:36 tobijk: glennk: ah right, thx :)
15:36 imirkin_: tobijk: i've often wanted this
15:36 imirkin_: tobijk: but no.
15:37 imirkin_: you have to do it through stupid colors/etc
15:37 imirkin_: atomic counters should make it interesting
15:37 imirkin_: since you can basically use them to be like "condition X was hit N times"
15:37 tobijk: will make a nice color map to interprete :D
15:38 glennk: can look at the dx12 vs integration and drool a bit in the meanwhile
15:41 RSpliet: you should be able to make OpenCL kernels do printf
15:42 RSpliet: it's horrendous, but nothing more than a buffer write being read out by a driver of choice (well, except NVIDIA since they don't support CL 1.2)
15:42 glennk: is that per workgroup?
15:43 imirkin_: RSpliet: yeah, i was thinking of doing something like that with GL too :)
15:43 imirkin_: once ssbo is supported
15:43 glennk: the hardware has some debugging support
15:43 glennk: breakpoints, stepping etc
15:43 imirkin_: glennk: yeah, nvidia has single-step etc
15:43 imirkin_: glennk: but it's not integrated
15:43 imirkin_: at least not in nouveau
15:43 glennk: so does intel and radeon
15:43 imirkin_: we know how to operate it though
15:44 imirkin_: (on fermi at least... kepler is a bit diff)
15:44 glennk: not sure how one would expose it
15:45 RSpliet: with an eclipse-forked IDE of course
15:46 RSpliet: I think you can write a GDB stub if you want to...
15:46 glennk: would make more sense in apitrace or vogl
15:46 glennk: need the frame capture for it to work
15:55 imirkin_: glennk: so this is what the nvc0 insbf opcode does: ((a->data.u32 << offset) & bitmask) | (c->data.u32 & ~bitmask)
15:55 imirkin_: do you think it's worthwhile searching for that pattern?
15:56 imirkin_: i think we should just rewrite packUnorm/Snorm using insbf and be good for the most part
15:57 imirkin_: maybe based on whether ARB_gpu_shader5 is supported
15:57 karolherbst: imirkin_: I can't move these joinat or PRE* instruction across fixed ones, can I?
15:57 tobijk: grml, i still overwrite clip with cull
15:58 imirkin_: karolherbst: mmmmmmm you probably can
15:58 imirkin_: karolherbst: but you shouldn't
15:59 glennk: imirkin_, yeah, the gs5 opcodes should cover it
15:59 karolherbst: imirkin_: okay, so I should just scan all instructions between fixed ones to find those, and just add them last?
15:59 karolherbst: in the same relative order
15:59 glennk: i was actually a bit surprised those pack/unpack functions were 4.1 and not 4.0
16:00 karolherbst: beacuse tobijk is right, these are causing these issues
16:00 glennk: karolherbst, i would presume "fixed" here means don't move any ops across it
16:00 karolherbst: I can now reschedule everything, maybe <10% aren't schedule and it works as expected
16:01 karolherbst: yeah, but imirkin_ also said these can be moved elsewhere inside a BB
16:01 karolherbst: so
16:01 karolherbst: what is stronger
16:01 imirkin_: karolherbst: yeah, i mean in general you can move them, but in practice you should leave them where they are
16:01 imirkin_: i.e. not treat them specially
16:02 glennk: think of it in terms of dependencies
16:02 glennk: fixed - has dependency on all prior instructions in bb
16:02 glennk: and all later instructions are dependent on it in turn
16:02 karolherbst: imirkin_: okay so these are just instructions which are best to be last
16:03 glennk: i don't understand what you mean by "last" here
16:03 imirkin_: karolherbst: don't treat them specially.
16:03 imirkin_: oh, but you can't randomly reorder them
16:03 imirkin_: i see the issue.
16:03 karolherbst: yeah
16:03 karolherbst: that's why
16:03 imirkin_: treat them as fixed then
16:03 karolherbst: mhh okay
16:04 imirkin_: and we'll sort out the details later
16:04 karolherbst: good idea by the way :/
16:04 karolherbst: should have thought about that too
16:04 karolherbst: decreases scheduling potentail, but well
16:05 glennk: sometimes one can speculate on values but that's more complex
16:07 karolherbst: imirkin_: OP_PRESIN and OP_PREEX2 too?
16:07 imirkin_: karolherbst: no, those are normal ops
16:08 karolherbst: okay
16:08 imirkin_: karolherbst: basically various math cleverness
16:08 glennk: sin/exp2 initial estimate?
16:08 imirkin_: i think presin normalizes the angle
16:08 imirkin_: and preex2 does... who knows.
16:10 karolherbst: okay, still get spilling issues :/
16:10 karolherbst: ahh I should reorder once
16:11 karolherbst: mhh
16:11 karolherbst: where is the right position for this?
16:11 imirkin_: outside the loop where it sits right now
16:11 imirkin_: (before it)
16:11 karolherbst: okay
16:12 karolherbst: okay...
16:12 karolherbst: "nouveau_compiler: ../../../../../src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp:322: void nv50_ir::RegAlloc::BuildIntervalsPass::addLiveRange(nv50_ir::Value*, const nv50_ir::BasicBlock*, int): Assertion `bb->getExit()->serial + 1 >= end' failed."
16:12 karolherbst: I guess this is validated somewhere
16:12 karolherbst: and I have to fill the list
16:13 imirkin_: you still might have to do the old logic
16:13 imirkin_: of the orderInstructions
16:13 imirkin_: in the spot where it used to be
16:13 karolherbst: yep
16:17 karolherbst: imirkin_: where do I have to save the list then? Because otherwise my changes are for nothing I assume
16:17 imirkin_: outside the loop
16:17 imirkin_: just once
16:18 karolherbst: yeah I understood that part
16:18 karolherbst: but the ArrayList where I write those instructions in, these have to put into the cfg
16:19 karolherbst: or should I just do it in orderInstructions then?
16:30 karolherbst: mhh, doesn't get better allthough I only run it once now
17:10 karolherbst: imirkin_: OP_LOAD is load from memory?
17:10 imirkin_: karolherbst: yeah
17:10 karolherbst: treat as fixed?
17:10 imirkin_: karolherbst: can be lmem though (which is where spills go)
17:11 imirkin_: no
17:11 imirkin_: er
17:11 imirkin_: yes.
17:11 karolherbst: for now
17:11 imirkin_: for now :)
17:11 karolherbst: :D
17:11 imirkin_: you could assume it depends on any previous store to the same memory file
17:11 imirkin_: or to be fancier, only to the same memory location or indirect store
17:12 karolherbst: sadly moving the stuff up causes serious other issues somewhere else :/ so I try rather not to destroy what the spilling logic did
17:12 imirkin_: but you can do that in v2
17:12 karolherbst: current state now is that I schedule everything, but get this spilling fails
17:13 karolherbst: what I don't understand is, why I get something like that: "0: linterp pass f32 %r5894 a[0x7c] (0)"
17:13 karolherbst: shouldn't be the reg number much lower?
17:17 karolherbst: ohh wow, after the first scheduling pass the instruction are so bad, that because of the load restriction I hardly get more then blocks of 10 instruction to schedule :/
17:17 karolherbst: imirkin_: OP_STORE also fixed?
17:18 karolherbst: ok, slightly better now
17:19 imirkin_: yes
17:19 imirkin_: er wait
17:19 imirkin_: OP_LOAD can also be a constbuf load
17:19 imirkin_: which isn't fixed
17:20 karolherbst: how can I check against that?
17:21 imirkin_: src->... something
17:21 imirkin_: grep around for FILE_
17:22 karolherbst: mhh in which file? or all?
17:22 karolherbst: ohh
17:22 karolherbst: I think I found it
17:22 karolherbst: FILE_MEMORY_CONST ?
17:23 karolherbst: I see stuff like "if (i->srcExists(0) && i->src(0).getFile() == FILE_MEMORY_CONST)"
17:23 karolherbst: ohhh srcExists...
17:23 karolherbst: what a nice function
17:23 imirkin_: yeah, basically you want to do like
17:24 imirkin_: for (i = 0; srcExists(i); i++)
17:24 imirkin_: isntead of the srcCount nonsense
17:24 karolherbst: okay
17:24 karolherbst: and then insn->getSrc(i) is never NULL
17:25 karolherbst: yep, looks much nicier
17:28 karolherbst: for load I have to check the src(0) against FILE_MEMORY_CONST I guess
17:29 karolherbst: okay, now its worse with the spilling again :/
17:32 karolherbst: imirkin_: want to see some stuff from simplier gpu code?
17:33 imirkin_: simpler?
17:33 imirkin_: not sure what you're suggesting
17:33 karolherbst: I mean this heaven shader is pretty big
17:33 imirkin_: right
17:33 imirkin_: it's not a good starting example
17:33 karolherbst: https://gist.github.com/karolherbst/58baf0b57ec88f54dd62
17:33 imirkin_: i just extracted it for you to help you test
17:33 karolherbst: this is glxspheres
17:33 karolherbst: "left to be scheduled 1" these are for indirects I think
17:33 karolherbst: because I don't handly them right now
17:34 karolherbst: ohh
17:34 karolherbst: nothing happens there
17:34 imirkin_: there are no indirects...
17:34 karolherbst: mhhh
17:34 karolherbst: maybe they are really not schedulable
17:34 karolherbst: but this should never happen :D
17:35 karolherbst: maybe all fixed?
17:35 imirkin_: lol
17:35 imirkin_: no
17:35 imirkin_: but read the code
17:35 imirkin_: each op depends on the result of the initial vfetch
17:35 imirkin_: and a bunch of interdependencies further on too
17:35 imirkin_: it's doing a matrix multiply
17:36 karolherbst: ohh
17:36 karolherbst: I disabled my scheduling in the code :(
17:37 karolherbst: updated the gist
17:37 karolherbst: "left to be scheduled 0" :)
17:37 karolherbst: nice
17:38 karolherbst: and why is this splitted now? :/
17:38 karolherbst: imirkin_: https://gist.github.com/karolherbst/a0e94c1919b5465e0515
17:39 imirkin_: what do you want me to look at
17:39 karolherbst: is there something really bad or odd and something
17:39 imirkin_: oh i see
17:39 karolherbst: or is it fine this way
17:39 imirkin_: sec
17:40 imirkin_: i'm a bit confused as to why your thing is running post-RA
17:40 imirkin_: it should run pre-RA
17:40 imirkin_: oh, it's the texbar thing
17:40 imirkin_: fix that up :)
17:40 imirkin_: i.e. call the old logic in insertTextureBarriers()
17:41 imirkin_: not that this would improve things, but "26: ld u32 %r188 c0[0x6c] (0)" can happen a lot earlier
17:41 imirkin_: i.e. it doesn't depend on anything
17:41 karolherbst: yeah I set load as fixed for now
17:41 karolherbst: or should I remove that again
17:42 karolherbst: ohh const
17:42 karolherbst: wait
17:42 imirkin_: const is const
17:43 imirkin_: you can load it whenever
17:43 karolherbst: updated: https://gist.github.com/karolherbst/a0e94c1919b5465e0515
17:45 karolherbst: "schedulables: 7" these prints just tell how many instructions were schedulable at the start of processing a chunk of instructions
17:48 imirkin_: cool
17:49 karolherbst: I try to move the tex instruction up a bit, want to see how the texbar behaves in that case
17:54 karolherbst: wow, also inserting the last instruction first really messes the compiler up :O
17:54 karolherbst: like infinite loop
18:00 karolherbst: ohh the tex instructions can't be moved up :/
19:14 karolherbst: okay, indirects also done now
19:18 karolherbst: imirkin_: any idea with what I should start now to schedule?
19:19 imirkin_: did you do the thing i said
19:19 imirkin_: or are you still computing lists and scheduling in bulk?
19:19 karolherbst: with bulk you mean what?
19:20 imirkin_: i mean... are you computing the "all ops without deps" bs and scheudling them first
19:20 imirkin_: and THEN scheduling other ops
19:20 imirkin_: or did you do the thing i said
19:20 karolherbst: not anymore
19:20 imirkin_: and keep track of a schedulable list at all times
19:20 imirkin_: and then schedule one instruction
19:20 imirkin_: update list
19:20 imirkin_: repeat
19:20 karolherbst: yeah should work now
19:20 imirkin_: ok cool -- so try to implement the random strategy
19:20 karolherbst: I don't have any leftovers in unigine
19:20 imirkin_: i.e. pick a *random* schedulable instruction
19:20 imirkin_: and schedule it
19:21 karolherbst: random ala pseudo or real random? :D
19:21 imirkin_: well, it should be a diff thing on every run
19:21 imirkin_: but you can seed it
19:21 karolherbst: okay
19:21 imirkin_: basically this will allow you to really verify your logic and have confidence in it
19:21 imirkin_: the next thing is to start keeping track of live values
19:21 imirkin_: i.e. how many values does this BB have in play
19:22 karolherbst: okay
19:22 imirkin_: (at the current scheduling-decision-making time)
19:22 karolherbst: by the way: using the last in the list didn't work
19:22 karolherbst: but ohh
19:22 imirkin_: i have no clue what that means
19:22 karolherbst: wait
19:22 karolherbst: this could have been an issue on my side
19:23 karolherbst: I mean I just get the last one from the list of schedulable instructions
19:23 imirkin_: that should work
19:23 imirkin_: if it dosen't, you're in trouble.
19:24 karolherbst: seems like I'm in trouble then :/
19:25 karolherbst: oh well
19:42 karolherbst: I am too tired now anyway
20:56 imirkin: skeggsb: good times with dp?
20:56 skeggsb: imirkin: wow, you noticed those quickly, i only just pushed it
20:56 skeggsb: debugging a board a guy in the office dumped on my desk :P
20:58 imirkin: i hit reload a lot :)
21:06 imirkin: there was also someone in bugs.fd.o who was having DP issues on a GM20x
21:07 imirkin: perhasp related
22:51 imirkin: skeggsb: btw, what do you think about karolherbst's patch to fix up that second clock on kepler? it seems to have made reclocking work for at least a few people with dGPU's... not 100% stable, but a whole lot more stable than before
22:52 imirkin: skeggsb: afaik it's this one: https://github.com/karolherbst/nouveau/commit/6933ebb2480bb62534648c180501c5bad6d2c514.patch