01:47 vducuy: Hi all, when i start ChromeOS UI. it show me error
01:47 vducuy: nouveau D[ DEVICE][0000:01:00.0][0x80000080] illegal class 0xb06f nouveau D[ DEVICE][0000:01:00.0][0x80000080] handle 0xcccc0000 not found
01:49 vducuy: after few tried and see nouveau D[chrome[4452]] ib channel create, -22
01:49 vducuy: anyone have ideas about this bug?
01:49 vducuy: thanks
02:01 karolherbst: imirkin: this list of instruction looks right, doesn't it? https://gist.github.com/karolherbst/82098b504204c61a9e2e I still get a range assertion error: bool nv50_ir::Interval::extend(int, int): Assertion `a <= b' failed. a=46-49 (different each run) and b=45
02:06 karolherbst: here is one full run: https://gist.github.com/karolherbst/bca061bc008ee21071c2
02:30 karolherbst: I guess something get scheduled which can't but why ...
02:31 karolherbst: 45 needs 47 first? :/ sad
02:31 karolherbst: okay, found it
04:32 karolherbst: ohh, I need to change the entry instruction of a block :O totally forgot about this
04:55 karolherbst: okay, so I get some dependencies not :/
05:26 RSpliet: shakesoda: sorry for pursuing you, but are you game with seeing if my patches fix reclocking to the highest level for you? I'd be really grateful if you can test it since I don't have a card that expects similar behaviour, but if you're not then I'll go and look for a different testing mechanism
06:04 karolherbst: mhhh strange
06:05 karolherbst: wow
06:05 karolherbst: imirkin: can OP_MUL have side effects?
06:06 karolherbst: these changes really mess glxgears up: https://gist.github.com/karolherbst/352c69840736f4c71b37
06:06 karolherbst: only some muls are switched
06:07 RSpliet: karolherbst: you missed a dependency
06:08 karolherbst: where
06:08 RSpliet: line 1043, 1044, 1045 require $0 before modified in line 1046
06:09 karolherbst: ohhhhhh
06:09 RSpliet: your scheduling pass flipped them around
06:09 karolherbst: that's a tricky one
06:09 karolherbst: I am sure it doesn't show up as a dep then
06:09 RSpliet: it isn't in SSA, but... after RA it gets more interesting :-P
06:10 karolherbst: mhh
06:10 karolherbst: so I have a const memory issue somewhere
06:11 karolherbst: RSpliet: will the instruction defs/source tell me this dependency?
06:11 karolherbst: all three cleare depend on mul ftz f32 $r0 $r0 c0[0x0] then
06:11 RSpliet: I don't know NV50IR that well
06:11 karolherbst: I have this issue somewhere else with other ops too
06:11 karolherbst: I try to treat const memory as fixed for now
06:12 RSpliet: basically you should assume the mul ftz f32 $r0 $r0 c0[0x0] (0) isn't schedulable yet, because there are remaining uses of $r0
06:13 RSpliet: liveness has two bounds :-P
06:14 RSpliet: not sure what the best way is to represent schedulability in a graph, I'm sure there's literature on it though
06:17 karolherbst: basically I do this:
06:17 karolherbst: find all instructions without dependencies inside a chunk of instructions
06:17 karolherbst: a chunk is between not schedulable instrutcions like exit, phi, fixed and so on
06:17 RSpliet: "chunk of instructions" being a BB?
06:17 karolherbst: smaller
06:18 karolherbst: so, then I have some schedulable instructions
06:18 karolherbst: of those I schedule one
06:18 karolherbst: then I check all defs of that instruction
06:19 RSpliet: well, but anyway, yes, it sounds likely that within any chunk of a program, if you reschedule while adhering to liveness you will not violate execution beyond that chunk
06:19 karolherbst: if one source of a def is still in my list of non schedulable instructions, I can't schedule this def
06:19 karolherbst: the others I add to my list of schedulables instructions
06:19 karolherbst: and go on
06:20 karolherbst: it also works as long as I take the first instruction of that list
06:20 RSpliet: so, that notion of "no dependencies" sounds a bit vague to me
06:20 karolherbst: if I take the last, I get issues
06:20 RSpliet: every instruction depends on it's inputs
06:20 RSpliet: whether it's a register or a register location
06:21 RSpliet: *memory location
06:21 karolherbst: RSpliet: https://github.com/karolherbst/mesa/commit/7060b186748325606c61fcd477c230057e1fb77e#diff-bb3cc04dda7921a13da7e4e48cc6166dR465
06:22 karolherbst: bad name: "isInList" its basically the list of instructions which are not schedulable yet in a chuk
06:22 RSpliet: sorry, I'm not in a position currently to read code
06:22 karolherbst: ohh I see
06:22 karolherbst: but do I have to check something else than getIndirect and getSrc ?
06:22 karolherbst: I mean I just check the instruction of those
06:23 karolherbst: but thanks for the pointer with the c memory access
06:23 karolherbst: wouldn't have found that
06:30 karolherbst: RSpliet: I bet you have to clue how I could check that?
06:32 RSpliet: well, if you look at the SSA you'll see that $r0 is originally %r113, and the write is to %r115
06:33 RSpliet: in other words, before RA they are not the same register (impossible in SSA, hence it's so nice for liveness analysis and thus optimisations)
06:35 RSpliet: that's not very helpful I guess :-P
06:35 karolherbst: ahhh
06:35 karolherbst: okay
06:36 karolherbst: imirkin alrady said it may be best to run only once
06:36 karolherbst: but somehow I really messed that up
06:36 karolherbst: and no in RA
06:36 RSpliet: before RA is definitely easier
06:36 karolherbst: yeah
06:36 karolherbst: I mean it also runs in pre RA
06:36 karolherbst: but it also runs in RA
06:37 RSpliet: that's probably undesirable regardless
06:37 karolherbst: wait a second
06:37 karolherbst: that's strange
06:38 karolherbst: the instructions are also sitched in pre-RA
06:38 RSpliet: that's what you get when you take the last in your block instead of the first :-P
06:38 RSpliet: but in pre-RA this is not invalid
06:38 karolherbst: but it has to work this way
06:39 karolherbst: because in the end a well choosen one is picked from that list
06:39 karolherbst: or a random one
06:39 karolherbst: and it shouldn't matter which one
06:39 RSpliet: we're talking past each other now
06:39 karolherbst: okay
06:40 karolherbst: " that's what you get when you take the last in your block instead of the first" I am not doing this
06:40 karolherbst: I create a list of schedulable instructions, which have no dependencies inside these or all the other and just take the last of them
06:41 RSpliet: okay, so not the last of the block, but the last of the list
06:41 RSpliet: which is those four instructions
06:41 karolherbst: mhh wait, I think Ifound the issue
06:42 karolherbst: ohh I don't check the list of schedulable instructions
06:42 karolherbst: ...
06:42 RSpliet: but, as I said, the same instruction shuffle in SSA is not incorrect
06:42 karolherbst: nice, works now
06:42 RSpliet: (just pointless, but we'll get to the meaningful scheduling later :-P)
06:42 karolherbst: https://github.com/karolherbst/mesa/commit/7060b186748325606c61fcd477c230057e1fb77e#diff-bb3cc04dda7921a13da7e4e48cc6166dR495 this line
06:43 glennk: scheduling before RA - lots of freedom, issues with register pressure, after RA, little wiggle room to do anything significant
06:43 karolherbst: had to add somethig like that "&& isSchedulable(insn, output, bb, schedulable)"
06:43 karolherbst: mhh
06:43 karolherbst: okay, issue found, nice
06:44 karolherbst: glxspheres64 also runs now
06:44 karolherbst: ohhhh well
06:44 karolherbst: if heaven runs now I am angry
06:45 RSpliet: I'd recommend a more joyous emotion, if you have the choice
06:45 karolherbst: I mean I knew it was some bs like this all along :D
06:46 karolherbst: and I shouldn't add something inside a bb which doesn't belong there
06:46 karolherbst: wow
06:46 karolherbst: this shouldn't happen
06:46 karolherbst: not only once
06:46 karolherbst: another bs issue
06:46 karolherbst: oh well
06:48 RSpliet: value your bs, you might want to fertilise your soil at some point
06:48 RSpliet: glennk: scheduling after RA feels like walking on eggs
06:54 tobijk: so pre-ra just schedule somehow and make another pass post-ra for minor optimizations?! :)
06:54 glennk: usually just spill/reloads that get scheduled post RA
06:55 tobijk: mh ok
06:55 RSpliet: glennk: I keep asking everyone I bump into: got literature? :-D
06:56 tobijk:would be interested as well :D
06:56 imirkin: karolherbst: your pass needs to run once, before RA
06:56 glennk: usually a good bet to look at what production compilers do and read the papers they refer to
06:56 imirkin: i didn't say that just for the fun of it -- it won't work otherwise.
07:02 karolherbst: yeah I tried and filed to do so :/ have to figure out how to feed the cfg with the list of instruction without messing something up
10:29 pmoreau: Shouldn't lines 1118 and 1119 be swapped in nouveau_bo.c? Or will the initialisation of .exec by {} be different from NULL?
10:34 imirkin_: link for the lazy
10:34 pmoreau: :D Have to fire up a browser and start an X session
10:34 pmoreau: But will do ;)
10:35 pmoreau: imirkin_: http://cgit.freedesktop.org/~darktama/nouveau/tree/drm/nouveau/nouveau_bo.c#n1118
10:42 xexaxo: pmoreau: iirc there was some issues reported so that one is intentional
10:42 pmoreau: xexaxo: Why not drop the line completely then?
10:43 imirkin_: pmoreau: ah yeah, it's been that way for a while
10:43 imirkin_: apparently PCRYPT is just way slower on nv98/nvaa/nvac
10:43 imirkin_: i haven't personally tested/seen this though
10:43 pmoreau: I saw that it was there from at least 2012 :)
10:43 imirkin_: i noticed this same issue like 2 years ago :)
10:43 pmoreau: :D
10:43 imirkin_: a comment would be nice
10:43 pmoreau: Yeah, +1
10:43 imirkin_: since it is a bit of a wtf
11:14 karolherbst: do I have to actually update this->insns or is this only there for RA and its already fine if I update the instructions inside the BB?
11:15 imirkin_: sorry, not sure
11:15 imirkin_: you'll have to work that out yourself i'm afraid
11:15 karolherbst: okay
11:16 karolherbst: because there is stuff like "BasicBlock::get(func->cfg.getRoot())->insertHead(nop);" going on before that loop
11:16 karolherbst: will try that out
11:16 karolherbst: seems the easiest way
11:25 karolherbst: wow seems to work
11:25 karolherbst: yeah
11:26 karolherbst: also getting them in reverse order works
11:26 imirkin_: nice
11:26 karolherbst: I should listen more to imirkin_ :D
11:26 imirkin_: i knew you'd learn eventually
11:26 imirkin_: ;)
11:26 karolherbst: allthough I knew I should do that, but gave up too early
11:27 karolherbst: current version: https://github.com/karolherbst/mesa/commit/3d4c29315573d973bab5182259c087631d3eb201
11:27 karolherbst: ohh right, for a stupid shader in heaven I fix the BB wrongly :/
11:27 karolherbst: will get to that later
11:28 imirkin_: you should leave const OP_LOAD's
11:33 karolherbst: completly?
11:33 imirkin_: i mean, you shouldn't treat them as fixed
11:33 karolherbst: ahh there are still the MAD thing in :/
11:33 imirkin_: when they load constbuf values
11:33 karolherbst: yeah I know
11:33 karolherbst: okay
11:33 karolherbst: yeah was trying out stuff, there are a lot of leftovers
11:33 imirkin_: ok
11:34 karolherbst: if picking random works for sure I will clean up a lot :D
11:34 imirkin_: what's wrong with MAD?
12:44 karolherbst: imirkin_: testing why glxgears failed
12:44 karolherbst: but its okay now
12:44 imirkin_: various opcodes have various restrictions, but i can't imagine why MAD would be in any way special wrt order of operations
12:44 imirkin_: ah ok
12:45 imirkin_: like most opcodes can only load immediates/constbufs from the second source
12:45 imirkin_: (and there's various logic to flip things around to try to make it happen, but it's not perfect... sadly has funny interactions with CSE)
12:46 karolherbst: https://github.com/karolherbst/mesa/commit/007e1eb7be89bed2166bb2eb7d6f1a6c1d448909
12:47 imirkin_: does heaven work?
12:48 imirkin_: iirc there's a DefIterator type fyi
12:48 imirkin_: also please use ++it instead of it++
12:49 imirkin_: it++ can generate more code
12:49 imirkin_: [at least with C++98, not sure if it's been fixed in C++11]
12:50 RSpliet: imirkin_: well, it's still two different operations, but I can't imagine compilers don't optimise away the case where the value is unused
12:50 imirkin_: also things like alreadyScheduled should be an unordered_set, not a list, no?
12:50 imirkin_: RSpliet: well, iirc it was an issue with at least gcc 4.1
12:51 imirkin_: RSpliet: i think that the ++ does something which makes it very hard to optimize away
12:51 RSpliet: that's like saying there's an issue with nouveau on kernel 3.13 :-p
12:51 karolherbst: got nothing after my last comment (will lookup in log=
12:51 imirkin_: RSpliet: ssssort of. i think the pace of gcc improvements is much lower.
12:51 RSpliet: var++ means "return old, increment", ++var means "increment, return new", I reckon the first one might require an extra register
12:51 imirkin_: RSpliet: heh, if only
12:52 karolherbst: mhh
12:52 imirkin_: RSpliet: actually ++it means "call void operator++()" while it++ means "call Foo& operator++()"
12:52 RSpliet: hah, oh yes, C++
12:52 imirkin_: or something along those lines
12:52 karolherbst: yeah I usually also use stuf++
12:52 imirkin_: and the two things need to do different things
12:52 karolherbst: don't know why I did that different there
12:53 imirkin_: and the ++it thing does less than it++
12:53 karolherbst: but for c++ its all a different stories
12:53 RSpliet: imirkin_ they need to do precisely what I explained, but in order to do the first you can't get away with just a value and return type, but you need a ptr to your value
12:53 karolherbst: because its an operator call anyway
12:53 imirkin_: yeah, in C i always just do x++ (unless there's reason to the contrary)
12:53 imirkin_: while in C++, for iterators, always do ++it
12:53 imirkin_: because it++ does extra work that's not necessary
12:54 karolherbst: I think ++stuff is actually using stuff.operator++() too
12:54 imirkin_: or at least it used to
12:54 imirkin_: karolherbst: yeah, just a diff one
12:54 imirkin_: karolherbst: there are two diff operator++'s
12:54 karolherbst: ohh really
12:54 karolherbst: that sounds like a lot of error fun
12:54 imirkin_: Point& operator++(); // Prefix increment operator.
12:54 imirkin_: Point operator++(int); // Postfix increment operator.
12:54 imirkin_: that's the diff between them
12:54 karolherbst: ohhh
12:54 imirkin_: so postfix creates a *copy* of the object
12:55 karolherbst: yeah
12:55 karolherbst: its a noop in c++11
12:55 imirkin_: the "int" arg is totally fake, just to distinguish the two cases
12:55 karolherbst: but even in c++98 its a noop
12:55 imirkin_: since you can't do return value-based overloading
12:55 karolherbst: compiler are allowed to optimize it
12:55 karolherbst: even if it changes code
12:55 karolherbst: I played around with copies and stuff
12:55 imirkin_: karolherbst: yeah, but it still has to use the copy constructor/etc
12:55 karolherbst: like printing stuff into std::cout
12:55 karolherbst: nope
12:56 karolherbst: this is optimized away
12:56 karolherbst: even prints disappear randomly
12:56 imirkin_: anyways, the final thing was that there were things that gcc couldn't get rid of when doing "it++"
12:56 imirkin_: so you should always just do "++it"
12:56 imirkin_: karolherbst: not randomly. the rules are very determinstic (but very complex)
12:56 karolherbst: the compiler is allowed to change state and operations with copies
12:56 karolherbst: yeah okay.
12:57 karolherbst: but I am sure with gcc it doesn't matter
12:57 RSpliet: karolherbst: one way to find out
12:57 karolherbst: if you don't use the copy, there will be never made one
12:57 RSpliet: objdump it
12:57 karolherbst: also return copy() is optimized away
12:58 karolherbst: okay
12:58 RSpliet: hehe, well, I didn't explicitly mean "you have to do it", rather just "I'm too lazy to" :-P
13:00 karolherbst: but yes it++ should generate a copy. I think with O1 it makes a difference?
13:00 imirkin_: karolherbst: i'm sure it used to matter. i haven't checked lately.
13:00 imirkin_: easy enough to just do ++it and move on
13:02 karolherbst: and actually no, heaven doesn't run yet
13:02 karolherbst: but this seems to be my fault
13:02 karolherbst: heaven_x64: ../../../../../src/gallium/drivers/nouveau/codegen/nv50_ir_bb.cpp:247: void nv50_ir::BasicBlock::remove(nv50_ir::Instruction*): Assertion `insn->bb == this' failed.
13:03 karolherbst: I would be glad to hear better ideas for finalizeList
13:03 karolherbst: ohh
13:03 karolherbst: I bet I call it with an empty list
13:03 karolherbst: ohh now I don't
13:03 karolherbst: *no
13:05 karolherbst: uhh, its something inside post RA
13:05 imirkin_: ok...
13:05 imirkin_: are you saying that post ra can't be affected by pre-ra? :)
13:06 karolherbst: I thought the error came from on of my calls to remove
13:07 karolherbst: at least something like glxspheres runs :/
13:09 karolherbst: hey ...
13:10 karolherbst: now my serials are messed up
13:17 karolherbst: shouldn't make bb->insertHead(current); current the new entry?
13:21 karolherbst: ohhh I see now
13:26 karolherbst: I should with front() in finalize list and not change it to back :O
13:26 karolherbst: *start
13:28 karolherbst: nice, furmark runs with scheduling from behind
13:28 karolherbst: no performance penalty
13:29 imirkin_: does it use shaders with multiple instructions? i thought it was just fillrate
13:29 karolherbst: furmark is a bit intense
13:29 karolherbst: around 40 fps here
13:29 karolherbst: windowed
13:30 karolherbst: but it seems also highly memory limited
13:30 karolherbst: ohh
13:30 karolherbst: several 100+ instruction big programs
13:31 karolherbst: seems like a good start to play around?
13:31 karolherbst: some tex instructions
13:31 karolherbst: presin
13:32 karolherbst: split?
13:32 karolherbst: what does split do
13:32 imirkin_: it's a virtual op
13:32 glennk: imirkin_, furmark tries to mix fill and shader to burn as much power as possible
13:32 karolherbst: the one program actually starts with 12 schedulables
13:33 imirkin_: it converts a 2- 3- or 4-wide register into N 1-wide ones
13:33 glennk: that said on more recent cards its probably more of a fill test than a shader one
13:33 imirkin_: and merge is the opposite of that
13:33 karolherbst: okay
13:33 karolherbst: wanna see the programs?
13:33 imirkin_: sure
13:33 imirkin_: did you try the random thing with furmark?
13:34 imirkin_: you should do that + print stuff out + record measurements
13:34 karolherbst: imirkin_: https://gist.github.com/karolherbst/8f5f8cbb7903e7908d8b
13:34 imirkin_: maybe we'll find a really awesome schedule for it and would be good to know what it was
13:34 karolherbst: mhh I tried back() now
13:34 imirkin_: heh
13:34 karolherbst: I mean, this is with back
13:35 karolherbst: also nice "Failed to release test userptr object! (9) i915 kernel driver may not be sane!"
13:35 karolherbst: for a second I was worried
13:36 karolherbst: imirkin_: 07 pstate 10fsp, 0f pstate 42 fps
13:36 karolherbst: actually 50
13:37 karolherbst: 0a "only" 20
13:37 karolherbst: so I guess this benchmark is highly memory limited
13:37 karolherbst: lowest cstate on 0f: 33fps
13:38 karolherbst: wow, actually this is a nice benchmark to play with
13:38 karolherbst: core improves performance until 730 MGz
13:38 karolherbst: after that no change
13:39 karolherbst: imirkin_: I was thinking if it would improve anything if we move the tex instruction far away from texbar
13:39 karolherbst: if possible
13:40 imirkin_: karolherbst: yeah, but that requires cleverness
13:40 imirkin_: for now, just pick a random element
13:41 karolherbst: okay
13:41 karolherbst: ohh mhhh
13:41 karolherbst: about thet unordered_set idea
13:41 karolherbst: don't know if its a good think currently
13:41 karolherbst: because hashing pointers => meh
13:41 imirkin_: ?
13:42 karolherbst: allthough, why not
13:42 karolherbst: right
13:42 imirkin_: worried about 36-bit pointers that don't fit into 32-bit words for when you plug this into a VAX?
13:42 karolherbst: nope
13:42 karolherbst: more about randomness thingies
13:42 karolherbst: using front() on a unordered_set should be already a little bit random or not?
13:42 karolherbst: if there is front
13:43 imirkin_: well, i'm asking you to pick a random element, so i'm clearly not too concerned about randomness ;)
13:43 karolherbst: :D
13:43 imirkin_: anyways, you could stick them in hashed by their id
13:43 imirkin_: iirc that's stable
13:43 imirkin_: across invocations
13:43 imirkin_: and you can get them back out based on id as well
13:43 imirkin_: i think it's an index into the arena from which they're allocated
13:43 imirkin_: [hence the new_Instruction/etc idiocy]
13:44 karolherbst: nah I just them by value
13:44 imirkin_: it has been a while since i looked into how that stuff worked tbh
13:44 imirkin_: i think i glanced at it, decided it was fine, then promptly forgot the details
13:44 karolherbst: std::counter always work on the value
13:44 karolherbst: std container
13:44 karolherbst: and if the value is a pointer, the pointer value will be hashed
13:45 imirkin_: right
13:45 karolherbst: these are not intented for use with pointers
13:45 imirkin_: but my point is you could hash on id
13:45 imirkin_: if you want the thing to be determinstic
13:45 karolherbst: mhhh
13:45 karolherbst: this is messy
13:45 imirkin_: with a custom hash thingie
13:45 karolherbst: but yeah, I could do that
13:47 karolherbst: ohh right, sets are maps with value == key
13:47 imirkin_: :)
13:48 karolherbst: I always miss the get() operator
13:48 imirkin_: ->find
13:48 imirkin_: usually one also creates GetWithDefault or GetOrNull helpers
13:49 karolherbst: yeah but for find you need the value
13:49 karolherbst: you get the iterator for a value
13:49 karolherbst: unordered_set::find is something like exists
13:50 imirkin_: oh yeah, isn't that a thing?
13:50 karolherbst: for sets its odd
13:50 imirkin_: no you still need to find
13:50 imirkin_: and check that find != end
13:50 karolherbst: yeah I know
13:51 karolherbst: but you find with a value which may be in the set already
13:51 karolherbst: I mean, yeah its okay and right this way
13:51 karolherbst: but not like get()
13:51 karolherbst: like in I want a random value out of this set, because set don't have something like that
13:52 karolherbst: no front() or back()
13:52 imirkin_: no random algorithm?
13:52 karolherbst: not easily
13:52 karolherbst: especially not with o(1) access
13:53 karolherbst: for sets its even o(n)
13:53 karolherbst: because you have to iterate and count
13:53 imirkin_: yea
13:53 karolherbst: there is only it begin()
13:53 imirkin_: for that you need a list
13:53 karolherbst: yes
13:53 imirkin_: er, vector
13:53 karolherbst: right
13:53 karolherbst: so the question is: is set the right type for that anyway?
13:54 karolherbst: I mean, what will be do mostly with the instructions: we will always read all of them
13:55 karolherbst: and then pick the best
13:55 karolherbst: and remove it and insert new stuff
13:55 karolherbst: so remove and insert should be O(1)
13:55 karolherbst: and iteration O(n)
13:56 RSpliet: if every iteration is O(n), you'll end up with a O(n^2) algo
13:56 karolherbst: yeah well
13:56 imirkin_: well, ultimately you want a pqueue
13:56 karolherbst: how can you iterate faster over a list if you need to read all?
13:56 karolherbst: yes
13:56 imirkin_: also the size of that list should ideally never be too large
13:56 RSpliet: the trick is to be clever and not read all :-P
13:56 karolherbst: right
13:57 imirkin_: so it'll be more like O(n*m)
13:57 karolherbst: so we should give it a priotiry at insertion time?
13:57 imirkin_: karolherbst: wellllllll heh
13:57 imirkin_: maybe
13:57 karolherbst: ...
13:57 imirkin_: i dunno.
13:57 karolherbst: sounds wrong
13:57 imirkin_: it's a hard problem.
13:57 karolherbst: because we have to know what the others are
13:57 imirkin_: so for now, doing something a little slow is ok
13:57 imirkin_: and once we know what we want
13:57 imirkin_: we can optimize it
13:57 karolherbst: for example: does it make sense to start a async memory operation, if we only put memry reads after it
13:58 karolherbst: RSpliet: usually the list is pretty small
13:58 karolherbst: will check with the heaven thingy
14:00 karolherbst: okay, with that 4000 instruction propgram, 200 was max size
14:01 imirkin_: so i think the basic idea has to be around tracking the current number of live values
14:01 karolherbst: 82 with back scheduling
14:01 imirkin_: we can decree that e.g. 16 is the "right" number of live values to have
14:01 imirkin_: so so you can be in either live value creation mode
14:01 imirkin_: or in live value "closing" mode
14:01 imirkin_: depending on thresholds
14:02 karolherbst: mhh
14:02 imirkin_: also you can use the throughput values to keep track o fwhen things complete
14:02 imirkin_: er, latency
14:02 karolherbst: doesn't it rather depend on how many registers I have?
14:02 imirkin_: well
14:02 karolherbst: ohh right
14:02 imirkin_: so going over max registers is REALLY bad
14:02 karolherbst: parallelism?
14:02 imirkin_: but if you use fewer regs, you get higher parallelism
14:02 karolherbst: ahhh
14:02 karolherbst: I see
14:02 karolherbst: so best is to use only 1 :D
14:03 imirkin_: i doubt it works quite like that
14:03 karolherbst: :p
14:03 karolherbst: yeah I know
14:03 RSpliet: if spills were free
14:03 RSpliet: :-P
14:03 imirkin_: i suspect that there are allocation granularities, but i don't know what they are
14:03 karolherbst: hey my memory is higher clocked than core
14:03 karolherbst: :p
14:03 RSpliet: yeah, but it's like centimeters away!
14:04 karolherbst: ohh well
14:04 karolherbst: I am aways scared to open US websites
14:04 karolherbst: because they are so far away :p
14:04 RSpliet: that's why you have Akamai, the worlds L2
14:04 karolherbst: ohhhh right
14:04 karolherbst: that's why
14:05 karolherbst: isn't there a akamai for my gpu too?
14:05 imirkin_: in your gpu, in fact :)
14:05 RSpliet: there's definitely an L2, but expect 30 cycle latencies
14:05 karolherbst: mhhh
14:05 karolherbst: is it currently used?
14:06 RSpliet: I assume it's on by default and transparent
14:06 karolherbst: okay
14:06 imirkin_: so like constbufs are crazy-cached
14:06 karolherbst: still have to fix heaven somehow
14:06 karolherbst: but there are too many spills
14:06 imirkin_: i have no experience with actual gmem read/write
14:06 imirkin_: oh yeah, lmem is super-cached too
14:07 imirkin_: but definitely slower than registers
14:07 imirkin_: or immediates
14:07 RSpliet: imirkin_: is that in the... (hmm, terminology), 16KB of unmanaged L2-memory?
14:08 RSpliet: iirc correctly, NVIDIA had this block of 64K (at some point in time, numbers might have changed), which you could configure as 48KB L2/16KB storage or 16KB L2/48KB storage
14:08 karolherbst: ...
14:08 imirkin_: RSpliet: not sure.
14:08 imirkin_: RSpliet: there's a whole bunch of stuff with semi-fixed alloc
14:08 imirkin_: like the call stack
14:09 imirkin_: but if you go over a certain size, you fall off into memory
14:10 karolherbst: do you see something wrong here? https://gist.github.com/karolherbst/0ef5684ab67d0e7b09de
14:11 imirkin_: 0: nop u32 %r933 (0)
14:11 imirkin_: urgh, i hate those
14:11 imirkin_: stupid undefs
14:11 imirkin_: (not your fault)
14:11 karolherbst: yeah I know
14:11 karolherbst: but I get an assertion error
14:11 karolherbst: nouveau_compiler: ../../../../../src/gallium/drivers/nouveau/codegen/nv50_ir_bb.cpp:247: void nv50_ir::BasicBlock::remove(nv50_ir::Instruction*): Assertion `insn->bb == this' failed.
14:11 karolherbst: want to fix this before I proceed
14:12 imirkin_: that means you stick some instruction into the wrong bb
14:12 imirkin_: hmmm
14:13 karolherbst: at least all BB doesn't change in size
14:13 karolherbst: and its really unlikely that it will happen with my change :/
14:13 imirkin_: yeah, just double-checked that too
14:13 karolherbst: I think I missed a prev / next somewhere
14:13 karolherbst: or did something wrong there
14:14 imirkin_: yea dunno
14:14 imirkin_: perhaps you're linking things you're not supposed to
14:14 karolherbst: maybe
14:14 imirkin_: figure out what inst it's dying on
14:14 karolherbst: will dump the list again
14:14 karolherbst: ohh right
14:14 imirkin_: and what bb it links to
14:14 imirkin_: and what this bb is
14:14 imirkin_: gdb is nice ;)
14:16 karolherbst: serial 79
14:17 karolherbst: ohh
14:17 karolherbst: thats only valid inside a bb
14:17 karolherbst: its a phi
14:18 karolherbst: with bb==NULL
14:18 imirkin_: which phi
14:18 karolherbst: this is od
14:18 karolherbst: d
14:18 karolherbst: next == prev == NULL
14:18 imirkin_: coz it's been removed probably
14:19 karolherbst: okay
14:19 karolherbst: bb id 3
14:20 karolherbst: yeah serial 79 would fit :/
14:20 karolherbst: okay
14:20 karolherbst: nice
14:20 karolherbst: "79: phi u32 %r528 %r965 %r968 (0)" this one then?
14:20 imirkin_: so do you have an off-by-one with your phi handling?
14:21 karolherbst: for (Instruction *insn = bb->getFirst(); insn != bb->getEntry(); insn = insn->next)
14:22 imirkin_: do you update things like bb->getEntry
14:22 karolherbst: yeah
14:22 karolherbst: thorugh bb->insertHead()
14:24 karolherbst: but the bb entry looks fine
14:24 karolherbst: bb->entry->serial == 80
14:24 imirkin_: ok, well i dunno what's happening
14:24 imirkin_: but clearly you're doing something to upset it :)
14:24 karolherbst: yeah
14:25 karolherbst: I could assert that I never touch any phis
14:25 imirkin_: perhaps you're forgetting to update some other bit of structure
14:25 karolherbst: do you want to look into finalizeList?
14:26 karolherbst: ohh wait
14:26 karolherbst: now I don't know if thats true
14:26 karolherbst: does entry point to a phi instruction in prev?
14:27 imirkin_: i assume so
14:27 karolherbst: mhh no, I push all the phis into the list
14:28 karolherbst: but the phi still has prev = next => NULL
14:30 karolherbst: yeah, I set those right
14:30 karolherbst: mhh
14:40 karolherbst: it actually happens in NVC0LegalizePostRA::visit
14:40 karolherbst: so it tries to remove this phi operation
14:41 karolherbst: imirkin_: did you see the bioshock infinite crash fix?
14:42 karolherbst: phh you saw
14:43 karolherbst: now even radeonsi users can enjoy the game :D
14:44 karolherbst: watch points wtf
14:48 RSpliet: whoa, and someone dropped a crate of Nine fixes
14:48 foul_owl: Hi folks. I'm trying to set up four monitors with my two nvidia cards
14:49 RSpliet: foul_owl: and in what part of the "ZaphodHeads" manual did you fail?
14:50 foul_owl: Well I'm reading this: http://nouveau.freedesktop.org/wiki/MultiMonitorDesktop/
14:50 foul_owl: I'm trying to map the connector names in /sys/class to the physical connectors
14:51 karolherbst: oh wow
14:51 karolherbst: is xinerama actually required for dual gpu?
14:51 RSpliet: foul_owl: isn't it easier to call xrandr to find out?
14:51 RSpliet: oh... ok, the manual states otherwise :-)
14:52 foul_owl: I would prefer to use some sort of tool, yes
14:52 karolherbst: isn't xrandr the better way to do this anyway?
14:52 RSpliet: karolherbst: no, you need a proper xorg.conf for multi-card set-ups
14:52 karolherbst: ohh I see
14:53 RSpliet: foul_owl: tools aren't going to automate the work for you
14:53 karolherbst: pitty
14:53 RSpliet: foul_owl: but why is the mapping between physical connectors and names in sysfs not trivial?
14:54 foul_owl: I couldn't figure out which card is which
14:54 karolherbst: try it out and you'll see
14:55 karolherbst: but doesn't nouveau reports this?
14:55 foul_owl: I figured it out now
14:55 RSpliet: well, if the cards are identical then the computer doesn't even know :-P
14:56 foul_owl: My question though is about the syntax of the pci line in 20-nouveau.conf
14:56 RSpliet: you mean the BusID?
14:56 foul_owl: Yeah
14:56 foul_owl: It's shown in the docs as "PCI:1:0:0"
14:56 foul_owl: Whereas the output of lspci shows: 01:00.0
14:57 imirkin_: foul_owl: actually it might be that xrandr names are required
14:57 imirkin_: foul_owl: also 01 would work just as well as 1
14:57 foul_owl: How do I get the xrandr names of the cards?
14:57 imirkin_: foul_owl: do you have an xorg log + xorg conf that your'e failing with?
14:57 karolherbst: imirkin_: somehow phi->next == phi
14:57 karolherbst: don't know where this comes from
14:57 imirkin_: foul_owl: by knowing very carefully how the driver works. i.e. you have no good way ;)
14:58 RSpliet: imirkin_: to be fair I don't know which of the four components in /sys/bus/pci_express/devices entries I should pick either :-P
14:58 imirkin_: foul_owl: but it's BASICALLY the kernel names
14:58 imirkin_: RSpliet: that thing is crazy... the first component is the root complex
14:58 RSpliet: would 0000:00:01.0:pcie01 translate to PCI:0:0:1 or PCI:0:1:0? :-D
14:58 imirkin_: for the apparently super-common case where you actually have *MULTIPLE PCIe ROOTS*
14:58 imirkin_: that would translate to 0:1:0
14:59 imirkin_: that first 0000 is for your 65536 pcie root complexes
14:59 RSpliet: the brochure said I had that in my laptop...
14:59 foul_owl: My system boots fine to x, but only two monitors are working, both off one of my cards
14:59 imirkin_: foul_owl: pastebin xorg log and xorg.conf
14:59 foul_owl: There is no xorg.conf
14:59 foul_owl: I thought it was deprecated years ago
15:00 imirkin_: foul_owl: i thought youw ere following the instructions for zaphod heads
15:00 imirkin_: anyways, i think you can also use prime offloading to make it work
15:00 foul_owl: Yes, creating a 20-nouveau.conf from scratch
15:00 imirkin_: but both paths are sadly full of fail
15:00 imirkin_: just depends on which fail you're lookign to minimize :)
15:01 RSpliet: foul_owl: oh... well, that's the same thing as an xorg.conf, only distributed over potentially multiple files
15:01 foul_owl: Gotcha
15:01 imirkin_: the nice thing about xinerama is that it works
15:02 imirkin_: the downside is that you lose direct rendering
15:02 foul_owl: I just need to know BusID lines and the zaphodhead lines
15:02 imirkin_: foul_owl: lspci -nn -d 10de:
15:02 foul_owl: I _think_ I have it correct
15:02 foul_owl: Same syntax as the output from lspci? Or so I have to change those periods to colons?
15:02 RSpliet: change the period to a colon, just to be sure...
15:02 foul_owl: Then my question about the zaphodhead lines
15:03 foul_owl: In /sys/class/drm/card2/ I see card2-DVI-I-3
15:03 imirkin_: foul_owl: grep . /sys/class/drm/card*-*/status
15:03 foul_owl: In the config file, should I put "card2-DVI-I-3" or "DVI-I-3"
15:03 RSpliet: the latter
15:03 imirkin_: DVI-I-3 i think
15:03 foul_owl: Oh damn, that last grep you sent is awesome
15:03 foul_owl: Ok perfect
15:04 foul_owl: I'll come back if I can't get it, just had those sytax questions. Thanks everyeon!
15:06 RSpliet: I guess we should write out those parameter extraction methods in a nice unambiguous chomsky normal form
15:07 karolherbst: imirkin_: found my issue
15:07 karolherbst: bb->phi->prev == bb->entry->prev
15:07 karolherbst: I guess bb->phi->prev should be always NULL?
15:10 imirkin_: i mean... it should be a list
15:10 imirkin_: the very first one should be null yeah
15:11 imirkin_: (i think)
15:11 imirkin_: i'm not really sure how all that linkage is done
15:11 imirkin_: that stuff just sorta works
15:11 imirkin_: i haven't touched it much :)
15:11 karolherbst: yeah
15:11 karolherbst: I created a cycle among the phis
15:11 imirkin_: you can look at nv50_ir_build_util.cpp
15:11 karolherbst: this sounds bad
15:11 imirkin_: which knows how to add/remove instructions
15:11 imirkin_: and presumably keeps things consistent
15:12 karolherbst: ohh wait, maybe
15:12 tobijk: are the bb's linked or was this just thought as a nice must have and was never done? :D
15:12 karolherbst: I think insertHead doesn't work for PHIs
15:14 karolherbst: ....
15:15 karolherbst: ohhh
15:15 karolherbst: p->prev = q->prev; in insertBefore
15:15 karolherbst: well, this should stay NULL
15:16 karolherbst: this sounds hacky now, but well
15:17 tobijk: you have something not "hacky"? :)
15:18 karolherbst: a lot
15:19 tobijk: karolherbst: sorry dont want to offend you
15:21 tobijk: nice input for clip = 1,2,3,4 and cull = 5,10,15,20:
15:21 tobijk: IMM[0] FLT32 { 0.0000, 1.0000, 5.0000, 2.0000}
15:21 tobijk: IMM[1] FLT32 { 10.0000, 3.0000, 15.0000, 4.0000}
15:21 tobijk: IMM[2] FLT32 { 20.0000, 0.0000, 0.0000, 0.0000}
15:21 tobijk: that is off :O
15:22 imirkin_: how so?
15:23 tobijk: if i'd know ;-)
15:28 shakesoda: RSpliet: I won't be able to test your patches until at least next week :(
15:31 RSpliet: shakesoda: that's all right, please do! thanks for the notification :-)
15:38 karolherbst: imirkin: the final program binary can change depending on the order of schedule instructions?
15:40 imirkin_: karolherbst: not sure what your question is
15:40 imirkin_: is your question whether differnet code generates different binaries?
15:41 imirkin_: if so, yes :)
15:41 karolherbst: nope, if for the same code, depending on scheduling the binary can be different in length
15:41 imirkin_: ah, different *length*
15:41 imirkin_: well, with spilling, easily
15:41 karolherbst: is RA deterministic?
15:42 imirkin_: i think so, yes
15:42 imirkin_: what scheduling are you talking about?
15:42 imirkin_: are you talking about reordering instructions?
15:42 karolherbst: change between front first or back first
15:42 karolherbst: yeah
15:43 imirkin_: so not *the same program*
15:44 imirkin_: if you change it so that there are enough live values that i thas to spill
15:44 imirkin_: then spilling generates loads/stores
15:44 karolherbst: mhh okay
15:44 imirkin_: also, depending on instruction order you might have more or less texbar's
15:44 imirkin_: those are about the only things that i can think of that would alter the code due to different ordering
15:45 karolherbst: mhh the last print sometimes has " 1: vfetch u32 $r5 a[0x18]" and sometimes "1: vfetch u32 $r4 a[0x18] (8)" is this an "issue" or may RA depend on various stuff which may make it not exactly deterministic?
15:45 imirkin_: RA is fully deterministic for a particular program
15:45 imirkin_: however changing the order of instructions changes the program
15:46 imirkin_: so you end up with diff results
15:47 karolherbst: I don't find it, but somewhere there is some minor randomness in my code :/
15:47 karolherbst: I think std::list orders by value, but not sure about that
15:48 imirkin_: list is ordered by insertion...
15:48 tobijk: and you have unordered_set
15:48 imirkin_: set orders by value
15:48 tobijk: that is random :)
15:48 karolherbst: not anymore
15:48 karolherbst: only list
15:48 tobijk: ah ok
15:53 karolherbst: oh wow, that's odd. sometimes my reordering produces different outputs
15:54 karolherbst: but I don't see that I depend on uninitialized stuff
15:54 tobijk: maybe the inut is already off then
15:54 karolherbst: nope
15:54 karolherbst: input is fine
15:54 karolherbst: will check again
15:55 karolherbst: nope, input is the same
15:55 karolherbst: ohh wait
15:56 karolherbst: no, I always use push_back
15:59 karolherbst: ha found that
15:59 karolherbst: again begin() on empty list :(
15:59 karolherbst: meh
16:01 karolherbst: imirkin_: ohh right, I didn't aks you yet. I was thinking about the PCIe API and had the idea to add 3 functions: changePCIVersion, changePCILinkCap, changePCILinkStatus
16:02 karolherbst: what do you think about this idea?
16:02 imirkin_: karolherbst: lose the lowerCamelCase :)
16:02 karolherbst: ohh right C
16:02 imirkin_: karolherbst: what's the diff between link cap and link status?
16:02 karolherbst: change_pci_version, change_pci_link_cap, change_pci_link_status :p
16:02 karolherbst: mhh not sure yet
16:02 imirkin_: i think there should just be one api
16:02 karolherbst: but the cap is the cap, and status the current speed
16:03 imirkin_: ->request_speed()
16:03 imirkin_: which takes either 2.5, 5.0, or 8.0
16:03 karolherbst: or 16.0 :p
16:03 imirkin_: we should upgrade to pci v2 automagically
16:03 imirkin_: on init
16:03 RSpliet: as enum values?
16:03 karolherbst: yeah was thinking about that too
16:03 imirkin_: RSpliet: well def not as floats ;)
16:03 RSpliet: :-D
16:03 karolherbst: I would also add PCI_SPEED_MAX to the enum
16:04 karolherbst: because 8.0 => 16.0 transition might be ugly otherwise
16:04 RSpliet: why?
16:04 karolherbst: if we have hardcoded 8.0
16:04 RSpliet: wouldn't the VBIOS always dictate the exact speed requested?
16:04 karolherbst: mhh
16:04 karolherbst: I don't think so
16:04 imirkin_: or the user
16:04 karolherbst: is there anything in the vbios about this ?
16:05 imirkin_: anyways, i have no problem with a MAX request as well
16:05 imirkin_: yeah
16:05 RSpliet: yes. performance modes have a link speed associated
16:05 imirkin_: pstate specifies it
16:05 karolherbst: ohh right
16:05 imirkin_: but there could also be something in debugfs
16:05 imirkin_: i don't mind
16:05 karolherbst: yeah
16:05 karolherbst: I plan to add a bunch of stuff there
16:05 imirkin_: imho all the internal nvkm components should gain debugfs stuff
16:05 karolherbst: pstate, cstate, mem clock, stuff
16:06 karolherbst: but coupling it to pstate seems to make sense
16:06 karolherbst: the blob does it too
16:07 RSpliet: well, the actual code belongs in nvkm/subdev/pci
16:07 RSpliet: pstate should be the one making the calls
16:08 karolherbst: yes
16:08 karolherbst: question
16:08 imirkin_: oh, there should also be a lane width thing in there
16:08 karolherbst: what should be done, if 8.0 is requsted, but it fails
16:08 imirkin_: except we don't know how to set it for kepler+
16:08 imirkin_: (right?)
16:08 karolherbst: yeah
16:08 karolherbst: but the blob doesn't change it
16:08 karolherbst: for kepler
16:09 imirkin_: karolherbst: return error
16:09 karolherbst: so...
16:09 karolherbst: mhhh
16:09 karolherbst: sounds wrong
16:09 karolherbst: pstate change shouldn't be effected by that
16:09 karolherbst: because they are several valid reasons
16:09 karolherbst: that it can't be done
16:09 RSpliet: in principle the VBIOS should never contain invalid configurations
16:09 karolherbst: that's not the point
16:09 karolherbst: maybe you have only 5.0 slots
16:09 RSpliet: in practice you can always make pstate try the next one down or leave it with the old value
16:10 karolherbst: how should the vbios know
16:10 karolherbst: or same goes for width change: what if there aren't enough available
16:10 karolherbst: because of other pcie cards
16:10 RSpliet: then the same answer applies ;-)
16:11 karolherbst: but it shouldn't prevent pstate change to 0f
16:11 karolherbst: that sounds wrong
16:11 RSpliet: don't try to make the changing code smart, rather make your error handling smart
16:11 karolherbst: like I am totaly play taht game => no 0f for you, because you board sucks
16:11 RSpliet: should've bought a better motherboard, tough luck
16:11 RSpliet: :-P
16:11 karolherbst: :D
16:12 karolherbst: yeah but you know what I mean
16:12 RSpliet: that's what you get for buying a Packard Bell
16:12 karolherbst: especially because pcie link speed might have no real effect on performance
16:12 karolherbst: usually it isn't a big perf gain
16:12 karolherbst: I messured 10% in general use cases
16:12 karolherbst: from 2.5 to 8.0
16:12 RSpliet: that's pretty steep tbh
16:13 karolherbst: 25% in the talos principle
16:13 karolherbst: or 100% in glxspheres
16:13 karolherbst: :D
16:13 karolherbst: it does increase stuff, sometimes and usually it does it always, but well
16:13 RSpliet: playing Portal, nouveau on my laptop is within 10% from the blob speed... if PCIe link speed change gives me the final 10% it's pretty darn awesome
16:13 karolherbst: I won't call it a critical thing
16:14 RSpliet: but coming back, the impact is usually more in load times
16:14 karolherbst: you missunerstood me
16:14 karolherbst: 10% from current fps
16:14 karolherbst: not compared to blob
16:14 karolherbst: so if you get 30 fps at 2.5 you might get 33 at 8.0
16:14 karolherbst: 31-32 is more likely though
16:15 karolherbst: its noticable with gallium_hud and always there
16:15 karolherbst: there is no doubt, but its usually no big deal
16:15 RSpliet: I didn't misunderstand you, but I rejected the difference for insignificance
16:15 karolherbst: :D
16:15 karolherbst: I see
16:15 RSpliet: because if I recall correctly, nouveau gets portal running at 48fps on my laptop, nvidia at 52
16:15 karolherbst: :)
16:15 RSpliet: or something like that
16:15 karolherbst: then you found the missing part
16:15 karolherbst: :p
16:16 RSpliet: well, the proof of the pudding is in the eating, shoo, go code you :-P
16:16 karolherbst: but portal might also have optimisation passes for mesa
16:16 karolherbst: and other stuff... don't
16:17 RSpliet: but as I said, the biggest impact is in upload times
16:18 karolherbst: okay, cleanup time now
16:18 karolherbst: please comment: https://github.com/karolherbst/mesa/commit/9e7f04005ac775d9d272db9cd219dce5cd2d4dbd
16:18 karolherbst: yeah
16:18 karolherbst: I noticed, that usually higher mem clock gives more performance than gpu clock
16:18 karolherbst: at least with nouveau
16:20 RSpliet: so an increase in fps from 30->33 is not spectacular (but good nonetheless!), but saving 5 to 10 seconds on load times because texture, vertexbuf, constbuf, vertarrays uploads go faster is quite valuable for gameplay experience :-)
16:20 karolherbst: yeah
16:20 RSpliet: and yes, memory is usually the bottleneck on performance
16:20 karolherbst: bioshock infinite is pretty laggy
16:20 karolherbst: if it comes to switching areas
16:20 karolherbst: or just running a bit
16:21 karolherbst: how does the temperature goes for memory by the way?
16:21 karolherbst: imagine I overclock my memory by x2
16:21 karolherbst: how "bad" is this
16:22 RSpliet: I never did do any experiments with that tbh
16:22 karolherbst: mhh okay
16:23 RSpliet: if the VBIOS tells me the speed is ok, it's ok, never bothered worrying about overclocking
16:23 karolherbst: yeah I usually don't do that too
16:23 RSpliet: don't think we have a sensor on the GPU for measuring RAM temperatures
16:23 karolherbst: but temperatur stayed pretty much around 85°C even after doing max reclock in blob
16:24 karolherbst: anyway, I don't think I missed something now
16:25 karolherbst: will do random thingy
16:26 RSpliet: random comment: doMain sounds a bit non-descriptive in my ears
16:26 karolherbst: I know the mains as bad
16:26 karolherbst: *names
16:26 karolherbst: but thanks for reminding me :)
16:41 karolherbst: imirkin_: random works now
16:41 imirkin_: karolherbst: yay!
16:42 imirkin_: for osmething like unigine?
16:42 imirkin_: without if (serial == 732) hacks?
16:42 karolherbst: there I have those spilling issues
16:42 karolherbst: no
16:42 karolherbst: furmark runs though
16:42 karolherbst: will try other games
16:44 karolherbst: imirkin_: antichamber seems to run
16:45 imirkin_: hmmmmm what spilling issues?
16:45 imirkin_: it should run SUPER slow
16:45 imirkin_: but it should stillw ork
16:48 karolherbst: segfault inside nv50_ir::GCRA::calculateSpillWeights :/
16:48 karolherbst: after "ERROR: no viable spill candidates left"
16:48 karolherbst: I doubt you want to go through the output though
16:49 imirkin_: ah ok
16:49 imirkin_: not really ;)
16:49 karolherbst: its one of those 4000+ instruction shader
16:49 imirkin_: at least not now
16:49 imirkin_: anyways, that's awesome
16:49 karolherbst: yeah
16:49 karolherbst: didn't even hit my assert
16:49 karolherbst: so every single instruction got somehow scheduled
16:50 karolherbst: will try bioshock
16:51 karolherbst: happy compiling
16:51 karolherbst: ohh right extension
16:51 karolherbst: :D
16:51 karolherbst: but it got into the main menu
16:52 karolherbst: makes sense though, unigine fails but bioshock works
16:53 karolherbst: have some failing shaders though
16:53 karolherbst: mhh it "partly" works
16:53 karolherbst: the parts which work look fine
16:54 imirkin_: bbiab
16:54 karolherbst: usually get those "ERROR: failed to coalesce phi operands" and "nvc0_program_translate:567 - shader translation failed: -4"
16:54 imirkin_: hmmmm
16:54 imirkin_: that's a weird RA fail
16:54 karolherbst: I think its just too much
16:55 karolherbst: too bad order or something
16:55 karolherbst: I coul extrace those programs
16:55 karolherbst: and check whats wrong
16:55 karolherbst: hopefull I find a smaller one
16:56 karolherbst: okay random is strange
16:56 karolherbst: no other parts don't work
16:57 karolherbst: awesome :O
16:57 karolherbst: ingame works
16:57 karolherbst: only main menu failed
16:58 karolherbst: no issue found yet
16:58 karolherbst: except perf
16:59 karolherbst: okay nice
16:59 karolherbst: one failing shader: https://gist.github.com/karolherbst/ff23bbb5a05112cc2622
17:00 karolherbst: 200 instr big
17:02 karolherbst: this was random
17:02 karolherbst: and this is front: https://gist.github.com/karolherbst/5125cc835efaa79ac2d8
17:06 karolherbst: some random wine games also work
17:08 karolherbst: also compilation speed isn't the best yet
18:22 karolherbst: ohhhhh
18:22 karolherbst: imirkin_: there is a thing like --eon_disable_arb_copy_image
18:42 karolherbst: I disabled scheduling for some instruction so that the diff is smaller, still I can't see where this shader fails or why