00:33 espes__: mwk: a couple years later: https://www.youtube.com/watch?v=3LoHErtgxgI
00:39 imirkin: espes__: nice. feel free to ask questions about nv2a...
00:41 imirkin: sounds like you have a lot of it covered already though (looking over hw/xbox/nv2a.c)
00:47 karolherbst: imirkin_: ping
00:47 karolherbst: did you solve the voltage thing?
00:47 karolherbst: because I know what that could be about
00:48 imirkin: karolherbst: no, i didn't try
00:48 imirkin: btw, i'm on GF108 right now... what did you want me to try?
00:48 karolherbst: you should check if you also have this PWM GPIO ;)
00:48 imirkin: pretty sure i don't
00:48 karolherbst: mhh
00:49 karolherbst: then its just a mismatch
00:49 karolherbst: nouveau is pretty strict about voltage
00:49 karolherbst: something like this should help https://github.com/karolherbst/nouveau/commit/5554a27415b61a59f1667074cd2162c9f2470cdf
00:49 karolherbst: voltage fails if the requeste and the "looked up" value aren't the same
00:49 imirkin: meh, i'm perfectly happy at the higher voltage
00:50 imirkin: did you want me to check some reg on my fermi?
00:50 karolherbst: yeah it should be fine as long as the voltage is high enough
00:50 karolherbst: yeah, the temp reg
00:50 karolherbst: 0x02044c
00:51 imirkin: 0
00:52 karolherbst: 0 or nothing there
00:52 imirkin: i don't get a MMIO read error if that's what you mean
00:52 karolherbst: mhh
00:52 karolherbst: what about 0x020450
00:52 imirkin: 0.
00:52 karolherbst: okay
00:53 imirkin: probably just not hooked up?
00:53 karolherbst: maybe
00:53 imirkin: anyways...
00:53 imirkin:&
00:53 karolherbst: it is also strange, that it is there since nvd
00:53 karolherbst: *ncd
00:53 karolherbst: ..
00:54 karolherbst: still tired, meh
00:59 karolherbst: okay
01:02 karolherbst: imirkin: this is the function where the GPIO for voltage is selected? https://github.com/karolherbst/nouveau/blob/master/drm%2Fnouveau%2Fnvkm%2Fsubdev%2Fvolt%2Fbase.c#L165-L204
01:27 karolherbst: ohhhh I was totally wrong with the volt values, the blob uses mcuh higher ones
02:18 mupuf: karolherbst: oh, forgot to tell you, I plugged an nve6 and GK208
02:18 mupuf: I could not find my nve7 this morning
02:39 karolherbst: ohh nice
02:39 karolherbst: I chcked but I only saw the old cards
02:39 karolherbst: maybe I checked too early
02:39 karolherbst: thanks by the way
02:39 karolherbst: mupuf: do you know how I get the right voltage values with my pwm?
02:39 karolherbst: blob is between 0x26 and 0x3d
02:40 mupuf: no, I do not know
02:40 mupuf: we need to find this information in the vbios
02:40 karolherbst: okay
02:40 mupuf: I tried to find some datasheets of voltage controllers with PWM input
02:40 mupuf: but I could not find any
02:40 mupuf: I discussed this with a hw engineer
02:41 mupuf: and he agreed that it is likely not using a low-pass filter to generater an analog signal
02:41 karolherbst: okay
02:41 mupuf: it is likely monitoring the Ton/(Ton + Toff)
02:42 mupuf: I guess we just need to find this table
02:42 karolherbst: I slowly get the feeling, that the blob doesn't care that much about the vbios though
02:42 karolherbst: because the pwm stuck at 0x26 for 862-+135MHz
02:42 karolherbst: *0x3d
02:43 mupuf: should not be too hard to compare the vbios of a PWM-based VR and a GPIO-based one
02:43 karolherbst: also it seems like to use only three or four values in total
02:43 karolherbst: yeah
02:43 mupuf: well, maybe it is because there are only 4 values defined
02:44 karolherbst: nouveau detected only the GK208 card
02:44 karolherbst: but its fine for now
02:45 mupuf: well, the voltage table seems to have been adjusted
02:46 karolherbst: https://gist.github.com/karolherbst/38cb88b19db03773f300 this is what nvbios tells me
02:47 mupuf: this is not the vbios table I was talking about
02:47 karolherbst: which one do you meant?
02:48 mupuf: Voltage table
02:48 mupuf: look for it
02:48 karolherbst: its also there
02:48 mupuf: it is the one before hte temperature table
02:48 karolherbst: line 258
02:48 mupuf: wow, you have a ton of values
02:48 karolherbst: yeah
02:48 karolherbst: patch
02:49 karolherbst: https://github.com/karolherbst/envytools/commit/f684eadaa33eb8d4677f5900b80db0e1e7788aec
02:49 karolherbst: there are some cards with 64 values ;)
02:49 karolherbst: without the patch 0 are shown
02:49 mupuf: will have to look at that later
02:49 mupuf: see you!
02:49 karolherbst: yeah, later
02:50 mupuf: and no, both the GK106 and the GK208 are there
02:50 mupuf: nvalist says so
02:50 karolherbst: okay
02:50 karolherbst: dmesg only printed information about one
02:50 karolherbst: so I was confused
02:51 karolherbst: ohh
02:51 karolherbst: there is only card in sysfs, too
02:51 karolherbst: I will figure that out
02:51 mupuf: let me check it out
02:51 mupuf: stop using reator for a sec
02:51 karolherbst: k
02:52 karolherbst: the GK208 is ddr3 though
02:53 mupuf: what the heck is wrong with this nve6....
02:53 mupuf: lspci finds it, and so does nva
02:54 mupuf: anyway, will check it out later
02:54 karolherbst: okay
02:54 karolherbst: but the GK208 doesn't help me ;)
02:55 karolherbst: but I think I will find a way
02:55 mupuf: then it will have to wait for tonight, sorry
02:55 karolherbst: no problem
02:56 karolherbst: "[ 867.099023] nouveau D[ DEVICE][0000:05:00.0][0x80000080] illegal class 0x9570"
04:51 karolherbst: imirkin: what is the best way to extract anykind of nvidia gpu assembly from a mmt?
04:51 karolherbst: I tried "-d all -e shader" but I guess there is more
05:22 karolherbst: RSpliet: you said something about a crystal clock != 27MHz. What kind of cards or under which ciscumstances is the clock different?
05:22 RSpliet: generally that's older than Tesla
05:23 karolherbst: okay
05:23 karolherbst: because I hardly doubt my gddr5 patch works if the crystal is different
05:24 karolherbst: or it won'T work as expected
05:24 karolherbst: which means I would have to think about how to handle different crystal clocks
05:24 RSpliet: you should make it so
05:25 karolherbst: also I noticed, that my voltage was much lower than the blob ones, so this could also explain some hangs
05:25 RSpliet: there's perfectly good functions for reading out the source clock frequency
05:25 karolherbst: yeah, that's not the big problem
05:25 karolherbst: I basically just hardcoded usable refclocks for the first PLL output
05:25 karolherbst: that should be more dynamic then
05:26 karolherbst: but I don't know what the blob does on such cards
05:26 RSpliet: neither do we, but I doubt it's supposed to be a hard-coded clock
05:27 karolherbst: yeah
05:27 karolherbst: the PLL values kind a form a pattern
05:27 karolherbst: if you check this table https://gist.github.com/karolherbst/a5dd956189a533ff3e6d#file-blob-values-for-pll-csv
05:27 karolherbst: first row is the first PLL reg
05:28 karolherbst: the question is if I should use these reg values or rather try to find falues for a specific clock range
05:28 karolherbst: ... it shouldn't be voltage in the table, but refclock...
05:34 RSpliet: would you mind dropping a copy of nvbios -v <yourbios.bin> somewhere on a paste website?
05:34 karolherbst: my stuff is in git
05:35 karolherbst: nve6/$email/vbios.rom
05:35 RSpliet: I have no access to git from work
05:35 RSpliet: nor the envytools
05:35 karolherbst: okay
05:35 karolherbst: wait
05:38 karolherbst: RSpliet: https://gist.github.com/karolherbst/706944aac588b37d0da8
05:45 karolherbst: what are you searching for?
05:51 RSpliet: hints to determine the refclk
05:51 RSpliet: but none
05:51 RSpliet: (and I only have so much time to waste)
05:53 karolherbst: RSpliet: the PLL values are the same across cards on the blob
05:53 karolherbst: didn't check all, but the ones I checked were the same
05:53 karolherbst: I really don't think its card dependent what to use or maybe its just the same in the bios for all
05:54 RSpliet: yes, it was unlikely, but couldn't resist my urge
06:02 karolherbst: I see
06:16 karolherbst: now nearly only performance is missing for my card
06:24 karolherbst: RSpliet: what do you think about a memory over/under-clocking debugfs file?
06:25 karolherbst: gpu core clocking looks to challanging for the moment, because of the cstate stuff going on
06:25 RSpliet: I'd say don't bother
06:25 RSpliet: it's not something we'd want to support just like that
06:25 karolherbst: mhh I wanted to implement it like its on the blob
06:26 karolherbst: only for highest pstate
06:26 RSpliet: well, you can... but I don't expect it to be accepted upstream
06:26 karolherbst: mhh okay
06:27 karolherbst: because of danger or just because the gain isn't worth it
06:27 RSpliet: because of liability
06:27 karolherbst: ohh
06:28 karolherbst: okay
06:28 karolherbst: even if there is a option for that and stuff? Because I don't see why nouveau should not include something like that if nvidia does
06:29 RSpliet: it will need a lot of thought
06:29 karolherbst: yeah I am aware of that
06:29 RSpliet: on how we'd want to support it, how we inform the user, and what the options are
06:29 karolherbst: okay
06:30 RSpliet: I'd say it would take a *lot* of effort to get all this mapped out (determining boundaries, bikeshedding, politics) for a relatively little gain
06:30 RSpliet: there's bigger issues to tackle to get more perf from nouveau
06:30 karolherbst: okay
06:31 karolherbst: I was just thinking that as long as everything seems to work for me, I could take a look on the performance side. But maybe improving the gpu code is the better way to do it
06:32 RSpliet: it most likely is... although I don't have a clear vision on what the bottlenecks currently are
06:32 karolherbst: mhh
06:33 karolherbst: I am pretty sure for me its the gpu code or putting enough stuff into the gpu or something in this kind of area
06:33 RSpliet: I reckon it's memory bandwidth utilisation, but there could be a million different hidden ways of increasing that
06:33 karolherbst: lower temp just doesn't come by itself if voltage/clocks/pci link is the same
06:33 karolherbst: I think the gpu is doing not as much as with the blob
06:34 RSpliet: lower temps can be achieved by clock gating
06:34 karolherbst: yeah, but blob has higher ones ;)
06:34 karolherbst: at full load
06:34 RSpliet: and lower zeroes?
06:34 RSpliet: oh
06:34 RSpliet: higher temperatures
06:34 karolherbst: yes
06:34 karolherbst: that's why I think it has something todo what the gpu does or gets
06:35 karolherbst: I think the difference is like 5 or 6°C ?
06:35 RSpliet: possibly
06:35 karolherbst: or its just gpu boost
06:36 karolherbst: neve encountered that in nvidia-settings though
06:36 karolherbst: ohhhh
06:36 karolherbst: ...
06:36 RSpliet: or nouveau being more conservative with driving the fan...
06:36 karolherbst: I just found something
06:36 karolherbst: gpu boost target seems to be 80°C for the titan card
06:37 karolherbst: and nvidia ran the card at like this temp at normal 0f clocks
06:37 karolherbst: could be coincidence though
06:38 karolherbst: mhh
06:38 karolherbst: okay gpu boost is windows only
06:39 karolherbst: RSpliet: nouvean doesn't control my fan
06:39 karolherbst: and its alos not as loud as with the blob
06:49 imirkin: karolherbst: just demmt -l foo.mmt, and then search for START_ID
06:49 karolherbst: okay, thanks
06:49 imirkin: there's probably a clever cmdline you can give to jump dump those, but tbh i've never needed that
06:55 karolherbst: imirkin: the CODE section?
06:55 karolherbst: okay, that looks nice
06:55 imirkin: that has the code, yes ;)
06:56 karolherbst: I see a lot of scheds there ..
06:56 imirkin: every 6th instruction
06:56 karolherbst: *8
06:56 imirkin: er right
06:57 karolherbst: does it tell the priority of the next 7 instructions?
06:58 karolherbst: mhh
06:58 imirkin: read SchedDataCalculator
06:59 imirkin: it basically gives it an idea of the inter-op dependencies
06:59 imirkin: latencies
06:59 imirkin: etc
06:59 karolherbst: okay
06:59 karolherbst: but that's not the part I am interessted currently :D
06:59 karolherbst: don't know eough to implement it anyway
07:00 karolherbst: so I will just compare code and look if something is a lot different
07:01 imirkin: well, one thing that the blob compiler likes to do is to do conditional branches over long predicated sections if all invocations have that predicate = 0
07:02 karolherbst: and it likes to have code lenthg ot v * 8
07:02 karolherbst: and fills up with nops
07:02 karolherbst: or does mesa the same?
07:02 imirkin: and a branch that jumps to itself after the exit... heh.
07:02 karolherbst: yeah
07:02 karolherbst: saw that
07:02 karolherbst: 00000038: 00001de7 80000000 exit
07:02 karolherbst: 00000040: e0001de7 4003ffff B bra 0x40
07:02 imirkin: somehow i doubt that has anything to do with perf though
07:02 karolherbst: like this?
07:03 imirkin: yea
07:03 imirkin: we don't do that
07:03 imirkin: doesn't seem necessary
07:03 karolherbst: found something REALLY strange now
07:03 karolherbst: https://gist.github.com/karolherbst/53e892641c1aa316386c
07:03 karolherbst: the bra should be a sched
07:03 karolherbst: but it isn't
07:03 karolherbst: still 7 instructions after it
07:03 karolherbst: even all nops
07:03 imirkin: yeah, it's just filled in with junk
07:03 imirkin: don't worry about it
07:04 imirkin: nothing after the exit matters
07:04 karolherbst: I am more worried about the bra
07:04 karolherbst: okay
07:04 karolherbst: ohh
07:04 karolherbst: it always does this anyway
07:04 imirkin: there might be some reason why they have it... perhaps if you single-step over the exit, you need the bra there... who knows.
07:04 karolherbst: debugging?
07:04 imirkin: yea
07:05 imirkin: but in practice it doesn't matter
07:06 karolherbst: how hard is it to implement scheduling?
07:06 karolherbst: even a really trivial one
07:07 imirkin: not too hard
07:08 imirkin: you figure out which opcodes in a bb have no dependencies on any other opcodes in that bb
07:08 imirkin: and then pick the "right" one, repeat
07:08 imirkin: which one is the right one? that's why scheduling is tricky
07:09 karolherbst: okay
07:09 karolherbst: so checking dest and stuff
07:09 karolherbst: and soemtimes there instructions with strange effects I guess
07:09 karolherbst: or more stuff gets changed then in dest/src
07:11 imirkin: no
07:11 imirkin: there could be implicit deps on memory though
07:11 imirkin: e.g. x = a; b = x;
07:11 imirkin: where x is some memory location
07:11 karolherbst: okay
07:12 imirkin: and there are some other subtleties i won't mention for now
07:12 imirkin: since they're relatively minor
07:12 karolherbst: NV50_PROG_DEBUG prints native code like the one in mmt?
07:13 imirkin: sssssort of
07:13 karolherbst: I already saw mad = fma
07:13 imirkin: (a) it prints the code that nouveau *thinks* it's emitting
07:13 karolherbst: :D
07:13 imirkin: (b) it prints the opcodes in nv50 ir, which generally but not perfectly map to actual isa opcodes
07:14 imirkin: the first point is an important one... tons of emitter bugs here and there
07:14 imirkin: http://cgit.freedesktop.org/mesa/mesa/log/src/gallium/drivers/nouveau/codegen
07:15 imirkin: like this one for example, although that was a simple minor one:
07:15 imirkin: http://cgit.freedesktop.org/mesa/mesa/commit/src/gallium/drivers/nouveau/codegen?id=8c8a71f0d125bb655b17a32914ffecf8d159593b
07:16 imirkin: this one was fun: http://cgit.freedesktop.org/mesa/mesa/commit/src/gallium/drivers/nouveau/codegen?id=c3215ef204c0fdfc44230adbd423720169d44dcb
07:19 karolherbst: found unknown stuff
07:20 karolherbst: imirkin: https://gist.github.com/karolherbst/e49119952ca21ed74a88
07:20 imirkin: that looks like BS
07:20 karolherbst: but this one kind of seems unimportant
07:20 karolherbst: okay
07:21 imirkin: there aren't short opcodes on kepler
07:21 karolherbst: never happens again in my file
07:22 imirkin: nvdisasm doesn't like it either
07:22 imirkin: karolherbst: this is how i run nvdisasm btw: http://hastebin.com/raw/qihijesolo
07:23 imirkin: (SM30 = kepler1)
07:23 karolherbst: okay
07:23 imirkin: no relation to the shader model 3.0 from DX9
07:24 imirkin: if you're feeling clever/creative, you could hack support for running the shader code through nvdisasm instead of envytools
07:24 imirkin: er, envydis
07:24 imirkin: might be too slow though
07:25 karolherbst: its okay
07:26 karolherbst: I have like 9 different codes from the blob
07:26 karolherbst: should be fine
07:26 karolherbst: 8
07:27 imirkin: if you're looking to improve nouveau's compiler, look for code that you think is stupidly written, and try to figure out optimizations that would make it smarter
07:27 imirkin: (i mean generated code, of course)
07:28 imirkin: also note that using more registers = less parallelism
07:28 imirkin: not a fact that nouveau's RA takes into account
07:28 karolherbst: okay
07:28 karolherbst: that might explain it
07:28 karolherbst: gpu does less parrallelism=> it does less stuff => produces less heat?
07:31 karolherbst: imirkin: I think these are the same? https://gist.github.com/karolherbst/6c40fb151c295eb6d324
07:32 karolherbst: I mean, same gl/glsl whatever code
07:33 karolherbst: wow
07:33 karolherbst: nouveau uses alot more registers
07:34 karolherbst: ohh, maybe only the umbers are strange
07:35 imirkin: yeah, so with clever scheduling
07:35 imirkin: it's able to put the texbar *way down*
07:36 imirkin: with like 10-20 instructions in between the tex and their use
07:36 imirkin: whereas since nouveau doesn't reorder
07:36 imirkin: it happens almost immediately
07:36 karolherbst: I am suprised that this code doesn't have any branches ...
07:36 imirkin: shader code usually has few if any branches
07:36 karolherbst: usually there a lot of them and I just give up reading them
07:36 karolherbst: ohh
07:37 karolherbst: I saw a lot of them
07:37 karolherbst: maybe I started with something strange
07:37 imirkin: like shadertoy.com? :)
07:37 karolherbst: don't know
07:38 karolherbst: maybe I should start with glxspheres first :/ instead of furmark
07:38 imirkin: which has shaders that *only* have branches
07:38 imirkin: hehe
07:38 karolherbst: glxgears has 3x perf on blob
07:38 imirkin: that's a call overhead test
07:38 imirkin: more than anything else
07:38 karolherbst: mhh
07:38 karolherbst: gpu load at 80%?
07:38 karolherbst: ohh
07:39 imirkin: yeah, coz they have very low call overhead ;)
07:39 karolherbst: yeah I see
07:40 karolherbst: these stupid regs just polluted the traces like always ...
07:41 karolherbst: okay, but from the code I don't really see what nouveau does bad
07:41 karolherbst: except scheduling
07:42 karolherbst: whats cvt?
07:42 imirkin: convert
07:42 karolherbst: ohh
07:42 imirkin: usually from one datatype to another
07:42 imirkin: but it can also be used for rounding, abs/neg, etc
07:43 imirkin: bbl
07:49 karolherbst: got blob unstable at highest perf mode with specific clocks :D
07:50 karolherbst: imirkin: memory performance is awesome now
07:50 karolherbst: furmark: blob 55fps, nouveau 52fps
07:50 karolherbst: and its heavy memory limited
07:51 karolherbst: 75% mem clock: blob only 40fps
08:01 karolherbst: blob : 1700 points, nouveau 1433 points
08:02 karolherbst: 85% with a memory bottlenecked benchmark
08:16 imirkin_: karolherbst: you wanted my gk208 vbios earlier... http://people.freedesktop.org/~imirkin/traces/gk208/gk208-vbios.rom
08:17 karolherbst: I wanted it?
08:17 karolherbst: mhh
08:17 karolherbst: do you know why?
08:17 imirkin_: voltage?
08:19 karolherbst: don't know
08:19 karolherbst: maybe I will know later and then I have it
08:19 karolherbst: anyway I think the performance problem is somehow core related
08:19 karolherbst: memory seems to be fast enough
08:24 karolherbst: imirkin_: stupid idea but maybe the gpu expects sched instructions to be there and it runs slower just when there is something else... :D
08:28 imirkin_: yes, that's how kepler works
08:29 karolherbst: I meant like, would it improve performance if every 8th instructions is sched with arguments that do nothing?
08:30 martm: karolherbst: mind telling how does your gk208 bench against blob using some framework, can you get to 80% of blobs perf?
08:30 karolherbst: gk106
08:30 karolherbst: and furmark is at 85% speed
08:31 karolherbst: mhh basically 0f pstate does the trick
08:31 martm: aight, that should be even more powerful gpu, if i remember right, well yeah then this could be state tracker overhead
08:31 karolherbst: furmark depends a lot of memory clock
08:31 martm: are you able to run it at highest perf level?
08:31 karolherbst: decreasing mem clock by 25% also decreases fps by 25%
08:32 karolherbst: yes
08:32 martm: well nice!
08:32 karolherbst: guess what I've been working on the last days :D
08:32 karolherbst: martm: https://github.com/karolherbst/nouveau/commit/6933ebb2480bb62534648c180501c5bad6d2c514
08:33 karolherbst: but GK208 seems to be mostly ddr3
08:35 imirkin_: karolherbst: the only thing that i'm fairly sure works well in the nouveau codegen is the sched code calculation on kepler
08:35 imirkin_: karolherbst: the sched codes aren't *instructions* per se, even though they're presented that way
08:35 karolherbst: ohh okay
08:36 imirkin_: the gpu just grabs stuff in 0x40 chunks, and the first 8 byte of that chunk is the sched info
08:36 karolherbst: ahhhh
08:36 imirkin_: (or 0x20 chunks on maxwell)
08:36 karolherbst: okay and the sched part is empy now with stuff that does nothing
08:37 imirkin_: for maxwell? yeah
08:37 karolherbst: and kepler?
08:37 imirkin_: kepler = good
08:38 karolherbst: mhh
08:38 karolherbst: so mesa does schedule on kepler?
08:38 imirkin_: no
08:38 imirkin_: it computes the sched codes
08:39 imirkin_: however the instructions are still ordered very poorly
08:39 karolherbst: ohh okay
08:39 imirkin_: but *given* that ordering, the sched codes are computed correctly
08:39 karolherbst: I see
08:39 martm: karolherbst: but gk208 is some different kepler , does gddr5 reclock work on it too now?
08:39 karolherbst: no
08:39 karolherbst: gk208 is ddr3 mostly
08:39 imirkin_: sched codes just have various latency/dependency/etc info
08:39 karolherbst: and this isn't mainlined
08:39 karolherbst: okay
08:39 imirkin_: huh?
08:40 martm: aight, what about maxwell?
08:40 karolherbst: I mean it works kind of, if you got the patch
08:40 karolherbst: but there are still issues
08:40 imirkin_: worksforme
08:40 imirkin_: (gk208 + ddr3)
08:40 karolherbst: yeah
08:41 martm: well, dudes you seem to be doing intelligent stuff:), nice
08:41 karolherbst: I think martm thinks it is gddr5
08:41 martm: yeah i got it, yeah i thought it could be
08:41 karolherbst: there is one
08:41 karolherbst: GeForce GT 640 Rev. 2 is gddr5
08:42 imirkin_: but not gk208 :)
08:42 imirkin_: it's a GK107
08:42 karolherbst: and GeForce GT 730 (GDDR5)
08:42 imirkin_: [i think]
08:42 imirkin_: maybe i'm thinking of GT 630 rev2
08:42 karolherbst: the GT 640 seems strange anyway
08:42 karolherbst: there is even a fermi version of gt640
08:42 karolherbst: yes
08:43 karolherbst: 630 rev 2 is ddr3
08:44 karolherbst: imirkin_: so if I understand you right: nouveau currently orders the instructions badly and the "sched" instructions just gives some additional information
08:45 RSpliet: well, badly is a strong word
08:45 imirkin_: it just leaves them in the initial order
08:45 RSpliet: it orders them in a way that the program behaves as expected
08:45 karolherbst: okay
08:45 imirkin_: but there's no reordering that happens later
08:45 RSpliet: it doesn't attempt to optimise for memory access
08:45 imirkin_: aka instruction scheduling
08:46 karolherbst: k, got it
08:46 RSpliet: nor pipeline stalls (?)
08:46 RSpliet: nor dual-issue (??)
08:46 martm: i seem to have a ban still on most relevant intel channels on my name, does anyone know how intel GL_INTEL_TESSELATION should work on android using windows drivers? mupuf do you know how they do it?
08:47 karolherbst: mhhh okay
08:48 karolherbst: maybe I could try to work on that, but I mostly think that this is a llot of work to get this right somehow
08:49 RSpliet: it is, why else would you think nobody done it yet :-P
08:50 karolherbst: :/
08:50 RSpliet: +has
08:51 karolherbst: I am just worried that it takes like a week to get the first result or anything
08:51 RSpliet: then split up the work in smaller milestones
08:51 karolherbst: is it possible with instructions reordering?
08:52 imirkin_: karolherbst: well, milestone 1 is to create logic that reorders it based on random heuristics
08:53 imirkin_: karolherbst: and later milestones will hopefully improve on "random" :)
08:53 imirkin_: if you can't beat random, then... heh.
08:53 karolherbst: I mean, if I do something wrong the code just does stupid stuff I guess
08:54 karolherbst: maybe I could start reorder specific instructions first
08:54 imirkin_: no, you have to build up dependency graphs
08:54 imirkin_: the graph basically already exists
08:54 imirkin_: you just have to process it
08:54 karolherbst: ohh
08:55 imirkin_: and only reorder within a basic block
08:55 karolherbst: then I can't do that much wrong if the graph is right
08:55 imirkin_: as long as you respect the graph ;)
08:55 karolherbst: yeah
08:55 imirkin_: nothing in nouveau will prevent you from ordering stuff wrong
08:56 karolherbst: but then again I need to know what is a better ordering
08:56 martm: hmm, in what cases should instruction reordering be beneficial?
08:56 imirkin_: which is why milestone 1 = random ordering
08:57 karolherbst: okay, so maybe I got lucky and see a performance increase or performance drop
08:57 imirkin_: so that you don't have to worry about such things and can just make sure your reordering logic is good
08:57 karolherbst: ahh okay
08:57 imirkin_: and then you can implement heuristics
09:00 karolherbst: imirkin_: which part of mesa do I have to look?
09:01 imirkin_: codegen
09:01 imirkin_: look for reorderInstructions
09:04 karolherbst: can I use the shader cache to try stuff out?
09:05 karolherbst: ohh
09:05 karolherbst: its only an option
09:05 karolherbst: nvm then
09:06 karolherbst: imirkin_: you mean orderInstructions?
09:06 imirkin_: ya
09:07 karolherbst: :D
09:07 karolherbst: nice function though
09:07 imirkin_: insert code here :)
09:07 karolherbst: Graph::Node
09:07 karolherbst: this is part of gallium?
09:07 imirkin_: no
09:07 karolherbst: okay, so nouveau already
09:09 karolherbst: imirkin_: is this "BasicBlock::get(reinterpret_cast<Graph::Node *>(it->get()));" these blocks I saw in the dumped shader codes?
09:09 imirkin_: yes
09:09 karolherbst: okay
09:09 imirkin_: bb has no control flow
09:09 imirkin_: only at the end of it
09:10 karolherbst: okay
09:10 karolherbst: so I don'T need to worry much there
09:10 imirkin_: the graph controls the linkages
09:10 karolherbst: I currently look how I get the dependencies
09:10 imirkin_: instr->getSrc(n)->getInsn()
09:11 imirkin_: will tell you the instruction that defines that source
09:11 imirkin_: if it's in a diff bb then it doesn't matter
09:11 karolherbst: mhh okay
09:12 karolherbst: I could create some lists first and mix them together
09:12 karolherbst: first list: no deps inside block
09:13 imirkin_: right
09:13 imirkin_: and then add stuff in every time you shced an p
09:13 imirkin_: or you can go the other way
09:13 imirkin_: start at the end
09:14 imirkin_: http://cgit.freedesktop.org/mesa/mesa/tree/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp
09:14 karolherbst: p?
09:15 imirkin_: op :)
09:43 mlankhorst: [6;2~
09:44 mlankhorst: oops
10:16 imirkin_: karolherbst: btw, you may be interested in reading up a bit on SSA form. basically each thing is only ever assigned once.
10:17 imirkin_: and there are these things called "phi" nodes that deal with merging assignments from multiple branches
10:21 karolherbst: imirkin_: do I have to iterate over all instructions to check if an instruction is inside a block?
10:21 imirkin_: insn->bb == bb seems faster
10:22 karolherbst: ahh
10:22 karolherbst: I usually code always with methods, so I am not used to checking members :D
10:27 karolherbst: okay, first test
10:29 imirkin_: well, don't go too crazy with adding methods
10:29 imirkin_: but obviously add what makes sense
10:29 imirkin_: i'm not too bit a fan of random getFoo() functions that just return some member
10:29 imirkin_: neither was calim apparently
10:29 imirkin_: :)
10:30 imirkin_: [he wrote all the code]
10:30 imirkin_: [or at least a lot of it]
10:30 karolherbst: I also often code with interfaces ;)
10:30 karolherbst: but that was for software for more general usage
10:30 imirkin_: karolherbst: well, there are definitely a few ugly odds and ends
10:30 karolherbst: yeah
10:30 imirkin_: if you have ideas for cleanups, run them by me, and if they make sense let's do them
10:30 karolherbst: for mesa performance is a bit important, so I can understand why stuff is done that way
10:31 karolherbst: but serisouly, ArrayList could have been a template ;)
10:32 karolherbst: maybe I take a look to make stuff c++11 compatible if its not done already
10:32 karolherbst: makes iteration much easier
10:32 imirkin_: karolherbst: that's one of the few uglies
10:32 karolherbst: okay, so file installed
10:32 imirkin_: karolherbst: the reason is that originally c++ templates were frowned upon, so calim did it without using std::stuff
10:32 karolherbst: now let the fun begin
10:32 karolherbst: :D
10:32 karolherbst: ahhh
10:32 karolherbst: yeah
10:32 karolherbst: no
10:32 karolherbst: its okay
10:32 karolherbst: std::stuff is slow
10:33 imirkin_: karolherbst: and no one's had the heart to clean it up. but ideally it should be used
10:33 imirkin_: really? my experience is that it's quite good
10:33 karolherbst: string conversion is slow as hell
10:33 imirkin_: conversion?
10:33 karolherbst: boost::lexical_cast is much faster
10:33 karolherbst: string to int
10:33 karolherbst: and stuff
10:33 imirkin_: that's not the std stuff i'm talking about :p
10:33 karolherbst: I know
10:33 imirkin_: i'm talking about std::list std::vector std::unordered_* etc
10:33 karolherbst: but the c++ std lib tries to be safe
10:34 karolherbst: I mean, its not that slow, but well...
10:34 karolherbst: there are some SIMD implementation out there
10:34 karolherbst: which do things a lot faster
10:34 imirkin_: probably. either way, it's not like ArrayList is particularly clever.
10:34 karolherbst: yeah
10:34 imirkin_: it's just like a not-templates array impl
10:35 karolherbst: a std container should have been fine there
10:35 karolherbst: mhh
10:35 karolherbst: I don't like templates either with gcc older than 4.9
10:35 karolherbst: :D
10:35 karolherbst: hard to debug
10:35 karolherbst: I always had to check with clang before
10:35 imirkin_: heh
10:35 imirkin_: use c++filt
10:35 karolherbst: otherwise some errors just didn't tell my anything
10:35 imirkin_: or wait... is it not that one?
10:35 imirkin_: iirc there was a script to basically trim out 99% of the errors of gcc's template junk
10:36 imirkin_: i got pretty good at reading them back in the gcc 4.0 days though, so i don't mind them
10:36 karolherbst: wiw, glxgears still works...
10:36 imirkin_: or maybe it was 3.3 days
10:36 karolherbst: wait
10:36 karolherbst: I have to install mesa right
10:36 imirkin_: and you have to save the new order ;)
10:36 karolherbst: okay, now I bricked mesa
10:37 karolherbst: but why
10:37 karolherbst: ohhh "libGL: dlopen /usr/lib64/dri/nouveau_dri.so failed (/usr/lib64/dri/nouveau_dri.so: undefined symbol: ST_DEBUG)"
10:38 karolherbst: imirkin_: howso?
10:39 imirkin_: -ENOPATCH
10:39 karolherbst: ?
10:39 karolherbst: now I am confused
10:40 imirkin_: i can't tell you wha tyou did wrong if i don't see a patch
10:40 karolherbst: ahh no, I think dirty build directory or something
10:40 karolherbst: I do a clean build now
10:40 karolherbst: intel got this: /usr/lib64/dri/i965_dri.so failed (/usr/lib64/dri/i965_dri.so: undefined symbol: nir_validate_shader)
10:41 karolherbst: I built mesa like half a year ago?
10:41 karolherbst: I mean in my local repository
10:41 imirkin_: clean.
10:41 karolherbst: yes
10:41 imirkin_: save your changes somewhere and 'git clean -fdx'
10:41 imirkin_: watch out
10:41 imirkin_: coz it'll erase all the various non-tracked files
10:42 imirkin_: so if you keep patches/etc in there... it'll nuke those too
10:42 karolherbst: segfault now
10:42 karolherbst: this could be my mistake
10:42 karolherbst: now, its fine
10:42 karolherbst: make clean was enough
10:42 imirkin_: it's been known to happen :)
10:44 karolherbst: https://gist.github.com/karolherbst/64281feb34942a8025a7 this looks kind of good to me :/
10:45 karolherbst: I know its wrong logically though
10:45 imirkin_: errrr wha
10:45 imirkin_: oh i see
10:46 imirkin_: i wonder what insn->serial is
10:46 karolherbst: I have to check dest too
10:46 imirkin_: no
10:46 karolherbst: if the instructions get inserted at the end?
10:46 imirkin_: but getInsn() can return null
10:46 imirkin_: e.g. imagine an immediate
10:46 karolherbst: ohh
10:46 karolherbst: okay
10:46 imirkin_: it can also be a bit more subtle
10:47 imirkin_: like imagine it's an indirect memory reference
10:47 imirkin_: you need to also look at getSrc(i)->hasIndirect(0) and ->hasIndirect(1)
10:47 imirkin_: s/has/is/
10:47 imirkin_: or just do ->getIndirect(i, 0) and ->getIndirect(i, 1)
10:47 karolherbst: glxgears: ../../../../../src/gallium/drivers/nouveau/codegen/nv50_ir_util.cpp:119: bool nv50_ir::Interval::extend(int, int): Assertion `a <= b' failed.
10:48 imirkin_: but glxgears won't have any such trickery
10:48 karolherbst: yeah well
10:48 karolherbst: it seems it has
10:49 karolherbst: mhh
10:49 imirkin_: no
10:49 imirkin_: definitely no indirect stuff.
10:49 karolherbst: okay, still I get this assetion
10:49 imirkin_: figure out why
10:49 karolherbst: maybe I should check dest
10:49 imirkin_: the thign i said will just cause you not to notice deps that exist
10:49 karolherbst: okay
10:49 imirkin_: the current insn is the one that defines the dest...
10:50 imirkin_: it's SSA
10:50 imirkin_: so each dest is only assigned once
10:50 karolherbst: okay
10:50 imirkin_: SSA = Static Single Assignment
10:51 imirkin_: that means if you have like if (foo) a = 5; else a = 6;
10:51 imirkin_: that becomes something like
10:51 imirkin_: if (foo) a1 = 5; else a2 = 6; a = phi(a1, a2);
10:52 imirkin_: where phi is the magical thing that joins results from multiple branches
10:53 karolherbst: I really should add -ggdb
10:54 karolherbst: ohh okay
10:55 imirkin_: but each thing is assigned exactly once
10:55 imirkin_: which makes optimizations a LOT easier to think about
11:26 karolherbst: wow
11:26 karolherbst: compiler error
11:26 imirkin_: ICE?
11:26 karolherbst: this assert came from a skipped segfault
11:27 karolherbst: added a "insn->getSrc(i) == NULL" check
11:27 karolherbst: and now it works
11:27 karolherbst: kindof
11:27 karolherbst: ohh wait
11:27 imirkin_: errr.... that's odd
11:27 imirkin_: i geuss most of the loops iterating over instructions are
11:27 karolherbst: nope, was something else
11:27 imirkin_: for (i = 0; insn->hasSrc(i); i++)
11:27 imirkin_: or something
11:27 karolherbst: but I still didn't got the segfault before
11:28 karolherbst: imirkin_: https://gist.github.com/karolherbst/f77951b539cd304ba096
11:28 imirkin_: my personal guess is that it has to do with the serial thing
11:28 imirkin_: i suspect it might have to be sequential
11:28 imirkin_: dunno
11:28 imirkin_: check how it's used
11:28 karolherbst: okay
11:28 imirkin_: i've never seen or thought about it before
11:28 karolherbst: but its overwritten
11:29 imirkin_: (ok, i've probably seen it before, i just have no recollection what it does)
11:29 karolherbst: void insert(void *item, int& id)
11:29 karolherbst: id = ids.getSize() ? ids.pop().u.i : size++;
11:29 imirkin_: oh, ok
11:29 imirkin_: that makes sense
11:29 imirkin_:hates passing by reference
11:29 karolherbst: serial is just the index in the current list?
11:29 imirkin_: yea
11:29 imirkin_: i thought the value was being passed in
11:29 karolherbst: yeah, you should return the new index
11:29 karolherbst: ...
11:29 imirkin_: either that, or pass &inst->serial
11:29 karolherbst: no such stupid reference thingy
11:30 karolherbst: mhhh
11:30 karolherbst: pointers are unsafe ;)
11:30 imirkin_: uhhhh what?
11:30 imirkin_: since when?
11:30 karolherbst: references are fine to use
11:30 karolherbst: but they get overused sometimes
11:30 imirkin_: i tend to err on the other side
11:30 karolherbst: if you don't want to have the NULL possibility
11:30 karolherbst: use references
11:30 imirkin_: you can get null with references too
11:30 karolherbst: yeah
11:30 imirkin_: ;)
11:31 karolherbst: but then its API violation
11:31 karolherbst: and you did hacky tricks before
11:31 imirkin_: anyways, pointers are plenty safe
11:31 imirkin_: esp if you're not an idiot
11:31 karolherbst: yeah its okay usually
11:31 karolherbst: &stuff is so C by the way :p
11:32 karolherbst: no serious C++ dev would ever want to do that
11:32 imirkin_: i guess i'm not a serious C++ dev
11:32 karolherbst: I mean a dev serious about C++
11:32 imirkin_: ah hehe
11:32 imirkin_: that i'm definitely not.
11:32 karolherbst: yeah
11:32 imirkin_: see curro for that
11:32 karolherbst: usually in C++ you use the stack
11:32 karolherbst: and pass by reference
11:32 imirkin_: look at how clover is written
11:32 karolherbst: global stuff you use poiters
11:33 karolherbst: *pointers
11:33 mupuf: imirkin_: fun, I have just said the same thing about passing by reference!
11:33 mupuf: Doing some python scripting right now
11:33 imirkin_: immensely confusing but very succinct
11:33 karolherbst: but even in c++11 you don't user pointers anymore
11:33 imirkin_: mupuf: well, languages like java/python/etc *only* allow you to pass by reference
11:33 karolherbst: std::shared_ptr<Type> ptr;
11:33 mupuf: imirkin_: yes...
11:33 karolherbst: well ...
11:33 imirkin_: [java has the autoboxed types, but that's a minor wart]
11:33 karolherbst: java, this is something I know a lot about
11:34 imirkin_: sadly me too. i wish i could lose that knowledge
11:34 karolherbst: fun fact
11:34 karolherbst: Integer.get(5) == Integer.get(5)
11:34 imirkin_: but not for 129
11:34 karolherbst: but Integer.get(2048) != Integer.get(2048)
11:34 karolherbst: yes
11:34 imirkin_: which leads to fun bugs when you hire employee #129
11:34 karolherbst: :)
11:34 imirkin_: not that i've debugged that very bug before
11:35 imirkin_: or anything like that
11:35 karolherbst: if you encounter something like that in the project, fire the project leader :D
11:35 imirkin_: strings can also be interned
11:35 imirkin_: well... bugs get written. and things work
11:35 karolherbst: Integer and stuff were introduces for containers
11:35 karolherbst: nothing more
11:35 imirkin_: and then all of a sudden things don't work for some random poor guy
11:35 karolherbst: yes
11:35 imirkin_: coz his id is > than the "small integer" limit
11:36 karolherbst: thinking is a good way to introduce bugs
11:36 imirkin_: not thinking is an even better one
11:36 karolherbst: you can't win
11:36 karolherbst: also knowing
11:36 karolherbst: really tricky one
11:38 karolherbst: imirkin_: do the instructions have some kind of id?
11:38 imirkin_: but the truly enjoyable bugs are when you end up dealing with 2 instances of the same class object but from different classloaders
11:38 karolherbst: yeah
11:38 karolherbst: but this is normal
11:38 imirkin_: and so "x instanceof class" doesn't work, even though it clearly is an instance of that class
11:38 karolherbst: can also happen with C
11:38 imirkin_: leads to a bit of head-scratching
11:39 karolherbst: or C++
11:39 imirkin_: -fno-rtti solves all those problems though
11:39 imirkin_: aka "don't do that"
11:39 karolherbst: did you know that typeid is compiler specific and contain garbage if the compiler wants
11:39 imirkin_: whereas in java there's no way around it
11:39 karolherbst: don't use custom classloader stuff?
11:40 karolherbst: :p
11:40 karolherbst: but I guess for plugins you want that
11:40 imirkin_: except then you have some dumb tomcat container thing
11:40 imirkin_: but some things are loaded by tomcat directly (e.g. for servlet filters)
11:40 karolherbst: usually you want something like pluginContainer->lookupClassObject(String fullClassName)
11:40 karolherbst: and compare with that object
11:40 karolherbst: ohh
11:40 imirkin_: and data is oddly passed around
11:40 karolherbst: okay
11:41 karolherbst: never worked on tomcat
11:41 imirkin_: well, any j2ee servlet container works like that
11:41 karolherbst: yeah I know
11:41 karolherbst: all jetty based or something?
11:41 imirkin_: leads to some very perplexing issues sometimes
11:41 imirkin_: jetty is another one
11:41 imirkin_: (also j2ee)
11:41 karolherbst: I think tomcat already uses most of jetty code
11:41 imirkin_: tomcat was around much earlier :p
11:42 imirkin_: anyways... this isn't #java
11:42 karolherbst: right
11:42 imirkin_: or even #java-is-the-worst
11:42 imirkin_: heh
11:42 karolherbst: :D
11:42 karolherbst: ohhh
11:42 karolherbst: this sounds like fun
11:42 karolherbst: well "int id;"
11:42 karolherbst: part of Instruction
11:42 karolherbst: maybe I have to give them new ids?
11:43 imirkin_: yeah... there's some sort of incremental id
11:43 karolherbst: I am scared now
11:43 imirkin_: unlikely
11:43 karolherbst: int serial; // CFG order
11:43 imirkin_: perhaps something uses id instead of serial though
11:43 imirkin_: dunno
11:43 imirkin_: i haven't really looked at that stuff. it generally works ok :)
11:43 karolherbst: I will use the sorted out instructions first
11:43 karolherbst: maybe this changes something
11:44 karolherbst: mhh same error
11:44 karolherbst: but with different values
11:45 karolherbst: ohh no
11:45 karolherbst: something else now
11:45 karolherbst: imirkin_: https://gist.github.com/karolherbst/f77951b539cd304ba096#file-file2
11:46 karolherbst: I think I moved something behind exit :D
11:46 karolherbst: and now the compiler is unhappy
11:46 imirkin_: oh yeah
11:46 karolherbst: so I have to find the exit instruction
11:46 karolherbst: and insert it last
11:47 imirkin_: karolherbst: yeah, don't touch bb->getExit()
11:47 imirkin_: karolherbst: and also insert all the phi nodes first
11:47 karolherbst: I didn't touch it
11:47 karolherbst: ohh okay
11:47 karolherbst: how can I check if something is a phi node?
11:47 karolherbst: at least I know, that I change something with glxgears
11:47 imirkin_: i forget... look around
11:47 imirkin_: isNop() maybe
11:47 imirkin_: or perhaps there's a isPhi
11:47 imirkin_: you can always check if op == OP_PHI
11:48 imirkin_: anyways, the idea is that first you have all the phi nodes, then all the regular instructions, then all the exit branches (there can be several)
11:48 karolherbst: maybe with dType?
11:48 imirkin_: no.
11:48 karolherbst: okay operation op;
11:48 karolherbst: yepp, there is OP_PHI
11:51 karolherbst: okay, exit branches
11:52 karolherbst: how do I check against that?
11:53 karolherbst: ohh OP_EXIT
11:54 imirkin_: no
11:54 imirkin_: everything after bb->getExit()
11:54 karolherbst: mhh okay
11:55 karolherbst: so I check insn == bb->Exit() and if I reached that, everything has to be collected what comes after
11:56 karolherbst: there is also Instruction *getPhi()
11:56 karolherbst: and Instruction *getEntry()
11:56 karolherbst: this makes stuff easier
11:58 imirkin_: yes
11:58 imirkin_: but bb->getPhi() points to the start
11:58 imirkin_: as does bb->getExit()
11:58 imirkin_: but you want the *last* phi
11:58 imirkin_: oh, there's also a bb->getEntry i think?
11:58 karolherbst: yes
11:58 imirkin_: i forget what all these things are
11:58 imirkin_: but one of them may be the one you want
11:58 karolherbst: and getFirst() const { return phi ? phi : entry; }
11:59 imirkin_: right. so you want to start your reordering with ->getEntry
11:59 karolherbst: entry is the first non phi
11:59 imirkin_: ya
11:59 imirkin_: but you still have to insert all the phi's ;)
11:59 imirkin_: but do your logic from ->getEntry until ->getExit
12:00 karolherbst: yes
12:11 karolherbst: wow it works
12:11 karolherbst: order should change a lot too
12:11 karolherbst: okay
12:12 karolherbst: okay heaven doesn't run
12:14 karolherbst: "ERROR: no viable spill candidates left"
12:27 karolherbst: oh no
12:27 karolherbst: forgot a continue
12:31 imirkin_: karolherbst: well, also make sure you handle the indirect stuff
12:31 imirkin_: coz heaven might actually use that
12:31 karolherbst: yeah
12:31 karolherbst: seems that way
12:32 karolherbst: okay
12:32 karolherbst: so what should I do with ->getIndirect(i, 0) and ->getIndirect(i, 1) ?
12:32 karolherbst: != NULL => with dep
12:32 imirkin_: yes.
12:32 karolherbst: okay
12:32 imirkin_: it's effectively yet-another source
12:33 karolherbst: gl extenstion?
12:33 imirkin_: it's to have sources like a[0x80+$r1] or whatever
12:33 imirkin_: could also be c[$r1][$r2]
12:34 imirkin_: never more than 2d though ;)
12:34 karolherbst: what is the "i" for?
12:34 karolherbst: src?
12:34 imirkin_: i?
12:34 imirkin_: oh yeah
12:34 imirkin_: each src might have indirect references
12:35 imirkin_: [not really, there are limits on this specified in the target, but this code should just iterate over all of it]
12:35 imirkin_: of course note that this scheduling approach will lead to more register usage
12:35 imirkin_: and thus less parallelism
12:36 karolherbst: still get this spilling error
12:36 karolherbst: and segfault
12:36 imirkin_: find the tgsi shader it's dying on
12:36 imirkin_: and then debug it with nouveau_compiler
12:36 imirkin_: and NV50_PROG_DEBUG=255
12:36 imirkin_: should shed some light on the situation
12:37 Eliasvan: Hi all, I've got a quick question about a lockup of my desktop on my GTX 760.
12:37 Eliasvan: Yesterday I installed a fresh new Fedora 22 on my machine (kernel=4.1.4-200.fc22.x86_64, mesa=10.6.3, xorg-x11-drv-nouveau=1.0.11), and I'm getting random lockups (in which case I can no longer switch to a TTY terminal, but I *can* log into the machine with ssh).
12:37 Eliasvan: When logging in with ssh, I see one single line that occurs at the moment of the lockup:
12:37 karolherbst: imirkin_: nouveau_compiler how?
12:37 Eliasvan: nouveau E[ PFIFO][0000:01:00.0] read fault at 0x0000011000 [UNSUPPORTED_KIND] from PBDMA0/HOST on channel 0x007edd2000 [unknown]
12:37 Eliasvan: Is there anyone here who can help me, or at least give some pointers to find what causes the issue?
12:37 imirkin_: karolherbst: src/gallium/drivers/nouveau/nouveau_compiler
12:37 imirkin_: Eliasvan: which gpu?
12:38 Eliasvan: GTX760
12:38 imirkin_: oh wait, you mentioned it
12:38 imirkin_: which is a GK106 iirc?
12:38 Eliasvan: I have dmesg here, so if you need further details?
12:38 imirkin_: full dmesg wouldn't hurt
12:39 Eliasvan: oh, ok, one second
12:39 imirkin_: can't guarantee it'll help though :)
12:39 karolherbst: imirkin_: https://gist.github.com/karolherbst/70159c4b310e98152b9e
12:40 imirkin_: wow
12:40 imirkin_: %r16000 -- it's going a little crazy
12:40 imirkin_: good thing that's an int
12:40 imirkin_: karolherbst: that's not the full output btw
12:40 karolherbst: yeah
12:40 karolherbst: do you need all?
12:40 Eliasvan: Here you go: http://pastebin.com/W4yB7qWC
12:41 karolherbst: have to increase konsole buffer
12:41 imirkin_: Eliasvan: are you changing clocks?
12:42 imirkin_: [i note you have pstate=1]
12:42 karolherbst: also GDDR5
12:42 Eliasvan: imirkin_: I think did in this session, but I'm sure I reverted it to the boot-clock-speed before the lockup happened
12:42 karolherbst: wanna test something? :D
12:43 imirkin_: Eliasvan: hmmm ok
12:43 imirkin_: karolherbst: he seems to be having no trouble hanging his machine without your help
12:43 karolherbst: :D
12:43 karolherbst: I meant for 0f psate gddr5
12:43 Eliasvan: imirkin_: an if I remember correctly, the lockup also occured when I had no pstate=1 in the boot-command
12:43 imirkin_: Eliasvan: well, pstate=1 on its own does very little
12:43 imirkin_: the act of changing clocks can mess things up though
12:45 Eliasvan: imirkin_: yes, but I remember I got a lockup yesterday on the clean system install (no pstate=1 or reclocking done)
12:45 imirkin_: Eliasvan: there were some fixes in 4.2 for odd things
12:45 imirkin_: that said i have no idea whether they'd help you
12:45 Eliasvan: because before this, I was using 3.11, and it was very stable
12:46 imirkin_: little things like http://cgit.freedesktop.org/~darktama/nouveau/commit/?h=linux-4.2&id=e024058fb8b3fb75f350dcc545ad55b35349fa57
12:46 imirkin_: yeah, a *ton* has changed since 3.11
12:46 Eliasvan: imirkin_: yes, and the performance increase is one of them, I can say at least ;)
12:47 imirkin_: you mean pstate?
12:47 imirkin_: or even without changing clocks?
12:47 Eliasvan: imirkin_: no, even without reclocking, performance almost doubled on my card
12:47 imirkin_: that's... surprising
12:47 karolherbst: yeah
12:47 imirkin_: perhaps something in mesa though
12:48 imirkin_: or was this with the same mesa but older kernel?
12:48 Eliasvan: imirkin_: yeah, probably
12:48 Eliasvan: thanks for the suggestions, I'll see if I can get some shine 4.2 then ;)
12:48 karolherbst: imirkin_: how much do you need from my output?
12:49 karolherbst: Eliasvan: if you want you can test some patches for gddr5 0e 0f stability, but first you want to figure out your issue I guess
12:49 Eliasvan: imirkin_: no, everything was older (Fedora 19 instead of 22)
12:49 imirkin_: with git mesa you'll get GL 4.1 on that card, in case that matters
12:50 Eliasvan: karolherbst: so you mean, that I can do "echo 0e/0f > pstate" without getting a colorful checkerboard? That would be awesome!
12:50 karolherbst: mhh
12:51 karolherbst: maybe the colorful stuff will still be there, don't know, but the gpu should not hang anymore then
12:51 imirkin_: Eliasvan: sounds like you want new and exciting ways to hang the board :)
12:51 karolherbst: but maybe, who knows
12:51 Eliasvan: imirkin_: GL4.1 would be nice as well
12:51 karolherbst: ohh
12:51 karolherbst: thats already there
12:51 imirkin_: Eliasvan: use mesa from git
12:52 Eliasvan: imirkin_: for the record, the nvidia driver runs less stable than nouveau...
12:52 karolherbst: bioshock infinite running great here :)
12:52 imirkin_: Eliasvan: that's a bad sign
12:52 Eliasvan: imirkin_: you mean my card could be broken?
12:52 imirkin_: Eliasvan: nvidia driver tends to know what it's doing, so if it's dying... could be various unhappiness
12:52 karolherbst: ohh I got my card to die with nvidia too
12:52 imirkin_: maybe, i guess
12:52 karolherbst: but that was my fault maybe
12:52 imirkin_: well, you can always make it happen when you try hard
12:53 karolherbst: still nvidia shouldn't accept every memory clock
12:53 imirkin_: but during normal operation, it should be mostly fine
12:53 Eliasvan: imirkin_: well, at least my Fedora 19 had a great stable time without any issues, although it ran at the lowest clocks, maybe that explains the increased stability
12:53 karolherbst: imirkin_: you know what, you get a file link now :p
12:54 imirkin_: Eliasvan: could be yeah
12:55 Eliasvan: karolherbst: if I use mesa from git, will I be able to test those gddr5 0e 0f patches?
12:55 karolherbst: they are kernel patches
12:55 Eliasvan: ah, ok, so if I pull linux-next?
12:55 karolherbst: actually one patch
12:55 karolherbst: mhh
12:56 karolherbst: 4.1 should be fine
12:56 karolherbst: at least thats what I use
12:56 imirkin_: Eliasvan: mesa is unrelated to clock speeds
12:56 karolherbst: imirkin_: http://www.filebin.ca/2CbBR52lNdS9/out.xz have fun
12:58 karolherbst: I think this is just an array bound issue
12:59 karolherbst: maybe
13:02 karolherbst: I try to test something less intense
13:05 imirkin_: karolherbst: well here's your shader: http://hastebin.com/raw/azejayojix
13:05 imirkin_: nice and succinct ;)
13:05 karolherbst: imirkin_: it works on bioshock infinite :D
13:05 karolherbst: speaking of less intense
13:05 imirkin_: look at the order your thing is producing though
13:05 imirkin_: i bet it ends up being like 10000 live values at once
13:06 karolherbst: imirkin_: I get like the same perf with bioshock
13:07 imirkin_: you should add a thing that prints the shader before and after your reorder
13:07 imirkin_: for a specific bit of NV50_PROG_DEBUg
13:07 imirkin_: and only run with that one
13:07 imirkin_: (and maybe the 1 bit)
13:08 karolherbst: 1 prints the code
13:08 imirkin_: heh
13:08 imirkin_: there's like 50 points in the compiler
13:08 imirkin_: where the code might get printed
13:08 karolherbst: :D
13:09 karolherbst: is operation enough?
13:09 karolherbst: or is there a nice function to do so
13:09 karolherbst: ->print() ?#
13:09 imirkin_: func->print :p
13:09 imirkin_: or prog->print
13:10 RSpliet: imirkin_: are there any satanic texts hidden in that shader?
13:10 Eliasvan: karolherbst: can I apply your gddr5 patch on a 4.1 kernel, and if yes, where can I find that patch? (yeah, I like some adventure ;) )
13:10 imirkin_: RSpliet: hm?
13:10 RSpliet: I don't recall shaders being 3000+ lines of TGSI :-P
13:11 imirkin_: RSpliet: i've seen shaders with 1000 *temporaries* :)
13:11 karolherbst: Eliasvan: https://github.com/karolherbst/nouveau/commit/6933ebb2480bb62534648c180501c5bad6d2c514
13:11 Eliasvan: thanks!
13:11 karolherbst: its for stability not messing something not up ;)
13:11 RSpliet: I'd be surprised if you saw a shader with a perm
13:12 imirkin_: RSpliet: yeah, few reach that level of awesome
13:12 Eliasvan: karolherbst: so, I'll recompile my kernel, and let you know whether the 0e/0f pstate works ;)
13:12 karolherbst: okay
13:14 karolherbst: well
13:14 karolherbst: RSpliet: ohh there are some
13:14 karolherbst: unigine heavon is really nice to hardware
13:17 karolherbst: ...
13:17 karolherbst: well
13:18 karolherbst: why does all func() invokaction lead to a segfault
13:18 karolherbst: ...
13:18 imirkin_: karolherbst: take the shader i gave you a link to
13:18 imirkin_: and just feed it to nouveau_compiler
13:18 karolherbst: k
13:18 imirkin_: should speed up your change/test cycle considerably
13:19 karolherbst: nouveau_compiler is in envytools?
13:19 imirkin_: no
13:19 imirkin_: in mesa
13:19 karolherbst: ohh
13:19 imirkin_: src/gallium/drivers/nouveau/nouveau_compiler
13:19 karolherbst: ok
13:19 imirkin_: -a e4 -
13:19 imirkin_: (for a nve4 target, reading from stdin)
13:19 imirkin_: actually you probably want to stick that thing into a file :)
13:19 imirkin_: but you get the idea
13:20 karolherbst: 1382: MUL TEFailed to parse TGSI shader
13:24 karolherbst: don't know but the shader seems kind of strange
13:24 imirkin_: oh right
13:24 imirkin_: edit nouveau_compiler
13:24 imirkin_: er, nouveau_compielr.c
13:24 imirkin_: and increase the number of tokens it'll parse
13:24 imirkin_: just add a 0
13:26 karolherbst: mhh
13:26 karolherbst: ahh
13:27 karolherbst: "TGSI asm error: Unknown register file [1340 : 25]"
13:28 imirkin_: good ol' line 1340
13:29 imirkin_: hmmmm odd
13:29 imirkin_: didn't increase things enough?
13:29 imirkin_: i think it'll read a fixed number of bytse from the file too
13:29 karolherbst: ohh
13:29 imirkin_: feel free to fix it up to be less idiotic btw
13:30 karolherbst: added to the list I may do in the future
13:30 karolherbst: yes
13:30 karolherbst: char text[65536] = {0};
13:30 imirkin_: you like that? :)
13:30 karolherbst: a lot
13:30 imirkin_:is really lazy
13:30 karolherbst: so much fun
13:31 imirkin_: i just wanted the damn thing to work
13:31 karolherbst: segfault now
13:31 imirkin_: same one as in unigine?
13:31 karolherbst: I guess so
13:31 karolherbst: at least I can use gdb now
13:31 imirkin_: if only you could use gdb to find out
13:31 imirkin_: anyways, should speed up your edit/build/test cycle a lot
13:32 imirkin_: you don't have to make install for nouveau_compielr btw
13:32 karolherbst: https://gist.github.com/karolherbst/e677a02da9ee3756f793
13:32 imirkin_: errrr... weird
13:32 imirkin_: that usually means that you have a print but didn't do NV50_PROG_DEBUG
13:32 karolherbst: ahhh
13:32 karolherbst: that makes sense
13:33 imirkin_: there's some silly init thing that needs to happen
13:34 karolherbst: okay, got compiled now
13:34 imirkin_: it worked??
13:34 imirkin_: but failed in unigine?
13:34 karolherbst: no, it compiled with same ordering
13:35 imirkin_: not sure what you're saying
13:35 karolherbst: without my stuff it compiles
13:35 imirkin_: ah, well that's expected ;)
13:35 karolherbst: segfault with my stuff
13:35 imirkin_: coincidence? :)
13:35 karolherbst: yes
13:35 karolherbst: https://gist.github.com/karolherbst/e677a02da9ee3756f793
13:36 imirkin_: must be!
13:36 karolherbst: that would be unusualy that there is a link
13:36 karolherbst: how much does that happens to be honest
13:36 imirkin_: well
13:36 imirkin_: you probably do something unexpected
13:36 imirkin_: that the code doesn't cope with
13:36 imirkin_: that's why i was saying to print out the code before and after your logic
13:36 imirkin_: so that we can see wtf it's doing
13:37 karolherbst: okay
13:37 imirkin_: whoa whoa whoa
13:37 imirkin_: wait
13:38 imirkin_: this orderInstructions thing runs a lot later than i thought
13:38 karolherbst: no problem though
13:38 imirkin_: oh, i guess it's ok actually
13:38 martm: well this conversation is too long to jump into , :) hehee
13:39 imirkin_: the call inside NVC0LegalizePostRA::insertTextureBarriers may be problematic though
13:39 imirkin_: but that's post-ra, you're not getting that far ;)
13:40 karolherbst: imirkin_: https://gist.github.com/karolherbst/e677a02da9ee3756f793#file-output-duh
13:40 karolherbst: ey ...
13:40 karolherbst: truncated
13:40 karolherbst: will throuw stuff out
13:40 imirkin_: errr what
13:41 imirkin_: you're moving OP_JOIN up
13:41 imirkin_: you can't do that
13:41 imirkin_: ok, so that's the first issue
13:41 imirkin_: you're ignoring "fixed" instructions
13:41 imirkin_: there's an insn->fixed
13:41 imirkin_: which roughly speaking means "no touch!"
13:41 karolherbst: ohhh
13:41 karolherbst: mhhh
13:42 karolherbst: so how should I handle these then?
13:43 imirkin_: also you didn't do the prints like i said
13:43 imirkin_: you need to add a debug flag, and print *right before* and *right after* your manipulations
13:43 imirkin_: and enable *only* that debug flag
13:43 karolherbst: func->print(); func->orderInstructions(this->insns); func->print();
13:43 karolherbst: like so?
13:43 imirkin_: yes.
13:43 karolherbst: I did that
13:44 imirkin_: hmmmm
13:44 imirkin_: can you add some ****************** prints?
13:44 imirkin_: i don't see where that would have taken effect
13:44 karolherbst: there are 4 prints in that file I think
13:45 imirkin_: if (prog->dbgFlags & NV50_IR_DEBUG_REG_ALLOC) {
13:45 karolherbst: "nodep inserted 68 dep inserted 38 nodep inserted 6" and so in is stuff in orderInstructions
13:45 imirkin_: see that?
13:45 karolherbst: yes
13:45 imirkin_: add your own debug flag
13:45 imirkin_: and condition based on that flag being set
13:45 imirkin_: and then run it with only that flag set
13:45 karolherbst: okay
13:46 imirkin_: well, if your stuff is supposed to have run after the "nodep bla" stuff
13:46 imirkin_: then it's not working
13:46 imirkin_: e.g. look at BB:0
13:46 imirkin_: i would have expected e.g. mov u32 %r5896 0x00000000 to have moved up to the "top" section
13:46 imirkin_: while "rcp f32 %r5895 %r5894" would be in the bottom section
13:47 imirkin_: but basically what your logic is doing is super-extending the live ranges of all values
13:47 imirkin_: which is probably not great
13:47 imirkin_: as for fixed isntructions
13:47 imirkin_: just do a "flush" when you hit a fixed instruction
13:47 imirkin_: i.e. send out all of the "dep" instructions
13:47 imirkin_: and clear the list
13:47 imirkin_: and then emit the fixed instruction
13:47 imirkin_: and continue
13:48 martm: whao things are getting interesting, imirkin_: i want you to join some channel where we can talk about fpga's;) its after i am doing some couple of days reading about my issue (where i just probably handle it myself though)
13:49 karolherbst: what should go first btw: deps or nodeps ?
13:49 imirkin_: nodeps :)
13:49 imirkin_: otherwise it won't work
13:49 imirkin_: heh
13:49 karolherbst: yeah
13:49 karolherbst: I noticed that already
13:50 imirkin_: since the dep stuff depends on the nodep stuff
13:50 imirkin_: but imagine you have 1000 instructions
13:50 imirkin_: of which 500 are "nodep"
13:50 imirkin_: that means that you're creating 500 live values
13:50 imirkin_: before you start using them
13:50 imirkin_: which means that you need 500 registers
13:50 karolherbst: check line 11183 and so on
13:50 imirkin_: or you start spilling like it's going out of style
13:50 karolherbst: these are the stuff I parse
13:51 karolherbst: wow, is that site slow
13:51 karolherbst: too much
13:51 karolherbst: will do the fixed stuff thingy
13:51 imirkin_: what you really want to do
13:51 imirkin_: is manage the number of live values
13:51 karolherbst: okay
13:51 imirkin_: obviously the underlying code has a minimum that you won't be able to get under
13:51 karolherbst: so basically a id->ins map
13:51 imirkin_: but you should also have a target of, say, 16 live values
13:51 karolherbst: and whenever I insterted id-1 instructions I insert the right fixed one
13:52 imirkin_: once you hit 16 live values, start scheduling instructions that use those values
13:52 imirkin_: which means that you need to build a "schedulable instruction" list
13:52 karolherbst: mhhh I thought I should do something easy first...
13:52 imirkin_: which to begin with will be all instructions that have no deps on values in that bb
13:52 trusktr: Hey guys, my desktop freezes, and I see "chromium.desktop[3393]: nouveau: kernel rejected pushbuf: Invalid argument". Is there a fix for this?
13:53 imirkin_: karolherbst: yeah.... but every algo will require that list
13:53 imirkin_: karolherbst: so you might as well do it
13:53 karolherbst: but what does "fixed" mean. fixed in position or something else?
13:53 imirkin_: trusktr: a few people have reported issues like that, cause unknown sadly
13:53 imirkin_: trusktr: if you're using libdrm-2.4.60, don't
13:53 imirkin_: but allegedly people have issues even with other libdrm versions
14:05 karolherbst: okay so basically I just create another tree?
14:05 imirkin_: karolherbst: you need to build a list
14:05 imirkin_: of schedulable instructions
14:05 imirkin_: a priority queue maybe, dunno
14:06 karolherbst: but what means schedulable in this context then
14:06 imirkin_: pqueue doesn't quite work since order has external dependencies
14:06 karolherbst: if I hit a fixed instruction
14:06 imirkin_: instruction that you can emit
14:06 karolherbst: where does it have to be?
14:06 imirkin_: and have it do what you want
14:06 imirkin_: like if you have
14:06 imirkin_: mov a, b
14:06 imirkin_: mov c, d
14:07 imirkin_: add e, a, c
14:07 imirkin_: you can't emit the add before you emit the mov's
14:07 imirkin_: so at the beginning, only the 2 mov's would be shcedulable
14:07 karolherbst: yeah obviously
14:07 trusktr: Thanks imirkin_ I'll try not using that version of libdrm and see what happens.
14:07 imirkin_: trusktr: are you actually on libdrm-2.4.60?
14:08 imirkin_: trusktr: what distro ships it by default?
14:08 trusktr: I'm not sure at the moment as I'm booted into OS X. I wrote a note to check when I get back to Linux.
14:08 trusktr: I'm using Arch Linux.
14:08 imirkin_: ah ok
14:08 Eliasvan: karolherbst: what cards do you have to test on?
14:08 imirkin_: arch is probably much later by now
14:09 karolherbst: mine gk106 (770M)
14:09 trusktr: or sure. The problem in my case only ever seems to be triggered by Chromium (chromium.desktop in the logs).
14:09 trusktr: s/or/for
14:11 Eliasvan: karolherbst: well, I wish you felicitations, your patch works perfect!: http://filebin.ca/2CbX37Vwb5Pr
14:12 karolherbst: Eliasvan: nice
14:12 Eliasvan: and FlightGear is still running, no lockup yet
14:12 karolherbst: I bet you get a lot more performance?
14:13 Eliasvan: (I left my plane flying, seems it hasn't crashed yet ;) )
14:13 karolherbst: more than doubled is possible
14:13 karolherbst: try to run glxspheres with vblank_mode=1
14:13 karolherbst: vblank_mode=1 glxspheres
14:13 Eliasvan: in FlightGear from 23 to 30 fps
14:13 karolherbst: ohh okay
14:13 karolherbst: still a little bit low :/
14:13 Eliasvan: but Flightgear seems heavily CPU-bound
14:13 karolherbst: or gpu core bound
14:13 Eliasvan: I'll test another one, any suggestions?
14:14 karolherbst: unigine heavon :D
14:14 Eliasvan: hmm, ok
14:14 karolherbst: *heaven
14:14 imirkin_: karolherbst: get him on pcie v3 8GT/s?
14:14 karolherbst: :D
14:14 Eliasvan: I'll have to download that one
14:14 karolherbst: now stuff get sirous
14:14 karolherbst: *serious
14:14 karolherbst: Eliasvan: does you have envytools installed?
14:14 karolherbst: *do
14:14 Eliasvan: karolherbst: no, I don't have envytools installed
14:15 karolherbst: mhh you could install it and increase the pcie link speed
14:15 karolherbst: may give you more performance too
14:15 karolherbst: but only some games will really speed up
14:15 tobijk: karolherbst: just do the link reclocking already ;-)
14:15 karolherbst: :D
14:15 karolherbst: gddr5 0f stable is the new shit tobijk ;)
14:16 imirkin_: karolherbst: awesome detective work btw :)
14:16 tobijk: hehe :)
14:17 karolherbst: I like mmiotraces a lot, they show so much stuff
14:19 karolherbst: imirkin_: it is still strange that it just works for other too, I mean how unreal is that...
14:19 karolherbst: I still wait for the big clash somewhere
14:20 imirkin_: i'm sure it won't work on some boards
14:20 imirkin_: but it sounds like it works on a good fraction of boards out there
14:20 karolherbst: yeah
14:21 karolherbst: I know that crystal clock != 27 MHz is a corner case I should take care off
14:21 karolherbst: but other then that? mhhh
14:21 imirkin_: is that a thing?
14:21 imirkin_: i guess you could have an external xtal
14:21 karolherbst: RSpliet said for pre tesla there were some
14:24 Eliasvan: ok, glxspheres without vblank: 470fps for "07", and approx 940fps for "0a", "0e" and "0f"
14:24 karolherbst: glxspheres needs pcie link reclock
14:24 Eliasvan: probably no big difference for the last two, since they mostly increase memclock
14:24 karolherbst: or you hit just the pci link bottleneck
14:25 Eliasvan: should I test unigene heaven?
14:25 karolherbst: you should install envytools
14:25 Eliasvan: to do what?
14:25 karolherbst: pci link from 2.5GT/s to 8.0GT/s
14:25 Eliasvan: aha, so nouveau doesn't do that?
14:25 karolherbst: nope
14:25 Eliasvan: and nvidia, does that one do it?
14:25 karolherbst: because it mostly doesn't give any perf boost
14:26 karolherbst: yeah, nvidia does
14:26 karolherbst: there are some games and benchmarks whcih will get much better results
14:26 Eliasvan: ah, maybe that's the reason why nvidia is less stable?
14:26 karolherbst: but in general, maybe 5% more performance is possible
14:26 karolherbst: doubt that
14:27 Eliasvan: anyway, I'll give Unigene a go, (without link upgrading)
14:27 Eliasvan: oh no... guess what: lockup
14:27 karolherbst: yeah
14:27 karolherbst: this still happens
14:27 Eliasvan: it was at 07...
14:28 Eliasvan: it's another issue I guess
14:28 karolherbst: maybe voltage?
14:28 karolherbst: don't know
14:28 karolherbst: there is no change for 07 or 0a from my patch
14:28 karolherbst: it only effects mem clocks higher 2.4GHz
14:29 karolherbst: Eliasvan: do you get any issues after pstate change?
14:29 Eliasvan: oh wait, I'm getting some dmesg output related to the reclocking
14:29 Eliasvan: nouveau E[ CLK][0000:01:00.0] failed to raise voltage: -22
14:30 karolherbst: okay
14:30 karolherbst: I know that one
14:30 karolherbst: but you seem to have a voltage gpio
14:30 Eliasvan: karolherbst: apart from the lockup that happened before your patch as well, no
14:30 karolherbst: at least this
14:30 karolherbst: now we come to the really off stuff
14:30 karolherbst: mhh
14:31 karolherbst: this doesn't make much sense
14:31 karolherbst: default voltage too low?
14:31 karolherbst: imirkin_: is this a thing?
14:31 Eliasvan: that dmesg message is unrelated to the lockup: the lockup happens 6 min later with the same message as before:
14:31 Eliasvan: nouveau E[ PFIFO][0000:01:00.0] read fault at 0x0000011000 [UNSUPPORTED_KIND] from PBDMA0/HOST on channel 0x007edd2000 [unknown]
14:31 karolherbst: voltage is a tricky thing
14:32 karolherbst: mhh
14:32 karolherbst: Eliasvan: which kernel?
14:32 Eliasvan: karolherbst: do you mind I send you my dmesg?
14:32 karolherbst: nope, go ahead
14:33 imirkin_: karolherbst: i know very little about this, sorry
14:33 karolherbst: k
14:33 imirkin_: karolherbst: i was just happy to get my ddr3 card reclocking so that it's faster than the intel igpu
14:34 karolherbst: I am still a bit confused about what I have to do for the reordering now :/
14:34 Eliasvan: http://pastebin.com/Ej2umSV0
14:35 imirkin_: er hm... error setting pstate, that should mean that the clocks don't change
14:35 imirkin_: probably the failure happens too late
14:35 karolherbst: yeah your clock stays too low for the core
14:35 karolherbst: but this shouldn't matter that much...
14:35 karolherbst: there are others with the same problem though
14:35 Eliasvan: oh, stupid...
14:36 karolherbst: I have the same problem, but I helped me with that hack: https://github.com/karolherbst/nouveau/commit/5554a27415b61a59f1667074cd2162c9f2470cdf
14:36 Eliasvan: but I did "cat pstate", and the "*" clearly indicated a change...
14:36 karolherbst: but this won't help you I think
14:36 karolherbst: no, its fine
14:36 karolherbst: its like that
14:36 karolherbst: mem clock first
14:36 karolherbst: core clock second
14:37 imirkin_: seems like voltage increase should come first
14:37 imirkin_: and voltage decrease should come last
14:37 karolherbst: imirkin_: mem and core have different voltage thingies
14:37 imirkin_: ah
14:38 Eliasvan: karolherbst: so, should I apply your patch in the link you sent at 23:36?
14:38 karolherbst: there are only two voltages for memory I think
14:38 karolherbst: 1.35 and 1.5?
14:38 karolherbst: not sure
14:38 karolherbst: Eliasvan: it shouldn't matter
14:38 karolherbst: the voltage is fine for 07 clock
14:38 karolherbst: and I guess it changes voltage right for 0a
14:39 Eliasvan: karolherbst: once I extracted my vbios, would that help you?
14:40 karolherbst: a bit I guess
14:40 karolherbst: I am not that good with the stuff
14:40 karolherbst: just think of me as a lucky guesser who gets things right
14:41 Eliasvan: I sent it too mmio trace dumps email, some time ago
14:41 karolherbst: others here have way more knowledge than me
14:41 Eliasvan: not sure you can access that
14:41 karolherbst: mhh wait
14:41 karolherbst: maybe I can
14:42 karolherbst: ohh I guess I don't find it
14:42 karolherbst: well
14:43 karolherbst: I mean I kind of know why the voltaging fails
14:43 karolherbst: it just calulates the value not that well
14:44 Eliasvan: ah, ok
14:45 karolherbst: imirkin_: I still not get how I should handle the fixed instructions. I just saw in the source its an indicator to not do dead code elimination, but aside from that, what does it really means
14:45 imirkin_: it means "no touch!"
14:46 karolherbst: position?
14:46 imirkin_: everything
14:46 karolherbst: mhh
14:46 imirkin_: basically the instructions that happen before it should happen before it
14:46 imirkin_: and the instructions that happen after it should happen after it
14:46 imirkin_: it's a compiler barrier effectively
14:46 karolherbst: so if the 374th instruction is a fixed one, it has to be t the 374th position after the reorder?
14:46 karolherbst: ahhh
14:46 karolherbst: okay
14:47 karolherbst: so after I hit such an instruction I should flush my current lists of collected stuff, reorder stupiditly, insert that fixed one and continue
14:48 karolherbst: okay, should be easy enough
14:48 imirkin_: right
14:48 imirkin_: now, we could determine what things are safe to move across fixed instructions
14:48 imirkin_: however i'd avoid that discussion for now
14:48 karolherbst: yeah
14:48 imirkin_: fixed ops are pretty rare
14:48 karolherbst: it should keep easy
14:48 karolherbst: for now
14:48 karolherbst: *simple
14:57 karolherbst: imirkin_: anything else I have to take care off? Don't think I will do much anymore today, so it would be nice to know that stuff for the time you are not available ;)
14:57 imirkin_: make sure you look at the indirect bits
14:57 imirkin_: pretty sure any predicates should end up as sources
14:58 imirkin_: there might be some unpleasantness with nv50 and its equivalent of EFLAGS
14:58 imirkin_: i forget if we track those deps explicitly or not
15:06 Eliasvan: ok guys, thanks for the help!
15:07 Eliasvan: (I'm gonna leave now ;) )
15:08 imirkin_: karolherbst: this also gets tricky if you have code like
15:08 imirkin_: mov l[0], a
15:08 imirkin_: mov l[1], b
15:08 imirkin_: mov l[2], c
15:08 imirkin_: mov x, l[d]
15:08 imirkin_: l = lmem
15:09 imirkin_: this can happen e.g. for temp arrays with indirect indices
15:09 imirkin_: not sure how to track that dep tbh
15:09 imirkin_: you can just treat any read access from lmem as a "fixed" instruction
15:10 Eliasvan: karolherbst: before I leave, are there things I can help you with (for testing)? I'm almost never on IRC, so that's why I ask now
15:10 imirkin_: if you want to be more thorough look at what the MemoryOpt pass does
15:46 karolherbst: imirkin_: segfault in https://gist.github.com/karolherbst/e44b3d0e45d09fb495c0
15:47 imirkin_: karolherbst: you're probably doing something you're not supposed to
15:47 imirkin_: karolherbst: or perhaps you just have more live values than it can deal with
15:47 imirkin_: karolherbst: either way, do the prints i mentioned
15:47 imirkin_: they are the ones that will contain info, not the segfault backtrace
15:49 imirkin_: karolherbst: but really you should impl the thing i mentioned
15:49 imirkin_: coz what you're doing will heavily pessimize most shaders
15:49 karolherbst: I did already this fixed thing a bit
15:49 imirkin_: can i see the before/after ?
15:50 karolherbst: mhhh strange
15:51 karolherbst: ahh my issue
15:51 karolherbst: *mistake
15:51 imirkin_: well, when you're patching, no matter what the issue, it'll always be your issue ;)
15:51 karolherbst: imirkin_: https://gist.github.com/karolherbst/802980a748b89606cf4b
15:52 karolherbst: no I was wondering why I didn't get the output
15:52 karolherbst: and I just commented that out again :/
15:52 imirkin_: is it doing anything?
15:52 imirkin_: look at the start of BB:0
15:53 imirkin_: looks identical...
15:54 karolherbst: mhh I will throw it into a diff tool
15:54 karolherbst: mhh
15:54 karolherbst: maybe func->print(); isn't good?
15:55 imirkin_: lol
15:55 imirkin_: yeah, way more likely that the print function is broken than your pass is broken.
15:55 karolherbst: :D who knows
15:56 karolherbst: I mean maybe I have to do something before the output gets updated?
15:56 karolherbst: I am pretty sure that I change something in the order
15:57 karolherbst: yeah, serial is changing for a lot of instructions
15:58 karolherbst: or wait
15:58 karolherbst: that's odd
15:58 karolherbst: ahh I skipped the deps one
15:58 karolherbst: printing changes
16:00 karolherbst: so the instructions get sorted into the insert list
16:03 karolherbst: yeah, even if I skip the fixed ins handling, the output is the same, but I get another segfault
16:03 karolherbst: I think the print doesn't takes the list into account passed into orderInstructions?
16:03 imirkin_: i think that your orderInstructions thing isn't correctly modifying the instruction list :p
16:04 imirkin_: expect to see problems until it does.
16:04 imirkin_: iirc there's a insn->next
16:04 imirkin_: make sure to fix that up too
16:04 karolherbst: ohh, I have to fix all the pointers too
16:04 karolherbst: okay
16:05 imirkin_: so yeah, you kept the original order but messed up all the serials
16:05 imirkin_: probably not a great thing to do ;)
16:05 karolherbst: :D
16:06 karolherbst: can I iterate backwards through ArrayList? i guess not :(
16:06 imirkin_: wow, that is a MONSTER shader btw... ugh. 36816 bytes of compiled code
16:06 karolherbst: yeah
16:06 karolherbst: it only gets worse in the future
16:06 karolherbst: ohh maybe not
16:07 karolherbst: guess there is a lot of optimization potential? :D
16:07 imirkin_: unlikely... it's just an enormous shader
16:07 imirkin_: oooh, interesting pattern
16:07 imirkin_: 3534: and u32 $r24 $r1 0x000000ff (8)
16:07 imirkin_: 3535: cvt f32 $r24 u32 $r24 (8)
16:07 imirkin_: that could be optimized into cvt f32 $r24 u8 $r1
16:08 karolherbst: :)
16:08 imirkin_: 3537: shr u32 $r25 $r1 0x00000008 (8)
16:08 imirkin_: 3538: and u32 $r25 $r25 0x000000ff (8)
16:08 imirkin_: 3539: cvt f32 $r25 u32 $r25 (8)
16:09 imirkin_: there's actual vector ops that allow you to access a specific byte of a register
16:09 imirkin_: although i don't think there's a vcvt :)
16:09 karolherbst: so I guess there can be something done for this shader?
16:11 imirkin_: yeah
16:11 imirkin_: it's basically taking a 4 packed bytes in an integer
16:11 imirkin_: and converting them to 4 floats
16:11 imirkin_: i'm guessing they really just want to be using NV_gpu_shader5's u8vec4
16:12 imirkin_: more generally though, shift + low-bits-and = extbf
16:12 imirkin_: should probably add that op, that should be easy
16:13 imirkin_: give me a minute....
16:13 karolherbst: I will compare performance then I guess
16:13 karolherbst: but it shouldn't change much
16:14 imirkin_: no, it's just a handful of instructions probably
16:14 imirkin_: actually looks like they do it a bunch
16:14 imirkin_: but still, it's just 1 instruction saved every so often
16:14 imirkin_: out of a million instructions ;)
16:14 karolherbst: yeah well
16:14 karolherbst: saving one out of two is nice
16:15 karolherbst: I guess there are patterns where you only save one out of 10 or something
16:16 karolherbst: is there something else I need to fix? got a infinite loop somehow :(
16:16 karolherbst: ahh prev
16:16 karolherbst: makes sense
16:17 karolherbst: wtf...
16:17 karolherbst: do you know what?
16:17 karolherbst: this shader you saw was a joke
16:17 karolherbst: "14646: exit - # (0)" ...
16:17 karolherbst: I hope I messed something up :D
16:18 karolherbst: yeah I messed up
16:29 karolherbst: okay, got multiple exit, this can't be right
16:35 imirkin_: karolherbst: early version: http://hastebin.com/aqakeyuvex.coffee
16:35 imirkin_: seems to not be entirely broken
16:37 imirkin_: oh, but even better, there IS a way to convert a value straight out
16:37 imirkin_: that'll require a lot more hacking though
16:40 imirkin_: I2F.F32.U8 R0, R1.B1
16:40 imirkin_: nice. it does look it exists!
16:40 karolherbst: :D
16:40 karolherbst: nice
16:40 imirkin_: so that means i should be able to detect cvt + extbf with appropriate params
16:41 imirkin_: let's see... where do i do that? algebraic opt maybe?
16:42 karolherbst: whats extbf?
16:42 imirkin_: extract bitfield
16:42 karolherbst: sounds like algebraic to me
16:42 karolherbst: or where would you put cvt?
16:43 imirkin_: heh
16:43 imirkin_: well there's already some funky cvt handling in tehre
16:43 karolherbst: so its gettig more funky
16:44 karolherbst: yeah somehow I give the exit instructions a next value
16:44 marcosps1: Good night :)
16:45 karolherbst: maybe I shouldn't do that
16:45 karolherbst: ohh wait
16:45 karolherbst: imirkin_: should any instruction point to exit with next?
16:46 imirkin_: nfc
16:46 imirkin_: you def need to fix up the bb and function entry/exit
16:46 imirkin_: there are like a million things that have to be consistent
16:46 imirkin_: i have no idea what they all are, sorry
16:47 marcosps1: imirkin_: what are these bbs..? Branches or something like it...?
16:47 karolherbst: https://gist.github.com/karolherbst/1660a9e8c74474dda66c
16:48 karolherbst: there is a difference now, but the exit got wrong
16:48 imirkin_: marcosps1: basic block
16:49 marcosps1: imirkin_: this basic blocks are "block of operations that will be executed by the GPU" ?
16:49 imirkin_: marcosps1: no, it's a compiler concept
16:49 imirkin_: marcosps1: it's a block of instructions without control flow in the middle
16:49 marcosps1: imirkin_: ok :)
16:50 imirkin_: makes it easy to reason about how things are related when you don't have jumps all over the place
16:50 imirkin_: or at least easier :)
16:51 imirkin_: karolherbst: this is how iteration is done: for (i = bb->getEntry(); i; i = next)
16:51 imirkin_: the last instruction in the bb should have a null next poitner
16:51 karolherbst: yeah
16:51 karolherbst: I know
16:51 imirkin_: instructions are a singly-linked list
16:51 karolherbst: I do that
16:51 imirkin_: exit should be the last instruction, so no next ptr :)
16:52 karolherbst: https://gist.github.com/karolherbst/1660a9e8c74474dda66c#file-tmp-cpp
16:52 karolherbst: maybe I am too optimistic here
16:52 imirkin_: this seems confusing
16:52 imirkin_: first off, why i = 1?
16:52 karolherbst: Instruction *current = static_cast<Instruction *>(result.get(0));
16:53 karolherbst: I read the first one before the loop
16:53 karolherbst: don't like ugly ifness inside the loop
16:53 imirkin_: secondly, usually you just do current->prev->next = current
16:53 imirkin_: instead of doing that last bs
16:53 karolherbst: ahhh
16:53 karolherbst: makes sense somehow
16:54 imirkin_: :)
16:54 karolherbst: but then again
16:54 karolherbst: I have to set current->prev before
16:54 imirkin_: hehe
16:54 imirkin_: yes.
16:54 karolherbst: so whats the point
16:54 karolherbst: :D
16:54 imirkin_: so.......
16:54 karolherbst: mhh
16:54 karolherbst: still
16:55 karolherbst: I don't get where this exit comes from
16:55 imirkin_: right.
16:55 karolherbst: I clearly set the next pointer to NULL
16:55 karolherbst: even if I do it always
16:55 imirkin_: yeah, you managed to multiply it
16:55 imirkin_: well done
16:55 karolherbst: I transformed the 4000 instruction thingy into a 14.000 big one
16:55 karolherbst: that's why I trited something smaller
16:55 imirkin_: all loads/stores for lmem right
16:56 imirkin_: coz you end up with a bajillion live values
16:56 imirkin_: so everything is spilled
16:56 imirkin_: coz you only have 64 regs
16:56 karolherbst: I thought it was clear, that we don't do anything smart at first? :D
16:57 karolherbst: first I want to fix this stupid exit thingy
16:59 mupuf: karolherbst: weird, nouveau works on my nve6 on linux 4.0, but is not recognized on 4.1
16:59 mupuf: what the heck? Nouveau did not get any changes for 4.1 :o
17:00 karolherbst: :)
17:00 karolherbst: magic
17:00 karolherbst: mupuf: by the way, I found 2 volunteers for the patch
17:01 karolherbst: seems to work fine so long
17:01 karolherbst: its "improving the situation"
17:02 mupuf: very well!
17:02 mupuf: I need to give it a try at some point then!
17:03 mupuf: well, I guess I will have a look at this problem later
17:03 mupuf: good nigth!
17:03 karolherbst: night
17:03 karolherbst: imirkin_: ohhhhhhhhh
17:04 karolherbst: the result list doesn't respect bb boundaries
17:04 karolherbst: I need a bb local result list
17:04 karolherbst: and copy the stuff over after fixing the stuff
17:04 karolherbst: ...
17:06 karolherbst: now it looks better
17:13 karolherbst: imirkin_: okay something here is wrong https://gist.github.com/karolherbst/2d6cdbdc26b8ce483de8
17:13 karolherbst: maybe the "all loads/stores for lmem right" thingy?
17:13 karolherbst: it crashes in nv50_ir::GCRA::simplify
17:14 imirkin_: heheheh yeah.
17:14 imirkin_: you're destroying the spilling logic
17:15 karolherbst: I am not that good with compilers, I know that something like spilling exist, but never looked up what it means
17:15 imirkin_: let's say you have 2 registers
17:16 karolherbst: ohhh I think I already got it
17:16 imirkin_: but you want to compute a*b + c*d + e*f
17:16 karolherbst: spilling is when you have to access memory instead of register?
17:16 imirkin_: you compute a*b, c*d, e*f
17:16 imirkin_: now you have 3 values
17:16 imirkin_: but you only have 2 registers
17:16 karolherbst: or the other way around?
17:16 imirkin_: now in this case you could be clever about it
17:16 imirkin_: spilling means that you store the value of a register to memory
17:16 imirkin_: and then restore it down the line when you need it
17:16 karolherbst: okay
17:17 imirkin_: and in the meanwhile use that register for other purposes
17:17 karolherbst: I hope performance get a lot worse after this stuff works
17:17 imirkin_: btw, i'm highly confused by the old code as well
17:17 imirkin_: can i see the tgsi?
17:18 imirkin_: i think there's some sort of horrible bug in... something
17:18 karolherbst: how do I dump that?
17:18 imirkin_: scroll up
17:18 imirkin_: NV50_PROG_DEBUG=1
17:18 imirkin_: should produce the tgsi
17:18 karolherbst: thik I found it
17:19 karolherbst: is it normal that thre are several nouveau thingies?
17:19 imirkin_: ld u32 %r17310 l[0x10]
17:19 imirkin_: one is pre-ra, the other is post
17:19 imirkin_: anyways... wtf is that thing. nothing has written to lmem
17:19 imirkin_: where is that op coming from
17:19 karolherbst: wait
17:19 karolherbst: the shader file I use for nouveau_compiler, is that tgsi already?
17:19 imirkin_: that's in the *old*
17:19 imirkin_: yes
17:19 karolherbst: ohh okay
17:20 karolherbst: https://gist.github.com/karolherbst/2d6cdbdc26b8ce483de8#file-tgsi
17:20 imirkin_: wait
17:21 imirkin_: that's the same one i've been using
17:21 karolherbst: yeah
17:21 karolherbst: you gave me that one
17:21 imirkin_: i don't see any lmem stores
17:22 imirkin_: perhaps it's a weird intermediate stage thing
17:23 karolherbst: so do I have to recalculate the splling information somehow?
17:23 karolherbst: or is there something utterly broken with the BB?
17:24 karolherbst: s
17:28 imirkin_: something utterly broken
17:28 imirkin_: but not sure how.
17:29 imirkin_: imho orderInstructions is getting called at a weird time
17:29 imirkin_: i see why calim did it that way
17:29 imirkin_: but i'd call it strictly before RA
17:29 imirkin_: not in that weirdo loop
17:29 imirkin_: where crazy things happen
17:29 karolherbst: we can move it I guess?
17:33 imirkin_: maybe
17:33 imirkin_: all that stuff is pretty subtle
17:33 imirkin_: i'd start with a smaller shader
17:33 imirkin_: and get a feel for how things go
17:33 imirkin_: and play with diff scheduling algo's
17:34 imirkin_: that will also give yous ome familiarity with the whole compiler system
17:34 imirkin_: and then you'll be better placed to futz with the orderInstructions() placement
17:38 marcosps1: imirkin_: I removed your if of target_nvc0.c and I added a printf statement there to verify if a double immediate is comming, but now, how can I test it? I tried to run the shader_runner to verify if the print is shown, but it's not...
17:39 imirkin_: marcosps1: not overly surprising
17:39 imirkin_: there's probably other things protecting against that case
17:40 karolherbst: imirkin_: the other shader in heaven seem to compile fine
17:40 marcosps1: imirkin_: So, deeper and deeper of mesa :) I'll to verify it now...
17:40 karolherbst: glxgears also runs without issues
17:41 karolherbst: could try bioshock though, it ran before
17:42 imirkin_: nice. my code is now generating stuff like I2F.F32.U8 R1, R1.B2
17:42 imirkin_: now to find how the encoding works on gk110 =/
17:43 imirkin_: so that i can actually test it
17:43 imirkin_: heh
17:49 marcosps1: karolherbst: I have a nvc0, is you want to make some test, please let me know :)
17:53 imirkin_: karolherbst: where did that shader come from? heaven?
17:54 karolherbst: yes
17:54 karolherbst: glxspheres is working by the way
17:54 karolherbst: but with only 50% perf
17:55 imirkin_: ;)
17:56 imirkin_: coz you spill too many registers
17:56 imirkin_: or rather use
17:56 imirkin_: so you don't get as much parallelism
17:57 karolherbst: strange
17:57 karolherbst: stock seems to be as fast as
17:58 karolherbst: imirkin_: but your patch seems to improve performance
17:58 imirkin_: karolherbst: well i have a better one coming up
17:58 karolherbst: could be random though
17:58 imirkin_: just putting some finishing touches
17:58 karolherbst: glxspheres fps isn't stable across runs
17:59 karolherbst: but usually I got like 1000
17:59 karolherbst: and now only 500
17:59 imirkin_: oh, it's unlikely that my patch helped glxspheres
17:59 imirkin_: i highly doubt they do integer math
18:00 karolherbst: yeah, I think its random
18:00 imirkin_: oh, i did add a thing to make it so that AND(x, 0) == 0
18:01 imirkin_: my glxspheres number stayed pretty constant
18:01 karolherbst: maybe installing self compiled mesa isn't that good
18:01 karolherbst: don't know
18:02 tobijk: karolherbst: you installed it with debug info?
18:03 karolherbst: could be, will reinstall from system and see what I got
18:04 tobijk: btw i would install it to some extra dir and do LD_LIBRARY_PATH=/path/to/folder
18:04 tobijk: lets you keep the system mesa :)
18:04 karolherbst: usually I don't mind
18:05 marcosps1: imirkin_: finally: https://paste.fedoraproject.org/256537/39946285/
18:05 marcosps1: imirkin_: sorry, wrong guess
18:06 imirkin_: marcosps1: that's for merging constant split/merge pairs
18:06 imirkin_: which happen as a result of how the tgsi integration is done
18:07 marcosps1: imirkin_: yes, but the print was wrong...
18:08 imirkin_: yes :)
18:09 karolherbst: ohhh
18:09 karolherbst: its the fault of my intel card
18:09 karolherbst: glxspheres fps stay constant on resize
18:09 karolherbst: but MP/s goes up a lot
18:09 marcosps1: imirkin_: now i'm reaching some place: https://paste.fedoraproject.org/256538/46551143/
18:09 marcosps1: imirkin_: it just emits the load when the dType is 32bit...
18:10 imirkin_: LOAD is for loading stuff from various memory files
18:10 imirkin_: not for immediates
18:10 imirkin_: CONST is a constbuf, not an immediate
18:11 marcosps1: imirkin_: so, FILE_IMMEDIATE, of enum DataFile, is the right place to look? I'm king of lost now :)
18:12 karolherbst: imirkin_: okay
18:12 marcosps1: *kind of
18:12 karolherbst: bioshock infinite runs with my modifications
18:12 karolherbst: same performance
18:12 imirkin_: marcosps1: so basically you want to look at the ConstantFolding peephole pass
18:13 imirkin_: marcosps1: tbh i'm not sure offhand how to add support... basically you want to be able to detect that an immediate can be loaded. check out how getImmediate() works
18:13 imirkin_: you'll have to extend that to "see" through merge ops
18:15 marcosps1: imirkin_: ok, thanks :)
18:15 imirkin_: karolherbst: look at the top 2 commits at https://github.com/imirkin/mesa/commits/texture_samples
18:21 karolherbst: wow
18:22 karolherbst: the last program dropped from 4050 to 3660 instructions
18:22 imirkin_: uh oh
18:22 tobijk: some magic patch it seems :)
18:23 imirkin_: oh hm, actually that seems reasonable
18:23 imirkin_: wow they do that junk a whole lot
18:23 karolherbst: benchmark
18:23 karolherbst: maybe they want to generate garbage
18:23 imirkin_: what's a little frightening is that it doesn't matter which byte i extract, it still looked right
18:24 imirkin_: maybe it was some silly weird object that looked wrong and i didnt' notice though
18:24 karolherbst: wow, that got wrong
18:24 imirkin_: oh, it's messed up? :(
18:25 karolherbst: nouveau failed to load
18:25 imirkin_: huh?
18:26 imirkin_: just checked the nve4 code that's generated, nvdisasm likes it
18:26 karolherbst: will be my mistake
18:27 karolherbst: "libGL error: driver pointer missing"
18:27 karolherbst: then again a clean rebuild
18:27 imirkin_: pretty sure my patch didn't do that ;)
18:30 karolherbst: this also explains poor glxspheres performance
18:33 karolherbst: will run a benchmark now
18:37 imirkin_: this is highly unlikely to affect real programs
18:37 imirkin_: unigine is probably alone
18:37 imirkin_: maybe dolphin
18:38 tobijk: benchmarks with on-purpose crappy shaders? :>
18:38 karolherbst: ohh right dolphin
18:39 imirkin_: tobijk: probably not. but i just don't see why they'd do it that way
18:39 karolherbst: but less instructions is still a good thing in general?
18:39 imirkin_: basically they're retrieving values from a texture
18:39 imirkin_: and then extracting the low 24 bits into separate numbers, converting them to float
18:39 imirkin_: a supposedly float texture, no less!
18:41 karolherbst: I knew I saw something
18:41 karolherbst: its like ultra pretty rare
18:41 karolherbst: but sometimes there is a red flicker on the ground in unigine
18:41 karolherbst: only saw it like 3 times?
18:41 imirkin_: only with my patch?
18:41 karolherbst: nope
18:41 karolherbst: I saw a flicker before
18:41 karolherbst: something odd
18:42 karolherbst: testing without your patch currently anyway
18:42 imirkin_: weird
18:43 karolherbst: imirkin_: no difference with your patch in performance
18:44 imirkin_: yeah not surprising
18:44 imirkin_: i think that shader is probably used for something silly
18:44 karolherbst: yes
18:44 karolherbst: still its good to have such a optimization
18:48 karolherbst: mhh
18:48 karolherbst: still got the segfault :/
18:49 tobijk: i produced one right now as well :D
18:49 karolherbst: but I think this is in the next compilation now
18:50 karolherbst: imirkin_: the one shadre previously producing "ERROR: no viable spill candidates left" doesn't to it anymore
18:50 karolherbst: and the crashing one does that
18:50 karolherbst: now the one after that crashes
18:50 karolherbst: like I can do one pass more now
18:51 karolherbst: what does GCRA do?
18:52 tobijk: imirkin_: every gallium driver uses the clip lowering to float[8] -> vec4[2]? that really surprises me
18:52 imirkin_: tobijk: it's the gallium api
18:52 imirkin_: tobijk: it comes in as CLIPDIST[0] and [1].xyzw
18:53 tobijk: and here i am wanting to change that :D
18:53 imirkin_: you could put it behind a cap
18:53 imirkin_: i'm not immensely a fan of that pass
18:53 tobijk: yeah i wanted to make a cap for combinedclipcull
18:54 imirkin_: it does horrible things to dynamic indexing
18:54 tobijk: which indeed disables the single lowering
18:54 imirkin_: i suspect that Brian is going to hate it
18:54 tobijk: :/
18:54 imirkin_: check with the group before implementing your plan
18:55 karolherbst: any idea why I get it==NULL here? http://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp#n1213
18:55 imirkin_: hehe
18:55 imirkin_: no.
18:55 imirkin_: i've never touched the RA logic
18:55 tobijk: karolherbst: oh that one, i was there too often :(
18:56 imirkin_: and i have no plans on ever touching it
18:56 imirkin_: tobijk: different one.
18:56 karolherbst: tobijk: oh sounds like you've got a plan :p
18:56 karolherbst: yeah I really mess up instruction orderin in the BBs
18:56 imirkin_: tobijk's troubles were in the SpillCodeInserter
18:56 tobijk: imirkin_: while i tried i always ran into that place with a SEGFAULT
18:56 imirkin_: iirc we fixed it though no?
18:57 tobijk: we left it like it was for good :)
18:57 tobijk: maybe you have done something in the meantime, not sure
18:57 imirkin_: there were def issues in there at one point
18:57 imirkin_: which i fixed
18:58 imirkin_: relating to spilling merges or something
18:58 tobijk: that came in before i tried to spill more cleverly
19:00 karolherbst: what does "buildLiveSets(BasicBlock::get(func->cfg.getRoot()));" do?
19:01 imirkin_: if i said it built the live sets for the function, would you hate me?
19:01 tobijk: :P
19:01 karolherbst: ...
19:01 imirkin_: liveness is a concept
19:01 imirkin_: it means whether the variable is in play or not
19:02 imirkin_: multiple variables might be "live" at once
19:02 karolherbst: ahh k
19:02 imirkin_: so at each instruction you have a list of values that are "live"
19:08 karolherbst: got a -4 now
19:09 karolherbst: I thought a break and a it=NULL check may do something usefull, seems like I was wrong
19:09 karolherbst: :D
21:56 imirkin_: mwk: have you figured out precisely what F2I.{S8,S16}.F32 do? i'm specifically interested in what happens to the "upper" bits
22:21 imirkin_: mwk: urgh, fail! looks like F2I.S16.F32 sets the high bits :(
22:21 imirkin_: but i think that S8 doesn't
22:21 imirkin_: which is extremely odd.
22:25 imirkin_: oh actually it's throwing invalid opcode errors. great.
22:27 imirkin_: [this is on a GK208]
22:27 imirkin_: oh well!
22:57 imirkin_: mwk: the ptx isa docs say it'll sign-extend
23:19 imirkin_: mwk: but i haven't been able to get the opcode to execute in the first place =/
23:19 imirkin_: i should probably see what ptxas does... urgh
23:30 imirkin_: mwk: looks like it does F2I + I2I pairs