00:13 agrecascino: oh jesus christ i fucked up
00:13 agrecascino: as soon as nouveau loads
00:13 agrecascino: everything goes black
00:13 agrecascino: what did i do wrong
00:14 imirkin_: agrecascino: mmmm... potentially nothing
00:14 imirkin_: agrecascino: how are your screens connected? which gpu do you have? can you grab dmesg after this (e.g. by ssh'ing in)?
00:14 agrecascino: the only reason i'm here is because i logged in blind
00:14 agrecascino: and X used vesa apparently
00:15 imirkin_: errrr.... that's weird
00:15 imirkin_: dmesg + xorg log would be great
00:16 agrecascino: https://0x0.st/N6G.txt
00:16 agrecascino: dmesg
00:17 agrecascino: https://0x0.st/N6D.txt
00:17 agrecascino: xorg
00:17 agrecascino: it seems to unload nouveau at one point
00:17 imirkin_: so...
00:18 imirkin_: it looks like nouveau gets very upset somewhere in the disp logic
00:18 imirkin_: which is what causes the screens to go black
00:18 imirkin_: however X manages to get it out of its funk?
00:18 agrecascino: what i learned:
00:18 imirkin_: either way, you have an ancient version of X
00:18 agrecascino: bow down to the proprietary overlords
00:18 agrecascino: and
00:18 agrecascino: learn how to login blind
00:19 imirkin_: with a definitely not-new-enough modesetting + glamor config
00:19 imirkin_: er, s/config/logic
00:19 imirkin_: the actual moral of the story is "buy amd"
00:20 agrecascino: AMD's MXM cards haven't been updated in years, last time i checked
00:20 imirkin_: no clue. i do know that amd supports open-source efforts while nvidia does not.
00:20 agrecascino: i would be using an amd machine
00:20 agrecascino: however, the power supply to go in that machine is too wide for the case
00:21 agrecascino: so rip
00:22 agrecascino: also yeah
00:22 agrecascino: i'm right
00:22 agrecascino: the latest MXM cards on AMDs side are all based on 7XXX cards
00:22 agrecascino: that's pretty depressing
00:23 agrecascino: is nouveau supposed to break text consoles though
00:23 agrecascino: i mean
00:24 agrecascino: if it fails loading horribly
00:24 agrecascino: or 2d acceleration is WIP
00:24 agrecascino: i mean that's overkill for failing horribly
00:24 imirkin_: text console works... just the display is off
00:24 imirkin_: which is of little consolation to you :)
00:24 agrecascino: i really love not seeing anything
00:25 agrecascino: gives me tons of assurance that everything is ok
00:25 imirkin_: either way, no, that's not supposed to happen - getting this stuff right is hard =/
00:26 agrecascino: especially when the manufacturer of said product is trying to make it harder to do it right
00:26 imirkin_: basically the hw is unhappy with how we're driving it... not a whole lot the driver can do.
00:26 imirkin_: it's not some adaptable intelligent actor
00:28 agrecascino: any possible workarounds?
00:28 imirkin_: provide a detailed bug report and hope ben can figure out what's wrong
00:29 imirkin_: (e.g. one that includes your full dmesg, not just a small extract)
00:29 agrecascino: where to report it?
00:29 imirkin_: bugs.freedesktop.org xorg -> Driver/nouveau
00:30 skeggsb: evo is pissed because we're doing a modeset for heads 2/3 (with sors 0/1), and leaving heads 0/1 displaying but without an OR attached
00:31 skeggsb: nfi how that'd actually happen, but that's what the state says
00:31 imirkin_: skeggsb: note that he has a super-old Xorg + modesetting version
00:31 imirkin_: 1.14
00:33 agrecascino: :\ I think I'm going to bow down to NVIDIA.
00:44 imirkin_: skeggsb: any prospect of integrating (some of?) karolherbst's work in 4.7?
00:44 karolherbst: imirkin_: he told me he is short on time ;)
00:45 karolherbst: imirkin_: also for 4.7 it is a bit too late now
00:45 imirkin_: meh. he puts the pull request together at the last second usually :)
00:45 imirkin_: and it's not like your patches are new
00:47 karolherbst: well
00:47 karolherbst: I think there are still some design issues or something
00:47 karolherbst: anyway, I would be surprised if ben has nothing to complain about ;)
00:47 imirkin_: he _is_ a complainer, that ben :p
00:50 airlied: skeggsb: the last second is approaching for -next :)
01:00 agrecascino: off topic question but
01:00 agrecascino: my cursor is invisible, how do i fix this
01:02 karolherbst: agrecascino: usually a software bug. How did it disappear?
01:02 agrecascino: it was never there
01:02 karolherbst: agrecascino: and usually logout/login fixes it
01:02 agrecascino: uhhhh
01:02 karolherbst: agrecascino: using ubuntu?
01:03 agrecascino: no, i just can't see my console
01:03 karolherbst: well
01:03 agrecascino: hmm
01:03 karolherbst: on an ubuntu machine I found the bug that when at login time there is no cursor, after login you still have none
01:03 agrecascino: i'll try it
01:05 karolherbst: imirkin_: now that I enabled SSO in like every shader which enables it, most of those "WARNING: value %92 not uniquely defined" warnings are gone and I already thought something went wrong :D
01:05 agrecascino: didn't fix
01:05 agrecascino: it
01:05 agrecascino: but i got an odd visual
01:07 karolherbst: yeah well
01:07 karolherbst: "2149818 -> 2149790 (-0.00%)" :/
01:07 karolherbst: meh
01:07 karolherbst: imirkin_: seems like all the effort is indeed for nothing (for now)
01:08 imirkin_: =/
01:08 imirkin_: fwiw it did seem quite odd
01:08 agrecascino: let me describe the visuals
01:08 agrecascino: i guess
01:08 agrecascino: i logged out
01:08 karolherbst: imirkin_: right
01:08 agrecascino: x stayed on screen, even though it was not running
01:08 agrecascino: and when i started up x again
01:09 agrecascino: the screen got filled with garbage for a few seconds
01:09 agrecascino: before coming to xfce
01:09 karolherbst: imirkin_: at least my PostRADCE pass finds usefull stuff now: helped inst ../nvidia_shaderdb/tomb_raider/8430.shader_test - 1 705 -> 651
01:09 imirkin_: figure out why :p
01:10 karolherbst: meh
01:10 karolherbst: ohh maybe it is fine
01:15 karolherbst: https://gist.github.com/karolherbst/a80c3b1457ed9bb89756c7657a764e86
01:16 karolherbst: 20,21,29,30,37,38
01:17 imirkin_: karolherbst: you kinda need 20,21 - they're used in 24
01:17 imirkin_: your post-ra dce pass needs some help
01:17 karolherbst: i thought so
01:18 karolherbst: https://github.com/karolherbst/mesa/blob/c8fb248ddff9a68a54741565d6c6398b816d7838/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp#L64-L86
01:18 karolherbst: i guess something is missing here
01:18 imirkin_: i guess the refcounts are messed up post-ra? dunno.
01:19 imirkin_: sorry =/
01:19 karolherbst: let me check
01:19 imirkin_: (it kinda makes sense)
01:20 imirkin_: there used to be a merge constraint
01:20 imirkin_: which was being ref'd
01:20 imirkin_: which tells the RA to give things the same values
01:20 imirkin_: but i guess we don't up the ref count
01:24 karolherbst: seems like it
01:34 karolherbst: imirkin_: but depending on the shaders linked together, my pass could still optimize some branches away? Or wouldn't that be possible with the current code?
01:35 imirkin_: anything's possible
01:35 imirkin_: some things are just unlikely.
01:36 karolherbst: okay
01:36 karolherbst: at least a valley shader benefits a lot
01:36 imirkin_: well, the current PostRADCE is a non-starter
01:36 imirkin_: it messes up shaders
01:36 imirkin_: coz it relies on refcount
01:37 imirkin_: which is not accurate post-ra
01:37 karolherbst: well
01:37 karolherbst: only 5 shader are changing anyway
01:37 karolherbst: stupid shader-db
01:37 karolherbst: why wasn't that sso stuff merged :D
04:15 Tom^: what happends when you run out of vram, is there some sort of oomkiller? or does everything simply crash and burn
04:15 imirkin: everything simply crashes and burns
04:16 imirkin: the pushbuf submit fails, and we don't handle pushbuf submit failures in mesa
04:17 Tom^: ok
04:17 imirkin: which means we end up starting to submit commands that try to read/write to unmapped addresses
04:17 imirkin: it's a good system :)
12:57 karolherbst: hakzsam: well, I think those ipc metrics are kind of important. In saints row I get usually around 0.5 IPC :/
12:57 karolherbst: that sounds low
12:57 karolherbst: especially because in pixmark_piano I get above 4
13:32 RSpliet: karolherbst: artificial benchmarks usually obtain a much higher IPC
13:33 mupuf: RSpliet: you mean ALU-bound benchmarks
13:33 mupuf: there are no artificial benchmarks :D
13:33 RSpliet: all of them are artificial ;-)
13:33 mupuf: that's another way of looking at it :D
13:34 mupuf: karolherbst: you get a poor utilisation because ... you know ... memory accesses?
13:35 RSpliet: it's a very useful metric
13:35 RSpliet: not to compare two applications, but to measure the improvement of your optimisations
13:35 mupuf:never doubted that
13:39 karolherbst: RSpliet: right, but having 0.5 seems low
13:40 karolherbst: RSpliet: especially because in other scense in the same game I get much higher (usually due to reduce scene complexity)
13:40 karolherbst: *reduced
13:41 karolherbst: low IPC usually comes due to low dual issueing or stalls in the shader?
13:42 mupuf: karolherbst: or spilling
13:42 mupuf: or poor instruction scheduling to hide the memory latency
13:43 karolherbst: yeah, with the latter I also meant that
13:43 karolherbst: in fact, if something has to wait
13:45 mupuf: karolherbst: would be good for you to find out which shader is responsible for the biggest perf drop and optimize it :)
13:45 mupuf: either you can check the exec time of each draw call
13:45 karolherbst: right
13:45 karolherbst: now that I have patched SSO support into shader-db
13:45 mupuf: and then map that to the shaders used based on the program id
13:45 karolherbst: that should be quite possible
13:46 mupuf: shader-db?
13:46 mupuf: what you need is a way to replace shaders :D
13:46 karolherbst: yeah, shader-db doesn't support sso
13:46 karolherbst: :D
13:46 mupuf: separate shader objects
13:46 mupuf: either replace shaders or use gl to monitor the length of each drawcall
13:47 mupuf: length == exec time
13:47 karolherbst: apitrace can do that
13:47 mupuf: yep, make sure you are not cpu-limited
13:47 karolherbst: I just create a trace and turn on gpu profiling on retrce
13:47 karolherbst: that's my smallest concern
13:47 mupuf: yop, and you can dump perf counters too
13:47 karolherbst: anyway, nouveau does busy waiting on fences
13:47 karolherbst: so cpu usuage is often pretty high
13:48 mupuf: sure, but apitrace can be the bottleneck
13:48 karolherbst: not with those traces
13:48 karolherbst: *games
13:48 mupuf: lucky you then
13:48 karolherbst: k. then I will trace SR3 because perf sucks
13:48 mupuf: trtt and I will be working again on apitrace and perf counters this summer
13:49 karolherbst: :)
13:49 karolherbst: yay, I have a 6.6GB sr3 trace already
13:50 karolherbst: mupuf: and if I am CPU limited, I can simply downclock my GPU
13:50 mupuf: hmm, make sure IO is not a bottleneck too :p
13:50 karolherbst: mupuf: 16gb ram
13:50 karolherbst: file cache
13:50 karolherbst: usually my cache is around 11GB
13:51 karolherbst: 70% cpu usage, sounds fine
13:52 karolherbst: mupuf: "pixels drawn profiling" does this makes sense?
13:52 karolherbst: pixels drawn per draw call or something
13:54 karolherbst: mupuf: what is odd, is that while I retrace it, I get hardly above 24W
13:54 karolherbst: and I idle at 19W on 0f
14:00 karolherbst: heh
14:00 karolherbst: gpu core load is around 50%
14:01 karolherbst: memory load below 5%
14:01 karolherbst: it's like the CPU and the GPU are both super bored
14:03 karolherbst: mhh
14:03 karolherbst: maybe if the IPC drops below 100% the pmu counter also drops below it?
14:10 mupuf: karolherbst: force the CPU speed
14:11 karolherbst: mupuf: what do you mean by that?
14:11 mupuf: force the cpu frequency
14:11 karolherbst: you mean besides intel_pstate performance governor?
14:11 mupuf: yes
14:11 karolherbst: so I should disable sleep state
14:12 mupuf: nah, just force the frequency to 100%
14:12 karolherbst: yeah, right, and how should I do that otherwise? Because I don't drop below 2.4GHz
14:12 mupuf: ok, then it is not the issue
14:13 karolherbst: just the effective freqs drop due to sleeping
14:13 karolherbst: "Avg_MHz" int he "good" turbostat tool
14:13 mupuf: /sys/devices/system/cpu/intel_pstate <-- that is where you can force frequencies
14:13 karolherbst: "Avg_MHz" int he "good" turbostat tool
14:13 karolherbst: ..
14:14 karolherbst: https://gist.github.com/karolherbst/61fe2174ce54009d7d7ee41ba4b270c6
14:14 karolherbst: mupuf: well, those things don't do anything for me
14:14 karolherbst: maybe due to the performance governor
14:14 mupuf: ack
14:14 karolherbst: performance governor means max perf, always
14:14 karolherbst: :D
14:14 mupuf: no, performance also downclocks
14:15 karolherbst: no
14:15 mupuf: but performance is sometimes slower than ondemand, go figure out
14:15 karolherbst: only between base and max boost
14:15 mupuf: anyway, this is not your isssue
14:15 karolherbst: right
14:16 karolherbst: mupuf: I can simply get the shaders from qapitrace with the call id and write my own shader_test file right?
14:16 mupuf: good luck..
14:17 mupuf: you may try juha-pekka's c-file writer for apitrace
14:18 karolherbst: why? well I get the shaders
14:18 karolherbst: or do I need something else?
14:18 mupuf: because running shaders without data is just ... wrong?
14:19 mupuf: https://github.com/juhapekka/apitrace-1/tree/c_source_code_writer
14:20 karolherbst: yeah, but what kind of information does this give me? My plan was to look at the most expensive draw call and see what the shader looks like
14:26 karolherbst: ohhh wait
14:26 karolherbst: I saw this once
14:27 karolherbst: yep
14:27 karolherbst: something is odd with the sched oppcode
14:28 karolherbst: hakzsam: any reasons why inst_issued2 should give me a value not 0, when I return false in canDualIssue?
14:29 karolherbst: mupuf: dual issueing more than the hardware can results in a _big_ perf penalty
14:30 karolherbst: and with big I mean like 90% perf drops
14:30 mupuf: this makes absolutey no sense
14:30 mupuf: but I cannot work with you now
14:30 mupuf: still at work
14:31 karolherbst: yeah
14:31 karolherbst: this has to be it
14:31 karolherbst: the situation: I disabled canDualIssue by returning false, always.
14:31 karolherbst: inst_issued is usually around 60M in the trace
14:32 karolherbst: but in the frames where the perf is really really bad, inst_issued2 is unusually high
14:32 karolherbst: "1k"
14:32 karolherbst: allthough I explisitly disabled dual issueing in the mesa code
14:33 karolherbst: maybe there is a max value for the opcode and by overflowing we dual issue?
14:44 karolherbst: fun, I set all sched codes to 0x20 and get dual issueing allthough we though 0x04 is the dual issue code
14:46 karolherbst: I am wondering now.... could we use the inst_issued1 and inst_issued2 perf counters to get the right sched codes?
14:50 karolherbst: hakzsam: do you think there could be somewhere an overflow getting inst_iussued values?
14:57 karolherbst: hakzsam: especially because the per frame values are multiplies of 10
15:02 karolherbst: mhhh
15:03 karolherbst: when a vertex shader produces OUT 0-6, but the fragment shader only has IN 0-5, OUT6 could be dropped, right?
15:03 imirkin: check the semantic... the actual OUT/IN index doesn't matter
15:04 karolherbst: ohh okay
15:04 karolherbst: one OUT is POSITION the other are GENERICs, do you mean this for example?
15:04 imirkin: yes.
15:04 imirkin: and it'll be GENERIC[1]
15:04 imirkin: etc
15:04 karolherbst: yeah
15:04 imirkin: and GENERIC[1] matches up to GENERIC[1]
15:04 imirkin: irrespective of the IN/OUT index itself
15:04 karolherbst: all GENERICs are using in the fragment shader
15:05 karolherbst: is POSITION a special case which is always used?
15:05 imirkin: it's a semantic
15:05 imirkin: just like generic is a semantic
15:05 imirkin: not sure what you mean
15:05 imirkin: you can check the tgsi docs for what these things mean
15:05 karolherbst: okay
15:06 karolherbst: well
15:06 karolherbst: I was thinking about something else
15:06 karolherbst: what if in a fragment shader, the opts lead to only two of those IN being used
15:06 karolherbst: that would mean we could drop the OUTs in the vertex, right?
15:06 imirkin: we don't do any linking optimizations
15:06 imirkin: but yes, it does.
15:06 karolherbst: mhh okay
15:06 karolherbst: in the nv50 ir, how are those INs accessed?
15:07 imirkin: in a vertex shader, vfetch. in a fragment shader, interp
15:07 karolherbst: and then a[0xac] maps to o[0xac]?
15:07 imirkin: something like that, yes
15:08 karolherbst: mhh okay
15:08 karolherbst: then I can at least check if it might make a difference
15:08 imirkin: the gallium interface doesn't lend itself nicely to such optimizations
15:09 imirkin: since all the shader stages are completely separate
15:09 imirkin: until draw time
15:09 imirkin: and can be rebound in various ways with diff shaders
15:09 karolherbst: well
15:09 imirkin: effectively all gallium shaders are sso
15:09 karolherbst: if this could cut effectivley in half, then it has to be fixed
15:09 imirkin: could? sure. dunno that that would happen very often though.
15:10 karolherbst: well, that's what I am checking now
15:13 karolherbst: mhhh
15:13 karolherbst: in the one shader "export b128 # o[0x70] $r20q" -> "export b32 # o[0x7c] $r23"
15:13 karolherbst: this would be possible
15:14 karolherbst: 3 out of 28 b32 values unsued
15:14 karolherbst: *unused
15:15 imirkin: that's position
15:15 imirkin: chances are things would get unhappy if that were not emitted
15:15 imirkin: but not sure
15:15 karolherbst: mhh
15:16 karolherbst: still an interessting thing to do, maybe just complicate to implement due to gallium
15:16 karolherbst: but, the game engines also use SSO
15:16 karolherbst: and I could think, that devs could just not care enough to optimize that on their end
15:17 karolherbst: especially if nvidia already does it
15:21 karolherbst: ehm
15:21 karolherbst: floor ftz f32 $r2 abs $r0; sub ftz f32 $r6 abs $r0 $r2
15:22 karolherbst: isn't there some native instruction for this?
15:23 imirkin: FRC? no.
15:23 karolherbst: mhh but this shader is especially interessting here
15:23 karolherbst: late
15:23 karolherbst: set ftz u8 $p0 ge f32 $r0 neg $r0
15:24 karolherbst: not $p0 neg ftz f32 $r6 $r6
15:24 karolherbst: *later
15:29 karolherbst: this is
15:29 karolherbst: mhh
15:30 karolherbst: like neg(frc($r0))
15:33 karolherbst: yeah
15:34 karolherbst: this can be shorten to this:
15:34 karolherbst: floor ftz f32 $r2 $r0
15:34 karolherbst: add ftz f32 $r6 neg $r0 $r2
15:45 karolherbst: mhh okay, doesn't work for negative ones yet
15:51 karolherbst: am I stupid or is there no easy way to get the negated fractual part
15:55 karolherbst: ohhhh I am stupid
15:55 karolherbst: floor(-4.2) is -5
17:06 mlankhorst: indeed! :D
17:23 karolherbst: but there has to be a way to do fract in less than 4 instructions
17:24 karolherbst: or at least in 4 instructions without branching
17:25 imirkin_: should be 2 ops iirc
17:26 karolherbst: imirkin_: for positive ones, yes
17:26 imirkin_: karolherbst: https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp#n3096
17:27 karolherbst: imirkin_: -5.4 - math.floor(-5.4) == 0.6
17:27 karolherbst: at least in python
17:27 imirkin_: right.
17:27 karolherbst: and I doubt that on the gpu floor(-5.4) is -5
17:28 imirkin_: http://docs.gl/sl4/fract
17:28 imirkin_: hopefully that answers your question? :)
17:28 karolherbst: :O
17:28 karolherbst: how insane
17:28 karolherbst: because the shader wants the real thing
17:29 karolherbst: imirkin_: the generated code does something like this: 5.4 => -0.4 and -5.4 => 0.4
17:31 karolherbst: and currently nouveau generates floor/sub/set/ predicated neg
17:32 imirkin_: ok
17:35 karolherbst: \o/
17:35 karolherbst: that set and that predicated neg could be merged into a slct
17:45 imirkin_: karolherbst: it may be beneficial to have a pass that looks for if () x = foo; else x = bar;
17:45 imirkin_: it's a pretty standard kind of thing to look for.
17:45 imirkin_: known as a "sel peephole" i think.
17:46 karolherbst: ahh
17:46 karolherbst: yeah, currently thinking exactly about that
17:46 imirkin_: i would recommend doing that pre-ssa tbh
17:46 karolherbst: I already removed the floor/sub thing in my example
17:46 imirkin_: since messing with BB's after SSA is done is next to impossible
17:46 karolherbst: ohh, I think I got it now
17:47 karolherbst: messing with BB's isn't that hard actually
17:47 imirkin_: no, it is.
17:47 imirkin_: trust me.
17:47 karolherbst: well at least I can remove the edges without breaking stuff now
17:47 imirkin_: no.
17:47 imirkin_: you can't.
17:47 imirkin_: you just think you can.
17:47 karolherbst: sure I can
17:47 karolherbst: mhh
17:48 karolherbst: doesn't node->cut removes the edges too?
17:48 karolherbst: *remove
17:48 imirkin_: https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp#n375
17:48 karolherbst: uhh right, phi nodes
17:49 karolherbst: well maybe we should clean that CFG mess up a bit and store instruction <=> edge pointers
17:49 karolherbst: this would make things so much easier in the end
17:50 imirkin_: llvm stores the incident BB as part of the PHI node argument
17:50 imirkin_: which i think is the right approach
17:51 imirkin_: https://trello.com/c/GGfLYwbD/76-attach-bb-sources-to-phi-nodes-directly-instead-of-relying-on-inbound-edge-order
17:51 karolherbst: yeah, but the same problem also exists for branch instructions
17:51 imirkin_: heh
17:51 karolherbst: you have a bra, but only implicit edge ordering
17:51 imirkin_: that's a different issue
17:52 karolherbst: looks like same issue, just different side
17:52 imirkin_: the phi nodes thing makes edge manipulation a non-starter
17:52 imirkin_: i think the bra thing is more easily fixable by removing the implicit stuff
17:52 karolherbst: yeah, right, but it is the same issue: depending on edge ordering
17:53 imirkin_: right... just with a much simpler fix
17:57 karolherbst: imirkin_: for the example above: https://gist.github.com/karolherbst/be50b52ad6e1df0fb47f8f2f861befca
17:59 imirkin_: karolherbst: right.
18:00 hakzsam: karolherbst, what's the issue with inst_issued2 ?
18:02 karolherbst: hakzsam: it isn't 0 when I think it should be
18:03 karolherbst: hakzsam: I disabled dualIssuing in the emiter and forced for every instruction the same sched value
18:03 hakzsam: and it's not 0?
18:03 karolherbst: hakzsam: but then inst_issued2 suddenly was not 0 anymore (depending on inst_issued?)
18:03 karolherbst: hakzsam: well it is, but not the entire trace
18:04 hakzsam: I think this perf counter returns the correct result though
18:04 karolherbst: tell me what I should check and I will check it :)
18:06 hakzsam: well, checking all perf counters is really not trivial, but if you want to do it you have to write a minimal GL test and use GL_AMD_performance_monitor into it to monitor perf counters
18:07 hakzsam: and to write a minimal shader of course :)
18:08 karolherbst: "minimal"?
18:08 hakzsam: this is a general solution though
18:08 karolherbst: I think the issue only triggers when you have like 10M+ inst per frame
18:09 hakzsam: yeah, but if want to make sure that inst_executed should return N instructions for a shader, you can write it, monitor inst_executed and see what's happening :)
18:09 hakzsam: karolherbst, if the "issue" only happens with a big application, that's really hard to figure out if it's a real problem or not
18:10 karolherbst: well
18:10 karolherbst: inst_issued2 shouldn't be non 0 when nothing dual issues
18:10 karolherbst: hakzsam: but the value doesn't depend on the metric-inst_issued :/
18:11 hakzsam: are you sure that nothing should be dual issued in that specific case?
18:11 karolherbst: yeah
18:12 karolherbst: pretty sure
18:12 karolherbst: maybe something slipps through, but I really doubt that
18:12 hakzsam: I mean, maybe the hardware is lying you, and dual issue even if you don't want to (for some instructions)
18:12 hakzsam: I don't know if this can happen though
18:12 hakzsam: imirkin_, is that possible?
18:13 karolherbst: I will show you and you will immediatly think "wtf..."
18:13 imirkin_: i don't know anything about dual-issue
18:13 imirkin_: or what those counters are counting
18:13 karolherbst: especially the correlation with the fps
18:14 karolherbst: https://i.imgur.com/sTGzez7.png
18:14 hakzsam: I don't know much about dual-issue too, that's why I'm asking and guessing that maybe the hw does something :)
18:14 karolherbst: hakzsam: well I return false in "TargetNVC0::canDualIssue" for everything
18:14 hakzsam: okay, 110 vs 83M
18:14 karolherbst: yeah
18:14 karolherbst: exactly
18:15 karolherbst: so maybe something slipps through
18:15 karolherbst: and maybe this causes this big perf issue?
18:15 karolherbst: I know that dual issuing too much can cut of like 90% of perf
18:15 imirkin_: probably the blit shaders
18:15 imirkin_: they're hardcoded
18:15 karolherbst: imirkin_: like in binary hardcoded?
18:16 imirkin_: https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/nvc0/nvc0_surface.c#n801
18:16 hakzsam: imirkin_, that's a theory yes
18:16 karolherbst: because if those instruction go through SchedDataCalculator::setDelay then they also be affected by canDualIssue
18:16 karolherbst: uhhh
18:16 hakzsam: they won't reach that point
18:16 karolherbst: well
18:16 karolherbst: 00 sched means: we don't know
18:17 karolherbst: so they shouldn't get dual issues as well
18:17 karolherbst: but maybe we could sched information to those?
18:17 hakzsam: 00 is unknow right?
18:17 karolherbst: no idea if that makes sense for performance reasons
18:17 karolherbst: yes
18:17 karolherbst: 0x04 is dual issue
18:17 hakzsam: so, how can you be sure that nothing is dual issued? :)
18:18 karolherbst: because 0x00 is "default"
18:18 karolherbst: hakzsam: https://envytools.readthedocs.io/en/latest/hw/graph/fermi/cuda/isa.html?highlight=dual#notes-about-scheduling-data-and-dual-issue-on-gk104
18:18 karolherbst: 00: no scheduling info, suspend warp for 32 cycles
18:18 hakzsam: oh okay, so we know 00
18:18 hakzsam: I see
18:19 hakzsam: makes more sense
18:19 karolherbst: hakzsam: yep, cause kepler can't schedule
18:19 karolherbst: there is no hw scheduler anymore
18:20 hakzsam: karolherbst, can you reproduce the issue with something else? or this really only happens with big traces and ton of instructions ?
18:20 karolherbst: also note the "weak" correlation between inst_issued2 and metric-inst_issued
18:20 karolherbst: hakzsam: let me check
18:21 hakzsam: it's not obvious :)
18:21 karolherbst: hakzsam: pixmark_piano: 2.88G inst_issued
18:21 karolherbst: 0 inst_issued2
18:21 karolherbst: when in doubt, now you now what to check :D
18:22 karolherbst: now those 80M look like nothing really
18:22 hakzsam: sure, but I don't have any gpus right now (not at home)
18:22 hakzsam: it works with pixmark_piano and not with your trace?
18:22 hakzsam: what's that game btw?
18:23 karolherbst: saints_row 3
18:23 karolherbst: which runs at like 15% nvidia performance ;)
18:23 hakzsam: sadness
18:23 karolherbst: right
18:23 karolherbst: maybe this is one of the issues
18:24 karolherbst: mhh
18:24 hakzsam: karolherbst, did you remove the sched instructions from codegen/lib btw?
18:24 hakzsam: I don't know if they are used though
18:24 karolherbst: hakzsam: yeah
18:24 karolherbst: I can even force everything to 0x3f
18:24 hakzsam: karolherbst, probably not the only perf issue to be honest
18:24 hakzsam: but that's a good start
18:24 karolherbst: right
18:24 karolherbst: let me try something
18:24 karolherbst: I will dual issue everything
18:25 karolherbst: if the perf stays the same...
18:25 hakzsam: okay
18:25 hakzsam: but seriously 110 vs 83M is nothing
18:25 hakzsam: you can call that "noise" or whatever you want :)
18:25 karolherbst: well
18:26 karolherbst: 0 vs 3G sounds better :D
18:26 hakzsam: right
18:26 hakzsam: so, the perf counter is most likely correct?
18:26 hakzsam: I mean it seems to work in different cases
18:26 karolherbst: maybe
18:26 hakzsam: and doesn't with saints row 3
18:26 karolherbst: I didn't check
18:27 karolherbst: yeah
18:27 karolherbst: well
18:27 karolherbst: the only thing we now for sure: when I return false in canDualIssue or force a specific sched code in SchedDataCalculator::setDelay
18:27 hakzsam: it would be good to check with a different chip
18:27 karolherbst: inst_issued2 shows not 0 in sr3
18:27 hakzsam: fermi or kepler2
18:27 karolherbst: lol
18:27 karolherbst: I dual issue everything
18:27 karolherbst: same perf
18:28 karolherbst: ....
18:28 karolherbst: seriuosly
18:28 hakzsam: karolherbst, okay, I'll try to remember that thing and test when I will be back home
18:28 karolherbst: hakzsam: nope, I think your counter is good
18:28 karolherbst: something is odd elsewhere
18:28 hakzsam: I'll trust you then :)
18:29 karolherbst: the perf should be shit with dual issue everything
18:32 hakzsam: we definitely need those graphics perf counters
18:32 hakzsam: sorry for the delay :/
18:32 karolherbst: hakzsam: in pixmark_piano: 16 -> 2 fps
18:32 karolherbst: when I dual issue _everything_
18:32 hakzsam: big performance drop
18:33 hakzsam: but that makes sense
18:34 karolherbst: 14 fps when I dual issue nothing
18:34 karolherbst: checking with heaven
18:34 hakzsam: Tom^, do you still have tour gk110?
18:34 hakzsam: *your
18:34 Tom^: yes sir.
18:35 hakzsam: time to test something?
18:35 Tom^: sure
18:36 hakzsam: do you have mesa master? or a mesa not too old like two weeks ago ?
18:36 Tom^: built it yesterday iirc.
18:36 Tom^: or on saturday
18:36 hakzsam: nice
18:36 hakzsam: I just want to make sure that enabling compute support by default on GK110 won't break the universe because it's done at initialization time (ie. context creation)
18:37 Tom^: ok
18:37 hakzsam: I already asked you few months ago to test something like that but it didn't work correctly
18:37 Tom^: indeed
18:37 hakzsam: so, you just need to run some application with export NVF0_COMPUTE=1
18:37 hakzsam: like heaven
18:37 karolherbst: hakzsam: well in heaven performance breaks from 14 to 7 fps
18:37 hakzsam: or whatever you want, but not glxgears :)
18:37 karolherbst: I guess it highly depends on the IPC value how much it hurts
18:37 hakzsam: karolherbst, half!
18:38 karolherbst: yeah well half is boring :D
18:38 hakzsam: yeah, probably :)
18:38 karolherbst: anyway, test with sr3 trace again
18:38 hakzsam: Tom^, I remember that heaven did not work correctly with NVF0_COMPUTE=1 before, it hanged your gpu IIRC
18:38 hakzsam: Tom^, now, it should just work like a charm
18:38 Tom^: works like a charm.
18:39 Tom^: as long as it actually got activated i guess
18:39 Tom^: no way to confirm its on? :p
18:39 hakzsam: sure
18:39 hakzsam: glxinfo| grep "core"
18:40 hakzsam: it should return 4.3 :)
18:41 hakzsam: err, 4.2
18:41 karolherbst: hakzsam: well either the nouveau code is soooo bad, that it doesn't matter if we dual issue completly wrong or not or we indeed have a sched issue
18:42 Tom^: hakzsam: http://i.imgur.com/PhSIQag.png
18:42 karolherbst: our normal sched code vs sched 0x04 everything has the same perf
18:42 hakzsam: karolherbst, maybe we have a sched issue
18:42 hakzsam: Tom^, and without NVF0_COMPUTE I guess it returns GL 4.1?
18:42 Tom^: not really no
18:43 hakzsam: oh right, I'm stupid :)
18:43 hakzsam: 4.2 in any cases
18:43 karolherbst: hakzsam: anyway, inst_issued2 should return 0 if we don't dual issue
18:43 karolherbst: hakzsam: one way or another we should fix that
18:43 hakzsam: Tom^, but it should expose GL_ARB_compute_shader with NVF0_COMPUTE=1?
18:43 hakzsam: karolherbst, right
18:44 Tom^: hakzsam: seems so indeed.
18:45 Tom^: now i need to benchmark things see if anything happends :P
18:45 karolherbst: imirkin_, hakzsam: is there anything which might get invoked 10*n times per frame? or some_constant*n times?
18:45 hakzsam: Tom^, cool, thanks for testing
18:45 karolherbst: hakzsam: funny enough with GALLIUM_HUD_PERIOD=0 those values are always 10*n
18:45 imirkin_: karolherbst: oh, heh. those perf counter compute shaders :)
18:46 karolherbst: :D
18:46 hakzsam: Tom^, compute support will be enabled by default soon on your chip
18:46 karolherbst: imirkin_: so you say, when I disable some, the value should... drop?
18:46 hakzsam: karolherbst, yeah, we use compute shaders to read out perf counters
18:46 imirkin_: https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c#n112
18:46 Tom^: hakzsam: cool
18:46 karolherbst: :D
18:46 imirkin_: which does appear to do dual-issue
18:47 hakzsam: oh right
18:47 hakzsam: how did I forget that? :)
18:47 karolherbst: imirkin_: but why does it only happen in sr3?
18:47 hakzsam: karolherbst, so, you know what to do
18:47 karolherbst: no, that doesn't make sense though
18:47 hakzsam: try to disable them anyway
18:48 karolherbst: yeah
18:48 hakzsam: imirkin_, well, compute support seems good on gk110
18:50 karolherbst: imirkin_: nope, that wasn't it
18:50 hakzsam: still 110?
18:51 karolherbst: hakzsam: same values
18:51 hakzsam: mmh
18:51 karolherbst: hakzsam: did you look at the graph? it isn't like it is a constant value ;)
18:51 karolherbst: and that's why it was odd from the beginning that those shaders are counted
18:51 karolherbst: cause
18:51 karolherbst: they should be counted for every frame then too
18:52 karolherbst: and the value shouldn't be 0 at any time
18:52 hakzsam: didn't look at the details ;)
18:52 karolherbst: :D
18:52 hakzsam: != 0 I would say
18:52 karolherbst: peak is 890 ;)
18:53 hakzsam: maybe you could try to dump generated code and check for sched codes?
18:53 hakzsam: that's a pain but heh
18:53 Tom^: hakzsam: with it on http://i.imgur.com/2UgD0Ow.png so yea it seems fine as long as unigine-heaven makes any use of it i guess
18:53 karolherbst: hakzsam: right
18:53 karolherbst: hakzsam: this is my very last approach
18:53 hakzsam: Tom^, cool
18:53 karolherbst: hakzsam: I could check the emiter though
18:53 karolherbst: hakzsam: somewhere is a sched => byte translation
18:54 hakzsam: yeah, but make sure that no hardcoded codes (like the blitter one) are used
18:57 karolherbst: mhh
18:57 Tom^: so just GL_ARB_robust_buffer_access_behavior and GL_ARB_shader_image_size and then i have gl 4.3 wohoo :)
18:58 karolherbst: hakzsam: well there is a NVC0_DEBUG_SCHED_DATA define :D
19:01 hakzsam: Tom^, ARB_shader_image_size should be already exposed
19:02 Tom^: hakzsam: oh i was just checking https://mesamatrix.net/
19:02 hakzsam: karolherbst, and no NV50_PROG_DUMP=filename :)
19:02 hakzsam: karolherbst, misreading
19:02 karolherbst: grep "sched 04" :)
19:03 hakzsam: karolherbst, but having an envvar which dumps generated code could help at some point
19:03 karolherbst: mhhh
19:03 karolherbst: I don't get anything though
19:03 karolherbst: maybe it is indeed some binary code
19:04 hakzsam: Tom^, yeah, only this robustness thing needs to be done, but it's sort of bullshit ;)
19:05 karolherbst: mhh src/gallium/drivers/nouveau/codegen/lib/gk104.asm
19:05 karolherbst: builtins
19:05 hakzsam: I asked you earlier if you removed the sched codes from the codegen lib
19:06 karolherbst: ohh, then I missunderstood you
19:06 karolherbst: sorry
19:06 Tom^: hakzsam: oh i also have a gk110b but i guess its the same card pretty much
19:06 Tom^: hakzsam: to an gk110 that is
19:06 karolherbst: would be funny if there is something fishy inside those builtins :D
19:07 hakzsam: karolherbst, there is probably
19:08 karolherbst: couldn't we write those builtins inside TGSI and compile it normally?
19:09 karolherbst: hakzsam: and how do I compile those asm files now?
19:09 hakzsam: karolherbst, go to codegen/lib and make
19:10 karolherbst: okay
19:10 karolherbst: yep, looks good
19:10 karolherbst: now compiling
19:10 hakzsam: the idea of those builtins is to be precompiled
19:11 hakzsam: using TGSI will be useless
19:12 karolherbst: ahh okay
19:12 karolherbst: well at least we shouldn't need to put sched opcodes there
19:13 karolherbst: mhh
19:13 hakzsam: we need to, but maybe some sched codes are not totally correct
19:13 hakzsam: karolherbst, so, 0 now?
19:13 karolherbst: yeah
19:13 hakzsam: nice
19:13 karolherbst: ohh wait
19:13 karolherbst: no, I replaced the 0x04 with 0 :D
19:14 hakzsam: what's 0x04?
19:14 hakzsam: dual issue?
19:14 karolherbst: yeah
19:14 karolherbst: \o/
19:15 karolherbst: performance is still pretty bad
19:15 hakzsam: cool
19:15 karolherbst: but most of those builtins use 0x00 as sched
19:15 hakzsam: saints row 3 now returns 0 for inst_issued2?
19:15 karolherbst: yeah
19:16 hakzsam: awesome
19:17 karolherbst: well i "optimize" those builtins a bit then :D
19:18 hakzsam: okay
19:19 karolherbst: who needs this anyway :)
19:20 hakzsam: all shaders which use OP_CALL ;)
19:22 karolherbst: I hope I messed not too much now
19:23 karolherbst: well at least it should be easy to find out which builtin is getting used
19:24 hakzsam: I think so
19:24 karolherbst: yep, I just dual issue one function and see if inst_issued2 increases :D
19:24 hakzsam: but if the function is not called...
19:25 hakzsam: you won't see anything useful :)
19:26 karolherbst: well
19:26 karolherbst: I just whiped out the all the functions inside the asm file
19:26 karolherbst: and no visual change
19:27 hakzsam: what?
19:28 karolherbst: yeah, seems like only a minor thing calls those builtins
19:29 hakzsam: makes sense
19:29 karolherbst: and with this, I am where I was at the beginning :D
19:30 hakzsam: but now, you have inst_issued to 0 with saint rows 3
19:30 karolherbst: right
19:30 karolherbst: which helps me with nothing really
19:30 hakzsam: right
19:30 hakzsam: this just confirms that inst_issued2 is correct
19:31 karolherbst: hakzsam: the thing is just the GPU is bored (around 50% engine load, <10% memory load), the CPU is boared (~60%)
19:31 karolherbst: *bored
19:31 karolherbst: and something causes the performance to be insanly bad
19:32 hakzsam: mmh, that's pretty bad yeah
19:33 karolherbst: maybe we stall the gpu like crazy
19:33 karolherbst: but then we should have some more memory load
19:34 karolherbst: ohhh wait
19:34 karolherbst: this is the CU
19:34 karolherbst: *CPU
19:35 karolherbst: while the really bad frames are drawn the GPU core load drops below 10% but cpu is at 100%
19:37 karolherbst: hakzsam: maybe in the end it is something stupid as the compiler is compiling each frame
19:37 karolherbst: which means....
19:37 hakzsam: cpu bound you mean?
19:37 karolherbst: how can I disable all opts?
19:37 hakzsam: NV50_PROG_OPTIMIZE=0 ?
19:37 karolherbst: yeah
19:38 hakzsam: but TF2 which seems like cpu bound has good performance with nouveau
19:38 karolherbst: yay
19:38 karolherbst: more perf
19:39 hakzsam: yup
19:39 karolherbst: well a bit more perf
19:39 karolherbst: not so much that I would call it playable though
19:39 hakzsam: without OPTIMIZE?
19:39 hakzsam: with OPTIMIZE=0 I mean?
19:39 karolherbst: yeah
19:39 karolherbst: not much, but usually 1 or 2 fps more
19:39 karolherbst: anyway
19:40 karolherbst: I also did a --pcpu run
19:40 hakzsam: not a big issue I would say
19:40 karolherbst: :D
19:40 karolherbst: https://gist.github.com/karolherbst/ab029778c105dd05445e721fb907625c
19:40 karolherbst: guess what
19:41 hakzsam: no ideas what those numbers are
19:41 hakzsam: are you using apitrace?
19:42 karolherbst: yeah
19:42 karolherbst: # call no gpu_start gpu_dura cpu_start cpu_dura vsize_start vsize_dura rss_start rss_dura pixels program name
19:42 karolherbst: anyway
19:42 karolherbst: coloum 6: cpu time
19:43 karolherbst: those are the most CPU expensive calls in order
19:43 hakzsam: I see
19:43 karolherbst: there is a glTexStorage2D at position 214
19:43 karolherbst: or a glclear at 229
19:43 karolherbst: all those bs stuff
19:44 hakzsam: maybe you could replay the trace with perf?
19:44 hakzsam: and see if you find some cpu bottlenecks
19:44 karolherbst: at 1500+ there are alos some glReadPixel calls
19:44 karolherbst: at 1900+ some "usefull" stuff comes, but yeah
19:45 karolherbst: the issue is, that those linkProgram calls are done like every single frame
19:45 karolherbst: and glcompileshader
19:45 karolherbst: and a lot of those
19:46 karolherbst: maybe an in memory cache might help already?
19:46 hakzsam: no clue
19:49 karolherbst: mhh maybe we could async those compile calls and join on upload time or something like that
19:52 karolherbst: let's compile with O3 before I trace with perf
19:53 RSpliet: karolherbst: seems that potential stability patch thing runs fine, let's find out tomorrow morning whether my 780Ti crashed (angel)
19:53 karolherbst: RSpliet: yeah, it seemed okay on mine too
19:57 karolherbst: hakzsam: funny, it seems better after I compiled with 03 :D but still awesomely bad
19:58 hakzsam: 03 is definitely better than 00, especially when it's cpu bound :)
19:59 karolherbst: yeah but it isn't significantly better
19:59 karolherbst: well it's better and that what matters
19:59 hakzsam: yeah, this won't increase performance a lot
19:59 karolherbst: but now the top 500 changed a lot
20:00 karolherbst: 9: call 93884 0 0 797993074 23063609 0 0 0 0 0 0 glClear
20:00 hakzsam: hehe
20:00 karolherbst: hakzsam: 374 glClears under the top 500
20:00 karolherbst: 508 under the top 5000
20:01 karolherbst: what does glClear do anyway? :D
20:01 karolherbst: well aynway, it is full with glCompilerShader, glClear and glLinkProgram that it actually hurts
20:01 hakzsam: it clears :)
20:01 karolherbst: okay, now perf
20:03 karolherbst: perf -g, but what should I pass too?
20:03 hakzsam: usually I use 'perf record'
20:03 hakzsam: and then 'perf report'
20:03 hakzsam: but there are ton of options
20:03 karolherbst: right
20:04 karolherbst: I think -g is enough
20:04 urmet: what cards are you perf-testing?
20:04 hakzsam: gk106 I guess
20:04 karolherbst: yeah
20:05 hakzsam: piglit is soooo long
20:05 urmet: aw. i have gm.. :(
20:05 hakzsam: and I need two runs :/
20:05 karolherbst: hakzsam: meh, well at least on my GPU I don't need -1 :)
20:06 hakzsam: I always use -1
20:06 karolherbst: yeah, that's your issue :p
20:07 hakzsam: each time I tried to run concurrent tests my gpu hanged miserably :)
20:07 karolherbst: try RSpliets patch :D
20:08 hakzsam: which ones?
20:09 karolherbst: https://github.com/RSpliet/kernel-nouveau-nv50-pm/commit/12c76f4c4f4dfc3af5fecae26715b30cbe8505fd
20:09 karolherbst: RSpliet: do you think it might help with concurrent piglit?
20:11 hakzsam: first run is almost done :)
20:12 karolherbst: 49.07%-- snappy::RawUncompress :/
20:14 karolherbst: maybe I should just start the game and run that under perf
20:16 RSpliet: karolherbst: depends on the symptoms
20:16 RSpliet: if it's "mouse moves, hangs otherwise, CTX_TIMEOUT in logs" then I hope so
20:16 RSpliet: (or... something like CTX timeout, don't remember the exact string)
20:18 hakzsam: maybe I should try concurrent piglit then
20:23 hakzsam: karolherbst, piglit seems to be really slower on gm107 than gk208 for the same number of tests
20:31 hakzsam: second run now
20:33 imirkin_: hakzsam: for fewer tests... maxwell doesn't have tess
20:34 hakzsam: imirkin_, yeah, whatever it should take 30 minutes or so
20:38 hakzsam: imirkin_, btw, what's the issue with tess on maxwell?
20:38 imirkin_: i got lazy
20:39 imirkin_: fetching inputs and outputs doesn't work (and storing outputs is unlikely to work as well on TCS)
20:40 hakzsam: is there some piglit tests which fail ?
20:40 imirkin_: anything that uses tess
20:40 imirkin_: start with nop.shader_test
20:40 imirkin_: and move up from there
20:41 hakzsam: okay, that's a good start
20:41 hakzsam: (it's on my todolist just after images as you already know)
20:41 imirkin_: yea
20:42 imirkin_: it's just something that'll take a day of concentration from me, and i haven't had the motivation to do it
20:42 hakzsam: only one day? :)
20:42 imirkin_: in large part because nvidia kinda gave open-source a big "fuck you" with GM20x, which leads me to be less inclined to investigate.
20:42 hakzsam: yeah, I see
20:43 imirkin_: i obviously won't nack code that implements it... just... i have better thing to do in the time i put towards nouveau
20:44 imirkin_: and it wasn't trivially easy to do :)
20:44 hakzsam: sure, I guess it's not easy yeah
20:44 imirkin_: fermi and kepler were pretty similar... i only discovered differences pretty late in the tess development cycle
20:44 imirkin_: (to do with indirect accesses)
20:44 hakzsam: oh okay
20:44 imirkin_: while maxwell accesses these from a totally different place
20:45 imirkin_: i think it's semi-similar to how GS works, but i dunno if it's the same (or if GS was even done correctly)
20:45 hakzsam: I don't know how GS works on maxwell, but I'll have a look when I'll work on tess
20:46 hakzsam: anyway, if we get images before tess, we will be able to bump from 3.3 to 4.2 in one shot :)
20:46 imirkin_: https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_gm107.cpp#n168
20:46 hakzsam: this is for what? GS?
20:47 imirkin_: yeah
20:47 imirkin_: for fetching vertex attribs in GS
20:47 imirkin_: since you get it primitive-at-a-time
20:47 hakzsam: okay
20:47 imirkin_: vfetch takes an extra arg
20:47 imirkin_: which is the "lane" to fetch from
20:47 imirkin_: or... something
20:47 imirkin_: and this is the result of pfetch
20:48 imirkin_: however it's different for tess
20:48 hakzsam: well, tracing the blob with mmt will help for sure :)
20:48 imirkin_: yeah, there are traces already
20:48 imirkin_: https://people.freedesktop.org/~imirkin/traces/gm107/
20:48 imirkin_: sanity = sanity.shader_test, quads = quads.shader_test
20:49 hakzsam: cool
20:49 hakzsam: imirkin_, btw, are you going to have some time to look into the frag/comp issue for images on fermi?
20:50 hakzsam: I'll probably have an other look later, but I'm a bit lazy right now
20:50 hakzsam: especially because I have tried a ton of different things
20:50 hakzsam: without any good success
20:50 hakzsam: except the patch I pasted you earlier today :)
20:51 imirkin_: hakzsam: not today, most likely
20:51 imirkin_: hakzsam: maybe tomorrow? not sure.
20:51 hakzsam: np
21:15 karolherbst: hakzsam: somehow I get the feeling that perf is completly broken on my system
21:15 karolherbst: because perf report doesn't respect any parameters at all
21:35 karolherbst: hakzsam: mhh 20% inside the kernel
21:36 karolherbst: and another 20% inside libdrm_nouveau
21:53 karolherbst: okay, as it seems running under real conditions changes quite a lot
22:18 karolherbst: imirkin_: where should be a pass put for: set ge $r1 neg $r1 => set ge $r1 0? Algebraic?
22:18 imirkin_: i think so yea
22:19 karolherbst: maybe this is enough to trigger other passes
22:26 karolherbst: imirkin_: what are those LTU, NEU cond codes? U=unsigned?
22:26 imirkin_: if only
22:27 karolherbst: ?
22:27 imirkin_: iirc they're the stupid unordered things from floating comparisons
22:27 karolherbst: ohh
22:27 karolherbst: unordered
22:27 karolherbst: the heck
22:27 imirkin_: i.e. foo > nan. do you want true or false.
22:27 karolherbst: oh
22:28 imirkin_: ge = false. geu = true.
22:28 karolherbst: ahh okay
22:28 imirkin_: or something along those lines. mwk will know for sure.
22:28 karolherbst: is there a short code for cc == CC_LT || cc == CC_LE || cc == CC_LTU ...
22:28 imirkin_: what are you trying to do?
22:28 imirkin_: there's reverseCondCode()
22:28 imirkin_: and another related helper
22:29 karolherbst: well if you compare a number with the negated self, you usually only test for signess or a comparision against 0 is enough
22:29 karolherbst: which makes it a bit easier to deal with that instruction
22:30 karolherbst: like i >= -i <==> i >= 0
22:38 karolherbst: imirkin_: and with that we can merge set cc $r0 $r63 + predicated mod insn => slct
22:38 karolherbst: or + anything really
22:39 karolherbst: and then we have no predicate set anymore (and most likely one instruction less)
22:45 karolherbst: or is this already part of the sel peephole and I should just create a new pass and deal with that there completly?
22:45 imirkin_: dunno
22:49 karolherbst: mhh okay, maybe this way: the target of a sel peephole is to optimize simple conditional code into slcts. This can be anything set related + simple dependend instruction (like mov, abs, neg...)
22:50 karolherbst: in short anything like "x = condition ? foo : bar"
22:51 karolherbst: yeah, maybe this makes sense
22:51 karolherbst: then we can iterate over all BBs and get the last instruction
22:52 karolherbst: and if the condition and result of those condition is easy enough, we can just modify the result into a slct
22:52 karolherbst: mhhh
22:52 karolherbst: and then we could also have phi instructions depending on that
22:52 karolherbst: yeah.. maybe pre SSA is really the easiest way to do that