00:01orbea: Besides asking in here with an trace what is the best way to determine if a graphical issue in a game is an nouveau issue or an issue of the emulator / game itself? Example, http://ks392457.kimsufi.com/orbea/stuff/trace/Persona3_PCSX2.trace.xz The windows on top strobe / flash as the camera angel is changed, so far it seems restricted to that area of the game.
00:01karolherbst: imirkin_: wuuuut
00:02karolherbst: imirkin_: just for giggles, I used current->cut() and attach on while iterating the edges
00:02karolherbst: and it worked
00:03karolherbst: no clue what happend with the unconnected BB though
00:03karolherbst: just vanished I think?
00:07imirkin_: karolherbst: that's a huge program
00:07imirkin_: you need to make something like ... 3 lines
00:07imirkin_: can't tell what's going on here
00:07karolherbst: well in my GUI differ it looks quite nice :)
00:08imirkin_: orbea: i get the same on i965
00:08karolherbst: imirkin_: I try to have up a small TGSI
00:08orbea: cool, so its probably pcsx2 or the game then
00:09imirkin_: or a mesa-wide thing
00:09orbea: yea, that too
00:09imirkin_: 1876: message: major shader compiler issue 171: 0:406(9): warning: `t' used uninitialized
00:09imirkin_: i get a bunch of those
00:10imirkin_: probably related
00:10orbea: yea I saw them too when I replay it
00:10imirkin_: #define PS_DEPTH_FMT 0
00:11imirkin_: but setting it is only defined for PS_DEPTH_FMT 1/2/3
00:11imirkin_: oh, but it's probably ok
00:12imirkin_: yeah, that sample_depth function is never called
00:13karolherbst: imirkin_: realod, is that better now?
00:13orbea: I'm thinking it might be worth reporting on the pcsx2 github page
00:14imirkin_: orbea: go for it
00:14karolherbst: imirkin_: funny enough in RA new BBs are added because of those Edges :D
00:15karolherbst: I am pretty sure this is completly messed up
00:15imirkin_: so we don't stick an explicit branch at the end there
00:15imirkin_: i dunno how that even works, but that's how it's always been =/
00:17karolherbst: well as long as nothing breaks
00:17imirkin_: perhpas it's encoded somewhere clever
00:17imirkin_: that i forgot about
00:17karolherbst: the pre-RA think looks good though
00:19karolherbst: imirkin_: and how do I know which edge is taken when " 29: not %p154 bra BB:3 (0)" evaluates to false?
00:19karolherbst: being the last instruction in the BB
00:20imirkin_: that's what i'm saying.
00:20imirkin_: that branch is missing.
00:20imirkin_: it follows the CFG
00:20imirkin_: but how does it know which edge? dunno.
00:21karolherbst: it doesn't matter I think anyway
00:22karolherbst: because I detect that this bra is allowed to be rebinded to something else
00:22karolherbst: and it will be the only bra in that BB
00:22karolherbst: so I guess if I have only "tree" edges pointing to the same Node, I am fine to remove the predicate
00:32karolherbst: but maybe the issue is just that multiple edges declare the same connection
00:40karolherbst: okay, having only one edge setups RA
01:04karolherbst: imirkin_: ../../../../../src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp:2863: bool nv50_ir::FlatteningPass::tryPredicateConditional(nv50_ir::BasicBlock*): Assertion `pred' failed.
01:04karolherbst: seems like it is unhappy
01:05imirkin: seems like.
01:06karolherbst: had to teach this pass some manners, now it deals with that
01:07karolherbst: yeah lol, now the last DCE is skipped :/
01:08karolherbst: 206->189 instructions :)
01:10karolherbst: imirkin: any suggestion when to run this pass?
01:13imirkin: after DCE, and stick a second DCE after if if it does something
01:13karolherbst: well we have two DCEs already
01:13imirkin: (you coudl even invoke the DCE pass directly from it if you wanted)
01:13imirkin: DCE is cheap in most cases
01:14imirkin: [i.e. when it does nothing]
01:14karolherbst: so just before the last DCE
01:14imirkin: just after.
01:14imirkin: and then add another one after it
01:14karolherbst: instruction count doesn't make the pass more expensive
01:14karolherbst: bb count is what makes
01:21karolherbst: now I remove BBs even if it is something like BB:0 -> BB:1 1: exit
01:43karolherbst: my gpu crashed
02:11karolherbst: meh flattening pass destroys all the benefits now :/
02:12karolherbst: leaving stupid join/joinat/join constructs
02:13karolherbst: one shader preRA: 326 -> 279, but postRA: 208->233
02:13karolherbst: but for bioshock_infinite over all: total gprs used in shared programs : 18673 -> 18612 (-0.33%)
02:13karolherbst: and no hurt in gpr count
02:14karolherbst: just instruction count is messed up :/
02:14karolherbst: will take a look tomorrow
02:14karolherbst: ohh right
02:14karolherbst: because those BBs doesn't exist anymore
02:17karolherbst: now only joins are left
02:26karolherbst: total instructions in shared programs : 177159 -> 173804 (-1.89%)
10:09karolherbst: removing exit instructions sounds bad
10:09karolherbst: ohh I removed a -> BB:3 (cross)
10:10karolherbst: because BB:3 is empty besides a simple edge to BB:1 and BB:1 contains the exit
11:37karolherbst: how can I disassemble the output of nouveau_compiler?
11:55karolherbst: huh, envydis doesn't parse the scheds...
11:56karolherbst: prebrk 0x178 -> prebrk 0xffffffffffffffa8
11:56karolherbst: yeah, that looks bad
12:51mupuf: I tested nouveau's backlight support on my nvd9
12:51mupuf: well, while the firmware does set the enable bit, nouveau does not
12:52mupuf: and you know what? It does not matter
12:52mupuf: but it is not a clock selector
12:52mupuf: mwk: ^^
13:26mwk: mupuf: hm?
13:26mupuf: I am talking about the SOR PWM registers
13:26mupuf: you were the one adding "initial documentation" for them
13:26mwk: I'm afraid I don't remember this
13:27mupuf: it would seem that this clock selector bit is actually an enable bit (bit 30)
13:27mwk: as in, adding the documentation
13:27mupuf: but ... on SOR, it does nothing
13:27mwk: are you certain of it?
13:27mupuf: on SOR it really disables the clock
13:27mwk: a clock selector bit with one input tied to DC could easily look like an enable
13:27mupuf: well, on SOR, I set the frequency to ~2Hz
13:28mupuf: that would result in a constant output then
13:28mupuf: which is not the case, I set the div so as to get a PWM freq of 2Hz
13:28mupuf: and changed the bit
13:28mupuf: and no changes in frequency
13:29mupuf: on SOR, I get no output at all
13:31mwk: hold on
13:31mwk: so, on SOR, the bit doesn't do much?
13:31mwk: doesn't change the frequency?
13:31mwk: and on SOR, it acts as a disable?
13:31mwk: or as enable?
13:32mwk: have you been able to determine the clock source frequency for either?
13:33mwk: and that's true for SOR regardless of bit 30?
13:33mupuf: pretty easy, nvidia set the output clock to 100 Hz for the logo and 200 Hz for the backlight
13:33mwk: what video mode do you have active on the display?
13:33mupuf: why does it matter?
13:34mwk: because my bet is that one of the settings is a pixel clock
13:34mupuf: that would be insane
13:34mwk: and the other is some other clock
13:34mupuf: ok, I was using KMS
13:34mwk: likely crystal
13:34mupuf: yeah, it would be crystal
13:34mwk: and the SOR is actually in use, right?
13:35mwk: could you try changin the resolution?
13:35mwk: or pixel clock directly, for that matter?
13:37mupuf: well, ok, I will try to get my hand on the laptop again and test :D
13:38mupuf: but the pixel clock has 0 chances of being close to 27Mhz
13:38mupuf: so, I would expect to see a sharp difference
13:39mupuf: you know, going from 2 Hz to ... a big value since the pixel clock is likely in the GHz range
13:39mupuf: which is ... 500 times more than 27 MHz
13:39mwk: pixel clocks are in ~100MHz range
13:40mwk: one of the VGA modes (I forget which) has exactly 27MHz pixel clock, for that matter
13:40mwk: this is why they used the frequency for crystals in the first place
13:40mupuf: for VGA, I can understand because it is parrallel, but what about eDP?
13:40mupuf: in any case, I will try
13:40mwk: oh, eDP
13:41mwk: that's.. a different beast
13:41mupuf: well, it is an internal panel
13:41mwk: I thought we were talking LVDS
13:41mupuf: oh, could be that too, right
13:41mwk: then the pixel clock is unrelated to the resolution
13:42mupuf: in any case, SOR's PWM has always been used for the backlight and nothing else
13:42mwk: that's true
13:42mwk: but someone could have the idea of feeding it with pixel clock
13:42mupuf: so, I would not expect nvidia to do anything complicated for this ... but hey, they did it for the fans!
13:42mupuf: they finally came to their senses in ... maxwell
13:42mwk: I'd say check if that laptop uses eDP, and if so ditch it and try on some other one
13:43mupuf: no, pretty sure it is LVDS
17:42karolherbst: mupuf: we have somebody with fan problems on fermi, fans are always at max, if you want to take a look, I've uploaded the nvc1 vbios
17:58karolherbst: yay, finally I got a BB removed without changing the generated binary at all :)
18:00mupuf: karolherbst: yeah, I have one nvc1 with the same issue...
18:00karolherbst: mupuf: nvc1/gouchi
18:01karolherbst: ohh wait
18:01karolherbst: big announcement tomorrow for nvidia?
18:01Tom^: so it seems
18:01Tom^: probably new cards
18:01Tom^: in line with battlefield announcement
18:02karolherbst: ohh no
18:02karolherbst: that's like in the night
18:02karolherbst: +9 hours
18:03karolherbst: stupid nvidia :D
18:25karolherbst: "total instructions in shared programs : 45395 -> 45382 (-0.03%)" in valley :D
18:26karolherbst: but less branching too!
18:32karolherbst: imirkin: seems like using node.attach and node.cut it is rather safe to remove/add edges, because in the entire shader-db run I get no crash
18:34karolherbst: now I have to deal with the flattening pass being a bit stupid now :/
18:36karolherbst: yay, and now bioshock also runs :)
18:43hakzsam_: yeah probably pascal tomorrow :)
18:43karolherbst: I kind of hope they use 512bit GDDR5 for the high end GPUs :D
18:43karolherbst: but maybe the 1080 gets hbm2
18:44karolherbst: and the 1070 512bit gddr5
18:44karolherbst: or maybe both hbm2
18:44hakzsam_: we will see
18:44karolherbst: gddr5 will be good for nouveau
18:44karolherbst: but f16 will be fun already
18:45karolherbst: a lot of codegen changes :D
18:45karolherbst: maybe there will be fmul with two immediates possible or something crazy like that
18:46karolherbst: two 16 bit values
18:46karolherbst: or are the instructions 32bit encoded? :/
18:46karolherbst: ohh wait
18:46karolherbst: we don't know yet
18:47hakzsam_: we don't know, and you will have to wait more :)
18:47karolherbst: argh, stupid flattening pass :/
18:48karolherbst: https://gist.github.com/karolherbst/8347eeb56dd8e6fd55029e757ce0ec06 :/
18:49karolherbst: this is all so super usefull
18:50karolherbst: my pass does the right thing, but flattening can't deal with it
18:51karolherbst: but it is trivial in theory
18:53karolherbst: ohh wait
18:53karolherbst: RA is being stupid
18:54hakzsam_: something is wrong
18:54karolherbst: hakzsam_: why?
18:54karolherbst: hakzsam_: because both edges point to the same BB?
18:55hakzsam_: oh no sorry I was looking at the wrong paste
18:55karolherbst: I've udpated it
18:55karolherbst: but now you see what my Pass is doing
18:55karolherbst: and what RA does out of it
18:56karolherbst: Pass is called EmptyBranchElim, guess what it does ;)
18:56hakzsam_: yeah, the name is correct ;)
18:56karolherbst: flattening removed all those useless joinats and joins
18:56karolherbst: but after I run my pass it doesn't anymore
18:56karolherbst: and I am left with all this crap
18:57karolherbst: and I think it is RAs fault
18:57mgoodwin: off topic but does anyone know where to link up DFP-1,2,3 info with HDMI-0,DVI-D-0,DP-0 information
18:57karolherbst: BB:3 looks messed up already
18:57mgoodwin: Every source I find either lists the display as one or the other
18:58karolherbst: hakzsam_: maybe I can teach flattening to unmess it, :/ I think I've done it already, but had other issues and reverted it
18:59hakzsam_: maybe yeah
18:59karolherbst: getting rid of the joinats was easy
19:00karolherbst: and the remaining issue is eomthing like this:
19:00karolherbst: BB:88 -> BB:69 (forward) 270: join BB:69 (8)
19:00karolherbst: next BB
19:00karolherbst: the join is pretty pointless here
19:01karolherbst: BB:69 -> BB:70 (forward) 272: join BB:70 (8)
19:01karolherbst: and so on
19:02karolherbst: hakzsam_: do you know how join works?
19:02karolherbst: because I have no clue
19:03hakzsam_: sorry, I have to go, will be back later in ~1h
19:04karolherbst: joinat sets a join point and join joins the splitted execution
20:42imirkin_: mwk: can you remind me what i need to do to operate a nv2x graph class on a nv34? do i just use it, or is there more to it?
20:44karolherbst: imirkin_: mind taking a look at this: https://gist.github.com/karolherbst/8347eeb56dd8e6fd55029e757ce0ec06 and tell me why the flattening pass can't optimize those joins away?
20:45karolherbst: the pass seems to work (at least I could play bioshock allthough the pass touches most of the shaders)
20:45imirkin_: karolherbst: i dunno offhand, sorry
20:45imirkin_: would have to see what all your pass does
20:47mwk: imirkin_: I think you just use it...
20:47mwk: but then I never tested that
20:48mwk: you might need to do some extra context setup
20:49imirkin_: i guess we'll find out
20:49imirkin_: i want to test nouveau_vieux on a NV34
22:33karolherbst1: imirkin: here is the pass: https://github.com/karolherbst/mesa/commit/7e3e74262aa8244ba318a5759d6d7c3a1a5ec736
22:33karolherbst1: imirkin: I am sure my messing in flattening messed it up :/
22:34karolherbst: ahh shit, I removed some local changes, but it shouldn't matter that much
22:34karolherbst: just an improved targetOfBasicBlock
22:37karolherbst: uhh I removed also important stuff
22:38karolherbst: ahh no, it should be fine
22:46karolherbst: imirkin: do you think it may be a problem when I change the tree edges from for example BB:5 and BB:6 both to BB:8 and this messes the flattening pass or RA up a bit?
22:46imirkin_: i think you have too many BB's
22:46imirkin_: you need to make a SIMPLE example
22:46imirkin_: and then make sure you get that one *right*
22:47imirkin_: not "working", but actually right
22:47imirkin_: where you can easily reason about what is going on and making sure that the proper transformations are being applied
22:47imirkin_: then you can move on to a more complex example, repeat
22:48karolherbst: simple examples already work, it just gets messy when there are many join joinat joins
22:48karolherbst: besides that everything else looks fine
22:51karolherbst: mhh maybe it would help if I would just reduce it to one Edge of each kind and then the issue resolves itself. Having two Tree edges pointing to the same BB sounds wrong no matter how you look at it
22:55calim: *yawn* ... what was that about branches and edges ... the edges are supposed to be ordered like the branches
22:55karolherbst: calim: in which order?
22:56imirkin_: calim: it seems like sometimes the converter doesn't add a branch... it's implied where it goes
22:56imirkin_: calim: like for a if/else
22:57calim: when there is no branch there can only be 1 edge
22:57imirkin_: hm, i think we were seeing a situation like (not %p0) bra BB:n
22:57imirkin_: and no unconditional branch at the end
22:57calim: the order of edges shall be as the iterator says
22:58imirkin_: does that sound wrong to you?
22:58imirkin_: basically karolherbst is trying to nuke empty if/else's
22:58calim: in that case the first edge will go to BB:n and the 2nd edge indicates where to fall through to
22:58imirkin_: since otherwise can't dce the if's condition
22:59karolherbst: calim: any reasons why it is ordered that way?
22:59calim: so that we can have implicit branches ;)
22:59karolherbst: calim: I always got the feeling it would be easier to have a FlowInstruction -> Edge mapping
23:00calim: which might not be the best idea, if you like just add the explicit unconditional branches everywhere
23:00karolherbst: and BB -> default Edge
23:00calim: best idea <- the implicit branches
23:00karolherbst: it makes the code kind of not easy understandable
23:00karolherbst: and hard to mess with the edges
23:01calim: hm, a FlowInstruction to Edge mapping sounds like a good idea; I mean, it exists, it's just not ... surjective (I hope that fits here)
23:02karolherbst: it isn't explicitly stated
23:02karolherbst: calim: well what I need is something like this: change the Edge of a FlowInstruction and change the default Edge of a BasicBlock
23:02karolherbst: that's all what I need to do
23:03karolherbst: like if BB:1 -> BB:2 // BB:1 --(conditional bra)-> BB:3 // BB:2 bra -> BB:4 // BB:3 bra -> BB:4
23:03karolherbst: all I want to do is: BB:1 bra -> BB:4
23:03karolherbst: and nuke the useless empty BBs away
23:04karolherbst: there are also shaders with like over 30 empty branches
23:04karolherbst: and tons of join/joinats
23:05calim: so, nuke all edges and branches and empty blocks and just add an edge from BB:1 to BB:4
23:06karolherbst: yeah, that's my plan now
23:06karolherbst: my current code already uses node->attach and node->cut
23:06calim: and add an unconditional branch if you like :)
23:06karolherbst: calim: nah, the branch is already there
23:06karolherbst: calim: I need to do that in two phases
23:06karolherbst: 1. phase: rebind the nodes
23:06karolherbst: 2. rebind the flowInstructions
23:07karolherbst: and in the 2. phase conditional branches may become unconditonal branches
23:07karolherbst: and enable DCE to nuke more instructions
23:07karolherbst: but before I can reduce them I have to be sure of a BB ends in the same BB no matter which path is taken
23:10calim: the graph manipulation interface is a bit ... crappy
23:10karolherbst: well I think I managed now somewhat
23:11karolherbst: well bioshock infinite still runs without issue and shader-db run didn't crash
23:11karolherbst: so that's at least something
23:13calim: the inconvenient part is that you have to maintain the order of the edges when you want to change the target of one
23:14karolherbst: there will be just one though :/
23:14karolherbst: but the thing is, I also have to nuke the branch at the end away then I guess?
23:15calim: that shouldn't be necessary, there just shouldn't be more branches than edges
23:15karolherbst: I am sure flattening will crash
23:16calim: then flattening will likely need fixing
23:17karolherbst: calim: example tgsi: https://gist.github.com/karolherbst/bcad9d2bab6165272834108f9d650097
23:17karolherbst: after 158: is the good part
23:17calim: flattening is very powerful
23:18calim: yay shader compiler
23:18calim: is NOT very powerful
23:18calim: it's dumb
23:18karolherbst: I don't think those empty branches exists in the glsl phase
23:18karolherbst: no idea where they come from
23:20karolherbst: I think the "( Const0.y ) / ( ( Temp.y == 0.0 ) ? 3.0E-37 : ( Temp.y ) );" thingies are those empty branches
23:20karolherbst: but well
23:31karolherbst: what is the term parameter in bool reachableBy(const Node *node, const Node *term) const; by the way?+
23:35calim: a node that counts as a blockage of reachability
23:35karolherbst: ahh okay
23:36calim: so if it's only reachable via term, it's not reachable (helps terminate the search early)
23:36calim: yes that should be commented
23:36karolherbst: at least I got the doubled edges removed now
23:36karolherbst: and no crash! :D
23:36karolherbst: well I have to revert the flattening changes too now
23:39karolherbst: RA isn't stupid anymore
23:39karolherbst: and doesn't create new BBs
23:42karolherbst: I even can nuke those bra instructions now
23:42karolherbst: calim: maybe you have an idea why flattening isn't nuking those join/joinats? https://gist.github.com/karolherbst/c2e8657807add2333cf86648332b4518
23:44karolherbst: calim: without my pass: https://gist.github.com/karolherbst/cdc554404035ea6e748ee88fbf3c4e79
23:46karolherbst: maybe flattening needs conditional branches and two edges two work?