02:23 mupuf: hehe
02:24 mupuf: mwk: it had to be, right? :D
02:25 mooch2: welp, time to put in some of this nonsense into my emulator
10:07 mwk: mupuf: NV1's isn't :)
10:43 mwk: ugh, the 2d hwdocs are fucking hopeless
14:50 kourbou: Hey I'm getting freezes in X with a GTX 660 Ti - log: https://ptpb.pw/gVKx
14:51 kourbou: "nouveau 0000:01:00.0: fifo: write fault at * engine 00 [GR] client 0e [GPC2/GPCCS] reason 02 [PTE]"
14:55 kourbou: If you need more specifics, I'm happy to provide.
14:55 karolherbst: any idea which application was at those cordinates?
14:56 kourbou: Yes, probably Chromium.
14:57 karolherbst: I see
14:57 kourbou: Probably a video I was watching on Kickstarter.
14:57 karolherbst: did you change the pstate?
14:57 kourbou: No? What's that?
14:57 kourbou: Nah I don't overclock. :P
14:57 karolherbst: it isn't overclocking
14:58 karolherbst: any special webpage? I might be able to reproduce it, but I guess I could just use chromium and see when it messes up my gpu
14:58 kourbou: Oh you think it's probably reproducible?
14:58 karolherbst: might be, no clue
14:59 karolherbst: will run chromium on my nvidia gpu and see what I can do
14:59 kourbou: Okay thanks. I saw people mention the 660 is being buggy online.
14:59 karolherbst: doesn't matter
14:59 kourbou: Link to video: https://www.kickstarter.com/projects/1003614822/ponomusic-where-your-soul-rediscovers-music
15:00 karolherbst: at random position?
15:00 kourbou: Uhm. I can try remembering position.
15:01 kourbou: Yup. 6 minutes 50 seconds.
15:01 kourbou: After that.
15:01 karolherbst: reproducible?
15:01 kourbou: Shall I test?
15:01 karolherbst: I think it will happen randomly though
15:02 kourbou: Yeah, not reproducible, no.
15:03 kourbou: But I have had many freezes before switching to the propietary drivers.
15:03 kourbou: DOn't know if journalctl still has those in log.
15:03 kourbou: I'll check.
15:06 kourbou: Just checked, I had a similar crash yesterday actually.
15:08 kourbou: karolherbst: https://ptpb.pw/jcab <- Both write faults.
15:27 kourbou: Finding stuff online about NvGrUseFw=1. Would that help and where do I enbale it?
15:42 imirkin: kourbou: a number of people with GK106's have had issues with the nouveau context-switching fw. see https://bugs.freedesktop.org/show_bug.cgi?id=93629 for some advice.
15:44 kourbou: Thanks imirkin :)
15:54 karolherbst: imirkin: marek managed to improve borderlands 2 performance by 70% by implementing gl threading :) but somehwat I don't see the magic in his commit :O
15:55 karolherbst: ohh I guess radeonsi has already something like that, and now it is just implemented by r600 or so
15:56 kourbou: imirkin: What comment 14 refers to as "nvidia blob" is NvGrUseFW=1?
15:57 imirkin: kourbou: that's an option to load ctxsw fw off the filesystem. what you stick in there is up to you.
16:01 kourbou: Alright I'll try it.
16:07 kourbou: Uhm, I set it but it got uglier.
16:07 kourbou: I still managed to get into X though.
16:07 kourbou: "DRM: failed to create kernel channel, -22"
16:08 kourbou: "bus: MMIO read of 00000000 FAULT at 001940 [ !ENGINE ]
16:08 imirkin: you probably failed some bit of it
16:08 kourbou: Maybe I need the nouveau-fw package on AUR.
16:09 karolherbst: most likely not
16:09 kourbou: Then uhhhh.
16:09 imirkin: kourbou: stuff got renamed, there's no current package that provides the stuff
16:09 kourbou: Dammit.
16:10 kourbou: Also: "Direct firmware load for nvidia/gk106/fecs_inst.bin failed with error -2"
16:10 imirkin: that means the accel stuff didn't load.
16:10 kourbou: Oh. Does that explain the kernel channel fail?
16:13 imirkin: yep.
16:17 kourbou: I don't even have gk106 folder.
16:34 kourbou: Where do I find nouveau/fuc409c etc... ?
16:35 kourbou: From the source?
16:36 kourbou: Oh no I have to extract it, right?
16:42 pmoreau: kourbou: Just renaming what you get from nouveau-fw should be good enough.
16:44 kourbou: Thanks pmoreau
16:47 pmoreau: kourbou: 409c -> fecs_inst, 409d -> fecs_data, 41ac -> gpccs_inst, 41ad -> gpccs_data
16:56 kourbou: Giving it a shot.
16:59 kourbou: Seems to have worked. :D Thanks again pmoreau and imirkin.
17:03 pmoreau: kourbou: Cool! Hopefully it works a bit better with them.
17:10 karolherbst: imirkin: how can I increase the shader storage in vram?
17:13 imirkin: karolherbst: https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/nvc0/nvc0_program.c#n803
17:13 karolherbst: thanks
17:21 karolherbst: \o/ kernel bug within nouveau
17:23 karolherbst: just got this one: https://gist.github.com/karolherbst/bfb1418dedf950b6612d27320494b6e4
17:29 karolherbst: imirkin: does a page fault here look like a threading/multi context issue to you? https://github.com/karolherbst/nouveau/blob/master_4.8/drm/nouveau/nvkm/subdev/mmu/gf100.c#L183
17:29 karolherbst: nvkm_vm_unmap_at called it
17:32 imirkin: karolherbst: there's a mutex
17:32 imirkin: and mutexes make everything better
17:32 imirkin: unless something else modifies that list without holding the mutex :)
17:32 imirkin: gr. you're going to make me look at code, aren't you...
17:34 imirkin: seems like all uses of pgt_list are locked by the mutex
17:34 karolherbst: I've created a bug
17:35 karolherbst: gdb isn't really trustworthy here, but I think the previos and the next line won't cause such a page fault
17:35 imirkin: er, pgd_list
17:36 imirkin: [43305.187176] IP: [<ffffffffa00cdff8>] gf100_vm_flush+0xc8/0x1b0 [nouveau]
17:36 imirkin: let's see what decodecode says
17:36 karolherbst: decodedecode?
17:37 imirkin: decodecode :)
17:37 karolherbst: ohh
17:37 karolherbst: no clue what it is, I checked with gdb
17:37 karolherbst: I have debug symbols enabled in my kernel and stuff
17:38 imirkin: it's a script
17:38 imirkin: in the linux kernel
17:38 imirkin: which decodes the "Code: " line
17:38 imirkin: for those of us who can't decode an x86 instruction stream in our heads :)
17:39 karolherbst: ohh I see, that's why I never used it
17:39 imirkin: coz you can decode an x86 instruction stream in your head? :p
17:39 karolherbst: well, my head can do this, I don't have to think, I just see it
17:39 imirkin: like with the matrix? :)
17:39 karolherbst: exactly
17:40 imirkin:wonders what the redhead opcode sequence looks like...
17:40 karolherbst: I also don't get the concept of colours, because everything has the same, just in a different shade
17:40 imirkin: anyways, if you can include that disassemble result, that'd be convenient
17:41 karolherbst: already done
17:47 imirkin: gr. i hate how clever gcc is =/
17:50 imirkin: karolherbst: yeah, so you're definitely right. and obj == 41ad09000
17:50 imirkin: which is a very odd value for a kernel pointer
17:52 imirkin: er hm. i guess it's a mapped thing...
18:03 karolherbst: most likely
18:04 karolherbst: well I am still on that kernel
18:04 karolherbst: anything I can read out until I have to reboot?
18:08 imirkin: dunno i doubt it
18:08 imirkin: this seems like a pretty straightforward data race
18:08 imirkin: unfortunately i don't know enough about the data :)
18:10 karolherbst: should be easy to solve then :D
18:12 imirkin: just have to get ben interested in fixing it
18:12 karolherbst: uhm, it isn't always locked
18:12 imirkin: i glanced at things and it seems like there are no *obvious* thinkos in there
18:12 imirkin: which leaves the less-than-obvious thinkos, unfortunately
18:12 karolherbst: nvkm_vm_del and nvkm_vm_unmap_pgt don't lok, but they are also static functions
18:13 karolherbst: *lock
18:13 karolherbst: and nvkm_vm_map_pgt
18:13 karolherbst: nvkm_vm_ref -> nvkm_vm_del no lock
18:14 karolherbst: or am I wrong?
18:15 karolherbst: ohh wait that's a refcounting thing
18:22 imirkin: vm_del happens at teardown
18:22 imirkin: so not an issue either way
18:23 karolherbst: true
18:24 karolherbst: maybe it was caused by garbage gotten into the kernel by mareks branch
18:49 netz_work: karolherbst: hello o/
19:35 karolherbst: hakzsam_: when I add a new entry to nvc0_sw_query_drv_stat_names, where do I have to add the actual implementation?
20:59 karolherbst: hakzsam_: uhh, post_ra_dead doesn't work with my kepler, no instruction is considered dead by it
21:00 karolherbst: which is very odd
21:00 karolherbst: ohhh wait
21:00 karolherbst: my mistake
21:10 karolherbst: hakzsam_, imirkin: interesting fact regarind that mad optimisation: folding in the immediates only changes performance from 1026->1028 and dropping those movs improves perf to 1044
21:10 karolherbst: I always though movs are like super cheap
21:11 imirkin: no mov is cheaper than a mov :)
21:12 karolherbst: sure
21:12 imirkin: i'm highly surprised folding immediates but not getting rid of mov's makes any diff at all
21:12 karolherbst: but I thought the reduced waits might have a bigger impact than those movs
21:12 karolherbst: less waits
21:12 imirkin: ah i guess
21:12 imirkin: what's the wait after a mov though?
21:12 karolherbst: well, seems like there is one
21:13 karolherbst: evere instruction has a wait though
21:13 karolherbst: even movs
21:13 karolherbst: most of the time it doesn't matter though
22:07 karolherbst: imirkin: how do I check for an immediate best in RA? getImmediate too? or is there something better?
22:08 imirkin: karolherbst: no, at that point, it should be there
22:08 imirkin: can just check if it's FILE_IMMEDIATE
22:08 imirkin: (same as with FILE_GPR)
22:08 karolherbst: keep in mind I also have to do this for src0
22:08 imirkin: karolherbst: and really you should check if it's a LIMM i think
22:09 imirkin: hm?
22:09 imirkin: src0 can't ever be an immed
22:09 karolherbst: if the limm is at src0 I swap with src1
22:09 imirkin: only src1
22:09 imirkin: that can't happen
22:09 karolherbst: I meant if the mov is there
22:09 imirkin: well
22:09 imirkin: among other things
22:09 karolherbst: we didn't load propagate it yet )
22:10 imirkin: the LoadPropagation pass
22:10 imirkin: will try to swap
22:10 karolherbst: it doesn't
22:10 karolherbst: not in this case
22:10 karolherbst: maybe it does now
22:10 imirkin: i don't think you can get into a situation like that, at least not offhand
22:10 imirkin: maybe there's some very weird funny scenario i'm not thinking of
22:10 karolherbst: I know that the swap src0/src1 in postraloadpropagation helps
22:11 imirkin: i'd be curious to see the particulars
22:11 karolherbst: k
22:11 karolherbst: also
22:11 imirkin: either way, that's fine i guess
22:11 karolherbst: I moved the emits down, so that the pass can be merged without those
22:11 imirkin: if you want to do the swap it's ok
22:12 karolherbst: I know there are some issues somewhere or at least I think
22:12 karolherbst: not quite sure
22:12 imirkin: well, making the emit functions handle those cases
22:12 imirkin: doesn't seem like it could possibly hurt anything
22:12 karolherbst: no, I mean the emit is wrong
22:12 imirkin: then fix it :p
22:12 karolherbst: I think I mess with a neg mod somewhere
22:12 karolherbst: have to be awake enough to be able to think about it
22:12 imirkin: as much as possible, i want the diff gpu versions to generate identical code
22:13 karolherbst: I know
22:13 imirkin: esp as far as "new" passes are concerned.
22:13 karolherbst: I didn't implement the emi code for no reason :p
22:14 karolherbst: I like the nvc0 version of that pass though, so much cleaner than the nv50 one
22:14 karolherbst: I think I remember that the swap had a very slim benefit ~ 0.02%, currently checking
22:15 karolherbst: mhh, seems to be some specific case
22:18 karolherbst: imirkin: yep: total instructions in shared programs : 3491772 -> 3491273 (-0.01%)
22:18 karolherbst: that's just the swap
22:19 imirkin: well, if you want, we can look at one of the affected programs in detail
22:20 imirkin: either way, i don't terribly mind the swap
22:20 karolherbst: there is a small one with 45 instructions :)
22:22 karolherbst: imirkin: last mad https://gist.github.com/karolherbst/93dd4e4987dfed4a1b02f1bbef81700e
22:24 imirkin: including tgsi please
22:24 karolherbst: https://gist.github.com/karolherbst/93dd4e4987dfed4a1b02f1bbef81700e#file-tgsi
22:24 imirkin: interesting.
22:24 imirkin: 23: mov u32 %r130 0x3fcc49ba (0)
22:24 imirkin: 24: mad ftz f32 %r131 %r130 %r126 %r121 (0)
22:25 imirkin:wonders how that happened
22:25 imirkin: time to RTFS
22:25 karolherbst: uhhh
22:25 karolherbst: that's without the RA change
22:25 imirkin: oh but curious
22:26 imirkin: it's only the very last MAD that gets folded by your thing
22:26 karolherbst: mhh
22:26 karolherbst: even with the RA change it stays this way
22:26 imirkin: 31: mad ftz f32 $r2 $r4 2.018000 $r2 (8)
22:27 karolherbst: yeah
22:27 karolherbst: just saw it
22:27 karolherbst: 28 even
22:27 imirkin: either way, let's RTFS
22:28 imirkin: a-HA
22:28 imirkin: if ((isCSpaceLoad(i0) || isImmdLoad(i0)) && targ->insnCanLoad(insn, 1, i0)) {
22:28 imirkin: so we only do the swap if the target can load the relevant op
22:28 imirkin: if it can't, we leave it alone
22:28 imirkin: ok, so that's all reasonable then. have the swap in your pass too.
22:30 karolherbst: yeah, I already thought about adjusting insnCanLoad, but... everytime I look at it, I don't understand much
22:30 imirkin: leave it alone
22:30 imirkin: it's for pre-ra
22:31 karolherbst: ahh I see
22:31 karolherbst: okay, now pimping RA even more
22:32 karolherbst: so I should simply check if the file of src(0,1) is FILE_IMMEDIATE?
22:33 karolherbst: mhh, too optimistic I guess
22:34 imirkin: something like that
22:35 karolherbst: uhh by the way, a pretty handy extension for github (file tree on the left side): https://github.com/buunguyen/octotree
22:38 karolherbst: mhh, somehow using getImmediate looks like the best solution to me
22:38 karolherbst: really don't want to deal with all that stuff there again
22:39 karolherbst: allthough maybe it is fine
22:39 karolherbst: there shouldn't be any chained muls anymore in RA
22:41 imirkin: no, don't use getImmediate
22:49 karolherbst: so at least that flag thing doesn't have any affect
22:49 karolherbst: *effect
22:49 imirkin: i'm sure it doesn't
22:49 imirkin: it'd be eminently rare to have a mad with a flag
22:49 karolherbst: k
22:56 imirkin: but it's weird to have unnecessary restrictions
22:56 imirkin: it confuses people who read the code later on
22:56 karolherbst: the ra code was for short encodings to begin with
22:56 karolherbst: I should adjust the comment
22:57 imirkin: you should split it up into a nv50 and nvc0 section
22:57 imirkin: the nv50 section should have the flagsDef
22:57 karolherbst: there is no other difference though, right?
22:57 imirkin: the nvc0 should have that src0/src1 point to a mov from an immediate
22:58 karolherbst: I could simply do a (targ >= 0xc0 || flagsDef) thing
23:00 imirkin: but you also need to check for the mov
23:01 karolherbst: with the check for immediates: https://gist.github.com/karolherbst/6f160103a1212524215b73952290700f
23:01 karolherbst: total change
23:02 imirkin: that's more like it
23:02 karolherbst: https://gist.github.com/karolherbst/6f160103a1212524215b73952290700f
23:02 karolherbst: added the old thing below
23:02 karolherbst: yeah, looks better
23:02 karolherbst: now I just have to implement it without using getImmediate
23:04 imirkin: hmmmmmmmmmmmmmmmmm
23:04 imirkin: now i'm starting to doubt this strategy.
23:05 imirkin: actually no. it should be fine.
23:05 karolherbst: :D
23:05 imirkin: as long as you don't use getImmediate
23:05 imirkin: that uses getUniqueInsn
23:05 imirkin: you need to use getInsn()
23:05 karolherbst: mhh
23:06 imirkin: i need to come up with a plan to fix it up
23:06 imirkin: but ... me so lazy!
23:06 imirkin: [and tired of futzing with this bs]
23:06 karolherbst: why is getUniqueInsn bad inside getImmediate?
23:07 imirkin: getUniqueInsn assumes there's a unique insn
23:07 karolherbst: and?
23:07 karolherbst: wouldn't it just return null?
23:07 imirkin: which, while true while in SSA, isn't true pre- or post-ssa
23:07 imirkin: it doesn't
23:07 imirkin: getInsn returns null
23:07 imirkin: getUniqueInsn returns the first def
23:07 imirkin: (iirc)
23:07 karolherbst: uhh
23:08 karolherbst: null if def is empty
23:08 karolherbst: *defs
23:08 imirkin: heh, sure. if there are no defs. (like for a const or whatever)
23:08 karolherbst: but yeah, otherwise the first one
23:09 karolherbst: mhh, that RA change droped pixmark piano perf
23:09 karolherbst: 1046 -> 1043
23:10 imirkin: ok, but i don't really care about that - i care about opt passes that make sense and do what they're advertised to do
23:11 karolherbst: sure, just wondering why the impact is that big
23:12 karolherbst: one mov more
23:12 karolherbst: hu
23:13 karolherbst: that should have gotten load propagated
23:14 karolherbst: uhh
23:15 karolherbst: https://gist.github.com/karolherbst/4662561f99f7571a07322cdc6e4ee323
23:16 karolherbst: r12176 isn't immediated into the third mad anymore
23:16 imirkin: perhaps you removed yoru swap logic?
23:17 imirkin: or you did the check incorrectly?
23:17 karolherbst: check the first and the 5th mad
23:17 karolherbst: same immediate
23:17 karolherbst: and they get the src2 == def preference
23:17 karolherbst: but not the third one
23:18 imirkin: preference doesn't mean requirement
23:18 imirkin: although it does try to allocate them to the same node
23:18 imirkin: either way, hard for me to tell without seeing your patches
23:18 karolherbst: https://gist.github.com/karolherbst/4662561f99f7571a07322cdc6e4ee323#file-post-sa
23:18 karolherbst: I just check the getImmediate output
23:19 karolherbst: added a "if (insn->src(0).getImmediate(imm) || insn->src(1).getImmediate(imm))" after the MAD/SAD check
23:31 imirkin: come to think of it, before running allocateRegisters, the phi nodes are all still there
23:31 imirkin: so getImmediate is fine there
23:32 imirkin: but it wouldn't be fine in your postRA pass
23:44 netz: what is the package that does nouveau+wayland? is it part of xf86-video-nouveau or sommat?
23:45 imirkin: netz: wayland is a protocol...
23:46 netz: imirkin: I get that, I think I got the answer I need in #sway, though :)
23:48 imirkin: ok cool