07:59 vpelletier: I got nouveau to work on 4k pages PPC64BE by using noaccel=1 and nofbaccel=1. Either may be optional, but I did get the "gpu lockup" log from fbcon so I thought I would disable fb accel too.
09:20 vpelletier: woo, disabling only fbaccel I get 3D acceleration in X
09:21 karolherbst: vpelletier: interesting
09:22 vpelletier: ...or at least glxinfo tells me I'm getting 3D acceleration. glxgears does not rotate nor emit any fps counts
09:23 vpelletier: I did start in "spam" log level though, maybe it is killing performance (kernel option help text did mention it hurting performance)
09:24 karolherbst: doubtful
09:26 vpelletier: looks like glxgears is getting stuck in poll syscall
09:29 vpelletier: https://pastebin.com/0crqeXuN syslog of starting glxgears with "nouveau.debug=spam nouveau.nofbaccel=1"
09:29 vpelletier: https://pastebin.com/8sYUwjgC and when exiting it
09:32 vpelletier: glxgears is getting stuck in "poll([{fd=3, events=POLLIN}], 1, -1)", fd 3 being the X unix socket
09:32 vpelletier: would vsync be somehow affected by endiannes ?
09:33 karolherbst: maybe
09:34 vpelletier: imirkin_: ping ?
09:37 vpelletier: at least, a big lesson (to me) is that fb acceleration can screw nouveau in my case
09:38 vpelletier: I would guess that part should be easier to fix than if it was the 3D accel part
09:47 vpelletier: and indeed disabling vlank I see the gears rotating
09:48 vpelletier: and fps count scroll by
12:16 imirkin: vpelletier: noaccel + nofbaccel disable all acceleration and just keep the modesetting part of nouveau
12:17 imirkin: vpelletier: that should have worked with 64k pages as well... maybe
12:18 vpelletier: imirkin: at least console worked with 64k pages
12:19 vpelletier: the new/weird thing is that nofbaccel only makes X work, and 3D acceleration looks working except for vsync
12:20 vpelletier: so fbaccel seems to put the card in a bad state from which X cannot take over
12:22 imirkin: well, fbaccel doesn't do anything that X can't do
12:22 imirkin: so probably just delaying the inevitable
12:23 imirkin: hmmm... vsync and endianness... interesitng idea
12:23 imirkin: normally it's reported by an intr, but not sure what the pre-nv50 situation is
12:25 vpelletier: would there be some counter associated with it ?
12:25 imirkin: DVI or VGA screen?
12:25 vpelletier: dvi
12:38 Tom^: karolherbst: "gamer settings" and a 1000hz zen kernel http://i.imgur.com/NkvoJKM.jpg MOOAAAAR fps MOAAAAAARR!!!
12:38 karolherbst: Tom^: :p
12:39 hakzsam: pmoreau: yeah, shouldn't be hard to do
12:40 pmoreau: Ok
12:45 imirkin: vpelletier: hmmm... the code at nvkm/engine/disp/rootnv04.c is questionable. if it's really 32-bit values it should be fine, but if it's secretly 16-bit values then it'll get read in wrong.
12:46 imirkin: vpelletier: can you change those to nvkm_rd16's?
12:46 imirkin: or adjust the code in a different way if you want
12:51 vpelletier: all the rd32 ?
12:51 vpelletier: to pairs plus a lonely one
12:55 vpelletier: ah, the lonely one is used as two pairs of 16bits, so it makes no sense replacing it
12:56 vpelletier: s/two pairs of 16bits/two 16bits values/ (getting late..)
12:59 imirkin: the whole engine may get unhappy to receiving 16-bit reads though... dunno
13:00 vpelletier: as long as it does not kill the card, that's fine for me
13:00 imirkin: well - it might kill it for that boot
13:00 imirkin: but always resettable :)
13:02 imirkin: otoh, glxgears/etc did work fine for me on a nv34, which means that this is actually unlikely to be the cause
13:02 imirkin: otoh, i haven't tested lately, and there have been vblank changes
13:02 imirkin: [in the larger drm subsystem]
13:03 imirkin: oh - i guess i did test on 4.11
13:04 vpelletier: on a totally different topic, while kernel is building, I notice I get opengl 2.0 level on this card. from a wikipedia read, it looks like it is EXT_texture_sRGB which is missing. Grepping for this extension in mesa finds it on intel and radeon but not on nouveau. Is it a known limitation of nouveau (besides this card maybe ?)
13:08 imirkin: supported on this board ;)
13:08 imirkin: let's see... which ext is this...
13:09 imirkin: hmmmmm odd
13:09 imirkin: *should* be supported on older hw i thought
13:09 imirkin: https://people.freedesktop.org/~imirkin/glxinfo/#p=compat
13:10 imirkin: support on nv3x and nv4x just fine
13:11 vpelletier: glxingo | grep texture_sRGB -> empty
13:11 vpelletier: there are other sRGB hits though
13:11 imirkin: i believe you ... it's just odd and it should be supported. probably something silly.
13:11 vpelletier: glxinfo* (not typing from the same box, obviously)
13:11 imirkin: let's see... how is it flipped on...
13:12 imirkin: ahhhh
13:12 imirkin: heh
13:12 imirkin: it wants one of PIPE_FORMAT_A8B8G8R8_SRGB or PIPE_FORMAT_B8G8R8A8_SRGB
13:14 imirkin: whereas we expose BGRA8888_SRGB, which maps to the byte-swapped thing on BE
13:14 imirkin: i think that check needs to be adjusted in st_extensions.c
13:15 vpelletier: so, booted on https://pastebin.com/jzpyf8Xb and glxgears behaves the same (frozen when ru alone, rotating with vblank_mode=0)
13:15 imirkin: and potentially some light format work in st_formats.c
13:16 imirkin: ok
13:16 imirkin: this will require serious investigation ... i'd focus more on why fbdev breaks. that should be easier.
13:16 imirkin: since it's the definition of a trivial client...
13:17 vpelletier: while kernel was building I was hoping it would somehow also be vsync which would make it unhappy
13:18 vpelletier: actually, I'll reboot on this kernel and enable fbdev while I'm at it
13:19 vpelletier: fbaccel*
13:19 vpelletier: same "GPU lockup" message
13:20 vpelletier: and X broken
13:24 imirkin: forget about X
13:24 imirkin: figure out why fbdev accel breaks
13:29 vpelletier: any advice where to start ? I'm starting to read nv04_fbcon_accel_init
13:29 imirkin: work backwards from the gpu lockup
13:29 imirkin: i.e. what is it doing when the gpu locks up
13:30 imirkin: [i'm assuming you're a developer btw... is that a fair assumption?]
13:32 vpelletier: developer, yes. kernel developer, not much :)
13:35 imirkin: kernel's just another piece of software
13:36 imirkin: hmmmm... too bad NV5 can't work on BE =/ then i could have plugged my PCI NV17 and NV5 into the ppc and had coverage that way... o well.
13:50 vpelletier: I mean, at kernel level there is less room for introspection, pausing, etc. My daily job happens in python, which is excessively comfortable. I did debug kernel a bit on a raspberry pi, and kgdb over on-board serial port is nice, but I do not think I have this luxury here. So printf debugging I guess.
13:54 imirkin: yeah, i only ever do printf debugging, irrespective of environment
13:55 imirkin: gdb on rare occasions
14:14 vpelletier: so, it is the RING_SPACE call in nv04_fbcon_imageblit which returns -16
14:14 vpelletier: dsize=960 RING_SPACE(chan, 129) = -16
14:14 vpelletier: debug line: NV_ERROR(drm, "dsize=%i RING_SPACE(chan, %i) = %i\n", dsize, iter_len + 1, ret);
14:18 vpelletier: fwiw, the screen plugged on DVI is 1920x1080
14:32 vpelletier: and it was the first write of a 1920x16 image, which gives 0x7800, or 960 once shifted
14:36 imirkin: ok
14:36 imirkin: so RING_SPACE is basically the thing that ensures that there's still room to write commands into the buffer
14:36 imirkin: in case that there *isn't* room, it will flush the existing commands
14:36 imirkin: now, the underlying idea is that the GPU is processing stuff
14:37 imirkin: and so as long as you're not writing commands in TOO quickly, it all works out
14:37 imirkin: however if the GPU isn't making progress, you eventually run out of room entirely to write commands
14:37 imirkin: (aka "GPU has hung")
14:37 vpelletier: makes sense
14:38 imirkin: so ... do you have a full dmesg?
14:38 imirkin: probably should have led off with that question ;)
14:40 vpelletier: I have a syslog containing all from dmesg
14:40 imirkin: ok, that works
14:40 vpelletier: https://pastebin.com/2GtzPtFK
14:40 imirkin: [oh, and these are chained command buffers, so basically you have The Grand Command Buffer, which just contains jumps to other buffers, which in turn have commands at the end of them to jump back to the grand one
14:41 imirkin: RING_SPACE will only fail if the grand one has no more room to add jumps
14:41 imirkin: [this is true of like ... nv25..nv50 or so. nv50 changes it all around. and previous there were no jumps.]
14:45 imirkin: ok, so one observation is that this is an insta-lockup. the GPU is basically not processing the commands for some reason.
14:46 imirkin: this will require digging =/
14:47 vpelletier: friday, pmoreau noticed the lines about ioctl returning -22. not sure this is necesarily abnormal at such verbosity level (some probing happening and the feature is just not there ?)
14:47 imirkin: it's expected
14:47 imirkin: basically some code doesn't want to hardcode stuff by chipset
14:47 imirkin: and so it just tries all of the various versions until one works
14:48 imirkin: ok so the most suspect thing is going to be the fence stuff...
14:48 imirkin: that could also explain why vblank isn't "working"
14:49 imirkin: aaaaahhhhhhh i think i see it.
14:49 imirkin: maybe.
14:51 imirkin: nouveau_bo_rd32/wr32 don't byteswap. the question is whether they go through iomem or not on G70... unfortunately i don't know. (and if they do, whether ioread/write32_native will cause a byteswap to happen)
14:51 imirkin: vpelletier: try booting with nouveau.debug=NvPCIE=0
14:52 vpelletier: .config ?
14:53 imirkin: of course yea
14:53 imirkin: sorry :)
14:54 vpelletier: failed again, looks like the lock is not instant... ssh'ing
14:55 vpelletier: mmh, maybe I was impatient, timestamps look similar
14:55 vpelletier: should I post the syslog again ?
14:55 imirkin: yeah
14:57 vpelletier: https://pastebin.com/p06fb85C
14:57 imirkin: unfortunately i don't have any nv41+ GPUs to test with on my PPC board - nv41 is what introduced the PCIE MMU logic, which allows for cool things to happen.
14:59 imirkin: was your mouse cursor messed up in X by any chance?
14:59 vpelletier: when the issue happens, "nothing" coherent is displayed
14:59 imirkin: right, i understand - even with all the noaccel stuff though
15:00 vpelletier: ah, with noaccel everything is rendered nice
15:00 imirkin: ok
15:01 vpelletier: noaccel + nofbaccel and nofbaccel alone
15:02 imirkin: ok, well let's try it anyways, mostly coz i'm curious
15:03 imirkin: hmmmmm
15:04 imirkin: right. so in nouveau_bo.c:nouveau_bo_wr32 change *mem = val to *mem = cpu_to_le32(val);
15:05 imirkin: and in nouveau_bo.c:nouveau_bo_rd32 change "return *mem" to "return le32_to_cpu(*mem)"
15:06 imirkin: [the thing is that the iowrite32_native maps to iowrite32be which will write the value as be, however i THINK that the gpu will byteswap all IO accesses to LE internally. not so with regular DMA-style writes.]
15:07 vpelletier: should I pass NvPCIE=0 again on reboot ?
15:07 imirkin: nah
15:08 imirkin: the nv17_fence.c code is one of the very few pieces of code using those helpers, so i'm thinking it could be related
15:08 imirkin: it *does* work on nv34, but i think that it ends up taking slightly different paths, because (a) AGP, (b) AGP is disabled and (c) it doesn't have a MMU
15:09 imirkin: in theory NvPCIE=0 should make the G70 work the same way, but i don't know for sure, so may as well try the theory directly
15:09 vpelletier: still locking up, but it changed a bit:
15:09 vpelletier: DRM: dsize=448 (was 1920x16 (+ align) = 30720, >> 5) RING_SPACE(chan, 129) = -16
15:09 vpelletier: no longer on 960
15:10 imirkin: full log please
15:11 vpelletier: https://pastebin.com/X4SVPVUA
15:11 imirkin: errr WTF?!
15:11 imirkin: nv41_mmu_new(struct nvkm_device *device, int index, struct nvkm_mmu **pmmu)
15:11 imirkin: {
15:11 imirkin: if (device->type == NVKM_DEVICE_PCIE ||
15:11 imirkin: !nvkm_boolopt(device->cfgopt, "NvPCIE", true))
15:11 imirkin: return nv04_mmu_new(device, index, pmmu);
15:11 imirkin: ok that's wrong
15:11 imirkin: that should be if (device->type *!=* NVKM_DEVICE_PCIE
15:12 imirkin: skeggsb: !! :)
15:12 imirkin: ok, so you've been essentially running in the no-pcie mode no matter what
15:12 vpelletier: heh
15:12 karolherbst: :O
15:13 vpelletier: err
15:13 imirkin: ok. so let's try that again. undo my thing with nouveau_bo_rd/wr (stash it somewhere)
15:13 vpelletier: if (device->type == NVKM_DEVICE_AGP ||
15:13 vpelletier: this is how it reads here
15:13 imirkin: vpelletier: in nvkm/subdev/mmu/nv41.c ?
15:13 vpelletier: nvkm/subdev/mmu/nv41.c
15:13 imirkin: i may have a local patch ;)
15:14 imirkin: i was dicking around with that logic earlier and may have messed it up
15:14 imirkin: ah yes. in fact i very much did do precisely that. nevermind!
15:18 imirkin: vpelletier: ok, so an interesting value to print out would be the fence value
15:18 vpelletier: fwiw I reverted the vsync patch (calling 16bits variants instead of 32) when we started working on fb
15:18 imirkin: e.g look at nv17_fence_sync
15:18 imirkin: yeah that's probably right
15:19 imirkin: oh, and another thing to check - try booting with nouveau.vram_pushbuf=1
15:22 vpelletier: lockup still there with vram_pushbuf=1, back to dsize=960
15:22 vpelletier: editing fence_sync
15:24 vpelletier: what should I print from nouveau_fence struct ?
15:24 vpelletier: or just print anything so fence becomes visible ?
15:26 imirkin: the sequence number is the interesting bit
15:26 vpelletier: (fwiw, it's a bit past midnight in my timezone, so I'm getting slower and slower)
15:27 imirkin: hmmmm ... i wonder where it reads that fence ...
15:27 imirkin: oh right, it just sticks waits in
15:27 imirkin: so it must be sticking a wait in, and the writes aren't happening
15:28 imirkin: unfortunately i'm not wholly familiar with how this code SHOULD work
15:29 imirkin: sounds like you're in skeggsb's timezone though. try to bug him for help.
15:29 imirkin: he should be able to give you some good pointers.
15:32 vpelletier: mmh, I'm not getting my fence log
15:32 vpelletier: I may be doing something dumb
15:32 vpelletier: printk(KERN_INFO "nv17_fence_sync value = %i (before +2)\n", value);
15:33 vpelletier: this is what I added right below "value"'s assignment
15:33 imirkin: sounds reasonable...
15:33 vpelletier: as I have no nouveau_drm handy to continue abusing NV_ERROR
15:33 imirkin: yeah, printk is fine
15:35 vpelletier: I didn't include anything, btw, but gcc didn't complain (and I guess kernel.h gets imported at least indirectly in virtually every source file)
15:36 vpelletier: also, there is a nv04_fence.c file, and I've been editing nv04_fbcon.c before. maybe that's the interesting one ?
15:37 imirkin: oh crap
15:38 imirkin: fctx->base.emit = nv10_fence_emit;
15:38 imirkin: fctx->base.read = nv10_fence_read;
15:38 imirkin: fctx->base.sync = nv17_fence_sync;
15:38 vpelletier: oh
15:38 vpelletier: indeed
15:38 imirkin: the fence_sync is for switching between channels maybe? not sure.
15:39 imirkin: so there you want to print the fence->base.seqno and whatever that nvif_rd32 reutrns
15:40 vpelletier: there = ?
15:40 vpelletier: nv04 vs nv10 vs nv17
15:41 imirkin: nv10_fence_emit/read
15:41 imirkin: basically various functionality evolved throughout the various GPUs, among them - fencing mechanisms
15:41 imirkin: the later ones are substantially better than the previous ones, hence all the variants
15:41 imirkin: gtg
15:42 vpelletier: I'll post the result here, with a link to pastebin
15:42 vpelletier: I'll certainly sleep before you are back
15:48 vpelletier: mmh, disapointing: log not visible
15:49 vpelletier: in any case, here is the latest version of my code, over vanila 4.11.5 (be8335f6b5ee1cbca72121dd495f922c72141b74): https://pastebin.com/6NYBu7Fe
15:51 vpelletier: and the syslog: https://pastebin.com/JUaV2rv9
15:54 vpelletier: -> Zzz
15:55 dboyan: imirkin: Why does the ValueDef in some instructions pointing to NULL value? I was seeing it in OP_EXIT
16:01 imirkin: dboyan: the instruction doesn't have any defs
16:09 dboyan: imirkin: But I was seeing one in gdb like "defs = std::deque with 1 element = {{value = 0x0, ..."
16:09 imirkin: probably coz of ssa? dunno
16:11 dboyan: well, I was testing my DAG building, and I didn't handle this case, so the program segfaults
16:11 dboyan: I guess it's safe to just ignore null values?
16:14 imirkin: well, if the instruction has ->fixed = 1
16:14 imirkin: then you can't touch it anyways
16:14 imirkin: any ->fixed = 1 instruction should act as a compiler barrier, i suspect
16:14 imirkin: (compiler barrier = instructions shall not move across the compiler barrier)
16:14 dboyan: yeah, I have handled that case
16:16 dboyan: I guess I can come up with some simple policies tomorrow and try how my code mess things up :)
16:16 dboyan: Haven't taken latencies, register pressure, etc. into account
16:17 imirkin: there should be a defCount or something?
16:18 imirkin: which may be 0
16:18 dboyan: defCount is just defs.size(), so it's 1 there
16:19 dboyan: there is another defCount that takes 2 arguments though
16:19 imirkin: hmmm
16:19 imirkin: yeah, i never remmeber how that stuff works
16:19 imirkin: thankfully it generally does and i don't have to care ;)
16:19 imirkin: find a place that iterates over all defs, and do the same thing
16:21 dboyan: well yeah, there is an exists() in both ValueRef and ValueDef that checks if value != NULL
16:23 imirkin: perhaps for a reason ;)
16:25 dboyan: well, defCount() seems never used for iterating, iterations are like 'for (int d = 0; defExists(d); ++d)'
16:25 imirkin: yeahhh that's right.
16:25 imirkin: that's what i was thinking of
16:39 dboyan: okay, issue fixed, will try some reordering tomorrow
19:16 karolherbst: mhhh. fun, when I dual issue breaks, the inst_issued2 counter goes up, but perf decreases by around 1.5% if instructions aren't reordered for dual issuing, but perf goes up if they are reordered
19:19 karolherbst: ahhh there are some instructions too much issued, that makes sense
19:19 karolherbst: *many
19:23 karolherbst: mhh actually instructions gets issued which aren't executed at all most likely
20:02 karolherbst: Lyude: mind checking something for me on a prime system you can test stuff on?
21:39 pmoreau: RSpliet: I fixed indirect loads/stores from/to shared memory. I’ll start cleaning up the code, before continuing on the events stuff.
21:53 karolherbst: pmoreau: ohh by the way, did I get a final answer regarding if you plan/want attend SHA2017?
21:54 karolherbst: it's like in 1.5 months and I somehow don't really plan on doing nouveau stuff alone there
22:24 RSpliet: pmoreau: nice!!
23:25 vpelletier: well that looks fun: https://pastebin.com/B1cF2k1f
23:25 vpelletier: and this may explain why I'm not gettin my fence debug printk...
23:27 vpelletier: mmh... or these are normal values to get ?
23:27 imirkin: NV40_CHANNEL_DMA = 0x0000406e
23:28 imirkin: aka 16494
23:28 imirkin: oh hrm
23:28 imirkin: why does it keep going though?
23:28 vpelletier: oh, you're still up !
23:28 imirkin: hope you're already up, not still up ;)
23:29 imirkin: feels like that should have i < n && ret != 0
23:30 imirkin: although ... who knows
23:30 imirkin: it's probably fine
23:32 vpelletier: ah, but those breaks are for the switch statement, not for the "for" statement
23:32 vpelletier: so err on loop exit will always be for the last sclass
23:33 vpelletier: then again, for the last sclass among the oclass expected here, so should not matter :s
23:36 vpelletier: yeah, ret=0 at loop exit
23:36 vpelletier: and it only changes once in the loop, on the 16494 oclass entry
23:40 vpelletier: and it really should fence, right ?
23:41 vpelletier: it = some code before the "GPU lockup"
23:52 vpelletier: nv17_fence_context_new is called, once, and returns 0