02:14karolherbst: mwk: any idea what you did while creating nvc1-trace.txt.xz ?
02:14karolherbst: there is some pgraph clock gate stuff going on
02:29karolherbst: mupuf: seems like the PGRAPH CG CTRL is the only reg really touched besides in suspend/init parts
02:57mwk: karolherbst: that's been years ago and I didn't pay attention in the first place
02:57mwk: so, no :p
02:57karolherbst: I see :D
03:09pecisk: btw, stupid noobie question - why Nouveau doesn't use LLVM? Curious
03:15karolherbst: pecisk: ask imirkin :p
03:24pecisk: imirkin: why Nouveau doesn't use LLVM? :)
03:25RSpliet: pecisk: earlier generations of chips (< NV50) kind of naturally fitted well on top of TGSI
03:25RSpliet: with it's 4-way SIMD ISA
03:26karolherbst: can anybody tell me what is happening herE? https://gist.github.com/karolherbst/e5d0675d05c51c726893
03:26RSpliet: I *guess* that when the NV50 compiler was developed (mostly calim's work) LLVM was not decided upon in gallium and/or mature
03:26RSpliet: karolherbst: a lot is happening there
03:26RSpliet: all those *BLCG <= 0xd04b are initialisation values for the block-level clock gating units
03:27RSpliet: there's no magic calculation behind those values
03:27karolherbst: yeah I want to understand why this happens
03:27RSpliet: just poke and run
03:27karolherbst: I am interessted in understanding why 0x020200 is set to full RUN
03:27karolherbst: 0x27724545 => 0x27724544 near the top
03:28karolherbst: 0x27724544 => 0x27724540 little later
03:28karolherbst: and then back again
03:28karolherbst: to 0x45
03:28karolherbst: I don't see this pattern in many traces
03:29karolherbst: mostly nvc1 only
03:29karolherbst: maybe other fermi cards as well
03:30karolherbst: and before that there is like nothing
03:33RSpliet: karolherbst: https://android.googlesource.com/kernel/tegra/+/android-tegra-3.10/drivers/gpu/nvgpu/gk20a/gk20a_gating_reglist.c
03:35karolherbst: RSpliet: somehow this doesn't help me understand _why_ the blob changes those regs
03:35karolherbst: what is the reason for the blob to do so
03:36karolherbst: this paste is some clock gate on=>off=>on sequence
03:36karolherbst: and I don't get why it does it
03:36karolherbst: something in there has to be important enough, that active clock gates actually do some harm
03:39RSpliet: I think some of it might need to be disabled before changing clocks
03:39RSpliet: you might want to start with nv50, the world was simpler back then, could be easier to recognise patterns :-)
03:39karolherbst: where does it change clocks?
03:40karolherbst: this is like the most important part: https://gist.github.com/karolherbst/e5d0675d05c51c726893#file-gistfile1-txt-L36-L114
03:41RSpliet: it invokes PDAEMON to do... stuff
03:41RSpliet: find out what it makes PDAEMON do, and perhaps you'll find your answer
03:42RSpliet: (there's some ptrs to partially-reverse-engineered PDAEMON firmware)
03:42RSpliet: (lots of manual labour, but can be rewarding ;-))
03:43karolherbst: I somehow think it isn't worth the effort to investiage this until we actually run into some kind of problem :/
03:43RSpliet: it might be worth seeing what that routine does regardless
03:44RSpliet: some say "pain" and "work", others say "challenge" and "general knowledge" :-P
03:45karolherbst: I bet this is some stupid silly stuff like: let check how many ticks our gpu core does per second for real
03:45karolherbst: or something
03:45karolherbst: don't know
03:45RSpliet: more accurate load-measurements... possibly
03:45karolherbst: mhhh, of this you don't have to disable those clock gates :/
03:45RSpliet: aren't you at least a little curious now? :-D
03:46RSpliet: there goes my hope of karol becoming a fantastic scientist :-P
03:46karolherbst: you can't provoke me that easily :p
03:47karolherbst: I could check this on my nvc1 though
09:29AndrewR: imirkin, hello
09:30AndrewR: imirkin, it seems vp2 (or its parts) still running at very low speed on this nv92 even if I hacked a bit latest nouveau module from ben's tree to allow reclock
09:30imirkin: AndrewR: vp2 doesn't crash your G92?
09:31AndrewR: imirkin, may be you remember some magic poke to put it info full speed (I test with mpeg2)
09:31imirkin: iirc vdec is a separate clock domain, perhaps we don't adjust it properly on reclock
09:31imirkin: but also iirc, vdec hangs G92's
09:32AndrewR: imirkin, yeah, I just updated my bug today (not test h264 again, assume it will likely to hang. will test ..soon)
09:32AndrewR: NOUVEAU_PMPEG=1 mplayer ~/botva/vid/dreamtime.mpg -vc ffmpeg12 -vo xvmc - this works fast
09:33imirkin: right, that doesn't use vp2 though
09:33imirkin: that uses PMPEG
09:34imirkin: also... what cpu do you have? and perhaps more importantly, what target do you build mesa for?
09:34imirkin: e.g. would SSE2 be auto-enabled (as it would be on any x86_64 build)
09:35AndrewR: imirkin, this is athlon x2 3800 , currently at 1Ghz * 2 cores
09:35imirkin: the thing is that vp2's accel is not very well suited to vdpau
09:35AndrewR: imirkin, mesa build was ..generic (no cflags added by me)
09:35imirkin: it only does idct/mc for mpeg (and vc-1)
09:36imirkin: is the target arch x86 or x86_64?
09:36imirkin: does the cpu have sse2?
09:36AndrewR: imirkin, I use this mesa build on another machine(s)
09:36imirkin: you could try building mesa with -msse2
09:37imirkin: iirc i got the vdpau stuff to be as fast as xvmc on my core i7-920
09:37AndrewR: imirkin, ok, will try this too ..but it should only affect speed, not hang, right?
09:37imirkin: but the big problem is that we're "competing" with super-duper-optimized libraries like ffmpeg for doing the VLD decoding, and we're just not as fast
09:38imirkin: ah yes, indeed. shouldn't hang.
09:38imirkin: but iirc G92's hang whenever you use BSP... which i guess you're not using for mpeg2 :)
09:38imirkin: don't remember if anyone tried VP on its own
09:38AndrewR: imirkin, because right now it seems to be bottlenecked on videocard itself ..I see very little cpu loa with vdpau. with working pmpeg - it like 50+ % on both cores for VIDEO: MPEG2 1280x720 (aspect 3) 59.940 fps 45000.0 kbps (5625.0 kbyte/s)
09:39AndrewR: imirkin, I'm glad to test ..but by what kind of prog, exactly? yours re-vp2?
09:39imirkin: yeah, but re-vp2 only has h264 stuff
09:39imirkin: i did the mpeg stuff after i had h264 figured out already and didn't need those helpers
09:40imirkin: it was way simpler
09:40imirkin: (and having figured out h264 already certainly helped)
09:40AndrewR: imirkin, so, you tried it from hardest end first
09:40imirkin: but you're not going to get much better performance than with NOUVEAU_PMPEG on your gpu
09:42AndrewR: imirkin, anyway, as of now I definitely much more interesting in getting it all work correctly (not at 1/4 speed, nor hang on h264). additional speed ...well, nice...but after, if anyone will find any time for some optimizing
09:43imirkin: btw, there are also some sse2-related opts in the nouveau ddx for xv yuv <-> nv12 conversion
09:43imirkin: iirc i only flip them on #ifdef __SSE2__
09:43imirkin: feel free to send patches to make it runtime-detected
09:44AndrewR: imirkin, but then mplayer defaulted to vdpau if not told otherwise, and all installed correctly ...so I keep it this way
09:44imirkin: oh btw
09:44imirkin: i just realized you weren't using xvmc
09:44AndrewR: imirkin, first I'll try to becnh them, on this 720p but 60 fps vid for example
09:45imirkin: er wait, nm
09:45AndrewR: VO: [xvmc] 1280x720 => 1280x720 MPEG1/2 Motion Compensation and IDCT
09:46imirkin: check the ffmpeg bit
09:46AndrewR: imirkin, ffmpeg inside mplayer sometimes changes defaults, I read help for -vc
09:47AndrewR: imirkin, "ffmpeg12mc ffmpeg problems FFmpeg MPEG-1/2 (XvMC) - deprecated, just use ffmpeg12 [mpegvideo_xvmc]" - from -vc help (but then my mplayer/ffmpeg not fully new)
09:49imirkin: ah heh
09:49imirkin: yeah, the ffmpeg12mc thing is old i guess
09:50imirkin: been a while :)
09:50imirkin: btw, you can greatly reduce cpu utilization if you use mplayer -quiet
09:50imirkin: those term updates can be quite expensive =/
09:50AndrewR: imirkin, nouveau 0000:05:00.0: fb: invalid/missing rammap entry - hopefully this doesn't mean it silently faile reclock? (using commit 392c5d11201ac38737602ab5c5bc448268b02557 aka clk/g84: Enable reclocking for GDDR3 G94-G200 commit from Ben's 4.3 branch, slightly changed for allowing nv92 too)
09:51imirkin: check pstate for current clock info
09:51AndrewR: imirkin, yes, for 'production' film watching i'll use quiet
09:51AndrewR: imirkin, it seems updated
09:51imirkin: you're looking at the AC line right?
09:52imirkin: the *'s are there solely to mislead you :)
09:53imirkin: looks like memory stayed put
09:53imirkin: but the core/shader clocks are close-ish
09:55AndrewR: imirkin, perf counters stuff still not in mesa and even on ML?
09:56AndrewR: imirkin, I wonder if vp2 also counted ..or can be
09:56imirkin: ask hakzsam . i think he's waiting for the new libdrm nvif stuff to happen.
10:02AndrewR: imirkin, decode_frame still hang card a bit (as before).
10:07imirkin: AndrewR: right, don't see why it wouldn't :)
10:10AndrewR: imirkin, do you have any idea about "Mismatch on try 0 for insn 0x78f2167de1522d89" https://bugs.freedesktop.org/attachment.cgi?id=118979 ?
10:10AndrewR: imirkin, does this mean my vp2 not exactly as your
10:10imirkin: actually those are tests mwk wrote, not me
10:11imirkin: and they test the underlying vp engine
10:11imirkin: anyways, sounds like the engine's not really running
10:11imirkin: which is what causes the test failure
10:11imirkin: [and for the vp decoding thing to hang]
10:14AndrewR: imirkin, then I hope it turned out to be something simple ..not requiring rewriting nouveau 3rd time
10:17imirkin: there's just some engine enable bit missing somewhere
10:19pmoreau: Interesting work from Google on compiling CUDA… I'll have to dig it up! :-)
10:20imirkin: as long as nouveau can consume something that llvm can output, should be useful
10:22imirkin: skeggsb: btw, looks like that wayward buffer is a RT, which lends credence to some prime situation
10:22imirkin: skeggsb: or other bo sharing
10:23imirkin: skeggsb: could it be a buffer coming in from the ddx?
10:23AndrewR: imirkin, I just recall your work on nv17 (?) mpeg2 engine ...back then you said using it will require at least libdrm rewrite
10:23imirkin: AndrewR: yeah
10:24imirkin: AndrewR: the kernel end of it has been (semi-)done actually
10:24imirkin: AndrewR: but on your G92 you have an engine capable of doing the decoding just fine
10:25imirkin: i mean, it has the nv3x version of the engine which has a PFIFO interface
10:25imirkin: [actually it has the G80 version of the engine, but the differences are minor and not really related to the decoding itself]
10:26AndrewR: imirkin, I'll try to dig out this trace file from blob and reply it (I saved it on ext hdd it seems). Not sure if mmio replay is good thing to do in general ..but if it will cure hang - then it will be just some cutitdown work on what exactly missed on nouveau side ...not right now, still (have chat in another windo)
10:43Caterpillar: how can I retrieve nouveau driver version?
10:46RSpliet: Caterpillar: there is no such thing
10:46RSpliet: Nouveau consists of various components, and generally follows the version numbering of the framework containing that component
10:47RSpliet: (not always though)
10:48Caterpillar: RSpliet: ok
10:48RSpliet: the question is: what do you *really* want to know?
10:48Caterpillar: RSpliet: I asked because I wanted to fill the Version textbox in https://bugs.freedesktop.org/show_bug.cgi?id=92192
10:49imirkin_: ignore that
10:49RSpliet: or fill in your kernel version
10:50RSpliet: (your dmesg already gives that away, so I'd say don't bother)
10:53AndrewR: brute force mmioreplay failed, haha (at least with nouveau loaded)
11:35hakzsam: imirkin_, yeah, I'm still waiting for nvif
12:05hakzsam: mupuf, you were right! Writing a feature matrix is just a pain :)
12:08imirkin_: cut down on the features
12:21mwk: cue non-C11 compliant compilers in 3... 2... 1...
12:22imirkin_: mwk: what was the idea behind vstream? to generate sample streams to feed to vdec engines?
12:22mwk: imirkin_: pretty much yes
12:22mwk: it worked, too
12:23mwk: for H.264, at least
12:23mwk: I don't think I ever got to test H.262
12:24mwk: "noreturn void"... why on earth they didn't make noreturn imply void...
12:24mwk: or even replace it, it's not like you could have a noreturn int
12:24imirkin_: mwk: btw, AndrewR verified that the vp2 hwtest fails on his G92 -- it looks like at least the macro engine isn't "going", probably none of the VP engine
12:24imirkin_: mwk: i think this is an issue on all G92's
12:25RSpliet: mwk: I'm scared of the semantics of that...
12:25RSpliet: "this function does not return an int"
12:25imirkin_: if you figure out how to turn the vp engine "on", that'd be much appreciated by all the G92 people out there who can't use h264 decoding
12:26mwk: umm it should already be on..
12:26imirkin_: mwk: yeah, should be
12:26imirkin_: but isn't :)
12:26imirkin_: it's only an issue on G92's
12:27imirkin_: everything else works fine
12:27RSpliet: imirkin_: does it have a clock?
12:27mwk: Attaching to program: /home/mwk/envytools/hwtest/hwtest, process 420
12:27mwk: ptrace: Operation not permitted.
12:27mwk: what the fuck...
12:28imirkin_: RSpliet: haven't the faintest clue. last someone looked it, the xtensa cpu was running though
12:28RSpliet: imirkin_: nvatiming should give an easy answer to that... I think?
12:28imirkin_: but any sort of things that caused interactions with the underlying engines immediately hung
12:28RSpliet: hmm, not sure about that actually
12:28RSpliet: maybe mupuf knows :-P
12:30mwk: I'm not certain I have an operational G92 right now
12:30imirkin_: oh well
12:30mwk: I kind of chose today to start getting my test machines out of mothballs
12:30mwk: so I have exactly 1 right now
12:31imirkin_: well, if you have instructions for AndrewR, i'm sure he'd be able to perform them
12:31imirkin_: mwk: see https://bugs.freedesktop.org/show_bug.cgi?id=82835 -- last attachment has his hwtest log
12:32mwk: well, something's definitely wrong with this vp2...
12:32mwk: seems like... MMIO reads failing
12:33mwk: there's such a thing as IBUS timeout
12:33mwk: which deals with max response time from registers
12:33mwk: perhaps it should be bumped
12:35mwk: imirkin_: well, that's my guess
12:35mwk: MMIO interface too slow
12:35mwk: and hits timeout
12:35mwk: maybe some clock along the way is ridiculously slow
12:35imirkin_: mwk: well, experimentally, the xtensa cpu is up and running and working fine
12:36imirkin_: mwk: but once you send it some command that will talk to the macro/hw engines, ka-boom
12:36mwk: xtensa CPU and macro/hw *are* on distinct clocks
12:36mwk: overzealous clock gating? also a possibility
12:39imirkin_: yeah. something. but it seems like all G92's have this issue, and no other cards i've heard of have it
12:40mwk: ok, I may look into it
12:40mwk: but no promises
12:40mwk: I need to stuff some progress bar into hwtest
12:41mwk: I can't tell if it's hung or just doing iteration #1234 out of 1 million...
15:09imirkin_: skeggsb: ping
15:11mupuf: mwk: nice to see you having fun :D
15:12imirkin_: skeggsb: i wonder if it's a situation where nouveau_ttm_fault_reserve_notify sticks a faulted-out vram rt buffer into gart memory instead of vram
15:13imirkin_: and then that buffer is subsequently shared
15:20hakzsam: mwk, any ideas why I can't assemble 'exit' with 'envyas -m nv50 kernel.asm', do I need more options ?
15:20hakzsam: I have this error "nv50_mp.asm:1.1-2.1: No match"
15:22imirkin_: hakzsam: pastebin the file
15:23imirkin_: hakzsam: also try adding -V g84 -O cp
15:23hakzsam: imirkin_, the file only contains 'exit'
15:23hakzsam: same error with "envyas -m nv50 -V g84 -O cp kernel.asm"
15:23imirkin_: is there an exit on nv50?
15:24imirkin_: hakzsam: http://hastebin.com/yomawaroji.rb
15:24imirkin_: it's an instruction flag
15:24hakzsam: why the nop is needed?
15:24imirkin_: nop is the instruction
15:24hakzsam: it is not with gf100 as
15:25imirkin_: exit is just a flag
15:25imirkin_: yeah, on nvc0 there's an actual exit instruction
15:25hakzsam: ah okay
15:25imirkin_: you could just as well have like exit add $r1 $r2 $r3
15:26imirkin_: er i guess not with add
15:26imirkin_: only with some ops :)
15:26imirkin_: maybe only with long ops?
15:27hakzsam: I don't know :)
15:27imirkin_: you can always look at g80.c and see how it's all defined
15:27imirkin_: i'm too lazy though
15:27hakzsam: yeah, I'll have a look
15:27imirkin_: remember that nv50 isa is different between the shader types
15:27imirkin_: so you have to make sure to target the right shader
15:28hakzsam: 'cp' seems the one I want
15:28hakzsam: for compute
15:29imirkin_: that lets you access gmem
15:31imirkin_: sadly ARB_ssbo requires gmem access from fragment programs as well
15:31imirkin_: so ARB_compute_shader is not going to happen on nv50
15:33hakzsam: does the blob support ARB_compute_shader?
15:33imirkin_: not on nv50
15:34imirkin_: it of course does on nvc0
15:34imirkin_: (it's core in GL 4.3)
15:38hakzsam: mmh "tidx" doesn't seem to be a "special reg" on nv50
15:39imirkin_: hakzsam: physid
15:39hakzsam: physid is tidx on nv50?
15:41imirkin_: check handleRDSV()
15:43hakzsam: oh nice!
15:43imirkin_: physid = first 16 bits: x, next 10 bits = y, last 6 bits = z
15:43imirkin_: that stuff all reads out of shared memory
15:43imirkin_: not 100% sure who writes in there :)
15:44imirkin_: maybe it's magically written by the shader executor logic
16:20Boohbah_: fixes my plasmashell
16:20Boohbah_: imirkin_: genius :)
16:21imirkin_: glad it worked
16:21imirkin_: it's some very obscure situation happening here... i think i might know why but waiting for skeggsb to come appear
16:24Boohbah_: maybe duplicates
16:25imirkin_: yeah, i had originally misdiagnosed the issue as running out of vram
16:27imirkin_: pinged those bugs, perhaps we'll get some more confirmations
21:46imirkin: skeggsb: actually after a bit more thought i think that my patch in https://bugs.freedesktop.org/show_bug.cgi?id=92504#c20 might actually be correct, even for the pre-nv50 case. what do you think?
23:29imirkin: mlankhorst: when writing your fence/pushbuf -> context patches, did you think about cross-channel buffer usage?
23:30imirkin: mlankhorst: specifically the case i'm thinking of is when you have one buffer with a fence from one context, and then that same buffer gets used in another context and gets another fence attached to it
23:31imirkin: mlankhorst: this will "forget" the first fence, and since those fences are in diff contexts, they might get flushed out at different times