00:24 phillipsjk256: OK, tested tux racer with Randr. about 3x performance improvement over xinerama. NV driver is still 50% faster.
00:24 phillipsjk256: Bonus: tux did not clobber my screen settings.
00:31 phillipsjk256: I wonder if that is becuase I explicitly set them in xorg.conf. (in the past, the problem was everything would revert to mirroring)
05:20 karolherbst: who is koriakin in IRC?
05:20 mwk: that'd be me
05:21 karolherbst: ahh k
05:21 karolherbst: yeah, the values I found are a bit strange
05:21 karolherbst: the CARD_LNK_CTL2.SPEED reg seems to adjust the 0-1 bits according to the current PCI version the card is in
05:21 karolherbst: if I put my kepler into v1 mode, the value jumps down to 0
05:22 karolherbst: if I put it inot v2 mode, it goes to 1
05:22 karolherbst: but LnkCtl2 is at 2.5 in v2 mode then
05:22 karolherbst: usually v1 only card have a value of 0 there, and card which are put into v2 mode, also get a 1 there (tesla cards)
05:24 karolherbst: strange that the reg is there on G84 though
05:24 karolherbst: I guess it is, but it always 0 and the blob doesn't read it
05:27 mwk: seems so
05:28 karolherbst: I was a bit worried about this, too, because I didn't saw the PCIe version relation
05:28 karolherbst: 0 and 1 for the same thing
05:29 karolherbst: there are some cards where the blob does this v1->v2 transition, will check again on these traces
05:34 karolherbst: mwk: any better name for 2_5GT_V1 then?
05:35 mwk: wan't it called gen2?
05:35 karolherbst: mhh
05:36 karolherbst: I always think gen2 means like a card which can do v2, but usually gen2 == V2?
05:36 mwk: no idea, I thought the protocol was called gen2
05:36 mwk: anyway, _V2 is fine
05:37 mwk: could use a comment though
05:37 karolherbst: usually I see v in more "official" stuff
05:37 karolherbst: yeah
05:38 karolherbst: yeah the term "version" is used in pcie specs and not generation
05:38 mwk: v2 it is, then
05:39 karolherbst: anyway, these PCIe v3 card can also do v1, so it seems like this is all brutally compatible with older versions :D
05:40 karolherbst: what seems strange, that the card doesn't have a v3 switch at all
05:40 mwk: brutally compatible, eh
05:40 karolherbst: this bothers me for a long time already
05:40 karolherbst: yeah well
05:41 karolherbst: the spec seems to be pretty strict about compatibility
05:41 karolherbst: a x16 card should also work on a x1 slot if you want to put it there
05:41 mwk: don't believe everything you hear...
05:41 karolherbst: and with working I mean, it shouldn't break ;) the card can still put some "I refuse work" picture on your display
05:42 mwk: IME cards need all these extra ground/power connectors or will just refuse to talk to you
05:42 karolherbst: mhh therre are there in x1 slots
05:42 karolherbst: *they are there
05:42 mwk: I *tried* to make use of these damned slots
05:42 karolherbst: ohh I see
05:43 mwk: you mean electically x1 or physically x1?
05:43 karolherbst: but the power pins are all in the first 10 pins
05:43 karolherbst: physically x1
05:44 karolherbst: but I think the spec only allows this 75W power mode for x16 gpus only
05:44 karolherbst: so yeah
05:44 karolherbst: allthough the power stuff ís also there on smaller slots
05:44 mwk: hmm.
05:45 karolherbst: ohhh
05:45 mwk: maybe the prsnt thing was the problem then
05:45 karolherbst: the slot itself tells the card how big is it? nice
05:45 mwk: in any case...
05:45 mwk: before you put the saw to the motherboard, think twice
05:45 mwk: I can attest that the GPUs are not happy about such arrangements
05:45 karolherbst: yeah, of course :D
05:46 mwk: for whatever reason
05:46 karolherbst: mhh
05:46 mwk: it's not like I have a logic analyzer on hand
05:46 mwk: maybe I should have...
05:46 karolherbst: does they refuse entirely or print something on the display?
05:46 mwk: entirely
05:46 karolherbst: mhh
05:46 karolherbst: they shouldn't if the follow specs
05:46 mwk: if it gets as far as putting something on the display, that already counts as working :p
05:46 karolherbst: the spec clearly states, they should begin at 25W mode and then jump to 75W if possible
05:47 mwk: though I do have a funny nv30 AGP card
05:47 mwk: if you neglect to connect the extra power plug, it puts random colors to display
05:47 karolherbst: ....
05:47 karolherbst: mhh
05:47 mwk: as if the RAMDAC was unpowered or something
05:47 mwk: or maybe memory
05:47 mwk: scary shit
05:47 karolherbst: strange
05:48 karolherbst: the old GTX 560 I had just printed a nice notice on the display when I forgot them
05:48 mwk: GTX 560 is several generations newer than that :p
05:48 karolherbst: I know
05:48 mwk: and yeah, they're better behaved now
05:49 mwk: either print a message, or make a hellish beeping noise that will make you correct the problem immediately
05:49 karolherbst: mhhh
05:50 karolherbst: do you have a G84 card?
05:50 karolherbst: I bet the reg just contains 0x0 there?
05:50 karolherbst: the 0x0880a8 reg
05:51 karolherbst: and what do you mean by "this should have some variants" ?
05:51 karolherbst: ohh you mean because 8.0 isn't supported on all cards
05:51 karolherbst: mhhh
05:52 mwk: the reg contains 0
05:52 mwk: the low 2 bits aren't settable
05:53 mwk: *but* bits 4-5 of that reg are settable
05:53 mwk: so, the reg exists, the bitfield... maybe, maybe not
05:53 karolherbst: the 0-1 bits are only settable in v2 mode
05:53 karolherbst: otherwise the will be still 0 even on my kepler
05:53 karolherbst: seems like it is one of the smart regs
05:54 karolherbst: but there are more bits on newer cards there as well
05:54 karolherbst: 0x1f0000 and 0x3f0000 on kepler
05:54 karolherbst: or 0x10000 on a nv108
05:55 karolherbst: 0x1e0000 onj a nv117
05:55 karolherbst: no idea what it means though
05:57 karolherbst: mwk: but do you really want a variant for the 8.0 value?
05:58 mwk: would be nice
05:59 mwk: but if you're not able to find it quickly, screw it
05:59 karolherbst: usually all PCIe v3 cards?
05:59 karolherbst: but I don't know which is the first chipset with it
08:41 phillipsjk256: So upon restoring from sleep mode, my card appeared to be doing scan-out at the wrong frequency or bit-depth.
08:43 phillipsjk256: The screen when all rainbow static. I blindly switched to the console and that was scrambled as well, (but black and white)
08:45 phillipsjk256: I even captured the output of dmesg before restarting. apparently the memory was clocked at only 60Mhz
08:48 phillipsjk256: technically that may be enough for scan-out, but I ma not sure. (my old Virge had a similar clock speed)
09:00 phillipsjk256: apparently the floppy drive did not wake properly either.
09:05 RSpliet: pmoreau: that NV94 of yours is hardly a challenge :-P
09:06 imirkin: RSpliet: older cards don't have as many reclocking options
09:06 RSpliet: imirkin: I was only semi-serious
09:08 karolherbst: is there something like bool in linux?
09:08 RSpliet: but to bring it slightly more positive: it reclocks nicely now :-)
09:09 imirkin: airlied: skeggsb: benh: with the nouveau from 4.2, everything appears to be fine. which is extremely odd.
09:09 imirkin: perhaps the card got into some very sad state
09:09 RSpliet: karolherbst: there's very few cases in the kernel where you'd want a bool and not an integer
09:09 karolherbst: pcie_cap_speed: not full speed or full speed
09:09 karolherbst: there isn't something else
09:10 RSpliet: I'm not saying there's no case, but... well, normally integers are used
09:10 karolherbst: but why is "bool" not as good as int in general?
09:11 karolherbst: I know that in pre C99 bool was a mess, ...
09:11 RSpliet: well, functions returning a bool are not as informative as functions retuning an integer
09:11 karolherbst: ahh, because of error state?
09:11 RSpliet: exactly
09:11 karolherbst: mhh
09:11 RSpliet: and since they both should take 32 bits of space
09:11 RSpliet: (because unaligned memory access is a spawn of the devil)
09:11 RSpliet: I don't think there's been much incentive for adding bools
09:12 karolherbst: okay, then I won't use bool as a return type
09:12 karolherbst: but bool parameter should be fine I guess
09:14 RSpliet: judging by nvkm_clk_create, it should be fine
09:15 RSpliet: but... what does "full speed" mean?
09:15 karolherbst: I think I use too many function pointer and check against NULL too often
09:15 karolherbst: but I don't want a NULL dereference...
09:15 RSpliet: and what does "not full speed' mean?
09:15 karolherbst: not full speed is 2.5, and full speed is either 5.0 or 8.0
09:15 karolherbst: v2 and v3 card use the same gpu reg value
09:16 karolherbst: but the pcie reg will show different speeds
09:16 RSpliet: hmm ok
09:16 karolherbst: and the blob also uses both values only
09:16 karolherbst: since tesla
09:16 RSpliet: so how can you distringuish between the two in nouveau?
09:17 karolherbst: I don't have to
09:17 RSpliet: that's not what I asked ;-)
09:17 karolherbst: at loading time I just bump pcie version, pcie link cap and pcie link ctl to max speed
09:17 karolherbst: this is what the blob does at least for my kepler
09:18 karolherbst: link ctl can stay lower for fermi and older cards for reasons I don't fully understand
09:19 karolherbst: RSpliet: or what do you mean by "between the two" ?
09:19 karolherbst: you mean the same value for 5.0 and 88.0 or something else?
09:19 karolherbst: *8.0
09:19 RSpliet: can you tell in nouveau after setting the max link cap, what the result was (5.0 vs. 8.0)?
09:20 karolherbst: mhhhh
09:21 RSpliet: sorry for the distraction btw, I'm sure it's not required to get your patches carried into nouveau
09:21 karolherbst: no, its a good question, that's a problem I didn't solved yet
09:21 karolherbst: I need to find a good and stable way to determine what is the max speed the card can be bumped to
09:22 karolherbst: sadly its not clear to me, what I have to check against completly
09:22 karolherbst: but the cap doesn't matter there as much as othr things
09:22 karolherbst: if the cap stays at 2.5, I can't cleare go higher than that
09:25 RSpliet: pmoreau: FPS in portal 76,7->100,6
09:25 karolherbst: RSpliet: what did you do? :D
09:26 RSpliet: fixed reclocking for his card
09:26 karolherbst: ahh I see
09:28 RSpliet: which is unfortunate, because now I ran out of cards again
09:31 karolherbst: do you have a kepler card?
09:31 karolherbst: with gddr5? you could test my gddr5 patches ;)
09:32 imirkin: karolherbst: the only gddr5 tesla is the one i have, nva3 + gddr5
09:32 imirkin: not all nva3's have gddr5
09:32 imirkin: but a few do
09:32 karolherbst: :/
09:32 karolherbst: mhh
09:33 karolherbst: yeah, but I asked about kepler ,)
09:33 RSpliet: and they're messed up beyond imagination... imirkin: does it even work?
09:33 RSpliet: karolherbst: nope, the only kepler I have is DDR3
09:33 karolherbst: k
09:34 imirkin: RSpliet: yeah works fine now
09:35 RSpliet: interesting, I recall Ben saying his was still not stable or good
09:36 imirkin: yeah.... it'll probably have more issues if i try to drive a display off of it
09:37 RSpliet: but who does that with a graphics card anyway
09:37 huehner: karolherbst: i have a kepler nve7 with gddr5 here (optimus laptop)
09:37 imirkin: RSpliet: right, it's a fairly rare use-case ;)
09:37 imirkin: at least for me!
09:38 imirkin: skeggsb: just for reference, this is what dmesg looks like with nouveau 4.2 (+ of patch): http://hastebin.com/zisotebeyu.css
09:38 karolherbst: huehner: and pstate 0f is unstable for you?
09:38 huehner: karolherbst: never tried anything with it, if you need some specific testing let me know
09:39 karolherbst: huehner: try changing to 0f pstate with stock nouveau and try to run stuff and change while running stuff and so on
09:39 karolherbst: and then try again with this patch: https://github.com/karolherbst/nouveau/commit/b4224088f66e848b01caa3eac35fd6e4d390968f
09:39 karolherbst: if it feels more stable with the patch, I am happy
09:39 imirkin: i suspect he doesn't have pstate=1 set ;)
09:39 karolherbst: yeah, right
09:40 karolherbst: this is also needed
09:40 karolherbst: and current nouveau tree
09:40 karolherbst: aka after big cleanup
09:42 karolherbst: huehner: would be also nice to know where you are with your card compared to blob, I am usually around 65% blob speed at 0f
09:42 huehner: karolherbst: will try to do later tonight, i assume nouveao.ko from darktama's repo on top of 4.1 or 4.2 is enough?
09:43 karolherbst: yeah
09:43 huehner: karolherbst: any example perf-test to do that with?
09:43 karolherbst: games?
09:43 imirkin: huehner: glxspheres is a pretty good measure of memory bw
09:43 karolherbst: I usually test with borderlans presequel/2, bioshock infinite, talos principle, antichamber
09:43 karolherbst: imirkin: and gpu core
09:44 karolherbst: but only as a bottleneck factor
09:44 karolherbst: if pcie and memory are really high clock you see the perf drop, when the core gets too slow
09:44 huehner: karolherbst: will try to find smoe time later, to set something up
09:44 karolherbst: nice, thanks
09:45 karolherbst: I am still wating for the first you tells me, that the patch doesn't work
09:46 imirkin: grrrrrrrrr
09:46 imirkin: why didn't i test this in the first place
09:46 karolherbst: ? :D
09:46 imirkin: skeggsb: your rewrite broke BE. don't ask me how. but with 4.2, regular colors, with 4.3, messed up colors
09:47 karolherbst: happy git bisect :)
09:48 imirkin: not _too_ much fun
09:48 imirkin: since i have to keep applying patches
09:49 imirkin: [and also with 4.3, i need the extra byteswap]
09:49 imirkin: benh: fyi: --^
09:49 karolherbst: I see
10:07 karolherbst: pbus->max_bus_speed seems to be independent from anything whats in the card. even if my kepler set everything down to 2.5, this is still at 8.0
10:57 karolherbst: imirkin_, RSpliet: is it fine to set the PCIe speed to 5.0 when 8.0 is not possible, but requested and current speed is at 2.5?
10:58 imirkin_: karolherbst: yea
11:10 karolherbst: this would be the generic code so far (there are many func pointers, but I will look which one can be removed later after I finished tesla and fermi, too): https://github.com/karolherbst/nouveau/commit/553445f6089442ee581db6ee7bf4d01ac918a8bd
11:12 karolherbst: and I should use nvkm_error sometimes, too
11:13 imirkin_: you should upgrade to v2 on init
11:13 karolherbst: yeah
11:13 karolherbst: I know
11:14 karolherbst: its down below: pci->func->pcie_speed_init(pci);
11:14 imirkin_: ok cool
11:14 karolherbst: I just check if the version is 2+
11:14 karolherbst: that's the kepler magic: https://github.com/karolherbst/nouveau/commit/8d971a9b73d7f7ed860663fa4fb081f63a1cb4c3
11:14 karolherbst: I figured I should move some bits into chipset code, because they behave slightly different
11:15 karolherbst: for example: control speed matters for kepler, but not for tesla/fermi?
11:15 imirkin_: gk104_pci2_mask(pci, GK104_PCIE_SPEED_OFFSET, GK104_PCIE_SPEED_MASK, mask_value)
11:16 imirkin_: i'd stay away from using these defines for now... the rest of the code doesn't have them
11:16 karolherbst: even if I use them more than twice?
11:16 imirkin_: even if you use them 100x
11:16 karolherbst: k
11:16 imirkin_: also instead of reading it in twice
11:16 imirkin_: i'd do a single read
11:16 imirkin_: then mask in the bits
11:16 imirkin_: write, then mask the 1 in, and write
11:16 karolherbst: where do I read them twice?
11:17 imirkin_: mask = read + write
11:17 karolherbst: ohh
11:17 karolherbst: so I should do rather read, write, write
11:17 imirkin_: imho yes
11:17 imirkin_: not a huge deal though
11:18 imirkin_: this isn't exactly a perf-sensitive bit
11:18 karolherbst: mhh
11:18 karolherbst: the blob also reads twice
11:18 imirkin_: hehe ok
11:18 karolherbst: but I think this is for verificication
11:19 karolherbst: ohh I was wrong
11:19 karolherbst: the blob does this: write, read, write, read
11:38 Soukyuu: (hopefully) a quick question: the status page says that most 3D operations on NV50 are slow because power management is not properly implemented - can I just disable it and get the "full" performance?
11:38 Soukyuu: thing is, my GPU seems to be only using the highest power state anyway, either on windows or linux
11:40 imirkin_: Soukyuu: in that case there should be no slowness
11:40 imirkin_: Soukyuu: most tesla's tend to boot to a high pstate, so it's often not a problem... except on laptops
11:40 imirkin_: starting with fermi, they boot to lower pstates
11:41 Soukyuu: interesting. I've tried switching to nouveau earlier this year, and mpv opengl playback at my settings stuttered horribly, sadly
11:41 karolherbst: why opengl backend?
11:41 imirkin_: probably for totally unrelated reasons
11:41 Soukyuu: mainly for interpolation
11:41 Soukyuu: (aka frame blending)
11:41 imirkin_: Soukyuu: vdpau has advanced interp...
11:42 Soukyuu: it's what helps to at least reduce judder of playing 23.97fps content at 60fps
11:42 karolherbst: yeah, I would also user either vaapi or vdpau, depending on the card I use
11:42 imirkin_: anyways, i've definitely heard reports of suckage when using GL_NV_vdpau_interop
11:42 imirkin_: i haven't investigated due to lack of caring
11:43 Soukyuu: the card is pretty old now, is someone even still working on NV50?
11:43 imirkin_: Soukyuu: or are you saying that you were using sw decode and still had trouble with the opengl backend?
11:43 imirkin_: if so, that'd be surprising
11:43 karolherbst: yeah, will probably break it....
11:43 karolherbst: ;)
11:44 Soukyuu: imirkin_: I'm using --vo=opengl-hq:interpolation, I think that's sw decode
11:44 karolherbst: Soukyuu: is it really bad with vdpau?
11:45 Soukyuu: karolherbst: it's only really bad with :interpolation, and it's only usable when using --vo=opengl
11:45 karolherbst: ahh I see
11:45 Soukyuu: without interpolation, stuff is not stuttering, but there is judder then
11:45 Soukyuu: so sadly, hwdec is not viable for me
11:45 imirkin_: Soukyuu: well, if you want to play around with it again, we can do some debugging
11:46 Soukyuu: sure, that would be nice
11:46 imirkin_: Soukyuu: vdpau (with hwdec) + advanced interp has worked very well for me for interlaced content, and i've never ntoiced any issues playing back regular progressive content without any additional settings
11:46 imirkin_: and i'm extremely picky about that stuff
11:46 Soukyuu: it's a different kind of interpolation
11:47 Soukyuu: it's close to injecting interpolated frames as mvtools does it
11:47 imirkin_: so i'm guessing there's something else in your setup that's upsetting things
11:47 karolherbst: wild guess: did you check cpu usage while watching?
11:47 imirkin_: well presumably this all works fine on blob drivers?
11:47 Soukyuu: but instead of interpolating frames, it blends them - it's to get 23.97fps (progressive) to 60fps
11:47 Soukyuu: and yes
11:47 Soukyuu: blob works
11:48 Soukyuu: karolherbst: can't remember what the usage was, will switch and test it again
11:48 karolherbst: this sounds strange
11:49 karolherbst: maybe the interpolation bits are also coded in OpenGL and mesa can't keep up with the requested speed?
11:50 Soukyuu: might be, since it requires you to use the openGL backend
11:50 Soukyuu: https://github.com/mpv-player/mpv/wiki/Interpolation <- wiki article, the one I'm having troubles with is the smoothmotion version
11:51 karolherbst: uhh, sounds demanding
11:52 karolherbst: will test it on my intel card nethertheless
11:52 karolherbst: and my nouveau card
11:52 Soukyuu: I'll come back once I switch to nouveau
11:56 karolherbst: mhh, works nicely here
11:56 karolherbst: 6% cpu usage
11:59 karolherbst: 40% for full hd though
11:59 imirkin_: well nv50 can't do 4k ;) [i think]
12:00 imirkin_: pretty sure it'll only do 2560x1440
12:00 karolherbst: :D
12:00 imirkin_: and i don't think anyone's been crazy enough to record videos at that resolution
12:00 imirkin_: s/record/encode/
12:01 imirkin_: [also fairly sure that pre-kepler vdec didn't do 4k videos either]
12:01 karolherbst: yeah, but this is sw encoding
12:01 karolherbst: *decoding
12:01 karolherbst: so most of my fullhd videos run at around 40% cpu usage
12:02 karolherbst: on nouveau with his option
12:02 imirkin_: depends on cpu :)
12:02 imirkin_: i doubt my shiny new PowerPC 970FX chips would be able to keep up
12:02 karolherbst: ohh I bet it can
12:02 karolherbst: ohh wait
12:02 karolherbst: no altivec
12:02 karolherbst: okay it can't
12:02 karolherbst: :p
12:03 imirkin_: no, they got altivec
12:03 karolherbst: really?
12:03 karolherbst: I thought the G5 didn't had it
12:03 imirkin_: The execution pipelines were lengthened compared to the POWER4 to achieve higher clock frequencies. It has eight execution units: two arithmetic logic units (ALUs), two double precision floating-point units, two load/store units and two AltiVec units.
12:03 imirkin_: you were mistaken
12:03 glennk: it does but the altivec unit is on the other side of the chip
12:04 imirkin_: it's a long walk to the other side? :)
12:04 glennk: yes, moving stuff between alu/fpu and altivec takes longer on the g5 than the g4
12:05 karolherbst: ohh I see
12:05 karolherbst: but high clocked G5 should still have better altivec perf
12:06 karolherbst: I always thought altivec was G4 only
12:06 karolherbst: mhhh
12:06 karolherbst: but why...
12:07 Soukyuu: ok, I'm on nouveau now, video playback is still stuttering
12:08 imirkin_: Soukyuu: pastebin dmesg, xorg log, and glxinfo output
12:09 karolherbst: Soukyuu: what is your exact command you execute for playing? I would like to test the same here
12:09 karolherbst: but I got pretty high cpu usage for fullhd videos, so mhh
12:16 Soukyuu: karolherbst: --no-config --display-fps=60 --vo=opengl-hq:scale=ewa_lanczossharp:cscale=ewa_lanczossoft:dscale=mitchell:tscale=oversample:scale-antiring=0.8:cscale-antiring=0.9:scaler-resizes-only:sigmoid-upscaling:interpolation:fancy-downscaling:source-shader=~~/shaders/deband_high.glsl:temporal-dither:pbo --vf=vapoursynth=/home/azure/.config/mpv/scripts/mv_interpolation.py
12:16 Soukyuu: it's a mouthfull, but most of it is done via the config
12:17 Soukyuu: (and some scripts)
12:17 karolherbst: ohhh mhh
12:17 karolherbst: the script would be interessting
12:17 Soukyuu: karolherbst: https://github.com/Soukyuu/dotfiles/tree/master/config/mpv
12:20 karolherbst: "Option vf: vapoursynth doesn't exist."
12:21 Soukyuu: imirkin_: dmesg: http://pastebin.com/YnLEidSC | glxinfo: http://pastebin.com/fDGWm1Rh | xorg.log: http://pastebin.com/DgyEZCxS
12:22 Soukyuu: karolherbst: i think you need a recent mpv-git version for that to be implemented, plus vapoursynth and vapoursynth-mvtools plugin to work
12:22 karolherbst: I see
12:22 Soukyuu: you can see the stuttering without --vf too, though
12:22 Soukyuu: it looks even more ugly
12:23 karolherbst: I see
12:23 karolherbst: mhh
12:23 karolherbst: my cpu usage is around 35% with a full hd video
12:23 karolherbst: how is yours?
12:23 Soukyuu: 35-50% with a 720p one
12:23 Soukyuu: it's h264 hi10p though
12:24 Soukyuu: ah, scratch that, it's not
12:24 karolherbst: ohh on my intel card it stutters
12:24 Soukyuu: "regular" 8bit h264
12:25 Soukyuu: I'm running kwin btw, but I also disable compositing completely once mpv is running
12:25 Soukyuu: same problem with compositing active though
12:25 pmoreau: RSpliet: Awesome! Well, I could send you some more cards if you find any to your liking, or you could work on some power/clock gating on those you have. O:-)
12:25 Soukyuu: the nvidia blob has smooth playback with compositing off, but stutters just like nouveau with compositing on
12:26 karolherbst: Soukyuu: could you remove just cscale-antiring=0.9 ?
12:26 karolherbst: and check if its better
12:27 imirkin_: Soukyuu: ok, so looks like it boots to a middle clock
12:27 imirkin_: Soukyuu: fwiw RSpliet has patches for reclocking nva0
12:27 imirkin_: i think they should be in drm-next... maybe
12:27 Soukyuu: karolherbst: i think that somehow broke frame blending completely, I see afterimages
12:28 Soukyuu: imirkin_: can I make it stick to max clocks instead?
12:28 karolherbst: I use "--no-config --display-fps=60 --vo=opengl-hq:scale=ewa_lanczossharp:cscale=ewa_lanczossoft:dscale=mitchell:tscale=oversample:scale-antiring=0.8:scaler-resizes-only:sigmoid-upscaling:interpolation:fancy-downscaling:source-shader=~/shaders/deband_high.glsl:temporal-dither:pbo" now
12:28 karolherbst: and it seems fine with my intel card
12:28 imirkin_: Soukyuu: no... you get boot clocks by default. if you want to chagne that, you have to reclock.
12:28 karolherbst: you could check which of these options is really bad for your card
12:28 imirkin_: Soukyuu: the whole trick isn't deciding when to change clocks.. it's actually changing them :)
12:28 imirkin_: sadly it's not just like a register where you write a new value and move on
12:30 imirkin_: on the bright side, doesn't look like you have any crazy settings
12:30 imirkin_: if you really want triplebuffer, you need to set SwapLimit to 2. but that's a mostly untested setting, so i'd recommend against it.
12:30 Soukyuu: karolherbst: it's already enough to specify --vo=opengl-hq:interpolation and it stutters
12:30 Soukyuu: imirkin_: should i disable triple buffer completely then?
12:30 imirkin_: well, it already is disabled :)
12:31 imirkin_: the option in your xorg.conf has no effect... that's not a valid option
12:31 imirkin_: [ 4.501] (WW) NOUVEAU(0): Option "TripleBuffer" is not used
12:31 Soukyuu: oh
12:32 Soukyuu: as long as setting SwapLimit to 2 doesn't fry my GPU, I could try it
12:33 Soukyuu: does export __GL_YIELD=USLEEP do something for nouveau?
12:33 imirkin_: certainly not for nouveau specifically
12:34 karolherbst: this is blob only
12:34 Soukyuu: ok
12:34 imirkin_: Soukyuu: i'd recommend you try it without your crazy filter though
12:35 imirkin_: Soukyuu: and you might try the gpu-assisted vdec
12:35 imirkin_: although you need blob firmware for that...
12:35 imirkin_: and there are some known issues, although if your videos don't happen to hit them, it's all fine
12:35 imirkin_: Soukyuu: although note that your memory speed is SUPER slow at boot clocks
12:36 imirkin_: so ... i suspect that has something to do with the stuttering
12:36 Soukyuu: imirkin_: which of the crazy filter specifically?
12:36 imirkin_: Soukyuu: looks like drm-next should have the patches to enable reclocking on your gpu
12:36 imirkin_: so you might want to try it out
12:37 imirkin_: Soukyuu: http://cgit.freedesktop.org/~airlied/linux/log/?h=drm-next
12:40 Soukyuu: imirkin_: so I'd have to get that and compile a custom kernel, essentially?
12:41 imirkin_: if you want to try to upgrade to the highest speed, yes
12:42 imirkin_: either that or force the gpu into a high speed with the blob, and then kexec a kernel with nouveau :)
12:42 imirkin_: note that you also have to boot with pstate=1
12:42 imirkin_: er, nouveau.pstate=1
12:42 imirkin_: which will enable user-facing ability to change clocks
12:59 karolherbst: so, testing
13:00 Soukyuu: let's see if this modified pkgbuild works...
13:00 karolherbst: nice, its working
13:00 Soukyuu: what is?
13:01 Soukyuu: any idea why the font rendering is subtly different on nouveau vs nvidia?
13:02 karolherbst: imirkin_: done with kepler: https://github.com/karolherbst/nouveau/commit/6d007d729830755b7c227d60e745aca0f9b55a14
13:02 imirkin_: Soukyuu: i've noticed this as well. no clue.
13:02 imirkin_: Soukyuu: i tend to prefer the nouveau way, so i'm not too concerned :)
13:03 imirkin_: the blob seems to smooth things a lot more, making them harder to read
13:03 Soukyuu: yeah i think i like nouveau way more
13:03 karolherbst: gk104_pcie_speed_init is a hell of a function :/ I just hope I got this right
13:04 mlankhorst: imirkin_: I think that's from compressed textures or something..
13:04 RSpliet: karolherbst: doesn't look too bad really...
13:04 karolherbst: at least it works on my kepler
13:05 karolherbst: put the card into v1 mode with cap, link control and link status set to 2.5
13:05 RSpliet: (apart from me tripping over the choice of vowel; "bump" doesn't sound as neutral as "raise" or "increase" :-P )
13:05 karolherbst: after lioading it was at v2, cap and link control at 8.0, status at 2.5
13:05 karolherbst: :D
13:05 RSpliet: and with vowerl
13:05 karolherbst: right
13:05 RSpliet: *vowel
13:05 RSpliet: I mean verb
13:05 karolherbst: yeah, raise is better
13:06 RSpliet: oh, and I think C-style /* */ comments are preferred in kernel, and the comments are kind of non-informative given the nvkm_debug statements contain the same information
13:06 imirkin_: mlankhorst: really? we didn't enable compressed textuers on nv50 before and i still saw the difference
13:07 RSpliet: but... that's all just polishing, I think functionally it looks all right
13:07 karolherbst: does somebody want to check this on their kepler cards too? https://github.com/karolherbst/envytools/commit/8c7cb00b8425011f056f536acee1b3da617382b1
13:07 RSpliet: maybe remote the nvkm_info to nvkm_debug as well
13:07 karolherbst: I rely on that in my code and want to use it for kepler+
13:07 karolherbst: mhhh
13:07 karolherbst: I think it would be nice to know what the card can do
13:08 mlankhorst: imirkin_: I think it's some kind of power saving thing though
13:08 RSpliet: karolherbst: it
13:08 mlankhorst: framebuffer compression, not texture compression
13:08 imirkin_: mlankhorst: oh that one.
13:08 imirkin_: mlankhorst: i doubt it. the emacs window is of a different size with blob vs nouveau
13:08 mlankhorst: at least that's when I notice fonts getting blurry on blob :)
13:08 RSpliet: it's probably better to somehow interleave that with the pstate debugfs entry (which is now a sysfs entry... ssssh)
13:08 mlankhorst: ah
13:09 karolherbst: RSpliet: :D
13:09 karolherbst: ohh I really have to clean up all my stuff
13:09 RSpliet: or... well, maybe if you can compress the information as a one-liner only showing the most important bits
13:09 karolherbst: mhh
13:09 karolherbst: yeah
13:09 karolherbst: I have to rework the prints anyway
13:09 karolherbst: its kind of too ugly for info
13:09 karolherbst: "nouveau 0000:01:00.0: pci: link control speed: 2" but what does the 2 mean?
13:10 karolherbst: or maybe I just print the version and max speed?
13:10 karolherbst: or only the max speed?
13:10 RSpliet: that's probably the only two bits that people care about anyway
13:10 karolherbst: mhh, yeah only max speed should be enough
13:10 karolherbst: version is implied by max speed anyway
13:11 karolherbst: the thing is 0x8c1c0 doesn't show the card max speed
13:11 karolherbst: but rather what is currently possible with the current hardware configuration
13:11 karolherbst: at least I think so
13:11 karolherbst: look at this trace from mupufs card: https://gist.github.com/karolherbst/8d5906c8a334d3d040a5
13:11 RSpliet: and yes, I think the number of function pointers can be greatly reduced; you may think a bit more ad-hoc :-)
13:12 karolherbst: yeah
13:12 RSpliet: if cap speed, supported version and linkctl speed are never changed beyond boot, keep them private
13:12 karolherbst: some of the stuff is working on fermi+ too
13:12 karolherbst: mhh, right
13:12 karolherbst: but I want to clean this up when I'm done with tesla
13:13 karolherbst: fermi comes next
13:13 karolherbst: still need to find some regs there though :/
13:13 RSpliet: if you want to share them with fermi, make them non static gf100_ variants and add them to the pci priv.h (or something) so you can call them from your kepler sources too
13:13 RSpliet: well, their prototypes that is
13:14 karolherbst: yeah
13:14 karolherbst: or just add the gf100 fucntions into priv.h and just call them?
13:14 karolherbst: don't know yet
13:14 karolherbst: thing is, gk104_pcie_speed_set and gk104_pcie_speed_init will be a bit different on fermi
13:14 karolherbst: and not beacuse of different regs
13:15 karolherbst: link control speed seems to not matter on fermi
13:16 RSpliet: you don't need function pointers for re-usability
13:16 RSpliet: you want function pointers for other engines to invoke them
13:16 RSpliet: (and other subdevs)
13:16 karolherbst: or if I use them in base.c?
13:17 karolherbst: this is the "base" part https://github.com/karolherbst/nouveau/commit/9d5ca371c71829bed3fa6c9b62a83700cd066089
13:21 RSpliet: you might want to reconsider the function name "linux_speed_to_nvkm_speed"
13:24 RSpliet: skeggsb: mind giving a bit of feedback on https://github.com/RSpliet/kernel-nouveau-nv50-pm/commit/bd58c9a44ec748b63fc778faefe39fe6710ddcb3 ?
13:24 RSpliet: functionally it's all right I think
13:25 RSpliet: I'm just curious whether the approach in nv50_ram_timing_read is nice, and whether we should go the same route for Kepler as well (eg: extracting crucial timing information in ramgk104 rather than duplicating it in [g|s]ddr[2|3|5])
14:04 karolherbst: nice, my alsa is kind of messed up :/
14:29 karolherbst: do we need some mmiotraces for GM107?
14:29 imirkin_: no
14:29 imirkin_: or rather... i know i don't :)
14:29 imirkin_: not sure who the 'we' was referring to
14:30 karolherbst: mhh depends
14:30 karolherbst: I could try to figure out gddr5 reclocking there
14:30 karolherbst: but I really don't know maxwell state in total for that one
14:30 imirkin_: GM107 is mostly sort-of ok
14:31 imirkin_: the non-okness comes from mesa fail afaik... although it sure did like to hang when i tried to run piglit on it
14:31 imirkin_: but it worked fine with gbm
14:31 imirkin_: just not X
14:31 karolherbst: I see
14:31 imirkin_: like anyone still uses that ;)
14:32 karolherbst: I heard that OpenGL is working on blob now through EGL :D
14:36 karolherbst: I guess gddr5 reclock doesn't work on that card, too?
14:36 imirkin_: no clock code at all for maxwell
14:36 karolherbst: k
14:54 pmoreau: mwk: Ok, finally updated my pull request for renaming. I certainly wasn't the best fit for undertaking that task.
15:06 pmoreau: mwk: And i added various PDISP caps regs in rnndb, on the evo pull request
15:09 mwk: pmoreau: taking a look
15:10 Soukyuu: imirkin_: I compiled and booted the kernel with nouveau.pstate=1, what do I do next to try and reclock?
15:10 imirkin_: cat /sys/class/drm/card0/device/pstate
15:10 imirkin_: you shuld see a bunch of entries in there
15:10 mwk: pmoreau: you live and learn :)
15:11 Soukyuu: imirkin_: yes
15:11 imirkin_: then you do 'echo 20 > /sys/class/drm/card0/device/pstate' or whichever level you want to switch to
15:11 pmoreau: mwk: Eh eh, let's hope so! ;)
15:11 imirkin_: [i've already closed the window with your dmesg to know the exact level names]
15:12 Soukyuu: 0f seems to be what i want
15:13 Soukyuu: how do I check if it worked/where do I see the current clocks?
15:13 Soukyuu: oh nvm
15:13 Soukyuu: there is a * beside it
15:13 Soukyuu: interesting though:
15:13 Soukyuu: 03: core 300 MHz shader 600 MHz memory 100 MHz
15:13 Soukyuu: 07: core 400 MHz shader 800 MHz memory 300 MHz
15:13 Soukyuu: 0f: core 655 MHz shader 1408 MHz memory 1050 MHz AC DC *
15:13 Soukyuu: AC: core 648 MHz shader 1404 MHz memory 1053 MHz
15:14 imirkin_: AC shows your current setting
15:14 Soukyuu: yeah, why the mismatch?
15:14 imirkin_: hard to get the clocks exactly right
15:14 Soukyuu: ah
15:14 imirkin_: bunch of PLL's with multipliers and dividers
15:14 imirkin_: and various funny limits on both
15:14 imirkin_: which are entirely undocumented
15:15 imirkin_: but you can't multiply by 1000000000000 and then divide by 999999999999
15:16 Soukyuu: ok, the playback is nearly smooth now, but still slight stutters that aren't there with the blob + no compositing...
15:16 imirkin_: ;(
15:17 Soukyuu: i wonder if those few MHz missing on core could be the reason
15:17 karolherbst: maybe the compiled stuff is just really bad :/
15:17 imirkin_: no :)
15:17 imirkin_: the memory speed is the big thing
15:18 karolherbst: ahh
15:18 imirkin_: a jump from 300mhz to 1ghz is the big change, not the few mhz here and there
15:18 Soukyuu: interestingly, I'm not having any tearing without kwin compositing though with nouveau... now if only it was smooth D:
15:18 karolherbst: should we raise pcie too?
15:18 imirkin_: oh yeah
15:18 imirkin_: can you guide him through it?
15:18 karolherbst: yeah
15:18 karolherbst: no worries
15:18 Soukyuu: i'm listening
15:18 karolherbst: Soukyuu: Is in lspci your gpu 01:00.0 ?
15:19 imirkin_: step 1: grab envytools
15:19 karolherbst: ohh right, envytools
15:19 karolherbst: but I think he can simply install it?
15:19 imirkin_: what are the other options?
15:19 karolherbst: compiling?
15:19 Soukyuu: karolherbst: 01:00.0 VGA compatible controller: NVIDIA Corporation GT200 [GeForce GTX 260] (rev a1) <- yes
15:19 imirkin_: karolherbst: and "installing" is?
15:20 karolherbst: then I need as root or with sudo of: lspci -s 01:00.0 -vv
15:20 karolherbst: imirkin_: package manager?
15:20 Soukyuu: https://aur.archlinux.org/packages/envytools-git/ <- those?
15:20 imirkin_: karolherbst: my package manager compiles ;)
15:20 karolherbst: seems about right
15:20 karolherbst: :D
15:20 karolherbst: I know, I know
15:20 karolherbst: yeah, grab mupufs stuff
15:21 karolherbst: now that I think about it, pcie should be a real impact on this
15:21 karolherbst: if you want to push 24 frames per second through the pcie link...
15:21 karolherbst: and do a bunch of GL stuff on top of it
15:21 Soukyuu: karolherbst: http://pastebin.com/6ryuQrtu
15:22 karolherbst: wow, this looks like ancient :D
15:22 karolherbst: okay, install envytools
15:22 Soukyuu: shh, i'm poor
15:22 Soukyuu: xD
15:22 karolherbst: nah, its fine
15:22 imirkin_: Soukyuu: does your motherboard have pcie v2?
15:22 karolherbst: imirkin_: on tesla most of the cards start in v1 mode
15:23 imirkin_: karolherbst: i know, but many tesla's are plugged into v1-only boards i'm sure
15:23 karolherbst: mhh
15:23 specing: GTX260 is quite new, right?
15:23 karolherbst: "quite"
15:23 karolherbst: 2008
15:23 specing: newer than my G84M
15:24 imirkin_: specing: it was the second flagship of the telsa line, after the original G80 which preceded the rest of that line by a year or so
15:24 karolherbst: at least hte GTX 260 should do v2
15:24 mwk: pmoreau: ok, I'm done
15:24 Soukyuu: imirkin_: good question, it's a gigabyte 990-FXA-UD3
15:24 mwk: basically a few minor issues left with the rename commit and it's good to merge
15:24 karolherbst: Soukyuu: we will try
15:24 karolherbst: installed envytools already?
15:24 imirkin_: 4 PCI-E 2.0 interfaces for 2way AMD CrossFireX and SLI multi-graphics support
15:24 imirkin_: so... yes
15:25 Soukyuu: karolherbst: slow internet connection, still dling
15:25 Soukyuu: ok, installed
15:26 imirkin_: mattst88: hey, should i file a bug to request removing the gentoo x11-base/nouveau-drm package, or can you just do it?
15:26 karolherbst: now everything as root
15:26 karolherbst: what gives you nvapeek 0x00154c
15:26 mattst88: imirkin_: I actually masked it for removal a couple of days ago :)
15:26 Soukyuu: karolherbst: 0000154c: 0000017c
15:26 imirkin_: mattst88: ah ok. so wheels are in motion?
15:26 mattst88: long overdue, but it'll be another few weeks before I can delete it from the tree
15:27 mattst88: yep.
15:27 imirkin_: sounds good
15:27 imirkin_: thanks!
15:27 mattst88: once masked, you have to wait 30 days before removal
15:27 mattst88: yw!
15:27 karolherbst: Soukyuu: nvapoke 0000154c 0000017d
15:27 pmoreau: mwk: Ok, thanks! Fixing things right now. :-)
15:27 karolherbst: then lspci -s 01:00.0 -vv again
15:27 karolherbst: it should say something about v2?
15:27 pmoreau: mwk: Did you had a look at pull request #11? I added two new commits
15:28 Soukyuu: karolherbst: Capabilities: [78] Express (v2) Endpoint, MSI 00 <- this?
15:28 karolherbst: yeah
15:28 karolherbst: okay, nice
15:28 karolherbst: LnkCap is still at 2.5?
15:29 Soukyuu: yes
15:29 karolherbst: nvapoke 0000154c 000001fd
15:29 karolherbst: then it should be at 5.0
15:30 Soukyuu: LnkCap: Port #0, Speed 2.5GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <1us, L1 <1us
15:30 Soukyuu: doesn't seem like it
15:30 karolherbst: mhh
15:31 karolherbst: I am not quite sure what to do on tesla exactly, so maybe its fine? I doubt it, will check something
15:31 karolherbst: nvapeek 0x088460
15:32 Soukyuu: 00088460: b0602220
15:33 karolherbst: nvapeek 0x088088
15:33 Soukyuu: 00088088: 01010008
15:35 karolherbst: nvapoke 0x088460 b0602220
15:35 karolherbst: and then the full lscpi output again
15:35 pmoreau: mwk: I changed every use of "block" (where block != bigtile) to "unit", but it might be hard to understand what unit refers to in later paragraphs as it is kind of vague. What would be better in your opinion?
15:35 Soukyuu: karolherbst: LnkCap: Port #0, Speed 5GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <1us, L1 <1us
15:36 karolherbst: :D
15:36 karolherbst: nice
15:36 karolherbst: LnkSta?
15:36 Soukyuu: LnkSta: Speed 2.5GT/s, Width x16, TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
15:36 Soukyuu: LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
15:36 karolherbst: imirkin_: apperantly poking a value inside a reg which was there already may also increase LnkCap :D
15:36 karolherbst: okay
15:36 karolherbst: now the last part
15:37 karolherbst: nvapoke 0x088460 b0602221
15:37 karolherbst: does LnkSta change to 5.0 after that?
15:37 Soukyuu: LnkSta: Speed 5GT/s, Width x16, TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
15:37 Soukyuu: yup
15:37 karolherbst: nice
15:37 karolherbst: now check the video again
15:38 karolherbst: it may or may not be better now :D
15:39 imirkin_: Soukyuu: btw you might consider double-checking whether *not* having that filter still causes you visual artifacts
15:39 Soukyuu: karolherbst: it's still stuttering the same as with 2.5, sadly
15:39 karolherbst: k
15:40 karolherbst: I really need a tesla card to play with :/
15:40 Soukyuu: imirkin_: I can't tell because the stutters are replaced by judder if I disable the filter
15:41 Soukyuu: and judder looks as if it stutters more than it does with interpolation enabled
15:41 imirkin_: Soukyuu: is the video of somehow especially bad quality or something? i've just never seen issues like this
15:42 Soukyuu: 23.97fps on 60Hz screen just never goes well
15:42 imirkin_: well, it's a pretty common use-case
15:42 imirkin_: mplayer automatically does an ivtc i think
15:42 Soukyuu: it's not de-interlacing though
15:42 imirkin_: (or is it 3:2 pull-down? i'm forgetful in my old age)
15:43 imirkin_: i dunno what all mpv does
15:43 Soukyuu: the only visual artifact is that the content doesn't scroll smoothly
15:43 imirkin_: ah hm
15:43 Soukyuu: frame blending solves that, so does injecting interpolated frames
15:44 Soukyuu: interpolated frames sometimes produce their own artifacts, but that's not the issue I'm having
15:44 Soukyuu: karolherbst: i wish i had a spare tesla card to donate
15:44 karolherbst: I don't have a desktop system anyway
15:44 karolherbst: only two laptops here
15:45 karolherbst: and only for pcie stuff it would be a little overkill
15:45 Soukyuu: i guess
15:45 Soukyuu: so, any idea what could be the problem now that I'm supposedly running the card on max speed?
15:47 imirkin_: Soukyuu: sorry, no great ideas. i think that even clock-for-clock, nouveau only performs at about 60-80% of blob speed, so perhaps that last factor makes the difference
15:47 karolherbst: Soukyuu: you could try to run it on blob again and check the core/mem/pcie usage in nvidia-settings
15:48 karolherbst: if on of them is nearby 80%+ then its most likely a compiler issue I would assume
15:48 Soukyuu: ok, will do that. compiler issue?
15:49 karolherbst: for the nvidia card inside mesa
15:49 karolherbst: or something similiar
15:49 karolherbst: you have to compile binaries for the gpu and mesa isn't as good as the binary driver with that
15:49 Soukyuu: ah, i see
15:54 Soukyuu: karolherbst: back on blob, 19% gpu usage, 48% memory usage, 0% PCIe
15:54 Soukyuu: sidenote: i love btrfs snapshots
15:55 imirkin_: Soukyuu: would probably need to debug what's happening
15:56 imirkin_: no idea offhand why it's so different
15:56 karolherbst: I can switch between nouveau and nvidia without rebooting at all :p
15:56 karolherbst: mhhh
15:56 karolherbst: imirkin_: maybe the generated binaries are really bad?
15:56 imirkin_: ?
15:56 karolherbst: don't see why performance should suffer that much
15:57 Soukyuu: karolherbst: yes, but you're modifying your "live" system, i just have to boot a different snapshot and mess with it as much as i like
15:57 imirkin_: coz we upload textures to vram, for example
15:57 imirkin_: instead of keeping them in gart, for single-use situations
15:57 Soukyuu: imirkin_: I'd be happy to assist with debugging if you walk me through
15:57 karolherbst: I don't modify my live system
15:58 karolherbst: imirkin_: ohh I see
15:58 karolherbst: but the perf is bad for me on my intel card
15:58 karolherbst: too
15:58 karolherbst: nouveau is fine even on lowest pstate
15:58 imirkin_: Soukyuu: unfortunately there isn't anything straightforward... need to play around with it, test theories, etc
15:59 Soukyuu: I wish I knew more about this whole thing
16:00 Soukyuu: in any case, thanks for your time
16:00 imirkin_: Soukyuu: what's the exact command line you're running mpv with? perhaps i'll take a look
16:01 karolherbst: imirkin_: its somewhere in the log here ;) will catch it for ya
16:01 Soukyuu: I'm running a script that puts it together and also loads a shader and the mvtools plugin
16:01 Soukyuu: --no-config --display-fps=60 --vo=opengl-hq:scale=ewa_lanczossharp:cscale=ewa_lanczossoft:dscale=mitchell:tscale=oversample:scale-antiring=0.8:cscale-antiring=0.9:scaler-resizes-only:sigmoid-upscaling:interpolation:fancy-downscaling:source-shader=~~/shaders/deband_high.glsl:temporal-dither:pbo --vf=vapoursynth=/home/azure/.config/mpv/scripts/mv_interpolation.py
16:01 Soukyuu: it boils down to this ^
16:02 imirkin_: nice and simple ;)
16:02 Soukyuu: too many options
16:02 Soukyuu: but it works
16:02 karolherbst: still 50% gpu memory usage just for watching videos ...
16:02 imirkin_: can i see your shader?
16:02 pmoreau: mwk: Pushed a last round of modifications. If everything seems fine, I'll close this pull request and reopen a new one with proper commits and commit messages. :D And once it is merged, do the same thing for the EVO pull request.
16:03 Soukyuu: imirkin_: my dotfiles repo has it: https://github.com/Soukyuu/dotfiles/tree/master/config/mpv
16:03 Soukyuu: it's not my shader though, it's linked on mpv wiki
16:04 imirkin_: i like the 'rand' function ;)
16:04 karolherbst: :D
16:07 imirkin_: hopefully the ref[4] doesn't become an actual array
16:07 imirkin_: pretty sure it doesn't through since there are no indirect references
16:14 Soukyuu: imirkin_: you can also test with the basic --no-config --display-fps=60 --vo=opengl-hq:interpolation, but it's harder to tell because mvtools makes it much smoother
16:14 Soukyuu: so stutters are easy to see
16:15 karolherbst: fermi is annoying though with the pcie thingy ...
16:15 Soukyuu: https://dl.dropboxusercontent.com/u/19330332/sm_test.mp4 <- a test file with a horizontal pan
16:18 karolherbst: is this some kind of link training? https://gist.github.com/karolherbst/82d3dd1cccf1d5896928
16:21 karolherbst: mhhhh
16:21 karolherbst: I think the blob counts pci relinks on fermi
16:21 karolherbst: 0x088188 increases after a relink
16:48 karolherbst: imirkin_: did you found the Shuffle instruction for kepler already?
16:49 imirkin_: yeah... it should be in envydis no?
16:49 imirkin_: shfl
16:49 imirkin_: or maybe quadop
16:49 imirkin_: ":)
16:49 karolherbst: just reading whats new with kepler
16:49 imirkin_: there's also fswz on maxwell i think
16:49 karolherbst: and there is no hardware scheduling anymore
16:49 imirkin_: right.
16:50 imirkin_: hence the sched codes
16:50 karolherbst: yeah
16:50 karolherbst: this could also explain the big perf gab
16:50 karolherbst: because now the compiler really has to do it right
16:50 karolherbst: but wiki is funny: "Additional die spaces are acquired by replacing the complex hardware scheduler with simple software scheduler."
16:50 karolherbst: :D
16:52 airlied: hey anyone away know off hand how demmt decodes the mmap writes?
16:52 imirkin_: airlied: errr
16:52 imirkin_: not sure what you mean by 'how'
16:53 imirkin_: using code? :)
16:53 imirkin_: valgrind puts all the stuff there
16:53 imirkin_: we know which ioctls are which, so we know where the pushbufs are
16:54 airlied: so at least on fglrx the ioctl happen around a big load of mmap writes
16:54 airlied: so does it capture all the mmap writes then wait for the ioctl that uses them to decode it?
16:54 airlied: or does it decode on a mmap write line by line basis
16:54 imirkin_: i believe it waits
16:55 imirkin_: there used to be a dedma which did it line-by-line
16:55 imirkin_: but that's extremely unreliable for a number of reasons
16:55 imirkin_: there's an off-chance it has some look-ahead
16:56 imirkin_: i'm not 100% sure, joi went through a few iterations of it
16:56 imirkin_: he's the one who wrote it though and would have definitive information
16:56 imirkin_: iirc he even added very preliminary fglrx support
16:56 imirkin_: airlied: https://github.com/envytools/envytools/blob/master/demmt/fglrx.c
16:56 airlied: yeah I'm just using that
16:56 imirkin_: ah ok :)
16:56 airlied: but I want to enhance it to actually decode the command streams
16:56 imirkin_: right
16:56 imirkin_: so... look at...
16:56 airlied: I'm just not sure where I can hook in
16:57 imirkin_: https://github.com/envytools/envytools/blob/master/demmt/nvrm.c
16:58 imirkin_: i'm weak on the _actual_ details
16:58 imirkin_: i've only modified stuff once it's already all fairly processed and interpreted
16:59 karolherbst: imirkin_: I am currently thinking if I should move the pcie version raise into the generic part? I doubt that any card will use a different procedure
16:59 imirkin_: karolherbst: well e.g. only g84+ can do it
16:59 imirkin_: or g92+
17:00 karolherbst: yeah, but then its the same
17:00 karolherbst: read a reg, write it, read it done
17:00 imirkin_: sure, but you can't have that code run on older gpus
17:00 karolherbst: mhhh, right
17:00 karolherbst: I would only do it for pcie cards anyway, but mhh, don't know if there is a way how to tell if I can do it at all
17:01 imirkin_: there have been pcie cards for a long time
17:01 imirkin_: longer than there's been a pcie v2, believe it or not ;)
17:01 karolherbst: yeah I know
17:02 karolherbst: 2004 seems to be the first nvidia one?
17:02 karolherbst: maybe 2003
17:02 imirkin_: there are even some nv18's with pcie
17:02 imirkin_: (using a bridge chip, of course)
17:02 imirkin_: native was nv41+ i think
17:02 karolherbst: yeah, saw it
17:03 karolherbst: GeForce PCX4300
17:04 karolherbst: ohh I could just add a generic function for tesla+ and just call the function pointer ...
17:04 karolherbst: ohh wait
17:04 karolherbst: there are also v1 only tesla cards
17:04 imirkin_: yeah, g8x
17:05 karolherbst: mhhh
17:06 karolherbst: I need a trace of one of those
17:06 karolherbst: because there are also v2 g8x cards
17:06 imirkin_: no
17:06 imirkin_: there are not.
17:06 karolherbst: there are
17:07 imirkin_: show me
17:07 karolherbst: the one nv84 from mupuf operates at 5.0 speed
17:08 imirkin_: errrrr
17:08 imirkin_: interesting.
17:08 karolherbst: GeForce 9650M GS is some kind of hybrid
17:08 karolherbst: NB9P-GS1(G84)
17:08 imirkin_: is it actually a G84?
17:08 karolherbst: Quadro NVS 320M also v2
17:08 karolherbst: G84M
17:08 karolherbst: will check the trace folder deeper
17:08 imirkin_: i mean, does it show up as a G84 in the trace
17:09 karolherbst: his is a 01:00.0 VGA compatible controller [0300]: nVidia Corporation G84 [Quadro FX 370] [10de:040a] (rev a1) (prog-if 00 [VGA controller])
17:10 imirkin_: heh
17:10 imirkin_: i'm talking about the trace
17:10 imirkin_: not lspci
17:10 imirkin_: there's a 370LP which was a G98 (and i have it)
17:10 karolherbst: yes
17:10 karolherbst: [0] 63.125694 MMIO32 R 0x000000 0x084a00a2 PMC.ID => { STEPPING = 0xa2 | DEVICE_ID = 0xa | CHIPSET = G84 | FOUNDRY = TSMC }
17:11 imirkin_: and check that it actually gets gen2/5GT/s in the trace
17:11 karolherbst: trace: https://gist.github.com/karolherbst/5e6ad051426ea36bedb3
17:11 imirkin_: ok. i give up! :)
17:11 karolherbst: :D
17:12 karolherbst: but the blob tries to get to v2 for all tesla cards
17:12 karolherbst: it actually tries multiple times, too
17:12 karolherbst: pretty persistent
17:12 imirkin_: hehe
17:13 karolherbst: like this one: https://gist.github.com/karolherbst/34fcc09a0f1e93f88757
17:13 karolherbst: its a nv86 though
17:14 karolherbst: but I think the system is the problem here and not the card?
17:14 karolherbst: maybe the "CARD_SPEED = 5_0GT" is wrong?
17:14 imirkin_: probably not.
17:15 imirkin_: it probably tries to set v2 and then looks at some error state
17:15 imirkin_: and then notices that there's an error and gives up
17:15 imirkin_: [or rather, tries, tries, tries again]
17:15 karolherbst: but v2 G86 is somehow common
17:15 karolherbst: there are three quadro v2
17:15 karolherbst: and one desktop and one mobile chip with v2
17:16 karolherbst: at least there can't be any harm in trying, so I will keep it simple and try it too, but only once
17:24 karolherbst: are there any v1 fermi cards at all?
17:24 imirkin_: doubtful
17:25 karolherbst: maybe NVS 310/315?
17:25 karolherbst: could be wikis fault though
17:26 karolherbst: yeah, nvidia says v2 for these too
17:26 karolherbst: mhh
17:27 karolherbst: somehow this is boring, because this means there are only v2 fermi cards
17:27 karolherbst: I guess kepler is the mixed v2/v3 gen then
17:28 imirkin_: probably kepler is all v3
17:28 karolherbst: GK208 is sometimes v2
17:29 karolherbst: and some quadros?
17:29 karolherbst: yeah
17:29 karolherbst: some kepler quadros are v2 only, too
17:30 karolherbst: imirkin_: http://www.nvidia.de/content/PDF/data-sheet/DS_NV_Quadro_K4000_OCT13_NV_US_LR.pdf
17:30 karolherbst: ...
17:30 karolherbst: whatever
17:31 imirkin_: there's no way that's true
17:31 karolherbst: here another one http://www.nvidia.com/docs/IO/140231/NV_DS_QUADRO_K5000_11_05_NV_US_LR.pdf
17:32 karolherbst: there are 3 or 4 more kepler v2 quadros :D
17:32 imirkin_: well i dunno
17:32 imirkin_: my GK208 is v3
17:32 imirkin_: although it's only x8
17:32 imirkin_: and 3*8 < 2*16 ;)
17:32 karolherbst: :D
17:32 karolherbst: by a slim margin
17:33 imirkin_: by a wider margin if you compare the real thing of 8*8 and 5*16
17:33 karolherbst: nope
17:33 imirkin_: oh yeah
17:33 imirkin_: er
17:33 karolherbst: 7.877 Gbit/s * 8 compared to 4 Gbit/s * 16
17:34 imirkin_: yeah. not a wider margin.
17:34 imirkin_: oh, coz v2 didn't do 128/130?
17:34 karolherbst: right
17:34 karolherbst: still 8b/10b
17:34 imirkin_: good times
17:35 karolherbst: GTS 315?
17:35 karolherbst: *GT
17:35 karolherbst: *635
17:35 karolherbst: ...
17:35 karolherbst: or mobile?
17:35 imirkin_: bbl
17:57 imirkin: skeggsb: your mega-rewrite is bisectable right?
17:57 skeggsb: imirkin: yep
17:58 imirkin: ok. gonna try to see how BE got broken
17:58 imirkin: dunno if you saw but it looks like it works with the nouveau in 4.2.0
17:58 skeggsb: yeah, i seen that in the backlog
18:00 karolherbst: skeggsb: if you got some time, would you like to see over my gdd5 kepler changes? https://github.com/karolherbst/nouveau/commit/b4224088f66e848b01caa3eac35fd6e4d390968f
18:32 imirkin: skeggsb: this guy look familiar? http://hastebin.com/nizasebilo.avrasm
18:34 imirkin: i'm on this commit, ftr: device: convert user class to new-style nvkm_object
18:55 imirkin: weird, working load:
18:55 imirkin: Sep 1 01:30:06 ppc64 kernel: nouveau 0000:f0:10.0: i2c: ccb 05: type 00 drive 00 sense 00 share ff auxch ff
18:55 imirkin: broken load:
18:55 imirkin: Sep 1 01:31:48 ppc64 kernel: nouveau 0000:f0:10.0: i2c: ccb 05: type 05 drive 04 sense ff share 2f auxch ff
18:55 imirkin: and it dies after that
19:18 imirkin: weird. reloading it a second time and it works.
19:23 marcosps1: imirkin: Good night :)
19:25 marcosps1: imirkin: Now I'm more familiar with the compiler... and some question came to my mind. I'm now looking in ConstantFolding::visit, winch is called for each instruction i optimizeSSA.
19:26 marcosps1: imirkin: You said I needed to "teach" the code to verify is the value is a double immediate, and it needs to be done somewhere between getImmediate or some code related to immediates inside ConstantFolding. Right?
19:27 imirkin: marcosps1: don't necessarily worry about ConstantFolding
19:27 imirkin: there's a LoadPropagation thing iirc
19:27 imirkin: or something like that
19:28 imirkin: which is what should move the immediate from a separate load/mov directly as an arg of an instruction
19:30 imirkin: skeggsb: this is weird... the i2c table seems to get corrupted every so often
19:30 imirkin: on like every 3rd load or so
19:30 imirkin: could it be reading off the end?
19:31 marcosps1: imirkin: Hum, I'm looking at it now. Thanks!
19:36 imirkin: skeggsb: bef1fab512565193b40cd5a8356b4007ffa8c501 is first bad
19:36 imirkin: skeggsb: gr: convert user classes to new-style nvkm_object
19:39 imirkin: skeggsb: NVOBJ_FLAG_ZERO_ALLOC -- did that do something useful that we don't anymore maybe?
19:40 imirkin: oh, there's a "zero" bool
19:42 imirkin: skeggsb: errrrr
19:42 imirkin: nv04_gr_object_bind looks messed up no?
19:47 skeggsb: imirkin: no, it wasn't used before (we write all the members anyway, no point zeroing first)
19:47 imirkin: skeggsb: yep, that was it
19:48 imirkin: patch on its way soon
19:48 skeggsb: that... doesn't make sense
19:48 imirkin: no
19:48 imirkin: sec
19:48 imirkin: you'll see. it makes sense.
19:50 imirkin: sent
19:51 karolherbst: imirkin: would have complained about the same
19:52 imirkin: karolherbst: ?
19:52 karolherbst: I saw your patch
19:52 skeggsb: imirkin: ah right, that makes sense, the zero thing didn't :P
19:52 skeggsb: i looked at the diff of nv40 too, not nv04 :)
19:52 imirkin: skeggsb: right. esp since you still set zero = true
19:52 karolherbst: and it was the first thing which looked strage
19:52 imirkin: skeggsb: yeah, they look VERY similar
20:00 imirkin: alright. now i can move up in the world to dealing with X
20:01 imirkin: skeggsb: btw, hate to remind you but... thoughts on the proper way to deal with OF?
20:02 skeggsb: ah, thought we covered that yesterday?
20:02 imirkin: oh we did?
20:02 imirkin: i meant in terms of like... api's
20:02 imirkin: right now there's no way to get the size
20:02 imirkin: should i add a ->size() function?
20:02 imirkin: like the initial read tries to read 0x1000 of it, but my vbios isn't even 0x1000 big
20:21 marcosps1: imirkin: In LoadPropagation::visit, I can see the "// propagate !" comment, but, all it does is setSrc and setIndirect...
20:21 marcosps1: I'm missing something ?
20:21 imirkin: marcosps1: no
20:21 imirkin: so intead of having like
20:22 imirkin: mov reg, 5; add dst, src, reg;
20:22 imirkin: you have
20:22 imirkin: add dst, src, 5;
20:23 marcosps1: imirkin: Hum... so all these setSrc and all other things keeps atored in bb to be used after this method finishes...
20:24 imirkin: a bb stores a list of instructions
20:24 imirkin: each instruction has a list of defs and srcs
20:24 imirkin: there's a CSE pass at the end which removes instructions with unused defs
20:25 marcosps1: imirkin: Ok, I'm really getting something inside shader compiler.
20:25 marcosps1: imirkin: I was looking inside insnCanLoad to verify where can I verify the double immediate, since there is placed a lot of verifications about types and sizes.
20:27 imirkin: yeah, so you need to modify that function
20:27 imirkin: to return true for situations where the insn can load the double immediate
20:27 imirkin: now... note that not all insns can load all double immediates
20:27 imirkin: look for isSIMM() type stuff
20:27 imirkin: you need something like that
20:27 imirkin: (SIMM = short immediate)
20:27 imirkin: as opposed to LIMM == long immediate (where you get all 32 bits)
20:30 marcosps1: imirkin: I found isLIMM, but not isSIMM. About the seconds one, I just found a isIMM, winch return an immediate from ValueRef
20:31 imirkin: oh, i guess LIMM is the short one
20:31 imirkin: weird.
20:32 imirkin: wtvr
20:32 imirkin: look at the comment that says "// not all instructions support full 32 bit immediates"
20:33 marcosps1: imirkin: So, "full 32bit immediates" == double immediates?
20:36 marcosps1: imirkin: I know that we're looking for 64bit values, but, in this case, I'm confused ...
20:36 imirkin: no, those are 32-bit immediates
20:36 imirkin: there's no support in the code for 64-bit immediates (at least not really)
20:36 imirkin: you'll need to modify insnCanLoad among others
20:37 marcosps1: imirkin: Also, you said to me to verify envytools: https://github.com/envytools/envytools/blob/master/envydis/gf100.c
20:38 imirkin: for the specific encodings, and instrucitons which support the double imms, yes
20:41 marcosps1: imirkin: I think I'll need to drive through envytools to understand all this thing :)
23:11 joi: airlied: for nvrm: demmt inspects fifo create ioctls to find where IB buffer lies, then finds which cpu mapping matches it and then looks at each write to that cpu mapping (see buffer_decode.c)
23:12 joi: for nouveau it catches GEM_PUSHBUF ioctl
23:16 joi: if you want start decoding on memory write, hook into buffer_decode_register_write with mapping->fdtype==FDFGLRX check
23:28 joi: (fifo create inspection is in handle_nvrm_ioctl_create)
23:34 joi: ... and the next 2 steps are in buffer_decode.c
23:42 airlied: joi: cool I'll try and look into it
23:42 airlied: I ended up hacking the output from mmt into a tool or glisse's that worked well enough in the end
23:47 imirkin: airlied: having it all integrated is pretty nice since you can easily identify other buffers, like code or texture descriptors or whatever
23:47 imirkin: i guess it'd be more compelling if you had it all integrated with rnndb
23:48 imirkin: and/or envydis
23:57 joi: airlied: i'll bb in 9hrs in you have any questions about demmt