07:43 pmoreau: RSpliet: Any thing I could test on my G96?
07:49 nchauvet: hello, where is the development git repository for nouveau ? before entering the airlied tree and the kernel
07:49 imirkin: nchauvet: http://cgit.freedesktop.org/~darktama/nouveau/
07:50 nchauvet: thx ; I was also searching for the vga_switcherro tree if any ? (nothing in dairlied tree wrt switcheroo)
07:50 imirkin: nchauvet: although very technically speaking it's http://cgit.freedesktop.org/nouveau/linux-2.6
07:50 imirkin: however in practice, that first tree is the one you want
07:51 imirkin: i doubt there's a separate switcheroo tree/maintainer
07:51 imirkin: just send patches to airlied
07:55 nchauvet: okay, thx
07:59 RSpliet: pmoreau: nothing in particular springs to mind
07:59 RSpliet: for now ;-)
07:59 RSpliet: I've done a rebase to 4.3 yesterday, but won't have time to compile and test it
07:59 RSpliet: after that I might look into G96
08:01 pmoreau: :-)
08:02 pmoreau: Well, I'll still have the G96 in the coming days, so it's no problem
08:15 RSpliet: yeah... I'm not sure whether I'll get to it this weekend though
08:15 RSpliet: perhaps a friend from NL is coming over... priorities ;-)
12:39 msgctl: Hi, anyone using GTX 750?
12:40 imirkin_: did you have questions about it, or just taking a poll?
12:40 msgctl: my X session restarts every time an application closes opengl context
12:40 msgctl: imirkin_: ^^
12:40 imirkin_: that seems... unfortunate
12:40 imirkin_: solution: don't close GL contexts ;) can you pastebin your xorg log after such a crash?
12:40 imirkin_: there should be one in Xorg.0.log.old
12:40 msgctl: imirkin_: sure, sec.
12:48 msgctl: imirkin_: http://pastebin.com/jzFPBUr8
12:48 imirkin_: that's not at all related to nouveau.
12:49 imirkin_: you must have built an X server with xproto-7.0.28
12:49 imirkin_: downgrade to 7.0.27 and rebuild xorg-xserver, and the good times should roll
12:50 imirkin_: additionally i'd strongly recommend against using xf86-video-nouveau with the maxwell, and instead use the 'modesetting' driver. the easiest way to achieve this is to just uninstall xf86-video-nouveau
12:50 imirkin_: you should receive identical functionality, but GL core contexts won't be broken
12:50 msgctl: are you sure that's the issue? I tested on gentoo-stable with 7.0.27 and it was the same. I don't have the VM image anymore, but I'll double check.
12:50 msgctl: OK, thanks!
12:51 imirkin_: fwiw it dosen't matter what version of xproto is installed *now*, merely the one that was installed when the X server was built
12:52 msgctl: imirkin_: yes, it was .27 back then
12:52 msgctl: imirkin_: are you perhaps a nouveau developer? Is there a way I could contribute without putting in a billion of hours?
12:52 imirkin_: perhaps i am
12:53 msgctl: ;-)
12:53 imirkin_: what kind of things can you do?
12:53 imirkin_: e.g. can you write code?
12:53 msgctl: yup, had some experience with the kernel, more with embedded
12:53 imirkin_: awesome
12:54 imirkin_: well, skeggsb would frown on this, but you could finish up EXA support in xf86-video-nouveau for maxwell
12:54 imirkin_: another thing is that i left finishing up tessellation on maxwell to another day, which is something you could work on if interested
12:54 msgctl: is there any writeup on either that?
12:54 msgctl: of *
12:54 imirkin_: no, but i could provide the info for either
12:55 imirkin_: i also keep a trello with things here: https://trello.com/b/ZudRDiTL/nouveau
12:55 msgctl: as long as it doesn't burden you, please do
12:55 msgctl: yup, seen that
12:55 imirkin_: exa over at https://trello.com/c/E0kZXA2r/18-maxwell-2d-exa
12:55 imirkin_: tess is at https://trello.com/c/oaOt6jdd/118-gm100-tessellation
12:56 imirkin_: neither of those will make TOO much sense to you though if you're not already familiar with a lot of the concepts
12:56 imirkin_: these were largely written in "note to self" style
12:56 imirkin_: rather than "here is all the info a beginner would need" style
12:56 imirkin_: if you have compiler + GL experience, the tess one should be pretty easy
12:56 imirkin_: if you only have compiler experience, the tess one still won't be that hard
12:57 imirkin_: if you have no compiler experience the tess one might be tricky :)
12:57 imirkin_: but a good way to learn about compilers! :)
13:01 msgctl: I'm just trying to make sense out of this
13:07 imirkin_: ask early, ask often
13:07 karolherbst: imirkin_: you didn't show me the trello stuff :p
13:07 imirkin_: karolherbst: really?
13:08 karolherbst: nope
13:08 imirkin_: :(
13:08 imirkin_: i suck.
13:08 karolherbst: ohh no github login :/
13:08 karolherbst: sad
13:08 imirkin_: i can add you if you want
13:08 karolherbst: will use my google account then :/
13:08 karolherbst: ohh trello wants to know that I am on google, then no way I use my google account ...
13:09 imirkin_: don't seem to have your email
13:09 karolherbst: yeah, wait, still have to figure out how to login there
13:09 imirkin_: karolherbst: give me your email and i'll send you an invite
13:11 karolherbst: imirkin_: karolherbst in trello
13:19 SolarAquarion: Is nouveau a kernel driver
13:19 SolarAquarion: As in does it exist within the kernel
13:19 imirkin_: SolarAquarion: nouveau is not any single thing
13:19 imirkin_: SolarAquarion: see http://nouveau.freedesktop.org/wiki/ for the relevant list of software
13:20 SolarAquarion: imirkin_: I'm asking because of the nouveau/linux-2.6 branch
13:20 SolarAquarion: Which has branches for kernels
13:20 imirkin_: that's because it's a kernel tree
13:21 SolarAquarion: imirkin_: so there's an Xorg driver and there's also the kernel driver
13:21 imirkin_: msgctl: anyways, we'd _love_ more help, on nearly any aspect of nouveau. just let me know some bit that interests you and i can help you get started
13:21 SolarAquarion: So linux 4.3 seemingly has some interesting changes
13:22 imirkin_: SolarAquarion: look at that wiki page. it lists everything.
13:22 karolherbst: SolarAquarion: on Linux you usually need 4 things: kernel driver, drm userspace library, DDX moudle for X and a OpenGL library
13:23 karolherbst: imirkin_: this is the thing skeggsb talked about? https://trello.com/c/2Zd5nMU9/91-linebuffer-iso-hub
13:23 SolarAquarion: The kernel driver is compiled with the kernel correct
13:25 SolarAquarion: karolherbst: correct
13:25 imirkin_: karolherbst: no
13:26 imirkin_: errr... maybe
13:26 karolherbst: but it makes sense somehow
13:26 karolherbst: that the card hangs currently, when you access memory, while it's getting reclocked
13:26 imirkin_: yeah
13:27 imirkin_: but it's not supposed to ;)
13:27 karolherbst: yeah
13:27 SolarAquarion: skeggsb hi
13:27 karolherbst: blob turns the fb off I think while reclocking
13:27 imirkin_: we do too
13:27 karolherbst: ahh I see
13:28 imirkin_: or at least we try :)
13:28 karolherbst: :D
13:29 karolherbst: mhh I still didn't figure out what the last problem is on my card
13:29 karolherbst: or never digged deep enough
13:29 karolherbst: usually I just say core voltage is the problem for me now
13:30 SolarAquarion: karolherbst, my question so does the kernel have nouveau within the tree itself
13:30 karolherbst: yeah
13:30 SolarAquarion: as in does linus merge it in
13:30 SolarAquarion: karolherbst, yes?
13:31 karolherbst: yes
13:31 imirkin_: the drm maintainer merges it in
13:31 imirkin_: and then sends to linus
13:31 SolarAquarion: ok
13:31 imirkin_: as with any kernel driver
13:31 SolarAquarion: so linux 4.3 has some interesting changes
13:31 imirkin_: like what?
13:32 SolarAquarion: imirkin_, mostly the rewrites to the new style stuff?
13:32 SolarAquarion: it seemed interesting to me
13:32 karolherbst: I think this is more interesting for devs
13:32 SolarAquarion: the nvkm stuff?
13:33 SolarAquarion: Karolherbst so what is the new styled nvkm stuff
13:34 karolherbst: API
13:34 SolarAquarion: Better API I guess for development?
13:34 SolarAquarion: Or is it also user side
13:36 SolarAquarion: Karolherbst does it only help developer's
13:38 karolherbst: imirkin_: seems like I was so occupied with pcie upclocking, that I never checked if downclocking works :/
13:39 karolherbst: ohh it always goes up to max
13:39 karolherbst: pity
13:40 kb9vqf: Hi all, just wondering if anyone might have some idea where to look to fix this issue: https://bugs.freedesktop.org/show_bug.cgi?id=90453
13:40 kb9vqf: Specifically, I see some indication this might be userspace related (nouveau just doesn't recover from the userspace fault)
13:40 kb9vqf: is this correct?
13:41 imirkin_: kb9vqf: i have no idea how to debug such errors unfortunately
13:41 imirkin_: kb9vqf: but yes, the triggering issue is often some sort of error from userspace
13:42 kb9vqf: imirkin_: Yeah, I remember that. I was hoping someone with the specific knowledge required might be on here ;-)
13:42 imirkin_: don't know that such people exist =/
13:42 kb9vqf: I do wonder what would be triggering the fault from userspace when activating a second X11 session though
13:42 kb9vqf: I can understand the fault when changing OpenGL apps
13:42 imirkin_: a separate X session isn't that different
13:43 kb9vqf: how so?
13:43 imirkin_: it gets its own channel and all
13:43 kb9vqf: ah
13:43 kb9vqf: is there a document explaining this channel concept anywhere?
13:43 imirkin_: not really
13:43 imirkin_: but it's basically a hardware-switchable unit
13:43 imirkin_: which holds all the context state
13:43 imirkin_: like a process
13:43 imirkin_: but a process like like 64 bytes, while a context is like many megabytse
13:44 kb9vqf: so there are a finite number of these channels, with the max set by hardware?
13:44 imirkin_: mmmmmaybe
13:44 imirkin_: you might only be able to have like 128 scheduled at a time
13:44 kb9vqf: or is it more like a VMCB where you can have as many as you want but only one is active at a time?
13:44 imirkin_: not 100% sure on the details.
13:45 imirkin_: the HW can switch between contexts on its own
13:45 imirkin_: only one is ever active
13:45 imirkin_: but the HW is aware of some finite number of them
13:45 kb9vqf: gotcha
13:45 imirkin_: and then the software can probably hold an infinite quantity of other oens
13:45 imirkin_: but i don't know that nouveau enables this theoretical functionality
13:45 karolherbst: imirkin_: how many pstate changes per seconds would be a "good" benchmark for stability?
13:45 imirkin_: [i also don't know that it doesn't :) ]
13:45 kb9vqf: so then the first place I'd look is to see if the X11 session I'm switching away from doesnt release the channe;
13:45 kb9vqf: ?
13:46 imirkin_: karolherbst: ask mupuf or RSpliet. i believe "tight loop" is a good metric.
13:46 imirkin_: kb9vqf: huh?
13:46 karolherbst: so rather if I change every 0.001 seconds and it still works, its "good"
13:46 imirkin_: the issue is that some resource gets written when it's not mapped
13:46 kb9vqf: hmmm, ok
13:46 imirkin_: we change the VM on ever context switch
13:46 kb9vqf: I'm still very new to anything GPU related :-)
13:46 imirkin_: aren't we all
13:47 kb9vqf: well, more new than any of you I think ;-)
13:47 imirkin_: the blind leading the blind...
13:47 kb9vqf: heh
13:47 imirkin_: except maybe skeggsb :)
13:47 imirkin_: who's merely one-eyed with blurry vision
13:47 kb9vqf: I guess I was wondering if the old X11 session could have locked this hardware resource, causing problems with the new session trying to use it
13:48 imirkin_: "hardware resource"?
13:48 kb9vqf: you were saying a channel is a context that controls some bit of hardware
13:48 imirkin_: as in "the gpu"?
13:48 imirkin_: the gpu decides when to switch contexts
13:48 kb9vqf: oh, ok
13:49 imirkin_: and invokes user-defined firmware when it makes that decision
13:49 kb9vqf: so what exactly is in a context?
13:49 imirkin_: whose job it is to save some stuff off, maybe load some new things, and flip the VMs (i guess)
13:49 imirkin_: all gpu state
13:49 karolherbst: imirkin_: mhhhhh bash echo totally messed up, but gpu still running like nothing is the problem
13:50 kb9vqf: so if I understand correctly, a new X11 session consumes one context by itself
13:50 imirkin_: [or at least a lot of it]
13:50 imirkin_: yes
13:50 kb9vqf: and each opengl application consumes another?
13:50 imirkin_: well, the DDX does
13:50 imirkin_: that's right
13:50 kb9vqf: hmm, ok
13:50 kb9vqf: make sense
13:50 kb9vqf: *makes
13:50 imirkin_: it's not an ideal situation
13:51 imirkin_: i think the blob driver might have everything share a single channel
13:51 kb9vqf: well, actually I like Nouveau's approach for one reason: the hung channel doesn't kill the entire system
13:51 imirkin_: hung gpu does though
13:51 kb9vqf: yes, well
13:51 imirkin_: and hung channels often lead to that
13:51 imirkin_: since you'll end up trying to wait on some fence
13:51 imirkin_: or... something
13:51 kb9vqf: huh
13:51 kb9vqf: ok
13:52 karolherbst: imirkin_: I was thinking, maybe the gpu could be "full reset" sometimes to solve stuff?
13:52 kb9vqf: so I guess I could interpret the error message as "X11 thinks it should be receiving control of the GPU but the GPU is not switching to the proper context in time"?
13:53 imirkin_: karolherbst: yeah that'd be nice
13:53 karolherbst: I saw these GPU_RESET regs sometimes
13:53 karolherbst: and wanted to play with them
13:53 imirkin_: karolherbst: the ctxsw timeout is that the firmware was supposed to send an interrupt to the cpu
13:53 imirkin_: and the cpu never received it
13:53 imirkin_: which means the gpu is stuck
13:53 karolherbst: I see
13:54 karolherbst: and if that happens, the driver could tirgger a gpu reset
13:54 karolherbst: which would mean to restore the state afterwads, too?
13:54 imirkin_: right, so you have to (a) figure out how to reset the gpu
13:54 imirkin_: (b) notify userspace that the gpu's been reset and it needs to reinit
13:54 karolherbst: I know that intel has some mechanics for that
13:55 imirkin_: (c) update userspace to perform these actions when receiving such a notification
13:55 imirkin_: i'm not saying it's impossible...
13:55 imirkin_: intel also has documentation
13:55 karolherbst: yeah, but userspace doesn't crash
13:55 karolherbst: and seems to work as nothing happend
13:55 imirkin_: yeah, it's a good system
13:55 karolherbst: allthough it hangs for some time
13:56 karolherbst: ohh mhhh
13:56 karolherbst: the pstate interface itself has an issue somewhere
13:56 imirkin_: yeah, so they got their shit figured out
13:56 imirkin_: we don't
13:57 kb9vqf: imirkin_: Aha, there's a bit more info here: http://nouveau.freedesktop.org/wiki/NouveauCompanion_5/
13:57 karolherbst: imirkin_: https://gist.github.com/karolherbst/41c68ae5ce9f3303f1d5 any idea?
13:58 imirkin_: good thing i didnt' entirely nuke those when cleaning up the wiki :)
13:58 imirkin_: karolherbst: there's lines missing before
13:58 karolherbst: mhh?
13:58 karolherbst: ohh no, its the process stack
13:58 karolherbst: no dmesg issue
13:58 kb9vqf: imirkin_: Yes, it gives important info that should be copied somewhere else. Are we still doing context switches on interrupt?
13:59 karolherbst: echo into pstate just hangs
13:59 imirkin_: karolherbst: some locking fail then
13:59 karolherbst: doesn't get nouveua unloaded though :/ meh
14:00 imirkin_: kb9vqf: sorry dunno
14:00 kb9vqf: ok, was just thinking as it would be interesting to see if the card just needed to be poked with another IRQ
14:03 imirkin_: you don't poke cards with an irq
14:03 imirkin_: a card pokes you with an irq
14:03 kb9vqf: yeah, brain cramp
14:04 kb9vqf: so do we control any of the firmware doing the context switching?
14:05 kb9vqf: ...or maybe I should just figure out what changed from this bisect: https://bugs.freedesktop.org/show_bug.cgi?id=90276#c15
14:05 imirkin_: yeah
14:05 imirkin_: http://cgit.freedesktop.org/~darktama/nouveau/tree/drm/nouveau/nvkm/engine/gr/fuc
14:05 imirkin_: enjoy
14:06 imirkin_: aka "i want a compiler, but all i have is a macro processor"
14:07 kb9vqf: so nouveau actually uploads its own firmware to the card and this controls the context switch (among a lot of other things)? interesting
14:08 imirkin_: well, it's gotta upload *something* in there
14:08 kb9vqf: Like I said, I know very little about these cards
14:09 kb9vqf: My experience is in other systems, so I can piece together what's going on but I need correct info to do so :-)
14:09 imirkin_: don't know that such info could be provided
14:09 imirkin_: merely best guesses
14:10 kb9vqf: understood
14:29 mupuf: karolherbst: unless you got rid of the wait for vblank, you cannot reclock faster than the refresh rate of your screen
14:29 mupuf: oh! laptop gpu
14:29 mupuf: well, I guess it solves the problem :p
14:36 karolherbst: :)
14:36 karolherbst: I was running glxspheres though
14:36 karolherbst: at around 900 fps
14:37 karolherbst: reclocking every 0.1 seconds kind of worked for a long time though
14:37 karolherbst: but now the module is in some kind of dead lock :/ so I have to reboot
14:38 karolherbst: mupuf: I just wanted to check if its more stable with higher core voltage
14:38 mupuf: how long did it last?
14:39 karolherbst: mhh minutes
14:39 mupuf:considers it stable after about a million reclocks
14:39 karolherbst: around 5 or 6
14:39 karolherbst: yeah well
14:39 karolherbst: the gpu was still running
14:39 karolherbst: no problem
14:39 karolherbst: just the echo call got dead locked
14:39 mupuf: usually, if it lasted that long, it would last forever
14:40 karolherbst: I also noticed a fps impact by around 40% i glxspheres, but well
14:40 karolherbst: this could be due to the 0a pstate it got locked on
14:41 karolherbst: mhh
14:41 karolherbst: rmmod hangs here: https://gist.github.com/karolherbst/3ab07d4469c2974f7390
14:45 karolherbst: mupuf: the PWM issue for me is, that the vard is at 0x2e with nouveau, but the blob uses 0x3e of 0x3f at highst pstate, so I assume this is another issue I encountered before (the gpu got stuck after around 1000 reclocks then)
14:50 mupuf: well, increasing the voltage may help as you will reach the necessary voltage faster
14:51 karolherbst: with increased voltage the gpu didn't got stuck at 0f so far
14:52 karolherbst: without it after 30 minutes it hangs somtimes
14:52 karolherbst: but this should happen on 0a as well, need to investiage it a bit
15:03 mupuf: it is likely due to us not waiting for something to be over before pausing the gpu
15:03 mupuf: or us reclocking while some transfer is going on
15:04 karolherbst: mupuf: this is the stack from bash: https://gist.github.com/karolherbst/6b7708bdc3e616b61ede
15:11 mupuf: hmm, looks weird
15:12 mupuf: well, there may be a bug in our code to schedule reclocks
15:12 mupuf: but why the heck would kstroull would call nvif?
15:12 mupuf: the stack looks weird
15:12 mupuf: it looks like it got interrupted
15:12 mupuf: and then deadlocked
15:20 karolherbst: yeah
15:20 karolherbst: maybe
15:20 karolherbst: the kstroull thing is strange
15:31 karolherbst: mupuf: by the way, do you still want to test the gddr5 patch? skeggsb plans to land a proper fix for 4.4 and I would like to catch some corner cases or other issues before that
15:37 pmoreau: Cool! A regression to track: connecting an external screen did work in 3.19 as long as powerlvl >= 2.
15:38 pmoreau: And apparently running glxgears didn't make the screen flicker like hell... Weird
15:39 pmoreau: Ah ah, but it does when disconnecting the screen! O.O
15:39 mupuf: karolherbst: sure, will try it during the weekend
15:39 mupuf: need to arrive early at work tomorrow
15:39 mupuf: and finished the xorg board meeting 40 minutes ago
15:40 karolherbst: k
15:40 karolherbst: thanks
15:49 Karlton: karolherbst: is that a fix for kepler reclocking?
15:50 karolherbst: gddr5, yes
15:51 imirkin_: pmoreau: bisect?
15:51 karolherbst: for memory clocks above 2.4 GHz usually
15:51 karolherbst: Karlton: didn't you tested something like that already?
15:52 karolherbst: I got the impression it worked for you
15:53 Karlton: karolherbst: no my problem was that after I reclocked, it would hang if I just tried to use it to play a video or something :P
15:53 Karlton: GK106 (NVE6)
15:53 karolherbst: gddr5?
15:53 Karlton: yeah
15:53 karolherbst: reclocked to 0f?
15:54 Karlton: yeah 0f: core 549-1293 MHz memory 6008 MHz
15:54 karolherbst: then try if that helps: https://github.com/karolherbst/nouveau/commit/57cfbc0acd3c8e8c8d58c91eb7cf4c813b819c3d
15:54 Karlton: i'll try that :)
15:54 karolherbst: with that you should also be able to reclock to 0f while something is running
15:54 karolherbst: it shouldn't matter then anymore
15:55 karolherbst: allthough it can still hang the gpu for reasony I don't understand but skeggsb does
15:55 karolherbst: so if it hangs for the first try it could be still bad luck
15:55 karolherbst: but it should work in 99.9% of all cases
15:55 imirkin_: twice... coincidence
15:55 karolherbst: :)
15:56 imirkin_: third time -- it's a pattern ;)
15:57 karolherbst: I never understood the fN part really though, so if somebody wants to help out with that, I would be glad
15:58 RSpliet: karolherbst: I can't tell you the electrotechical properties, but it really is just a factor in the equation
15:59 karolherbst: imirkin_: it seems like the high clock PLL doesn't have a fN parameter, this could explain why the blob is still sometimes like 6MHz off the requested value
15:59 karolherbst: RSpliet: yeah I know
15:59 karolherbst: but I didn't manager to convert it into a simple formular
16:00 RSpliet: why would you want the formula to be simple?
16:00 karolherbst: I don't want to have a try and error approach there
16:00 RSpliet: you don't have to, right ?
16:01 karolherbst: I have already two nested for loops
16:01 karolherbst: don't want a third one
16:01 RSpliet: since it has such a limited effect, it makes sense to use it for fine-tuning afterwards instead
16:01 karolherbst: also the fN values can be anythong from 0x0 to 0xffff
16:01 RSpliet: so do your calculations assuming fN is 0
16:01 karolherbst: ohh right, but still
16:01 karolherbst: I just thing its easier to calculate the nearest fN
16:02 karolherbst: then to iterate with O(log) over 0x0 to 0xffff
16:02 RSpliet: well, then that's what you need to do ;-)
16:02 imirkin_: log2(0x10000) = 16
16:02 imirkin_: not SUCH a huge iteration...
16:02 karolherbst: usually I need to resolve this to fN: (((u16)(fN + 4096) * clk) >> 13)) = diff
16:03 karolherbst: mhh
16:03 imirkin_: i've seen longer for loops :p
16:03 karolherbst: ohh wrong, diff = (((u16)(fN + 4096) * clk) >> 13)) / (M * P)
16:04 karolherbst: input should be diff, clk, M and P
16:04 karolherbst: output fN obviously
16:04 RSpliet: why a for-loop for a linear thing like fN?
16:04 karolherbst: ?
16:04 karolherbst: 0xffff possible values?
16:05 karolherbst: how else should I do that
16:05 RSpliet: http://hastebin.com/lupejawaru.coffee
16:06 karolherbst: yeah well, I got that far
16:06 RSpliet: now you know what clk, target, N, M and P are because this is the final fine-tuning
16:06 RSpliet: then let's peel further :-)
16:07 karolherbst: I just need to get a formular like fN= ... something
16:08 karolherbst: but this bit shift and the cast is a bit annoying and I don't get around those
16:08 RSpliet: http://hastebin.com/mafowezotu.coffee ?
16:09 RSpliet: ignore the cast
16:09 karolherbst: mhhh
16:09 karolherbst: okay
16:09 RSpliet: I would double-check that though, this is just from the back of a beer-coaster
16:09 karolherbst: I see
16:10 karolherbst: I didn't want to ignroe the case, so well
16:10 RSpliet: (and especially see if you can run into rounding errors, because that number on the right could get big)
16:10 karolherbst: I want to ignore the (clk * N) part though
16:10 karolherbst: and just put the error in
16:10 RSpliet: that's a good idea yes
16:10 karolherbst: because I already have the error
16:11 karolherbst: the bad thig is though, I only got the error from the last PLL :/
16:11 karolherbst: its a bit messy somehow
16:11 karolherbst: so I need to transpose the error from the one pll to the other one
16:11 RSpliet: you can divide the error by the last PLL multiplication factor?
16:11 karolherbst: yeah
16:12 karolherbst: as far P=1
16:12 karolherbst: but P=2 is also a possible value
16:12 karolherbst: which I ignore so far
16:13 RSpliet: anyway, basic mathematics, I trust you can push that to something elegant :-)
16:13 karolherbst: yeah, I think so
16:13 karolherbst: what does the P parameter do exactly by the way?
16:13 RSpliet: and the reason why I ignored the cast is because it only defines an upper limit on the fN value
16:14 karolherbst: or what is the difference between M and P?
16:14 RSpliet: for the maths, it shouldn't make a difference because overflow is bad :-)
16:15 RSpliet: doesn't seem to be any, but they might be a different type of divider physically (
16:15 karolherbst: mhh
16:15 karolherbst: M stays 1 anyway
16:16 RSpliet: it used to be that one is a linear factor, and the other is a 2^x kind of divider
16:16 RSpliet: but judging by the code, that's not the case on Kepler
16:17 karolherbst: mhh, but the calculate is taken from tesla
16:17 karolherbst: *calculation
16:18 karolherbst: orr wait mhh not sure though yet
16:18 karolherbst: no, its the same on tesla
16:20 karolherbst: mhh with P=2 it would be 4 calculations per refclock instead of 2
16:20 karolherbst: and there are 3*7 possible reflocks
16:21 RSpliet: no, the calculation is taken from 2nd gen tesla :-P
16:21 karolherbst: ohh right
16:21 karolherbst: gt215
16:21 RSpliet: that factor 2 is not going to make a difference
16:22 karolherbst: I know
16:23 karolherbst: I am just thinking if its needed to check P=2, because I should get near enough with fN already
16:23 RSpliet: and if it does; pre-calculate and store the values in the c-state table
16:24 karolherbst: RSpliet: mem clocks are pstate fixed so far
16:24 RSpliet: you get the idea
16:24 karolherbst: right
16:24 karolherbst: mhhh
16:24 RSpliet: hmm, you might want to check this: with P = 2, does it always just use odd values of N?
16:25 karolherbst: I could calculate these values on pstate creation time though....
16:25 karolherbst: RSpliet: was thinking already about this, so that I only have 3 caluclations
16:25 karolherbst: will check that after I get to reboot :D
16:26 karolherbst: stupid deadlocks
16:26 karolherbst: RSpliet: mhhh I was thinking
16:26 RSpliet: good!
16:26 karolherbst: currenty I do cur_N = target_khz / cur_clk; check; cur_N+=1; check;
16:27 karolherbst: but I could do a P+=1; cur_N *2 +1 check in between
16:27 karolherbst: or P+=1; cur_N * 2 - 1 last
16:28 karolherbst: cur_clk is lower PLL clock
16:28 karolherbst: output
16:29 RSpliet: what I would personally do, is find an "N*2" rather than an "N". Then if found: if even, P = 1, divide by two; if odd: P = 2, keep your N as the "N*2"
16:29 karolherbst: mhh
16:30 karolherbst: makes somehow sense
16:30 RSpliet: but many ways lead to Rome
16:30 karolherbst: but mostly P=1 on blob
16:31 karolherbst: I fear there are more restrictions to P=2 because its not that common
16:31 karolherbst: with this method it should be like 33% of all cases
16:31 karolherbst: but its far less
16:33 karolherbst: but I will investigate it
16:33 RSpliet: in fact, I don't see why *that* should be an exhaustive search either tbh; you iterate through your "low PLL" values, and you can calculate the P that is either underestimated or overestimated... you probably want an underestimate
16:33 RSpliet: because for that you can slightly account using the fN
16:33 RSpliet: sorry
16:33 RSpliet: the N
16:34 RSpliet: no need to try all of them :-)
16:34 karolherbst: yeah
16:35 karolherbst: but it doesn't matter if the actual clock is a bit higher
16:35 karolherbst: in the blob it is also like 1.5MHz higher than in the pstate
16:35 RSpliet: well, you can try and get pretty close with a heuristic like that :-)
16:35 karolherbst: yeah, I already do
16:35 karolherbst: except for the P=2 and fN part
16:35 karolherbst: I think the biggest difference was like 40MHz with that
16:36 karolherbst: which isn't much if you think about it
16:36 RSpliet: well, 40MHz is in the "I'd worry" range
16:36 karolherbst: its less then 1%
16:36 RSpliet: yes, but esp. the top clock, they probably try to squeeze as much perf out of it as they can
16:36 karolherbst: I can safely overclock my memory by over 1000MHz with the blob
16:37 RSpliet: question is whether all memory out there is that overclockable
16:37 karolherbst: right
16:37 karolherbst: blob allows +4000 though
16:37 karolherbst: even more actually
16:37 karolherbst: stock clock is 4008MHz which is pretty low
16:38 pmoreau: imirkin_: Of course! :-) 4.2 was fine as well, so probably occured during the rework. But I'll finish the bisect tomorrow.
16:38 RSpliet: and whether stability could change under load... I've seen some eye diagram of GDDR5, and those aren't pretty :-D
16:38 RSpliet: (or age for that matter...)
16:38 karolherbst: I see
16:38 imirkin_: pmoreau: ah ok. probably another casualty of cut & paste
16:39 pmoreau: Could be
16:39 RSpliet: so with tesla I always took the approach of "let's not worry about 3 MHz, but take the safe side if I can" (for as far as I did any of the real clock tree work, that is)
16:40 RSpliet: even though I was talking clocks of 810MHz, so... less than one percent :-)
16:40 RSpliet: skeggsb probably had some fun experience with that as well :-P
16:42 imirkin_: pmoreau: found one diff
16:43 imirkin_: - if (!nv_wait(disp, 0x610200 + (chid * 0x10), 0x00000000, 0x00000000)) {
16:43 imirkin_: + if (!(nvkm_rd32(device, 0x610200 + (chid * 0x10)) & 0x00030000))
16:43 imirkin_: seems like the old code was buggy, but... different.
16:43 imirkin_: skeggsb: --^
16:43 skeggsb: yeah, the old code was buggy
16:43 skeggsb: though.. that doesn't mean the new one isn't ;)
16:44 RSpliet: :-D
16:44 imirkin_: well... new one checks with that 30000 mask later on
16:44 pmoreau: I'll try to revert it back and see how it goes :)
16:44 imirkin_: this is in nv50_disp_pioc_init
16:44 skeggsb: yeah i know
16:44 skeggsb: it inherited the new condition from gf119 fwiw
16:45 RSpliet: doesn't seem written as a timeout though
16:46 imirkin_: RSpliet: i didn't paste the whole new thing
16:46 pmoreau: From Nouveau point of view, is plugging a mDP -> HDMI adaptor different from plugging a mDP -> VGA adaptor?
16:46 RSpliet: ok :-)
16:46 imirkin_: the new timeout style is a little trippy... highly surprised cpp allows it
16:46 skeggsb: pmoreau: yes, the former is likely tmds passthrough, whereas the latter is actual displayport
16:46 imirkin_: pmoreau: yes
16:47 skeggsb: not necessarily though, if you have an active dp->hdmi adapter
16:47 imirkin_: skeggsb: does pioc play in any of that?
16:47 skeggsb: nope
16:47 pmoreau: Ok, so I'll probably need a mmiotrace for all three of them, as only one of them works with Nouveau.
16:47 imirkin_: didn't think so
16:48 imirkin_: well, i just audited the timer conversion for disp and that was the only diff i foun
16:48 imirkin_: found*
16:48 imirkin_: the reason i'm picking on timer is that it's really easy to make a typo
16:49 pmoreau: Or on regs (cf. gt215 reclocking) ;)
16:49 RSpliet: that was in a timeout :-P
16:49 pmoreau: Damn
16:49 pmoreau: I had forgotten
17:09 Karlton: karolherbst: what kernel do I need to apply that commit to?
17:11 Karlton: I tried linux4.2 but they are too different xD
17:25 karolherbst: ohh right, you need the nouveau rework for that
17:25 karolherbst: Karlton: you can buld nouveau from this branch though: https://github.com/karolherbst/nouveau/commits/gddr5
17:25 karolherbst: and just load the module
17:41 sarnex: imirkin_: does nouveau support 970m?
17:41 imirkin: sarnex: depends what chip is inside it
17:41 imirkin: i don't do marketing names
17:42 sarnex: i have someone in #d3d9 getting 'nouveau_drm_screen_create: unknown chipset nv124'
17:42 skeggsb: then no, not supported - requires signed firmware, and secure boot support - neither of which we currently have
17:42 sarnex: is it old kernel/ddx or really unsupported
17:43 sarnex: ah i see. thanks guys
17:44 karolherbst: what? secure boot is also required?
17:44 karolherbst: oh man :/
17:45 sarnex: nvidia is brutal
17:45 skeggsb: karolherbst: secure boot of the gpu, not what you're thinking of
17:45 karolherbst: ahh k
17:46 karolherbst: I already wanted to ask if the key has to be inside a whitelist or something ...
17:46 karolherbst: but what does "secure boot of the gpu" means?
17:47 skeggsb: a fun, convoluted, process of loading the pmu/graphics falcon ucode in such a way that it can't be tampered with
17:48 karolherbst: sounds nice
17:48 skeggsb: not really, it's the same crap that's preventing us from doing acceleration on gm2xx right now
17:48 skeggsb: the graphics falcon can't even touch its own registers anymore..
17:48 karolherbst: uhhh
17:49 karolherbst: thats harsh
17:49 skeggsb: (unless the falcon is booted in light secure mode, which requires pmu heavy secure, which requires signed firmware)
17:49 skeggsb: the hilarious thing is that the host can still touch all those registers directly...
17:49 karolherbst: :D
17:50 imirkin: skeggsb: but not fast enough ;)
17:50 skeggsb: i have a host-based port of our gr ucode that works on gm2xx, it's horrendously laggy though compared to doing it from the falcon
17:50 Karlton: karolherbst: I am playing a video at 0f now :-D
17:50 karolherbst: Karlton: good
17:50 skeggsb: (unusably so for desktop usage)
17:50 karolherbst: Karlton: try swithcing pstates
17:51 karolherbst: while playing video
17:51 Karlton: k...
17:51 karolherbst: and do it until the gpu hangs :p
17:52 imirkin: skeggsb: how big is a context?
17:52 skeggsb: it varies a lot, mostly depending on how many TPCs you have
17:52 Karlton: crap
17:52 Karlton: my screen fucked up xD
17:52 skeggsb: in the hundreds of kilobytes though, on average
17:53 karolherbst: Karlton: you gotta be carefull :p
17:53 imirkin: skeggsb: oh hm. i thought it was at least 1MB
17:53 karolherbst: Karlton: how many switched did it take?
17:53 skeggsb: imirkin: it's entirely possibly i'm misremembering too :P
17:53 skeggsb: possible*
17:54 Karlton: karolherbst: I went from 07 to 0f and started getting artifacts
17:54 karolherbst: Karlton: also how high is your memclock at 0f?
17:54 Karlton: 0f: core 549-1293 MHz memory 6008 MHz AC DC *
17:54 karolherbst: mhhh
17:54 karolherbst: I meant the last line
17:54 karolherbst: the "actual" clock
17:54 karolherbst: this is the clock you should use
17:54 imirkin: skeggsb: well i'm not basing it on any real measurement... just a vague thought
17:55 imirkin: Karlton: try stepping to 0a first?
17:55 karolherbst: imirkin: shouldn't matter generally :/
17:55 Karlton: I switched to 0f and everything is fine again
17:55 karolherbst: maybe the clock got too high
17:55 karolherbst: :D
17:55 karolherbst: yeah
17:55 karolherbst: it unscrews itself, nice
17:56 karolherbst: Karlton: but really, what does the last row says inside the psate file at 0f?
17:56 Karlton: AC: core 1293 MHz memory 5976 MHz
17:57 karolherbst: mhh okay
17:57 karolherbst: 32MHz difference
17:57 karolherbst: Karlton: is it okay now at 0f?
17:57 karolherbst: or does the screen goes always screwed up?
17:57 karolherbst: or only sometimes?
17:58 Karlton: yeah I can switch again without anything messing up
17:58 Karlton: but the video stopped
17:58 Karlton: err well finished playing I mean
17:58 karolherbst: yeah, that's expected I guess then
17:58 karolherbst: ohh
17:58 karolherbst: k
17:58 Karlton: it kept playing the whole time though
17:58 karolherbst: I think its still a bit unstable, if there is a lot of pressure on the memory
17:59 karolherbst: but a screwed up screen is better than a screwed up gpu I would say
18:00 Karlton: better than before when I couldn't even use 0d without it hanging
18:00 karolherbst: skeggsb: is this your isohub thingy you were talking about? screen screwed up while reclocking memory?
18:00 karolherbst: Karlton: ahh 0d and 0f are the same for you?
18:01 Karlton: they look the same but I could switch to 0f but never 0d for some reason
18:01 karolherbst: mhhh
18:01 karolherbst: it depends on some factors though
18:01 karolherbst: with stock nouveau, it makes a difference if you switch from 07 or 0a to 0d/0f
18:08 Karlton: karolherbst: I killed it with flightgear :D
18:08 karolherbst: mhh
18:08 karolherbst: how?
18:09 karolherbst: you just switched a lot or after the first swtich already?
18:09 Karlton: I was already on 0f and just started flightgear
18:09 karolherbst: I see
18:09 karolherbst: you could try a second time though
18:10 Karlton: k...
18:14 karolherbst: mhh
18:15 Karlton: nope
18:15 karolherbst: but with normal nouveau you couldn't even change while doing nothing?
18:15 karolherbst: or what's the difference with the patch?
18:22 karolherbst: Karlton: its a bit strange, that starting something messes it that much up :/ usually changing the clock is the bigger problem
18:22 karolherbst: at least for me
18:33 karolherbst: imirkin: so it seems there are indeed some configuration which still have big troubles even with the patch :/
18:38 imirkin: not surprising
18:39 Karlton: karolherbst: sorry, I change to 0f and then start flightgear
18:40 Karlton: after it loads it plays for a few seconds and then hangs
18:40 karolherbst: Karlton: if you would like, could you give me the output of "nvapeek 0x132000 0x40" while in 0f?
18:40 karolherbst: without running anything
18:40 karolherbst: I see
18:40 karolherbst: Karlton: how is it with stock nouveau?
18:40 karolherbst: I am more interessted in the general difference
18:41 karolherbst: if there is no difference, then its bad, because I might still miss something, if there is a little one, then it's fine already
18:42 Karlton: I think stock nouveau died quicker
18:42 Karlton: iirc
18:43 karolherbst: I see
18:44 karolherbst: but if you want to help you could change to 0f and then just do a "nvapeek 0x132000 0x40" and give me the output
18:44 karolherbst: maybe there is really something odd going on
18:44 Karlton: should dmesg be flooded with stuff?
18:44 karolherbst: when?
18:45 karolherbst: and it depends on what for stuff
18:45 karolherbst: when the gpu messed up, then yes
18:45 Karlton: yeah when changing pstates
18:46 karolherbst: really depends
18:46 karolherbst: what does dmesg say?
18:46 Karlton: like a bunch of 448.941216] nouveau 0000:01:00.0: gr: TRAP ch 2 [023faf0000 Xorg[2423]]
18:46 Karlton: and 448.941140] nouveau 0000:01:00.0: gr: GPC0/TPC0/MP trap: global 00000004 [MULTIPLE_WARP_ERRORS] warp 3d0009 [INVALID_OPCODE]
18:46 karolherbst: mhhh
18:47 karolherbst: the latter one is a bit odd :/
18:48 karolherbst: Karlton: does "nvapeek 0x20344" print anything?
18:50 Karlton: karolherbst: here is my dmesg: http://sprunge.us/WSZf
18:50 Karlton: karolherbst: I have to install that first :D
18:50 karolherbst: I see
18:50 karolherbst: the package is called envytools by the way
18:51 marcosps1: imirkin: based on your comments by email, so no need to have a isImmd64Load?
18:52 imirkin: marcosps1: no, you still need it...
18:52 imirkin: but it needs to work a little diff than the 32-bit one
18:54 marcosps1: imirkin: I really need to review how SSA work and then adapt the code as you said. It's not so clearly how to implement it in your SSA approach. I'm was not that good in compilers at university ...
18:54 imirkin: marcosps1: SSA = single static assignment
18:55 imirkin: ok, iirc i asked you if you knew that stuff and you said 'yes' :p
18:55 imirkin: with SSA you can very easily know how a particular value is defined
18:55 marcosps1: imirkin: yes, I know a little about compilers, but not in these technical english names hehehe
18:55 imirkin: since it's only ever set once
18:56 Karlton: karolherbst: "nvapeek 0x20344" is "..."
18:56 marcosps1: imirkin: But, not problem at all, I like new challanges. I'll be stronger when I fisnish this task :)
18:57 karolherbst: Karlton: good
18:57 Karlton: karolherbst: at 0f "00132000: 98030001 00001001 10000000 00000000
18:57 Karlton: 00132010: 00000000 00000fff 00000000 00000000
18:57 Karlton: 00132020: 20030001 00062901 f0000000 00000300
18:58 karolherbst: I need 0x132030, too
18:58 Karlton: for "nvapeek 0x132000 0x40"
18:59 Karlton: "WARN: Can't probe 0000:01:00.0
18:59 Karlton: err derp hold on
18:59 Karlton: karolherbst: it's "00132030: 00001007"
19:00 karolherbst: then nvapeek 0x137320 and nvapeek 0x137330
19:01 Karlton: "nvapeek 0x137320" is nothing and "nvapeek 0x137330" is 00137330: 81200634
19:03 karolherbst: okay
19:03 karolherbst: thanks
19:18 marcosps1: imirkin: http://pastebin.com/7R8vwQiv
19:18 marcosps1: much better now. I believe now I got what you're trying to tell me.
19:19 imirkin: marcosps1: seems reasonable
19:19 marcosps1: imirkin: :)
19:19 imirkin: obviously hacky, but i assume that was the point
19:20 marcosps1: imirkin: should I try a better way to detect it, maybe when generating these instructions?
19:24 marcosps1: imirkin: in this same hacky way, I need to get both values from merge instr and change set inst to use it?
19:25 imirkin: no, your detection is fine
19:25 imirkin: in fact isImmd64Load() is largely right
19:25 imirkin: i think the name 'ld' is misleading
19:26 imirkin: i'd just call it 'i'
19:26 imirkin: you need to make sure that it's a double-wide merge
19:26 imirkin: merges can be 2-, 3- or 4-wide
19:27 imirkin: you can probably look at typeSizeof(dType) == 8
19:30 marcosps1: imirkin: nice, done here. And now, how to get the value. Do I need to get the values from merge itself?
19:30 imirkin: you need to retrieve the 2 immediates
19:30 imirkin: put them together into one
19:30 imirkin: and then do something like bld.mkImm() to create a 64-bit immediate
19:30 imirkin: note that chances are code for that doesn't *quite* exist
19:32 marcosps1:is looking how to put them together
19:33 imirkin: a << 32 | b
19:33 imirkin: ;)
19:33 imirkin: we're not fancy here.
19:34 marcosps1: :)
20:05 marcosps1: imirkin: Hum, bld.mkImm exists... :)
20:05 marcosps1: exists with uint64_t as parameter
20:06 imirkin: yeah, so just give it the combination of the 2 32-bit values
20:06 imirkin: and you should be good to go, at least on that part
20:06 marcosps1: imirkin: the next tricky will be to put this new value on set instead of the reg...
20:10 imirkin: marcosps1: i->setSrc(1, bld.mkImm(foo))
20:11 marcosps1: imirkin: how, that easy :o
20:21 marcosps1: imirkin: http://pastebin.com/vtkJiDWi
20:22 marcosps1: I'll fix the shift part, but for now the mkImm is crashing. It seems the prog is null. I'm investigating about it...
20:22 imirkin: marcosps1: sure, that seems reasonable.
20:22 imirkin: you also need to validate that the imm can actually be loaded of course
20:22 imirkin: oh right
20:22 imirkin: ignore the builder
20:22 imirkin: do
20:22 imirkin: new_ImmediateValue(prog, the-value)
20:22 imirkin: [i think]
20:23 imirkin: grep around for the proper usage... it's something like that
20:23 imirkin: [note that it's *not* 'new ImmediateValue'... it all allocated from an arena]
20:28 marcosps1: imirkin: Hum... not we have another problem: newImmediateValue only accepts uint32_t
20:29 marcosps1: *now we have
20:29 marcosps1:is sleepy
20:31 imirkin: yeah you need to fix it up
20:31 marcosps1: imirkin: yes, we love to add new macros :)
20:31 imirkin: macros don't care about types
20:31 marcosps1: imirkin: now I understand why you created this trello card :)
20:33 marcosps1: imirkin: I'm looking at that arena alocation.
20:35 imirkin: marcosps1: just add the proper constructor
20:35 imirkin: (wait, how does bld.mkImm work then??)
20:35 marcosps1: imirkin: Yes, it seems ImmediateValue needs this new ctor
20:36 marcosps1: mkImm with uint64_t uses (uint32_t) 0 as parameter of new_Immediate
20:37 imirkin: haha ok
20:37 imirkin: problem solved then ;)
20:37 marcosps1: :)
20:38 marcosps1: imirkin: now, can you please help me with the shift op?
20:39 imirkin: add parens?
20:39 imirkin: oh, and cast to u64
20:44 marcosps1: imirkin: almost there... I'm getting a -nan now :)
20:45 imirkin: probably the print functions aren't ready for it
20:46 marcosps1: imirkin: hehe, it's getting huge :)
20:49 marcosps1: imirkin: also, I'm hitting a crash in emitter: nouveau_compiler: codegen/nv50_ir_emit_nvc0.cpp:340: void nv50_ir::CodeEmitterNVC0::setImmediate(const nv50_ir::Instruction*, int): Assertion `!(u32 & 0x00000fff)' failed.
20:49 marcosps1: this can be related to validation in winch ops we'll enable double immediates?
20:51 imirkin: right, so you also need to fix up the emitter
20:51 imirkin: to know how to properly emit double immediates
20:52 marcosps1: imirkin: I beleive now things are getting more difficult :)
20:53 imirkin: well i didn't say it'd be a 2-line change :p
20:54 marcosps1: imirkin: Yes, but I'm quite happy until now :) I'm learning a lot of things... as you said, there is a lot of things that needs to be done in nouveau :)
20:55 marcosps1: IFAICS, the emitter is always checking for a u32 value. This needs to verify out ImmediateValue to check for our new ctor :)
20:56 imirkin: well
20:56 imirkin: you need to look at the type
20:56 imirkin: if the type is F64, then different rules apply
20:56 imirkin: the value needs to be able to fit in the top N bits of the double... 20 iirc?
20:59 marcosps1: imirkin: this I don't know :P
20:59 imirkin: that's why i was suggesting you look at the encodings in envydis
21:02 marcosps1: imirkin: I'll try to understand envydis code and talk with you again...
21:03 marcosps1: imirkin: I'm falling asleep here.. so, thank you for all the help today! See ya!