01:46 kloofy: on smp there seems to be roughly too options looking at Bochs x86 PC to bootstrap cpu's and exchange cores, it's done via sipi startup sequence and ipi messages, or it can be done even with interrupts, something that they call apic init level deassert and then scheduler changes in kernel the shared memory
01:46 kloofy: even with/even without
01:59 kloofy: allthough technically this deassert is still looking like an interrupt, not sure how to control in shared memory without an interrupt which core runs what code
02:07 imirkin_: skeggsb: ping on https://lists.freedesktop.org/archives/nouveau/2016-August/025901.html
03:02 kloofy: gotta sleep it off, backpain ..either way it's even bit easier to look at from freebsd source code, linux sources were bit too big to start with
03:02 kloofy: http://fxr.watson.org/fxr/source/amd64/amd64/cpu_switch.S
03:03 kloofy: it really seems that interrupts are used, but can be done in userspace, it really is done so, that kernel threads are reentrant, while userspace threads map to them and they switch contexts i.e save and restore to mem, and run it off from kernel threads
04:33 skeggsb: imirkin_: i actually thought i pushed it on the weekend, but apparently it was only local somehow
04:41 imirkin: skeggsb: ah ok. no worries.
04:41 imirkin: i doubt there are hoardes of nv3x users looking to run nouveau_vieux on them :)
04:41 skeggsb: it's pushed now regardless :)
04:41 skeggsb: thanks for the ping!
04:41 imirkin: along with some other goodies i see
04:42 imirkin: oh, karol's series. nice!
04:44 imirkin: skeggsb: https://github.com/skeggsb/nouveau/commit/7ffeb1d32676033f738b0491fd62b2c148da6df4#diff-5498975b27c2b12d3163fd24db81ddd1R52
04:44 imirkin: i think those are supposed to all be 'static inline void ...'
04:45 skeggsb: yup, that seems likely
04:45 imirkin: [and can also drop the 'return;' in there
04:46 skeggsb: i'll make a note and deal with that.. soon.. tried a different mst monitor today, and it's making me hate life
04:49 imirkin: heh, i'm sure airlied has a complete supply of life-sucking mst devices
04:49 imirkin: have you made a loop yet?
04:56 kloofy: https://www.cs.umd.edu/class/fall2015/cmsc412/percpu.pdf it much looks like the thread switching is done via gdt which programs some sort of segment regs
05:00 gnurou: skeggsb: since a few weeks I am getting this error when compiling as a module from the Nouveau tree: drm/nouveau/uapi/drm/nouveau_drm.h:30:17: fatal error: drm.h: No such file or directory
05:01 gnurou: skeggsb: I can fix this by including drm/drm.h instead, but am surprised to see it in the first place - have you noticed this error?
05:03 imirkin: gnurou: are you working on anything nouveau-related of late?
05:04 btborg: Hi
05:04 btborg: I was wondering if anybody managed to get Nouveau running on Nvidia GeForce GTX 980Ti yet?
05:04 imirkin: btborg: it should work
05:05 btborg: GM200 is listed as supported on the Codenames page, but I only see Titan X
05:05 imirkin: btborg: it's a GM200 iirc... you might need a "late" kernel
05:05 btborg: from what I've heard 980Ti are Titan Xs (Maxwell) that failed the binning processs
05:05 btborg: https://en.wikipedia.org/wiki/GeForce_900_series#GeForce_900_.289xx.29_series
05:05 gnurou: imirkin: still am, although I had to pull a bit from work last month
05:06 btborg: I'm wondering, has anybody gotten Nouveau working on 980Ti yet?
05:06 imirkin: btborg: like i said... should work.
05:07 skeggsb: gnurou: odd, it works for me
05:07 kloofy: reading out the context of this difficult cpu stuff, i maybe it could be done without segment regs it would be just fast and conveniant though, but not without filling in those tables if done in userspace only
05:07 gnurou: skeggsb: I tried both ARM and x86 builds, and got this for both...
05:08 imirkin: btborg: in v4.6+ by the looks of it
05:08 btborg: It should work, but I haven't found any evidence of a successful nouveau installation on a 980Ti system. I guess I'll try it again myself, with some bleeding edge distro sporting a super-recent kernel.
05:08 gnurou: skeggsb: well let's keep it that way for now if I'm the only one seeing this
05:08 btborg: thanks
05:08 imirkin: btborg: i'm fairly sure people have reported that GM200 works. dunno about 980 Ti specifically.
05:12 btborg: I'm wondering, how do they bin GPUs? Is it literally the same GPU, just restricted by software not to use certain cores or memory modules?
05:12 btborg: 980Ti vs. Titan X Maxwell, I'm asking
05:13 imirkin: fused off in hardware usually
05:13 btborg: damn
05:13 btborg: That could be one reason they locked down the firmware, though
05:14 imirkin: firmware has nothing to do with it
05:14 btborg: Like the recent hack for Non-K processor overclocking on Intel Skylake chips
05:14 btborg: recently patched by microcode updates, but still
05:15 btborg: Does Nvidia literally brick components on lower-binned boards?
05:19 btborg: Hmm... it seems they blow fuses on components they want to disable. It's apparently cheaper than making another board.
06:17 btborg: Hi
06:17 btborg: So it was a partial success
06:17 btborg: I got Nouveau working in Arch Linux on my GTX 980Ti
06:18 btborg: I could tell by the console framebuffer resolution
06:19 btborg: I was hoping to try out Wayland, but I couldn't get any Display Servers/GUIs/WMs to work with it, X, Wayland, or otherwise.
06:20 btborg: I probably could get it running, but it would require DRM trickery beyond my current scope of knowledge
06:20 imirkin: you need a recent linux-firmware
06:20 imirkin: otherwise you don't get acceleration
06:21 imirkin: you also need a semi-recent mesa... like 11.2 i think
06:21 imirkin: and a recent xorg :)
06:21 imirkin: moral of the story: you need recent stuff.
06:22 btborg: I live on the upstream edge, and constantly yaourt -Syyu
06:24 btborg: It's not the recommended way to install AUR packages, but I'm like screw it. I don't have time to optimize and compile every single package from source, Gentoo style.
06:25 btborg: I guess I'll just have to wait for kernel 4.6+, mesa and wayland support to become more ubiquitous
07:12 karolherbst: skeggsb, gnurou did you read my ping about those coherent patches breaking stuff for me?
07:13 karolherbst: and for mupuf as well
07:19 karolherbst: skeggsb: and thanks for merging
07:25 gnurou: karolherbst: missed that - what is it about?
07:30 karolherbst: shader-db (using egl-gbm) hangs like every second run on screen_destroy time
07:31 karolherbst: gnurou: https://github.com/skeggsb/nouveau/commit/8fc2737c6b4da675358f12b3b714af0f4a9c390c
07:31 karolherbst: this one
07:31 karolherbst: and then after a while I get fence timeouts
07:42 gnurou: karolherbst: and reverting this commit fixes your issues?
07:45 gnurou: karolherbst: as in, just this commit and not the the previous one? (6d58b87)
07:45 karolherbst: just this one is enough actually
07:46 karolherbst: yes
07:46 gnurou: that's unexpected, if anything it should enforce *more* coherency
07:46 karolherbst: no clue really
07:46 karolherbst: you can test it yourself with shader-db quite esaily
07:46 karolherbst: just run it for one shader_test file
07:46 karolherbst: and it will hang quite often
08:07 gnurou: karolherbst: shader-db's README says "Currently it supports Mesa's i965 and radeonsi drivers", do you need to do anything to make it run on Nouveau?
08:09 karolherbst: gnurou: a little commit
08:09 karolherbst: but doesn't matter
08:09 karolherbst: you can still run it with nouveau
08:09 karolherbst: it is painly for parsing the results and some other little things
08:10 gnurou: karolherbst: ok, so just e.g. "./run shaders/supertuxkart" should be enough to repro?
08:10 karolherbst: gnurou: yeah, should be
08:11 karolherbst: and I still need some helping reseting the pmu :/ somehow I didn't managed to get it to run the nouveau firmware again
08:24 gnurou: karolherbst: mmm, "./run shaders" consistently passes with flying colors on my gm206, weird...
08:25 gnurou: and I test on 4.8-rc5, so I have the offending patch
08:28 karolherbst: mhh odd
08:29 karolherbst: I still run a 4.7 kernel though
08:29 gnurou: let me try with a kepler board
08:29 karolherbst: maybe something within drm is needed so that it won't break?
08:29 karolherbst: but mupuf also had this issue
08:29 gnurou: I not that I know of...
08:29 gnurou: with 4.7 are you using Nouveau's current master? I can try to repro on a similar setup
08:32 karolherbst: gnurou: https://github.com/karolherbst/nouveau/commits/master_4.7
08:32 karolherbst: same as master from two days just with some drm-next commit removed
08:33 gnurou: karolherbst: thanks, let me try that
08:35 karolherbst: gnurou: it might be that some drm-next commit was indeed important for that
08:36 gnurou: if I can repro on your branch and not on 4.8-rc5, then it might be that indeed
08:36 gnurou: for now git fetch is stalled, what's wrong with github...
08:37 karolherbst: it happens sometimes
08:37 karolherbst: abort and retry
08:37 karolherbst: :D
08:49 kloofy: minor details only about task descriptor EIP stack to go, wether it is a hw based stack pointer that per process or global, basically it pops off the stack entry with EIP --extended instruction pointer for that particular core, when there is iret instruction met, return from interrupt
08:50 kloofy: per processor
08:59 gnurou: karolherbst: well "./run shaders" again always completes successfully on your branch (and kernel 4.7), on both gm206 and gk106
09:00 gnurou: karolherbst: which test are you running exactly? it shouldn't matter as I am running them all, but just in case
09:02 karolherbst: I am sure it happend with 0ad
09:02 karolherbst: but I will retry again at home I guess
09:06 gnurou: even doing a "while true" on this particular one I have no error
09:13 karolherbst: mhh I see
09:13 karolherbst: well I will check again
09:14 karolherbst: maybe it does only happen with something special set in the kernel config or something other odd
09:54 kloofy: The TSS may contain saved values of all the x86 registers. This is used for task switching. The operating system may load the TSS with the values of the registers that the new task needs and after executing a hardware task switch (such as with an IRET instruction) the x86 CPU will load the saved values from the TSS into the appropriate registers. Note that some modern operating systems such as Windows and Linux[1] do
09:54 kloofy: not use these fields in the TSS as they implement software task switching.
09:54 kloofy: so it's almost fully understood now
09:56 kloofy: linux and widows instead simply use FIFO's for the task lists in various states
10:00 kloofy: simple fifo's so when userspace task quites, it's asks from the fifo, the data out of a program counter, quits the progam and does not insert it anymore at the top of the list based of a flag
10:01 kloofy: that userspace process will be killed
10:01 karolherbst: gnurou: do you think you will find some time to look into that pmu reset thing?
10:04 kloofy: it seems to be very simple, but there are too many possible ways of implementations
10:07 kloofy: it's like in linux kernel, there is so many complexities different schedulers and yet different prioritiy based scheds, not only basic round robin fifo, doing that in userspace seems to be a reasonable amount of work
10:15 kloofy: but in the end it's not so difficult either, when you know amount of processes and their priorites, arithemetic should be done basde of them how much or long slices they use
10:23 RSpliet: karolherbst: congrats with your merge
10:24 RSpliet: (and sorry I didn't take the time to review... *hides in a corner*)
10:36 kloofy: ah anyways the process fifos are simple, but those even with longer time slices would not guarantee perf, if one used caching too, then there is more piles of code
10:42 kloofy: but still onger slices would be obviously faster, but it may still miss the cacheline or such occationally, i remember they used mlock for realtime
10:43 kloofy: *longer
11:10 Tomin: +
11:14 Tomin: oops, sorry, I rested my hand on numeric keyboard and accidentally pressed some keys
11:16 karolherbst: RSpliet: well there are even more patches :D
11:17 karolherbst: I count 16 patches
11:17 karolherbst: but this is more for the update on temperature change part, so it is a little bit more complex
11:53 karolherbst: k, in the next step I will add that boost debugfs file I guess
13:22 RSpliet: gnurou: not attending XDC this year? :-C
13:24 kloofy: mot running whole linux kernel would make my life harder, since i need to implement scheduler my own, but not sure if phoronix is on the right track, they say linux shcheduler is not as good as people may think
13:24 RSpliet: then again, Nouveau is already pretty much outnumbered by NVIDIA there
13:24 kloofy: theoretically again same as for gpu, there could be done a little layer of runahead decoding to instruction cache
13:25 pmoreau: RSpliet: He was planning to, but had a setback. :-/
13:26 RSpliet: that's a shame!
13:27 kloofy: but i don't there are big problems with kernels scheduler on cpu, because actually both with the kernel and without the sw pipeling runahead decoding could be implemented in hw
13:33 kloofy: i am just speculating i could make it work faster by not running the whole kernel , but with that mission tiny bit of reading is yet needed, cause it needs the cache and pagetables, but i think this one comes with alteras documentation
13:37 karolherbst: RSpliet: :O well, gnurou is actually both, so we have to find a referee … D:
13:38 kloofy: gnurou: is the allimightiest nouveau and NVIDIA duck:)
13:39 karolherbst: a football match would be funny or something :D
13:42 karolherbst: what would be the proper word for it, if it has to be understood by americans, but also not that silly word starting with s? :D
13:44 kloofy: shit?
13:44 kloofy: :) yeah i know soccer is what they call it in america
13:45 waltercool: Guys, where can I fetch the nouveau patches for boost?
13:50 karolherbst: waltercool: current master
13:50 karolherbst: https://github.com/skeggsb/nouveau
13:52 waltercool: thanks :) Didn't know that repo, I will add it into my kernel to do some testing
13:54 waltercool: karolherbst: do I need to have nouveau modularized in my kernel, compile and override the nouveau.ko, that's all?
14:00 waltercool: don't worry, found the documentation :thumbsup:
14:23 Tom^: karolherbst: its not any different from your _v5 no?
14:23 Tom^: besides perhaps cosmetics :p
14:36 karolherbst: and some left out stuff, no
14:38 Tom^: oki
14:49 Tom^: cool just found a bug
14:50 Tom^: setting my 144hz monitor to 144hz in xorg with xrandr makes it garbled and flickering
14:50 imirkin: is that new?
14:50 Tom^: well its the first time i test it, got the monitor a few days ago
14:51 imirkin: how is it connected, and what resolution is it?
14:51 Tom^: DVI at 1920x1080
14:51 Tom^: the displayport didnt detect it at all
14:51 imirkin: errrrr
14:51 imirkin: can i see the xorg log?
14:51 imirkin: the modeline cvt comes up with is a 452MHz clock... which is clearly too high
14:52 Tom^: imirkin: https://gist.github.com/gulafaran/da8271ab93e360b4e0cbd945afc01772
14:53 imirkin: [ 13.410] (II) NOUVEAU(0): Modeline "1920x1080"x144.0 325.08 1920 1944 1976 2056 1080 1083 1088 1098 +hsync +vsync (158.1 kHz e)
14:53 RSpliet: cvt can't to reduced blanking for 144Hz?
14:53 imirkin: which should fit just fine in a dual-link cable
14:53 imirkin: Tom^: what about the 120hz setting?
14:54 Tom^: same thing
14:54 imirkin: are you SURE you're connecting it over DVI?
14:54 imirkin: coz it says DP1
14:54 Tom^: i had it booted in DP
14:54 imirkin: the cables do look pretty similar :p
14:54 Tom^: replaced it since it didnt detect it
14:54 Tom^: i guess i could restart X with it plugged in.
14:55 Tom^: to give you a proper Xorg log
14:55 imirkin: hold on
14:55 imirkin: DVI-D-1 i take it?
14:55 Tom^: DVI-I-1
14:56 Tom^: imirkin: https://gist.github.com/gulafaran/f5fd75eb3c4f850e1ece8b06a562f282 xrandr
14:57 imirkin: interesting. i thought that stuff normally came out in xorg logs
14:57 Tom^: imirkin: http://i.imgur.com/15Hjfzi.jpg this is how it looks like green static uh flicker :P
14:58 Tom^: works fine in 60hz tho :p
14:58 imirkin: what about 120hz?
14:58 Tom^: same
14:58 imirkin: or 100hz?
14:58 imirkin: same being... flicker or fine?
14:58 Tom^: flicker
14:58 Tom^: 99.93 gave me red flicker instead
14:58 Tom^: xD
14:59 RSpliet: Tom^: maybe silly, but did you double-check the DVI cable you used is duak-link enabled? Also: which GPU?
14:59 Tom^: 780ti
14:59 Tom^: RSpliet: and yea didnt check that, will do.
14:59 RSpliet: ok, that rules out bandwidth problems I guess... :-)
15:00 Tom^: derp single link
15:00 Tom^:facepalms
15:02 Tom^: lol silly monitor vendor, 144hz monitor but sends with a single link cable
15:04 Tom^: guess that leaves me with trying to figure out why the displayport didnt work.
15:04 Tom^: or is that a bit of a WIP in nouveau?
15:13 imirkin: Tom^: DP should work... DP-MST does not (but skeggsb is working on it)
15:13 imirkin: Tom^: normally we should be detecting single-link DVI cables and trimming those modelines out
15:14 Tom^: well, that doesnt seem to happend =D
15:14 imirkin: and even more normally, the *monitor* shouldn't send modelines over that the cable can't handle
15:15 imirkin: Tom^: http://www.abacus24-7.com/images/z-dviports.gif
15:16 ajax: the monitor can't tell if it's only connected with a sl-dvi cable
15:16 ajax: (well, maybe it could, but nobody does so)
15:16 ajax: which is why the first few generations of 2560x1600 monitors listed 1280x800 as the first mode
15:17 imirkin: lol
15:18 imirkin: seems like if the VGA-side can, the display side can too...
15:18 imirkin: but what do i know
15:19 ajax: the host side doesn't know if it's an sl-dvi cable either
15:19 ajax: it can know that it isn't equipped with a dl-dvi _transmitted_
15:19 ajax: but the cable itself is a mystery
15:19 ajax: transmitter, even
15:19 imirkin: oh
15:19 imirkin: interesting.
15:29 Tom^: imirkin: hm yea something isnt quite right with the DP port. Xorg.0.log seems to read its edid, but xrandr simply lists it as disconnected.
15:29 Tom^: imirkin: and at boot when nouveau loads up you can see the monitor powering on but simply ends up at no signal. with the dvi it clones the monitors atleast.
15:31 imirkin: Tom^: find a dual-link cable :)
15:31 imirkin: Tom^: or get skeggsb to help you debug your DP woes
15:32 Tom^: but he lives in the wrong timezone
15:32 imirkin: he *loves* DP. there's no other explanation for him spending so much time on it.
15:32 Tom^: he might aswell be on mars :(. but yea il probably just fetch some dvi dual link cable instead
15:32 imirkin: boot with nouveau.debug=debug drm.debug=0x1e
15:33 imirkin: (and with just the DP monitor attached)
15:33 Tom^: oki
15:34 imirkin: er
15:34 imirkin: nouveau.debug=debug,bios=trace
15:35 Tom^: imirkin: thats funny, with it only connected it works.
15:36 imirkin: hilarious.
15:36 Tom^: dmesg without ,bios=trace https://gist.github.com/gulafaran/abb65774f8ecfdfd5bffa03ed8a66ed0
15:36 imirkin: i'm rofl.
15:41 Tom^: ok now after uh 4 reboots? it powered on and works with both.
15:41 Tom^: something fishy is going on :P
15:42 Tom^: got dmesg for both when it was solo with trace, and when i had both plugged in and it didnt power on.
15:43 Tom^: and changing resolution on it killed it. :P
15:45 Tom^: with both plugged in and it didnt turn on. https://gist.github.com/gulafaran/accc6d5ff2e1bfa480a58291814ca8cc
15:47 Tom^: im buying dual-link tomorrow..
16:02 imirkin: that's a log from it not working?
16:03 imirkin: wtf??
16:03 imirkin: [ 5.762810] nouveau 0000:01:00.0: DRM: display: 4x270000 dpcd 0x12
16:03 imirkin: but it tries to link at 4x540 - [ 5.586727] nouveau 0000:01:00.0: disp: outp 05:0006:0f44: 4 lanes at 540000 KB/s
16:04 imirkin: skeggsb: something seems fishy
16:04 imirkin: skeggsb: and i don't see it try to retrain after it notices that the max is 270
16:05 imirkin: for some reason training is succeeding at 4x540 but it shouldn't? ugh.
16:11 waltercool: Nouveau for Maxwell is OpenGL3, right?
16:11 waltercool: maxwell2
16:12 ajax: yes. fermi and kepler are 4.3, maxwell is still only 3.3 atm
16:13 ajax: technically would be 4.1 if tesselation shaders were implemented
16:13 ajax: afaict
16:14 waltercool: Oh, is related with the signed blob?
16:14 ajax: no idea.
16:15 waltercool: oh :( Thanks for the info
16:29 kloofy: but the calculations, where around 2000threads proccess an alu, it will be completed in 4cycles just almost precise data based of amd radeon
16:30 kloofy: but...when those same amounts of threads traverse the l1 in parallel, that is satisfied with 1cycle, i wonder can that really be true that l1 is so fast
16:33 kloofy: so let's do a calc 2000x4=8000KB cause word has 4bytes but it can do like 64KB traversal with 1 cycle, why is mov so slow compared to this, this can't be accurate information?
16:35 imirkin_: ajax: fyi, in mesa-git it's 4.1 for maxwell
16:36 imirkin_: and it's related to us not computing sched information properly in shaders, which starts to REALLY matter once you start playing with memory
16:49 kloofy: probably it is correct information than, seems like it has 12kb per sm, and thread-count is bit larger, and mov is also 1cycle
17:56 kloofy: on a gpu that is quite understandable everything is virtually contiguous, cache will be access fast, but for cpu the data is just ammusing, cacje access pattern due to context switches is absolutely random
18:00 karolherbst: gnurou: nope, with current master on 4.7 I still have the same issue :/
18:19 kloofy: actually yeah alu being 1 cycle and l1 cache miss 8cycles on cpu, it could make sense, on hit servicing also with 1cycle
18:32 kloofy: https://devtalk.nvidia.com/default/topic/517591/kepler-global-memory-latency-what-is-it-/
19:58 kloofy: http://www.ece.cmu.edu/~ece447/s15/lib/exe/fetch.php?media=onur-447-spring15-lecture18-caches-afterlecture.pdf this is the basic theory
20:00 kloofy: the thing is that fully assicoative cache does tag lookups of all the tag array fifo, it is pretty weird, because then basically the cache access can have variable latency, depending how soon the hit in that particular level is found
20:00 kloofy: so i don't understand how could one measure the exact hit latency
20:22 kloofy: it's possible to optimize the cache, but lot of data in the web since , some people not understanding how it works currently does not make sense, but the last numbers could had been almost ok fron nvidia site, as seen it is in ns's not cycles
20:23 kloofy: cycles not ns.. hmm, this does not look ok to me
20:26 kloofy: https://devtalk.nvidia.com/default/topic/496975/cuda-programming-and-performance/fermi-l2-cache-how-fast-is-the-l2-cache-how-do-i-access-it-/
20:27 kloofy: ouh yeah that is consistent with google books, it could be round about, but it is some sort of average
20:28 kloofy: http://www.hsafoundation.com/html/Content/SysArch/Topics/02_Details/cache_entry.htm
20:28 kloofy: that one makes the most sense, that there is maximum l1 hit latency
20:40 kloofy: wall of text again, i am afraid i should not bother with this cache myself, i'd use the possibilities what the fpga vendors provide on hard ip cores, and just use larger regfiles
20:59 karolherbst: does anybody mind if I sort the kernel module parameters for nouveua?
20:59 karolherbst: *nouveau
21:00 karolherbst: ohh they are already sorted except nvAGP
21:02 karolherbst: I've added the NvBoost parameter to the page
21:11 karolherbst: skeggsb: by the way, how many of the kerpler cards you have access to won't reclock even with my patches?
21:13 karolherbst: skeggsb: if it is possible you can bring one of them to xdc and give it to mupuf so I can take a look through reator
21:13 karolherbst: one issue I am kind of aware is that some really high clocked gpus (above 1.1GHz or something) also make some troubles