01:33g5ntoo: hi, i'm using a NV43 [GeForce 6600 LE] in a powermac G5 (ppc64, big endian). it's not really working all that well :)
01:33Fervi: glxinfo :)
01:34Fervi: and send to pastebin
01:34g5ntoo: wait a moment
01:38Fervi: ok; you have nouveau :D
01:38Fervi: please paste /var/log/Xorg.0.log
01:38Fervi: and what do you mean "it's not really working all that well"?
01:39g5ntoo: when i start glxgears, i can only see the first frame
01:39Fervi: yhm, ok
01:40Fervi: but still give Xorg.0.log file :P
01:40g5ntoo: coming :)
01:42imirkin: g5ntoo: yeah, the BE experience isn't so great
01:42imirkin: g5ntoo: but hey - at least all the colors aren't reversed now ;)
01:42Fervi: it's something :P
01:42gruetzkopf: https://paste.ccc.ac/?9d679fee9441c466#ISPpUpai+pRZJwmdBa0JqoOwzX6xAtCq3WewlKukOB8= <xorg log
01:42g5ntoo: that's true, the colors looked correct
01:43g5ntoo: ^ gruetzkopf is sitting next to me
01:43imirkin: g5ntoo: what's the specific issue?
01:43Fervi: glx show only first frame
01:43Fervi: or just glxgear
01:43imirkin: hmmmm glxgears should work
01:43imirkin: oh wait, is this a NV46?
01:44imirkin: hm, nope. NV43.
01:44gruetzkopf: is NV46 extra dead?
01:44g5ntoo: input doesn't seem to work, but that might have a different reason
01:44imirkin: no, but it has messed up MSI
01:44imirkin: which we for a long time failed to account for properly
01:45imirkin: are you seeing nouveau errors galore in dmesg by any chance?
01:46Fervi: imirkin - Geforce 6600 LE support DRI3?
01:46g5ntoo: nope, no nouveau messages past the initial configuration in dmesg
01:46imirkin: Fervi: not sure, but xorg 1.17.4 almost certainly doesn't
01:47Fervi: g5ntoo, can you upgrade your kernel, xorg etc?
01:47imirkin:wouldn't advise that
01:47imirkin: Fervi: the DRI3 thing is a bug that was introduced in xorg 1.19
01:47gruetzkopf: that'd take a while
01:47g5ntoo: https://ipfs.io/ipfs/QmRpsB15o31bqDRLco1yvjA4bm91vgDuSNUaBo7QYevfT9 <-- lspci btw, in case someone cares
01:48imirkin: can you 'watch cat /proc/interrupts' and see whether the gpu's intr counter is increasing quickly
01:49imirkin: [should be 60/s for vblank, but it should be a lot faster with 3d going on i think]
01:50gruetzkopf: thats staying at 1
01:50gruetzkopf: exactly one
01:50g5ntoo: also, at 1680x1050 pixels, which is automatically configured because it's the screen's size, the screen doesn't turn on; only at smaller resolutions
01:51imirkin: can i see dmesg?
01:51gruetzkopf: not incrementing in any way
01:51imirkin: the "init messages" might contain a gpu hang in them :)
01:52RSpliet: mmm... where to find a 64-bit GPU bound benchmark
01:52g5ntoo: https://ipfs.io/ipfs/QmaQrbqrGDyKak9vyBpFaGk5Ccx6y2WhRz1RVc6iRo7g7u <-- dmesg
01:58imirkin: two separate ideas -
01:58imirkin: (a) you don't have some of my fixes/workarounds from "recent" mesa versions. although i think that 11.0 (your mesa version) did have them. will check
01:59imirkin: (b) i checked on an agp gpu where the GART is effectively disabled due to the G5 AGP GART being ... broken. you have a PCIe model. can you try booting with nouveau.config=NvPCIE=0 which should disable the PCIE gart.
02:00gruetzkopf: we'll try b first ;)
02:00g5ntoo: imirkin: is that part of the kernel command line?
02:01g5ntoo: i mean: should i put it on the kernel command line?
02:01imirkin: ok, looks like those changes actually went into mesa 11.1, however i did tag them for stable, so your 11.0.x version should have them.
02:01imirkin: g5ntoo: yes.
02:01g5ntoo: good, will try
02:09imirkin: gruetzkopf: https://bugs.freedesktop.org/show_bug.cgi?id=98630 - at least someone else has gotten a lot further with a nv4x on a g5.
02:23g5ntoo: it took a little while to reboot
02:26imirkin: did it help?
02:29g5ntoo: not in any obvious way
02:33g5ntoo: nvkm isn't firing any interrupts :(
02:33g5ntoo: well, one
02:33imirkin: that's surprising.
02:33imirkin: and things are generally working well?
02:34imirkin: other than the fact that glxgears doesn't make progress?
02:34gruetzkopf: vty console works fine, cmatrix in a xterm in x works fine
02:35imirkin: does glxgears print stuff like "xyz fps" every 5 seconds? or does it never print anything?
02:36g5ntoo: which seems to make some sense. if you render stuff on the gpu, it probably has to finish asynchronously, but if you just push a rendered image, the gpu doesn't need to fire an interrupt
02:36imirkin: the DDX performs accel
02:36imirkin: and the fbcon is accelerated as well
02:36imirkin: i think all that stuff needs interrupts to operate properly
02:37g5ntoo: no, it renders one frame and then it hangs (glxgears)
02:38imirkin: ok, so i think it's something in the DRI2 stuff where it's getting blocked up
02:38imirkin: and can you confirm that you're using xorg 1.17.x?
02:39imirkin: yeah, i don't remember that having any DRI2 issues.
02:39imirkin: in fact i dunno that it had dri3 at all. probably... i think that was the first release with it
02:40imirkin: oh, but dri3 won't work on nv4x coz of the whole sharing thing, i think. so nevermind.
02:41g5ntoo: i'll try turning off the openfirmware framebuffer driver (not sure if it helps)
02:43imirkin: nah, that shouldn't matter
02:43imirkin: nouveau takes over from offb
02:47g5ntoo: we have a kernel 4.4 here, maybe we should try a newer one
02:47imirkin: shouldn't matter iirc
03:58g5ntoo: imirkin: hmm, currently mesa 11.0.6 is installed. should i upgrade to 13.0.2 and see what i get?
03:59imirkin: wouldn't hurt, but i doubt it'll fix things. i think 11.0.x has the fixes.
05:04mrrhq: I hate to ask, but will there be full Maxwell/Pascal support?
05:05imirkin: define 'full'
05:25mrrhq: Full 2D/3D support. I wish all the laptops sold today were Kepler ones, but they're still nearly just as expensive as ones with the GTX 1070.
05:25mrrhq: It's getting harder and harder to find GTX 780 Ti's around for a decent price.
05:26mrrhq: First-world problems. :^)
05:27imirkin: well, among other things, we don't support video encoding accel on kepler
05:27imirkin: and don't support e.g. opencl or cuda on any generations
05:27imirkin: so "full" can have a lot of definitions :)
05:28imirkin: either way, i suspect that the GM20x support will be in place in a few years
05:28imirkin: by the time it's totally and utterly obsolete
05:31imirkin: GM20x enablement is pretty much at the mercy of nvidia, who has no incentive to release the various firmware
05:31imirkin: it'd be nice if someone spent time working on a firmware extractor (unfortunately the firmware has become much harder to get at as well)
05:32imirkin: but that would still not be redistributable even if we could get it
05:51imirkin: skeggsb: what does "protect against concurrent access to semaphore buffers" fix? (i.e. what kinds of errors would that produce)
05:52skeggsb: list corruption coming from nv84_fence_context_new/del()
05:52imirkin: which would usually show up as a BUG/WARN? or something else?
05:52skeggsb: nouveau_bo_vma_add/del() are the functions touching the list
05:53skeggsb: if kernel list debugging is on, it'll show as a warn
05:53imirkin: k. don't think i've ever seen such a thing reported. although kernel list debugging is probably not a very popular feature.
05:55skeggsb: i was triggering it randomly, and rarely
05:55skeggsb: don't know 100% for sure it's fixed, but, it hasn't happened again :P
05:57imirkin: well, certainly that particular race seems fixed
05:57imirkin: could be others lurking of course
05:57skeggsb: no doubt
05:57skeggsb: i'm squashing them as i find them
05:58imirkin: i'm really happy you're spending some time on that
05:58imirkin: i think there's a lot of stuff like that which happens rarely, but combines to general instability
05:58skeggsb: running full piglit runs pretty much the entire day will show those up quite well :P
05:59skeggsb: i tell you what though.. the 3d driver is held together with some magic..
05:59imirkin: that's why i don't run piglit - tends to kill my boxes :)
05:59skeggsb: so damn fragile
06:03imirkin: some of it is "it was like that when i got there", but i'm sure a bunch of it is my bad too...
06:03skeggsb: i think a lot of it is "just make it work, it's too much effort to undo the bad"
06:04imirkin: definitely a thought i've had in the past :)
06:04skeggsb: meh, it's understandable :P
06:05imirkin: btw - you probably haven't gotten to it yet - but there's basically no ssbo/compute/etc tests in piglit
06:05imirkin: however dEQP has a ton of good ones for gles3.1
06:06imirkin: [and i assume GL-CTS covers it too if you have access]
06:07imirkin: i deleted my 'locking' branch btw, since it sounds like *distros* were looking to carry those patches, which is basically the worst idea ever. let me know if you need to see those patches for whatever reason.
06:08imirkin: although i assume you're well past them by now
06:08skeggsb: well, honestly, i'm still on the "untangling some mess" stage, i'm hoping i'll be able to reduce the required locking to the minimal amount
06:08imirkin: sounds good
06:09skeggsb: i'd be far further ahead if i didn't keep hitting weird unrelated issues :P
06:09imirkin: the *biggest* thing to avoid is ... lock; sleep; unlock;
06:09skeggsb: but, it's kinda good too, because this shit needs fixing
06:09imirkin: which can happen esp with fence waits, etc
06:17mrrhq: imirkin: Btw, video support on an external GPU is pretty much neglegable to me, in fact, I bet even something like the HD 620 could support 4K video decoding/rendering easily. Besides, MPV can just use OpenGL instead of VAAPI or VDPAU.
06:18mrrhq: Not that I would know how to use OpenGL extensions or shaders to make videos run smoother or look better.
06:18mrrhq: It's not my forte.
06:18imirkin: mrrhq: opengl is used for presentation, not the actual decoding.
06:18imirkin: mrrhq: and while it may not be important to you, it's part of "full support"
06:18imirkin: i also mentioned video *encoding*, not decoding
06:18mrrhq: But could I still use a modern Intel/AMD CPU to decode video?
06:19imirkin: decoding with dedicated hw will always be way more efficient
06:19mrrhq: Well, by encoding do you mean changing from one video format to another?
06:19RAOF: And also on the video; even modern CPUs might struggle with 4K h265.
06:19RAOF: (On the low-end CPUs)
06:20imirkin: i mean taking any video stream and encoding it
06:20mrrhq: I should really look at Kaby Lake reviews and see if that's true. I still use 1080p for many good reasons. :p
06:20imirkin: e.g. what's being rendered to your screen. or a pre-existing video. or whatever.
06:21mrrhq: Intel Iris is still a lot better than Intel HD as well.
06:21imirkin: but again - diff people have diff bits that are important to them
06:21mrrhq: imirkin: Oh okay, like streaming or recording video?
06:22imirkin: for example.
06:22imirkin: anyways, i gtg.
06:22RAOF: Steam streaming, for example, will use hardware acceleration where available.
06:23mrrhq: Yeah even Firefox uses hardware acceleration, but does it only include video? I wonder.
06:23RAOF: No, I mean Steam streaming will use the hardware to encode the video stream sent to the remote box.
06:24mrrhq: I'm one of those oddbalss that prefers to use ffmpeg or OBS on Linux for streaming.
06:24RAOF: That doesn't get you playing your Steam games on your remote box :)
06:24mrrhq: Oh you mean something like Chromecast?
06:25mrrhq: Yeah I don't know all of what Steam can do.
06:25RAOF: No, I mean http://store.steampowered.com/streaming/
06:26mrrhq: I've heard of the Steam Link though.
06:26mrrhq: It's practically a thin-client machine.
08:20mrrhq: Does NVIDIA G-Sync work with Nouveau?
08:21karolherbst_work: no, due to the lack of hardware afaik
08:22Calinou: it won't work, yeah
08:32mrrhq: Thanks guys. Great to know. I've been thinking of getting a customized laptop from Eurocom, and one of the options asked me if I wanted G-Sync.
09:28Calinou: >gaming laptop
10:21mwk: a laptop with G-Sync? what?
10:24mwk: so they made a new thing and called it G-Sync, ignoring the previous thing called G-Sync. nevermind.
10:48mooch: hey mwk
10:54mwk: hi there
10:54mooch: mwk: i just merged my nv3 and nv4 emulations into one source code file
10:56mwk: seems we're doing the same thing lately :p
10:57mooch: eh, i got the idea from you lol
10:58mooch: i'm not going to throw in nv1 into that mess tho, oh HELL no
10:58mwk: yeah, probably a good idea
11:34mooch2: mwk, what does palt do on nv3?
11:41RSpliet: karolherbst_work: Do you have a GPU-bound benchmark that I can use?
11:41RSpliet: preferably 64-bit - steam doesn't like my (32-bit) build of mesa
11:42mwk: mooch2: it's an interface to an external bus on the physical NV3 chip
11:43mwk: a card maker could use it to connect some kind of extra chip to the nv3
11:43mwk: PALT is the interface to whatever is connected there
11:43mwk: that thing is only present on NV1 and NV3, and it's unused on all my cards
11:44mooch2: well then
11:44mwk: FWIW the external bus is a simple ISA-ish parallel bus, exactly identical to the one used for the flash ROM
11:45mooch2: i don't know how the flash rom writing even works, so
11:45mwk: writing is kind of complicated, but not very complicated
11:46mwk: but reading is simple and important :)
11:47mwk: PROM and PALT work basically the same way, just forward the accesses to whatever's on the bus
11:47mooch2: wow, really? so you just write to PROM and it writes to the actual rom???
11:48mwk: it's not that simple
11:48mwk: there's an accidental write protection, you have to write some magic sequence first
11:48mwk: and you have to write in block-sized units
11:48mwk: but - yes, flashing the ROM is done by writing the data to PROM :)
11:49mooch2: also, my riva tnt doesn't work on some BIOSes because the vbios is 64k
11:49mwk: what's the problem with that?
11:50mooch2: battler said it's because some bioses expect the vbios to be 32k or under
11:50mooch2: which doesn't make sense to me
11:50mwk: maybe some very old ones...
11:50mooch2: i was having this issue on a pentium board
11:50mooch2: the riva 128 worked fine, but plug in the riva tnt, and it just refused to boot!
11:51mwk: oh, so you have the hardware now?
11:51mooch2: i'm talking about on the emulator
11:51mooch2: we emulate specific boards
11:51mooch2: for dat higher accuracy :)
11:52mooch2: oddly enough, bochs's bios doesn't work here
11:52mooch2: it has ALL KINDS of issues
12:04karolherbst_work: RSpliet: pixmark piano
12:04karolherbst_work: RSpliet: but best are those which aren't
12:04karolherbst_work: RSpliet: so that you see actually lower clocks used
12:05karolherbst_work: RSpliet: and if you really want to test it, branch: better_dr
12:29RSpliet: karolherbst_work: sure pixmark piano isn't CPU bound?
12:31RSpliet: I did this "optimise for minimal register file bank conflicts" thing following the hint from a research paper I stumbled across last night, and observed no difference at all in pixmark piano perf
12:31karolherbst_work: it does consume 100% CPU though
12:31karolherbst_work: I see
12:31RSpliet: and 1.5fps increase in xonotic (on ~240)
12:31RSpliet: so not a lot :-D
12:31RSpliet: now the pixmark piano shader is a 4000-line dragon, not sure how representative that is for real games(tm)
12:31karolherbst_work: pixmark piano reacts quite strong on changes in the shader binaries
12:32karolherbst_work: much more than anything else
12:32RSpliet: not the RA change that should (and shows to) reduce conflicts :-D
12:32mupuf_: RSpliet: no changes should not be a reason not to merge something :D
12:32mupuf_: There may be another bottleneck right now
12:33mupuf_: although, in this case, it does not seem like it
12:33RSpliet: mupuf_: sure, but it's not trivial code :-)
12:33mupuf_: ah, I see
12:33RSpliet: (and needs a bit of cleaning up, it hard-codes four banks currently, probably unnecessary for NV50 at least)
12:33karolherbst_work: RSpliet: I wouldn't bet on it
12:34RSpliet: karolherbst_work: bet on what?
12:34karolherbst_work: RSpliet: did you verify that the shader changes according to your expectations?
12:35RSpliet: of course I did!
12:35RSpliet: I have a diff at home
12:35RSpliet: it's a little naive in the sense that it doesn't consider dual-issue - so a reduction in conflicts inside individual instructions does not necessarily mean reduction of conflicts per cycle
12:36RSpliet: and conservative in that it tries hard not to increase total GPR usage... that can maybe be relaxed a bit
12:38karolherbst_work: if you hurt dual issueing you hurt perf by a lot
12:38RSpliet: I don't hurt dual-issue
12:38karolherbst_work: you can't ignore it
12:38karolherbst_work: ahh okay
12:38karolherbst_work: so what is the goal of that RA pass?
12:39karolherbst_work: oh I think I got it
12:39RSpliet: paper claims that registers are in four (groups of) banks, interleaved. Highest throughput is achieved when source regs for instructions are spread over these banks
12:40RSpliet: so trying to avoid allocations like add r0 r0 r4 - instead use for the second reg something whose modulo four is not zero
12:40karolherbst_work: wat a second
12:42karolherbst_work: ohh crap, I removed the paste :/ I have the nvidia generated binary of pixmark piano
12:42karolherbst_work: but only at home
12:43RSpliet: on a best effort basis. Live ranges gives a (very) conservative estimate GPR usage (assumes the size of a reg is always 1), and if the "preferred bank reg" is beyond that boundary, it falls back to the old first-fit allocation strategy
12:43karolherbst_work: did your pass change the instruction count?
12:43RSpliet: to avoid expanding GPR. Could probably take the upper bound GPR for max warps in flight instead (which is what, 32?)
12:43RSpliet: no it didn't
12:43RSpliet: just RA
12:44karolherbst_work: well sure, but you have still layouting for tex
12:44karolherbst_work: and stuff like that
12:44RSpliet: those kind of special restrictions trump the regular allocation anyway, unaffected
12:44karolherbst_work: like if you add a tons of movs to fit those quad/tripple/double regs, the performance impact _is_ noticable in pixmark
12:44RSpliet: again, there's no difference in instruction count
12:45RSpliet: I'll show you the diff tonight if you're interested
12:45karolherbst_work: but maybe the difference in perf is too small with less parallelism
12:45RSpliet: but in the meanwhile, I seek a benchmark that is representative for a modern game. openarena and xonotic are too old, pixmark is too static
12:46RSpliet: I expect a bigger impact for shaders with high reg count, as they can't hide the register access penalties as well in the pipeline
12:46karolherbst_work: test my traces then :p
12:46RSpliet: karolherbst_work: I'd be happy to
12:47RSpliet: are they public? can I replay them easily? 64-bit?
12:54karolherbst_work: RSpliet: https://drive.google.com/drive/folders/0B78S7GSrzebIemFQZlJyaExySlU?usp=sharing
12:55karolherbst_work: some are CPU bound though
12:55karolherbst_work: they are mainly for bug testing
12:55karolherbst_work: except the saints one
12:55karolherbst_work: and civ5
12:55karolherbst_work: and antichamber
12:55karolherbst_work: the only dangerous one is the metro trace
13:11mupuf_: RSpliet: valley may help, but it still is heavily memory-bw limited
13:12mupuf_: the gfxbench 4 suite's ALU tests may find some improvements
13:12mupuf_: you can also try manhatan (3.0 and 3.1)
13:13karolherbst_work: mupuf_: do you think you might find some time this weekend?
13:13mupuf_: karolherbst_work: for what? the fan issue?
13:14mupuf_: the only time I have free is sunday evening
13:15karolherbst_work: no, discussing the groundwork for the dynamic reclocking things aka how to not fire an interrupt all the time. I think I got it working now, but I have one issue I don't find a simply solution for
13:15mupuf_: karolherbst_work: oh, that should fit my morning commute
13:16karolherbst_work: I can explain the issue really easy though
13:16mupuf_: go for it then
13:16karolherbst_work: whenever I request a reclock on the PMU, I save the read out values
13:16karolherbst_work: when I don't get an ack, I only send on more extreme values
13:16mupuf_:will think about it after his finnish exam
13:16karolherbst_work: so, this works in general
13:17karolherbst_work: one issue is though when this happens:
13:17karolherbst_work: gr: 80 %, vid: 10 % -> request reclock -> no ACK -> gr 10%, vid: 80% -> request reclock -> no ACK -> gr: 80%. vid: 10% ....
13:17karolherbst_work: and so on
13:17karolherbst_work: ohh, there are max/min instructions right?
13:18karolherbst_work: nvm then :D
13:18karolherbst_work: solution: if the PMU requests a reclock to higher ones, it only overwrites the old values with higher values
13:18karolherbst_work: this should fix this
13:18mupuf_: you should treat each domain independently
13:19karolherbst_work: didn't want messy code though on the PMU
13:19karolherbst_work: but I could mark the reclock direction
13:19mupuf_: yeah, I can get this :D
13:19karolherbst_work: and adjust all domains with min/max
13:19karolherbst_work: so I only save either _just_ higher vlaues or _just_ lower values
13:19mupuf_: sounds reasonable
13:19karolherbst_work: k :)
13:19mupuf_: or... you let the kernel tell you the range it cares about
13:19karolherbst_work: that requires to many answers
13:20karolherbst_work: there are other issues anyway like temperature
13:20karolherbst_work: things change on the host
13:20karolherbst_work: so I don't want to depend on anything except a silly ACK ont eh PMU side
13:21karolherbst_work: ACK basically means: set your saves values to the target loads and go from there
13:21karolherbst_work: so if a request was at 90% and the target load is 70%, the saved value is 90% on request, and if the PMU gets the ACK, it resets it to 70%
13:22karolherbst_work: and hopefull the load stays around 70% + margin, so no requests are sent anymore
13:22karolherbst_work: but this are details now :D
13:23karolherbst_work: mupuf_: by the way, do you know some theories about how to implement really nice algorithms? we could be as dumb as nvidia, but I plan to do something better than they do :p
13:24mupuf_: karolherbst_work: PIDs are often use for this sort of work
13:24mupuf_: but ... not sure if the raw signal will not be too noisy for it
13:33karolherbst_work: uhh well sure, that makes sense on the CPU side
13:35karolherbst_work: mupuf_: well ideally we would want to have the timer set really fast, maybe to display freq *4 or so
13:35karolherbst_work: and just depend on the smart algorithm to not interrupt the host the entire time
13:36mupuf_:may be tempted to have a fast upclock and slow downclock path
13:36mupuf_: like nvidia
13:36karolherbst_work: this is easy to do
13:36mupuf_: fancy stuff for lowering, reflex actions for increasing the perf
13:37karolherbst_work: nvidia isn't fancy with lowering
13:37karolherbst_work: it just waits :D
13:37mupuf_: nope, it has a threshold and a wait :D
13:37karolherbst_work: … :D
13:38karolherbst_work: anyway, we can do better
13:38mupuf_: I guess the learning should be on how quickly to clock down the gpu
13:38karolherbst_work: I like the idea about having a fast timer on the PMU
13:38mupuf_: it should depend on how often we switch from high to a lower
13:38karolherbst_work: 1ms even worked here, but the host got upset at some point
13:39karolherbst_work: mupuf_: is there any reason to do anything smarter than "if not 20 seconds below this thresholds don't request a down clock"?
13:42mupuf_: 1ms could be a good reflex action ... but we will expose ourselves to un-necessary upclocks when ... scrolling a page in the browser?
13:42mupuf_: 20s :D
13:42mupuf_: if you have a 1ms upclock timer, you can definitely allow yourself to downclock in 1s
13:42karolherbst_work: if it works well, we could just remove that 20 second pause
13:43RSpliet: karolherbst_work: thanks for the traces. I'll look into them tonight or tomorrow
13:43mupuf_: but yeah, 5s could be a good threshold for a first implementation
13:43karolherbst_work: I would like to set the timer to target framerate * 4 to be sure
13:44karolherbst_work: so we can reclock 4 times within a frame
13:44mupuf_: well, we don't want that
13:44karolherbst_work: which is still annoying for spikey work loads
13:44karolherbst_work: messy are things where you have like 10 seconds nothing really and then a short spikey thing
13:44mupuf_: but having a latency of up to framerate/4 would be quite nice
13:44karolherbst_work: I don't know how much my algorithm pushes the clock currently
13:44mupuf_: nvidia is not being too smart here though
13:45karolherbst_work: no idea if doubling is the max thing or 3x or just 1.5x
13:45mupuf_: so let's not try to be too aggressive at first
13:46mupuf_: I would say, set the polling rate to 10ms, that does not sound too crazy low
13:46karolherbst_work: I ignore memory for now anyway for now
13:46mupuf_: and no clock down for 5 seconds
13:47mupuf_: or a small clock down for every 100ms spent under a threshold (like 40% usage)
13:53karolherbst_work: down clocks are according to load anyway
13:53karolherbst_work: but the 1ms interval might be not enough to get reliable loads
13:53karolherbst_work: so if you are 10% under the target it clocks down by 7.5% or something like this currently
13:54karolherbst_work: why do I do it linear anyway
13:55karolherbst_work: 10% load should lead to a drastic down clock
13:55karolherbst_work: but meh
13:55karolherbst_work: but it would most likely only half the clock at most
13:55karolherbst_work: doesn'T matter though
13:55mupuf_: yeah, nvidia has thresholds, but after this, the drop/increase depends on how far from the threshold it is
13:55karolherbst_work: yeah sure
13:55karolherbst_work: I do it already too
13:55mupuf_: hence why I said: Do a prototype in the userspace
13:55karolherbst_work: except the threshold thing on downclocks
13:55mupuf_: make it work and then push it to the PMU
13:56mupuf_: you are just doing things in the wrong order here :D
13:56karolherbst_work: uhh everything is basically done
13:56karolherbst_work: I had a userspace daemon once
13:56mupuf_: done but untested
13:56karolherbst_work: it was messy and not reliable
13:56mupuf_: and probably suboptimal
13:56karolherbst_work: sure, but I rather tweak the module than a userspace daemon
13:56mupuf_: this is a research project, not upstream code ;)
13:57mupuf_: you are very very weird
13:57karolherbst_work: yeah, and I want an reliable environment
13:57karolherbst_work: mupuf_: it takes 5 second to reload nouveau for me :p
13:57karolherbst_work: while X is running
13:57mupuf_: yeah, lucky you
13:57mupuf_: then, go for the module-side
13:57mupuf_: polling at 10ms on the cpu side is not a biggy
13:58RSpliet: speaking of which - is the LLVM fµc back-end going places?
14:00mupuf_: RSpliet: mwk is probably busy with another project that will be released soon
14:01RSpliet: mupuf_: I've seen lots of work on hw validation, but curious whether someone else might have picked this up :-)
14:01mupuf_: not as far as I know
14:02karolherbst_work: mupuf_: I don't pull on the CPU :p the PMU side is totally unrelated to the actually algorithm and implementation
14:02karolherbst_work: except the timer
14:03karolherbst_work: it is just a bit smart about when do requests reclocks, but it holds no information about the algirithm itself
14:03karolherbst_work: except the thresholds...
14:05karolherbst_work: mhh I could let the host be able to reconfigure those
14:10mwk: I'm not sure if it's worth anything to anyone at this point, esp. since it's been dead for three years and I have no intentions of reviving it, but...
14:36mooch2: well then
15:04imirkin: RSpliet: note that different RA strategy *can* affect instruction counts, both as a result of different spilling, as well as a result of constraint register moves. (stuff like mov $r0 $r0 gets removed at the end, but depending on RA those may or may not be identical registers.)
15:04imirkin: RSpliet: you may also be interested in looking at the various perf counters available, they may show you what you're looking for
15:05RSpliet: imirkin: it *can* also affect GPR count in the presence of doubles and quads - if we suddenly leave more gaps of single reg size
15:06RSpliet: but the latter can arguable be fought by assigning registers from large to small
15:08RSpliet: imirkin: RA does a pass of eliminating those moves + preferred register pinning prior to assignment
15:10imirkin: well, it can only eliminate those moves post assignment
15:10imirkin: it's done in the PostRASomething
15:15RSpliet: mmm, I guess game benchmarks are the best judge of which strategy contributes more to performance in the end :-)
15:15imirkin: the usual suspects are things like valley and heaven
15:16RSpliet: I'll take Karols traces tonight and see what happens
15:16karolherbst_work: in worst case all traces are CPU bound by apitrace
15:17RSpliet: oh, he has neither of those traces :-) thanks
15:18RSpliet: FX6300 is unfortunately not the beefiest of CPUs
15:18RSpliet: not complaining, but I realise my limitations
15:19RSpliet: karolherbst_work: no worries, I can install Unigine heaven
15:21RSpliet: (would be silly if you can hack compilers and kernels but not install a benchmark, right?)
15:24karolherbst_work: I wouldn't be so sure about that in general :O
15:39karolherbst_work: functional dependencies are in for 4.10 :) sounds like something we could use for the audio stuff
15:40karolherbst_work: Lekensteyn: you saw it?
15:42karolherbst_work: funny, whenever I ask anybody to work on nouveau, the usual response is "I never reverse engineered" or "never done any kernel programming", weak excuses I won't accept anymore!
15:49RSpliet: karolherbst_work: "Remember the first time you had sex? Yeah, your first experience with reverse engineering will be something like that...!"
15:50Lekensteyn: karolherbst_work: haven't noticed it, will keep in mind for next year :)
15:52imirkin: RSpliet: i prefer the analogy from lord of war...
15:52RSpliet: imirkin: enlighten me?
15:52imirkin: Selling a gun for the first time is a lot like having sex for the first time. You're excited but you don't really know what the hell you're doing. And some way, one way or another, it's over too fast.
15:53karolherbst_work: of course it is about guns :p
15:53imirkin: well... it's Lord of War.
15:53imirkin: what did you expect.
15:54imirkin: [great movie btw]
15:54RSpliet: write-after-read hazards
15:54karolherbst_work: I am sure americans rather discard sex than weapons :D
15:54RSpliet: but we're diverging...
16:08imirkin: gnurou: what's the holdup with releasing gp104/gp106/gp107 gr accel firmware?
16:09imirkin: are they really so different from gp100?
17:42imirkin_: skeggsb: if you're really trying to make the kernel more solid, you might want to tilt at the max-texture-size windmill - mlankhorst says that running a few of those in parallel loops always kills things
17:55bloblo: I have gtx 770, Linux lcls 4.8.0-1-686-pae #1 SMP Debian 4.8.7-1 (2016-11-13)
17:55bloblo: 0f: core 405-1241 MHz memory 7010 MHz AC DC *
17:55imirkin_: the AC: line is the one that matters.
17:55bloblo: nouveau 0000:02:00.0: clk: failed to raise voltage: -22
17:56bloblo: nouveau 0000:02:00.0nouveau 0000:02:00.0: clk: error setting pstate 3: -22: clk: error setting pstate 3: -22
17:56bloblo: when i put 0f pstate
17:56imirkin_: you want either linus's git, or drm-next
17:56bloblo: AC: core 405 MHz memory 7009 MHz
17:57imirkin_: it should be fixed in the upcoming 4.10-rc1 release
17:57bloblo: i am using debian kernel
17:57bloblo: ah i see
17:57bloblo: when 4.10-rc1 out aproximatly ?
17:57imirkin_: 2 weeks from now
17:58imirkin_: (or more like 1.5)
17:58bloblo: ok thank
18:43RSpliet: karolherbst: on your civ5 benchmark I see a consistent 0.5fps improvement... nothing too shocking is it
18:44RSpliet: but at the same time a 3.5% perf improvement - sounds a little better
18:48RSpliet: unigine heaven renders only a blank screen on my machine?
18:56RSpliet: does work with my home-brew mesa... but that's not very useful for comparison
18:57imirkin_: heaven used to work =/
18:57RSpliet: imirkin_: it works with upstream mesa
18:58RSpliet: not with Fedora mesa
18:58RSpliet: so... I'll have to complain with the Fedora folks
18:58imirkin_: oh ok
18:58imirkin_: what version is fedora mesa?
18:58imirkin_: maybe they have some awesome patches on top :)
18:58imirkin_: that sounds like a recent version
18:59RSpliet: two weeks old apparently
18:59karolherbst: RSpliet: civ5 is mostly CPU bound though
19:00karolherbst: especially this trace
19:00karolherbst: and with especially, I mean especially for real
19:01RSpliet: well, getting consistently 3.5% more FPS over the stock fedora mesa (which I wrongfully but optimistically assume would be the same as when I'd build mesa without the patches... wrongfully because the linker matters)
19:01RSpliet: what about saints 3
19:02karolherbst: no clue, we do something terribly wrong there
19:02karolherbst: generally not CPU bound
19:02karolherbst: but also no high gpu load
19:02karolherbst: worth a try though
19:02RSpliet: I just need a few benchmarks that work :-D
19:03karolherbst: you could use the grimrock one
19:03karolherbst: it should be light enough on apitrace
19:03RSpliet: is that 32-bit?
19:03karolherbst: those are traces
19:03RSpliet: and will it be when I apitrace that?
19:04karolherbst: I have no clue if that makes a difference in the trace, I would be surprised if it would
19:10RSpliet: 9.86322 fps -> 10.4404 fps
19:10RSpliet: for saints 3
19:11RSpliet: it's something... and there's a few knobs I still need to tweak
19:11karolherbst: do you have cpu boosting disabled and everything?
19:12karolherbst: because I expect that the binary quality of mesa matters a lot here
19:12RSpliet: I don't think AMD does boosting
19:12imirkin_: RSpliet: not to diminish your work, but such changes could result from just random changing of RA policy. definitely would want to get some wider testing.
19:12RSpliet: imirkin_: no that's my feeling as well
19:12karolherbst: also it sounds like you compare your local compilation against system installed? or did I missunderstand this part?
19:13RSpliet: by the very least I should try a better apples-to-apples comparison
19:15RSpliet: imirkin: with civ5 I noticed GPR usage went up by one for two or three shaders after patches, but down by one in one or two occasions as well
19:16karolherbst: mhh I wouldn't imagine that any such compiler changes would make any difference with that trace
19:16karolherbst: it does around 400k gl calls a frame most of the time
19:17karolherbst: but yeah, if your patches indeed improve performance there, I am happy as well
19:17karolherbst: just didn't expect that it would matter
19:17RSpliet: karolherbst: the linker has a ridiculous impact on performance of CPU bound operations
19:17RSpliet: so I'm not cheering just yet
19:17karolherbst: you should compare against your own build :p
19:18RSpliet: that's better, but still no guarantee
19:18karolherbst: good enough though
19:18RSpliet: ideally I have a debugging flag that I can abuse to disable it, and use the same build for both benchmarks
19:19karolherbst: well sure, but you know, if we couldn't rely on performance difference between two commits getting built the same, we would have serious issues elsewhere already
19:20karolherbst: anyway, on pixmark_piano the CPU impact is... not there
19:21karolherbst: and even full debug vs full release build have the same performance
19:21karolherbst: and you can actually run it 50 times and you would get the same result +-1 difference in absolute value there
19:58thican: Good evening everyone, could I ask please if you know when the support for the nvidia's GPU 10XX generation will be available in nouveau?
19:58thican: I am unfortunately unable to use nvidia's proprietary binary on my kernel, because of PaX.
19:59karolherbst: thican: when nvidia releases the signed firmware
20:00thican: isnt the 375 version?
20:01karolherbst: well we don't know where the firmwares are
20:01thican: this? http://www.nvidia.com/download/driverResults.aspx/112992/en-us
20:01karolherbst: the driver isn't equal to the firmware
20:02thican: ah ok, my bad.
20:02karolherbst: there are somewhere inside the dirver though, we just don't know where
20:03thican: ok, thanks for the answers. :-)
20:04thican: Just a question, tho, why isn't possible to have a minimal support even with this generation? are those completly different?
20:04karolherbst: there should be minimal support
20:04karolherbst: just no hw acceleration
20:04thican: Is it normal nouveau says "unknown chipset" about this 1050 Ti?
20:05karolherbst: ohh, you will need the 4.10 driver for this
20:05thican: ok, so I have something wrong on my system
20:05thican: 4.10? you talk about the kernel? :'(
20:05thican: sad, it will be in 2 months, I guess :/
20:06thican: what about xf86-video-nouveau? I have the version 1.0.13
20:06thican: but you said I should have minimal support
20:06thican: currently, I only have a 80x24 terminal in TTY
20:06karolherbst: maybe there is a drm-next package for you
20:07thican: and Xorg isn't able to start
20:07thican: I am using libdrm 2.4.74
20:07thican: Mesa 13.0.2
20:08karolherbst: you need a more recent kernel
20:09thican: I have 4.8.11
20:10karolherbst: yeah, that's too old
20:10karolherbst: look for a drm-next package or compile a recent kernel yourself
20:10thican: Well, the problem is the lack of support for my Hardened profile :/
20:11thican: GRsec and stuff
20:11karolherbst: ohhh, that makes sense
20:11thican: Well, I should have say I am using Gentoo with an Hardened profile, maybe :-)
20:11thican: Hence my unability to use nvidia-drivers
20:12thican: I wish I could contribute, but my skills are so low :/
20:12karolherbst: mhh ogg
20:12karolherbst: last posibility is to compile nouveau yourself
20:12karolherbst: or a kernel with the patches
20:12karolherbst: I am currently making a 4.9 based branch with all recent changes
20:13karolherbst: 4.8 wouldn't work for various reasons
20:14karolherbst: mhh 4.9 doesn't work either cause of the atomic stuff, meh
20:14thican: sad :/
20:14thican: Because the GPU 1080 got released 2 months ago, no?
20:14karolherbst: you could apply the patches yourself though
20:14karolherbst: gp106 right?
20:14thican: do you know where I could get those patches you are talking about, please?
20:15karolherbst: try this one: https://github.com/skeggsb/linux/commit/1fe487d7d2858265e23f10fa6b4565112f9a17fe
20:16thican: Thanks again, karolherbst :-)
20:16karolherbst: append .patch for the raw file
20:20karolherbst: mhh, maybe I indeed managed to rebase on 4.9, had to throw out all the atomic modesetting stuff :/
20:21thican: Another question about this patch, I should have the 4.9 branch to apply it, right? It won't work with 4.8, will it?
20:21Yoshimo: So there is no intention to beat NVIDIA in grabbing the firmware before they officially release it? ;)
20:21karolherbst: thican: mhh, try it with 4.8
20:21karolherbst: ohh wait
20:21karolherbst: let me think
20:22karolherbst: thican: try with your 4.8 kernel
20:22thican: Don't worry, I am not that hurry :-)
20:22karolherbst: should work
20:22thican: I still have my old graphic card.
20:22karolherbst: if not, only a handful additional patches should be needed
20:22thican: ok, thanks for support, karolherbst :-)
20:23karolherbst: Yoshimo: :D help if you want
20:23Yoshimo: our last attempt was just a file with garbage iirc
20:26thican: Yoshimo: I don't understand, why do we need Nvidia's firmware? What does this file (or stuff) provide?
20:27Yoshimo: the ability to control the fan
20:27karolherbst: and more
20:27karolherbst: we are talking about pascal
20:27thican: isn't it the same code between the other GPU?
20:27thican: You mean you reverse it from this file?
20:28karolherbst: no, pascal is a lot of different
20:28Yoshimo: even if they were, they wouldn't have the official nvidia signature
20:28karolherbst: and we need the _signed_ firmware, beacuse we can't sign it, only nvidia
20:28karolherbst: and other stuff the gpu won't accept
20:28thican: ah ok!
20:28karolherbst: leaving the hw accell units not accessable
21:22thican: I didn't understand this was a signed componant
21:23thican: and with the GPU requiering it to be signed
21:23thican: I guess we don't need the firmware from nVidia, but in fact their keys
21:24karolherbst: yeah, that would indeed help
21:24thican: why not their source code, then? :D
21:24karolherbst: or maybe not
21:24karolherbst: who knows
21:24thican: and their blue prints!
21:24thican: and their patents!
21:24karolherbst: the patents are public
21:24thican: yes, true
21:24karolherbst: anyway, if somebody of us has access to those blueprints, this would be a legally really dangerous situation
21:25thican: why taht?
21:25karolherbst: well, those are company secrets
21:25thican: yes, indeed
21:25karolherbst: and if they got out, they want to find out how that happened
21:26thican: I guess we should not tell them, therefore :D
22:08RSpliet: unigine finds avoiding reg bank conflicts worth .0239fps - apples to apples :')
22:08karolherbst: mhh, at least something
22:08RSpliet: that's nothing in my book :-D
22:08karolherbst: I didn't want to be negative
22:09karolherbst: now you know why I use pixmark_piano for shader improvements :p
22:09karolherbst: if you get pixmark_piano 2% faster, it accounts for ~0.4% across everything else
22:10karolherbst: ohh right
22:10karolherbst: nvidia shader
22:10imirkin_: RSpliet: make sure you're building your test mesa with the same opt flags as your non-test one
22:10RSpliet: imirkin_: it's the same mesa
22:10imirkin_: RSpliet: ah ok
22:10RSpliet: I hacked up an optimization level 4 and made sure the debugging flag is read out even in a non-debug build
22:11RSpliet: (and before you ask: it works, GPR count and binary sizes differed when trying it on saints row apitrace ;-))
22:12karolherbst: RSpliet: nvidia version of pixmark_piano: https://gist.github.com/karolherbst/7cb64c31804e9009e36f2d001a79727c
22:13RSpliet: oh yes, I promised you the reg allocation diff... from yesterday
22:14RSpliet: karolherbst: http://paste.fedoraproject.org/506561/17536541/
22:15karolherbst: RSpliet: do you notice how far nvidia tries to put a lot of space between defs and sources?
22:16RSpliet: karolherbst: I do, but that's mostly a scheduling problem, very little an RA problem?
22:16RSpliet: you can predict and generate that distance on SSA too
22:17karolherbst: I think we should have a basic SSA based scheduler
22:17karolherbst: would be fun to try out
22:17karolherbst: and just treat every instruction the same for now
22:17RSpliet: thought you did that half a year ago? :-P
22:17karolherbst: past RA
22:17karolherbst: or while RA
22:18RSpliet: mmm, yes, generally a bad idea to mix those two responsibilities up
22:18karolherbst: I have a trivial rescheduling pass to improve dual issueing though
22:18karolherbst: makes up for ~6% perf in pixmark_piano
22:19karolherbst: oaky, more like 3%
22:19RSpliet: yes, I've been thinking of that... I feel that that is part of a bigger whole of determining the fitness of instructions in a window and picking the fittest - where dual-issue probably is the fittest of all
22:19karolherbst: with the other it was 6%
22:19karolherbst: RSpliet: https://github.com/karolherbst/mesa/commit/1ff5e7eaf137b6d344c75228948b396206ff06b4
22:20karolherbst: we could actually do both thins
22:20karolherbst: scheduling pre SSA
22:20karolherbst: do the RA stuff
22:20karolherbst: and then tweak for dual issueing later
22:21karolherbst: this might actually work
22:21RSpliet: don't jump back an forth straight away ;-)
22:21karolherbst: well, with a proper scheduling pass, we have a higher dagree of flexibility post ra
22:22karolherbst: and can move things way further
22:22RSpliet: thinking a bit more, dual issue depends on your register assignment, so optimise for minimal latency sounds like a good target for post-RA scheduling
22:22RSpliet: will very likely overlap with some of hakzsams work if you want to consider this kind of stuff
22:23karolherbst: I have other things to finish before that anyway
22:23RSpliet: pre-RA scheduling should target GPR use minimisation by limiting liveness ranges
22:23RSpliet: very non-trivial :-D
22:24karolherbst: like this? "total bytes used in shared programs : 17375696 -> 17284272 (-0.53%)"
22:24karolherbst: ohh wait
22:24karolherbst: this is bytes :D
22:24nullbyte_: how can i block nouveau in opensuse?
22:24nullbyte_: i have tried everything, but i don't know how?
22:24RSpliet: nullbyte_: I think opensuse already does that for you, but ask the suse people if you need a distro-specific option
22:25nullbyte_: what option i have already asked him they didn't know
22:25RSpliet: the generic solution is blacklisting the nouveau kernel module
22:26RSpliet: https://en.opensuse.org/SDB:NVIDIA_the_hard_way ?
23:58gnurou: imirkin_: they come from a different driver branch (r367 vs r361) and this seems to be the cause
23:58gnurou: imirkin_: I say "seems" because the firmware is opaque to me too and I have no control at all over the process