00:13 urjaman:found one more thing that works on NV34+nouveau: OpenSCAD :) (does warn, but works)
00:20 Doctors: imirkin_; idk if you've done any more work to that one patch set, but Its been going pretty well.
00:37 imirkin: Doctors: you're gonna have to be a LOT more specific than that...
00:38 Doctors: imirkin_; The on on GPU locking.
00:38 imirkin: urjaman: i have a NV34 plugged in now, hopefully i'll be able to carve out some time + motivation to fix it up a bit
00:38 imirkin: Doctors: oh, the multithread thing?
00:39 Doctors: imirkin_; Yep
00:39 imirkin: Doctors: cool... what was your use-case? did you have an actual app that was using multiple contexts concurrently?
00:40 imirkin: [sorry, i remember most things for like ... 10 minutes. this seems to have happened more than 10 minutes ago]
00:41 Doctors: imirkin_; My git log for the branch with the fix I needed says it was from around jun 5
00:41 imirkin: [i'm like the guy in memento]
00:42 Doctors: imirkin_; I was having issues with modded MC with the mesa libary, and tested a patch you had on my GPU (and having been using that patched mesa version I built since then)
00:43 imirkin: ah cool. and MC = ?
00:43 imirkin: oh. MineCraft
00:43 imirkin: neat-o
00:43 imirkin: well, unfortunately the underlying approach i took is a little off, so it may require a bit of rethinking
00:43 imirkin: and equally unfortunately, working on this stuff requires keeping a LOT of stuff in your head at once, which is ... tiring.
00:45 Doctors:nods
00:45 Doctors: Yep get the gist on why that would be difficult to keep a bunch of things in your head
00:46 imirkin: you do something, and then you test, and then it's like "ah shit, forgot about that other case", etc
00:46 imirkin: and there's a lot of going around in circles
05:13 xut_xut: I'm interested in learning RE, specifically reversing Nvidia drivers for OpenPOWER motherboards. How would I benefit from online courses like those offered by the Linux Foundation? Any recommendations?
07:10 karolherbst_work: pmoreau: k, dmesg should show some pmu timeouts
07:49 karolherbst_work: :O there are no M variants of pascal for mobile systems
07:49 karolherbst_work: just the plain desktop gpus with 5% lower clocks or so
08:01 gregory38: maybe they will throttle more
08:02 karolherbst_work: doesn't make sense
08:02 karolherbst_work: they will just boost the hell out of the gpu until it gets too hot
08:02 karolherbst_work: as usual, but the chips will be the same
08:03 karolherbst_work: they kind of are already, just a 770m has fewer cores than a 770
08:03 gregory38: same chips == less production cost
08:03 karolherbst_work: it makes sense kind of, because it makes it comparable
08:03 karolherbst_work: yeah, that too
08:03 gregory38: and much faster time to market
08:04 karolherbst_work: I would assume for intel it is the same already, too. just that the desktop gpus have higher base clocks
08:05 gregory38: Intel has also the CPU so it is more complex to handle. But I guess they try/do the same
08:05 gregory38: during conception you need to selection low power/high performance cells (transistors)
08:06 karolherbst_work: I really doubt that matters today
08:06 karolherbst_work: even on a desktop you really want high efficient cells
08:06 gregory38: yes, you mean with clock/voltage gaint
08:06 gregory38: gating*
08:06 gregory38: maybe
08:06 karolherbst_work: nope, in total
08:06 karolherbst_work: less efficient cpus -> less performance
08:06 karolherbst_work: that's todays rule
08:07 gregory38: well, if you want to have a high frequency you need fast transistor
08:07 gregory38: however they will leak more.
08:08 karolherbst_work: I really doubt that this matters today
08:08 karolherbst_work: the frequencies are stable for like 10 years now
08:08 gregory38: because silicon is still the same ;) For PC I don't know. But it seems to be useful on smartphone
08:08 gregory38: with big-little CPU implementation
08:09 karolherbst_work: so you usually improve performance by either adding more specialized instructions or make the entire chip more power efficient
08:09 karolherbst_work: and just throw a bunch of more instructions on the die
08:09 karolherbst_work: *transistors
08:12 gregory38: yes but all transistors aren't equals that why it used to have severals techs (like low power or high performance metal gate)
08:12 gregory38: I dunno if it is still true in 16/14nm
08:13 karolherbst_work: I am pretty sure intel uses the same on haswell for desktop/mobile chips, have to check though
08:13 gregory38: oh yes likely. It is super extensive to have different chip
08:14 karolherbst_work: yep, same TDP for same freqs
08:14 karolherbst_work: and mobile TDP is always unboosted
08:14 karolherbst_work: that's why desktop chips have such high TDPs usually
08:15 gregory38: and potentially they sort the chip with less voltage requirement for mobile
08:15 karolherbst_work: possible
08:15 gregory38: I'm pretty sure
08:16 gregory38: couple of watt won't kill any desktop
08:16 gregory38: and it improve yields
08:17 karolherbst_work: the difference for kepler gpus is around 25% here
08:17 gregory38: in Watt ?
08:18 karolherbst_work: voltage requiernments...
08:18 karolherbst_work: wait
08:18 karolherbst_work: I think it is less in fact
08:18 karolherbst_work: more like 15%
08:18 karolherbst_work: mhh, no I think 25% is good
08:19 karolherbst_work: I can run my GPU at nearly 1GHz at 0.9V
08:19 karolherbst_work: desktop gpus with such a clock are usually at 1.05V or more
08:19 gregory38: they binned the chip
08:19 karolherbst_work: not really
08:20 karolherbst_work: there is a value you can read out from the gpu
08:20 karolherbst_work: and it affects voltage calculation
08:20 karolherbst_work: and for my gpu, this value is pretty good
08:21 karolherbst_work: ohh I have an idea
08:21 karolherbst_work: I just rebuild my current laptop into a desktop
08:21 karolherbst_work: and buy a new low voltage laptop :O
08:21 karolherbst_work: than I can add awesome fans and OC the hell out of everything
08:22 karolherbst_work: the laptop MB even has a MXM and a CPU slot
08:22 gregory38: well, will the chip be able to achieve the same frequency
08:22 karolherbst_work: sure
08:22 karolherbst_work: fans are just shitty
08:22 karolherbst_work: my GPU has a "base" clock of 705MHz
08:22 karolherbst_work: and full boosted 862MHz
08:22 karolherbst_work: but even under nvidia 997MHz is no problem
08:23 karolherbst_work: with the same voltage as for 862MHz of course
08:23 karolherbst_work: and I can still go up to 1.2V
08:23 karolherbst_work: so there is plenty of room
08:24 gregory38: Hum. Seem like desktop chip are the defects of the production. Or you get lucky
08:25 gregory38: by the way, may I ask when do you plan (if it isn't already done) your kepler reclocking stuff in main kernel ?
08:27 karolherbst_work: as soon as possible
08:27 gregory38: Ok. Cool :)
08:28 karolherbst_work: skeggsb wanted to review those patches and kind of stopped again
08:29 gregory38: no worry. I know that code/review takes time
08:32 gregory38: Need to go. Have a good day :)
08:41 pmoreau: karolherbst_work: Indeed it did :-)
08:46 karolherbst_work: pmoreau: as I expected, something is funky with the pmu
08:47 karolherbst_work: maybe it doesn't even run at all
08:47 pmoreau: :-O
08:48 karolherbst_work: it is a secure falcon after all
08:48 pmoreau: True
08:48 karolherbst_work: mhh we could try this
08:48 karolherbst_work: pmu_counters branch
08:49 pmoreau: If you need any data, or direct access, I should be able to provide it (direct access might be more difficult, but still doable)
08:49 karolherbst_work: and there we could see if the pmu counters are configued at all
08:49 pmoreau: K, will try that this evening
08:51 karolherbst_work: nvapeek 0x10a500 0x80
08:51 karolherbst_work: there are the counters
08:58 pmoreau: Noted
08:59 karolherbst_work: maybe it is a simple isa issue and resolved fairly simple
09:06 karolherbst_work: pmoreau: did you actually tried to run anything on the gpu after reclocking?
09:17 pmoreau: karolherbst_work: I had glxgears with vblank off the whole time while reclocking
09:18 pmoreau: And checking the output from sensors; I didn’t know about `watch`: really handy!
09:25 karolherbst_work: yeah
09:25 karolherbst_work: so it didn't get too hot then
09:25 karolherbst_work: allthough you didn't clock to max anyway
09:27 pmoreau: No, it stayed around 30 C
09:27 pmoreau: I did clock to 0f, but it had absolutely no impact.
09:27 karolherbst_work: ahh right
09:27 karolherbst_work: memory
09:27 karolherbst_work: I guess the fps also stayed pretty much the same?
09:27 pmoreau: Even to 0a, the core wasn’t clocked to max, and it didn’t moved when going up to 0f
09:28 karolherbst_work: mhh you have to increase the boost level too
09:28 karolherbst_work: 0 just means: doesn't matter what happens, the gpu stays below power budget and max temperature
09:28 karolherbst_work: that's why the clock was low
09:28 karolherbst_work: 1: mostly fine, 2: max possible
09:29 pmoreau: But does the clock range reported in pstate include the boost clock?
09:29 karolherbst_work: yes
09:29 karolherbst_work: and even more
09:30 karolherbst_work: kind of
09:30 pmoreau: Oh ok, didn’t know
09:30 pmoreau: bbl
09:30 karolherbst_work: with my patches it isn't guaranteed, that even with boost:2 that you get the max clock
09:30 karolherbst_work: pmoreau: https://github.com/karolherbst/nouveau/commit/b2a5efca3538bf6eb22186151dd9e69ba2b6e018
10:41 pmoreau: Got it
11:18 kloofy: hello, allmighty ducks of nouveau! was a training week an almost more then half handicap feel now, however things are looking good overall
11:31 kloofy: that good even that it's time for a rant;)!
11:32 fling: yay!
11:33 kloofy: well karolherbst_work mmiotrace way i once told it's pretty slow , cause handling pagefaults requires pagetable walk, i can not remember my notes precisely but there is another way with pointers and debug registers
11:45 kloofy: this one would be lot faster the idea is to alias every memory mapped reg to a debug register, once it they are written, system should raise an interrupt
11:50 karolherbst_work: kloofy: you are aware, that there are like 0x1000000 addresses in the mmio space?
11:51 kloofy: karolherbst_work: so, yeah i am?
11:51 karolherbst_work: how do you think of mapping all those addresses to debug registers?
11:52 karolherbst_work: also
11:52 karolherbst_work: read(x) != read(x)
11:52 kloofy: use a loop to alias them to the pointer, debug interrupt will give you wether it was write or read, and respective pointer *reg and reg will give the value and address of that reg
11:52 karolherbst_work: we are in kernel space here
11:53 kloofy: hmm, yeah thought about that one too
11:54 karolherbst_work: I am not saying there isn't a faster way, but mmiotrace is pretty fast already
11:54 karolherbst_work: and it doesn't walk the pagetables
11:54 karolherbst_work: it handles page faults
11:55 kloofy: well it was just a thought actually, as i classified this under a rant where about that problem i do not care about, if it is fast allready than it's all good
11:55 kloofy: i just never have used mmiotrace:) so i would not know, i just heard in the past that it was slow back then
11:55 karolherbst_work: it is slower, but still fast enough so that it doesn't matter
11:57 karolherbst_work: mhh, actually I have an idea
11:57 kloofy: karolherbst_work: yeah probably, it's actually what i think too, that is why it was classified as rant
11:57 karolherbst_work: pq: I might have an idea for an improved mmiotracer
11:57 pq: great! Don't pull me into it. ;-)
11:58 karolherbst_work: pq: instead of fake handling the page faults, why not remap the same region twice (once for the driver, once for mmiotrace) and then while handling the driver mapped region, just read out the mmiotrace mapped one
11:58 karolherbst_work: spoiler alert: I have no clue how page faults are working in linux
11:59 karolherbst_work: :p
12:00 pq: karolherbst_work, a) how would you synchronize; b) there are write-only regs; c) there are regs where a *read* triggers a side-effect.
12:01 karolherbst_work: pq: mhhhhh right
12:01 kloofy: atomic
12:01 pq: karolherbst_work, so a basic requirement is that tracing must not change the hw IO patterns at all.
12:02 kloofy: anyways about the pointer stuff where i always recap how it worked in c, is a nice tutorial on youtube
12:02 karolherbst_work: pq: right, so the only sane solution would be indeed an instruction emulator
12:02 karolherbst_work: kloofy: we are in kernel space. what C can doesn't matter at all.
12:03 kloofy: "pointers tutorial 25" is the one though it is easy i and other people tend to forget
12:03 karolherbst_work: pq: yeah, I wasn't thinking enough about that
12:03 pq: karolherbst_work, yes, and then you still need page faults to have it triggered, but you *could* avoid changing the page flags by diverting to a secondary map while you emulate. I think that might be a big win.
12:04 karolherbst_work: pq: yeah
12:04 karolherbst_work: pq: could we do stuff like this? driver map -> mmiotrace map -> hw
12:04 karolherbst_work: and only handle the driver -> mmiotrace mapings
12:04 kloofy: karolherbst_work: why so? something really must give it a userspace virtual address
12:04 kloofy: to that pointer
12:05 karolherbst_work: kloofy: there are no userspace virtual addresses in play here
12:05 pq: karolherbst_work, I'm not sure what you mean.
12:05 karolherbst_work: pq: the point of mmiotrace is, that every page is marked as non existent, always
12:06 pq: karolherbst_work, that would be optimal, except it doesn't work like that.
12:06 karolherbst_work: umph :/
12:06 kloofy: karolherbst_work: it's how soft-iommu works too, there should be away obviously
12:06 pq: if it did that, it would not need to disable SMP too
12:07 pq: karolherbst_work, currently mmiotrace changes the page markings twice every time it handles one access.
12:07 karolherbst_work: right
12:08 karolherbst_work: and I wanted to eliminate one of those
12:08 pq: and to make that "reliable", it saw no other option than to disable SMP too
12:08 kloofy: karolherbst_work: so the pointer can be in kernel this does not matter at all, since do you agree that kernel does the right, if that is kernel virtual address, the pointer will alias it?
12:08 kloofy: if there is a write it will go straight to debug reg
12:08 pq: you could eliminate both - I did not count the initial mapping that marks everything not present
12:08 karolherbst_work: kloofy: we must not read the pointer value at all, never
12:09 karolherbst_work: pq: mhhh
12:09 kloofy: anyways well i just do not seem to care either
12:09 pq: karolherbst_work, so yes, mmiotrace currently is terribly inefficient, but hey...
12:09 kloofy: i could make the patch but it seems that noone including me really cares a lot, and it seems i have other stuff to do
12:10 karolherbst_work: pq: maybe if I have enough time, I will spend a bit on writing a new one from scratch, which is faster and more awesome :p
12:11 pq: karolherbst_work, add a proper instruction emulator, add a second mapping fo the iomem (if possible?), and you can get rid of 2 page status changes and an interrupt per access, leaving you with just one (page) fault interrupt.
12:11 karolherbst_work: pq: I think the smp issue could be eliminated if we queue accesses to the same armed area and be smart about it
12:12 pq: karolherbst_work, how do you queue?
12:12 kloofy: i also saw the synchrnization i.e among different hw cores/threads the biggest issue to be handled, i.e that allready mentioned SMP stuff
12:12 kloofy: but i had some sort of solution there
12:12 karolherbst_work: pq: no idea yet
12:12 pq: karolherbst_work, really the SMP issue should be solved by an instruction emulator so we don't need to single-step nor toggle the page attributes
12:12 karolherbst_work: pq: but for example, if we get concurrent accesses to different areas, we could assume it is save to handle both concurrently
12:13 karolherbst_work: I see
12:14 kloofy: now lets move to another rant karolherbst_work don't you think that at some point we should get dx11.3 done for mesa&gallium
12:14 kloofy: ?
12:14 pq: if you had concurrent access to the same register, you'd be screwed to begin with, so
12:14 karolherbst_work: kloofy: ask in #d3d9
12:14 karolherbst_work: pq: right
12:14 karolherbst_work: we have to assume what the driver does is safe enough
12:15 pq: if you had concurrent access to the same page but not register, then getting rid of the single-stepping and page attribute toggling should solve it completely
12:15 karolherbst_work: and if it accesses the same reg concurrently, it is most likely safe to do so
12:16 karolherbst_work: but how awesome would it be to also include the kernel thread id to the mmiotrace log :O
12:17 pq: that shouldn't be hard to add...
12:17 karolherbst_work: it would give us a bit of context information
12:18 kloofy: karolherbst_work: well i talked to mannerov; who is actually one of the good and relaxed guys in community in contrast to many others, we talked about design and found that it is actually quite doable
12:18 kloofy: either the people are jsut too busy or lazy though
12:20 karolherbst_work: pq: sadly the one guy with the mmiotrace issue didn'T respond anymore :/ really annoying
12:24 kloofy: so let's move to another one, now how do you think packing the masks into virtual address in the spot of index+offset field gonna work out, or into macro/micro tiles, i kinda dry tested the theory and it this one should indeed work
12:24 kloofy: now but i again don't know if i could materialize the theory into a patch too
12:25 kloofy: among with the new precompiler based of llvm's linker, and with couple of other optimizations it should really lift the perf to heaven basically
12:26 kloofy: karolherbst_work: i assume you still want to do the scheduling the nvidia way , while i think that my way is faster
12:27 kloofy: the sourceforge project kinda disassembled the scheduling opcodes , i have a hunch but i am not really interested in their way
12:28 mmaret: Hi guys, I'm looking for some help !
12:28 mmaret: I'm facing a issue with a nvidia "Grid" card. I'm using a K520(GK104) on a amazon virtual machine (Xen)
12:29 mmaret: most of the time this card produce a big number of interrupt (like 100000/sec)
12:29 mmaret: cat /proc/interrupts reports those interrupts from nvkm
12:29 mmaret: that's why i'm asking here :)
12:30 mmaret: I said that this issue happens sometimes because on some amazon computer it seems that it does not happens
12:31 mmaret: This happens with the kernel of the ubuntu 14.04 but also when I upgrade the kernel to 4.4.13
12:32 mmaret: I think that's only related to the kernel module because the issue happens even without X; Just when nouveau is inserted
12:33 mmaret: If you have any idea ...
12:34 mmaret: or If I should directly make a bug report let me know !
12:35 pmoreau: mmaret: Can you upload the output of `dmesg` somewhere please?
12:36 kloofy: i did not understand the key sentence which was the sixth you composed, but those seem to be processor interrupts, should really look how they work in fact
12:37 kloofy: so the hw generates an interrupt and kernel polls it, would it show up in /proc/interrupts?
12:37 kloofy: does anyone know?
12:37 pmoreau: no idea
12:38 mmaret: http://pastebin.com/rpDSRYq4
12:42 pmoreau: Nothing weird in it
12:43 mmaret: it's always when you are requesting some help and people answearing that your computer crash :)
12:44 pmoreau: mmaret: You should open a bug report, that will make the issue logged somewhere at least.
12:44 pmoreau: :-D
12:44 mmaret: I was trying to say that If you want more debug (like insmod nouveau debug=spam?) let me know
12:44 mmaret: ok I'll do that
12:45 kloofy: i just had one internet buddy suffering with hw interrupts, did not really get whole lot of it, but it may almost seem that even today irq sharing can be troublesome on pcie
12:45 kloofy: at least from internet i digged up some threads where it really was
12:45 mmaret: Is it possible that this issue could come from a difference in nvidia bios version ?
12:46 pmoreau: I was hoping for some output in the dmesg to help, but… let’s leave it to more capable persons. :-)
12:47 kloofy: and off the top of my head i do not remember the userspace way to rearrange interrupt lines in sw, too tired to read today too, but it can be done from bios at least
12:47 mmaret: Thanks for having look at it pmoreau, I really appreciate !
12:48 pmoreau: Could be, if your is not initialising some things on the GPU whereas others do, and Nouveau does not initialise it either, it could make the card unhappy.
12:49 pmoreau: (And for reporting bugs, use bugs.freedesktop.org, not bugzilla.kernel.org please; just in case)
12:49 kloofy: possibly yeah:)!
12:53 mmaret: thanks !
12:55 imirkin: mmaret: chances are those interrupts are generating a good bit of load, and should be visible with perf
12:56 imirkin: mmaret: my personal guess is it's something extremely dumb, like an hpd storm. i think you can boot with nouveau.modeset=2 to disable the outputs.
12:57 mmaret: I hardly know about perf tools but, yes, you are right, system can harldy respond sometimes
12:57 mmaret: I'll try the nouveau.modeset=2
12:58 imirkin: i think the dolphin guys were playing with nouveau on amazon too, and also were running into weird issues
12:58 imirkin: you could try reaching out to delroth in #dolphin-dev
12:58 mmaret: nice to know and good idea !
12:58 imirkin: [dolphin is a wii/gamecube emulator]
13:00 mmaret: If that lead me to play wii for testing ... :)
13:06 imirkin: yes... testing...
13:23 mmaret: nouveau.modeset=2 indeed reduce the number of IRQ to a normal level
13:38 karolherbst_work: I guess we do something odd while polling for outputs or so
13:38 karolherbst_work: maybe nouveau goes into a crazy state if there is no display attached
13:40 pmoreau: Could be the VBIOS being slightly borked. I know the one from my old laptop reports some wrong data regarding outputs if I set the discrete card as the main one.
13:47 karolherbst_work: I should write more beginers trello cards :)
13:49 mmaret: kloofy, sorry I miss your question. And indeed I was not very clear... When you run some Linux Image on Amazon, you did not choose the computer that it will run on. Only the "kind" of computer, something like (little, medium, big, big with a gpu car).
13:50 mmaret: And I'm having to issue on the "big with a gpu card" sometimes
13:51 mmaret: so may be not on the same computer (e.g. vbios version are different time to times)
13:53 kloofy: mmaret: sounds as if you are really close to nailing it down, because honestly it's a mystification to me
13:53 mmaret: interrupts on pcie looks like a good idea. But from my (poor) understanding, having modeset=2 reducing the interrupts number discard this solution ?
13:53 imirkin_: mmaret: open a bug, include your vbios
13:53 mmaret: ^^ to mee too !
13:53 kloofy: i mean do you kinda remotely dommand you gpu via amazon or is it a physical connection?
13:53 imirkin_: mmaret: and also include the output of a boot with nouveau.debug=debug and drm.debug=0xe
13:54 kloofy: *command
13:54 mmaret: imirkin, I've check, I've cross at least 3 different vbios version
13:54 mmaret: kloofy, i remotely dommand it
13:55 kloofy: so then they pin another vbios to memory and send commands to your card which has another vbios?
13:56 kloofy: ouh god, that is some system then
13:56 imirkin_: mmaret: doesn't matter... just include one of them
13:56 imirkin_: it's in /sys/kernel/debug/dri/0/vbios.rom
13:57 mmaret: ok. The info in dmesg is not enough ? "[ 2.187055] nouveau 0000:00:03.0: bios: version 80.04.d4.00.04"
13:58 imirkin_: no :)
14:00 mmaret: do I have to enable some debug for having vbios.rom ? The folder only contains bufs "clients gem_names name VGA-1 vm vma"
14:03 imirkin_: mmmmm... wrong directory maybe?
14:03 imirkin_: as long as you have debugfs it should be there
14:03 kloofy: mmaret: in that case yes, that could be an issue if some wrong vga commands are sent to you'r card, however i do not know even what info is in vbios , it's pretty clear that they should remotely send api commands that are more in higher levels like opengl
14:03 imirkin_: but this could be the "real" gpu
14:04 mmaret: imirkin, you are right.. wrong directory ... mine is on /dri/1
14:05 imirkin_: mmaret: btw, just curious - are you planning on using nouveau on there for some gpu-related reasons? or your vm just happens to have an nvidia gpu and you felt like playing with it?
14:06 mmaret: yep, we really want to use the GPU
14:07 imirkin_: then you might be interested in karol's branch since it'll be able to reclock the gpu properly
14:07 imirkin_: chances are upstream won't
14:07 kloofy:is getting smarter and smarter every day, basically shocked how people use remote computers:)
14:08 mmaret: To my knowledge, having those k520 card are one of the rare solution to have GPU acces inside a virtual machine
14:09 karolherbst_work: mmaret: well, it depends on _what_ you want to do and how much performance is important
14:09 mmaret: I was looking at his github profile :) Good to know!
14:10 imirkin_: well, my guess is anytime you're playing with gpu's, perf is important
14:10 karolherbst_work: sometimes you just want to test stuff
14:10 karolherbst_work: in a few cases
14:11 karolherbst_work: is opencl even usable under nouveau?
14:11 imirkin_: not at all
14:11 mmaret: In theory i do not look for really heavy performances in a first step. But it's still interesting. And if I can test things for you karolherbst_work
14:11 imirkin_: but GL compute shaders work fine
14:11 karolherbst_work: yeah well
14:11 kloofy: mmaret: but theoretically let's pretend that is some sort either wrong vbios or vga arbitration issue , in the first case how would you be able to work around the problem:) you do not probably know what kind of vbios they have:)?
14:12 karolherbst_work: compute shaders don't even come close to opencl :p
14:12 imirkin_: if you say so
14:12 karolherbst_work: well you can't target an amd, intel and nvidia gpu at the same time with compute shaders :p
14:13 imirkin_: huh?
14:13 karolherbst_work: or have kernels running on all gpus at the same time
14:13 karolherbst_work: within the same application
14:13 mmaret: kloofy, indeed, it could be anoying
14:13 karolherbst_work: and distribute work across all
14:13 karolherbst_work: with opencl you can
14:13 imirkin_: ok ... i think at least some of that is possible with EGL
14:13 imirkin_: either way, he only has 1 gpu in there, so that's not likely to be an issue
14:14 karolherbst_work: right, allthough I would use compute shaders only if I also have to use OpenGL anyway, otherwise OpenCL is the superior solution usually
14:16 mmaret: And your work won't be upstreamed karolherbst_work ?
14:16 imirkin_: maybe in kernel 5.0 at this rate
14:17 karolherbst_work: mmaret: ping skeggsb :p
14:17 mmaret: :)
14:17 karolherbst_work: I am sure he will look more at this, if he gets 10 mails daily because of that :p
14:17 imirkin_: ben has other things he's supposed to be doing, and this isn't a priority for him. [otherwise he would have just picked up the patches a long time ago and fixed them up to his heart's content]
14:18 karolherbst_work: yeah I know :/
14:18 kloofy: this all nuclear science happened when i started to rant, karolherbst_work i was not even capable of thinking about such complex sceneraios, but as compute shaders do not have geometry you could be right
14:18 karolherbst_work: but he indeed looked over them at least once already
14:18 karolherbst_work: I would say it is like 80% done in the process or so
14:18 imirkin_: you could make your patches easier to deal with by cutting them up into logical reviewable chunks
14:18 imirkin_: coz you just have this giant pile o' stuff
14:19 karolherbst_work: :O
14:19 karolherbst_work: huh
14:19 imirkin_: (at least last i looked)
14:19 kloofy: kloofy: cause you can not split the geometry or place interrupts on the rasterizer, then probably yeah it can't be used so
14:19 karolherbst_work: there are 40 patches on the branch
14:19 kloofy: now i started to talk to myself:)
14:19 karolherbst_work: or slightlly less now
14:19 imirkin_: right, which is not a reviewable chunk
14:19 imirkin_: a reviewable chunk is 5 patches
14:19 imirkin_: maybe 10 at most.
14:19 karolherbst_work: it is a lot of stuff though
14:19 imirkin_: that doesn't mean you should go around squashing your stuff
14:19 imirkin_: but it does mean that you should take it and organize it into logical groups
14:20 karolherbst_work: and I doubt that partly merges are fine too
14:20 imirkin_: and then send them out one group at a time
14:20 imirkin_: when one group is done, move on to the next
14:20 imirkin_: at least that's what i'd do
14:20 karolherbst_work: I could extract fixes though
14:20 karolherbst_work: but that would only cut it in half
14:20 imirkin_: if the fixes are small and obvious, those tend to get insta-applied
14:21 imirkin_: that's what i mean about your series just being a pile o' stuff
14:21 karolherbst_work: k
14:21 karolherbst_work: I could indeed do that today
14:21 karolherbst_work: patches like this one can be applied alone: https://github.com/karolherbst/nouveau/commit/11ba261a6842e8ea65c493edf10696ce32a8d6d0
14:22 imirkin_: yep
14:22 imirkin_: and then it doesn't end up waiting on some of the more "speculative" patches
14:22 karolherbst_work: guess I extract all the fixes and make a boost series seperated
14:23 imirkin_: if it's gonna be a lot of work for you, double-check with ben before doing it
14:23 imirkin_: since ultimately he's applying it, not me
14:23 karolherbst_work: he said something related to that though, but it sounded like that he wanted to choose what to apply
14:24 karolherbst_work: anyway, I can extract those easy fix patches and then it looks already better
14:27 pmoreau: karolherbst_work: Depends what OpenCL features you want. If you are after summing two vectors, that works. But I doubt anyone will be happy with that alone.
14:27 imirkin_: pmoreau: but if you want to subtract, you're on your own!
14:27 kloofy: i am in depression i've gone ahead and done too much research and there is not much to be researched anymore:( boring and i miss all the time wasted on it
14:27 pmoreau: xD
14:28 pmoreau: No, substraction, mul, div and mod should work as well
14:29 karolherbst_work: :)
14:29 karolherbst_work: seems enough to do some crypto cracking
14:29 karolherbst_work: :D
14:29 pmoreau: Comparisons are stored in predicates, so I assume that will fail completely if you try to store them in a global mem vector
14:29 kloofy: yeah i thought exactly the same thing what you are discussing here
14:29 pmoreau: No ifs needed?
14:29 pmoreau: Or loops?
14:29 pmoreau: Cause I do not support those yet
14:29 karolherbst_work: if you are smart, then no
14:30 karolherbst_work: you just compensate with more launches :D
14:30 pmoreau: :-D
14:30 pmoreau: True
14:30 karolherbst_work: most useless thing in opencl: loops
14:30 kloofy: it might be possible to split the , grid -- blocks too
14:30 karolherbst_work: second useless thing: ifs :p
14:31 imirkin_: third useless thing: math
14:32 karolherbst_work: pmoreau: for example: you can do FXAA completly without loops
14:32 karolherbst_work: in opencl
14:32 RSpliet: karolherbst_work: but what if you need a large prime number of threads? Then you might need to over-launch threads if you exceed the dims of your work grid. Would be nice to have an if statement in place to make sure the useless threads actually don't do anything ;-)
14:32 kloofy:goes to cure his depression with couple rounds of beers
14:33 karolherbst_work: RSpliet: that's why ifs are more useless than loops :p
14:33 karolherbst_work: *usefull
14:34 RSpliet: fair enough, although I'd argue that you might be focussing too much on the more trivial kernels ;-)
14:34 karolherbst_work: fxaa kernels are 500+ lines usually
14:35 karolherbst_work: for image processing and stuff
14:36 RSpliet: for loops use-case, think problems where for each data point you need to find a convergence point
14:36 karolherbst_work: I would go with the argument, that static sized loops are fine though
14:36 karolherbst_work: so that the compiler can unroll them
14:37 pmoreau: Sadness, it seems that SPIR-V doesn’t have an equivalent to CUDA’s `__shf*l()` :-/
14:38 pmoreau: * shfl*
14:38 RSpliet: pmoreau: what's the English equivalent to CUDA's __shf*l() ?
14:38 imirkin_: is that like lshf?
14:38 imirkin_: or is it like shfl?
14:39 pmoreau: RSpliet: Being able to pass values between threads within a warp: http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#warp-shuffle-functions
14:39 imirkin_: ah. so shfl.
14:39 imirkin_: aka shuffle
14:39 pmoreau: Yeah, I corrected it
14:39 imirkin_: there's a NV ext for all that junk
14:39 imirkin_: (in opengl)
14:40 RSpliet: oh hmm, yeah, last time I checked OpenCL didn't have those kind of ops
14:40 RSpliet: but then
14:40 imirkin_: that was in 1995? :p
14:40 RSpliet: last time I checked OpenCL 1.2 was the state of the art
14:40 pmoreau: imirkin_: Close enough :-D
14:42 RSpliet: ahh yes, the good old days of "Smack my bitch up" and running my OpenCL 1.2 programs on a Matrox Impression
14:42 pmoreau: OpenCL has a shuffle, but it applies on a vector, so within a single thread
14:42 imirkin_: pmoreau: ah yeah. that's something else :)
14:42 pmoreau: Yeah :-D
14:43 karolherbst_work: those opencl kernels for music postproduction :3
14:47 pmoreau: So, they did introduce some in OpenCL 2.0: `work_group_broadcast()`, `work_group_reduce()`, `work_group_scan()`, `work_group_all()` and `work_group_any()`. But that is not as flexible as the CUDA equivalents.
14:48 kloofy: pmoreau: so you are working on opencl support?
14:49 pmoreau: I am, from time to time. Finally getting back on track
14:49 kloofy:got back with couple of bottled meds, namely alcohol, more precisely beers
14:50 karolherbst_work: pmoreau: I kind of forgot, but do you also have your own talk at xdc?
14:50 kloofy:buries one sided love also to the bottom of the bottle that way
14:52 kloofy: it turns out that new fpga's from xilinx with around 1000pins on 16finfet technologies arria10 specific model from altera and ku13P from xilinx have 10billion transistors on the die
14:53 kloofy: and pretty good price, still thinking about writing firmwares for them
14:54 kloofy: imirkin: we basically have an offer, to be a sw mogul for those hw's in the future, all that is needed is to write software for them:)
14:54 imirkin_: mmaret: could you include a boot without nouveau.modeset=2 and with nouveau.debug=debug drm.debug=0xe
14:55 pmoreau: karolherbst_work: We need to finish discussing it with hakzsam and mupuf, but maybe I could join their talk and do a small presentation.
14:56 mmaret: imirkin, ok I will. A bit latter, I've a meeting now
14:57 karolherbst_work: pmoreau: huh, what do they talk about? just the status update talk?
14:57 karolherbst_work: I thought everybody will participate there :D
14:57 pmoreau: If I understood correctly, yes
14:58 pmoreau: But as I said, we still need to discuss that later today
14:58 karolherbst_work: k
14:58 kloofy: imirkin : any insiigths about those new codecs you have, those hevc h265 and what not, are you dealing with video decoding those days around too?
14:58 karolherbst_work: I wouldn't mind to talk a bit on the status talk too
14:59 karolherbst_work: pmoreau: on my list what I've done so far (which should be also the stuff since last XDC I guess): gddr5 fixes, pcie relink, hwmon (voltage, power)
15:00 kloofy: imirkin_: is like sissy little insaulted girl, still does not talk to me
15:00 pmoreau: karolherbst_work: PCIe relink?
15:01 karolherbst_work: pmoreau: pcie link speed changes
15:01 pmoreau: Ah k
15:01 karolherbst_work: reclocking is a stupid word somehow
15:01 karolherbst_work: no idea what the best word is though
15:42 imirkin_: mmaret: in case you're looking to run android-y stuff and need ES 3.1, make sure to update mesa to 12.0.1. still no AEP though... getting close on that.
16:27 mmaret: damm ! I have another try with modeset=2 and I got a high number of IRQ. This issue is driving me mad !
16:35 imirkin_: mmaret: with the boot you pasted, did you have the high interrupt issue?
16:39 mmaret: yes
16:39 imirkin_: weird :(
16:40 mmaret: :(
16:43 RSpliet: karolherbst_work: reclocking is silly :-P PM (either power management or performance management) is a more generic term
16:43 imirkin_: mmaret: would be great if you could get a perf trae
16:43 imirkin_: trace*
16:45 Yoshimo: have you guys managed to fry the maxwell2 cards yet? ;)
16:45 RSpliet: DVFS might be fairly accurate, if you define "dynamic" to be "at run-time", and the criteria then not load-based but user-requested
16:46 RSpliet: Yoshimo: my deep-fat fryer isn't big enough for a Maxwell2 card :-(
16:47 RSpliet: karolherbst_work: either way DVFS is the final goal, and we've made forward progress towards that goal by fixing PLL-related issues (karolherbst), pcie link reconfiguration (karolherbst), hwmon (karolherbst et al.), voltage selection (karolherbst) - some of it already upstream :-P
16:47 Yoshimo: on a more serious note, were you successfull in changing the speed without fan support?
16:48 RSpliet: Yoshimo: I don't think anyone tried, lacking the infrastructure to run our memory clock change scripts on PDAEMON for >= Maxwell2
16:49 Yoshimo: ok what did i read a few lines back then?
16:49 RSpliet: other PLLs might be possible and shouldn't be much different from Maxwell1
16:49 pmoreau: Yoshimo: Memory did not reclock on my GM206 when I tried yesterday
16:50 pmoreau: Nor did it fry ;-)
16:50 Yoshimo: 206, is that a 970?
16:50 imirkin_: 960
16:51 pmoreau: Right, 960
16:51 RSpliet: karolherbst: in case you missed the IRC logs: reclocking is silly :-P PM (either power management or performance management) is a more generic term. DVFS might be fairly accurate, if you define "dynamic" to be "at run-time", and the criteria then not load-based but user-requested. Either way DVFS is the final goal, and we've made forward progress towards that goal by fixing PLL-related issues (karolherbst), pcie link reconfiguration (ka
16:52 karolherbst: RSpliet: your messages is cut off and yes I missed that
16:52 pmoreau: "pcie link reconfiguration (karolherbst), hwmon (karolherbst et al.), voltage selection (karolherbst) - some of it already upstream :-P" for the end
16:53 karolherbst: :D
16:53 karolherbst: I see
16:55 karolherbst: pmoreau: if you don't look into that, I will reclock maxwell2 memory this weekend :p
16:56 karolherbst: RSpliet: well I guess I use the DVFS term then
17:00 RSpliet: it's widely accepted :-)
17:04 karolherbst: ... it seems like my ISP is pretty much screwed these days
17:06 Yoshimo: maybe the shouldn't sell their contracts so cheap and invest more in infrastructure instead ;)
17:06 karolherbst: lol
17:06 karolherbst: if they would be cheap
17:07 karolherbst: a damn ripp of it is here
17:07 karolherbst: *rip
17:07 Yoshimo: 100mb for 35 isn't bad
17:08 karolherbst: it is
17:08 karolherbst: it is very bad
17:08 karolherbst: in other countries you get 1G for less
17:08 imirkin_: or in the USA, where you pay more for dialup :p
17:08 karolherbst: :D
17:08 karolherbst: only the US is worse
17:08 Yoshimo: you have to compare inside the country karol
17:09 karolherbst: that won't work at all here
17:09 karolherbst: and is stupid anyway
17:09 karolherbst: ohh wait, nice monopoly laws you got here, everything seems fair indeed then :p
17:09 karolherbst: if everybody costs 100€ for 20M it is fair, right ;)
17:09 karolherbst: *everyone
17:09 Yoshimo: although i heard they are very good at enforcing net neutrality in USA
17:10 karolherbst: hihi
17:10 karolherbst: sure
17:11 Yoshimo: last offtopic comment on that: maybe the local power,gas &water company will bring fibre to you. UnitedInternet starts to build such contracts with local infrastructure holders
17:11 Yoshimo: looking forward to the maxwell weekend
17:19 pmoreau: karolherbst: I certainly won’t look into it! You fool! :-D (Apart from testing your pmu branch tonight.) I have no idea how that works, and the OpenCL stuff is enough to keep me busy ;-)
17:25 mlankhorst: +
17:26 karolherbst: pmoreau: k
17:35 karolherbst: hakzsam: still doing something with reator?
17:35 hakzsam: no, have fun
17:36 karolherbst: yay gm206 :)
17:36 karolherbst: this gonna be fun
17:45 karolherbst: okay, the pmu doesn't seem to run at all
17:46 karolherbst: :O
17:46 karolherbst: the heck
17:47 karolherbst: mupuf: on your gm206 at 0f engine clocked to max
17:47 karolherbst: enabling clock gating
17:47 karolherbst: 45W -> 30W
17:47 Yoshimo: < imirkin> Yoshimo: AoA should be supported on maxwell v2. in order to expose GL 4.3, we need to expose images on maxwell, which in turn requires some instruction scheduling work to be done , hakzsam is that something you are currently working on?
17:47 mupuf: karolherbst: nice!
17:48 karolherbst: frigging down by 33%
17:48 karolherbst: mupuf: by the way, any idea how I could start the pmu on gm206?
17:49 karolherbst: mupuf: on lowest perfmode: 14.8W -> 12.8W
17:50 mupuf: karolherbst: what do you mean by "start"?
17:50 karolherbst: well
17:50 karolherbst: I have no clue if the pmu runs or not
17:50 mupuf: you need to extract the pmu from the blob first
17:50 karolherbst: at least my pmu counters code isn't executed
17:50 karolherbst: nah, I wanted to use the nouveau pmu code first (if that's possible)
17:51 mupuf: well, maybe these regs are also privileged?
17:51 mupuf: you can write to a scratch register and read it back
17:51 karolherbst: let's see
17:52 karolherbst: mupuf: nope, I can write them from the host
17:53 karolherbst: something is funky with the pmu, because even nvkm_send_pmu doesn't get anything
17:53 mupuf: ok, first instruction in your code: write to a scratch register
17:53 mupuf: a certain value
17:53 mupuf: this way, you can know for sure
17:53 karolherbst: the pmu counters work too
17:53 karolherbst: k
17:54 mupuf: oh, but IIRC, the seqno is written by the current pmu code to one scratch reg
17:55 karolherbst: mhh odd
17:55 karolherbst: I am wondering why the voltage changes
17:56 karolherbst: odd
17:58 karolherbst: uhh, the heck
17:58 hakzsam: I was on summer holidays so I didn't start to work on this yet, but it's my plan for the next few days :)
17:58 hakzsam: Yoshimo, ^
17:58 karolherbst: I think it reads out the gpios...
17:59 Yoshimo: nice
18:00 karolherbst: "bios: found ranged based VIDs"
18:00 karolherbst: mhhh
18:00 karolherbst: maxwell2 with gpio vids?
18:00 karolherbst: :O
18:01 karolherbst: indeed
18:02 karolherbst: ahh k, something is odd reading out the temperature
18:04 Yoshimo: what is a vid exactly used for?
18:04 imirkin_: the gpio's are used for setting voltages
18:05 karolherbst: what is pgob?
18:05 imirkin_: there's also a pwm-based voltage setter
18:05 imirkin_: karolherbst: power gating something
18:05 karolherbst: k
18:06 pmoreau: karolherbst: So, since you can play with Reator, I no longer need to try the PMU stuff, as you have already tried it, right?
18:06 karolherbst: exactly
18:07 pmoreau: Nice! One thing less to do! :-D Since I have a working VBIOS now, I’ll go back to that bug report.
18:12 karolherbst: mupuf: scratch is badf5040
18:13 mupuf: karolherbst: oh oh, fun
18:13 karolherbst: yeah
18:13 mupuf: they may have changed a lot of things on maxwell
18:13 mupuf: maybe pmu is not even fuc-based LD
18:13 karolherbst: I am sure it works on maxwell1
18:13 karolherbst: like 100% sure
18:14 karolherbst: now I would need gnurou for this
18:14 karolherbst: do you know skeggsb knows something about this?
18:14 karolherbst: *if
18:15 mupuf: really? How about trying to deasm what the blob uploads?
18:15 karolherbst: maybe because I don't have it?
18:16 karolherbst: and I wouldn't be allowed to begin with :p
18:17 karolherbst: or did you mean for maxwell1?
18:22 Yoshimo: didn't we have a rule that decompiling is allowed to make your stuff compatible or did i get that wrong?
18:25 karolherbst: Yoshimo: you got it wrong
18:25 karolherbst: well
18:25 karolherbst: in theory yes
18:25 karolherbst: but in practise it means shit
18:25 karolherbst: it also means, you have to ask first and stuff
18:25 karolherbst: if they say: give us 1M €, you aren't allowed anymore, cause there is a way besides decompiling
18:25 karolherbst: more or less ;)
18:29 Yoshimo: would have been too easy
18:30 karolherbst: Yoshimo: you know, freedom as long as it isn't hurting economy ;) otherwise you could just re like every commercial software, what a world would that be if that would be actually allowed :p
18:30 karolherbst: *disasamble
18:32 Yoshimo: you still would have to make your case that you have something that needs to be compatible with it and you otherwise can't
18:32 karolherbst: and it is the only way to get compatibility ,)
18:35 karolherbst: mupuf: :D when in doubt, disable secboot :D
18:36 karolherbst: pmu counters up and running now
18:36 karolherbst: :O
18:36 karolherbst: the heck?
18:36 karolherbst: why do I get fan data now
18:37 karolherbst: mupuf: are you home?
18:37 karolherbst: \o/
18:38 karolherbst: AC: core 1240 MHz memory 7009 MHz on gm206
18:43 karolherbst: okay...
18:43 karolherbst: I think I know the issue
18:43 karolherbst: the pmu is used to do most of the secboot crap
18:43 karolherbst: but is left in the HS state
18:43 karolherbst: most likely
18:44 pmoreau: karolherbst: You managed to reclock memory? O.O
18:44 karolherbst: sure
18:44 karolherbst: :D
18:44 karolherbst: thing is
18:44 karolherbst: no secboot is done now
18:44 karolherbst: so it is pretty pointless
18:44 pmoreau: "Obviously, it was so easy!" :-p
18:44 karolherbst: mhhh
18:44 karolherbst: actually
18:44 karolherbst: I can unload nouveau
18:44 karolherbst: and do secboot now
18:44 pmoreau: Gj!
18:44 karolherbst: and memory might stay upclocked :O
18:45 karolherbst: that 50W power consumption though
18:45 karolherbst: ...
18:45 karolherbst: with fans at 900rpm
18:45 pmoreau: :-D
18:46 imirkin_: and temp slowly rising...
18:46 karolherbst: strike
18:46 karolherbst: AC: core 1392 MHz memory 7009 MHz after nouveau reload with secboot
18:47 karolherbst: hihi: AC: core 405 MHz memory 7009 MHz
18:47 karolherbst: damn pmu
18:47 karolherbst: we need a solution for that
18:48 karolherbst: \o/
18:48 karolherbst: 30935 frames in 5.0 seconds = 6186.897 FPS
18:49 karolherbst: uhhh
18:49 karolherbst: now it is getting interessting
18:49 karolherbst: 75682 frames in 5.0 seconds = 15136.304 FPS
18:50 karolherbst: 60W @ 57°C
18:50 karolherbst: sounds sane
18:50 karolherbst: https://gist.github.com/karolherbst/8d1fc37a40d580b02745d91aaef408d8
18:51 karolherbst: I guess I am done now :D
18:52 karolherbst: I told ya it works on maxwell2 :p
18:53 Yoshimo: there must be a catch somewhere
18:54 karolherbst: Yoshimo: you need to load nouveau twice
18:54 karolherbst: once with secbbot disabled -> reclock memory -> reload nouveau with secboot enabled
18:58 Yoshimo: is that something that can somehow be automated?
18:58 karolherbst: sure
18:59 karolherbst: insmod non_secboot_nouveau.ko; echo 0f > /sys/kernel/debug/0/pstate; echo 0 > /sys/class/vtcon/vtcon1/bin; rmmod nouveau; insmod secboot_nouveau.ko
18:59 karolherbst: done
19:06 Yoshimo: i only have one nouveau.ko, is that a trivial change to get two versions?
19:06 karolherbst: cp
19:07 karolherbst: but you shouldn't do that without fan controls
19:08 Yoshimo: probably won't come before volta
19:08 karolherbst: no clue, maybe
20:13 karolherbst: imirkin: do you think 20 patches are fine for now, where 10 patches of those are super trivial?
20:14 karolherbst: mhh doesn't matter though, because ben didn't had much to complain about them and I already fixed the issues
20:14 imirkin_: i would advise against a chunk of more than 10 patches.
20:14 karolherbst: well, those are already reviewed by ben
20:14 karolherbst: and it actually implements all required things to reclock the cards without issues
20:15 imirkin_: so ... more than just little fixes
20:15 karolherbst: it doesn't contain the update on temperature code, which is what the other patches are for
20:15 karolherbst: well
20:15 karolherbst: it is just fixes though
20:15 karolherbst: + vpstate table
20:15 karolherbst: but that's all
20:15 imirkin_: well, you can take or leave my advice
20:15 imirkin_: but my advice is to not do more than 10 patches at a time
20:16 karolherbst: I know, but if I only fix it partially, it won't imrpove anything and kind of makes it even worse
20:16 imirkin_: so ... these things aren't just individual obvious fixes?
20:16 karolherbst: they are, but if you fix some parts
20:16 karolherbst: nouveau can actually tries to reclock one gpu
20:16 karolherbst: and crashes due to undervolting
20:16 karolherbst: before that, reclocking would simply fail
20:17 imirkin_: ok, well do whatever
20:17 imirkin_: then order it s.t. it keeps failing
20:17 imirkin_: until everything is in
20:17 karolherbst: so voltage map table fix first
20:28 karolherbst: mhh, it is actually fine, because I really don't think it should go in partly, because there is also no point in doing so. and then people see stuff got merged, think it is fixed and then complain it isn't
20:48 imirkin_: heh. and you haven't fixed the "0ed" to be "zeroed" even though iirc even ben asked for that...
20:55 imirkin_: patches 13+ could have been left off to another series...
22:18 karolherbst: imirkin_: ohh, totally forgot about that :/
22:20 karolherbst: imirkin_: mhh, the most important patch is 19 though, without that, reclocking is still unstable