01:28 hakzsam: imirkin, I just fixed occlusion_query_meta_no_fragments on nv50, trivial
01:31 mupuf: hakzsam: famous last words!
01:39 hakzsam: imirkin, https://github.com/hakzsam/mesa/commit/6f0fafc6ad82a4615b9d4ae1f07e96c57da667d1
01:40 hakzsam: you did the same for nvc0 a few days ago, that's why I said it's trivial ;)
01:40 karolherbst: mhh, I think I give up with the mmiotrace for now
01:40 karolherbst: my entire kernel behaves really strange then
01:41 karolherbst: inside pstore: https://gist.github.com/karolherbst/b88cec852d0ad1779f38
01:41 karolherbst: just for loading the nvidia module
02:26 hakzsam: imirkin, http://paste.awesom.eu/YrOw this fixes dmesg errors for max-samplers on nv50, but the tests still fails...
02:47 karolherbst: mupuf: I don't think I can help with anything which includes generating a mmiotrace, I really don't know why my system gets pretty unstable with it built in the kernel
02:50 pq: karolherbst, why is the kernel backtrace mentioning nvidia symbols when you are just enabling mmiotrace?
02:50 karolherbst: I enabled mmiotrace and then load nvidia
02:50 karolherbst: then the system crashes
02:50 pq: that's not what the log looks like
02:50 karolherbst: its pstore
02:51 karolherbst: the order of logs are quite strange
02:51 pq: it looks like you load nvidia first, then enable mmiotrace, which then fails somewhere and has nvidia symbols in the trace
02:51 karolherbst: so you have to look in higher numbered parts first
02:51 pq: karolherbst, are you perhaps using a script to enable mmiotrace and immediately then load nvidia.ko?
02:51 karolherbst: Oops#1 Part2 comes before Oops#1 Part1
02:51 karolherbst: no
02:52 pq: huh?
02:52 pq: why is your paste unordered? makes reading it very hard
02:52 karolherbst: as I said: pstore
02:52 karolherbst: I just cat the files
02:53 pq: so... you can't do that in the right order?
02:53 karolherbst: pq: the system is gone after loading the kernel
02:53 karolherbst: gone in like: nothing happens anymore
02:53 karolherbst: I could reverse the ls output and cat in that order :D
02:54 pq: bbswitch? is that bumblebee?
02:55 karolherbst: yeah
02:55 karolherbst: its fine, never had any problems with it so far
02:55 karolherbst: nouveau can't deactivate my card either all the time
02:55 karolherbst: but I have to use a hack for now anyway
02:56 pq: "so far", yeah... I'd get rid of that first
02:56 pq: no idea what it does, why would you need it for mmiotracing?
02:57 pq: anyway, the kernel crash is a NULL deref, which means there is a bug somewhere.
02:58 karolherbst: only happens if mmiotrace is inside the kenel though
02:58 pq: um, how else could it be?
02:58 pq: I'd imagine mmiotrace to be good in exposing races, because it makes things crawl when drivers might not expect it.
02:59 pq: also, because bbswitch pokes the nvidia card, I'd imagine it may interfere with mmiotracing
03:00 karolherbst: it does not
03:00 pq: I don't know what happens if one driver mmaps an IO region, then mmiotrace enables, and then another driver mmaps the same region.
03:01 karolherbst: bbswitch does nothing nvidia related
03:01 karolherbst: it just executes ACPI methods
03:01 pq: oookay...
03:01 karolherbst: if it told to do so
03:01 karolherbst: it just provides a handy sysfs interface for executing the ACPI methods to turn on/off the dedicated gpu
03:01 pq: I'd still recommend getting rid of all kernel modules you don't strictly need, if you want to see if it might work.
03:02 karolherbst: yeah, but I had the problem before, its nothing new :/
03:02 pq: oh?
03:02 pq: that the command 'nvidia-smi' causes a kernel panic?
03:03 karolherbst: yeah
03:03 karolherbst: simply if mmiotrace is built in the kernel
03:03 karolherbst: even if its deactivated
03:03 pq: ahha
03:03 karolherbst: I removed it now, because otherwise I can't do much
03:03 pq: I'd like to blame nvidia.ko for being broken, but have no evidence
03:04 karolherbst: also nouveau does crash the kernel
03:04 pq: a slight incompatiblity between nvidia.ko and your kernel perhaps?
03:04 karolherbst: nearly everything can mess up my kernel then
03:04 karolherbst: maybe
03:04 pq: by a NULL deref?
03:04 karolherbst: running 4.1
03:05 karolherbst: fun fact: sometimes my kernel doesn't really crash
03:05 karolherbst: but drm seems to be messed up
03:05 karolherbst: in a way, that I can't even switch to ttys
03:05 karolherbst: but my discs are doing stuff
03:05 pq: so you have issues all over the place?
03:05 karolherbst: a bit
03:05 karolherbst: I should enable sshd again :D
03:06 pq: bummer, sounds like the kernel is "broken" - could be anything.
03:06 karolherbst: yeah
03:06 karolherbst: could be gcc though
03:06 karolherbst: overoptimizations or something
03:06 karolherbst: since the begining there were always isra calls inside the traces
03:07 karolherbst: once it was pretty much obivous it was isra related
03:08 karolherbst: maybe I just have a bad combination of kernel features enabled
03:09 karolherbst: but its not too bad actually, I have a long list of stuff I would like to test with the nouveau driver
03:16 karolherbst: mhhh
03:21 karolherbst: I think there will some wine troubles come in the near future
03:21 karolherbst: a process just though it would be a nice idea to probe for the nvidia kernel module very two seconds while nouveau was loaded
03:24 karolherbst: faster actually
03:25 karolherbst: isn't that nice? https://gist.github.com/karolherbst/f52cd0090aadd9ed08c6
03:25 karolherbst: and it doesn't stop
04:17 hakzsam: imirkin, the problem seems to be related to vertex shader only
04:21 karolherbst: effractur:
04:21 effractur: ?
04:22 karolherbst: ...
04:22 karolherbst: ohh
04:22 karolherbst: my mistake
04:22 karolherbst: wasn't aware I wrote this
04:23 effractur: np
04:58 hakzsam: imirkin, okay, i think i'm on the right way to fix it :)
05:08 hakzsam: imirkin, PIGLIT: {"result": "pass" } ;)
05:10 hakzsam: I don't really like the way I fixed it but I'll improve that right now
06:10 hakzsam: imirkin, actually, there is still a problem with max-samplers, but I have a partial fix which allows to bind more than 16 samplers on nv50 https://github.com/hakzsam/mesa/commit/02309d038f8f0aa413b2fc399302032168b2197b
06:11 hakzsam: this patch also fixes dmesg errors
06:44 karolherbst: :O I am running out of memory :O
06:45 karolherbst: https://gist.github.com/karolherbst/33d8d39bcdc771312328
06:45 karolherbst: strange
06:46 karolherbst: shouldn't the kernel free some of the file cache first?
06:59 karolherbst: :D
07:00 karolherbst: http://lkml.iu.edu/hypermail/linux/kernel/1507.1/01758.html
07:57 karolherbst: mupuf, imirkin: if the nvidia uses pramin to read the bios, shouldn't demmio show this ? Or will it always display PROM?
07:58 imirkin: karolherbst: it should show this
07:58 karolherbst: mhh
07:58 karolherbst: because it only displays PROM
08:00 karolherbst: mupuf mentioned that nvidia checks PRAMIN even on my card, but if it doesn't how much can I achieve with nvafakebios at all?
08:00 imirkin: not much :)
08:00 imirkin: perhaps they changed their drivers around
08:01 imirkin: or perhaps they have a diff policy for mobile chips
08:01 karolherbst: the trace was made with the 343.22 driver
08:01 karolherbst: mhh
08:02 karolherbst: but you already said you couldn't find anything usefull from the trace I gave you?
08:03 imirkin: well, i also didn't see the gpio thing
08:03 imirkin: but now that i know which one to look for
08:03 imirkin: i might have more luck
08:03 imirkin: although i assume mupuf looked at your trace too?
08:03 karolherbst: I think so
08:04 karolherbst: but before we found out this about the pwm stuff
08:12 karolherbst: whats with that "PCLOCK.CLK7" stuff?
08:18 karolherbst: new phoronix benchmarks show really bad performance for 750 Ti
08:18 karolherbst: be he couldn't switch pstates too
08:20 imirkin: and i'm moderately sure that it misrenders... at least it does with xonotic
08:20 karolherbst: checking
08:20 karolherbst: its not listed in some benchmarks
08:21 imirkin: no idea why... last i checked valley ran just fine
08:21 karolherbst: maybe its good now?
08:21 karolherbst: linux 4.1 and mesa-dev was checked
08:21 imirkin: maybe what's good now? xonotic?
08:22 karolherbst: nothing really
08:22 karolherbst: although xonotic was at 20fps
08:22 imirkin: off chance that it was affected by my ftz change
08:22 karolherbst: maybe he didn't check the output
08:22 imirkin: maybe?
08:22 imirkin: i can guarantee it
08:22 karolherbst: :D
08:22 karolherbst: but its insane
08:22 karolherbst: 650 clocked at 0f
08:22 karolherbst: 750 TI at 07
08:22 karolherbst: and still 20%
08:23 karolherbst: sometimes more
08:24 imirkin: i'm actually concerned by those gputest outputs
08:24 imirkin: they suggest that we have really high command overhead
08:25 karolherbst: yeah
08:25 imirkin: but... given the current perf situation, hard to care
08:25 karolherbst: would be something for me, who can't RE and tracing fails
08:25 karolherbst: :D
08:26 karolherbst: analyzing c++ code perf is something I've done sometimes
08:26 imirkin: mostly C code actually
08:26 karolherbst: really? :/
08:26 imirkin: i doubt the compiler is getting invoked too much
08:27 karolherbst: I thought gallium would be c++
08:27 imirkin: the compiler is C++
08:27 karolherbst: I see
08:27 imirkin: nah, gallium is mostly C
08:27 karolherbst: should be still the same
08:27 karolherbst: tracing some native code is piece of cake
08:27 karolherbst: finding the bottlenecks not so much, but I believe finding bottlenecks from gpu code is much harder
08:28 imirkin: :)
08:37 imirkin: mupuf: could i trouble you to plug a G80 in when you get the chance? i want to play around with this LOD sampler thing
08:58 karolherbst: this is just insane here :/
08:58 karolherbst: guess what, while nouveau was loaded, gputest came to the brilliant idea to load the nvidia module
09:00 karolherbst: and if this wouldn't be bad enough, nvidia unloads unclean leaving some sysfs stuff, and totally messes up at a second load
09:01 tobijk: hehe i see you have a nice time there :O
09:02 karolherbst: really
09:03 imirkin: just blacklist it and be done with it
09:03 karolherbst: guess what it is
09:03 imirkin: that should avoid it getting accidentally loaded
09:03 imirkin: ah heh
09:03 karolherbst: I know that libcuda.so tries to load the nvidia module for example
09:03 karolherbst: or nvidia_uvm gets loaded and tries to pull in nvidia
09:03 karolherbst: something like that
09:03 karolherbst: imirkin: his setup is completly messed up
09:04 karolherbst: on 07 I already have like 10.000 points with the first benchmark
09:04 imirkin: well, it's not about # of points
09:04 imirkin: that's highly specific to... a lot of stuff
09:04 imirkin: it's more about driver overhead during the triangle test
09:05 karolherbst: mhh
09:06 karolherbst: only 13026 points with 0a
09:06 karolherbst: that scales well
09:07 karolherbst: running at half cpu speed now
09:08 karolherbst: but I think the plot3d one is unimportant, fur is more critical
09:09 imirkin: maybe. but triangle is the one i'm concerned by :)
09:09 imirkin: coz that test does *nothing*
09:09 karolherbst: yeah, running it now
09:10 imirkin: so if we're slow, that means that we suck somehow in the driver
09:10 karolherbst: yeah, the triangle is now though
09:10 karolherbst: *nice
09:10 karolherbst: 18k points with half cpu clock
09:12 karolherbst: seems like cpu clock doesn't change much or intel_pstate is just doing nothing
09:12 karolherbst: 18k at full speed
09:13 karolherbst: uha, fur mark
09:13 karolherbst: a lot of glitches
09:13 imirkin: ok, well it might not be cpu overhead... perhaps we're waiting somewhere
09:13 imirkin: which would also suck
09:13 karolherbst: to be clear: I disabled turbo boost and set max clock to 75, which drops from 3.4GHz max to 1.8GHz max clock
09:13 karolherbst: I run3.2GHz stable single core thorugh boost
09:14 karolherbst: fur mark: a lot of PRIME tearing + old frames are displayed again? Or wrong order
09:15 karolherbst: okay, fur mark: 100% cpu usage
09:16 karolherbst: triangle same, strange
09:16 karolherbst: never mind, will try my luck with callgrind, maybe it shows something
09:31 karolherbst: imirkin: found something
09:31 karolherbst: nouveau_fence_update: self 30%
09:31 karolherbst: nouveau_fence_wait: self 15%
09:31 imirkin: ouch.
09:31 imirkin: so... yeah. we're basically sleeping.
09:31 karolherbst: nvc0_screen_fence_updatE: 4.4%
09:31 imirkin: fence_update taking 30% seems a bit high
09:32 imirkin: heh
09:32 karolherbst: then everything below w%
09:32 imirkin: that's a lot of fencing!
09:32 karolherbst: yeah
09:32 karolherbst: 50%
09:32 karolherbst: should I run with intel?
09:32 imirkin: which is also what i was afraid of
09:32 imirkin: you can, but it won't really affect the issue
09:32 karolherbst: I just want to compare
09:33 karolherbst: I built with 03 though, so it might be also inlined stuff in the functions
09:34 karolherbst: intel: 20.5k points
09:34 imirkin: not really, look at those functions :p
09:35 karolherbst: ra_add_transitive_reg_conflict with 3.7% self is max
09:36 imirkin: that's just startup cost for intel iirc
09:36 karolherbst: yeah, but everything is below that
09:36 karolherbst: will compile with 0g to get more accurate results
09:37 karolherbst: maybe the foor loop is the evil part
09:37 karolherbst: don't know how many fances there could be
09:38 imirkin: a lot :)
09:38 imirkin: depends on what the app does
09:38 karolherbst: ohh
09:38 karolherbst: I see
09:38 karolherbst: triangle gputest ;)
09:38 karolherbst: nothhing
09:38 imirkin: if it's like "draw; wait; draw; wait;" then a lot.
09:39 karolherbst: yeah, but the driver shouldn't loose so much performance there
09:39 imirkin: what do you think fence_wait is? :p
09:40 karolherbst: no idle wait
09:40 imirkin: bbl
10:02 karolherbst: imirkin: 47% of the entire benchmark is happening inside glXSwapBuffers
10:03 imirkin_: enable_vblank=0 perhaps?
10:04 karolherbst: k
10:06 karolherbst: nothing really changes
10:07 karolherbst: I think the loops are simply causing troubles
10:07 karolherbst: how many fances can be in the list?
10:07 imirkin_: dunno... hopefully not too many
10:07 imirkin_: it's basically the depth of the render
10:08 imirkin_: unless something is doing nouveau_fence_next like mad somewhere
10:08 karolherbst: maybe I should run it inside a debugger
10:08 karolherbst: wait
10:08 imirkin_: so basically the idea is that you have
10:08 imirkin_: draw; fence; draw; fence; draw; fence
10:08 karolherbst: nouveau_fence_next is called 60k times
10:08 imirkin_: except we don't actually always emit the fence there
10:09 imirkin_: ok, so there are 60k fences
10:09 karolherbst: ...
10:09 imirkin_: but not all live at once :)
10:09 imirkin_: it's a sequence in time
10:09 imirkin_: to sync cpu and gpu
10:09 imirkin_: cpu issues commands to gpu
10:09 imirkin_: but gpu doesn't perform them synchronously with the issuing
10:09 imirkin_: so you tell the gpu to write X, X+1, X+2, etc every so often
10:09 imirkin_: this is known as a fence
10:09 karolherbst: update is called 60M times
10:10 imirkin_: and this way you know how much progress it's made
10:10 imirkin_: that seems high
10:10 karolherbst: 60 seconds
10:10 karolherbst: at 300 fps
10:10 imirkin_: at least based on what i remember it doing
10:10 karolherbst: wow
10:10 karolherbst: this is heavy
10:10 karolherbst: nouveau_fence_wait: call 20k times
10:11 karolherbst: and it calles 60M/20k each time fence_update
10:11 karolherbst: so basically 30k each time
10:11 imirkin_:goes to read code
10:11 karolherbst: no, 3k times
10:11 imirkin_: oh right, but i bet fence_update just does the screen->fence.update and it's too early
10:12 karolherbst: but this is nvc0_screen_fence_update, right?
10:12 karolherbst: only 4% cpu time
10:12 imirkin_: right...
10:12 karolherbst: but still, same amount of calls
10:14 imirkin_: so ideally instead of busy-waiting we'd have a kernel mechanism for this
10:15 imirkin_: fence writes can be made to trigger interrupts
10:15 imirkin_: but... that won't make anything faster
10:15 karolherbst: nope
10:15 imirkin_: it'll just reduce cpu usage
10:15 imirkin_: so the fundamental issue is... is it waiting on the right fences?
10:16 karolherbst: what does nouveau_fence_trigger_work?
10:16 karolherbst: this is called for each fence in the loop
10:16 imirkin_: you can attach work to a fence
10:16 imirkin_: so that when that fence is hit, it can do whatever
10:16 karolherbst: ahh
10:16 imirkin_: generally used to free resources
10:16 karolherbst: mhh
10:16 imirkin_: coz you can't free something while the gpu is using it
10:16 karolherbst: no, it should make stuff faster
10:17 karolherbst: if we assume there is no waiting at all, but just iteration and bad luck
10:17 karolherbst: like always the last fences are the right one
10:17 karolherbst: then there is a lot of cpu time spend in just iterating through the fence list
10:17 karolherbst: which could be reduced, which means glXSwapBuffers would return earlier
10:18 karolherbst: which would enable the application to continue drawing stuff, right?
10:18 imirkin_: my bet is that the time spent iterating over the list of fences is ~0
10:18 karolherbst: also the last loop is kind of heacy
10:18 imirkin_: if it isn't, then that's a problem in and of itself
10:18 imirkin_: if (sequence == screen->fence.sequence_ack)
10:18 imirkin_: break;
10:18 imirkin_: that seems wrong
10:19 imirkin_: that should probably be if (sequence >= screen->fence.sequence_ack)
10:19 karolherbst: should I try it out?
10:19 imirkin_: ya
10:20 karolherbst: will do full compile though, don't want to mess with system gl again :D
10:20 imirkin_: yeah, i never touch system installs of anything
10:22 karolherbst: I mean I let portage still install the patched version, but I know it works that way and I don't mind messing up my nouveau mesa thingy :D
10:22 imirkin_: fair enough.
10:23 imirkin_: e.g. i use nfsroot to boot my android boards coz i don't want to touch the system install :)
10:23 karolherbst: :D
10:23 imirkin_: and do bind mounts when overriding stuff in android land.
10:23 karolherbst: I always do full roots
10:23 imirkin_: i _really_ don't like touching system installs ;)
10:23 karolherbst: I like touching system install in android
10:24 karolherbst: removing all the bloat
10:24 karolherbst: I really hate not uninstallable software on android
10:24 imirkin_: oh, well i don't actually use it for anything other than dev
10:24 imirkin_: so it's a lot more important to me that the thing boots than it not have bloatware
10:24 karolherbst: usually I go custom rom, but for some models its just not usable
10:25 imirkin_: of course these jokers sometimes build kernels without nfs support, and that's when i have to do something custom :)
10:25 karolherbst: there is awesome software though and test software against (like API interceptors so you can deny addressbook access on a low level layer)
10:25 karolherbst: :D
10:26 Karlton: use replicant: http://www.replicant.us/ :)
10:27 imirkin_: can't imagine that'd boot on any of my boards
10:27 karolherbst: yeah well
10:27 imirkin_: they explicitly hate on qualcomm
10:27 karolherbst: the supported device list is kind of a joke ;)
10:27 imirkin_: and i play with freedreno too, which is what this is all for
10:27 karolherbst: yeah I thought of that too
10:28 imirkin_: lots to do on that driver too if you're interested :)
10:28 karolherbst: wow
10:28 karolherbst: and no 3D graphics with replicant
10:28 imirkin_: they'd have 3D graphics if they used freedreno
10:28 imirkin_: heh
10:28 karolherbst: also no wifi and no bluetooth?
10:28 karolherbst: Karlton: are you serious?
10:28 karolherbst: at least the camera works for two devices
10:28 imirkin_: meh, all i need is gpu and network
10:28 Karlton: they don't support non-free firmware do to backdoors common in mobile devices
10:29 karolherbst: and software video decoding?
10:29 karolherbst: really?
10:29 karolherbst: yeah well
10:29 Karlton: its a "free" replacement
10:29 karolherbst: then you should throw away your phone
10:29 karolherbst: seriously
10:29 Karlton: I don't even have on :D
10:29 karolherbst: what about the celuar netowrk ROM?
10:29 Karlton: s/on/one/
10:29 karolherbst: its always non free
10:29 imirkin_: well it's the same as the RMS-approved laptop list
10:29 karolherbst: so its backdoored already
10:30 karolherbst: what you do over bluetooth, wifi is a joke compared what you do over your phone network
10:30 Karlton: yeah, there is only 2 computers that the FSF endorses
10:30 karolherbst: both lenovo, right? :D
10:31 Karlton: yeah
10:31 Karlton: on the x200 they replaced intels ME crap :)
10:31 imirkin_: mwk: are you aware of anything specific to the G80 wrt its sampler or texture descriptors being different, as far as base level/last level (i.e. hard-max-lod) are concerned?
10:31 karolherbst: I think the FSF is a little bit too serious there
10:31 karolherbst: I mean, yes okay, but what about the bios?
10:32 karolherbst: it would be a good sign to support something like coreboot or so
10:32 karolherbst: or are both lenovos able to run with coreboot?
10:32 Karlton: they run libreboot, which is coreboot without support for non-free firmware that intel has
10:32 karolherbst: I see
10:33 karolherbst: imirkin_: testing now
10:33 karolherbst: the triangle is still nice
10:33 Karlton: most things running coreboot still require firemware blobs xD
10:33 karolherbst: yeah
10:33 karolherbst: I know
10:34 imirkin_: i actually should probably think about that condition...
10:34 imirkin_: we might be iterating over the fence list in the wrong order
10:34 karolherbst: :D
10:34 karolherbst: yay
10:34 karolherbst: more performance
10:34 imirkin_: moar fps? :)
10:34 karolherbst: mhhh
10:34 karolherbst: I meant the idea
10:34 karolherbst: the benchmark is still the same
10:35 karolherbst: nothing changed really in the call amount
10:35 imirkin_: it shouldn't have...
10:35 imirkin_: but the relative amount of cpu time spent?
10:35 karolherbst: the same
10:35 imirkin_: k
10:35 karolherbst: with really I meant like it still some around 60M
10:35 imirkin_: give me a min
10:36 imirkin_: ok, looks like fences are appended to the end
10:36 imirkin_: so we're iterating in the right order
10:36 imirkin_: also... wtf
10:37 imirkin_: oh no, it's cool
10:37 karolherbst: couldn't the lower loop be merged into the upper one?
10:37 karolherbst: mhh
10:37 karolherbst: not really I think
10:38 imirkin_: wouldn't matter
10:38 imirkin_: they iterate over diff sets of things
10:38 karolherbst: k
10:38 karolherbst: yeah see it now
10:39 imirkin_: can you throw an assert in there for like assert(sequence != ~0U)
10:40 imirkin_: oh, you said only 20k fences, nevermind
10:40 imirkin_: but if it ever overflows, we'd be in unhappy-land
10:41 karolherbst: 60k actually
10:41 imirkin_: wtvr. not 4B
10:41 karolherbst: not even close
10:45 karolherbst: oh no "Failed to release test userptr object! (9) i915 kernel driver may not be sane!" :O
10:45 imirkin_: that's for #intel-gfx
10:46 karolherbst: yeah I know
10:46 karolherbst: first time I saw it though and under callgrind
10:46 karolherbst: imirkin_: its worse for the fur test
10:47 karolherbst: 45% fence_update
10:47 karolherbst: 22% fence_wait
10:47 karolherbst: 6% nvc0_screen_fence_wait
10:47 karolherbst: 75% glXSwapBuffers in total
10:47 karolherbst: that's heavy
10:47 imirkin_: this will require thought re wtf is going on
10:48 karolherbst: should I check some games?
10:48 imirkin_: this won't show up in real applications
10:48 karolherbst: maybe everything is hit by this
10:48 karolherbst: mhh
10:48 karolherbst: testing though
10:48 imirkin_: but sure, go ahead. i love to be proven wrong.
10:48 imirkin_: happens so rarely :p
10:49 karolherbst: mhh which game mhh
10:49 karolherbst: oh no
10:49 karolherbst: man
10:49 karolherbst: this stupid valgrind
10:49 karolherbst: seriously
10:49 karolherbst: I hate it
10:49 imirkin_: 64-bit?
10:50 karolherbst: yeah
10:50 imirkin_: whereas your game is 32-bit
10:50 karolherbst: man
10:50 imirkin_: heh
10:50 karolherbst: wontfix bug
10:50 karolherbst: ...
10:50 imirkin_: but look at it on the bright side, the 32-bit valgrind doesn't know about half the isntructions that yoru game will execute
10:50 imirkin_: apparently it was never updated for sse/etc
10:50 karolherbst: yeah
10:50 imirkin_: (or like sse4)
10:50 karolherbst: they left out everyhting after sse2 or so
10:50 karolherbst: so valgrind just stops
10:50 karolherbst: this is painfull
10:50 imirkin_: dota2 reborn is 64-bit
10:51 karolherbst: because my 32bit library are full of sse4 and avx stuff
10:51 imirkin_: that'll teach you to optimize for your cpu!
10:51 karolherbst: once I had to rebuilt like 100 libs just to get valgrind working
10:51 karolherbst: :D
10:52 karolherbst: imirkin_: https://bugs.kde.org/show_bug.cgi?id=337475
10:52 karolherbst: "x86 (32-bit) support stopped at SSSE3. Please use 64 bit mode for anything more modern." I woudl if I COULD!
10:52 karolherbst: ...
10:53 Karlton: oh, replicant actually documents the backdoors they find: http://redmine.replicant.us/projects/replicant/wiki/SamsungGalaxyBackdoor D:
10:53 imirkin_: maintain your patchset
10:53 karolherbst: witch patchset?
10:53 karolherbst: *which
10:53 imirkin_: for ssse3+ support
10:54 karolherbst: yeah well, do you know one?
10:54 karolherbst: I bet this is just some entries inside a table
10:54 karolherbst: like 100 loc in total
10:54 karolherbst: just to cover all SSE and AVX variants
10:55 imirkin_: no clue
10:55 imirkin_: just sayin'... i'm sure a lot of people would appreciate
10:56 karolherbst: but this is just insane
10:56 karolherbst: there are some bug reports
10:56 karolherbst: and always the same reply: "its unsupported"
10:56 karolherbst: yeah well
10:56 karolherbst: what should you do then?
10:56 imirkin_: as with everything, limited resources, etc
10:57 imirkin_: they might even have some deeper reason for it, like it's not easy to add support for some dumb 32-bit reasons
10:57 karolherbst: maybe
10:57 imirkin_: the point is, if people have written patches for this stuff already
10:58 imirkin_: it shouldn't be too difficult to collect them all into a tree and then send a message somewhere saying "hey, if you want working 32-bit valgrind, look here"
11:03 karolherbst: I don't think there are any :/
11:03 karolherbst: maybe because there are other tools to trace applications
11:03 karolherbst: so other tools got used
11:09 karolherbst: ohh got some 64bit games
11:27 karolherbst: mhh yeah, seems to be okay
11:32 karolherbst: imirkin_: did you run the fur test on nvidia hardware?
11:32 karolherbst: or wait, I will do a trace
11:40 karolherbst: okay, the stutters are DRI_PRIME problems
11:41 karolherbst: imirkin_: the fur example is really heave
11:41 karolherbst: 200 waits
11:41 karolherbst: but 34M update
11:42 karolherbst: and only 500 fence next
11:42 imirkin_: ok, well you realize that it just does update in a busy loop right?
11:42 imirkin_: that means that for one reason or another, the gpu is taking its sweet time
11:43 imirkin_: either we're waiting for the wrong thing
11:43 imirkin_: or... something
11:43 karolherbst: yeah
11:44 karolherbst: maybe I could collect some data about where the right one usually is or what the right one is
11:46 karolherbst: imirkin_: whats bothering me is, that there are more than 100k update calles per wait call
11:46 imirkin_: each call is "did the gpu complete"
11:46 imirkin_: until the gpu writes that fence, it'll just spin
11:46 karolherbst: I see
11:47 karolherbst: so its busy waiting the entire time
11:47 imirkin_: right.
11:48 karolherbst: so the longer one frame takes the more its spends inside this loop
11:48 imirkin_: or whatever fence is being waited on
11:48 imirkin_: fences are also used to keep track of staging buffer writes completing, etc
11:49 karolherbst: well the calls always came from glXSwapBuffers
11:49 imirkin_: right, so that's probably just a pipe->flush() call
11:51 karolherbst: couldn't there be something like one mutex for each fence, then mutex to nouveau_fence_trigger_work and later a loop wait on all mutex with a timeout?
11:52 karolherbst: or wherever the work is trigger on a fence
11:59 imirkin_: uhm
12:00 imirkin_: it's waiting for the GPU to do something
12:00 imirkin_: the work is only triggered when the gpu has done something.
12:16 karolherbst: yeah I figured
12:17 karolherbst: imirkin_: I was thinking about a workflow like that: fences are triggered to start and added to a waitingqueue in which the fence_wait function would wait on until somebody triggeres a "finished-event" on that fence objects and the wiat would unlock
12:17 imirkin_: that's what it does now.
12:17 karolherbst: yeah but busy waiting
12:17 imirkin_: exactly
12:17 imirkin_: to avoid busy-waiting it needs a kernel assist to notify when the event is fired
12:18 imirkin_: but it won't speed anything up, just reduce cpu usage
12:18 karolherbst: who does set the fence to finished?
12:18 karolherbst: yeah I know
12:18 imirkin_: the gpu writes the new value to some agreed-upon memory location
12:18 imirkin_: and optionally triggers an interrupt as it does that
12:18 karolherbst: k
12:19 imirkin_: (the option is selected at fence write time)
12:19 imirkin_: so in-kernel fence writes trigger interrupts, but userspace ones have no way to receive the interrupt notification, so they don't
12:19 karolherbst: wouldn't it make sense to always trigger such an event if mesa will wait upon that fence?
12:19 karolherbst: ahh
12:19 imirkin_: we can do something like nouveau_bo_wait which will use the in-kernel mechanism
12:20 imirkin_: but we suballocate some bo's for smaller buffer allocations
12:20 imirkin_: so we don't want to wait on the wrong thing
12:20 imirkin_: i suspect this is a situation where we could do a nouveau_bo_wait if it's not a suballocated thing
12:20 karolherbst: could nouveau_fence_trigger_work be called on a fance which does nothing?
12:20 imirkin_: you're looking at it in the wrong direction
12:21 imirkin_: trigger work is *triggered* when the fence has been hit
12:21 imirkin_: not the other way around
12:21 imirkin_: fences don't do anything
12:21 imirkin_: they're just ways of telling how far the gpu has gotten
12:21 imirkin_: it's a really crappy word
12:21 imirkin_: but one that's commonly used in the industry i think
12:21 imirkin_: another way to think about it is watermark
12:21 karolherbst: then what does nouveau_fence_trigger_work do?
12:22 imirkin_: it performs work once the GPU is done doing something
12:22 imirkin_: e.g. you give some data to the GPU in a buffer
12:22 imirkin_: and you can't free the buffer until the GPU is done reading from it
12:22 imirkin_: so you can attach work to a fence that's emitted afterwards in the command stream
12:23 imirkin_: which will free that bo when triggered
12:23 imirkin_: (bo = buffer object btw)
12:23 karolherbst: yeah
12:23 karolherbst: so there is work on a fence attached which will be executed after the fence is "reached" by the gpu
12:24 karolherbst: the only thing I encountered fences where in creating thread safe singletons and that memory fences would be required to make them 100% save
12:26 imirkin_: yeah that's ssssort of similar
12:26 imirkin_: but in this case it's just a recorded watermark
12:26 imirkin_: literally an integer
12:26 imirkin_: that's incremented by the gpu every so often
12:26 imirkin_: you can also use it to sync multiple engines on the same gpu, but that's much less frequently done.
12:26 karolherbst: I usually know how locks and such works, but never read about the differences between a mutex and a fence or such
12:27 karolherbst: although I always get the feeling its basically everything the same
12:27 imirkin_: sorta yeah
12:27 imirkin_: in order to avoid busy waits, you need kernel support
12:27 karolherbst: :/
12:27 imirkin_: for everything, including mutexes
12:28 imirkin_: the cpu always has to be executing *something* :)
12:29 karolherbst: yeah, well
12:29 karolherbst: it could sleep though :p
12:29 imirkin_: how?
12:32 karolherbst: sleep state
12:32 imirkin_: what is the userspace-accessible instruction to be executed
12:32 karolherbst: yeah okay, userspace
12:32 imirkin_: and is it the same on each cpu? what about kvm? etc
12:32 karolherbst: don't know any, but I am also not the assembler guy
12:33 imirkin_: there's the cpuidle driver which does "the right thing" for a particular cpu going into idle
12:33 imirkin_: otherwise it just spins nops
12:33 imirkin_: (which are generally pretty cheap instructions)
12:33 karolherbst: mhh
12:33 karolherbst: I think cpuidle is not the right thing today for that
12:33 imirkin_: quite sure that it is
12:34 karolherbst: yeah, but this is for idle only. Usually you have something like intel_pstate, which will set cstates and pstates accordingly
12:34 imirkin_: which has nothing to do with spinning on a mutex
12:35 karolherbst: mhh, right
12:37 karolherbst: on a side note: http://blog.cr4.sh/2015/07/building-reliable-smm-backdoor-for-uefi.html Karlton
13:16 Karlton: karolherbst: yeah and this too: http://hackaday.com/2015/06/08/hard-drive-rootkit-is-frighteningly-persistent/
13:19 karolherbst: yeah
13:20 karolherbst: since the leaks I get the feeling we have throw everything away, but then again: how do you protect against something like this
13:20 karolherbst: ever
13:22 karolherbst: Karlton: do you know the Ken Thompson Hack?
13:24 Karlton: yeah, and there was an interresting thing someone did with cpu acoustics to decrypt a gpg key
13:24 karolherbst: yeah
13:24 karolherbst: its awesome what can be done
13:25 karolherbst: I think the Ken Thompson Hack is underestimated a lot
13:25 karolherbst: you simply need to hack the "main" compiler at microsoft and you backdoored a lot of computers
13:25 Karlton: they did with a lenovo: http://www.cs.tau.ac.il/~tromer/acoustic/
13:25 Karlton: s/did/did it/
13:26 Yoshimo: still doesn't mean we should give up on pgp and open source
13:26 karolherbst: right
13:26 karolherbst: we should think about how to protect against new threats and new ways of attacking us
13:26 karolherbst: but for this, we have to know the threats and attacks ;)
13:27 mupuf: imirkin_: I can do that
13:27 Yoshimo: first: get the easy mass surveillance down, secure messaging and browsing on the net and after that you can move to protecting the individual targets
13:27 karolherbst: Yoshimo: and how do we do that?
13:27 Yoshimo: but this channel is about nvidia cards ;)
13:27 karolherbst: the first part
13:28 karolherbst: then we talk about backdoored nvidia cards
13:28 Yoshimo: well, cert pinning, pgp, dnssec ;)
13:28 karolherbst: screen reading :)
13:28 karolherbst: there was a valid attack, where you could read old framebuffer content from uninitilaized GL contexts
13:28 karolherbst: and so get maybe passworts and stuff
13:29 karolherbst: not uninitialized context, but buffers
13:32 karolherbst: Yoshimo: https://hsmr.cc/palinopsia/
13:35 Yoshimo: is that something a driver could prevent?
13:36 RSpliet: Yoshimo: yes, by clearing the buffer prior to freeing it
13:36 RSpliet: that could however put quite some pressure on performance
13:36 karolherbst: intel isn't effected by this attack ;)
13:36 karolherbst: yeah
13:36 RSpliet: of course they are
13:37 karolherbst: that's most of the reason some software is not safe eough
13:37 karolherbst: because performance is sometimes more important
13:37 karolherbst: that's what caused heartbleed in fact
13:37 karolherbst: RSpliet: doesn't work here
13:37 karolherbst: "The internal graphics card seems to be unaffected at the moment. Tests showed that in this setup only programs forced to run on the dedicated card will leak data to VRAM."
13:37 karolherbst: its alos on the website
13:38 karolherbst: would be interessting to know where integrated intel gpus are effected
13:38 RSpliet: IGPs don't have VRAM, but I can't imagine them emptying buffers before or after use
13:38 karolherbst: most likely yes
13:38 karolherbst: wow, that would be an attack
13:39 karolherbst: mapping previously used RAM for passwords into VRAM of IGPs
13:39 karolherbst: then read the buffer to get passwords :D
13:39 AlbertP: is system RAM not cleared when a system reboots?
13:39 RSpliet: not necessarily
13:39 karolherbst: AlbertP: try to clrear TBs of RAM
13:39 karolherbst: there are servers out there with such amount of RAM
13:39 RSpliet: karolherbst: that's easy, just cut the power to RAM for a second or two
13:40 karolherbst: :)
13:40 karolherbst: RSpliet: but this may cost you 1M $ ;)
13:42 RSpliet: anyway, yes this is a known attack; it would be good if say PEM clears password buffers after use
13:42 RSpliet: don't know whether it does
13:43 karolherbst: mhh with intel there are only black buffers, so I think they are cleared
13:43 karolherbst: with nouveau there is just randomness after the gpu was off
13:46 RSpliet: imirkin_: 076543210 ?
13:47 RSpliet: is that an intentional cryptic way of writing 0x48FF4EA ?
13:47 karolherbst: mhhhh
13:47 karolherbst: ....
13:47 karolherbst: :D
13:47 imirkin_: RSpliet: well, it's the same way as it's done in nvc0_validate_fb
13:48 imirkin_: RSpliet: and also it's a list of 3-bit mappings, so octal is a lot more readable
13:48 karolherbst: why not 87654321 though?
13:48 karolherbst: ahhh
13:48 imirkin_: because there are only 8 RT's
13:48 imirkin_: 0..7
13:48 imirkin_: but you can use that register for some crazy switcheroo thing, which we never make use of
13:48 karolherbst: switcheroo are for lazy people
13:48 imirkin_: i.e. rendering to RT 0 in the shader actually goes to RT 5 or whatever
13:48 karolherbst: *is
13:49 imirkin_: perhaps it's more useful in DX10, dunno
13:49 karolherbst: mhh
13:49 imirkin_: or perhaps someone was like "oh, that seems like a cute idea, let's implement it", and then it was never actually used
13:50 imirkin_: either way, we only ever put in the identity mapping in
13:50 RSpliet: imirkin_: it's decimal though, not octal. I'm not sure if I understand your reply :-)
13:50 karolherbst: maybe used in some crazy _NV_ extension
13:50 imirkin_: RSpliet: look at C perhaps
13:50 RSpliet: ...
13:50 imirkin_: C literals, that is
13:50 imirkin_: 123 = decimal
13:50 imirkin_: 0123 = octal
13:50 imirkin_: 0x123 = hex
13:51 imirkin_: 0b111 = binary (C11 only i think)
13:51 RSpliet: ...
13:51 RSpliet: *face*
13:51 RSpliet: *desk*
13:51 RSpliet: okay
13:52 imirkin_: always fun when people write months as like "04", and then start wondering why august won't work
13:52 RSpliet: that's the first time I find something in C which i'd label sheer insanity
13:52 imirkin_: i dunno. octal comes up a lot and is pretty useful
13:52 imirkin_: file permissions is another big on
13:52 imirkin_: one*
13:53 imirkin_: so you can write 0644 instead of something dumb and unreadable
13:53 imirkin_: i.e. either those stupid incomperehensible macros, or worse, the hex/decimal equivalent
13:53 imirkin_: and iirc e.g. PDP-11 assembly was all octal-only too
13:53 imirkin_: so it's got a nice history
13:53 RSpliet: it is actually, just, I'd have tried harder to find a less confusing notation
13:54 imirkin_: as confusing as 0x for hex :p
13:54 karolherbst: I mean someone could have used 0o7654321
13:54 RSpliet: well, at least with 0x it doesn't look like somebody tried desparately to align their values
13:54 imirkin_: (i suspect the octality of PDP-11's was somewhat linked to having 18-bit words)
13:58 karolherbst: mupuf: do you think we might be now able to find something inside the mmiotrace I made a while ago? Now that we now a little what we are searching for?
14:12 mupuf: karolherbst: well. I already have my nv117 trace
14:13 mupuf: and that is what I am working on
14:13 mupuf: I have been comparing two mmiotraces side by side
14:13 mupuf: one with gpio-based voltage management
14:13 karolherbst: ahh okay
14:13 mupuf: and one with pwm-based
14:13 karolherbst: nice
14:14 mupuf: unfortunately, it looks like the voltage management is not done at the same place in the driver depending on how it is handled
14:14 mupuf: bad sw design or does it take longer to signal the voltage change?
14:16 karolherbst: I can only say, that the driver took its time until it started to clock down
14:16 karolherbst: after nvidia load and nvidia-settings start I always have to wait like half a minute or so until its doing anything
14:18 karolherbst: but usually you have until the clock of the pwm triggers a pulse, so don't know
14:28 mupuf: imirkin_: plugged
14:29 imirkin_: mupuf: awesome, thanks
14:29 imirkin_: you can leave it off for now though
14:29 imirkin_: probably won't get to it for a few hours
14:29 mupuf: good that I have 2 of them!
14:29 imirkin_: SLI!
14:29 mupuf: the quadro one does not fit inside the box
14:29 mupuf: it is WAAAAAYYYY too long
14:29 imirkin_: hehehe
14:30 RSpliet: mupuf: for my NVA0, I had to remove the hard drive bracket from my case
14:30 RSpliet: including the hard drive inside
14:30 mupuf: 31.5cm
14:30 RSpliet: (and the card reader, collateral... :-p)
14:30 mupuf: RSpliet: yeah, I know the feeling!
14:30 RSpliet: glad I'm done hacking it for now :-P
14:33 karolherbst: mupuf: I was thinking: how does the tegra k1 does stuff?
14:34 karolherbst: or is it too different so that we can't look there?
14:34 mupuf: karolherbst: the voltage regulator is for both the CPU and the GPU
14:34 karolherbst: ohh :/
14:35 mupuf: anyway, I really need to sleep now, going to Estonia tomorrow and I need to wake up in 5.5h
14:35 karolherbst: k, have fun
15:17 karolherbst: imirkin_: how bad is a sse2_unaligned called on a hsw cpu?
15:17 karolherbst: memcpy
15:18 karolherbst: I have a game here with 37% of __memcpy_sse2_unaligned, where a fourth of that is caused by st_TexSubImage
15:19 imirkin_: well, the copy's gotta happen *somehow*
15:19 karolherbst: I thought there would be faster variants of memcpy
15:19 imirkin_: and it's gonna take time
15:19 imirkin_: so the fact that it's 37% isn't that interesting in itself
15:20 imirkin_: that said, texture uploads are a little suboptimal
15:20 imirkin_: you (the application) write to some buffer and pass it to glTexImage
15:20 imirkin_: then the impl copies that into a GARt buffer object
15:21 imirkin_: and then copies that into VRAM
15:21 imirkin_: kinda the opposite of zero-copy ;)
15:21 karolherbst: yeah I know
15:21 karolherbst: I just thought there would be a faster implementation of memcpy
15:21 karolherbst: :/
15:22 karolherbst: I thought there would be something sse4 based
15:22 imirkin_: i played around with that stuff a bunch... long story short, cpu's are really good at memcpy now
15:23 karolherbst: yeah, that's why I thought there would be something faster ;)
15:23 imirkin_: the dumbest and smartest way to memcpy are roughly the same speed
15:23 imirkin_: unless you're on some weirdo platform
15:23 imirkin_: it knows about rep stosb
15:23 imirkin_: it knows about the fancy sse things
15:23 imirkin_: etc
15:28 karolherbst: yeah, there is indeed a __memcpy_avx_unaligned inside libc
15:28 karolherbst: mhh
15:28 karolherbst: there might be reasons I don't know it doesn't get used
15:28 imirkin_: size of the copy matters in the heuristics too
15:28 karolherbst: ahhh
15:29 RSpliet: is there only an unaligned version?
15:29 karolherbst: there are plenty
15:29 imirkin_: there are opcodes for aligned
15:29 karolherbst: __memcpy_ssse3 is aligned?
15:29 imirkin_: but i think it's more expensive to check for alignment than it is to just use the unaligned opcodes
15:29 karolherbst: seems like avx is only unaligned
15:30 karolherbst: yeah, there is sse2 and ssse3 aligned, sse2 avx unaliged memcpys
15:30 karolherbst: mhh
15:30 RSpliet: or AVX aligned would be just as quick as "XXX" aligned, and nobody bothered implementing
15:31 karolherbst: yeah I was talkig about what is in my glibc
15:31 karolherbst: 2.20 by the way
15:31 RSpliet: probably not interesting; I reckon it might be better if the application could write straight to the GART-mapped memory
15:32 karolherbst: http://sourceware.org/ml/libc-alpha/2014-04/msg00070.html
15:32 karolherbst: its not much faster :/
15:32 karolherbst: 2% to 12%
15:33 RSpliet: quite an imporvement...
15:33 karolherbst: still
15:33 karolherbst: more performance
15:34 karolherbst: this is the test: testl $bit_AVX_Usable, __cpu_features+FEATURE_OFFSET+index_AVX_Usable(%rip)
15:36 karolherbst: I really don't see why it shouldn't get used
15:38 karolherbst: I bet mesa just calls memcpy and let glibc do the work
15:44 glennk: what tool are you profiling with?
15:44 imirkin_: karolherbst: precisely.
15:44 karolherbst: callgrind currently
15:44 glennk: ah, don't use that for measuring absolute times, it's way off for things like memcpy
15:45 glennk: use "perf" or another sampling profiler
15:45 karolherbst: I don't use it for absolute times
15:45 karolherbst: still, it showed me that the sse2 version of memcpy is used and not the avx one
15:45 glennk: basically callgrind counts ops, not actual cpu ticks
15:46 karolherbst: yeah, well cpu ticks ....
15:46 karolherbst: I don't think you can count time with them either
15:46 glennk: it also counts each op as taking one cycle
15:47 glennk: so it'll lie badly for something like memcpy with sse etc
15:47 glennk: just double check your profile with "perf top" ;-)
15:48 karolherbst: and how should I collect perf data for just one process?
15:51 glennk: don't start focusing on a process, start looking at total system overview
15:52 imirkin_: you can tell perf to only look at a single process
15:52 karolherbst: yeah well, I can't use this then, because its out of context
15:52 imirkin_: and it'll sample it statistically so it'll tell you how much time is spent quite nicely
15:52 karolherbst: okay
15:52 glennk: unlike callgrind there's no system bias, just a few % overhead from the sampling
15:52 glennk: the two tools are fully complementary :-)
16:01 karolherbst: imirkin_: found an old IRC log "< imirkin> the nouveau overhead is actually quite worrying " :)
16:02 imirkin_: karolherbst: not that old. like a day ago.
16:02 karolherbst: :D
16:02 karolherbst: but there was also discussions about < imirkin> the nouveau overhead is actually quite worrying
16:02 karolherbst: ...
16:02 karolherbst: __memcpy_avx_unaligned
16:03 imirkin_: that's not where the overhead is.
16:03 karolherbst: but why its slower then the ssse3 versions
16:03 imirkin_: or rather, i'm not worried about that overhead.
16:03 karolherbst: yeah I know
16:03 karolherbst: I think I will trust glibc doing the right thing
16:03 glennk: memcpy is super system dependent which method to use when
16:04 karolherbst: I thought on hsw avx should be pretty fast most of the time
16:04 karolherbst: at least in perf top it shows up
16:06 glennk: imirkin, gart is uncached?
16:07 karolherbst: mhh, another thing: I read sometimes about mtrr and PAT, but never had time to find something which would give me info I needed about that
16:08 karolherbst: I only have write-back entries inside the mtrr, but I don't know if thats good or bad or could be better
16:08 imirkin_: PAT = good
16:08 glennk: mtrr is ancient cruft, leave it be :-)
16:09 karolherbst: yeah, I see that PAT is doing stuff to my mtrr
16:09 karolherbst: I know that in the past some "improved" stuff inside their mtrr, but I got the feeling that with PAT nothing should be changed anymore
16:10 karolherbst: its somehow hard to find information to rely on
16:13 rpirea: imirkin_ ctxsw should be extracted from nvidia driver?
16:14 imirkin_: rpirea: no
16:14 rpirea: is in linux-firmware?
16:15 imirkin_: no
16:16 imirkin_: nouveau supplies its own
16:16 imirkin_: unless you're talking about GK20A
16:20 rpirea: imirkin_ nvidia have some documentations about gpu open?
16:20 imirkin_: rpirea: very minimal
16:28 karolherbst: how can I view the shaders in qapitrace?
16:29 imirkin_: karolherbst: apply https://github.com/apitrace/apitrace/commit/a468a7afdfbcef5c76d95580f04c8ab1b1d8d1f9 to your build
16:29 karolherbst: nice thanks
16:29 imirkin_: the ones frmo the call you pointed out are http://hastebin.com/leqocihoxo.cs (frag shader) and http://hastebin.com/ucudicacax.cs (vertex shader)
16:30 imirkin_: but i noticed that the GL_TEXTURE1 thing looks different already by then
16:30 imirkin_: the nouveau one is a lot bluer
16:30 karolherbst: luckily I added epatch_user support to all ebuilds :)
16:31 karolherbst: mhh
16:31 imirkin_: bbiab
16:49 rpirea: imirkin_ http://pastebin.com/ZR6Bt6Qj
16:50 rpirea: someone responsable for GM107 need to see that. NVIDIA 840M
16:52 karolherbst: yeah well, I had this issue like an entire year
16:52 karolherbst: if its the same
16:52 karolherbst: ahh no, its something else
16:53 karolherbst: but this looks strange
16:53 karolherbst: entirley
16:53 karolherbst: *entirly
16:53 rpirea: karolherbst i had added manually CASE 0x118:
16:53 karolherbst: like RAM size
16:53 karolherbst: and current clock
16:53 karolherbst: rpirea: where?
16:54 rpirea: 2 sec
16:55 rpirea: karolherbst nvkm/engine/device/gf100.c
16:55 rpirea: sorry
16:55 rpirea: gm100.c
16:55 karolherbst: rpirea: gf would be strange
16:56 karolherbst: where did you add the case?
16:56 rpirea: above 0x117
16:56 karolherbst: mhh
16:58 karolherbst: then its most likely that something fails
16:59 rpirea: if you can fix that i will give you a beer :)
16:59 karolherbst: I doubt I can
16:59 karolherbst: but the clk class should be wrong
17:01 rpirea: ram size shoul be 2 gb
17:01 imirkin: rpirea: you need kernel 4.1
17:01 rpirea: imirkin isn't in arch
17:01 imirkin: rpirea: there's an additional patch to make sure to post the GM108...
17:01 imirkin: rpirea: not sure how that's relevant
17:01 imirkin: you need kernel 4.1 irrespective of its availability in arch
17:02 imirkin: you may still need to also add nouveau.config=NvForcePost=1
17:03 rpirea: brb
17:04 karolherbst: imirkin: https://github.com/karolherbst/nouveau/commit/5ecb79cc3794bcca08a38c7e7c408c3ec53714c2 this one?
17:05 karolherbst: seems like it
17:10 imirkin: that's the one :)
17:11 karolherbst: whats the CTXSW ?
17:11 karolherbst: currently poking through the headers generated by nvidia
17:12 karolherbst: maybe I am lucky and find something
17:13 imirkin: context switching
17:16 karolherbst: ohh, I think I found the stuff to auto power off gpu on overheating
17:17 rpirea: tomorrow i will be with kernel 4.1 :|
17:17 rpirea: good night :)
17:17 karolherbst: night
17:22 karolherbst: I get the feeling that in the reference headers are only uninteressting stuff
17:59 baozich: Hi, I'm recently trying a GTX 650 card on an arm64 board. So far, I can get the output from framebuffer console. However, it is said 'No devices detected.' when I was trying to launch X. I checked the kernel boot log and found that it failed to initialize the PGRAPH. Any ideas?
18:00 imirkin: baozich: pastebin dmesg and xorg log
18:00 baozich: And here is the error log of dmesg: http://paste.debian.net/281243/
18:00 imirkin: can you do the full dmesg?
18:00 imirkin: the nouveau.debug=debug thing is unnecessary btw
18:01 imirkin: or rather trace
18:02 baozich: wait for a sec :)
18:04 waltercool: Guys, just a question, there is a "easy way" to help with the project?
18:04 waltercool: renouveau seems a little old code
18:04 imirkin: waltercool: renouveau is ancient
18:05 imirkin: what kind of help are you able to give?
18:05 waltercool: IDK, I mean, I have a Maxwell card, and looking the current status of the project, you need some help with that
18:05 waltercool: there is some way to deliver some useful info?
18:05 imirkin: which one?
18:05 waltercool: 840m
18:06 imirkin: lspci -nn -d 10de:
18:06 imirkin: should say GMxxx
18:06 waltercool: GM108M
18:06 imirkin: that should work with nouveau with a minor kernel patch
18:07 imirkin: just stick a "case 0x118:" above "case 0x117:" in drivers/gpu/drm/nouveau/more/directories/engine/device/gm100.c
18:07 waltercool: let me try
18:08 imirkin: oh, and it needs to be linux 4.1 or later
18:08 imirkin: otherwies you don't get accel
18:09 waltercool: 118 is like 117?
18:09 imirkin: ya
18:10 waltercool: compiling...
18:10 imirkin: there's probably some minor differences in the register init, but should mostly work
18:10 waltercool: Nouveau didn't need a blob as the radeon mesa module, right?
18:11 waltercool: (I have a de-blobbed kernel)
18:11 imirkin: should be fine... it needs some firmware code, but it's built into the kernel
18:11 imirkin: i believe the linux-libre deblobber leaves it alone
18:11 waltercool: OK
18:19 baozich: imirkin: http://paste.debian.net/281245/ http://paste.debian.net/281246/
18:20 imirkin: baozich: hrmph.
18:20 imirkin: i figured init tables might not be getting run
18:20 imirkin: but... they are
18:21 imirkin: i seem to recall that lynxeye got an nvidia board running on arm
18:21 imirkin: i forget if it was armv7 or armv8
18:21 imirkin: but he had to make some patches as i recall... let me see if i can find them
18:24 imirkin: baozich: does arm64 define __arm__ or __arm64__?
18:24 baozich: imirkin: __aarch64__
18:24 imirkin: so __arm__ is not defined?
18:24 baozich: it is the armv7
18:25 baozich: __arm__ only stands for 32-bit arm
18:25 waltercool: OK, my system get panicked
18:25 imirkin: baozich: i suspect you need to fix up ttm_io_prot in drivers/gpu/drm/ttm/ttm_bo_util.c
18:25 imirkin: waltercool: did you save a dmesg?
18:25 baozich: imirkin: yes
18:25 waltercool: I mean, it was Xorg, not the system, sorry
18:26 baozich: imirkin: I have amd radeon booted already.
18:26 imirkin: baozich: well, tbh i have no idea what's wrong. i was just looking over lucas's patches and noticed that one
18:26 waltercool: let me analyze the logs first if I can get something useful
18:27 baozich: imirkin: is there other places that should be inserted arch-dependent defines?
18:27 imirkin: baozich: commit 2fc2dd781
18:28 imirkin: baozich: btw, you might want to remove nvidiafb
18:28 imirkin: it seems to bail early, but who knows
18:29 waltercool: sorry, my fault, I hadn't recompiled nouveau
18:29 baozich: imirkin: ok, let me have a try
18:30 imirkin: baozich: i don't see any obvious patches from lucas other than that one that were arch-related
18:30 imirkin: others were one-time hookups into other subsystems, not arch-specific
18:30 imirkin: perhaps gnurou might have ideas, since he's an nvidia employee who's getting nouveau up on tegra, which while different, are still moderately similar.
18:31 imirkin: and i assume that gm20b is also on armv8
18:31 waltercool: imirkin, seems to works fine, let me some extra testing
18:32 baozich: imirkin: thanks. I'm doubting if that error might introduced by a buggy pcie initialization. for the current pcie driver for armv8 is still under review.
18:33 imirkin: well, it does get moderately far
18:33 karolherbst: baozich: GTX 650 or Ti?
18:33 imirkin: you have modesetting on there already...
18:33 baozich: karolherbst: GTX 650
18:33 imirkin: just no accel
18:34 baozich: it looks like writing to PGRAPH mmio has something wrong.
18:35 imirkin: do you have 16GB of ram in this board?
18:35 baozich: 32GB in fact
18:35 imirkin: [TTM] Zone kernel: Available graphics memory: 16450600 kiB
18:35 imirkin: interesting
18:36 baozich: do you mean the memory on the graphic card or the development board?
18:36 baozich: 32GB is the main memory
18:36 imirkin: main memory
18:36 baozich: for the GTX 650, 2GB
18:37 baozich: the error is triggered by mmio write in gk104_gr_init
18:38 imirkin: yeah...
18:39 baozich: but I have no idea how to decode its semantic
18:40 karolherbst: waltercool: I don't know yet how I can help nouveau, so I try stuff out like RE, searching bugs, finding perf bottlenecks, help fixing bugs and stuff until I find an area where I find something usefull
18:40 waltercool: RE?
18:40 karolherbst: reverese engeneering
18:40 karolherbst: *engineering
18:40 karolherbst: to find out what the cards are doing and how they work without having specs
18:41 waltercool: yeah, I know
18:41 waltercool: uhmm, that looks like I should study A) nouveau kernel module and B) Mesa driver, isn't?
18:42 karolherbst: mhh, I don't think A is needed at all
18:42 karolherbst: its not that complicated so far
18:42 imirkin: you can also contribute by filing good bug reports
18:42 waltercool: OK, that seems easy
18:42 karolherbst: yeah I spend time running some games on my nouveau card and search for issues
18:43 waltercool: what do I need to do that? mesa compiled with debug flag, right?
18:43 karolherbst: sadly the one I found already exists
18:43 waltercool: ahahha
18:43 karolherbst: not really
18:43 karolherbst: just run something on that card
18:43 karolherbst: and if nothing bothers you, everything is fine
18:43 karolherbst: if you see something strange, like flickering, wrong colors, then its a bug
18:43 karolherbst: usually you should create an apitrace then
18:44 karolherbst: but this isn't that hard if you use nouveau
18:44 karolherbst: allthough I would like to find optimizations path inside the shaders, but I doubt that I will be lucky and just find one
18:44 waltercool: but that's only visual stuff, there is some way to debug something more technical? I mean, memory issues, primitive rendering problems... idk
18:44 waltercool: ?
18:45 karolherbst: mhh
18:45 karolherbst: thats technically enough
18:45 karolherbst: imagine there is something wrongly rendered
18:45 karolherbst: then it leades somehwere inside shader code
18:45 imirkin: waltercool: figuring out where the bug lies would also be extremely helpful ;)
18:45 karolherbst: or somewhere else
18:45 waltercool: a good backstack you mean
18:46 karolherbst: maybe something is wrongly calculated, because the shader are calculating in a strange way
18:46 karolherbst: don't think this works that way on a gpu :D
18:46 karolherbst: apitrace is the way to go
18:46 waltercool: that's like strace?
18:46 karolherbst: waltercool: https://apitrace.github.io/
18:46 karolherbst: more or less I guess
18:46 waltercool: haha I'm already compiling it :)
18:47 karolherbst: with apitrace you record all gl calls the application made
18:47 karolherbst: so you can replay it on other hardware
18:47 karolherbst: and inspect the calls more deeply
18:47 waltercool: that's cool, you just execute it and say where you want to dump the data, isn't?
18:48 karolherbst: you should compile qapitrace too
18:48 karolherbst: its a viewer for the traces
18:48 waltercool: qt4 UI?
18:48 karolherbst: yes
18:48 karolherbst: there you can inspect the current buffer states and view all shaders and stuff
18:48 waltercool: OK, let me add the build param
18:49 imirkin: should auto-build
18:49 imirkin: if you have the reqs
18:49 waltercool: I haven't haha, I'm recompiling mesa
18:49 karolherbst: I don't think this is really needed
18:50 waltercool: qapitrace needs qtwebkit...? dammn...
18:50 karolherbst: its not like you will find the issues through debug information inside mesa
18:50 waltercool: yeah, I need to build mesa with gles 1.0 support
18:50 karolherbst: or at least I highly doubt that
18:50 karolherbst: waltercool: no distribution package?
18:50 waltercool: Gentoo 8)
18:50 karolherbst: I see
18:51 karolherbst: you know that USE=debug isn't for debug flags?
18:51 imirkin: waltercool: why do you need gles1? there's not any gles1 applications that exist
18:51 karolherbst: mhh right I also have it disabled
18:51 waltercool: karolherbst: yeah, for Gentoo there is other way for that, I can append it into compiler flags
18:52 karolherbst: I would suggest package.env
18:52 waltercool: imirkin: Looks like apitrace require that
18:52 imirkin: wtf?!
18:52 waltercool: karolherbst: how? where?
18:52 karolherbst: yeah
18:52 karolherbst: qt4? (... qtwebkit)
18:52 waltercool: I must assume is a legacy requirement for old cards
18:52 imirkin: weird.
18:53 imirkin: >=media-libs/mesa-8.0[gles1,gles2]
18:53 karolherbst: I will check
18:53 imirkin: under egl
18:53 waltercool: yup
18:53 imirkin: i dunno if that's real
18:53 karolherbst: it really links against qtwebkit
18:53 imirkin: gles has little to do with egl
18:53 egl: hm
18:53 waltercool: haha
18:53 eagle: ;D
18:54 imirkin: good move :)
18:54 karolherbst: yeah
18:54 karolherbst: egl != gles
18:54 waltercool: but egl is something like a layer between GLes and GL?
18:54 karolherbst: best sentence so far: "EGL is the OpenGL ES API for wayland like GLX is it for OpenGL on X"
18:54 karolherbst: no
18:54 imirkin: EGL is like GLX
18:54 karolherbst: yeah
18:55 waltercool: oh! Didn't know that
18:55 imirkin: you can do GL, or GL ES, or other junk, with EGL
18:55 karolherbst: usually you should use EGL for new code, but EGL seems to lack some features compared to GLX
18:55 imirkin: like openvg
18:55 karolherbst: openvg...
18:55 imirkin: it's dead.
18:55 karolherbst: I figured
18:55 imirkin: but the point is the EGL isn't just for GL things
18:55 waltercool: that's for vectoring, right?
18:56 imirkin: ya
18:56 baozich: imirkin: hmmm, disable FB_NVIDIA doesn't help...
18:57 imirkin: baozich: didn't think it necessarily would, but worth a shot
18:57 karolherbst: imirkin: what do you think was blury in the talos trace?
18:58 imirkin: karolherbst: huh? at the draw call you pointed out, GL_TEXTURE1 looks different on nouveau and i965
18:59 karolherbst: I didn't compare anything so far
18:59 waltercool: Hey guys, a hard and maybe funny question, why would you being interested on Nvidia if they are kinda PITA with nouveau? Challenging? Fun? IDK?
19:00 karolherbst: I vote for IDK
19:00 imirkin: fun challenges
19:00 waltercool: ahhaha
19:00 karolherbst: :D
19:00 karolherbst: fun IDK callanging
19:00 imirkin: it's like a puzzle
19:01 waltercool: but, how did you handle the legal problems with it?
19:01 imirkin: what legal problems?
19:01 waltercool: there is no bullying from Nvidia?
19:01 karolherbst: what legal problems?
19:02 imirkin: not to my knowledge
19:02 imirkin: we don't do anything untoward
19:02 karolherbst: but hey, wine had a lot of MS fun I think
19:02 karolherbst: :D
19:02 karolherbst: or was it mono?
19:02 waltercool: uhgg don't mention this garbage :P
19:02 karolherbst: its no garbage
19:03 waltercool: mono? Is a good software with a good faith and good team, but a Microsoft strategy to do "something"
19:04 waltercool: but luckily, Mono is a very big help for Wine team
19:04 karolherbst: I don't know, seems like a valid project to me
19:05 waltercool: Yup, indeed is a good project, but only the Mono community
19:05 karolherbst: imirkin: anybody said something about the trace today I thought it was you
19:06 karolherbst: yeah well, its not their choice what to implement is it?
19:06 imirkin: karolherbst: yeah... see above. about GL_TEXTURE1
19:07 karolherbst: okay, checking on intel
19:07 karolherbst: now I like having two gpus :)
19:07 karolherbst: I can like compare on the same screen
19:07 waltercool: mux or muxless?
19:07 waltercool: oh, muxless
19:07 karolherbst: optimus
19:07 imirkin: yeah. on my other box i have i965 and a gk208 but the gk208 goes into a vnc window... less convenient unfortunately. i should get dri3 going on there
19:07 waltercool: DRI_PRIME is still working?
19:08 karolherbst: what means "stills"?
19:08 karolherbst: still
19:08 karolherbst: yeah it does work
19:08 waltercool: nice
19:08 karolherbst: allthough I am on DRI3, stuff is a little bit different there
19:08 karolherbst: imirkin: what should be different?
19:08 imirkin: the image
19:09 waltercool: I mean, optirun/primusrun is very energy efficient, but I don't like the idea to "compile" a module externally
19:09 imirkin: look at GL_TEXTURE1 for both
19:09 imirkin: the nouveau one is a lot bluer on the right hand side
19:09 karolherbst: mhh, at the draw call, both seems to look equal
19:09 karolherbst: ahh okay, let me check
19:09 karolherbst: which one?
19:09 karolherbst: the wall surface texture?
19:09 imirkin: GL_TEXTURE1 :p
19:09 imirkin: it's a semi-transparent one
19:10 imirkin: maybe it was 2?
19:10 karolherbst: there are a lot of GL_TEXTURE1
19:10 imirkin: they're all the same
19:10 imirkin: just different levels
19:10 imirkin: pick the level=0 one
19:10 imirkin: maybe it was GL_TEXTURE2
19:11 imirkin: it's a weird texture, half-transparent, the other half also has strong alpha.
19:11 karolherbst: mhh
19:11 karolherbst: what does the texture show?
19:12 karolherbst: the first GL_TEXTURE1 has the wall surface
19:12 karolherbst: alpha
19:12 karolherbst: for the surface shadows I guess
19:12 imirkin: ok, so skip that one. i got it wrong
19:12 imirkin: GL_TEXTURE2
19:13 karolherbst: the one with blue at the bottom and sand yellow/pink at the top?
19:13 imirkin: yea
19:13 imirkin: i think there might be 2 of those
19:13 imirkin: but yeah that sounds right
19:14 karolherbst: its only 16x16 big :/
19:15 imirkin: no
19:15 imirkin: there's one that's at least 256x256
19:15 karolherbst: then you mean the GL_TEXTURE1 with blue bottom and nearly white at the top
19:16 karolherbst: alpha at the top actually
19:16 imirkin: yes!
19:16 imirkin: so it *was* GL_TEXTURE1
19:17 karolherbst: yeah, but there is also a GL_TEXTURE1 with the wall surface more at the top of the list
19:17 imirkin: unlikely
19:17 karolherbst: but there is
19:17 imirkin: that one's probably GL_TEXTURE10
19:17 karolherbst: no, its 1
19:17 imirkin: or perhaps one's GL_TEXTURE_2D and the other is GL_TEXTURE_CUBE or whatever
19:17 karolherbst: ahh yeah
19:17 karolherbst: wall is 2D
19:18 karolherbst: okay, looks the same on nouveau and intel for me
19:18 imirkin: hmmmm
19:18 imirkin: one's not a little bluer on the right hand side?
19:19 imirkin: look at them side-by-side
19:19 imirkin: for me one of them had a lot more blue
19:20 karolherbst: https://i.imgur.com/h3jxsDX.png?1
19:21 imirkin: wait no
19:21 imirkin: wrong texture
19:21 imirkin: the one i'm thinking of also had a bunch of yellow
19:21 imirkin: and a bit of red
19:22 karolherbst: another GL_TEXTURE1, I see
19:22 imirkin: hehe
19:23 karolherbst: https://i.imgur.com/ii3N6ZS.png?1
19:24 karolherbst: but this looks like the sky to me
19:24 karolherbst: just saying
19:24 imirkin: nope
19:24 imirkin: keep going
19:24 imirkin: the red was more horizontal
19:24 imirkin: and/or yellow
19:25 karolherbst: oh right there is another GL_TEXTURE1
19:25 imirkin: chances are if you make the side pannel bigger you'll see that they're not actually all GL_TEXTURE1
19:26 karolherbst: https://i.imgur.com/kF5NhsD.png?1
19:26 imirkin: no
19:26 imirkin: these are all sky
19:26 imirkin: and too big
19:26 imirkin: look for one that tops out at 256x256
19:26 karolherbst: there are al 256x256
19:26 karolherbst: *all
19:26 imirkin: oh. but you zoomed them in, i see
19:26 karolherbst: yeah
19:27 imirkin: hold on, let me see what's going on here
19:27 waltercool: OK, nouveau doesn't work with my videocard
19:28 imirkin: waltercool: how so?
19:28 karolherbst: waltercool: do you have the 4.1 kernel?
19:28 waltercool: yup
19:28 waltercool: is bad?
19:28 waltercool: just loading the module, it crash
19:28 imirkin: do you have logs from such a crash?
19:29 waltercool: What kind of log would be useful?
19:29 waltercool: I have some info from dmesg
19:29 imirkin: dmesg after the panic
19:29 waltercool: http://pastebin.com/eRutSXjt
19:30 waltercool: Nopaste since module load to now
19:30 waltercool: I mean, a dmesg since...
19:30 karolherbst: yeah well, your video card is no video card
19:30 imirkin: karolherbst: ok, i think it's the GL_TEXTURE1, GL_TEXTURE_CUBE_MAP_NEGATIVE_Z
19:30 waltercool: yeah, muxless VC
19:31 imirkin: waltercool: that seems reasonable enough... what makes you say there's a kernel crash?
19:32 karolherbst: imirkin: this was this one: https://i.imgur.com/ii3N6ZS.png?1
19:32 imirkin: hm right.
19:32 imirkin: and it was the same on intel?
19:32 imirkin: oh wait no
19:32 karolherbst: one is nouveau and one is intel
19:32 imirkin: POSITIVE_X
19:33 karolherbst: so this one: https://i.imgur.com/kF5NhsD.png?1
19:33 imirkin: maybe i dreamed it
19:33 karolherbst: maybe
19:33 karolherbst: there are only 3 wall stuff there
19:33 karolherbst: GL_TEXTURE0
19:33 imirkin: nah, it was one of those
19:33 imirkin: wtvr
19:34 karolherbst: mhh
19:34 karolherbst: but why should the sky cause this issue?
19:34 karolherbst: it has to do something with dynamic lightnind somehow anyway
19:34 karolherbst: *lightning
19:36 imirkin: who knows what changes based on that feature
19:36 karolherbst: if you look more carefully at the green area, it looks like the lightning cause by the sun, it also is on the tree
19:37 karolherbst: the green on of right tree changes a bit
19:43 imirkin: ugh. probably a shader miscompile. that thing is enormous though =/
19:44 imirkin: at least we know that the TGSI is (probably) ok
19:44 karolherbst: yeah
19:44 karolherbst: softpipe is fine
19:45 karolherbst: maybe some stupid devision
19:45 karolherbst: or wrong optimisation
19:46 waltercool: Ok, I have more log http://pastebin.com/iBG9knHb
19:47 waltercool: what could it be?
19:47 waltercool: :S
19:47 karolherbst: I already told you
19:47 karolherbst: "yeah well, your video card is no video card" :p
19:47 karolherbst: maybe its not propagated as one, how could X pick it up?
19:47 imirkin: waltercool: you're trying to do something odd. or perhaps systemd is screwing you...
19:48 karolherbst: imirkin: what about "vgaarb: this pci device is not a vga device" ?
19:48 imirkin: waltercool: oh, you have bumblebee. you'll have a lot of trouble with that.
19:48 imirkin: karolherbst: it's fine.
19:48 karolherbst: okay
19:48 imirkin: karolherbst: 3d accelerator vs vga controller
19:48 karolherbst: I see
19:49 imirkin: waltercool: remove (or otherwise don't use) bumblebee
19:49 karolherbst: allthough bumblebee should start fine
19:49 karolherbst: maybe the config is messed up
19:49 karolherbst: mhhh
19:49 imirkin: karolherbst: too many variables, and it's completely unnecessary for 99% of users
19:49 karolherbst: I have an idea
19:49 karolherbst: either DRI_PRIME
19:50 karolherbst: or set the PCI address in xorg.conf.ouveau
19:50 karolherbst: *xorg.conf.nouveau
19:50 karolherbst: that the card is on 3:00.0 is really strange
19:50 waltercool: imirkin: Yeah, I disabled bumblebeed
19:51 karolherbst: imirkin: oh wait
19:51 karolherbst: I think I might found something
19:51 waltercool: DRI_PRIME doesn't return anything
19:52 karolherbst: waltercool: http://nouveau.freedesktop.org/wiki/Optimus/
19:52 waltercool: DRI_PRIME=1 returns my intel card
19:53 karolherbst: waltercool: do you have DRI2 or DRI3?
19:53 waltercool: there is a way to verify that?
19:53 karolherbst: xorg log
19:54 waltercool: oh right, let me check
19:54 karolherbst: nope found nothing
19:55 waltercool: DRI2
19:56 waltercool: but I have mesa compiled with DRI3
19:56 waltercool: well, if Xorg didn't use it, should be the same
19:56 karolherbst: mhh dri3 is disabled in xf86-video-intel
19:58 karolherbst: waltercool: you have to follow the instructions in the dri2 part
19:59 karolherbst: you need xf86-video-nouveau installed
19:59 karolherbst: and do these xrandr calls
19:59 karolherbst: most likely you have to restart X after loading nouveau
19:59 waltercool: Yeah, I'm on that current status
20:00 waltercool: I can't rmmod nouveau
20:00 karolherbst: right
20:00 waltercool: nouveau loads, restarted X, and can{t be unloaded
20:00 karolherbst: yeah, right
20:00 waltercool: X was restarted by force, just crashed
20:00 karolherbst: nouveau won't be removable that way anymore
20:00 karolherbst: because xorg just picks it up and loads the nouveau ddx driver
20:01 waltercool: so, how would you turn off a device?
20:01 waltercool: oh
20:01 waltercool: no muxless so?
20:01 karolherbst: switcheroo and runpm
20:01 waltercool: well, switcheroo detects the videocard
20:01 karolherbst: switcheroo has to be enabled in the kernel
20:01 waltercool: no idea about runpm
20:01 karolherbst: it should turn off after some time
20:01 waltercool: I already got it ;)
20:02 waltercool: but DRI_PRIME=1 only detects the crappy intel
20:02 karolherbst: mhh
20:02 karolherbst: is mesa built with nouveau support?
20:02 waltercool: also, I have connected xrandr offload
20:02 waltercool: yup
20:03 karolherbst: so xrandr --listproviders gives you two cards
20:03 karolherbst: one Intel and one nouveau?
20:03 waltercool: yup, already connected there
20:03 waltercool: Provider 1: id: 0x4f cap: 0x5, Source Output, Source Offload crtcs: 0 outputs: 0 associated providers: 1 name:nouveau
20:04 karolherbst: well then it should work
20:04 karolherbst: what gives you LIBGL_DEBUG=verbose DRI_PRIME=1 glxinfo
20:05 waltercool: damn
20:05 waltercool: libGL: screen 0 does not appear to be DRI3 capable
20:05 karolherbst: wait
20:05 karolherbst: set LIBGL_DRI3_DISABLE=1
20:05 waltercool: but... BUT, seems like glxgears detect the card I think
20:05 karolherbst: it fallbacks to intel anyway
20:06 karolherbst: you can't be sure
20:06 waltercool: nope, just opens intel card
20:07 karolherbst: even with LIBGL_DRI3_DISABLE=1 LIBGL_DEBUG=verbose DRI_PRIME=1 glxinfo ?
20:07 karolherbst: would be interessting what the debug out is at the top
20:07 waltercool: but I mean, glxgears without DRI_PRIME=1 is SLOWER indeed
20:07 waltercool: let me check
20:08 waltercool: karolherbst: no interesting debug on top :(
20:08 karolherbst: what does slower mean? It should be at 60fps anyway
20:09 waltercool: Without DRI_PRIME: 406 frames in 5.0 seconds = 81.095 FPS, With DRI_PRIME: 27940 frames in 5.0 seconds = 5587.829 FPS
20:09 karolherbst: ...
20:09 karolherbst: the first just sounds wrong or do you have a 80 Hz display?
20:10 waltercool: let me check with switcheroo status with DRI_PRIME
20:10 waltercool: 60Hz
20:10 karolherbst: then it should be 60
20:10 waltercool: Well, the second line, without primusrun: 300 frames in 5.0 seconds = 59.999 FPS
20:11 karolherbst: in fact both runs should be at 60
20:11 karolherbst: you don't use primusrun anymore
20:11 waltercool: sasjdaisj DRI_PRIME sorry
20:11 waltercool: I'm just too used to say primusrun, haha
20:11 karolherbst: anyway, both calls should give you 60 fps
20:12 karolherbst: even if the second one is on the nvidia card
20:12 waltercool: DRI_PRIME=0 -> 60fps, DRI_PRIME=1 5xxx fps
20:12 waltercool: hmmm
20:12 karolherbst: then most likely your vsync settings are a bit strange or DRI2 behaves really different
20:12 karolherbst: what about DRI_PRIME=1 glxspheres
20:13 karolherbst: it should display which driver it uses
20:13 waltercool: Intel
20:13 waltercool: both
20:13 imirkin: what's your glxinfo output?
20:14 waltercool: and with dri_prime 1 is annoying
20:14 imirkin: (pastebin)
20:14 waltercool: yup
20:14 imirkin: with DRI_PRIME=1
20:15 waltercool: http://pastebin.com/Tm65gnvP
20:15 waltercool: I attached both
20:15 waltercool: with LIBGL_DEBUG
20:16 imirkin: did you ever run 'xrandr --setprovideroffloadsink 1 0' ?
20:16 waltercool: nope, I used the interface names, let me try
20:16 imirkin: works fine with the names too
20:16 imirkin: i was just lazy to type them out
20:16 imirkin: DRI_PRIME=1 glxinfo | grep "OpenGL vendor string"
20:16 imirkin: what does that print?
20:16 waltercool: let me restart, maybe primusrun just screwed something
20:16 waltercool: Intel
20:17 imirkin: are there errors in dmesg?
20:17 waltercool: Indeed
20:17 imirkin: pastebin?
20:18 waltercool: http://pastebin.com/nSPsuwtc
20:18 waltercool: haha I was doing that
20:18 karolherbst: hehe
20:18 karolherbst: why?
20:18 waltercool: why I was uploading into pastebin?
20:19 imirkin: hmmm... unclear how bad those are
20:19 karolherbst: the ACPI thikn is pretty normal
20:19 karolherbst: *thing
20:19 imirkin: could be we're missing some bit of init for GM108
20:19 imirkin: we have a mmiotrace for one, but haven't processed the differences in gr init yet
20:20 imirkin: and by 'we' i mean the main nouveau dev
20:20 karolherbst: it kind of annoying that DRI_PRIME fails silently
20:20 waltercool: yeah, no dammn log, just says Intel
20:21 waltercool: would do something if I load DRI3 on Xorg?
20:21 karolherbst: waltercool: you need to compile xf86-video-intel with dri3 support
20:22 karolherbst: but it shouldn't change much
20:22 karolherbst: its just easer to use then
20:22 waltercool: I can do that
20:22 waltercool: let me modify my ebuild
20:23 karolherbst: it may cause other bugs though
20:23 karolherbst: dri3 seems to be pretty unstable, but for me it worked better all the time than dri2
20:24 waltercool: naaah, but we can discard that
20:24 karolherbst: but I think I am the only one
20:24 waltercool: this is debugging, we can do it :P
20:24 waltercool: I will rollback later
20:25 waltercool: what's life without risks?
20:26 waltercool: ok, let me restart... gg
20:27 karolherbst: imirkin: did you also noticed, that glretrace is slower on nouveau than intel?
20:28 waltercool: and I'm back with DRI3 without major issues
20:28 karolherbst: then try out DRI_PRIME=1 glxinfo
20:28 waltercool: works!
20:28 karolherbst: :D
20:28 waltercool: yey!
20:29 waltercool: well, that's why risks are very interestings :P
20:29 karolherbst: mhhh
20:29 karolherbst: is the nouveau module in use?
20:29 karolherbst: shown in lsmod
20:29 waltercool: indeed
20:29 karolherbst: mhhh
20:29 karolherbst: stupid X server
20:29 waltercool: I have also the nvidia driver, but isn't signed, so, it won't load it
20:29 gnurou: baozich: you may want to try this patch on your kernel: https://github.com/Gnurou/linux/commit/c27c0f2cdad3caa30337f4730c5159414c2aaa32
20:30 karolherbst: ugly workaroung: https://gist.github.com/karolherbst/19205eeacc9e9453c231
20:30 karolherbst: waltercool: but you would have to modify the PCI address of the nvidia card
20:30 karolherbst: but its ugly
20:30 waltercool: yeah, Xorg should autoconfig that
20:30 waltercool: let me try both
20:30 karolherbst: no
20:30 karolherbst: its not what I meant
20:30 waltercool: no?
20:30 karolherbst: X should NOT load the nouveau module with DRI3
20:31 karolherbst: because its not needed
20:31 waltercool: ahhh
20:31 karolherbst: it only disallows you do unload nouveau
20:31 karolherbst: *to
20:31 karolherbst: its messy if you want to use bumblebee sometimes
20:31 karolherbst: because you would have to restart X everytime
20:31 karolherbst: and reconfigure your system
20:32 karolherbst: does nouveau turn your card off?
20:32 waltercool: I'm watching just that, hehe
20:32 karolherbst: should happen after 5 seconds or something
20:33 waltercool: how can I say that?
20:33 waltercool: vgaswitcheroo just says DynPwr
20:33 waltercool: No + there
20:33 karolherbst: waltercool: http://nouveau.freedesktop.org/wiki/Optimus/
20:33 karolherbst: "Checking the current power state"
20:34 waltercool: yeah
20:34 waltercool: but only says DynPwr
20:34 karolherbst: mhhh
20:34 karolherbst: it took too long already
20:34 waltercool: so, is On, right?
20:35 karolherbst: currently I still use bbswitch to turn it off, because it won't work for me either
20:35 karolherbst: but thats expected because of a hack
20:35 karolherbst: does "echo OFF > /sys/kernel/debug/vgaswitcheroo/switch" help?
20:35 waltercool: oh, let me try
20:36 waltercool: nope
20:36 waltercool: Neither bbswitch
20:36 karolherbst: bbswitch won't work as long as nouveau is loaded
20:36 karolherbst: only after
20:36 karolherbst: uload
20:36 karolherbst: *unload
20:37 waltercool: hmmm
20:37 karolherbst: you see the problem? :D
20:38 waltercool: I'm currently seeing some problems :P Like... IDK how to turn off the videocard haha
20:39 karolherbst: unload nouveau
20:39 karolherbst: tell bbswitch to turn it off
20:39 karolherbst: I don't really know how much bbswitch messes up internally with switcheroo
20:39 karolherbst: sadly
20:39 waltercool: can't unload nouveau :(
20:39 karolherbst: but as long as I only play around with nouveau and will use bumblebee for most serious stuff, I still use bbswitch
20:39 waltercool: but... a good news, Civ5 works great
20:39 karolherbst: yeah, because x server picked up the card
20:40 karolherbst: I see
20:40 waltercool: hehe
20:40 karolherbst: never tried it until now
20:40 waltercool: would work with your hack?
20:40 karolherbst: seems to work for me so far
20:41 karolherbst: (tm)
20:41 waltercool: driver dummy.... ugggggh
20:41 karolherbst: yeah, doesn't matter
20:41 karolherbst: with dri you don't need the nouveau ddx driver
20:41 karolherbst: *dri3
20:42 karolherbst: but switcheroo may work if nouveau is loaded before starting X the first time after boot
20:42 karolherbst: and not unloaded later
20:42 karolherbst: at least it seems to be for me this way
20:43 waltercool: yup, I agree it shouldn't disable nouveau
20:43 waltercool: is nasty
20:44 waltercool: that works for muxed videocard, but shouldn't for muxless
20:44 waltercool: btw, why sna?
20:45 waltercool: to avoid uxa?
20:45 karolherbst: sna is default anyway
20:45 karolherbst: I just put it there a year ago
20:45 karolherbst: I think you can safly remove it
20:46 waltercool: haha I remember the whole transition from exa -> uxa -> sna
20:46 waltercool: was PITA
20:47 waltercool: nouveau uses glamor, right?
20:47 karolherbst: yeah
20:47 karolherbst: intel wants to use it too
20:47 karolherbst: but its not fast enough yet
20:47 waltercool: well, it should be an standard with that indeed
20:48 waltercool: effort less, maintainance more
20:49 waltercool: ok, let me restart X
20:49 karolherbst: xwayland supports only glamor anyway
20:49 waltercool: really?
20:49 karolherbst: yeah
20:49 waltercool: What would happen with all Xorg apps? Will be dead? There will be some compatibility layer?
20:50 waltercool: I know wayland is far different...
20:50 karolherbst: xwayland
20:50 waltercool: but reduces a lot of layers
20:50 karolherbst: on wayland you don't need glamor
20:50 waltercool: xwayland isn't just wayland inside X?
20:51 karolherbst: its X inside wayland
20:51 waltercool: oh you are right, weston is wayland inside X
20:51 karolherbst: no
20:51 karolherbst: weston is a wayland compositor
20:51 waltercool: just compositor?
20:52 karolherbst: you can run weston without x
20:52 waltercool: but I'm afraid is still hard to use
20:52 waltercool: I mean, for users
20:52 karolherbst: we will see
20:52 waltercool: lot of work to do
20:53 waltercool: ok, let me restart to do this test
21:03 waltercool: OK, nouveau loads fine, but PRIMUS doesn't like nouveau again
21:04 karolherbst: mhh
21:04 waltercool: no DRI3 error this time
21:05 karolherbst: I think I will go to bed now, can't think anymore :/
21:05 waltercool: haha go friend, let's sleep something
21:05 waltercool: Is late also here, hahaha and I still in my job position :( I should go to home some hours earlier, hehe
21:06 waltercool: thank you anyways for the help btw
21:06 imirkin: gnurou: that was basically the patch i suggested ;)
21:24 gnurou: imirkin: ah I missed it, sorry :P
21:25 imirkin: gnurou: is that the only arch weirdness you found for gm20b?
21:30 gnurou: imirkin: that and the tiling issue that Ben merged yesterday
21:31 gnurou: but that later should not be a concern for dgpu
21:31 imirkin: it should be. but only maxwell ones
21:31 imirkin: his is a kepler
21:32 gnurou: s/later/latter
21:32 gnurou: should be good then