01:03 sxi: hi does anyone know where the soft strap bios offset is for the gtx 680?
01:03 sxi: i'm trying to do one of those gpu mods to enable tcc
01:42 mwk: tcc?
01:43 sxi_: yea the tcc vs wddm driver mode
01:44 sxi_: it's supposed to be faster on the tcc driver with cuda
01:45 sxi_: i think the gtx 680 is supposed to be modifiable to a k10 since the device id for gtx 680 is "10DE 1180" vs k10's "10DE 118F"
01:45 sxi_: so i have to turn on 4 bits in the "or" soft strap, but i don't know where that is in the bios
01:45 sxi_: in the rom sorry
01:50 mwk: I thought that Quadrifying or Teslifying GeForces has been impossible since the NV40s?
01:50 mwk: ormaybe since G80
01:51 sxi_: well i got my info from this thread: http://www.eevblog.com/forum/chat/hacking-nvidia-cards-into-their-professional-counterparts/
01:52 sxi_: they were using resistors but somebody said that it is still possible to just mod the bios to turn a gtx 680 into a k10
01:54 mwk: yes, the straps come from two sources
01:54 mwk: the actual resistors, and a value in bios
01:55 mwk: the eeprom straps used to be 0x58:0x68
01:55 mwk: but ISTR they changed it around Kepler, so might not be true for GTX 680
01:56 sxi_: yea that's what this article also says: https://www.altechnative.net/2013/11/25/virtualized-gaming-nvidia-cards-part-3-how-to-modify-a-fermi-based-geforce-into-a-quadro-geforce-gts450gtx470gtx480-to-quadro-200050006000/
01:59 mwk: I don't quite get why people bother with hardware modifications
01:59 mwk: wouldn't it be simpler to get a hex editor and hack the driver instead?
02:00 mwk: the dev id whitelists are in there somewhere
02:00 mwk: they changed the driver to check hw straps only? you can just nop it out...
02:01 sxi_: i agree, but i don't know how hard the driver is to reverse
02:02 mwk: well, I do... it's hard, but not *that* hard
02:03 mwk: and has a high "hack once, use everywhere" factor
02:03 mwk: which cannot be said for soldering
02:03 sxi_: yup exactly
02:04 sxi_: on windows the patches might need to be kept up to date
02:04 sxi_: maybe bindiff can do the job but i haven't played with that
02:04 mwk: hmm, right, windows involves driver signing too
02:06 sxi_: so how did you guys find the strap offsets before? was it driver reversing or is it somewhere in nvflash.exe
02:07 mwk: neither
02:07 mwk: there are hw registers containing a copy of the overrides from eeprom, which I've found lots of time ago
02:08 mwk: and then some day I was studying vbios format and found the stuff at 0x58 looked very similiar to the values commonly found in these registers
02:23 sxi_: cool
02:23 sxi_: i read some old logs here http://people.freedesktop.org/~cbrill/dri-log/?channel=nouveau&date=2013-11-11
02:24 sxi_: @gordan mentioned that there's this 0x400 uefi header, and the bytes at 0x458 do look a bit like soft straps
02:25 sxi_: do you know how the strap checksum is calculated? i can see if the checksum is correct
02:27 mwk: there is no strap checksum that I know of
02:27 mwk: there is, however, whole-bios checksum
02:27 mwk: which is calculated as usual for PCI biosen
02:48 sxi_: ok i'll give this a shot and see if it works
03:56 karolherbst: imirkin: are you sure that catalyst doesn't implement ARB_shading_language_include?
04:25 RSpliet: karolherbst: Apple implements it across all GPUs: https://developer.apple.com/opengl/capabilities/
04:25 karolherbst: yeah I checked that already
04:25 karolherbst: but he said to me that catalyst doesn't on Linux
04:26 karolherbst: which would be weird, cause the game kind of does run on catalys just with visual issues (divinity)
04:26 RSpliet: don't have one of those cards yourself do you? glxinfo would easily confirm it
04:26 karolherbst: nah, I don't have any amd cards here
04:28 karolherbst: well maybe some radeon guy knows
04:28 karolherbst: but would be somehow fun to actually implement this
04:28 karolherbst: I doubt they use anything else besides the #include thing
04:28 karolherbst: which is kind of a really nice thing to have actually
04:29 RSpliet: could be a nice way of diving into the crypts of Mesa
04:29 RSpliet: I don't think any of the drivers need to provide additional support
04:29 karolherbst: no
04:29 karolherbst: it is a glsl only thing
04:30 karolherbst: it's pretty simple actually
04:30 RSpliet: well, make sure you get all the details of the extension right
04:30 RSpliet: but yes
04:31 RSpliet: do it! ;-)
04:31 karolherbst: you do like NamedStringARB(SHADER_INCLUDE_ARB, name_len, "engine/awesome/shader.glh", "your glsl code here", str_len);
04:31 karolherbst: and then in your glsl you do "#include <engine/awesome/shader.glh>"
04:31 karolherbst: and this gets somehow replaced by the above string
04:32 karolherbst: well writing piglit tests should be easy enough for this
04:33 RSpliet: uhuh, but details like the __FILE__ macro, leading /... you know, the details ;-)
04:35 karolherbst: RSpliet: yeah and those included shaders can also contain an include
04:36 karolherbst: anyway, I will wait until they respond to my message, I think they were just sloppy and use this extensions whenever they detect a nvidia card
04:36 karolherbst: because catalyst doesn't seem to have it
04:57 wvvu: what's the difference between '/dev/card0' and /dev/dri/card0' ??
04:58 wvvu: I trying to figure out why valgrind-mt didn't produce any sort of info.
05:03 karolherbst: RSpliet: and I really got a cold now :/
05:19 imirkin: karolherbst: at least it didn't... http://people.freedesktop.org/~imirkin/glxinfo/glxinfo.html#v=Vendor
05:30 karolherbst: imirkin: k, because that would mean the decide upon the vendor_id or something :/
05:30 karolherbst: right, I could try to run it with my intel gpu and see what it does there
06:36 martm: MichaelLong: i read radeon docs, and it seems quite easy there to make such a scheduler, if nvidia hw is similar , and likely it is, than i stand by my words, making the perf fly is one of the easiest tasks
06:45 martm: full ssa that hw backend deals with the stuff would quite an easy task
07:47 robclark: btw, anyone see an issue w/ out of tree kernel build (ie. O=/some/path/not/root/of/kernel/tree) in nouveau recently?
07:48 imirkin_: robclark: with ben's tree, or the regular kernel tree?
07:48 robclark: regular kernel tree
07:48 imirkin_: (the answer to both is "never tried" for me, but good to know specifically :) )
07:48 robclark: looks like include path issue..
07:49 robclark: I guess not too hard to fix, but if there was an upstream patch already I'd just take that instead..
07:49 imirkin_: everything got pretty majorly rejiggered in 4.3
07:49 karolherbst: robclark: you mean a build in the top level or inside drm?
07:50 karolherbst: ohh wait, my mistake, I thought you meant the out of tree module
07:50 robclark: I mean toplevel kernel build, but using O=..
07:50 karolherbst: I think I tried this like a month ago
07:50 karolherbst: and it kind of worked for me
07:50 robclark: bunch of:
07:50 robclark: error: implicit declaration of function ‘agp_backend_release’ [-Werror=implicit-function-declaration]
07:50 robclark: agp_backend_release(pci->agp.bridge);
07:50 imirkin_: karolherbst: out-of-tree build is different from regular.
07:56 karolherbst: yeah I figured, I am pretty sure I tried it out some time ago though
08:02 imirkin_: robclark: anyways, i'm not aware of any patches flying by that would have fixed this
08:02 imirkin_: robclark: are you, perchance, building on an odd platform, like arm?
08:02 imirkin_: one that perhaps doesn't have AGP support :)
08:02 robclark: no.. but I know what is going on..
08:02 imirkin_: ah ok
08:02 robclark: with all the reshuffling I ended up w/ two slightly different os.h
08:02 imirkin_: oh weird
08:03 karolherbst: robclark: did you also got some function prototype mismatches?
08:03 karolherbst: like kernfs_find_and_get?
08:04 robclark: no.. but if it picked up the old nvif/os.h which missed the #include for agp_backend.h...
08:04 karolherbst: ohh
08:05 karolherbst: using O= in the kernel seems to be slightly unreliable somehow anyway :/
08:06 robclark:normally uses O= all the time..
08:07 karolherbst: yeah I usually build in the root directory
08:07 karolherbst: I think you shouldn't ever mix this :/
08:07 karolherbst: even mrproper didn't worked really in this case
08:08 karolherbst: could be my fault though, cause my build script is pretty much customized
08:10 karolherbst: imirkin_: by the way, I guess you didn't found time for the zcull integration thing?
08:10 imirkin_: no
08:10 imirkin_: i suck.
08:11 karolherbst: no problem though
08:11 imirkin_: hold on, give me a minute
08:11 karolherbst: but I was thinking, about it a little, could it be, that EarlyZ runs after vertex shaders?
08:12 imirkin_: it most certainly does.
08:12 karolherbst: I am not quite sure how the OpenGL pipline works, but this would be the only time where it actually would make sense
08:12 karolherbst: ahh right
08:12 karolherbst: and zculling would run before vertex shaders
08:13 imirkin_: let me forward you a thing i wrote for... pmoreau maybe? i forget. it was an overview of the GL pipeline
08:13 karolherbst: wait
08:13 karolherbst: I found something already
08:13 karolherbst: https://www.opengl.org/wiki/Rendering_Pipeline_Overview
08:13 karolherbst: can't be too wrong I guess
08:14 imirkin_: too late. i sent it.
08:14 imirkin_: you don't have to read it.
08:14 karolherbst: uhhh, so dropping vertexes would reduce shader invocations quite a lot actually
08:14 karolherbst: ahh nice, thanks
08:15 karolherbst: imirkin_: unigine heaven is awesome to show what tesseletion is :) you could like add it to the text, because you can just disable it at runtime and immediately see the difference
08:16 imirkin_: well... it makes good use of tessellation
08:16 imirkin_: but it in no way explains how the tess stages work
08:16 karolherbst: okay right, I meant it can show you what you can do with it
08:29 karolherbst: ohhh, okay, I think I got it then
08:29 karolherbst: Zcull is used to speed up the entire pixel discarding thing, so that the gpu can do more usefull stuff in total :)
08:30 imirkin_: yeah
08:30 imirkin_: it makes early depth tests faster
08:30 imirkin_: [and presumably stencil]
08:31 karolherbst: seriously? now there are seriously nvidia-linux only games? :/ ......... this can't be a good sign
08:31 karolherbst: I hope that really doesn't become a thing, because that would mean mesa has to implement all kind of stuff
08:32 mwk: karolherbst: what games?
08:32 karolherbst: xcom2
08:32 karolherbst: and divinity
08:33 karolherbst: I mean they say that at first it is like nvidia only, but there might be no support at all later on
08:34 karolherbst: there may be others as well
08:36 karolherbst: mwk: ever looked at ARB_shading_language_include?
08:45 mwk: karolherbst: no
08:46 karolherbst: this could be something mesa would have to implement to support divinity (and maybe other games, because that's a thing in HLSL)
08:47 imirkin_: karolherbst: i think it's been nak'd by some people who know what's up... you should figure out why first.
08:48 imirkin_: karolherbst: totally untested beyond compilation: http://hastebin.com/qosuyazuga.coffee
08:48 karolherbst: I mean okay, but won't this be reconsidered if there is actually stuff requiring it without changing that? But yeah I try to find out why
08:48 karolherbst: thanks
08:49 imirkin_: nvc0->zcull might be null when you hit the zcull validate thing -- you need to check for that explicitly and clear things out
08:49 imirkin_: (i didn't do that)
08:49 karolherbst: ohh I found somehting: http://lists.freedesktop.org/archives/mesa-dev/2012-April/020692.html
08:49 imirkin_: also there's a bit more to it
08:49 imirkin_: since you need to be careful about blits and whatnot messing up the various state
08:50 imirkin_: so more things need to cause zcull to get deleted
08:51 imirkin_: karolherbst: oh you should also dirty the NEW_FRAMEBUFFER state in that if (...) in nvc0_clear
08:51 karolherbst: k
08:51 imirkin_: i.e. explicitly do nvc0->dirty |= NVC0_NEW_FRAMEBUFFER
08:51 imirkin_: anyways, i'm reverting the changes. enjoy.
08:54 karolherbst: k
09:14 imirkin_: karolherbst: oh, my rounding code is wrong too :)
09:15 imirkin_: er no, it's fine
09:24 karolherbst: I will try it out and see what it does visually :D
09:24 imirkin_: it will crash until you fix the thing i mentioned in validate_zcull
09:27 karolherbst: ohh nice, page allocation failures in the kernel...
09:28 karolherbst: not because of your stuff though
09:28 karolherbst: have to reboot as it seems
09:33 karolherbst: ohhh
09:34 karolherbst: weird, why is my intel gpu used
09:35 karolherbst: ha, no dri3 :/
09:39 karolherbst: imirkin: do you mean "/* XXX tmpl.format = ? this has implications for the memtype too */" or something else?
09:41 imirkin_: no
09:41 imirkin_: in validate_zcull
09:42 karolherbst: I think you didn't left any comment there or I just doesn't see what you mean
09:45 karolherbst: ohh
09:45 karolherbst: you meant "nvc0->zcull" being NULL
09:45 imirkin_: i didn't leave a comment
09:46 imirkin_: which is why i wrote it here :p
09:46 karolherbst: yeah I noticed, but because I had to restart I somehow forget and had to open the logs to check it :)
09:54 karolherbst: imirkin_: also what is currently written into the zcull buffeR?
09:55 imirkin_: ?
09:56 karolherbst: I mean what is inside nvc0->zcull
09:57 imirkin_: information that the gpu reads and writes about the zcull buffer?
09:57 karolherbst: I meant more like is it a texture with actual content now or is there nothing in it, despite being a buffer
09:58 imirkin_: it has content. the content is whatever's written by the gpu as it's storing info for later zcull usage.
09:58 karolherbst: ahhh okay
09:58 imirkin_: it has to store that data somewhere. the zcull buffer is it.
09:58 imirkin_: but perhaps it doesn't need to be a texture, but instead needs to be a buffer
09:58 imirkin_: i'd have to check a trace
09:58 imirkin_: look at the memtyp
09:58 imirkin_: etc
09:59 karolherbst: okay, so all I have to basically figure out is what commands I have to execute to the gpu so that it actually do the right thing
09:59 imirkin_: pretty much always the question before us :)
10:00 karolherbst: clearing the zcull thing seems easy though: send region == 0 and done
10:01 karolherbst: ohh and there is a test_mask thing
10:01 karolherbst: that sounds important
10:58 imirkin_: hakzsam: could you review the nv50 compute patch i just sent?
10:59 imirkin_: erg, of coruse i forgot to remove the other free
11:19 karolherbst: it is somehow painfull that I don't see that zculling is doing anything :/
11:19 karolherbst: I also don't know what I should expect to change
11:19 imirkin_: moar fps
11:20 karolherbst: moar like 10% or more like 2%?
11:20 imirkin_: moar like depends on the exact app :)
11:20 imirkin_: could be 0
11:20 karolherbst: mhh
11:20 karolherbst: I am testing with heaven
11:20 glennk: 0-20% for the particular operation
11:20 RSpliet: and if they are countable (are they?) less fragment shader "threads"
11:21 karolherbst: glennk: ahh yeah, do you know any game/application which benefits really much from this?
11:21 imirkin_: RSpliet: i don't think so
11:21 karolherbst: well mesa can tell us the number of fs-invocations
11:21 imirkin_: it's just a faster depth test
11:21 glennk: well, apps that do a lot of depth tested overdraw
11:21 karolherbst: anyway, I disabled msaa for this, because it makes zcull less effective anyway
11:22 karolherbst: glennk: yeah, but do you have any in mind?
11:22 glennk: shadow mapping would be one place
11:22 glennk: far cry?
11:22 karolherbst: well, it has to be somehow run on linux :)
11:22 glennk: just that game in particular has "awesome" visibility culling
11:22 karolherbst: and I am not doing stuff like that with wine
11:23 karolherbst: ahh mhh
11:23 karolherbst: other cry engine based games should too then I suppose
11:23 glennk: as in "we have view frustum culling" and "depth testing"
11:23 glennk: well, the newer ones have a software z occlusion buffer on pc
11:23 RSpliet: imirkin_: wait, zculling gets rid of vertices that are out of sight right? I ehh... wait, help, do fs work on viewport pixels? they should work on object pixels right - and since there's less...?
11:24 karolherbst: RSpliet: it is faster
11:24 karolherbst: RSpliet: same amount of pixels, just faster eliminated
11:24 glennk: they no longer do stuff like draw the tress behind the mountain behind more trees behind two sets of walls when you are indoors
11:24 karolherbst: in theory
11:24 karolherbst: glennk: right
11:24 karolherbst: glennk: I think unigine heaven just sucks completly here anyway
11:24 glennk: which btw answers the question "why is far cry so taxing on hardware?"
11:24 imirkin_: glennk: wouldn't that happen anyways with early depth tests?
11:25 karolherbst: because it simplky doesn't matter if I look directly on to a wall or not, I get nearly the same fps
11:25 karolherbst: except looking into the sky helps...
11:25 imirkin_: glennk: just a question of what gets rasterized, but not necessarily full frag invocations
11:25 glennk: the unigine demos are useless for this
11:25 karolherbst: thanks for verifying this, I feared as much
11:25 glennk: imirkin_, well, hier-z can reject blocks of fragments in one go
11:26 karolherbst: glennk: I also have borderlands, bioshock and saints row iv, do you think any of them would benefit a lot?
11:26 glennk: a simple test: glDepthFunc(GL_LESS), then draw the same quad a bunch of times
11:26 imirkin_: glennk: right, which goes faster. but same # of frag invocations...
11:27 karolherbst: glennk: I have no clue about OpenGL, sorry :D
11:27 glennk: if its >> theoretical pixel fill rate of drawing those n quads, you have working hier-z
11:27 imirkin_: glennk: i guess it can also help with ARB_conservative_depth
11:28 glennk: imirkin_, frag invocations is frag shader invocations right?
11:28 imirkin_: yes
11:28 glennk: only those that pass the early depth/stencil tests increment that counter
11:29 glennk: so you could have a tile that gets fully rejected, say about 64 fragments, by a single check to hier-z
11:29 imirkin_: right, which is way faster than doing them one at a time
11:29 imirkin_: but it doesn't result in a diff # of frag shader invocs
11:29 karolherbst: at laest it shouldn't
11:29 glennk: ah yes i see what you mean now
11:30 glennk: you want the fragments rasterized counter
11:30 glennk: or at least the workload counter of the fixed function rasterizer block
11:31 glennk: so yeah, mostly a benefit if thats the block you have a bottleneck in
11:31 glennk: glxgears fullscreen should be some difference for example
11:32 benwaffle: I just switched from nvidia to nouveau. When I start an X session everything looks like its shaking
11:32 karolherbst: glennk: I have a laptop
11:32 karolherbst: glennk: pcie bus is the bottleneck here usually
11:32 glennk: karolherbst, that is an unwise choice for gpu hacking :-p
11:32 karolherbst: why?
11:32 karolherbst: if I mess my driver I don't have to reboot
11:34 imirkin_: benwaffle: well-timed earthquake maybe?
11:34 benwaffle: unfortunately, no
11:34 imirkin_: do you have a nForce2 NV1F gpu?
11:34 benwaffle: everythings flickering in X, not a VT
11:34 benwaffle: NVC0
11:34 imirkin_: literally nvc0? i.e. GF100?
11:34 benwaffle: GF114 i think
11:35 benwaffle: yeah GF114
11:35 imirkin_: pastebin dmesg + xorg log... it'll answer a bunch of questions
11:35 karolherbst: glennk: soo mhh, could you check some games if hiz effects anything in them? I don't know how to find any game which would be effect by this, because 1. I don't know how to disable this with nvidia, if it is possible at all and 2. I have no radeon gpu to find such myself
11:35 hakzsam: imirkin_, yeah, I'll do a bit later
11:35 karolherbst: glennk: or would you say every "normal" game should be effected?
11:35 glennk: glxgears fullscreen on a 6950 is about 35% difference
11:36 karolherbst: :O
11:36 karolherbst: k
11:36 karolherbst: glennk: how many fps?
11:36 glennk: 1000 vs 1350
11:37 benwaffle: http://termbin.com/683g
11:37 benwaffle: http://termbin.com/4rl9
11:38 karolherbst: glennk: sadly I am maxed out at lowest core clock already :/
11:38 benwaffle: ah, imirkin_, removing my xorg.conf fixed it
11:38 karolherbst: at 625 fps
11:38 imirkin_: cool :)
11:39 karolherbst: and my pcie load is pretty high
11:39 benwaffle: sweet it solved all my problems
11:39 imirkin_: [ 558.827572] nouveau 0000:01:00.0: disp: ERROR 4 [INVALID_VALUE] 84 [] chid 0 mthd 0828 data 000099bb
11:39 imirkin_: that's not great
11:39 glennk: karolherbst, if i'm not mistaken the fast clear also requires htile buffer, so nohyperz would disable that as well
11:40 benwaffle: imirkin_: do you know what it means
11:40 imirkin_: not in any useful manner
11:40 imirkin_: appears that value 0x99bb is invalid for disp method 828 :)
11:41 imirkin_: (duh!)
11:41 karolherbst: okay, I think the zcull stuff does something, but... maybe the pcie overhead kind of kills performance in that case, because there is actually something uploaded on the gpu, don't know though
11:42 karolherbst: ohh wait
11:45 benwaffle: now, gnome-shell wayland causes a lock up
11:47 imirkin_: can't win 'em all
11:47 benwaffle: i can ping it
11:47 benwaffle: nmap: Host seems down
11:50 benwaffle: ah, gnome-shell on X doesn't work either from gdm
13:07 karolherbst: mhh, somehow starting a second X server without -sharevts hangs the screen refresh of my intel X server :/
13:07 imirkin_: coz it tries to switch vt's
13:07 karolherbst: with 1.17 it just exit
13:07 imirkin_: and your intel screen isn't being driven
13:08 karolherbst: ohh okay
13:09 karolherbst: seems like glxgears -fullscreen doesn't work on the other X server :/
13:09 imirkin_: well you don't have any displays attached
13:09 karolherbst: well at least there is the geometry parameter
13:10 karolherbst: ohhh
13:10 karolherbst: over 3000 fps
13:10 imirkin_: helps to not have to display anything
13:10 karolherbst: ohh I thought it actually helps if you don't need to copy over the pcie bus
13:11 imirkin_: benwaffle: btw if you want to debug further, please try to obtain logs, esp dmesg. and make sure to update mesa.
13:11 karolherbst: but does displaying to a display really does so much of a difference?
13:11 benwaffle: i have latest packages
13:11 benwaffle: but i don't want to debug, just want gnome to work, idk if its nouveau problem anymore
13:12 imirkin_: probably is.
13:21 karolherbst: imirkin_: in the mmt I see rows like that: PM: 0x02a1e89c GK104_3D.VERTEX_ARRAY_START_LOW[0] = 0x2a1e89c [0x102a1e89c] [0x1029a0000+0x7e89c] [GK104_3D.ZCULL_LIMIT_LOW+0x7e89c], I am not quite sure what to do with that. Does it tells the gpu where the vertexes are which may be eliminated through zcull?
13:22 imirkin_: it says that it's setting to that address. oh and btw, here's where it has also seen this address.
13:22 imirkin_: could be purely coincidental
13:22 imirkin_: or not.
13:22 karolherbst: well
13:22 karolherbst: I get this stuff rather often
13:23 karolherbst: okay, but the ZCULL thing just means the same value was set for this as well
13:25 karolherbst: mhhhh
13:25 karolherbst: let me check something
13:25 imirkin_: not same
13:25 imirkin_: value+x
13:25 imirkin_: it's just an "FYI" type of thing
13:25 imirkin_: buffers are allocated sequentally
13:25 imirkin_: so it's likely that things are adjacent to one another :)
13:26 karolherbst: okay
13:26 karolherbst: I think
13:26 karolherbst: this zcull buffer is only uploaded once
13:26 karolherbst: or
13:26 imirkin_: never
13:26 karolherbst: maybe it isn't something like that
13:26 imirkin_: zcull is purely a gpu-private buffer
13:26 karolherbst: I mean, these zcull_address thingies
13:26 karolherbst: are only set once
13:27 imirkin_: that means depth buffer never changes
13:27 imirkin_: or they're very clever about how they manage zcull stuff.
13:27 karolherbst: I even resized the window actually
13:27 karolherbst: the size changes
13:27 karolherbst: but never the address
13:27 karolherbst: or the limit
13:28 karolherbst: I should try out some game for this
13:28 imirkin_: well, they don't need to change i
13:28 imirkin_: they probably just invalidate its contents
13:28 imirkin_: and adjust size/etc
13:29 imirkin_: so as long as you just allcoate a really big one up-front, all good? dunno
13:29 karolherbst: will check this
13:30 karolherbst: .... this valgrind really start to annoy me... it never works with 32bit stuff on 64bit systems
13:53 karolherbst: imirkin_: I got like 9 address changes at the beginning (first 1%) of an 7GB mmt, but that's pretty much it
13:53 imirkin_: ok
13:55 karolherbst: usually after some ZETA, CB_something, scissor, viewport stuff
13:56 karolherbst: maybe only when GK104_3D.SCREEN_SCISSOR_HORIZ and GK104_3D.SCREEN_SCISSOR_VERT are also set
13:56 benwaffle: imirkin_: i think you're right. gnome & i3 both lock up under X
13:56 karolherbst: yeah
13:56 karolherbst: I think it is the SCREEN_SCISSOR stuff
13:56 karolherbst: in this game they are set to full hd then back to 1x1
13:58 imirkin_: benwaffle: weird... well i have a fermi and it's pretty stable for me as long as i don't do crazy stuff
13:58 imirkin_: benwaffle: if debugging isn't your thing sounds like it's time to go back to the blob
13:59 benwaffle: imirkin_: i'll debug
13:59 benwaffle: i got two lines in red from nouveau :o
13:59 imirkin_: benwaffle: try to get dmesg when the hang happens
14:00 benwaffle: imirkin_: systemd's journal grabs dmesg
14:00 benwaffle: right
14:00 benwaffle: it's showing me kernel messages
14:00 imirkin_: i don't care where you get it from
14:00 imirkin_: i just want what's in it :)
14:01 karolherbst: imirkin_: any idea what the region could be? in simple stuff like glxgears and glxspheres it was either set to 0x3f (maybe max value of someting?), 0x0 or 0x1, but in divinity it is also set to 0x5 or 0x2 or various different values
14:01 imirkin_: karolherbst: nfc
14:01 karolherbst: k
14:02 benwaffle: imirkin_: http://termbin.com/jpp4 those last 2 lines
14:02 imirkin_: is "system-logind" secretly Xorg?
14:03 imirkin_: benwaffle: please confirm your mesa version
14:03 benwaffle: 11.1.1
14:04 imirkin_: hrmph
14:04 benwaffle: kernel 4.4
14:04 benwaffle: sorry, 4.3.5
14:04 benwaffle: lets try 4.4
14:06 imirkin_: unlikely to matter =/
14:06 imirkin_: i have no idea. what's worse is i have no idea why i never see such issues
14:06 imirkin_: but on occasion others do
14:07 imirkin_: maybe something to do with running fancy desktops
14:07 benwaffle: imirkin_: it locked up in i3
14:07 imirkin_: did you have some compositing bs in there?
14:08 benwaffle: idk
14:09 benwaffle: imirkin_: weston (wayland) seems fine though
14:10 benwaffle: shit nvm
14:10 imirkin_: it sounds like 3d accel is just hosed on your card =/
14:10 imirkin_: you could try running with blob firmware
14:10 benwaffle: it broke it this morning when i updated
14:10 benwaffle: what about nouveau.noaccel
14:11 imirkin_: that'll fix everything and disable accel :)
14:11 imirkin_: you'd still have modesetting though
14:12 benwaffle: everything is support slow
14:12 benwaffle: super
14:13 imirkin_: right. no accel.
14:13 benwaffle: ughj
14:14 benwaffle: imirkin_: so where does the problem lie? mesa?
14:14 imirkin_: no idea
14:14 imirkin_: fun, right?
14:14 benwaffle: \o/
14:14 imirkin_: you could try using blob fw to see if it's in our context switching logic
14:14 imirkin_: whereby your gpu could have some sort of "funky" layout we don't properly account for... or something
14:14 benwaffle: DRM: GPU lockup - switching to software fbcon
14:22 benwaffle: my lockup is a little different on a livecd
14:22 benwaffle: can move the cursor but thats it
15:01 glennk: imirkin_, zcull is definitely per buffer, otherwise you'd have constant resolves with apps switching between fbos
15:01 glennk: r200/r300 era hardware had a single on-chip hier-z ram, so could only use it with a single buffer per frame
15:05 imirkin_: glennk: actually on G80-class it's an on-card buffer ... or something. on GF100 it's a separate buffer.
15:07 glennk: i think its why r300 supports a kernel-managed shared depthbuffer, so that apps can use hyperz
15:30 mwk: on G80, it's 8 on-card buffers
15:30 mwk: NV20-NV40 and GF100+ have it in VRAM
15:31 mwk: I suppose they moved it to dedicated RAM on G80 for speed, but moved it back on GF100 since now they had L2?
15:44 glennk: that, and the use case moved towards apps using lots of fbos
16:50 karolherbst: imirkin_: I think the buffer has to be a multiple of 0x20000 in size
16:54 Guest36475: hello
16:54 Guest36475: is there a difference between '/dev/card0' and '/dev/dri/card0'?
16:55 airlied: there shouldn't be a /dev/card0
16:57 Guest36475: well it's there
16:57 karolherbst: what does udevadm /dev/card0 tell you?
16:57 airlied: not sure where you got it from, nobody should be creating it
16:57 karolherbst: I meant udevadm info /dev/card0
17:02 imirkin: karolherbst: ok. well you should perhaps try just doing pipe_buffer_create. but then the memtype won't be set... you need to carefully look at the expected memtype...
17:02 karolherbst: yay my message regarding divinity and ARB_shading_language_include and some of their sloppy work was directed to the linux lead programmer :)
17:02 karolherbst: imirkin: yeah I want to first figure out what nvidia is doing
17:02 karolherbst: no idea when they set the size to 0x40000 though
17:02 karolherbst: a 4k+ window of glxgears didn't make the driver resize the zcull buffer
17:03 Guest36475: 'missing or unknown command'
17:03 karolherbst: so I would say it isn't much related to the actual screen size but to something more
17:03 karolherbst: Guest36475: "udevadm info /dev/card0"
17:04 karolherbst: imirkin: what should I look for to see how the nvidia driver allocates the memory for this area?
17:05 imirkin: should be in the ioctl params to allocate that memory space
17:07 karolherbst: mhh what string should I search for in the mmt? Because there doesn't seem to be any allocation near the zcull stuff :/
17:08 imirkin: look for where that address gets allocated
17:10 Guest36475: karolherbst: they are both the same
17:10 karolherbst: this thing? LOG: NVRM_IOCTL_VSPACE_MAP post, fd: 9, cid: 0xc1d00071, handle: 0xcaf0001b [class: 0x0002 NV1_DMA_FROM_MEMORY], dev: 0xbeef0003 [class: 0x0080 NVRM_DEVICE_0], vspace: 0xbeef0202, base: 0x0000000000000000, size: 0x0000000000020000, flags: 0x00000000, addr: 0x0000000100a20000, status: SUCCESS
17:10 karolherbst: Guest36475: ohh weird
17:10 karolherbst: Guest36475: because udevadm info shouldn't print this message
17:11 karolherbst: okay yeah, the addr is the same as the address put into zcull, so I suppose this is the right allocation
17:46 imirkin: skeggsb: should the subchan semaphore machinery work with a gart buffer?
17:47 skeggsb: yes, i don't see why not, we currently use coherent buffers too so i can't imagine we'd need flushes or anything
17:49 imirkin: i don't see why not either, but... heh. it's not working.
17:49 imirkin: although in hindsight what i'm doing might be stupid
17:49 imirkin: i'll think more.
17:49 karolherbst: skeggsb: by the way: if I set a timer to fast on the pmu (like around 1 ms) and then do some pmu requests from the host, the timer might be cancled (for whatever reasons) and my actual requests doesn't get any response. Further pmu requests are working fine though, but the timer loop is gone, any idea?
17:51 karolherbst: skeggsb: by the way, the timer and my pmu requests are both for the same process
17:58 karolherbst: ohh I could imagine it is the same place as before and now the pmu just acks this timer before it got executed or something :/
18:07 karolherbst: okay I think I got it, just have to test it out
20:45 discipline: Hello, which channel is more appropriate to talk about backlight control on nvidia 980m card? Do you believe it is done though the nvidia driver? It seems to work, but I would like to understand how it works :-)