00:36karolherbst: imirkin_: mhh, I can't find a statement for either case
00:37karolherbst: the only thing the spec is quite precise about is if the same image attached to a fbo is used as a source and the same fbo was bound for drawing
02:18karolherbst: *sigh*, I'll just run the entire WebGL through valgrind :/
02:29glisse: anyone knows how the lowerbit of 0x619f04 works ? like if i wanted to set a sys memory address there
02:29glisse: 1 seems to be for vram
02:30glisse: so i was guessing 2 for system memory cache coherent
02:31imirkin: don't know for sure, but perhaps the same enum as in PTE's?
02:31glisse: yeah that's what i tried
02:31imirkin: may i inquire as to what you're trying to do?
02:31glisse: and shifting the dma addr in various 4 increment
02:32imirkin: also, from your questions, you already have quite an advanced understanding of nvidia board operations...
02:32glisse: trying to use that to test dma ie can the gpu access main memory
02:32glisse: right now i am trying on x86 were the gpu works
02:33imirkin: ok, and so you're feeding the address into 0x1700?
02:33imirkin: while setting 619f04 to 0 or 2 or whatever?
02:33imirkin: note that the 8 bit also has to be set
02:34imirkin: i mean, 0x8, not the 8th bit
02:34glisse: looking at the bios thing i am not sure how 0x1700 works, i thought that writting addr to 619f04 and than reading at 0x700000 was it
02:35imirkin: in order to ensure you're using it semi-correctly, try reading/writing vram through that window
02:35imirkin: so 0x1700 is the base address for accesses at 0x70000000
02:35imirkin: er, too many 0's
02:36imirkin: have a look at nvkm/subdev/instmem/nv50.c
02:37imirkin: (the *_slow methods)
02:37glisse: yeah that one has many 0 :)
02:45imirkin: glisse: https://github.com/envytools/envytools/blob/master/rnndb/display/g80_pdisplay.xml#L787
02:46imirkin: and 619f00
02:47imirkin: no indication of how it might be used
02:47imirkin: i think it's a separate mechanism
02:47imirkin: you really need to use 0x1700 + 0x700000
02:47karolherbst: imirkin: we more or less use that window in nvafakebios to upload a vbios into vram though
02:48imirkin: karolherbst: right, but ... it has no connection to the 0x700000 thing
02:48imirkin: (or does it?)
02:48karolherbst: it does
02:48karolherbst: it's the pramin method used
02:48glisse: what is shr for ?
02:48glisse: shift right
02:49imirkin: karolherbst: yeah, but read nvafakebios -- it reads the address from 619f04 and then writes it into 0x1700
02:49imirkin: glisse: yes, shr = shift right
02:49karolherbst: mhh, nvidia uses 0x619f04 to somehow know how mauch vram the GPU has, but that value isn't necessarily set
02:51glisse: what is ram window ?
02:51glisse: like it sounds i should be using ram window rather than rom
02:51imirkin: you should be using neither
02:51imirkin: what address are you trying to write to?
02:51imirkin: i.e. have the gpu write (or read)
02:52glisse: well right now i am trying a page that i dma map
02:52glisse: on x86
02:52glisse: i am only reading to avoid bad thing
02:52imirkin: so you have the physical (or dma) address, yes?
02:52glisse: dma address
02:52imirkin: crap, lost my place. hold on.
02:54imirkin: then you write (addr >> 16) | (0x2 << 24) to 0x1700
02:54imirkin: and then you read from 0x700000 + addr & 0xffff
02:55imirkin: and if the dma address is higher than 40-bit, you weep
02:56imirkin: if this is pascal, good chance that's slightly different, since it has 48-bit addresses? dunno.
02:57glisse: they haven't updated most of this stuff
02:57glisse: i think they updated them only in volta or turing
02:57karolherbst: should be volta
02:58imirkin: oh ok. i kinda stopped caring around kepler, for obvious reasons
02:58glisse: that works :)
02:58imirkin: yw :)
02:58glisse: well now i need to wait my turn to get time on the ppc thing ...
02:59imirkin: oh. you're testing on x86. which you have in your posession. makes sense.
02:59imirkin: like the supercomputers of yore...
02:59glisse: x86 is the super computer for kids
03:00karolherbst: I am sure x86 is one of the platforms where you have _more_ issues the more CPUs you have
03:00imirkin: having lots of slow cpu's is almost universally worse than one fast one...
03:01karolherbst: I am sure by default everything gets slower if you have two x86 CPUs instead of one
03:01karolherbst: identical ones
03:01glisse: that's amd issue on windows with threadripper
03:01glisse: on linux in most cases it works ok
03:02glisse: after all people at spent the last 10yers optimizing for NUMA
03:02karolherbst: okay, right, because linux usually schedules threads smart enough
03:02glisse: also memory placement
03:02karolherbst: and tries make use of caches
03:02glisse: that matter much more
03:02glisse: well depends on the workload
03:02glisse: bandwidth accross socket or chiplet sucks
03:02karolherbst: memory bandwidth sucks as well
03:02glisse: half iirc of the bandwidth of your local connected RDIMM
03:03karolherbst: you are good as long as you hit the L3 cache, which gets harder if you get a second CPU on the board :) or your workload is just crappy
03:04glisse: try atomic to remote memory ...
03:04karolherbst: fun, I guess?
03:04glisse: those things are joy to work with
03:04glisse: some folks consider each socket as an independant computer
03:04karolherbst: smart thing to do
03:04karolherbst: solves most of your issues
03:05karolherbst: because you think different about your application
03:05karolherbst: the hdckahdkjasdhkasjchnkjasch1!
03:05karolherbst: valgrind caught an error
03:05glisse: yeah, again depends on your workload, some poor smuck have dataset too big for single socket
03:05karolherbst: imirkin: guess waht happened then
03:05imirkin: it crashed
03:05karolherbst: much worse
03:05imirkin: it hung
03:05karolherbst: nope, the terminal window closed
03:06karolherbst: and you know, I was smart
03:06karolherbst: I added the "--noclose" flag
03:06imirkin: sorry, it's impolite to laugh at another man's pain...
03:06imirkin: but ... hahahah
03:07karolherbst: mhh... maybe I should log into a file...
03:08glisse: damm some one made a 10h video of haha and uploaded it to youtube ... https://www.youtube.com/watch?v=2LM0CZZ9Uw8
03:08glisse: internet bring the best in human
03:08karolherbst: imirkin: I am sure, if I tell valgrind to log into a file, chromium is smart and restarts the thread and valgrind just overwrites the old log file
03:08karolherbst: mhh, log into an fd
03:08karolherbst: that sounds like what I want
03:08karolherbst: or a pipe
03:08karolherbst: mhh, that sounds like a good idea
03:09karolherbst: uhm... fifo I meant
03:11imirkin: oh, it's the stupid thing where chrome restarts the thing? very annoying
03:11karolherbst: okay, next try
03:12imirkin: you can attach gdb to the valgrind
03:12imirkin: and have it pause on error
03:12karolherbst: I rather not
03:12karolherbst: gdb + valgrind brings back bad memories
03:12karolherbst: I just let valgrind write into a fifo and have a cat | tee somewhere else
03:12karolherbst: should work fine as well
03:13karolherbst: mhh, allthough right, might be easier to debug with gdb, but I hope valgrinds output is good enough
03:16imirkin: well, you can just continue in gdb as necessary
03:16karolherbst: right, but I never had luck reproducing just issues when I used gdb + valgrind at the same time
03:16imirkin: yeah, dunno
03:17karolherbst: maybe it would work in that case, dunno
03:18karolherbst: imirkin: or those GLES3 CTS crashes, never happend when I ran the entire CTS within gdb :/ stuff like that is just annoying
03:23glisse: when a program crash you can have the kernel dump the program state in /var/crash or something like that
03:23glisse: then you can run gdb after the fact
03:23glisse: forgot how it all works as it tends to be distro specific
03:29imirkin: glisse: btw, it occurs to me ... this is a pascal board, right?
03:29imirkin: how is the option rom execution handled on it?
03:29imirkin: [on ppc]
03:30glisse: it is pascal, x86 gpu on a ppc, i don't think anything get executed
03:30glisse: the bootload is just a linux kernel
03:30glisse: that kexec your kernel
03:30imirkin: ok... i don't think that'll work with the firmware loading mechanism
03:30imirkin: iirc it relies on there already being something there
03:30imirkin: not sure.
03:31imirkin: although ... hm. if that were the case, it wouldn't work on resume, but it does.
03:31imirkin: so perhaps i'm just mistaken.
03:34imirkin: ben would know for sure
03:59maxthecat_: when I rmmod nouveau, I got rmmod: ERROR: Module nouveau is in use.
03:59maxthecat_: so I follow the instructions in https://nouveau.freedesktop.org/wiki/KernelModeSetting/, stop lightdm service, echo 0 > /sys/class/vtconsole/vtcon1/bind, vtcon1 is frame buffer device, then my system stops working.
03:59maxthecat_: Can anybody help me fix it? I use Linux 4.4 . Thanks.
04:00karolherbst: imirkin: https://gist.githubusercontent.com/karolherbst/3fa9d15e0a7f146e8d265c70cc889f9f/raw/1426aa59a94d136d84509d61acbfb6963dfb8607/gistfile1.txt
04:02karolherbst: looks like user after free
04:03karolherbst: somehow looks like fallout from my mt fixes to be hones
04:46imirkin: maxthecat_: please elaborate on "stops working"
07:22maxthecat_: imirkin: After entering 'echo 0 > /sys/class/vtconsole/vtcon1/bind', I was unable to enter commands or switching terminal. have to reboot.
08:00mtf8: so I've got my dual-dvi GeForce 610 working nicely using the free nouveau driver and I have not dl'd, extracted, and placed the non-free firmware on my disk under /lib/firmware/nouveau. I'm trying to understand if I care...
13:36karolherbst: interesting, it has nothing to do with my mt fixes
14:16karolherbst: imirkin: nice.. I found an application which causes a hw channel kill in less than a minute :)
15:03karolherbst: imirkin: okay, I think I know what the cause of the memory corruption is: at some point we resize the text area, but the pushbuffer still has references to it. With bad luck, we deleted it, but the internal list still has the pointer to it (pushbuf_flush frees, pushbuf_validates accesses)
15:04karolherbst: and this happens inside the same thread
15:06karolherbst: mhh, seems like we don't del bos with a refcnt of != 0, so maybe we just forget to take a ref somewhere
15:10karolherbst: I made the initial text size 1 << 15 instead of 19, now it crashed in under a minute :)
15:12karolherbst: mhh, but there are like 40 active contexts... maybe it is indeed mt related?, wondering
15:14karolherbst: also, with all the fixes, GLES3 deqp: Failed: 4/42653 (0.0%)
16:04imirkin: karolherbst: hmmm ... the codesize thing is tricky. iirc i ran into that bug and already fixed it :)
16:04imirkin: er, code page
16:04karolherbst: imirkin: ahh, so you have a patch but it isn't merged yet?
16:05imirkin: no, pushed ages ago
16:05karolherbst: mhh, seems like there are more issues :)
16:06karolherbst: mhh, I guess there is more to it
16:06imirkin: i wonder if you need a nouveau_pushbuf_validate in there
16:06imirkin: but that did fix the issue i was having
16:06imirkin: which was with bindless + dirt showdown or whatever
16:07karolherbst: yeah.. I mean I only found that one through chromium running the WebGL cts for over an hour
16:07imirkin: iirc skeggsb and i decided that this was enough at the time though
16:07karolherbst: there is some weird stuff going on though
16:07imirkin: so perhaps something else odd
16:07imirkin: the way libdrm works is very confusing.
16:07imirkin: which is one of the reasons i want to kill it off
16:07karolherbst: there are tons of channel creations and destroys
16:09karolherbst: but that shouldn't really matter as the pushbuf is used by all contexts
16:10karolherbst: imirkin: after reading some of the code today, it seemed like we could make the libdrm much much simplier if we just assume one bo/pusbuf/whatever is only ever used in one thread and just simplify things towards that (or well, reimplement the API or whatever into a second version of something)
16:37imirkin: karolherbst: ideally i'd like to import the replacement into mesa
16:37imirkin: since it's all very tightly integrated anyways
16:38imirkin: libdrm_nouveau's API is appropriate for much simpler use-cases like the one in the ddx
16:46imirkin: airlied: any clue what this "lock vga" stuff is about? https://github.com/skeggsb/nouveau/blob/master/drm/nouveau/nvkm/engine/disp/vga.c#L127
17:21karolherbst: imirkin: so, it seem like that bo gets deleted on the next flush, but it's still in the bctx->pending list
17:21imirkin: coz it's ref'd
17:21imirkin: i.e. pushbuf_refn
17:21karolherbst: well, a nouveau_bufref is in the pending list with a reference to it
17:22imirkin: er, hm. right.
17:22imirkin: we have a bufctx bin for all that stuff
17:22imirkin: do we not clear it out / remove that one from it?
17:22imirkin: that'd be bad.
17:23karolherbst: you mean the reference to the text bo?
17:23imirkin: er, yes
17:23imirkin: hold on
17:24imirkin: ok, so right after we do
17:24imirkin: nouveau_bufctx_reset(nvc0->bufctx_3d, NVC0_BIND_3D_TEXT);
17:25imirkin: and dump the new text bo in
17:26imirkin: so i think the current logic should work ok...
17:26imirkin: at least so far as i understand it.
17:26karolherbst: I don't think the problem is that we don't keep a reference, we just never unref the old bo
17:27imirkin: why not
17:33karolherbst: ohh, no, you are right. The bo gets dedleted because it's refcnt is 0, but it still seems to be used somewhere
17:33karolherbst: or uhm, at least y statement wasn't correct
17:34imirkin: i wonder if we need to a pushbuf_validate to go with the PUSH_REFN
17:34imirkin: before the bo delete
17:34karolherbst: yeah, maybe
17:35imirkin: i'm gonna go play with ppc for a bit. good luck :)
17:37karolherbst: imirkin: doing a pushbuf_validate right after the PUSH_REFN deletes the bo as well :)
17:39imirkin: ok, we have a work thing
17:39imirkin: fence_work thing, rather
17:39imirkin: that allows you to delay such things
17:45karolherbst: okay... the bo is refed _after_ it gets deleted
17:46karolherbst: in nvc0_blit
17:48karolherbst: nv50_miptree(pipe_blit_info.src.resource)->base.bo is the text bo
17:48karolherbst: or at least the address is the same
17:55imirkin: that can very easily happen
17:55imirkin: you free() and then malloc(), you get the same address
17:59karolherbst: only difference, there was no new bo created with the same address
18:00karolherbst: ohh wait, got my printfs wrong again
18:03karolherbst: okay, I can confirm there is no new bo with the same address until the corruption appears
18:05karolherbst: or... not always
18:14imirkin: crap. i think my g5 has died =/
18:17karolherbst: imirkin: how to unref a bo from a bufctx?
18:17imirkin: no clue
18:18imirkin: i mean bufctx_reset() for the most part
18:19karolherbst: nvc0->bufctx_3d keeps the reference
18:19karolherbst: to the old text bo
18:20kisak: karolherbst: that sandybridge / fermi 525M the other day ... so far the 525M with nouveau is acting subjectively better than the intel chip with the few games I tried to run. marginally better framerate and less/no rendering artifacts vs intel
18:21karolherbst: kisak: huh? how's that possible
18:21karolherbst: the intel driver ain't that bad
18:22karolherbst: kisak: sure it uses the intel driver?
18:22karolherbst: not that it does like.. software rendering
18:22kisak: sure, it's using the gpu in both cases
18:22karolherbst: how sure?
18:22imirkin: kisak: SNB is a pretty shitty chip
18:23imirkin: (in terms of graphics, cpu is fine)
18:23karolherbst: imirkin: but compared to a 525m with default clocks?
18:23kisak: glxinfo vs DRI_PRIME=1 glxinfo sure
18:23imirkin: well, more features could be getting used with the fermi
18:23imirkin: kisak: if you want, there's a branch for fermi reclocking
18:23karolherbst: it might be that the intel chip is busy with the desktop though
18:23imirkin: however it didn't work for me
18:24imirkin: but i had a display attached, with increases the complications
18:24karolherbst: there are some issues with higher clocks on fermi
18:24imirkin: kisak: https://github.com/skeggsb/nouveau/tree/devel-clk
18:24imirkin: karolherbst: i used precisely the same GPU as skeggsb did for developing :)
18:24imirkin: same Quadro 400 board. except apparently somehow different.
18:25karolherbst: imirkin: there are magic clock ranges you need to do things differently, might be that you had slightly higher clocks or something :)
18:25imirkin: karolherbst: in theory it's the same board, should have the same everything
18:25imirkin: i don't think we ever compared vbios's precisely though
18:26karolherbst: in practise you also have that weirdo speedo value which might affect the highect clock selected, etc...
18:26karolherbst: it's already a thing on fermi
18:26karolherbst: just a question whether that was tested before or after my voltage patches landded
18:26karolherbst: anyway, htere are always odd things going on
18:27kisak: karolherbst: main issue with the sandybridge was broken alpha transparency on some textures with wined3d9->ogl which are fine on the fermi. not really expecting decent perf anywhere anyway
18:27karolherbst: yeah, intel can't do that
18:27karolherbst: I mean, gallium nine
18:27karolherbst: so nouveau uses gallium nine, which is a totally different path to begin with :)
18:27karolherbst: and usually faster
18:28imirkin: if you have the nine stuff set up
18:28kisak: sandybridge also somehow has input lag with a steam controller on a native game that feels fine with nouveau
18:29imirkin: well, enjoy =]
18:29kisak: I don't have gallium nine setup yet
18:29imirkin: if you do run into issues, make an apitrace and report them
18:29imirkin: in general, nouveau's fairly standards-conforming
18:32kisak: I was trying to run opengl deqp tests yesterday, but eventually gave up from segfaults for sandybridge and kernel lockups with fermi, testing ended with not being able to run mustpass lists from an unexpected int returned in a transform feedback test
18:32kisak: ^just to play around with the test suite
18:33karolherbst: imirkin: ohh right, GLES3 is fail free (except those preprocessor tests) :)
18:33imirkin: kisak: yeah, those tests do weird things
18:33karolherbst: the deqp tests
18:33imirkin: for regular software, it's pretty good though
20:48imirkin: rhyskidd: do you have push access to xf86-video-nouveau?
20:48rhyskidd: i don't think so,
20:48rhyskidd: i have to a bunch of other mesa and xorg stuff though
20:49rhyskidd: am i right that xf86-video-nouveau never moved to gitlab fdo?
20:50kisak: looks like it's not on gitlab
20:51imirkin: rhyskidd: what tree is that "radeon" commit in?
20:57imirkin: rhyskidd: the radeon code just stops calling that function entirely
21:02rhyskidd: but they call it at least once where xf86_reload_cursors was called, within their gamma fuction -- nouveau just never does that second one
21:03imirkin: have you tested it?
21:04rhyskidd: only on a tesla
21:04imirkin: did it work? =]
21:05imirkin: ok, i'll try it out locally as well
21:05imirkin: assuming all goes well, will push it out
21:05imirkin: we need to cut a new release too
21:06rhyskidd: i'm looking at the other two new warnings that were introduced building against xorg-server 1.20, they look legit
21:27karolherbst: imirkin: okay... I think in order to do it correctly, we would need to validate the pushbuffer with each bufctx_3d and bufctx_cp attached from each context
21:28karolherbst: well, nouveau_pushbu_validdate moves all items from the pending to the current list and a flush should do the trick then
22:00karolherbst: imirkin: do you think we need those "BCTX_REFN_bo(nvc0->bufctx_3d, 3D_TEXT, flags, screen->text); BCTX_REFN_bo(nvc0->bufctx_cp, 3D_TEXT, flags, screen->text);" calls?
22:00karolherbst: because removing those solves the issue
22:02karolherbst: mhh, but then we never ref the text bo :/
22:47karolherbst: but nouveau + 970m runs quite well in general :)
23:14imirkin: karolherbst: until those buffers get evicted, that works fine
23:14imirkin: rhyskidd: send patches to list if you want them reviewed by me
23:15karolherbst: imirkin: what are you refering to precisely?
23:16imirkin: karolherbst: well, the whole reason why we do this buffer tracking stuff, is so that ttm can ensure that buffers are in vram/gart
23:16imirkin: while the code is executing
23:16imirkin: er, while the commands are executing
23:16imirkin: i.e. until a fence is reached
23:23karolherbst: imirkin: ohh btw, we might have a nice solution for that pid/name issue on dead channels. We just copy the name once the channel was created
23:24imirkin: yeah, i had a temp patch to do that
23:24imirkin: passing it through everywhere is a _giant_ pain
23:25karolherbst: imirkin: ohh, cool. I've discussed this with skeggsb and I think this was his preferred solution as well
23:25karolherbst: or at least one simple trick we can do
23:25karolherbst: and which would ease our pain quite a lot
23:25karolherbst: imirkin: regarding the text bo, right. The thing is just, that we have several objects which store the reference to that bo and I don't really see a way on how to manage that correctly. Maybe we shouldn't add those to the bufctx on context creation time but just ref those whenever we execute that shader and basically track it like we do for other resources?
23:27imirkin: that's the whole purpose of a bufctx
23:27imirkin: every time that you kick, you get a new bo list
23:27imirkin: the bufctx allows you to stick things onto that bo list every time
23:27karolherbst: okay, sure, but now we have the issue that we have like 40 contexts
23:27karolherbst: and each of those refs the text bo twice
23:28karolherbst: and apperantly even after a kick, at least one of those still has the reference to the old text bo
23:28karolherbst: so at least it doesn't seem to work all that great for shared resources
23:29imirkin: hm, interesting point --
23:29imirkin: the text bo is a screen resource
23:30imirkin: but we're managing it in a per-context bufctx
23:30imirkin: that's doomed to fail
23:30imirkin: very interesting and relevant point.
23:30karolherbst: mhh, with my apporach to get each context its pushbuf we could eventually just ref the old bo on each bufctx of each context :/
23:30karolherbst: dunno if that would work out
23:31karolherbst: will add locking pain at least
23:31imirkin: so the reason that the text is screen-wide is that it introduces a stall to try to just change it
23:32karolherbst: currently the text bo is 512 kB by default... we could just increase it an never evict it .... :/
23:32karolherbst: uhm, resize
23:32karolherbst: but.. we have the same issue for the tls buffer as well I guess
23:32karolherbst: just unlikely that this will be a problem
23:33imirkin: this whole resizing is a "new" thing
23:33imirkin: the original logic had it at a fixed size
23:33imirkin: we added it to thread the needle between taking up too much vram for each program that touches GL and being able to have a large text segment for games
23:34karolherbst: maybe we could have it 2MB, but up to 0.5% of vram
23:34karolherbst: or something
23:34karolherbst: still fixed
23:34karolherbst: worst thing I noticed was bioshock infinite which caused 3 resizes
23:34karolherbst: would be 4MB
23:35imirkin: so ... remember that like some earlier tesla boards had 128mb
23:35imirkin: i realize this is nvc0 code, which could have different defaults for fermi+
23:35karolherbst: or we just say 0.5Mb up to 0.5%
23:35karolherbst: and if that's not enough, the workload is probably too heavy anyway
23:36karolherbst: 0.5MB is the current default size afaik
23:36karolherbst: size is 1 << 19 for nouveau_bo_new, whatever that means in the end
23:40karolherbst: mhh, but that is annoying the more OpenGL applications you have running
23:48karolherbst: those CTXSW_TIMEOUTs though :/
23:49karolherbst: if there would be _any_ way to debug those
23:50karolherbst:should really work on the shader trap handler at some point, could be useful