03:31karolherbst: is it normal, that someone has to wait quite some time until one gets a response to a xorg-server bug?
03:35airlied: there may or may not be anyone looking at xserver bugs
03:53airlied: skeggsb, imirkin , mlankhorst : what stops multiple threads from hitting nouveua_mm.c in the mesa drivers?
03:55karolherbst: airlied: the thing is, there is an out of bound access to a global array and I don't think I may be able to debug this, because the function where my server crashes is just called like always
03:57airlied: karolherbst: stick the bug numbner in here, I'll look tomorrow maybe
03:57karolherbst: airlied: https://bugs.freedesktop.org/show_bug.cgi?id=91316
03:57karolherbst: thanks a lot
03:58karolherbst: its just annoying, because my X server crashes like all the time now :/
03:58mlankhorst: airlied: pain and suffering?
03:58mlankhorst: I think that one's mostly per context though
04:00airlied: mlankhorst: they are part of the screen
04:00airlied: which is shared amongst contexts
04:00airlied: karolherbst: does not having AutoAddGPU FALSE help?
04:01karolherbst: airlied: no
04:01airlied: I'm not sure why you see GPU removals for card1
04:01airlied: that sounds like a kernel bug
04:01karolherbst: moving the modesetting driver helped though, but that's no stable solution
04:01karolherbst: airlied: removed nouveau driver
04:01karolherbst: its normal
04:02karolherbst: or maybe still wrong, but I guess its fine that this happens after the nouveau kernel module is unloaded
04:02airlied: karolherbst: you removed nouveau while X is running?
04:02karolherbst: yeah, why not?
04:02karolherbst: I have an intel card
04:02karolherbst: laptop with hybrid graphics
04:02airlied: it's just probably not a tested path
04:03karolherbst: its unrelated though
04:03karolherbst: because it also happens without nouveau being loaded once
04:03airlied: either way I don't think you have an X server problem
04:03airlied: more likely an intel driver one
04:03karolherbst: its just, that runpm doesn't work and I don't want to have my nvidia card be on all the time
04:04airlied: it won't turn off by not loading the driver afaik
04:04karolherbst: I also have bumblebee installed and stuff
04:04karolherbst: nouveau just began to work since 4.1
04:05karolherbst: for me
04:05airlied: so you ran it under gdb and ir crashed? and you got that backtrace?
04:05airlied: you should probably attach the whole gdb ouptut
04:05karolherbst: I have also coredumps
04:05airlied: because wht you've attached doesn't help
04:06karolherbst: mupuf seems to have the same issue
04:06karolherbst: just got the message in #intel-gfx
04:07airlied: yeah as I said more likely an intel bug
04:07airlied: hopefully they can figure it out
04:07karolherbst: would be a coredump help with a Xorg binary with debug symbols?
04:07airlied: a live gdb capture would be best
04:07airlied: but with all the info in it
04:08mupuf: karolherbst: what bug are you talking about? The one about the X server fialing to boot?
04:08karolherbst: mupuf: random crashes
04:08mupuf: oh, then I do not see the same bug
04:08karolherbst: mupuf: "[13:04] <ickle> mupuf: karolherbst has your random crash bug"
04:08airlied:zzz time &
04:08mupuf: kwin randomly freezes, but that is likely due to the weird mesa I am using
04:09mupuf: but the xserver has a corruption at boot time
04:09mupuf: more like a ton actually
04:09mupuf: and I have been trying to fix it for some time
04:09karolherbst: mupuf: like this one: https://bugs.kde.org/show_bug.cgi?id=342500
04:09mupuf: well, fix the one that annoys me
04:10mupuf: well, never experienced this one
04:11karolherbst: mupuf: so you don't have any crash in FlushAllOutput?
04:11mupuf: I do have this crash in FlushAllOutput
04:11karolherbst: so do I
04:11mupuf: ok, but I only get them at startup
04:11karolherbst: I see
04:12karolherbst: may be the same reason though
04:12mupuf: Tried going back in time in all the projects
04:12mupuf: xserver, ddx and mesa
04:12karolherbst: this happens for a week or so
04:12karolherbst: but really don't know what really changed it
04:12karolherbst: maybe gcc-4.9.3?
04:12mupuf: using arch too?
04:12mupuf: nah, I used gcc 5.1
04:13karolherbst: I see
04:13karolherbst: but since when does it happen for you?
04:13mupuf: a week
04:13mupuf: that's interesting
04:13karolherbst: I check what I've updated since then
04:14mupuf: let's move this discussion to #intel-gfx
04:53chewitt: what's the correct command for starting the xserver? .. something like: /usr/bin/xorg-launch vt01
04:56chithead: "startx -- vt1" or something similar
05:17chewitt: turns out to simply be "Xorg" :)
05:17chewitt: spot anything obvious in this verbose?: http://sprunge.us/gEVX
05:19chewitt: sorry.. http://sprunge.us/ddGi
05:53chewitt: googling.. "xf86OpenConsole: setsid failed: Operation not permitted" seems to be a failure to open a device?
05:58neoraider: chewitt, -19 is ENODEV, maybe the Nouveau kernel module is not loaded correctly, or your GPU is not supported yet?
05:58chewitt: nv46 .. and oldie
05:59chewitt: I have nomodeset in kernel params as without.. it locks up the box
06:00chewitt: imirkin was telling I can rmmod nouveau, then modprobe to test
06:01chewitt: or do I need to rmmod all the dependencies as well?
06:01chewitt: any special order to load things?
06:12pq: chewitt, nomodeset basically disables nouveau, so
06:13chewitt: so the unload/reload is cancelled out by this?
06:17pq: I suppose you need to somehow undo the nomodeset setting when loading again
06:18pq: I'm just saying that nomodeset is a likely cause for the ENODEV
06:19xexaxo: with nomodeset we load the module but we don't error out.
06:19pq: xexaxo, when X starts?
06:19chewitt: I need to figure out a way to stop Xorg from starting on boot... as when Xorg starts the box locks up and I can't access via SSH to debug further
06:20xexaxo: pq: I won't start as the ddx cannot communicate with the module.
06:20xexaxo: the module is in a "don't care, haven't touched anything" state.
06:20chewitt: if I can stop it starting I can start manually with output on SSH console
06:20xexaxo: s/I/It/ even
06:20xexaxo:checks what the intel driver does in case of nomodeset
06:20chewitt: can it be changed once the OS is running?
06:21pq: xexaxo, yeah, we're talking about the same thing :-)
06:22xexaxo: chewitt: there might be something in /sys/module but I doubt it
06:22xexaxo: pq: oops :)
06:22chewitt: I have to go afk for a while (kids) .. I will be back in a couple of hours for more pain :)
06:22xexaxo: there is another approach - blacklist nouveau on boot and modprobe at a later stage
06:23xexaxo: this way you can drop the nomodeset
06:33xexaxo: the intel kernel module does the same fwiw. it has nice comment though - Silently fail loading to not upset userspace :)
06:51imirkin: chewitt: rmmod/insmod should work. insmod doesn't pass the kernel cmdline in. although... hm. nomodeset might still make it thought, but nouveau.modeset=0 wouldn't.
06:52imirkin: airlied: nouveau + multiple concurrently used contexts = mega fail. in *so* many places. nouveau_mm is the *least* of your concerns.
06:54xexaxo: imirkin: looked at Tom's threadsafe driver/wrapper ? From a quick look it doesn't seem like it would work for nouveau but who knows.
06:55imirkin: xexaxo: it mostly would
06:56imirkin: in the past when i've proposed such a thing people were unhappy about the added mutex overhead for the 99.9999% non-concurrent case.
06:58imirkin: even though the fastpath shouldn't be that bad
06:58imirkin: mlankhorst also had a patch to clean up a bunch of the things, but far from everything
06:58xexaxo: hmm... iirc your locking was at a different level, wasn't it ? perhaps it would have had the same impact though - don't think I've looked at the patch.
06:59imirkin: yeah, my locking was just on nv50_vbo
06:59imirkin: i.e. way less locking ;)
07:00mlankhorst: imirkin: yeah gave up on it for now
07:00xexaxo: that's what I recall as well
07:00imirkin: mlankhorst: but you hated the mutex idea coz of the overhead yes?
07:01imirkin: even though mutexes have a fastpath...
07:01mlankhorst: imirkin: won't help if one thread waits for the other to complete its download of a pixmap
07:02imirkin: mlankhorst: majority use-case is single-context-at-a-time though
07:02imirkin: the 2 simultaneously used context thing is completely broken right now, so i don't really care if we make it "slower"
07:02imirkin: i just want to keep the currently-working things as fast as they are
07:56chewitt: using nouveau.modeset=0 and rmmod/insmod followed by Xorg -verbose.. the "xf86OpenConsole: setsid failed: Operation not permitted" is gone
07:57chewitt: still see "(EE) [drm] Failed to open DRM device for pci:0000:01:00.0: -19" though
07:57imirkin: chewitt: pastebin dmesg
07:58imirkin: you seem generally competent at debugging this stuff, but if you just show these tiny snippets, i won't be able to get the full picture. so please include full logs, both dmesg and xorg.
07:59imirkin: if you're having trouble retrieving these logs, then that's by far and away the first thing you should be addressing
08:05chewitt: Xorg.0.log: http://sprunge.us/EiXX
08:06chewitt: dmesg after a reboot.. buffer has overflowed
08:13imirkin: and/or use netconsole
08:14night199uk: q: for the scripts included in the BIOS tables (and other places) for nvidia is there a standalone parser?
08:14imirkin: night199uk: nvbios
08:14night199uk: i think envytools can parse the ones that are actually in the bios
08:14night199uk: imirkin: can parse a script from elsewhere? let me check
08:14imirkin: mmmmm.... maybe not
08:15imirkin: but... i think that things like SUBDIRECT and whatnot take an absolute offset into the bios, not a relative one
08:16imirkin: so the code won't make much sense if you don't know where it starts
08:16imirkin: HOWEVER, i think you can tell it to decode a script that starts at location x
08:16imirkin: and nothing prevents you from saying that x = 0
08:16night199uk: let me check out the source
08:16night199uk: see if it can easily be adapter
08:16imirkin: i.e. nvbios -i 0 foo.script
08:16night199uk: i’m manually doing them by hand for short scripts right now
08:16night199uk: oh, interesting
08:17night199uk: let me give that a go
08:17imirkin: but if there are call or subroutines, then it won't know to decode those
08:17imirkin: or perhaps it will, but it will do a bad job
08:17night199uk: i imagine not, given the source
08:19imirkin: hmmm... i guess that might not work for a few reasons
08:19imirkin: first off, offset 0 doesn't work
08:19imirkin: secondly, it still assumes that it's a vbios and will do the other vbios decoding
08:19imirkin: try -i 1 and see how far it gets
08:19night199uk: i can use different offsets
08:20night199uk: maybe pass in a .efi image with the offset of the script if that’s known
08:20night199uk: let me try a few others, otherwise maybe i can hack the source :-(
08:20imirkin: it does look like it works -- i just fed it a random file with -i 1, and it decodes the "script" just fine
08:20night199uk: actually i can’t pass it a .efi, damn
08:20night199uk: okay, let me try a few things - thanks
08:21imirkin: just insert a random byte at the front of your file, and use -i 1 ;)
08:21night199uk: heh, the script isn’t stored linearly :-(
08:22imirkin: that's the issue i was talking about... the CALL's take absolute offsets
08:22night199uk: can extract it to a .bin and work on it though, i’ve got enough to get there i think :-)
08:22night199uk: no, more like its populated on the stack
08:22night199uk: mov [rsp+168h+Script], 6Eh
08:22night199uk: that sort of stuff ;-)
08:22imirkin: sure, i just meant for the benefit of the nvbios parser
08:22night199uk: so i need to pull out the instructions
08:22imirkin: not "for real"
08:23night199uk: this will be a big help from where i am now though
08:23night199uk: tx :-)
08:23night199uk: i notice nouveau & envy don’t implement all instructions i see btw
08:23imirkin: but yeah, normally these vbioses start with some x86 opcodes anyways
08:23imirkin: and then there's a table that tells you where the real vbios stuff is
08:23imirkin: nvbios parses all that...
08:23imirkin: PCIR? something like that.
08:23night199uk: yeah, PCIR header
08:24night199uk: i have a 010editor script that does all that stuff
08:24imirkin: so your scripts don't actually start at offset 0
08:24imirkin: at least not the scripts that nvbios parses
08:25night199uk: well these scripts are stored in .efi images anyway
08:25night199uk: which are relocatable
08:25night199uk: so i imagine the code in them avoids subs and calls
08:25imirkin: ... not relevant
08:25night199uk: or uses relative addressing where needed
08:25imirkin: the relocatability is in reference to the x86 code
08:26night199uk: actually they’re always called as offset 0
08:26imirkin: not in reference to the vbios tables
08:26night199uk: yeah, i just realised that as well
08:26night199uk: but they’re always run with PC=0 at init, thinking about it
08:26imirkin: offset 0 of the vbios will always be some x86 code that will execute the actual tables
08:26night199uk: so all offsets would be relative to 0
08:27imirkin: you may want to study the nouveau execution logic to gain a better understanding of what these things do
08:27night199uk: i have the EFI execution VM
08:27night199uk: but yeah the nouveau one i used quite heavily to figure that out
08:28imirkin: e.g. this is the jump instruction:
08:28imirkin: note how the offset is an absolute one, and relative to the start of the vbios
08:28imirkin: [which includes the x86 to execute it, etc]
08:29night199uk: relative to the start of the vbios?
08:29imirkin: well sure... it's just a 816
08:29night199uk: i haven’t come across 0x5c in the efi images yet
08:29imirkin: the vbios is only ever 64kb
08:29imirkin: what about 0x5b?
08:29night199uk: i guess it would be avoided as the scripts encoded in here are not run from vbios at all
08:29imirkin: jump is a bit outmoded
08:29night199uk: neither 0x5b
08:30night199uk: let me see how the vm in the efi image executes 0x5b
08:30night199uk: not yet
08:31imirkin: that's about it for opcodes that manipulate the offset directly
08:31night199uk: so far most of the scripts are pretty short, i just found a bunch of longer ones hence the ask so lets see what they throw up
08:31imirkin: happy to send you some more complex vbioses :)
08:32night199uk: have you extracted just the .EFI ROMs from many?
08:32night199uk: if you have some interesting vbios’ that might be useful
08:32night199uk: only interested in UEFI BIOS tho
08:33imirkin: ah. i don't know anything about uefi or how it would be in any way different... but... who knows.
08:33imirkin: i guess they could have switched away from the vbios table model
08:33imirkin: and just do it all in code directly
08:33night199uk: well, UEFI implements a basic nvidia driver
08:33night199uk: nah, they still use the tables
08:34night199uk: but the UEFI driver has a bunch of scripts hard coded directly in code, too
08:34imirkin: well, it still has to modeset
08:34imirkin: and it still has to do a bunch of stuff
08:34night199uk: and the implement a script runner ‘VM’ almost exactly the same in working as the nouveau one
08:34imirkin: so the vbios scripts ought to be the same
08:34imirkin: however outside of *really* old cards, i've never seen a short vbios script
08:35night199uk: i’m not sure how much they leverage the vbios scripts yet though
08:35imirkin: anything after nv30 is sure to be quite long.
08:35night199uk: and how much they use internal mode setting scripts in the EFI image
08:35night199uk: haha, i can give you some :-)
08:35imirkin: right, that's what i'm saying -- they could have migrated away from using the vbios tables ;)
08:35imirkin: which of course has nothing to do with efi
08:35night199uk: i’ll tell you when i get it finished
08:35imirkin: and why they'd use different logic for efi and non-efi is a bit odd to me
08:36imirkin: i guess maybe efi runs in 32-bit real mode or some junk like that?
08:36night199uk: the ones encoded in the EFI image may just be fallback
08:36imirkin: [is there such a beast?]
08:36night199uk: nah, this must be a DXE driver i guess so it’s 64-bit
08:36night199uk: didn’t think about it too much but it’s 64-bit code
08:37imirkin: ok, so... not 16-bit real mode ;)
08:37night199uk: its at least 32-bit flat mode, i think probably 64-bit
08:37imirkin: that may explain why they opted for a diff mechanism
08:37imirkin: vbios is 16-bit real mode
08:38night199uk: yeah, old fashioned vbios?
08:38imirkin: well, option rom, really
08:39imirkin: anyways, bbl
08:39night199uk: np - thanks for the help :-)
08:43tobijk: mh somebody familiar with Xserver internals? i have a reproducible segfault :/
08:45tobijk: imirkin: we are "lucky" its not nouveau after all, nouveau just complains about the xserver dying :)
09:14chewitt: imirkin: dmesg of modprobe after booting with nouveau blacklisted: http://pastebin.com/9nAYW8D6
09:15not_karolherbst: at least I can post here
09:16chewitt: the modprobe is at/about line 42
09:16chewitt: you can see wmi loaded
09:26imirkin__: chewitt: that's it?
09:26chewitt: beyond that point the box has locked up
09:27imirkin_: can you add 'config=NvMSI=0' to the insmod line?
09:27chewitt: it's not the full dmesg.. but there is nothing much of interest earlier
09:27imirkin_: i.e. 'insmod nouveau config=...'
09:28chewitt: insmod or modprobe?
09:28imirkin_: [also, why is it debugging at the trace level?]
09:28imirkin_: or adjust your modprobe.conf
09:28imirkin_: modprobe doesn't let you pass options on the cmdline
09:28imirkin_: despite having related functionality, the two tools are *quite* different
09:29imirkin_: oh wait, i see. you blacklisted, then did modprobe
09:29imirkin_: then you can stick a 'options nouveau config=NvMSI=0' to your modprobe.conf
09:29imirkin_: and use modprobe
09:31chewitt: I have the blacklist line and that in the same file.. I presume it makes no difference?
09:32chewitt: I got marginally more output http://pastebin.com/JJiXjSmb
09:32imirkin_: yea that's fine
09:32chewitt: then the box rebooted
09:32imirkin_: progress ;)
09:32imirkin_: btw, is there something funny about this box that you're not mentioning?
09:32imirkin_: like it's really a VAX or something?
09:33chewitt: mk1 AppleTV
09:33imirkin_: what does that translate to, in terms of hw?
09:34chewitt: 1GHz Pentium M CPU, 64MB VRAM, 256MB RAM, HDD.. 10/100 Ethernet, etc.
09:34chewitt: quite low spec
09:34imirkin_: and presumably that G72 is soldered on there?
09:35imirkin_: explains your obsession with kodi too :)
09:35chewitt: there's a small number of late shipping units with 128MB VRAM.. but the rest are fixed config
09:35chewitt: I historically "curate" the AppleTV build for OpenELEC
09:35chewitt: (if you heard of that)
09:35imirkin_: not really, sorry
09:36imirkin_: ah nice, just an appliance-ish distro
09:36chewitt: OE is the #2 method for running XBMC/Kodi
09:36imirkin_: i assume "directly" is the #1 method?
09:36chewitt: Windows (sadly)
09:36chewitt: but we overtook the Linux side of the house
09:37chewitt: we stopped shipping the ATV build recently as the boxes are under-size on RAM and have some issues
09:38chewitt: I always wanted to see how much light nouveau was than the nvidia blob we have been shipping
09:38imirkin_: that won't stop you! :)
09:38chewitt: we originally did use nouveau
09:38chewitt: but only as nvidia borked hdmi audio output for a whlie
09:38imirkin_: there's no (real) hdmi audio support on pre-nva3
09:38imirkin_: [in nouveau][
09:39imirkin_: sometimes it happens to work
09:39imirkin_: but it's purely coincidental
09:39chewitt: once I persuaded them to fix the driver I switched over to the blob.. and it was fine for a while
09:39chewitt: but we are starved on RAM
09:39karolherbst: chewitt: is the driver really so heavy?
09:39chewitt: once you play a few things and stuff is moved to swap on disk.. playback is okay
09:39karolherbst: or is RAM just really really low
09:39imirkin_: what do you do that takes up ram (and is gpu-related)?
09:40imirkin_: karolherbst: 256M ram, 64M vram
09:40chewitt: no idea.. as you've noticed I am not really a developer person :(
09:40karolherbst: whats the size output in lsmod about?
09:40imirkin_: chewitt: well, the thing is that xbmc likes to do crazy stuff
09:40imirkin_: like it'll use GL for all sorts of bs
09:40imirkin_: i bet if you nuked that, it'd all be fine
09:41Karlton: 256M is enough for kodi
09:41Karlton: you start it directly
09:41chewitt: but.. way (way) back I remember that the amount of RAM free under nouveau was much better than nvidia
09:41chewitt: so.. hello :)
09:41imirkin_: can you set up netconsole?
09:41imirkin_: i suspect there's more logs in there that you're not catching
09:42imirkin_: it doesn't make sense that a write to 0x12c4 would hang the box
09:42imirkin_: and it's nto even executing, it's just reading the vbios
09:43Karlton: imirkin_: they call it Kodi now because it doesn't work on Xbox anymore
09:43imirkin_: yeah, but i call it xbmc because i hate pointless name changes
09:45chewitt: there were some issues with rights to the name xbox :)
09:46chewitt: but the holders were quite good about that
09:46Karlton: yes it isn't just to be less confusing, there is also some legal issues
09:49Karlton: the holder is Microsoft, so probably it was best that they changed it
09:52chewitt: also changed to ensure they hold the rights to the name correctly.. to prevent some of the abuse taking place with xbmc
09:53chewitt: rebuilding the kernel with netconsole enabled .. my MacBook is all setup to receive the stream :)
09:57Yoshimo: what kind of graphic chip is on the xbox , while we are at it?
09:57imirkin_: i.e. a GF4 Ti-style chip
09:59Yoshimo: and the playstation 4 has a radeon, not a surprise
10:01imirkin_: i was talking about the original xbox
10:02imirkin_: one of the playstations had a G70-ish chip too
10:02imirkin_: ps3 i think
10:03Karlton: and playstation 4 is FreeBSD in a proprietary DRM form
10:04Karlton: and they use their own graphic driver based on catalyst
10:30chewitt: can't seem to get netconsole working :(
10:31chewitt: added to kernel config.. but doesn't appear to be in the image I build
10:35imirkin_: you said =y right?
10:39chewitt: I did.. but instructions I found since say =m .. so changing that
10:39chewitt: I bumped libdrm at the same time.. running current now
10:43imirkin_: no, don't do =m... those instructions suck. they probably include some stupid configfs and dynamic config too right?
10:46chewitt: easy enough to change back
10:51imirkin_: you do have to supply the netconsole stuff on the cmdline though
10:51imirkin_: what arg did you pass in (exactly) -- it's very picky
11:02chewitt: it's in a .conf file, not in kernel boot params
11:02imirkin_: oh, that's why it didn't work
11:02imirkin_: it has to be in the kernel cmdline
11:03imirkin_: (if it's built-in)
12:17imirkin_: karolherbst: which game was it that needed ARB_copy_image and didn't check for it? bioshock? witcher2?
13:14airlied: imirkin_: okay so I'd like to try and fix it, and objections about slow downs can be taken with a single extended digit
13:14airlied: correctness over speed
13:14airlied: I'd kinda like virgl to work on nouveau
13:15imirkin_: airlied: ok, well this is a multi-step process
13:15imirkin_: each step will be pointless until all the steps are completed
13:15airlied: find everything in screen, add a big lock :-)
13:16airlied: iterate until it no longer hangs my machine
13:16imirkin_: so... tom sent a virtual driver which did that
13:16imirkin_: but i think that's too drastic, and unnecessary
13:18imirkin_: anyways, there's currently no real program that has multiple contexts that it uses from diff threads at the same time. you're going to contend with bugs in every driver if you do that.
13:19airlied: imirkin_: I'm pretty sure vmware does
13:19airlied: we have tests you know
13:19airlied: you just don't run them
13:19imirkin_: coz they destroy the universe
13:20imirkin_: the non-multithreaded glx tests also destroy the universe
13:20imirkin_: so there are much deeper problems
13:20airlied: they don't on other driversr
13:20airlied: or at least they destroy a lot less
13:20imirkin_: that's coz their gpu drivers can recover
13:21imirkin_: nouveau can't
13:21airlied: no it isn't
13:21airlied: I don't get any gpu resets on radeonsi or cayman
13:21airlied: with a full piglit run
13:21airlied: to be honest most radeon resets fail badly
13:21airlied: I haven't had a successful reset in a long time
13:21imirkin_: ok, well the fact that their gpu's reset semi-properly definitely enables them to run and debug these things
13:22imirkin_: whereas when i'm on my desktop, i'm not going to debug tests that hang my box
13:22imirkin_: i'm just going to not run them :)
13:22imirkin_: i have enough tests to debug that don't hang my box
13:22imirkin_: anyhow, if you want to do the BKL solution, improve tom's thing, and add a flag to optionally throw it into the mix
13:23imirkin_: (tom's thing == the pseudo driver he sent recently which does precisely this)
13:24imirkin_: if you want to solve the issue for real inside nouveau let me know, and i can point out some issues to start fixing
13:25airlied: no I was just going to add a lock inside nouveau screen
13:25imirkin_: please don't do that
13:25imirkin_: use tom's pseudo-driver
13:25airlied: and anywhere a context does something to the screen, lock it
13:25airlied: I don't think that'll scale
13:25airlied: it'll at least be a lot harder to push down the lock with
13:25imirkin_: how so? it's the same thing, but outside the driver
13:26airlied: you generally lock data not code
13:26imirkin_: the solution long term is not to push down locks
13:26airlied: that thing locks code
13:26imirkin_: yeah, but there's a lot of implicit shit that goes on
13:26imirkin_: and it's really hard to lock properly
13:26airlied: I'd rather have locks attached to things that you can see what to lock
13:26imirkin_: the solution is to not have to lock it in the first place
13:26airlied: like the mm and vbo seem obvious
13:27imirkin_: ok, well big problem #1 is that all the contexts share a single pushbuf
13:27imirkin_: they should instead just each get their own
13:27imirkin_: and yeah, a BKL around the draw_vbo call is bsaically required
13:27imirkin_: unless you want to get into doing complex things
13:28imirkin_: keep in mind you're not just locking data structures, you're also "locking" card state
13:28imirkin_: and yeah, any long-term solution will require locking nouveau_mm =/
13:28imirkin_: unless we make one per context... hm
13:29imirkin_: but if you're just going to add a lock around every function in screen and resource, then just use tom's thing -- that's precisely what it does
13:32tobijk: imirkin_: are you aware of a build breakage with --enable-debug in mesa?
13:32imirkin_: i am not
13:32imirkin_: was it my bad?
13:32tobijk: i dont think so
13:33tobijk: maybe its just my llvm :/
13:33imirkin_: oh, if you have a svn llvm, that always breaks
13:33imirkin_: and is fully expected
13:33tobijk: nah its a "stable" one
13:34imirkin_: they change their api back and forth every other day
13:34imirkin_: pastebin error
13:34tobijk: i hoped for a missing build include ;-)
13:35airlied: imirkin_: okay so make pushbuf per context, and locks to mm/vbo, pray
13:35imirkin_: airlied: not quite enough
13:36imirkin_: airlied: even though the pushbufs will be per-context (yay) , they will still submit into the same logical command stream
13:36imirkin_: there are a few places outside of draw_vbo, which will have a mega-lock
13:36imirkin_: which submit significant amount of junk into the command stream
13:36imirkin_: specifically various rseource copies
13:36imirkin_: those probably also will need to acquire a BKL
13:37airlied: into the command stream isn't into the pushbuf?
13:37imirkin_: sadly no
13:37hakzsam: tobijk, probably related to your llvm version
13:37imirkin_: each pushbuf is a user-space cache of commands
13:37imirkin_: when it gets kicked, it actually gets submitted to the hw
13:37airlied: yeah like all sane drivres
13:38airlied: what is submitting things to the hw that isn't a pushbuf?
13:38imirkin_: it doesn't just get kicked when you ask though
13:38tobijk: hakzsam: yeah, nice standard package *urgh*
13:38imirkin_: but my point is... a single logical set of commands can span multiple submits
13:38airlied: okay so in sane drivers we use a context
13:38airlied: for things that aren't the contetx
13:38airlied: so the blitter gets its own context
13:38airlied: transfers use their own
13:38imirkin_: well, in nouveau we use a single hw context for *everything*
13:38airlied: but I suppose that requires contexts not to be insanely broken
13:39imirkin_: i believe that's also what the blob driver does
13:39airlied: well it's fine to use a single hw context
13:39imirkin_: context switches aren't particularly cheap
13:39airlied: it's having separation above that
13:39imirkin_: ok, well the way a resource copy works is like "set dst address; write a bunch of data"
13:39imirkin_: that bunch of data might be more than you can stick into a single pushbuf submit
13:39airlied: okay so it should do that in its own sw context
13:40imirkin_: i dunno what a sw context is
13:40airlied: it's a pushbuf
13:40imirkin_: a user-side cache of a small quantity of commands?
13:40airlied: a user side cache of state
13:40airlied: that ends up in a pushbuf
13:40imirkin_: ah, no such thing exists for nouveau
13:40airlied: so two sw contexts can construct pushbufs independently and queue them on one hw context
13:41airlied: I wonder if I'd be quicker writing a vulkan driver :-P
13:41hakzsam: imirkin_, btw, backporting the edgeflag stuff from nvc0 to nv50 should fix the gl-1.0-edgeflag-* tests, right?
13:41imirkin_: you'd have to build something like that. i can't think of a particularly efficient way to do something like that
13:41imirkin_: hakzsam: if done correctly, yes :)
13:41hakzsam: imirkin_, correctly is the problem, indeed ;)
13:41airlied: imirkin_: be efficently wrong isn't being efficient :)
13:43imirkin_: airlied: ok, well the current thing works for all reasonable applications.
13:43hakzsam: imirkin_, well, I'll try to do that too because I didn't make progress about the point-vertex-id fail yesterday
13:43airlied: imirkin_: it really doesn't, lots of real apps use contexts
13:43imirkin_: airlied: if there's a reasonable application for which this breaks, we can look into how to fix it properly
13:43airlied: they aren't games
13:43imirkin_: airlied: i believe this is all fixable by means that don't involve sw contexts
13:43imirkin_: airlied: multiple contexts work fine.
13:44airlied: imirkin_: they clearly don't
13:44imirkin_: airlied: concurrent calls from multiple GL threads doesn't
13:44airlied: that isn't working fine though
13:44imirkin_: but you can have a bunch of contexts and flip between them with glXMakeCurrent or whatever
13:44airlied: I'm not even sure this is what makes virgl fallover
13:44imirkin_: airlied: is virgl multithreaded?
13:44imirkin_: i.e. can there ever be a situation with 2 threads calling glFoo() ?
13:44airlied: qemu is, but I don't think multiple threads enter the renderer
13:45imirkin_: if not, then nouveau should work perfectly fine
13:45airlied: yeah it clearly fallsover running piglit inside virgl
13:45imirkin_: any piglit in particular where it tends to die?
13:45imirkin_: like glx-bla or streaming-texture-leak or max-texture-size?
13:46airlied: inside virgil they work fine
13:46airlied: since virtgl doesn't exactly leak glx into the host
13:46imirkin_: i have no idea why the glx tests die
13:46imirkin_: but i suspect it has little to do with them being glx-specific
13:46imirkin_: certainly the glx-multithreaded one explicitly does bad things
13:46imirkin_: but other ones also cause death
13:47airlied: imirkin_: can you run piglit without glx but without -1?
13:47imirkin_: that destroys the world
13:47airlied: so that seems wrong, for a card that has hw context switching
13:47imirkin_: yeah, but i'm in no position to debug it
13:47imirkin_: *most* of the tests run fine with that
13:47airlied: esp when gnome-shell and glamor in the future
13:47imirkin_: but anything that interacts with X tends to break
13:48imirkin_: yeah, you see why i'm not keen on dumping xf86-video-nouveau? :)
13:48airlied: I wonder if I should divert hans attention into this if he has time
13:49imirkin_: in any case, i don't want anything to have to do with a gnome-shell future :p
13:49airlied: well really X should have the same problem
13:49airlied: since EXA causes context switches
13:50imirkin_: it does
13:50imirkin_: that's why piglits that touch X crash when run in parallel
13:50airlied: does piglit with gbm work okay?
13:50imirkin_: i assume so, but i haven't checked
13:50imirkin_: some tests leak out of the gbm sandbox btw
13:50imirkin_: and do X things
14:24Grinchier: hello all, i had a weird issue last night, was unable to wake monitor from suspend. Here is my log.
14:25Grinchier: I was hoping sonmeone could help me understand what any of this means....http://dpaste.com/2GWXK6X
14:26Grinchier: it ends with about 150,000 lines of with about 150,000 lines of nouveau E[ PFIFO][0000:01:00.0] PBDMA0: ACQUIRE
14:27Grinchier: PFIFO][0000:01:00.0] PBDMA0: ch 2 [Xorg] subc 0 mthd 0x001c data 0x00001004
14:31Grinchier: anyone? PGRAPH][0000:01:00.0] ROP0 0x80000000 0x80000001
14:31Grinchier: woop http://dpaste.com/2GWXK6X
14:45Grinchier: I added to this bug report, although I am using ubuntu not fedora lol https://bugs.freedesktop.org/show_bug.cgi?id=72315
14:49imirkin_: Grinchier: unfortunately such issues are very hard to debug
14:49imirkin_: Grinchier: among other things it's a good idea to mention what hardware you have
14:49imirkin_: nouveau supports 15 years worth of different hardware
15:16Grinchier: imirkin_: gtx 650 ti
15:16Grinchier: VGA compatible controller: NVIDIA Corporation GK106 [GeForce GTX 650 Ti] (rev a1)
15:40Grinchier: imirkin: sorry i lost my connection
15:41imirkin_: happens to the best of us
15:41imirkin_: unfortunately i don't have a whole lot to add to your issue
15:41imirkin_: i'm unfamiliar with a lot of the low-level details of how the gpu operates
15:53Grinchier: hopefully it was just a one time thing
23:26chewitt: right.. so.. in order to capture lockup stuff I built netconsole into the kernel
23:27chewitt: oddly.. if I build with =y the module doesn't exist.. but =m and it does
23:27chewitt: [ 7938.073797] netconsole: Unknown symbol __netpoll_cleanup (err 0)
23:27chewitt: [ 7938.073829] netconsole: Unknown symbol netpoll_parse_options (err 0)
23:27chewitt: [ 7938.073851] netconsole: Unknown symbol netpoll_setup (err 0)
23:27chewitt: [ 7938.073865] netconsole: Unknown symbol netpoll_send_udp (err 0)
23:27chewitt: [ 7938.073894] netconsole: Unknown symbol netpoll_cleanup (err 0)
23:27chewitt: modprobe: ERROR: could not insert 'netconsole': Unknown symbol in module, or unknown parameter (see dmesg)
23:27chewitt: modprobe netconsole firstname.lastname@example.org/eth0,email@example.com/80:e6:50:14:ef:8e
23:28chewitt: nothing wrong that I can see in the params .. so must be something in the kernel, or..?
23:28pq: chewitt, well, yes, if you build with =y, it is built into the kernel and there is no module. :-)
23:28pq: and no need to modprobe it as it is already loaded
23:29chewitt: I guess you understand my n00b level now :)
23:29pq: and because it loads at kernel boot, you need to configure it on the kernel command line, otherwise it is too late
23:29chewitt: either way it doesn't appear to work with =y .. nothing being sent
23:30imirkin: additionally modprobe doesn't take module options on the cmdline
23:30imirkin: insmod does though
23:30chewitt: ahh.. ok
23:30imirkin: oh hm. maybe it does.
23:30imirkin: also, my experience is that that's overly specific...
23:30chewitt: insmod: ERROR: could not load module netconsole: No such file or directory
23:31imirkin: like i said, i just do: firstname.lastname@example.org/eth0,@192.168.3.1/ console=ether
23:31imirkin: on the *kernel cmdline*
23:31pq: there is no module to insmod if you build with =y
23:31chewitt: cat .config/modprobe.d/netconsole.conf
23:31chewitt: options netconsole email@example.com/eth0,firstname.lastname@example.org/80:e6:50:14:ef:8e
23:32chewitt: and plain "modprobe netconsole" generates the same Unknown symbol stuff
23:32pq: chewitt, you need that on the kernel command line, that is, set in your boot loader config.
23:32imirkin: you see how things go easier if you just follow my advice? build it with =y
23:32imirkin: and stick that on the kernel cmdline. and you'll be done.
23:32imirkin: instead you're messing around with modules :p
23:32chewitt: ok.. but I did build with =y and I did put stuff in the boot params.. and still nada.. hence experimenting
23:33imirkin: did you do "console=ether"?
23:33imirkin: do you see how i have it in my example?
23:33imirkin: both the one i just pasted, and the same thing i pasted yesterday or whenever? :p
23:33pq: once more: if you build it with =y, commands insmod and modprobe are not useful - the module is already loaded. Trying to load a stale module file left over from a previous kernel build will not do any good.
23:34imirkin: anyhoo... i'm off
23:34imirkin: good luck!
23:34imirkin: when you run into issues with the recommended thing, just ask
23:34imirkin: chances are it's something silly
23:34chewitt: so I need console=ether in addition to the netconsole part.. anything else in params?
23:35imirkin: just those... and i'd nuke the ports and the mac address
23:35imirkin: use the thing i have, but just replace the ip addresses (and interface name as necessary)
23:35imirkin: the first address is the ip address of the machine, the second one is the ip address of the destination
23:36chewitt: console=ether netconsole=172.16.20.10/eth0,@172.16.20.11
23:36chewitt: console=ether netconsole=172.16.20.10/eth0,172.16.20.1 <= no @
23:37imirkin: email@example.com/eth0,@192.168.3.1/ console=ether
23:37imirkin: *only* replace the ip addresses, nothing else
23:38chewitt: kernel rebuilding..
23:39imirkin: and note that this is going to be coming in as udp, not tcp
23:39imirkin: so 'nc -l -u -p 6666'
23:39imirkin: [some versions of nc inexplicably want somewhat different arguments... not sure about yours]
23:41chewitt: no -p on MacOS
23:48imirkin: some way to specify the port though
23:54chewitt: can't use -l with -p
23:55imirkin: well, you need to listen on udp port 6666. there's a way to make it work
23:55imirkin: i think with -l it might just expect a port? i forget.