~shefty/rdma-dev.git
8 years agocxgb4vf: Support CPL_SGE_EGR_UPDATEs encapsulated in a CPL_FW4_MSG
Vipul Pandya [Mon, 29 Apr 2013 04:04:41 +0000 (04:04 +0000)]
cxgb4vf: Support CPL_SGE_EGR_UPDATEs encapsulated in a CPL_FW4_MSG

Newer firmware can post CPL_SGE_EGR_UPDATE message encapsulated in a
CPL_FW4_MSG as follows

flit0 rss_header (if DropRSS == 0 in IQ context)
flit1 CPL_FW4_MSG cpl
flit2 rss_header w/opcode CPL_SGE_EGR_UPDATE
flit3 CPL_SGE_EGR_UPDATE cpl

So FW4_MSG CPLs with a newly created type of FW_TYPE_RSSCPL have the
CPL_SGE_EGR_UPDATE CPL message in flit 2 of the FW4_MSG. Firmware can still
post regular CPL_SGE_EGR_UPDATE messages, so the drivers need to handle
both.

This patch also writes a new parameter to firmware requesting encapsulated
EGR_UPDATE. This allows firmware with this support to not break older drivers.

Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agocxgb4: Support CPL_SGE_EGR_UPDATEs encapsulated in a CPL_FW4_MSG
Vipul Pandya [Mon, 29 Apr 2013 04:04:40 +0000 (04:04 +0000)]
cxgb4: Support CPL_SGE_EGR_UPDATEs encapsulated in a CPL_FW4_MSG

Newer firmware can post CPL_SGE_EGR_UPDATE message encapsulated in a
CPL_FW4_MSG as follows

flit0 rss_header (if DropRSS == 0 in IQ context)
flit1 CPL_FW4_MSG cpl
flit2 rss_header w/opcode CPL_SGE_EGR_UPDATE
flit3 CPL_SGE_EGR_UPDATE cpl

So FW4_MSG CPLs with a newly created type of FW_TYPE_RSSCPL have the
CPL_SGE_EGR_UPDATE CPL message in flit 2 of the FW4_MSG. Firmware can still
post regular CPL_SGE_EGR_UPDATE messages, so the drivers need to handle
both.

This patch also writes a new parameter to firmware requesting encapsulated
EGR_UPDATE. This allows firmware with this support to not break older drivers.

Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agocxgb4: Fix pci_device_id structure initialization with correct PF number
Vipul Pandya [Mon, 29 Apr 2013 04:04:39 +0000 (04:04 +0000)]
cxgb4: Fix pci_device_id structure initialization with correct PF number

Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agotcp: reset timer after any SYNACK retransmit
Yuchung Cheng [Mon, 29 Apr 2013 08:44:51 +0000 (08:44 +0000)]
tcp: reset timer after any SYNACK retransmit

Linux immediately returns SYNACK on (spurious) SYN retransmits, but
keeps the SYNACK timer running independently. Thus the timer may
fire right after the SYNACK retransmit and causes a SYN-SYNACK
cross-fire burst.

Adopt the fast retransmit/recovery idea in established state by
re-arming the SYNACK timer after the fast (SYNACK) retransmit. The
timer may fire late up to 500ms due to the current SYNACK timer wheel,
but it's OK to be conservative when network is congested. Eric's new
listener design should address this issue.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: Add MIB counters for checksum errors
Eric Dumazet [Mon, 29 Apr 2013 08:39:56 +0000 (08:39 +0000)]
net: Add MIB counters for checksum errors

Add MIB counters for checksum errors in IP layer,
and TCP/UDP/ICMP layers, to help diagnose problems.

$ nstat -a | grep  Csum
IcmpInCsumErrors                72                 0.0
TcpInCsumErrors                 382                0.0
UdpInCsumErrors                 463221             0.0
Icmp6InCsumErrors               75                 0.0
Udp6InCsumErrors                173442             0.0
IpExtInCsumErrors               10884              0.0

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agotg3: shows HW time stamping support only if ptp_capable is present
Flavio Leitner [Mon, 29 Apr 2013 07:08:07 +0000 (07:08 +0000)]
tg3: shows HW time stamping support only if ptp_capable is present

Current tg3 shows hardware timestamping support for all devices
when that is true only for the hardware with PTP_CAPABLE flag
present.

Signed-off-by: Flavio Leitner <fbl@redhat.com>
Acked-by: Nithin Nayak Sujir <nsujir@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: defer net_secret[] initialization
Eric Dumazet [Mon, 29 Apr 2013 05:58:52 +0000 (05:58 +0000)]
net: defer net_secret[] initialization

Instead of feeding net_secret[] at boot time, defer the init
at the point first socket is created.

This permits some platforms to use better entropy sources than
the ones available at boot time.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobe2net: FLR must be first cmd issued to Lancer FW
Kalesh AP [Sun, 28 Apr 2013 22:22:29 +0000 (22:22 +0000)]
be2net: FLR must be first cmd issued to Lancer FW

Lancer FW requires that the first cmd issued by the host-driver be an FLR.
So, re-order be_probe() to move be_cmd_function_reset() ahead of
be_cmd_fw_init().

Signed-off-by: Kalesh AP <kalesh.purayil@emulex.com>
Signed-off-by: Somnath Kotur <somnath.kotur@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobe2net: Use GET_FUNCTION_CONFIG V1 cmd
Kalesh AP [Sun, 28 Apr 2013 22:21:13 +0000 (22:21 +0000)]
be2net: Use GET_FUNCTION_CONFIG V1 cmd

Skyhawk-R requires V1 version of GET_FUNCTION_CONFIG cmd to be used for
querrying resources available per function.

Signed-off-by: Kalesh AP <kalesh.purayil@emulex.com>
Signed-off-by: Somnath Kotur <somnath.kotur@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
David S. Miller [Mon, 29 Apr 2013 18:29:06 +0000 (14:29 -0400)]
Merge branch 'master' of git://git./linux/kernel/git/pablo/nf-next

Pablo Neira Ayuso says:

====================
The following patchset contains relevant updates for the Netfilter
tree, they are:

* Enhancements for ipset: Add the counter extension for sets, this
  information can be used from the iptables set match, to change
  the matching behaviour. Jozsef required to add the extension
  infrastructure and moved the existing timeout support upon it.
  This also includes a change in net/sched/em_ipset to adapt it to
  the new extension structure.

* Enhancements for performance boosting in nfnetlink_queue: Add new
  configuration flags that allows user-space to receive big packets (GRO)
  and to disable checksumming calculation. This were proposed by Eric
  Dumazet during the Netfilter Workshop 2013 in Copenhagen. Florian
  Westphal was kind enough to find the time to materialize the proposal.

* A sparse fix from Simon, he noticed it in the SCTP NAT helper, the fix
  required a change in the interface of sctp_end_cksum.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agosh_eth: add R8A77781 support
Sergei Shtylyov [Sat, 27 Apr 2013 10:44:24 +0000 (10:44 +0000)]
sh_eth: add R8A77781 support

Add support for another ARM member of the R-Car family, R-Car M1A, also known as
R8A77781 -- it will share the code with previously added R8A77790.

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agosctp: Correct type and usage of sctp_end_cksum()
Simon Horman [Fri, 19 Apr 2013 01:54:58 +0000 (10:54 +0900)]
sctp: Correct type and usage of sctp_end_cksum()

Change the type of the crc32 parameter of sctp_end_cksum()
from __be32 to __u32 to reflect that fact that it is passed
to cpu_to_le32().

There are five in-tree users of sctp_end_cksum().
The following four had warnings flagged by sparse which are
no longer present with this change.

net/netfilter/ipvs/ip_vs_proto_sctp.c:sctp_nat_csum()
net/netfilter/ipvs/ip_vs_proto_sctp.c:sctp_csum_check()
net/sctp/input.c:sctp_rcv_checksum()
net/sctp/output.c:sctp_packet_transmit()

The fifth user is net/netfilter/nf_nat_proto_sctp.c:sctp_manip_pkt().
It has been updated to pass a __u32 instead of a __be32,
the value in question was already calculated in cpu byte-order.

net/netfilter/nf_nat_proto_sctp.c:sctp_manip_pkt() has also
been updated to assign the return value of sctp_end_cksum()
directly to a variable of type __le32, matching the
type of the return value. Previously the return value
was assigned to a variable of type __be32 and then that variable
was finally assigned to another variable of type __le32.

Problems flagged by sparse.
Compile and sparse tested only.

Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
8 years agonetfilter: nfnetlink_queue: avoid expensive gso segmentation and checksum fixup
Florian Westphal [Fri, 19 Apr 2013 04:58:27 +0000 (04:58 +0000)]
netfilter: nfnetlink_queue: avoid expensive gso segmentation and checksum fixup

Userspace can now indicate that it can cope with larger-than-mtu sized
packets and packets that have invalid ipv4/tcp checksums.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
8 years agonetfilter: nfnetlink_queue: add skb info attribute
Florian Westphal [Fri, 19 Apr 2013 04:58:26 +0000 (04:58 +0000)]
netfilter: nfnetlink_queue: add skb info attribute

Once we allow userspace to receive gso/gro packets, userspace
needs to be able to determine when checksums appear to be
broken, but are not.

NFQA_SKB_CSUMNOTREADY means 'checksums will be fixed in kernel
later, pretend they are ok'.

NFQA_SKB_GSO could be used for statistics, or to determine when
packet size exceeds mtu.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
8 years agonetfilter: move skb_gso_segment into nfnetlink_queue module
Florian Westphal [Fri, 19 Apr 2013 04:58:25 +0000 (04:58 +0000)]
netfilter: move skb_gso_segment into nfnetlink_queue module

skb_gso_segment is expensive, so it would be nice if we could
avoid it in the future. However, userspace needs to be prepared
to receive larger-than-mtu-packets (which will also have incorrect
l3/l4 checksums), so we cannot simply remove it.

The plan is to add a per-queue feature flag that userspace can
set when binding the queue.

The problem is that in nf_queue, we only have a queue number,
not the queue context/configuration settings.

This patch should have no impact other than the skb_gso_segment
call now being in a function that has access to the queue config
data.

A new size attribute in nf_queue_entry is needed so
nfnetlink_queue can duplicate the entry of the gso skb
when segmenting the skb while also copying the route key.

The follow up patch adds switch to disable skb_gso_segment when
queue config says so.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
8 years agonetfilter: nf_queue: move device refcount bump to extra function
Florian Westphal [Fri, 19 Apr 2013 04:58:23 +0000 (04:58 +0000)]
netfilter: nf_queue: move device refcount bump to extra function

required by future patch that will need to duplicate the
nf_queue_entry, bumping refcounts of the copy.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
8 years agonetfilter: ipset: set match: add support to match the counters
Jozsef Kadlecsik [Sat, 27 Apr 2013 12:40:50 +0000 (14:40 +0200)]
netfilter: ipset: set match: add support to match the counters

The new revision of the set match supports to match the counters
and to suppress updating the counters at matching too.

At the set:list types, the updating of the subcounters can be
suppressed as well.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
8 years agonetfilter: ipset: The list:set type with counter support
Jozsef Kadlecsik [Mon, 8 Apr 2013 21:11:32 +0000 (23:11 +0200)]
netfilter: ipset: The list:set type with counter support

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
8 years agonetfilter: ipset: The hash types with counter support
Jozsef Kadlecsik [Mon, 8 Apr 2013 21:11:02 +0000 (23:11 +0200)]
netfilter: ipset: The hash types with counter support

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
8 years agonetfilter: ipset: The bitmap types with counter support
Jozsef Kadlecsik [Mon, 8 Apr 2013 21:10:22 +0000 (23:10 +0200)]
netfilter: ipset: The bitmap types with counter support

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
8 years agonetfilter: ipset: Introduce the counter extension in the core
Jozsef Kadlecsik [Sat, 27 Apr 2013 12:38:56 +0000 (14:38 +0200)]
netfilter: ipset: Introduce the counter extension in the core

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
8 years agonetfilter: ipset: list:set type using the extension interface
Jozsef Kadlecsik [Thu, 4 Apr 2013 10:21:02 +0000 (12:21 +0200)]
netfilter: ipset: list:set type using the extension interface

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
8 years agonetfilter: ipset: Hash types using the unified code base
Jozsef Kadlecsik [Mon, 8 Apr 2013 20:50:55 +0000 (22:50 +0200)]
netfilter: ipset: Hash types using the unified code base

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
8 years agonetfilter: ipset: Unified hash type generation
Jozsef Kadlecsik [Mon, 8 Apr 2013 19:05:44 +0000 (21:05 +0200)]
netfilter: ipset: Unified hash type generation

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
8 years agonetfilter: ipset: Bitmap types using the unified code base
Jozsef Kadlecsik [Sat, 27 Apr 2013 12:37:01 +0000 (14:37 +0200)]
netfilter: ipset: Bitmap types using the unified code base

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
8 years agonetfilter: ipset: Unified bitmap type generation
Jozsef Kadlecsik [Mon, 8 Apr 2013 19:00:52 +0000 (21:00 +0200)]
netfilter: ipset: Unified bitmap type generation

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
8 years agonetfilter: ipset: Introduce extensions to elements in the core
Jozsef Kadlecsik [Sat, 27 Apr 2013 12:28:55 +0000 (14:28 +0200)]
netfilter: ipset: Introduce extensions to elements in the core

Introduce extensions to elements in the core and prepare timeout as
the first one.

This patch also modifies the em_ipset classifier to use the new
extension struct layout.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
8 years agonetfilter: ipset: Move often used IPv6 address masking function to header file
Jozsef Kadlecsik [Mon, 8 Apr 2013 18:54:37 +0000 (20:54 +0200)]
netfilter: ipset: Move often used IPv6 address masking function to header file

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
8 years agonetfilter: ipset: Make possible to test elements marked with nomatch
Jozsef Kadlecsik [Mon, 8 Apr 2013 19:51:25 +0000 (21:51 +0200)]
netfilter: ipset: Make possible to test elements marked with nomatch

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
8 years agohyperv: Fix a compiler warning in netvsc_send()
Haiyang Zhang [Fri, 26 Apr 2013 08:25:55 +0000 (08:25 +0000)]
hyperv: Fix a compiler warning in netvsc_send()

Fixed: warning: cast from pointer to integer of different size

The Hyper-V hosts always use 64 bit request id. The guests can have 32 or 64
bit pointers which equal to the ulong type size. So we cast it to ulong type.
And, assigning 32bit integer to 64 bit variable works fine.

The VMBus returns the same id in the completion packet. But the value has no
effect on the host side.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch 'pegasus'
David S. Miller [Mon, 29 Apr 2013 17:59:07 +0000 (13:59 -0400)]
Merge branch 'pegasus'

Petko Manolov says:

====================
This series of patches is fixing a bug related to multiple control URB
submissions (noted by Sarah Sharp), optimizes read and write_mii_word
routines and removes socket buffer pool used in the receive path.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agodrivers: net: usb: pegasus: fix control urb submission
Petko Manolov [Thu, 25 Apr 2013 22:41:50 +0000 (22:41 +0000)]
drivers: net: usb: pegasus: fix control urb submission

Pegasus driver used single callback for sync and async control URBs.
Special flags were employed to distinguish between both, but due to flawed
logic it didn't always work.  As a result of this change
[get|set]_registers() are now much simpler.  Async write is also leaner
and does not use single, statically allocated memory for usb_ctrlrequest,
which is another potential race when asynchronously submitting URBs.

Signed-off-by: Petko Manolov <petkan@nucleusys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agodrivers: net: usb: pegasus: read/write_mii_word optimised
Petko Manolov [Thu, 25 Apr 2013 22:41:36 +0000 (22:41 +0000)]
drivers: net: usb: pegasus: read/write_mii_word optimised

Duplicated code in routines reading and writing MII registers is now
packed in __mii_op().

Signed-off-by: Petko Manolov <petkan@nucleusys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agodrivers: net: usb: pegasus: remove skb pool
Petko Manolov [Thu, 25 Apr 2013 22:41:21 +0000 (22:41 +0000)]
drivers: net: usb: pegasus: remove skb pool

The socket buffer pool for the receive path is now gone.  It's existence
didn't make much difference (performance-wise) and the code is better off
without the spinlocks protecting it.

Signed-off-by: Petko Manolov <petkan@nucleusys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoipv6: Kill ipv6 dependency of icmpv6_send().
Pravin B Shelar [Thu, 25 Apr 2013 11:08:30 +0000 (11:08 +0000)]
ipv6: Kill ipv6 dependency of icmpv6_send().

Following patch adds icmp-registration module for ipv6.  It allows
ipv6 protocol to register icmp_sender which is used for sending
ipv6 icmp msgs.  This extra layer allows us to kill ipv6 dependency
for sending icmp packets.

This patch also fixes ip_tunnel compilation problem when ip_tunnel
is statically compiled in kernel but ipv6 is module

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: increase frag hash size
Jesper Dangaard Brouer [Thu, 25 Apr 2013 09:52:25 +0000 (09:52 +0000)]
net: increase frag hash size

Increase fragmentation hash bucket size to 1024 from old 64 elems.

After we increased the frag mem limits commit c2a93660 (net: increase
fragment memory usage limits) the hash size of 64 elements is simply
too small.  Also considering the mem limit is per netns and the hash
table is shared for all netns.

For the embedded people, note that this increase will change the hash
table/array from using approx 1 Kbytes to 16 Kbytes.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoatm: he: use mdelay instead of large udelay constants
Arnd Bergmann [Thu, 14 Mar 2013 14:21:36 +0000 (15:21 +0100)]
atm: he: use mdelay instead of large udelay constants

ARM cannot handle udelay for more than 2 miliseconds, and
it is rather bad style to block the cpu for 16ms anyway,
so let's use msleep instead.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Chas Williams <chas@cmf.nrl.navy.mil>
Cc: linux-atm-general@lists.sourceforge.net
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch 'pktdiag'
David S. Miller [Mon, 29 Apr 2013 17:22:07 +0000 (13:22 -0400)]
Merge branch 'pktdiag'

Nicolas Dichtel says:

====================
The goal of this patchset is to be able to get all infos exported via the
/proc/net/packet and also beeing able to get filter associated to af_packet
sockets.

As usual, the patch against iproute2 will be sent once the patches are included
and net-next merged. I can send it on demand.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agosock_diag: allow to dump bpf filters
Nicolas Dichtel [Thu, 25 Apr 2013 06:53:54 +0000 (06:53 +0000)]
sock_diag: allow to dump bpf filters

This patch allows to dump BPF filters attached to a socket with
SO_ATTACH_FILTER.
Note that we check CAP_SYS_ADMIN before allowing to dump this info.

For now, only AF_PACKET sockets use this feature.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agopacket_diag: disclose meminfo values
Nicolas Dichtel [Thu, 25 Apr 2013 06:53:53 +0000 (06:53 +0000)]
packet_diag: disclose meminfo values

sk_rmem_alloc is disclosed via /proc/net/packet but not via netlink messages.
The goal is to have the same level of information.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agopacket_diag: disclose uid value
Nicolas Dichtel [Thu, 25 Apr 2013 06:53:52 +0000 (06:53 +0000)]
packet_diag: disclose uid value

This value is disclosed via /proc/net/packet but not via netlink messages.
The goal is to have the same level of information.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoselftests: psock_tpacket: fix status check
Daniel Borkmann [Wed, 24 Apr 2013 23:08:00 +0000 (23:08 +0000)]
selftests: psock_tpacket: fix status check

Testing like this for TP_STATUS_AVAILABLE clearly is a stupid bug
since it always returns true. Fix this by only checking for flags
where the kernel owns the packet and negate this result, since we
also could run into the non-zero status TP_STATUS_WRONG_FORMAT
and need to reclaim frames.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agovxlan: allow choosing destination port per vxlan
stephen hemminger [Sat, 27 Apr 2013 11:31:57 +0000 (11:31 +0000)]
vxlan: allow choosing destination port per vxlan

Allow configuring the default destination port on a per-device basis.
Adds new netlink paramater IFLA_VXLAN_PORT to allow setting destination
port when creating new vxlan.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agovxlan: compute source port in network byte order
stephen hemminger [Sat, 27 Apr 2013 11:31:56 +0000 (11:31 +0000)]
vxlan: compute source port in network byte order

Rather than computing source port and returning it in host order
then swapping later, go ahead and compute it in network order to
start with. Cleaner and less error prone.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agovxlan: source compatiablity with IFLA_VXLAN_GROUP (v2)
stephen hemminger [Sat, 27 Apr 2013 11:31:55 +0000 (11:31 +0000)]
vxlan: source compatiablity with IFLA_VXLAN_GROUP (v2)

Source compatiability for build iproute2 was broken by:
  commit c7995c43facc6e5dea4de63fa9d283a337aabeb1
  Author: Atzm Watanabe <atzm@stratosphere.co.jp>
    vxlan: Allow setting destination to unicast address.

Since this commit has not made it upstream (still net-next),
and better to avoid gratitious changes to exported API's;
go back to original definition, and add a comment.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agovxlan: fix byte order issues with NDA_PORT
stephen hemminger [Sat, 27 Apr 2013 11:31:54 +0000 (11:31 +0000)]
vxlan: fix byte order issues with NDA_PORT

The NDA_PORT attribute was added, but the author wasn't careful
about width (port is 16 bits), or byte order.  The attribute was
being dumped as 16 bits, but only 32 bit value would be accepted
when setting up a device. Also, the remote port is in network
byte order and was being compared with default port in host byte
order.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agovxlan: document UDP default port
stephen hemminger [Sat, 27 Apr 2013 11:31:53 +0000 (11:31 +0000)]
vxlan: document UDP default port

The default port for VXLAN is not same as IANA value.
Document this.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agovxlan: update mail address and copyright date
stephen hemminger [Sat, 27 Apr 2013 11:31:52 +0000 (11:31 +0000)]
vxlan: update mail address and copyright date

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobridge: Add fdb dst check during fdb update
roopa [Mon, 22 Apr 2013 12:56:49 +0000 (12:56 +0000)]
bridge: Add fdb dst check during fdb update

Current bridge fdb update code does not seem to update the port
during fdb update. This patch adds a check for fdb dst (port)
change during fdb update. Also rearranges the call to
fdb_notify to send only one notification for create and update.

Changelog:
v2 - Change notify flag to bool

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net...
David S. Miller [Sat, 27 Apr 2013 03:33:41 +0000 (23:33 -0400)]
Merge branch 'master' of git://git./linux/kernel/git/jkirsher/net-next

Jeff Kirsher says:

====================
This series contains updates to e1000e, igb and ixgbe.

There are 2 patches in this series which could be applied to net,
but since Linus is so very close to releasing 3.9, I do not think
it prudent to try and push these into net at this time.  I have CC'd
stable on these patches so that they can queue them up as soon as
3.9 gets released.

The 2 patches are:
  e1000e: fix numeric overflow in phc settime method
  ixgbe: fix EICR write in ixgbe_msix_other

Richard provides a fix for e1000e by using a helper function from time.h
to resolve a unintended overflow in the PTP settime function.

Bruce provides a fix to wait for NAPI to be done with the current context
after disabling interrupts and then disable NAPI when the interface
is going down.  This fixes a possible "unable to handle kernel paging
request" panic in net-next.

Andi Kleen provides a patch for igb to use mdelay instead of udelay
when we needed 100000us.

Jacob provides a fix for ixgbe to simply mask the lower 16bits off so that
ixgbe_msix_other does not write them in the EICR, which causes them to
remain high and be properly handled by the clean_rings interrupt routine
as normal.

Emil cleans up the logic in ixgbe_setup_loopback_test() to only access
registers applicable to the MAC type.  In addition, removes majority
of the AUTOC register reads by using a cached value instead to avoid
writing corrupted values to AUTOC due to bad FW.  Emil also add support
for disabling link during boot time.  Lastly, he provides a patch which
adds the MAC type to the version in ethtool_regs which will make it
easier to check the MAC type when dumping registers with ethtool.

There is a separate ethtool tool patch which is dependent upon Emil's
last patch of the series to add the MAC type to the version in
ethtool_regs, which will be sent separately.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch 'mlx4'
David S. Miller [Sat, 27 Apr 2013 03:29:21 +0000 (23:29 -0400)]
Merge branch 'mlx4'

Or Gerlitz says:

====================
This series adds support for the SRIOV ndo_set_vf callbacks to the mlx4 driver.

Series done against the net-next tree as of commit 37fe0660981d7a "net:
fix address check in rtnl_fdb_del"

We have successfully tested the series on net-next, except for getting
the VF link info issue I have reported earlier today on netdev, we
see the problem for both ixgbe and mlx4 VFs. Just to make sure get
VF config is working OK with patch #6 - we have run it over 3.8.8 too.

We added to the V1 series two patches that disable HW timestamping
when running over a VF, as this isn't supported yet.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/mlx4: Add support to get VF config
Rony Efraim [Thu, 25 Apr 2013 05:22:30 +0000 (05:22 +0000)]
net/mlx4: Add support to get VF config

Support getting VF config.

Signed-off-by: Rony Efraim <ronye@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/mlx4: Add VF MAC spoof checking support
Rony Efraim [Thu, 25 Apr 2013 05:22:29 +0000 (05:22 +0000)]
net/mlx4: Add VF MAC spoof checking support

Add ndo_set_vf_spoofchk support

Signed-off-by: Rony Efraim <ronye@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/mlx4: Add set VF default vlan ID and priority support
Rony Efraim [Thu, 25 Apr 2013 05:22:28 +0000 (05:22 +0000)]
net/mlx4: Add set VF default vlan ID and priority support

Add support to ndo_set_vf_vlan in the driver. Once this call is used the vport
is considered to be in VST mode. In this mode, the PPF driver configures
Ethernet QPs created by this VF to use this vlan id and priority. Currently
RoCE isn't supported on that mode.

The special values of VID=4095 or VID=0,UP=0 are considered as VGT.

Signed-off-by: Rony Efraim <ronye@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/mlx4: Add set VF mac address support
Rony Efraim [Thu, 25 Apr 2013 05:22:27 +0000 (05:22 +0000)]
net/mlx4: Add set VF mac address support

Add ndo_set_vf_mac support which allows to set the MAC address
for mlx4 VF Ethernet NICs from the host.

Signed-off-by: Rony Efraim <ronye@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/mlx4: Add structures to keep VF Ethernet ports information
Rony Efraim [Thu, 25 Apr 2013 05:22:26 +0000 (05:22 +0000)]
net/mlx4: Add structures to keep VF Ethernet ports information

This patch add struct mlx4_vport_state where all the parameters related
to management of VFs port (virtual ports of the NIC eswitch) are kept.

The driver keeps an administrative and operational copy of the settings.
The current administrative copy becomes operational on the event of probing
a VF either on a VM or on the host.

Signed-off-by: Rony Efraim <ronye@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/mlx4: Add reference counting to MAC registeration
Rony Efraim [Thu, 25 Apr 2013 05:22:25 +0000 (05:22 +0000)]
net/mlx4: Add reference counting to MAC registeration

Add reference counting to the driver MAC registeration code. This would
be needed for cases where a mac is registered from more than once, e.g
when both the host and the VM driver register the same mac, the host
for mac spoof protection purposes and the VM for its regular needs.

Signed-off-by: Rony Efraim <ronye@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/mlx4_en: Disable HW clock overflow check when no HW support
Amir Vadai [Thu, 25 Apr 2013 05:22:24 +0000 (05:22 +0000)]
net/mlx4_en: Disable HW clock overflow check when no HW support

Should not run HW clock overflow check if HW clock is not supported. Also, since
this watchdog is the only customer of service_task, no need to start it in that case.

Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/mlx4_core: Disable HW timestamping for VFs
Amir Vadai [Thu, 25 Apr 2013 05:22:23 +0000 (05:22 +0000)]
net/mlx4_core: Disable HW timestamping for VFs

Disable timestamp capability on virtual functions.

Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agogenetlink: fix possible memory leak in genl_family_rcv_msg()
Wei Yongjun [Fri, 26 Apr 2013 15:34:16 +0000 (15:34 +0000)]
genetlink: fix possible memory leak in genl_family_rcv_msg()

'attrbuf' is malloced in genl_family_rcv_msg() when family->maxattr &&
family->parallel_ops, thus should be freed before leaving from the error
handling cases, otherwise it will cause memory leak.

Introduced by commit def3117493eafd9dfa1f809d861e0031b2cc8a07
(genl: Allow concurrent genl callbacks.)

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobe2net: Avoid diagnostic test in certain versions of firmware to avoid NIC freeze.
Suresh Reddy [Thu, 25 Apr 2013 23:03:22 +0000 (23:03 +0000)]
be2net: Avoid diagnostic test in certain versions of firmware to avoid NIC freeze.

Signed-off-by: Suresh Reddy <suresh.reddy@emulex.com>
Signed-off-by: Sarveshwar Bandi <sarveshwar.bandi@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobe2net: Renamed rx_address_mismatch_errors to rx_address_filtered
Suresh Reddy [Thu, 25 Apr 2013 23:03:21 +0000 (23:03 +0000)]
be2net: Renamed rx_address_mismatch_errors to rx_address_filtered

Signed-off-by: Suresh Reddy <suresh.reddy@emulex.com>
Signed-off-by: Sarveshwar Bandi <sarveshwar.bandi@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobe2net: Add support for setting and getting rx flow hash options
Suresh Reddy [Thu, 25 Apr 2013 23:03:20 +0000 (23:03 +0000)]
be2net: Add support for setting and getting rx flow hash options

Signed-off-by: Suresh Reddy <suresh.reddy@emulex.com>
Signed-off-by: Sarveshwar Bandi <sarveshwar.bandi@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoixgbe: add mac type to the version in ethtool_regs
Emil Tantilov [Fri, 19 Apr 2013 09:31:17 +0000 (09:31 +0000)]
ixgbe: add mac type to the version in ethtool_regs

This patch adds the mac type to the version in ethtool_regs.

This will make it easier to check the mac type when dumping registers with
ethtool. The drawback of this is that older versions of ethtool will only
be able to dump in hex format for 82599 and above  when used with the updated
driver.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agoixgbe: add support for disabling link at boot time on 82599
Emil Tantilov [Fri, 12 Apr 2013 08:36:47 +0000 (08:36 +0000)]
ixgbe: add support for disabling link at boot time on 82599

This patch adds support for disabling link during boot time. This
feature was requested by customers and is configurable through the EEPROM.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agoixgbe: cache AUTOC reads
Emil Tantilov [Fri, 12 Apr 2013 08:36:42 +0000 (08:36 +0000)]
ixgbe: cache AUTOC reads

This patch removes majority of the AUTOC register reads by using a cached
value instead.

The reason for this change is to avoid writing corrupted values to AUTOC
due to bad FW.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agoixgbe: fix register access during ethtool loopback test
Emil Tantilov [Fri, 12 Apr 2013 02:10:25 +0000 (02:10 +0000)]
ixgbe: fix register access during ethtool loopback test

This patch cleans up the logic in ixgbe_setup_loopback_test() to only access
registers applicable to the MAC type. AUTOC is only valid on MACs older than
X540. MACC is used for X540.

In addition it removes a read of AUTOC and uses the stored value to force the
link up.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agoixgbe: fix EICR write in ixgbe_msix_other
Jacob Keller [Sat, 2 Mar 2013 07:51:42 +0000 (07:51 +0000)]
ixgbe: fix EICR write in ixgbe_msix_other

Previously, the ixgbe_msix_other was writing the full 32bits of the set
interrupts, instead of only the ones which the ixgbe_msix_other is
handling. This resulted in a loss of performance when the X540's PPS feature is
enabled due to sometimes clearing queue interrupts which resulted in the driver
not getting the interrupt for cleaning the q_vector rings often enough. The fix
is to simply mask the lower 16bits off so that this handler does not write them
in the EICR, which causes them to remain high and be properly handled by the
clean_rings interrupt routine as normal.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Cc: stable <stable@vger.kernel.org>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agoigb: limit udelay for phy changes to 10000us
Andi Kleen [Mon, 22 Apr 2013 07:46:40 +0000 (07:46 +0000)]
igb: limit udelay for phy changes to 10000us

If you really want 100000us you should really use mdelay or so.

Found by the LTO kernel build

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agoe1000e: panic caused by Rx traffic arriving while interface going down
Bruce Allan [Sat, 20 Apr 2013 05:37:29 +0000 (05:37 +0000)]
e1000e: panic caused by Rx traffic arriving while interface going down

An "unable to handle kernel paging request" panic can occur when receiving
traffic while the interface is going down.  Wait for NAPI to be done with
current context after disabling interrupts and then disable NAPI.

See https://bugzilla.vyatta.com/show_bug.cgi?id=8837.

Reported-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agoe1000e: fix numeric overflow in phc settime method
Richard Cochran [Tue, 23 Apr 2013 01:56:34 +0000 (01:56 +0000)]
e1000e: fix numeric overflow in phc settime method

The PTP Hardware Clock settime function in the e1000e driver
computes nanoseconds from a struct timespec. The code converts the
seconds field .tv_sec by multiplying it with NSEC_PER_SEC. However,
both operands are of type long, resulting in an unintended overflow.
The patch fixes the issue by using the helper function from time.h.

CC: stable <stable@vger.kernel.org>
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agonet: fix address check in rtnl_fdb_del
Vlad Yasevich [Tue, 23 Apr 2013 11:05:23 +0000 (11:05 +0000)]
net: fix address check in rtnl_fdb_del

Commit 6681712d67eef14c4ce793561c3231659153a320
vxlan: generalize forwarding tables

relaxed the address checks in rtnl_fdb_del() to use is_zero_ether_addr().
This allows users to add multicast addresses using the fdb API.  However,
the check in rtnl_fdb_del() still uses a more strict
is_valid_ether_addr() which rejects multicast addresses.  Thus it
is possible to add an fdb that can not be later removed.
Relax the check in rtnl_fdb_del() as well.

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/cpsw: fix irq_disable() with threaded interrupts
Sebastian Siewior [Wed, 24 Apr 2013 08:48:25 +0000 (08:48 +0000)]
net/cpsw: fix irq_disable() with threaded interrupts

During high throughput it is likely that we receive both: an RX and TX
interrupt. The normal behaviour is that once we enter the ISR the
interrupts are disabled in the IRQ chip and so the ISR is invoked only
once and the interrupt line is disabled once. It will be re-enabled
after napi completes.
With threaded interrupts on the other hand the interrupt the interrupt
is disabled immediately and the ISR is marked for "later". By having TX
and RX interrupt marked pending we invoke them both and disable the
interrupt line twice. The napi callback is still executed once and so
after it completes we remain with interrupts disabled.

The initial patch simply removed the cpsw_{enable|disable}_irq() calls
and it worked well on my AM335X ES1.0 (beagle bone). On ES2.0 (beagle
bone black) it caused an never ending interrupt (even after the mask via
cpsw_intr_disable()) according to Mugunthan V N. Since I don't have the
ES2.0 and no idea what is going on this patch tracks the state of the
irq_disable() call and execute it only when not yet done.
The book keeping is done on the first struct since with dual_emac we can
have two of those and only one interrupt line.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/cpsw: optimize the for_each_slave_macro()
Sebastian Siewior [Wed, 24 Apr 2013 08:48:24 +0000 (08:48 +0000)]
net/cpsw: optimize the for_each_slave_macro()

text    data     bss     dec     hex filename
15530      92       4   15626    3d0a cpsw.o.before
15478      92       4   15574    3cd6 cpsw.o.after

52 bytes smaller, 13 for each invocation.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/cpsw: make sure modules remove does not leak any ressources
Sebastian Siewior [Wed, 24 Apr 2013 08:48:23 +0000 (08:48 +0000)]
net/cpsw: make sure modules remove does not leak any ressources

This driver does not clean up properly after leaving. Here is a list:
- Use unregister_netdev(). free_netdev() is good but not enough
- Use the above also on the other ndev in case of dual mac
- Free data.slave_data. The name of the strucre makes it look like
  it is platform_data but it is not. It is just a trick!
- Free all irqs. Again: freeing one irq is good start, but freeing all
  of them is better.

With this rmmod & modprobe of cpsw seems to work. The remaining issue
is:
|WARNING: at fs/sysfs/dir.c:536 sysfs_add_one+0x9c/0xd4()
|sysfs: cannot create duplicate filename '/devices/ocp.2/4a100000.ethernet/4a101000.mdio'
|WARNING: at lib/kobject.c:196 kobject_add_internal+0x1a4/0x1c8()

comming from of_platform_populate() and I am not sure that this belongs
here.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/ti: add MODULE_DEVICE_TABLE + MODULE_LICENSE
Sebastian Siewior [Wed, 24 Apr 2013 08:48:22 +0000 (08:48 +0000)]
net/ti: add MODULE_DEVICE_TABLE + MODULE_LICENSE

If compiled as modules each one of these modules is missing something.
With this patch the modules are loaded on demand and don't taint the
kernel due to license issues.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/cpsw: redo rx skb allocation in rx path
Sebastian Siewior [Tue, 23 Apr 2013 07:31:39 +0000 (07:31 +0000)]
net/cpsw: redo rx skb allocation in rx path

In case that we run into OOM during the allocation of the new rx-skb we
don't get one and we have one skb less than we used to have. If this
continues to happen then we end up with no rx-skbs at all.
This patch changes the following:
- if we fail to allocate the new skb, then we treat the currently
  completed skb as the new one and so drop the currently received data.
- instead of testing multiple times if the device is gone we rely one
  the status field which is set to -ENOSYS in case the channel is going
  down and incomplete requests are purged.
  cpdma_chan_stop() removes most of the packages with -ENOSYS. The
  currently active packet which is removed has the "tear down" bit set.
  So if that bit is set, we send ENOSYS as well otherwise we pass the
  status bits which are required to figure out which of the two possible
  just finished.

Acked-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/davinci_cpdma: remove unused argument in cpdma_chan_submit()
Sebastian Siewior [Tue, 23 Apr 2013 07:31:38 +0000 (07:31 +0000)]
net/davinci_cpdma: remove unused argument in cpdma_chan_submit()

The gfp_mask argument is not used in cpdma_chan_submit() and always set
to GFP_KERNEL even in atomic sections. This patch drops it since it is
unused.

Acked-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/cpsw: don't rely only on netif_running() to check which device is active
Sebastian Siewior [Tue, 23 Apr 2013 07:31:37 +0000 (07:31 +0000)]
net/cpsw: don't rely only on netif_running() to check which device is active

netif_running() reports false before the ->ndo_stop() callback is
called. That means if one executes "ifconfig down" and the system
receives an interrupt before the interrupt source has been disabled we
hang for always for two reasons:
- we never disable the interrupt source because devices claim to be
  already inactive and don't feel responsible.
- since the ISR always reports IRQ_HANDLED the line is never deactivated
  because it looks like the ISR feels responsible.

This patch changes the logic in the ISR a little:
- If none of the status registers reports an active source (RX or TX,
  misc is ignored because it is not actived) we leave with IRQ_NONE.
- the interrupt is deactivated
- The first active network device is taken and napi is scheduled. If
  none are active (a small race window between ndo_down() and the
  interrupt the) then we leave and should not come back because the
  source is off.
  There is no need to schedule the second NAPI because both share the
  same dma queue.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/cpsw: don't continue if we miss to allocate rx skbs
Sebastian Siewior [Tue, 23 Apr 2013 07:31:36 +0000 (07:31 +0000)]
net/cpsw: don't continue if we miss to allocate rx skbs

if during "ifconfig up" we run out of mem we continue regardless how
many skbs we got. In worst case we have zero RX skbs and can't ever
receive further packets since the RX skbs are never reallocated. If
cpdma_chan_submit() fails we even leak the skb.
This patch changes the behavior here:
If we fail to allocate an skb during bring up we don't continue and
report that error. Same goes for errors from cpdma_chan_submit().
While here I changed to __netdev_alloc_skb_ip_align() so GFP_KERNEL can
be used.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/davinci_cpdma: don't check for jiffies with interrupts
Sebastian Siewior [Tue, 23 Apr 2013 07:31:35 +0000 (07:31 +0000)]
net/davinci_cpdma: don't check for jiffies with interrupts

__cpdma_chan_process() holds the lock with interrupts off (and its
caller as well), same goes for cpdma_ctlr_start(). With interrupts off,
jiffies will not make any progress and if the wait condition never gets
true we wait for ever.
Tgis patch adds a a simple udelay and counting down attempt.

Acked-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: ipv4: typo issue, remove erroneous semicolon
Chen Gang [Mon, 22 Apr 2013 20:45:42 +0000 (20:45 +0000)]
net: ipv4: typo issue, remove erroneous semicolon

Need remove erroneous semicolon, which is found by EXTRA_CFLAGS=-W,
the related commit number: c54419321455631079c7d6e60bc732dd0c5914c5
("GRE: Refactor GRE tunneling code")

Signed-off-by: Chen Gang <gang.chen@asianux.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobnx2x, bnx2fc: Use per port max exchange resources
Bhanu Prakash Gollapudi [Mon, 22 Apr 2013 19:22:30 +0000 (19:22 +0000)]
bnx2x, bnx2fc: Use per port max exchange resources

The firmware supports a maximum of 4K FCoE exchanges. In 4-port devices,
or when working in multi-function mode, this resource needs to be distributed
between the various possible FCoE functions.

This information needs to be calculated by bnx2x and propagated into bnx2fc
via cnic. bnx2fc can then use this value to calculate corresponding xid
resources instead of using global constants.

Signed-off-by: Bhanu Prakash Gollapudi <bprakash@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: fec: Enable imx6 enet checksum acceleration.
Jim Baxter [Fri, 19 Apr 2013 08:10:49 +0000 (08:10 +0000)]
net: fec: Enable imx6 enet checksum acceleration.

Enables hardware generation of IP header and
protocol specific checksums for transmitted
packets.

Enabled hardware discarding of received packets with
invalid IP header or protocol specific checksums.

The feature is enabled by default but can be
enabled/disabled by ethtool.

Signed-off-by: Fugang Duan <B38611@freescale.com>
Signed-off-by: Jim Baxter <jim_baxter@mentor.com>
Reviewed-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: calxedaxgmac: fix condition in xgmac_set_features()
Dan Carpenter [Thu, 25 Apr 2013 07:44:20 +0000 (10:44 +0300)]
net: calxedaxgmac: fix condition in xgmac_set_features()

The "changed" variable should be a 64 bit type, otherwise it can't store
all the features.  The way the code is now the test for whether
NETIF_F_RXCSUM changed is always false and we return immediately.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoopenvswitch: Use parallel_ops genl.
Pravin B Shelar [Tue, 23 Apr 2013 07:48:48 +0000 (07:48 +0000)]
openvswitch: Use parallel_ops genl.

OVS locking was recently changed to have private OVS lock which
simplified overall locking.  Therefore there is no need to have
another global genl lock to protect OVS data structures.  Following
patch uses of parallel_ops genl family for OVS.  This also allows
more granual OVS locking using ovs_mutex for protecting OVS data
structures, which gives more concurrencey.  E.g multiple genl
operations OVS_PACKET_CMD_EXECUTE can run in parallel, etc.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agogenl: Allow concurrent genl callbacks.
Pravin B Shelar [Tue, 23 Apr 2013 07:48:30 +0000 (07:48 +0000)]
genl: Allow concurrent genl callbacks.

All genl callbacks are serialized by genl-mutex. This can become
bottleneck in multi threaded case.
Following patch adds an parameter to genl_family so that a
particular family can get concurrent netlink callback without
genl_lock held.
New rw-sem is used to protect genl callback from genl family unregister.
in case of parallel_ops genl-family read-lock is taken for callbacks and
write lock is taken for register or unregistration for any family.
In case of locked genl family semaphore and gel-mutex is locked for
any openration.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoirda: irlmp_reasons[] can be static
Wu Fengguang [Fri, 19 Apr 2013 17:10:45 +0000 (17:10 +0000)]
irda: irlmp_reasons[] can be static

Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Acked-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: remove redundant code in dev_hard_start_xmit()
Eric Dumazet [Mon, 22 Apr 2013 14:31:34 +0000 (14:31 +0000)]
net: remove redundant code in dev_hard_start_xmit()

This reverts commit 068a2de57ddf4f4 (net: release dst entry while
cache-hot for GSO case too)

Before GSO packet segmentation, we already take care of skb->dst if it
can be released.

There is no point adding extra test for every segment in the gso loop.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agopacket: account statistics only in tpacket_stats_u
Daniel Borkmann [Fri, 19 Apr 2013 06:12:29 +0000 (06:12 +0000)]
packet: account statistics only in tpacket_stats_u

Currently, packet_sock has a struct tpacket_stats stats member for
TPACKET_V1 and TPACKET_V2 statistic accounting, and with TPACKET_V3
``union tpacket_stats_u stats_u'' was introduced, where however only
statistics for TPACKET_V3 are held, and when copied to user space,
TPACKET_V3 does some hackery and access also tpacket_stats' stats,
although everything could have been done within the union itself.

Unify accounting within the tpacket_stats_u union so that we can
remove 8 bytes from packet_sock that are there unnecessary. Note that
even if we switch to TPACKET_V3 and would use non mmap(2)ed option,
this still works due to the union with same types + offsets, that are
exposed to the user space.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agopacket: reorder a member in packet_ring_buffer
Daniel Borkmann [Fri, 19 Apr 2013 06:12:28 +0000 (06:12 +0000)]
packet: reorder a member in packet_ring_buffer

There's a 4 byte hole in packet_ring_buffer structure before
prb_bdqc, that can be filled with 'pending' member, thus we can
reduce the overall structure size from 224 bytes to 216 bytes.
This also has the side-effect, that in struct packet_sock 2*4 byte
holes after the embedded packet_ring_buffer members are removed,
and overall, packet_sock can be reduced by 1 cacheline:

Before: size: 1344, cachelines: 21, members: 24
After:  size: 1280, cachelines: 20, members: 24

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch 'af_packet-timestamp'
David S. Miller [Thu, 25 Apr 2013 05:22:53 +0000 (01:22 -0400)]
Merge branch 'af_packet-timestamp'

Daniel Borkmann says:

====================
This is a joint effort with Willem to bring optional i) tx hw/sw
timestamping into PF_PACKET, that was reported by Paul Chavent,
and ii) to expose the type of timestamp to the user, which is in
the current situation not possible to distinguish with the RX_RING
and TX_RING API (but distinguishable through the normal timestamping
API), reported by Richard Cochran. This set is based on top of
``packet: account statistics only in tpacket_stats_u''. Related
discussion can be found in: http://patchwork.ozlabs.org/patch/238125/
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agopacket: doc: update timestamping part
Daniel Borkmann [Tue, 23 Apr 2013 00:39:32 +0000 (00:39 +0000)]
packet: doc: update timestamping part

Bring the timestamping section in sync with the implementation.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agopacket: if hw/sw ts enabled in rx/tx ring, report which ts we got
Daniel Borkmann [Tue, 23 Apr 2013 00:39:31 +0000 (00:39 +0000)]
packet: if hw/sw ts enabled in rx/tx ring, report which ts we got

Currently, there is no way to find out which timestamp is reported in
tpacket{,2,3}_hdr's tp_sec, tp_{n,u}sec members. It can be one of
SOF_TIMESTAMPING_SYS_HARDWARE, SOF_TIMESTAMPING_RAW_HARDWARE,
SOF_TIMESTAMPING_SOFTWARE, or a fallback variant late call from the
PF_PACKET code in software.

Therefore, report in the tp_status member of the ring buffer which
timestamp has been reported for RX and TX path. This should not break
anything for the following reasons: i) in RX ring path, the user needs
to test for tp_status & TP_STATUS_USER, and later for other flags as
well such as TP_STATUS_VLAN_VALID et al, so adding other flags will
do no harm; ii) in TX ring path, time stamps with PACKET_TIMESTAMP
socketoption are not available resp. had no effect except that the
application setting this is buggy. Next to TP_STATUS_AVAILABLE, the
user also should check for other flags such as TP_STATUS_WRONG_FORMAT
to reclaim frames to the application. Thus, in case TX ts are turned
off (default case), nothing happens to the application logic, and in
case we want to use this new feature, we now can also check which of
the ts source is reported in the status field as provided in the docs.

Reported-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agopacket: minor: convert status bits into shifting format
Daniel Borkmann [Tue, 23 Apr 2013 00:39:30 +0000 (00:39 +0000)]
packet: minor: convert status bits into shifting format

This makes it more readable and clearer what bits are still free to
use. The compiler reduces this to a constant for us anyway.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agopacket: enable hardware tx timestamping on tpacket ring
Daniel Borkmann [Tue, 23 Apr 2013 00:39:29 +0000 (00:39 +0000)]
packet: enable hardware tx timestamping on tpacket ring

Currently, we only have software timestamping for the TX ring buffer
path, but this limitation stems rather from the implementation. By
just reusing tpacket_get_timestamp(), we can also allow hardware
timestamping just as in the RX path.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agopacket: tx timestamping on tpacket ring
Willem de Bruijn [Tue, 23 Apr 2013 00:39:28 +0000 (00:39 +0000)]
packet: tx timestamping on tpacket ring

When transmit timestamping is enabled at the socket level, record a
timestamp on packets written to a PACKET_TX_RING. Tx timestamps are
always looped to the application over the socket error queue. Software
timestamps are also written back into the packet frame header in the
packet ring.

Reported-by: Paul Chavent <paul.chavent@onera.fr>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net...
David S. Miller [Thu, 25 Apr 2013 04:55:27 +0000 (00:55 -0400)]
Merge branch 'master' of git://git./linux/kernel/git/jkirsher/net-next

Jeff Kirsher says:

====================
This series contains updates to ixgbe, igb and pci.

The ixgbe changes contains a fix to a possible divide by zero by bailing
out of the ixgbe_update_itr() function if the last interrupt timeslice is
zero.  In addition, support is added for the new OCP x520 adapter as well
as LX support for 82599 devices.  Jacob provides a patch to change
variable wol_supported to wol_enabled to better reflect what the code
is actually doing (i.e. checking if WoL is enabled).

Alex adds SRIOV helper function to pci that will determine if a PF
has any VFs that are currently assigned to a guest.

The remaining 8 patches are against igb and contain the following changes:
* implement SERDES loopback configuration for i210 devices by unsetting
  sigdetect bit, so as to fix Ethtool loopback test failure
* add support for the SMBI semaphore for I210/I211 devices
* implement the new generic pci_vfs_assigned helper function (Alex's PCI
  helper function)
* display warning when link speed is downgraded due to Smartspeed
* ensure that VLAN hardware filtering remains enabled when the device is
  in promiscuous mode and VT mode simultaneously
* cleanup dead code in igb
* bump the driver version

v2: updated the PCI patch to add SRIOV helper function to remove extern
    from the declaration of pci_vfs_assigned in pci.h and return 0 if
    SR-IOV is disabled which is inline with other PCI SR-IOV functions
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
David S. Miller [Thu, 25 Apr 2013 04:53:40 +0000 (00:53 -0400)]
Merge branch 'master' of git://git./linux/kernel/git/pablo/nf-next

Pablo Neira Ayuso says:

====================
The following patchset contains fixes for recently applied
Netfilter/IPVS updates to the net-next tree, most relevantly
they are:

* Fix sparse warnings introduced in the RCU conversion, from
  Julian Anastasov.

* Fix wrong endianness in the size field of IPVS sync messages,
  from Simon Horman.

* Fix missing if checking in nf_xfrm_me_harder, from Dan Carpenter.

* Fix off by one access in the IPVS SCTP tracking code, again from
  Dan Carpenter.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoigb: Bump version of driver
Carolyn Wyborny [Wed, 17 Apr 2013 16:44:53 +0000 (16:44 +0000)]
igb: Bump version of driver

Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>